U.S. patent application number 17/455220 was filed with the patent office on 2022-06-09 for memory management device.
The applicant listed for this patent is Lantiq Beteiligungs-GmbH & Co. KG. Invention is credited to Ritesh BANERJEE, Jiaxiang SHI, Ingo VOLKENING.
Application Number | 20220179792 17/455220 |
Document ID | / |
Family ID | 1000006156892 |
Filed Date | 2022-06-09 |
United States Patent
Application |
20220179792 |
Kind Code |
A1 |
BANERJEE; Ritesh ; et
al. |
June 9, 2022 |
MEMORY MANAGEMENT DEVICE
Abstract
Memory modules and associated devices and methods are provided
using a memory copy function between a cache memory and a main
memory that may be implemented in hardware. Address translation may
additionally be provided.
Inventors: |
BANERJEE; Ritesh;
(Bangalore, IN) ; SHI; Jiaxiang; (Singapore,
SG) ; VOLKENING; Ingo; (Singapur, SG) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
Lantiq Beteiligungs-GmbH & Co. KG |
Neubiberg |
|
DE |
|
|
Family ID: |
1000006156892 |
Appl. No.: |
17/455220 |
Filed: |
November 17, 2021 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
15527138 |
May 16, 2017 |
|
|
|
PCT/EP2015/077507 |
Nov 24, 2015 |
|
|
|
17455220 |
|
|
|
|
Current U.S.
Class: |
1/1 |
Current CPC
Class: |
G06F 12/08 20130101;
G06F 12/0811 20130101; G06F 2212/1024 20130101; G06F 12/0835
20130101; G06F 12/1081 20130101 |
International
Class: |
G06F 12/0831 20060101
G06F012/0831; G06F 12/0811 20060101 G06F012/0811; G06F 12/1081
20060101 G06F012/1081 |
Foreign Application Data
Date |
Code |
Application Number |
Nov 25, 2014 |
SG |
10201407795P |
Claims
1. A memory module for a computing device, the memory module
comprising: a main memory, at least one cache memory, a memory copy
device being connected with the main memory and with the cache
memory, wherein the memory copy device comprises at least one
Direct Memory Access (DMA) port for accessing data in the main
memory and data in the cache memory, the memory copy device further
comprising a cached access module device for reading and writing
data between the cache memory and the main memory via the DMA port
for maintaining data integrity and coherence between the cache
memory and the main memory, and an interconnect module coupled to
the DMA port, the main memory and the at least one cache memory,
wherein the interconnect module is configured to present a request
received from the DMA port to either a port of the main memory or a
port coupled to the cache memory based on an address range of a
memory address of the request.
2. The memory module of claim 1, wherein the memory copy device
further comprises: an address translation device for translating
between a memory physical address and a memory virtual address.
3. The memory module of claim 1, wherein the memory copy device is
implemented in hardware.
4. The memory module of claim 1, wherein the memory copy device is
configured to perform the reading and writing without having to use
a processor external to the memory copy device.
5. A computing device comprising: at least one processing core
module; and a memory module comprising: a main memory, at least one
cache memory, a memory copy device being connected with the main
memory and with the cache memory, wherein the memory copy device
comprises at least one Direct Memory Access (DMA) port for
accessing data in the main memory and data in the cache memory, the
memory copy device further comprising a cached access module device
for reading and writing data between the cache memory and the main
memory via the DMA port for maintaining data integrity and
coherence between the cache memory and the main memory, and an
interconnect module coupled to the DMA port, the main memory and
the at least one cache memory, wherein the interconnect module is
configured to present a request received from the DMA port to
either a port of the main memory or a port coupled to the cache
memory based on an address range of a memory address of the
request, wherein the processing core module is configured to store
data in the memory module and to reads data from the memory
module.
6. The computing device of claim 5 wherein the memory copy device
further comprises: an address translation device for translating
between a memory physical address and a memory virtual address.
7. The computing device of claim 5, wherein the memory copy device
is implemented in hardware. 8 The computing device of claim 5,
wherein the memory copy device is configured to perform the reading
and writing without having to use a processor external to the
memory copy device.
9. The computing device of claim 5, wherein the memory copy device
of the memory module is configured to perform memory copy
operations between the main memory and the cache memory of the
memory module without using the processing core module.
10. The computing device of claim 5, further comprising a network
module to operate in a network.
11. The computing device of claim 10, wherein the network module
comprises one or more devices selected from a group consisting of a
router device, a gateway device, and a Network Attached Storage
(NAS) device.
12. A method of operating a memory module comprising: translating,
in a hardware device, between a memory physical address and memory
virtual address; and, in the hardware device, reading and writing
data between a cache memory and a main memory via Direct Memory
Access in order to maintain data integrity and coherence between
the cache memory and the main memory, wherein the memory module is
a memory module comprising: a main memory, at least one cache
memory, a memory copy device being connected with the main memory
and with the cache memory, wherein the memory copy device comprises
at least one Direct Memory Access (DMA) port for accessing data in
the main memory and data in the cache memory, the memory copy
device further comprising a cached access module device for reading
and writing data between the cache memory and the main memory via
the DMA port, and an interconnect module coupled to the DMA port,
the main memory and the at least one cache memory, wherein the
interconnect module is configured to present a request received
from the DMA port to either a port of the main memory or a port
coupled to the cache memory based on an address range of a memory
address of the request, and the hardware device is the memory copy
device of the memory module.
13. The method of claim 12, wherein the reading and writing is
performed at least in part using the translating.
Description
REFERENCE TO RELATED APPLICATION
[0001] This application is a continuation of U.S. patent
application Ser. No. 15/527,138 filed on May 16, 2017, which is a
National Phase entrance of PCT Application No. PCT/EP2015/077507
filed on Nov. 24, 2015, which claims priority to Singapore Patent
Application No. 10201407795P filed on Nov. 25, 2014, the contents
of which are incorporated herein by reference in its entirety as if
fully set forth.
TECHNICAL FIELD
[0002] The application relates to memory management devices and to
corresponding methods.
BACKGROUND
[0003] Memories are generally used to store data. Memory access
times, together with other parameters like processor speed, may
determine a speed of operation of a system. For example, in
conventional systems processors may not be able to perform other
tasks, e.g. executing instructions, while accessing a memory.
Therefore, memory management approaches have been developed to
enable efficient memory access and/or efficient handling of data
stored in a memory.
[0004] US 20130057562 A1 shows techniques for accessing graphics
data stored in a memory. In some implementations, the techniques
may include performing data modification tasks independently from
an execution of one or more programs for example on a graphics
processing unit (GPU), which is an example for a processor. For
instance, the techniques of this document may allow the GPU to
separate tasks related to memory management and data modification
from tasks related to data processing, and allow for independent
execution of these tasks. As one example, techniques described in
this document may allow the GPU to retrieve data from memory or
store data in memory, and in parallel with performing such
functions, execute instructions of one or more programs. As another
example, techniques described in this patent may allow the GPU to
convert data into a more suitable form, which instructions to be
executed then utilize, independently from the execution of the
instructions themselves.
[0005] However, there is still a need for improved memory devices,
memory management systems and associated methods which may for
example reduce the burden for a processor regarding memory
operations.
SUMMARY
[0006] The independent claims define various aspects of the present
application. The dependent claims define further embodiments.
Features from different claims may be combined unless noted
otherwise.
[0007] According to one embodiment, the application provides a
memory module, also referred to as data memory module herein, for a
computing device.
[0008] The computing device is provided with a software program
that includes instructions for finding, organizing, and/or changing
data. An example of the computing device is a computer or server.
Another example may be an integrated system provided with a memory,
a processor and other components. The data memory module is used
for storing e.g. the above data, which is used by the computing
device.
[0009] The data memory module may include a main memory (e.g. main
memory unit), at least one cache memory (e.g. cache memory unit),
and a memory copy device (e.g. memory copy unit).
[0010] The cache memory is usually a smaller and faster memory,
which stores copies of frequently used data of the main memory. A
central processing unit (CPU) of a computer or other processor like
a graphics processing unit (GPU) often uses the cache memory to
reduce average time to access data from the main memory. In
practice, a predetermined waiting time is usually required to
synchronize data between the main memory and the cache memory.
[0011] This synchronization is especially important when large
amounts of data are involved and may keep data in the cache memory
and the main memory consistent with each other.
[0012] The memory copy device is communicatively connected with the
main memory and with the cache memory.
[0013] The memory copy device comprises at least one Direct Memory
Access (DMA) port, an address translation device and a cached
access module device.
[0014] The DMA port is used for accessing data in the main memory
and data in the cache memory. Via the DMA port, the memory copy
device is able to access data independently of a processor like a
CPU or GPU.
[0015] The address translation device is used for translating
between a memory physical address and a memory virtual address
and/or between a cache memory address and a main memory
address.
[0016] The cached access module device is used for reading and
writing data between the cache memory and the main memory via the
DMA port for maintaining date integrity and coherence.
[0017] The DMA port allows transfer of data between the main memory
and the cache memory independently of a processor like a CPU or
GPU. In effect, in embodiments this makes the processor free to
perform other tasks during this access of data. This thereby allows
for a more efficient use of computing resources, which is
especially important when large amounts of data are transferred
between the main memory and the cache memory.
[0018] This form of data transfer between the main memory and the
cache memory may largely be based on hardware to perform the data
transfer. For example, the memory copy device may be implemented as
an integrated circuit like an application specific integrated
circuit (ASIC) and may be integrated together with the cache memory
and the main memory. This form of data transfer may be faster and
useful for transferring of large amounts of data.
[0019] This approach is different from many conventional
implementations that use a processor like a CPU for the data
transfer between cache memory and main memory, which is based
largely on software running on the processor to do the data
transfer.
[0020] According to another aspect, the instant application also
provides a computing device. The computing device includes at least
one processing core module and the above memory module. The
processing core module stores data in the memory module and reads
data from the memory module.
[0021] According to yet another aspect, the instant application
also provides a computer server. The computer server comprises a
network module that includes the above computing device.
[0022] The network module may include one or more devices selected
from a group consisting of a router device, a gateway device, and a
Network Attached Storage (NAS) device.
[0023] According to yet another aspect, the instant application
provides a method of operating a memory module.
[0024] The method includes translating an address between memory
physical address and memory virtual address or between a cache
memory address and a main memory address, e.g. by an address
translation device. The method further includes (e.g. after the
address translation) reading and/or writing data between a cache
memory and a main memory via a Direct Memory Access (DMA) (e.g. a
corresponding DMA port) in order to maintain data integrity and
coherence between the cache memory and the main memory. This may
e.g. be performed by a cached access module device.
[0025] According to another aspect, the instant application
provides a computer with a memory device, a memory copy device, and
a Central Processing Unit (CPU). The memory copy device may be a
Hardware Memory Copy (HWMemCopy) engine.
[0026] The memory device in this aspect includes a main memory, a
cache memory, and a buffer memory. The memory device may be
implemented using a Dynamic Random Access Memory (DRAM) and/or a
Static Random Access Memory (SRAM).
[0027] Different from other memories of devices such as routers,
gateways, and Network Attached Storage (NAS) devices, the memory
copy device does not use resources of a processor like a CPU. The
memory copy device may improve networking throughput and/or may
improve or maximise application performance e.g. on embedded CPUs.
It may efficiently keep load relating to memory data away from a
processor like a CPU or GPU. This load may comprise load due to
memory address translation and copying data, which are described
below.
[0028] Software applications of the computer often use a virtual
address schemes for memories for easier implementation.
[0029] The virtual address corresponds to a physical address. In
one example, the memory device has a starting virtual address of
0x8000000, which corresponds to a physical address of
0x0000000.
[0030] The memory copy device may provide efficient translation of
a virtual memory address to its corresponding physical memory
address.
[0031] The memory copy device also may copy data efficiently from a
source memory segment to a destination memory segment while
maintaining coherency among the main memory, the cache memory and
the buffer memory.
[0032] This coherency may involve copying or writing data in the
buffer memory, also referred to as source buffer memory in this
case, to the main memory, and if needed, to the cache memory. The
data is sometimes referred to as "dirty" lines.
[0033] Similarly, the coherency can also involve copying from the
buffer memory, also referred to as destination buffer memory in
this case, to the main memory, and if needed, to the cache memory.
After this, memory eviction of the cache memory (destination) may
be performed, and new data may be written to the destination buffer
memory.
[0034] A method of operating the memory copy device according to an
embodiment will now be described. The method includes a step of
address translation and a step of cache memory coherency.
[0035] Referring to the address translation, during initialization
or when a mapping between a source memory segment and a destination
memory segment is changed, a software driver or other entity
configures or generates address translation rules for the memory
copy device.
[0036] The software driver then provides a memory copy (memcopy)
request command to the memory copy device. The memory copy
(memcopy) request command indicates as to whether a source address
and/or a destination address are to be translated for optimizing
use of virtual addresses, which are available in or used in a
software for a particular operation.
[0037] The memory copy device later performs an appropriate
translation of addresses based on the generated address translation
rules. The memory copy device may perform translation of addresses
for an unmapped address or can implement a simplified Memory
Management Unit (MMU) logic for mapped addresses.
[0038] Referring to the cache memory coherency, the software driver
may indicate with the memcopy request command to the memory copy
device as to whether data is to be read from a source address
and/or a destination address or data is to be written to/through a
cache memory.
[0039] If an access to a buffer memory of a cache memory is
required, the memory copy device then may translate the respective
address for pointing to a cache memory region address that
corresponds to a physical address space.
[0040] The memory copy device later may initiate a Direct Memory
Access (DMA) read from the buffer memory or may initiate a DMA
write to the buffer memory to maintain the cache memory coherent
e.g. to the main memory and/or buffer memory.
[0041] The DMA read and/or the DMA write may be done in a manner
that traverses a Cache Memory Sub-system and Coherency Manager.
[0042] This may be done by using driver sideband signals or by
performing address transformations or translations to a cache
memory address space. One way this may be achieved is to present
transformed "accessthrough" cache memory addresses to an
interconnect, which then routes the same through a CPU cache memory
subsystem (that may include the cache memory) using an I/O cache
memory coherency port while presenting to it the original address
(before cache memory address transformation).
[0043] In short, in some aspects the instant application provides a
hardware memory copy module for performing a memory copy
conventionally provided as a software command. This may be done by
performing an address translation between a physical address and a
virtual address. Combined with a support of an homogenous data path
from a source memory to a destination memory, software flags may be
used to indicate whether data from the source memory is sent to a
physical memory address or to a cache (virtual) memory address.
This may allow the source memory and the destination memory to have
different attributes.
[0044] In some aspects, the method of memory copy used by a system,
with respect to physical and virtual addresses, may be detected by
checking an Open Source driver used by the system.
[0045] The memory copy device provided by some aspects of the
instant application may be different from a system that uses two
copy processes. Such a system has more overhead and requires more
system bandwidth to achieve or perform the same task.
[0046] The above is merely a brief overview over some features or
elements of some embodiments and is not to be construed as
limiting, as other embodiments may comprise different features or
elements. Features or elements from different embodiments may be
combined with each other.
BRIEF DESCRIPTION OF THE DRAWINGS
[0047] FIG. 1 illustrates a memory copy engine.
DETAILED DESCRIPTION
[0048] In the following detailed description, details are provided
to describe embodiments of the application. It shall be apparent to
one skilled in the art, however, that the embodiments may be
practiced without such details. In other word, a description of an
embodiment with a plurality of features or elements is merely to
provide a better understanding to the skilled person, but is not to
be construed as indicated that all these features or elements are
necessary for implementation of an embodiment.
[0049] Some embodiments described may have similar parts. The
similar parts may have same names or similar reference number. The
description of one such part applies by reference to another
similar part, where appropriate, thereby reducing repetition of
text and providing a more concise description. This, however, does
not imply that the similar parts are necessarily implemented in the
same manner.
[0050] FIG. 1 shows a computing module 10 illustrating an
embodiment. While module 10 will be described as comprising a
plurality of modules or systems, two or more of these modules or
systems may also be implemented together as a single module or
system.
[0051] The computing module 10 includes a processor sub-system 13,
a Double Data Rate (DDR) synchronous Dynamic Random-Access Memory
(DRAM) module 16, and a hardware memory copy (HWMemCopy) engine
19.
[0052] The processor sub-system 13 comprises a first computing core
21 with a first cache memory 24, a second computing core 26 with a
second cache memory 28, and an Input Output Control Port (IOCU)
module 30. In other embodiments, only one such core or more than
two cores may be provided. The cores may be implemented in a single
processor (e.g. CPU or GPU), but may also be provided in different
processors.
[0053] The HWMemCopy engine 19 includes an address translation
module 32 and a cached access module 34 with a memory input/output
(I/O) module 37, with a command register 40, and with a result
register 43.
[0054] The memory I/O module 37 is connected to an Interconnect
module 46, which is connected to ports 48 of the IOCU module 30 and
to DDR ports 51 of the DDR DRAM module 16.
[0055] The command register 40 is connected to computing cores 21
and 26.
[0056] Result register 43 is connected to a Yield Manager module 52
that is in turn connected to the computing cores 21 and 26.
[0057] A method of using the HWMemCopy engine 19 is described
below.
[0058] A software driver writes to registers of Address Translation
module 32 of the HWMemCopy engine 19 for configuring translation of
virtual memory address to physical memory address by the HWMemCopy
engine 19. Instead of a software driver (running e.g. on one or
both of cores, 21, 26) also another entity, e.g. a hardware module,
may perform the corresponding functions in other embodiments.
[0059] The software driver also configures Cache Memory Access
Transform parameters in the HWMemCopy engine 19 to enable the
HWMemCopy engine 19 to perform cache memory I/O operation.
[0060] When a memory copy operation is initiated, the software
driver provides the Command Register 40 of the HWMemCopy engine 19
with a source buffer memory address, a destination buffer memory
address. The software driver also provides the Command Register 40
with a flag data regarding source address translation, a flag data
regarding destination address translation, a flag data regarding
cache source I/O buffer memory, and/or a flag data regarding cache
destination I/O buffer memory.
[0061] The Address Translation Module 32 of the HWMemCopy engine 19
then performs translation of the source main memory address and the
destination main memory address, when needed.
[0062] Address Translation Module 32 also may also performs
translation of source cache memory address and destination cache
memory address, when needed.
[0063] HWMemCopy engine 19 then reads data from a source buffer
memory and writes corresponding data to a destination buffer memory
using the Memory I/O module 37 that couples a DMA read channel to a
DMA write channel.
[0064] A DMA-R engine 55 of the Memory I/O module 37 later may
sends the physical source memory address and bytes per burst data
to the Interconnect module 46 to read the source buffer memory.
[0065] The Interconnect module 46 decides to present the request to
either the DDR port 51 or to the IOCU port 48 based on an address
range of the address, e.g. based on whether the address belongs to
an address range for the DDR memory, which may be an example for a
main memory, or to another memory like a cache memory or the
processor sub-system in general.
[0066] If the Interconnect module 46 directs the address to the
IOCU port 48, the address has been adjusted to reflect its physical
memory address.
[0067] The cache memory sub-system then reads data from the DDR
port 51 if the received address is not in the cache memory 24
and/or 28.
[0068] The data, which is read by the DMA-R engine channel, is
passed to the DMA-W engine channel, which transfers the source
address and bytes per burst to the Interconnect module 46 for
writing the data to a source buffer memory.
[0069] The Interconnect module 46 later decides to present the data
either to the DDR port 51 or to the IOCU port 48 based on the
memory address range of the destination memory address.
[0070] If the Interconnect module 46 transfers the memory address
to IOCU port 48, the memory address has been adjusted to reflect
its physical memory address of the destination memory.
[0071] The cache memory sub-system later writes the desired data
via the DDR port 51 to maintain cache coherency. Some data in the
cache memory 24 and 28 may not be coherent and need not be written
to the DDR port 51 for improving performance. Also, cache data in a
level one (L1) data cache can be replaced appropriately.
[0072] Once all burst segments of a DMA operation are complete, the
HWMemCopy engine 19 issues an interrupt signal to the respective
requesting CPU core 21 or 26 regarding completion of the requested
memory copy (memcpy) operation.
[0073] This manner of memory copy has an advantage of not hogging
CPU time. The HWMemCopy engine 19 especially improves networking
throughput and maximizes application performance for embedded
CPU.
[0074] This is different from many kinds of software systems using
a memory copy, also called a memcpy function, which significantly
hogs CPU time as the CPU has to run the respective software.
Examples of systems using such a software approach are software
system are routers, gateways, and Network Attached Storage (NAS)
devices. In embodiments, the above-described techniques may be used
in such systems instead of the conventional software approach.
[0075] Although the above description contains much specificity,
this should not be construed as limiting the scope of the
embodiments but merely providing a more detailed illustration.
[0076] The above stated advantages of some of the embodiments
should not be construed as limiting the scope of the embodiments
but merely to explain possible achievements if the described
embodiments are put into practice. Other embodiments may not have
such advantages as described. Thus, the scope of the application
should be determined by the claims and their equivalents, rather
than by the examples given.
* * * * *