U.S. patent application number 10/401411 was filed with the patent office on 2004-09-30 for dma prefetch.
This patent application is currently assigned to International Business Machines Corporation. Invention is credited to Kahle, James Allan.
Application Number | 20040193754 10/401411 |
Document ID | / |
Family ID | 32989444 |
Filed Date | 2004-09-30 |
United States Patent
Application |
20040193754 |
Kind Code |
A1 |
Kahle, James Allan |
September 30, 2004 |
DMA prefetch
Abstract
A method and an apparatus are provided for prefetching data from
a system memory to a cache for a direct memory access (DMA)
mechanism in a computer system. A DMA mechanism is set up for a
processor. A load access pattern of the DMA mechanism is detected.
At least one potential load of data is predicted based on the load
access pattern. In response to the prediction, the data is
prefetched from a system memory to a cache before a DMA command
requests the data.
Inventors: |
Kahle, James Allan; (Austin,
TX) |
Correspondence
Address: |
Gregory W. Carr
670 Founders Square
900 Jackson Street
Dallas
TX
75202
US
|
Assignee: |
International Business Machines
Corporation
Armonk
NY
|
Family ID: |
32989444 |
Appl. No.: |
10/401411 |
Filed: |
March 27, 2003 |
Current U.S.
Class: |
710/22 |
Current CPC
Class: |
G06F 13/28 20130101 |
Class at
Publication: |
710/022 |
International
Class: |
G06F 013/28 |
Claims
1. A method for prefetching data from a system memory to a cache
for a direct memory access (DMA) mechanism in a computer system,
the method comprising the steps of: setting up the DMA mechanism
for a processor; detecting a load access pattern of the DMA
mechanism; predicting at least one potential load of data based on
the load access pattern; and in response to the prediction,
prefetching the data from the system memory to the cache before a
DMA command requests the data.
2. The method of claim 1, further comprising the step of, in
response to a DMA load request of the data, loading the data from
the cache.
3. The method of claim 1, wherein the computer system includes a
plurality of processors sharing the cache, further comprising the
step of loading the data from the cache to one or more of the
plurality of processors.
4. The method of claim 2, further comprising the step of issuing
the DMA load request of the data.
5. The method of claim 1, wherein the load access pattern includes
a pattern of consecutively loading two or more pieces of data
adjacently stored in a logical address space.
6. The method of claim 5, wherein the step of predicting at least
one potential load of data based on the load access pattern
comprises the step of predicting a potential load of a first piece
of data after a DMA load request of a second piece of data stored
adjacently to the first piece of data in a logical address
space.
7. The method of claim 1, wherein the processor includes a local
store, and wherein the data is loaded from the cache to the local
store of the processor.
8. A computer system comprising: a processor having a local store;
a memory flow controller (MFC) included in the processor, the MFC
having a load access pattern leading to a prediction of at least
one potential load of data; a system memory; and a cache coupled
between the processor and the system memory, wherein, in response
to the prediction, the data is prefetched from the system memory to
the cache before the MFC requests the data.
9. The computer system of claim 8, wherein the MFC, in response to
a DMA load request of the data, loads the data from the cache to
the local store.
10. A multiprocessor computer system comprising: one or more
processors, each processor having a local store; one or more memory
flow controllers (MFCs) each included in each processor, a first
MFC having a load access pattern leading to a prediction of at
least one potential load of data; a system memory; and a cache
coupled between at least one processor and the system memory,
wherein, in response to the prediction, the data is prefetched from
the system memory to the cache before the first DMAC requests the
data.
11. The multiprocessor computer system of claim 10, wherein at
least one of the processors is a synergistic processor complex
(SPC).
12. The multiprocessor computer system of claim 11, wherein the
synergistic processor complex (SPC) includes a synergistic
processor unit (SPU).
13. An apparatus for prefetching data from a system memory to a
cache for a direct memory access (DMA) mechanism in a computer
system, the apparatus comprising: means for setting up a DMA
mechanism for a processor; means for detecting a load access
pattern of the DMA mechanism; means for predicting at least one
potential load of data based on the load access pattern; and means
for in response to the prediction, prefetching the data from a
system memory to a cache before a DMA command requests the
data.
14. The apparatus of claim 13, further comprising means for, in
response to a DMA load request of the data, loading the data from
the cache.
15. The apparatus of claim 13, wherein the computer system includes
a plurality of processors sharing the cache, the apparatus further
comprising means for loading the data from the cache to one or more
of the plurality of processors.
16. The apparatus of claim 14, further comprising means for issuing
the DMA load request of the data.
17. The apparatus of claim 13, wherein the load access pattern
includes a pattern of consecutively loading two or more pieces of
data adjacently stored in a logical address space.
18. The apparatus of claim 17, wherein the means for predicting at
least one potential load of data based on the load access pattern
comprises means for predicting a potential load of a first piece of
data after a DMA load request of a second piece of data stored
adjacently to the first piece of data in a logical address
space.
19. The apparatus of claim 13, wherein the processor includes a
local store, and wherein the data is loaded from the cache to the
local store of the processor.
20. A computer program product for prefetching data from a system
memory to a cache for a direct memory access (DMA) mechanism in a
computer system, the computer program product having a medium with
a computer program embodied thereon, the computer program
comprising: computer program code for setting up a DMA mechanism
for a processor; computer program code for detecting a load access
pattern of the DMA mechanism; computer program code for predicting
at least one potential load of data based on the load access
pattern; and computer program code for, in response to the
prediction, prefetching the data from a system memory to a cache
before a DMA command requests the data.
21. The computer program product of claim 20, the computer program
further comprising computer program code for, in response to a DMA
load request of the data, loading the data from the cache.
22. The computer program product of claim 20, wherein the computer
system includes a plurality of processors sharing the cache, the
computer program further comprising computer program code for
loading the data from the cache to one or more of the plurality of
processors.
23. The computer program product of claim 21, the computer program
further comprising computer program code for issuing the DMA load
request of the data.
24. The computer program product of claim 20, wherein the load
access pattern includes a pattern of consecutively loading two or
more pieces of data adjacently stored in a logical address
space.
25. The computer program product of claim 24, wherein the computer
program code for predicting at least one potential load of data
based on the load access pattern comprises computer program code
for predicting a potential load of a first piece of data after a
DMA load request of a second piece of data stored adjacently to the
first piece of data in a logical address space.
26. The computer program product of claim 20, wherein the processor
includes a local store, and wherein the data is loaded from the
cache to the local store of the processor.
Description
BACKGROUND OF THE INVENTION
[0001] 1. Field of the Invention
[0002] The invention relates generally to memory management and,
more particularly, to prefetching data to a cache in a direct
memory access (DMA) mechanism.
[0003] 2. Description of the Related Art
[0004] In a multiprocessor design, a DMA mechanism is to move
information from one type of memory to another. The DMA mechanism
such as a DMA engine or DMA controller also moves information from
a system memory to a local store of a processor. When a DMA command
tries to move information from the system memory to the local store
of the processor, there is going to be some delay in fetching the
information from the system memory to the local store of the
processor.
[0005] Therefore, a need exists for a system and method for
prefetching data from a system memory to a cache for a direct
memory access (DMA) mechanism in a computer system.
SUMMARY OF THE INVENTION
[0006] The present invention provides a method and an apparatus for
prefetching data from a system memory to a cache for a direct
memory access (DMA) mechanism in a computer system. A DMA mechanism
is set up for a processor. A load access pattern of the DMA
mechanism is detected. At least one potential load of data is
predicted based on the load access pattern. In response to the
prediction, the data is prefetched from a system memory to a cache
before a DMA command requests the data.
BRIEF DESCRIPTION OF THE DRAWINGS
[0007] For a more complete understanding of the present invention
and the advantages thereof, reference is now made to the following
descriptions taken in conjunction with the accompanying drawings,
in which:
[0008] FIG. 1 shows a block diagram illustrating a single processor
computer system adopting a cache along with a direct memory access
(DMA) mechanism;
[0009] FIG. 2 shows a block diagram illustrating a multiprocessor
computer system adopting a cache along with a DMA mechanism;
and
[0010] FIG. 3 shows a flow diagram illustrating prefetching
mechanism applicable to a DMA mechanism as shown in FIGS. 1 and
2.
DETAILED DESCRIPTION
[0011] In the following discussion, numerous specific details are
set forth to provide a thorough understanding of the present
invention. However, it will be apparent to those skilled in the art
that the present invention may be practiced without such specific
details. In other instances, well-known elements have been
illustrated in schematic or block diagram form in order not to
obscure the present invention in unnecessary detail.
[0012] It is further noted that, unless indicated otherwise, all
functions described herein may be performed in either hardware or
software, or some combination thereof. In a preferred embodiment,
however, the functions are performed by a processor such as a
computer or an electronic data processor in accordance with code
such as computer program code, software, and/or integrated circuits
that are coded to perform such functions, unless indicated
otherwise.
[0013] Referring to FIG. 1 of the drawings, the reference numeral
100 generally designates a single processor computer system
adopting a cache in a direct memory access (DMA) mechanism. The
single processor computer system 100 comprises a synergistic
processor complex (SPC) 102, which includes a synergistic processor
unit (SPU) 104, a local store 106, and a memory flow controller
(MFC) 108. The single processor computer system also includes an
SPU's L1 cache (SL1 cache) 109 and a system memory 110. The SPC 102
is coupled to the SL1 cache 109 via a connection 112. The SL1 cache
109 is coupled to the system memory 110 via a connection 114. The
MFC 108 functions as a DMA controller.
[0014] Once the MFC 108 is set up to perform data transfers between
the system memory 110 and the local store 106, a load access
pattern of the MFC 108 is detected. The load access pattern
generally contains information on the data being transferred. The
load access pattern can be used to predict future data transfers
and prefetch data to the SL1 cache 109 before the MFC 108 actually
requests the data. When the MFC 108 actually requests the data, the
MFC 108 does not have to go all the way back to the system memory
110 to retrieve the data. Instead, the MFC 108 accesses the SL1
cache 109 to retrieve the data and transfer the data to the local
store 106.
[0015] Preferably, the MFC 108 checks the SL1 cache 109 first for
any data. If there is a hit, the MFC 108 transfers the data from
the SL1 cache 109 to the local store 106. If there is a miss, the
MFC 108 transfers the data from the system memory 110 to the local
store 106.
[0016] FIG. 2 is a block diagram illustrating a multiprocessor
computer system 200 adopting a cache in a DMA mechanism. The
multiprocessor computer system 200 has one or more synergistic
processor complexes (SPCs) 202. The SPC 202 has a synergistic
processor unit (SPU) 204, a local store 206, and a memory flow
controller (MFC) 208. The multiprocessor computer system 200
further comprises an SPU's L1 cache (SL1 cache) 210 and a system
memory 212. The SL1 cache 210 is coupled between the SPC 202 and
the system memory 212 via connections 216 and 218. Note here that
the single SL1 cache 210 is used to interface with all the SPCs
202. In different implementations, however, a plurality of caches
may be used. Additionally, the multiprocessor computer system 200
comprises a processing unit (PU) 220, which includes an L1 cache
222. The multiprocessor computer system 200 further comprises an L2
cache 224 coupled between the PU 220 and the system memory 212 via
connections 226 and 228.
[0017] Once the MFC 208 is set up to perform data transfers between
the system memory 212 and the local store 206, a load access
pattern of the MFC 208 is detected. The load access pattern
generally contains information on the data being transferred. The
load access pattern can be used to predict future data transfers
and prefetch data to the SL1 cache 210 before the MFC 208 actually
requests the data. When the MFC 208 actually requests the data, the
MFC 208 does not have to go all the way back to the system memory
212 to retrieve the data. Instead, the MFC 208 accesses the SL1
cache 210 to retrieve the data and transfer the data to the local
store 206.
[0018] Now referring to FIG. 3, shown is a flow diagram
illustrating a prefetching mechanism 300 applicable to a DMA
mechanism as shown in FIGS. 1 and 2.
[0019] In step 302, the DMA mechanism is set up for a processor. In
FIG. 1, for example, the MFC 108 is set up for the SPC 102. In FIG.
2, for example, the MFC 208 is set up for the SPC 202. In step 304,
a load access pattern of the DMA mechanism is detected. In
streaming data, for example, a load of a first piece of data leads
to a subsequent load of a second piece of data stored adjacently to
the first piece of data in a logical address space. Therefore, in
this example, it is very likely that the second piece of data will
be requested to be loaded soon after the load of the first
piece.
[0020] In step 306, at least one potential load of data is
predicted based on the load access pattern. In the same example,
the second piece of data is predicted to be loaded soon. In step
308, in response to the prediction, the data is prefetched from the
system memory to the cache before a DMA command requests the data.
In step 310, in response to a DMA load request of the data, the
data is loaded from the cache.
[0021] It will be understood from the foregoing description that
various modifications and changes may be made in the preferred
embodiment of the present invention without departing from its true
spirit. This description is intended for purposes of illustration
only and should not be construed in a limiting sense. The scope of
this invention should be limited only by the language of the
following claims.
* * * * *