U.S. patent application number 13/620199 was filed with the patent office on 2013-05-16 for memory subsystem and method.
This patent application is currently assigned to GOOGLE INC.. The applicant listed for this patent is Suresh Natarajan Rajan, David T. Wang. Invention is credited to Suresh Natarajan Rajan, David T. Wang.
Application Number | 20130124904 13/620199 |
Document ID | / |
Family ID | 47721345 |
Filed Date | 2013-05-16 |
United States Patent
Application |
20130124904 |
Kind Code |
A1 |
Wang; David T. ; et
al. |
May 16, 2013 |
MEMORY SUBSYSTEM AND METHOD
Abstract
One embodiment of the present invention sets forth an interface
circuit configured to combine time staggered data bursts returned
by multiple memory devices into a larger contiguous data burst. As
a result, an accurate timing reference for data transmission that
retains the use of data (DQ) and data strobe (DQS) signals in an
infrastructure-compatible system while eliminating the cost of the
idle cycles required for data bus turnarounds to switch from
reading from one memory device to reading from another memory
device, or from writing to one memory device to writing to another
memory device may be obtained, thereby increasing memory system
bandwidth relative to the prior art approaches.
Inventors: |
Wang; David T.; (Thousand
Oaks, CA) ; Rajan; Suresh Natarajan; (San Jose,
CA) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
Wang; David T.
Rajan; Suresh Natarajan |
Thousand Oaks
San Jose |
CA
CA |
US
US |
|
|
Assignee: |
GOOGLE INC.
Mountain View
CA
|
Family ID: |
47721345 |
Appl. No.: |
13/620199 |
Filed: |
September 14, 2012 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
12144396 |
Jun 23, 2008 |
8386722 |
|
|
13620199 |
|
|
|
|
Current U.S.
Class: |
713/401 |
Current CPC
Class: |
G06F 13/1689 20130101;
G11C 7/22 20130101; G06F 1/12 20130101 |
Class at
Publication: |
713/401 |
International
Class: |
G06F 1/12 20060101
G06F001/12 |
Claims
1. (canceled)
2. A sub-system comprising: a plurality of memory devices
comprising a first memory device, wherein a timing for a data burst
from each of the plurality of memory devices is provided by a
respective, different, data strobe (DQS) signal; an interface
circuit comprising: a plurality of memory data signal interfaces
comprising a first memory data signal interface, a number of the
plurality of memory data signal interfaces being equal to a number
of the plurality of memory devices, each memory data signal
interface including a respective data (DQ) path and a respective
data strobe (DQS) path coupled to a corresponding memory device of
the plurality of memory devices, wherein the first memory data
signal interface is coupled to the first memory device; a system
control signal interface coupled to a memory controller, the system
control signal interface configured to receive a first read command
from the memory controller; and emulation and command translation
logic configured to: select the first memory data signal interface
based on the first read command; receive a first data burst from
the first memory data signal interface, wherein a timing reference
for the first data burst is provided by a DQS signal of the first
memory device; delay the first data burst to align a phase
difference between the DQS signal of the first memory device and a
clock signal of the interface circuit; and transmit the delayed
first data burst to the memory controller.
3. The sub-system of claim 2, wherein the plurality of memory
devices further comprises a second memory device, the plurality of
memory data signal interfaces further comprises a second memory
data signal interface, the system control signal interface is
further configured to receive a second read command from the memory
controller, and the emulation and command translation logic is
further configured to: select the second memory data signal
interface based on the second read command; receive a second data
burst from the second memory data signal interface, wherein a
timing of the second data burst is provided by a DQS signal of the
second memory device; delay the second data burst to align a phase
difference between the DQS signal of the second memory device and
the clock signal; combine the delayed first data burst and the
delayed second data burst into a contiguous data burst; and
transmit the contiguous data burst to the memory controller.
4. The sub-system of claim 3, wherein the emulation and command
translation logic is further configured to: emulate a virtual
memory device using at least the first memory device and the second
memory device, wherein a memory capacity of the virtual memory
device is equal to a combined memory capacity of the first memory
device and the second memory device; and present the virtual memory
device to the memory controller.
5. The sub-system of claim 2, wherein the interface circuit further
comprises initialization and configuration logic, the
initialization and configuration logic configured to: select the
first memory data signal interface; issue a calibration read
command, via the first memory data signal interface, to read test
data stored at the first memory device; receive the test data from
the first memory device across the first memory data signal
interface; determine the phase difference between the DQS signal of
the first memory device and the clock signal based on a timing of
the received test data; and set a delay within the first memory
data signal interface corresponding to the first memory device.
6. The sub-system of claim 5, wherein selecting the first memory
data signal interface further comprises receiving a calibration
request and selecting the first memory data signal interface in
response to the received calibration request.
7. The sub-system of claim 2, wherein the interface circuit further
comprises data path logic, and wherein the data path logic is
configured to concatenate two or more data bursts to eliminate an
inter-device command scheduling constraint between the two or more
data bursts.
8. The sub-system of claim 7, wherein the inter-device command
scheduling constraint includes a rank-to-rank data bus turnaround
time or an on-die-termination (ODT) control switching time.
9. An interface circuit comprising: a plurality of memory data
signal interfaces comprising a first memory data signal interface,
each memory data signal interface including a respective data (DQ)
path and a respective data strobe (DQS) path coupled to a
respective, different, memory device; a system control signal
interface coupled to a memory controller, the system control signal
interface configured to receive a first read command from the
memory controller; and emulation and command translation logic
configured to: select the first memory data signal interface based
on the first read command; receive a first data burst from the
first memory data signal interface, wherein a timing reference for
the first data burst is provided by a DQS signal of a memory device
coupled to the first memory data signal interface; delay the first
data burst to align a phase difference between the DQS signal of
the memory device and a clock signal of the interface circuit; and
transmit the delayed first data burst to the memory controller.
10. The interface circuit of claim 9, wherein the plurality of
memory data signal interfaces further comprises a second memory
data signal interface, the system control signal interface is
further configured to receive a second read command from the memory
controller, and the emulation and command translation logic is
further configured to: select the second memory data signal
interface based on the second read command; receive a second data
burst from the second memory data signal interface, wherein a
timing reference for the second data burst is provided by a DQS
signal of a memory device coupled to the second memory data signal
interface; delay the second data burst to align a phase difference
between the clock signal and the DQS signal of the memory device
coupled to the second memory data signal interface; combine the
delayed first data burst and the delayed second data burst into a
contiguous data burst; and transmit the contiguous data burst to
the memory controller.
11. The interface circuit of claim 10, wherein the emulation and
command translation logic is further configured to: emulate a
virtual memory device using at least the memory device coupled to
the first memory data signal interface and the memory device
coupled to the second memory data signal interface, wherein a
memory capacity of the virtual memory device is equal to a combined
memory capacity of the two memory devices; and present the virtual
memory device to the memory controller.
12. The interface circuit of claim 9, further comprising
initialization and configuration logic, the initialization and
configuration logic configured to: select the first memory data
signal interface; issue a calibration read command, via the first
memory data signal interface, to read test data stored at the
memory device coupled to the first memory data signal interface;
receive the test data across the first memory data signal
interface; determine the phase difference between the DQS signal of
the memory device coupled to the first memory data signal interface
and the clock signal based on a timing of the received test data;
and set a delay within the first memory data signal interface
corresponding to the memory device coupled to the first memory data
signal interface.
13. The interface circuit of claim 12, wherein selecting the first
memory data signal interface further comprises receiving a
calibration request and selecting the first memory data signal
interface in response to the received calibration request.
14. The interface circuit of claim 9 further comprising data path
logic, the data path logic configured to concatenate two or more
data bursts to eliminate an inter-device command scheduling
constraint between the two or more data bursts.
15. The interface circuit of claim 14, wherein the inter-device
command scheduling constraint includes a rank-to-rank data bus
turnaround time or an on-die-termination (ODT) control switching
time.
16. A computer-implemented method, comprising: receiving, by an
interface circuit, a first read command from a memory controller;
selecting a first memory data signal interface of a plurality of
memory data signal interfaces based on the first read command;
receiving a second read command from a memory controller; selecting
a second memory data signal interface of the plurality of memory
data signal interfaces based on the second read command; receiving
a first data burst from the first memory data signal interface,
wherein a timing reference for the first data burst is provided by
a first DQS signal of a memory device coupled to the first memory
data signal interface; receiving a second data burst from the
second memory data signal interface, wherein a timing reference for
the second data burst is provided by a second, different, DQS
signal of a memory device coupled to the second memory data signal
interface; delaying the first data burst to align a phase
difference between the first DQS signal and a clock signal of the
interface circuit; delaying the second data burst to align a phase
difference between the second DQS signal and the clock signal of
the interface circuit; concatenating the delayed first data burst
and the delayed second data burst into a contiguous data burst; and
transmitting the contiguous data burst to the memory
controller.
17. The method of claim 16, further comprising: selecting the first
memory data signal interface; issuing a calibration read command,
via the first memory data signal interface, to read test data
stored at the memory device coupled to the first memory data signal
interface; receiving the test data across the first memory data
signal interface; determining the phase difference between the
first DQS signal the clock signal based on a timing of the received
test data; and setting a delay within the first memory data signal
interface corresponding to the memory device coupled to the first
memory data signal interface.
18. The method of claim 16, further comprising: emulating a virtual
memory device using at least the memory device coupled to the first
memory data signal interface and the memory device coupled to the
second memory data signal interface, wherein a memory capacity of
the virtual memory device is equal to a combined memory capacity of
the two memory devices; and presenting the virtual memory device to
the memory controller.
19. The method of claim 16, wherein concatenating the delayed first
data burst and the delayed second data burst into a contiguous data
burst further comprises concatenating the delayed first data burst
and the delayed second data burst into the contiguous data burst to
eliminate an inter-device command scheduling constraint between the
first data burst and the second data burst.
20. The method of claim 19, wherein the inter-device command
scheduling constraint includes a rank-to-rank data bus turnaround
time or an on-die-termination (ODT) control switching time.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application is a continuation of U.S. patent
application Ser. No. 12/144,396, filed Jun. 23, 2008, the subject
matter of which is hereby incorporated herein by reference.
BACKGROUND OF THE INVENTION
[0002] 1. Field of the Invention
[0003] Embodiments of the present invention generally relate to
memory subsystems and, more specifically, to improvements to such
memory subsystems.
[0004] 2. Description of the Related Art
[0005] Memory circuit speeds remain relatively constant, but the
required data transfer speeds and bandwidth of memory systems are
increasing, currently doubling every three years. The result is
that more commands must be scheduled, issued and pipelined in a
memory system to increase bandwidth. However, command scheduling
constraints that exist in the memory systems limit the command
issue rates, and consequently, limit the increase in bandwidth.
[0006] In general, there are two classes of command scheduling
constraints that limit command scheduling and command issue rates
in memory systems: inter-device command scheduling constraints, and
intra-device command scheduling constraints. These command
scheduling constraints and other timing constraints and timing
parameters are defined by manufacturers in their memory device data
sheets and by standards organizations such as JEDEC.
[0007] Examples of inter-device (between devices) command
scheduling constraints include rank-to-rank data bus turnaround
times, and on-die-termination (ODT) control switching times. The
inter-device command scheduling constraints typically arise because
the devices share a resource (for example a data bus) in the memory
sub-system.
[0008] Examples of intra-device (inside devices) command-scheduling
constraints include column-to-column delay time (tCCD), row-to-row
activation delay time (tRRD), four-bank activation window time
(tFAW), and write-to-read turn-around time (tWTR). The intra-device
command-scheduling constraints typically arise because parts of the
memory device (e.g. column, row, bank, etc.) share a resource
inside the memory device.
[0009] In implementations involving more than one memory device,
some technique must be employed to assemble the various
contributions from each memory device into a word or command or
protocol as may be processed by the memory controller. Various
conventional implementations, in particular designs within the
classification of Fully Buffered DIM Ms (FBDIMMs, a type of
industry standard memory module) are designed to be capable of such
assembly. However, there are several problems associated with such
an approach. One problem is that the FBDIMM approach introduces
significant latency (see description, below). Another problem is
that the FBDIMM approach requires a specialized memory controller
capable of processing the assembly.
[0010] As memory speed increases, the introduction of latency
becomes more and more of a detriment to the operation of the memory
system. Even modern FBDIMM-type memory systems introduce 10 s of
nanoseconds of delay as the packet is assembled. As will be shown
in the disclosure to follow, the latency introduced need not be so
severe.
[0011] Moreover, the implementation of the FBDIMM-type memory
devices required corresponding changes in the behavior of the
memory controller, and this FBDIMMS are not backward compatible
among industry-standard memory system. As will be shown in the
disclosure to follow, various embodiments of the present invention
may be used with previously existing memory controllers, without
modification to their logic or interfacing requirements.
[0012] In order to appreciate the extent of the introduction of
latency in an FBDIMM-type memory system, one needs to refer to FIG.
1. FIG. 1 shows an FBDIMM-type memory system 100 wherein multiple
DRAMS (D0, D1, . . . D7, D8) are in communication via a
daisy-chained interconnect. The buffer 105 is situated between two
memory circuits (e.g. D1 and D2). In the READ path, the buffer 105
is capable to present to memory D.sub.N the data retrieved from
D.sub.M (M>N). Of course in a conventional FBDIMM-type system,
the READ data from each successively higher memory D.sub.M must be
merged with the data of memory D.sub.N, and such function is
implemented via pass-through and merging logic 106. As can be seen,
such an operation occurs sequentially at each buffer 105, and
latency is thus cumulatively introduced.
[0013] As the foregoing illustrates, what is needed in the art is a
memory subsystem and method that overcome the shortcomings of prior
art systems.
SUMMARY OF THE INVENTION
[0014] One embodiment of the present invention sets forth an
interface circuit configured to combine a plurality of data bursts
returned by a plurality of memory devices into a contiguous data
burst. The interface circuit includes a system control signal
interface adapted to receive a first command from a memory
controller and emulation and command translation logic adapted to
translate a first address associated with the first command, issue
the first command to a first memory device within the plurality of
memory devices corresponding to the first address, and determine
that the first command is a read command. The emulation and command
translation logic is further adapted to select a memory data signal
interface corresponding to the first memory device, receive a first
data burst from the first memory device, delay the first data burst
to eliminate a first clock-to-data phase between the first memory
device and the interface circuit, and re-drive the first data burst
to the memory controller.
[0015] One advantage of the disclosed interface circuit is that it
can provide higher memory performance by not requiring idle bus
cycles to turnaround the data bus when switching from reading from
one memory device to reading from another memory device, or from
writing to one memory device to writing to another memory
device.
BRIEF DESCRIPTION OF THE DRAWINGS
[0016] So that the manner in which the above recited features of
the present invention can be understood in detail, a more
particular description of the invention, briefly summarized above,
may be had by reference to embodiments, some of which are
illustrated in the appended drawings. It is to be noted, however,
that the appended drawings illustrate only typical embodiments of
this invention and are therefore not to be considered limiting of
its scope, for the invention may admit to other equally effective
embodiments.
[0017] FIG. 1 illustrates an FBDIMM-type memory system, according
to prior art;
[0018] FIG. 2A illustrates major logical components of a computer
platform, according to prior art;
[0019] FIG. 2B illustrates major logical components of a computer
platform, according to one embodiment of the present invention;
[0020] FIG. 2C illustrates a hierarchical view of the major logical
components of a computer platform shown in FIG. 2B, according to
one embodiment of the present invention;
[0021] FIG. 3A illustrates a timing diagram for multiple memory
devices in a low data rate memory system, according to prior
art;
[0022] FIG. 3B illustrates a timing diagram for multiple memory
devices in a higher data rate memory system, according to prior
art;
[0023] FIG. 3C illustrates a timing diagram for multiple memory
devices in a high data rate memory system, according to prior
art;
[0024] FIG. 4A illustrates a data flow diagram showing how time
separated bursts are combined into a larger contiguous burst,
according to one embodiment of the present invention;
[0025] FIG. 4B illustrates a waveform corresponding to FIG. 4A
showing how time separated bursts are combined into a larger
contiguous burst, according to one embodiment of the present
invention;
[0026] FIG. 4C illustrates a flow diagram of method steps showing
how the interface circuit can optionally make use of a training or
clock-to-data phase calibration sequence to independently track the
clock-to-data phase relationship between the memory components and
the interface circuit, according to one embodiment of the present
invention;
[0027] FIG. 4D illustrates a flow diagram showing the operations of
the interface circuit in response to the various commands,
according to one embodiment of the present invention;
[0028] FIGS. 5A through 5F illustrates a computer platform that
includes at least one processing element and at least one memory
module, according to various embodiments of the present
invention.
DETAILED DESCRIPTION
[0029] FIG. 2A illustrates major logical components of a computer
platform 200, according to prior art. As shown, the computer
platform 200 includes a system 220 and an array of memory
components 210 interconnected via a parallel interface bus 240. As
also shown, the system 220 further includes a memory controller
225.
[0030] FIG. 2B illustrates major logical components of a computer
platform 201, according to one embodiment of the present invention.
As shown, the computer platform 201 includes the system 220 (e.g.,
a processing unit) that further includes the memory controller 225.
The computer platform 201 also includes an array of memory
components 210 interconnected to an interface circuit 250, which is
connected to the system 220 via the parallel interface bus 240. In
various embodiments, the memory components 210 may include logical
or physical components. In one embodiment, the memory components
210 may include DRAM devices. In such a case, commands from the
memory controller 225 that are directed to the DRAM devices respect
all of the command-scheduling constraints (e.g. tRRD, tCCD, tFAW,
tWTR, etc.). In the embodiment of FIG. 2B, none of the memory
components 210 is in direct communication with the memory
controller 225. Instead, all communication to/from the memory
controller 225 and the memory components 210 is carried out through
the interface circuit 250. In other embodiments, only some of the
communication to/from the memory controller 225 and the memory
components 210 is carried out through the interface circuit
250.
[0031] FIG. 2C illustrates a hierarchical view of the major logical
components of the computer platform 201 shown in FIG. 2B, according
to one embodiment of the present invention. FIG. 2C depicts the
computer platform 201 being comprised of wholly separate
components, namely the system 220 (e.g. a motherboard), and the
memory components 210 (e.g. logical or physical memory
circuits).
[0032] In the embodiment shown, the system 220 further comprises a
memory interface 221, logic for retrieval and storage of external
memory attribute expectations 222, memory interaction attributes
223, a data processing engine 224 (e.g., a CPU), and various
mechanisms to facilitate a user interface 225. In various
embodiments, the system 220 is designed to the specifics of various
standards, in particular the standard defining the interfaces to
JEDEC-compliant semiconductor memory (e.g DRAM, SDRAM, DDR2, DDR3,
etc.). The specific of these standards address physical
interconnection and logical capabilities. In different embodiments,
the system 220 may include a system BIOS program capable of
interrogating the memory components 210 (e.g. DIMMs) as a way to
retrieve and store memory attributes. Further, various external
memory embodiments, including JEDEC-compliant DIMMs, include an
EEPROM device known as a serial presence detect (SPD) where the
DIMM's memory attributes are stored. It is through the interaction
of the BIOS with the SPD and the interaction of the BIOS with the
physical memory circuits' physical attributes that the memory
attribute expectations and memory interaction attributes become
known to the system 220.
[0033] As also shown, the computer platform 201 includes one or
more interface circuits 250 electrically disposed between the
system 220 and the memory components 210. The interface circuit 250
further includes several system-facing interfaces, for example, a
system address signal interface 271, a system control signal
interface 272, a system clock signal interface 273, and a system
data signal interface 274. Similarly, the interface circuit 250
includes several memory-facing interfaces, for example, a memory
address signal interface 275, a memory control signal interface
276, a memory clock signal interface 277, and a memory data signal
interface 278.
[0034] In FIG. 2C, the memory data signal interface 278 is
specifically illustrated as separate, independent interface. This
illustration is specifically designed to demonstrate the functional
operation of the seamless burst merging capability of the interface
circuit 250, and should not be construed as a limitation on the
implementation of the interface circuit. In other embodiments, the
memory data signal interface 278 may be composed of more than one
independent interfaces. Furthermore, specific implementations of
the interface circuit 250 may have a memory address signal
interface 275 that is similarly composed of more than one
independently operable memory address signal interfaces, and
multiple, independent interfaces may exist for each of the signal
interfaces included within the interface circuit 250.
[0035] An additional characteristic of the interface circuit 250 is
the presence of emulation and command translation logic 280, data
path logic 281, and initialization and configuration logic 282. The
emulation and command translation logic 280 is configured to
receive and, optionally, store electrical signals (e.g. logic
levels, commands, signals, protocol sequences, communications) from
or through the system-facing interfaces, and process those signals.
In various embodiments, the emulation and command translation logic
280 may respond to signals from the system-facing interfaces by
responding back to the system 220 by presenting signals to the
system 220, process those signals with other information previously
stored, present signals to the memory components 210, or perform
any of the aforementioned operations in any order.
[0036] The emulation and command translation logic 280 is capable
of adopting a personality, and such personality defines the
physical memory component attributes. In various embodiments of the
emulation and command translation logic 280, the personality can be
set via any combination of bonding options, strapping, programmable
strapping, the wiring between the interface circuit 250 and the
memory components 210, and actual physical attributes (e.g. value
of mode register, value of extended mode register) of the physical
memory connected to the interface circuit 250 as determined at some
moment when the interface circuit 250 and memory components 210 are
powered up.
[0037] The data path logic 281 is configured to receive internally
generated control and command signals from the emulation and
command translation logic 280, and use the signals to direct the
flow of data through the interface circuit 250. The data path logic
281 may alter the burst length, burst ordering, data-to-clock
phase-relationship, or other attributes of data movement through
the interface circuit 250.
[0038] The initialization and configuration logic 282 is capable of
using internally stored initialization and configuration logic to
optionally configure all other logic blocks and signal interfaces
in the interface circuit 250. In one embodiment, the emulation and
command translation logic 280 is able to receive configuration
request from the system control signal interface 272, and configure
the emulation and command translation logic 280 to adopt different
personalities.
[0039] More illustrative information will now be set forth
regarding various optional architectures and features of different
embodiments with which the foregoing frameworks may or may not be
implemented, per the desires of the user. It should be noted that
the following information is set forth for illustrative purposes
and should not be construed as limiting in any manner. Any of the
following features may be optionally incorporated with or without
the other features described.
Industry-Standard Operation
[0040] In order to discuss specific techniques for inter- and
intra-device delays, some discussion of access commands and how
they are used is foundational.
[0041] Typically, access commands directed to industry-standard
memory systems such as DDR2 and DDR3 SDRAM memory systems may be
required to respect command-scheduling constraints that limit the
available memory bandwidth. Note: the use of DDR2 and DDR3 in this
discussion is purely illustrative examples, and is not to be
construed as limiting in scope.
[0042] In modern DRAM devices, the memory storage cells are
arranged into multiple banks, each bank having multiple rows, and
each row having multiple columns. The memory storage capacity of
the DRAM device is equal to the number of banks times the number of
rows per bank times the number of column per row times the number
of storage bits per column. In industry-standard DRAM devices (e.g.
SDRAM, DDR, DDR2, DDR3, and DDR4 SDRAM, GDDR2, GDDR3 and GDDR4
SGRAM, etc.), the number of banks per device, the number of rows
per bank, the number of columns per row, and the column sizes are
determined by a standards-setting organization such as JEDEC. For
example, the JEDEC standards require that a 1 Gb DDR2 or DDR3 SDRAM
device with a four-bit wide data bus have eight banks per device,
8192 rows per bank, 2048 columns per row, and four bits per column.
Similarly, a 2 Gb device with a four-bit wide data bus must have
eight banks per device, 16384 rows per bank, 2048 columns per row,
and four bits per column. A 4 Gb device with four-bit wide data bus
must have eight banks per device, 32768 rows per bank, 2048 columns
per row, and four bits per column. In the 1 Gb, 2 Gb and 4 Gb
devices, the row size is constant, and the number of rows doubles
with each doubling of device capacity. Thus, a 2 Gb or a 4 Gb
device may be emulated by using multiple 1 Gb and 2 Gb devices, and
by directly translating row-activation commands to row-activation
commands and column-access commands to column-access commands. This
emulation is possible because the 1 Gb, 2 Gb, and 4 Gb devices all
have the same row size.
[0043] The JEDEC standards require that an 8 Gb device with a
four-bit wide data bus interface must have eight banks per device,
32768 rows per bank, 4096 columns per row, and four bits per
column--thus doubling the row size of the 4 Gb device.
Consequently, an 8 Gb device cannot necessarily be emulated by
using multiple 1 Gb, 2 Gb or 4 Gb devices and simply translating
row-activation commands to row-activation commands and
column-access commands to column-access commands.
[0044] Now, with an understanding of how access commands are used,
presented as follows are various additional optional techniques
that may optionally be employed in different embodiments to address
various possible issues.
[0045] FIG. 3A illustrates a timing diagram for multiple memory
devices (e.g., SDRAM devices) in a low data rate memory system,
according to prior art. FIG. 3A illustrates that multiple SDRAM
devices in a low data rate memory system can share the data bus
without needing idle cycles between data bursts. That is, in a low
data rate system, the inter-device delays involved are small
relative to a clock cycle. Therefore, multiple devices may share
the same bus and even though there may be some timing uncertainty
when one device stops being the bus master and another device
becomes the bus master, the data cycle is not delayed or corrupted.
This scheme using time division access to the bus has been shown to
work for time multiplexed bus masters in a low data rate memory
systems--without the requirement to include idle cycles to switch
between the different bus masters.
[0046] As the speed of the clock increases, the inter- and
intra-device delays comprise successively more and more of a clock
cycle (as a ratio). At some point, the inter- and intra-device
delays are sufficiently large (relative to a clock cycle) that the
multiple devices on a shared bus must be managed. In particular,
and as shown in FIG. 3B, as the speed of the clock increases, the
inter- and intra-device delays comprise successively more and more
of a clock cycle (as a ratio). Consequently, a one cycle delay is
needed between the end of a read data burst of a first device on a
shared device and the beginning of a read data burst of a second
device on the same bus. FIG. 3B illustrates that, at the clock rate
shown, multiple memory devices (e.g., DDR SDRAM, DDR2 SDRAM, DDR3
SDRAM devices) sharing the data bus must necessarily incur
minimally a one cycle penalty when switching from one memory device
driving the data bus to another memory device driving the data
bus.
[0047] FIG. 3C illustrates a timing diagram for multiple memory
devices in a high data rate memory system, according to prior art.
FIG. 3C shows command cycles, timing constraints 310 and 320, and
idle cycles of memory. As the clock rate further increases, the
inter- and intra-device delay may become as long as one or more
clock cycles. In such a case, switching between a first memory
device and a second memory device would introduce one or more idle
cycles 330. Embodiments of the invention herein might be
advantageously applied to reduce or eliminate idle time 330 between
the data transfers 328 and 329.
[0048] Continuing the discussion of FIG. 3C, the timing diagram
shows a limitation preventing full bandwidth utilization in a DDR3
SDRAM memory system. For example, in an embodiment involving DDR3
SDRAM memory systems, any two row-access commands directed to a
single DRAM device may not necessarily be scheduled closer than a
period of time defined by the timing parameter of tRRD. As another
example, at most four row-access commands may be scheduled within a
period of time defined by the timing parameter of tFAW to a single
DRAM device. Moreover, consecutive column-read access commands and
consecutive column-write access commands cannot necessarily be
scheduled to a given DRAM device any closer than tCCD, where tCCD
equals four cycles (eight half-cycles of data) in DDR3 DRAM
devices. This situation is shown in the left portion of the timing
diagram of FIG. 3C at 305. Row-access or row-activation commands
are shown as ACT in the figures. Column-access commands are shown
as READ or WRITE in the figures. Thus, for example, in memory
systems that require a data access in a data burst of four
half-cycles as shown in FIG. 3C, the tCCD constraint prevents
column accesses from being scheduled consecutively. FIG. 3C shows
that the constraints 310 and 320 imposed on the DRAM commands sent
to a given device restrict the command rate, resulting in idle
cycles or bubbles 330 on the data bus and reducing the bandwidth.
Again, embodiments of the invention herein might be advantageously
applied to reduce or eliminate idle time 330 between the data
transfers 328 and 329.
[0049] As illustrated in FIGS. 3A-3C, idle-cycle-less data bus
switching was possible with slower speed DRAM memory systems such
as SDRAM memory systems, but not possible with higher speed DRAM
memory systems such as DDR SDRAM, DDR2 SDRAM, DDR3 SDRAM devices
due to the fact that in any memory system where multiple memory
devices share the same data bus, the skew and jitter
characteristics of address, clock, and data signals introduce
timing uncertainties into the access protocol of the memory system.
In the case when the memory controller wishes to stop accessing one
memory device to switch to accessing a different device, the
differences in address, clock and data signal skew and jitter
characteristics of the two difference memory devices reduce the
amount of time that the memory controller can use to reliably
capture data. In the case of the slow-speed SDRAM memory system,
the SDRAM memory system is designed to operate at speeds no higher
than 200 MHz, and data bus cycle times are longer than 5
nanoseconds (ns). Consequently, timing uncertainties introduced by
inter-device skew and jitter characteristics may be tolerated as
long as they are sufficiently smaller than the cycle time of the
memory system--for example, 1 ns. However, in the case of higher
speed memory systems, where data bus cycles times are comparable in
duration to, or shorter than, one-nanosecond, a one-nanosecond
uncertainty in skew or jitter between signal timing from different
devices means that memory controllers can no longer reliably
capture data from different devices without accounting for the
inter-device skew and jitter characteristics.
[0050] As illustrated in FIG. 3B, DDR SDRAM, DDR2 and DDR3 SDRAM
memory systems use the DQS signal to provide a source-synchronous
timing reference between the DRAM devices and the memory
controller. The use of the DQS signal provides accurate timing
control at the cost of idle cycles that must be incurred when a
first bus master (DRAM device) stops driving the DQS signal, and a
second bus master (DRAM device) starts to drive the DQS signal for
at least one cycle before the second bus master places the data
burst on the shared data bus. The placement of multiple DRAM
devices on the same shared data bus is a desirable configuration
from the perspective of enabling a higher capacity memory system
and providing a higher degree of parallelism to the memory
controller. However, the required use of the DQS signal
significantly lowers the sustainable bandwidth of the memory
system.
[0051] The advantage of the infrastructure-compatible burst merging
interface circuit 250 illustrated in FIGS. 2B and 2C and described
in greater detail below is that it can provide the higher capacity,
higher parallelism that the memory controller desires while
retaining the use of the DQS signal in an infrastructure-compatible
system to provide the accurate timing reference for data
transmission that is critical for modern memory systems, without
the cost of the idle cycles required for the multiple bus masters
(DRAM devices) to switch from one DRAM device to another.
Elimination of Idle Data-Bus Cycles Using an Interface Circuit
[0052] FIG. 4A illustrates a data flow diagram through the data
signal interfaces 278, Data Path Logic 281 and System Data Signal
Interface 274 of FIG. 2C, showing how data bursts returned by
multiple memory devices in response to multiple, independent read
commands to different memory devices connected respectively to Data
Path A, synchronized by Data Strobe A, Data Path B, synchronized by
Data Strobe B, and Data Path C, synchronized by Data Strobe C are
combined into a larger contiguous burst, according to one
embodiment of the present invention. In particular, data burst B
(B0, B1, B2, B3) 4A20 is slightly overlapping with data burst A
(A0, A1, A2, A3) 4A10. Also, data burst C 4A30 does not overlap
with either the data burst A 410, nor the data burst B 4A20. As
described in greater detail in FIGS. 4C and 4D, various logic
components of the interface circuit 250 illustrated in FIG. 2C are
configured to re-time overlapping or non-overlapping bursts to
obtain contiguous burst of data 4A40. In various embodiments, the
logic required to implement the ordering and concatenation of
overlapping or non-overlapping bursts may be implemented using
registers, multiplexors, and combinational logic. As shown in FIG.
4A, the assembled, contiguous burst of data 4A40 is indeed
contiguous and properly ordered.
[0053] FIG. 4A shows that the data returned by the memory devices
can have different phase relationships relative to the clock signal
of the interface circuit 250. FIG. 4D shows how the interface
circuit 250 may use the knowledge of the independent clock-to-data
phase relationships to delay each data burst to the interface
circuit 250 to the same clock domain, and re-drive the data bursts
to the system interface as one single, contiguous, burst.
[0054] FIG. 4B illustrates a waveform corresponding to FIG. 4A
showing how the three time separated bursts from three different
memory devices are combined into a larger contiguous burst,
according to one embodiment of the present invention. FIG. 4B shows
that, as viewed from the perspective of the interface circuit 250,
the data burst A0-A1-A2-A3, arriving from one of the memory
components 210 to memory data signal interface A as a response to
command (Cmd) A issued by the memory controller 225, can have a
data-to-clock relationship that is different from data burst
B0-B1-B2-B3, arriving at memory signal interface B, and a data
burst C0-C1-C2-C3 can have yet a third clock-to-data timing
relationship with respect to the clock signal of the interface
circuit 250. FIG. 4B shows that once the respective data bursts are
re-synchronized to the clocking domain of the interface circuit
250, the different data bursts can be driven out of the system data
interface Z as a contiguous data burst.
[0055] FIG. 4C illustrates a flow diagram of method steps showing
how the interface circuit 250 can optionally make use of a training
or clock-to-data phase calibration sequence to independently track
the clock-to-data phase relationship between the memory components
210 and the interface circuit 250, according to one embodiment of
the present invention. In implementations where the clock-to-data
phase relationships are static, the training or calibration
sequence is not needed to set the respective delays in the memory
data signal interfaces. While the method steps are described with
relation to the computer platform 201 illustrated in FIGS. 2B and
2C, any system performing the method steps, in any order, is within
the scope of the present invention.
[0056] The training or calibration sequence is typically performed
after the initialization and configuration logic 282 receives
either an interface circuit initialization or calibration request.
The goal of the training or calibration sequence is to establish
the clock-to-data phase relationship between the data from a given
memory device among the memory components 210 and a given memory
data signal interface 278. The method begins in step 402, where the
initialization and configuration logic 282 selects one of the
memory data signal interfaces 278. As shown in FIG. 4C, memory data
signal interface A may be selected. Then, the initialization and
configuration logic 282 may, optionally, issue one or more commands
through the memory control signal interface 276 and optionally,
memory address signal interface 275, to one or more of the memory
components 210 connected to memory data signal interface A. The
commands issued through the memory controller signal interface 276
and optionally, memory address signal interface 275, will have the
effect of getting the memory components 210 to receive or return
previously received data in a predictable pattern, sequence, and
timing so that the interface circuit 250 can determine the
clock-to-data phase relationships between the memory device and the
specific memory data signal interface. In specific DRAM memory
systems such as DDR2 and DDR3 SDRAM memory systems, multiple
clocking relationships must all be tracked, including clock-to-data
and clock-to-DQS. For the purposes of this application, the
clock-to-data phase relationship is taken to encompass all clocking
relationships on a specific memory data interface, including and
not limited to clock-to-data and clock-to-DQS.
[0057] In step 404, the initialization and configuration logic 282
performs training to determine clock-to-data phase relationship
between the memory data interface A and data from memory components
210 connected to the memory data interface A. In step 406, the
initialization and configuration logic 282 directs the memory data
interface A to set the respective delay adjustments so that
clock-to-data phase variances of each of the memory components 210
connected to the memory data interface A can be eliminated. In step
408, the initialization and configuration logic 282 determines
whether all memory data signal interfaces 278 within the interface
circuit 250 have been calibrated. If so, the method ends in step
410 with the interface circuit 250 entering normal operation
regime. If, however, the initialization and configuration logic 282
determines that not all memory data signal interfaces 278 have been
calibrated, then in step 412, the initialization and configuration
logic 282 selects a memory data signal interface that has not yet
been calibrated. The method then proceeds to step 402, described
above.
[0058] The flow diagram of FIG. 4C shows that the memory data
signal interfaces 278 are trained sequentially, and after memory
data interface A has been trained, memory data interface B is
similarly trained, and respective delays set for data interface B.
The process is then repeated until all of the memory data signal
interfaces 278 have been trained and respective delays are set. In
other embodiments, the respective memory data signal interfaces 278
may be trained in parallel. After the calibration sequence is
complete, control returns to the normal flow diagram as illustrated
in FIG. 4D.
[0059] FIG. 4D illustrates a flow diagram of method steps showing
the operations of the interface circuit 250 in response to the
various commands, according to one embodiment of the present
invention. While the method steps are described with relation to
the computer platform 201 illustrated in FIGS. 2B and 2C, any
system performing the method steps, in any order, is within the
scope of the present invention.
[0060] The method begins in step 420, where the interface circuit
250 enters normal operation regime. In step 422, the system control
signal interface 272 determines whether a new command has been
received from the memory controller 225. If so, then, in step 424,
the emulation and command translation logic 280 translates the
address and issues the command to one or more memory components 210
through the memory address signal interface 275 and the memory
control signal interface 276. Otherwise, the system control signal
interface 272 waits for the new command (i.e., the method returns
to step 422, described above).
[0061] In the general case, the emulation and command translation
logic 280 may perform a series of complex actions to handle
different commands. However, the description of all commands are
not vital to the enablement of the seamless burst merging
functionality of the interface circuit 250, and the flow diagram in
FIG. 4D describes only those commands that are vital to the
enablement of the seamless burst merging functionality.
Specifically, the READ command, the WRITE command and the
CALIBRATION command are important commands for the seamless burst
merging functionality.
[0062] In step 426, the emulation and command translation logic 280
determines whether the new command is a READ command. If so, then
the method proceeds to step 428, where the emulation and command
translation logic 280 receives data from the memory component 210
via the memory data signal interface 278. In step 430, the
emulation and command translation logic 280 directs the data path
logic 281 to select the memory data signal interface 278 that
corresponds to one of the memory components 210 that the READ
command was issued to. In step 432, the emulation and command
translation logic 280 aligns the data received from the memory
component 210 to match the clock-to-data phase with the interface
circuit 250. In step 434, the emulation and command translation
logic 280 directs the data path logic 281 to move the data from the
selected memory data signal interface 278 to the system data signal
interface 274 and re-drives the data out of the system data signal
interface 274. The method then returns to step 422, described
above.
[0063] If, however, in step 426, the emulation and command
translation logic determines that the new command is not a READ
command, the method then proceeds to step 436, where the emulation
and command translation logic determines whether the new command is
a WRITE command. If so, then, in step 438, the emulation and
command translation logic 280 directs the data path logic 281 to
receive data from the memory controller 225 via the system data
signal interface 274. In step 440, the emulation and command
translation logic 280 selects the memory data signal interface 278
that corresponds to the memory component 210 that is the target of
the WRITE commands and directs the data path logic 281 to move the
data from the system data signal interface 274 to the selected
memory data signal interface 278. In step 442, the selected memory
data signal interface 278 aligns the data from system data signal
interface 274 to match the clock-to-data phase relationship of the
data with the target memory component 210. In step 444, the memory
data signal interface 278 re-drives the data out to the memory
component 210. The method then returns to step 422, described
above.
[0064] If, however, in step 436, the emulation and command
translation logic determines that the new command is not a WRITE
command, the method then proceeds to step 446, where the emulation
and command translation logic determines whether the new command is
a CALIBRATION command. If so, then the method ends at step 448,
where the emulation and command translation logic 280 issues a
calibration request to the initialization and configuration logic
282. The calibration sequence has been described in FIG. 4C.
[0065] The flow diagram in FIG. 4D illustrates the functionality of
the burst merging interface circuit 250 for individual commands. As
an example, FIG. 4A illustrates the functionality of the burst
merging interface circuit for the case of three consecutive read
commands. FIG. 4A shows that data bursts A0, A1, A2 and A3 may be
received by Data Path A, data bursts B0, B1, B2 and B3 may be
received by Data Path B, and data bursts C0, C1, C2 and C3 may be
received by Data Path C, wherein the respective data bursts may all
have different clock-to-data phase relationships and in fact pat of
the data bursts may overlap in time. However, through the mechanism
illustrated in the flow diagram contained in FIG. 4D, data bursts
from Data Paths A, B, and C are all phase aligned to the clock
signal of the interface circuit 250 before they are driven out of
the system data signal interface 274 and appear as a single
contiguous data burst with no idle cycles necessary between the
bursts. FIG. 4B shows that once the different data bursts from
different memory circuits are time aligned to the same clock signal
used by the interface circuit 250, the memory controller 225 can
issue commands with minimum spacing--constrained only by the full
utilization of the data bus--and the seamless burst merging
functionality occur as a natural by-product of the clock-to-data
phase alignment of data from the individual memory components 210
connected via parallel data paths to interface circuit 250.
[0066] FIG. 5A illustrates a compute platform 500A that includes a
platform chassis 510, and at least one processing element that
consists of or contains one or more boards, including at least one
motherboard 520. Of course the platform 500 as shown might comprise
a single case and a single power supply and a single motherboard.
However, it might also be implemented in other combinations where a
single enclosure hosts a plurality of power supplies and a
plurality of motherboards or blades.
[0067] The motherboard 520 in turn might be organized into several
partitions, including one or more processor sections 526 consisting
of one or more processors 525 and one or more memory controllers
524, and one or more memory sections 528. Of course, as is known in
the art, the notion of any of the aforementioned sections is purely
a logical partitioning, and the physical devices corresponding to
any logical function or group of logical functions might be
implemented fully within a single logical boundary, or one or more
physical devices for implementing a particular logical function
might span one or more logical partitions. For example, the
function of the memory controller 524 might be implemented in one
or more of the physical devices associated with the processor
section 526, or it might be implemented in one or more of the
physical devices associated with the memory section 528.
[0068] FIG. 5B illustrates one exemplary embodiment of a memory
section, such as, for example, the memory section 528, in
communication with a processor section 526. In particular, FIG. 5B
depicts embodiments of the invention as is possible in the context
of the various physical partitions on structure 520. As shown, one
or more memory modules 530.sub.1-530.sub.N each contain one or more
interface circuits 550.sub.1-550.sub.N and one or more DRAMs
542.sub.1-542.sub.N positioned on (or within) a memory module
530.sub.1.
[0069] It must be emphasized that although the memory is labeled
variously in the figures (e.g. memory, memory components, DRAM,
etc), the memory may take any form including, but not limited to,
DRAM, synchronous DRAM (SDRAM), double data rate synchronous DRAM
(DDR SDRAM, DDR2 SDRAM, DDR3 SDRAM, etc.), graphics double data
rate synchronous DRAM (GDDR SDRAM, GDDR2 SDRAM, GDDR3 SDRAM, etc.),
quad data rate DRAM (QDR DRAM), RAMBUS XDR DRAM (XDR DRAM), fast
page mode DRAM (FPM DRAM), video DRAM (VDRAM), extended data out
DRAM (EDO DRAM), burst EDO RAM (BEDO DRAM), multibank DRAM (MDRAM),
synchronous graphics RAM (SGRAM), phase-change memory, flash
memory, and/or any other type of volatile or non-volatile
memory.
[0070] Many other partition boundaries are possible and
contemplated, including positioning one or more interface circuits
550 between a processor section 526 and a memory module 530 (see
FIG. 5C), or implementing the function of the one or more interface
circuits 550 within the memory controller 524 (see FIG. 5D), or
positioning one or more interface circuits 550 in a one-to-one
relationship with the DRAMs 542.sub.1-542.sub.N and a memory module
530 (see 5E), or implementing the one or more interface circuits
550 within a processor section 526 or even within a processor 525
(see FIG. 5F). Furthermore, the system 220 illustrated in FIGS. 2B
and 2C is analogous to the computer platform 500 and 510
illustrated in FIGS. 5A-5F, the memory controller 225 illustrated
in FIGS. 2B and 2C is analogous to the memory controller 524
illustrated in FIGS. 5A-5F, the interface circuit 250 illustrated
in FIGS. 2B and 2C is analogous to the interface circuits 550
illustrated in FIGS. 5A-5F, and the memory components 210
illustrated in FIGS. 2B and 2C are analogous to the DRAMs 542
illustrated in FIGS. 5A-5F. Therefore, all discussions of FIGS. 2B,
2C, and 4A-4D apply with equal force to the systems illustrated in
FIGS. 5A-5F.
[0071] One advantage of the disclosed interface circuit is that the
idle cycles required to switch from one memory device to another
memory device may be eliminated while still maintaining accurate
timing reference for data transmission. As a result, memory system
bandwidth may be increased, relative to the prior art approaches,
without changes to the system interface or commands.
[0072] While the foregoing is directed to embodiments of the
present invention, other and further embodiments of the invention
may be devised without departing from the basic scope thereof.
Therefore, the scope of the present invention is determined by the
claims that follow.
* * * * *