U.S. patent application number 10/122113 was filed with the patent office on 2002-11-07 for high speed bus interface for non-volatile integrated circuit memory supporting continuous transfer.
This patent application is currently assigned to Dell Products, L.P.. Invention is credited to Dancer, Norman S., Frandeen, James W., Groff, Everett E., Harris, George W. JR., Nespor, Jeffery S., Nolan, Shari J..
Application Number | 20020166023 10/122113 |
Document ID | / |
Family ID | 27736902 |
Filed Date | 2002-11-07 |
United States Patent
Application |
20020166023 |
Kind Code |
A1 |
Nolan, Shari J. ; et
al. |
November 7, 2002 |
High speed bus interface for non-volatile integrated circuit memory
supporting continuous transfer
Abstract
A memory system with non-volatile integrated circuit memory
devices including an interface for a high speed bus is described,
supporting continuous writes at the bus speed, without the
possibility of buffer overrun during most conditions. The system
comprises an memory bus, an system buffer, an array of non-volatile
storage units, such as flash memory devices, and an interconnect
system supporting data transfer among the components. The array
includes sets and subsets of non-volatile storage units, referred
to herein for convenience as platters having multiple banks, banks
having multiple columns, and columns having multiple storage units.
The storage units comprises integrated circuit memory having page
buffers, with input ports. In one example, the array includes two
platters, eight banks per platter, four columns per bank, and eight
storage units per column, for a total of 256 storage units. The
system buffer includes at least the same number of stores as
columns in each bank. The stores comprise FIFOs with from one to
sixteen cycles deep. A triple nested loop is used to manage
continues transfer of data from the high speed bus into the much
slower non-volatile integrated circuit memory.
Inventors: |
Nolan, Shari J.; (San Jose,
CA) ; Nespor, Jeffery S.; (Pleasanton, CA) ;
Harris, George W. JR.; (Mountain View, CA) ; Dancer,
Norman S.; (San Jose, CA) ; Groff, Everett E.;
(San Jose, CA) ; Frandeen, James W.; (Soquel,
CA) |
Correspondence
Address: |
Stephen A. Terrile
HAMILTON & TERRILE, LLP
PO Box 203518
Austin
TX
78720
US
|
Assignee: |
Dell Products, L.P.
|
Family ID: |
27736902 |
Appl. No.: |
10/122113 |
Filed: |
April 11, 2002 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
10122113 |
Apr 11, 2002 |
|
|
|
09292536 |
Apr 15, 1999 |
|
|
|
6401161 |
|
|
|
|
Current U.S.
Class: |
711/103 ;
711/157; 711/167; 711/5 |
Current CPC
Class: |
G06F 13/1689
20130101 |
Class at
Publication: |
711/103 ;
711/167; 711/157; 711/5 |
International
Class: |
G06F 012/00 |
Claims
What is claimed is:
1. An apparatus comprising: a plurality of banks of non-volatile
storage units, each bank having a number of columns of non-volatile
storage units, each non-volatile storage unit having an input
buffer for storing a page of data, the page having a page size,
having an input coupled to the input buffer accepting an input
portion of data of a page at a memory speed, the non-volatile
storage units storing the data of the page from the input buffer
within a memory write time; a plurality of interface buffers
coupled to the input bus; an input bus having an input bus speed
which is faster than the memory speed, the input bus being coupled
to the plurality of interface buffers; a bus system, connecting
each of the plurality of interface buffers being coupled to one of
the columns in each of the plurality of banks of non-volatile
storage units, supplying data from the plurality of buffers to the
inputs of the non-volatile storage units in selected at the memory
speed; and wherein the number of banks of non-volatile storage
units in each set being greater than or equal to the memory write
time multiplied by the memory speed divided by the page size and
the number of columns in each bank being greater than or equal to
the input bus speed divided by the memory speed.
2. The apparatus of claim 1, further comprising control logic for
accepting a burst data transfer including a Y-bit word every input
bus cycle for a plurality of cycles, over the input bus and storing
the Y-bit words of the burst data to the plurality of banks of
non-volatile storage units at the input bus speed.
3. The apparatus of claim 2, wherein the control logic further
comprises a logic for selecting a starting page in the plurality of
sets of non-volatile storage units for storing the data burst.
4. The apparatus of claim 2, wherein the control logic further
comprises a logic for providing a destination page and control
information to the plurality of banks of non-volatile storage
units.
5. The apparatus of claim 2, wherein the control logic further
comprises a logic for transferring portions of data from the
plurality of buffers to the plurality of banks of non-volatile
storage units on every interval of the input bus speed.
6. The apparatus of claim 5, wherein there are at least N interface
buffers f (f=0 to N-1) having a depth of Z cycles, at least N
columns c {circle over (c)}=0 to N-1) in each of at least M banks b
(b=0 to M-1), and the page buffers in the non-volatile memory units
include storage for at least X input cycles i (i=0 to X-1), and
wherein the logic employs a process supporting continuous writes of
16000 input bus cycles or more comprising writing data to bank b,
column c, page address i in the array at the input bus speed in a
given cycle i+c+b+Z from the interface buffer f, then incrementing
f and c for f and c going from 0 to N-1, and then incrementing i,
for i going from 0 to X-1, and then incrementing b for b going from
0 to M-1.
7. The apparatus of claim 1, wherein the input bus has an input bus
data width, and each of the plurality of buffers is capable of
accepting data the size of the input bus data width, and the
parallel combination of the input buffers of the non-volatile
storage units in each column are capable of accepting data the size
of the input bus data width.
8. The apparatus of claim 7, wherein the input bus data width is 64
bits.
9. The apparatus of claim 7, wherein a burst data transfer is
accepted over the input bus for storage in the plurality of sets of
non-volatile storage, and the burst data received in data portions,
each data portion being the size of the input data bus data width,
and the burst data transfer comprised of 16,384 data portions.
10. The apparatus of claim 1, wherein each column comprises one or
more integrated circuit non-volatile storage elements.
11. The apparatus of claim 10, wherein each non-volatile storage
element comprises a flash memory device.
12. The apparatus of claim 1, wherein the input bus speed is about
66 megahertz (MHz), the memory speed is about 16.5 MHz, the page
size is 512 Y-bit words, and the memory write time is greater than
100 microseconds.
13. The apparatus of claim 12, wherein the number of interface
buffers is four and the number of non-volatile memory banks is
eight.
14. A method for storing data from an input bus at an input bus
speed to an array of integrated circuit, non-volatile memory
devices, the memory devices including page buffers having storage
for at least X input cycles (i=0 to X-1) and accepting data at an
array speed which is slower than the input bus speed, the method
comprising: arranging the array of integrated circuit, non-volatile
memory devices in at least N columns c {circle over (c)}=0 to N-1)
in each of at least M banks b (b=0 to M-1); providing at least N
interface buffers f (f=0 to N-1) having a depth of Z cycles coupled
to the input bus for receiving data in input bus cycle n+f to
interface buffer f at the input bus clock speed for f going from 0
to N-1; and writing data to bank b, column c, page address i in the
array at the input bus speed in a given cycle i+c+b+Z from the
interface buffer f, then incrementing f and c for f and c going
from 0 to N-1, and then incrementing i, for i going from 0 to X-1,
and then incrementing b for b going from 0 to M-1.
15. The method of claim 14, wherein the array speed is less than
one third and greater than one fourth the input bus speed, X is
greater than or equal to 256, N is greater than or equal to 4, and
M is greater than or equal to 8.
16. The method of claim 14, wherein the input bus has an input bus
data width, and each of the plurality of buffers is capable of
accepting data the size of the input bus data width, and the
parallel combination of the input buffers of the non-volatile
storage units in each column are capable of accepting data the size
of the input bus data width.
17. The method of claim 16, wherein the input bus data width is 64
bits.
18. The method of claim 16, wherein a burst data transfer is
accepted over the input bus for storage in the plurality of sets of
non-volatile storage, and the burst data received in data portions,
each data portion being the size of the input data bus data width,
and the burst data transfer comprised of 16,384 data portions.
Description
BACKGROUND OF THE INVENTION
[0001] 1. Field of the Invention
[0002] This invention relates to the management of interfaces
between high speed buses and memory. In particular, the invention
relates to an arrangement of non-volatile integrated circuit
memory, such as flash memory, that supports operation with a high
speed bus.
[0003] 2. Description of the Related Art
[0004] Large scale data storage systems are being used in an
increasing variety of settings. Thus, flexibility in the design of
the access systems used with these systems is becoming increasingly
important. One approach to improving the flexibility which has
evolved is called a storage area network [SAN] ("SAN"). In the SAN
environment, heterogeneous storage systems are being deployed which
allow for greater flexibility in the use and management of data. In
a SAN, the storage systems are interconnected by high-speed
communication channels, such as the fiber channel networks. Thus,
for the best performance, the interfaces to the memory systems in
the SAN must be as fast as possible.
[0005] One kind of memory system which is not been widely applied
to the SAN environment is non-volatile solid-state memory, such as
memory systems using integrated circuit flash memory devices. One
reason non-volatile solid state memory is not in wide-spread use
arises from the relatively slow processes used for storing data in
such devices. It is difficult for a system based on an array of
flash memory integrated circuits, for example, to keep up with a
high-speed communication channel feeding data.
[0006] The current generation of flash memory modules represented
by devices such as the Toshiba TC5825FT, generally has a relatively
long write period which varies in length over the life of the
device from about 200 .mu.s to as much as 1000 .mu.s or more per
write cycle. Read operations are much faster, but can still take 10
.mu.s or more. Furthermore, the memory modules have on chip
buffers, which accept data bytes at a clock speed up to about 20
MHz for example. Standard bus speeds are generally much faster and
[carries] carry eight bytes per cycle. For example, the PCI bus
operates typically at 33 or 66 MHz and [carry] carries 64 bits or 8
bytes per cycle. This means that there cannot be a write to the
flash memory module during each bus cycle.
[0007] In order to transfer data from a computer bus to flash
memory, typically a buffer is used. The buffer is designed to be
big enough to hold the data received over the bus as the flash
memory write cycles occur. For a representative system using
current generation flash memory modules, a 16 KB [FIFO] first in,
first out ("FIFO") buffer is required at the interface between the
flash device and a 66 MHz, 64 bit PCI bus. The buffers often
require extra board space, and are easily overrun by large data
transfer operations.
[0008] Thus, this configuration does not permit the flash memory to
be used in a sustained transfer of large files at the same speed as
the computer bus. Further, if a faster bus is used, the performance
of the flash memory becomes progressively worse compared to the
capacity of the bus.
[0009] Accordingly, what is needed is a method and apparatus for
interfacing a high speed bus with a flash memory or other
non-volatile solid state memory devices.
SUMMARY OF THE INVENTION
[0010] A memory system with an array of non-volatile solid state
memory devices including an interface for a high speed bus is
described, supporting continuous writes at the bus speed of very
large blocks of data, without the possibility of buffer overrun
during most conditions.
[0011] An apparatus comprises [an] a memory bus, a plurality of
interface buffers, an array of non-volatile storage units, such as
flash memory devices, and an interconnect system supporting data
transfer among the components. The array includes sets and subsets
of non-volatile storage units, referred to herein for convenience
as platters having multiple banks, banks having multiple columns,
and columns having multiple storage units. In one example, the
array includes two platters, eight banks per platter, four columns
per bank, and eight storage units per column, for a total of 256
storage units. Of course other configurations fall within the
present invention using different combinations of units per column,
columns per bank, and banks per platter.
[0012] The non-volatile storage units each have an input buffer for
storing a page of data, and an input port coupled to input pins on
the unit and to the input buffer. The page size and the size of the
input port can vary, but for example, a page is 256, 512 or 1024
bytes, and the input port can accept one or two 8-bit bytes per
storage unit clock cycle.
[0013] In one embodiment supporting continuous writes, there are at
least N interface buffers f (f=0 to N-1), the interface buffers
having a depth of Z cycles, at least N columns c [{circle over
(c)}] (c=0 to N-1) in each of at least M banks b (b=0 to M-1), and
the [page] input buffers in the non-volatile memory units include
storage for at least X addresses in a page (i=0 to X-1). Logic in
the system employs a process supporting continuous writes
comprising writing data to bank b, page address i, and column c in
a given input cycle i+c+b+ Z from the interface buffer f to column
c, for f and c going from 0 to N-1, and then incrementing i, for i
going from 0 to X-1, and then incrementing b for b going from 0 to
M-1. Z in preferred implementations ranges from 1 to 16.
[0014] The memory speed at which the input buffer can accept data
can vary. In the following example, a typical speed of 16.5 MHz is
used. The non-volatile storage units take a certain write time to
store the page of data from the input buffer into the memory. The
[sets] columns of non-volatile storage units are each coupled to a
corresponding interface buffer by a memory bus. The memory bus
supplies data from the interface buffers to the inputs of the
non-volatile storage units at the memory speed. The input bus is
coupled to the interface buffers to supply them with data. The
input bus speed is typically several times faster than the memory
speed. For example, the input bus speed might be 66 MHz as compared
to a memory speed of 16.5 MHz. The write time for flash memory
devices includes a write wait time plus a setup time plus the time
to write the number of bytes required. For a column of eight
devices with one byte input ports, a bus eight bytes wide can
supply data to be written in one storage unit cycle in the column.
For [a] an input buffer of 512 bytes, 512 storage unit cycles are
used to fill the input buffers of the column of devices. Thus, in
512 storage unit cycles, 4192 (4K) bytes are stored in the column
to be written into the non-volatile memory. The total time,
considering zero wait states, is one storage unit cycle for a
command, three cycles for address, 512 cycles for data, and the
memory wait time. Thus, this total time ranges, for example, from
about 232.182 [micro-seconds] .mu.s to 1032.182 [micro-seconds]
.mu.s, with the bus coupled to the input port busy for 32.182
[micro-seconds] .mu.s.
[0015] With a 16.5 MHz storage unit clock, 4 interleaved columns
are used in each bank to keep up with a 66 MHz PCI bus. This
provides for storage of 16K bytes within each 32.182
[micro-seconds] .mu.s per bank interval at the speed of the
incoming PCI bus. At the end of the per bank interval, the system
switches to the next bank on the platter. The number of banks on
the platter is selected so that a total write time of, for example,
about 250 [micro-seconds] .mu.s elapses before the system reverts
to the first bank. Multiple platters can be coupled in parallel
with logical memory addressing for added memory capacity or in a
series to handle longer write times.
[0016] The number of non-volatile storage banks in each array is
going to be at least as great as the memory write time multiplied
by the memory speed divided by the page size. For example, if the
memory speed is 16.5 MHz, the page size is 512 bytes and the memory
write time is 200 .mu.s, at least seven banks must be provided.
More can be provided and in one embodiment, eight banks are used
with these clock speed and input buffer parameters.
[0017] In one embodiment, the system includes control logic for
accepting burst data transfers over the input bus and storing the
burst data in the non-volatile storage units.
[0018] In one embodiment, the system includes logic for selecting a
starting page in the non-volatile storage units to store the data
burst.
[0019] In one embodiment, the system includes control logic for
providing a destination page and control information to the
non-volatile storage units.
[0020] In one embodiment, the system includes logic for enabling
the individual non-volatile storage columns. For example, the first
non-volatile storage unit of each of the banks can be enabled or
selected.
[0021] In one embodiment, the system includes logic for
transferring portions of data from the interface buffers to the
non-volatile storage columns at every interval of the input bus
speed.
[0022] In one embodiment, a triple round-robin is used to transfer
the data from the plurality of interface buffers to the
non-volatile storage units. The outermost round-robin selects one
of the columns in each set. The middle round-robin selects among
the entries of the page size of the input buffer in the
non-volatile storage units. The innermost round-robin selects one
of the banks in the plurality of banks in a round-robin fashion.
Then data is transferred from the selected interface buffer to the
selected column.
[0023] In one embodiment, the burst data is received in 16,384 data
portions each the width of the input bus of, for example, 64 bits
per portion.
BRIEF DESCRIPTION OF THE FIGURES
[0024] FIG. 1 is a block diagram of an interface between a high
speed bus and a non-volatile storage.
[0025] FIG. 2 is a block diagram of an arrangement of a set of
non-volatile storage units.
[0026] FIG. 3 is a timing diagram showing the relationship between
the operation of the high speed bus and the non-volatile
storage.
[0027] FIG. 4 is a process flow diagram demonstrating a method for
interfacing a high speed bus with non-volatile storage.
[0028] FIG. 5 is a process flow diagram demonstrating a method for
storing a data burst to a non-volatile storage.
DETAILED DESCRIPTION
[0029] A. System Overview
[0030] FIG. 1 is a block diagram of a memory system including an
interface between a high speed input bus 100 and an array of
non-volatile storage devices. This interface can be used to allow
non-volatile storage to match the speed and capacity of a high
speed input bus 100 such as a PCI bus. FIG. 1 shows the
configuration for interfacing flash memory non-volatile storage
units operating at 16.5 MHz and a write wait time of over 200
[microseconds] .mu.s with a 66 MHz, 64-bit [64 bit] wide PCI input
bus 100. Types of non-volatile storage other than flash memory can
be used. One of the characteristics of non-volatile storage units
is that they operate at a slower speed than a high speed computer
bus.
[0031] This paragraph lists the elements of the system shown in
FIG. 1. FIG. 1 includes a high speed input bus 100, a bridge chip
102, a local bus 104, a set of control lines 106, a controller 108,
[first in first out (FIFO)] first in, first out ("FIFO") interface
buffers (herein, "interface buffers") [110A-116D] 110A-110D, a FIFO
select 118, a set of control lines 120, and banks of non-volatile
storage units (herein also referred to as "banks") [122A-D]
122A-122H. The banks of non-volatile storage units [122A] 122A-122H
include columns of non-volatile storage units (herein also referred
to as "columns" or "columns of units") [130-132] 130D-130D.
[0032] The input bus 100 is coupled to the bridge chip 102. The
local bus 104 couples the bridge chip 102 and the [FIFO] interface
buffers [110A-116D] 110A-110D. The set of control lines 106 couples
the bridge chip 102 and the controller 108. The controller 108 is
coupled to the [FIFO] interface buffers [110A-116D] 110A-110D by
the FIFO [selected] select 118. The [FIFO] interface buffers
[110-116] 110A-110D are coupled to the corresponding banks of
non-volatile storage units [122A-D] 122A-122H by the memory bus
[system] 140 operating at the memory unit clock speed (e.g. 16.5
MHz). The [FIFO] interface buffers 110A-110D may be as small as one
cycle deep, or more preferably, four to sixteen cycles deep to
allow for safety against variations in transfer latencies. Each 64
or 66 bit wide [buffer] interface buffer 110A-110D is coupled
respectively to a corresponding column 130A-130D in the bank [B1]
122A, and to a corresponding column of units 130A-130D in each of
the other banks [B2 to B8] of non-volatile storage units 122B-122H
in this example. For the 64 bit wide [bus] embodiment of input bus
100, eight sets of eight bits from [the] each interface buffer
110A-110D are coupled in parallel to the input ports of the eight
memory [unit] units in the corresponding column 130D-130D. This
way, 64 bits are written in parallel to the eight bit input ports
of eight chips, and in 512 such cycles, the input buffers 200A,
202A, 204A, 206A, 208A, 210A, 212A, 214A (herein also collectively
"200A-214A"), shown in FIG. 2, on the chips in the [column] columns
130D-130D of a bank among banks 122A-122H are filled. The
controller 108 then connects the [buffers] interface buffers
110A-110D to the next bank among banks 122A-122H.
[0033] The input bus 100 is a bus such as the 66 MHz 64 bit PCI
bus, or some other sort of bus supplying several gigabits per
second or more. Data flows over the input bus 100 into a bridge
chip 102 that decodes the control signals on the input bus 100. The
bridge chip 102 identifies data on the input bus 100 that is to be
stored in, or retrieved from, the non-volatile storage. The data
can temporarily reside on the bridge chip 102. In some embodiments,
the local bus 104 is coupled to a random access memory (not shown),
like high speed synchronous dynamic random access memory (SDRAM).
This additional memory can provide temporary storage of data prior
to the transfer of the data to the flash memory. This additional
memory may also be used to maintain a memory map or some other
table keeping track of where data is stored in the flash
memory.
[0034] The data is usually transferred across the input bus 100 in
data bursts. Each data burst will be comprised of a number of bus
size portions of data. In the case of the PCI input bus 100, the
data width is 64 bits. Also, the PCI input bus 100 can carry two
bits of parity information, making the total data width 66 bits if
parity information is being stored. In one embodiment, the typical
block of data sent in burst mode is [16K bits] 16,384, or 16K bits
in 256 cycles at 64-bits per cycle. If parity is included on the
input bus 100, 16,896 bits in 256 cycles with two bits of parity
are transferred. The two extra bits in one alternative can be
buffered in a separate buffer 2 bits by 256 cycles deep. The parity
data in this embodiment is transferred to the [flash array]
non-volatile storage units in 16 cycles extra. Alternatively, the
columns 130D-130D and interface buffers 110A-110D can be made 66 or
more bits wide, rather than 64, to accommodate real time,
continuous parity data transfer.
[0035] The controller 108 controls the flow of information from the
bridge chip 102 to the [sets of flash memory units 122A-D] banks
122A-122H. The controller 108 also maintains a table of where data
is stored in the [flash memory 122A-D] banks 122A-122H. This can be
maintained in the controller 108 or in a memory coupled to the
controller 108. The functions of the bridge chip 102 and the
controller 108 can be combined. The controller 108 may be a field
programmable gate array (FPGA), a microprocessor, or some other
type of controller. The controller 108 receives signals from the
bridge chip 102 over the set of control lines 106. The set of
control lines 106 indicate the operation to be performed. The
operations include, for example, read, write, block erase, setup
with and without parity, byte access, and idle.
[0036] The controller 108 responds to signals sent over the set of
control lines 106 by changing the signals on the FIFO select 118
and the set of control lines 120. The controller 108 can enable the
inputs to one or all of the [FIFO] interface buffers [110A-D]
110A-110D by altering the signals sent over the FIFO select
118.
[0037] In the illustrated embodiment, the non-volatile storage
units that comprise the columns (e.g. 130A to 130D) of flash memory
units in the banks 122A-122H use the same inputs for addresses,
data, and instructions. Therefore, when addresses are being
provided from the bridge chip 102, or from some other source, the
controller 108 will enable all of the interface [FIFO] buffers
[110A-D] 110A-110D. Then, the controller 108 will transfer the
address and instruction information to selected columns (130A-130D)
that comprise the banks of non-volatile storage units [122A-D]
122A-122H from the [FIFO] interface buffers [110A-D] 110A-110D.
[0038] Once the actual data to be written to the non-volatile
storage is on the bridge chip 102, the controller 108 round-robins
the data into the [FIFO] interface buffers [110A-D] 110A-110D. In
this example, the [FIFO] interface buffer 110A would get the data
from a first input bus cycle after the address information. The
[FIFO] interface buffer 110B would get the data from a second input
bus cycle. The [FIFO] interface buffer 110C would get the data from
the third input bus cycle. The [FIFO] interface buffer 110D would
get the data from the fourth input bus cycle and the round-robin
would start again at interface buffer 110A.
[0039] At the same time that the controller 108 is performing a
round robin on the input from the bridge chip 102 into the [FIFO]
interface buffers [110A-D] 110A-110D, the controller 108 is
performing a triple loop process to transfer the data from the
front of the [FIFO] interface buffers [110A-D] 110A-110D into the
non-volatile storage units 200-214 across [interface system] memory
bus 140. The outermost loop selects among the first to the fourth
[column 130A-133D] columns 130D-130D. The middle loop is on the
number of entries that make up each page of the input [buffer]
buffers 200A-214A of the [flash memory module] non-volatile storage
units 200-214. In this example, the middle loop ranges over the 512
entries of 64 bits each in the page, or 528 entries if parity
information is being stored in a separate buffer at the interface.
The innermost loop is on the banks 122A-122H.
[0040] The triply nested loop structure is such that on each clock
period of the clock on the input bus 100, one data portion is being
transferred to an interface buffer 110A-110D while another is being
stored into a [non-volatile storage] column 130D-130D from [a] an
interface buffer 110A-110D. The one to one[,] or better mapping of
input to output cycles on the interface buffers 110A to 110D[,]
insures that no overrun condition will happen in normal
circumstances, and supports continuous transfer of data from a high
speed input bus 100 to the [flash memory] non-volatile storage
units 200-214. Further, the [FIFO] interface buffers [110A-D]
110A-110D do not need to be very large. Because of the arrangement
of the non-volatile storage units into [sets] banks of non-volatile
storage units [122A-D] 122A-122H, an entry will be removed from [a]
an interface buffer 110A-110D just as another entry is stored in
the interface buffer 110A-110D. For this reason, the [FIFO]
interface buffers [110A-D] 110A-110D have a depth of 1,
constituting a single entry register. In some embodiments, each
[FIFO] interface buffer 110A-110D has a depth of 16 entries. It is
also not necessary to use a FIFO buffer, as other types of buffers
can be used. Each entry in the interface [buffer] buffers 110A-110D
should be capable of carrying the full data width of the input bus
100, for example 64 bits of data. If parity information is being
preserved, on the 64 bit PC input bus 100, that would be 66 bits
wide, and an extra interface buffer of the same type as 110A-110D
as mentioned above could be used because the parity would be
supplied at the end of the data with additional bus clock
cycles.
[0041] In the example shown, the banks of non-volatile storage
units [122A-H] 122A-122H [are comprised of comprise] four columns
(e.g. 130D-130D) of non-volatile storage units. In this example,
each column 130D-130D [is comprised of] comprises eight
non-volatile storage units 200, 202, 204, 206, 208, 210, 212, 214
(herein also collectively "200-214"), shown in FIG. 2. The columns
of non-volatile storage units 130D-130D are part of the [set] banks
of non-volatile storage units [122A] 122A-122H.
[0042] Each non-volatile storage unit 200-214 may [be comprised of]
comprise multiple non-volatile storage elements. One type of
non-volatile storage that can be used is flash memory. In one
embodiment, Toshiba TC8256FT flash memory elements are used. Each
Toshiba TC8256FT flash memory module holds 64 M bits, or 8 M bytes
without parity. In embodiments supporting parity, the chips have
additional capacity to store the parity bits. The modules are
organized into 16,384 pages of 512 [bytes] entries of 64 bits each,
528 [bytes] entries if parity information is being stored.
[0043] The Toshiba TC8256FT flash memory elements receive data 8
bits at a time. For that reason, multiple Toshiba TC8256FT flash
memory modules will be grouped to form a single [non-volatile
storage] column of non-volatile storage units (i.e., one of columns
130D-130D) capable of holding the fall data width of the input bus
100. In the case of PCI, there are 64 bits of data; accordingly,
each of the columns of non-volatile storage units 130A-130D could
be comprised of eight Toshiba TC8256FT flash memory elements. In
this configuration, each [non-volatile storage] column of
non-volatile storage units 130D-130D has 64 MB of memory and each
bank of non-volatile storage units 122A-122H has [512] 256 MB of
memory, for a total storage capacity of 2 GB of flash memory per
platter. Depending on the application, larger or smaller flash
memory units may be used.
[0044] The example shown is for a 66 MHz PCI input bus 100 with one
type of non-volatile storage units 200-214, the Toshiba TC8256FT
flash memory module. More generally, the configuration of sets and
non-volatile storage units 200-214 can be computed based on the
timing characteristics of the input bus 100 and the non-volatile
storage units 200-214 used in the system. The minimum number of
interface buffers 110A-110D can be computed by using Equation 1. 1
bus speed memory speed ( 1 )
[0045] The bus speed is the clock speed at which the input bus 100
is running. The memory speed is the clock speed at which the input
buffer 200A-214A of the non-volatile storage unit 200-214 can
accept data. For a 100 MHz input bus 100 and a non-volatile storage
unit 200-214 with [a] an input buffer 200A-214A capable of
accepting data at 16.5 MHz, the required number of buffers
110A-110D would be the next higher integer from [ 100/16.5]
(100/16.5), or 7. If the [page] input buffers 200A-214A of the
non-volatile storage units 200-214 could accept data at 20 MHz, the
same 100 MHz bus would only require 5 columns 130D-130D. The number
of columns of non-volatile storage units 130D-130D in each bank
122A-122H is identical to or greater than the number of interface
buffers 110A-110D.
[0046] The number of non-volatile storage units 200-214 in each set
can vary based on the characteristics of the non-volatile storage
unit 200-214 and the design specifications. If flash memory is
used, there may be different performance characteristics for the
non-volatile storage portion of the flash module over the lifetime
of the flash memory module. Depending on the application, a
different write time should be used to calculate the number of
non-volatile storage units 200-214 per set.
[0047] In some applications, the average write time should be used.
In others, the worst case numbers are more appropriate. For
example, the Toshiba TC8256FT flash memory module has a worst case
write time of 1000 .mu.s, but an average write time over the useful
life of 200 .mu.s. Depending on the application and the length of
time that the module will be used, a different write time should be
used in designing the configuration of the non-volatile storage. In
one embodiment, the average write time is used. In another
embodiment, the worst case write time is used.
[0048] The minimum number of banks per platter can be computed
using Equation 2: 2 flash write time writes per page .times. flash
clock period = flash write time .times. flash clock rate writes per
page ( 2 )
[0049] For example, if a 200 .mu.s write time is used for the flash
memory units, then given the rate at which the input buffer
200A-214A of the non-volatile storage unit can accept data, 16.5
MHz, and the page size, 512 entries, the number of banks needed can
be computed using Equation 2. Here, the computation results in a
minimum number of banks of the next greater integer from 3 200 s
.times. 16.5 MHz 512 = 6.510 = 7.
[0050] or 7.
[0051] In this example, eight columns are present in each bank.
This is done because the exact number of columns in each bank can
be tuned to the application. In one embodiment, the burst data
transfer size is 16,384 64-bit portions. By having eight columns of
non-volatile storage units in each bank, there are 32 non-volatile
storage units total per bank. Each column of non-volatile storage
unit has a page buffer that can hold 512 64-bit pieces of
information. Therefore, with 32 columns of non-volatile storage
units in eight banks, a single page of all of the non-volatile
storage units will hold the data burst (512.times.32=16384). The
memory map is also simple with this configuration because a block
can be located by a single address, its page number, which is the
same in all of the flash memory units. Further, using eight units
instead of seven allows a greater tolerance for the flash memory to
perform as slowly as approximately 250 .mu.s on write
operations.
[0052] The Toshiba TC8256FT flash memory elements use only a single
set of inputs to provide addressing, instructions, and data to the
flash memory module. Accordingly, the set of control lines 120 will
not provide address information if the Toshiba TC8256FT flash
memory element is used. Instead, the address and instructions are
provided over the same inputs that couple the [FIFO] interface
buffers [110A-D] 110A-110D to the non-volatile storage columns
130A-130D. In one embodiment, each block of data comes in 16,384
64-bit data bursts and accordingly an entire data burst is stored
on the same page in all of the flash memory units. Thus, the
destination page and write instruction can be loaded into all of
the [FIFO] interface buffers [110A-D] 110A-110D with the FIFO
select 118 set so that all of the [FIFO] interface buffers
110A-110D get the destination page and write instruction. The
destination page and write instruction can then be transferred from
the [FIFO] interface buffers [110A-D] 110A-110D to all of the
non-volatile storage units 200-214 in the banks [122A-H] 122A-122H.
Depending upon the configuration of the set of control lines 120,
this may require a double loop through all of the columns 130A-130D
and all of the banks 122A-122H, or it may be possible to simply
loop through all of the buffers and activate all of the columns
130A-130D simultaneously.
[0053] B. Banks of Columns of Non-Volatile Storage Units
[0054] FIG. 2 is a block diagram of an arrangement of a column
[130] 130A of non-volatile storage units 200-214. FIG. 2 includes a
controller 108, [FIFO] interface buffer 110A, a FIFO select 118, a
set of control lines 120, and a column 130A of non-volatile storage
units [200, 202, 204, 206, 208, 210, 212, 214] 200-214. In each of
the eight banks a column (e.g. 230A) corresponding to a single
interface buffer 110A is connected to the interface buffer 110A.
The non-volatile storage column 130A is comprised of eight
non-volatile storage [elements] units 200-214. Each of the other
interface buffers 110B, 110C, and 110D are connected in a similar
fashion to corresponding columns (not shown) in the bank.
[0055] The controller 108 is connected to the [FIFO] interface
buffer 110A by the FIFO select 118. The [FIFO] interface buffer
110A is coupled to one non-volatile storage column 130A in each
bank by a 64 bit wide memory bus 140. The [bus] lines of memory bus
140 are then divided across the non-volatile storage [elements]
units that make up each column. Bits 0-7 of the memory bus 140 are
coupled to non-volatile storage [element] unit 200. Bits 8-15 are
coupled to non-volatile storage [element] unit 202, and so on. In
this fashion, the 64 bit memory bus 140 is coupled to the eight
8-bit non-volatile storage [elements] units 200-214 that [comprise]
constitute this non-volatile storage column 130A. The set of
control lines 120 are coupled to the chip enable, write enable and
other control inputs of the non-volatile storage units 200-214 in
each of the [column] columns 130A-130D.
[0056] Each of the non-volatile storage [elements] units 200-214 is
comprised of a non-volatile memory and an input buffer 200A-214A
that is capable of storing a page of data [and a non-volatile
memory]. [The] Each input buffer 200A-214A is loaded with the data
and then the non-volatile memory is written. [The] Each input
buffer 200A-214A is capable of accepting data at a limited rate.
Memory elements such as the Toshiba TC8256FT flash module can
accept data at rates up to 20 MHz. With current non-volatile
storage units, this process takes a relatively long period such as
250 .mu.s, which is several thousand clock cycles of a clock
running at 20 MHz. Other non-volatile memory devices having read
while write capability, different page sizes, different input port
sizes, and the like can be utilized as well, with appropriate
changes in the bus widths and timing.
[0057] C. Timing
[0058] FIG. 3 is a timing diagram showing the relationship between
the operation of the high speed input bus 100 and the non-volatile
storage. FIG. 3 includes a Bus Clock 300, [a] an interface buffer
110A clock 302, [a] an interface buffer 110B clock 304, [a] an
interface buffer 110C clock 306, [a] an interface buffer 110D clock
308 (herein, "interface buffer clocks," or "clocks," collectively,
302-308), and reference points 310-326. In this example, the target
address is page 5, and the timing shown corresponds to the middle
of a transfer.
[0059] The bus clock 300 is running at 66 MHz. At each of the
reference points 310-326, a portion of the data burst is loaded
into one of the four [FIFO] interface buffers 110A-110D. At
reference point 310, interface buffer 110A is loaded. At reference
point 312, interface buffer 110B is loaded. At reference point 314,
interface buffer 110C is loaded. At reference point 316, interface
buffer 110D is loaded, and the process continues from reference
points 318-326. The clocks 302-308 for the interface buffers
110A-110D are running at 16.5 MHz. The clocks 302-308 for the
interface buffers 110A-110D each start at the same time as the
rising edge of the bus clock 300. However, each of the four
interface buffer clocks 302-308 starts on a different clock phase
so that the interface buffer clocks 302-308 are each one period of
the bus clock 300 off from one another. This enables the interface
buffers 110A-110D to be emptied in a round-robin fashion at the
same overall rate as the bus clock 300.
[0060] At reference point 310, interface buffer 110D clock 308 is
in the middle of transferring the [byte] entry 510 of page 5 from
[FIFO] interface buffer 110D to column 130D. Prior to reference
point 310, the first 509 entries have been loaded into all of the
input buffers 200A-214A and stored. Prior to reference point 310,
the 510th entry has been placed into the input buffers 200A-214A of
the first three columns 130A-130C. By reference point 312, the
transfer from [FIFO] interface buffer 110D of the 510th entry to
the input buffers 200A-214A of column 130D will be completed. While
the transfer to the input [buffer] buffers 200A-214A of the
non-volatile storage unit is completed, three more cycles are
required to finish the storing of the data in the device.
[0061] Now, the transfer of the 511th entry can begin. On each of
the reference points, 310-316, one entry will be transferred from
the corresponding interface buffer 110A-110D to the 511th entry of
the input [buffer] buffers 200A-214A of the columns 130A-130D.
[0062] At reference point 318, the selected bank will change so
that the second unit in the platter of non-volatile storage units
200-214 receives data, in this example also at page 5, but not
necessarily so. This is important because, once the entry 511
(assuming no parity) was stored into the input buffer 200A-214A,
the page was filled and the input buffer 200A-214A will write out
the buffered data to the non-volatile memory units 200-214. In the
example shown in FIG. 3, the first selected bank is bank 122B, and
at reference point 318, the bank changes to bank 122C.
[0063] At reference points 318-324, the first [byte] entry of the
fifth page of the next bank will be written to the selected
non-volatile storage unit 200-214 in each of the sets from the
corresponding buffer.
[0064] Because the interface buffer clocks 302-308 correspond with
the bus clock 300, in the case where there is an interrupt on the
bus clock 300, the timing of any interface buffer clocks 302-308
can be held until the interrupt is complete.
[0065] D. Setup
[0066] FIG. 4 is a process flow diagram demonstrating a method for
interfacing a high speed bus with non-volatile storage.
[0067] The process starts at step 400, where a request is received
to store a data burst at a target address. In one embodiment, each
data burst is 16,384 64-bit entries. Other data burst sizes can be
supported.
[0068] Next, at step 404, addressing information and commands are
placed in the buffers. The addressing information is the target
page. The command is that a page is going to be written. By
providing this information to the columns, the input will be
prepared to receive data, and when each 64 bit word is received,
the input buffers of the non-volatile storage units will begin to
write that data to the column. In other embodiments, each
non-volatile storage unit has addressing and command lines separate
from the data lines. In that case, at step 404, the addressing and
commands are provided to the non-volatile storage units themselves
and control can proceed at step 408, skipping over step 406.
[0069] Next at step 406, the destination address and commands are
written to columns. Depending on the configuration of the control
lines and the buffers, it may be possible to do this in a single
loop through all of the buffers. In other configurations, a double
loop between each of the buffers and all of the columns may be
required.
[0070] Next at step 408, the data burst is received and stored in
the columns. Then the "write complete" of the page is verified.
This process can be performed by the method of FIG. 5.
[0071] The method can also support reading data bursts from the
non-volatile storage and placing it on the bus at high speed. The
method of FIG. 4 can be used by selecting a read location at step
402 and then loading the data from columns into the buffers and
then onto the bus at step 408.
[0072] E. Write Process
[0073] FIG. 5 is a process flow diagram demonstrating a method for
storing a data burst to non-volatile storage. This can be used at
step 408 of FIG. 4 to store the data burst into the non-volatile
storage.
[0074] The process starts at step 500, with an input location set
at bank b, column c, page address i. That location is written from
the interface buffer f corresponding to column c. Next the
algorithm determines whether all columns in the bank had been
written (step 504). If they have not all been written, then the
algorithm branches to step 506 and increments the column c along
with the interface buffer f. The process returns to step 502 to
write the updated location. If at step 504, all the columns in the
bank had been written, then c is reset and the algorithm determines
whether all the bytes in the page had been written (step 508). If
all bytes page had not been written, then the algorithm branches to
step 510, and increments of the parameter i. It then branches to
step 502 to write the updated location. If at step 508, all the
bytes in the page had been written, then i is reset and the
algorithm determines whether all the banks in the platter have been
written (step 512). If at step 512, more banks need to be written,
then the algorithm branches to step 514 to increment the bank b.
The algorithm then returns to step 502 to write the updated
location. If at step 512, all banks had been written, then the
process is done (step 516).
[0075] This triply looped process enables one entry of information
to be moved from the bus to a FIFO buffer for each clock cycle of
the bus. The process also allows one entry to be moved from a FIFO
buffer to the column each clock cycle. This provides an interface
between the bus and the non-volatile storage.
[0076] The method can also support reading data bursts from the
non-volatile storage and placing it on the bus at high speed. The
method of FIG. 5 can be used by reading the next byte from the
column into the selected buffer at step 512 and moving the current
entry in the selected buffer onto the bus at step 514.
[0077] F. Conclusion
[0078] Thus, a method and apparatus for interfacing a high speed
bus with a non-volatile storage has been described. The apparatus
supports matching a high speed bus such as a 66 MHz bus with the
much slower flash memory modules that may be used for non-volatile
storage to provide throughput equivalent to that of the bus.
[0079] The foregoing description of various embodiments of the
invention have been presented for purposes of illustration and
description. It is not intended to limit the invention to the
precise forms disclosed. Many modifications and equivalent
arrangements will be apparent.
ATTACHMENT B
Original Specification Showing Changes Highlighted
* * * * *