Parallel erase operations in memory systems Bruce, Ricardo H. ; et al. [Bruce, Ricardo H.]

Parallel erase operations in memory systems

Bruce, Ricardo H. ; et al.

Patent Application Summary

U.S. patent application number 10/157541 was filed with the patent office on 2002-10-03 for parallel erase operations in memory systems. Invention is credited to Bruce, Ricardo H., Bruce, Rolando H..

Application Number	20020141244 10/157541
Document ID	/
Family ID	26941028
Filed Date	2002-10-03

United States Patent Application	20020141244
Kind Code	A1
Bruce, Ricardo H. ; et al.	October 3, 2002

Parallel erase operations in memory systems

Abstract

An apparatus for and method of memory operation having a memory, a cache containing a plurality of entries with a plurality of the entries to be written to memory, a detector for detecting in the cache the plurality of entries to be written to memory, and a processor for erasing a first portion of the memory to accommodate the plurality of entries to be written to memory and writing to the first portion of the memory the plurality of entries to be written to memory wherein an erase operation is followed by a plurality of sequential write operations.

Inventors:	Bruce, Ricardo H.; (Union City, CA) ; Bruce, Rolando H.; (S. San Francisco, CA)
Correspondence Address:	Stephen R. Uriarte 45550 Northport Loop E. Fremont CA 94538 US
Family ID:	26941028
Appl. No.:	10/157541
Filed:	May 28, 2002

Related U.S. Patent Documents


Application Number	Filing Date	Patent Number
10157541	May 28, 2002
09819423	Mar 27, 2001
60250642	Nov 30, 2000

Current U.S. Class:	365/185.33 ; 711/E12.04
Current CPC Class:	G11C 16/10 20130101; G06F 12/0804 20130101; G06F 2212/2022 20130101; G11C 16/16 20130101
Class at Publication:	365/185.33
International Class:	G11C 016/04

Claims

The invention claimed is:

1. A method of memory operation comprising: providing a memory; providing a cache containing a plurality of entries with a plurality of the entries to be written to memory; detecting in the cache the plurality of entries to be written to memory; erasing a first portion of the memory to accommodate the plurality of entries to be written to memory; and writing to the first portion of the memory the plurality of entries to be written to memory wherein an erase operation is followed by a plurality of sequential write operations.

2. The method as claimed in claim 1 wherein the detecting uses circuitry selected from a group of circuitry consisting of a local processor, a direct memory access controller, a memory-specific direct memory access controller, and a combination thereof.

3. The method as claimed in claim 1 including: detecting a second plurality of entries to be written to memory; and writing to the first portion of the memory the second plurality of entries to be written to memory wherein a plurality of erase operations followed by a plurality of sequential write operations is performed in parallel.

4. The method as claimed in claim 1 including: detecting a second plurality of entries to be written to memory; erasing a second portion of the memory to accommodate the second plurality of entries to be written to memory; and writing to the second portion of the memory the second plurality of entries to be written to memory wherein a plurality of erase operations followed by a plurality of sequential write operations is performed in parallel.

5. The method as claimed in claim 1 wherein the erasing is performed on a basis selected from a group consisting of a schedule, a demand, and a combination thereof.

6. A method of flash memory operation comprising: providing a flash memory; providing a cache containing a plurality of entries with a plurality of dirty entries to be written to flash memory; detecting in the cache the plurality of dirty entries to be written to flash memory; erasing a first portion of the flash memory to accommodate the plurality of dirty entries to be written to flash memory; and writing to the first portion of the flash memory the plurality of dirty entries to be written to flash memory wherein an erase operation is followed by a plurality of sequential write operations.

7. The method as claimed in claim 6 wherein the detecting uses circuitry selected from a group of circuitry consisting of a local processor, a direct memory access controller, a flash-specific direct memory access controller, and a combination thereof.

8. The method as claimed in claim 6 including: detecting a second plurality of dirty entries to be written to flash memory; and writing to the first portion of the flash memory the second plurality of dirty entries to be written to flash memory wherein a plurality of erase operations followed by a plurality of sequential write operations is performed in parallel.

9. The method as claimed in claim 6 including: detecting a second plurality of dirty entries to be written to flash memory; erasing a second portion of the flash memory to accommodate the second plurality of dirty entries to be written to flash memory; and writing to the second portion of the flash memory the second plurality of dirty entries to be written to flash memory wherein a plurality of erase operations followed by a plurality of sequential write operations is performed in parallel.

10. The method as claimed in claim 6 wherein the erasing is performed on a basis selected from a group consisting of a schedule, a demand, and a combination thereof.

11. A memory system comprising: a memory; a cache containing a plurality of entries with a plurality of the entries to be written to memory; a detector for detecting in the cache the plurality of entries to be written to memory; and a processor for erasing a first portion of the memory to accommodate the plurality of entries to be written to memory and for writing to the first portion of the memory the plurality of entries to be written to memory wherein an erase operation is followed by a plurality of sequential write operations.

12. The memory system as claimed in claim 11 wherein detector uses circuitry selected from a group of circuitry consisting of a local processor, a direct memory access controller, a memory-specific direct memory access controller, and a combination thereof.

13. The memory system as claimed in claim 11 wherein: the detector detects a second plurality of entries to be written to memory; and the processor writes to the first portion of the memory the second plurality of entries to be written to memory wherein a plurality of erase operations followed by a plurality of sequential write operations is performed in parallel.

14. The memory system as claimed in claim 11 wherein: the detector detects a second plurality of entries to be written to memory; and the processor erases a second portion of the memory to accommodate the second plurality of entries to be written to memory and writes to the second portion of the memory the second plurality of entries to be written to memory wherein a plurality of erase operations followed by a plurality of sequential write operations is performed in parallel.

15. The memory system as claimed in claim 11 wherein the processor erases are performed on a basis selected from a group consisting of a schedule, a demand, and a combination thereof.

16. A memory system of flash memory operation comprising: a flash memory; a cache containing a plurality of entries with a plurality of dirty entries to be written to flash memory; a detector for detecting in the cache the plurality of dirty entries to be written to flash memory; and a processor for erasing a first portion of the flash memory to accommodate the plurality of dirty entries to be written to flash memory and writing to the first portion of the flash memory the plurality of dirty entries to be written to flash memory wherein an erase operation is followed by a plurality of sequential write operations.

17. The memory system as claimed in claim 16 wherein the detector uses circuitry selected from a group of circuitry consisting of a local processor, a direct memory access controller, a flash-specific direct memory access controller, and a combination thereof.

18. The memory system as claimed in claim 16 wherein: the detector detects a second plurality of dirty entries to be written to flash memory; and writes to the first portion of the flash memory the second plurality of dirty entries to be written to flash memory wherein a plurality of erase operations followed by a plurality of sequential write operations is performed in parallel.

19. The memory system as claimed in claim 16 wherein: the detector detects a second plurality of dirty entries to be written to flash memory; and the processor erases a second portion of the flash memory to accommodate the second plurality of dirty entries to be written to flash memory and writes to the second portion of the flash memory the second plurality of dirty entries to be written to flash memory wherein a plurality of erase operations followed by a plurality of sequential write operations is performed in parallel.

20. The memory system as claimed in claim 16 wherein the processor performs erases on a basis selected from a group consisting of a schedule, a demand, and a combination thereof.

Description

CROSS-REFERENCE TO RELATED APPLICATION(S)

[0001] This application is a Continuing Application claiming the benefit of U.S. patent application, Ser. No. 09/819,423, filed Mar. 27, 2001, which in turn claims priority of U.S. Provisional patent application serial No. 60/250,642, filed Nov. 30, 2000.

FIELD OF THE INVENTION

[0002] The present invention relates generally to memory storage systems, and more particularly to flash memory systems.

BACKGROUND OF THE INVENTION

[0003] Computer systems have traditionally used hard disk systems with rotating magnetic disks as data storage media. However, disk drives are disadvantageous in that they are bulky and they require high precision moving mechanical parts. They are also not rugged and are prone to reliability problems, as well as consuming significant amounts of power.

[0004] More recently these hard disk systems are being replaced by semiconductor systems. These semiconductor systems use electrically erasable programmable read-only-memory (EEPROM) technology as memory storage cells as a substitute for the hard-disk magnetic media. The EEPROMs have the capability of electrically erasing data stored on the memory and replacing it with other data. However, programming the EEPROM is relatively slow since input/output of data and addressing is in a serial format. Additionally, special "high" voltages are required when programming the EEPROM. Even further, EEPROMs are typically only available in relatively small memory sizes such as 8 Kbyte or 16 Kbyte sizes. As more and more non-volatile memory space is required at lower power consumption for portable electronic apparatus, alternatives to EEPROM are required.

[0005] "Flash" EEPROM, also known as "flash memory", has been the answer. Large regions of flash memory can be erased at one time which makes reprogramming flash memory faster than reprogramming EEPROM and which is the origin of the term "flash". Additionally, it has lower stand-by power consumption than EEPROM. Also, in replacing hard disk systems, these flash memory systems are sometimes referred to as flash "disk" systems and similar descriptive terminology is used, even though no rotating magnetic disks are used.

[0006] In the flash memory system, a plurality of flash memory chips are arranged in banks that share some of the control signals from a buffer chip. The flash memory chips are nonvolatile semiconductor-memory chips that retain data when power is no longer applied.

[0007] The flash memory chips are divided into pages and blocks. A 64 Mbit flash chip typically has 512-byte pages, which happens to match the sector size for IDE and small-computer system interface (SCSI) hard disks. Rather than writing to or reading from just one word in the page, the entire page must be read or written at the same time; individual bytes cannot be written. Thus flash memory operations are inherently slow since an entire page must be read or written.

[0008] Flash memory is also not truly random-access. While reads can be to random pages, writes require that memory cells must first be erased before information is placed in them; i.e., a write (or program) operation is always preceded by an erase operation.

[0009] The erase operation is done in one of several ways. For example, in some flash memories, the entire chip is erased at one time. If not all the information in the chip is to be erased, the information must first be temporarily saved, and is usually written into another memory (typically a RAM). The information is then restored into the nonvolatile flash memory by programming back into the chip.

[0010] In other flash memories, the memory is divided into blocks that are each separately erasable, but only one at a time. By selecting the desired block and going through the erase sequence the designated area is erased. While, the need for temporary memory is reduced, erase in various areas of the memory still requires a time consuming sequential approach.

[0011] In still other flash memories, the memory is divided into sectors where all cells within each sector are erasable together. Each sector can be addressed separately and selected for erase.

[0012] In even other flash memories, certain numbers of blocks are reserved to be pre-erased and a logical block address (LBA) to physical block address (PBA) translation must be performed.

[0013] While flash reads can be to random pages, flash writes require that larger regions, such as a sector, block, or chip be erased in a flash erase operation before a flash write can be performed. For example, in block erases, a block of 16 pages must be erased together, while all 512 bytes on a page must be written together.

[0014] In all these flash memories, flash erase operations are significantly slower than flash read or write operations. Further, only one erase operation per flash memory chip can be active at a time.

[0015] Since the time taken by the flash erase and the write operations affect the operating speed of the entire flash memory system, a way of speeding up these operations has been long sought, but has equally as long eluded those skilled in the art.

[0016] Working from another direction, those skilled in the art have developed cache memories to speed up the performance of computer systems having slower access devices, such as flash memory. Typically, a part of system RAM is used as a cache for temporarily holding the most recently accessed data from the flash memory system. The next time the data is needed, it may be obtained from the fast cache instead of the slow flash memory system. This technique works well in situations where the same data is repeatedly operated on. This is the case in most structures and programs since the computer tends to work within a small area of memory at a time in running a program.

[0017] Most of the conventional cache designs are read caches for speeding up reads from flash memory. In some cases, write caches are used for speeding up writes to flash memory. However, in the case of writes to flash memory systems, data is written to flash memory directly every time they occur, while being-written into cache at the same time. This is done because of concern for loss of updated data files in case of power loss. If the write data is only stored in the cache memory, which is a volatile memory, a loss of power will result in new updated files being lost from the cache before having the old data updated in nonvolatile flash memory. The system will then be operating on the old data when these files are used in further processing. The need to write to flash memory every time is considered by those skilled in the art to defeat the benefits of the caching mechanism for writes. Read caching does not have this concern since the data that could be lost from cache has a backup in flash memory.

[0018] Those skilled in the art have also used direct-memory access (DMA) to facilitate data transfers. While DMA is efficient for transfers of raw data to a memory, flash memory chips also require command and address sequences to set up the relatively long flash operations. Unfortunately, DMA is not well suited to transfer addresses and commands since it is designed to transfer long strings of data beginning at a starting address through an ending address.

[0019] Thus, those skilled in the art working from different directions have encountered what appears to be an insurmountable bottleneck in speeding up flash memory systems to match faster and faster host computer system processors.

SUMMARY OF THE INVENTION

[0020] A method of memory operation providing a memory, a cache containing a plurality of entries with a plurality of the entries to be written to memory, a detector for detecting in the cache the plurality of entries to be written to memory, and erasing a first portion of the memory to accommodate the plurality of entries to be written to memory and writing to the first portion of the memory the plurality of entries to be written to memory in which an erase operation is followed by a plurality of sequential write operations. Since the time taken by the flash erase and the write operations affect the operating speed of the entire flash memory system, the present invention provides a way of substantially speeding up these operations.

[0021] A memory system having a memory, a cache containing a plurality of entries with a plurality of the entries to be written to memory, a detector for detecting in the cache the plurality of entries to be written to memory, and a processor for erasing a first portion of the memory to accommodate the plurality of entries to be written to memory and writing to the first portion of the memory the plurality of entries to be written to memory in which an erase operation is followed by a plurality of sequential write operations. Since the time taken by the flash erase and the write operations affect the operating speed of the entire flash memory system, the present invention provides a fast memory system.

[0022] The above and additional advantages of the present invention will become apparent to those skilled in the art from a reading of the following detailed description when taken in conjunction with the accompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

[0023] FIG. 1 is a block diagram of a flash memory system in accordance with one embodiment of the present invention; and

[0024] FIG. 2 is a time chart showing conventional alternating flash erase and write operations, parallel flash erase and write operations of the present invention, and parallel-parallel flash erase and write operations in accordance with another embodiment of the present invention.

[0025] FIG. 3 shows an example list of cache blocks that require write transactions.

[0026] FIG. 4 shows example erase command sequences performed in response to certain cache blocks shown in FIG. 3 in accordance with one embodiment of the present invention.

[0027] FIG. 5 shows example write command sequences performed in response to certain erase command sequences shown in FIG. in accordance with one embodiment of the present invention.

[0028] FIG. 6 shows write command sequences that may be asserted on the same flash bus by two different DMA controllers so as to interleave the write commands in accordance with a further embodiment of the present invention.

[0029] FIG. 7 shows a method of performing a parallel-erase operation in accordance with one embodiment of the present invention.

[0030] FIG. 8 shows a method of interleaving write command sequences in accordance with yet one embodiment of the present invention.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

[0031] FIG. 1 is a block diagram of a flash memory system 10 having at least one flash specific DMA controller coupled to each flash bus used in the system. In the embodiment shown, two DMA controllers are used with DMA controller 12 coupled to flash bus 16 and DMA controller 14 coupled to flash bus 18. The number of DMA controllers for each flash bus or the number of flash buses shown is not intended to limit the present invention in any way and may be increased to improve performance. Flash memory systems are known, such as that described in U.S. Pat. No. 5,822,251.

[0032] In operation, local processor 20 sends high-level requests to flash chips 24 via local bus 22. Each request is translated into a sequence of commands, address bytes and data transfers ("command sequence") by either DMA controller 12 or 14. DMA controller 12 or 14 in turn transfers the command sequence to a flash buffer chip ("buffer chip") by using shared data/address/command lines that comprise the flash bus. Each buffer chip 26 is coupled to at one bank of flash chips 24 via a buffer bus, such as buffer bus 40 and 42. In addition to the shared data/address/command lines, flash buses 16 and 18 also have lines for transmitting an encoded command. The encoded command is used to select and control a plurality of buffer chips 26. In one embodiment, there are 8 multiplexed data/address/command lines and 2 encoded command lines. Thus, each flash bus has an 8-bit portion destined for the buffer chips or for the flash memory chips 24, and a 2-bit portion sent only to buffer chips 26.

[0033] Each buffer chip 26 buffers at least one bank of flash memory chips 24 and also serves as a protocol converter by using a protocol defined for the flash buses to transceive command sequences on the flash buses and by converting the protocol to another protocol expected by a flash memory chip, such as flash chip 114a, 114b, 114c or 114d. This could be as simple as converting flash bus commands to the appropriate sequence of signal transitions to a flash memory chip, or could involve translation of commands or addresses, or even more complex sequencing. In one embodiment, the commands on the flash bus are kept similar to those expected by the flash chips to minimize the cost of conversion and thus keep buffer chips 26 simple.

[0034] DMA controllers 12 and 14 drive flash buses 16 and 18, respectively. Flash buses 16 and 18 can operate at the same time, allowing flash operations to be initiated and processed in parallel. Each buffer chip 26 controls at least one bank of flash memory chips 24. Each bank can be separately accessed, allowing flash chips to perform flash operations in parallel with flash chips in other banks. In addition, each flash chip within the same bank can also be separately accessed, allowing flash operations to be performed in parallel on more than one flash chip within the same bank of flash chips. Hence, not only can parallel flash operations be performed on different flash chips belonging to separate banks but also on flash chips belonging to the same bank of flash chips.

[0035] Each buffer chip 26 may be coupled to any number, such as four, of banks of flash chips 24 although only two banks per buffer chip are shown in FIG. 1 to avoid over-complicating the present invention. Each bank has eight flash memory chips 24 although only four flash chips are shown, such as flash-chips 114a through 114d and 116a through 116d. Additional banks of flash memory chips can be added to an existing buffer bus, or modules of flash memory chips with a buffer chip can be coupled to a flash bus. The ability to add additional flash buses facilitates expansion since any number of buffer chips can be added. Buffer chips monitor flash operations performed by flash chips, permitting the buffer chips to indicate to DMA controllers 12 and 14 which flash chips 24 are busy and enabling the DMA controllers to perform additional flash operations to other flash chips.

[0036] DMA controllers 12 and 14 may be contained in a single Application-Specific Integrated Circuit (ASIC) 28. The ASIC 28 connects DMA controllers 12 and 14 to local bus 22. In another embodiment, the DMA controllers are integrated in any chip in the flash memory system 10, which facilitates data transfer between the flash chips and local bus 22. For example, one DMA controller may be integrated with each of buffer chips 26 instead of being integrated together in the ASIC 28.

[0037] Local bus 22 connects a cache 30, local processor 20, and an interface controller 32, such as a small-computer system interface (SCSI), ATA/IDE, or another interface controller, to DMA controllers 12 and 14. Host requests from a host 34 are received by interface controller 32 and driven onto local bus 22. Local processor 20 responds to the host requests by storing host data into cache 30 for writes, or reading data from the flash memory chips 24 or from cache 30. A read-only memory (ROM) 36 contains firmware code of routines that execute on local processor 20 to respond to host requests. Other system-maintenance routines are stored on ROM 36, such as wear-leveling and copy-back routines.

[0038] Cache 30 is under firmware control by local processor 20, and thus the local processor's local memory 38 and cache 30 may share the same physical memory. Cache 30 is implemented using DRAM although this is not intended to limit the present invention in any way. Cache 30 is used as a cache for temporarily holding the most recently accessed data from flash memory system 10. The next time the data is needed by the host 34, it may be obtained from cache 30 instead of the relatively slower flash memory 25. This technique works well in situations where the same data is repeatedly operated on as is the case in most structures and programs since host 34 tends to work within a small area of memory at a time in running a program. In one embodiment, cache 30 has a cache size of 32MB for a 256 MB flash memory.

[0039] Local processor 20 also tracks data stored in cache 30 and can determine if the data, such as a cache block or cache sector/page, is "dirty", or has been updated more recently than the copy of the data stored in flash memory 25. This permits host 34 to use the most recent copy of the data and, when the dirty data is to be "victimized", or replaced, with other data, the dirty data is first written to flash memory 25 so that any changes that were made to the dirty data will be preserved. This technique is well known to those skilled in the art. This cache coherency process permits local processor 20 to determine when dirty cache data is ready to be written to flash memory 25.

[0040] Local processor 20 initiates write or read transactions by sending high-level commands to one of the DMA controllers 12 or 14. DMA controller 12 or 14 then generates the corresponding command sequences. Many command sequences may be needed, such as for block reads and writes. A block read requires that many page read command sequences be performed, each sequence generally sending command and address bytes to flash memory chips 24 through the buffer chips 26. Some flash chips also have a sequential read mode where command and address bytes need only be sent for the first page in a sequence.

[0041] Local processor 20 uses DMA transfers to move data between one of the DMA controllers coupled to the flash buses and cache 20. The DMA transfers may be performed by local processor 20 using program control or may be facilitated using at least one additional DMA controller (not shown) that is either integrated with local processor 20 or implemented separately and coupled to local bus 22.

[0042] Conventional flash memory, such as flash chips 114a through 114d and 116a through 116d, operate differently from DRAM devices in many respects. For instance, a flash block selected for a write transaction must first erase the block targeted for the write transaction before the block can be written with data. Performing and completing an erase cycle before performing a write cycle adds an additional delay. Referring to FIG. 2, a time chart 50 is shown that depicts alternating erase and write cycles 52 that are performed in conventional flash memory. The erase cycles have flash erase times 61 through 64, respectively, and the write cycles have write times 65 through 68, respectively. In a typical flash chip used by the inventors listed herewith, each erase cycle takes approximately 4 ms to perform, while performing each write cycle takes approximately 3 ms. Thus, the total elapsed time to perform four write transactions, for example, is the first erase cycle time 61 plus the first write cycle time 65 plus the second erase cycle time 62 plus the second write cycle time 66 plus the third erase cycle time 63 plus the third write cycle time 67 plus the fourth erase cycle time 64 plus the fourth write cycle time 68. At 4 millisecond per erase cycle and 3 milliseconds per write cycle, the total time for four conventional erases and writes is 28 ms.

[0043] The present invention minimizes the above described cumulative delays by determining which cache blocks need write transactions and which of the erase cycles associated with the write transactions can be performed before performing the write cycles associated with the erase cycles. The erase cycles are performed as a group (in "parallel") before performing the write cycles. This approach reduces the total time to perform write transactions when compared to traditional methods. Performing erase cycles in parallel ("parallel erase operation") may only be done using separate flash chips although flash chips belonging to the same bank of flash chips are considered as separate flash chips and may be erased as part of the parallel operation.

[0044] For example, FIG. 3 shows a list of cache blocks that require write transactions. Specifically, cache blocks 106, 108, 110, and 112 require a write transaction to flash chips 114a, 114d, 116a and 116c, respectively. Cache blocks 118 and 120 require a write transaction to flash chips 114a and 116a, respectively, while cache blocks 122 and 124 require a write transaction to flash chips 114c and 114b, respectively. Cache blocks 106, 108, 110 and 112 qualify for a parallel erase operation since they require write transactions to separate flash chips. Cache blocks 118 and 120 also qualify for a parallel erase operation but cannot be performed in parallel with the parallel erase operation for cache blocks 106, 108, 110 and 112 because they require erase cycles to the same flash chips as cache blocks 112 and 120. Consequently, the present invention, either DMA 14 or local processor through program code, selects either set of cache blocks as eligible for write transactions that involve a parallel erase operation. Upon completion of all of the erase cycles by the respective flash chips, the DMC then transmits a series of write commands as group to the erased flash chips via flash bus 18. The erased flash chips then perform write cycles on the flash blocks that were erased.

[0045] Write transactions are then performed on the next set of cache blocks that are eligible, such as cache blocks 112 and 120, for a parallel erase operation. The cache blocks that are eligible for a group or parallel erase are cache blocks 122 and 124, as shown in FIG. 3. This set of cache blocks were not included with the first group of cache blocks because they are preceded by cache blocks 118 and 120, which required erase cycles to be performed by the same flash chips used in the first group of cache blocks.

[0046] In yet another embodiment of the present invention, the ideal number of dirty cache entries, which trigger an erase, is heuristically determined to achieve optimal performance as would be evident to those skilled in the art. One factor in the determination is that a flash memory cell is currently capable of being cycled a limited number of times before the erases irrevocably damage the memory cell. Thus, one objective is to minimize the number of erases and another is to spread the erases over different flash memory chips 24.

[0047] The determination as to which of the cache blocks requiring a write transaction qualify for a group erase may be performed while the blocks are either in cache 30 or in a pending queue, which in one embodiment of the present invention, are provided for each DMA coupled to a flash bus, such as DMA controller 12 and DMA controller 14. In addition, the set of blocks selected are those that are the first eligible group which will not cause the parallel erase cycles to be performed on the same flash chip at the same time although this is not intended to limit the present invention in any way.

[0048] FIG. 4 is a representation of the erase command sequences for cache blocks 106, 108, 110 and 112 that are performed by DMA controller 14 and the resulting erase cycles 126, 128, 130 and 132 performed by flash chips 114a, 114d, 116a and 116c, respectively, on the flash blocks (not shown) that correspond to cache blocks 106, 108, 110 and 112 in accordance with one embodiment of the present invention. FIG. 5 is a representation of the write command sequences for cache blocks 106, 108, 110, and 112 that are performed by DMA controller 14 and the resulting write cycles 134, 136, 138 and 140 performed by flash chips on the flash blocks 114a, 114c, 116a and 116d that correspond to cache blocks 106, 108, 110 and 112 in accordance with one embodiment of the present invention.

[0049] As seen in FIG. 4, erase command sequences 100a, 100b, 100c and 100d directed at flash chips 114a, 114d, 116a and 116c, respectively, are launched in a sequential manner by DMA controller 14 on flash bus 18, rendering flash bus 18 unavailable 133 during the launching of the commands. The erase command sequences are launched in response to the write transactions requested for cache blocks 106, 108, 110 and 113. Flash chips 114a, 114d, 116a and 116c receive their respective erase command sequence and each consumes approximately four (4) milliseconds to perform an erase cycle in response to each erase command sequence. Although the erase command sequences are asserted sequentially, each erase command sequence takes only microseconds to complete, enabling the flash chips to perform the erase cycles as a group or in parallel.

[0050] Referring to FIG. 5, write command sequences 101a, 101b, 101c and 101d directed at flash chips 114a, 114d, 116a and 116c, respectively, are launched in a sequential manner by DMA controller 14 on flash bus 18. Like the erase command sequences above, this renders flash bus 18 busy 142 during the launching of the commands. The write command sequences are launched as part of the write transactions requested for cache blocks 106, 108, 110 and 113 and after the erase cycle that corresponds to the write cycle has been completed. Flash chips 114a, 114d, 116a and 116c receive their respective write command sequence and each consumes approximately three (3) milliseconds to perform a write cycle in response to each write command sequence.

[0051] Thus, although the erase and write commands are launched on flash bus 18 sequentially, the commands consume a fraction of the time that a flash device consumes when performing an erase or write operation since the time to assert the command sequences on flash bus 18 take microseconds to perform rather than milliseconds. This permits the flash chips to perform the erase operations essentially in parallel, greatly reducing the cumulative time to complete write transactions when compared to the example discussed in FIG. 2. Performance is further enhanced because more than one DMA controller is provided. Each DMA controller is able to launch new flash operations simultaneously, allowing different flash chips to perform separate flash operations at the same time and to further increase parallelism.

[0052] FIG. 6 shows write command sequences that may be asserted on the same flash bus by two different DMA controllers so as to interleave the write commands in accordance with a further embodiment of the present invention. To minimize over-complicating the discussion herein, write command sequences 101a, 101b, 101c and 101d shown in FIG. 5 are also shown in FIG. 6. Time line 142 represents write command sequences 144a, 144b, 144c and 144d that are asserted by another DMA controller on the same flash bus used by DMA controller 14, which in this example is flash bus 18. Write command sequences 144a, 144b, 144c and 144d are asserted only when the write command sequences asserted by DMA controller 14 are in an idle mode, such as when after a write command sequence is received by a flash chip and DMA controller 14 is waiting to receive an acknowledgement from the flash chip that it completed a program cycle. Each write command sequence 144a, 144b, 144c and 144d is followed by a corresponding program command sequence 148a, 1448b, 148c and 148d, respectively. During this program command sequence, the present invention permits a different DMA controller to assert a write command sequence on flash bus 18 that is then followed by a program sequence, such as program sequence 150a. Program sequence 150a preferably occurs during the same period as the occurrence of write command sequence 101b. Thus, by interleaving write command sequences asserted by different DMA controllers on the same flash bus, bandwidth efficiency on flash bus 18 may be improved since each write command sequence may be followed by another write command sequence. Write command sequences 144a through 144b correspond to previous erase command sequences (not shown).

[0053] In operation, local processor 20 or the DMA controller 14 detects which cache entries need to be written to flash memory, such as cache entries that are dirty, and causes an erase to be performed first to open up flash memory space on a schedule or on demand. An example of erasing on demand would be after a predetermined number of the writes and of erasing on demand would be when a cache entry is dirty but will be delayed in being written but the location of the stale entry in the flash memory is known so it can be erased. This latter technique also permits entries to be moved around in the flash memory to even the wear on the memory cells.

[0054] Increasing parallelism as describe above has a cost in that the local processor 20 will have to spend more and more of its time managing writes and less time performing other necessary operations which will slow down the overall operation of the flash memory system 10. To address this problem with the local processor 20, the present inventors suggest using at least one dedicated DMA controller for transferring data between DMA controllers attached to flash buses, such as DMA controllers 12 and 14 and cache 30 in yet another embodiment of the present invention. Preferably, there is one dedicated DMA controller for each DMA controller attached to the flash bus. During data transfer, transfer is controlled by the dedicated DMA controller, obviating the need for control by a program in the local processor 20 and, at the end of the transfer, the relevant status could be posted to the local processor 20. When local processor 20 sends a data transfer command to the dedicated DMA controller, it specifies management information that comprises a transfer start address and a transfer count for each of the transfer source and transfer destination. The area specified in this manner is transferred in sequence as one group of data blocks.

[0055] FIG. 7 is a process flow showing a method of performing a parallel erase operation in accordance with one embodiment of the present invention.

[0056] At reference 200, a plurality of entries or cache blocks that require a write transaction involving erase cycles to different flash chips are detected.

[0057] At reference 202, the flash chips are erased approximately in parallel by asserting the erase command cycles to each write transaction in sequence on the flash bus.

[0058] At reference 204, write command sequences are sequentially asserted on the flash bus. Please note that each command sequence includes a program command sequence.

[0059] In yet a further embodiment of the present invention as shown in FIG. 8, reference 206 may be performed by interleaving write command sequences that correspond to prior erase command sequences on the same flash bus as the write command sequences asserted in reference 204.

[0060] While the invention has been described in conjunction with a specific best mode, it is to be understood that many alternatives, modifications, and variations will be apparent to those skilled in the art in light of the aforegoing description. Accordingly, it is intended to embrace all such alternatives, modifications, and variations that fall within the spirit and scope of the included claims. All matters hither-to-fore set forth or shown in the accompanying drawings are to be interpreted in an illustrative and non-limiting sense.

* * * * *