Apparatus And Method For Handling Temperature Dependent Failures In A Memory Device Puthoor; Aneesh ; et al. [Western Digital Technologies, Inc.]

Apparatus And Method For Handling Temperature Dependent Failures In A Memory Device

Puthoor; Aneesh ; et al.

Patent Application Summary

U.S. patent application number 16/737551 was filed with the patent office on 2021-07-08 for apparatus and method for handling temperature dependent failures in a memory device. This patent application is currently assigned to Western Digital Technologies, Inc.. The applicant listed for this patent is Western Digital Technologies, Inc.. Invention is credited to Aneesh Puthoor, Narendhiran Chinnaanangur Ravimohan, Harvijay Singh.

Application Number	20210210157 16/737551
Document ID	/
Family ID	1000005666523
Filed Date	2021-07-08

United States Patent Application	20210210157
Kind Code	A1
Puthoor; Aneesh ; et al.	July 8, 2021

APPARATUS AND METHOD FOR HANDLING TEMPERATURE DEPENDENT FAILURES IN A MEMORY DEVICE

Abstract

Devices, methods, and systems for managing temperature dependent failures in a memory device. An erase failure of a memory block is detected, and marked as a grown bad block if the memory device temperature is below a threshold temperature. If the temperature exceeds the threshold temperature, it is determined whether memory cells of the block exceed a first threshold voltage. If the memory cells of the block exceed the first threshold voltage, the block is marked as a potential grown bad block. If the memory cells of the block are below the first threshold voltage, it is determined whether a number of the memory cells of the block exceed a second threshold voltage. If the memory cells of the block are below the second threshold, the block is programmed. If the memory cells of the block exceed the second threshold, the block is marked for error correction and programmed.

Inventors:

Puthoor; Aneesh; (Bangalore, IN) ; Singh; Harvijay; (Bangalore, IN) ; Ravimohan; Narendhiran Chinnaanangur; (Bangalore, IN)

Applicant:

Name	City	State	Country	Type
Western Digital Technologies, Inc.	San Jose	CA	US

Assignee:

Western Digital Technologies, Inc.
San Jose
CA

Family ID:

1000005666523

Appl. No.:

16/737551

Filed:

January 8, 2020

Current U.S. Class:	1/1
Current CPC Class:	G11C 29/88 20130101; G11C 16/3445 20130101; G11C 29/42 20130101; G11C 29/702 20130101; G11C 29/50004 20130101; G11C 16/26 20130101; G11C 29/783 20130101
International Class:	G11C 29/50 20060101 G11C029/50; G11C 29/00 20060101 G11C029/00; G11C 29/42 20060101 G11C029/42; G11C 16/34 20060101 G11C016/34; G11C 16/26 20060101 G11C016/26

Claims

1. A memory device configured to manage temperature dependent failures, the memory device comprising: circuitry configured to detect an erase failure of a memory block of the memory device; circuitry configured to, if a temperature of the memory device does not exceed a threshold temperature, mark the memory block as a grown bad block; circuitry configured to, if the temperature exceeds the threshold temperature, determine whether cell voltages of memory cells of the memory block exceed a first threshold cell voltage; circuitry configured to, if the cell voltages exceed the first threshold cell voltage, mark the memory block as a potential grown bad block; circuitry configured to, if the cell voltages do not exceed the first threshold cell voltage, determine whether a number of the cell voltages exceed a second threshold cell voltage; circuitry configured to, if the number of the cell voltages do not exceed the second threshold cell voltage, program the memory block; and circuitry configured to, if the number of the cell voltages exceed the second threshold cell voltage, mark the memory block for error correction and program the memory block.

2. The memory device of claim 1, further comprising circuitry configured to, if the memory block is marked as the potential grown bad block, mark the memory block as the grown bad block if a second erase failure of the memory block is detected at a temperature below the threshold temperature.

3. The memory device of claim 1, wherein determining whether the cell voltages exceed the first threshold cell voltage comprises determining whether cell voltages of a threshold number of the memory cells of the memory block exceed the first threshold cell voltage.

4. The memory device of claim 1, wherein determining whether the cell voltages exceed the second threshold cell voltage comprises determining whether cell voltages of a threshold number of the memory cells of the memory block exceed the second threshold cell voltage.

5. The memory device of claim 1, further comprising circuitry configured to throttle a speed of the memory device if a number of memory blocks of the memory device marked as possible grown bad blocks exceeds a threshold number of bad blocks.

6. The memory device of claim 1, wherein the temperature of the memory device comprises a junction temperature.

7. The memory device of claim 1, wherein the first threshold cell voltage and the second threshold cell voltage are based on a characteristic of the memory device.

8. The memory device of claim 1, wherein the first threshold cell voltage and the second threshold cell voltage are determined empirically.

9. The memory device of claim 1, wherein marking the memory block for error correction comprises setting a soft bit window.

10. The memory device of claim 1, wherein marking the memory block for error correction comprises increasing a size of a soft bit window.

11. A method for managing temperature dependent failures in a memory device, the method comprising: detecting an erase failure of a memory block of the memory device; if a temperature of the memory device does not exceed a threshold temperature, marking the memory block as a grown bad block; if the temperature exceeds the threshold temperature, determining whether cell voltages of memory cells of the memory block exceed a first threshold cell voltage; if the cell voltages exceed the first threshold cell voltage, marking the memory block as a potential grown bad block; if the cell voltages do not exceed the first threshold cell voltage, determining whether a number of the cell voltages exceed a second threshold cell voltage; if the number of the cell voltages do not exceed the second threshold cell voltage, programming the memory block; and if the number of the cell voltages exceed the second threshold cell voltage, marking the memory block for error correction and programming the memory block.

12. The method of claim 11, further comprising, if the memory block is marked as the potential grown bad block, marking the memory block as the grown bad block if a second erase failure of the memory block is detected at a temperature below the threshold temperature.

13. The method of claim 11, wherein determining whether the cell voltages exceed the first threshold cell voltage comprises determining whether cell voltages of a threshold number of the memory cells of the memory block exceed the first threshold cell voltage.

14. The method of claim 11, wherein determining whether the cell voltages exceed the second threshold cell voltage comprises determining whether cell voltages of a threshold number of the memory cells of the memory block exceed the second threshold cell voltage.

15. The method of claim 11, further comprising throttling a speed of the memory device if a number of memory blocks of the memory device marked as possible grown bad blocks exceeds a threshold number of bad blocks.

16. The method of claim 11, wherein the temperature of the memory device comprises a junction temperature.

17. The method of claim 11, wherein the first threshold cell voltage and the second threshold cell voltage are based on a characteristic of the memory device.

18. The method of claim 11, wherein the first threshold cell voltage and the second threshold cell voltage are determined empirically.

19. The method of claim 11, wherein marking the memory block for error correction comprises setting a soft bit window or increasing a size of the soft bit window.

20. A memory device, comprising: a memory including a plurality of memory cells; and means to detect an erase failure of a memory block of the memory, including: means to mark the memory block as a grown bad block, if a temperature of the memory device does not exceed a threshold temperature; means to determine whether cell voltages of memory cells of the memory block exceed a first threshold cell voltage, if the temperature exceeds the threshold temperature; means to mark the memory block as a potential grown bad block, if the cell voltages exceed the first threshold cell voltage; means to determine whether a number of the cell voltages memory cells of the block exceed a second threshold cell voltage, if the cell voltages do not exceed the first threshold cell voltage; means to program the memory block, if the number of the cell voltages do not exceed the second threshold cell voltage; and means to mark the memory block for error correction and program the memory block, if the number of the cell voltages exceed the second threshold cell voltage.

Description

FIELD OF INVENTION

[0001] This application relates to non-volatile memory devices, and more particularly, to managing temperature dependent failures of such devices.

BACKGROUND

[0002] Non-volatile memory devices typically include a number of memory cells implemented on a semiconductor die. Typically, the temperature of the memory increases under operational load. In some cases, the increased temperature can cause read, write, and/or erase operations to fail for the memory cells.

[0003] In some cases, a plurality of semiconductor dies is stacked within a package to increase the capacity of the memory. The thickness of one or more of the dies may be reduced in order to stack a greater number of dies in the package, to stack the same number of dies in a smaller sized package, or to stack the same number of dies of a different technology in the same sized package. However, a decrease in die thickness can make the dies more sensitive to high temperatures and increase leakage currents, which may cause voltages to clamp or NAND operations to fail.

SUMMARY

[0004] The present application is directed to devices, methods, and systems for managing temperature dependent failures in a memory device. An erase failure of a memory block is detected, and marked as a grown bad block if the memory device temperature is below a threshold temperature. If the temperature exceeds the threshold temperature, it is determined whether memory cells of the block exceed a first threshold voltage. If the memory cells of the block exceed the first threshold voltage, the block is marked as a potential grown bad block. If the memory cells of the block are below the first threshold voltage, it is determined whether a number of the memory cells of the block exceed a second threshold voltage. If the memory cells of the block are below the second threshold, the block is programmed. If the memory cells of the block exceed the second threshold, the block is marked for error correction and programmed.

BRIEF DESCRIPTION OF THE DRAWINGS

[0005] FIG. 1A is a block diagram illustrating an example non-volatile storage system;

[0006] FIG. 1B is a block diagram illustrating an example storage module that includes plural non-volatile storage systems;

[0007] FIG. 1C is a block diagram illustrating an example hierarchical storage system;

[0008] FIG. 2A is a block diagram illustrating components of an example controller;

[0009] FIG. 2B is a block diagram illustrating components of an example non-volatile memory die;

[0010] FIG. 3 is a line graph illustrating example cell voltage distributions of memory cells in a block;

[0011] FIG. 4 is a line graph illustrating further example cell voltage distributions of memory cells in a block; and

[0012] FIG. 5 is a flow chart illustrating an example method for managing temperature dependent block erase failures.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

[0013] Some implementations provide a memory device configured to manage temperature dependent failures. The memory includes circuitry configured to detect an erase failure of a memory block of the memory device. The memory also includes circuitry configured to mark the block as a grown bad block, if a temperature of the memory device does not exceed a threshold temperature. The memory also includes circuitry configured to determine whether memory cells of the block exceed a first threshold voltage, if the temperature exceeds the threshold temperature. The memory also includes circuitry configured mark the block as a potential grown bad block, if the memory cells of the block exceed the first threshold voltage. The memory also includes circuitry configured to determine whether a number of the memory cells of the block exceeds a second threshold voltage, if the memory cells of the block do not exceed the first threshold voltage. The memory also includes circuitry configured to program the block, if the memory cells of the block do not exceed the second threshold. The memory also includes circuitry configured to mark the block for error correction and program the block, if the memory cells of the block exceeds the second threshold.

[0014] In some implementations, the memory device also includes circuitry configured to mark the block as the grown bad block if the block is marked as the potential grown bad block, and if a second erase failure of the block is detected at a temperature below the threshold temperature. In some implementations, determining whether memory cells of the block exceed the first threshold voltage includes determining whether a threshold number of the memory cells of the block exceed the first threshold voltage. In some implementations, determining whether memory cells of the block exceed the second threshold voltage includes determining whether a threshold number of the memory cells of the block exceed the second threshold voltage. In some implementations, the memory device also includes circuitry configured to throttle a speed of the memory device if a number of memory blocks of the memory device marked as possible grown bad blocks exceeds a threshold number of bad blocks. In some implementations, the temperature of the memory device includes a junction temperature. In some implementations, the first threshold voltage and the second threshold voltage are based on a characteristic of the memory device. In some implementations, the first threshold voltage and the second threshold voltage are determined empirically. In some implementations, marking the block for error correction includes setting a soft bit window. In some implementations, marking the block for error correction includes increasing a size of a soft bit window.

[0015] Some implementations provide a method for managing temperature dependent failures in a memory device. The method includes detecting an erase failure of a memory block of the memory device. The method also includes marking the block as a grown bad block if a temperature of the memory device does not exceed a threshold temperature. The method also includes determining whether memory cells of the block exceed a first threshold voltage, if the temperature exceeds the threshold temperature. The method also includes marking the block as a potential grown bad block, if the memory cells of the block exceed the first threshold voltage. The method also includes determining whether a number of the memory cells of the block exceeds a second threshold voltage, if the memory cells of the block do not exceed the first threshold voltage. The method also includes programming the block, if the memory cells of the block do not exceed the second threshold. The method also includes marking the block for error correction and programming the block, if the memory cells of the block exceeds the second threshold.

[0016] In some implementations, the method includes marking the block as the grown bad block if the block is marked as the potential grown bad block and a second erase failure of the block is detected at a temperature below the threshold temperature. In some implementations, determining whether memory cells of the block exceed the first threshold voltage includes determining whether a threshold number of the memory cells of the block exceed the first threshold voltage. In some implementations, determining whether memory cells of the block exceed the second threshold voltage includes determining whether a threshold number of the memory cells of the block exceed the second threshold voltage. In some implementations, the method includes throttling a speed of the memory device if a number of memory blocks of the memory device marked as possible grown bad blocks exceeds a threshold number of bad blocks. In some implementations, the temperature of the memory device includes a junction temperature. In some implementations, the first threshold voltage and the second threshold voltage are based on a characteristic of the memory device. In some implementations, the first threshold voltage and the second threshold voltage are determined empirically. In some implementations, marking the block for error correction includes setting a soft bit window. In some implementations, marking the block for error correction includes increasing a size of a soft bit window.

[0017] Storage systems suitable for use in implementing aspects of these embodiments are shown in FIGS. 1A-1C. FIG. 1A is a block diagram illustrating an example non-volatile storage system 100 according to an example implementation of aspects of the subject matter described herein. Referring to FIG. 1A, non-volatile storage system 100 includes a controller 102 and non-volatile memory that may include one or more non-volatile memory dies 104. A die may include a collection of non-volatile memory cells, and associated circuitry for managing the physical operation of those non-volatile memory cells, that are formed on a single semiconductor substrate. Controller 102 interfaces with a host system and transmits command sequences for read, program, and erase operations to non-volatile memory die 104.

[0018] Controller 102 (which may include a non-volatile memory controller (e.g., a flash, ReRAM, PCM, or MRAM controller)) may include processing circuitry, a microprocessor or processor, and a computer-readable medium that stores computer-readable program code (e.g., firmware) executable by the microprocessor or processor, logic gates, switches, an application specific integrated circuit (ASIC), a programmable logic controller, and/or an embedded microcontroller, for example. Controller 102 can be configured with hardware and/or firmware to perform the various functions described below and shown in the flow diagrams. In some implementations, some of the components shown as being internal to the controller are stored external to the controller, and/or different components can be used. Additionally, the phrase "operatively in communication with" may mean directly in communication with or indirectly (wired or wireless) in communication with through one or more components, which may or may not be shown or described herein.

[0019] As used herein, a non-volatile memory controller is a device that manages data stored on non-volatile memory and communicates with a host, such as a computer or electronic device. In some implementations, a non-volatile memory controller may include various functionality in addition to or instead of the specific functionality described herein. For example, the non-volatile memory controller may include hardware and/or software to format the non-volatile memory to ensure the memory is operating properly, map out bad non-volatile memory cells, and/or allocate spare cells to be substituted for future failed cells. In some implementations, a subset of the spare cells can be used to hold firmware to operate the non-volatile memory controller and implement other features. In some implementations, if a host needs to read data from or write data to the non-volatile memory, it communicates with the non-volatile memory controller to facilitate the read. In some implementations, if the host provides a logical address to which data is to be read/written, the non-volatile memory controller converts the logical address received from the host to a physical address in the nonvolatile memory. Alternatively, in some implementations, the host provides the physical address. The nonvolatile memory controller may also include hardware and/or software to perform various memory management functions, such as, but not limited to, wear leveling (distributing writes to avoid wearing out specific blocks of memory that would otherwise be repeatedly written to) and garbage collection (after a block is full, moving only the valid pages of data to a new block, so the full block can be erased and reused).

[0020] Non-volatile memory die 104 may include any suitable non-volatile storage medium, including resistive random-access memory (ReRAM), magnetoresistive random-access memory (MRAM), phase-change memory (PCM), NAND flash memory cells and/or NOR flash memory cells. The memory cells can take the form of solid-state (e.g., flash) memory cells and can be one-time programmable, few-time programmable, or many-time programmable. The memory cells can also be single-level cells (SLC), multiple-level cells (MLC), triple-level cells (TLC), or use other memory cell level technologies, now known or later developed. Also, the memory cells can be fabricated in a two-dimensional or three-dimensional fashion.

[0021] The interface between controller 102 and non-volatile memory die 104 may be any suitable flash interface, such as Toggle Mode 200, 400, or 800. In some implementations, storage system 100 may include a card based system, such as a secure digital (SD) or a micro secure digital (micro-SD) card. In some implementations, storage system 100 may be part of an embedded storage system.

[0022] Although, in the example illustrated in FIG. 1A, non-volatile storage system 100 (sometimes referred to herein as a storage module) is shown with a single channel between controller 102 and non-volatile memory die 104, the subject matter described herein is not limited to having a single memory channel. For example, in some storage system architectures (such as those shown in FIGS. 1B and 1C), any suitable number of memory channels (e.g., 2, 4, 8 or more memory channels) may exist between the controller and the memory device, depending on controller capabilities. It is noted that more than a single channel may exist between the controller and the memory die, even if a single channel is shown in the drawings.

[0023] FIG. 1B is a block diagram illustrating an example storage module 200 that includes a plurality of non-volatile storage systems 100. Storage module 200 may include a storage controller 202 that interfaces with a host and with storage system 204, which includes a plurality of nonvolatile storage systems 100. The interface between storage controller 202 and nonvolatile storage systems 100 may include a bus interface, such as a serial advanced technology attachment (SATA), peripheral component interface express (PCIe) interface, or dual-data-rate (DDR) interface. Storage module 200, in some implementations, may include a solid state drive (SSD), or a non-volatile dual in-line memory module (NVDIMM), e.g., as may be found in server PC or portable computing devices, such as laptop computers, and tablet computers.

[0024] FIG. 1C is a block diagram illustrating an example hierarchical storage system 250. Hierarchical storage system 250 includes a plurality of storage controllers 202, each of which controls a respective storage system 204. Host systems 252 may access memories within the storage system via a bus interface. In some implementations, the bus interface may be a non-volatile memory host controller interface specification express (NVMe) or fiber channel over Ethernet (FCoE) interface. In some implementations, the system illustrated in FIG. 1C may include a rack mountable mass storage system that is accessible by multiple host computers, e.g., as may be found in a data center or other location where mass storage is needed.

[0025] FIG. 2A is a block diagram illustrating components of controller 102 in more detail. Controller 102 includes a front end module 108 that interfaces with a host, a back end module 110 that interfaces with the one or more non-volatile memory die 104, and various other components that perform functions which will now be described in detail. A component may include a packaged functional hardware unit designed for use with other components, a portion of a program code (e.g., software or firmware) executable by a processor, microprocessor, or processing circuitry that usually performs a particular function of related functions, and/or a self-contained hardware or software component that interfaces with a larger system, for example. Components of the controller 102 may include a parallelism module 111, which will be discussed in more detail below and can be implemented in hardware or software/firmware to perform the algorithms and methods discussed herein and shown in the attached drawings.

[0026] Referring again to modules of the controller 102, in some implementations, a buffer manager/bus controller manages buffers in random access memory (RAM) 116 and controls the internal bus arbitration of controller 102. A read only memory (ROM) 118 stores system boot code. Although illustrated in FIG. 2A as located separately from the controller 102, in some implementations one or both of the RAM 116 and ROM 118 may be located within the controller. In yet other embodiments, portions of RAM and ROM may be located both within the controller 102 and outside the controller.

[0027] Front end module 108 includes a host interface 120 and a physical layer interface (PHY) 122 that provides an electrical interface with the host and/or next level storage controller. The choice of the type of host interface 120 may depend on the type of memory being used. Examples of host interfaces 120 include, but are not limited to, SATA, SATA Express, SAS, Fibre Channel, USB, PCIe, and NVMe. The host interface 120 typically facilitates transfer for data, control signals, and timing signals.

[0028] Back end module 110 includes an error correction controller (ECC) engine 124 that encodes the data bytes received from the host, and decodes and error corrects the data bytes read from the non-volatile memory. A command sequencer 126 generates command sequences, such as program and erase command sequences, to be transmitted to non-volatile memory die 104. A RAID (Redundant Array of Independent Drives) module 128 manages generation of RAID parity and recovery of failed data. The RAID parity may be used as an additional level of integrity protection for the data being written into the memory device 104. In some implementations, the RAID module 128 may be a part of the ECC engine 124. A memory interface 130 provides the command sequences to non-volatile memory die 104 and receives status information from non-volatile memory die 104. In some implementations, memory interface 130 may be a double data rate (DDR) interface, such as a Toggle Mode 200, 400, or 800 interface. A flash control layer 132 controls the overall operation of back end module 110.

[0029] Storage system 100 also includes other discrete components 140. Components 140 may include external electrical interfaces, external RAM, resistors, capacitors, and/or other components that may interface with controller 102. In some implementations, one or more of the physical layer interface 122, RAID module 128, media management layer 138 and buffer management/bus controller are optional components that are not necessary in the controller 102.

[0030] FIG. 2B is a block diagram illustrating components of non-volatile memory die 104 in more detail. Non-volatile memory die 104 includes peripheral circuitry 141 and non-volatile memory array 142. Non-volatile memory array 142 includes the nonvolatile memory cells used to store data. The non-volatile memory cells may be any suitable non-volatile memory cells, including ReRAM, MRAM, PCM, NAND flash memory cells and/or NOR flash memory cells in a two dimensional and/or three dimensional configuration. Non-volatile memory die 104 further includes a data cache 156 that caches data. Peripheral circuitry 141 includes a state machine 152 that provides status information to the controller 102.

[0031] The memory cells of non-volatile memory die 104 are organized as and/or may be accessed in blocks, where each block includes a number of the memory cells. For example, if non-volatile memory array 142 includes 2 gigabytes of flash memory cells (e.g., where each cell stores one bit), the storage can be organized as 248 blocks, with 64 pages per block, each page including 2112 bytes, each including 2048 byte data storage area and a 64 byte spare area. The 64 byte area is usable for any suitable purpose, such as error correction, wear leveling, or other functions. These dimensions are exemplary only; any suitable number of flash memory cells, blocks, pages, and bytes can be used. In some implementations, peripheral circuitry 141 includes at least one temperature sensor which is usable to sense or infer a temperature of the memory cells (e.g., junction temperature). In some implementations, the temperature sensor is located elsewhere on die 104 and/or storage system 100.

[0032] Various operations can be performed on the blocks of a non-volatile memory array, including block erase and programming operations. An erase operation writes all of the bit values of the cells in a block to one (or zero, depending on convention). In some implementations this is done by reducing the charge stored in each memory cell below a threshold voltage. The threshold voltage can be referred to as an erase verify level. In some implementations, a block erase command returns a status bit which indicates whether the block erase was successful. In some implementations, a separate command is used to determine the status of the block erase. The status typically indicates that the block erase was successful if the charged stored in each memory cell (or a threshold number of memory cells) of the block is below the erase verify level. If any (or a threshold number) of the memory cells has a charge with a voltage above the erase verify level, the status will indicate a block erase failure.

[0033] Block erase failures can arise due to various factors, and are typically classified as either hard or soft errors. Hard errors are permanent failures due to manufacturing defects or damage due to wear or other factors. A cell having a hard error is not reliable (e.g., cannot be erased or programmed reliably). Blocks which include hard errors can be referred to as "bad blocks" (BB). Those blocks which become bad (i.e., acquire hard errors) during the course of operation (e.g., due to wear) can be referred to as "developed bad blocks" or "grown bad blocks" (GBB).

[0034] FIG. 3 is a line graph 300 illustrating example cell voltage distributions 302, 304, 306 for memory cells under various circumstances. Distribution 302 illustrates a distribution of voltages for cells in a block which has been fully programmed, e.g., using a program command. Distribution 302 shows that the stored charges of all of the cells in the block have a voltage which exceeds an erase verify voltage level 308. Distribution 304 illustrates a distribution of voltages for the cells after having been fully erased, e.g., using a block erase command. Distribution 304 shows that the stored charges of all of the cells in the block have a voltage which is below the erase verify voltage level 304.

[0035] Distribution 306 illustrates a distribution of voltages for the cells after a failed erase operation, e.g., due to hard errors, or other errors. Distribution 306 shows that the stored charges of all of the cells in the block have a voltage which is above the erase verify voltage level 308, even though the voltages are lower than the programmed voltage levels illustrated by distribution 302. If the block exhibits a distribution of voltages above an erase verify voltage level (e.g., erase verify voltage level 308), the block may be considered to be a bad block or GBB.

[0036] Typically, bad blocks are marked as such in a header of the block, and/or in a list or table stored in a good block of the memory (and/or in system memory, or another suitable location). In some implementations, if a block is marked bad, it is removed from the logical mapping of the memory and a spare good block is substituted in the logical mapping. There are a limited number of spare good blocks, and thus the memory may be subject to failure if the number of spare good blocks is exceeded by the number of grown bad blocks. Accordingly, it may be advantageous in some circumstances to avoid marking blocks as bad unless they are confirmed to include hard errors.

[0037] A soft or transient error is a temporary failure where the cell is not reliable temporarily and/or under specific circumstances. Transient errors may occur due to particle strikes, electrostatic discharges, or charge leakage due to high temperature, for example. Temperature related transient errors may arise due to excessive temperatures of the transistor junctions of the memory cells. High junction temperatures can occur due to heavy usage of the memory array, excessive ambient temperatures, and so forth. Susceptibility of the memory cells to temperature related transient errors can be exacerbated by die thinning. For example, reducing the thickness of a die on which memory cells are implemented (e.g., by back grinding) can reduce the temperature at which the memory cell experiences unacceptable charge leakage, and may cause NAND operations, such as block erase operations, to fail in some cases.

[0038] In some implementations, soft errors due to temperature are avoided by preventing operation of the memory cells at a temperature where the memory cells are susceptible to soft errors. For example, in some implementations, junction temperatures are monitored using a temperature sensor located in periphery circuitry of the memory die, or in another suitable location. If the junction temperature exceeds a threshold temperature, the operation of the memory is halted or throttled (e.g., the speed and/or number of memory operations is reduced). In some cases, a staged approach is implemented where the operation of the memory is throttled at a first threshold temperature (e.g., 60 degrees centigrade), and halted at a second, higher threshold temperature (e.g., 70 degrees centigrade). The monitoring and/or halting and throttling may be controlled by a memory controller (e.g., controller 102 as shown and described with respect to FIGS. 1A-C and 2B) or any other suitable hardware and/or software.

[0039] Throttling and/or halting memory operations may cause the memory to fail product performance specifications for block budget or otherwise exhibit undesirable performance characteristics for applications where junction temperature (e.g., due to load or high ambient temperature) is expected to regularly exceed the threshold. Further, not throttling and/or halting memory operations may cause the memory to fail block budget specifications, if a large number of blocks are marked bad due to failures under temperature dependent soft-error conditions. Accordingly, it may be desired to handle soft errors due to temperature in a manner which avoids or mitigates these issues.

[0040] In some circumstances, a block which fails erase due to high temperature may not be immediately marked as a GBB, and may still be used for programming under certain circumstances. This may have the advantage of maintaining compliance with block budget and/or performance specifications.

[0041] In some implementations, a cell which fails erase due to high temperature is tested to determine whether it should be marked as a potential grown bad block (PGBB), or whether it can still be programmed successfully under the current conditions. Such a test can referred to as a "program feasibility read". In a program feasibility read, the controller or other suitable circuitry performs an erase operation on the block under test and senses the voltage level of the memory cells of the block. The distribution of the memory cells is compared with one or more voltage thresholds to determine whether the memory cells can be successfully programmed, and in some implementations, under what conditions the memory cells can be successfully programmed.

[0042] FIG. 4 is a line graph 400 illustrating example cell voltage distributions 402, 404, 406 for memory cells of a block during a program feasibility read. In some implementations, the memory cells of the block are tested using a program feasibility read if an erase operation fails while the junction temperature of the block exceeds a threshold TH.sub.TEMP (e.g., 100 degrees centigrade). During the program feasibility read of the block, an erase operation is performed on the block and the voltage level of the memory cells is sensed (e.g., by the controller or other suitable circuitry). The distribution of voltage levels of the memory cells is compared against a first threshold TH.sub.READ1 408 and a second threshold TH.sub.READ2 410 in this example. In some implementations, first threshold TH.sub.READ1 408 and second threshold TH.sub.READ2 410 are based on a characteristic of the memory cells, or are determined empirically.

[0043] If the distribution of voltage levels includes cells (or a threshold number of cells) which exceed TH.sub.READ1 408, the block may be considered to be a PGBB in some implementations. Distribution 402 illustrates a distribution of voltages for cells in a block where cells (or a threshold number of cells) have a voltage exceeding TH.sub.READ1 408. Accordingly, a block corresponding to distribution 402 is marked as a PGBB in some implementations. A PGBB can be marked as such in a header of the block, and/or a PGBB list or table stored in a good block of the memory (and/or in system memory, or another suitable location). In some implementations, a PGBB is treated as a GBB (i.e., not used for programming) under certain temperature conditions (e.g., above a threshold temperature TH.sub.TEMP) and is tested again under other temperature conditions (e.g., below threshold temperature TH.sub.TEMP) to determine whether or not to mark it as an actual GBB.

[0044] If the distribution of voltage levels includes cells (or a threshold number of cells) which exceed TH.sub.READ2 410, but do not exceed TH.sub.READ1 408, the block may be considered to be a "marginal failure" in some implementations. Distribution 404 illustrates a distribution of voltages for cells in a block where cells (or a threshold number of cells) have a voltage exceeding TH.sub.READ2 410, but not exceeding TH.sub.READ1 408. Accordingly, a block corresponding to distribution 404 is marked as a marginal failure in some implementations. A marginal failure can be marked as such in a header of the block, and/or a marginal failure list or table stored in a good block of the memory (and/or in system memory, or another suitable location). In some implementations, a marginal failure block can be reliably programmed, despite the erase failure, if error correction is used. In some implementations, the error correction includes setting a soft bit window, or increasing the size of a soft bit window.

[0045] If the distribution of voltage levels includes cells (or a threshold number of cells) which do not exceed TH.sub.READ2 410, the block may be considered to be a second type of marginal failure in some implementations. Distribution 406 illustrates a distribution of voltages for cells in a block where cells (or a threshold number of cells) have a voltage which does not exceed TH.sub.READ2 410. Accordingly, a block corresponding to distribution 406 is marked as a second type of marginal failure in some implementations. A second type of marginal failure can be marked as such in a header of the block, and/or a second type of marginal failure list or table stored in a good block of the memory (and/or in system memory, or another suitable location). In some implementations, a second type of marginal failure block can be reliably programmed (e.g., above the threshold temperature TH.sub.TEMP), despite the erase failure and without error correction (or without additional error correction in some implementations).

[0046] FIG. 5 is a flow chart illustrating an example method 500 for managing temperature dependent block erase failures. The various elements of method 500 are presented in an example order; however these elements can be performed in any suitable order, or separated, or combined, as appropriate. Any of the techniques discussed herein can be used for any of these elements, as appropriate.

[0047] On condition 502 that an erase failure is detected, it is determined whether a number of blocks in the memory that have been marked as PGBB is above a threshold (e.g., TH.sub.BLOCK).

[0048] On condition 504 that the number of blocks in the memory that have been marked as PGBB is above the threshold, active throttling is applied in 506. On condition 504 that the number of blocks in the memory that have been marked as PGBB is not above the threshold, a temperature of the memory (e.g., junction temperature) is checked in 508.

[0049] On condition 510 that the temperature of the memory does not exceed a threshold temperature (e.g., TH.sub.TEMP), the block under test is marked as a GBB in 512. Otherwise, on condition 510 that the temperature of the memory does exceed the threshold temperature, a program feasibility read operation is initiated in 514.

[0050] On condition 516 that the voltage distribution of the cells in the block exceeds a first threshold (e.g., TH.sub.READ1), the block is marked as a PGBB in 518. Otherwise, on condition 520 that the voltage distribution of the cells in the block exceeds a second threshold (e.g., TH.sub.READ2), the block is marked for error correction such as soft bit read, or additional error correction, such as an increased soft bit window (e.g., is marked as a second type of marginal failure as shown and described regarding FIG. 4) in 522 and used for programming in 524. On condition 518 that the voltage distribution of the cells in the block does not exceed the second threshold, the block is used for programming in 526.

[0051] It is noted that conditions 516 and 520 are illustrated separately for ease of description, but can be combined into one condition as desired. It is also noted that various implementations may use portions of method 500, or incorporate method 500 or portions thereof into another method.

[0052] It should be understood that many variations are possible based on the disclosure herein. Although features and elements are described above in particular combinations, each feature or element can be used alone without the other features and elements or in various combinations with or without other features and elements.

[0053] The foregoing detailed description of the invention has been presented for purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise form disclosed. Many modifications and variations are possible in light of the above teachings. The described embodiments were chosen in order to best explain the principles of the invention and its practical application, to thereby enable others skilled in the art to best utilize the invention in various embodiments and with various modifications as are suited to the particular use contemplated.

* * * * *