U.S. patent application number 16/680330 was filed with the patent office on 2021-05-13 for fast-converging bit-flipping decoder for low-density parity-check codes.
The applicant listed for this patent is SK hynix Inc.. Invention is credited to Hongwei Duan, Haobo Wang, Jiangnan Xia, Chenrong Xiong, Fan Zhang.
Application Number | 20210143836 16/680330 |
Document ID | / |
Family ID | 1000004468245 |
Filed Date | 2021-05-13 |
![](/patent/app/20210143836/US20210143836A1-20210513\US20210143836A1-2021051)
United States Patent
Application |
20210143836 |
Kind Code |
A1 |
Zhang; Fan ; et al. |
May 13, 2021 |
FAST-CONVERGING BIT-FLIPPING DECODER FOR LOW-DENSITY PARITY-CHECK
CODES
Abstract
Disclosed are devices, systems and methods improving the
convergence of a bit-flipping decoder in a non-volatile memory
device. An example method includes receiving a noisy codeword, the
codeword having been generated based on a parity check matrix of a
low-density parity-check code and provided to a communication
channel prior to reception by the bit-flipping decoder, and
performing a single decoding iteration on the received noisy
codeword, the single decoding iteration spanning a plurality of
stages. In some embodiments, performing a single decoding iteration
includes computing a metric corresponding to a single column of the
parity check matrix, flipping at least one bit in the single column
upon a determination that the metric exceeds a flipping threshold,
computing, subsequent to the flipping, a syndrome as a product of
the noisy codeword and the parity check matrix, and updating the
flipping threshold upon a determination that the syndrome is not
zero.
Inventors: |
Zhang; Fan; (Fremont,
CA) ; Xiong; Chenrong; (San Jose, CA) ; Wang;
Haobo; (San Jose, CA) ; Duan; Hongwei; (San
Jose, CA) ; Xia; Jiangnan; (San Jose, CA) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
SK hynix Inc. |
Icheon-si |
|
KR |
|
|
Family ID: |
1000004468245 |
Appl. No.: |
16/680330 |
Filed: |
November 11, 2019 |
Current U.S.
Class: |
1/1 |
Current CPC
Class: |
H04L 1/0051 20130101;
G06F 11/1068 20130101; H03M 13/1108 20130101 |
International
Class: |
H03M 13/11 20060101
H03M013/11; G06F 11/10 20060101 G06F011/10 |
Claims
1. A method of improving convergence of a bit-flipping decoder in a
non-volatile memory device, comprising: receiving a noisy codeword,
the codeword having been generated based on a parity check matrix
of a low-density parity-check (LDPC) code and provided to a
communication channel prior to reception by the bit-flipping
decoder; and performing a single decoding iteration on the received
noisy codeword, the single decoding iteration spanning a plurality
of stages, wherein, performing the single decoding iteration
includes, for each of the plurality of stages: computing a metric
corresponding to a single column of the parity check matrix,
flipping at least one bit in the single column upon a determination
that the metric exceeds a flipping threshold, computing, subsequent
to the flipping, a syndrome as a product of the noisy codeword and
the parity check matrix, and updating the flipping threshold upon a
determination that the syndrome is not zero, wherein updating the
flipping threshold for a first stage of the plurality of stages is
based on a first set of rules, and wherein updating the flipping
threshold for a second stage subsequent to the first stage is based
on a second set of rules different from the first set of rules, and
wherein the flipping threshold is updated in the middle of the
single decoding iteration.
2. The method of claim 1, wherein updating the flipping threshold
is based on an index of the decoding iteration, a number of
non-zero values in the syndrome, or a number of bits flipped in a
column of the parity check matrix in a previous decoding
iteration.
3. The method of claim 1, wherein a set of rules comprises a finite
state machine (FSM).
4. The method of claim 3, wherein the FSM is configured to maximize
a throughput of the bit-flipping decoder.
5. The method of claim 1, wherein flipping the at least one bit
comprises flipping multiple bits in the single column.
6. The method of claim 1, wherein each stage of the plurality of
stages corresponds to a predetermined number of clock cycles.
7. The method of claim 1, wherein an index of the single decoding
iteration is less than a maximum number of decoding iterations.
8. A system for improving convergence of a bit-flipping decoder in
a non-volatile memory device, comprising: a processor and a memory
including instructions stored thereupon, wherein the instructions
upon execution by the processor cause the processor to: receive a
noisy codeword, the codeword having been generated based on a
parity check matrix of a low-density parity-check (LDPC) code and
provided to a communication channel prior to reception by the
bit-flipping decoder; and perform a single decoding iteration on
the received noisy codeword, the single decoding iteration spanning
a plurality of stages, wherein, performing the single decoding
iteration includes, for each of the plurality of stages: computing
a metric corresponding to a single column of the parity check
matrix, flipping at least one bit in the single column upon a
determination that the metric exceeds a flipping threshold,
computing, subsequent to the flipping, a syndrome as a product of
the noisy codeword and the parity check matrix, and updating the
flipping threshold upon a determination that the syndrome is not
zero, wherein updating the flipping threshold for a first stage of
the plurality of stages is based on a first set of rules, and
wherein updating the flipping threshold for a second stage
subsequent to the first stage is based on a second set of rules
different from the first set of rules, and wherein the flipping
threshold is updated in the middle of the single decoding
iteration.
9. The system of claim 8, wherein updating the flipping threshold
is based on an index of the decoding iteration, a number of
non-zero values in the syndrome, or a number of bits flipped in a
column of the parity check matrix in a previous decoding
iteration.
10. The system of claim 8, wherein a set of rules comprises a
finite state machine (FSM).
11. The system of claim 10, wherein the FSM is configured to
maximize a throughput of the bit-flipping decoder.
12. The system of claim 8, wherein flipping the at least one bit
comprises flipping multiple bits in the single column.
13. The system of claim 8, wherein each stage of the plurality of
stages corresponds to a predetermined number of clock cycles.
14. The system of claim 8, wherein an index of the single decoding
iteration is less than a maximum number of decoding iterations.
15. A non-transitory computer-readable storage medium having
instructions stored thereupon for improving convergence of a
bit-flipping decoder in a non-volatile memory device, the
instructions being configured to be executed by a processor, and
the storage medium comprising: instructions for receiving a noisy
codeword, the codeword having been generated based on a parity
check matrix of a low-density parity-check (LDPC) code and provided
to a communication channel prior to reception by the bit-flipping
decoder; and instructions for performing a single decoding
iteration on the received noisy codeword, the single decoding
iteration spanning a plurality of stages, wherein, the instructions
for performing the single decoding iteration include, for each of
the plurality of stages: instructions for computing a metric
corresponding to a single column of the parity check matrix,
instructions for flipping at least one bit in the single column
upon a determination that the metric exceeds a flipping threshold,
instructions for computing, subsequent to the flipping, a syndrome
as a product of the noisy codeword and the parity check matrix, and
instructions for updating the flipping threshold upon a
determination that the syndrome is not zero, wherein updating the
flipping threshold for a first stage of the plurality of stages is
based on a first set of rules, and wherein updating the flipping
threshold for a second stage subsequent to the first stage is based
on a second set of rules different from the first set of rules, and
wherein the flipping threshold is updated in the middle of the
single decoding iteration.
16. The storage medium of claim 15, wherein updating the flipping
threshold is based on an index of the decoding iteration, a number
of non-zero values in the syndrome, or a number of bits flipped in
a column of the parity check matrix in a previous decoding
iteration.
17. The storage medium of claim 15, wherein a set of rules
comprises a finite state machine (FSM).
18. The storage medium of claim 17, wherein the FSM is configured
to maximize a throughput of the bit-flipping decoder.
19. The storage medium of claim 15, wherein flipping the at least
one bit comprises flipping multiple bits in the single column.
20. The storage medium of claim 15, wherein each stage of the
plurality of stages corresponds to a predetermined number of clock
cycles.
Description
TECHNICAL FIELD
[0001] This patent document generally relates to non-volatile
memory devices, and more specifically, to error correction in
non-volatile memory devices.
BACKGROUND
[0002] Data integrity is an important feature for any data storage
device and data transmission. Use of strong error-correction codes
(ECCs) is recommended for various types of data storage devices
including NAND flash memory devices.
[0003] Solid-state drives (SSDs) use multi-level NAND flash devices
for persistent storage. However, the multi-level NAND flash devices
can be inherently unreliable and generally need to use ECCs to
allow dramatic increase in data reliability at the expense of extra
storage space for ECC parity bits. There is a demand for ECCs that
can provide data protection with improved convergence
properties.
SUMMARY
[0004] Embodiments of the disclosed technology relate to methods,
devices and systems for improving the convergence of a bit-flipping
decoder for low-density parity-check (LDPC) codes. The methods and
devices described in the present document advantageously, among
other features and benefits, enable faster convergence in
bit-flipping decoding algorithms, thereby providing higher
throughput and lower power consumption.
[0005] In an example aspect, a method for improving convergence of
a bit-flipping decoder includes receiving a noisy codeword, the
codeword having been generated based on a parity check matrix of an
LDPC code and provided to a communication channel prior to
reception by the bit-flipping decoder; and performing a single
decoding iteration on the received noisy codeword, the single
decoding iteration spanning a plurality of stages, wherein,
performing a single decoding iteration includes, for each of the
plurality of stages: computing a metric corresponding to a single
column of the parity check matrix, flipping at least one bit in the
single column upon a determination that the metric exceeds a
flipping threshold (T), computing, subsequent to the flipping, a
syndrome (s) as a product of the noisy codeword and the parity
check matrix, and updating the flipping threshold upon a
determination that the syndrome is not zero, wherein updating the
flipping threshold for a first stage of the plurality of stages is
based on a first set of rules, and wherein updating the flipping
threshold for a second stage subsequent to the first stage is based
on a second set of rules different from the first set of rules.
[0006] In another example aspect, the above-described method may be
implemented by a video encoder apparatus or a video decoder
apparatus that comprises a processor.
[0007] In yet another example aspect, these methods may be embodied
in the form of processor-executable instructions and stored on a
computer-readable program medium.
[0008] The subject matter described in this patent document can be
implemented in specific ways that provide one or more of the
following features.
BRIEF DESCRIPTION OF THE DRAWINGS
[0009] FIG. 1 illustrates an example of a memory system.
[0010] FIG. 2 is an illustration of an example non-volatile memory
device.
[0011] FIG. 3 is an example diagram illustrating the cell voltage
level distribution (V.sub.th) of a non-volatile memory device.
[0012] FIG. 4 is another example diagram illustrating the cell
voltage level distribution (V.sub.th) of a non-volatile memory
device.
[0013] FIG. 5 is an example diagram illustrating the cell voltage
level distribution (V.sub.th) of a non-volatile memory device
before and after program interference.
[0014] FIG. 6 is an example diagram illustrating the cell voltage
level distribution (V.sub.th) of a non-volatile memory device as a
function of the reference voltage.
[0015] FIG. 7 is a performance plot illustrating the fail-bit (FB)
distribution and codeword failure rate (CFR) of a bit-flipping
decoder in an exemplary NAND device.
[0016] FIG. 8 illustrates a flowchart of an example method for
improving convergence of a bit-flipping decoder.
[0017] FIG. 9A illustrates a flowchart of another example method
for improving convergence of a bit-flipping decoder.
[0018] FIG. 9B illustrates a flowchart of an yet another example
method for improving convergence of a bit-flipping decoder.
DETAILED DESCRIPTION
[0019] Low-density parity-check (LDPC) codes are an important part
of linear block error-correcting codes (ECCs), which have found
widespread use in data storage systems. LDPC codes can be decoded
by using two classes of decoding algorithms: soft-information
message passing algorithms, e.g., min-sum or sum-product
algorithms, and hard-decision algorithms such as bit-flipping
algorithms. The soft-information decoding algorithms provide a very
good decoding performance but require a large computation
resources. They exhibit, therefore, very high complexity in
hardware realization. On the contrary, the hardware implementations
of hard-decision decoders were shown to have low complexity and
reduced latency requirement because to the simple computation units
and smaller connection networks, and have been developed to provide
comparable error correction performance. In other systems,
combinations of hard- and soft-decoding implementations are
employed.
[0020] In most bit-flipping decoder architectures, the correction
capability of a decoder is optimized so that is can correct more
data and reduce the traffic going to the min-sum hard (MSH)
decoder. However, based on data collected from thousands of NAND
devices, this perspective may be inappropriate because correcting
more data only marginally reduces the traffic to the MSH decoder.
Thus, embodiments of the disclosed technology focus on, amongst
other features and benefits, increasing the convergence of the
bit-flipping decoder, thereby resulting in higher decoder
throughput and lower power consumption requirements.
[0021] FIGS. 1-6 overview a non-volatile memory system in which
embodiments of the disclosed technology may be implemented.
[0022] FIG. 1 is a block diagram of an example of a memory system
100 implemented based on some embodiments of the disclosed
technology. The memory system 100 includes a memory module 110 that
can be used to store information for use by other electronic
devices or systems. The memory system 100 can be incorporated
(e.g., located on a circuit board) in other electronic devices and
systems. Alternatively, the memory system 100 can be implemented as
an external storage device such as a USB flash drive and a
solid-state drive (SSD).
[0023] The memory module 110 included in the memory system 100 can
include memory areas (e.g., memory arrays) 102, 104, 106, and 108.
Each of the memory areas 102, 104, 106, and 108 can be included in
a single memory die or in multiple memory dice. The memory die can
be included in an integrated circuit (IC) chip.
[0024] Each of the memory areas 102, 104, 106, and 108 includes a
plurality of memory cells. Read, program, or erase operations can
be performed on a memory unit basis. Thus, each memory unit can
include a predetermined number of memory cells. The memory cells in
a memory area 102, 104, 106, or 108 can be included in a single
memory die or in multiple memory dice.
[0025] The memory cells in each of memory areas 102, 104, 106, and
108 can be arranged in rows and columns in the memory units. Each
of the memory units can be a physical unit. For example, a group of
a plurality of memory cells can form a memory unit. Each of the
memory units can also be a logical unit. For example, the memory
unit can be a bank, block, or page that can be identified by a
unique address such as bank address, block address, and page basis
address. During a read or write operation, the unique address
associated with a particular memory unit can be used to access that
particular memory unit. Based on the unique address, information
can be written to or retrieved from one or more memory cells in
that particular memory unit.
[0026] The memory cells in the memory areas 102, 104, 106, and 108
can include non-volatile memory cells. Examples of non-volatile
memory cells include flash memory cells, phase change memory (PRAM)
cells, magnetoresistive random-access memory (MRAM) cells, or other
types of non-volatile memory cells. In an example implementation
where the memory cells are configured as NAND flash memory cells,
the read or write operation can be performed on a page basis.
However, an erase operation in a NAND flash memory is performed on
a block basis.
[0027] Each of the non-volatile memory cells can be configured as a
single-level cell (SLC) or multiple-level memory cell. A
single-level cell can store one bit of information per cell. A
multiple-level memory cell can store more than one bit of
information per cell. For example, each of the memory cells in the
memory areas 102, 104, 106, and 108 can be configured as a
multi-level cell (MLC) to store two bits of information per cell, a
triple-level cell (TLC) to store three bits of information per
cell, or a quad-level cells (QLC) to store four bits of information
per cell. In another example, each of the memory cells in memory
area 111 can be configured to store at least one bit of information
(e.g., one bit of information or multiple bits of information), and
each of the memory cells in memory area 112 can be configured to
store more than one bit of information.
[0028] As shown in FIG. 1, the memory system 100 includes a
controller module 120. The controller module 120 includes a memory
interface 121 to communicate with the memory module 110, a host
interface 126 with communicate with a host (not shown), a processor
124 to executes firmware-level code, and caches and memories 122
and 123 to temporarily or persistently store executable
firmware/instructions and associated information. In some
implementations, the controller unit 120 can include an error
correction engine 125 to perform error correction operation on
information stored in the memory module 110. Error correction
engine 122 can be configured to detect/correct single bit error or
multiple bit errors. In another implementation, error correction
engine 125 can be located in the memory module 110.
[0029] The host can be a device or a system that includes one or
more processors that operate to retrieve data from the memory
system 100 or store or write data into the memory system 100. In
some implementations, examples of the host can include a personal
computer (PC), a portable digital device, a digital camera, a
digital multimedia player, a television, and a wireless
communication device.
[0030] In some implementations, the controller module 120 can also
include a host interface 126 to communicate with the host. Host
interface 126 can include components that comply with at least one
of host interface specifications, including but not limited to,
Serial Advanced Technology Attachment (SATA), Serial Attached Small
Computer System Interface (SAS) specification, Peripheral Component
Interconnect Express (PCIe).
[0031] FIG. 2 illustrates an example of a memory cell array
implemented based on some embodiments of the disclosed
technology.
[0032] In some implementations, the memory cell array can include
NAND flash memory array that is partitioned into many blocks, and
each block contains a certain number of pages. Each block includes
a plurality of memory cell strings, and each memory cell string
includes a plurality of memory cells.
[0033] In some implementations where the memory cell array is NAND
flash memory array, read and write (program) operations are
performed on a page basis, and erase operations are performed on a
block basis. All the memory cells within the same block must be
erased at the same time before performing a program operation on
any page included in the block. In an implementation, NAND flash
memories may use an even/odd bit-line structure. In another
implementation, NAND flash memories may use an all-bit-line
structure. In the even/odd bit-line structure, even and odd
bit-lines are interleaved along each word-line and are
alternatively accessed so that each pair of even and odd bit-lines
can share peripheral circuits such as page buffers. In all-bit-line
structure, all the bit-lines are accessed at the same time.
[0034] FIG. 3 illustrates an example of threshold voltage
distribution curves in a multi-level cell device, wherein the
number of cells for each program/erase state is plotted as a
function of the threshold voltage. As illustrated therein, the
threshold voltage distribution curves include the erase state
(denoted "ER" and corresponding to "11") with the lowest threshold
voltage, and three program states (denoted "P1", "P2" and "P3"
corresponding to "01", "00" and "10", respectively) with read
voltages in between the states (denoted by the dotted lines). In
some embodiments, each of the threshold voltage distributions of
program/erase states has a finite width because of differences in
material properties across the memory array.
[0035] In writing more than one data bit in a memory cell, fine
placement of the threshold voltage levels of memory cells is needed
because of the reduced distance between adjacent distributions.
This is achieved by using incremental step pulse program (ISPP),
i.e., memory cells on the same word-line are repeatedly programmed
using a program-and-verify approach with a stair case program
voltage applied to word-lines. Each programmed state associates
with a verify voltage that is used in verify operations and sets
the target position of each threshold voltage distribution
window.
[0036] Read errors can be caused by distorted or overlapped
threshold voltage distribution. An ideal memory cell threshold
voltage distribution can be significantly distorted or overlapped
due to, e.g., program and erase (P/E) cycle, cell-to-cell
interference, and data retention errors, which will be discussed in
the following, and such read errors may be managed in most
situations by using error correction codes (ECC).
[0037] FIG. 4 illustrates an example of ideal threshold voltage
distribution curves 410 and an example of distorted threshold
voltage distribution curves 420. The vertical axis indicates the
number of memory cells that has a particular threshold voltage
represented on the horizontal axis.
[0038] For n-bit multi-level cell NAND flash memory, the threshold
voltage of each cell can be programmed to 2.sup.n possible values.
In an ideal multi-level cell NAND flash memory, each value
corresponds to a non-overlapping threshold voltage window.
[0039] Flash memory P/E cycling causes damage to a tunnel oxide of
floating gate of a charge trapping layer of cell transistors, which
results in threshold voltage shift and thus gradually degrades
memory device noise margin. As P/E cycles increase, the margin
between neighboring distributions of different programmed states
decreases and eventually the distributions start overlapping. The
data bit stored in a memory cell with a threshold voltage
programmed in the overlapping range of the neighboring
distributions may be misjudged as a value other than the original
targeted value.
[0040] FIG. 5 illustrates an example of a cell-to-cell interference
in NAND flash memory. The cell-to-cell interference can also cause
threshold voltages of flash cells to be distorted. The threshold
voltage shift of one memory cell transistor can influence the
threshold voltage of its adjacent memory cell transistor through
parasitic capacitance-coupling effect between the interfering cell
and the victim cell. The amount of the cell-to-cell interference
may be affected by NAND flash memory bit-line structure. In the
even/odd bit-line structure, memory cells on one word-line are
alternatively connected to even and odd bit-lines and even cells
are programmed ahead of odd cells in the same word-line. Therefore,
even cells and odd cells experience different amount of
cell-to-cell interference. Cells in all-bit-line structure suffer
less cell-to-cell inference than even cells in the even/odd
bit-line structure, and the all-bit-line structure can effectively
support high-speed current sensing to improve the memory read and
verify speed.
[0041] The dotted lines in FIG. 5 denote the nominal distributions
of P/E states (before program interference) of the cells under
consideration, and the "neighbor state value" denotes the value
that the neighboring state has been programmed to. As illustrated
in FIG. 5, if the neighboring state is programmed to P1, the
threshold voltage distributions of the cells under consideration
shift by a specific amount. However, if the neighboring state is
programmed to P2, which has a higher threshold voltage than P1,
that results in a greater shift compared to the neighboring state
being P1. Similarly, the shift in the threshold voltage
distributions is greatest when the neighboring state is programmed
to P3.
[0042] FIG. 6 illustrates an example of a retention error in NAND
flash memory by comparing normal threshold-voltage distribution and
shifted threshold-voltage distribution. The data stored in NAND
flash memories tend to get corrupted over time and this is known as
a data retention error. Retention errors are caused by loss of
charge stored in the floating gate or charge trap layer of the cell
transistor. Due to wear of the floating gate or charge trap layer,
memory cells with more program erase cycles are more likely to
experience retention errors. In the example of FIG. 6, comparing
the top row of voltage distributions (before corruption) and the
bottom row of distributions (contaminated by retention error)
reveals a shift to the left.
[0043] FIG. 7 is a performance plot illustrating the fail-bit (FB)
distribution and codeword failure rate (CFR) of a bit-flipping
decoder in an exemplary NAND device (e.g., the NAND device
described in the context of FIGS. 1-6). FIG. 7 illustrates NAND
fail-bit (FB) count distributions on the left-hand y-axis for
different values of erase/write cycles (e.g., EWCYC=3000, 5000 and
7000), and the codeword failure rate for a bit-flipping decoder
(denoted "BF") and an improved bit-flipping decoder (denoted "iBF")
on the right-hand y-axis. In an example, the iBF decoder a version
of the BF decoder in which the energy function (or flipping energy
or indicator) is based on channel information; e.g., whether the
current hard decision is different from a hard decision
corresponding to the received bit or not.
[0044] As illustrated in FIG. 7, orders of magnitude of reduction
in the CFR of a bit-flipping decoder only translates to a tiny
traffic reduction to the min-sum hard (MSH) decoder. This is
because a majority of the traffic, even at end-of-life (EOL)
conditions, have low FB counts. Only a small percentage of the
traffic will have a FB count that is close to the waterfall point
of the bit-flipping decoder's CFR curve. Thus, embodiments of the
disclosed technology focus on, amongst other features and benefits,
optimizing or improving the operations of the bit-flipping decoder
to handle the majority of traffic, which is characterized by a low
FB count. As discussed earlier, this optimization results in an
increased decoder throughput and lower power consumption
requirements.
[0045] An exemplary operation of the bit-flipping decoder assumes
that x=[x.sub.0, x.sub.1, . . . , x.sub.N-1] is a bit vector and
H=[h.sub.ij] is an M.times.N low-density parity-check matrix with a
binary value h.sub.ij at row i and column j, and with each row of H
providing a parity check for x. If x is a codeword of H, then
xH.sup.T=0. It is further assumed that x is transmitted over a
noisy channel and that the output of the channel is y=[y.sub.0,
y.sub.1, . . . , y.sub.N-1], which is processed by the receiver.
The parity-check matrix of an LDPC code can be represented by a
bipartite graph called a Tanner graph that is composed of two
groups of nodes: checks nodes and variable nodes.
[0046] In an example, the receiver computes the syndrome of y as a
binary vector calculated as s=[s.sub.0, s.sub.1, . . . ,
s.sub.N-1]=yH.sup.T, with the weight of s (denoted
.parallel.s.parallel. and equivalent to the number of "1" entries
in s) representing the number of unsatisfied check nodes.
Furthermore, the number of unsatisfied check nodes connected with
variable node j is denoted .parallel.s.sub.j.parallel..
[0047] An example algorithm for improving convergence of a
bit-flipping decoder, according to embodiments of the disclosed
technology, assumes E(i) is the flipping energy (or indicator) of
the i-th column, T is the flipping threshold, s is the syndrome, t
is the iteration index (or number) such that
(0.ltoreq.t.ltoreq.t.sub.max), d.sub.j.sup.(t) denotes the decision
of variable node j of the t-th iteration, and
d.sup.(t).left brkt-top.d.sub.0.sup.(t), d.sub.1.sup.(t), . . . ,
d.sub.N-1.sup.(t).right brkt-bot.s.sup.(t)=.left
brkt-top.d.sub.0.sup.(t), d.sub.1.sup.(t), . . . ,
d.sub.N-1.sup.(t).right brkt-bot.H.sup.T.
[0048] Lastly, if it is assumed that d.sup.(-1)=.left
brkt-top.d.sub.0.sup.(-1), d.sub.1.sup.(-1), . . . ,
d.sub.N-1.sup.(-1).right brkt-bot.=y, then s.sup.(-1)=yH.sup.T.
[0049] In some embodiments, the iterative decoding algorithm is
described as follows: [0050] Operation 0: Set t=0, calculate
s.sup.(-1) and go to Operation 1; [0051] Operation 1: If max
iteration exceeded, stop decoding. Else, go to Operation 2; [0052]
Operation 2: If s=0, stop decoding. Otherwise, go to Operation 3;
[0053] Operation 3: For all of the variable nodes, perform the
following operations: [0054] update the flipping indicator (or
energy), E(j); [0055] update the flipping threshold, T; [0056] flip
the j-th bit if E(j)>T; [0057] compute and update s; and [0058]
go to Operation 1.
[0059] In some embodiments, the flipping indicator E(j) is updated
as follows: [0060] if d.sub.j.sup.(t)=y.sub.j, then
E(j)=.parallel.s.sub.j.parallel.; otherwise,
E(j)=.parallel.s.sub.j.parallel.+.alpha..
[0061] Herein, .alpha. can be any positive value. For example, if
.alpha. is set to 1, E(j)=.parallel.s.sub.j.parallel.+1 when
d.sub.h.sup.(t).noteq.y.sub.j.
[0062] In some embodiments, the adaptive flipping threshold T is
generated based on the iteration number, the number of unsatisfied
check nodes of the previous iteration, and/or the number of
variable nodes flipped in the previous iteration. In an example
finite state machine (FSM) that can be used to adaptively update
the flipping threshold, the following computations are performed:
[0063] if (t==0) and (.parallel.s.parallel.<.beta..sub.1), then
T=.delta..sub.1; [0064] else if (t==0), then T=.delta..sub.1-1;
[0065] else if (t==1), then T=.delta..sub.1; [0066] else if (p==0),
then T=T-1; [0067] else if (t>.epsilon.) and
(.parallel.s.parallel.<.beta..sub.2), then T=.delta..sub.2;
[0068] else if (.parallel.s.parallel.<.beta..sub.3) and
(p>.theta..sub.1) and (T+1)<.delta..sub.3, then T=T+1; [0069]
else if (.parallel.s.parallel.<.beta..sub.3) and
(p>.theta..sub.1), then T=.delta..sub.3; [0070] else if
(p.ltoreq..theta..sub.2) and
(.parallel.s.parallel.>.beta..sub.3), then T=max(.delta..sub.4,
T-1); [0071] else T=T.
[0072] Herein, .beta..sub.1, .beta..sub.2, .beta..sub.3,
.beta..sub.4, .beta..sub.5, .theta..sub.1, .theta..sub.2 and
.epsilon. are parameters that are predetermined or that can be
configured adaptively. In an example, .beta..sub.1=50,
.beta..sub.2=20, .beta..sub.3=9999, .delta..sub.1=5,
.beta..sub.2=4, .delta..sub.3=5, .delta..sub.3=3, .delta..sub.1=10,
.delta..sub.2=10, and .epsilon.=10. In another example, these
parameters may be optimized to maximize the throughput of the
bit-flipping decoder.
[0073] In some embodiments, the error correction mechanism flips a
single bit. In other embodiments, errors are corrected by flipping
multiple bits.
[0074] FIG. 8 illustrates a flowchart of an example method for
improving the convergence of a bit-flipping decoder. Maintaining
the notation of the previously described example, the method 800
begins with initializing the decoder 805 (e.g., by running a dummy
iteration) to generate the bit-flipping threshold T. Similar to
Operation 3 in the above description, the flipping energy for the
i-th column E(i) is computed 810, and compared to the bit-flipping
threshold 815, which results in either a bit being flipped
(.DELTA.=1 if E(j).gtoreq.T) or a bit not being flipped (.DELTA.=0
if E(j)<T).
[0075] The conditional bit-flipping is followed by recomputing the
syndrome (s_new) and the checksum value (C) 820. If the checksum
value is equal to zero (operation 825), then the bit-flipping
decoder is stopped since the received codeword has been correctly
decoded. If the checksum is not equal to zero, then the syndrome
and checksum values are updated 830, and the threshold is updated
835. In an example, the threshold is updated based on the iteration
number, the number of unsatisfied check nodes of the previous
iteration, and/or the number of variable nodes flipped in the
previous iteration, as described above.
[0076] The column index (i) is iterated, whether the last column
was processed is determined 840. If all columns have been
processed, then the iteration index is incremented and the next
iteration is initialized 850. If the iteration index has exceeded
the maximum number of iterations 845, then the bit-flipping decoder
is stopped. Otherwise, control returns to operation 810 where the
flipping energy for the (i+1)-th column E(i+1) is computed.
[0077] FIG. 9A illustrates a flowchart of an example method 900 for
improving convergence of a bit-flipping decoder. The method 900
includes, at operation 910, receiving a noisy codeword, the
codeword having been generated based on a parity check matrix of a
low-density parity-check (LDPC) code and provided to a
communication channel prior to reception by the bit-flipping
decoder.
[0078] The method 900 includes, at operation 920, performing a
single decoding iteration on the received noisy codeword, the
single decoding iteration spanning a plurality of stages.
[0079] FIG. 9B illustrates a flowchart related to operation 920 of
FIG. 9A. In some embodiments, performing the single decoding
iteration includes, for each of the plurality of stages, computing
a metric corresponding to a single column of the parity check
matrix (922).
[0080] Operation 920 includes flipping at least one bit in the
single column upon a determination that the metric exceeds a
flipping threshold (924).
[0081] Operation 920 includes computing, subsequent to the
flipping, a syndrome as a product of the noisy codeword and the
parity check matrix (926).
[0082] Operation 920 includes updating the flipping threshold upon
a determination that the syndrome is not zero, the flipping
threshold for a first stage of the plurality of stages being
updated based on a first set of rules, and updating the flipping
threshold for a second stage subsequent to the first stage being
based on a second set of rules different from the first set of
rules (928).
[0083] In some embodiments, updating the flipping threshold is
based on an index of the decoding iteration, a number of non-zero
values in the syndrome, or a number of bits flipped in a column of
the parity check matrix in a previous decoding iteration.
[0084] In some embodiments, a set of rules comprises a finite state
machine (FSM).
[0085] In some embodiments, the FSM is configured to maximize a
throughput of the bit-flipping decoder. FIG. 7 illustrated that
orders of magnitude of reduction in the CFR of a bit-flipping
decoder only translates to a tiny traffic reduction to the min-sum
hard (MSH) decoder. Thus, embodiments of the disclosed technology
focus on, amongst other features and benefits, improving the
convergence of the bit-flipping decoder instead of simply reducing
the traffic to the MSH decoder.
[0086] In some embodiments, flipping the at least one bit comprises
flipping multiple bits in the single column. In an example,
flipping multiple bits can improve the convergence speed.
[0087] In some embodiments, each stage of the plurality of stages
corresponds to a predetermined number of clock cycles.
[0088] Typically a single bit-flipping decoder iteration operates
over several clock cycles (e.g., number of clock cycles,
K>>1), and the bit-flipping threshold (amongst other
parameters) is only updated after an iteration is completed, i.e.,
the decoder parameters are updated for the next multi clock cycle
iteration if the iteration index has not exceeded the maximum
number of iterations. Embodiments of the disclosed technology
update the bit-flipping threshold after a predetermined number of
clock cycles (e.g., K=1, 2, 5 and so on), which is configured to
occur in the middle of an iteration.
[0089] In some embodiments, an index of the single decoding
iteration is less than a maximum number of decoding iterations.
[0090] The use of LDPC codes in data storage systems are typically
configured for operation at a signal-to-noise ratio (SNR) that is
higher than the typical wireless channel. Consequently, the number
of errors encountered in the received (or read) codeword are fewer
than in the case of a wireless channel. Thus, embodiments of the
disclosed technology are configured to perform LDPC bit-flipping
decoding operations at time increments smaller than a complete
iteration (e.g., per clock cycle or per a predetermined number of
clock cycles). This ensures that if a codeword only has 1-2 bit
errors, and they get corrected after processing the first few
columns, then the remainder of the iteration is not be completed as
in a conventional bit-flipping decoder because the checksum and
syndrome is checked after each bit-flipping operation. This
advantageously increases the decoder throughput and reduces power
consumption.
[0091] The methods and algorithms described above (including
methods 800 and 900) can be configured to update the syndrome at
every clock cycle, which is applied to the flipping indicator (or
energy) calculation after a certain number of cycles of processing
delay. The methods can be further configured to update the flipping
threshold at every single clock cycle (or after every predetermined
number of clock cycles).
[0092] In some embodiments, different finite state machines (FSMs)
can be used to update the threshold at different clock cycles. In
an example, FSM.sub.1 is used to update the threshold at a first
clock cycle, and FSM.sub.2 is used to update the threshold at a
second clock cycle and so on. In another example, FSM.sub.1 is used
initially, FSM.sub.2 is used after a first predetermined number of
clock cycles, and FSM.sub.3 is used for the last set of clock
cycles. In yet another example, and if a quasi-cyclic LDPC
(QC-LDPC) code is being used, the FSM is configured based on
properties of the circulant weights of the parity check matrix. The
FSMs can be optimized to maximize the throughput of the
bit-flipping decoder.
[0093] Implementations of the subject matter and the functional
operations described in this patent document can be implemented in
various systems, digital electronic circuitry, or in computer
software, firmware, or hardware, including the structures disclosed
in this specification and their structural equivalents, or in
combinations of one or more of them. Implementations of the subject
matter described in this specification can be implemented as one or
more computer program products, i.e., one or more modules of
computer program instructions encoded on a tangible and
non-transitory computer readable medium for execution by, or to
control the operation of, data processing apparatus. The computer
readable medium can be a machine-readable storage device, a
machine-readable storage substrate, a memory device, a composition
of matter effecting a machine-readable propagated signal, or a
combination of one or more of them. The term "data processing unit"
or "data processing apparatus" encompasses all apparatus, devices,
and machines for processing data, including by way of example a
programmable processor, a computer, or multiple processors or
computers. The apparatus can include, in addition to hardware, code
that creates an execution environment for the computer program in
question, e.g., code that constitutes processor firmware, a
protocol stack, a database management system, an operating system,
or a combination of one or more of them.
[0094] A computer program (also known as a program, software,
software application, script, or code) can be written in any form
of programming language, including compiled or interpreted
languages, and it can be deployed in any form, including as a
stand-alone program or as a module, component, subroutine, or other
unit suitable for use in a computing environment. A computer
program does not necessarily correspond to a file in a file system.
A program can be stored in a portion of a file that holds other
programs or data (e.g., one or more scripts stored in a markup
language document), in a single file dedicated to the program in
question, or in multiple coordinated files (e.g., files that store
one or more modules, sub programs, or portions of code). A computer
program can be deployed to be executed on one computer or on
multiple computers that are located at one site or distributed
across multiple sites and interconnected by a communication
network.
[0095] The processes and logic flows described in this
specification can be performed by one or more programmable
processors executing one or more computer programs to perform
functions by operating on input data and generating output. The
processes and logic flows can also be performed by, and apparatus
can also be implemented as, special purpose logic circuitry, e.g.,
FPGA (field programmable gate array) or ASIC (application specific
integrated circuit).
[0096] Processors suitable for the execution of a computer program
include, by way of example, both general and special purpose
microprocessors, and any one or more processors of any kind of
digital computer. Generally, a processor will receive instructions
and data from a read only memory or a random access memory or both.
The essential elements of a computer are a processor for performing
instructions and one or more memory devices for storing
instructions and data. Generally, a computer will also include, or
be operatively coupled to receive data from or transfer data to, or
both, one or more mass storage devices for storing data, e.g.,
magnetic, magneto optical disks, or optical disks. However, a
computer need not have such devices. Computer readable media
suitable for storing computer program instructions and data include
all forms of nonvolatile memory, media and memory devices,
including by way of example semiconductor memory devices, e.g.,
EPROM, EEPROM, and flash memory devices. The processor and the
memory can be supplemented by, or incorporated in, special purpose
logic circuitry.
[0097] While this patent document contains many specifics, these
should not be construed as limitations on the scope of any
invention or of what may be claimed, but rather as descriptions of
features that may be specific to particular embodiments of
particular inventions. Certain features that are described in this
patent document in the context of separate embodiments can also be
implemented in combination in a single embodiment. Conversely,
various features that are described in the context of a single
embodiment can also be implemented in multiple embodiments
separately or in any suitable subcombination. Moreover, although
features may be described above as acting in certain combinations
and even initially claimed as such, one or more features from a
claimed combination can in some cases be excised from the
combination, and the claimed combination may be directed to a
subcombination or variation of a subcombination.
[0098] Similarly, while operations are depicted in the drawings in
a particular order, this should not be understood as requiring that
such operations be performed in the particular order shown or in
sequential order, or that all illustrated operations be performed,
to achieve desirable results. Moreover, the separation of various
system components in the embodiments described in this patent
document should not be understood as requiring such separation in
all embodiments.
[0099] Only a few implementations and examples are described and
other implementations, enhancements and variations can be made
based on what is described and illustrated in this patent
document.
* * * * *