Trend-analysis Scheme For Reliably Reading Data Values From Memory Narayanan; Santosh ; et al. [Ajmera; Vishal Deep]

Trend-analysis Scheme For Reliably Reading Data Values From Memory

Narayanan; Santosh ; et al.

Patent Application Summary

U.S. patent application number 14/305585 was filed with the patent office on 2014-10-02 for trend-analysis scheme for reliably reading data values from memory. This patent application is currently assigned to LSI Corporation. The applicant listed for this patent is Vishal Deep Ajmera, Sandesh Kadirudyavara Ven Gowda, Santosh Narayanan, Benzeer Bava Arackal Pazhayakath. Invention is credited to Vishal Deep Ajmera, Sandesh Kadirudyavara Ven Gowda, Santosh Narayanan, Benzeer Bava Arackal Pazhayakath.

Application Number	20140298148 14/305585
Document ID	/
Family ID	49326205
Filed Date	2014-10-02

United States Patent Application	20140298148
Kind Code	A1
Narayanan; Santosh ; et al.	October 2, 2014

TREND-ANALYSIS SCHEME FOR RELIABLY READING DATA VALUES FROM MEMORY

Abstract

In one embodiment, a scheme for reliably reading data values, such as rapidly-changing counter values, from a memory location. Instead of performing a single read operation, a set of N consecutive read operations is performed to obtain a set of N samples. Since, for counter values and the like, the frequency of occurrence of out-of-sequence values is relatively low, it is expected that a majority of the N samples will be in sequence. Of these N samples, the largest subset of monotonically-increasing values is selected. The median value of this subset of monotonically non-decreasing values is returned as a reliable result of the read operation.

Inventors:

Narayanan; Santosh; (Bangalore, IN) ; Pazhayakath; Benzeer Bava Arackal; (Bangalore, IN) ; Ajmera; Vishal Deep; (Bangalore, IN) ; Gowda; Sandesh Kadirudyavara Ven; (Bangalore, IN)

Applicant:

Name	City	State	Country	Type
Narayanan; Santosh Pazhayakath; Benzeer Bava Arackal Ajmera; Vishal Deep Gowda; Sandesh Kadirudyavara Ven	Bangalore Bangalore Bangalore Bangalore		IN IN IN IN

Assignee:

LSI Corporation
San Jose
CA

Family ID:

49326205

Appl. No.:

14/305585

Filed:

June 16, 2014

Related U.S. Patent Documents


Application Number	Filing Date	Patent Number
13444186	Apr 11, 2012	8782504
14305585

Current U.S. Class:	714/807
Current CPC Class:	G06F 11/1004 20130101; H03M 13/09 20130101
Class at Publication:	714/807
International Class:	G06F 11/10 20060101 G06F011/10

Claims

1. A method for obtaining a stored data value from a memory location of a memory, the method comprising: (a) reading a sequence of N stored data values from a memory location of a memory to obtain a sequence of N read data values; (b) for at least one read data value of the sequence, determining whether the read data value follows an expected trend relative to at least one other read data value of the sequence; and (c) if the read data value follows the expected trend, then outputting a value generated based on the sequence of N read data values.

2. The invention of claim 1, wherein: the memory contains a stored checksum value generated by an algorithm using one or more of the N stored data values as input values, and further comprising: reading the stored checksum value from the memory to obtain a read checksum value; using the algorithm to generate a calculated checksum value, wherein the algorithm uses one or more of the N read data values as input values; and comparing the calculated checksum value with the stored checksum value.

3. The invention of claim 2, wherein steps (b) and (c) are performed only if the calculated checksum value matches the stored checksum value.

4. The invention of claim 1, wherein: identification of the specified memory location for performing step (a) is specified as part of a read request, and further comprising: determining the actual memory location from which the sequence of N stored data values was read in step (a); comparing the specified memory location with the actual memory location; and

5. The invention of claim 4, wherein steps (b) and (c) are performed only if the specified memory location matches the actual memory location

6. The invention of claim 1, wherein the expected trend is a monotonically-increasing trend.

7. The invention of claim 1, wherein the outputted value is a median value of a selected subset of values in the set of N read data values.

8. The invention of claim 7, wherein the selected subset of values is the subset that has the largest number of values from among all subsets of values that follow the expected trend in the set of N read data values.

9. The invention of claim 8, wherein the largest number of values is greater than or equal to N/2.

10. The invention of claim 1, further comprising, if the read data value does not follow the expected trend, then repeatedly performing steps (a) through (c) until the read data value follows the expected trend.

11. The invention of claim 10, further comprising ceasing the repetition of steps (a) through (c) if a maximum number M of iterations is reached.

12. The invention of claim 11, wherein the value of M is adaptively determined based on one or more prior read operations.

13. The invention of claim 1, wherein the value of N is adaptively determined based on one or more prior read operations.

14. The invention of claim 1, further comprising: selecting the subset that has the largest number of values from among all subsets of values that follow the expected trend in the set of N read data values; if the number of values in the selected subset of values is greater than an upper threshold, then reducing the value of N by a fixed step size; and if the number of values in the selected subset of values is smaller than a lower threshold, then increasing the value of N by a fixed step size.

15. Apparatus for obtaining a reliable stored data value from a memory location of a memory, the apparatus adapted to: (a) read a sequence of N stored data values from a memory location of a memory to obtain a sequence of N read data values; (b) for at least one read data value of the sequence, determine whether the read data value follows an expected trend relative to at least one other read data value of the sequence; and (c) if the read data value follows the expected trend, then output a value generated based on the sequence of N read data values.

16. The invention of claim 15, wherein the expected trend is a monotonically-increasing trend.

17. The invention of claim 15, wherein the outputted value is a median value of a selected subset of values in the set of N read data values.

18. The invention of claim 17, wherein the selected subset of values is the subset that has the largest number of values from among all subsets of values that follow the expected trend in the set of N read data values.

19. The invention of claim 18, wherein the largest number of values is greater than or equal to N/2.

20. The invention of claim 15, further comprising: selecting the subset that has the largest number of values from among all subsets of values that follow the expected trend in the set of N read data values; if the number of values in the selected subset of values is greater than an upper threshold, then reducing the value of N by a fixed step size; and if the number of values in the selected subset of values is smaller than a lower threshold, then increasing the value of N by a fixed step size.

Description

CROSS-REFERENCE TO RELATED APPLICATION

[0001] The instant application is a divisional of co-pending U.S. application Ser. No. 13/444,186, filed Apr. 11, 2012, the disclosure of which is hereby incorporated by reference into the instant application.

BACKGROUND OF THE INVENTION

[0002] 1. Field of the Invention

[0003] The present invention relates, generally, to the reading of stored values from a memory location, and more particularly but not exclusively, to the identification of erroneous values returned during such reads.

[0004] 2. Description of the Related Art

[0005] This section introduces aspects that may help facilitate a better understanding of the invention. Accordingly, the statements of this section are to be read in this light and are not to be understood as admissions about what is prior art or what is not prior art.

[0006] Various applications involve the use of memory locations that contain continuously-changing values. For example, in networking systems such as routers, switches, gateways, and mobile backhaul systems, packet counters and byte counters are implemented at various stages of packet processing, based on various fields in the packets being processed. Network-protocol stack implementations employ packet-level and byte-level counters at various levels of granularity. The usefulness and criticality of such counters are of paramount importance, because such counters are linked to standard, management information base (MIB)-level counters, which are used by service providers, e.g., for billing, accounting, and monitoring purposes.

[0007] Often, in network processor-based implementations, counters are implemented with a fixed resolution (in terms of the number of bits used to represent a counter) and are often read periodically to account for rollovers. Such counters are implemented in the data plane, i.e., the processing units where packet-forwarding and/or packet-switching decisions are made in real time. The control plane, i.e., the configuration unit, periodically reads these counter values from data-plane memory and interprets those values.

[0008] In hardware-based implementations, such as programmable network processors, such counters are typically maintained in memory elements by the data plane and are periodically read by the control plane for interpretation. A common problem that occurs during this process is the manifestation of spurious read errors. Such errors can lead to inaccurate statistics, from which recovery can be difficult or impossible.

[0009] For example, if the control plane reads a spurious value from the data plane for a counter, this can often result in an irrecoverable error. Such an error can have a disastrous impact on network billing and other processes that rely on the counters and packet statistics.

[0010] More specifically, such packet and/or byte counters may be maintained, e.g., in an statistics engine memory space or in a traffic-manager (TM) and/or traffic-shaper (TS) parameter memory space.

[0011] ASI memory can be used to maintain packet statistics during an initial packet-classification stage, prior to being handled by a traffic-manager engine. Since the traffic manager can drop packets based on traffic-management algorithms, ASI counters might not be sufficiently accurate for statistics regarding transmitted packets.

[0012] TM/TS-parameter memory has a fixed number of bytes used for maintaining packets and/or bytes transmitted from a particular queue of an interface. TM/TS counters, for each of the queues of the interface, are periodically read by the control-plane software, which estimates the number of packets and/or the number of bytes transmitted over the interface.

[0013] Periodic reading of the TM/TS-parameter memory by the control plane is performed to estimate counter rollovers. The control plane detects a counter rollover by comparing the current read value with the previous value. If the current value is smaller than the previous value, then a rollover is detected, and a control-plane version of the counter (e.g., an MIB counter) is incremented appropriately.

[0014] Transient TM/TS-parameter memory-read errors can cause the periodic return of spurious values for certain counters. In turn, these spurious read values cause incorrect counter increments in the control-plane version of the counters.

[0015] Even more serious is the case when the spurious value is such that it causes the control-plane logic to detect a false counter rollover. False counter-rollover detection occurs when the read value in one instance is a spurious value that is smaller than the previously-read value. False counter rollovers can cause huge counter discrepancies and consequently irrecoverable errors. This is because the control plane, unaware of the error, will continue maintaining and accumulating incorrect counter values.

[0016] When memory-read accesses take place at very high data rates (e.g., as is typical in a Gigabit Ethernet or faster interfaces), three principal types of memory-read errors are observed:

[0017] 1. The memory-read operation may return a spurious value from the correct memory location (the "first type" of error).

[0018] 2. The memory-read operation may return a correct value, but from an incorrect parameter-memory location (the "second type" of error).

[0019] 3. The memory-read operation may return a correct value, but that value is presented out of sequence to the corresponding application (the "third type" of error).

[0020] Known solutions to the first type of error, including performing repeated read operations and selecting the value that occurs most often, do not properly handle the scenario of a rapidly-changing value in the memory-location address. Accordingly, despite reducing the occurrence of the first type of error, such solutions do not reduce the occurrences of the second and third types of error.

SUMMARY OF THE INVENTION

[0021] Certain embodiments of the invention aim to characterize and identify memory-read errors in a reliable fashion so that the incidence of errors can be reduced or eliminated.

[0022] In one embodiment, the present invention provides a method and apparatus for obtaining a reliable stored data value from a memory location of a memory. A sequence of N stored data values is read from a memory location of a memory to obtain a sequence of N read data values. For at least one read data value of the sequence, it is determined whether the read data value follows an expected trend relative to at least one other read data value of the sequence. If the read data value follows the expected trend, then a value (e.g., a median value) generated based on the sequence of N read data values is outputted.

BRIEF DESCRIPTION OF THE DRAWINGS

[0023] FIG. 1a is a block diagram illustrating an exemplary network processor in one embodiment of the invention;

[0024] FIG. 1b is a flow diagram illustrating an exemplary sequence of read operations in the network processor of FIG. 1a;

[0025] FIG. 2 is a table showing, for each of a plurality of time instants, (i) the actual hardware counter value maintained in memory, (ii) the value returned as part of the memory-read operation, and (iii) the resultant application-level counter value calculated by the control-plane logic, including the interpretation of a false counter rollover; and

[0026] FIGS. 3a and 3b are collectively a flow diagram illustrating an exemplary method employing a combination of checksum, address-aliasing, and trend-analysis techniques to identify and/or correct errors in connection with a read operation from memory used to store rapidly-changing counter values.

DETAILED DESCRIPTION

[0027] Certain embodiments of the invention employ a method for reliably reading a continuously-changing value maintained in a memory location, including a set of techniques for detecting errors and identifying reliable values in a set of successive memory-read operations, as well as a fallback mechanism to signal a memory-read error in the event none of those techniques indicates a correct value with a sufficient level of confidence.

[0028] In one embodiment, the memory location is a counter in a network processor, such as a traffic-manager (TM) and/or traffic-shaper (TS) parameter memory, and a method consistent with the invention is implemented to reduce or eliminate errors that occur during the reading out of packet-counter values stored in the parameter memory. In this scenario, run-time environment (RTE) software or driver software accesses the parameter memory during read operations issued by a functional-programming layer (or application layer). This embodiment of the invention provides a mechanism to analyze and characterize parameter-memory reads when these locations are used to store byte-level and/or packet-level counters.

[0029] An application-level counter typically has a much-higher bit width than the corresponding counter maintained in the data plane. To keep its counter properly updated, the application periodically performs a read operation on the data-plane counter and accumulates the difference value in the higher bit-width application counter. If a false counter rollover is detected, then the error in the interpreted value of the counter can be relatively large. Due to the continuous accumulation of the difference value, even a very infrequent read error can cause a relatively large error to persist in the application-level counter, as will be demonstrated by the following example.

Exemplary Network Processor

[0030] FIG. 1a illustrates an exemplary network processor 100, which includes a data plane 102 and a control plane 104. Switching (i.e., packet forwarding) is performed in data plane 102, while routing (i.e., the exchange of routing information) is performed in control plane 104.

[0031] Data plane 102 includes external interface 106, a classifier engine 108, a traffic manager/traffic shaper (TM/TS) 110, a hardware mechanism 112, and a queue memory 114 that includes one or more packet/byte counters 116. External interface 106, which may include a plurality of physical-layer interfaces, including, e.g., MAC, 16-bit UTOPIA 2, SPI3, and POS-PHY2 interfaces, communicates and exchanges data packets with one or more physical-layer devices. Classifier engine 108 serves as a co-processor to optimize access-control list (ACL) processing and Layer-2 MAC-address lookups. TM/TS 110 decides where, when, and how incoming and outgoing data is routed. TM/TS 110, which is coupled to use DRAM memory (not shown), strips, adds, and modifies packet headers and also makes routing and scheduling decisions. Hardware mechanism 112 reads packet and byte counters 116 out of queue memory 114.

[0032] Control plane 104 includes processors 120a, 120b, a memory 122, one or more application-level counters 124, and an advanced extensible interface (AXI) 126. Processors 120a, 120b are coupled to employ memory 122, where instructions and/or program data reside. Processors 120a, 120b are responsible for performing a number of packet-specific operations that may be less time-critical than those performed in data plane 102, including, e.g., execution of routing protocols, management of routing tables, and so forth. Application-level counters 124 are counter registers that are maintained for software executed at the application level and are updated by control-plane logic based on packet and byte counters 116. AXI interface 126 provides a bus interface for control plane 104 to fetch instructions from external memory, read data from external memory, write data to external memory, and access peripheral devices.

[0033] It should be understood that network processor 100 may include one or more other additional elements not shown in FIG. 1a and may include fewer than all elements shown in FIG. 1a. For clarity, certain elements of network processor 100 are omitted from FIG. 1a, e.g., a memory controller, a security-protocol processor, an internal data-path memory, a management port, a segmentation engine, protocol data unit (PDU) buffers, a queue controller, a link list controller, a policing engine, a statistics engine, or the like.

Exemplary Sequence of Read Operations and Counter Corruption

[0034] FIG. 1b illustrates an exemplary sequence 150 of read operations in network processor 100. First, at step 151, data packets are received from external interface 106. Next, at step 152, classifier engine 108 sends those packets to queue memory 114. Then, at step 153, TM/TS 110 updates corresponding packet and byte counters 116 in queue memory 114 to reflect those packets received. Once the queue-level packet and byte counters 116 are updated, (i) at step 154, the data packets are forwarded to external interface 106, and (ii) at step 155, hardware mechanism 112 reads packet and byte counters 116 out of queue memory 114. Next, at step 156, control plane 104 receives the packet- and byte-counter values that were read out of queue memory 114 by hardware mechanism 112. Next, at step 157, the control-plane logic (as provided, e.g., from memory 122) uses the read-out counter values to update corresponding application-level counters 124. In the scenario of FIG. 1b, an exemplary illustration of counter corruption due to a spurious memory-read error occurring during step 155 will now be discussed with reference to the table of FIG. 2, which shows, for each of time instants i through (i+2), (i) the actual 32-bit hardware counter value maintained in memory, (ii) the value returned as part of the memory-read operation, and (iii) the resultant application-level counter value calculated by the control-plane logic (including the interpretation of a false counter rollover).

[0035] As shown, in this example, at time instant i, the actual value maintained in the data plane is 20000, which value is correctly read out and returned in the control plane as 20000. Consequently, the control-plane logic calculates and stores 20000 as the application-level version of the counter.

[0036] Next, at time instant (i+1), a read error occurs when the actual value maintained in the data plane is 40000, but that value is not correctly read out. Instead, an erroneous value of only 1000 is returned in the control plane. Based on this erroneous value, the control-plane logic detects a false counter rollover due to the fact that the value of 1000 returned in the control plane at time instant (i+1) is smaller than the value of 20000 returned in the previous iteration at time instant i. On this basis, the control-plane logic calculates the application-level counter value as ((2.sup.32-1-20000)+1000), or 4294948295, which is a highly erroneous value.

[0037] Subsequently, at time instant (i+2), the actual value maintained in the data plane is 60000, which value is correctly read out and returned in the control plane as 60000. However, because the application-level counter was corrupted at time instant (i+1) due to the spurious value of 1000 that was returned, the control-plane logic now calculates and stores the erroneous value of (4294948295+(60000-1000)), or 4295007295, as the control-plane version of the counter.

[0038] Thus, it can be seen that, even though the memory location in the data plane maintains the correct counter value at all times, an erroneous application-level counter value results from the incorrect value returned to the control plane during the flawed memory-read operation at time instant (i+1).

Experimental Data and Types of Errors

[0039] To solve the problems that arise from transient spurious memory-read errors, such as those discussed in the foregoing example, the inventors obtained experimental data by performing several hours of soak tests (i.e., tests to verify stability and performance characteristics over an extended period of time) on multiple different platforms using the LSI APP3K network processor. The frequency of errors, as well as the maximum burstiness of such errors (i.e., a ratio of the number of erroneous reads in a limited set of N consecutive reads) was studied in soak tests spanning several days. The parameter of maximum burstiness is an important parameter, because the detection of errors and the consequent selection of a correct value in a sample set containing both good and spurious values depends on this parameter. To illustrate this point, if there are 50% spurious values in N consecutive read operations, then the detection of spurious values is very difficult. On the other hand, if there are only 10% spurious values in a set of N consecutive reads, then the detection and filtering of spurious values is much easier. Traffic was pumped into the network at various rates in these experiments to study the effect of traffic rate on these transient read errors.

[0040] Based on the experimental observations, two parameters were found to have a direct bearing on the frequency as well as maximum burstiness of the memory-read errors: (i) hardware platform and (ii) traffic rate.

[0041] The error rate and burstiness of memory-read errors were found to be different in different hardware platforms, which may be due to various hardware-parameter differences across various platforms, as well as other operating-environment parameters (e.g., temperature) that could also potentially influence such errors.

[0042] It was further observed that, the higher the traffic rate, the higher the error rate and burstiness of memory-read errors. At relatively low rates (e.g., 100 Mbps), there were few incidents of memory-read errors. On the other hand, at Gigabit Ethernet rates (e.g., 1000 Mbps and faster), the rate and burstiness of memory-read errors was found to be much higher.

[0043] In the network processors, read errors observed in the case of TM/TS-parameter reads were found to be caused by: (i) the read operation from the control plane returning a spurious value instead of the correct value maintained in the read memory location (the "first type" of error), (ii) address-aliasing errors, namely, the read operation returning a correct value but from a memory location other than the intended memory location (the "second type" of error), and (iii) the read operation returning a correct value from the memory location but in an out-of-sequence manner when multiple read operations are scheduled from the control plane (the "third type" of error).

[0044] The set of experiments that were conducted verified that the memory locations always maintained the correct value, and that the sequence of read operations taking place was the cause of the issue, i.e., the first, second, and/or third types of error were responsible for generating the read error. By running these tests for several days, it was also confirmed empirically that, when erroneous values are returned during the first type of error, the error usually occurred in only one or two bytes of the sequence of bytes read in a single read operation.

Exemplary Scheme for Error Reduction

[0045] The first step in providing a mechanism for reducing and/or eliminating errors is to reliably identify an erroneous value when a read operation is performed from the control plane. In certain embodiments of the invention, three different techniques can be used, either alone or in combination with one another, to identify erroneous values.

[0046] The first technique employs a checksum procedure for all of the data bytes read in a typical read operation, to verify the integrity of the read-out data. Initially, a first-checksum value is calculated in the data plane when the logic writes this data, and the first-checksum value is stored along with the data values. The same checksum algorithm is later used in the control-plane read operation to generate a second-checksum value, which is compared against the stored first-checksum value for the data bytes read. If there are N bytes of data read out during a read operation, and a 16-bit checksum is used, then each read operation reads a total of (N+2) bytes of data. An erroneous value can be returned during a read of any one or more data bytes, including a read of the data values themselves or a read of the corresponding checksum value. In either of these cases, the read-out checksum and the calculated checksum will not match. For the foregoing checksum technique, a single checksum value can be generated for each individual counter or memory location, or alternatively, a single checksum value can be generated for a plurality of counters or memory locations. Although this is a reliable error-detection mechanism for the first type of error described above, the use of checksum values does not detect the second and third types of errors. The reason for this is that the second and third types of errors do not involve data corruption, but rather, those types of errors are due to a read of correct data from an incorrect, unintended, or unexpected memory location. Moreover, a checksum mechanism alone might detect a "false positive," such as when more than one error occurs in a single read. For example, if one data bit is incorrectly read as a 1 and a different data bit is incorrectly read as a 0, then the data might will pass the checksum test, because the correct checksum will still be computed. As another example, an error can occur in the data and another error in the checksum such that the data will still pass the checksum test.

[0047] The second technique returns the base-memory location of the read-out data values along with the data values themselves. This method is used to detect the address-aliasing errors that occur in the second type of error. An effective way to verify whether the data is received from the desired memory location is to return the address of the memory location that is being read out along with the data that is read out. In one embodiment, parameter memory is used to represent a counter that is specific to each queue. Since the control plane-based read operation of this data is per-queue, if the data plane returns an identifier of the corresponding queue along with the data bytes, then the control plane can verify whether the data is read from the parameter memory of the correct queue. Although this is a reliable error-detection mechanism for the second type of error described above, returning memory locations along with data does not detect the first and third types of errors.

[0048] The third technique performs a trend analysis to identify out-of-sequence values. This method can be used to detect the third type of error, which is the most-complex error scenario and cannot be detected by the first (checksum) or second (memory-address information) techniques. The trend-analysis technique employs prior knowledge of the nature and sequence of the values being read. In this approach, instead of a single read, a series of consecutive read operations are performed. Subsequently, the values in each of the read samples in the set are analyzed to detect out-of-sequence reads. Further details of the trend-analysis technique will now be explained.

[0049] In the packet-counter scenario, the data that is updated by the data plane and read periodically by the control plane is a set of data-packet counters, each of which continuously increases monotonically, except in the event of counter rollover. In other words, if any two consecutive reads for a particular packet counter are performed at instances t.sub.1 and t.sub.2 such that t.sub.2>t.sub.1, the value at instant t.sub.2 should be higher than the value at instant t.sub.1. The only exception to this trend is counter overflow, i.e., the case where the counter value exceeds the fixed bit width and restarts from zero. In other words, each counter is normally expected to show a monotonically non-decreasing trend across consecutive read operations.

[0050] The trend-analysis technique for extracting a value in the correct sequence is based on reading a set of consecutive samples. Instead of performing a single read operation, a set of N consecutive read operations are performed, with each read operation corresponding to a single sample. Since the frequency of occurrence of out-of-sequence values is low, the expectation is that, of the N samples, a majority of the samples will be in sequence. In one embodiment, of these N samples, the subset of monotonically non-decreasing values that has the largest number of values is selected. For example, suppose N=100, and there are three subsets within those 100 values having values that are monotonically non-decreasing: a first subset of 7 values, a second subset of 42 values, and a third subset of 23 values. The "correct" value to select (i.e., a reliable value to return as a result of the read operation) would be the median value of the second subset (of 42 monotonically non-decreasing values), because that subset has the largest number of values (i.e., 42) from among the three subsets (i.e., 7, 42, and 23).

[0051] The trend-analysis technique may not work correctly if the subset size is less than N/2. Accordingly, if the subset size is smaller than N/2, then the set of N consecutive reads is rejected, and another set of N reads is performed. This set of N reads is performed a maximum of M times. If none of the M read operations results in a correct value, with each operation extracting N samples, then an irrecoverable error is returned to the application layer. This irrecoverable-error scenario is of an extremely low probability and was never actually encountered during the inventors' experimental scenarios, even under heavy traffic load and live-network testing.

[0052] The trend-analysis technique can correctly handle the counter-rollover condition. If rollover occurs in the middle of a set of N reads, then it might not be possible to determine a subset of N/2 of increasing values. However, the subsequent iteration of N reads compensates for rollover occurring in the middle of a set of N reads.

[0053] An adaptive-tuning method can be used to optimize parameter values for the number N of samples read, as well as the maximum number M of such reads, both of which can be tunable parameters. In a practical system, these values can be hardcoded based on empirical data obtained through experimental and/or live-network testing. However, these parameters can also be made adaptive based on previous read operations. If the number N of values in the subset of the monotonically-increasing values is greater than an upper-threshold value, then the value of N can be reduced in the next iteration by a fixed step size, which is identified by the variable step. On the other hand, if the number N of values in the subset is smaller than a lower-threshold value, then the value of N can be increased by the fixed step size step. These can be bounded by keeping track of a minimum value N.sub.lt for N and a maximum value N.sub.ut for N. Mathematically, this can be represented as:

{N:N.epsilon.[N.sub.lt, N.sub.lt,+step, N.sub.lt+2*step, . . . N.sub.ut]}. (1)

In Equation (1), the variable N.sub.lt refers to the lower-threshold value of N, i.e., the minimum value, and the variable N.sub.ut refers to the upper-threshold value of N, i.e., the maximum value. The actual value of N can vary between lower threshold N.sub.lt and upper threshold N.sub.ut. The value of N can be increased or decreased by fixed integer step sizes. These thresholds and step sizes can be configured based on empirical data. The step sizes for increasing N and decreasing N do not necessarily have to be the same. In practical cases, a fixed value of N can also suffice.

[0054] By combining the first, second, and third techniques described above, as will now be explained with reference to flowchart 300 of FIGS. 3a and 3b, all occurrences of the first, second, and third types of error can reliably be detected. It is noted that, in this embodiment, the checksum steps (including steps 303, 304, and 305) and address-aliasing steps (including steps 303 and 306) are performed for the entire set of read counter values, while the trend-analysis steps (including steps 310 and 311) are performed for each separate counter value.

[0055] First, at step 301, an outer-loop counter is initialized by setting the value of counter j to zero. Next, at step 302, an inner-loop iteration count is initialized by setting the value of counter i to zero.

[0056] Next, at step 303, the TM/TS-parameter memory is read to obtain (i) a set of two or more data values, (ii) a stored checksum for those data values (preferably, for each read operation, a single checksum value is retrieved that corresponds to a plurality of counters), and (iii) the base address of the TM/TS memory at which those data values are stored.

[0057] Next, at step 304, a checksum for the read values is calculated. Next, at step 305, the calculated checksum is compared with the read checksum. The calculated checksum is calculated using the same algorithm that originally generated and wrote the stored checksum that was read out at step 303. If the values do not match, then, at step 307, the read values are rejected, and the method returns to step 303 so that the values and checksum can be re-read. If the values match, then the method proceeds to step 306.

[0058] At step 306, a determination is made whether the address of the memory location being read out matches the intended memory address for the current data read. If the memory location does not match (indicating that data values were read from an unintended memory address location), then, at step 307, the read values are rejected, and the method returns to step 303 so that the values and checksum can be re-read. If the memory locations match, then the method proceeds to step 308.

[0059] At step 308, a determination is made whether the number of read iterations has reached a given threshold N (the selection of N is discussed in further detail above). If threshold N has not been reached, then, at step 309, the value of counter i is incremented by 1, and the method returns to step 303 so that the values and checksum can be re-read. If the threshold has been reached, then the method proceeds to step 310. It is noted that, in some embodiments, the value of N is a constant, and in other embodiments, N takes the form of a small range of values whose upper limit N.sub.ut and lower limit N.sub.lt are constants selected based on empirical knowledge of error rates.

[0060] At step 310, the N read samples for the counter are arranged in order from i=0, . . . , N-1. Next, at step 311, a determination is made whether the N samples are monotonically non-decreasing. This is done based on the rationale that, for any particular packet counter, consecutive values should be non-decreasing (except at counter rollover), so successive reads should always exhibit a monotonically non-decreasing trend. If, at step 311, it is determined that the N samples are monotonically non-decreasing, then the method proceeds to step 315. If not, then the method proceeds to step 312.

[0061] At step 315, the median value of the read N values is selected as the read value of the counter. Next, at step 316, a determination is made whether N minus step size step is less than the lower-threshold value N.sub.lt of N. If so, then the method proceeds to step 318. If not, then, at step 317, the value of N is decremented by step size step, and the method then proceeds to step 318. At step 318, the method terminates with a complete and successful read condition.

[0062] At step 312, the value of counter j is incremented by 1. Next, at step 313, a determination is made whether the number of sample sets of size Nis less than the maximum number M of reads. If not, then the method proceeds to step 314. If so, then, at step 319, a determination is made whether N plus step size step is greater than the upper threshold value N.sub.ut of N. If so, then the method returns to step 302. If not, then, at step 320, the value of N is incremented by step size step, and the method then proceeds to step 302.

[0063] At step 314, the method terminates with a read failure condition.

[0064] It is noted that, in the foregoing method, the inner loop eventually counts up to a value of N, which represents the number of samples used for trend analysis. If a valid value exists in this set, then the operation completes with a success condition. Otherwise, another set of N reads is performed, for a maximum of M times. If all M read iterations fail, then the operation completes with a read failure being reported to the application. To achieve adaptive step-size adjustment, the value of N can be increased or decreased in integer steps, in a small range (e.g., in steps 317 and 320). This adaptive method analyzes the statistical pattern of memory-read errors and uses this analysis to tune the memory-read mechanism.

Alternative Embodiments

[0065] Although the foregoing description discusses a method in which checksum, address-aliasing, and trend-analysis techniques are all used, to identify as many types of errors as possible, fewer than all three techniques could alternatively be used in alternative embodiments. In one embodiment, a sequence of read operations is performed by employing a combination of a trend-analysis technique with only one other of these techniques, with the goal of reliable detection of memory-read errors and the selection of a correct value. In another embodiment, only a trend-analysis technique is used, without any checksum or address-aliasing techniques being used at all.

[0066] The term "memory," as used herein, can refer to a single portion of a physical memory device, an entire physical memory device, a plurality of physical memory devices, or portions of a plurality of physical memory devices.

[0067] The term "memory location," as used herein and in the appended claims, means one or more addressable areas for the storage of data. Those one or more addressable areas can be individual memory cells and/or groups of memory cells. Those memory cells and/or groups of memory cells can be read and/or written simultaneously (e.g., as sections or entire rows in a memory array), or can alternatively be read and/or written at different times and in various sequences. Those memory cells and/or groups of memory cells may be contiguous or non-contiguous and may reside on a single physical memory device or span multiple physical memory devices.

[0068] While, in certain embodiments, the memory location is used to store a rapidly-changing counter value, it should be understood that, in other embodiments, the memory location can be used for storing and retrieving other types of data and does not necessarily have to contain rapidly-changing and/or counter values.

[0069] The terms "median" and "median value," as used herein, refer to (i) a value in a set of values at which (or near which) about 50% of the other values in the set are smaller and about 50% of the other values in the set are greater, or alternatively, (ii) a value that is not part of the set of values itself, but at which (or near which) about 50% of the values in the set are smaller and about 50% of the values in the set are greater (e.g., if there is an even number of values in the set, then there is no single middle value in the set, and so the median might then be defined as the mean of the two middle values).

[0070] Further, although specific embodiments are described herein in which a median value generated based on a sequence of N read data values, or a subset thereof, is used as an output value, it should be recognized that other values based on the N read data values, or a subset thereof, could be outputted in alternative embodiments. For example, the first or last value in the set of N read values, the average value, a weighted average value, or some other value based on one or more of the N read values could be outputted instead of outputting a median value. It should be understood that the terms "trend" and "expected trend," as used herein, include not only monotonically-increasing trends, as described in the embodiments discussed herein, but also monotonically-decreasing trends and other types of trends.

[0071] Reference herein to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the invention. The appearances of the phrase "in one embodiment" in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments necessarily mutually exclusive of other embodiments. The same applies to the term "implementation."

[0072] The present invention may be implemented as circuit-based processes, including possible implementation as a single integrated circuit (such as an ASIC or an FPGA), a multi-chip module, a single card, or a multi-card circuit pack. As would be apparent to one skilled in the art, various functions of circuit elements may also be implemented as processing blocks in a software program. Such software may be employed in, for example, a digital signal processor, micro-controller, or general-purpose computer.

[0073] The present invention can be embodied in the form of methods and apparatuses for practicing those methods. The present invention can also be embodied in the form of program code embodied in tangible media, such as magnetic recording media, optical recording media, solid state memory, floppy diskettes, CD-ROMs, hard drives, or any other machine-readable storage medium, wherein, when the program code is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the invention. The present invention can also be embodied in the form of program code, for example, whether stored in a storage medium, loaded into and/or executed by a machine, or transmitted over some transmission medium or carrier, such as over electrical wiring or cabling, through fiber optics, or via electromagnetic radiation, wherein, when the program code is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the invention. When implemented on a general-purpose processor, the program code segments combine with the processor to provide a unique device that operates analogously to specific logic circuits. The present invention can also be embodied in the form of a bitstream or other sequence of signal values electrically or optically transmitted through a medium, stored magnetic-field variations in a magnetic recording medium, etc., generated using a method and/or an apparatus of the present invention.

[0074] It will be further understood that various changes in the details, materials, and arrangements of the parts which have been described and illustrated in order to explain the nature of this invention may be made by those skilled in the art without departing from the scope of the invention as expressed in the following claims.

[0075] The use of figure numbers and/or figure reference labels in the claims is intended to identify one or more possible embodiments of the claimed subject matter in order to facilitate the interpretation of the claims. Such use is not to be construed as necessarily limiting the scope of those claims to the embodiments shown in the corresponding figures.

[0076] It should be understood that the steps of the exemplary methods set forth herein are not necessarily required to be performed in the order described, and the order of the steps of such methods should be understood to be merely exemplary. Likewise, additional steps may be included in such methods, and certain steps may be omitted or combined, in methods consistent with various embodiments of the present invention.

[0077] The embodiments covered by the claims in this application are limited to embodiments that (1) are enabled by this specification and (2) correspond to statutory subject matter. Non-enabled embodiments and embodiments that correspond to non-statutory subject matter are explicitly disclaimed even if they fall within the scope of the claims.

* * * * *