U.S. patent application number 12/912735 was filed with the patent office on 2012-04-26 for scalable prediction failure analysis for memory used in modern computers.
This patent application is currently assigned to INTERNATIONAL BUSINESS MACHINES CORPORATION. Invention is credited to Tu T. Dang, Michael C. Elles, Juan Q. Hernandez, Dwayne A. Lowe, Challis L. Purrington.
Application Number | 20120102367 12/912735 |
Document ID | / |
Family ID | 45974011 |
Filed Date | 2012-04-26 |
United States Patent
Application |
20120102367 |
Kind Code |
A1 |
Dang; Tu T. ; et
al. |
April 26, 2012 |
Scalable Prediction Failure Analysis For Memory Used In Modern
Computers
Abstract
One embodiment provides a method for scalable predictive failure
analysis. Embodiments of the method may include gathering memory
information for memory on a user computer system having at least
one processor. Further, the method includes selecting one or more
memory-related parameters. Further still, the method includes
calculating based on the gathering and the selecting, a single bit
error value for the scalable predictive failure analysis through
calculations for each of the one or more memory-related parameters
that utilize the memory information. Yet further, the method
includes setting, based on the calculating, the single bit error
value for the user computer system.
Inventors: |
Dang; Tu T.; (Cary, NC)
; Elles; Michael C.; (Apex, NC) ; Hernandez; Juan
Q.; (Garner, NC) ; Lowe; Dwayne A.; (Durham,
NC) ; Purrington; Challis L.; (Raleigh, NC) |
Assignee: |
INTERNATIONAL BUSINESS MACHINES
CORPORATION
Armonk
NY
|
Family ID: |
45974011 |
Appl. No.: |
12/912735 |
Filed: |
October 26, 2010 |
Current U.S.
Class: |
714/47.3 ;
714/E11.02 |
Current CPC
Class: |
G06F 11/0754 20130101;
G06F 3/0673 20130101; G06F 3/0653 20130101; G06F 11/1048 20130101;
G06F 11/079 20130101; G06F 3/0619 20130101; G06F 11/076 20130101;
G11C 2029/0409 20130101; G11C 29/42 20130101; G06F 2201/81
20130101; G11C 29/50004 20130101; G06F 11/0727 20130101 |
Class at
Publication: |
714/47.3 ;
714/E11.02 |
International
Class: |
G06F 11/00 20060101
G06F011/00 |
Claims
1. A method for scalable predictive failure analysis, the method
comprising: gathering memory information for memory on a user
computer system having at least one processor; selecting one or
more memory-related parameters; calculating, based on the gathering
and the selecting, a single bit error value for the scalable
predictive failure analysis through calculations for each of the
one or more memory-related parameters that utilize the memory
information; and setting, based on the calculating, the single bit
error value for the user computer system.
2. The method of claim 1, further comprising detecting, subsequent
to the setting, one or more single bit errors for the memory.
3. The method of claim 1, further comprising comparing, subsequent
to the setting, a counted number of single bit errors for the
memory to the value.
4. The method of claim 1, further comprising alerting, subsequent
to the setting, if a counted number of single bit errors for the
memory at least equals the single bit error value.
5. The method of claim 1, further comprising returning to sleep,
subsequent to the setting, if a counted number of single bit errors
for the memory fails to exceed the single bit error value.
6. The method of claim 1, further comprising re-setting, according
to the method, the single bit error value for the user computer
system upon a memory replacement.
7. The method of claim 1, further comprising reporting the single
bit error value and any results from the method on a display
associated with the user computer system.
8. A computer program product for scalable predictive failure
analysis: a computer readable storage device; first program
instructions to gather memory information for memory on a user
computer system having at least one processor; second program
instructions to select one or more memory-related parameters; third
program instructions to calculate based on the gather and the
select, a single bit error value for the scalable predictive
failure analysis through calculations for each of the one or more
memory-related parameters that utilize the memory information;
fourth program instructions to set, based on the calculate, the
single bit error value for the user computer system; and wherein
the first, second, third, and fourth program instructions are
stored on the computer readable storage device.
9. The computer program product of claim 8, further comprising
fifth program instructions to detect, subsequent to the set, one or
more single bit errors for the memory; and wherein the fifth
program instructions are stored on the computer readable storage
device.
10. The computer program product of claim 8, further comprising
fifth program instructions to compare, subsequent to the set, a
counted number of single bit errors for the memory to the value;
and wherein the fifth program instructions are stored on the
computer readable storage device.
11. The computer program product of claim 8, further comprising
fifth program instructions to alert, subsequent to the set, if a
counted number of single bit errors for the memory at least equals
the single bit error value; and wherein the fifth program
instructions are stored on the computer readable storage
device.
12. The computer program product of claim 8, further comprising
fifth program instructions to return to sleep, subsequent to the
set, if a counted number of single bit errors for the memory fails
to exceed the single bit error value; and wherein the fifth program
instructions are stored on the computer readable storage
device.
13. The computer program product of claim 8, further comprising
fifth program instructions to re-set, according to the method, the
single bit error value for the user computer system upon a memory
replacement; and wherein the fifth program instructions are stored
on the computer readable storage device.
14. A system for scalable predictive failure analysis, the system
comprising: a processor, a computer readable memory and a computer
readable storage device; first program instructions to gather
memory information for memory on a user computer system having at
least one processor; second program instructions to select one or
more memory-related parameters; third program instructions to
calculate based on the gather and the select, a single bit error
value for the scalable predictive failure analysis through
calculations for each of the one or more memory-related parameters
that utilize the memory information; fourth program instructions to
set, based on the calculate, the single bit error value for the
user computer system; and wherein the first, second, third, and
fourth program instructions are stored on the computer readable
storage device for execution by the processor via the computer
readable memory.
15. The system of claim 14, further comprising fifth program
instructions to detect, subsequent to the set, one or more single
bit errors for the memory; and wherein the fifth program
instructions are stored on the computer readable storage device for
execution by the processor via the computer readable memory.
16. The system of claim 14, further comprising fifth program
instructions to compare, subsequent to the set, a counted number of
single bit errors for the memory to the value; and wherein the
fifth program instructions are stored on the computer readable
storage device for execution by the processor via the computer
readable memory.
17. The system of claim 14, further comprising fifth program
instructions to alert, subsequent to the set, if a counted number
of single bit errors for the memory at least equals the single bit
error value; and wherein the fifth program instructions are stored
on the computer readable storage device for execution by the
processor via the computer readable memory.
18. The system of claim 14, further comprising fifth program
instructions to return to sleep, subsequent to the setting, if a
counted number of single bit errors for the memory fails to exceed
the single bit error value; and wherein the fifth program
instructions are stored on the computer readable storage device for
execution by the processor via the computer readable memory.
19. The system of claim 14, further comprising fifth program
instructions to re-set, according to the method, the single bit
error value for the user computer system upon a memory replacement;
and wherein the fifth program instructions are stored on the
computer readable storage device for execution by the processor via
the computer readable memory.
20. The system of claim 14, further comprising fifth program
instructions to report the single bit error value and any results
from the method on a display associated with the user computer
system; and wherein the fifth program instructions are stored on
the computer readable storage device.
Description
BACKGROUND
[0001] Memory correctable errors are becoming a major issue in
today's modern personal computers, especially since supported
memory sizes often reach terabytes instead of gigabytes. To that
end, complex predictive failure analyses are desirous in order to
anticipate and prevent mild to catastrophic system failures
involving data loss and damage due to memory errors.
BRIEF SUMMARY
[0002] One embodiment provides a method for scalable predictive
failure analysis. Embodiments of the method may include gathering
memory information for memory on a user computer system having at
least one processor. Further, the method includes selecting one or
more memory-related parameters from a plurality. Further still, the
method includes calculating based on the gathering and the
selecting, a single bit error value for the scalable predictive
failure analysis through calculations for each of the one or more
memory-related parameters that utilize the memory information. Yet
further, the method includes setting, based on the calculating, the
single bit error value for the user computer system.
[0003] Another embodiment provides a computer program product for
scalable predictive failure analysis. The computer program product
includes a computer readable storage device. Further, the computer
program product includes first program instructions to gather
memory information for memory on a user computer system having at
least one processor. Further still, the computer program product
includes second program instructions to select one or more
memory-related parameters. Yet further, the computer program
product includes third program instructions to calculate based on
the gather and the select (i.e., performing the instructions to
gather and to select), a single bit error value for the scalable
predictive failure analysis through calculations for each of the
one or more memory-related parameters that utilize the memory
information. Still further, the computer program product includes
fourth program instructions to set, based on the calculate (i.e.,
performing the instructions to calculate), the single bit error
value for the user computer system, wherein the first, second,
third, and fourth program instructions are stored on the computer
readable storage device.
[0004] Another embodiment provides a system for scalable predictive
failure analysis. The system includes a processor, a computer
readable memory and a computer readable storage device. Further,
the system includes first program instructions to gather memory
information for memory on a user computer system having at least
one processor, wherein the memory may be the same, part of or
different from the computer readable memory. Further still, the
system includes second program instructions to select one or more
memory-related parameters. Yet further, the system includes third
program instructions to calculate, based on the gather and the
select, a single bit error value for the scalable predictive
failure analysis through calculations for each of the one or more
memory-related parameters that utilize the memory information.
Further still, the system includes fourth program instructions to
select, based on the calculate, the single bit error value for the
user computer system. The first, second, third, and fourth program
instructions of the system are stored on the computer readable
storage device for execution by the processor via the computer
readable memory.
BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS
[0005] So that the manner in which the above recited features,
advantages and objects of the present disclosure are attained and
can be understood in detail, a more particular description of this
disclosure, briefly summarized above, may be had by reference to
the embodiments thereof which are illustrated in the appended
drawings.
[0006] It is to be noted, however, that the appended drawings
illustrate only example embodiments of this disclosure, and,
therefore, are not to be considered limiting of its scope, for this
disclosure may admit or not to other equally effective
embodiments.
[0007] FIG. 1 depicts an example embodiment of a system for
scalable predictive failure analysis in accordance with this
disclosure.
[0008] FIG. 2 depicts a block diagram of an example embodiment of a
computer system suitable for scalable predictive failure analysis,
such as a user computer system.
[0009] FIG. 3 depicts an example embodiment of a flowchart to show
a method for scalable predictive failure analysis in accordance
with this disclosure.
[0010] FIG. 4 depicts another diagram of an example embodiment of a
computer system suitable for scalable predictive failure analysis,
such as a user computer system.
DETAILED DESCRIPTION
[0011] The following is a detailed description of example
embodiments with accompanying drawings. The example embodiments are
in such detail as to communicate the invention. However, the amount
of detail offered is not intended to limit the anticipated
variations of embodiments; on the contrary, the intention is to
cover all modifications, equivalents, and alternatives falling
within the spirit and scope of the present invention as defined by
the appended claims.
[0012] Generally speaking, systems, methods and media for scalable
predictive failure analysis (SPFA) for single bit errors (SBE) in
memory are disclosed. Embodiments include gathering, for a user
computer system, memory information, such as memory size,
synchronous dynamic random access memory (SDRAM) technology on the
module, module packaging, memory failure mode and vendor quality.
Calculation of the SBE value ensues through combining
calculation(s) for each of the selected memory-related parameters,
wherein the selecting optionally occurs subsequent or prior to the
gathering. The calculated SBE value is set and valid for the user
computer system until powering down or changing memory components
in the user computer system. Accordingly, the SBE value is scalable
because the value is determined for the particular user computer
system--not simply a fixed, generic value. Alerts, whether audible
or visible, may occur based on comparing counted SBEs to the
scalable SBE value. The alerts provide credible predictive failure
analysis to avert system memory failures while incorporating the
realities of the unique complexities for the particular user
computer system.
[0013] In general, the routines executed to implement the
embodiments of the invention may be part of a specific application,
component, program, module, object, or sequence of instructions.
The computer program of the present invention typically is
comprised of a multitude of instructions that will be translated by
the native computer into a machine-readable format and hence
executable instructions. Also, programs are comprised of variables
and data structures that either reside locally to the program or
are found in memory or on storage devices. In addition, various
programs described herein may be identified based upon the
application for which they are implemented in a specific embodiment
of the invention. However, it should be appreciated that any
particular program nomenclature herein is used merely for
convenience, and thus the invention should not be limited to use
solely in any specific application identified and/or implied by
such nomenclature.
[0014] While specific embodiments will be described below with
reference to particular configurations of hardware and/or software,
those of skill in the art will realize that embodiments of the
present invention may advantageously be implemented with other
substantially equivalent hardware, software systems, manual
operations, or any combination of any or all of these. The
invention can take the form of an entirely hardware embodiment, an
entirely software embodiment or an embodiment containing both
hardware and software elements. In a preferred embodiment, the
invention is implemented in software, which includes but is not
limited to firmware, resident software, microcode, etc. Moreover,
embodiments of the invention may also be implemented via parallel
processing using a parallel computing architecture, such as one
using multiple discrete systems (e.g., plurality of computers,
etc.) or an internal multiprocessing architecture (e.g., a single
system with parallel processing capabilities).
[0015] Any combination of one or more computer readable medium(s)
may be utilized. The computer readable medium may be a computer
readable signal medium or a computer readable storage medium. A
computer readable storage medium may be, for example, but not
limited to, an electronic, magnetic, optical, electromagnetic,
infrared, or semiconductor system, apparatus, or device, or any
suitable combination of the foregoing. More specific examples (a
non-exhaustive list) of the computer readable storage medium would
include the following: an electrical connection having one or more
wires, a portable computer diskette, a hard disk, a random access
memory (RAM), a read-only memory (ROM), an erasable programmable
read-only memory (EPROM or Flash memory), an optical fiber, a
portable compact disc read-only memory (CD-ROM), an optical storage
device, a magnetic storage device, or any suitable combination of
the foregoing. In the context of this document, a computer readable
storage medium may be any tangible medium that can contain, or
store a program for use by or in connection with an instruction
execution system, apparatus, or device.
[0016] A computer readable signal medium may include a propagated
data signal with computer readable program code embodied therein,
for example, in baseband or as part of a carrier wave. Such a
propagated signal may take any of a variety of forms, including,
but not limited to, electro-magnetic, optical, or any suitable
combination thereof. A computer readable signal medium may be any
computer readable medium that is not a computer readable storage
medium and that can communicate, propagate, or transport a program
for use by or in connection with an instruction execution system,
apparatus, or device.
[0017] Program code embodied on a computer readable medium may be
transmitted using any appropriate medium, including but not limited
to wireless, wireline, optical fiber cable, RF, etc., or any
suitable combination of the foregoing.
[0018] Computer program code for carrying out operations for
aspects of the present invention may be written in any combination
of one or more programming languages, including an object oriented
programming language such as Java, Smalltalk, C++ or the like and
conventional procedural programming languages, such as the "C"
programming language or similar programming languages. The program
code may execute entirely on the user's computer, partly on the
user's computer, as a stand-alone software package, partly on the
user's computer and partly on a remote computer or entirely on the
remote computer or server. In the latter scenario, the remote
computer may be connected to the user's computer through any type
of network, including a local area network (LAN) or a wide area
network (WAN), or the connection may be made to an external
computer (for example, through the Internet using an Internet
Service Provider).
[0019] Aspects of embodiments of the invention described herein may
be stored or distributed on computer-readable medium as well as
distributed electronically over the Internet or over other
networks, including wireless networks. Data structures and
transmission of data (including wireless transmission) particular
to aspects of the invention are also encompassed within the scope
of the invention. Furthermore, the invention can take the form of a
computer program product accessible from a computer-readable medium
providing program code for use by or in connection with a computer
or any instruction execution system. For the purposes of this
description, a computer-usable or computer readable medium can be
any apparatus that can contain, store, communicate, propagate, or
transport the program for use by or in connection with the
instruction execution system, apparatus, or device. The medium may
be an electronic, magnetic, optical, electromagnetic, infrared, or
semiconductor system (or apparatus or device) or a propagation
medium. Examples of a computer-readable medium include a
semiconductor or solid state memory, magnetic tape, a removable
computer diskette, a random access memory (RAM), a read-only memory
(ROM), a rigid magnetic disk and an optical disk. Current examples
of optical disks include compact disk-read only memory (CD-ROM),
compact disk-read/write (CD-R/W) and DVD.
[0020] Each software program described herein may be operated on
any type of data processing system, such as a personal computer,
server, etc. A data processing system suitable for storing and/or
executing program code may include at least one processor coupled
directly or indirectly to memory elements through a system bus. The
memory elements may include local memory employed during execution
of the program code, bulk storage, and cache memories which provide
temporary storage of at least some program code in order to reduce
the number of times code must be retrieved from bulk storage during
execution. Input/output (I/O) devices (including but not limited to
keyboards, displays, pointing devices, etc.) may be coupled to the
system either directly or through intervening I/O controllers.
Network adapters may also be coupled to the system to enable the
data processing system to become coupled to other data processing
systems or remote printers or storage devices through intervening
private or public networks, including wireless networks. Modems,
cable modems and Ethernet cards are just a few of the currently
available types of network adapters.
[0021] Turning now to the drawings, FIG. 1 depicts a user computer
system 100 having a collection of cooperating, algorithmic modules
for SPFA calculations. The enabling logic for modules 110, 115,
120, 130, 140, 145 is reduced to software and/or hardware. The
modules 110, 115, 120, 130, 140, 145, are located, for example,
within the operating system of a user computer system 100. In
alternative example embodiments, any of the modules 110, 115, 120,
130, 140, 145 may be located remotely but in network communication
with the user computer system 100. Example of remote location may
have some of the modules 110, 115, 120, 130, 140, 145 located on
other computer systems, including manipulations and calculations of
the generated data being the subject of a Web service.
[0022] Regardless of individual logic location, the system 100 has
accessible logic to gather memory information for memory 105 on the
user computer system 100. The gathering module 110 gathers memory
information, memory size, synchronous dynamic random access memory
(SDRAM) technology on the module, module packaging, memory failure
mode and vendor quality for memory 105 under test on the particular
user computer system 100. For example, memory information for
memory 105 could be a module size of 2 GB for a single-rank dual
in-line module (DIMM). Below, further discussion of memory
information occurs in combination with discussion of selected
memory-based parameters.
[0023] The system 100 also includes logic, denominated as a
configuration module 120 in FIG. 1, for selecting one or more
memory-related parameters from a plurality of such parameters. A
user or administrator, for example, of the user computer system 100
selects which memory-related parameters to include in the SPFA
calculations. The selecting may occur through textual entry, radial
selection, or other method for selecting options through a display
coupled to the user computer system 100. The selected
memory-related parameters, themselves, directly correlate to memory
information. That is, memory information regarding memory size
correlates to the memory-related parameter for memory size, memory
information regarding module packaging correlates to the
memory-related parameter for module packaging, and so forth.
[0024] In communication with both the gathering and configuration
modules 110, 120, the calculation module 130 includes logic to
calculate a combination of the selected memory-related parameters.
The SPFA uses the selected number of memory-related parameters,
which one considers critical to maintain a functioning memory
subsystem, in order to calculate the SBE value. The setting module
140 then sets the calculated SBE value for the system 100.
Evaluation of exemplary memory-related parameters and combination
of the same for calculation of the SBE value now ensues.
[0025] Memory module size is a memory-related parameter for
possible inclusion in the SPFA calculation for the memory 105. For
such, the following exemplary scale is provided for a correctable
SBE value based on the actual capacity of each module or
module-pairs installed in the system:
TABLE-US-00001 TABLE 1 Module Size Scale Factor (n) PFA threshold
in time window 2 GB 1 x 4 GB 2 2x 8 GB 4 4x 16 GB 8 8x 32 GB 16
16x
Referring to Table 1, and assuming x=256 SBE for a baseline PFA
count within a 24-hour window, then a larger memory 105 DIMM
logically permits more SBEs before meeting or exceeding a set SBE
value, i.e., a threshold. For example, the memory-based parameter
for memory module size would allow 256 SBEs for a 2 GB DIMM, 512
SBEs for a 4 GB DIMM, 1024 SBEs for a 8 GB DIMM, 2048 SBEs for a 16
GB DIMM, and 4096 SBEs for a 32 GB DIMM before memory failure
realized by visual and/or audio alert through use of the detection
and comparison modules 115, 145.
[0026] In addition to memory module size, another possibly selected
memory-related parameter for inclusion in the calculation of the
SBE value is SDRAM technology on the memory module 105. For such,
the following exemplary scale is provided:
TABLE-US-00002 TABLE 2 Number of Rank Scale Factor (m) PFA
threshold in time window 1 (Single) 1 y 2 (Dual) 1.2 y/1.2 4 (Quad)
1.6 y/1.6
Referring to Table 2, and assuming y=1024 for a baseline PFA count
within a 24-hour window, memory 105 DIMM with a lesser rank permits
a higher SBE value. For example, the memory-based parameter for
SDRAM technology would allow 1024 SBEs for a single-rank DIMM, 823
SBEs for a dual-rank DIMM, and 640 SBEs for a quad-rank DIMM before
alerting the user or another system in network communication with
the system 100 of memory failure of a module or other memory device
needing repair or replacement, whereupon the latter at least
suggests a new SBE value should be re-set by re-calculation.
[0027] Still another memory-related parameter for inclusion in the
calculation of the SBE value is module packaging of the memory 105
on the particular user computer system 100. For such, the following
exemplary scale is provided:
TABLE-US-00003 TABLE 3 SDRAM Data Width Scale Factor (k) PFA
threshold in time window x8 (with no IBM .RTM. 1 z Chipkill .TM.
tech- nology support) x8 (with IBM .RTM. 2 2z Chipkill .TM.
support) x4 (with IBM .RTM. 2.5 2.5z Chipkill .TM. support)
IBM.RTM. Chipkill.TM. is an advanced error checking and correcting
(ECC) computer technology that has the ability to correct multi-bit
memory errors on a single SDRAM. Referring to Table 3, and assuming
z=256 for a baseline PFA count within a 24-hour window, memory 105
DIMM with additional advanced ECC protection, i.e., Chipkill.TM.,
affords a higher SBE value due to this individual PFA metric. For
example, the memory-based parameter regarding Chipkill.TM. would
allow 256 SBEs for x8 DIMM with no Chipkill.TM., 512 SBEs for x8
DIMM with Chipkill.TM. is, and 640 SBEs for x4 DIMM with
Chipkill.TM.
[0028] Yet another memory-related parameter for optional inclusion
in the calculation of the SBE value is memory failure mode of the
memory 105 on the particular user computer system 100. Here, this
memory-related parameter regards single count reduction for a
single memory address. That is, a correctable SBE that occurs
repeatedly at the same memory address on memory 105 DIMM is counted
as one failure instead of counting the repeats as multiple
failures.
[0029] Another example of a memory-related parameter for optional
inclusion in the calculation of the SBE value is vendor quality of
the memory 105 on the particular user computer system 100. For
such, the following exemplary scale is provided:
TABLE-US-00004 TABLE 4 Number of Rank Scale Factor (m) Vendor A,
Product 1 1 Vendor A, Product 2 0.8 Vendor B, Product 1 1 Vendor C,
Product 1 0.5
Table 4 represents a memory vendor quality/reliability matrix on a
per product basis. A memory vendor can have multiple products, each
one could have a different quality/reliability rating. The quality
scale rating, such as Table 4, may be used for calculating the SBE
value. A memory 105 DIMM from a lower quality score supplier yields
a lower PFA threshold value for this memory-related parameter. A
lower quality score would require replacement or repair sooner as
compared to a higher quality score provided all other contributing
PFA memory-related parameters to the SBE value are constant.
[0030] For calculation purposes, combination of the selected,
memory-related parameters may be through simple addition,
multiplication, a mixture of the two, or any other combination
method so as to yield a reliable, relative, and meaningful SBE
value for SFPA. For example, the foregoing five memory-related
parameters may calculate an SBE value according to:
PFA.sub.(sum)=PFA.sub.(a)+PFA.sub.(b)+PFA.sub.(c)+PFA.sub.(d)+PFA.sub.(a)-
. The value of each memory-related PFA threshold and time window(s)
should be defined by the subject matter expert on the system design
team. That is, the illustrative tables provided herein are neither
the sole nor necessarily appropriate values to use because the same
are solely intended as examples. Whether a hardware built-in memory
test, power-on memory test (i.e., post-power on self-test), system
in run time, or memory diagnostic test, this disclosure enables a
selectable and scalable PFA for memory 105 that thwarts
consequences of memory failures for a particular user computer
system 100.
[0031] FIG. 2 depicts a block diagram of one embodiment of a
computer system 200 suitable for use in scalable predictive failure
analysis. Other possibilities for the computer system 200 are
possible, including a computer having capabilities other than those
ascribed herein and possibly beyond those capabilities, and they
may, in other embodiments, be any combination of processing devices
such as workstations, servers, mainframe computers, notebook or
laptop computers, desktop computers, PDAs, mobile phones, wireless
devices, set-top boxes, or the like. At least certain of the
components of computer system 200 may be mounted on a multi-layer
planar or motherboard (which may itself be mounted on the chassis)
to provide a means for electrically interconnecting the components
of the computer system 200.
[0032] In the depicted embodiment, the computer system 200 includes
a processor 202, storage 204, memory 206, a user interface adapter
208, and a display adapter 210 connected to a bus 212 or other
interconnect. The bus 212 facilitates communication between the
processor 202 and other components of the computer system 200, as
well as communication between components. Processor 202 may include
one or more system central processing units (CPUs) or processors to
execute instructions, such as an IBM.RTM. PowerPC.RTM. processor,
an Intel.RTM. Pentium.RTM. processor, an Advanced Micro Devices,
Inc. processor or any other suitable processor. IBM and PowerPC are
trademarks of International Business Machines Corporation,
registered in many jurisdictions worldwide. Intel and Pentium are
trademarks or registered trademarks of Intel Corporation or its
subsidiaries in the United States or other countries. The processor
202 may utilize storage 204, which may be non-volatile storage such
as one or more hard drives, tape drives, diskette drives, CD-ROM
drive, DVD-ROM drive, or the like. The processor 202 may also be
connected to memory 206 via bus 212, such as via a memory
controller hub (MCH). System memory 206 may include volatile memory
such as random access memory (RAM) or double data rate (DDR)
synchronous dynamic random access memory (SDRAM). In the disclosed
systems, for example, a processor 202 may execute instructions to
perform functions, such as by gathering memory information and
selecting memory-related parameters for inclusion for SPFA
calculations. Information before, during or after calculations may
temporarily or permanently be stored in storage 204 or memory
206.
[0033] Turning now to FIG. 3, another aspect of scalable predictive
failure analysis for memory associated with a particular user
computer system is disclosed. At point is an example embodiment of
a flowchart 300 for improved predictive failure analysis after
having set the SBE value for the user computer system. Flowchart
300 is for a system, such as system 100, notably involving the
logic associated with the detection and comparison modules 115, 145
of FIG. 1.
[0034] Returning to FIG. 3, flowchart 300 starts 305 by the system
detecting 310 SBEs on a DIMM via a system management interrupt
(SMI). When the user computer system boots, the BIOS or other BIOS
implementation, such as Unified Extensible Firmware Interface
(UEFI), interrupt factors are established. Upon the memory
controller detecting 310 a SBE, SMI is triggered to notify wake up
BIOS to check 320 the memory-related parameters and SBE counts
accumulated so far. Decision block 330 queries whether the SBE
count value is at least equal to set SBE value. If yes 340, then
the flowchart 300 issues 350 an SPFA alert and optionally provides
repair actions, such as displaying a visual notice to replace the
specific faulty memory module or suggests reparative procedures. If
no 335, then the flowchart 300 returns to sleep, at least until the
next SBE is counted, because comparison of the counted SBEs for the
particular user computer system is less than the set SBE value.
Subsequent to the issuing 350 the alert with optional actions or no
335, the flowchart ends 375.
[0035] FIG. 4 illustrates information handling system 401 which is
a simplified example of a computer system, such as shown in FIG. 2
for use in scalable predictive failure analysis, and capable of
performing the operations described herein. Computer system 401
includes processor 400 which is coupled to host bus 405. A level
two (L2) cache memory 410 is also coupled to the host bus 405.
Host-to-PCI bridge 415 is coupled to main memory 420, includes
cache memory and main memory control functions, and provides bus
control to handle transfers among PCI bus 425, processor 400, L2
cache 410, main memory 420, and host bus 405. As an alternative to
the foregoing, the level 2 cache 410, memory controller and the
north bridge may be integrated into the CPU; then, the system main
memory is connected to the memory controller, which is inside the
CPU. PCI bus 425 provides an interface for a variety of devices
including, for example, LAN card 430. PCI-to-ISA bridge 435
provides bus control to handle transfers between PCI bus 425 and
ISA bus 440, universal serial bus (USB) functionality 445, IDE
device functionality 450, power management functionality 455, and
can include other functional elements not shown, such as a
real-time clock (RTC), DMA control, interrupt support, and system
management bus support. Peripheral devices and input/output (I/O)
devices can be attached to various interfaces 460 (e.g., parallel
interface 462, serial interface 464, infrared (IR) interface 466,
keyboard interface 468, mouse interface 470, fixed disk (HDD) 472,
removable storage device 474) coupled to ISA bus 440.
Alternatively, many I/O devices can be accommodated by a super I/O
controller (not shown) attached to ISA bus 440.
[0036] BIOS 480 is coupled to ISA bus 440, and incorporates the
necessary processor executable code for a variety of low-level
system functions and system boot functions. BIOS 480 can be stored
in any computer readable medium, including magnetic storage media,
optical storage media, flash memory, random access memory, read
only memory, and communications media conveying signals encoding
the instructions (e.g., signals from a network). In order to attach
computer system 401 to another computer system to copy files over a
network, LAN card 430 is coupled to PCI bus 425 and to PCI-to-ISA
bridge 435. Similarly, to connect computer system 401 to an ISP to
connect to the Internet using a telephone line connection, modem
475 is connected to serial port 464 and PCI-to-ISA Bridge 435.
[0037] While the computer systems described in FIGS. 2 and 4 are
capable of executing the disclosure described herein, these
computer systems are simply examples of computer systems and user
computer systems. Those skilled in the art will appreciate that
many other computer system designs are capable of performing the
disclosure described herein.
[0038] Another embodiment of the disclosure is implemented as a
program product for use within a device such as, for example, those
systems and methods depicted in FIGS. 1 and 3. The program(s) of
the program product defines functions of the embodiments (including
the methods described herein) and can be contained on a variety of
media including but not limited to: (i) information permanently
stored on non-volatile storage-type accessible media (e.g., write
and readable as well as read-only memory devices within a computer
such as ROM, flash memory, CD-ROM disks readable by a CD-ROM
drive); (ii) alterable information stored on writable storage-type
accessible media (e.g., readable floppy disks within a diskette
drive or hard-disk drive); and (iii) information conveyed to a
computer through a network. The latter embodiment specifically
includes information downloaded onto either permanent or even sheer
momentary storage-type accessible media from the World Wide Web, an
internet, and/or other networks, such as those known, discussed
and/or explicitly referred to herein. Such data-bearing media, when
carrying computer-readable instructions that direct the functions
of the present disclosure, represent embodiments of the present
disclosure.
[0039] In general, the routines executed to implement the
embodiments of this disclosure, may be part of an operating system
or a specific application, component, program, module, object, or
sequence of instructions. The computer program of this disclosure
typically comprises a multitude of instructions that will be
translated by the native computer into a machine-readable format
and hence executable instructions. Also, programs are comprised of
variables and data structures that either reside locally to the
program or are found in memory or on storage devices. In addition,
various programs described hereinafter may be identified based upon
the application for which they are implemented in a specific
embodiment of this disclosure. However, it should be appreciated
that any particular program nomenclature that follows is used
merely for convenience, and thus this disclosure should not be
limited to use solely in any specific application identified and/or
implied by such nomenclature.
[0040] While the foregoing is directed to example embodiments of
this disclosure, other and further embodiments of this disclosure
may be devised without departing from the basic scope thereof, and
the scope thereof is determined by the claims that follow.
* * * * *