U.S. patent application number 13/420231 was filed with the patent office on 2012-09-06 for method for simultaneous calibration of mass spectra and identification of peptides in proteomic analysis.
This patent application is currently assigned to CEDARS-SINAI MEDICAL CENTER. Invention is credited to Robert A. Grothe, JR..
Application Number | 20120223224 13/420231 |
Document ID | / |
Family ID | 37482329 |
Filed Date | 2012-09-06 |
United States Patent
Application |
20120223224 |
Kind Code |
A1 |
Grothe, JR.; Robert A. |
September 6, 2012 |
METHOD FOR SIMULTANEOUS CALIBRATION OF MASS SPECTRA AND
IDENTIFICATION OF PEPTIDES IN PROTEOMIC ANALYSIS
Abstract
The invention relates to a mass spectrometry calibration system
that may be performed in real-time using the information contained
within a sample without the addition of specific calibrants. When
applied to a sample, such as a proteomic sample, the calibration
system may identify the exact masses of peptides in the sample. The
system involves the use of mathematical algorithms that iteratively
estimate the error in the measurement and update the calibration
parameters accordingly; thereby resulting in peptide mass
identification.
Inventors: |
Grothe, JR.; Robert A.;
(Burlingame, CA) |
Assignee: |
CEDARS-SINAI MEDICAL CENTER
Los Angeles
CA
|
Family ID: |
37482329 |
Appl. No.: |
13/420231 |
Filed: |
March 14, 2012 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
11914588 |
Nov 16, 2007 |
8158930 |
|
|
PCT/US06/21321 |
May 31, 2006 |
|
|
|
13420231 |
|
|
|
|
60686684 |
Jun 2, 2005 |
|
|
|
Current U.S.
Class: |
250/282 ;
250/281 |
Current CPC
Class: |
H01J 49/0009
20130101 |
Class at
Publication: |
250/282 ;
250/281 |
International
Class: |
B01D 59/44 20060101
B01D059/44 |
Claims
1. A method of producing a calibrated mass spectrum, comprising: a)
providing a sample comprising an elemental composition; b)
subjecting the sample to mass spectrometry whereby a mass
spectrometry output is obtained; c) providing input parameters; d)
converting the mass spectrometry output to mass values using the
input parameters; e) estimating error and elemental composition
probabilities based on the mass values; f) updating the input
parameters based on the estimated error and elemental composition
probabilities; g) applying the updated input parameters to the mass
spectrometry output to produce updated mass values; and h)
repeating steps d through g until convergence is reached, whereby a
calibrated mass spectrum is produced.
2. The method of claim 1, wherein the input parameters are selected
from the group consisting of a mass database, initial calibration
parameters, an initial error estimate, updated calibration
parameters, an updated error estimate, and combinations
thereof.
3. The method of claim 1, wherein the mass spectrometry is Fourier
transform mass spectrometry.
4. The method of claim 1, wherein the mass spectrometry output
comprises cyclotron frequencies.
5. The method of claim 1, wherein the elemental composition
probabilities are peptide probabilities.
6. The method of claim 1, wherein the sample is selected from the
group consisting of blood, plasma, serum, spinal fluid, urine,
sweat, saliva, tears, breast aspirate, prostate fluid, seminal
fluid, vaginal fluid, stool, cervical scraping, cytes, amniotic
fluid, intraocular fluid, mucous, moisture in breath, animal
tissue, cell lysates, tumor tissue, hair, skin, buccal scrapings,
nails, bone marrow, cartilage, prions, bone powder, ear wax, and
combinations thereof.
7. The method of claim 1, wherein the elemental composition
comprises at least one peptide.
8. The method of claim 1, wherein the sample is selected from the
group consisting of hydrocarbons, petroleum products, nucleotides,
combinatorial samples, polymeric samples, and combinations
thereof.
9. The method of claim 1, wherein the sample is a petroleum
product.
10. The method of claim 1, wherein the estimating the error and
elemental composition probabilities comprises using an Expectation
Minimization algorithm.
11. The method of claim 1, wherein the estimating the error and
elemental composition probabilities comprises using a spline
algorithm.
12. A mass spectrometry calibration system, comprising: a) a mass
spectrometry device to analyze a sample and produce a mass
spectrometry output; and b) calibration software configured to: i)
receive input parameters, ii) convert the mass spectrometry output
to mass values using the input parameters, iii) estimate error and
elemental composition probabilities based on the mass values, iv)
update input parameters based on the estimated error and elemental
composition probabilities, v) apply the updated input parameters to
the mass spectrometry output to produce updated mass values, and
vi) repeat steps ii through v until convergence is reached, whereby
a calibrated mass spectrum is produced.
13. The system of claim 12, wherein the input parameters are
selected from the group consisting of a mass database, initial
calibration parameters, an initial error estimate, updated
calibration parameters, an updated error estimate, and combinations
thereof.
14. The system of claim 12, wherein the mass spectrometry device is
a Fourier transform mass spectrometer.
15. The system of claim 12, wherein the mass spectrometry output
comprises cyclotron frequencies.
16. The system of claim 12, wherein the elemental composition
probabilities are peptide probabilities.
17. The system of claim 12, wherein the sample is selected from the
group consisting of blood, plasma, serum, spinal fluid, urine,
sweat, saliva, tears, breast aspirate, prostate fluid, seminal
fluid, vaginal fluid, stool, cervical scraping, cytes, amniotic
fluid, intraocular fluid, mucous, moisture in breath, animal
tissue, cell lysates, tumor tissue, hair, skin, buccal scrapings,
nails, bone marrow, cartilage, prions, bone powder, ear wax, and
combinations thereof.
18. The system of claim 12, wherein the sample comprises at least
one peptide.
19. The system of claim 12, wherein the sample is selected from the
group consisting of hydrocarbons, petroleum products, nucleotides,
combinatorial samples, polymeric samples, and combinations
thereof.
20. The system of claim 12, wherein the sample is a petroleum
product.
21. The system of claim 12, wherein the software is configured to
estimate the error and the elemental composition probabilities
using an Expectation Minimization algorithm.
22. The system of claim 12, wherein the software is configured to
estimate the error and the elemental composition probabilities
using a spline algorithm.
23. A computer-readable medium having computer-executable
instructions that when executed perform a method, the method
comprising: a) converting a mass spectrometry output to mass values
using input parameters; b) estimating error and elemental
composition probabilities based on the mass values; c) updating the
input parameters based on the estimated error and elemental
composition probabilities; d) applying the updated input parameters
to the mass spectrometry output to produce updated mass values; and
e) repeating steps b through d until convergence is reached,
whereby a calibrated mass spectrum is produced.
24. The computer-readable medium of claim 23, wherein the input
parameters are selected from the group consisting of a mass
database, initial calibration parameters, an initial error
estimate, and combinations thereof.
25. The computer-readable medium of claim 23, wherein the
estimating the error and the elemental composition probabilities
uses an Expectation Minimization algorithm.
26. The computer-readable medium of claim 23, wherein the
estimating the error and the elemental composition probabilities
uses a spline algorithm.
27. The computer-readable medium of claim 23, wherein the mass
spectrometry output is produced by a Fourier transform mass
spectrometer.
28. The computer-readable medium of claim 23, wherein the mass
spectrometry output comprises cyclotron frequencies.
29. The computer-readable medium of claim 23, wherein the elemental
composition probabilities are peptide probabilities.
Description
FIELD OF INVENTION
[0001] The invention relates to the calibration of mass spectra
obtained in connection with proteomic analysis and to the
identification of peptides in connection with the same.
BACKGROUND OF THE INVENTION
[0002] In conventional ion cyclotron resonance ("ICR") mass
spectrometers, such as those typically used in connection with
Fourier Transform Mass Spectrometry ("FTMS"), charged particles are
directed into a magnetic field such that the mass to charge ratio
(M/Z) of the particles can be measured. In one application of this
technology, as described in U.S. Pat. No. 4,959,543, which is
incorporated by reference herein in its entirety, charged particles
are subjected to a high voltage pulse and caused to be accelerated
to larger radii of gyration relative to the particles' natural
radii of gyration. Once excited in this fashion, the charged
particles move in circular orbits at frequencies given by the
cyclotron equation, .omega.=B/(M/Z) (where B is the magnetic field
strength and .omega. is the angular frequency). The excited
cyclotron motions induce transient signals on a pair of parallel
electrodes positioned inside the magnet; the transient signals are
a measure of the cyclotron frequency of the particles. In fact, the
transient signals are actually a composite of the cyclotron
frequencies of all of the ions present in the magnet. By
implementing certain Fourier transform mathematics (e.g., a Fast
Fourier Transform, or "FFT," algorithm to extract the frequency and
amplitude for each frequency component), these transient signals
are converted into a frequency spectrum (i.e., frequency peaks
corresponding to each ionic species in the instrument). In this
first order model, measured frequencies are converted into M/Z
through calibration values when the magnetic field strength (B) is
known. There are a number of commercially available products that
implement the FTMS technique; for example, Thermo, Bruker, and
IonSpec all produce FTMS instruments that generally function in
this manner.
[0003] As noted above, FTMS exploits the property that an ion of
mass M and charge Z placed in a magnetic field of strength B
undergoes orbital motion with angular frequency B/(M/Z). In a mass
spectrometer, ions must be trapped by an external electrostatic
field producing a slight shift in the cyclotron frequency given
above. Additional frequency shifts are produced by the
electrostatic field produced by the population of ions in the
instrument, known as the "space-charge effect" (Gorshov. et al.,
Amer. Society Mass Spectrom. 4:855-868, 1991). Variations in the
frequency observed for a particular ion (with fixed M/Z) can be due
to fluctuations in the strength of the magnetic field, trapping
voltage, or the "space-charge" effect. Of these three factors, the
space-charge effect is believed to be the most difficult to control
and to model. Variations in the space-charge effect are significant
in liquid-chromatography mass spectrometry (LCMS), the standard
technique used in analysis of proteomic samples. These variations
are best corrected by active real-time calibration.
[0004] Efforts to extract accurate mass information from FTMS by
mass calibration have been previously investigated. See L. K. Zhang
et al., Mass Spectrometry Reviews, 24:286-309 (2005). Previous
methods of FTMS mass calibration include the use of "internal"
calibrants, and/or the use of "external" calibrants. In external,
or "off-line" calibration, a set of standard molecules of known
mass are measured by the instrument separately from the
experimental sample. The differences between the measured and true
masses are known with certainty, and the calibration parameters are
adjusted to minimize these differences. The primary limitation of
external calibration is that the calibration parameters do not
remain constant from one scan to the next, largely due to the space
charge effect. See E. B. Ledford, Jr. et al., Anal. Chem.,
56:2744-2748 (1984).
[0005] Internal or "on-line" calibration involves the infusion of
standard molecules of known mass into an experimental sample, or
directly into the mass spectrometer in parallel with the sample,
and measuring the mass of the standards and experimental sample in
the same scan. However, the signal from the calibrant molecules may
obscure a signal arising from the sample through "ion suppression".
Ion suppression occurs because the total ion capacity of an FTMS
instrument is generally fixed. Therefore, the calibrant molecules
are analyzed at the expense of analyte ions, reducing the measured
analyte signal.
[0006] A number of methods have attempted to perform calibration
without added calibrants in a process called "direct calibration".
One approach (described in M. Mann, Proceedings of the 43.sup.rd
ASMS Conference on Mass Spectrometry and Allied Topics, Atlanta,
1995) is based upon Mann's insight that peptide masses are confined
to clusters of values spaced roughly 1 Dalton (10-100 ppm) apart
throughout the spectrum (Wool et al., Proteomics, 2:1365-1373,
2002). While this method may be useful for low mass accuracy mass
spectrometers (e.g., MALDI-TOF), it is not suitable for use with
higher mass-accuracy systems such as FTMS. In these methods,
peptides are either matched to a distribution (not identified) or
only peptides that are known to be in the sample a priori are
identified.
[0007] Another direct calibration method uses the known mass
spacings between different charge states of the same molecule as
calibration constraints (Bruce et al., JASMS 11:416-421, 2000).
However, this method is unable to match the accuracy of FTMS
frequency measurements. Yanofsky et al. disclose a method for an
internal recalibration of an FTICR-MS analysis (Anal. Chem
77:7246-7254, 2005). However, this method is a limited approach
that uses the knowledge of a particular class of proteins, and
requires partial knowledge of the sample components. Direct
calibration methods have also been used to identify components in
wine (Cooper, H. J., and Marshall, A. G., J. Agric. Food Chem,
49:5710-5718), and petroleum products (Marshall A. G. et al., Acc.
Chem. Res. 37:53-59, 2004). These methods, however, also require a
priori knowledge of the masses of some of the species in the
sample.
[0008] There is a need in the art for improved calibration and
peptide identification techniques in connection with mass
spectrometry that obviate at least some of the aforementioned
limitations of currently available technology.
SUMMARY OF THE INVENTION
[0009] The invention disclosed herein relates to The invention
disclosed herein relates to systems and methods useful for
producing calibrated mass spectrometry spectra using components of
a mass spectrometry sample as calibrants.
[0010] Embodiments of the present relate to methods of producing a
calibrated mass spectrum, comprising: providing a sample comprising
an elemental composition, subjecting the sample to mass
spectrometry whereby a mass spectrometry output is obtained,
providing input parameters, converting the mass spectrometry output
to mass values using the input parameters, estimating error and
elemental composition probabilities based on the mass values,
updating the input parameters based on the estimated error and
elemental composition probabilities, applying the updated input
parameters to the mass spectrometry output to produce updated mass
values, and repeating several of these steps until convergence is
reached, whereby a calibrated mass spectrum is produced.
[0011] Further embodiments of the present invention relate to
methods wherein the input parameters are selected from the group
consisting of a mass database, initial calibration parameters, an
initial error estimate, updated calibration parameters, an updated
error estimate, and combinations thereof.
[0012] Still further embodiments of the present invention relate to
methods wherein the mass spectrometry is Fourier transform mass
spectrometry.
[0013] Other embodiments of the present invention relate to methods
wherein the mass spectrometry output comprises cyclotron
frequencies, and wherein the elemental composition probabilities
are peptide probabilities.
[0014] Additional embodiments of the present invention relate to
methods wherein the sample is selected from the group consisting of
blood, plasma, serum, spinal fluid, urine, sweat, saliva, tears,
breast aspirate, prostate fluid, seminal fluid, vaginal fluid,
stool, cervical scraping, cytes, amniotic fluid, intraocular fluid,
mucous, moisture in breath, animal tissue, cell lysates, tumor
tissue, hair, skin, buccal scrapings, nails, bone marrow,
cartilage, prions, bone powder, ear wax, and combinations
thereof.
[0015] Alternative embodiments of the present invention relate to
methods wherein the elemental composition comprises at least one
peptide.
[0016] Other embodiments of the present invention relate to methods
wherein the sample is selected from the group consisting of
hydrocarbons, petroleum products, nucleotides, combinatorial
samples, polymeric samples, and combinations thereof.
[0017] Other embodiments of the present invention relate to methods
wherein the sample is a petroleum product.
[0018] Other embodiments of the present invention relate to methods
wherein the estimating the error and elemental composition
probabilities comprises using an Expectation Minimization algorithm
and/or using a spline algorithm.
[0019] Embodiments of the present invention relate to mass
spectrometry calibration systems, comprising a mass spectrometry
device to analyze a sample and produce a mass spectrometry output,
and calibration software configured to receive input parameters,
convert the mass spectrometry output to mass values using the input
parameters, estimate error and elemental composition probabilities
based on the mass values, update input parameters based on the
estimated error and elemental composition probabilities, apply the
updated input parameters to the mass spectrometry output to produce
updated mass values, and repeat several of these steps until
convergence is reached, whereby a calibrated mass spectrum is
produced.
[0020] Further embodiments of the present invention relate to mass
spectrometry calibration systems wherein the input parameters are
selected from the group consisting of a mass database, initial
calibration parameters, an initial error estimate, updated
calibration parameters, an updated error estimate, and combinations
thereof.
[0021] Still further embodiments of the present invention relate to
mass spectrometry calibration systems wherein the mass spectrometry
device is a Fourier transform mass spectrometer.
[0022] Other embodiments of the present invention relate to mass
spectrometry calibration systems wherein the mass spectrometry
output comprises cyclotron frequencies, and wherein the elemental
composition probabilities are peptide probabilities.
[0023] Further embodiments of the present invention relate to mass
spectrometry calibration systems wherein the sample is selected
from the group consisting of blood, plasma, serum, spinal fluid,
urine, sweat, saliva, tears, breast aspirate, prostate fluid,
seminal fluid, vaginal fluid, stool, cervical scraping, cytes,
amniotic fluid, intraocular fluid, mucous, moisture in breath,
animal tissue, cell lysates, tumor tissue, hair, skin, buccal
scrapings, nails, bone marrow, cartilage, prions, bone powder, ear
wax, and combinations thereof.
[0024] Still further embodiments of the present invention relate to
mass spectrometry calibration systems wherein the sample comprises
at least one peptide.
[0025] Additional embodiments of the present invention relate to
mass spectrometry calibration systems wherein the sample is
selected from the group consisting of hydrocarbons, petroleum
products, nucleotides, combinatorial samples, polymeric samples,
and combinations thereof.
[0026] Other embodiments of the present invention relate to mass
spectrometry calibration systems wherein the sample is a petroleum
product.
[0027] Further embodiments of the present invention relate to mass
spectrometry calibration systems wherein the software is configured
to estimate the error and the elemental composition probabilities
using an Expectation Minimization algorithm, and/or using a spline
algorithm.
[0028] Embodiments of the present invention also relate to a
computer-readable medium having computer-executable instructions
that when executed perform a method, the method comprising
converting a mass spectrometry output to mass values using input
parameters, estimating error and elemental composition
probabilities based on the mass values, updating the input
parameters based on the estimated error and elemental composition
probabilities, applying the updated input parameters to the mass
spectrometry output to produce updated mass values, and repeating
several of these steps until convergence is reached, whereby a
calibrated mass spectrum is produced.
[0029] Further embodiments of the present invention relate to
computer-readable media wherein the input parameters are selected
from the group consisting of a mass database, initial calibration
parameters, an initial error estimate, and combinations
thereof.
[0030] Still further embodiments of the present invention relate to
computer-readable media wherein the estimating the error and the
elemental composition probabilities uses an Expectation
Minimization algorithm and/or a spline algorithm.
[0031] Other embodiments of the present invention relate to
computer-readable media wherein the mass spectrometry output is
produced by a Fourier transform mass spectrometer.
[0032] Additional embodiments of the present invention relate to
computer-readable media wherein the mass spectrometry output
comprises cyclotron frequencies.
[0033] Further embodiments of the present invention relate to
computer-readable media wherein the elemental composition
probabilities are peptide probabilities.
BRIEF DESCRIPTION OF THE FIGURES
[0034] FIG. 1 depicts a flow chart, illustrating a method of
simultaneous calibration of mass spectra and elemental composition
identification in accordance with an embodiment of the present
invention.
[0035] FIG. 2A shows a distribution of peptide masses in the human
proteome in accordance with an embodiment of the present
invention.
[0036] FIG. 2B is an inset of FIG. 2A in accordance with an
embodiment of the present invention. It shows nominal mass clusters
near 1,000 Da.
[0037] FIG. 2C is an inset of FIG. 2B in accordance with an
embodiment of the present invention. The panel shows five
individual peptide masses designated by the peak numbers A through
E.
[0038] FIG. 3A shows the estimation of frequencies from a mass
spectrum in accordance with an embodiment of the present
invention.
[0039] FIG. 3B shows a graph depicting the conversion of
frequencies to masses by estimating calibration parameters in
accordance with an embodiment of the present invention.
[0040] FIG. 4 shows a more detailed overview of the calibration
process in accordance with an embodiment of the present
invention.
[0041] FIG. 5 shows the results of a calibration test in accordance
with an embodiment of the present invention.
DESCRIPTION OF THE INVENTION
[0042] Unless defined otherwise, technical and scientific terms
used herein have the same meaning as commonly understood by one of
ordinary skill in the art to which this invention belongs. One
skilled in the art will recognize many methods and materials
similar or equivalent to those described herein, which could be
used in the practice of the present invention. Indeed, the present
invention is in no way limited to the methods and materials
described.
[0043] Embodiments of the present invention relate to systems and
methods for calibration and peptide identification in connection
with mass spectrometry; in particular, with FTMS. Furthermore, the
present invention exploits the natural relationship between peptide
identification and calibration to solve two related problems
simultaneously, and to iteratively improve the solutions for each.
Most conventional calibration methods require calibrant molecules
of known mass to be added to a sample. The present invention,
however, is based upon an iterative process of identifying
components in the sample and using these identified components as
calibrants.
[0044] While preferred embodiments of the inventive systems and
methods relate to peptide calibration, they may readily be applied
to other types of chemicals or compounds. As used herein, the
general term "elemental composition" includes all types of
compounds, including peptides, that may be analyzed using the
systems and methods disclosed herein.
[0045] Most calibration methods in current use require the addition
of calibrant molecules of known mass into a sample. Alternatively,
the inventive direct calibration methods use the components of the
sample alone to provide dozens of calibrants covering the entire
mass spectrum. Direct calibration methods save time and materials,
simplify the experimental apparatus and protocol, perform
calibration in real time each time a spectrum is generated, avoid
obscuration of information that can result from ion suppression,
resulting in significant improvements in accuracy. The higher mass
accuracy of FTMS systems allow the identification of elemental
compositions from a large pool of candidates, for example, human
tryptic peptides or petroleum components. Increased calibration
accuracy results from the ability to use more species in the
calibration and the positive feedback between identification and
calibration.
[0046] FIG. 1 shows a general overview of the calibration system
(100). First, a sample may be analyzed by mass spectrometry to
produce a mass spectrometry output (101). For example, with FTMS,
the mass spectrometry output comprises cyclotron frequencies. The
mass spectrometry output, along with other initial input parameters
(102), such as a mass database (ENSEMBL, for example), calibration
parameters, and error estimates may be used to convert the mass
spectrometry output to mass values (103). The error as well as the
probabilities for the elemental compositions may then be estimated
(104), and the calibration parameters may be updated (105). The
updated calibration parameters may then be used to again convert
mass spectrometry output to mass values. Steps 103 through 105 may
repeated any number of times until the data reach convergence. The
converged data, or converged calibration output, may then be stored
or displayed in any suitable computer-readable or printed format
(106). In certain embodiments of the invention, the output of the
mass spectrometry calibration system is a calibrated mass
spectrum.
[0047] In accordance with an embodiment of the present invention,
calibration may be performed in real-time using the information
contained in a sample without the addition of specific calibrants.
A sample comprising peptides, for example, a proteomic sample, may
be subjected to a mass spectrometry, for example, FTMS, using
instruments and methods that are well known in the art. As shown in
FIGS. 2A through 2C, Individual human tryptic peptide masses may be
resolved at around 1 ppm accuracy. Table 1 shows for example, the
number of peptide mass values that may be analyzed. FIG. 2A shows
the entire distribution of mass values in the human proteome. FIG.
2B is an inset of the region of FIG. 2A (inset region designated by
the rectangular bar). This figure shows the nominal mass clusters
near 1000 Da. FIG. 2C is an inset of the region of FIG. 2B (inset
region designated by the rectangular bar). This figure shows five
individual peptide masses. The box below the graph designates the
mass for peaks A through E in the figure.
TABLE-US-00001 TABLE 1 human protein sequences 50,071 (as provided
by IPI, ENSEMBL) ideal tryptic peptides 2,515,788 distinct
sequences 808,076 distinct masses 356,933
[0048] In FTMS, an ionized peptide's mass-to-charge ratio is
estimated by estimating the frequency of its circular motion
induced by a centripetal magnetic force. The ion induces an image
charge, or transient voltage signal, on either of two parallel
detection plates as it passes. The observed frequency is calculated
from a peak in the Fourier transform of the transient voltage
between the plates.
[0049] The "observed" mass is derived in a two-step process; 1)
extraction of ion frequencies, and 2) conversion of frequencies to
mass by calibration. As shown in FIG. 3, calibration of the FT mass
spectrometer is the process by which each observed frequency (a
peak in a spectrum) is converted into a mass-to-charge value. In
FTMS, the measured quantity is frequency, and mass "measurements"
are derived from frequencies. Calibration may be thought of as an
optimization problem: given a family of calibration equations such
that there is a one-to-one correspondence with vectors of
real-valued parameters, choose an equation (or equivalently
parameter values) that minimizes a cost function. In this case, the
cost function is the estimated variance of the normalized
error.
[0050] FIG. 4 shows the calibration process for FTMS in more
detail. Table 2 shows the definitions of the symbols used in FIG.
4. Box 401 comprises the input parameters. The input parameters
include M, which denotes a peptide mass database, AM and BM the
initial calibration parameters, f, the observed frequencies from
the mass spectrometer, and .sigma..sup.(0), the initial error
estimate. A.sup.(0), B.sup.(0), and .sigma..sup.(0) are only used
in the first iteration. The values A.sup.(0) and B.sup.(0) are used
to convert the observed frequencies to mass values (402). The value
.sigma..sup.(0) is used to calculate initial peptide mass
distributions.
TABLE-US-00002 TABLE 2 Symbol Definition f = (f.sub.1. . . f.sub.n)
observed frequencies M = (M.sub.1 . . . M.sub.N) peptide mass
database A.sup.(k), B.sup.(k) calibration parameters
.sigma..sup.(k) error estimate m.sup.(k) = (m.sup.(k).sub.1. . .
m.sup.(k).sub.n) calibrated mass p.sup.(k) =
[p.sub.ij.sup.(k)].sub.i=[1 . . . n],j=[1 . . . N] probability
matrix p.sub.ij probability that frequency I (came from mass
M.sub.j)
[0051] The mass values are then subjected to an iterative process
wherein a mathematical algorithm, such as the Expectation
Minimization (EM) algorithm is applied, allowing for the estimation
of error in the probabilities that are assigned to the mass values
(403). A comprehensive description of the EM algorithm is provided
in a publication by Dempster et al. (J. Royal Statistical Society
B, 39:1-38, 1977), which is incorporated herein by reference in its
entirety. The use of the EM algorithm for calibration is described
in the Examples. The revised error estimates allow for the
calculation of updated calibration parameters (404), A.sup.(k) and
B.sup.(k). These calibration parameters are then re-applied to the
mass values. The processes designated by boxes 402 through 404 are
repeated until the updated calibration parameters no longer change
from the values in the subsequent iterations. This stage is
referred to as "convergence" (405).
[0052] In general, the frequency is inserted into a calibration
equation to obtain the mass-to-charge ratio of the ionized peptide.
The calibration equation has a set of parameters whose values are
taken to be fixed in the initial step of the calculation.
Subsequently, the calibration parameters are tuned to minimize the
estimated normalized error.
[0053] The second step is to estimate the charge on the peptide by
examining the positions of adjacent peaks that are presumed to be
species with identical elemental composition and charge, differing
only in isotopic composition. Since these mass differences between
isotopes are approximately one atomic mass unit, a peptide with
charge z would produce a set of peaks with uniform peaks separated
by 1/z units in mass-to-charge.
[0054] To first order, the mass-to-charge ratio is linearly
proportional to the period of the ion's revolution; the constant of
proportionality is the magnitude of the magnetic field. The very
high accuracy of the FTMS, however, exposes systematic errors in
the simple first-order model. Higher-order effects depend upon the
geometry of the analytic chamber and the "space-charge
effect"--interactions between multiple ionic species present within
the chamber. A term that depends upon the square of the period is
commonly used to account for these effects. A review by Zhang et
al. describes some of the development of these models (Mass
Spectrometry Reviews 24:286-309, 2005).
[0055] For example, a collection of peptide mass measurements and a
database of exact peptide mass values may be provided. There are
several databases comprising exact peptide mass values that are
known in the art. For example, the ENSEMBL database (Hubbard T. et
al., Nucleic Acids Res 33:D447-D453, 2005) and the European
Bioinformatics Institute (EBI) both provide comprehensive lists of
peptides and peptide masses. Alternatively, the calculated masses
of an "in silico" tryptic digest of a proteome, for example, the
human proteome, may be used as a peptide mass database. For
elemental compositions other than peptides, such as petroleum
products, polymers, or combinatorial libraries, alternative mass
databases may be used that are apparent to those of skill in the
art.
[0056] The calibration process proceeds iteratively. At each step,
the calibration parameters are updated to minimize the variance of
the normalized error using the current estimate of the probability
mass distribution for the exact mass identity (elemental
composition, e.g., peptide). The updated calibration parameters
change the mass values that are computed from the observed
frequencies. These new values will result in a new (initial)
estimate for the normalized error variance. This initial estimate
will be refined by the EM algorithm, resulting in a updated
estimate of the normalized error variance and a new set of
probability mass distributions for the exact mass identity of each
measurement. This procedure of iterating calibration steps and
applications of the EM algorithm to update the exact mass
probabilities is repeated to convergence. The term "convergence,"
as used herein occurs when subsequent iterations result in
essentially the same values of the calibration parameters A and B.
An example of this process is shown in Example 4.
[0057] The calibration system disclosed herein may be used with a
number of different mass spectrometry systems and configurations
that are known in the art. While an embodiment involves the use of
the calibration system with FTMS, it may also be used with other
types of mass spectrometry such as time-of-flight (TOF) mass
spectrometry, given that the mass accuracy is sufficient.
[0058] The calibration system disclosed herein may be used on a
variety of different sample types. In a preferred embodiment, the
calibration system is used with samples comprising peptides in a
biological sample. For example, a proteomic sample may be analyzed.
A wide array of biological samples may be obtained and used in
conjunction with alternate embodiments of the system (e.g., a body
fluid, such as blood, plasma, serum, CSF (spinal fluid), urine,
sweat, saliva, tears, breast aspirate, prostate fluid, seminal
fluid, vaginal fluid, stool, cervical scraping, cytes, amniotic
fluid, intraocular fluid, mucous, moisture in breath, animal
tissue, cell lysates, tumor tissue, hair, skin, buccal scrapings,
nails, bone marrow, cartilage, prions, bone powder, ear wax, etc.).
In addition, non-mammalian biological samples may be analyzed using
the systems and methods disclosed herein. For example, samples of
elemental compositions obtained from plants, bacteria, fungi, soil,
and water may be analyzed.
[0059] In addition to biological samples comprising peptides, the
calibration systems and methods disclosed herein may be used to
analyze any number of different types of samples that will be
readily apparent to those of skill in the art. Other examples of
chemical compounds or elemental compositions that may be analyzed
in this manner include but are by no means limited to
polynucleotides, hydrocarbon or petroleum products, combinatorial
libraries, and polymeric samples. Further, the calibration system
may also be used to analyze the compounds or elemental compositions
present in liquids such wine or other beverages. The calibration
method requires that most components belong to a finite, but large
set of possible elemental compositions. The size of this set can be
as large as 10.sup.5-10.sup.6, and is limited only by the accuracy
of the MS instrument.
[0060] For peptide applications of the calibration system, samples
may be prepared using any suitable method. Many such methods are
known in the art. For example, a proteomic sample may be digested
with a protease such as trypsin to produce smaller peptides. Prior
to introduction into the mass spectrometer, the peptides may be
fractionated by a variety of methods, including chromatographic
methods such as reverse-phase, size exclusion, or ion exchange
chromatography, or by electrophoretic methods such as SDS-PAGE.
[0061] The mass spectrometry calibration system disclosed herein
generally comprises "calibration software" that facilitates the
mathematical calculations necessary for calibration. The
calibration software may be stored as machine readable code on a
computer that may be in communication with the mass spectrometry
system. Alternatively, the calibration system may be applied to the
output of a mass spectrometer separately from the mass spectrometry
system. The software may be stored on any suitable computational
device. For example, the software as well as the means for its
execution may be integrated with the mass spectrometry instrument,
or housed separately on a computer or any type of suitable
electronic storage device. Examples include but are no means
limited to hard disks or drives, CD-ROMs, DVDs, and removable
storage devices such as USB drives and flash drives. Nearly any
hardware, firmware, software, operating system, database platform,
networking technique or other conventional computer tool can be
configured to operate in connection with the system and methods of
the present invention, as will be appreciated by those of skill in
the art.
[0062] In an alternative embodiment of the invention, an algorithm
is utilized that finds a spline curve (continuous in first
derivative) that minimizes the weighted squared distance to
identified masses. The use of spline in a high-order, locally
deformable calibration model to fit a large number of calibrants is
believed to be one of the novel features of the instant invention.
The spline may be constructed from segments of the form
M/Z=A/f+B/f.sup.2+C. The weight associated with each calibrant
point reflects the probability that a given mass has been
identified correctly. Each spline segment may contain at least N
points (e.g., N=10, N=20, etc.) to prevent overfitting. Indeed,
generally speaking, the estimation of calibration (spline)
parameters is the solution to a constrained optimization problem.
The solution is the point where the vector normal to the constraint
space (sets of parameters which are valid splines--i.e., smooth
curves) is parallel to the gradient of the objective function
(i.e., the sum of squared differences between observed and
calculated mass values). Example 6 demonstrates how a spline
algorithm may be used in the calibration process.
Example 1
Assessment of a Peptide's Exact Mass from a Mass Measurement with
Known Error
[0063] In this Example, the mass of a peptide is measured, and the
measured mass is denoted as .beta.. To make an inference about the
true mass of the peptide from the measured value, a quantitative
model of the measurement process is needed. The measurement of a
peptide with mass .alpha. can be modeled as the sum of the true
mass a plus an error term, e.
[0064] The error term, denoted by "e", is a normally distributed
random variable with mean zero and variance
.quadrature..sigma..sup.2. The conditional probability density,
p(.beta.|.alpha.), evaluated at .beta. is given below.
p ( .beta. .alpha. ) = ( 2 .pi..sigma. 2 ) - 1 / 2 exp ( - ( .beta.
- .alpha. ) 2 2 .sigma. 2 ) ( 1 ) ##EQU00001##
[0065] For the purposes of this example, a database of all possible
exact mass values may be provided, and the set of these values may
be denoted by {.alpha..sub.1, .alpha..sub.2 .alpha..sub.r}. Peptide
exact mass assessment involves assigning probabilities to the
possible mass values, p(.alpha..sub.j|.quadrature..beta.), j [1 . .
. r], given the measured value .beta.. These probabilities may be
computed in terms of our measurement model and Bayes' Law.
p ( .alpha. j .beta. ) = p ( .alpha. j ) p ( .beta. .alpha. j ) j =
1 r p ( .alpha. j ) p ( .beta. .alpha. j ) ( 2 ) ##EQU00002##
[0066] The factor p(.alpha..sub..phi.) in the above equation
denotes the a priori (before measurement) probability that the
peptide has mass .alpha..sub.j. If there is no a priori information
about the peptide mass values, p(.alpha..sub.j)=1/r, for all j in
[1 . . . r]. For example, it is possible to assign theoretical a
priori probabilities to peptide elemental compositions.
[0067] Although the above equation assigns non-zero probability to
all possible mass values, the probability assigned to values
differing from .beta. by more than 5.sigma. is quite small and can
be neglected. In some cases, only one exact mass value will have
significant probability.
Example 2
Estimation of Mass Measurement Error Variance from Measurements of
Known Peptides
[0068] A related calculation is the estimation of the variance of
the mass measurement error e from a collection of measurements of
peptides of known masses. For example, in this case, one may have q
peptides with masses .alpha..sub.m(1), .alpha..sub.m(2), . . .
.alpha..sub.m(q) respectively. Each peptide in sequence may be
measured resulting in measured values .beta..sub.1, .beta..sub.2, .
. . , .beta..sub.q respectively. That is, for each i from 1 to q,
.beta..sub.i is the measured value of the ith peptide, whose true
mass is .alpha..sub.m(j).
[0069] If it is known that when measurement errors are independent
and identically distributed normal random variables with mean zero,
the maximum likelihood estimate of the variance of the error may be
computed. Let .sigma..sup.2 denote the (unknown) variance of the
error. The probability density for the measured value of a peptide
with mass .alpha..sub.m(i), evaluated at the value .beta..sub.i is
given by Equation 1.
[0070] Let N-component vectors .alpha. and .beta. denote the
ordered collections of true and measured masses respectively. Then
the probability density for the entire set of measured values,
evaluated at b, is given by Equation 3
p ( .beta. .alpha. , .sigma. 2 ) = ( 2 .pi..sigma. 2 ) - q / 2 i =
1 q exp ( - ( .beta. i - .alpha. m ( i ) ) 2 2 .sigma. 2 ) = ( 2
.pi..sigma. 2 ) - q / 2 exp ( - .beta. - .alpha. 2 2 .sigma. 2 ) (
3 ) ##EQU00003##
[0071] where .parallel..beta.-.alpha..parallel..sup.2 denotes the
squared Euclidean distance between .beta. and .alpha., that is, the
sum of the squared component differences.
[0072] Let {circumflex over (.sigma.)}.sup.2 denote the
maximum-likelihood estimate of the error variance, the value of
.sigma..sup.2 that maximizes the right-hand side of Equation 3. It
is equivalent and more convenient, to maximize the logarithm of
this quantity. First, the first-derivative is evaluated with
respect to .sigma..sup.2.
.sigma. 2 log p ( .beta. .alpha. , .sigma. 2 ) = .sigma. 2 log ( -
q 2 log ( 2 .pi..sigma. 2 ) - - .beta. - .alpha. 2 2 .sigma. 2 ) =
- q 2 .sigma. 2 + - .beta. - .alpha. 2 2 ( .sigma. 2 ) 2 ( 4 )
##EQU00004##
[0073] The log-likelihood has zero first-derivative at {circumflex
over (.sigma.)}.sup.2, and its value is determined as shown in
Equation 5.
.sigma. 2 log p ( .beta. .alpha. , .sigma. 2 ) .sigma. 2 = .sigma.
^ 2 = 0 .sigma. ^ 2 = .beta. - .alpha. 2 q = 1 q i = 1 q ( .beta. i
- .alpha. m ( i ) ) 2 ( 5 ) ##EQU00005##
[0074] The maximum-likelihood estimate of the variance is simply
the mean of the squared difference between measured and true
values.
[0075] In mass spectrometry, the average magnitude of the error,
for repeated measurements of the same peptide, is linearly
proportional to the mass of the measured peptide. Furthermore, the
measurement accuracy of a mass spectrometry is characterized by the
average magnitude of the error expressed in parts per million (ppm)
of the measured mass. For example, a peptide of mass .alpha. is
measured and the resulting measurement error is e. That is, the
measured value is .alpha.+e. Let e' denote the normalized
measurement error (expressed in ppm) defined by Equation 6.
e ' = 10 6 e .alpha. ( 6 ) ##EQU00006##
[0076] Let (.sigma.').sup.2 denote the variance of the normalized
error. Let ({circumflex over (.sigma.)}').sup.2 denote the
maximum-likelihood estimate of this quantity. The estimation of the
normalized error variance is similar to that of the unnormalized
error variance and given by Equation 7.
( .sigma. ^ ' ) 2 = 1 q i = 1 q ( .beta. i - .alpha. m ( i ) 10 - 6
.alpha. m ( i ) ) 2 ( 7 ) ##EQU00007##
Example 3
Estimation of Measurement Error from Measurements of Unidentified
Peptides
[0077] In the previous two examples, it was demonstrated 1) how to
assess a peptide's exact mass from a mass measurement when the
measurement error is known and 2) how to estimate the measurement
error from a collection of known peptides. In this Example, the
maximum likelihood estimate of the normalized measurement error
variance from measurements of unidentified peptides will be
derived. This solution will be interpreted in terms of the
solutions of the problems in Examples 1 and 2.
[0078] In this Example, one has a database of all possible exact
mass values denoted by a=(.alpha..sub.1, .alpha..sub.2, . . . ,
.alpha..sub.r) and a collection of mutually independently measured
peptide masses b=(.beta..sub.1, .beta..sub.2, . . . .beta..sub.q).
There exists a mapping m: [1 . . . q] [1 . . . r] such that for
each i in [1 . . . r] measured value .beta..sub.i resulted from
measuring a peptide with mass .alpha..sub.m(i). If this mapping
were known, it would be possible to estimate the normalized error
variance directly as described in the Example 2. In this sense, the
quantities {.alpha., .beta., m} form a complete data set. Let
({circumflex over (.sigma.)}').sup.2|.alpha.,.beta.,m denote the
estimate of (.sigma.').sup.2 given .alpha., .beta., and m. Instead
the mapping m may be inferred (or better, averaged over possible
realizations of m) to estimate .quadrature.(.sigma.').sup.2 for the
incomplete data set {.alpha., .beta.}.
[0079] One possible method for constructing this estimate would be
to start with an initial (incorrect) estimate of
.quadrature.(.nu.').sup.2. Let .left brkt-bot.({circumflex over
(.sigma.)}').sup.2.right brkt-bot..sub.0 denote this initial
estimate. Then, assuming that the error parameter is actually .left
brkt-bot.({circumflex over (.sigma.)}').sup.2.right
brkt-bot..sub.0, for each measurement .beta..sub.i, calculate the
probability that the exact mass value is aj. These probabilities
p(.alpha..sub.j|.beta..sub.i,.left brkt-bot.({circumflex over
(.sigma.)}').sup.2.right brkt-bot..sub.0 are computed substituting
.beta..sub.i for .beta. in Equation 2 and
(10.sup.6.alpha..sub.j).sup.2[({circumflex over
(.sigma.)}').sup.2].sub.0 for .sigma..sup.2 in Equation 1.
[0080] Then, the updated estimate of the measurement error is the
weighted average over each pair of measurements and possible exact
mass value (.beta..sub.i, .alpha..sub.j). The weights are the
probabilities p(.alpha..sub.j|.beta..sub.i,.left
brkt-bot.({circumflex over (.sigma.)}').sup.2.right brkt-bot..sub.0
computed above. In general, if ({circumflex over (.sigma.)}').sup.2
denotes the estimated variance after n iterations, the subsequent
estimate ({circumflex over (.sigma.)}').sub.n+1.sup.2 is given by
Equation 8.
[ ( .sigma. ^ ' ) 2 ] n + 1 = 1 q i = 1 q j = 1 r ( .beta. i -
.alpha. j 10 - 6 .alpha. j ) 2 p ( .alpha. j | .beta. i , [ (
.sigma. ^ ' ) 2 ] n ) ( 8 ) ##EQU00008##
[0081] Like Equation 7, Equation 8 is the average of the observed
deviations between the measured and exact mass. In Equation 8, each
possible exact mass value is weighted by its conditional
probability given the measured value .beta..sub.i and the previous
estimate of the normalized error variance, .left
brkt-bot.({circumflex over (.sigma.)}').sup.2.right
brkt-bot..sub.n. These probabilities are computed as shown in
Equation 2. Equation 8 reduces to Equation 7 if
p(.alpha..sub.j|.beta..sub.i,.left brkt-bot.({circumflex over
(.sigma.)}').sup.2.right brkt-bot..sub.n is set equal to
.delta..sub.ij, i.e. with probability one, the exact mass
corresponding to measurement .beta..sub.i is .alpha..sub.i.
[0082] The formal derivation of Equation 8 using the EM algorithm
is given in Example 5.
[0083] Starting from an initial estimate of the normalized error
variance (e.g. .left brkt-bot.({circumflex over
(.sigma.)}').sup.2.right brkt-bot..sub.0=1), Equation 8 is
recalculated repeatedly until the estimate converges. This process
is guaranteed to converge to the maximum likelihood estimate of the
normalized error variance, as it is a realization of the
generalized Expectation-Maximization (EM) algorithm.
[0084] Each step of the EM algorithm averages over all possible
"completions" of the data, in this case, all possible peptide
identifications. As the algorithm converges to a stable estimate of
the error, it also produces increasingly accurate probabilistic
peptide identifications.
Example 4
Calibration of Fourier-Transform Mass Spectra
A Two-Parameter Calibration from a Spectrum of Unknown Peptide
[0085] A set of frequencies (f.sup.obs.sub.1, f.sup.obs.sub.2, . .
. f.sup.obs.sub.q) corresponding to the cyclotron motion of the
monoisotopic species of a peptide may be extracted from the
spectrum. It is also assumed that the charges of the peptides may
also be determined unambiguously from the sequence of frequencies
of isotopically related species. Let (z.sub.1, z.sub.2, . . .
z.sub.q) denote the corresponding charges.
[0086] Let A and B denote undetermined calibration parameters in
the following functional form relating observed frequencies to
mass-over-charge ratio:
( m z ) obs = A 1 f obs + B 1 ( f obs ) 2 ##EQU00009##
[0087] Solving for the mass, the related equation below is
obtained:
m obs = z ( A 1 f obs + B 1 ( f obs ) 2 ) ##EQU00010##
[0088] The calibration problem involves finding values A* and B*
that minimize the estimated average squared (normalized) difference
between the true value of the mass and the value calculated from
the observed frequency, the charge, and the calibration parameters
as in the above equation.
[0089] It will be shown that the values of A* and B* may be
determined by solving two linear equations in two unknowns.
[0090] It is assumed that the possible exact mass values are given
by {a1, a2, ar}. The expected squared error is given in Equation 8
where bi is replaced by m.sub.i.sup.obs. In addition, the
probabilities assigned to the exact mass values will be taken as
fixed. As a shorthand notion, let p.sub.ij represent the quantity
p(.alpha..sub.j|m.sub.i.sup.obs,({circumflex over
(.sigma.)}').sup.2).
[0091] Equation 8 is re-written in this new notation.
.sigma. ^ 2 = 1 q i = 1 q j = 1 r ( m i obs - .alpha. j 10 - 6
.alpha. j ) 2 p ij ##EQU00011##
[0092] Then, m.sub.i.sup.obs is replaced with the calibration
formula.
.sigma. ^ 2 = 1 q i = 1 q j = 1 r ( z i ( A 1 f i obs + B 1 ( f i
obs ) 2 ) - .alpha. j 10 - 6 .alpha. j ) 2 p ij ##EQU00012##
[0093] Now both sides are differentiated with respect to each
calibration parameter.
.differential. ( .sigma. ^ 2 ) .differential. A = 1 q i = 1 q j = 1
r ( z i ( A 1 f i obs + B 1 ( f i obs ) 2 ) - .alpha. j 10 - 6
.alpha. j ) ( z i f i obs 10 - 6 .alpha. j ) p ij ##EQU00013##
.differential. ( .sigma. ^ 2 ) .differential. B = 1 q i = 1 q j = 1
r ( z i ( A 1 f i obs + B 1 ( f i obs ) 2 ) - .alpha. j 10 - 6
.alpha. j ) ( z i ( f i obs ) 2 10 - 6 .alpha. j ) p ij
##EQU00013.2##
[0094] When the above derivatives are evaluated at (A*,B*), each is
equal to zero, since (A*,B*) minimizes {circumflex over
(.sigma.)}.sup.2.
A * q i = 1 q j = 1 r ( z i 2 ( f i obs ) 2 ) ( 1 ( 10 - 6 .alpha.
j ) 2 ) p ij + B * q i = 1 q j = 1 r ( z i 2 ( f i obs ) 3 ) ( 1 (
10 - 6 .alpha. j ) 2 ) p ij = 1 q i = 1 q j = 1 r ( .alpha. j ( 10
- 6 .alpha. j ) 2 ) ( z i f i obs ) p ij ##EQU00014## A * q i = 1 q
j = 1 r ( z i 2 ( f i obs ) 3 ) ( 1 ( 10 - 6 .alpha. j ) 2 ) p ij +
B * q i = 1 q j = 1 r ( z i 2 ( f i obs ) 4 ) ( 1 ( 10 - 6 .alpha.
j ) 2 ) p ij = 1 q i = 1 q j = 1 r ( .alpha. j ( 10 - 6 .alpha. j )
2 ) ( z i ( f i obs ) 2 ) p ij ##EQU00014.2##
[0095] The two equations above are re-written as a single matrix
equation.
[ i = 1 q z i 2 ( f i obs ) 2 j = 1 r p ij .alpha. j 2 i = 1 q z i
2 ( f i obs ) 3 j = 1 r p ij .alpha. j 2 i = 1 q z i 2 ( f i obs )
3 j = 1 r p ij .alpha. j 2 i = 1 q z i 2 ( f i obs ) 4 j = 1 r p ij
.alpha. j 2 ] [ A * B * ] = [ i = 1 q z i f i obs j = 1 r p ij
.alpha. j i = 1 q z i ( f i obs ) 2 j = 1 r p ij .alpha. j ]
##EQU00015##
[0096] Finally, the optimal values of the calibration parameters
may be solved.
[ A * B * ] = [ i = 1 q z i 2 ( f i obs ) 2 j = 1 r p ij .alpha. j
2 i = 1 q z i 2 ( f i obs ) 3 j = 1 r p ij .alpha. j 2 i = 1 q z i
2 ( f i obs ) 3 j = 1 r p ij .alpha. j 2 i = 1 q z i 2 ( f i obs )
4 j = 1 r p ij .alpha. j 2 ] - 1 [ i = 1 q z i f i obs j = 1 r p ij
.alpha. j i = 1 q z i ( f i obs ) 2 j = 1 r p ij .alpha. j ]
##EQU00016##
[0097] After the new values A* and B* have been used to recalculate
the observed masses, m.sub.i.sup.obs, the error estimate may be
reduced. As a result, the probabilities assigned to the exact
masses for each measurement p.sub.ij shift so that more weight is
placed upon candidates that are close to the calculated mass value.
The EM algorithm may be run again to simultaneously determine the
overall error and the individual probabilities. After the
probabilities are updated, the values of A* and B* that have just
been calculated are no longer optimal and may be recalculated. This
procedure of iterating calibration steps and applications of the EM
algorithm to update the exact mass probabilities is repeated to
convergence.
Example 5
Derivation of the Update Step in the Application of the EM
Algorithm
[0098] By definition of the EM algorithm, the estimate of the
normalized error variance in step n+1, .left brkt-bot.({circumflex
over (.sigma.)}').sup.2.right brkt-bot..sub.n+1, is the value that
maximizes the function Q (the expectation) calculated from the
estimate obtained in step n, .left brkt-bot.({circumflex over
(.sigma.)}').sup.2.right brkt-bot..sub.n.
[ ( .sigma. ^ ' ) 2 ] n + 1 = argmax ( .sigma. ' ) 2 .di-elect
cons. R + Q ( ( .sigma. ' ) 2 | [ ( .sigma. ^ ' ) 2 ] n ) ( 9 )
##EQU00017##
[0099] The function Q is defined as the expectation of the
log-likelihood of the complete data given the undetermined
normalized error variance, (.sigma.').sup.2. The complete data is
the set of observed measurements .beta. plus the exact masses of
the measured peptides, denoted by the mapping m. The possible
completions of the data, the exact peptide masses, are considered
to be drawn from the conditional distribution given the
measurements .beta. with the normalized error variance taken to be
.left brkt-bot.({circumflex over (.sigma.)}').sup.2.right
brkt-bot..sub.n.
Q ( ( .sigma. ' ) 2 | [ ( .sigma. ^ ' ) 2 ] n ) = E [ log p (
.beta. , m | .alpha. , ( .sigma. ' ) 2 ) | .alpha. , .beta. , [ (
.sigma. ^ ' ) 2 ] n ] = m .di-elect cons. [ 1 r ] q log p ( .beta.
, m | .alpha. , ( .sigma. ' ) 2 ) p ( m | .alpha. , .beta. , [ (
.sigma. ^ ' ) 2 ] n ) ( 10 ) ##EQU00018##
[0100] The value of (.sigma.').sup.2 that maximizes Q has zero
first-derivative. The first derivative of Q is given by Equation
11.
.differential. Q ( ( .sigma. ' ) 2 | ( .sigma. ^ ' ) 2 n )
.differential. ( .sigma. ' ) 2 = m .di-elect cons. [ 1 r ] q
.differential. log p ( .beta. , m | .alpha. , ( .sigma. ' ) 2 )
.differential. ( .sigma. ' ) 2 p ( m | .alpha. , .beta. , [ (
.sigma. ^ ' ) 2 ] n ) ( 11 ) ##EQU00019##
[0101] The probability of the complete data, which appears in the
right hand side of Equation 11, can be expressed as a product of
probabilities. These factors are expressed in terms of individual
measurements in Equations 13 and 14.
p ( .beta. , m | .alpha. , ( .sigma. ' ) 2 ) = p ( .beta. | .alpha.
, ( .sigma. ' ) 2 , m ) p ( m ) ( 12 ) p ( .beta. | .alpha. , (
.sigma. ' ) 2 , m ) = i = 1 q p ( .beta. i | .alpha. m i , (
.sigma. ' ) 2 ) ( 13 ) p ( m ) = i = 1 q p ( .alpha. m i ) ( 14 )
##EQU00020##
[0102] The log-likelihood of the complete data, which appears in
the right-hand side of Equation 11, can be expressed as a sum of
terms by combining equations 12, 13, and 14.
log p ( .beta. , m | .alpha. , ( .sigma. ' ) 2 ) = i = 1 q log p (
.beta. i | .alpha. m i , ( .sigma. ' ) 2 ) + i = 1 q log p (
.alpha. m i ) = - 1 2 ( .sigma. ' ) 2 i = 1 q ( .beta. i - .alpha.
m i 10 - 6 .alpha. m i ) 2 - q 2 log ( ( .sigma. ' ) 2 ) - q 2 log
( 2 .pi. ( 10 - 6 .alpha. m i ) 2 ) + i = 1 q log p ( .alpha. m i )
( 15 ) ##EQU00021##
[0103] The derivative of the log-likelihood of the complete data
with respect to (.sigma.').sup.2 is given in Equation 16.
.differential. log p ( .beta. , m | .alpha. , ( .sigma. ' ) 2 )
.differential. ( .sigma. ' ) 2 = 1 2 [ ( .sigma. ' ) 2 ] 2 i = 1 q
( .beta. i - .alpha. m i 10 - 6 .alpha. m i ) 2 - q 2 1 2 ( .sigma.
' ) 2 ( 16 ) ##EQU00022##
[0104] Then, the right-hand side of Equation 16 is plugged into
Equation 10 to obtain the first derivative of Q.
.differential. Q ( ( .sigma. ' ) 2 | [ ( .sigma. ^ ' ) 2 ] n )
.differential. ( .sigma. ' ) 2 = 1 2 [ ( .sigma. ' ) 2 ] 2 m
.di-elect cons. [ 1 ... r ] q i = 1 q ( .beta. i - .alpha. m i 10 -
6 .alpha. m i ) 2 p ( m | .alpha. , .beta. , [ ( .sigma. ^ ' ) 2 ]
n ) - q 2 ( .sigma. ' ) 2 ( 17 ) ##EQU00023##
[0105] To determine the value of (.sigma.').sup.2 that maximizes Q,
the right-hand side of Equation 17 is set to zero and solve for
(.sigma.').sup.2. This value is the updated estimate of the
normalized error variance.
[ ( .sigma. ^ ' ) 2 ] n + 1 = 1 q m .di-elect cons. [ 1 ... r ] q i
= 1 q ( .beta. i - .alpha. m i 10 - 6 .alpha. m i ) 2 p ( m |
.alpha. , .beta. , [ ( .sigma. ^ ' ) 2 ] n ) ( 18 )
##EQU00024##
[0106] The multi-dimensional sum in the right-hand side of Equation
18 can be simplified by virtue of the separability of
p(m|.alpha.,.beta.,.left brkt-bot.({circumflex over
(.sigma.)}').sup.2.right brkt-bot..sub.n.
p ( m | .alpha. , .beta. , [ ( .sigma. ^ ' ) 2 ] n ) = i = 1 q p (
.alpha. m i | .beta. i , [ ( .sigma. ^ ' ) 2 ] n ) ( 19 )
##EQU00025##
[0107] Next, exchange the order of summation and expand the vector
sum in the right-hand side of Equation 18 explicitly.
[ ( .sigma. ^ ' ) 2 ] n + 1 = 1 q i = 1 q m 1 = 1 r p ( .alpha. m 1
| .beta. 1 , [ ( .sigma. ^ ' ) 2 ] n ) m 2 = 1 r p ( .alpha. m 2 |
.beta. 2 , [ ( .sigma. ^ ' ) 2 ] n ) m q = 1 r p ( .alpha. m q |
.beta. q , [ ( .sigma. ^ ' ) 2 ] n ) ( .beta. i - .alpha. m i 10 -
6 .alpha. m i ) 2 ( 20 ) ##EQU00026##
[0108] Then, rearrange Equation 20, separating each term in the sum
as a product of q terms.
[ ( .sigma. ^ ' ) 2 ] n + 1 = 1 q i = 1 q ( m i = 1 r p ( .alpha. m
i | .beta. i , [ ( .sigma. ^ ' ) 2 ] n ) ( .beta. i - .alpha. m i
10 - 6 .alpha. m i ) 2 ) k .noteq. i ( m k = 1 r p ( .alpha. m k |
.beta. k , [ ( .sigma. ^ ' ) 2 ] n ) ) ( 21 ) ##EQU00027##
[0109] However, each term in the product indexed by k is the sum of
disjoint probabilities and therefore unity. To obtain the form in
Equation 8, the index on the inner sum is changed from m.sub.i to
j.
[ ( .sigma. ^ ' ) 2 ] n + 1 = i = 1 q j = 1 r ( .beta. i - .alpha.
j 10 - 6 .alpha. j ) 2 p ( .alpha. j | .beta. i , [ ( .sigma. ^ ' )
2 ] n ) ( 22 ) ##EQU00028##
Example 6
Use of a Spline Algorithm
[0110] A spline is a smooth function defined on some domain,
consisting of a set of smooth segment functions defined on
subdomains that form a partition of the original domain. A spline
is formed by concatenation of the segment functions. To obtain a
smooth spline, constraints are imposed upon the values of the
segment functions and their derivatives at the subdomain
boundaries. For a spline to be continuous and have n continuous
derivatives requires n+1 constraints at each boundary point.
[0111] In data analysis, a model function that best fits the data
is chosen from a family of related functions, each indexed by a
vector of parameter values. When the parameters represent physical
quantities, the model function represents an estimate of the state
of a system from a set of measurements.
[0112] In some cases, a given physical model is a good description
of a process only for disjoint local regions of a domain space. A
family of functions can be extended to model a larger class of
phenomenon by connecting them to form splines. The domain space
(the independent variable) is partitioned into regions, each of
which is characterized by its own local set of parameter values.
The values of the spline parameters in a subdomain are guided by
the measurement values from its own subdomain, but also coupled to
the parameter values in other domains by virtue of the spline
constraints.
[0113] Calibration in FTMS involves generalizing the relationship
between the measured cyclotron frequency of an ion and its
mass-to-charge ratio from a set of observed frequencies of ions of
known mass-to-charge ratios. The form of the calibration function
is based upon the magnetic and electrostatic forces encountered by
ions in an analytic cell. There are a variety of different
calibration functions, but the most widely used involves two
parameters, A and B (Ledford, E. B. et al., Mass Calibration, Int J
Mass Spectrom Ion Process 56: 2744-2748 (1984))
m / z = A f obs + B f obs 2 ( 23 ) ##EQU00029##
[0114] Because the motion of ions in an FTMS cell is not fully
understood, the parameter values are semi-empirical. Parameter A
corresponds to the centripetal magnetic force and the radial
component of the electrostatic trapping force. Parameter B
corresponds to the "space-charge effect".
[0115] The space-charge effect describes the electrostatic
repulsion between analyte ions of different species, causing a net
outward force, and a decrease in frequency. The value of parameter
B has been shown to be roughly linear in the total number of ions
in the analytic cell (Easterling M. L. et al., Anal Chem 71:624-632
(1999)). However, the space-charge effect is fundamentally a local
rather than a global phenomenon, with ions influenced
disproportionately more by ions of similar frequency. Therefore,
the local spectral density of ions appears to affect the observed
frequency. Local distortions in the calibration relation have been
reported (Mass elon C. et al., JASMS 13: 99-106 (2002)).
[0116] Spline parameters may be used to estimate the local
variations in the calibration parameters with the ultimate goal of
improving the accuracy of the estimated m/z values. The frequency
domain is partitioned into regions. The choice of partition is
driven by the data. Each subdomain has its own local values of
calibration parameters A and B, and an additional parameter D,
introduced for technical reasons. The first spline segments has
three degree of freedom; each additional spline segment introduces
three parameters; two of these are required to satisfy the spline
constraints; the remaining degree of freedom can be used to fit the
data.
[0117] The calibration relation between mass-to-charge-ratio and
frequencies in the range [f.sub.lo, f.sub.hi) may be determined
using a spline as the calibration relation. Let s denote a spline
of N segments defined on this region. Let P=(f.sub.0, f.sub.l, . .
. f.sub.N) with f.sub.0=f.sub.lo. f.sub.N=f.sub.hi, and
f.sub.i<f.sub.j for i<j denote a partition of the range
[f.sub.lo,f.sub.hi). Let si for i in 1 . . . N denote the segment
function defined on the subdomain [f.sub.i-1,f.sub.i). For
notational convenience, let l(f) denote the subdomain that contains
f.
I(f)==i i.epsilon.[f.sub.i-1,f.sub.i) (24)
[0118] Let s(f) denote the value of the spline evaluated at f. This
is defined as the value of segment function indexed by l(f)
evaluated at f.
s(f)=s.sub.l(f)(f) (25)
[0119] Let A.sub.i, B.sub.i denote the local calibration parameters
in [f.sub.i-1,f.sub.i), and let D.sub.i denote the local shift
applied to this region in order to generate a globally smooth
spline.
s i ( f ) = A i f + B i f 2 + D i f .di-elect cons. [ f i - 1 , f i
) ( 26 ) ##EQU00030##
[0120] Combining Equations 25 and 26, the calibration relation
generalized to splines is given by
s ( f ) = A I ( f ) f + B I ( f ) f 2 + D I ( f ) ( 27 )
##EQU00031##
[0121] Let x denote the vector of 3N parameters, combining the
three local parameters for each of the N spline segments.
x=[A.sub.1B.sub.1D.sub.1 . . . A.sub.NB.sub.ND.sub.N].sup.t
(28)
[0122] Equation 27 may be written as a product of a row vector
r.sup.T(f) and vector x.
s(f)=r.sup.T(f)x (29)
[0123] Row vector r.sup.T(f) has 3N columns, all but three of which
are zero: columns 3l(f)-2, 3l(f)-1, and 3l(f) contain entries 1/f,
1/f2, and 1.
[0124] In general, the expression for column i of r.sup.T(f) can be
expressed as follows:
r T ( f ) ( i ) = .delta. ( 3 i + 2 3 , I ( f ) ) f 3 i / 3 - i (
30 ) ##EQU00032##
[0125] The 2(N-1) constraints on parameter vector x that must be
satisfied for s to be a smooth spline can be represented by a
matrix Equation.
Cx=0 (31)
[0126] C denotes a constraint matrix of 2(N-1) rows, one for each
constraint, and 3N columns, one for each parameter. For example,
the constraint that the spline s be continuous at f.sub.1, requires
that the following condition holds:
s 1 ( f 1 ) = A 1 f 1 + B 1 f 1 2 + D 1 = s 2 ( f 1 ) = A 2 f 1 + B
2 f 1 2 + D 2 ( 32 a ) ##EQU00033##
[0127] Equivalently, in matrix form,
[ 1 f 1 1 f 1 2 1 - 1 f 1 - 1 f 1 2 - 1 0 0 ] x = 0 ( 32 b )
##EQU00034##
[0128] The constraint that the first derivative of s be continuous
at f.sub.1 requires
s 1 f | f 1 = - A 1 f 1 2 - 2 B 1 f 1 3 = s 2 f | f 1 = - A 2 f 1 2
- 2 B 2 f 1 3 ( 33 a ) ##EQU00035##
[0129] Equivalently, in matrix form,
[ 1 f 1 2 1 f 1 3 0 - 1 f 1 2 - 1 f 1 3 0 0 0 ] x = 0 ( 33 b )
##EQU00036##
[0130] Let C.sub.1 denote the banded diagonal matrix of N-1
continuity constraints, and C.sub.2 denote the banded diagonal
matrix of N-1 first-derivative constraints. Then, C is the matrix
formed by stacking C.sub.1 and C.sub.2.
C = [ C 1 C 2 ] ( 34 ) ##EQU00037##
[0131] The general entries (in row i column j) of C.sub.1 and
C.sub.2 respectively are given below.
C 1 ( i , j ) = .delta. ( 3 i + 2 3 , j ) f j 3 i / 3 - i ( 35 a )
C 2 ( i , j ) = .delta. ( 3 i + 2 3 , j ) ( 3 i / 3 - i ) f j 3 i /
3 - i - 1 ( 35 b ) ##EQU00038##
[0132] Let f denote the vector whose components are the measured
frequencies of K distinct ions.
f=[f.sup.obs.sub.1 . . . f.sup.obs.sub.K].sup.T (36)
[0133] Let m denote the vector that contains the corresponding
(known) m/z values of these ions.
m[m.sub.1 . . . m.sub.K].sup.T (37)
[0134] Let m.sup.calc denote the vector of values calculated from
corresponding f.sup.obs using the vector of calibration parameters
x and the calibration relation in Equation 27.
m.sup.calc=[m.sup.calc.sub.1 . . . m.sup.calc.sub.K].sup.T
(38a)
m.sup.calc.sub.i=S(f.sup.obs.sub.i) (38b)
[0135] Let e denote the weighted squared error between the observed
m/z values and the corresponding calculated values.
e = k = 1 K w k ( m k calc - m k ) 2 ( 39 ) ##EQU00039##
[0136] It may be assumed that the errors are normally distributed
with the standard error proportional to the mass. Therefore, the
weights are given by the inverse mass squared.
w k = 1 m k 2 ( 40 ) ##EQU00040##
[0137] The goal is to find the parameter vector x that minimizes
the e subject to the constraint Cx=0, i.e. the smooth calibration
spline that best fits the observed data. Because the log-likelihood
is equal to -e (plus some terms that can be ignored because they
are independent of x), if x minimizes e it also maximizes the data
likelihood.
[0138] Because the constraint is linear, the solution to the
constrained optimization problem exists in closed form and can be
found using the method of Lagrange multipliers.
[0139] To construct the solution, Equation 38 may be expressed in
matrix form. First the vector m.sup.calc may be expressed in terms
of a matrix Equation. To do so, matrix R may be constructed by
stacking the row vectors defined by Equation 30 evaluated for each
observed frequency.
R = [ r T ( f 1 obs ) r T ( f K obs ) ] ( 41 ) ##EQU00041##
[0140] Then, combining Equation 41 with Equations 29 and 38ab, the
vector m.sup.calc is the product of matrix R and parameter vector
x.
m.sup.calc=Rx (42)
[0141] Next, a diagonal matrix W is defined whose entries are the
weights defined in Equation 40.
W(i,j)=.delta.(i,j)w.sub.j (43)
[0142] Then, combining Equations 42 and 43 with Equation 39, a
matrix expression for the squared error is obtained.
e=(Rx-m).sup.TW(Rx-m) (44)
[0143] Let x* denote the value of x that minimizes e subject to the
constraint Cx=0.
x*=(R.sup.TWR).sup.-1R.sup.TWm-(R.sup.TWR).sup.-1C.sup.T[C(R.sup.TWR).su-
p.-1C.sup.T].sup.-1C(R.sup.TWR).sup.-1R.sup.TWm (45)
[0144] This is the set of parameters that describe a
maximum-likelihood spline relation between observed frequencies and
m/z.
[0145] When calibration is performed on samples without analytes of
known mass-to-charge ratio, the maximum likelihood vector of spline
parameters can also be written in terms of Equation 45, except that
the matrices W and R and the vector m must be modified.
[0146] When an ion mass is not known, its mass is characterized by
a probability mass function. For example, suppose that the m.sub.k
could be any of the following n.sub.k values m.sub.k1, m.sub.k2, .
. . or m.sub.knk. Suppose also that the probability that the true
m/z value is equal to each of these values is p.sub.k1, P.sub.k2, .
. . and p.sub.knk respectively. In the case of uncertain m/z
values, the expectation of the squared error is minimized, where
the error is taken to be a random variable.
e = k = 1 K i = 1 n k p ki w k ( m k calc - m ki ) 2 ( 46 )
##EQU00042##
[0147] The term e may be written in matrix form by collapsing the
double-sum in Equation 46 into a single sum. The vector m may be
constructed as shown in Equation 37, except that each scalar known
mass m.sub.k may be replaced with the vector of n.sub.k candidate
mass values (m.sub.k1, m.sub.2, . . . M.sub.knk). Likewise, the
vector m.sup.calc may be constructed as shown in Equation 38a,
except that the each scalar calculated mass m.sup.calc.sub.k may be
replaced with a vector containing n.sub.k copies of
m.sup.calc.sub.k. The diagonal matrix of weights, originally
defined, by Equation 43, is similarly modified. In place of each
scalar diagonal entry, a block-diagonal matrix is formed, with K
blocks denoted by W.sub.k.
W=diag(W.sub.k) (47)
[0148] The matrix Wk is itself a diagonal matrix with n.sub.k
entries. Each weight is the product of the inverse mass squared and
the candidate probability.
W.sub.k(i,j)=.delta.(i,j)p.sub.kiw.sub.k (48)
Example 7
Calibration Test with Simulated Data
Calibration of Tryptic Peptide Mixtures does not Require
Calibration Standards
[0149] A simulation experiment was performed to validate a
calibration program that used probabilistic peptide identifications
rather than known calibrant masses. Peptide masses were selected
randomly from a database of human proteome tryptic peptides. A set
of ion cyclotron frequencies was calculated from the mass values
assuming all peptides had +1 charge and using values for the
calibration parameters that are typical for the LTQ-FT. Observed
frequencies were simulated by adding random shifts to the
calculated frequencies. Calibration errors were introduced by
random shifts to the chosen calibration parameter values. For
errors of typical size (e.g. 1 ppm), it was possible to recalibrate
the spectra without using knowledge of the original mass values,
but only that the peptides were randomly selected from the
database. To allow discovery of modified peptides, a database of
"typical" tryptic peptide chemical formulas was constructed. The
database contains the most frequently occurring chemical formulas
of fragments that would be generated by tryptic digest of random
amino acid sequences.
[0150] The data simulation consisted of three parts: selection of
peptide masses, conversion of masses to cyclotron frequencies, and
introduction of random errors in the frequency values.
[0151] The spectrum was driven by the selection of peptide masses
at random from a database that contains an in silico tryptic digest
of the human proteome. The resulting digest produced 342,623
distinct mass values. Peptide masses were chosen uniformly at
random from this list. The number of peptides in the spectrum was a
variable parameter.
[0152] To ionize a peptide of neutral mass m.sub.N, the charge z
was chosen to be defined by Equation 49.
z=.left brkt-top.m.sub.N/2000.right brkt-bot. (49)
[0153] The mass of the ion m.sub.I is the neutral mass plus the
mass of z protons. The mass of a proton m.sub.p is 1.007276 Da.
m.sub.I=m.sub.N+zm.sub.p (50)
[0154] The ideal cyclotron frequency depends upon the mass to
charge ratio of the ion.
m.sub.I/z=(m.sub.N+zm.sub.p)/z=m.sub.N/z-m.sub.p (51)
[0155] Hereafter, m/z (dropping the subscript I) was used to denote
the mass to charge ratio of the ion.
[0156] The choice for z placed an upper limit of (approximately)
2,000 on m/z, which is typical for FTMS data collection in
proteomic experiments. Each m/z value was converted into an ideal
cyclotron frequency. Typically, the calibration relation is defined
in terms of the ideal cyclotron frequency for an ion. For example,
the common relation was used as shown in Equation 52.
m / z = A f + B f 2 ( 52 ) ##EQU00043##
[0157] Note that the second term in the right-hand side of Equation
49 is small compared with the first-term. In some calculations,
like analysis of the effect of frequency measurement error upon the
mass-to-charge ratio (see below), the following approximation was
acceptable.
m / z .apprxeq. A f ( 53 ) ##EQU00044##
[0158] Equation 54 has two solutions.
f = A 2 ( m / z ) .+-. A 2 + 4 B ( m / z ) 2 ( m / z ) ( 54 )
##EQU00045##
[0159] The smaller of the two frequencies is the magnetron
frequency. The larger value was desired, the cyclotron frequency,
which is slightly smaller than A/(m/z). The values for A and B of
1.075*10.sup.8 and -3.455*10.sup.8 were chosen respectively. These
values approximate typical values for the Thermo LTQ-FT. Using
these calibration parameters, each m/z value was plugged into
Equation 54 to generate an ideal cyclotron frequency. These values
are referred to as A.sub.true and B.sub.true. The values of
A.sub.true and B.sub.true were not available to the calibration
program that subsequently analyzed the simulated data. The ideal
frequency generated from Equation 54 will be referred to as
f.sub.true.
[0160] A mean-zero Gaussian random variable was added to each
cyclotron frequency to simulate additive measurement error, denoted
by e in Equation 55. The resulting frequency was denoted by
f.sub.obs.
f.sub.obs=f.sub.true+e (55)
[0161] The standard deviation of the random error e was set to be
proportional to the true frequency.
.sigma. e = x 10 6 f true ( 56 ) ##EQU00046##
[0162] The term x denoted the measurement error in
parts-per-million (ppm). Note that a given ppm error in the
frequency produces an approximately equivalent ppm error in mass,
as can be derived by differentiating both sides of (53).
( m / z ) ( m / z ) .apprxeq. f f ( 57 ) ##EQU00047##
[0163] The error in this approximation is insignificant for typical
calibration parameters. The simulated data consisted of a set of
"observed" cyclotron frequencies, generated as described above. The
number of observed frequencies was a variable parameter, which was
denoted by N. The performance of the algorithm depended upon N as
described below.
[0164] In addition to the parameters controlling the data
simulation, there were a number of parameters that controlled the
algorithm. The most important of these was the initial estimates of
the calibration parameters A and B. These initial estimates are
denoted by A.sub.0 and B.sub.0 respectively. In practice, these
parameters may be the last known calibration parameters for the
machine--either the output of the algorithm on the previous scan or
the result of calibration on a previous run,
[0165] In testing the algorithm, the chosen values differed
slightly from the true values of A and B described above to
simulate realistic errors in calibration. Analysis may be helpful
in determining how to appropriately miscalibrate spectra.
[0166] Consider the effect of errors in both A and B upon m/z by
modifying Equation 52.
.DELTA. ( m / z ) = .DELTA. A f + .DELTA. B f 2 ( 58 )
##EQU00048##
[0167] Setting .DELTA.(m/z) to zero and solving for .DELTA.B
indicates that the calibration error will be equal to zero for some
value of f. Let f.sub.0 denote the value where the calibration
error is zero.
.DELTA.B=.DELTA.A(f.sub.0) (59)
[0168] Combining Equations 58 and 59, produces an Equation for the
calibration error in m/z as a function of .DELTA.A and f.sub.0.
.DELTA. ( m / z ) = .DELTA. A f [ 1 - f 0 f ] ( 60 )
##EQU00049##
[0169] Combining Equation 60 with (53), produces an approximation
for the normalized calibration error.
.DELTA. ( m / z ) ( m / z ) .apprxeq. .DELTA. A A [ 1 - f 0 f ] (
61 ) ##EQU00050##
[0170] The root-mean-squared normalized calibration error in a
spectrum with observed frequencies (f1 . . . fN) can be
approximated from (61). Replacing the true frequencies with the
observed frequencies should not significantly change our
estimate.
rms [ .DELTA. ( m / z ) ( m / z ) ] .apprxeq. .DELTA. A A i = 1 N [
1 - f 0 f i ] 2 ( 62 ) ##EQU00051##
[0171] The error is minimized when f.sub.0 is chosen to be the
reciprocal average of the reciprocal frequency. This value of
f.sub.0, denoted by f.sub.0* in Equation 59, eliminates systematic
calibration errors in a given spectrum.
f 0 * = ( 1 N i = 1 N 1 f i ) - 1 ( 63 ) ##EQU00052##
[0172] The first six parameters describe the generation of
simulated data. The values of A.sub.true and B.sub.true are typical
calibration parameters that have been have encountered when running
the Thermo LTQ-FT. The values of A.sub.init and B.sub.init were
chosen to introduce miscalibration. A.sub.init differed from
A.sub.true by 2 ppm. From Equation 55, it was observed that
introduced calibration errors bounded above by 2 ppm for large
masses. The value of B.sub.init was chosen so that f.sub.0
(Equation 55) would be near the center of the spectrum. This
combination of A.sub.0init and B.sub.0init placed the zero point
for the calibration at m/z .about.2000.
[0173] The number of peaks was arbitrarily set to 50 to represent a
typical mass spectrum. The algorithm may perform better given more
peaks. The measurement error describes the normalized rms deviation
between the true cyclotron frequency and the observed value.
[0174] The last three parameters governed the calibration
algorithm. In the above example, the initial error estimate was
intentionally chosen to be much larger than the actual error. The
number of iterations for the error estimator and calibrator were
chosen to be much larger than what is typically required for
convergence.
[0175] The algorithm proved to be robust to a variety of
conditions. The data are shown in FIG. 5. In the high mass region
inset of FIG. 5, the true masses lie on the x-axis. The first
dashed vertical line denotes a low-confidence identification
because several candidates are within .+-.1.sigma. of the true mass
value. The second dotted line denotes a high-confidence
identification because there is only one candidate within
.+-.1.sigma. of the true mass value. There were no candidates in
.+-.1.sigma.. In summary, 50 random human tryptic peptides were
analyzed (m=[0,2000], z=1).
[0176] The parameters characterizing the simulated data were the
number of peptides in the spectrum and the measurement error. The
performance of the calibration algorithm would be expected to
increase with the number of peptides. This is because the initial
convergence of the algorithm depends upon being able to
unambiguously identify at least a small number of peptide masses.
The probability that this condition is satisfied increases
exponentially with the number of peptides in the spectrum.
Similarly, the performance of the algorithm would be inversely
correlated with the size of the measurement error. Large errors may
make it difficult to identify peptide masses.
[0177] While the description above refers to particular embodiments
of the present invention, it should be readily apparent to people
of ordinary skill in the art that a number of modifications may be
made without departing from the spirit thereof. The presently
disclosed embodiments are, therefore, to be considered in all
respects as illustrative and not restrictive.
* * * * *