U.S. patent application number 13/110541 was filed with the patent office on 2012-01-19 for normalization of mass spectra acquired by mass spectrometric imaging.
This patent application is currently assigned to BRUKER DALTONIK GMBH. Invention is credited to Soren-Oliver DEININGER, Eryk WOLSKI.
Application Number | 20120016598 13/110541 |
Document ID | / |
Family ID | 44576073 |
Filed Date | 2012-01-19 |
United States Patent
Application |
20120016598 |
Kind Code |
A1 |
DEININGER; Soren-Oliver ; et
al. |
January 19, 2012 |
NORMALIZATION OF MASS SPECTRA ACQUIRED BY MASS SPECTROMETRIC
IMAGING
Abstract
Mass spectra acquired by imaging mass spectrometry (IMS), in
particular MALDI imaging of tissue sections, are each normalized by
one of: the p-norm of the mass spectrum transformed by applying an
exclusion list, the p-norm of the mass spectrum transformed by
square rooting the intensity values, the median of the mass
spectrum, and the median absolute deviation of the noise level of
the mass spectrum.
Inventors: |
DEININGER; Soren-Oliver;
(Leipzig, DE) ; WOLSKI; Eryk; (Bremen,
DE) |
Assignee: |
BRUKER DALTONIK GMBH
Bremen
DE
|
Family ID: |
44576073 |
Appl. No.: |
13/110541 |
Filed: |
May 18, 2011 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
61347026 |
May 21, 2010 |
|
|
|
Current U.S.
Class: |
702/28 |
Current CPC
Class: |
H01J 49/0036 20130101;
H01J 49/0004 20130101; G01N 33/6851 20130101; H01J 49/164
20130101 |
Class at
Publication: |
702/28 |
International
Class: |
G06F 19/00 20110101
G06F019/00; H01J 49/26 20060101 H01J049/26 |
Claims
1. A method for normalizing mass spectra of a mass spectrometric
imaging data set, comprising normalizing each mass spectrum by one
of: (i) applying an intensity value exclusion list to that mass
spectrum and then computing the p-norm of the resulting mass
spectrum, (ii) square rooting the intensity values of that mass
spectrum and then computing the p-norm to the resulting mass
spectrum, (iii) the median of the mass spectrum, and (iv) the
median absolute deviation of the noise level of the mass
spectrum.
2. The method of claim 1, further comprising: (a) deriving a first
mass image from the mass spectrometric imaging data set wherein
each mass spectrum has been normalized by one of (i)-(iv); (b)
normalizing each mass spectrum by computing a p-norm of that mass
spectrum without applying an exclusion list; (c) deriving a second
mass image from the mass spectra normalized in step (b); and (d)
comparing the first and second mass images and normalizing the mass
spectra using the method in step (b) when the first and second mass
images are substantially similar otherwise normalizing the mass
spectra using the method in step (a).
3. A method for normalizing mass spectra of a mass spectrometric
imaging data set comprising the steps: (a) calculating first
normalization factors for each mass spectrum by a p-norm; (b)
calculating second normalization factors for each mass spectrum by
one of: the p-norm of the mass spectrum transformed by applying an
exclusion list, the p-norm of the mass spectrum transformed by
square rooting the intensity values, the median of the mass
spectrum, and the median absolute deviation of the noise level of
the mass spectrum; (c) normalizing each mass spectrum by the
corresponding first normalization factor, when the first and second
normalization factors match in a statistical test, otherwise
normalizing each mass spectrum by the corresponding second
normalization factor.
4. The method of claim 3, wherein the statistical test is a
correlation.
5. The method of claim 4, wherein a correlation coefficient greater
than 0.8 indicates a match of the first and second normalization
factors.
6. The method of claim 3, wherein the statistical test is a
chi-square analysis of the distributions of the normalization
factors.
7. The method of one of claims 1 to 3, wherein the mass spectra of
the mass spectrometric imaging data set are acquired by MALDI
imaging.
8. The method according claim 7, wherein a tissue section is
analyzed by MALDI imaging to produce the mass spectrometric imaging
data set.
9. The method of one of claims 1 to 3, wherein the p-norm is the
total ion count.
10. The method of one of claims 1 to 3, wherein the mass
spectrometric imaging data set is acquired from a tissue section
and the method further comprises the steps of deriving a mass image
from the normalized mass spectra; and displaying the mass image in
order to determine and visualize the spatial distribution of
compounds in the tissue section.
Description
BACKGROUND
[0001] The invention provides methods for normalizing mass spectra
acquired by imaging mass spectrometry (IMS), particularly by
imaging tissue sections using matrix assisted laser
desorption/ionization (MALDI). Histology is the science of human,
animal and plant tissues, in particular, their structure and
function. A histologic examination of a tissue sample determines
the kind and state of the tissue, e.g. the type(s) and
differentiations of the tissue sample, bacterial and parasitic
pathogens in the tissue sample, the disease state of the tissue
sample or any other change compared to a normal state.
[0002] In routine examination, the kind and state of a tissue
sample are determined by optically imaging tissue sections,
acquired by microscopes or scanners. Usually, the tissue sections
are only a few micrometers thick and are stained to increase the
contrast of the optical images and emphasize structures in the
tissue sections. Histology has mainly been based on morphologic
characteristics since the kind and state of a tissue sample are
determined according to the presence of specific structures of
tissue and cells and their staining properties.
[0003] Imaging mass spectrometry (IMS) is a technique used to
determine (and visualize) the spatial distribution of compounds in
a sample by acquiring spatially resolved mass spectra. In recent
years, IMS is increasingly used to analyze the spatial
distributions of compounds in tissue sections (Caprioli; U.S. Pat.
No. 5,808,300 A), particularly by using matrix assisted laser
desorption/ionization (MALDI). However, IMS can also be used to
analyze other types of samples, like plates of thin layer
chromatography (Maier-Posner; U.S. Pat. No. 6,414,306 B1), gels of
an electrophoresis or blot membranes. All spatially resolved mass
spectra of a sample constitute a mass spectrometric imaging data
set S(x, y, m). The mass spectrometric imaging data set S(x, y, m)
of a sample can be viewed as a collection of multiple mass images
S(x, y, m.sub.k) of different masses or mass ranges m.sub.k, that
is, S(x, y, m) can be divided into mass ranges each generating a
mass image.
[0004] Caprioli has established a raster scan method to acquire
spatially resolved MALDI mass spectra of tissue sections. A tissue
section is prepared on a sample plate with a matrix layer and then
scanned with laser pulses of a focused laser beam in the x- and
y-directions, often with several hundred pixels in both directions.
In order to raster an entire tissue section, the sample plate is
moved by a stage along the x- and y-direction. Every pixel (focus
region of the laser beam) on the tissue section is irradiated at
least once in the imaging process, and usually ten to a hundred
times. The ions generated in the multiple MALDI processes are
analyzed in a mass analyzer, most often a time-of-flight mass
spectrometer with axial ion injection. The multiple mass spectra
acquired at a single pixel are added to a sum spectrum and the sum
spectrum is assigned to the pixel.
[0005] If the concentrations of compounds are sufficiently high in
the tissue section, the spatial distribution can be determined by
IMS. The tissue section is characterized by the spatial
distribution of compounds, i.e. by molecular information. The
compounds can be all kinds of biological substances, like proteins,
nucleic acids, lipids and sugars, or drugs. Chemical modifications
of compounds, in particular posttranslational modifications of
proteins and metabolites of drugs, can be determined across the
tissue section. In general, IMS generates spatially resolved mass
spectra and thus provides high content molecular information as
well as morphologic information, the latter at a limited spatial
resolution compared with the optical images.
[0006] According to Suckau et al. (U.S. Pat. No. 7,873,478 B2), the
spatial distribution of a tissue kind and state can be determined
by combining at least two different mass signals at each pixel with
predetermined mathematical or logical expressions to generate a
measure representing the tissue kind and state at that spot. The
different mass signals represent different compounds, i.e., that
two or more different mass images are combined with predetermined
mathematical or logical expressions to a state image of the tissue
section. The state image is often displayed together with an
optical image of the tissue section.
[0007] Normalization is the process of multiplying (or dividing) a
mass spectrum with an intensity-scaling factor (normalization
factor f) to expand or reduce the range of the intensity axis. It
is used to compare mass spectra of varying intensity (Baggerly
2003, Morris 2005, Norris 2007, Smith 2006, Villanueva 2005, Wagner
2003, Wolski 2006, Wu 2003; see list at the end of the disclosure).
In general, a mass spectrum S is a vector of multiple intensity
values s.sub.i (i=1 . . . N) at corresponding masses m.sub.i. The
mass spectrum S is multiplied or divided by the normalization
factor to generate a normalized mass spectrum.
[0008] Intrinsic properties of a tissue and the preparation of a
tissue section for MALDI imaging may influence the normalization of
the acquired mass spectra and can lead to artifacts in normalized
mass images. For example, an inhomogeneous spatial distribution of
salts or endogenous compounds can suppress the formation of ions in
the MALDI process and lead to an inhomogeneous mass image of a
compound that is homogeneously distributed in the tissue section.
The mass signals of lipids being present in the tissue can be much
more intense than signals of peptides or proteins. Therefore, there
is risk that highly concentrated lipids suppress the formation of
peptide and protein ions.
[0009] Further, MALDI imaging requires the preparation of a matrix
layer on the tissue section. The properties of the matrix layer,
particularly the size of matrix crystals and their spatial
distribution on the tissue section, can affect mass signals of
compounds, like proteins, irrespective of their concentration in
the tissue section. That is of interest since the resolution of a
MALDI mass image can actually be higher than the size of the matrix
crystals. A contamination of the MALDI ion source can fade the
image brightness during the acquisition of the entire MALDI imaging
data set.
[0010] Besides using an optimized and stable preparation, the
influence of the tissue and its preparation on mass images can be
minimized by proper normalization. A failure to apply normalization
can also lead to artifacts in mass images. A normalization is also
required to compare mass spectra across different imaging data sets
in cohort studies, e.g., for biomarker discovery.
[0011] The most commonly used normalization procedures in mass
spectrometry are normalization on the total ion count (TIC) as well
as the vector norm. The TIC-norm and the vector norm are special
cases of the so called p-norm of a mass spectrum S:
S = ( i s i p ) 1 / p ##EQU00001##
[0012] For p=1, the normalization is based on the sum of all
intensity values S.sub.i in the mass spectrum S, which is equal to
the total ion count (TIC). The TIC-normalized mass spectra have the
same integrated area under the spectrum. The normalization factor
of the TIC norm is:
f TIC = i s i ##EQU00002##
[0013] For p=2, the p-norm equals the vector norm. The
normalization factor of the vector norm is:
f vector = i S i 2 ##EQU00003##
[0014] For p.fwdarw..infin., the p-norm leads to the maximum norm,
in which the normalization is done on the most intensive peak of
the mass spectrum (and which is sometimes used in LC-MS based
label-free approaches). The larger the exponent p becomes, the
higher the influence of intensity signals on the result of the
normalization becomes. This is also true for noise spectra. In the
maximum norm, the highest intensity value in a noise spectrum will
be the same as the highest intensity pixel of the most intense
signal of other spectra. Noise spectra are therefore considerably
amplified by increased p, and are therefore expected to be least
problematic in TIC normalization.
[0015] The TIC-normalization and the vector norm as well are based
on the assumption that a comparable number of signals is present
with more or less similar intensities in all mass spectra to be
normalized. This assumption is fulfilled for samples, like serum
samples or homogenized tissue samples, where only a few signal
intensities change against an otherwise constant background. In
mass spectrometric imaging data sets, one cannot trust that this
condition is met because different types of tissue (or cells) may
be present in the same tissue section. As a consequence, it is
possible to compare expression levels across samples for comparable
types of tissue after TIC normalization. However, the error can be
high when comparing expression levels between different types of
tissue expressing a heterogeneous set of compounds with quite
different spatial distributions. In certain cases, the TIC
normalization can produce misleading results and possibly lead to
wrong conclusions, e.g., regarding the spatial distribution of a
potential biomarker, drug or metabolite of a drug. This is typical
for tissues in which abundant signals are present in confined
areas, such as insulin in the pancreas or beta-amyloid peptides in
the brain. The question of whether or not MALDI imaging datasets
should be normalized, and the optimal model to do so, is still
subject of intense debate at conferences or MALDI imaging
workshops.
[0016] In principle, every mass spectrometer analyzes ions
according to the ratio of their mass to the number of their
unbalanced elementary charges (m/z, also termed the "charge-related
mass"). Since MALDI is of particular importance for acquiring
spatially resolved mass spectra and provides only singly charged
ions, the term "mass" rather than "charge-related mass" will be
used below only for the sake of simplification. Spatially resolved
mass spectra of mass spectrometric imaging data sets can be
acquired with different kinds of mass spectrometers. At present,
time-of-flight mass spectrometers (TOF-MS) with axial ion injection
are mainly used for MALDI imaging, but time-of-flight mass
spectrometers with orthogonal ion injection, ion traps
(electrostatic or high frequency) or ion cyclotron resonance mass
spectrometers can also be used therefore.
SUMMARY
[0017] In accordance with the principles of the invention mass
spectra of a mass spectrometric imaging data set are normalized in
a variety of methods and used to derive mass images which are
displayed or used for a further analysis. Each mass spectrum is
normalized by the p-norm of that mass spectrum. However, before the
p-norm is calculated, the spectrum is transformed in a
predetermined manner. The p-norm is most preferably the TIC (total
ion count) of the mass spectrum, but can be other normalization
functions.
[0018] In one embodiment, the mass spectrum is transformed by
applying an exclusion list before the p-norm is calculated.
[0019] In another embodiment, the mass spectrum is transformed by
square rooting the intensity values (square root intensity
transformation) before the p-norm is calculated.
[0020] In still another embodiment, the mass spectrum is
transformed by the median of the mass spectrum.
[0021] In yet another embodiment, the mass spectrum is transformed
by the median absolute deviation of the noise level of the mass
spectrum.
[0022] In this process, mass spectra of the mass spectrometric
imaging data set are preferably acquired by MALDI imaging. The
samples analyzed by MALDI imaging are preferably tissue sections,
but can also be plates of thin layer chromatography, gels of an
electrophoresis or blot membranes. The mass spectrometric data set
and thus mass images derived from the data set can cover the entire
sample or one or more regions of interest which can be
predetermined or selected by a user. The mass spectra to be
normalized can be any subset of the mass spectra of a mass
spectrometric imaging data set, e.g. every second mass spectrum in
one or both directions, or can be derived from the mass spectra of
a mass spectrometric imaging data set, e.g. by binning.
[0023] The artifacts introduced to mass images of a tissue section
by the TIC norm or the vector norm are usually a result of mass
signals with high intensity or large areas under the peak in
confined regions on the tissue section. These mass signals are
preferably incorporated into the exclusion list so that they do not
affect the subsequent p-normalization. The intensity values of the
mass spectrum S(x.sub.i, y.sub.j, m) at pixel (x.sub.i, y.sub.j)
are transformed by applying the exclusion list to the mass
spectrum; then the normalization factor f.sub.exclusion is
calculated from the transformed mass spectrum S(x.sub.i, y.sub.i,
m):
S _ ( x i , y j , m ) = { 0 , m lower < m < m higher S ( x i
, y j , m ) , else , ##EQU00004## [0024] wherein m.sub.lower and
m.sub.higher define the boundaries of a single mass range. The
exclusion list can in principle comprise two or more mass ranges
M.sub.n:
[0024] S _ ( x i , y j , m ) = { 0 , m .di-elect cons. M 1 M 2 M n
S ( x i , y j , m ) , else ##EQU00005##
[0025] The normalization factor f.sub.exclusion is equal to:
(f.sub.exclusion=.parallel. S.parallel.
[0026] The mass spectra S are preferably normalized by the total
ion count of the transformed mass spectrum S. The exclusion list
can be defined by a user after an inspection of mass images
normalized by the TIC without an exclusion list in order to
identify one or more mass ranges of signals that lead to artifacts.
The user may start from an empty list or a predetermined exclusion
list and iteratively add (or remove) mass ranges to the exclusion
list. The mass ranges can be predetermined depending on the kind of
tissue.
[0027] Normalization does not have to be based on the peak areas or
maximum intensities of the mass signals, but can be also based on
the noise level n.sub.i of a mass spectrum. A normalization factor
f.sub.noise can for example be calculated by the median absolute
deviation of the noise level:
f.sub.noise=median(|n.sub.i-median(n.sub.i)|)
[0028] There are different ways to estimate the noise level n.sub.i
of a mass spectrum. Wavelet shrinkage, a signal de-noising
technique, is frequently used to smooth and denoise chromatograms
and mass spectra. It employs the universal thresholding method to
derive an estimate of the noise level in a spectrum. In this
method, the noise level n.sub.i is estimated from the detail
coefficients d.sub.i of the finest scale. The detail coefficients
d.sub.i of the finest scale can be determined without computing a
complete wavelet decomposition. In case of the Haar wavelet
decomposition, the detail coefficients d.sub.i are differences of
consecutive intensity values S.sub.i of the mass spectrum S:
[0029] d.sub.i=s.sub.i-s.sub.i-1,
and the normalization factor f.sub.noise is:
f.sub.noise=median(|d.sub.i-median(d.sub.i)|)
[0030] The calculation of the noise level n.sub.i can be affected
by operations like smoothing and especially binning, which are
often part of a MALDI imaging workflow. Normalization can also be
based on the median of the mass spectrum which shall be robust to
these preprocessing methods and is expected to be a measure for the
intensity of the baseline. Therefore, the normalization factor
f.sub.median is calculated by the median of the intensities values
s.sub.i of a measured mass spectrum S:
f.sub.medium=median(s.sub.i)
[0031] Using both latter approaches it is possible to circumvent
the inherent dangers of the TIC normalization without the need of a
user intervention to provide an exclusion list.
[0032] In a second embodiment, the invention provides a method for
normalizing mass spectra of a mass spectrometric imaging data set,
wherein a first mass image is derived from the normalized mass
spectra according to the first aspect of the invention, each mass
spectrum is additionally normalized by a p-norm (preferably by the
total ion count without applying the exclusion list, a second
normalized mass image is derived from the additionally normalized
mass spectra, the additionally normalized mass spectra are used, if
the first and second normalized images are substantially
similar.
[0033] The mass images can be compared by a user in order to
determine the similarity between them. A similarity comparison can
also be performed by known image comparison algorithms for the
entire images or only for one or more regions of interest, e.g. by
correlating the entire images or corresponding regions, by
comparing coefficients of a Fourier transform or wavelet transform
or by calculating and comparing statistical characteristics (mean,
median, variance). The regions of interest used for the comparison
can be overlapping or disjoint.
[0034] In a third embodiment, the invention provides a method for
normalizing mass spectra of a mass spectrometric imaging data set,
comprising the steps: [0035] (a) calculating first normalization
factors for each mass spectrum by one of: the p-norm of the mass
spectrum transformed by applying an exclusion list, the p-norm of
the mass spectrum transformed by square rooting the intensity
values (square root intensity transformation), the median of the
mass spectrum, and the median absolute deviation of the noise level
of the mass spectrum, [0036] (b) calculating second normalization
factors for each mass spectrum by a p-norm without an exclusion
list, and [0037] (c) normalizing the mass spectra by the
corresponding second normalization factors, if the first and second
normalization factors match in a statistical test, and otherwise by
the corresponding first normalization factors.
[0038] The p-norm in steps (a) and (b) is most preferably the total
ion count of the mass spectrum. In a preferred embodiment, the
statistical test is a correlation, e.g. a Pearson correlation. The
normalization factors match if the correlation coefficient is
preferably greater than 0.8, more preferably greater than 0.9 for
increased certainty. In another embodiment, the statistical test is
a chi-square analysis of the distributions of the calculated
normalization factors.
[0039] The methods according to the invention can be used to
determine and visualize the spatial distribution of compounds in a
tissue. At first, a mass spectrometric imaging data set of a tissue
section is acquired. At second, the mass spectra of mass
spectrometric imaging data are normalized by a method according to
the invention. At third, a mass image is derived from the
normalized mass spectra and displayed.
BRIEF DESCRIPTION OF THE DRAWINGS
[0040] FIGS. 1A to 1E show different conventional images of a piece
of rat hippocampus at 20 .mu.m lateral resolution: FIG. 1A shows an
optical image of the unstained tissue section prior to the
measurement. FIG. 1B shows an optical image of the matrix
distribution after preparation. FIG. 1C shows a mass image of a
selected mass signal without normalization. FIG. 1D is an overlay
of FIGS. 1B and 1C. FIG. 1E shows a mass image of the selected mass
signal after normalization by the vector norm.
[0041] FIG. 2 shows averaged mass spectra acquired at one islet of
Langerhans (dashed line, with an aberrant insulin peak) and at a
"normal" area (solid line) of a mouse pancreas.
[0042] FIGS. 3A to 3F show mass images of a mouse pancreas for
insulin (FIGS. 3A to 3C) and a housekeeping protein in the pancreas
(FIGS. 3D to 3F) with no normalization, normalization on vector
norm and normalization on the total ion count (TIC).
[0043] FIG. 4 shows an optical image after hematoxylin and eosin
staining (H&E staining) of a tissue section of a rat
testis.
[0044] FIG. 5 shows averaged mass spectra acquired at a certain
tubuli with an aberrant peak (dashed line) and at a "normal" tubuli
(solid line) of the rat testis.
[0045] FIG. 6A to 6F show mass images of a compound that is
homogeneously distributed in the rat testis applying different
normalizations.
[0046] FIG. 7A to 7F show mass images of a compound that is only
present in certain tubuli of the rat testis after applying
different normalizations.
[0047] FIG. 8 shows a mass spectrum of low intensity wherein only
the very upper part of the baseline is recorded or even only
electronic spikes are present in the spectrum.
[0048] FIGS. 9A1 to 9D3 show mass images of three different
compounds of the rat testis (peak 1, peak 2 and peak 3) after
applying the TIC-norm (Figures Ax), the TIC-norm with an exclusion
list (Figures Bx), the TIC-norm after a logarithmic intensity
transformation (Figures Cx) and the TIC-norm after a square root
transformation (Figures Dx).
[0049] FIGS. 10A1 to 10C3 show histograms of three uniformly
distributed compounds of the rat testis after applying the TIC-norm
with an exclusion list (Figures Ax), the TIC-norm after a square
root intensity transformation (Figures Bx) and the TIC-norm after a
logarithmic intensity transformation (Figures Cx).
[0050] FIGS. 11A to 11F show distributions and correlation
coefficients of normalization factors calculated from the rat
testis dataset.
DETAILED DESCRIPTION
[0051] While the invention has been shown and described with
reference to a number of embodiments thereof, it will be recognized
by those skilled in the art that various changes in form and detail
may be made herein without departing from the spirit and scope of
the invention as defined by the appended claims.
[0052] The examples below show that normalization improves the
amount of information extracted from mass spectrometric imaging
data sets, especially for MALDI imaging when the lateral resolution
approaches the level of the inhomogeneities of the matrix layer.
The same may be true when other factors are present that influence
the overall intensities of the measured mass spectra, such as
different salt or lipid concentrations.
[0053] It is necessary to understand that certain assumptions are
made on the data for all normalization approaches, e.g. that the
integrated area of all peaks in the mass spectra should be
comparable (in case of normalization on the TIC), that the overall
intensities of the peaks should be rather similar (in case of the
vector norm), that the noise level or median baseline should be
similar for all peaks. In mass spectrometry-based serum profiling,
where normalization on the TIC is usually used, it is assumed that
only a few mass signals change throughout the dataset and that the
majority of mass signals are constant. In the case of MALDI imaging
of tissue sections, this assumption is often not justified because
different protein profiles may be present in different regions of
the tissue. If no normalization is applied, other assumptions are
made on the data, namely that there are no effects such as
inhomogeneous matrix layers or disturbing salt or lipid
concentrations. The question whether any normalization at all or
which normalization is warranted can be answered by determining
which of the assumptions is most true.
[0054] As shown in the examples below, it may be necessary to
perform normalization on mass spectrometric imaging data sets to
get access to the true histological distribution of compounds,
especially if the resolution of the MALDI imaging is comparable
with the size of the matrix structures (crystals). However, if the
known normalization on the TIC-norm or the vector norm is applied
to mass spectra of MALDI imaging data sets of tissue sections, the
mass images derived from normalized mass spectra can show strong
artifacts. These artifacts result from an inhomogeneous
distribution of compounds in the tissue section leading to aberrant
mass signals with unusually high intensities or integrated areas
and are particularly dangerous for the interpretation of the data,
because they can accidentally reflect real histological differences
in the tissue. It can be further observed that the normalization on
the TIC is less prone to artifacts compared to the normalization on
the vector norm.
[0055] The manual exclusion of the aberrant mass signals from
calculating normalization factors solves the problem and results in
mass images that reflect a true distribution of compounds. However,
the disadvantage of this most reliable approach is that it normally
requires manual interaction with the data. This requires that both
the presence of the problem and those signals causing the problems
have to be identified first. The presence of the problem can be
spotted by the appearance of "holes" in the distribution of the
noise or in the mass images of abundant (homogeneously distributed)
mass signals. The aberrant signals can easily be spotted by looking
into mass spectra at those regions.
[0056] The normalization on the median and the noise level are
robust against the presence of aberrant mass signals. The mass
images according to these normalizations look less smooth than the
normalization on the TIC with an exclusion list. However, they do
not require a manual interaction and are more robust. Therefore,
they can be considered as preferred for a primary normalization.
The normalization on the median and on the noise level gives
similar results. Since the normalization on the median is less
influenced by common processing steps in MALDI imaging such as
binning or spectra smoothing, the normalization on the median is
the most robust approach.
EXAMPLES
[0057] For the examples below, the work flow for acquiring a MALDI
imaging data set of a tissue sample comprises the following steps:
[0058] (a) A tissue sample is cut into cryosections with a
cryo-microtome. The tissue sections with a thickness of 10 .mu.m
are transferred onto conductive Indium-Tin-Oxide coated glass
slides, vacuum-dried in a desiccator for a few minutes; and washed
two times in 70% Ethanol and once in 96% Ethanol for one minute
each. Subsequently, the sections are dried and stored under vacuum
until the matrix is applied. [0059] (b) The tissue sections are
coated with a matrix by vaporizing a matrix solution with an
ultrasonic nebulizer, for instance, according to U.S. Pat. No.
7,667,196 B2 (Schurenberg) and US 2008/0142703 A1 (Schurenberg).
[0060] (c) Spatially resolved mass spectra of the coated tissue
sections are acquired by a time-of-flight mass spectrometer in the
linear mode. For each pixel, 200 laser shots are accumulated at
constant laser energy.
[0061] There are different ways to overlay an optical image of a
tissue section with a mass image of the same or adjacent tissue
section. Here, the MALDI imaging data set is acquired prior to the
optical image. The matrix layer applied to the tissue section in
step (b) is removed after the mass spectrometric image has been
acquired in step (c). Then the tissue section is subjected to
routine histologic staining, and the optical image is acquired.
Example 1
[0062] The dataset of example 1 covers a small region of a rat
brain, containing part of the hippocampus. The MALDI imaging
dataset was acquired at a lateral resolution of 20 .mu.m with a
CHCA matrix (alpha-Cyano-4-hydroxy-cinnamic acid). At this
resolution, the structure of the matrix crystals tends to be in the
same order of magnitude as the lateral resolution. A non-normalized
image will therefore be an overlay of the matrix structure with the
distribution of the selected compound.
[0063] FIGS. 1A to 1E show different images of a tissue section of
a rat hippocampus. FIG. 1A shows an optical image of an unstained
tissue section prior to the preparation of a matrix layer. FIG. 1B
shows an optical scan of the matrix layer after preparation. FIG.
1C shows a mass image of a selected compound without any
normalization. In FIG. 1D, the optical image of the matrix layer of
FIG. 1B is overlaid with the mass image of FIG. 1C showing that the
spatial distribution of the selected mass signal is highly effected
by the structure of the matrix layer. FIG. 1E shows the same mass
signal after normalization using the vector norm. It can be clearly
seen that the distribution of the mass signal now appears much
smoother. The mass image follows the histological structure of the
tissue section much better and shows a rather uniform distribution
outside the hippocampus.
Example 2
[0064] The dataset of example 2 is acquired from a tissue section
of a mouse pancreas. The islets of Langerhans in the mouse pancreas
are small glands in which insulin, glucagone and certain other
peptide hormones are produced and excreted. The tissue section of
the mouse pancreas is coated with sinapinic acid matrix.
[0065] FIG. 2 shows averaged mass spectra from an islet of
Langerhans (dashed line, with an aberrant insulin peak) and a
different region of similar size (solid line) of the mouse
pancreas. The intensities of insulin peaks are extremely high
compared to other protein signals, while the remaining non-insulin
signals show similar intensities in both regions. The insulin peaks
reach intensities of up to 125 counts per laser shot, while the
other signals are in the order of 1-2 counts per shot. This is an
example for a spectrum in which one highly abundant peak is present
in confined regions, being a particular problem for normalization
of prior art.
[0066] FIG. 3 shows mass images of insulin (FIG. 3A to 3C) and of a
homogeneously distributed protein (FIGS. 3D to 3F), each without
normalization, with normalization on the vector norm and with
normalization on the TIC. It becomes apparent, that the
normalization on the vector norm leads to obvious artifacts. Both
the spatial distribution and the intensity of the insulin signal
appear inflated in the islets, while the homogeneously distributed
protein appears to be absent. In contrast, the TIC-normalization is
in a better agreement with the raw data. Only in one islet of
Langerhans, an attenuation "hole" appears in the mass image of the
homogeneously distributed protein. When the TIC-normalization is
used with exclusion of the insulin signal (not shown), no holes are
present in the mass image of the homogeneously distributed
protein.
Example 3
[0067] The dataset of example 3 is acquired from a tissue section
of a rat testis. There are seminiferous tubuli present in rat
testis, in which the stem cells (spermatogonia) undergo maturation
to mature spermatids. In a rat, 14 different stages can be defined.
This process is highly structured and can appear at different
stages of maturation in the same cross section
[0068] The MALDI imaging dataset was acquired at a lateral
resolution of 20 .mu.m with a CHCA matrix
(alpha-Cyano-4-hydroxy-cinnamic acid). The high spatial resolution
is needed to resolve substructures in the tubuli. The drawback of
CHCA matrix in linear mode is that it leads to quite broad mass
signals.
[0069] FIG. 4 shows a microscopic image after H&E staining of a
tissue section of a rat testis. The optical image is obtained after
MALDI imaging and shows the same region of the rat testis as the
mass images of FIGS. 6A to 6F and FIGS. 7A to 7F. In the tissue
section analyzed here, the cross-section through a blood vessel
(41), cross-sections of seminiferous tubules (42) and the
interstitium (43) are visible. The maturation of the spermatides
takes place in the tubules. Different tubules can have different
maturation states with differing molecular signals. In this tissue
section, there is a group of tubules characterized by one aberrant
mass signal at about 6263 Dalton (FIG. 5). This mass signal is not
as intensive as that of insulin in the previous example, but it
contains a comparably large area compared to the total area of the
spectrum due to its width. Therefore it also affects the
normalization on the TIC-norm.
[0070] Importantly, the highly abundant mass signals of the mouse
pancreas and the rat testis are related to real histological
structures (islets of Langerhans and immature tubuli). It is
therefore easily possible in cases like these to accept a
normalization artifact as biologically meaningful information. It
is easily possible that a compound being present at the same
abundance across the entire tissue shows a tissue specific
distribution in a normalized mass image, which might be
misinterpreted as regulated in spermatide maturation in the case of
rat testis
[0071] FIGS. 6A to 6F show mass images of a compound that is
homogeneously distributed in the tissue section of the rat testis
except for the blood vessel. The non-normalized mass image (FIG.
6A) shows mainly the spatial distribution of the matrix layer
overlaid with the real distribution. Both the normalization on the
vector norm and the TIC-norm (FIGS. 6B and 6C) produce the same
kind of artifacts, namely a wrong down regulation of the respective
mass signal in some of the tubuli. Again, this artifact is
dangerous, because it shows a spatial distribution that is in
agreement with histology.
[0072] In FIGS. 6D to 6F, mass images normalized according to the
invention are shown: a normalization on the TIC with exclusion of
the aberrant mass signal (FIG. 6D), the normalization on the median
(FIG. 6E) and on the noise level (FIG. 6F). The mass images of the
homogeneously distributed compound look almost identical for the
median and the noise level and do not produce the artificial down
regulation. The normalization on the TIC with manual exclusion of
the aberrant mass signal shows however the smoothest distribution.
This is consistently found for other masses as well.
[0073] FIGS. 7A to 7F show mass images of a compound that is mainly
present in certain tubuli. However, it is also present in some of
those tubuli showing the aberrant mass signal, and the respective
mass signals are therefore attenuated.
[0074] By applying TIC normalization with exclusion of the aberrant
signal (FIG. 7D) or normalization on median (FIG. 7E) and on the
noise level (FIG. 7F), respectively, the mass signal is most
abundant in the seminiferous tubules but still visible in the
interstitium. As described above, normalization using the TIC-norm
with manual exclusion is least affected by the distribution of the
matrix crystals and shows the least noisy image. Without any
normalization, it is not possible to detect the characteristic
presence of this signal in the interstitium.
[0075] Ideally, a mass spectrum contains a complete baseline with
symmetric noise. This is actually one of the implicit assumptions
of normalization on the noise level or the median. There are
different reasons, why this is not always true. For example, there
may be very little matrix at a certain region, or part of the
tissue may not have adhered properly at the support, or the
detector settings of the instrument may cut off the lower part of
the baseline. In such a case it is possible to observe spectra as
the one shown in FIG. 8, where only the very upper part of the
baseline is recorded or even only electronic spikes are present in
the spectrum. Such mass spectra have a negative effect on many
normalization approaches, because they have an artificially low
TIC, noise level and median, the latter two can actually be zero or
very close to zero, which will artificially increase such spectra
after normalization. If median or noise level is zero, then the
normalization will be undefined because of a division by zero.
Therefore these mass spectra have to be excluded from
normalization.
[0076] If a particular mass signal can be matched (according to
mass) in two or more mass spectra from different tissue areas, this
signal intensity is an estimation of the abundance of a compound.
These estimates might contain errors resulting from random noise,
different signal-to-noise ratios due to varying concentrations of
the compound or electronic noise. The error can depend on the
intensity. Any statistical model would either directly account for
variances or would transform the data so that the variances are
approximately equal for all peak intensity levels. Here, two
different intensity transformations are applied prior to a
normalization by the TIC-norm of the transformed mass spectra,
namely the square root and the logarithmic transformation of the
intensities values.
[0077] FIGS. 9A1 to 9D3 show mass images of three different
compounds (peak 1, peak 2 and peak 3) after normalization applying
TIC-normalization (Figures Ax), TIC-normalization with an exclusion
list (Figures Bx), TIC-normalization after logarithmic intensity
transformation (Figures Cx) and TIC-normalization after square root
intensity transformation (Figures Dx).
[0078] As can be seen in FIGS. 9C1 to 9C3, the logarithmic
transformation leads to a "flat" look of the normalized mass images
with little structure, which makes this normalization not
applicable for MALDI imaging. The few "bright" pixels in the mass
images are a result of applying the logarithmic transformation on
mass spectra with an incomplete noise as described above. The
square root transformation (shown in FIGS. 9D1 to 9D3) leads to
structured mass images, which show similar features than the TIC
based normalization. Moreover, the square root transformation shows
only very slight artifacts compared to the TIC based normalization.
The resulting mass images show less dynamic range, which may be a
problem in the assessment of relative intensity differences in a
dataset.
[0079] FIGS. 10A1 to 10C3 show histograms of three uniformly
distributed mass signals after normalization applying the TIC-norm
with an exclusion list (Figures Ax), the TIC-norm after a square
root intensity transformation (Figures Bx) and the TIC-norm after a
logarithmic intensity transformation (Figures Cx). These mass
signals show a skewed distribution with a tail to the high
intensities after the TIC normalization (FIGS. 10A1 to 10A3). Only
a few pixels show the highest intensities. To see the true
structure of the data it is often necessary to set the maximum
intensity threshold to a value between 50% and 70% of the maximum
intensity. After the square root transformation (FIGS. 10B1 to
10B3), these signals show a much more symmetric distribution. The
logarithmic transformation (FIGS. 10C1 to 10C3) results in a very
narrow distribution with a very long tail which leads to the flat
appearance of the mass images shown in FIGS. 9C1 to 9C3.
[0080] In many IMS datasets the described problems do not appear.
In such cases, the normalization with the TIC-norm can be applied
without restriction. Because TIC-normalization seems to be superior
if applicable, it is desirable to have an automatic algorithm to
detect if TIC normalization is applicable. The correlation of the
normalization factors calculated by the median or noise level with
the ones calculated by the TIC-norm can be one way to achieve an
automatic testing.
[0081] FIG. 11 shows several correlations for the data set of the
rat testis. Since the normalization on the TIC with exclusion of
aberrant mass signals is most preferred, it is used as standard
method for the comparison. With the exception of the square root
transformation (FIG. 11F), the best correlation was observed with
the median normalization factors (FIG. 11C). Therefore, it appears
to be possible to use the correlation of a non-parametric
normalization, like median or noise level, with the
TIC-normalization without exclusion to define a threshold for the
automatic detection of problems with the TIC-normalization.
[0082] Applied to MALDI imaging data sets of tissue sections,
common normalization based on the vector norm and the TIC-norm can
lead to artifacts. However, a normalization is necessary to deal
with spatial inhomogeneities of the matrix layer. Although the
normalization on the noise level, the median or the TIC after
square root transformation can be used to get normalized mass
images without artifacts, TIC normalization with a manual exclusion
of mass signals causing the artifacts gives the best results. This
approach often needs a manual intervention by the user.
[0083] In any case, care is needed when TIC normalization (without
an exclusion list) is applied. The median normalization can be used
as an additional tool to spot artifacts generated by TIC
normalization. The comparison of the images after TIC normalization
and median normalization is a good way to test the applicability of
TIC normalization. If this comparison shows substantial differences
in the resulting normalized mass images then TIC normalization
should not be applied.
* * * * *