Process for analyzing protein samples Parker, Kenneth C. ; et al. [Aebersold, Rudolf H.]

Process for analyzing protein samples

Parker, Kenneth C. ; et al.

Patent Application Summary

U.S. patent application number 09/851058 was filed with the patent office on 2002-12-19 for process for analyzing protein samples. Invention is credited to Aebersold, Rudolf H., Huang, Yulin, Nadler, Timothy K., Parker, Kenneth C., Smolka, Marcus B., Vella, George J..

Application Number	20020192720 09/851058
Document ID	/
Family ID	25309855
Filed Date	2002-12-19

United States Patent Application	20020192720
Kind Code	A1
Parker, Kenneth C. ; et al.	December 19, 2002

Process for analyzing protein samples

Abstract

Methods using gel electrophoresis and mass spectrometry for the rapid, quantitative analysis of proteins or protein function in mixtures of proteins derived from two or more samples in one unit operation are disclosed. In one embodiment the method includes (a) preparing an extract of proteins from each of at least two different samples; (b) providing a set of substantially chemically identical and differentially isotopically labeled protein reagents, one for each sample; (c) reacting each protein sample of step (a) with a different reagent from the set of step (b) to provide isotopically labeled proteins; (d) mixing each of said isotopically labeled proteins to form a single mixture of different isotopically labeled proteins; (e) electrophoresing the mixture of step (d) by an electrophoresing method capable of separating proteins within said mixture; and (f) detecting the difference in the expression levels of the proteins in the two samples by mass spectrometry based on individual peptides derived from chemical or enzymatic digestion. The analytical method can be used for qualitative and particularly for quantitative analysis of global protein expression profiles in cells and tissues, i.e. the quantitative analysis of proteomes.

Inventors:	Parker, Kenneth C.; (Hopkinton, MA) ; Nadler, Timothy K.; (Framingham, MA) ; Vella, George J.; (Medway, MA) ; Huang, Yulin; (Westwood, MA) ; Aebersold, Rudolf H.; (Mercer Island, WA) ; Smolka, Marcus B.; (Sao Paulo, BR)
Correspondence Address:	Chief Patent Counsel PerSeptive Biosystems, Inc. 500 Old Connecticut Path Framingham MA 01701 US
Family ID:	25309855
Appl. No.:	09/851058
Filed:	May 8, 2001

Current U.S. Class:	435/7.9 ; 436/517
Current CPC Class:	G01N 33/6803 20130101; Y10T 436/24 20150115; Y10T 436/25125 20150115; Y10S 435/964 20130101
Class at Publication:	435/7.9 ; 436/517
International Class:	G01N 033/53; G01N 033/542; G01N 033/557

Claims

We claim:

1. A method of comparing protein compositions of interest between at least two different samples which comprises: (a) preparing an extract of proteins from each of said at least two different samples; (b) providing a set of substantially chemically identical and differentially isotopically labeled protein reagents, one for each sample wherein said reagent has a formula selected from the group consisting of: A-L-PRG and L-PRG wherein A is an affinity label that selectively binds to a captive reagent, L is a linker group in which one or more atoms are differentially labeled with one or more stable isotopes and PRG is a protein reactive group that selectively reacts with a given protein functional group or is a substrate for an enzyme; (c) reacting each protein sample of step (a) with a different reagent from said set of step (b) to provide isotopically labeled proteins; (d) mixing each of said isotopically labeled proteins to form a single mixture of different isotopically labeled proteins; (e) electrophoresing the mixture of step (d) by an electrophoresing method capable of separating proteins within said mixture; and (f) detecting the difference in the expression levels of the proteins in the two samples by mass spectrometry based on individual peptides derived from chemical or enzymatic digestion.

2. The method of claim 1 wherein said reagent has the formula: A-L-PRG and affinity tagged proteins in the samples are enzymatically or chemically processed to convert them into labeled peptides.

3. The method of claim 1 wherein said reagent has the formula: L-PRG and labeled proteins in the samples are enzymatically or chemically processed to convert them into labeled peptides.

4. The method of any one of claims 1, 2 or 3 wherein the protein or peptide portion of one or more of the labeled proteins are sequenced by tandem mass spectrometry to identify the labeled protein from which the peptide originated.

5. The method of any one of claims 1, 2 or 3 wherein the proteins are identified by peptide mass fingerprinting, and the isotopically labeled peptides are used for quantitation.

6. The method of any one of claims 1, 2 or 3 in which the amount of one or more proteins or peptides in the samples is also determined by mass spectrometry and which further comprises the step of introducing into a sample a known amount of one or more internal standards for each of the proteins to be quantified.

7. The method of any one of claims 1, 2 or 3 wherein the released isotopically labeled proteins or peptides are separated by chromatography prior to detecting and detection by mass spectrometry.

8. The method of claims 1, 2 or 3 where the samples consist of protein mixtures derived from tissues, cells, biological fluids including serum, cerebrospinal fluid, urine, ascites, or subcellular fractions including supernatants and various membrane-containing organelles or nuclear preparations, or protein preparations separated by chromatographic methods, capillary electrochromatography or capillary electrophoresis methods.

9. The method of claims 1, 2 or 3 where the proteins are identified by any protein staining technique, or where protein-containing regions are localized by mass spectrometry following systematic digestion and extraction or any combination of transblotting and digestion.

10. The method of any one of claims 1, 2 or 3 in which a plurality of proteins or peptides in one sample are detected and identified.

11. The method of any one of claims 1, 2 or 3 further comprising a step in which one or more of the proteins or peptides in a sample are chemically or enzymatically processed to expose a functional group that can react with a label.

12. The method of any one of claims 1, 2 or 3 wherein PRG is a protein reactive group that selectively reacts with certain protein functional groups and a plurality of proteins or peptides are detected and identified in a single sample.

13. The method of claim 12 wherein two or more substantially chemically identical and differentially isotopically labeled protein reactive reagents having different specificities for reaction with proteins or peptides are provided and reacted with each sample to be analyzed.

14. The method of claim 13 wherein all of the proteins or peptides in a sample are detected and identified.

15. The method of any one of claims 1, 2 or 3 wherein the relative amounts of one or more proteins or peptides in two or more different samples are determined and which further comprises the steps of combining the differentially labeled samples, capturing isotopically labeled components from the combined samples and measuring the relative abundances of the differentially labeled proteins or peptides.

16. The method of claim 1, 2 or 3 which determines the relative amounts of membrane proteins in one or more different samples.

17. The method of claim 15 in which different samples contain proteins originating from different organelles or different subcellular fractions.

18. The method of claim 15 in which different samples represent proteins or peptides expressed in response to different environmental or nutritional conditions, different chemical or physical stimuli or at different times.

19. The method of claim 1 wherein absolute protein concentration is deduced by comparison to a known amount of a deuterated or non-deuterated peptide standard, where this standard was derived by chemical synthesis or was isolated from biological samples.

20. The method of claim 1 whereby multiple samples are labeled with PRG containing different numbers of heavy atoms so that multiple samples can be separated on a single gel and analyzed at one time.

21. The method of claim 1 whereby proteins of special interest that are previously known to be particularly informative are analyzed based on their location on a 1D or 2D gel. These proteins can include disease markers as well as control proteins.

22. The method of claim 1 whereby the post-translational modification status of particular proteins are monitored by gel analysis.

Description

FIELD OF THE INVENTION

[0001] This invention relates to a process for detecting differences in protein composition between complex protein samples such as cell lysates, cell extracts, or tissue extracts. More particularly this invention relates to a process for analyzing protein compositions using gel electrophoresis utilizing at least two labeled reagents capable of detecting such differences.

BACKGROUND OF THE INVENTION

[0002] Two dimensional (2D) electrophoresis has long been a mainstay in the quantitative analysis of complex mixtures of proteins, as from cell lysates or organelles. The traditional approach for quantifying proteins is to perform image analysis of the gels. The proteins can be detected by staining the proteins, by autoradiography, or even by using antibodies specific for certain proteins (Western blotting). Although powerful software has been developed to quantify the amount of protein that migrates to a spot in a gel, there is a limit to how much information can be obtained by such analyses even if the gels are perfectly reproducible and even if the software for spot analysis is able to resolve ambiguities of overlapping spots and uneven backgrounds. Recently, mass spectrometric techniques were described in published PCT International Application WO 00/11208 in which stable isotopes are incorporated into peptides derived from each proteins that bypasses the need for gels and for image analysis of any kind, because quantitation is performed by a mass spectrometer. However, when proteins are digested ahead of time, almost all information relating to protein chemical modification is lost, and the quantitative information for different proteins that share the peptide that is detected is combined together.

[0003] Proteins are essential for the control and execution of virtually every biological process. The rate of synthesis and the half-life of proteins and thus their expression level are also controlled post-transcriptionally. Furthermore, the activity of proteins is frequently modulated by post-translational modifications, in particular protein phosphorylation, and dependent on the association of the protein with other molecules including DNA and proteins. Neither the level of expression nor the state of activity of proteins is therefore directly apparent from the gene sequence or even the expression level of the corresponding mRNA transcript. It is therefore highly desirable that a complete description of a biological system include measurements that indicate the identity, quantity and the state of activity of the proteins which constitute the system. The large-scale (ultimately global) analysis of proteins expressed in a cell or tissue has been termed proteome analysis. Proteome analysis permits the detection and monitoring of differences in cell structure, function and development. The capability of determining differences in protein content between normal cells and abnormal cells such as cancerous cells is a valuable diagnostic tool.

[0004] At present no protein analytical technology approaches the throughput and level of automation of presently available genomic technology. The most common implementation of proteome analysis is based on the separation of complex protein samples most commonly by 2D gel electrophoresis (2 DE) and the subsequent sequential identification of the separated protein species, typically by mass spectrometry. This approach has been revolutionized by the development of powerful mass spectrometric techniques and the development of computer algorithms which correlate protein and peptide mass spectral data with sequence databases and thus rapidly and conclusively identify proteins. This technology has reached a level of sensitivity which now permits the identification of essentially any protein which is detectable by conventional protein staining methods including silver staining. In the 2 DE/MS.sup.n method, proteins are quantified by densitometry of stained spots in the 2 DE gels, followed by mass spectrometry (MS), tandem mass spectrometry (MSMS or MS.sup.2), or multiple rounds of mass spectrometry (MS).sup.n. Alternatively, the staining step can be omitted, and the proteins can be detected by mass spectrometry, for example, by analyzing extracts of every slice from a 1D gel, or from every piece of a 2D gel, or by scanning membranes onto which digests from such gels have been deposited by transblotting (Bienvenut et al., Anal. Chem. 71:4800-4807, 1999).

[0005] In gel electrophoresis, proteins can be separated into individual components according to differences in mass by electrophoresing a protein mixture in a polyacrylamide gel under denaturing conditions. One dimensional and two dimensional gel electrophoresis have become standard tools for studying proteins. One dimensional SDS (sodium dodecyl sulfate) electrophoresis through a cylindrical or slab gel reveals only the major proteins present in a sample tested. Two dimensional polyacrylamide gel electrophoresis (2D PAGE), which separates proteins by isoelectric focusing, i.e., by charge, in one dimension and by size in the second dimension, provides higher resolving power, which is important when there are many proteins in the sample. The proteins migrate in one-or two-dimensional gels as bands or spots respectively. The separated proteins are visualized by a variety of methods, such as by staining with a protein specific dye, by protein mediated silver precipitation, autoradiographic detection of radioactively labeled protein, and by covalent or non-covalent attachment of fluorescent compounds. Immediately following the electrophoresis, the resulting gel patterns may be visualized by eye, photographically or by electronic image capture, for example, by using a cooled charge-coupled device (CCD). To compare samples of proteins from different cells or different stages of cell development by conventional methods, each different sample is presently run on separate lanes of a one dimensional gel or separate two dimensional gels. Comparison is by visual examination or electronic imaging, for example, by computer-aided image analysis of digitized one or two dimensional gels. The goal of such research is often to determine which proteins out of the hundreds of proteins that can be detected have changed in expression level between a control sample and one or more experimental samples.

[0006] Two dimensional gel electrophoresis has been a powerful tool for resolving complex mixtures of proteins. The differences in migration between the proteins, however, can be subtle. Imperfections in the gel can interfere with accurate observations. In order to minimize the imperfections, the gels provided in commercially available electrophoresis systems are prepared with exacting precision. Even with meticulous controls, no two gels are identical. The gels may differ one from the other in pH gradients or uniformity. In addition, the electrophoresis conditions from one run to the next may be different. Computer software has been developed for automated alignment of different gels. However, all of the software packages are based on linear expansion or contraction of one or both of the dimensions on two dimensional gels. The software has difficulty adjusting for local distortions in the gels. The ideal way to overcome such limitations is to combine the two samples prior to gel electrophoresis, assuming the two samples can be distinguished from one another at the analysis stage.

[0007] It has been proposed in U.S. Pat. Nos. 6,043,025 and 6,127,134 to provide a process for analyzing protein compositions from at least two samples wherein one sample is stained with a first dye and a second sample is stained with a second dye. The samples then are separated either by a 1D or 2D gel electrophoresis process to effect protein separation into a plurality of spots. A spot of interest then is analyzed to determine the difference in luminescent intensity of the dyes thereby to determine protein concentration from each sample. The camera is able to distinguish between the two dyes by the wavelengths of the emitted light, although dynamic range can be compromised due to a small amount of spectral overlap between the dyes. For this quantitation to be precise, the two species of proteins must migrate to exactly the same spot, ideally the same position as the unmodified protein. In some instances, only a small proportion of the protein is initially stained with the dyes. If there is any separation of stained from unstained proteins, then some fluorescent proteins may co-migrate with unrelated unstained proteins, resulting in misleading identifications in cases in which the protein is identified post electrophoresis.

[0008] The development of methods and instrumentation for automated, data-dependent electrospray ionization (ESI) tandem mass spectrometry (MS.sup.n) in conjunction with microcapillary liquid chromatography (.mu.LC) and database searching has significantly increased the sensitivity and speed of the identification of gel-separated proteins. As an alternative to the 2 DE/MS.sup.n approach to proteome analysis, the direct analysis by tandem mass spectrometry of peptide mixtures generated by the digestion of complex protein mixtures has been proposed (Ducret et al., Prot. Sci. 7:706-719,1998). Tandem .mu.LC/MSMS has also been used successfully for the large-scale identification of individual proteins directly from mixtures without gel electrophoretic separation (Yates et al., Methods Mol. Biol., 146: 17-26, 2000; Link et al., Nat. Biotechnol. 17:676-82, 1999; Opitek et al., Anal. Chem. 64: 1518-1524, 1997). While these approaches dramatically accelerate protein identification, the absolute or relative quantities of the analyzed proteins cannot be easily determined, and these methods have not been shown to substantially alleviate the dynamic range problem also encountered by the 2 DE/MSMS approach (Gygi et al., Proc. Natl. Acad. Sci. USA 17:9390-5, 2000). Therefore, low abundance proteins in complex samples are also difficult to analyze by the .mu.LC/MSMS method without their prior enrichment.

[0009] An alternative to quantifying proteins in complex mixtures after SDS PAGE or 2D PAGE on the basis of staining intensity using conventional protein stains or fluorescent stains is to use protein stains to localize the regions of interest. Following proteolytic digestion, the peptides may then be labeled with stable isotopes, for example with deuterated nicotinoyloxysuccinimide (Munchbach, Quadroni, Miotto and James, Anal. Chem. A, 2000), which allows mass spectrometry to be used for quantitation. This approach suffers from the drawback that the protein ratio obtained is dependent on how carefully the spots are excised from the gel. Also, the control and the experimental sample must be run on separate gels.

[0010] Alternatively, isotopically labeled amino acid precursors may be introduced specifically into one of the two samples prior to proteolytic digestion (Sechi and Chait, Anal. Chem., 24:5150-8, 1998, Chen, Smith and Bradbury, Anal. Chem. 72: 1134-1143, 2000). This approach suffers from the drawback that the proteins must be isolated from culture conditions that allow close to complete replacement of the unlabeled amino acid precursors by the labeled precursors, or the intensity of each peptide will be spread out over a larger isotope cluster than usual, compromising both sensitivity and quantitation.

[0011] Recently, an approach was developed involving isotope coded affinity tags (ICAT.TM.) that combines the incorporation of stable isotopes into the cysteine-containing peptides of proteins with the ability to affinity purify these modified peptides and to subsequently detect the proteins by mass spectrometry (Gygi et al., Nat Biotechnol., 17:994-9, 1999). Reagents useful in carrying out this method are commercially available from Applied Biosystems (Foster City, Calif.) under the ICAT.TM. brand. Because proteins typically have a small number of cysteine residues, it becomes possible to identify large numbers of proteins by focusing on a small subset of the peptides that are generated upon proteolytic digestion, making it possible to penetrate further into the proteome without being overwhelmed by large numbers of peptides from the most abundant proteins. Because the quantitation is performed by mass spectrometry, two or more samples can be combined together prior to analysis, so that artifactual sample processing differences do not affect the results so long as they take place after cysteine modification.

[0012] There are, however, several limitations to the previously described ICAT reagent based technology that in certain cases limit the information that can be obtained from the experiment. The cysteine containing peptides should be sufficiently long to uniquely identify proteins (or classes of homologous proteins). Because each peptide is separately purified, MS.sup.n techniques are often used to identify the protein from which the peptide was derived, instead of the simpler peptide mass fingerprinting (PMF) technique. No information is retained about the intact molecular weight of the protein(s) from which the cysteine-containing peptide was derived, or whether the protein was chemically modified by phosphorylation. Finally, no information is obtained from proteins that do not contain cysteine.

[0013] The present invention combines mass spectrometric quantitation with the resolving power of 2D electrophoresis so that differences in protein compositions from two or more samples containing complex mixtures can be determined from a single 2D gel. This extension to the current state of ICAT reagent technology overcomes each of the foregoing limitations. Proteins are modified by using the same ICAT reagent technology as before. However, all the advantages of protein separation by 2D gels are preserved. Although analysis of the ICAT reagent labeled peptides themselves usually leads to no information about the chemical modification of the protein from which they derived, the position of the protein on the gel is indicative of whether the protein was modified. Also, the chemically modified peptides themselves are present in the same spot, thus the ICAT reagent labeled peptides can still be used for quantitation of the relative amounts of each of the modified species. In addition, ICAT reagent containing peptides of any length are now informative because any one spot contains very few proteins. This also makes it possible to use PMF to identify the proteins, including any non-cysteine containing proteins that may be present at the same spot on the gel. These techniques still allow simultaneous processing of two or more samples such as those obtained from an experimental and a control sample. This same combination of technologies is also applicable to less resolving gel systems like 1D SDS PAGE gel analysis, 1D isoelectric focusing gels and the like.

SUMMARY OF THE INVENTION

[0014] This invention provides methods based upon 1D and 2D gel electrophoresis and mass spectrometry for the rapid, quantitative analysis of proteins or protein function in mixtures of proteins derived from two or more samples in one unit operation. Thus, only one gel must be performed in order to deduce which proteins have changed in expression level between the experimental sample and the control sample because the quantitation is determined by mass spectrometry. The analytical method can be used for qualitative and particularly for quantitative analysis of global protein expression profiles in cells and tissues, i.e. the quantitative analysis of proteomes. The method can also be employed to screen for and identify proteins whose expression level in cells, tissue or biological fluids is affected by a stimulus (e.g., administration of a drug or contact with a potentially toxic material), by a change in environment (e.g., nutrient level, temperature, passage of time) or by a change in condition or cell state (e.g., disease state, malignancy, site-directed mutation, gene knockouts) of the cell, tissue or organism from which the sample originated. The proteins identified in such a screen can function as markers for the changed state. For example, comparisons of protein expression profiles of normal and malignant cells can result in the identification of proteins whose presence or absence is characteristic and diagnostic of the malignancy.

[0015] The methods herein can also be used to implement a variety of clinical and diagnostic analyses to detect the presence, absence, deficiency or excess of a given protein or protein function in a biological fluid (e.g., blood), or in cells or tissue. The method is particularly useful in the analysis of complex mixtures of proteins, i.e., those containing 5 or more distinct proteins or protein functions. This method can also be used to look for absolute, quantitative changes if specific calibrated standards are labeled.

[0016] As with the techniques described in the aforementioned published PCT patent application (WO 00/11208), the present invention employs an isotopically labeled protein which can be either an affinity-labeled protein reactive reagent or non-affinity labeled protein reactive reagent that allows for the selective isolation of peptide fragments from complex mixtures. First, the control and the experimental sample(s) are labeled separately with different isotopic variants of the ICAT reagent, and are then combined. Separation of the protein components of the two or more samples is effected by either 1D or 2D gel electrophoresis followed by protein digestion. The isolated peptide fragments or reaction products are characteristic of the presence of a protein in those mixtures. Isolated peptides are characterized by mass spectrometric (MS) techniques. The most abundant proteins may be identified by peptide mass fingerprinting. Alternatively, the sequence of isolated peptides can be determined using tandem MS (MS.sup.n) techniques, and by application of presently available sequence database searching techniques, the protein from which the sequenced peptide originated can be identified. The reagents utilized in the process of this invention provide for differential isotopic labeling of the isolated peptides that facilitates quantitative determination by mass spectrometry of the relative amounts of proteins in different samples. Also, the use of differentially isotopically labeled reagents as internal standards of known concentration facilitates quantitative determination of the absolute amounts of one or more proteins or reaction products present in the sample.

[0017] In general, the affinity labeled protein reactive reagents utilized in the process of this invention have three portions: an affinity label (A) covalently linked to a protein reactive group (PRG) through a linker group (L):

A-L-PRG

[0018] The linker may be differentially isotopically labeled, e.g., by substitution of one or more atoms in the linker with a stable isotope thereof. For example, hydrogen atoms can be substituted with deuterium atoms or .sup.12C with .sup.13C.

[0019] The non-affinity labeled protein reactive reagents utilized in the process of this invention have two portions: a protein reactive group (PRG) and a linker group (L):

L-PRG

[0020] which are as defined above.

[0021] The affinity label A functions as a molecular handle that selectively binds covalently or non-covalently, to a capture reagent (CR). Binding to CR facilitates isolation of peptides labeled with A. In specific embodiments, A is a streptavidin or avidin. After affinity isolation of affinity tagged materials, some of which may be isotopically labeled, the interaction between A and the capture reagent is disrupted or broken to allow MS analysis of the isolated materials. The affinity label, when utilized, can be displaced from the capture reagent by addition of displacing ligand, which may be free A or a derivative of A, or by changing solvent (e.g., solvent type or pH) or temperature conditions or the linker may be cleaved chemically, enzymatically, thermally or photochemically to release the isolated materials for MS analysis.

[0022] The type of PRG group that is specifically provided herein include those groups that selectively react with a protein functional group to form a covalent or non-covalent bond tagging the protein at specific sites. In specific embodiments, PRG is a group having specific reactivity for certain protein groups, such as specificity for sulfhydryl groups, and is useful in general for selectively tagging proteins in complex mixtures. A sulfhydryl specific reagent tags proteins containing cysteine.

[0023] Exemplary reagents useful in the process of this invention have the general formula

A--B.sup.1--X.sup.1--(CH.sub.2).sub.n--[X.sup.2--(CH.sub.2).sub.m].sub.x--- X.sup.3--(CH.sub.2).sub.p--X.sup.4--B.sup.2-PRG

[0024] where:

[0025] A is optionally present and is the affinity label;

[0026] PRG is the protein reactive group;

[0027] X.sup.1', X.sup.2, X.sup.3 and X.sup.4, independently of one another, and X.sup.2 independently of other X.sup.2 in the linker group, can be selected from O, S, NH, NR, NRR'.sup.+, CO, COO, COS, S--S, SO, S0.sub.2, CO--NR', CS--NR', Si--O, aryl or diaryl groups or X.sup.I-X.sup.4 may be absent, but preferably at least one of X.sup.1-X.sup.4 is present;

[0028] B.sup.1 and B.sup.2, independently of one another, are optional moieties that can facilitate bonding of the A or PRG group to the linker or prevent undesired cleavage of those groups from the linker and can be selected, for example, from COO, CO, CO--NR', CS--NR' and may contain one or more CH.sub.2 groups alone or in combination with other groups, e.g.(CH.sub.2).sub.q--CONR', (CH.sub.2).sub.q--CS--NR', or (CH.sub.2).sub.q;

[0029] n, m, p and q are whole numbers that can have values from 0 to about 100, preferably one of n, m, p or q is not 0 and x is also a whole number that can range from 0 to about 100 where the sum of n+xm+p+q is preferably less than about 100 and more preferably less than about 20;

[0030] R is an alkyl, alkenyl, alkynyl, alkoxy or aryl group; and

[0031] R' is a hydrogen, an alkyl, alkenyl, alkynyl, alkoxy or aryl group.

[0032] One or more of the CH.sub.2 groups of the linker can be optionally substituted with small (C.sub.1-C.sub.6) alkyl, alkenyl, or alkoxy groups, an aryl group or can be substituted with functional groups that promote ionization, such as acidic or basic groups or groups carrying permanent positive or negative charge. One or more single bonds connecting CH.sub.2 groups in the linker can be replaced with a double or a triple bond. Preferred R and R' alkyl, alkenyl, alkynyl or alkoxy groups are small having 1 to about 6 carbon atoms.

[0033] One or more of the atoms in the linker can be substituted with a stable isotope to generate one or more substantially chemically identical, but isotopically distinguishable reagents. For example, one or more hydrogens in the linker can be substituted with deuterium to generate isotopically heavy reagents.

[0034] In an exemplary embodiment the linker contains groups that can be cleaved to remove the affinity tag. If a cleavable linker group is employed, it is typically cleaved after affinity tagged peptides have been isolated using the affinity label together with the CR. In this case, any isotopic labeling in the linker preferably remains bound to the protein or peptide.

[0035] Linker groups include among others: ethers, polyethers, ether diamines, polyether diamines, diamines, amides, polyamides, polythioethers, disulfides, silyl ethers, alkyl or alkenyl chains (straight chain or branched and portions of which may be cyclic), aryl, diaryl or alkyl-aryl groups. Aryl groups in linkers can contain one or more heteroatoms (e.g., N, O or S atoms).

[0036] In one aspect, the invention provides a gel electrophoresis mass spectrometric method for identification and quantitation of one or more proteins in a complex mixture which employs affinity labeled reagents in which the PRG is a group that selectively reacts with certain amino acids or derivatives of amino acids that are typically found in proteins (e.g., sulfhydryl, amino, carboxy, homoserine lactone groups). Labeled reagents that optionally can contain an affinity label and with different PRG groups are introduced into a mixture containing proteins and the reagents react with certain proteins to tag them. In each case, it is necessary either to obtain stoichiometric protein modification with the isotope labeled reagent, or to modify the isotope labeled reagent so that the protein migrates homogeneously on the gel system to be employed. It may be necessary to pretreat the protein mixture to reduce disulfide bonds or otherwise facilitate labeling. After reaction with the labeled reagents, the multiple samples are combined, preferably in equal amounts, and the proteins in the complex mixture separated by either 1D or 2D gel electrophoresis. The gel is then stained to reveal the location of the proteins. The area of the gel containing the protein mixture or mixtures of interest is then excised and cleaved, e.g., enzymatically, into a number of peptides, or the gel is sliced uniformly so that all pieces can be analyzed. Alternatively, the proteins may be electroblotted to a membrane, and digestion performed on the membrane. As a third alternative, the proteins may be continuously eluted from the bottom of the gel and collected as fractions, followed by digestion. This digestion step may not be necessary, if the proteins are relatively small. After the peptides are purified, the protein(s) may be identified by means of peptide mass fingerprinting (PMF). When utilizing a reagent labeled with an affinity label, peptides that remain tagged with the affinity label are then isolated by an affinity isolation step, e.g., affinity chromatography, via their selective binding to the CR. Isolated peptides are released from the CR by displacement of A or cleavage of the linker, and released materials are analyzed by liquid chromatography/mass spectrometry (LC/MS). When a non-affinity labeled reagent is utilized, this affinity isolation step is not effected. The sequence of one or more tagged peptides is then determined by MSMS techniques, if necessary. In some cases, at least one peptide sequence derived from a protein will be characteristic of that protein and be indicative of its presence in the mixture. In other cases, the isotopically labeled peptide may be too short to uniquely identify a protein, and the use of PMF data may be necessary to identify the protein of origin. In other cases, the isotopically labeled peptides may be identical within a family of closely related proteins, which can then be distinguished by PMF or by MSMS analysis of other peptides present in the mixture that are unique to specific proteins. Finally, the high resolving power of 2D gel electrophoresis makes it possible to distinguish between different chemically modified forms of the same protein coding sequence, even if these proteins overlap in space with other unrelated proteins. Thus, the sequences of the peptides and the peptide mass fingerprint information together typically provide sufficient information to identify one or more proteins present in a mixture, even if the sequence of the isotopically labeled peptide is not sufficiently informative by itself.

[0037] The relative amounts of proteins in one or more different samples containing protein mixtures (e.g., biological fluids, cell or tissue lysates, etc.) can be determined using chemically identical but differentially isotopically labeled reagents. These reagents may, but need not, contain an affinity tag. In this method, each sample to be compared is treated with a different isotopically labeled reagent to label certain proteins therein. Tagged peptides originating from different samples are distinguished from one another by their mass, even though they have the same chemical composition. Peptides characteristic of their protein origin are identified using MS or MS.sup.n techniques allowing identification of proteins in the samples. The relative amounts of a given protein in each sample is determined by comparing relative abundance of the ions generated from any differentially labeled peptides originating from that protein. The method can be used to assess simultaneously the relative amounts of known proteins that originated in different samples. Further, since the method does not require any prior knowledge of the type of proteins that may be present in the samples, it can be used to identify proteins which are present at different levels in the samples examined. More specifically, the method can be applied to screen for and identify proteins which exhibit differential expression in cells, tissue or biological fluids. It is also possible to determine the absolute amounts of specific proteins in a complex mixture. In this case, a known amount of internal standard, one for each specific protein in the mixture to be quantified, is added to the sample to be analyzed. The internal standard is a peptide that is identical in chemical structure to the labeled peptide to be quantified except that the internal standard is differentially isotopically labeled than the peptide to be quantified. The internal standard can be provided in the sample to be analyzed in other ways. For example, a specific protein or set of proteins can be chemically tagged with an isotopically labeled reagent. A known amount of this material can be added to the sample to be analyzed. Also, it is possible to quantify the levels of specific proteins in multiple samples in a single analysis (multiplexing). In this case, affinity tagging reagents used to derivative proteins present in different labeled peptides from different samples can be selectively quantified by mass spectrometry.

[0038] The method of the present invention provides for quantitative measurement of specific proteins in biological fluids, cells or tissues and can be applied to determine global protein expression profiles in different cells and tissues. The same general strategy can be broadened to achieve the proteome-wide, qualitative and quantitative analysis of the state of modification of proteins, by employing labeled reagents with differing specificity for reaction with modified amino acid residues. The method of this invention can be used to identify low abundance proteins in complex mixtures and can be used to selectively analyze specific groups or classes of proteins such as membrane or cell surface proteins, or proteins contained within organelles, sub-cellular fractions, or biochemical fractions such as immunoprecipitates. Further, these methods can be applied to analyze differences in expressed proteins in different cell states. For example, the methods herein can be employed in diagnostic assays for the detection of the presence or the absence of one or more proteins indicative of a disease state, such as cancer.

BRIEF DESCRIPTION OF THE DRAWINGS

[0039] FIG. 1 is an image of a 2D gel onto which five different standard proteins had been loaded, with insets of mass spectra showing the regions that contained ICAT.TM. reagent pairs in accordance with the present invention. Also listed is the ratio at which the proteins were mixed prior to electrophoresis, and the ratio that was obtained upon measurement of the intensities of the ICAT reagent pairs.

[0040] FIG. 2 is an expanded view of the spot for lactalbumin, segmented into quadrants. Also shown are the regions of a mass spectrum containing one ICAT reagent pair, and the intensity ratio that was determined for each of them in accordance with the present invention.

[0041] FIG. 3 is a set of mass spectra obtained from one fraction of a mixture of two lysates of E. coli that had been labeled separately with ICAT reagent prior to electrophoresis through a flow-through gel apparatus in accordance with the present invention. The first panel shows the entire peptide mass fingerprint that was obtained for one particular fraction after digestion with trypsin, and the second panel shows the peptides that were retained and eluted from avidin beads for this fraction. Two ICAT reagent pairs are shown in the insets.

DESCRIPTION OF THE SPECIFIC EMBODIMENTS

[0042] One aspect of this invention employs affinity tagged protein reactive reagents in which the affinity tag is covalently attached to a protein reactive group by a linker or a reagent free of an affinity tag and which comprises a protein reactive group covalently attached to a linker. The linker is isotopically labeled to generate pairs or sets of reagents that are substantially chemically identical, but which are distinguishable by mass. For example a pair of reagents, one of which is isotopically heavy and the other of which is isotopically light can be employed for the comparison of two samples one of which may be a reference sample containing one or more known proteins in known amounts. For example, any one or more of the hydrogen, nitrogen, oxygen or sulfur atoms in the linker may be replaced with their isotopically stable isotopes .sup.2H, .sup.13C, .sup.15N, .sup.17O, .sup.18O or .sup.34S.

[0043] When utilized, suitable affinity tags bind selectively either covalently or non-covalently and with high affinity to a capture reagent (CR). The CR-A interaction or bond should remain intact after extensive and multiple washings with a variety of solutions to remove non-specifically bound components. The affinity tag binds minimally or preferably not at all to components in the assay system, except CR, and does not significantly bind to surfaces of reaction vessels. Any non-specific interaction of the affinity tag with other components or surfaces should be disrupted by multiple washes that leave CR-A intact. Further, it must be possible to disrupt the interaction of A and CR to release peptides, substrates or reaction products, for example, by addition of a displacing ligand or by changing the temperature or solvent conditions. Preferably, neither CR nor A react chemically with other components in the assay system and both groups should be chemically stable over the time period of an assay or experiment. The affinity tag preferably does not undergo peptide-like fragmentation during (MS).sup.n analysis. The affinity label is preferably soluble in the sample liquid to be analyzed and the CR should remain soluble in the sample liquid even though attached to an insoluble resin such as Agarose. In the case of CR, the term soluble means that CR is sufficiently hydrated or otherwise solvated such that it functions properly for binding to A. CR or CR-containing conjugates should not be present in the sample to be analyzed, except when added to capture A.

[0044] Examples of A and CR pairs include:

[0045] biotin or structurally modified biotin-based reagents, including iminobiotin, which bind to proteins of the avidin/streptavidin, which may, for example, be used in the forms of streptavidin-Agarose, oligomeric-avidin-Agarose, or monomeric-avidin Agarose;

[0046] any 1,2-diol, such as 1,2-dihydroxyethane (HO--CH.sub.2--CH.sub.2--- OH), and other 1,2 dihyroxyalkanes including those of cyclic alkanes, e.g., 1,2-dihydroxycyclohexane which bind to an alkyl or aryl boronic acid or boronic acid esters, such as phenyl B(OH).sub.2 or hexyl-B(O Ethyl).sub.2 which may be attached via the alkyl or aryl group to a solid support material, such as Agarose;

[0047] maltose which binds to maltose binding protein (as well as any other sugar/sugar binding protein pair or more generally to any ligand/ligand binding protein pairs that has properties discussed above);

[0048] a hapten, such as dinitrophenyl group, for any antibody where the hapten binds to an anti-hapten antibody that recognizes the hapten, for example the dinitrophenyl group will bind to an anti-dinitrophenyl-IgG;

[0049] a ligand which binds to a transition metal, for example, an oligomeric histidine will bind to Ni(II), the transition metal CR may be used in the form of a resin bound chelated transition metal, such as nitrilotriacetic acid-chelated NI(ii) or iminodiacetic acid chelated Ni(II);

[0050] glutathione which binds to glutathione-S-transferase.

[0051] In general, any A-CR pair commonly used for affinity enrichment which meets the suitability criteria discussed above can be used. Biotin and biotin-based affinity tags are preferred. Of particular interest are structurally modified biotins, such as iminobiotin, which will elute from avidin or streptavidin columns under solvent conditions compatible with ESI-MS analysis, such as dilute acids containing 10-20% organic solvent. It is expected that iminobiotin tagged compounds will elute in solvents below pH 4. Iminobiotin tagged protein reactive reagents can be synthesized by methods described herein for the corresponding biotin tagged reagents. In one preferred embodiment, the affinity enrichment medium consists of monomeric avidin, which has a lower affinity for biotin than tetrameric avidin, and therefore can be recycled and used for the purification of peptides from many fractions.

[0052] A displacement ligand, DL, is optionally used to displace A from CR. Suitable DLs are not typically present in samples unless added. DL should be chemically and enzymatically stable in the sample to be analyzed and should not react with or bind to components (other than CR) in samples or bind non-specifically to reaction vessel walls. DL preferably does not undergo peptide-like fragmentation during MS analysis, and its presence in sample should not significantly suppress the ionization of tagged peptide, substrate or reaction product conjugates. DL itself preferably is minimally ionized during mass spectrometric analysis and the formation of ions composed of DL clusters is preferably minimal. The selection of DL, depends upon the A and CR groups that are employed. In general, DL is selected to displace A from CR in a reasonable time scale, at most within a week of its addition, but more preferably within a few minutes or up to an hour. The affinity of DL for CR should be comparable to or stronger than the affinity of the tagged compounds containing A for CR. Furthermore, DL should be soluble in the solvent used during the elution of tagged compounds containing A from CR. DL preferably is free A or a derivative or structural modification of A. Examples of DL include, biotin or biotin derivatives, particularly those containing groups that suppress cluster formation or suppress ionization in MS.

[0053] The linker group (L) should be soluble in the sample liquid to be analyzed and it should be stable with respect to chemical reaction, e.g., substantially chemically inert, with components of the sample as well as A and CR groups. The linker when bound to A should not interfere with the specific interaction of A with CR or interfere with the displacement of A from CR by a displacing ligand or by a change in temperature or solvent. The linker should bind minimally or preferably not at all to other components in the system, to reaction vessel surfaces or CR. Any non-specific interactions of the linker should be broken after multiple washes which leave the A-CR complex intact. Linkers preferably do not undergo peptide-like fragmentation during (MS).sup.n analysis. At least some of the atoms in the linker groups should be readily replaceable with stable heavy-atom isotopes, The linker preferably contains groups or moieties that facilitate ionization of the affinity tagged reagents, peptides, substrates or reaction products.

[0054] To promote ionization, the linker may contain acidic or basic groups, e.g., COOH, S0.sub.3H, primary, secondary or tertiary amino groups, nitrogen-heterocycles, ethers, or combinations of these groups. The linker may also contain groups having a permanent charge, e.g., phosphonium groups, quaternary ammonium groups, sulfonium groups, chelated metal ions, tetralkyl or tetraryl borate or stable carbanions.

[0055] The covalent bond of the linker to A or PRG should typically not be unintentionally cleaved by chemical or enzymatic reactions during the assay. In some cases it may be desirable to cleave the linker from the affinity tag A or from the PRG, for example to facilitate release from an affinity column. Thus, the linker can be cleavable, for example, by chemical, thermal, enzymatic or photochemical reaction. Photocleavable groups in the linker may include the 1-(2-nitrophenyl)-ethyl group. Thermally labile linkers may, for example, be a double-stranded duplex formed from two complementary strands of nucleic acid, a strand of a nucleic acid with a complementary strand of a peptide nucleic acid, or two complementary peptide nucleic acid strands which will dissociate upon heating. Cleavable linkers also include those having disulfide bonds, acid or base labile groups, including among others, diarylmethyl or trimethylarylmethyl groups, silyl ethers, carbamates, oxyesters, thioesters, thionoesters, and alpha-fluorinated amides and esters. Enzymatically cleavable linkers can contain, for example, protease-sensitive amides or esters, .beta.-lactamase-sensitive .beta.-lactam analogs and linkers that are nuclease-cleavable, or glycosidase-cleavable.

[0056] The protein reactive group (PRG) can be a group that selectively reacts with certain protein functional groups. Any selectively reactive protein reactive group should react with a functional group of interest that is present in at least a portion of the proteins in a sample. Reaction of PRG with functional groups on the protein should occur under conditions that do not lead to substantial degradation of the compounds in the sample to be analyzed. Examples of selectively reactive PRGs suitable for use in the affinity tagged reagents of this invention include those which react with sulfhydryl groups to tag proteins containing cysteine, those that react with amino groups, carboxylate groups, ester groups, phosphate reactive groups, and aldehyde and/or ketone reactive groups or, after fragmentation with CNBr, with homoserine lactone.

[0057] Thiol reactive groups include epoxides, alpha-haloacyl group, nitrites, sulfonated alkyl or aryl thiols and maleimides. Amino reactive groups tag amino groups in proteins and include sulfonyl halides, isocyanates, isothiocyanates, active esters, including tetrafluorophenyl esters, and N-hydroxysuccinimidyl esters, acid halides, and acid anhydrides. In addition, amino reactive groups include aldehydes or ketones in the presence or absence of NaBH.sub.4 or NaCNBH.sub.3.

[0058] Carboxylic acid reactive groups include amines or alcohols in the presence of a coupling agent such as dicyclohexylcarbodiimide, or 2,3,5,6-tetrafluorophenyl trifluoroacetate and in the presence or absence of a coupling catalyst such as 4-dimethylaminopyridine; and transition metal-diamine complexes including Cu(II) phenanthroline

[0059] Ester reactive groups include amines which, for example, react with homoserine lactone.

[0060] Phosphate reactive groups include chelated metal where the metal is, for example Fe(III) or Ga(III), chelated to, for example, nitrilotriacetic acid or iminodiacetic acid.

[0061] Aldehyde or ketone reactive groups include amine plus NaBH.sub.4 or NaCNBH.sub.3, or these reagents after first treating a carbohydrate with periodate to generate an aldehyde or ketone.

[0062] The requirements discussed above for A, L, PRG, extend to the corresponding to the segments of A-L-PRG and the reaction products generated with this reagent.

[0063] Internal standards, which are appropriately isotopically labeled, may be employed in the methods of this invention to measure absolute quantitative amounts of proteins in samples. These may be prepared by reaction of affinity labeled protein reactive reagents with a preparation known to contain the protein of interest to generate the affinity tagged peptides generated from digestion of the tagged protein. Alternatively, the desired peptides may be chemically synthesized. Affinity tagged peptide internal standards are substantially chemically identical to the corresponding affinity tagged peptides generated from digestion of the affinity tagged protein, except that they are differentially isotopically labeled to allow their independent detection by MS techniques.

[0064] The method of this invention can also be applied to determine the relative quantities of one or more proteins in two or more protein samples, while simultaneously determining their identity. The proteins in each sample are reacted with the labeled reagents which are substantially chemically identical but differentially isotopically labeled. The samples are combined and processed as one, and then run together by gel electrophoresis. The proteins contained in specific bands or spots are then digested. Alternatively, after mixing the protein samples, but prior to electrophoresis, the proteins may be subjected to avidin affinity chromatography to enrich for biotinylated proteins, which could be important, for example, if intact cells had been labeled. The relative quantity of each labeled peptide, which reflects the relative quantity of the protein from which the peptide originates, is determined by the measurement of the respective isotope peaks by mass spectrometry.

[0065] The methods of this invention can be applied to the analysis or comparison of multiple different samples. Samples that can be analyzed by methods of this invention include cell homogenates; cell fractions; biological fluids including urine, blood, and cerebrospinal fluid; tissue homogenates; tears; feces; saliva; lavage fluids such as lung or peritoneal lavages; mixtures of biological molecules including proteins, lipids, carbohydrates and nucleic acids generated by partial or complete fractionation of cell or tissue homogenates.

[0066] The methods of this invention employ MS and (MS).sup.n methods. While a variety of MS and (MS).sup.n are available and may be used in these methods, Matrix Assisted Laser Desorption Ionization MS (MALDI/MS) and Electrospray Ionization MS (ESI/MS) methods are preferred.

[0067] As set forth above, the proteins in each sample are labeled with either an (A) affinity labeled or a non-affinity labeled reagent both of which include a labeled linker moiety (L) and a protein reactive group (PRG).

[0068] The labeled samples are mixed and then preferably subjected to 2D PAGE. One dimensional SDS electrophoresis can be used instead of 2D PAGE, or one dimensional isoelectric focusing gels, or any other electrophoretic method for separating proteins, including native protein electrophoresis. The procedures for running one dimensional and two dimensional electrophoresis are well known to those skilled in the art.

[0069] Proteins that the two cell samples have in common form coincident spots upon protein staining, or upon direct MS analysis of a piece of the gel. The ratio of the detectable isotopes between identical proteins from either sample will be constant for the vast majority of proteins. Proteins that the two samples do not have in common will migrate independently. Thus, a protein that is unique or of different relative concentration to one sample will have a different ratio of detectable isotopes from the majority of protein spots. The protein spots of interest then are digested to form labeled peptides which then are analyzed by (MS).sup.n.

[0070] In conventional analysis, a control is run with known proteins for the cell type being studied. The known spots on the sample gel have to be identified and marked, then compared to the control and the second gel to determine differences between the two gels. In the present invention, there is only one gel so no marking is necessary. In addition, the software used on conventional processes for alignment of different gels prior to comparing and contrasting protein differences does not correct for local distortions and inconsistencies between two or more gels. The process of the present invention eliminates the need for such correction because the extracts for all samples to be tested are mixed and run on the same gel. Any gel distortions are experienced equally by each sample.

[0071] One of the advantages of performing gel electrophoresis is that proteins of particular interest migrate to a reproducible place on the gel, so that if desired, only these proteins need be analyzed. These proteins can include disease markers as well as control proteins. Many of the post-translationally modified forms of these proteins can be separated from one another by gel electrophoresis, so that the methods of the invention could be used to determine and quantify changes in the expression of each of these modified forms. If there was any difficulty in localizing such proteins, a small portion of the separated samples could be transblotted from the gel and these proteins could be located by immunoblotting techniques. Alternatively, a small amount of the protein of interest could be labeled with a fluorescent marker known not to affect migration position prior to electrophoresis to identify the regions of interest to be analyzed. Then the methods of this invention could be used to measure the quantitative changes in the majority of the proteins in the gel based upon the PRG as a function of their migration on the gel.

[0072] The method of this invention can be utilized to analyze the protein composition described in Published PCT application WO 00/11208 which is incorporated herein by reference.

Quantitative Proteome Analysis with Affinity Labeled Reagent

[0073] This method consists of using a biotin labeled sulfhydryl-reactive reagent for quantitative protein profile measurements in a sample protein mixture and a reference protein mixture. The method comprises the following steps:

[0074] A. Reduction Disulfide bonds of proteins in the sample and reference mixtures are reduced to free SH groups. The preferred reducing agent is tri-n-butylphosphine which is used under standard conditions. Alternative reducing agents include tricarboxyethylphosphine, mercaptoethylamine and dithiothreitol. If required, this reaction can be performed in the presence of solubilizing agents including high concentrations of urea and detergents to maintain protein solubility. The reference and sample protein mixtures to be compared are processed separately, applying identical reaction conditions.

[0075] B. Derivatization of SH groups with an affinity tag Free SH groups are derivatized with the biotinylating reagent biotinyl-iodoacetylamidyl-- 4,7, dioxadecanediamine. The reagent is prepared in different isotopically labeled forms by substitution of linker atoms with stable isotopes and each sample is derivatized with a different isotopically labeled form of the reagent. Derivatization of SH groups is preferably performed under slightly basic conditions (pH 8.5) for 90 minutes at room temperature. For the quantitative, comparative analysis of two samples, one sample each (termed reference sample and sample) are derivatized with the isotopically light and the isotopically heavy form of the reagent, respectively. For the comparative analysis of several samples one sample is designated a reference to which the other samples are related to. Typically, the reference sample is labeled with the isotopically heavy reagent and the experimental samples are labeled with the isotopically light form of the reagent, although this choice of reagents is arbitrary. These reactions are also compatible with the presence of high concentrations of solubilizing agents.

[0076] C. Combination of labeled samples After completion of the affinity tagging reaction defined aliquots of the samples labeled with the isotopically different reagents (e.g., heavy and light reagents) are combined and all the subsequent steps are performed on the pooled samples. Combination of the differentially labeled samples at this early stage of the procedure eliminates variability due to subsequent reactions and manipulations. Preferably equal amounts of each sample are combined; and then fractionated by one of the following well known techniques:

[0077] 1.) Flow Through Gel electrophoresis The labeled proteins are separated through a preparative flow-through SDS gel (5%) apparatus (Mini Prep Cell, Bio-Rad) and the eluted proteins are collected in fractions. The proteins may be concentrated, for example, by acetone precipitation before proteolytic digestion is effected by overnight incubation with an enzyme such as trypsin.

[0078] 2.) Standard gel electrophoresis The gel may be stained for proteins to localize spots or bands, or the spots or slices may be processed without protein detection at this stage. Protein mixtures that are present in a spot (2D) or band (1D) by gel electrophoresis are excised from the gel, optionally dried and digested with an enzyme. The proteins in the sample mixture are digested, typically with trypsin. Alternative proteases are also compatible with the procedure as in fact are chemical fragmentation procedures. This step may be omitted in the analysis of small proteins.

[0079] 3.) Standard gel electrophoresis with digestion and transblotting for peptide extraction The gel may be treated with enzymes and transblotted (with or without the aid of electric current) onto a membrane, or transblotted through an active protease membrane and captured on a second membrane (Bienvenut et al., Anal. Chem. 71:4800-4807, 1999). That membrane can then be directly analyzed by MS or MALDI MSMS.

[0080] D. Peptide Mass Fingerprinting The protein digest may then be submitted to PMF to identify the major protein components. In favorable instances, the Cys-containing biotinylated peptides are detectable at this stage as isotope pairs that are 8 amu apart, and the relative amount of the proteins can be determined by comparing the intensities of these peptides in the mass spectrum without additional purification.

[0081] E. Affinity isolation of the affinity tagged peptides by interaction with a capture reagent The biotinylated peptides may then be isolated on avidin-agarose. After digestion the pH of the peptide samples is lowered to 6.5 and the biotinylated peptides are immobilized on beads coated with monomeric avidin (Promega). The beads are extensively washed. The last washing solvent includes 10% acetonitrile to remove residual SDS. Biotinylated peptides are eluted from avidin-agarose, for example, with 0.4% trifluoroacetic in the presence of acetonitrile.

[0082] Analysis of the isolated, derivatized peptides may also be accomplished by .mu.LC-MS.sup.n or CE-MS.sup.n with data dependent fragmentation. Methods and instrument control protocols well-known in the art and described, for example, in Ducret et al., 1998; Prot.Sci. 7: 706-719, Figeys and Aebersold, 1998 Electrophoresis 19: 885-892; Figeys et al., 1996, Nature Biotech. 14:1579-1583; or Haynes et al., 1998 Electrophoresis 19:939-945 are used and which are incorporated herein by reference. In this last step, both the quantity and sequence identity of the proteins from which the tagged peptides originated can be determined by automated multistage MS. This is achieved by the operation of the mass spectrometer in a dual mode in which the instrument alternates in successive scans between measuring the relative quantities of peptides eluting from the capillary column and recording the sequence information of selected peptides. Peptides are quantified by measuring in the MS mode the relative signal intensities for pairs of peptide ions of identical sequence that are tagged with the isotopically light or heavy forms of the reagent, respectively, and which, therefore, differ in mass by the mass differential encoded within the affinity tagged reagent. Peptide sequence information is automatically generated by selecting peptide ions of a particular mass-to-charge (m/z) ratio for collision-induced dissociation (CID) in the mass spectrometer operating in the MS.sup.n mode. See Link, A. J. et al. Electrophoresis 18:1314-1334, 1997; Gygi, S. P. et al. Mol.Cell. Biol. 19:1720-1730, 1999, and Gygi, S. P. et al. Electrophoresis 20:310-319, 1999 and which are incorporated herein by reference. The resulting CID spectra are then automatically correlated with sequence databases to identify the protein from which the sequenced peptide originated. The combination of the results generated by MS and MSMS analyses of affinity tagged and differentially labeled peptide samples determines the relative quantities as well as the sequence identities of the components of protein mixtures in a single, automated operation.

[0083] This method can also be practiced using other affinity tags and other protein reactive groups, including amino reactive groups, carboxyl reactive groups, or groups that react with homoserine lactones.

[0084] The approach employed herein for quantitative proteome analysis is based on two principles. First, a short sequence of contiguous amino acids from a protein (5-25 residues) contains sufficient information to uniquely identify that protein. Protein identification by MS.sup.n is accomplished by correlating the sequence information contained in the CID mass spectrum with sequence databases, using sophisticated computer searching algorithms (Eng, J. et al. J. Amer. Soc. Mass Spectrom. 5: 976-989, 1994; Mann, M. et al. Anal Chem. 66: 4390-4399, 1994; Qin, J. et al. Amer. Chem. 69: 3995-4001, 1997; Clauser, K. R. et al. Proc. Nat. Acad. Sci. USA 92:5072-5076, 1995 which are incorporated herein by reference). Second, pairs of identical peptides tagged with the light and heavy affinity tagged reagents, respectively, (or in analysis of more than two samples, sets of identical tagged peptides in which each set member is differentially isotopically labeled) are chemically identical and therefore serve as mutual internal standards for accurate quantitation. The MS measurement readily differentiates between peptides originating from different samples, representing for example different cell states, because of the difference between isotopically distinct reagents attached to the peptides. The ratios between the intensities of the differing weight components of these pairs or sets of peaks provide an accurate measure of the relative abundance of the peptides (and hence the proteins) in the original cell pools because the MS intensity response to a given peptide is independent of the isotopic composition of the reagents (De Leenheer, A. P. et al, Mass. Spectrom. Rev. 11:249-702, 1992) which are incorporated herein by reference. The use of isotopically labeled internal standards is standard practice in quantitative mass spectrometry and has been exploited to great advantage in, for example, the precise quantitation of drugs and metabolites in bodily fluids.

[0085] The methods of this invention, in particular 1D gels, can be applied to analysis of classes of proteins with particular physical-chemical properties including poor solubility, large or small size and extreme pI values. Low abundance proteins can be analyzed by performing protein affinity subtraction prior to electrophoresis to remove the most abundant proteins. Alternatively, the biotinylation reaction could be performed in such a way as to label a minor subset of proteins, for example, those proteins exposed on the ouside of a cell, or proteins that remain exposed after organelle purification. Because a large amount of non-biotinylated protein would then be present that would otherwise interfere with electrophoresis, after mixing the proteins from the control and experimental together, the protein preparation could be subjected to avidin affinity chromatography to enrich for the biotinylated proteins, which would then be electrophoresed.

[0086] The prototypical application of the chemistry and method of the present invention is the establishment of quantitative profiles of complex protein samples and ultimately total lysates of cells and tissues following the preferred method described above. In addition the reagents and methods of this invention have applications which go beyond the determination of protein expression profiles. Such applications include the following:

[0087] The application of amino-reactive or sulfhydryl-reactive, differentially isotopically labeled affinity tagged reagents can be used for the quantitative analysis of proteins in immunoprecipitated complexes. In the preferred version of this technique protein complexes from cells representing different states (e.g., different states of activation, different disease states, different states of differentiation) are precipitated with a specific reagent, preferably an antibody. The proteins in the precipitated complex are then derivatized and analyzed as above.

[0088] The application of amino-reactive, differentially isotopically labeled affinity tagged reagents can be used to determine the sites of induced protein phosphorylation. In a preferred version of this method purified proteins (e.g., immunoprecipitated from cells under different stimulatory conditions) are fragmented and derivatized as described above. Phosphopeptides are identified in the resulting peptide mixture by fragmentation in the ion source of the ESI-MS instrument and their relative abundances are determined by comparing the ion signal intensities of the experimental sample with the intensity of an included, isotopically labeled standard.

[0089] Amino-reactive, differentially isotopically labeled affinity tagged reagents are used to identify the N-terminal ion series in MSMS spectra. In a preferred version of this application, the peptides to be analyzed are derivatized with a 50:50 mixture of an isotopically light and heavy reagent which is specific for amino groups. Fragmentation of the peptides by CID therefore produce two N-terminal ion series which differ in mass precisely by the mass differential of the reagent species used. This application dramatically reduces the difficulty in determining the amino acid sequence of the derivatized peptide.

[0090] The following examples illustrate four different experiments in which gel electrophoresis separations were performed and quantitative data were obtained using ICAT.TM. reagents that contained a biotinyl affinity tag, a linker with eight deuterium atoms, and an iodoacetamide protein reactive group. These examples are not exhaustive and are not intended to limit the scope of these experiments.

EXAMPLES

Example 1

[0091] Five different standard proteins were alkylated separately with the d0 ICAT reagent and the d8 ICAT reagent, and mixed together in different ratios prior to performing 2D gel electrophoresis. After staining, the spots corresponding to these proteins were cut out, digested with trypsin, and submitted to PMF. FIG. 1 shows an image of the gel and insets of each mass spectrum that contain one of the ICAT reagent pairs from each protein. In addition, the ratio at which the proteins were mixed together prior to gel electrophoresis is listed, as well as the ratio of d0 to d8 that was obtained by mass spectrometry. In all five cases, the discrepancy between the experimental and the observed ratios was well below 20%.

[0092] One of the problematic aspects of separating ICAT reagent labeled peptides by HPLC is that the d8 labeled peptide typically elutes several seconds ahead of the corresponding d0 labeled peptide. To demonstrate the fact that upon gel electrophoresis there is no similar isotope separation effect, the 2D spot for lactalbumin, shown in FIG. 2, was split into quadrants, which were then separately digested, extracted, and submitted to MALDI MS analysis. The right hand side of FIG. 2 demonstrates that the same ratio of d0 to d8 was obtained for each of these quadrants, within 10%.

Example 2

[0093] E. coli bacteria lysates, either labeled with an ICAT reagent comprising deuterated biotinyl iodoacetamide reagent for minimum medium (glucose) growing condition or labeled with non-deuterated reagent for rich medium (LB broth) growing condition, were mixed at equal amounts. The mixture was separated through a preparative flow-through SDS gel (5%) apparatus (Mini Prep Cell, Bio-Rad) and proteins were fractionated into solution. The fractionated proteins were then acetone precipitated before proteolytic digestion by overnight incubation with trypsin. Upon avidin chromatography, peptides from both the flow-through portion and the elution portion were collected into 96 fractions. The flow-through was captured on reversed phase medium (POROS.RTM. 50R1, Applied Biosystems) and washed with distilled water and eluted with 60% ACN. Samples were vacuum dried and re-suspended with 50% ACN/0.1% TFA. Spectra were acquired using an Applied Biosystems Voyager MALDI TOF mass spectrometer with .alpha.-cyano-4-hydroxycinnamic acid as matrix. The strategy was to identify proteins using PMF, while the d0/d8 ratio was used for quantitation.

[0094] FIG. 3 shows the spectrum acquired for the avidin flow through and for the peptides eluted from the avidin for one fraction that contained proteins at about 40,000 in molecular weight. Ten (10) different ICAT reagent labeled pairs are marked. The major protein components were tentatively identified by PMF using the ChemApplex PMF software program (Applied Biosystems), and six components are listed in Table 1 below. OmpA was the main component, which comprised 25% of the total intensity. The confidence in the identification is roughly proportional to the score listed in column 5. Note that all six of these proteins have molecular weights that are between 30K and 52K daltons, as would be expected using crude SDS separation. A special peptide database was created containing cysteine peptides only, and the masses from the eluted peptides were searched against this database. The top six candidate proteins are listed. Two of these proteins are identical to those identified from the avidin flow through. Notably, two of the proteins in the flow through fraction, namely, ribose binding protein and outer membrane C, have no cysteines, and therefore would not contribute any peptides to the avidin eluate fraction.

1TABLE 1 Flow-Through Acc. # Protein Name MW # peptide Score % Intensity ppm P02990 EF-TU 43156 6 47828 25.4 3.7 P06996 ompC 40344 8 13194 12.7 7.3 P00477 SHT 45289 4 7778 5.1 4.2 P02925 ribose BP 30932 5 4488 7.4 10.7 P06711 glutamine syn. 51741 4 4196 3.4 7.7 P08200 ICDH 45728 3 1174 1.6 6.6 Avidin Elution Acc. # Protein Name MW # peptide Score % Intensity ppm ratio P02990 EF-TU 43156 5 17822 48.2 3.2 0.65 P02934 ompA 37179 2 305 2.2 3.3 0.67 P39342 hypo. 54.3 54299 2 91 0.6 10.3 0.49 P76200 hypo. 43 41368 2 44 1.7 18.4 2.6 P07460 succ.coA syn. 41368 2 21 1.1 14.2 1.5 P00477 SHT 45289 3 1 0.2 28.3 0.13

[0095] The proteins listed in Table I were identified from the spectra in FIG. 3 using the ChemApplex PMF program. The top panel was obtained from the flow-through of the avidin beads, and the bottom panel was obtained from the avidin elution. The first column lists the SwissProt accession number of the protein that was identified. The second column lists an abbreviated form of the protein's name: EF-TU for elongation factor-TU, ompC for outer membrane protein C, SHT for serine hydroxymethyl transferase, ribose BP for the periplasmic ribose binding protein, glutamine syn. for glutamine synthetase, ICDH for isocitrate dehydrogenase, ompA for outer membrane protein A, hypo. for hypothetical protein, succ. CoA syn. for succinyl coenzyme A synthetase beta chain. The MW column lists the molecular weight of the protein; # peptide lists the number of peptides that were matched (including only the d0 masses for the avidin eluted peptides); the Score was calculated by the ChemApplex program taking into account only the d0 masses; % Intensity is the percentage of the intensity of all the masses in the spectrum that could be accounted for by the masses that were matched (again only the d0 masses); and ppm is the average intensity-weighted ppm error for those masses between the experimental measurements from the mass spectrum and the theoretical mass of the peptides. Ratio was calculated manually by dividing the intensity of the d0 peptide by the intensity for the corresponding d8 peptide, and averaging where possible. The low intensity of the d0 masses for SHT explains why the ChemApplex program had difficulty in distinguishing SHT from the noise; the program was not looking for the d8 masses, all three of which are detectable over the background. Note that ompC and RBP do not contain cysteines, and therefore are invisible in the avidin eluate fraction. The confidence in the identifications is highest for the proteins with the highest score, and also for the proteins that were independently identified in the flow-through fraction and the affinity elution sample. All of the proteins in both tables except the two hypothetical proteins in the second table have been identified repeatedly from these E. coli lysates.

Example 3

[0096] Two E coli preparations similar to those described above were labeled with ICAT d0 reagent and ICAT d8 reagent, mixed together and submitted to 1D SDS gel analysis. Slices were cut from the gel, washed, digested with trypsin, and the peptides were eluted. No avidin affinity chromatography was performed, so that only the most intense ICAT reagent labeled peptides were detectable. Upon PMF analysis with ChemApplex, E. coli tryptophanase was detectable as the most prominent protein component, after trypsin itself. Under these conditions, the peptides that corresponded to ICAT reagent pairs were also detected in an oxidized form, due to oxidation at the original cysteine sulfur atom, analogous to the oxidation of methionine residues that is commonly observed post SDS gel analysis. Thus, each peptide provides two independent measurements of the ratio of d0 to d8, one for the reduced form of the peptide, and one for the oxidized form of the peptide. A prominent quartet of peaks about 8 amu apart was detected starting at 1581.85, which corresponds to the tryptophanase peptide QLPCPAELLR (SEQ ID NO: 1), and the d8, d0+O and d8+O peaks. The ICAT reagent pair with an unmodified methionine had a d8/d0 ratio of 2.1, whereas the oxidized pair had a d8/d0 ratio of 1.9. In these experiments, the ratios obtained for ICAT reagent pairs of peptides derived from the same protein were commonly within 20% of each other, except for the weakest signals and those signals that obviously overlapped other peptides (which is particularly apparent when they correspond to expected trypsin digestion products from the same proteins already identified). Other ICAT reagent pairs from tryptophanase were detectable, but not well resolved over the background.

Example 4

[0097] Proteins were isolated from rat cardiac cells from normal myocytes or from myocytes that had been subjected to ischemic conditions. Normal rat proteins were labeled with the d0 ICAT reagent, and the ischemic cell proteins were labeled with the d8 ICAT reagent. The two samples were mixed together, and run on a 2D gel, and stained with Coomassie brilliant blue. Spots were cut out, digested with trypsin, and submitted to PMF. The data were then searched using the ChemApplex software program, using a database that consisted of all of the human, rat and mouse proteins in the SwissProt database. The top candidate for one spot was human citrate synthetase. The rat and mouse homologues of citrate synthetase were absent from the database. The peptide mass fingerprint spectrum contained a prominent ICAT reagent pair at 1098 that did not correspond to any of the citrate synthetase peptides. Because the rat citrate synthetase protein was not present in the SwissProt database, a rat EST database was searched in the Protein Prospector (University of California-San Francisco) software program using masses that corresponded exactly to the theoretical masses of citrate synthetase that had been identified. One of the EST sequences that was identified by this means contained the sequence YSQCR (SEQ ID NO: 2), which corresponded to the ICAT reagent pair at 1098. The homologous human sequence was YTQCR (SEQ ID NO: 3), explaining the measured mass did not match the sequence in the database. This peptide sequence is too short to be a unique identifier of a protein, and would not be useful had it not been possible to assign the peptide to citrate synthetase on the basis of the PMF data.

* * * * *