U.S. patent application number 12/307114 was filed with the patent office on 2010-11-11 for genes associated with chemotherapy response and uses thereof.
Invention is credited to Hongyue Dai, Andrey Loboda, Chunsheng Zhang.
Application Number | 20100284915 12/307114 |
Document ID | / |
Family ID | 38895109 |
Filed Date | 2010-11-11 |
United States Patent
Application |
20100284915 |
Kind Code |
A1 |
Dai; Hongyue ; et
al. |
November 11, 2010 |
GENES ASSOCIATED WITH CHEMOTHERAPY RESPONSE AND USES THEREOF
Abstract
The invention provides molecular markers that are associated
with responsiveness of a cancer patient to a chemotherapy
treatment, and methods and computer systems for determining such
responsiveness based on measurements of these molecular markers.
The present invention also provides methods and compositions for
enhancing the efficacy of chemotherapies in patients by modulating
the expression or activity of genes encoding these molecular
markers and/or their encoded proteins.
Inventors: |
Dai; Hongyue; (Chestnut
Hill, MA) ; Zhang; Chunsheng; (West Roxbury, MA)
; Loboda; Andrey; (Philadelphia, PA) |
Correspondence
Address: |
CHRISTENSEN, O'CONNOR, JOHNSON, KINDNESS, PLLC
1420 FIFTH AVENUE, SUITE 2800
SEATTLE
WA
98101-2347
US
|
Family ID: |
38895109 |
Appl. No.: |
12/307114 |
Filed: |
June 28, 2007 |
PCT Filed: |
June 28, 2007 |
PCT NO: |
PCT/US07/15025 |
371 Date: |
March 15, 2010 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
60818262 |
Jun 30, 2006 |
|
|
|
Current U.S.
Class: |
424/9.1 ;
424/174.1; 435/366; 435/375; 435/6.16; 506/17; 506/7; 514/19.3;
514/27; 514/274; 514/44A; 514/44R; 514/449; 514/492; 514/90 |
Current CPC
Class: |
C12Q 2600/158 20130101;
G01N 33/57449 20130101; A61P 35/00 20180101; Y02A 90/26 20180101;
Y02A 90/10 20180101; G16B 25/00 20190201; G16B 20/00 20190201; C12Q
2600/106 20130101; C12Q 1/6886 20130101; C12Q 2600/136 20130101;
G01N 33/57484 20130101; C12Q 2600/118 20130101; G01N 33/57415
20130101 |
Class at
Publication: |
424/9.1 ; 435/6;
506/7; 514/44.A; 514/44.R; 514/19.3; 424/174.1; 514/274; 514/90;
514/449; 514/27; 514/492; 435/375; 435/366; 506/17 |
International
Class: |
A61K 49/00 20060101
A61K049/00; C12Q 1/68 20060101 C12Q001/68; C40B 30/00 20060101
C40B030/00; A61K 31/713 20060101 A61K031/713; A61K 31/7105 20060101
A61K031/7105; A61K 38/02 20060101 A61K038/02; A61K 39/395 20060101
A61K039/395; A61K 31/513 20060101 A61K031/513; A61K 31/675 20060101
A61K031/675; A61K 31/337 20060101 A61K031/337; A61K 31/7048
20060101 A61K031/7048; A61K 31/282 20060101 A61K031/282; C12N 5/07
20100101 C12N005/07; C12N 5/071 20100101 C12N005/071; C40B 40/08
20060101 C40B040/08; A61K 31/7088 20060101 A61K031/7088; A61P 35/00
20060101 A61P035/00 |
Claims
1. A method for predicting the responsiveness of a mammalian
patient having a cancer to a chemotherapy regimen, comprising:
predicting said mammalian patient (a) as responsive to said
chemotherapy regimen, if expression and/or activity of one or more
gene products in a cell sample taken from said mammalian patient is
not up-regulated relative to a reference population of individuals
of the same species as said mammalian patient; or (b) as
non-responsive to said chemotherapy regimen, if expression and/or
activity of said one or more gene products is up-regulated relative
to said reference population of individuals, wherein said one or
more gene products comprise respectively products of one or more
different genes selected from the group consisting of genes
corresponding to SEQ ID NOs:1-39 or respective functional
equivalents thereof.
2. The method of claim 1, further comprising: determining, prior to
said predicting step, whether expression and/or activity of said
one or more gene products is up-regulated as relative to said
reference population of individuals.
3. The method of claim 2, wherein said determining step is carried
out by a method comprising: determining one or more chemotherapy
response scores (CR scores) based on measurements of said one or
more gene products in said cell sample, wherein said one or more CR
scores indicate whether expression and/or activity of said one or
more gene products is up-regulated as compared to individuals in
said reference population.
4. The method of claim 3, comprising: determining a CR score that
is an average of said measurements of said one or more gene
products, wherein said mammalian patient is predicted as responsive
if said average is equal to a predetermined threshold value or as
non-responsive if said average is greater than said predetermined
threshold value.
5. The method of claim 3, comprising: determining a first CR score
that is a first measurement of a gene product of a gene having the
greatest expressive range among a first subset of said one or more
different genes, wherein said first subset is selected from the
group consisting of genes having SEQ ID NOs:1-19 or determining a
second CR score that is a second measurement of a gene product of a
gene having the greates expressive range among a second subset of
said one or more different genes, wherein said second subset is
selected from the group consisting of genes having SEQ ID
NOs:20-39, wherein said mammalian patient is predicted as
responsive if said first or second measurement is less or equal to
a predetermined threshold value or as non-responsive if said first
or second measurement is greater than said predetermined threshold
value.
6. The method of claim 3, wherein said step of determining one or
more CR scores is carried out by a method comprising: (a1)
comparing a marker profile comprising said measurements of said one
or more gene products with a responsive template and/or a
non-responsive template, wherein said responsive template comprises
measurements of said one or more gene products representative of
measurements of said one or more genes products in a plurality of
mammalian patients being responsive to said chemotherapy regimen,
and said non-responsive template comprises measurements of said one
or more gene products representative of measurements of said
plurality of genes products in a plurality of mammalian patients
being non-responsive to said chemotherapy regimen; and (a2)
determining a first degree of similarity between said marker
profile and said responsive template and/or a second degree of
similarity between said marker profile and said non-responsive
template, wherein said first and second degrees of similarity are
said one or more CR scores, and wherein said mammalian patient is
(b1) predicted to be responsive if said first degree of similarity
is greater than said second degree of similarity or if said first
degree of similarity is greater than a predetermined threshold or
(b2) predicted to be non-responsive if said first degree of
similarity is no greater than said second degree of similarity or
if said second degree of similarity is no greater than said
predetermined threshold.
7. The method of claim 6, wherein said first or second degree of
similarity is represented by a correlation coefficient between said
marker profile and said respective template.
8. The method of claim 6, wherein the measurement of each gene
product in said responsive template is an average of the
measurements of said gene product in a plurality of responsive
mammalian patients, and wherein the measurement of each gene
product in said non-responsive template is an average of the
measurements of said gene product in a plurality of non-responsive
mammalian patients.
9. The method of claim 3, wherein said step of determining one or
more CR scores is carried out by a method comprising using a
chemotherapy response classifier selected from the group consisting
of an artificial neural network (ANN) classifier and a support
vector machine (SVM) classifier, wherein said chemotherapy response
classifier receives an input comprising a marker profile comprising
said measurements of said one or more gene products and provides an
output comprising said one or more CR scores.
10. The method of claim 9, wherein said chemotherapy response
classifier is trained with training data from a plurality of
training cancer patients, wherein said training data comprise for
each patient of said plurality of training cancer patients (i) a
training marker profile comprising measurements of said plurality
of gene products in a cell sample taken from said training patient;
and (ii) data indicating whether said training patient is
responsive to said treatment regimen.
11. The method of claim 3, comprising determining one or more CR
scores that indicates in which percentile said measurements of said
one or more gene products fall in the said reference population of
individuals, wherein said patient is predicted to be non-responsive
if said one or more CR scores indicate that said measurements of
said one or more gene products fall in the Y1 percentile in said
reference population, wherein Y1 percentile=60 percentile, 70
percentile, 80 percentile, or 90 percentile, or is predicted to be
responsive if said one or more CR scores indicate that said
measurements of said one or more gene products fall in the Y2
percentile in said reference population, wherein Y2 percentile=10
percentile, 20 percentile, 30 percentile, or 40 percentile,
12. The method of claim 1, wherein said measurements of one or more
gene products are measurements of abundance levels of gene
transcripts.
13. The method of claim 1, wherein said measurements of one or more
gene products are measurements of abundance levels of proteins.
14. The method of claim 1, wherein said chemotherapy regimen
comprises administration of a chemotherapy drug selected from the
group consisting of 5-fluorouracil, CMF combination consisting of
cyclophosphamide, methotrexate, and 5-fluorouracil, paclitaxel,
etoposide, and carboplatin.
15. The method of claim 1, wherein said one or more gene products
are respectively products of said one or more different genes
selected from the group consisting of genes having SEQ ID
NOs:1-39.
16. The method of claim 1, wherein said one or more gene products
are of at least N or are all of said one or more different genes
selected from the group consisting of genes having SEQ ID NOs:1-39,
wherein N=2, 3, 4, 5, 10, 15, 20, 25, 30, or 35.
17. The method of claim 1, wherein said one or more gene products
are of at least N, or are all of said one or more different genes
selected from the group consisting of genes having SEQ ID NOs:1-19.
wherein N=2, 3, 4, 5, 10, or 15.
18. The method of claim 1, wherein said one or more gene products
are of at least N, or are all of said one or more different genes
selected from the group consisting of genes having SEQ ID
NOs:20-39. wherein N=2, 3, 4, 5, 10, or 15.
19. The method of claim 16, wherein said one or more gene products
comprises gene products of (i) at least N, or are all of said one
or more different genes selected from the group consisting of genes
having SEQ ID NOs:1-19, wherein N=2, 3, 4, 5, 10. or 15 and (ii) at
least M, or are all of said one or more different genes selected
from the group consisting of genes having SEQ ID NOs:20-39, wherein
M=2, 3, 4, 5, 10, or 15.
20. The method of claim 1, wherein said chemotherapy regimen is an
adjuvant chemotherapy regimen, and wherein a prediction of a
patient as responsive to said chemotherapy regimen indicates
non-occurrence of metastases or survival within a first
predetermined period of time after initial diagnosis in said
patient treated with said chemotherapy regimen, and wherein a
prediction of a patient as non-responsive to said chemotherapy
regimen indicates occurrence of metastases or non-survival within a
second predetermined period of time in said patient treated with
said chemotherapy regimen.
21. The method of claim 1, wherein said chemotherapy regimen is a
primary chemotherapy regimen, and a prediction of a patient as
responsive to said chemotherapy regimen indicates (i) a reduction
in tumor size or number of cancer cells and/or (ii) non-occurrence
of metastases or survival within a first predetermined period of
time after initial diagnosis in said patient treated with said
chemotherapy regimen, and wherein a prediction as responsive to
said chemotherapy regimen indicates (iii) a lack of reduction in
tumor size or number of cancer cells and/or (iv) occurrence of
metastases or non-survival within a second predetermined period of
time in said patient treated with said chemotherapy regimen.
22. The method of claim 20 or 21, wherein said first period of time
and said second periods of time are the same, and are each 3, 5, 7,
10, or 12 years.
23. The method of claim 1, wherein said patient has been determined
to have a poor prognosis, wherein a poor prognosis indicates
occurrence of metastases or non-survival within a third
predetermined period of time in said patient untreated with any
chemotherapy for said cancer.
24. The method of claim 1, wherein said measurement of each said
gene product is a relative eve of said gene product in said cell
sample versus level of said gene product in a reference sample,
represented as a log ratio.
25. The method of claim 24, wherein said reference sample is
selected from the group consisting of a sample comprising a pool of
cancer cells obtained from a plurality of patients having said
cancer, a sample of cells of a non-cancerous cell line of cells of
the same type of tissue as said cancer, and a sample of cells of a
cell line of said cancer.
26. The method of claim 1, wherein said patient is a human
patient.
27. The method of claim 1, wherein said cancer is breast
cancer.
28. The method of claim 1, wherein said cancer is ovarian
cancer.
29. A method for assigning a treatment regimen for a patient having
a cancer, comprising (i) predicting whether said patient is
responsive or non-responsive to a chemotherapy regimen using the
method of claim 1; and (ii) if said patient is determined to be
responsive to said chemotherapy regimen, assigning said patient a
treatment regimen that comprises said chemotherapy regimen; or if
said patient is determined to be non-responsive to said
chemotherapy regimen, assigning said patient (ii1) a treatment
regimen that does not comprise said chemotherapy regimen or (ii2) a
treatment regiment comprising (A) said chemotherapy regimen and (B)
one or more agents that reduce the expression and/or activity level
of said one more gene products.
30. A method for enrolling a plurality of cancer patients for a
clinical trial a chemotherapy regimen, comprising (i) determining
whether each patient in said plurality is responsive or
non-responsive to said chemotherapy regimen using the method of
claim 1; and (ii) assigning each patient who is predicted to be
responsive to one patient group and each patient who is predicted
to be non-responsive to another patient group, at least one of said
patient group being enrolled in said clinical trial.
31. The method of claim 1, wherein said method is a computer
implemented method.
32. A computer system comprising a processor, and a memory coupled
to said processor and encoding one or more programs, wherein said
one or more programs cause the processor to carry out the method of
claim 1.
33. A computer program product for use in conjunction with a
computer having a processor and a memory connected to the
processor, said computer program product comprising a computer
readable storage medium having a computer program mechanism encoded
thereon, wherein said computer program mechanism may be loaded into
the memory of said computer and cause said computer to carry out
the method of claim 1.
34. The method of claim 1, further comprising obtaining said
measurements of said one or more gene products by a method
comprising measuring said plurality of gene products of said cell
sample taken from said patient.
35. The method of claim 11, further comprising obtaining
measurement of abundance level of each said gene transcript by a
method comprising contacting a positionally-addressable microarray
with nucleic acids from said cell sample or nucleic acids derived
therefrom under hybridization conditions, and detecting the amount
of hybridization that occurs, said microarray comprising one or
more polynucleotide probes complementary to a hybridizable sequence
of each said gene transcript or a nucleic acid derived thereof.
36. The method of claim 11, further comprising obtaining
measurement of abundance level of each said gene transcript by a
method comprising measuring the transcript level of said gene using
quantitative reverse transcriptase PCT (qRT-PCR).
37. A method for treating a patient having a cancer, comprising
administering to said patient (a) one or more agents that is
capable of reducing the expression and/or activity of one or more
different genes selected from the group consisting of genes having
SEQ ID NOs:1-39 or respective functional equivalents thereof and/or
their encoded proteins, and (b) a chemotherapy regimen, wherein
said patient is predicted to be non-responsive to said chemotherapy
regimen as a result of overexpression of said one or more different
genes.
38. The method of claim 37, wherein said one or more different
genes consist of at least N or all of the different genes selected
from the group consisting of genes having SEQ ID NOs:1-39, wherein
N=2, 3, 4, 5, 10, 15, 20, 25, 30, or 35.
39. The method of claim 38, wherein said one or more different
genes consist of at least N or all of the different genes selected
from the group consisting of genes having SEQ ID NOs:1-19, wherein
N=2, 3, 4, 5, 10, or 15.
40. The method of claim 38, wherein said one or more different
genes consist of at least N or all of the different genes selected
from the group consisting of genes having SEQ ID NOs:20-39, wherein
N=2, 3, 4, 5, 10 or 15.
41. The method of claim 38, wherein said one or more different gene
are of (i) at least N or all of the different genes selected from
the group consisting of genes having SEQ ID NOs:1-19, wherein N=2,
3, 4, 5, 10, or 15; and (ii) at least K or all of the different
genes selected from the group consisting of genes having SEQ ID
NOs:20-39, wherein K=2, 3, 4, 5, 10, or 15.
42. The method of claim 37, wherein said one or more agents
comprise a substance selected from the group consisting of siRNA,
antis oleic acid, ribozyme, and triple helix forming nucleic acid,
each being capable of reducing the expression of one or more c f
said one or more different genes.
43. The method of claim 37, wherein said one or more agents
comprise a substance selected from the group consisting of
antibody, peptide, and small molecule, each is capable of reducing
the activity of one or more of proteins encoded by said one or more
different genes.
44. The method of claim 42, wherein said one or more agents
comprise an siRNA targeting said one or more different genes.
45. The method of claim 44, wherein said one or more different
genes consist of at least L different genes, wherein L=2, 3, 4, 5,
10, or 15.
46. The method of claim 37, further comprising determining a
transcript level of each of said one or more different genes.
47. The method of claim 46, wherein said determining each said
transcript level is carried out by a method comprising measuring
the transcript level of said gene using one or more polynucleotide
probes, each of said one or more polynucleotide probes comprising a
nucleotide sequence complementary to a hybridizable sequence in
said transcript of said gene or a nucleic acid derived thereof.
48. The method of claim 47, wherein said one or more polynucleotide
probes are polynucleotide probes on a microarray.
49. The method of claim 48, wherein said determining each said
transcript level is carried out by a method comprising measuring
the transcript level of said gene using quantitative reverse
transcriptase PCT (qRT-PCR).
50. The method of claim 37, wherein said chemotherapy regimen
comprises administering a chemotherapy drug selected from the group
consisting of 5-fluorouracil, CMF combination consisting of
cyclophosphamide, methotrexate, and 5-fluorouracil, paclitaxel,
etoposide, carboplatin.
51. The method of claim 37, wherein said patient is a human
patient.
52. The method of claim 37, wherein said cancer is breast
cancer.
53. The method of claim 37, wherein said cancer is ovarian
cancer.
54. A method for modulating sensitivity of a cell to a
chemotherapeutic drug, comprising contacting said cell with one or
more agents, said one or more agents being capable of reducing the
expression and/or activity of one or more different genes selected
from the group consisting of genes having SEQ ID NOs:1-39 or
respective functional equivalents thereof and/or the their encoded
proteins.
55. A method for modulating growth of a cell, comprising contacting
said cell with (a) one or more agents, said one or more agents
being capable of reducing the expression and/or activity of one or
more different genes selected from the group consisting of genes
having SEQ ID NOs:1-39 or respective functional equivalents thereof
and/or the their encoded proteins; and (b) a sufficient amount of a
chemotherapeutic drug.
56. The method of any one of claims 54 to 55, wherein said one or
more different genes consist of at least N or all of the different
genes selected from the group consisting of genes having SEQ ID
NOs:1-39, wherein N=2, 3, 4, 5, 10, 15, 20, 25, 30, or 35.
57. The method of claim 56, wherein said one or more different
genes consist of at least N or all of the different genes selected
from the group consisting of genes having SEQ ID NOs:1-19, wherein
N=2, 3, 4, 5, 10, or 15.
58. The method of claim 56, wherein said one or more different
genes consist of at least N or all of the different genes selected
from the group consisting of genes having SEQ ID NOs:20-39, wherein
N=2, 3, 4, 5, 10, or 15.
59. The method of claim 56, wherein said one or more different gene
are of (i) at least N or all of the different genes selected from
the group consisting of genes having SEQ ID NOs:1-19, wherein N=2,
3, 4, 5, 10, or 15; and (ii) at least K or all of the different
genes selected from the group consisting of genes having SEQ ID
NOs:20-39, wherein K=2, 3, 4, 5, 10, or 15.
60. The method of claim 54 or 55, wherein said one or more agents
comprise a substance selected from the group consisting of siRNA,
antisense nucleic acid, ribozyme, and triple helix forming nucleic
acid, each is capable of reducing the expression of one or more of
said one or more different genes.
61. The method of claim 54 or 55, wherein said one or more agents
comprise a substance selected from the group consisting of
antibody, peptide, and small molecule, each is capable of reducing
the activity of one or more of proteins encoded by said one or more
different genes.
62. The method of claim 60, wherein said one or more agents
comprise an siRNA targeting said one or more different genes.
63. The method of claim 62, wherein said one or more different
genes consist of at least L different genes, wherein L=2, 3, 4, 5,
10, or 15.
64. The method of claim 54 or 55, further comprising determining a
transcript level of each of said one or more different genes.
65. The method of claim 64, wherein said determining each said
transcript level is carried out by a method comprising measuring
the transcript level of said gene using one or more poly:nucleotide
probes, each of said one or more polynucleotide probes comprising a
nucleotide sequence complementary to a hybridizable sequence in
said transcript of said gene or a nucleic acid derived thereof.
66. The method of claim 65, wherein said one or more polynucleotide
probes are polynucleotide probes on a microarray.
67. The method of claim 65, wherein said determining each said
transcript level is carried out by a method comprising measuring
the transcript level of said gene using quantitative reverse
transcriptase PCT (qRT-PCR).
68. The method of claim 54 or 55, wherein said chemotherapeutic
drug is selected from the group consisting of 5-fluorouracil, CMF
combination consisting of cyclophosphamide, methotrexate, and
5-fluorouracil, paclitaxel, etoposide, carboplatin.
69. The method of claim 54 or 55, wherein said cell is a human
cell.
70. The method of claim 54 or 55, wherein said cell is a breast
cancer cell.
71. The method of claim 54 or 55, wherein said cell is ovarian
cancer cell.
72. A method of identifying an agent that is capable of modulating
sensitivity of a cell to the growth inhibitory effect of a
chemotherapeutic drug, said method comprising comparing a first
growth inhibitory effect of said chemotherapeutic drug on cells
expressing said gene in the presence of a candidate agent with a
second growth inhibitory effect of said chemotherapeutic drug on
cells expressing said gene in the absence of said agent, wherein
said agent is capable of reducing the expression and/or activity of
a gene selected from the group consisting of genes having SEQ ID
NOs:1-39 or respective functional equivalents thereof and/or its
encoded protein, wherein a difference in said first inhibitory
effect and said second growth inhibitory effect identities said
agent as capable of modulating sensitivity of said cell to the
growth inhibitory effect of said chemotherapeutic drug.
73. The method of claim 72, further comprising: (a) contacting a
first cell expressing said gene with said chemotherapeutic drug in
the presence of said agent and measuring said first growth
inhibitory effect; (b) contacting a second cell expressing said
gene with said chemotherapeutic drug in the absence of said agent
and measuring said second growth inhibitory effect.
74. The method of claim 72, wherein said agent comprises a
substance selected from the group consisting of siRNA, antisense
nucleic acid, ribozyme, and triple helix forming nucleic acid, each
reducing the expression of said genes.
75. The method of claim 72, wherein said one or more agents
comprise a substance selected from the group consisting of
antibody, peptide, and small molecule, each reducing the activity
of one or more of proteins encoded by said one or more different
genes in said patient.
76. The method of claim 72, wherein said chemotherapy regimen
comprises administering a chemotherapy drug selected from the group
consisting of 5-fluorouracil, CMF combination consisting of
cyclophosphamide, methotrexate, and 5-fluorouracil, paclitaxel,
etoposide, carboplatin.
77. The method of claim 72, wherein said cell is a breast cancer
cell.
78. The method of claim 72, wherein said cell is ovarian cancer
cell.
79. A microarray comprising for each of one or more different genes
selected from the group consisting of genes having SEQ ID NOs:1-39
or respective functional equivalents thereof, one or more
polynucleotide probes complementary and hybridizable to a sequence
in said gene, wherein polynucleotide probes complementary and
hybridizable to said genes constitute at least 50%, 60%, 70%, 80%
or 90% of the probes on said microarray.
80. The method of claim 79, wherein said one or more different
genes consist of at least N or all of the different genes selected
from the group consisting of genes having SEQ ID NOs:1-39, wherein
N=2, 3, 4, 5, 10, 15, 20, 25, 30, or 35.
81. The method of claim 79, wherein said one or more different
genes consist of at least N or all of the genes selected from the
group consisting of genes having SEQ ID NOs:1-19, wherein N=2, 3,
4, 5, 10, 15.
82. The method of claim 79, wherein said one or more different
genes consist of at least N or all of the different genes selected
from the group consisting of genes having SEQ ID NOs:20-39, wherein
N=2, 3, 4, 5, 10, or 15.
83. The method of claim 79, wherein said one or more different gene
are of (i) at least Nor all of the different genes selected from
the group consisting of genes having SEQ ID NOs:1-19, wherein N=2,
3, 4, 5, 10, or 15; and (ii) at least N or all of the different
genes selected from the group consisting of genes having SEQ ID
NOs:20-39, wherein N=2, 3, 4, 5, 10, or 15.
84. The method of claim 34, wherein said method is carried out in
vivo.
85. The method of claim 34, wherein said method is carried out in
vitro.
Description
[0001] This application claims the benefit under 35 U.S.C.
.sctn.119(e) of U.S. Provisional Patent Application No. 60/818,262,
filed on Jun. 30, 2006, which is incorporated by reference herein
in its entirety.
1. FIELD OF THE INVENTION
[0002] The invention relates to molecular markers that are
associated with responses to chemotherapies in a patient, and
methods and computer systems for determining such responses based
on measurements of these molecular markers. The present invention
also relates to methods and compositions for enhancing the efficacy
of chemotherapies in patients by modulating the expression or
activity of genes encoding these molecular markers and/or their
encoded proteins.
2. BACKGROUND OF THE INVENTION
[0003] Chemotherapy is an important modality in treating many types
of cancers. A large and growing variety of potent chemotherapeutic
agents targeting cancer cells by various mechanisms have been
developed and can be used individually or in combination. With the
help of such a growing menu of chemotherapeutic agents, the
disease-free survival and overall survival of cancer patients have
been significantly improved for many types of cancers. However, not
all cancer patients are responsive to all available chemotherapy
treatments. Most chemotherapeutic agents cause severe side effects,
such as anemia, infections and sepsis (sometimes lethal) due to
immune suppression, hemorrhage, and hepatotoxicity. Chemotherapy
treatments generally are also physically exhausting for patients,
and are often associated with high costs. Therefore, determining
whether a cancer patient should receive chemotherapy and choosing
the appropriate chemotherapy are often important parts of medical
intervention. Traditionally, chemotherapy is prescribed to cancer
patients based on their disease prognosis and risk of side effects.
For example, in cases of breast cancer, such prognostic and
predictive factors as age, tumor size, axillary lymph node status,
histological tumor type, pathological grade and hormone receptor
status have been used to evaluate whether a patient may benefit
from chemotherapy treatments.
[0004] In the past several years, gene expression signatures of
cancer cells have been found to provide more accurate disease
prognosis and/or prediction of chemotherapy responsiveness than
traditional clinical factors. Gene markers that are informative for
predicting breast cancer outcome have been disclosed (see, e.g.,
United States Patent Publication 20030224374; United States Patent
Publication 20040058340; van't Veer et al., 2001, Nature 415:530;
van de Vijver et al., 2002, N. Engl. J. Med. 347:1999). Expression
profiles of such gene markers, e.g., a 70-gene set, was found to be
capable of predicting the likelihood of the occurrence of
metastases within five years of initial diagnosis in breast cancer
patients. It was found that a prognosis based on expression
profiles of the gene markers outperforms that based on traditional
clinical factors. The 70-gene set was also found to be capable of
predicting whether a patient should be treated with systemic
therapies such as chemotherapy and hormonal therapy (see United
States Patent Publication 20040058340).
[0005] The 70-gene marker set has also been found to be useful for
predicting the responsiveness of a breast cancer patient to
chemotherapy in certain patient subgroups (see, e.g., Dai et al.,
US 2004-0058340, published Mar. 25, 2004). Among patients whose
gene expression profile indicates poor prognosis, a patient's
responsiveness to chemotherapy depends not only on the patient's ER
level, but also on the change of the ER level with age. It
discloses that patients who show high ER level at an earlier age
(thus a high ER/AGE) show little response to chemotherapy, whereas
patients who show high ER level at later age (thus a low ER/AGE)
show increased response to chemotherapy.
[0006] Pawitan et al. (Pawitan et al., 2005, Breast Cancer Res.
7:R953-964) reported identification of gene expression signatures
that are associated with prognosis and response to adjuvant
therapies. Gene expression profiles of tumor samples from 159
population-derived breast cancer patients were analyzed using
hierarchical clustering, and a set of 64 genes was identified. The
64-gene set was found to be able to distinguish three subclasses of
patients: patient who did well with therapy, patients who did well
without therapy, and patients who failed to benefit from given
therapy.
[0007] Wang et al. investigated the gene expression patterns of
chemoresistance to thymidylate synthase (TS) inhibitors Raltitrexed
(TDX) and 5-fluorouracil (5-FU) in a panel of 5 matched cancer cell
lines (Wang et al., 2001, Cancer Res. 61:5505-10). By comparing the
expression profiles of resistant cell lines and their respective
chemosensitive parent cell lines, Wang et al. have found 28 genes
whose expression levels were altered >1.5-fold among resistant
cells, with 2 genes (TS and YES1) consistently higher in the
panel.
[0008] Duan et al. disclosed identification of genes involved in a
paclitaxel resistance phenotype (Duan et al., 2005, Cancer
Chemotherapy and Pharmacology 55:277-285). Affymetrix HG-U95Av2
microarrays were used to quantify gene expression differences
between the resistant and sensitive cell lines. Three
paclitaxel-resistant human ovarian and breast cancer cell lines
were established from drug-sensitive patental cell lines. Eight
genes were identified to be significantly over-expressed in the
three drug-resistant cell lines, including multi-drug resistant
gene 1 (MDR1), and three genes were identified to be significantly
under-expressed in the three drug-resistant cell lines.
[0009] Chang et al. disclosed evaluating tumor response to
neoadjuvant docetaxel treatment in breast cancer patient based on
expression profiles (Chang et al., 2003, Lancet. 362:362-369).
Differential patterns of expression of 92 genes were found to
correlate with docetaxel response. Among these genes, a higher
expression of genes involved in cell cycle, cytoskeleton, adhesion,
protein transport, protein modification, transcription, and stress
or apoptosis was found to be associated with sensitive tumors,
whereas increased expression of some transcriptional and signal
transduction genes was found to be associated with resistant
tumors. Chang et al. disclosed that the molecular patterns of the
residual cancers after three months of docetaxel treatment were
found to be strikingly similar, independent of initial sensitivity
or resistance (Chang et al., 2005, J Clin Oncol 23:1169-1177). They
concluded that this may indicate selection of a residual and
resistant subpopulation of cells. The gene expression pattern was
populated by genes involved in cell cycle arrest at G2M (e.g.,
mitotic cyclins and cdc2) and survival pathways involving the
mammalian target of rapamycin. The authors state that these genes
may be therapeutic targets that could lead to improved
treatment.
[0010] Luker et al. (Luker et al., 2001, Cancer Res. 61:6540-6547)
reported identification of interferon regulatory factor 9 (IRF9) as
a positive regulator of resistance to anti-microtubule agents such
as paclitaxel in breast cancer cells. Luker et al. showed that
several proteins in the type I IFN regulated pathway were
over-expressed in paclitaxel-resistant breast tumor cell lines
derived from the MCF-7 cell line and in untreated breast tumor
samples and uterine tumor samples.
[0011] Einav et al. (Einav et al., 2005, Oncogen 24:6367-75)
reported an analysis of gene expression data of various cancers,
including gene expression data of childhood acute lymphoblastic
leukemia (ALL) samples (Yeoh et al., 2002, Cancer Cell 1:133-143),
gene expression data of breast cancer samples (van't Veer et al.,
2002, Nature 415:530-536), and gene expression data of ovarian
cancer samples (Welsh et al., 2001, Proc. Natl. Acad. USA
98:1176-81), among others. They discovered that a group of about 30
correlated genes, containing mainly genes in the interferon
response pathway, are over-expressed in certain subclasses of ALL
samples, breast cancer samples and ovarian cancer samples.
[0012] Spentzos et al. reported a 93-gene signature that can be
used to prognose chemotherapy responsiveness in epithelial ovarian
cancer patients (Spentzos et al., 2005, J. Clinic. Oncology
23:7911-7918).
[0013] WO 2005/100606 disclosed gene sets useful in predicting the
response of cancer, e.g. breast cancer patients to chemotherapy. WO
2005/100606 also disclosed a multi-gene RNA analysis based cancer
test which can be used for predicting patient response to
chemotherapy.
[0014] Discussion or citation of a reference herein shall not be
construed as an admission that such reference is prior art to the
present invention.
3. SUMMARY OF THE INVENTION
[0015] The invention provides a method for predicting the
responsiveness of a mammalian patient having a cancer to a
chemotherapy regimen, comprising predicting said patient as (a)
responsive to said chemotherapy regimen, if expression and/or
activity of one or more gene products in a cell sample taken from
said patient is not up-regulated relative to a reference population
of individuals of the same species as said patient; or (b)
non-responsive to said chemotherapy regimen, if expression and/or
activity of said one or more gene products is up-regulated relative
to said reference population, wherein said one or more gene
products comprise respectively products of one or more different
genes selected from the group consisting of genes corresponding to
SEQ ID NOsNOs:1-39 or respective functional equivalents thereof. In
one embodiment, the method further comprises prior to said step of
predicting a step of determining whether expression and/or activity
of said one or more gene products is up-regulated as relative to
said reference population of individuals.
[0016] In some embodiments, said step of determining is carried out
by a method comprising determining one or more chemotherapy
response scores (CR scores) based on measurements of at least said
one or more gene products in said cell sample, wherein said one or
more CR scores indicate whether expression and/or activity of said
one or more first gene products is up-regulated as compared to
individuals in said reference population.
[0017] In one embodiment, said step of determining one or more CR
scores is carried out by a method comprising determining a CR score
that is an average of said measurements of said one or more gene
products, wherein said patient is predicted as responsive if said
average is less or equal to a predetermined threshold value or as
non-responsive if said average is greater than said predetermined
threshold value.
[0018] In another embodiment, said step of determining one or more
CR scores is carried out by a method comprising determining a CR
score that is a measurement of a gene product of a gene having the
greatest expressive range among said different genes selected from
the group consisting of genes having SEQ ID NOsNOs:1-19 or among
said different genes selected from the group consisting of genes
having SEQ ID NOsNOs:20-39, wherein said patient is predicted as
responsive if said measurement is less or equal to a predetermined
threshold value or as non-responsive if said measurement is greater
than said predetermined threshold value.
[0019] In still another embodiment, said step of determining one or
more CR scores is carried out by a method comprising (a1) comparing
a marker profile comprising said measurements of said one or more
gene products with a responsive template and/or a non-responsive
template, said responsive template comprising measurements of said
one or more gene products representative of measurements of said
one or more genes products in a plurality of patients being
responsive to said chemotherapy regimen, and said non-responsive
template comprising measurements of said one or more gene products
representative of measurements of said plurality of genes products
in a plurality of patients being non-responsive to said
chemotherapy regimen; and (a2) determining a first degree of
similarity between said marker profile and said responsive template
and/or a second degree of similarity between said marker profile
and said non-responsive template, wherein said first and second
degrees of similarity are said one or more CR scores, and wherein
said patient is (b1) predicted to be responsive if said first
degree of similarity is greater than said second degree of
similarity or if said first degree of similarity is greater than a
predetermined threshold or (b2) predicted to be non-responsive if
said first degree of similarity is no greater than said second
degree of similarity or if said second degree of similarity is no
greater than said predetermined threshold. In one embodiment, each
said degree of similarity is represented by a correlation
coefficient between said marker profile and said respective
template. In one embodiment, the measurement of each gene product
in said responsive template is an average of the measurements of
said gene product in a plurality of responsive patients, and
wherein the measurement of each gene product in said non-responsive
template is an average of the measurements of said gene product in
a plurality of non-responsive patients.
[0020] In another embodiment, said step of determining one or more
CR scores is carried out by a method comprising using a
chemotherapy response classifier selected from the group consisting
of an artificial neural network (ANN) classifier and a support
vector machine (SVM) classifier, wherein said chemotherapy response
classifier receives an input comprising a marker profile comprising
said measurements of said one or more gene products and provides an
output comprising said one or more CR scores. In one embodiment,
said chemotherapy response classifier is trained with training data
from a plurality of training cancer patients, wherein said training
data comprise for each patient of said plurality of training cancer
patients (i) a training marker profile comprising measurements of
said plurality of gene products in a cell sample taken from said
training patient; and (ii) data indicating whether said training
patient is responsive to said treatment regimen.
[0021] In still another embodiment, the method comprises
determining one or more CR scores that indicates in which
percentile said measurements of said one or more gene products fall
in the said reference population, wherein said patient is predicted
to be non-responsive if said one or more CR scores indicate that
said measurements of said one or more gene products fall in the Y1
percentile in said reference population, wherein Y1 percentile=60
percentile, 70 percentile, 80 percentile, or 90 percentile, or is
predicted to be responsive if said one or more CR scores indicate
that said measurements of said one or more gene products fall in
the Y2 percentile in said reference population, wherein Y2
percentile=10 percentile, 20 percentile, 30 percentile, or 40
percentile.
[0022] In one embodiment, said measurements of one or more gene
products are measurements of abundance levels of gene
transcripts.
[0023] In another embodiment, said measurements of one or more gene
products are measurements of abundance levels of proteins.
[0024] In a specific embodiment, said chemotherapy regimen
comprises administration of a chemotherapy drug selected from the
group consisting of 5-fluorouracil, CMF combination consisting of
cyclophosphamide, methotrexate, and 5-fluorouracil, paclitaxel,
etoposide, and carboplatin.
[0025] In one embodiment, said one or more gene products are
respectively products of the genes selected from the group
consisting of genes having SEQ ID NOsNOs:1-39. In another
embodiment, said one or more gene products are of at least N or are
all of the different genes selected from the group consisting of
genes having SEQ ID NOsNOs:1-39, wherein N=2, 3, 4, 5, 10, 15, 20,
25, 30, or 35. In still another embodiment, said one or more gene
products are of at least N, or are all of the different genes
selected from the group consisting of genes having SEQ ID
NOsNOs:1-19, wherein N=2, 3, 4, 5, 10, or 15. In still another
embodiment, said one or more gene products are of at least N, or
are all of the different genes selected from the group consisting
of genes having SEQ ID NOs:20-39, wherein N=2, 3, 4, 5, 10, or 15.
In still another embodiment, said one or more gene products
comprises gene products of (i) at least N, or are all of the
different genes selected from the group consisting of genes having
SEQ ID NOs:1-19, wherein N=2, 3, 4, 5, 10, or 15 and (ii) at least
M, or are all of the different genes selected from the group
consisting of genes having SEQ ID NOs:20-39, wherein M=2, 3, 4, 5,
10, or 15.
[0026] In one embodiment, the chemotherapy regimen is an adjuvant
chemotherapy regimen, and wherein a prediction of a patient as
responsive to said chemotherapy regimen indicates non-occurrence of
metastases or survival within a first predetermined period of time
after initial diagnosis in said patient treated with said
chemotherapy regimen, and wherein a prediction of a patient as
non-responsive to said chemotherapy regimen indicates occurrence of
metastases or non-survival within a second predetermined period of
time in said patient treated with said chemotherapy regimen. In
another embodiment, said chemotherapy regimen is a primary
chemotherapy regimen, and a prediction of a patient as responsive
to said chemotherapy regimen indicates (i) a reduction in tumor
size or number of cancer cells and/or (ii) non-occurrence of
metastases or survival within a first predetermined period of time
after initial diagnosis in said patient treated with said
chemotherapy regimen, and wherein a prediction as responsive to
said chemotherapy regimen indicates (iii) a lack of reduction in
tumor size or number of cancer cells and/or (iv) occurrence of
metastases or non-survival within a second predetermined period of
time in said patient treated with said chemotherapy regimen. The
first period of time and said second periods of time can be the
same, e.g., each 3, 5, 7, 10, or 12 years.
[0027] In another embodiment, said patient has been determined to
have a poor prognosis, wherein a poor prognosis indicates
occurrence of metastases or non-survival within a third
predetermined period of time (e.g., 3, 5, 7 or 10 years) in said
patient untreated with any chemotherapy for said cancer.
[0028] In one embodiment, said measurement of each said gene
product is a relative level of said gene product in said cell
sample versus level of said gene product in a reference sample,
represented as a log ratio.
[0029] In one embodiment, said reference sample is selected from
the group consisting of a sample comprising a pool of cancer cells
obtained from a plurality of patients having said cancer, a sample
of cells of a non-cancerous cell line of cells of the same type of
tissue as said cancer, and a sample of cells of a cell line of said
cancer.
[0030] In a preferred embodiment, said patient is a human
patient.
[0031] In one embodiment, said cancer is breast cancer. In another
embodiment, said cancer is ovarian cancer.
[0032] The invention also provides a method for assigning a
treatment regimen for a patient having a cancer, comprising (i)
predicting whether said patient is responsive or non-responsive to
a chemotherapy regimen using the method described above; and (ii)
if said patient is determined to be responsive to said chemotherapy
regimen, assigning said patient a treatment regimen that comprises
said chemotherapy regimen; or if said patient is determined to be
non-responsive to said chemotherapy regimen, assigning said patient
(ii1) a treatment regimen that does not comprise said chemotherapy
regimen or (ii2) a treatment regiment comprising (A) said
chemotherapy regimen and (B) one or more agents that reduce the
expression and/or activity level of said one more gene
products.
[0033] The invention also provides a method for enrolling a
plurality of cancer patients for a clinical trial of a chemotherapy
regimen, comprising (i) determining whether each patient in said
plurality is responsive or non-responsive to said chemotherapy
regimen using the method described above; and (ii) assigning each
patient who is predicted to be responsive to one patient group and
each patient who is predicted to be non-responsive to another
patient group, at least one of said patient group being enrolled in
said clinical trial.
[0034] In a preferred embodiment, the above described methods are
computer-implemented methods.
[0035] The methods of the invention can further comprise obtaining
said measurements of said one or more gene products by a method
comprising measuring said plurality of gene products of said cell
sample taken from said patient.
[0036] In one embodiment, the method further comprises obtaining
measurement of abundance level of each said gene transcript by a
method comprising contacting a positionally-addressable microarray
with nucleic acids from said cell sample or nucleic acids derived
therefrom under hybridization conditions, and detecting the amount
of hybridization that occurs, said microarray comprising one or
more polynucleotide probes complementary to a hybridizable sequence
of each said gene transcript or a nucleic acid derived thereof. In
one embodiment, measurement of abundance level of each said gene
transcript by a method comprising measuring the transcript level of
said gene using quantitative reverse transcriptase PCT
(qRT-PCR).
[0037] The invention also provides a computer system comprising a
processor, and a memory coupled to said processor and encoding one
or more programs, wherein said one or more programs cause the
processor to carry out any one of the method of described above.
The invention also provides a computer program product for use in
conjunction with a computer having a processor and a memory
connected to the processor, said computer program product
comprising a computer readable storage medium having a computer
program mechanism encoded thereon, wherein said computer program
mechanism may be loaded into the memory of said computer and cause
said computer to carry out any one of the method of described
above.
[0038] The invention also provides a method for treating a patient
having a cancer, comprising administering to said patient (a) one
or more agents that is capable of reducing the expression and/or
activity of one or more different genes selected from the group
consisting of genes having SEQ ID NOs:1-39 or respective functional
equivalents thereof and/or their encoded proteins, and (b) a
chemotherapy regimen, wherein said patient is predicted to be
non-responsive to said chemotherapy regimen as a result of
over-expression of said one or more different genes. In one
embodiment, said one or more different genes consist of at least N
or all of the different genes selected from the group consisting of
genes having SEQ ID NOs:1-39, wherein N=2, 3, 4, 5, 10, 15, 20, 25,
30, or 35. In another embodiment, said one or more different genes
consist of at least N or all of the different genes selected from
the group consisting of genes having SEQ ID NOs:1-19, wherein N=2,
3, 4, 5, 10, or 15. In still another embodiment, said one or more
different genes consist of at least N or all of the different genes
selected from the group consisting of genes having SEQ ID
NOs:20-39, wherein N=2, 3, 4, 5, 10, or 15. In still another
embodiment, said one or more different gene are of (i) at least N
or all of the different genes selected from the group consisting of
genes having SEQ ID NOs:1-19, wherein N=2, 3, 4, 5, 10, or 15; and
(ii) at least K or all of the different genes selected from the
group consisting of genes having SEQ ID NOs:20-39, wherein K=2, 3,
4, 5, 10, or 15.
[0039] In some embodiments, said one or more agents comprise a
substance selected from the group consisting of siRNA, antisense
nucleic acid, ribozyme, and triple helix forming nucleic acid, each
being capable of reducing the expression of one or more of said one
or more different genes. In a preferred embodiment, said one or
more agents comprise an siRNA targeting said one or more different
genes. In one embodiment, said one or more different genes consist
of at least L different genes, wherein L=2, 3, 4, 5, 10, or 15.
[0040] In some other embodiments, said one or more agents comprise
a substance selected from the group consisting of antibody,
peptide, and small molecule, each is capable of reducing the
activity of one or more of proteins encoded by said one or more
different genes.
[0041] In some embodiments, the method further comprises
determining a transcript level of each of said one or more
different genes. In one embodiment, said determining each said
transcript level is carried out by a method comprising measuring
the transcript level of said gene using one or more polynucleotide
probes, each of said one or more polynucleotide probes comprising a
nucleotide sequence complementary to a hybridizable sequence in
said transcript of said gene or a nucleic acid derived thereof. In
one embodiment, said one or more polynucleotide probes are
polynucleotide probes on a microarray. In another embodiment, said
determining each said transcript level is carried out by a method
comprising measuring the transcript level of said gene using
quantitative reverse transcriptase PCT (qRT-PCR).
[0042] In a specific embodiment, said chemotherapy regimen
comprises administering a chemotherapy drug selected from the group
consisting of 5-fluorouracil, CMF combination consisting of
cyclophosphamide, methotrexate, and 5-fluorouracil, paclitaxel,
etoposide, carboplatin.
[0043] In the methods of treating a cancer patient, the patient can
be a human patient. The cancer can be breast cancer or ovarian.
[0044] The invention also provides a method for modulating
sensitivity of a cell to a chemotherapeutic drug, comprising
contacting said cell with one or more agents, said one or more
agents being capable of reducing the expression and/or activity of
one or more different genes selected from the group consisting of
genes having SEQ ID NOs:1-39 or respective functional equivalents
thereof and/or the their encoded proteins. The invention also
provides a method for modulating growth of a cell, comprising
contacting said cell with (a) one or more agents, said one or more
agents being capable of reducing the expression and/or activity of
one or more different genes selected from the group consisting of
genes having SEQ ID NOs:1-39 or respective functional equivalents
thereof and/or the their encoded proteins; and (b) a sufficient
amount of a chemotherapeutic drug. The method can be carried out in
vivo. The method can also be carried out in vitro.
[0045] In one embodiment, said one or more different genes consist
of at least N or all of the different genes selected from the group
consisting of genes having SEQ ID NOs:1-39, wherein N=2, 3, 4, 5,
10, 15, 20, 25, 30, or 35. In another embodiment, said one or more
different genes consist of at least N or all of the different genes
selected from the group consisting of genes having SEQ ID NOs:1-19,
wherein N=2, 3, 4, 5, 10, or 15. In still another embodiment, said
one or more different genes consist of at least N or all of the
different genes selected from the group consisting of genes having
SEQ ID NOs:20-39, wherein N=2, 3, 4, 5, 10, or 15. In still another
embodiment, said one or more different gene are of (i) at least N
or all of the different genes selected from the group consisting of
genes having SEQ ID NOs:1-19, wherein N=2, 3, 4, 5, 10, or 15; and
(ii) at least K or all of the different genes selected from the
group consisting of genes having SEQ ID NOs:20-39, wherein K=2, 3,
4, 5, 10, or 15.
[0046] In one embodiment, said one or more agents comprise a
substance selected from the group consisting of siRNA, antisense
nucleic acid, ribozyme, and triple helix forming nucleic acid, each
is capable of reducing the expression of one or more of said one or
more different genes. In one embodiment, said one or more agents
comprise an siRNA targeting said one or more different genes. In
one embodiment, said one or more different genes consist of at
least L different genes, wherein L=2, 3, 4, 5, 10, or 15.
[0047] In another embodiment, said one or more agents comprise a
substance selected from the group consisting of antibody, peptide,
and small molecule, each is capable of reducing the activity of one
or more of proteins encoded by said one or more different
genes.
[0048] In some embodiments, the method further comprises
determining a transcript level of each of said one or more
different genes. In one embodiment, said determining each said
transcript level is carried out by a method comprising measuring
the transcript level of said gene using one or more polynucleotide
probes, each of said one or more polynucleotide probes comprising a
nucleotide sequence complementary to a hybridizable sequence in
said transcript of said gene or a nucleic acid derived thereof. In
one embodiment, said one or more polynucleotide probes are
polynucleotide probes on a microarray. In another embodiment, said
determining each said transcript level is carried out by a method
comprising measuring the transcript level of said gene using
quantitative reverse transcriptase PCT (qRT-PCR).
[0049] In a specific embodiment, said chemotherapy regimen
comprises administering a chemotherapy drug selected from the group
consisting of 5-fluorouracil, CMF combination consisting of
cyclophosphamide, methotrexate, and 5-fluorouracil, paclitaxel,
etoposide, carboplatin.
[0050] In the methods of for modulating sensitivity of a cell to a
chemotherapeutic drug, the cell is a human cell. The cell can be a
breast cancer cell or an ovarian cancer cell.
[0051] The invention also provides a method of identifying an agent
that is capable of modulating sensitivity of a cell to the growth
inhibitory effect of a chemotherapeutic drug, said method
comprising comparing a first growth inhibitory effect of said
chemotherapeutic drug on cells expressing said gene in the presence
of a candidate agent with a second growth inhibitory effect of said
chemotherapeutic drug on cells expressing said gene in the absence
of said agent, wherein said agent is capable of reducing the
expression and/or activity of a gene selected from the group
consisting of genes having SEQ ID NOs:1-39 or respective functional
equivalents thereof and/or its encoded protein, wherein a
difference in said first inhibitory effect and said second growth
inhibitory effect identifies said agent as capable of modulating
sensitivity of said cell to the growth inhibitory effect of said
chemotherapeutic drug. In one embodiment, the method further
comprises (a) contacting a first cell expressing said gene with
said chemotherapeutic drug in the presence of said agent and
measuring said first growth inhibitory effect; and (b) contacting a
second cell expressing said gene with said chemotherapeutic drug in
the absence of said agent and measuring said second growth
inhibitory effect. The method can be carried out in vivo, e.g., on
human or non-human patients. The method can also be carried out in
vitro, e.g., on cells of a cell culture.
[0052] In one embodiment, said agent comprises a substance selected
from the group consisting of siRNA, antisense nucleic acid,
ribozyme, and triple helix forming nucleic acid, each reducing the
expression of said genes.
[0053] In another embodiment, said one or more agents comprise a
substance selected from the group consisting of antibody, peptide,
and small molecule, each reducing the activity of one or more of
proteins encoded by said one or more different genes in said
patient.
[0054] In a specific embodiment, said chemotherapy regimen
comprises administering a chemotherapy drug selected from the group
consisting of 5-fluorouracil, CMF combination consisting of
cyclophosphamide, methotrexate, and 5-fluorouracil, paclitaxel,
etoposide, carboplatin.
[0055] In the methods, said cell can a breast cancer cell or an
ovarian cancer cell.
[0056] The invention also provides a microarray comprising for each
of one or more different genes selected from the group consisting
of genes having SEQ ID NOs:1-39 or respective functional
equivalents thereof, one or more polynucleotide probes
complementary and hybridizable to a sequence in said gene, wherein
polynucleotide probes complementary and hybridizable to said genes
constitute at least 50%, 60%, 70%, 80% or 90% of the probes on said
microarray. In one embodiment, said one or more different genes
consist of at least N or all of the different genes selected from
the group consisting of genes having SEQ ID NOs:1-39, wherein N=2,
3, 4, 5, 10, 15, 20, 25, 30, or 35. In another embodiment, said one
or more different genes consist of at least Nor all of the genes
selected from the group consisting of genes having SEQ ID NOs:1-19,
wherein N=2, 3, 4, 5, 10, 15. In still another embodiment, said one
or more different genes consist of at least Nor all of the
different genes selected from the group consisting of genes having
SEQ ID NOs:20-39, wherein N=2, 3, 4, 5, 10, or 15. In still another
embodiment, said one or more different gene are of (i) at least Nor
all of the different genes selected from the group consisting of
genes having SEQ ID NOs:1-19, wherein N=2, 3, 4, 5, 10, or 15; and
(ii) at least N or all of the different genes selected from the
group consisting of genes having SEQ ID NOs:20-39, wherein N=2, 3,
4, 5, 10, or 15.
4. BRIEF DESCRIPTION OF THE DRAWINGS
[0057] FIG. 1. (a) A network (hub #34) enriched for interferon
stimulated genes (ISG). (b) The hub genes are highly co-regulated
in breast cancer data where the network is derived from. Each row
represents a sample, each column represents one gene. A darker
shade, which was magenta in the original depiction of FIG. 1b,
represents up-regulation, and a lighter shade, which was cyan in
the original depiction of FIG. 1b, represents down regulation.
[0058] FIG. 2. The expression level of interferon stimulated genes
(ISGs) is related to chemotherapy (CMF) sensitivity in breast
cancer patients. (a) Patients with low expression of ISGs showed
great chemotherapy sensitivity as indicated by the Kaplan-Meier
plot of metastasis-free probability between patients who received
the treatment (lighter shade, which was red in the original
depiction of FIG. 2) vs. no treatment (darker shade, which was blue
in the original depiction of FIG. 2). At 10 years after diagnosis
of cancer, the treatment boosted the metastasis-free probability
from 60% to -95% (log-rank-test P-value 0.3%). (b) Patients with
high expression of ISGs showed no chemo-therapy sensitivity. There
was essentially no difference in metastasis-free probability
between patients with and without chemotherapy (P=75%).
[0059] FIG. 3. Exemplary bar chart of number of genes in each
P-value bin for 9 hubs. P-value is based on the correlation
coefficient between gene expression level and 5-FU drug resistance
category in ovarian ex-vivo experiment. Three hubs (#20, 34 and 88)
have a significant fraction of members whose base-line expression
level correlated with the drug resistance (with P-value of
correlation<5%). Two of the 3 hubs (#34 and 88) belong to an ISG
pathway.
[0060] FIG. 4. Expression of ISGs and their relation with drug
resistance in ex-vivo ovarian samples. Left panel: category of 5-FU
drug resistance measured by growth inhibition. EDR stands for
extreme drug resistance, LDR stands for low drug resistance. The
remaining category stands for intermediate. Heatmap: expression of
ISGs from hub 34 and hub 88. Each row represents a sample, each
column represents one gene. A darker shade, which was magenta in
the original heatmap of FIG. 4, represents up-regulation; and a
lighter shade, which was cyan in the original heatmap of FIG. 4,
represents down regulation. For LDR samples, ISGs are mostly
under-expressed compared to the average, whereas for EDR samples,
the ISG levels are relatively higher. Top panel: correlation of
expression level to drug resistance.
[0061] FIG. 5. Fraction of interferon-stimulated-genes (ISGs)
correlated with drug resistance in ex-vivo ovarian cancer samples
treated with a panel of anti-cancer drugs. The ISGs are relatively
specific in reporting the 5-FU drug sensitivity. Results for the
following drugs or drug combinations are shown: Taxol; Taxotere;
cisplatin (CPLAT); carboplatin (CARBPLT); cisplatin+gemcitabine
(CPG); cyclophosphamide (FOURHC); Doxil (DOXILR); etoposide (ETOP);
gemcitabine (GMCB); Topotecan (TOPOR); carboplatin+taxol (CARTXn);
cisplatin+cyclosporin A (CPCSAn); cisplatin+verapamil (CPVERn);
Doxil (DOXILPCI); doxil+cyclosporin A (DXLCAn); 5-FU (FIVEFUn);
hexamethylmelamine (PMMn); taxol+cyclosporin (TAXCAn); TOPOTECAN
(TOPOPn).
[0062] FIG. 6 illustrates an exemplary embodiment of a computer
system for implementing the methods of this invention.
5. DETAILED DESCRIPTION OF THE INVENTION
[0063] The invention provides molecular markers, i.e., genes, the
expression levels of which can be used for evaluating the
responsiveness of a cancer patient to chemotherapy. The identities
of these markers and the measurements of their respective gene
products, e.g., measurements of levels (abundances) of their
encoded mRNAs or proteins, can be used to develop a chemotherapy
responsiveness classifier that discriminates sensitivity from
resistance to one or more chemotherapeutic agents based on
measurements of such gene products in a sample from a patient. As
used herein, the term "gene product" includes mRNA transcribed from
the gene and protein encoded by the gene.
[0064] As used herein, chemotherapy in the context of a cancer
patient refers to the treatment, preferably systemic, of the cancer
patient with one or more anticancer drugs. Depending on the type
and stage of the cancer, the chemotherapy can be adjuvant
chemotherapy or primary chemotherapy. Adjuvant chemotherapy of a
cancer patient refers to chemotherapy of a patient whose primary
tumor has been surgically removed and who exhibits no evidence that
cancer remains. Primary chemotherapy, also called neoadjuvant
chemotherapy or induction chemotherapy, refers to chemotherapy
prior to a definitive surgical and/or other local therapeutic (e.g.
radiotherapeutic) procedure. Primary chemotherapy can be used
either prior to surgery or radiation to reduce the tumor size or as
the main treatment, e.g., for treating patients whose cancer is
inoperable and/or has become metastatic. Primary chemotherapy is
used in treating some patients with certain cancers, such as
specific types of lymphomas, some small cell lung cancers, and
locally advanced breast cancer. The appropriate dose and/or
schedule of chemotherapy treatment of a cancer patient can be
determined by a person skilled in the art. In preferred
embodiments, the chemotherapy treatment is carried out according to
standard medical practice for treating the particular cancer.
Chemotherapy treatment of a patient can begin at any time after the
initial diagnosis.
[0065] A patient is said to be responsive or sensitive to a
chemotherapy treatment ("responsive patient" or "responder") if the
chemotherapy treatment confers benefit to the patient, whereas a
patient is said to be non-responsive or resistant to a chemotherapy
treatment ("non-responsive patient" or "non-responder") if the
chemotherapy treatment fails to confer benefit to the patient.
Whether a patient is benefited can be determined clinically by a
person skilled in the art. For example, benefits to a cancer
patient include but are not limited to one or more of the
following: reduction of the size of the tumor and/or quantity of
tumor cells in the patient, metastasis-free survival within a
predetermined period of time after initial diagnosis, e.g., a
period of 1, 2, 3, 4, 5 or 10 years, or overall survival within a
predetermined period of time after initial diagnosis, e.g., a
period of 1, 2, 3, 4, 5 or 10 years. Thus, in cases of adjuvant
chemotherapy, in one embodiment, a patient treated by an adjuvant
chemotherapy regimen is said to be responsive if no metastases
occurs within a predetermined period of time after initial
diagnosis, e.g., a period of 1, 2, 3, 4, 5 or 10 years, whereas the
patient is said to be non-responsive if metastases occurs within a
predetermined period of time, e.g., a period of 1, 2, 3, 4, 5 or 10
years. In another embodiment, a patient treated by an adjuvant
chemotherapy regimen is said to be responsive if the patient
survives within a predetermined period of time after initial
diagnosis, e.g., a period of 1, 2, 3, 4, 5 or 10 years, whereas the
patient is said to be non-responsive if the patient does not
survive within a predetermined period of time, e.g., a period of 1,
2, 3, 4, 5 or 10 years. In cases of primary chemotherapy, in one
embodiment, a patient treated by a primary chemotherapy regimen is
said to be responsive if a reduction in tumor size or number of
cancer cells occurs and/or no metastases occurs or the patient
survives within a predetermined period of time after initial
diagnosis, e.g., a period of 1, 2, 3, 4, 5 or 10 years, whereas the
patient is said to be non-responsive if no reduction in tumor size
or number of cancer cells occurs and/or metastases occurs or the
patient does not survive within a predetermined period of time
after initial diagnosis, e.g., a period of 1, 2, 3, 4, 5 or 10
years. For primary chemotherapy, local surgical or radiation
treatment of the primary tumor may also be performed after the
chemotherapy treatment.
[0066] The invention provides a list of genes that discriminates
between responsive patients and non-responsive patients (Table 1,
infra). This set of genes is called the chemotherapy response
genes. Measurements of gene products of one or more of these genes,
as well as of their functional equivalents, can be used for
predicting whether a patient having a cancer will be responsive or
non-responsive to a treatment regimen of one or more
chemotherapeutic agents. A functional equivalent with respect to a
gene, designated as gene A, refers to a gene that encodes a protein
or mRNA that at least partially overlaps in physiological function
in the cell to that of the protein or mRNA encoded by gene A. In
particular, prediction of chemotherapy responsiveness in a patient
can be carried out by a method comprising determining whether
expression and/or activity of the gene product of one or more
different genes listed in Table 1, or functional equivalents of
such genes, in an appropriate cell sample from the patient, e.g., a
tumor sample obtained from the patient, is up-regulated, i.e.,
increased, relative to a reference population of individuals. The
reference population can be a plurality of individuals of the same
species as the patient. In a preferred embodiment, the patient is a
human patient. In another preferred embodiment, the reference
population comprises a plurality of patients having the same type
of cancer. Preferably, the reference population comprises both
responsive patients and non-responsive patients. The reference
population can comprise at least 10, 50, 100, 200, or 300 patients.
In one embodiment, the expression or activity of a gene product of
the patient is determined to be up-regulated if measurement of the
expression or activity of the gene product is above a first
threshold value. In another embodiment, the expression or activity
of a gene product of the patient is determined to be not
up-regulated if measurement of the expression or activity of the
gene product is not greater than a second threshold value. The
first and second threshold value can be the same threshold. In one
embodiment, the threshold value is an average value of measurements
of the expression or activity of the gene product in the reference
population. The first and second threshold value can also be
different. In another embodiment, the expression or activity of a
gene product of the patient is determined to be up-regulated if the
measurement of the expression or activity of the gene product falls
in the Y1 percentile in the reference population, i.e., the
measurement of the expression or activity of the gene product is
greater than Y1% of the individuals in the reference population,
where Y1 percentile=60 percentile, 70 percentile, 80 percentile, or
90 percentile. In another embodiment, the expression or activity of
a gene product of the patient is determined to be not up-regulated
if the measurement of the expression or activity of the gene
product falls in the Y2 percentile in the reference population,
i.e., the measurement of the expression or activity of the gene
product is greater than Y2% of the individuals in the reference
population, where Y2 percentile=10 percentile, 20 percentile, 30
percentile, or 40 percentile. In another embodiment, when the one
or more genes comprises more than one gene, the above described
methods can be adapted by using the sum or average of the
measurements of the expression or activity of the gene
products.
[0067] In some embodiments, a profile of one or more measurements
of the expression and/or activity of one or more genes, e.g., at
least Nor all, where N=1, 2, 3, 4, 5, 10, 15, 20, 25, 30, or 35; or
at least X % of the different genes, where X %=3%, 5%, 10%, 20%,
30%, 40%, 50%, 60%, 70%, 80%, or 90%, in Table 1 is used. Such a
profile of measurements is also referred to herein as an
"expression profile" or a "marker profile." In one embodiment, one
or more chemotherapy responsiveness scores or indices ("CR scores"
or "CR indices") are determined for a patient based on such an
expression profile. The CR scores indicate whether the one or more
genes in the marker profile of the patient is increased relative to
the reference population. The responsiveness of the patient to the
chemotherapy regimen is then determined based on the score or
scores.
[0068] The invention also provides methods and computer systems for
evaluating chemotherapy responsiveness to a chemotherapy regimen in
a patient based on a measured marker profile comprising
measurements of one or more markers of the present invention, e.g.,
an expression profile comprising measurements of transcripts of one
or more of the genes listed in Table 1, e.g., 1 or at least Nor all
different genes, where N=2, 3, 5, 10, 15, 20, 25, 30, or 35, listed
in Table 1 or functional equivalents of such genes. The methods and
systems of the invention can use a chemotherapy responsiveness
classifier for evaluating the responsiveness. The chemotherapy
responsiveness classifier can be based on an appropriate pattern
recognition method (such as those described in Section 5.2) that
receives an input comprising a marker profile and provides an
output comprising data, e.g., one or more CR scores, indicating
whether the patient is sensitive or resistant to chemotherapy. The
chemotherapy response classifier can be constructed with training
data from a plurality of cancer patients for whom marker profiles
and chemotherapy responsiveness are known. The plurality of
patients used for training the chemotherapy response classifier is
also referred to herein as the training population. The training
data comprise for each patient in the training population (a) a
marker profile comprising measurements of gene products of a
plurality of genes, respectively, in an appropriate cell sample,
e.g., a tumor sample, taken from the patient; and (b) information
regarding the patient's responsiveness to chemotherapy (e.g.,
metastasis free duration under the chemotherapy). Various
chemotherapy response classifiers that can be used in conjunction
with the present invention are described in Section 5.2., infra. In
some embodiments, additional patients having known marker profiles
and chemotherapy responsiveness can be used to test the accuracy of
the chemotherapy responsiveness classifier obtained using the
training population. Such additional patients are also called "the
testing population."
[0069] The markers in the marker sets are selected based on their
ability to discriminate patients who are responsive to a
chemotherapy regimen from patients who are non-responsive to the
chemotherapy regimen in a plurality of cancer patients whose
chemotherapy responsiveness is known, e.g., the training
population. Various methods can be used to evaluate the correlation
between marker levels and chemotherapy responsiveness. For example,
genes whose expression levels are significantly different across
responders and non-responders can be identified using an
appropriate method known in the art.
[0070] The measurements in the profiles of the gene products that
are used can be any suitable measured values representative of the
expression levels of the respective genes. The measurement of the
expression level of a gene can be direct or indirect, e.g.,
directly of abundance levels of RNAs or proteins or indirectly, by
measuring abundance levels of cDNAs, amplified RNAs or DNAs,
proteins, or activity levels of RNAs or proteins, or other
molecules (e.g., a metabolite) that are indicative of the
foregoing. In one embodiment, the profile comprises measurements of
abundances of the transcripts of the marker genes. The measurement
of abundance can be a measurement of the absolute abundance of a
gene product. The measurement of abundance can also be a value
representative of the absolute abundance, e.g., a normalized
abundance value (e.g., an abundance normalized against the
abundance of a reference gene product) or an averaged abundance
value (e.g., average of abundances obtained at different time
points or from different tumor cell samples from the patients, or
average of abundances obtained using different probes, etc.), or a
combination of both. As an example, the measurement of abundance of
a gene transcript can be a value obtained using an Affymetrix.RTM.
GeneChip.RTM. to measure hybridization to the transcript.
[0071] In another embodiment, the expression profile is a
differential expression profile comprising differential
measurements of a plurality of transcripts in a sample derived from
the patient versus measurements of the plurality of transcripts in
a reference sample, e.g., a cell sample of normal cells. Each
differential measurement in the profile can be but is not limited
to an arithmetic difference, a ratio, or a log(ratio). As an
example, the measurement of abundance of a gene transcript can be a
value for the transcript obtained using an ink-jet array or a cDNA
array in a two-color measurement. In a preferred embodiment, the
reference sample comprises target polynucleotide molecules from
normal cell samples, e.g., samples of non-cancerous cells. In one
embodiment, the non-cancerous cells are from the same kind of
biological tissue as the cancerous cells. A biological tissue
refers to a collection of interconnected cells that perform a
similar function within an organism. In another preferred
embodiment, the reference sample comprises target polynucleotide
molecules from cell samples from a population of cancer
patients.
[0072] The invention also provides methods and compositions for
enhancing the efficacy of a chemotherapy regimen by modulating the
expression and/or activity of one or more of the chemotherapy
response genes listed in Table 1 and/or their gene products, and/or
by modulating interactions of these genes and/or their gene
products with other proteins or molecules, e.g., substrates, in
combination of with the chemotherapy regimen. In one embodiment,
the expression of one or more of the chemotherapy response genes is
reduced to treat a cancer patient in combination with the
chemotherapy regimen. Such modulation can be achieved by, e.g.,
using an siRNA, antisense nucleic acid, ribozyme, and/or triple
helix forming nucleic acid that target the chemotherapy response
genes. In another embodiment, the activity of one or more
chemotherapy response proteins is reduced to enhance the effects of
the chemotherapy regimen. Such modulation can be achieved by, e.g.,
using antibodies, peptide molecules, and/or small molecules that
target chemotherapy response proteins. The inventors have
discovered that the chemotherapy response genes listed in Table 1
are highly expressed in non-responders as compared to responders,
and that reducing the expression levels of these genes enhances the
responsiveness of a patient.
[0073] The invention also provides methods and compositions for
utilizing the chemotherapy response genes, and/or their products
for screening for agents that modulate their expression and/or
activity and/or modulating their interactions with other proteins
or molecules. Agents that modulate expression and/or activity of
the chemotherapy response genes can be used in combination with the
chemotherapy treatment for treating a non-responsive cancer
patient. Such agents include but not limited to siRNA, antisense
nucleic acid, ribozyme, triple helix forming nucleic acid,
antibody, peptide or polypeptide molecules, and small organic or
inorganic molecules.
[0074] The present invention also provides methods and compositions
for identifying other extra- or intra-cellular molecules, e.g.,
genes and proteins, which interacts with the chemotherapy response
genes, and/or their gene products. The present invention also
provides methods and compositions for treating cancer by modulating
such extra- or intra-cellular molecules.
[0075] A "patient" as used herein is an animal. The patient can be
but is not limited to a human, or, in a veterinary context, a
non-human animal such as a ruminant, horse, swine, sheep, or a
domestic companion animal such as a feline or canine. In a
preferred embodiment, the patient is a human patient. Suitable
samples that can be used in conjunction with the methods of the
present invention include but are not limited to tumor samples,
e.g., tumor samples obtained from biopsies. In this application,
certain genes (for example, those correspond to SEQ ID NOs 1-39)
for human patients are disclosed. A person skilled in the art will
be able to determine the corresponding homologs for a non-human
animal and use such corresponding homologs to practice the
invention in such a non-human animal.
[0076] The invention also provides a computer system comprising a
processor, and a memory coupled to said processor and encoding one
or more programs, wherein said one or more programs cause the
processor to carry out a method described herein.
[0077] The invention also provides a computer program product for
use in conjunction with a computer having a processor and a memory
connected to the processor, said computer program product
comprising a computer readable storage medium having a computer
program mechanism encoded thereon, wherein said computer program
mechanism may be loaded into the memory of said computer and cause
said computer to carry out a method described herein.
5.1. Genes Associated with Chemotherapy Response
[0078] The invention provides molecular marker sets (of genes) that
can be used for evaluating chemotherapy response in a cancer
patient. The marker sets comprise one or more markers listed in
Table 1. Table 1 lists genes whose gene product can be measured and
used to distinguish cancer patients who are sensitive to a
chemotherapeutic agent from cancer patients who are resistant to
the chemotherapeutic agent. The inventors have discovered that
up-regulation of the expression and/or activity of one or more of
these genes correlates with resistance to chemotherapy. The genes
listed in Table 1 include genes clustered into two different
clusters. Genes corresponding to SEQ ID NOs:1-19 belong to one
cluster, and genes corresponding to SEQ ID NOs:20-39 belong to
another cluster. The genes listed in Table 1 are called the
chemotherapy response genes ("CR genes"). The genes listed in Table
1 are particularly useful for evaluating responsiveness of breast
cancer or ovarian cancer patients to respective standard
chemotherapy regimen, e.g., the CMF combination (consisting of
cyclophosphamide, methotrexate, and 5-fluorouracil). For those
genes listed in Table 1 that have a GenBank.RTM. accession number,
the GenBank.RTM. accession number is listed. For those genes in
Table 1 that do not have a GenBank.RTM. Accession No, the Contig ID
numbers of the transcript sequences in the Phil Green assembly (Nat
Genet 2000 June; 25(2):232-4) is listed. Phil Green's group at the
University of Washington assembled ESTs from the Washington
University-Merck Human EST Project and CGAP archives. Analysis of
expressed sequence tags indicates 35,000 human genes (Nat Genet
2000 June; 25(2):232-4). This assembly, dated Mar. 17, 2000,
resulted in 62,064 contigs representing 795,000 ESTs (see web
address: www.phrap.org/est_assembly/human/gene_number_methods.html;
and wwvv.phrap.org/est_assembly/human/gene_number_methods.html).
These contigs have the word "contig" included in their
identifiers.
TABLE-US-00001 TABLE 1 chemotherapy response genes Transcript ID
Gene Symbol Gene Name SEQ ID No NM_002346 LY6E lymphocyte antigen 6
complex, locus E 1 NM_003113 SP100 nuclear antigen Sp100 2
Contig43645_RC LOC129607 hypothetical protein LOC129607 3 NM_002462
MX1 myxovirus (influenza virus) resistance 4 1,
interferon-inducible protein p78 (mouse) NM_002759 EIF2AK2
eukaryotic translation initiation factor 5 2-alpha kinase 2
NM_004223 UBE2L6 ubiquitin-conjugating enzyme E2L 6 6 NM_004335
BST2 bone marrow stromal cell antigen 2 7 NM_005101 G1P2
interferon, alpha-inducible protein 8 (clone IFI-15K) NM_004585
RARRES3 retinoic acid receptor responder 9 (tazarotene induced) 3
Contig25595_RC KIAA1618 KIAA1618 10 NM_005567 LGALS3BP lectin,
galactoside-binding, soluble, 3 11 binding protein NM_007267 EVER1
epidermodysplasia verruciformis 1 12 AB037825 KIAA1404 KIAA1404
protein 13 NM_017414 USP18 ubiquitin specific protease 18 14 M30818
MX2 myxovirus (influenza virus) resistance 15 2 (mouse) NM_016817
OAS2 2'-5'-oligoadenylate synthetase 2, 16 69/71 kDa NM_000308 PPGB
protective protein for beta- 17 galactosidase (galactosialidosis)
NM_002038 G1P3 interferon, alpha-inducible protein 18 (clone
IFI-6-16) AB006746 PLSCR1 phospholipid scramblase 1 19 AB025254
TDRD7 tudor domain containing 7 20 Contig1063_RC 21 U72882 IFI35
interferon-induced protein 35 22 NM_004509 SP110 SP110 nuclear body
protein 23 AF026941 RSAD2 radical S-adenosyl methionine domain 24
containing 2 NM_005532 IFI27 interferon, alpha-inducible protein 27
25 NM_014314 DDX58 DEAD (Asp-Glu-Ala-Asp) box 26 polypeptide 58
NM_006417 IFI44 interferon-induced protein 44 27 AL137255 ZC3HDC1
zinc finger CCCH-type domain 28 containing 1 NM_006820 IFI44L
interferon-induced protein 44-like 29 Contig51660_RC IFRG28 28 kD
interferon responsive protein 30 Contig63102_RC LGP2 likely
ortholog of mouse D11lgp2 31 NM_017523 BIRC4BP XIAP associated
factor-1 32 NM_016816 OAS1 2',5'-oligoadenylate synthetase 1, 33
40/46 kDa NM_017631 FLJ20035 hypothetical protein FLJ20035 34
Contig47563_RC FLJ31033 hypothetical protein FLJ31033 35
Contig41538_RC IFIT3 interferon-induced protein with 36
tetratricopeptide repeats 3 NM_017912 HERC6 hect domain and RLD 6
37 NM_001548 IFIT1 interferon-induced protein with 38
tetratricopeptide repeats 1 NM_001549 IFIT3 interferon-induced
protein with 39 tetratricopeptide repeats 3
[0079] Genes that are not listed in Table 1 but which are
functional equivalents of any gene listed in Table 1 can also be
used with or in place of the gene listed in the table. A functional
equivalent of a gene A refers to a gene that encodes a protein or
mRNA that at least partially overlaps in physiological function in
the cell to that of the protein or mRNA of gene A.
[0080] In various specific embodiments, different numbers and
subcombinations of the genes listed in Table 1 are selected as the
marker set, whose profile is used in the methods of the invention,
as described in Section 5.2., infra. In one embodiment, at least N
different genes listed in Table 1 are used, where N=1, 2, 3, 4, 5,
10, 15, 20, 25, 30, or 35. In another embodiment, at least Nor all
of the different genes selected from the group consisting of genes
having SEQ ID NOs:1-19 are used, where N=1, 2, 3; 4, 5, 10, or 15.
In still another embodiment, at least Nor all of the different
genes selected from the group consisting of genes having SEQ ID
NOs:20-39 are used, where N=1, 2, 3, 4, 5, 10, or 15. In still
another embodiment, at least N or all of the different genes
selected from the group consisting of genes having SEQ ID NOs:1-19,
where N=1, 2, 3, 4, 5, 10, or 15, and at least Mor all of the
different genes selected from the group consisting of genes having
SEQ ID NOs:20-39, where M=1, 2, 3, 4, 5, 10, or 15, are used. In
still another embodiment, one or more of the interferon stimulated
genes (ISGs) listed in Table 1 are used. In one embodiment, at
least N or all different ISGs listed in Table 1 are used, where
N=1, 2, 3, 4, 5, 10.
[0081] The invention also provides methods for identifying a set of
genes that can be used for evaluating chemotherapy responsiveness
in cancer patients. The methods make use of measured expression
profiles of a plurality of genes (e.g., measurements of abundance
levels of the corresponding gene products) in suitable tumor
samples, e.g., tumor cell line or tumor samples from a plurality of
patients whose responsiveness to chemotherapy is known.
Chemotherapy response markers can be obtained by identifying genes
whose expression levels are correlated with responsiveness to
chemotherapy. In preferred embodiments, sets of genes co-varying
among a population of cancer patients are evaluated to identify
those sets whose expression levels correlate with chemotherapy
responsiveness in the patients. In other preferred embodiments,
sets of genes co-varying among cells of a tumor cell line are
evaluated to identify those sets whose expression levels correlate
with responsiveness of the tumor cells to chemotherapy
treatment.
[0082] In one embodiment, co-varying gene sets (also identified as
gene networks or hubs in this application) are determined from
expression profiles of a plurality of genes (e.g., measurements of
abundance levels of the corresponding gene products) in tumor
samples from a plurality of patients whose responsiveness to a
chemotherapy regimen is known. The plurality of patients comprises
both responsive patients and non-responsive patients. Each
co-varying gene set is evaluated to determine its association with
responsiveness to the chemotherapy. In one embodiment, the
plurality of patients is divided into two populations according to
the expression level of one or more genes in the co-varying gene
set. Patients having high expression level of the one or more genes
are assigned to one population (the "high expression population"),
and patients having low expression level of the one or more genes
are assigned to the other population (the "low expression
population"). In one embodiment, the average expression level of
all genes in the set is used such that patients having the average
expression level above a predetermined threshold level are assigned
to the high expression population, and patients having the average
expression level below or equal to the predetermined threshold
level are assigned to the low expression population. The
predetermined threshold level can be a level that best separates
the patients according to treatment effect. The effect of the
chemotherapy treatment is examined for each patient population. In
one embodiment, the metastasis rate is examined to determine
whether it is affected by the chemotherapy treatment. In one
embodiment, a log-rank-test is performed on one or more suitable
clinical parameters that indicate responsiveness or
non-responsiveness, e.g., the metastasis free probability as a
function of time for patients, with treatment vs. no treatment. The
co-varying set is identified as a chemotherapy responsive set if
the set has a log-rank-test p-value below a predetermined threshold
value in one patient population but not in another patient
population, where the populations were stratified based on the
level, e.g., average expression level or a representative level, of
co-varying genes.
[0083] In another preferred embodiment, cell samples of a cancer
cell line or from tumor cells grown ex-vivo are used to identify
the markers. A plurality of cell samples treated with different
doses of a chemotherapeutic agent can be used. The growth
inhibitory effect of each drug on the tumor cell is measured.
Samples can be categorized into 3 classes for each drug: EDR
(extreme drug resistance), MDR (moderate drug resistance), and LDR
(low drug resistance). Pairwise correlation coeffients of different
genes are calculated. Genes having magnitudes of correlation
coefficients above a selected threshold value, e.g., 0.5, are
grouped in a co-varying set. Genesets that exhibit significant
difference in expression levels in the EDR samples and LDR samples
are identified as genesets that can be used to evaluate
chemotherapy responsiveness in patients. In one embodiment,
genesets containing genes whose expression levels in EDR samples
are higher, e.g., at least 1.5 fold higher, than those in LDR
samples are identified as genesets that can be used to evaluate
chemotherapy responsiveness in patients. In another embodiment,
genesets containing genes whose expression levels correlate with
low drug resistance, e.g., having a correlation above 0.3, 0.4, or
0.5, are identified as genesets that can be used to evaluate
chemotherapy responsiveness in patients.
[0084] Methods for grouping genes into co-varying sets are known in
the art. See, e.g., U.S. Pat. No. 6,203,987 and U.S. Pat. No.
6,801,859, both of which are incorporated herein by reference in
their entireties. The co-varying sets of the present invention can
be identified by means of a clustering algorithm (i.e., by means of
"clustering analysis").
[0085] The clustering methods and algorithms that can be employed
in the present invention include both "hierarchical" or
"fixed-number-of groups" algorithms (see, e.g., S-Plus Guide to
Statistical and Mathematical Analysis v.3.3, 1995, MathSoft, Inc.:
StatSci. Division, Seattle, Wash.). Such algorithms are well known
in the art (see, e.g., Fukunaga, 1990, Statistical Pattern
Recognition, 2nd Ed., San Diego: Academic Press; Everitt, 1974,
Cluster Analysis, London: Heinemann Educ. Books; Hartigan, 1975,
Clustering Algorithms, New York: Wiley; Sneath and Sokal, 1973,
Numerical Taxonomy, Freeman; Anderberg, 1973, Cluster Analysis for
Applications, New York: Academic Press), and include, e.g.,
hierarchical agglomerative clustering algorithms, the "k-means"
algorithm of Hartigan, and model-based clustering algorithms such
as mclust by MathSoft, Inc. Preferably, hierarchical clustering
methods and/or algorithms are employed in the methods of this
invention. In one embodiment, the clustering analysis of the
present invention is done using the hclust routine or algorithm
(see, e.g., `hclust` routine from the software package S-Plus,
MathSoft, Inc., Cambridge, Mass.).
[0086] The clustering algorithms used in the present invention
operate on a table of data containing gene expression measurements.
Specifically, the data table analyzed by the clustering methods
comprises an m.times.k array or matrix wherein m is the total
number of conditions or perturbations, i.e., total number of
different siRNAs, and k is the number of cellular constituents,
e.g., transcripts of genes, measured and/or analyzed.
[0087] The clustering algorithms analyze such arrays or matrices to
determine dissimilarities between cellular constituents.
Mathematically, dissimilarities between cellular constituents i and
j are expressed as "distances" I.sub.i,j. For example, in one
embodiment, the Euclidian distance is determined according to the
formula
I i , j = ( n v i ( n ) - v j ( n ) 2 ) 1 / 2 ( 4 )
##EQU00001##
where v.sub.i.sup.(n) and v.sub.j.sup.(n) are the response of
cellular constituents i and j respectively to the perturbation n.
In other embodiments, the Euclidian distance in Equation 4 above is
squared to place progressively greater weight on cellular
constituents that are further apart. In alternative embodiments,
the distance measure l.sub.i,jis the Manhattan distance provide
by
I i , j = n v i ( n ) - v j ( n ) ( 5 ) ##EQU00002##
[0088] In another embodiment, the distance is defined as
I.sub.i,j=1-r.sub., ijwhere r.sub.i,j is the "correlation
coefficient" or normalized "dot product" between the response
vectors v.sub.i and v.sub.j. For example, r.sub.i,j is defined
by
r i , j = v i v j v i v j ( 6 ) ##EQU00003##
wherein the dot product v.sub.iv.sub.j is defined by
v i v j = n v i ( n ) v j n and v i = ( v i v i ) 1 / 2 ; and v j =
( v j v j ) 1 / 2 ( 7 ) ##EQU00004##
[0089] In still other embodiments, the distance measure may be the
Chebychev distance, the power distance, and percent disagreement,
all of which are well known in the art. In another embodiment, the
distance measure is I.sub.i,j=1-r.sub.i,j with the correlation
coefficient which comprises a weighted dot product of the response
vector v.sub.i and v.sub.j. Specifically, in this embodiment,
r.sub.ij is preferably defined by the equation
r i , j = n v i ( n ) v j ( n ) .sigma. i ( n ) .sigma. j ( n ) [ n
( v i ( n ) .sigma. i ( n ) ) 2 n ( v j ( n ) .sigma. j ( n ) ) 2 ]
1 / 2 ( 8 ) ##EQU00005##
where .PHI..sub.i.sup.(n) and .PHI..sub.j.sup.(n) are the standard
errors associated with the measurement of the i'th and j'th
cellular constituents, respectively, in experiment n.
[0090] The correlation coefficients of Equations 6 and 8 are
bounded between values of +1, which indicates that the two response
vectors are perfectly correlated and essentially identical, and -1,
which indicates that the two response vectors are "anti-correlated"
or "anti-sense" (i.e., are opposites). These correlation
coefficients are particularly preferable in embodiments of the
invention where cellular constituent sets or clusters are sought of
constituents which have responses of the same sign.
[0091] In other embodiments, it is preferable to identify cellular
constituent sets or clusters which are co-regulated or involved in
the same biological responses or pathways, but which comprise
similar and anti-correlated responses. In such embodiments, it is
preferable to use the absolute value of Equation 6 or 8, i.e.,
|r.sub.i,j|, as the correlation coefficient.
[0092] In still other embodiments, the relationships between
co-regulated and/or co-varying cellular constituents may be even
more complex, such as in instances wherein multiple biological
pathways (e.g., signaling pathways) converge on the same cellular
constituent to produce different outcomes. In such embodiments, it
is preferable to use a correlation coefficient
r.sub.ij=r.sub.ij.sup.(change) which is capable of identifying
co-varying and/or co-regulated cellular constituents irrespective
of the sign. The correlation coefficient specified by Equation 9
below is particularly useful in such embodiments.
r i , j change = n v i ( n ) .sigma. i ( n ) v j ( n ) .sigma. j (
n ) [ n ( v i ( n ) .sigma. i ( n ) ) 2 n ( v j ( n ) .sigma. j ( n
) ) 2 ] 1 / 2 ( 9 ) ##EQU00006##
[0093] Generally, the clustering algorithms used in the methods of
the invention also use one or more linkage rules to group cellular
constituents into one or more sets or "clusters." For example,
single linkage or the nearest neighbor method determines the
distance between the two closest objects (i.e., between the two
closest cellular constituents) in a data table. By contrast,
complete linkage methods determine the greatest distance between
any two objects (i.e., cellular constituents) in different clusters
or sets. Alternatively, the unweighted pair-group average evaluates
the "distance" between two clusters or sets by determining the
average distance between all pairs of objects (i.e., cellular
constituents) in the two clusters. Alternatively, the weighted
pair-group average evaluates the distance between two clusters or
sets by determining the weighted average distance between all pairs
of objects in the two clusters, wherein the weighing factor is
proportional to the size of the respective clusters. Other linkage
rules, such as the unweighted and weighted pair-group centroid and
Ward's method, are also useful for certain embodiments of the
present invention (see, e.g., Ward, 1963, J. Am. Stat. Assn 58:236;
Hartigan, 1975, Clustering Algorithms, New York: Wiley).
[0094] Once a clustering algorithm has grouped the cellular
constituents from the data table into sets or cluster, e.g., by
application of linkage rules such as those described supra, a
clustering "tree" may be generated to illustrate the clusters of
cellular constituents so determined.
[0095] In a preferred embodiment, tumor samples from a population
of M cancer patients are used to identify the markers. Preferably,
M is at least 100, 200, or 300. Expression profile of each tumor
sample is obtained. Preferably, the population contains both
responsive and non-responsive patients. In another preferred
embodiment, cell samples of a cancer cell line are used to identify
the markers. A plurality of cell samples treated with different
doses of a chemotherapeutic agent can be used. The growth
inhibitory effect of each drug on the tumor cell is measured.
Samples can be categorized into 3 classes for each drug: EDR
(extreme drug resistance), MDR (moderate drug resistance), and LDR
(low drug resistance). Pairwise correlation coefficients of
different genes are calculated. Genes having magnitudes of
correlation coefficients above a selected threshold value, e.g.,
0.5, are grouped in a co-varying set.
[0096] In a specific embodiment, tumor samples from a population of
K cancer patients are used, among which N patients received
chemotherapy. Microarrays are used for expression profiling.
Pairwise correlation coefficients of different genes in the
expression profiles are calculated. Genes having magnitudes of
correlation coefficients above a predetermined threshold level,
e.g., 0.5, are grouped in a co-varying set. A total of S co-varying
sets (or hubs) are obtained. The hub expression level of each hub
in each cancer sample is then obtained by averaging over genes in
the hub. The population of K cancer patients is divided into two
subpopulations according to the hub expression level. A threshold
that best separate the patients according to treatment effect is
found. For example, the threshold can be 20 percentile, 30
percentile 50 percentile, or 80 percentile, which best separates
the patients according to treatment effect. Within each
subpopulation, the treatment effect is examined by determining
whether the metastasis or survival rate is affected by the
chemotherapy. In one embodiment, a log-rank-test is performed on
the metastasis free probability or probability of survival as a
function of time for patients with treatment vs. no treatment. When
this search is performed over all K samples, one or more hubs with
log-rank-test p-value<0.01 are identified. Among the identified
hubs, one or more hubs can be selected.
[0097] The selected hubs can also be examined in ex-vivo cancer
data sets. Cancer cell line samples are plated ex-vivo and treated
by a panel of anticancer drugs. The tumor cell growth inhibition
for each drug treatment is measured and samples are categorized
into 3 classes for each drug: EDR (extreme drug resistance), MDR
(moderate drug resistance), and LDR (low drug resistance). The
cancer cell line samples pre-dose of drugs are profiled against the
pool of all samples. The expression levels of hub genes are tested
by their correlation to the drug resistance categories. The hubs
that exhibit significant fraction of members correlated (p-value of
correlation<5%) to the growth inhibition by each drug are
identified.
[0098] The specificity of identified hubs for reporting on the
responsiveness to a drug can also be checked. In one embodiment,
the correlation between expression level and drug resistance for
all tested drugs is calculated. The number (or percentage) of genes
in a hub correlated with resistance to a drug can be used as a
measure of the specificity of the hub for the drug. In preferred
embodiments, a hub for which such a number or percentage for a drug
is above a predetermined threshold, e.g., 0.3, 0.4 or 0.5, are
identified as specific for the drug.
5.2. Methods of Evaluating Responsiveness to Chemotherapy
[0099] The invention provides methods for determining the
responsiveness of a cancer patient to a chemotherapy regimen using
a measured marker profile comprising measurements of one or more of
the gene products of genes, e.g., the sets of genes described in
Section 5.1., supra. In particular, prediction of chemotherapy
responsiveness in a patient can be carried out by a method
comprising determining whether expression and/or activity of the
gene product of one or more different genes listed in Table 1, or
functional equivalents of such genes, in an appropriate cell sample
from the patient, e.g., a tumor sample obtained from the patient is
up-regulated, i.e., increased, relative to a reference population
of individuals, e.g., a plurality of patients having the same type
of cancer.
[0100] In one embodiment, one or more CR scores or indices are
determined for a patient based on the expression levels of one or
more of such markers. The CR scores indicate whether the one or
more genes in the marker profile of the patient is increased
relative to the reference population. The responsiveness of the
patient to the chemotherapy, e.g., nonoccurrence of metastases or
survival within a predetermined period of time when undergoing a
chemotherapy, is then determined based on the score or scores.
[0101] In preferred embodiments, the methods of the invention use a
chemotherapy response classifier, also called a classifier, for
predicting chemotherapy responsiveness to in a patient. The
chemotherapy response classifier can be based on an appropriate
pattern recognition method that receives an input comprising a
marker profile and provides an output comprising data indicating
which phase the patient belongs. The chemotherapy response
classifier can be trained with training data from a training
population of cancer patients. Typically, the training data
comprise for each of the cancer patients in the training population
a training marker profile comprising measurements of respective
gene products of a plurality of genes in a suitable sample taken
from the patient and chemotherapy responsiveness information. In a
preferred embodiment, the training population comprises both
responsive and non-responsive patients.
[0102] In preferred embodiments, the chemotherapy response
classifier can be based on a classification (pattern recognition)
method described below, e.g., profile similarity (Section 5.2.1.1.,
infra); artificial neural network (Section 5.2.1.2., infra);
support vector machine (SVM, Section 5.2.1.3., infra); logic
regression (Section 5.2.1.4., infra), linear or quadratic
discriminant analysis (Section 5.2.1.5., infra), decision trees
(Section 5.2.1.6., infra), clustering (Section 5.2.1.7., infra),
principal component analysis (Section 5.2.1.8., infra), nearest
neighbor classifier analysis (Section 5.2.1.9., infra). Such
chemotherapy response classifiers can be trained with the training
population using methods described in the relevant sections,
infra.
[0103] Various known statistical pattern recognition methods can be
used in conjunction with the present invention. A chemotherapy
response classifier based on any of such methods can be constructed
using the marker profiles and responsiveness data of training
patients. Such a chemotherapy response classifier can then be used
to evaluate the responsiveness of a cancer patient based on the
patient's marker profile. The methods can also be used to identify
markers that discriminate between responders and non-responders
using such markers. In a preferred embodiment, the methods are used
to predict responsiveness of a breast cancer or ovarian cancer
patient to a chemotherapy regimen selected from the following: CMF
combination (combination of cyclophosphamide, methotrexate, and
5-fluorouracil), 5-FU, paclitaxel (Taxol), etoposide, and
carboplatin.
5.2.1. Profile Matching
[0104] The responsiveness of a cancer patient to a chemotherapy
regimen can be evaluated by comparing a marker profile obtained in
a suitable sample from the patient with a marker profile that is
representative of marker profiles in responsive patients and/or a
marker profile that is representative of marker profiles in
non-responsive patients. As used herein, a marker profile is said
to be representative of marker profiles in a given patient
population if the marker profile contains the level of expression
and/or activity of one or more genes or gene products that is
characteristic of the patients in the population. In preferred
embodiments, the marker profile is an average of marker profiles of
a plurality of patients in the given patient population. Such a
marker profile is also termed a "template profile" or a "template."
A marker profile that is representative of marker profiles in
responsive patients is also called a "responsive template", and a
marker profile that is representative of marker profiles in
non-responsive patients is also called a "non-responsive template."
The degree of similarity to such a template profile provides an
evaluation of the patient's responsiveness to chemotherapy. If the
degree of similarity of the patient marker profile and a template
profile is above a predetermined threshold, the marker profile of
the patient is classified as a marker profile of the class of
patients represented by the template, and the patient is predicted
to belong to the class of patients.
[0105] In one embodiment, the similarity is represented by a
correlation coefficient between the patient's profile and the
template. In one embodiment, a correlation coefficient above a
correlation threshold indicates a high similarity, whereas a
correlation coefficient below the threshold indicates a low
similarity. Thus, the correlation coefficient can be used as a CR
score.
[0106] In a specific embodiment, P.sub.i measures the similarity
between the patient's profile {right arrow over (y)} and a template
profile, e.g., the responsive template profile {right arrow over
(z)}.sub.R or the non-responsive template profile {right arrow over
(z)}.sub.NR. Such a coefficient, P.sub.i, can be calculated using
the following equation:
P.sub.i=({right arrow over (z)}.sub.i{right arrow over
(y)})/(.parallel.{right arrow over
(z)}.sub.i.parallel..parallel.{right arrow over (y)}.parallel.)
where i designates the ith template. For example, i is R for the
responsive template. Thus, in one embodiment, {right arrow over
(y)} is classified as a responsive profile, and thus the patient is
classified as a responsive patient, if P.sub.R is greater than a
selected correlation threshold. In another embodiment, {right arrow
over (y)} is classified as a non-responsive profile, and thus the
patient is classified as a non-responsive patient, if P.sub.NR is
greater than a selected correlation threshold. In preferred
embodiments, the correlation threshold is set as 0.3, 0.4, 0.5 or
0.6. In another embodiment, {right arrow over (y)} is classified as
a responsive profile if P.sub.R is greater than P.sub.NR, whereas
{right arrow over (y)} is classified as a non-responsive profile if
P.sub.R is less than P.sub.NR.
[0107] In another embodiment, the correlation coefficient is a
weighted dot product of the patient's profile {right arrow over
(y)} and a template profile, in which measurements of each
different marker is assigned a weight.
[0108] In another embodiment, similarity between a patient's
profile and a template is represented by a distance between the
patient's profile and the template. In one embodiment, a distance
below a given value indicates high similarity, whereas a distance
equal to or greater than the given value indicates low
similarity.
[0109] In one embodiment, the Euclidian distance according to the
formula
D.sub.i=.parallel.{right arrow over (y)}-{right arrow over
(z)}.sub.i.parallel.
is used, where D.sub.i measures the distance between the patient's
profile {right arrow over (y)} and a template profile. In other
embodiments, the Euclidian distance is squared to place
progressively greater weight on cellular constituents that are
further apart. In alternative embodiments, the distance measure
D.sub.i is the Manhattan distance provide by
D i = n y ( n ) - z i ( n ) ##EQU00007##
where y(n) and z.sub.i(n) are respectively measurements of the nth
marker gene product in the patient's profile {right arrow over (y)}
and a template profile.
[0110] In another embodiment, the distance is defined as
D.sub.i=1-P.sub.i, where P.sub.i is the correlation coefficient or
normalized dot product as described above.
[0111] In still other embodiments, the distance measure may be the
Chebychev distance, the power distance, and percent disagreement,
all of which are well known in the art.
[0112] In one embodiment, the average expression level of the genes
in a marker set, e.g., the marker set containing genes having SEQ
ID NOs:1-39, or the marker set containing genes having SEQ ID
NOs:1-19 or the marker set containing genes having SEQ ID
NOs:20-39, is used as the CR score. If the value of the average in
a patient sample is above a predetermined threshold value, the
patient is classified as a non-responsive patient to chemotherapy
treatment using 5-FU, the CMF combination, Paclitaxel, etoposide,
or carboplatin, whereas if the value of the average in a patient
sample is not greater than the predetermined threshold value, the
patient is classified as a responsive patient to such chemotherapy
treatment. In another embodiment, the set value of a marker set
(see, e.g., U.S. Pat. No. 6, 203,987), e.g., the marker set
containing genes having SEQ ID NOs:1-19 or the marker set
containing genes having SEQ ID NOs:20-39, is used as the CR score.
If the set value in a patient sample is above a predetermined
threshold value, the patient is classified as a non-responsive
patient to chemotherapy treatment using 5-FU, the CMF combination,
Paclitaxel, etoposide, or carboplatin, whereas if the set value in
a patient sample is not greater than the predetermined threshold
value, the patient is classified as a responsive patient to such
chemotherapy treatment. In still another embodiment, the expression
level of the gene having the greatest expressive value in a marker
set (see, e.g., WO99/58720), e.g., the marker set containing genes
having SEQ ID NOs:1-19 or the marker set containing genes having
SEQ ID NOs:20-39, is used as the CR score. If the expression level
of such gene or genes in a patient sample is above a predetermined
threshold value, the patient is classified as a non-responsive
patient to chemotherapy treatment using 5-FU, the CMF combination,
Paclitaxel, etoposide, or carboplatin, whereas if the expression
level of such a gene in a patient sample is not greater than the
predetermined threshold value, the patient is classified as a
responsive patient to such chemotherapy treatment. In still another
embodiment, the average expression level of a subset of genes in a
marker set, e.g., at least N or all markers in the marker set
containing genes having SEQ ID NOs:1-39, where N=5,10, 20, 30, or
at least Mor all markers in the marker set containing genes having
SEQ ID NOs:1-19 or in the marker set containing genes having SEQ ID
NOs:20-39, where M=5, 10, 15, is used as the CR score. If the
average in a patient sample is above a predetermined threshold
value, the patient is classified as a non-responsive patient to
chemotherapy treatment using 5-FU, the CMF combination, Paclitaxel,
etoposide, or carboplatin, whereas if the value of the average in a
patient sample is not greater than the predetermined threshold
value, the patient is classified as a responsive patient to such
chemotherapy treatment. In preferred embodiments, the predetermined
threshold value for any one of the above embodiments is an average
of the respective measurements in a plurality of training patients.
Preferably, the plurality of training patients comprises both
responders and non-responders. Thus, the predetermined threshold
can be the value of the relevant measurement in the general patient
population.
5.2.2. Artificial Neural Network
[0113] In some embodiments, a neural network is used to classify a
patient marker profile. The neural network takes the patient marker
profile as an input and generates an output comprising data
indicating whether the patient is predicted to be a responsive or a
non-responsive patient, e.g., a CR score. A neural network can be
constructed for a selected set of molecular markers of the
invention. A neural network is a two-stage regression or
classification model. A neural network has a layered structure that
includes a layer of input units (and the bias) connected by a layer
of weights to a layer of output units. For regression, the layer of
output units typically includes just one output unit. However,
neural networks can handle multiple quantitative responses in a
seamless fashion.
[0114] In multilayer neural networks, there are input units (input
layer), hidden units (hidden layer), and output units (output
layer). There is, furthermore, a single bias unit that is connected
to each unit other than the input units. Neural networks are
described in Duda et al., 2001, Pattern Classification, Second
Edition, John Wiley & Sons, Inc., New York; and Hastie et al.,
2001, The Elements of Statistical Learning, Springer-Verlag, New
York.
[0115] The basic approach to the use of neural networks is to start
with an untrained network, present a training pattern, e.g., marker
profiles from training patients, to the input layer, and to pass
signals through the net and determine the output, e.g., one or more
CR scores indicating chemotherapy responsiveness in the training
patients, at the output layer. These outputs are then compared to
the target values; any difference corresponds to an error. This
error or criterion function is some scalar function of the weights
and is minimized when the network outputs match the desired
outputs. Thus, the weights are adjusted to reduce this measure of
error. For regression, this error can be sum-of-squared errors. For
classification, this error can be either squared error or
cross-entropy (deviation). See, e.g., Hastie et al., 2001, The
Elements of Statistical Learning, Springer-Verlag, New York.
[0116] Three commonly used training protocols are stochastic,
batch, and on-line. In stochastic training, patterns are chosen
randomly from the training set and the network weights are updated
for each pattern presentation. Multilayer nonlinear networks
trained by gradient descent methods such as stochastic
back-propagation perform a maximum-likelihood estimation of the
weight values in the model defined by the network topology. In
batch training, all patterns are presented to the network before
learning takes place. Typically, in batch training, several passes
are made through the training data. In online training, each
pattern is presented once and only once to the net.
[0117] In some embodiments, consideration is given to starting
values for weights. If the weights are near zero, then the
operative part of the sigmoid commonly used in the hidden layer of
a neural network (see, e.g., Hastie et al., 2001, The Elements of
Statistical Learning, Springer-Verlag, New York) is roughly linear,
and hence the neural network collapses into an approximately linear
model. In some embodiments, starting values for weights are chosen
to be random values near zero. Hence the model starts out nearly
linear, and becomes nonlinear as the weights increase. Individual
units localize to directions and introduce nonlinearities where
needed. Use of exact zero weights leads to zero derivatives and
perfect symmetry, and the algorithm never moves. Alternatively,
starting with large weights often leads to poor solutions.
[0118] Since the scaling of inputs determines the effective scaling
of weights in the bottom layer, it can have a large effect on the
quality of the final solution. Thus, in some embodiments, at the
outset all expression values are standardized to have mean zero and
a standard deviation of one. This ensures all inputs are treated
equally in the regularization process, and allows one to choose a
meaningful range for the random starting weights. With
standardization inputs, it is typical to take random uniform
weights over the range [-0.7, +0.7].
[0119] A recurrent problem in the use of networks having a hidden
layer is the optimal number of hidden units to use in the network.
The number of inputs and outputs of a network are determined by the
problem to be solved. In the present invention, the number of
inputs for a given neural network can be the number of molecular
markers in the selected set of molecular markers of the invention.
The number of output for the neural network will typically be just
one. However, in some embodiment more than one output is used so
that more than just two states can be defined by the network. If
too many hidden units are used in a neural network, the network
will have too many degrees of freedom and is trained too long,
there is a danger that the network will over-fit the data. If there
are too few hidden units, the training set cannot be learned.
Generally speaking, however, it is better to have too many hidden
units than too few. With too few hidden units, the model might not
have enough flexibility to capture the nonlinearities in the data;
with too many hidden units, the extra weight can be shrunk towards
zero if appropriate regularization or pruning, as described below,
is used. In typical embodiments, the number of hidden units is
somewhere in the range of 5 to 100, with the number increasing with
the number of inputs and number of training cases.
[0120] One general approach to determining the number of hidden
units to use is to apply a regularization approach. In the
regularization approach, a new criterion function is constructed
that depends not only on the classical training error, but also on
classifier complexity. Specifically, the new criterion function
penalizes highly complex models; searching for the minimum in this
criterion is to balance error on the training set with error on the
training set plus a regularization term, which expresses
constraints or desirable properties of solutions:
J+J.sub.pat+.lamda.J.sub.reg.
The parameter .lamda. is adjusted to impose the regularization more
or less strongly. In other words, larger values for .lamda. will
tend to shrink weights towards zero: typically cross-validation
with a validation set is used to estimate .lamda.. This validation
set can be obtained by setting aside a random subset of the
training population. Other forms of penalty can also be used, for
example the weight elimination penalty (see, e.g., Hastie et al.,
2001, The Elements of Statistical Learning, Springer-Verlag, New
York).
[0121] Another approach to determine the number of hidden units to
use is to eliminate--prune--weights that are least needed. In one
approach, the weights with the smallest magnitude are eliminated
(set to zero). Such magnitude-based pruning can work, but is
non-optimal; sometimes weights with small magnitudes are important
for learning and training data. In some embodiments, rather than
using a magnitude-based pruning approach, Wald statistics are
computed. The fundamental idea in Wald Statistics is that they can
be used to estimate the importance of a hidden unit (weight) in a
model. Then, hidden units having the least importance are
eliminated (by setting their input and output weights to zero). Two
algorithms in this regard are the Optimal Brain Damage (OBD) and
the Optimal Brain Surgeon (OBS) algorithms that use second-order
approximation to predict how the training error depends upon a
weight, and eliminate the weight that leads to the smallest
increase in training error.
[0122] Optimal Brain Damage and Optimal Brain Surgeon share the
same basic approach of training a network to local minimum error at
weight w, and then pruning a weight that leads to the smallest
increase in the training error. The predicted functional increase
in the error for a change in full weight vector .delta.w is:
.delta. J = ( .differential. J .differential. w ) t .delta. w + 1 2
.delta. w t .differential. 2 J .differential. w 2 .delta. w + O (
.delta. w 3 ) ##EQU00008## where .differential. 2 J .differential.
w 2 ##EQU00008.2##
is the Hessian matrix. The first term vanishes because we are at a
local minimum in error; third and higher order terms are ignored.
The general solution for minimizing this function given the
constraint of deleting one weight is:
.delta. w = - w q [ H - 1 ] qq H - 1 u q and L q = 1 2 - w q 2 [ H
- 1 ] qq ##EQU00009##
Here, u.sub.q is the unit vector along the qth direction in weight
space and L.sub.q is approximation to the saliency of the weight
q--the increase in training error if weight q is pruned and the
other weights updated .delta.w. These equations require the inverse
of H. One method to calculate this inverse matrix is to start with
a small value, H.sub.0.sup.-1=.alpha..sup.-1I, where .alpha. is a
small parameter--effectively a weight constant. Next the matrix is
updated with each pattern according to
H m + 1 - 1 = H m - 1 - H m - 1 X m + 1 X m + 1 T H m - 1 n a m + X
m + 1 T H m - 1 X m + 1 ##EQU00010##
where the subscripts correspond to the pattern being presented and
.alpha..sub.m decreases with m. After the full training set has
been presented, the inverse Hessian matrix is given by
H.sup.-1=H.sub.n.sup.-1. In algorithmic form, the Optimal Brain
Surgeon method is:
TABLE-US-00002 begin initialize n.sub.H, w, .theta. train a
reasonably large network to minimum error do compute H.sup.-1 by
Equation 1 q * .rarw. arg min q w q 2 / ( 2 [ H - 1 ] qq ) (
saliency L q ) ##EQU00011## w .rarw. w - w q * [ H - 1 ] q * q * H
- 1 e q * ( saliency L q ) ##EQU00012## until J(w) > .theta.
return w end
[0123] The Optimal Brain Damage method is computationally simpler
because the calculation of the inverse Hessian matrix in line 3 is
particularly simple for a diagonal matrix. The above algorithm
terminates when the error is greater than a criterion initialized
to be .theta.. Another approach is to change line 6 to terminate
when the change in J(w) due to elimination of a weight is greater
than some criterion value.
[0124] In some embodiments, a back-propagation neural network (see,
for example Abdi, 1994, "A neural network primer", J. Biol System.
2, 247-283) containing a single hidden layer of ten neurons (ten
hidden units) found in EasyNN-Plus version 4.0 g software package
(Neural Planner Software Inc.) is used. In a specific example,
parameter values within the EasyNN-Plus program are set as follows:
a learning rate of 0.05, and a momentum of 0.2. In some embodiments
in which the EasyNN-Plus version 4.0 g software package is used,
"outlier" samples are identified by performing twenty
independently-seeded trials involving 20,000 learning cycles
each.
5.2.3. Support Vector Machine
[0125] In some embodiments of the present invention, support vector
machines (SVMs) are used to classify subjects using expression
profiles of marker genes described in the present invention. The
SVM takes the patient marker profile as an input and generates an
output comprising data indicating whether the patient is predicted
to be a responsive or a non-responsive patient, e.g., a CR score.
General description of SVM can be found in, for example,
Cristianini and Shawe-Taylor, 2000, An Introduction to Support
Vector Machines, Cambridge University Press, Cambridge, Boser et
al., 1992, "A training algorithm for optimal margin classifiers, in
Proceedings of the 5.sup.th Annual ACM Workshop on Computational
Learning Theory, ACM Press, Pittsburgh, Pa., pp. 142-152; Vapnik,
1998, Statistical Learning Theory, Wiley, New York; Duda, Pattern
Classification, Second Edition, 2001, John Wiley & Sons, Inc.;
Hastie, 2001, The Elements of Statistical Learning, Springer, New
York; and Furey et al., 2000, Bioinformatics 16, 906-914.
Applications of SVM in biological applications are described in
Jaakkola et al., Proceedings of the 7.sup.th International
[0126] Conference on Intelligent Systems for Molecular Biology,
AAAI Press, Menlo Park, Calif. (1999); Brown et al., Proc. Natl.
Acad. Sci. 97(1):262-67 (2000); Zien et al., Bioinformatics,
16(9):799-807 (2000); Furey et al., Bioinformatics, 16(10):906-914
(2000)
[0127] In one approach, when a SVM is used, the gene expression
data is standardized to have mean zero and unit variance and the
members of a training population are randomly divided into a
training set and a test set. For example, in one embodiment, two
thirds of the members of the training population are placed in the
training set and one third of the members of the training
population are placed in the test set. The expression values for a
selected set of genes of the present invention are used to train
the SVM. Then the ability for the trained SVM to correctly classify
members in the test set is determined. In some embodiments, this
computation is performed several times for a given selected set of
molecular markers. In each iteration of computation, the members of
the training population are randomly assigned to the training set
and the test set. Then, the quality of the combination of molecular
markers is taken as the average of each such iteration of the SVM
computation.
[0128] Support vector machines map a given set of binary labeled
training data to a high-dimensional feature space and separate the
two classes of data with a maximum margin hyperplane. In general,
this hyperplane corresponds to a nonlinear decision boundary in the
input space. Let X .di-elect cons. R.sub.0 .OR right. .sup.n be the
input vectors, y .di-elect cons. {-1,+1} be the labels, and .phi.:
R.sub.0.fwdarw.F be the mapping from input space to feature space.
Then the SVM learning algorithm finds a hyperplane (w, b) such that
the quantity
.gamma. = min i y i { w , .phi. ( X i ) - b } ##EQU00013##
is maximized, where the vector w has the same dimensionality as F,
b is a real number, and .gamma. is called the margin. The
corresponding decision function is then
f(X)=sign(w,.phi.(X)-b)
[0129] This minimum occurs when
w = i .alpha. i y i .phi. ( X i ) ##EQU00014##
where {.alpha..sub.i} are positive real numbers that maximize
i .alpha. i - ij .alpha. i .alpha. j y i y j .phi. ( X i ) , .phi.
( X j ) ##EQU00015##
subject to
i .alpha. i y i = 0 , .alpha. i > 0. ##EQU00016##
[0130] The decision function can equivalently be expressed as
f ( X ) = sign ( i .alpha. i y i .phi. ( X i , .phi. ( X ) - b )
##EQU00017##
[0131] From this equation it can be seen that the .alpha..sub.i
associated with the training point X.sub.i expresses the strength
with which that point is embedded in the final decision function. A
remarkable property of this alternative representation is that only
a subset of the points will be associated with a non-zero
.alpha..sub.i. These points are called support vectors and are the
points that lie closest to the separating hyperplane. The
sparseness of the .alpha. vector has several computational and
learning theoretic consequences. It is important to note that
neither the learning algorithm nor the decision function needs to
represent explicitly the image of points in the feature space,
.phi.(X.sub.i), since both use only the dot products between such
images, .phi.(X.sub.i),.phi.(X.sub.j). Hence, if one were given a
function K(X,Y)=.phi.(X),.phi.(X), one could learn and use the
maximum margin hyperplane in the feature space without ever
explicitly performing the mapping. For each continuous positive
definite function K(X,Y) there exists a mapping .phi. such that
K(X,Y)=.phi.(X),.phi.(X) for all X,Y .di-elect cons. R.sub.0
(Mercer's Theorem). The function K(X,Y) is called the kernel
function. The use of a kernel function allows the support vector
machine to operate efficiently in a nonlinear high-dimensional
feature spaces without being adversely affected by the
dimensionality of that space. Indeed, it is possible to work with
feature spaces of infinite dimension. Moreover, Mercer's theorem
makes it possible to learn in the feature space without even
knowing .phi. and F. The matrix
K.sub.ij=.phi.(X.sub.i),.phi.(X.sub.j) is called the kernel matrix.
Finally, note that the learning algorithm is a quadratic
optimization problem that has only a global optimum. The absence of
local minima is a significant difference from standard pattern
recognition techniques such as neural networks. For moderate sample
sizes, the optimization problem can be solved with simple gradient
descent techniques. In the presence of noise, the standard maximum
margin algorithm described above can be subject to over-fitting,
and more sophisticated techniques should be used. This problem
arises because the maximum margin algorithm always finds a
perfectly consistent hypothesis and does not tolerate training
error. Sometimes, however, it is necessary to trade some training
accuracy for better predictive power. The need for tolerating
training error has led to the development the soft-margin and the
margin-distribution classifiers. One of these techniques replaces
the kernel matrix in the training phase as follows:
K.rarw.K+.lamda.I
while still using the standard kernel function in the decision
phase. By tuning .lamda., one can control the training error, and
it is possible to prove that the risk of misclassifying unseen
points can be decreased with a suitable choice of .lamda..
[0132] If instead of controlling the overall training error one
wants to control the trade-off between false positives and false
negatives, it is possible to modify K as follows:
K.rarw.K+.lamda.D
where D is a diagonal matrix whose entries are either d.sup.+ or
d.sup.-, in locations corresponding to positive and negative
examples. It is possible to prove that this technique is equivalent
to controlling the size of the .alpha..sub.i in a way that depends
on the size of the class, introducing a bias for larger
.alpha..sub.i in the class with smaller d. This in turn corresponds
to an asymmetric margin;
[0133] i.e., the class with smaller d will be kept further away
from the decision boundary. In some cases, the extreme imbalance of
the two classes, along with the presence of noise, creates a
situation in which points from the minority class can be easily
mistaken for mislabeled points. Enforcing a strong bias against
training errors in the minority class provides protection against
such errors and forces the SVM to make the positive examples
support vectors. Thus, choosing
d + = 1 n + and d - = 1 n - ##EQU00018##
provides a heuristic way to automatically adjust the relative
importance of the two classes, based on their respective
cardinalities. This technique effectively controls the trade-off
between sensitivity and specificity.
[0134] In the present invention, a linear kernel can be used. The
similarity between two marker profiles X and Y can be the dot
product XY. In one embodiment, the kernel is
K(X,Y)=XY+1
[0135] In another embodiment, a kernel of degree d is used
K(X,Y)=(XY+1).sup.d, where d can be either 2, 3, . . .
[0136] In still another embodiment, a Gaussian kernel is used
K ( X , Y ) = exp ( - X - Y 2 2 .sigma. 2 ) ##EQU00019##
[0137] where .sigma. is the width of the Gaussian.
5.2.4. Logistic Regression
[0138] In some embodiments, the chemotherapy response classifier is
based on a regression model, preferably a logistic regression
model. Such a regression model includes a coefficient for each of
the molecular markers in a selected set of molecular markers of the
invention. In such embodiments, the coefficients for the regression
model are computed using, for example, a maximum likelihood
approach. In particular embodiments, molecular marker data from two
different clinical groups, e.g., responsive and non-responsive, is
used and the dependent variable is the clinical status of the
patient for which molecular marker characteristic data are
from.
[0139] Some embodiments of the present invention provide
generalizations of the logistic regression model that handle
multicategory (polychotomous) responses. Such embodiments can be
used to discriminate an organism into one or three or more clinical
groups, e.g., chronic phase, accelerated phase, and blast phase.
Such regression models use multicategory logit models that
simultaneously refer to all pairs of categories, and describe the
odds of response in one category instead of another. Once the model
specifies logits for a certain (J-1) pairs of categories, the rest
are redundant. See, for example, Agresti, An Introduction to
Categorical Data Analysis, John Wiley & Sons, Inc., 1996, New
York, Chapter 8, which is hereby incorporated by reference.
5.2.5. Discriminant Analysis
[0140] Linear discriminant analysis (LDA) attempts to classify a
subject into one of two categories based on certain object
properties. In other words, LDA tests whether object attributes
measured in an experiment predict categorization of the objects.
LDA typically requires continuous independent variables and a
dichotomous categorical dependent variable. In the present
invention, the expression values for the selected set of molecular
markers of the invention across a subset of the training population
serve as the requisite continuous independent variables. The
clinical group classification of each of the members of the
training population serves as the dichotomous categorical dependent
variable.
[0141] LDA seeks the linear combination of variables that maximizes
the ratio of between-group variance and within-group variance by
using the grouping information. Implicitly, the linear weights used
by LDA depend on how the expression of a molecular marker across
the training set separates in the two groups (e.g., a responsive
group and a non-responsive group) and how this gene expression
correlates with the expression of other genes. In some embodiments,
LDA is applied to the data matrix of the N members in the training
sample by K genes in a combination of genes described in the
present invention. Then, the linear discriminant of each member of
the training population is plotted. Ideally, those members of the
training population representing a first subgroup (e.g. responsive
subjects) will cluster into one range of linear discriminant values
(e.g., negative) and those member of the training population
representing a second subgroup (e.g. non-responsive subjects) will
cluster into a second range of linear discriminant values (e.g.,
positive). The LDA is considered more successful when the
separation between the clusters of discriminant values is larger.
For more information on linear discriminant analysis, see Duda,
Pattern Classification, Second Edition, 2001, John Wiley &
Sons, Inc; and Hastie, 2001, The Elements of Statistical Learning,
Springer, New York; Venables & Ripley, 1997, Modern Applied
Statistics with s-plus, Springer, New York.
[0142] Quadratic discriminant analysis (QDA) takes the same input
parameters and returns the same results as LDA. QDA uses quadratic
equations, rather than linear equations, to produce results. LDA
and QDA are interchangeable, and which to use is a matter of
preference and/or availability of software to support the analysis.
Logistic regression takes the same input parameters and returns the
same results as LDA and QDA.
5.2.6. Decision Trees
[0143] In some embodiments of the present invention, decision trees
are used to classify patients using expression data for a selected
set of molecular markers of the invention. Decision tree algorithms
belong to the class of supervised learning algorithms. The aim of a
decision tree is to induce a classifier (a tree) from real-world
example data. This tree can be used to classify unseen examples
which have not been used to derive the decision tree.
[0144] A decision tree is derived from training data. An example
contains values for the different attributes and what class the
example belongs. In one embodiment, the training data is expression
data for a combination of genes described in the present invention
across the training population.
[0145] The following algorithm describes a decision tree
derivation:
TABLE-US-00003 Tree(Examples,Class,Attributes) Create a root node
If all Examples have the same Class value, give the root this label
Else if Attributes is empty label the root according to the most
common value Else begin Calculate the information gain for each
attribute Select the attribute A with highest information gain and
make this the root attribute For each possible value, v, of this
attribute Add a new branch below the root, corresponding to A = v
Let Examples(v) be those examples with A = v If Examples(v) is
empty, make the new branch a leaf node labeled with the most common
value among Examples Else let the new branch be the tree created by
Tree(Examples(v),Class,Attributes - {A}) end
[0146] A more detailed description of the calculation of
information gain is shown in the following. If the possible classes
v.sub.i of the examples have probabilities P(v.sub.i) then the
information content I of the actual answer is given by:
I ( P ( v i ) , , P ( v n ) ) = i = 1 n - P ( v i ) log 2 P ( v i )
##EQU00020##
The I-value shows how much information we need in order to be able
to describe the outcome of a classification for the specific
dataset used. Supposing that the dataset contains p positive (e.g.
responsive) and n negative (e.g. non-responsive) examples (e.g.
individuals), the information contained in a correct answer is:
I ( p p + n , n p + n ) = - p p / n log 2 p p + n - n p + n log 2 n
p + n ##EQU00021##
where log.sub.2 is the logarithm using base two. By testing single
attributes the amount of information needed to make a correct
classification can be reduced. The remainder for a specific
attribute A (e.g. a gene) shows how much the information that is
needed can be reduced.
Remainder ( A ) = i = 1 v p i + n i p + n I ( p i p i + n i , n i p
i / n i ) ##EQU00022##
"v" is the number of unique attribute values for attribute A in a
certain dataset, "i" is a certain attribute value, "p.sub.i" is the
number of examples for attribute A where the classification is
positive (e.g. cancer), "n.sub.i" is the number of examples for
attribute A where the classification is negative (e.g.
healthy).
[0147] The information gain of a specific attribute A is calculated
as the difference between the information content for the classes
and the remainder of attribute A:
Gain ( A ) = I ( p p + n , n p / n ) - Remainder ( A )
##EQU00023##
The information gain is used to evaluate how important the
different attributes are for the classification (how well they
split up the examples), and the attribute with the highest
information.
[0148] In general there are a number of different decision tree
algorithms, many of which are described in Duda, Pattern
Classification, Second Edition, 2001, John Wile.sub.y & Sons,
Inc.
[0149] Decision tree algorithms often require consideration of
feature processing, impurity measure, stopping criterion, and
pruning. Specific decision tree algorithms include, cut are not
limited to classification and regression trees (CART), multivariate
decision trees, ID3, and C4.5.
[0150] In one approach, when an exemplary embodiment of a decision
tree is used, the gene expression data for a selected set of
molecular markers of the invention across a training population is
standardized to have mean zero and unit variance. The members of
the training population are randomly divided into a training set
and a test set. For example, in one embodiment, two thirds of the
members of the training population are placed in the training set
and one third of the members of the training population are placed
in the test set. The expression values for a select combination of
genes described in the present invention is used to construct the
decision tree. Then, the ability for the decision tree to correctly
classify members in the test set is determined. In some
embodiments, this computation is performed several times for a
given combination of molecular markers. In each iteration of the
computation, the members of the training population are randomly
assigned to the training set and the test set. Then, the quality of
the combination of molecular markers is taken as the average of
each such iteration of the decision tree computation.
5.2.7. Clustering
[0151] In some embodiments, the expression values for a selected
set of molecular markers of the invention are used to cluster a
training set. For example, consider the case in which ten genes
described in the present invention are used. Each member m of the
training population will have expression values for each of the ten
genes. Such values from a member m in the training population
define the vector:
TABLE-US-00004 X.sub.1m X.sub.2m X.sub.3m X.sub.4m X.sub.5m
X.sub.6m X.sub.7m X.sub.8m X.sub.9m X.sub.10m
where X.sub.im is the expression level of the i.sup.th gene in
organism m. If there are m organisms in the training set, selection
of i genes will define m vectors. Note that the methods of the
present invention do not require that each the expression value of
every single gene used in the vectors be represented in every
single vector m. In other words, data from a subject in which one
of the i.sup.th genes is not found can still be used for
clustering. In such instances, the missing expression value is
assigned either a "zero" or some other normalized value. In some
embodiments, prior to clustering, the gene expression values are
normalized to have a mean value of zero and unit variance.
[0152] Those members of the training population that exhibit
similar expression patterns across the training group will tend to
cluster together. A particular combination of genes of the present
invention is considered to be a good classifier in this aspect of
the invention when the vectors cluster into the trait groups found
in the training population. For instance, if the training
population includes responsive patients and non-responsive patient,
a clustering classifier will cluster the population into two
groups, with each group uniquely representing either the responsive
group or the non-responsive group.
[0153] Clustering is described on pages 211-256 of Duda and Hart,
Pattern Classification and Scene Analysis, 1973, John Wiley &
Sons, Inc., New York. As described in Section 6.7 of Duda, the
clustering problem is described as one of finding natural groupings
in a dataset.
[0154] To identify natural groupings, two issues are addressed.
First, a way to measure similarity (or dissimilarity) between two
samples is determined. This metric (similarity measure) is used to
ensure that the samples in one cluster are more like one another
than they are to samples in other clusters. Second, a mechanism for
partitioning the data into clusters using the similarity measure is
determined.
[0155] Similarity measures are discussed in Section 6.7 of Duda,
where it is stated that one way to begin a clustering investigation
is to define a distance function and to compute the matrix of
distances between all pairs of samples in a dataset. If distance is
a good measure of similarity, then the distance between samples in
the same cluster will be significantly less than the distance
between samples in different clusters. However, as stated on page
215 of Duda, clustering does not require the use of a distance
metric. For example, a nonmetric similarity function s(x, x') can
be used to compare two vectors x and x'. Conventionally, s(x, x')
is a symmetric function whose value is large when x and x' are
somehow "similar". An example of a nonmetric similarity function
s(x, x') is provided on page 216 of Duda.
[0156] Once a method for measuring "similarity" or "dissimilarity"
between points in a dataset has been selected, clustering requires
a criterion function that measures the clustering quality of any
partition of the data. Partitions of the data set that extremize
the criterion function are used to cluster the data. See page 217
of Duda. Criterion functions are discussed in Section 6.8 of
Duda.
[0157] More recently, Duda et al., Pattern Classification, 2.sup.nd
edition, John Wiley & Sons, Inc. New York, has been published.
Pages 537-563 describe clustering in detail. More information on
clustering techniques can be found in Kaufman and Rousseeuw, 1990,
Finding Groups in Data: An Introduction to Cluster Analysis, Wiley,
New York, N.Y.; Everitt, 1993, Cluster analysis (3d ed.), Wiley,
New York, N.Y.; and Backer, 1995, Computer-Assisted Reasoning in
Cluster Analysis, Prentice Hall, Upper Saddle River, N.J.
Particular exemplary clustering techniques that can be used in the
present invention include, but are not limited to, hierarchical
clustering (agglomerative clustering using nearest-neighbor
algorithm, farthest-neighbor algorithm, the average linkage
algorithm, the centroid algorithm, or the sum-of-squares
algorithm), k-means clustering, fuzzy k-means clustering algorithm,
and Jarvis-Patrick clustering.
5.2.8. Principal Component Analysis
[0158] Principal component analysis (PCA) has been proposed to
analyze gene expression data. Principal component analysis is a
classical technique to reduce the dimensionality of a data set by
transforming the data to a new set of variable (principal
components) that summarize the features of the data. See, for
example, Jolliffe, 1986, Principal Component Analysis, Springer,
N.Y. Principal components (PCs) are uncorrelate and are ordered
such that the k.sup.th PC has the kth largest variance among PCs.
The k.sup.th PC can be interpreted as the direction that maximizes
the variation of the projections of the data points such that it is
orthogonal to the first k-1 PCs. The first few PCs capture most of
the variation in the data set. In contrast, the last few PCs are
often assumed to capture only the residual `noise` in the data.
[0159] PCA can also be used to create a chemotherapy response
classifier in accordance with the present invention. In such an
approach, vectors for a selected set of molecular markers of the
invention can be constructed in the same manner described for
clustering above. In fact, the set of vectors, where each vector
represents the expression values for the select genes from a
particular member of the training population, can be considered a
matrix. In some embodiments, this matrix is represented in a
Free-Wilson method of qualitative binary description of monomers
(Kubinyi, 1990, 3D QSAR in drug design theory methods and
applications, Pergamon Press, Oxford, pp 589-638), and distributed
in a maximally compressed space using PCA so that the first
principal component (PC) captures the largest amount of variance
information possible, the second principal component (PC) captures
the second largest amount of all variance information, and so forth
until all variance information in the matrix has been accounted
for.
[0160] Then, each of the vectors (where each vector represents a
member of the training population) is plotted. Many different types
of plots are possible. In some embodiments, a one-dimensional plot
is made. In this one-dimensional plot, the value for the first
principal component from each of the members of the training
population is plotted. In this form of plot, the expectation is
that members of a first group (e.g. chronic phase patients) will
cluster in one range of first principal component values and
members of a second group (e.g., advance phase patients) will
cluster in a second range of first principal component values.
[0161] In one example, the training population comprises two
groups: a responder group and a non-responder group. The first
principal component is computed using the molecular marker
expression values for the select genes of the present invention
across the entire training population data set. Then, each member
of the training set is plotted as a function of the value for the
first principal component. In this example, those members of the
training population in which the first principal component is
positive are the responders and those members of the training
population in which the first principal component is negative are
the non-responders.
[0162] In some embodiments, the members of the training population
are plotted against more than one principal component. For example,
in some embodiments, the members of the training population are
plotted on a two-dimensional plot in which the first dimension is
the first principal component and the second dimension is the
second principal component. In such a two-dimensional plot, the
expectation is that members of each subgroup represented in the
training population will cluster into discrete groups. For example,
a first cluster of members in the two-dimensional plot will
represent responsive subjects, a second cluster of members in the
two-dimensional plot will represent non-responsive subjects, and so
forth.
[0163] In some embodiments, the members of the training population
are plotted against more than two principal components and a
determination is made as to whether the members of the training
population are clustering into groups that each uniquely represents
a subgroup found in the training population. In some embodiments,
principal component analysis is performed by using the R mva
package (Anderson, 1973, Cluster Analysis for applications,
Academic Press, New York 1973; Gordon, Classification, Second
Edition, Chapman and Hall, CRC, 1999.). Principal component
analysis is further described in Duda, Pattern Classification,
Second Edition, 2001, John Wiley & Sons, Inc.
5.2.9. Nearest Neighbor Classifier Analysis
[0164] Nearest neighbor classifiers are memory-based and require no
model to be fit. Given a query point x.sub.0, the k training points
x.sub.(r), r, . . . , k closest in distance to x.sub.0 are
identified and then the point x.sub.0 is classified using the k
nearest neighbors. Ties can be broken at random. In some
embodiments, Euclidean distance in feature space is used to
determine distance as:
d.sub.(i)=.parallel.x.sub.(i)-x.sub.ol.parallel..
[0165] Typically, when the nearest neighbor algorithm is used, the
expression data used to compute the linear discriminant is
standardized to have mean zero and variance 1. In the present
invention, the members of the training population are randomly
divided into a training set and a test set. For example, in one
embodiment, two thirds of the members of the training population
are placed in the training set and one third of the members of the
training population are placed in the test set. Profiles of a
selected set of molecular markers of the invention represent the
feature space into which members of the test set are plotted. Next,
the ability of the training set to correctly characterize the
members of the test set is computed. In some embodiments, nearest
neighbor computation is performed several times for a given
combination of genes of the present invention. In each iteration of
computation, the members of the training population are randomly
assigned to the training set and the test set. Then, the quality of
the combination of genes is taken as the average of each such
iteration of the nearest neighbor computation.
[0166] The nearest neighbor rule can be refined to deal with issues
of unequal class priors, differential misclassification costs, and
feature selection. Many of these refinements involve some form of
weighted voting for the neighbors. For more information on nearest
neighbor analysis, see Duda, Pattern Classification, Second
Edition, 2001, John Wiley & Sons, Inc; and Hastie, 2001, The
Elements of Statistical Learning, Springer, N.Y.
5.2.10. Evolutionary Methods
[0167] Inspired by the process of biological evolution,
evolutionary methods of classifier design employ a stochastic
search for an optimal classifier. In broad overview, such methods
create several classifiers--a population--from measurements of gene
products of the present invention. Each classifier varies somewhat
from the other. Next, the classifiers are scored on expression data
across the training population. In keeping with the analogy with
biological evolution, the resulting (scalar) score is sometimes
called the fitness. The classifiers are ranked according to their
score and the best classifiers are retained (some portion of the
total population of classifiers). Again, in keeping with biological
terminology, this is called survival of the fittest. The
classifiers are stochastically altered in the next generation--the
children or offspring. Some offspring classifiers will have higher
scores than their parent in the previous generation, some will have
lower scores. The overall process is then repeated for the
subsequent generation: The classifiers are scored and the best ones
are retained, randomly altered to give yet another generation, and
so on. In part, because of the ranking, each generation has, on
average, a slightly higher score than the previous one. The process
is halted when the single best classifier in a generation has a
score that exceeds a desired criterion value. More information on
evolutionary methods is found in, for example, Duda, Pattern
Classification, Second Edition, 2001, John Wiley & Sons,
Inc.
5.2.11. Bagging, Boosting and the Random Subspace Method
[0168] Bagging, boosting and the random subspace method are
combining techniques that can be used to improve weak classifiers.
These techniques are designed for, and usually applied to, decision
trees. In addition, Skurichina and Duin provide evidence to suggest
that such techniques can also be useful in linear discriminant
analysis.
[0169] In bagging, one samples the training set, generating random
independent bootstrap replicates, constructs the classifier on each
of these, and aggregates them by a simple majority vote in the
final decision rule. See, for example, Breiman, 1996, Machine
Learning 24, 123-140; and Efron & Tibshirani, An Introduction
to Bootstrap, Chapman & Hall, New York, 1993.
[0170] In boosting, classifiers are constructed on weighted
versions of the training set, which are dependent on previous
classification results. Initially, all objects have equal weights,
and the first classifier is constructed on this data set. Then,
weights are changed according to the performance of the classifier.
Erroneously classified objects (molecular markers in the data set)
get larger weights, and the next classifier is boosted on the
reweighted training set. In this way, a sequence of training sets
and classifiers is obtained, which is then combined by simple
majority voting or by weighted majority voting in the final
decision. See, for example, Freund & Schapire, "Experiments
with a new boosting algorithm," Proceedings 13.sup.th International
Conference on Machine Learning, 1996, 148-156.
[0171] To illustrate boosting, consider the case where there are
two phenotypic groups exhibited by the population under study,
phenotype 1 (e.g., advanced phase patients), and phenotype 2 (e.g.,
chronic phase patients). Given a vector of molecular markers X, a
classifier G(X) produces a prediction taking one of the type values
in the two value set: {phenotype 1, phenotype 2}. The error rate on
the training sample is
err _ = 1 N i = 1 N I ( y i .noteq. G ( x i ) ) ##EQU00024##
where N is the number of subjects in the training set (the sum
total of the subjects that have either phenotype 1 or phenotype
2).
[0172] A weak classifier is one whose error rate is only slightly
better than random guessing. In the boosting algorithm, the weak
classification algorithm is repeatedly applied to modified versions
of the data, thereby producing a sequence of weak classifiers
G.sub.m(x), m,=1, 2, . . . , M. The predictions from all of the
classifiers in this sequence are then combined through a weighted
majority vote to produce the final prediction:
G ( x ) = sign ( m = 1 M .alpha. m G m ( x ) ) ##EQU00025##
Here .alpha..sub.1, .alpha..sub.2, . . . , .alpha..sub.M are
computed by the boosting algorithm and their purpose is to weigh
the contribution of each respective G.sub.m(x). Their effect is to
give higher influence to the more accurate classifiers in the
sequence.
[0173] The data modifications at each boosting step consist of
applying weights w.sub.1, w.sub.2, . . . , w.sub.n to each of the
training observations (x.sub.i, y.sub.i), i=1, 2, . . . , N.
Initially all the weights are set to w.sub.i=1/N, so that the first
step simply trains the classifier on the data in the usual manner.
For each successive iteration m=2, 3, . . . , M the observation
weights are individually modified and the classification algorithm
is reapplied to the weighted observations. At stem m, those
observations that were misclassified by the classifier G.sub.m-1(x)
induced at the previous step have their weights increased, whereas
the weights are decreased for those that were classified correctly.
Thus as iterations proceed, observations that are difficult to
correctly classify receive ever-increasing influence. Each
successive classifier is thereby forced to concentrate on those
training observations that are missed by previous ones in the
sequence.
[0174] The exemplary boosting algorithm is summarized as
follows:
[0175] 1. Initialize the observation weights w.sub.i=1/N, i=1, 2, .
. . , N.
[0176] 2. For m=1 to M: [0177] (a) Fit a classifier G.sub.m(x) to
the training set using weights w.sub.i. [0178] (b) Compute
[0178] err m = i = 1 N w i I ( y i .noteq. G m ( x i ) ) i = 1 N w
i ##EQU00026##
[0179] (c) Compute .alpha..sub.m=log((1-err.sub.m)/err.sub.m).
[0180] (d) Set
w.sub.i.rarw.w.sub.iexp[.alpha..sub.m(y.sub.i.noteq.G.sub.m(x.sub.i))],i=-
1, 2, . . . , N.
3. Output G ( x ) = sign m = 1 M .alpha. m G m ( x )
##EQU00027##
[0181] In the algorithm, the current classifier G.sub.m(x) is
induced on the weighted observations at line 2a. The resulting
weighted error rate is computed at line 2b. Line 2c calculates the
weight .alpha..sub.m given to G.sub.m(x) in producing the final
classifier G(x) (line 3). The individual weights of each of the
observations are updated for the next iteration at line 2d.
Observations misclassified by G.sub.m(x) have their weights scaled
by a factor exp(.alpha..sub.m), increasing their relative influence
for inducing the next classifier G.sub.m+1(x) in the sequence. In
some embodiments, modifications of the Freund and Schapire, 1997,
Journal of Computer and System Sciences 55, pp. 119-139, boosting
method are used. See, for example, Hasti et al., The Elements of
Statistical Learning, 2001, Springer, N.Y., Chapter 10. In some
embodiments, boosting or adaptive boosting methods are used.
[0182] In some embodiments, modifications of Freund and Schapire,
1997, Journal of Computer and System Sciences 55, pp. 119-139, are
used. For example, in some embodiments, feature preselection is
performed using a technique such as the nonparametric scoring
methods of Park et al., 2002, Pac. Symp. Biocomput. 6, 52-63.
Feature preselection is a form of dimensionality reduction in which
the genes that discriminate between classifications the best are
selected for use in the classifier. Then, the LogitBoost procedure
introduced by Friedman et al., 2000, Ann Stat 28, 337-407 is used
rather than the boosting procedure of Freund and Schapire. In some
embodiments, the boosting and other classification methods of
Ben-Dor et al., 2000, Journal of Computational Biology 7, 559-583
are used in the present invention. In some embodiments, the
boosting and other classification methods of Freund and Schapire,
1997, Journal of Computer and System Sciences 55, 119-139, are
used.
[0183] In the random subspace method, classifiers are constructed
in random subspaces of the data feature space. These classifiers
are usually combined by simple majority voting in the final
decision rule. See, for example, Ho, "The Random subspace method
for constructing decision forests," IEEE Trans Pattern Analysis and
Machine Intelligence, 1998; 20(8): 832-844.
5.2.12. Other Algorithms
[0184] The pattern classification and statistical techniques
described above are merely examples of the types of models that can
be used to construct a model for classification. Moreover,
combinations of the techniques described above can be used. Some
combinations, such as the use of the combination of decision trees
and boosting, have been described. However, many other combinations
are possible. In addition, in other techniques in the art such as
Projection Pursuit and Weighted Voting can be used to construct a
chemotherapy response classifier.
5.3. Methods of Determining Expression Levels of Chemotherapy
Response Genes
[0185] The invention also provides methods and compositions for
determining expression levels of CR genes, i.e., marker genes
listed in Table 1 and/or their encoded proteins. Such information
can be used to determine a treatment regimen for a patient. For
example, a patient whose level of expression of one or more CR
genes predicts that the patient is responsive to a chemotherapeutic
agent can be assigned a treatment regimen comprising the
chemotherapeutic agent. A patient whose level of expression of one
or more CR genes predicts that the patient is non-responsive to a
chemotherapeutic agent can either be assigned a treatment regimen
that does not comprise the chemotherapeutic agent, or assigned a
treatment regimen including a combination of the chemotherapeutic
agent and a therapy to regulate the expression levels of the gene
or genes. Thus, the invention provides methods and composition for
assigning treatment regimen for a cancer patient. The invention
also provides methods and composition for monitoring treatment
progress for a cancer patient based on the expression levels of the
marker genes.
[0186] A variety of methods can be employed for the diagnostic and
prognostic evaluation of patients for their responsiveness to
chemotherapy. In one embodiment, measurements of expression level
of one or more of CR genes listed in Table 1, and/or abundance or
activity level the encoded proteins are used.
[0187] In one embodiment, the method comprises determining an
expression level of a chemotherapy response gene listed in Table 1
in a sample of a patient, and determining whether the expression
level is deviated (above or below) from a predetermined threshold
that separates responsive and non-responsive patients. In another
embodiment, the method comprises determining a level of abundance
of a protein encoded by a CR gene, in a sample from a patient, and
determining whether the level of abundance is deviated from a
predetermined threshold that separates responsive and
non-responsive patients. In still another embodiment, the method
comprises determining a level of activity of a protein encoded by a
CR gene in a sample of a patient, and determining whether the level
of activity is deviated from a predetermined threshold that
separates responsive and non-responsive patients. In the foregoing
embodiments, and the embodiments described below, the sample can be
an ex vivo cell sample, e.g., cells in a cell culture, or in vivo
cells.
[0188] In a specific embodiment, the method comprises determining
an expression level of an interferon stimulated gene (ISG) listed
in Table 1 in the sample of a patient, and determining whether the
expression level is above a predetermined threshold. In another
embodiment, the method comprises determining a level of abundance
of a protein encoded by an ISG gene, and determining whether the
level is above a predetermined threshold.
[0189] Such methods may, for example, utilize reagents such as
nucleotide sequences and antibodies, e.g., the chemotherapy
response nucleotide sequences, and antibodies directed against
chemotherapy response proteins, including peptide fragments
thereof. Specifically, such reagents may be used, for example, for:
(1) the detection of the presence of mutations in a chemotherapy
response gene, or the detection of either over- or under-expression
of a chemotherapy response gene relative to the normal expression
level; and (2) the detection of either an over- or an
under-abundance of a chemotherapy response protein relative to the
threshold chemotherapy response protein level.
[0190] The methods described herein may be performed, for example,
by utilizing pre-packaged diagnostic kits comprising nucleic acid
of at least one specific chemotherapy response gene or an antibody
that binds a chemotherapy response protein, which may be
conveniently used, e.g., in clinical settings, to diagnose patients
exhibiting responsiveness or non-responsiveness to
chemotherapy.
[0191] Nucleic acid-based detection techniques and peptide
detection techniques are described in Sections 5.3.2., infra. In
one embodiment, the expression levels of one or more marker genes
are measured using qRT-PCR.
5.3.1. Samples Collection
[0192] In the present invention, gene products, such as target
polynucleotide molecules or proteins, are extracted from a sample
taken from a cancer patient. The sample may be collected in any
clinically acceptable mariner, but must be collected such that
marker-derived polynucleotides (i.e., RNA) are preserved (if gene
expression is to be measured) or proteins are preserved (if encoded
proteins are to be measured). In one embodiment, tumor samples are
used. In one embodiment, the pre-treatment tumor sample from a
patient is used. In another embodiment, the tumor sample from a
patient after and/or during treatment is used. In one embodiment,
the unsorted tumor sample from a patient is used. In another
embodiment, the sorted tumor sample from a patient is used. Other
suitable samples may comprise any clinically relevant tissue
sample, such as a tumor biopsy or fine needle aspirate, or a sample
of body fluid, such as blood, plasma, serum, lymph, ascitic fluid,
cystic fluid, or urine. The sample may be taken from a human, or,
in a veterinary context, from non-human animals such as ruminants,
horses, swine or sheep, or from domestic companion animals such as
felines and canines.
[0193] In a specific embodiment, mRNA or nucleic acids derived
therefrom (i.e., cDNA or amplified RNA or amplified DNA) are
preferably labeled distinguishably from polynucleotide molecules of
a reference sample, and both are simultaneously or independently
hybridized to a microarray comprising some or all of the markers or
marker sets or subsets described above. Alternatively, mRNA or
nucleic acids derived therefrom may be labeled with the same label
as the reference polynucleotide molecules, wherein the intensity of
hybridization of each at a particular probe is compared.
[0194] Methods for preparing total and poly(A)+ RNA are well known
and are described generally in Sambrook et al., MOLECULAR
CLONING--A LABORATORY MANUAL (2ND ED.), Vols. 1-3, Cold Spring
Harbor Laboratory, Cold Spring Harbor, N.Y. (1989)) and Ausubel et
al., CURRENT PROTOCOLS IN MOLECULAR BIOLOGY, vol. 2, Current
Protocols Publishing, New York (1994)). Preferably, total RNA, or
total mRNA (poly(A)+ RNA) is measured in the methods of the
invention directly or indirectly (e.g., via measuring cDNA or
cRNA).
[0195] RNA may be isolated from eukaryotic cells by procedures that
involve lysis of the cells and denaturation of the proteins
contained therein. Cells of interest include wild-type cells (i.e.,
non-cancerous), drug-exposed wild-type cells, tumor- or
tumor-derived cells, modified cells, normal or tumor cell line
cells, and drug-exposed modified cells. Preferably, the cells are
breast cancer tumor cells.
[0196] Additional steps may be employed to remove DNA. Cell lysis
may be accomplished with a nonionic detergent, followed by
microcentrifugation to remove the nuclei and hence the bulk of the
cellular DNA. In one embodiment, RNA is extracted from cells of the
various types of interest using guanidinium thiocyanate lysis
followed by CsCI centrifugation to separate the RNA from DNA
(Chirgwin et al., Biochemistry 18:5294-5299 (1979)). Poly(A)+ RNA
is selected by selection with oligo-dT cellulose (see Sambrook et
al., MOLECULAR CLONING--A LABORATORY MANUAL (2ND ED.), Vols. 1-3,
Cold Spring Harbor Laboratory, Cold Spring Harbor, N.Y. (1989).
Alternatively, separation of RNA from DNA can be accomplished by
organic extraction, for example, with hot phenol or
phenol/chloroform/isoamyl alcohol.
[0197] If desired, RNase inhibitors may be added to the lysis
buffer. Likewise, for certain cell types, it may be desirable to
add a protein denaturation/digestion step to the protocol.
[0198] For many applications, it is desirable to preferentially
enrich mRNA with respect to other cellular RNAs, such as transfer
RNA (tRNA) and ribosomal RNA (rRNA). Most mRNAs contain a poly(A)
tail at their 3' end. This allows them to be enriched by affinity
chromatography, for example, using oligo(dT) or poly(U) coupled to
a solid support, such as cellulose or Sephadex.TM. (see Ausubel et
al., CURRENT PROTOCOLS IN MOLECULAR BIOLOGY, vol. 2, Current
Protocols Publishing, New York (1994). Once bound, poly(A)+ mRNA is
eluted from the affinity column using 2 mM EDTA/0.1% SDS.
[0199] In a specific embodiment, total RNA or total mRNA from cells
is used in the methods of the invention. The source of the RNA can
be cells of an animal, e.g., human, mammal, primate, non-human
animal, dog, cat, mouse, rat, bird, etc. In specific embodiments,
the method of the invention is used with a sample containing total
mRNA or total RNA from 1.times.10.sup.6 cells or less. In another
embodiment, proteins can be isolated from the foregoing sources, by
methods known in the art, for use in expression analysis at the
protein level.
[0200] Probes to the homologs of the marker sequences disclosed
herein can be employed preferably when non-human nucleic acid is
being assayed.
5.3.2. Determination of Abundance Le3vels of Gene Products
[0201] The abundance levels of the gene products of the genes in a
sample may be determined by any means known in the art. The levels
may be determined by isolating and determining the level (i.e.,
amount) of nucleic acid transcribed from each marker gene.
Alternatively, or additionally, the level of specific proteins
encoded by a marker gene may be determined.
[0202] The levels of transcripts of specific marker genes can be
accomplished by determining the amount of mRNA, or polynucleotides
derived therefrom, present in a sample. Any method for determining
RNA levels can be used. For example, RNA is isolated from a sample
and separated on an agarose gel. The separated RNA is then
transferred to a solid support, such as a filter. Nucleic acid
probes representing one or more markers are then hybridized to the
filter by northern hybridization, and the amount of marker-derived
RNA is determined. Such determination can be visual, or
machine-aided, for example, by use of a densitometer. Another
method of determining RNA levels is by use of a dot-blot or a
slot-blot. In this method, RNA, or nucleic acid derived therefrom,
from a sample is labeled. The RNA or nucleic acid derived therefrom
is then hybridized to a filter containing oligonucleotides derived
from one or more marker genes, wherein the oligonucleotides are
placed upon the filter at discrete, easily-identifiable locations.
Hybridization, or lack thereof, of the labeled RNA to the
filter-bound oligonucleotides is determined visually or by
densitometer. Polynucleotides can be labeled using a radiolabel or
a fluorescent (i.e., visible) label.
[0203] The levels of transcripts of particular marker genes may
also be assessed by determining the level of the specific protein
expressed from the marker genes. This can be accomplished, for
example, by separation of proteins from a sample on a
polyacrylamide gel, followed by identification of specific
marker-derived proteins using antibodies in a western blot.
Alternatively, proteins can be separated by two-dimensional gel
electrophoresis systems. Two-dimensional gel electrophoresis is
well-known in the art and typically involves isoelectric focusing
along a first dimension followed by SDS-PAGE electrophoresis along
a second dimension. See, e.g., Hames et al, 1990, GEL
ELECTROPHORESIS OF PROTEINS: A PRACTICAL APPROACH, IRL Press, New
York; Shevchenko et al., Proc. Nat'l Acad. Sci. USA 93:1440-1445
(1996); Sagliocco et al., Yeast 12:1519-1533 (1996); Lander,
Science 274:536-539 (1996). The resulting electropherograms can be
analyzed by numerous techniques, including mass spectrometric
techniques, western blotting and immunoblot analysis using
polyclonal and monoclonal antibodies.
[0204] Alternatively, marker-derived protein levels can be
determined by constructing an antibody microarray in which binding
sites comprise immobilized, preferably monoclonal, antibodies
specific to a plurality of protein species encoded by the cell
genome. Preferably, antibodies are present for a substantial
fraction of the marker-derived proteins of interest. Methods for
making monoclonal antibodies are well known (see, e.g., Harlow and
Lane, 1988, ANTIBODIES: A LABORATORY MANUAL, Cold Spring Harbor,
N.Y., which is incorporated in its entirety for all purposes). In
one embodiment, monoclonal antibodies are raised against synthetic
peptide fragments designed based on genomic sequence of the cell.
With such an antibody array, proteins from the cell are contacted
to the array, and their binding is assayed with assays known in the
art. Generally, the expression, and the level of expression, of
proteins of diagnostic or prognostic interest can be detected
through immunohistochemical staining of tissue slices or
sections.
[0205] Finally, levels of transcripts of marker genes in a number
of tissue specimens may be characterized using a "tissue array"
(Kononen et al., Nat. Med 4(7):844-7 (1998)). In a tissue array,
multiple tissue samples are assessed on the same microarray. The
arrays allow in situ detection of RNA and protein levels;
consecutive sections allow the analysis of multiple samples
simultaneously.
5.3.2.1. Microarrays
[0206] In preferred embodiments, polynucleotide microarrays are
used to measure expression so that the expression status of each of
the markers above is assessed simultaneously.
Generally, microarrays according to the invention comprise a
plurality of markers informative for clinical category
determination, for a particular disease or condition.
[0207] The invention also provides a microarray comprising for each
of one or more genes listed in Table 1, one or more polynucleotide
probes complementary and hybridizable to a sequence in said gene,
wherein polynucleotide probes complementary and hybridizable to
said genes constitute at least X% of the probes on said microarray,
X %=50%, 60%, 70%, 80%, 90%, 95%, or 98%. In a particular
embodiment, the invention provides such a microarray wherein the
one or more genes comprises all genes listed in Table 1. The
microarray can be in a sealed container.
[0208] The microarrays preferably comprise at least N, where N=2,
3, 4, 5, 7, 10, 15, 20, 25, 30, or 35, or all of the markers, or
any combination of markers listed in Table 1. The actual number of
informative markers the microarray comprises will vary depending
upon the particular condition of interest.
[0209] In other embodiments, the invention provides polynucleotide
arrays in which the chemotherapy response markers comprise at least
X% of the probes on the array, where X %=50%, 60%, 70%, 80%, 85%,
90%, 95% or 98%. In another specific embodiment, the microarray
comprises a plurality of probes, wherein said plurality of probes
comprise probes complementary and hybridizable to at least 75% of
the chemotherapy response markers.
[0210] General methods pertaining to the construction of
microarrays comprising the marker sets and/or subsets above are
described in the following sections.
5.3.2.2. Construction of Microarrays
[0211] Microarrays are prepared by selecting probes which comprise
a polynucleotide sequence, and then immobilizing such probes to a
solid support or surface. For example, the probes may comprise DNA
sequences, RNA sequences, or copolymer sequences of DNA and RNA.
The polynucleotide sequences of the probes may also comprise DNA
and/or RNA analogues, or combinations thereof. For example, the
polynucleotide sequences of the probes may be full or partial
fragments of genomic DNA. The polynucleotide sequences of the
probes may also be synthesized nucleotide sequences, such as
synthetic oligonucleotide sequences. The probe sequences can be
synthesized either enzymatically in vivo, enzymatically in vitro
(e.g., by PCR), or non-enzymatically in vitro.
[0212] The probe or probes used in the methods of the invention are
preferably immobilized to a solid support which may be either
porous or non-porous. For example, the probes may be polynucleotide
sequences which are attached to a nitrocellulose or nylon membrane
or filter covalently at either the 3' or the 5' end of the
polynucleotide. Such hybridization probes are well known in the art
(see, e.g., Sambrook et al., MOLECULAR CLONING--A LABORATORY MANUAL
(2ND ED.), Vols. 1-3, Cold Spring Harbor Laboratory, Cold Spring
Harbor, N.Y. (1989). Alternatively, the solid support or surface
may be a glass or plastic surface. In a particularly preferred
embodiment, hybridization levels are measured to microarrays of
probes consisting of a solid phase on the surface of which are
immobilized a population of polynucleotides, such as a population
of DNA or DNA mimics, or, alternatively, a population of RNA or RNA
mimics. The solid phase may be a nonporous or, optionally, a porous
material such as a gel.
[0213] In preferred embodiments, a microarray comprises a support
or surface with an ordered array of binding (e.g., hybridization)
sites or "probes" each representing one of the markers described
herein. Preferably the microarrays are addressable arrays, and more
preferably positionally addressable arrays. More specifically, each
probe of the array is preferably located at a known, predetermined
position on the solid support such that the identity (i.e., the
sequence) of each probe can be determined from its position in the
array (i.e., on the support or surface). In preferred embodiments,
each probe is covalently attached to the solid support at a single
site.
[0214] Microarrays can be made in a number of ways, of which
several are described below. However produced, microarrays share
certain characteristics. The arrays are reproducible, allowing
multiple copies of a given array to be produced and easily compared
with each other. Preferably, microarrays are made from materials
that are stable under binding (e.g., nucleic acid hybridization)
conditions. The microarrays are preferably small, e.g., between 1
cm.sup.2 and 25 cm.sup.2, between 12 cm.sup.2 and 13 cm.sup.2, or 3
cm.sup.2. However, larger arrays are also contemplated and may be
preferable, e.g., for use in screening arrays. Preferably, a given
binding site or unique set of binding sites in the microarray will
specifically bind (e.g., hybridize) to the product of a single gene
in a cell (e.g., to a specific mRNA, or to a specific cDNA derived
therefrom). However, in general, other related or similar sequences
will cross hybridize to a given binding site.
[0215] The microarrays of the present invention include one or more
test probes, each of which has a polynucleotide sequence that is
complementary to a subsequence of RNA or DNA to be detected.
Preferably, the position of each probe on the solid surface is
known. Indeed, the microarrays are preferably positionally
addressable arrays. Specifically, each probe of the array is
preferably located at a known, predetermined position on the solid
support such that the identity (i.e., the sequence) of each probe
can be determined from its position on the array (i.e., on the
support or surface).
[0216] According to the invention, the microarray is an array
(i.e., a matrix) in which each position represents one of the
markers described herein. For example, each position can contain a
DNA or DNA analogue based on genomic DNA to which a particular RNA
or cDNA transcribed from that genetic marker can specifically
hybridize. The DNA or DNA analogue can be, e.g., a synthetic
oligomer or a gene fragment. In one embodiment, probes representing
each of the markers are present on the array. In a preferred
embodiment, the array comprises probes for each of the markers
listed in Table 1.
5.3.2.3. Preparing Probes for Microarrays
[0217] As noted above, the "probe" to which a particular
polynucleotide molecule specifically hybridizes according to the
invention contains a complementary genomic polynucleotide sequence.
The probes of the microarray preferably consist of nucleotide
sequences of no more than 1,000 nucleotides. In some embodiments,
the probes of the array consist of nucleotide sequences of 10 to
1,000 nucleotides. In a preferred embodiment, the nucleotide
sequences of the probes are in the range of 10-200 nucleotides in
length and are genomic sequences of a species of organism, such
that a plurality of different probes is present, with sequences
complementary and thus capable of hybridizing to the genome of such
a species of organism, sequentially tiled across all or a portion
of such genome. In other specific embodiments, the probes are in
the range of 10-30 nucleotides in length, in the range of 10-40
nucleotides in length, in the range of 20-50 nucleotides in length,
in the range of 40-80 nucleotides in length, in the range of 50-150
nucleotides in length, in the range of 80-120 nucleotides in
length, and most preferably are 60 nucleotides in length.
[0218] The probes may comprise DNA or DNA "mimics" (e.g.,
derivatives and analogues) corresponding to a portion of an
organism's genome. In another embodiment, the probes of the
microarray are complementary RNA or RNA mimics. DNA mimics are
polymers composed of subunits capable of specific,
Watson-Crick-like hybridization with DNA, or of specific
hybridization with RNA. The nucleic acids can be modified at the
base moiety, at the sugar moiety, or at the phosphate backbone.
Exemplary DNA mimics include, e.g., phosphorothioates.
[0219] DNA can be obtained, e.g., by polymerase chain reaction
(PCR) amplification of genomic DNA or cloned sequences. PCR primers
are preferably chosen based on a known sequence of the genome that
will result in amplification of specific fragments of genomic DNA.
Computer programs that are well known in the art are useful in the
design of primers with the required specificity and optimal
amplification properties, such as Oligo version 5.0 (National
Biosciences). Typically each probe on the microarray will be
between 10 bases and 50,000 bases, usually between 300 bases and
1,000 bases in length. PCR methods are well known in the art, and
are described, for example, in Innis et al., eds., PCR PROTOCOLS: A
GUIDE TO METHODS AND APPLICATIONS, Academic Press Inc., San Diego,
Calif. (1990). It will be apparent to one skilled in the art that
controlled robotic systems are useful for isolating and amplifying
nucleic acids.
[0220] An alternative, preferred means for generating the
polynucleotide probes of the microarray is by synthesis of
synthetic polynucleotides or oligonucleotides, e.g., using
N-phosphonate or phosphoramidite chemistries (Froehler et al.,
Nucleic Acid Res. 14:5399-5407 (1986); McBride et al., Tetrahedron
Lett. 24:246-248 (1983)). Synthetic sequences are typically between
about 10 and about 500 bases in length, more typically between
about 20 and about 100 bases, and most preferably between about 40
and about 70 bases in length. In some embodiments, synthetic
nucleic acids include non-natural bases, such as, but by no means
limited to, inosine. As noted above, nucleic acid analogues may be
used as binding sites for hybridization. An example of a suitable
nucleic acid analogue is peptide nucleic acid (see, e.g., Egholm et
al., Nature 363:566-568 (1993); U.S. Pat. No. 5,539,083).
[0221] Probes are preferably selected using an algorithm that takes
into account binding energies, base composition, sequence
complexity, cross-hybridization binding energies, and secondary
structure. See Friend et al., International Patent Publication WO
01/05935, published Jan. 25, 2001; Hughes et al., Nat. Biotech.
19:342-7 (2001).
[0222] A skilled artisan will also appreciate that positive control
probes, e.g., probes known to be complementary and hybridizable to
sequences in the target polynucleotide molecules, and negative
control probes, e.g., probes known to not be complementary and
hybridizable to sequences in the target polynucleotide molecules,
should be included on the array. In one embodiment, positive
controls are synthesized along the perimeter of the array. In
another embodiment, positive controls are synthesized in diagonal
stripes across the array. In still another embodiment, the reverse
complement for each probe is synthesized next to the position of
the probe to serve as a negative control. In yet another
embodiment, sequences from other species of organism are used as
negative controls or as "spike-in" controls.
5.3.2.4. Attaching Probes to the Solid Surface
[0223] The probes are attached to a solid support or surface, which
may be made, e.g., from glass, plastic (e.g., polypropylene,
nylon), polyacrylamide, nitrocellulose, gel, or other porous or
nonporous material. A preferred method for attaching the nucleic
acids to a surface is by printing on glass plates, as is described
generally by Schena et al, Science 270:467-470 (1995). This method
is especially useful for preparing microarrays of cDNA (See also,
DeRisi et al, Nature Genetics 14:457-460 (1996); Shalon et al.,
Genome Res. 6 :639-645 (1996); and Schena et al., Proc. Natl. Acad.
Sci. U.S.A. 93:10539-11286 (1995)).
[0224] A second preferred method for making microarrays is by
making high-density oligonucleotide arrays. Techniques are known
for producing arrays containing thousands of oligonucleotides
complementary to defined sequences, at defined locations on a
surface using photolithographic techniques for synthesis in situ
(see, Fodor et al., 1991, Science 251:767-773; Pease et al., 1994,
Proc. Natl. Acad. Sci. U.S.A. 91:5022-5026; Lockhart et al., 1996,
Nature Biotechnology 14:1675; U.S. Pat. Nos. 5,578,832; 5,556,752;
and 5,510,270) or other methods for rapid synthesis and deposition
of defined oligonucleotides (Blanchard et al., Biosensors &
Bioelectronics 11:687-690). When these methods are used,
oligonucleotides (e.g., 60-mers) of known sequence are synthesized
directly on a surface such as a derivatized glass slide. Usually,
the array produced is redundant, with several oligonucleotide
molecules per RNA.
[0225] Other methods for making microarrays, e.g., by masking
(Maskos and Southern, 1992, Nuc. Acids. Res. 20:1679-1684), may
also be used. In principle, and as noted supra, any type of array,
for example, dot blots on a nylon hybridization membrane (see
Sambrook et al., MOLECULAR CLONING--A LABORATORY MANUAL (2ND ED.),
Vols. 1-3, Cold Spring Harbor Laboratory, Cold Spring Harbor, N.Y.
(1989)) could be used. However, as will be recognized by those
skilled in the art, very small arrays will frequently be preferred
because hybridization volumes will be smaller.
[0226] In one embodiment, the arrays of the present invention are
prepared by synthesizing polynucleotide probes on a support. In
such an embodiment, polynucleotide probes are attached to the
support covalently at either the 3' or the 5' end of the
polynucleotide.
[0227] In a particularly preferred embodiment, microarrays are
manufactured by means of an ink jet printing device for
oligonucleotide synthesis, e.g., using the methods and systems
described by Blanchard in U.S. Pat. No. 6,028,189; Blanchard et
al., 1996, Biosensors and Bioelectronics 11:687-690; Blanchard,
1998, in Synthetic DNA Arrays in Genetic Engineering, Vol. 20, J.K.
Setlow, Ed., Plenum Press, New York at pages 111-123. Specifically,
the oligonucleotide probes in such microarrays are preferably
synthesized in arrays, e.g., on a glass slide, by serially
depositing individual nucleotide bases in "microdroplets" of a high
surface tension solvent such as propylene carbonate. The
microdroplets have small volumes (e.g., 100 pL or less, more
preferably 50 pL or less) and are separated from each other on the
microarray (e.g., by hydrophobic domains) to form circular surface
tension wells which define the locations of the array elements
(i.e., the different probes). Microarrays manufactured by this
ink-jet method are typically of high density, preferably having a
density of at least about 2,500 different probes per 1 cm.sup.2.
The polynucleotide probes are attached to the support covalently at
either the 3' or the 5' end of the polynucleotide.
5.3.2.5. Target Labeling and Hybridization to Microarrays
[0228] The polynucleotide molecules which may be analyzed by the
present invention (the "target polynucleotide molecules") may be
from any clinically relevant source, but are expressed RNA or a
nucleic acid derived therefrom (e.g., cDNA or amplified RNA derived
from cDNA that incorporates an RNA polymerase promoter), including
naturally occurring nucleic acid molecules, as well as synthetic
nucleic acid molecules. In one embodiment, the target
polynucleotide molecules comprise RNA, including, but by no means
limited to, total cellular RNA, poly(A).sup.+ messenger RNA (mRNA)
or fraction thereof, cytoplasmic mRNA, or RNA transcribed from cDNA
(i.e., cRNA; see, e.g., Linsley & Schelter, U.S. patent
application Ser. No. 09/411,074, filed Oct. 4, 1999, or U.S. Pat.
Nos. 5,545,522, 5,891,636, or 5,716,785). Methods for preparing
total and poly(A).sup.+ RNA are well known in the art, and are
described generally, e.g., in Sambrook et al., MOLECULAR CLONING--A
LABORATORY MANUAL (2ND ED.), Vols. 1-3, Cold Spring Harbor
Laboratory, Cold Spring Harbor, N.Y. (1989). In one embodiment, RNA
is extracted from cells of the various types of interest in this
invention using guanidinium thiocyanate lysis followed by CsC1
centrifugation (Chirgwin et al., 1979, Biochemistry 18:5294-5299).
In another embodiment, total RNA is extracted using a silica
gel-based column, commercially available examples of which include
RNeasy (Qiagen, Valencia, Calif.) and StrataPrep (Stratagene, La
Jolla, Calif.). In an alternative embodiment, which is preferred
for S. cerevisiae, RNA is extracted from cells using phenol and
chloroform, as described in Ausubel et al., eds., 1989, CURRENT
PROTOCOLS IN MOLECULAR BIOLOGY, Vol. III, Green Publishing
Associates, Inc., John Wiley & Sons, Inc., New York, at pp.
13.12.1-13.12.5). Poly(A).sup.+ RNA can be selected, e.g., by
selection with oligo-dT cellulose or, alternatively, by oligo-dT
primed reverse transcription of total cellular RNA. In one
embodiment, RNA can be fragmented by methods known in the art,
e.g., by incubation with ZnCl.sub.2, to generate fragments of RNA.
In another embodiment, the polynucleotide molecules analyzed by the
invention comprise cDNA, or PCR products of amplified RNA or
cDNA.
[0229] In one embodiment, total RNA, mRNA, or nucleic acids derived
therefrom, is isolated from a sample taken from a cancer patient.
Target polynucleotide molecules that are poorly expressed in
particular cells may be enriched using normalization techniques
(Bonaldo et al., 1996, Genome Res. 6:791-806).
[0230] As described above, the target polynucleotides are
detectably labeled at one or more nucleotides. Any method known in
the art may be used to detectably label the target polynucleotides.
Preferably, this labeling incorporates the label uniformly along
the length of the RNA, and more preferably, the labeling is carried
out at a high degree of efficiency. One embodiment for this
labeling uses oligo-dT primed reverse transcription to incorporate
the label; however, conventional methods of this method are biased
toward generating 3' end fragments. Thus, in a preferred
embodiment, random primers (e.g., 9-mers) are used in reverse
transcription to uniformly incorporate labeled nucleotides over the
full length of the target polynucleotides. Alternatively, random
primers may be used in conjunction with PCR methods or T7
promoter-based in vitro transcription methods in order to amplify
the target polynucleotides.
[0231] In a preferred embodiment, the detectable label is a
luminescent label. For example, fluorescent labels, bioluminescent
labels, chemiluminescent labels, and colorimetric labels may be
used in the present invention. In a highly preferred embodiment,
the label is a fluorescent label, such as a fluorescein, a
phosphor, a rhodamine, or a polymethine dye derivative. Examples of
commercially available fluorescent labels include, for example,
fluorescent phosphoramidites such as FluorePrime (Amersham
Pharmacia, Piscataway, N.J.), Fluoredite (Millipore, Bedford,
Mass.), FAM (ABI, Foster City, Calif.), and Cy3 or Cy5 (Amersham
Pharmacia, Piscataway, N.J.). In another embodiment, the detectable
label is a radiolabeled nucleotide.
[0232] In a further preferred embodiment, target polynucleotide
molecules from a patient sample are labeled differentially from
target polynucleotide molecules of a reference sample. The
reference can comprise target polynucleotide molecules from normal
cell samples (i. e., cell sample, e.g., of cells not afflicted with
cancer) or from cell samples, e.g., tumor cells from cancer
patients.
[0233] Nucleic acid hybridization and wash conditions are chosen so
that the target polynucleotide molecules specifically bind or
specifically hybridize to the complementary polynucleotide
sequences of the array, preferably to a specific array site,
wherein its complementary DNA is located.
[0234] Arrays containing double-stranded probe DNA situated thereon
are preferably subjected to denaturing conditions to render the DNA
single-stranded prior to contacting with the target polynucleotide
molecules. Arrays containing single-stranded probe DNA (e.g.,
synthetic oligodeoxyribonucleic acids) may need to be denatured
prior to contacting with the target polynucleotide molecules, e.g.,
to remove hairpins or dimers which form due to self complementary
sequences.
[0235] Optimal hybridization conditions will depend on the length
(e.g., oligomer versus polynucleotide greater than 200 bases) and
type (e.g., RNA, or DNA) of probe and target nucleic acids. One of
skill in the art will appreciate that as the oligonucleotides
become shorter, it may become necessary to adjust their length to
achieve a relatively uniform melting temperature for satisfactory
hybridization results. General parameters for specific (i.e.,
stringent) hybridization conditions for nucleic acids are described
in Sambrook et al., MOLECULAR CLONING--A LABORATORY MANUAL (2ND
ED.), Vols. 1-3, Cold Spring Harbor Laboratory, Cold Spring Harbor,
N.Y. (1989), and in Ausubel et al., CURRENT PROTOCOLS IN MOLECULAR
BIOLOGY, vol. 2, Current Protocols Publishing, New York (1994).
Typical hybridization conditions for the cDNA microarrays of Schena
et al. are hybridization in 5.times.SSC plus 0.2% SDS at 65.degree.
C. for four hours, followed by washes at 25.degree. C. in low
stringency wash buffer (1.times.SSC plus 0.2% SDS), followed by 10
minutes at 25.degree. C. in higher stringency wash buffer
(0.1.times.SSC plus 0.2% SDS) (Schena et al., Proc. Natl. Acad.
Sci. U.S.A. 93:10614 (1993)). Useful hybridization conditions are
also provided in, e.g., Tijessen, 1993, HYBRIDIZATION WITH NUCLEIC
ACID PROBES, Elsevier Science Publishers B. V.; and Kricka, 1992,
NONISOTOPIC DNA PROBE TECHNIQUES, Academic Press, San Diego,
Calif.
[0236] Particularly preferred hybridization conditions include
hybridization at a temperature at or near the mean melting
temperature of the probes (e.g., within 51.degree. C., more
preferably within 21.degree. C.) in 1 M NaCl, 50 mM MES buffer (pH
6.5), 0.5% sodium sarcosine and 30% formamide.
5.3.2.6. Signal Detection and Data Analysis
[0237] When fluorescently labeled gene products are used, the
fluorescence emissions at each site of a microarray may be,
preferably, detected by scanning confocal laser microscopy. In one
embodiment, a separate scan, using the appropriate excitation line,
is carried out for each of the two fluorophores used.
Alternatively, a laser may be used that allows simultaneous
specimen illumination at wavelengths specific to the two
fluorophores and emissions from the two fluorophores can be
analyzed simultaneously (see Shalon et al., 1996, "A DNA microarray
system for analyzing complex DNA samples using two-color
fluorescent probe hybridization," Genome Research 6:639-645, which
is incorporated by reference in its entirety for all purposes). In
a preferred embodiment, the arrays are scanned with a laser
fluorescent scanner with a computer controlled X-Y stage and a
microscope objective. ` Sequential excitation of the two
fluorophores is achieved with a multi-line, mixed gas laser and the
emitted light is split by wavelength and detected with two
photomultiplier tubes. Fluorescence laser scanning devices are
described in Schena et al., Genome Res. 6:639-645 (1996), and in
other references cited herein. Alternatively, the fiber-optic
bundle described by Ferguson et al., Nature Biotech. 14:1681-1684
(1996), may be used to monitor mRNA abundance levels at a large
number of sites simultaneously.
5.3.2.7. Other Assays for Detecting and Quantifying RNA
[0238] In addition to microarrays such as those described above any
technique known to one of skill for detecting and measuring RNA can
be used in accordance with the methods of the invention.
Non-limiting examples of techniques include Northern blotting,
nuclease protection assays, RNA fingerprinting, polymerase chain
reaction, ligase chain reaction, Qbeta replicase, isothermal
amplification method, strand displacement amplification,
transcription based amplification systems, nuclease protection (SI
nuclease or RNAse protection assays), SAGE as well as methods
disclosed in International Publication Nos. WO 88/10315 and WO
89/06700, and International Applications Nos. PCT/US87/00880 and
PCT/US89/01025.
[0239] A standard Northern blot assay can be used to ascertain an
RNA transcript size, identify alternatively spliced RNA
transcripts, and the relative amounts of mRNA in a sample, in
accordance with conventional Northern hybridization techniques
known to those persons of ordinary skill in the art. In Northern
blots, RNA samples are first separated by size via electrophoresis
in an agarose gel under denaturing conditions. The RNA is then
transferred to a membrane, cross-linked and hybridized with a
labeled probe. Nonisotopic or high specific activity radio-labeled
probes can be used including random-primed, nick-translated, or
PCR-generated DNA probes, in vitro transcribed RNA probes, and
oligonucleotides. Additionally, sequences with only partial
homology (e.g., cDNA from a different species or genomic DNA
fragments that might contain an exon) may be used as probes. The
labeled probe, e.g., a radio-labeled cDNA, either containing the
full-length, single stranded DNA or a fragment of that DNA sequence
may be at least 20, at least 30, at least 50, or at least 100
consecutive nucleotides in length. The probe can be labeled by any
of the many different methods known to those skilled in this art.
The labels most commonly employed for these studies are radioactive
elements, enzymes, chemicals that fluoresce when exposed to
ultraviolet light, and others. A number of fluorescent materials
are known and can be utilized as labels. These include, but are not
limited to, fluorescein, rhodamine, auramine, Texas Red, AMCA blue
and Lucifer Yellow. A particular detecting material is anti-rabbit
antibody prepared in goats and conjugated with fluorescein through
an isothiocyanate. Proteins can also be labeled with a radioactive
element or with an enzyme. The radioactive label can be detected by
any of the currently available counting procedures. Non-limiting
examples of isotopes include .sup.3H, .sup.14C, .sup.32P, .sup.35S,
.sup.36Cl, .sup.51Cr, .sup.57Co, .sup.58Co, .sup.59Fe, .sup.90Y,
.sup.125I, .sup.131I, and .sup.186Re. Enzyme labels are likewise
useful, and can be detected by any of the presently utilized
colorimetric, spectrophotometric, fluorospectrophotometric,
amperometric or gasometric techniques. The enzyme is conjugated to
the selected particle by reaction with bridging molecules such as
carbodiimides, diisocyanates, glutaraldehyde and the like. Any
enzymes known to one of skill in the art can be utilized. Examples
of such enzymes include, but are not limited to, peroxidase,
beta-D-galactosidase, urease, glucose oxidase plus peroxidase and
alkaline phosphatase. U.S. Pat. Nos. 3,654,090, 3,850,752, and
4,016,043 are referred to by way of example for their disclosure of
alternate labeling material and methods.
[0240] Nuclease protection assays (including both ribonuclease
protection assays and Si nuclease assays) can be used to detect and
quantify specific mRNAs. In nuclease protection assays, an
antisense probe (labeled with, e.g., radio-labeled or nonisotopic)
hybridizes in solution to an RNA sample. Following hybridiiation,
single-stranded, unhybridized probe and RNA are degraded by
nucleases. An acrylamide gel is used to separate the remaining
protected fragments. Typically, solution hybridization is more
efficient than membrane-based hybridization, and it can accommodate
up to 100 .mu.g of sample RNA, compared with the 20-30 .mu.g
maximum of blot hybridizations.
[0241] The ribonuclease protection assay, which is the most common
type of nuclease protection assay, requires the use of RNA probes.
Oligonucleotides and other single-stranded DNA probes can only be
used in assays containing S1 nuclease. The single-stranded,
antisense probe must typically be completely homologous to target
RNA to prevent cleavage of the probe:target hybrid by nuclease.
[0242] Serial Analysis Gene Expression (SAGE), which is described
in e.g., Velculescu et al., 1995, Science 270:484-7; Carulli, et
al., 1998, Journal of Cellular Biochemistry Supplements
30/31:286-96, can also be used to determine RNA abundances in a
cell sample.
[0243] Quantitative reverse transcriptase PCR (qRT-PCR) can also be
used to determine the expression profiles of marker genes (see,
e.g., U.S. Patent Application Publication No. 2005/0048542A1). The
first step in gene expression profiling by RT-PCR is the reverse
transcription of the RNA template into cDNA, followed by its
exponential amplification in a PCR reaction. The two most commonly
used reverse transcriptases are avilo myeloblastosis virus reverse
transcriptase (AMV-RT) and Moloney murine leukemia virus reverse
transcriptase (MLV-RT). The reverse transcription step is typically
primed using specific primers, random hexamers, or oligo-dT
primers, depending on the circumstances and the goal of expression
profiling. For example, extracted RNA can be reverse-transcribed
using a GeneAmp RNA PCR kit (Perkin Elmer, Calif., USA), following
the manufacturer's instructions. The derived cDNA can then be used
as a template in the subsequent PCR reaction.
[0244] Although the PCR step can use a variety of thermostable
DNA-dependent DNA polymerases, it typically employs the Taq DNA
polymerase, which has a 5'-3' nuclease activity but lacks a 3'-5'
proofreading endonuclease activity. Thus, TaqMan.RTM. PCR typically
utilizes the 5'-nuclease activity of Taq or Tth polymerase to
hydrolyze a hybridization probe bound to its target amplicon, but
any enzyme with equivalent 5' nuclease activity can be used. Two
oligonucleotide primers are used to generate an amplicon typical of
a PCR reaction. A third oligonucleotide, or probe, is designed to
detect nucleotide sequence located between the two PCR primers. The
probe is non-extendible by Taq DNA polymerase enzyme, and is
labeled with a reporter fluorescent dye and a quencher fluorescent
dye. Any laser-induced emission from the reporter dye is quenched
by the quenching dye when the two dyes are located close together
as they are on the probe. During the amplification reaction, the
Taq DNA polymerase enzyme cleaves the probe in a template-dependent
manner. The resultant probe fragments disassociate in solution, and
signal from the released reporter dye is free from the quenching
effect of the second fluorophore. One molecule of reporter dye is
liberated for each new molecule synthesized, and detection of the
unquenched reporter dye provides the basis for quantitative
interpretation of the data.
[0245] TaqMan.RTM. RT-PCR can be performed using commercially
available equipment, such as, for example, ABI PRISM 7700.TM..
Sequence Detection System.TM. (Perkin-Elmer-Applied Biosystems,
Foster City, Calif., USA), or Lightcycler (Roche Molecular
Biochemicals, Mannheim, Germany). In a preferred embodiment, the 5'
nuclease procedure is run on a real-time quantitative PCR device
such as the ABI PRISM .sup.7700.TM. Sequence Detection System.TM..
The system consists of a thermocycler, laser, charge-coupled device
(CCD), camera and computer. The system includes software for
running the instrument and for analyzing the data.
[0246] 5'-Nuclease assay data are initially expressed as Ct, or the
threshold cycle. Fluorescence values are recorded during every
cycle and represent the amount of product amplified to that point
in the amplification reaction. The point when the fluorescent
signal is first recorded as statistically significant is the
threshold cycle (Ct).
[0247] To minimize errors and the effect of sample-to-sample
variation, RT-PCR is usually performed using an internal standard.
The ideal internal standard is expressed at a constant level among
different tissues, and is unaffected by the experimental treatment.
RNAs most frequently used to normalize patterns of gene expression
are mRNAs for the housekeeping genes
glyceraldehyde-3-phosphate-dehydrogenase (GAPDH) and
.beta.-actin.
[0248] A more recent variation of the RT-PCR technique is the real
time quantitative PCR, which measures PCR product accumulation
through a dual-labeled fluorigenic probe (i.e., TaqMan.RTM. probe).
Real time PCR is compatible both with quantitative competitive PCR,
where internal competitor for each target sequence is used for
normalization, and with quantitative comparative PCR using a
normalization gene contained within the sample, or a housekeeping
gene for RT-PCR. For further details see, e.g. Held et al., Genome
Research 6:986-994 (1996).
5.3.2.8. Detection and Quantification of Protein
[0249] Measurement of the translational state may be performed
according to several methods. For example, whole genome monitoring
of protein (e.g., the "proteome,") can be carried out by
constructing a microarray in which binding sites comprise
immobilized, preferably monoclonal, antibodies specific to a
plurality of protein species encoded by the cell genome.
Preferably, antibodies are present for a substantial fraction of
the encoded proteins, or at least for those proteins relevant to
the action of a drug of interest. Methods for making monoclonal
antibodies are well known (see, e.g., Harlow and Lane, 1988,
Antibodies: A Laboratory Manual, Cold Spring Harbor, N.Y., which is
incorporated in its entirety for all purposes). In one embodiment,
monoclonal antibodies are raised against synthetic peptide
fragments designed based on genomic sequence of the cell. With such
an antibody array, proteins from the cell are contacted to the
array and their binding is assayed with assays known in the
art.
[0250] Immunoassays known to one of skill in the art can be used to
detect and quantify protein levels. For example, ELISAs can be used
to detect and quantify protein levels. ELISAs comprise preparing
antigen, coating the well of a 96 well microtiter plate with the
antigen, adding the antibody of interest conjugated to a detectable
compound such as an enzymatic substrate (e.g., horseradish
peroxidase or alkaline phosphatase) to the well and incubating for
a period of time, and detecting the presence of the antigen. In
ELISAs the antibody of interest does not have to be conjugated to a
detectable compound; instead, a second antibody (which recognizes
the antibody of interest) conjugated to a detectable compound may
be added to the well. Further, instead of coating the well with the
antigen, the antibody may be coated to the well. In this case, a
second antibody conjugated to a detectable compound may be added
following the addition of the antigen of interest to the coated
well. One of skill in the art would be knowledgeable as to the
parameters that can be modified to increase the signal detected as
well as other variations of ELISAs known in the art. In a preferred
embodiment, an ELISA may be performed by coating a high binding
96-well microtiter plate (Costar) with 2 .mu.g/ml of rhu-IL-9 in
PBS overnight. Following three washes with PBS, the plate is
incubated with three-fold serial dilutions of Fab at 25.degree. C.
for 1 hour. Following another three washes of PBS, 1 .mu.g/ml
anti-human kappa-alkaline phosphatase-conjugate is added and the
plate is incubated for 1 hour at 25.degree. C. Following three
washes with PBST, the alkaline phosphatase activity is determined
in 50 .mu.l/AMP/PPMP substrate. The reactions are stopped and the
absorbance at 560 nm is determined with a VMAX microplate reader.
For further discussion regarding ELISAs see, e.g., Ausubel et al,
eds, 1994, Current Protocols in Molecular Biology, Vol. 1, John
Wiley & Sons, Inc., New York at 11.2.1.
[0251] Protein levels may be determined by Western blot analysis.
Further, protein levels as well as the phosphorylation of proteins
can be determined by immunoprecitation followed by Western blot
analysis. Immunoprecipitation protocols generally comprise lysing a
population of cells in a lysis buffer such as RIPA buffer (1% NP-40
or Triton X-100, 1% sodium deoxycholate, 0.1% SDS, 0.15 M NaCl,
0.01 M sodium phosphate at pH 7.2, 1% Trasylol) supplemented with
protein phosphatase and/or protease inhibitors (e.g., EDTA, PMSF,
aprotinin, sodium vanadate), adding the antibody of interest to the
cell lysate, incubating for a period of time (e.g., 1 to 4 hours)
at 40.degree. C., adding protein A and/or protein G sepharose beads
to the cell lysate, incubating for about an hour or more at 40
.degree. C., washing the beads in lysis buffer and resuspending the
beads in SDS/sample buffer. The ability of the antibody of interest
to immunoprecipitate a particular antigen can be assessed by, e.g.,
western blot analysis. One of skill in the art would be
knowledgeable as to the parameters that can be modified to increase
the binding of the antibody to an antigen and decrease the
background (e.g., pre-clearing the cell lysate with sepharose
beads). For further discussion regarding immunoprecipitation
protocols see, e.g., Ausubel et al, eds, 1994, Current Protocols in
Molecular Biology, Vol. 1, John Wiley & Sons, Inc., New York at
10.16.1.
[0252] Western blot analysis generally comprises preparing protein
samples, electrophoresis of the protein samples in a polyacrylamide
gel (e.g., 8%- 20% SDS-PAGE depending on the molecular weight of
the antigen), transferring the protein sample from the
polyacrylamide gel to a membrane such as nitrocellulose, PVDF or
nylon, incubating the membrane in blocking solution (e.g., PBS with
3% BSA or non-fat milk), washing the membrane in washing buffer
(e.g., PBS-Tween 20), incubating the membrane with primary antibody
(the antibody of interest) diluted in blocking buffer, washing the
membrane in washing buffer, incubating the membrane with a
secondary antibody (which recognizes the primary antibody, e.g., an
anti-human antibody) conjugated to an enzymatic substrate (e.g.,
horseradish peroxidase or alkaline phosphatase) or radioactive
molecule (e.g., .sup.32P or .sup.125I) diluted in blocking buffer,
washing the membrane in wash buffer, and detecting the presence of
the antigen. One of skill in the art would be knowledgeable as to
the parameters that can be modified to increase the signal detected
and to reduce the background noise. For further discussion
regarding western blot protocols see, e.g., Ausubel et al, eds,
1994, Current Protocols in Molecular Biology, Vol. 1, John Wiley
& Sons, Inc., New York at 10.8.1.
[0253] Protein expression levels can also be separated by
two-dimensional gel electrophoresis systems. Two-dimensional gel
electrophoresis is well-known in the art and typically involves
iso-electric focusing along a first dimension followed by SDS-PAGE
electrophoresis along a second dimension. See, e.g., Hames et al.,
1990, Gel Electrophoresis of Proteins: A Practical Approach, IRL
Press, New York; Shevchenko et al., 1996, Proc. Natl. Acad. Sci.
USA 93:1440-1445; Sagliocco et al., 1996, Yeast 12:1519-1533;
Lander, 1996, Science 274:536-539. The resulting electropherograms
can be analyzed by numerous techniques, including mass
spectrometric techniques, Western blotting and immunoblot analysis
using polyclonal and monoclonal antibodies, and internal and
N-terminal micro-sequencing.
5.4. Treating Cancer by Modulating Expression and/or Activity of
Chemotherapy Response Genes and/or their Products
[0254] The invention provides methods and compositions for
utilizing chemotherapy response genes listed in Table 1 in treating
cancer. The methods and compositions are used for treating
non-responsive cancer patient by modulating the expression and/or
activity of such genes and/or the encoded proteins in combination
with a chemotherapy. The compositions (e.g., agents that modulate
expression and/or activity of the CR gene or gene product) of the
invention are preferably purified.
[0255] In one embodiment, the invention provides methods and
compositions for treating a non-responsive cancer patient by
reducing the expression and/or activity of one or more genes listed
in Table 1, and/or its encoded protein by at least 2 fold, 3 fold,
4 fold, 6 fold, 8 fold or 9 fold.
[0256] In a specific embodiment, the invention provides a method
for treating a non-responsive cancer patient by administering to a
patient (i) an agent that is capable of reducing the expression
and/or activity of one or more genes listed in Table 1, and/or its
encoded protein, and (ii) a therapeutically sufficient amount of a
chemotherapeutic agent. The invention also provide (i) an agent
that is capable of reducing the expression and/or activity of one
or more genes listed in Table 1, and/or its encoded protein, and
(ii) a therapeutically sufficient amount of a chemotherapeutic
agent for simultaneous or sequential use in treatment of a cancer
patient, e.g., a non-responsive cancer patient. The invention also
provides (i) an agent that is capable of reducing the expression
and/or activity of one or more genes listed in Table 1, and/or its
encoded protein, and (ii) a therapeutically sufficient amount of a
chemotherapeutic agent for use in the manufacture of a medicament
for simultaneous or sequential use in treatment of a cancer
patient, e.g., a non-responsive cancer patient.
[0257] The invention also provides methods and compositions for
utilizing chemotherapy response genes listed in Table 1 for
modulating sensitivity of a cell to a chemotherapeutic drug. In one
embodiment, the invention provides a method for modulating
sensitivity of a cell to a chemotherapeutic drug by contacting the
cell with one or more agents that are capable of reducing the
expression and/or activity of one or more different genes listed in
Table 1 or respective functional equivalents thereof and/or the
their encoded proteins. In one embodiment, the cell is an in vivo
cell. In another embodiment, the cell is an in vitro cell, e.g., a
cell in a cell culture.
[0258] Thus, the invention also provides methods and compositions
for modulating growth of a cell, e.g., an in vivo cell or an in
vitro cell, e.g., a cell in a cell culture. In one embodiment, the
invention provides a method for modulating growth of a cell,
comprising contacting the cell with (a) one or more agents that are
capable of reducing the expression and/or activity of one or more
different genes listed in Table 1 or respective functional
equivalents thereof and/or the their encoded proteins; and (b) a
sufficient amount of a chemotherapeutic drug.
[0259] A variety of approaches may be used in accordance with the
invention to modulate expression of a CR gene and/or its encoded
protein in vivo. For example, siRNA molecules may be engineered and
used to silence a CR gene in vivo. Antisense DNA molecules may also
be engineered and used to block translation of a CR mRNA in vivo.
Alternatively, ribozyme molecules may be designed to cleave and
destroy the mRNAs of a CR gene in vivo. In another alternative,
oligonucleotides designed to hybridize to the 5' region of the CR
gene (including the region upstream of the coding sequence) and
form triple helix structures may be used to block or reduce
transcription of the CR gene. The expression and/or activity of a
CR protein can be modulated using antibody, peptide or polypeptide
molecules, and small organic or inorganic molecules.
[0260] In a preferred embodiment, RNAi is used to knock down the
expression of a CR gene. In one embodiment, double-stranded RNA
molecules of 21-23 nucleotides which hybridize to a homologous
region of mRNAs transcribed from the CR gene are used to degrade
the mRNAs, thereby "silence" the expression of the CR gene. The
method can be used to reduce expression levels of aberrantly
up-regulated CR genes. Preferably, the dsRNAs have a hybridizing
region, e.g., a 19-nucleotide double-stranded region, which is
complementary to a sequence of the coding sequence of the CR gene.
Any siRNA that targets an appropriate coding sequence of a CR gene
and exhibit a sufficient level of silencing can be used in the
invention. As exemplary embodiments, 21-nucleotide double-stranded
siRNAs targeting the coding regions of a CR gene are designed
according to selection rules known in the art (see, e.g., Elbashir
et al., 2002, Methods 26:199-213; International Application No.
PCT/US04/35636, filed Oct. 27, 2004, each of which is incorporated
herein by reference in its entirety). In a preferred embodiment,
the siRNA or siRNAs specifically inhibit the translation or
transcription of a CR protein without substantially affecting the
translation or transcription of genes encoding other protein
kinases in the same kinase family. In a specific embodiment, siRNAs
targeting an up-regulated gene listed in Table 4 are used to
silence the respective CR genes.
[0261] The invention also provides methods and compositions for
treating a non-responsive cancer patient by reducing the expression
and/or activities of one or more CR genes, and/or their encoded
proteins. In one embodiment, a non-responsive cancer patient is
treated by administering to the patient one or more agents that
reduce the expression and/or activities of these CR genes, and/or
their encoded proteins. In a preferred embodiment, an siRNA is used
to silence the plurality of different CR genes. The sequence of the
siRNA is chosen such that the transcript of each of the genes
comprises a nucleotide sequence that is identical to a central
contiguous nucleotide sequence of at least 11 nucleotides of the
sense strand or the antisense strand of the siRNA, and/or comprises
a nucleotide sequence that is identical to a contiguous nucleotide
sequence of at least 8 nucleotides at the 3' end of the sense
strand or the antisense strand of the siRNA. Thus, when
administrated to the patient, the siRNA silences all of the
plurality of genes in cells of the patient. In preferred
embodiments, the central contiguous nucleotide sequence of the
siRNA that is identical to one or more CR genes is 11-15, 14-15,
11, 12, or 13 nucleotides in length. In other preferred
embodiments, the 3' contiguous nucleotide sequence of the siRNA
that is identical to one or more CR genes is 9-15, 9-12, 11, 10, 9,
or 8 nucleotides in length. The length and nucleotide base sequence
of the target sequence of each different target gene, i.e., the
sequence of the gene that is identical to an appropriate sense or
antisense sequence of the siRNA, can be different from gene to
gene. For example, gene A may have a sequence of 11 nucleotides
identical to the nucleotide sequence 3-13 of the sense strand of
the siRNA, while gene B may have a sequence of 12 nucleotides
identical to the nucleotide sequence 4-15 of the sense strand of
the siRNA. Thus, a single siRNA may be designed to silence a large
number of, e.g., at least 2, 3, 4, 5, 10, 15, 20, 25, 30, 35 or 39,
CR genes in cells.
[0262] RNAi can be carried out using any standard method for
introducing nucleic acids into cells. In one embodiment, gene
silencing is induced by presenting the cell with one or more siRNAs
targeting the CR gene (see, e.g., Elbashir et al., 2001, Nature
411, 494-498; Elbashir et al., 2001, Genes Dev. 15, 188-200, all of
which are incorporated by reference herein in their entirety). The
siRNAs can be chemically synthesized, or derived from cleavage of
double-stranded RNA by recombinant Dicer. Another method to
introduce a double stranded DNA (dsRNA) for silencing of the CR
gene is shRNA, for short hairpin RNA (see, e.g., Paddison et al.,
2002, Genes Dev. 16, 948-958; Brummelkamp et al., 2002, Science
296, 550-553; Sui, G. et al. 2002, Proc. Natl. Acad. Sci. USA 99,
5515-5520, all of which are incorporated by reference herein in
their entirety). In this method, a siRNA targeting a CR gene is
expressed from a plasmid (or virus) as an inverted repeat with an
intervening loop sequence to form a hairpin structure. The
resulting RNA transcript containing the hairpin is subsequently
processed by Dicer to produce siRNAs for silencing. Plasmid- or
virus-based shRNAs can be expressed stably in cells, allowing
long-term gene silencing in cells both in vitro and in vivo (see,
McCaffrey et al. 2002, Nature 418, 38-39; Xia et al., 2002, Nat.
Biotech. 20, 1006-1010; Lewis et al., 2002, Nat. Genetics 32,
107-108; Rubinson et al.; 2003, Nat. Genetics 33, 401-406;
Tiscornia et al., 2003, Proc. Natl. Acad. Sci. USA 100, 1844-1848,
all of which are incorporated by reference herein in their
entirety). Such plasmid- or virus-based shRNAs can be delivered
using a gene therapy approach. SiRNAs targeting the CR gene can
also be delivered to an organ or tissue in a mammal, such a human,
in vivo (see, e.g., Song et al. 2003, Nat. Medicine 9, 347-351;
Sorensen et al., 2003, J. Mol. Biol. 327, 761-766; Lewis et al.,
2002, Nat. Genetics 32, 107-108, all of which are incorporated by
reference herein in their entirety). In this method, a solution of
siRNA is injected intravenously into the mammal. The siRNA can then
reach an organ or tissue of interest and effectively reduce the
expression of the target gene in the organ or tissue of the
mammal.
[0263] In preferred embodiments, an siRNA pool (mixture) containing
at least k (k=2, 3, 4, 5, 6 or 10) different siRNAs targeting a CR
gene at different sequence regions is used to silence the gene. In
a preferred embodiment, the total siRNA concentration of the pool
is about the same as the concentration of a single siRNA when used
individually. As used herein, the word "about" with reference to
concentration means within 20%. Preferably, the total concentration
of the pool of siRNAs is an optimal concentration for silencing the
intended target gene. An optimal concentration is a concentration
further increase of which does not increase the level of silencing
substantially. In one embodiment, the optimal concentration is a
concentration further increase of which does not increase the level
of silencing by more than 5%, 10% or 20%. In a preferred
embodiment, the composition of the pool, including the number of
different siRNAs in the pool and the concentration of each
different siRNA, is chosen such that the pool of siRNAs causes less
than 30%, 20%, 10% or 5%, 1%, 0.1% or 0.01% of silencing of any
off-target genes (e.g., as determined by standard nucleic acid
assay, e.g., PCR). In another preferred embodiment, the
concentration of each different siRNA in the pool of different
siRNAs is about the same. In still another preferred embodiment,
the respective concentrations of different siRNAs in the pool are
different from each other by less than 5%, 10%, 20% or 50% of the
concentration of any one siRNA or said total siRNA concentration of
said different siRNAs. In still another preferred embodiment, at
least one siRNA in the pool of different siRNAs constitutes more
than 90%, 80%, 70%, 50%, or 20% of the total siRNA concentration in
the pool. In still another preferred embodiment, none of the siRNAs
in the pool of different siRNAs constitutes more than 90%, 80%,
70%, 50%, or 20% of the total siRNA concentration in the pool. In
other embodiments, each siRNA in the pool has a concentration that
is lower than the optimal concentration when used individually. In
a preferred embodiment, each different siRNA in the pool has an
concentration that is lower than the concentration of the siRNA
that is effective to achieve at least 30%, 50%, 75%, 80%, 85%, 90%
or 95% silencing when used in the absence of other siRNAs or in the
absence of other siRNAs designed to silence the gene. In another
preferred embodiment, each different siRNA in the pool has a
concentration that causes less than 30%, 20%, 10% or 5% of
silencing of the gene when used in the absence of other siRNAs or
in the absence of other siRNAs designed to silence the gene. In a
preferred embodiment, each siRNA has a concentration that causes
less than 30%, 20%, 10% or 5% of silencing of the target gene when
used alone, while the plurality of siRNAs causes at least 80% or
90% of silencing of the target gene. In specific embodiments, a
pool containing the 3 different is used for targeting a CR gene.
More detailed descriptions of techniques for carrying out RNAi are
also presented in Section 5.6.
[0264] In other embodiments, antisense, ribozyme, and triple helix
forming nucleic acid are designed to inhibit the translation or
transcription of a CR protein or gene with minimal effects on the
expression of other genes that may share one or more sequence motif
with the CR gene. To accomplish this, the oligonucleotides used
should be designed on the basis of relevant sequences unique to a
CR gene. In one embodiment, the oligonucleotide used specifically
inhibits the translation or transcription of a CR protein or gene
without substantially affecting the translation or transcription of
other proteins in the same protein family.
[0265] For example, and not by way of limitation, the
oligonucleotides should not fall within those regions where the
nucleotide sequence of a CR gene is most homologous to that of
other genes. In the case of antisense molecules, it is preferred
that the sequence be at least 18 nucleotides in length in order to
achieve sufficiently strong annealing to the target mRNA sequence
to prevent translation of the sequence. Izant et al., 1984, Cell,
36:1007-1015; Rosenberg et al., 1985, Nature, 313:703-706.
[0266] Ribozymes are RNA molecules which possess highly specific
endoribonuclease activity. Hammerhead ribozymes comprise a
hybridizing region which is complementary in . nucleotide sequence
to at least part of the target RNA, and a catalytic region which is
adapted to cleave the target RNA. The hybridizing region contains
nine (9) or more nucleotides. Therefore, the hammerhead ribozymes
are useful for targeting a CR gene having a hybridizing region
which is complementary to the sequences of the target gene and are
at least nine nucleotides in length. The construction and
production of such ribozymes is well known in the art and is
described more fully in Haseloff et al., 1988, Nature,
334:585-591.
[0267] The ribozymes of the present invention also include RNA
endoribonucleases (hereinafter "Cech-type ribozymes") such as the
one which occurs naturally in Tetrahymena Thermophila (known as the
IVS, or L-19 IVS RNA) and which has been extensively described by
Thomas Cech and collaborators (Zaug, et al., 1984, Science,
224:574-578; Zaug and Cech, 1986, Science, 231:470-475; Zaug, et
al., 1986, Nature, 324:429-433; published International patent
application No. WO 88/04300 by University Patents Inc.; Been et
al., 1986, Cell, 47:207-216). The Cech endoribonucleases have an
eight base pair active site which hybridizes to a target RNA
sequence whereafter cleavage of the target RNA takes place.
[0268] In the case of oligonucleotides that hybridize to and form
triple helix structures at the 5' terminus of a CR gene and can be
used to block transcription, it is preferred that they be
complementary to those sequences in the 5' terminus of a CR gene
which are not present in other related genes. It is also preferred
that the sequences not include those regions of the promoter of a
CR gene which are even slightly homologous to that of other related
genes.
[0269] The foregoing compounds can be administered by a variety of
methods which are known in the art including, but not limited to
the use of liposomes as a delivery vehicle. Naked DNA or RNA
molecules may also be used where they are in a form which is
resistant to degradation such as by modification of the ends, by
the formation of circular molecules, or by the use of alternate
bonds including phosphothionate and thiophosphoryl modified bonds.
In addition, the delivery of nucleic acid may be by facilitated
transport where the nucleic acid molecules are conjugated to
poly-lysine or transferrin. Nucleic acid may also be transported
into cells by any of the various viral carriers, including but not
limited to, retrovirus, vaccinia, AAV, and adenovirus.
[0270] Alternatively, a recombinant nucleic acid molecule which
encodes, or is, such antisense nucleic acid, ribozyme, triple helix
forming nucleic acid, or nucleic acid molecule of a CR gene can be
constructed. This nucleic acid molecule may be either RNA or DNA.
If the nucleic acid encodes an RNA, it is preferred that the
sequence be operatively attached to a regulatory element so that
sufficient copies of the desired RNA product are produced. The
regulatory element may permit either constitutive or regulated
transcription of the sequence. In vivo, that is, within the cells
or cells of an organism, a transfer vector such as a bacterial
plasmid or viral RNA or DNA, encoding one or more of the RNAs, may
be transfected into cells e.g. (Llewellyn et al., 1987, J. Mol.
Biol., 195:115-123; Hanahan et al. 1983, J. Mol. Biol.,
166:557-580). Once inside the cell, the transfer vector may
replicate, and be transcribed by cellular polymerases to produce
the RNA or it may be integrated into the genome of the host cell.
Alternatively, a transfer vector containing sequences encoding one
or more of the RNAs may be transfected into cells or introduced
into cells by way of micromanipulation techniques such as
microinjection, such that the transfer vector or a part thereof
becomes integrated into the genome of the host cell.
[0271] The activity of a CR protein can be modulated by modulating
the interaction of a CR protein with its binding partners. In one
embodiment, agents, e.g., antibodies, peptides, aptamers, small
organic or inorganic molecules, can be used to inhibit binding of a
CR protein binding partner to treat cancer. In another embodiment,
agents, e.g., antibodies, aptamers, small organic or inorganic
molecules, can be used to inhibit the activity of a CR protein to
treat cancer. In other embodiments, when the CR protein is a
kinase, the invention provides small molecule inhibitors of the CR
protein. A small molecule inhibitor is a low molecular weight
phosphorylation inhibitor. As used herein, a small molecule refers
to an organic or inorganic molecule having a molecular weight is
under 1000 Daltons, preferably in the range between 300 to 700
Daltons, which is not a nucleic acid molecule or a peptide
molecule. The small molecule can be naturally occurring, e.g.,
extracted from plant or microorganisms, or non-naturally occurring,
e.g., generated de novo by synthesis. A small molecule that is an
inhibitor can be used to block a cellular process that dependent on
a CR protein. In one embodiment, the inhibitors are substrate
mimics. In a preferred embodiment, the inhibitor of the CR proteins
is an ATP mimic. In one embodiment, such ATP mimics possess at
least two aromatic rings. In a preferred embodiment, the ATP mimic
comprises a moiety that forms extensive contacts with residues
lining the ATP binding cleft of the CR protein and/or peptide
segments just outside the cleft, thereby selectively blocking the
ATP binding site of the CR protein. Minor structural differences
from ATP can be introduced into the ATP mimic based on the peptide
segments just outside the cleft. Such differences can lead to
specific hydrogen bonding and hydrophobic interactions with the
peptide segments just outside the cleft.
[0272] In still other embodiments, antibodies that specifically
bind the CR protein are used. In a preferred embodiment, the
invention provides antibodies that specifically bind the
extracellular domain of a CR protein that is a receptor. Antibodies
that specifically bind a target can be obtained using standard
method known in the art, e.g., a method described in Section
5.8.
[0273] In one embodiment, an antibody-drug conjugate comprising an
antibody that specifically binds a cell surface expressed CR
protein is used. The efficacy of the antibodies that targets CR
protein can be increased by attaching toxins to them. Existing
immunotoxins are based on bacterial toxins like pseudomonas
exotoxin, plant exotoxin like ricin or radio-nucleotides. The
toxins are chemically conjugated to a specific ligand such as the
variable domain of the heavy or light chain of the monoclonal
antibody. Normal cells lacking the cancer specific antigens are not
targeted by the targeted antibody.
[0274] In other embodiments, a peptide and peptidomimetic that
interferes with the interaction of a CR protein with its
interaction partner is used. A peptide preferably has a size of at
least 5, 10, 15, 20 or 30 amino acids. Such a peptide or
peptidomimetic can be designed by a person skilled in the art based
on the sequence and structure of a CR protein. In one embodiment, a
peptide or peptidomimetic that interferes with substrate binding of
a CR protein is used. In another embodiment, peptide or
peptidomimetic that interferes with the binding of a signal
molecule to a CR protein is used. In some embodiments of the
invention, a fragment or polypeptide of at least 5, 10, 20, 50, 100
amino acids in length of a CR protein are used.
[0275] In another embodiment, a dominant negative mutant of a CR
protein is used to reduce activity of a CR protein. Such a dominant
negative mutant can be designed by a person skilled in the art
based on the sequence and structure of a CR protein. In one
embodiment, a dominant negative mutant that interferes with
substrate binding of a CR protein is used. In another embodiment, a
dominant negative mutant that interferes with the binding of a
signal molecule to a CR protein is used. In a preferred embodiment,
the invention provides a dominant negative mutant that comprises
the C-terminal region of a CR protein. In another embodiment, the
invention provides a dominant negative mutant that comprises the
N-terminal region of the CR protein.
[0276] Gene therapy can be used for delivering any of the above
described nucleic acid and protein/peptide therapeutics into target
cells. Gene therapy is particularly useful for enhancing aberrantly
down-regulated genes. Exemplary methods for carrying out gene
therapy are described below. For general reviews of the methods of
gene therapy, see Goldspiel et al., 1993, Clinical Pharmacy
12:488-505; Wu and Wu, 1991, Biotherapy 3:87-95; Tolstoshev, 1993,
Ann. Rev. Pharmacol. Toxicol. 32:573-596; Mulligan, 1993, Science
260:926-932; and Morgan and Anderson, 1993, Ann. Rev. Biochem.
62:191-217; May, 1993, TIBTECH 11(5):155-215). Methods commonly
known in the art of recombinant DNA technology which can be used
are described in Ausubel et al. (eds.), 1993, Current Protocols in
Molecular Biology, John Wiley & Sons, New York; and Kriegler,
1990, Gene Transfer and Expression, A Laboratory Manual, Stockton
Press, New York.
[0277] In a preferred embodiment, the therapeutic comprises a
nucleic acid that is part of an expression vector that expresses
the therapeutic nucleic acid or peptide/polypeptide in a suitable
host. In particular, such a nucleic acid has a promoter operably
linked to the coding region, said promoter being inducible or
constitutive, and, optionally, tissue-specific. In another
particular embodiment, a nucleic acid molecule is used in which the
coding sequences and any other desired sequences are flanked by
regions that promote homologous recombination at a desired site in
the genome, thus providing for intrachromosomal expression of the
CR nucleic acid (see e.g., Koller and Smithies, 1989, Proc. Natl.
Acad. Sci. U.S.A. 86:8932-8935; Zijlstra et al., 1989, Nature
342:435-438).
[0278] Delivery of the nucleic acid into a patient may be either
direct, in which case the patient is directly exposed to the
nucleic acid or nucleic acid-carrying vector, or indirect, in which
case, cells are first transformed with the nucleic acid in vitro,
then transplanted into the patient. These two approaches are known,
respectively, as in vivo or ex vivo gene therapy.
[0279] In a specific embodiment, the nucleic acid is directly
administered in vivo, where it is expressed to produce the encoded
product. This can be accomplished by any of numerous methods known
in the art, e.g., by constructing it as part of an appropriate
nucleic acid expression vector and administering it so that it
becomes intracellular, e.g., by infection using a defective or
attenuated retroviral or other viral vector (see U.S. Pat. No.
4,980,286), or by direct injection of naked DNA, or by use of
microparticle bombardment (e.g., a gene gun; Biolistic, Dupont), or
coating with lipids or cell-surface receptors or transfecting
agents, encapsulation in liposomes, microparticles, or
microcapsules, or by administering it in linkage to a peptide which
is known to enter the nucleus, by administering it in linkage to a
ligand subject to receptor-mediated endocytosis (see e.g., Wu and
Wu, 1987, J. Biol. Chem. 262:4429-4432) (which can be used to
target cell types specifically expressing the receptors), etc. In
another embodiment, a nucleic acid-ligand complex can be formed in
which the ligand comprises a fusogenic viral peptide to disrupt
endosomes, allowing the nucleic acid to avoid lysosomal
degradation. In yet another embodiment, the nucleic acid can be
targeted in vivo for cell specific uptake and expression, by
targeting a specific receptor (see, e.g., PCT Publications WO
92/06180 dated Apr. 16, 1992 (Wu et al.); WO 92/22635 dated Dec.
23, 1992 (Wilson et al.); WO92/20316 dated Nov. 26, 1992 (Findeis
et al.); WO93/14188 dated Jul. 22, 1993 (Clarke et al.), WO
93/20221 dated Oct. 14, 1993 (Young)). Alternatively, the nucleic
acid can be introduced intracellularly and incorporated within host
cell DNA for expression, by homologous recombination (Koller and
Smithies, 1989, Proc. Natl. Acad. Sci. U.S.A. 86:8932-8935;
Zijlstra et al., 1989, Nature 342:435-438).
[0280] In a specific embodiment, a viral vector that contains the
nucleic acid of a CR gene is used. For example, a retroviral vector
can be used (see Miller et al., 1993, Meth. Enzymol. 217:581-599).
These retroviral vectors have been modified to delete retroviral
sequences that are not necessary for packaging of the viral genome
and integration into host cell DNA. The CR nucleic acid to be used
in gene therapy is cloned into the vector, which facilitates
delivery of the gene into a patient. More detail about retroviral
vectors can be found in Boesen et al., 1994, Biotherapy 6:291-302,
which describes the use of a retroviral vector to deliver the mdr1
gene to hematopoietic stem cells in order to make the stem cells
more resistant to chemotherapy. Other references illustrating the
use of retroviral vectors in gene therapy are: Clowes et al., 1994,
J. Clin. Invest. 93:644-651; Kiem et al., 1994, Blood 83:1467-1473;
Salmons and Gunzberg, 1993, Human Gene Therapy 4:129-141; and
Grossman and Wilson, 1993, Curr. Opin. Genet. and Devel.
3:110-114.
[0281] Adenoviruses are other viral vectors that can be used in
gene therapy. Adenoviruses are especially attractive vehicles for
delivering genes to respiratory epithelia. Adenoviruses naturally
infect respiratory epithelia where they cause a mild disease. Other
targets for adenovirus-based delivery systems are liver, the
central nervous system, endothelial cells, and muscle. Adenoviruses
have the advantage of being capable of infecting non-dividing
cells. Kozarsky and Wilson (1993, Current Opinion in Genetics and
Development 3:499-503) present a review of adenovirus-based gene
therapy. Bout et al. (1994, Human Gene Therapy 5:3-10) demonstrated
the use of adenovirus vectors to transfer genes to the respiratory
epithelia of rhesus monkeys. Other instances of the use of
adenoviruses in gene therapy can be found in Rosenfeld et al.,
1991, Science 252:431-434; Rosenfeld et al., 1992, Cell 68:143-155;
and Mastrangeli et al., 1993, J. Clin. Invest. 91:225-234.
[0282] Adeno-associated virus (AAV) has also been proposed for use
in gene therapy (Walsh et al., 1993, Proc. Soc. Exp. Biol. Med.
204:289-300).
[0283] Another approach to gene therapy involves transferring a
gene to cells in tissue culture by such methods as electroporation,
lipofection, calcium phosphate mediated transfection, or viral
infection. Usually, the method of transfer includes the transfer of
a selectable marker to the cells. The cells are then placed under
selection to isolate those cells that have taken up and are
expressing the transferred gene. Those cells are then delivered to
a patient.
[0284] In this embodiment, the nucleic acid is introduced into a
cell prior to administration in vivo of the resulting recombinant
cell. Such introduction can be carried out by any method known in
the art, including but not limited to transfection,
electroporation, microinjection, infection with a viral or
bacteriophage vector containing the nucleic acid sequences, cell
fusion, chromosome-mediated gene transfer, microcell-mediated gene
transfer, spheroplast fusion, etc. Numerous techniques are known in
the art for the introduction of foreign genes into cells (see e.g.,
Loeffler and Behr, 1993, Meth. Enzymol. 217:599-618; Cohen et al.,
1993, Meth. Enzymol. 217:618-644; Cline, 1985, Pharmac. Ther.
29:69-92) and may be used in accordance with the present invention,
provided that the necessary developmental and physiological
functions of the recipient cells are not disrupted. The technique
should provide for the stable transfer of the nucleic acid to the
cell, so that the nucleic acid is expressible by the cell and
preferably heritable and expressible by its cell progeny.
[0285] The resulting recombinant cells can be delivered to a
patient by various methods known in the art. In a preferred
embodiment, epithelial cells are injected, e.g., subcutaneously. In
another embodiment, recombinant skin cells may be applied as a skin
graft onto the patient. Recombinant blood cells (e.g.,
hematopoietic stem or progenitor cells) are preferably administered
intravenously. The amount of cells envisioned for use depends on
the desired effect, patient state, etc., and can be determined by
one skilled person in the art.
[0286] Cells into which a nucleic acid can be introduced for
purposes of gene therapy encompass any desired, available cell
type, and include but are not limited to epithelial cells,
endothelial cells, keratinocytes, fibroblasts, muscle cells,
hepatocytes; blood cells such as T lymphocytes, B lymphocytes,
monocytes, macrophages, neutrophils, eosinophils, megakaryocytes,
granulocytes; various stem or progenitor cells, in particular
hematopoietic stem or progenitor cells, e.g., as obtained from bone
marrow, umbilical cord blood, peripheral blood, fetal liver,
etc.
[0287] In a preferred embodiment, the cell used for gene therapy is
autologous to the patient.
[0288] In an embodiment in which recombinant cells are used in gene
therapy, a nucleic acid is introduced into the cells such that it
is expressible by the cells or their progeny, and the recombinant
cells are then administered in vivo for therapeutic effect. In a
specific embodiment, stem or progenitor cells are used. Such stem
cells can be hematopoietic stem cells (HSC).
[0289] Any technique which provides for the isolation, propagation,
and maintenance in vitro of HSC can be used in this embodiment of
the invention. Techniques by which this may be accomplished include
(a) the isolation and establishment of HSC cultures from bone
marrow cells isolated from the future host, or a donor, or (b) the
use of previously established long-term HSC cultures, which may be
allogeneic or xenogeneic. Non-autologous HSC are used preferably in
conjunction with a method of suppressing transplantation immune
reactions of the future host/patient. In a particular embodiment of
the present invention, human bone marrow cells can be obtained from
the posterior iliac crest by needle aspiration (see e.g., Kodo et
al., 1984, J. Clin. Invest. 73:1377-1384). The HSCs can be made
highly enriched or in substantially pure form. This enrichment can
be accomplished before, during, or after long-term culturing, and
can be done by any techniques known in the art. Long-term cultures
of bone marrow cells can be established and maintained by using,
for example, modified Dexter cell culture techniques (Dexter et
al., 1977, J. Cell Physiol. 91:335) or Witlock-Witte culture
techniques (Witlock and Witte, 1982, Proc. Natl. Acad. Sci. U.S.A.
79:3608-3612).
[0290] In a specific embodiment, the nucleic acid to be introduced
for purposes of gene therapy comprises an inducible promoter
operably linked to the coding region, such that expression of the
nucleic acid is controllable by controlling the presence or absence
of the appropriate inducer of transcription.
[0291] The methods and/or compositions described above for
modulating the expression and/or activity of a CR gene or CR
protein may be used to treat patients in conjunction with a
chemotherapeutic agent, e.g., GleevecTM.
[0292] The effects or benefits of administration of the
compositions of the invention alone or in conjunction with a
chemotherapeutic agent can be evaluated by any methods known in the
art, e.g., by methods that are based on measuring the survival
rate, side effects, dosage requirement of the chemotherapeutic
agent, or any combinations thereof. If the administration of the
compositions of the invention achieves any one or more benefits in.
a patient, such as increasing the survival rate, decreasing side
effects, lowering the dosage requirement for the chemotherapeutic
agent, the compositions of the invention are said to have augmented
a chemotherapy treatment, and the method is said to have
efficacy.
5.5. Methods for Screening Agents that Modulate CR Proteins
[0293] Agents that modulate the expression or activity of a
chemotherapy response gene or encoded protein, or modulate
interaction of a chemotherapy response protein with other proteins
or molecules can be identified using a method described in this
section. Such agents are useful in treating cancer patients who
exhibit non-responsiveness to chemotherapy. The methods described
in this section can be performed in vivo, e.g., using cells that
are in vivo. The methods described in this section can also be
performed in vitro, e.g., using cells in a cell culture.
5.5.1. Screening Assays
[0294] The following assays are designed to identify compounds that
bind to a chemotherapy response gene or its products, bind to other
cellular proteins that interact with a chemotherapy response
protein, bind to cellular constituents, e.g., proteins, that are
affected by a chemotherapy response protein, or bind to compounds
that interfere with the interaction of the chemotherapy response
gene or its product with other cellular proteins and to compounds
which modulate the expression or activity of a chemotherapy
response gene (i.e., modulate the expression level of the
chemotherapy response gene and/or modulate the activity level of
the chemotherapy response protein). Assays may additionally be
utilized which identify compounds which bind to chemotherapy
response protein regulatory sequences (e.g., promoter sequences),
see e.g., Platt, K.A., 1994, J. Biol. Chem. 269:28558-28562, which
is incorporated herein by reference in its entirety, which may
modulate the level of chemotherapy response gene expression.
Compounds may include, but are not limited to, small organic
molecules which are able to affect expression of the chemotherapy
response gene or some other gene involved in the chemotherapy
response protein pathways, or other cellular proteins. Further,
among these compounds are compounds which affect the level of
chemotherapy response gene expression and/or chemotherapy response
protein activity and which can be used in the regulation of
sensitivity to the effect of a chemotherapy agent.
[0295] Compounds may include, but are not limited to, peptides such
as, for example, soluble peptides, including but not limited to,
Ig-tailed fusion peptides, and members of random peptide libraries
(see, e.g., Lam, K. S. et al., 1991, Nature 354:82-84; Houghten, R.
et al., 1991, Nature 354:84-86), and combinatorial
chemistry-derived molecular library made of D- and/or
L-configuration amino acids, phosphopeptides (including, but not
limited to members of random or partially degenerate, directed
phosphopeptide libraries; see, e.g., Songyang, Z. et al., 1993,
Cell 72:767-778), antibodies (including, but not limited to,
polyclonal, monoclonal, humanized, anti-idiotypic, chimeric or
single chain antibodies, and Fab, F(ab').sub.2 and Fab expression
library fragments, and epitope-binding fragments thereof), and
small organic or inorganic molecules.
[0296] Compounds identified via assays such as those described
herein may be useful, for example, in modulating the biological
function of the chemotherapy response protein.
[0297] In vitro systems may be designed to identify compounds
capable of binding a chemotherapy response protein. Compounds
identified may be useful, for example, in modulating the activity
of wild type and/or mutant chemotherapy response protein, may be
useful in elaborating the biological function of the chemotherapy
response protein, may be utilized in screens for identifying
compounds that disrupt normal chemotherapy response protein
interactions, or may in themselves disrupt such interactions.
[0298] The principle of the assays used to identify compounds that
bind to the chemotherapy response protein involves preparing a
reaction mixture of the chemotherapy response protein and the test
compound under conditions and for a time sufficient to allow the
two components to interact and bind, thus forming a complex which
can be removed and/or detected in the reaction mixture. These
assays can be conducted in a variety of ways. For example, one
method to conduct such an assay would involve anchoring
chemotherapy response protein or the test substance onto a solid
phase and detecting chemotherapy response protein/test compound
complexes anchored on the solid phase at the end of the reaction.
In one embodiment of such a method, the chemotherapy response
protein may be anchored onto a solid surface, and the test
compound, which is not anchored, may be labeled, either directly or
indirectly.
[0299] In practice, microtiter plates may conveniently be utilized
as the solid phase. The anchored component may be immobilized by
non-covalent or covalent attachments. Non-covalent attachment may
be accomplished by simply coating the solid surface with a solution
of the protein and drying. Alternatively, an immobilized antibody,
preferably a monoclonal antibody, specific for the protein to be
immobilized may be used to anchor the protein to the solid surface.
The surfaces may be prepared in advance and stored.
[0300] In order to conduct the assay, the nonimmobilized component
is added to the coated surface containing the anchored component.
After the reaction is complete, unreacted components are removed
(e.g., by washing) under conditions such that any complexes formed
will remain immobilized on the solid surface. The detection of
complexes anchored on the solid surface can be accomplished in a
number of ways. Where the previously nonimmobilized component is
pre-labeled, the detection of label immobilized on the surface
indicates that complexes were formed. Where the previously
nonimmobilized component is not pre-labeled, an indirect label can
be used to detect complexes anchored on the surface; e.g., using a
labeled antibody specific for the previously nonimmobilized
component (the antibody, in turn, may be directly labeled or
indirectly labeled with a labeled anti-Ig antibody).
[0301] Alternatively, a reaction can be conducted in a liquid
phase, the reaction products separated from unreacted components,
and complexes detected; e.g., using an immobilized antibody
specific for a chemotherapy response protein or the test compound
to anchor any complexes formed in solution, and a labeled antibody
specific for the other component of the possible complex to detect
anchored complexes.
[0302] The chemotherapy response gene or chemotherapy response
protein may interact in vivo with one or more intracellular or
extracellular molecules, such as proteins. For purposes of this
discussion, such molecules are referred to herein as "binding
partners". Compounds that disrupt chemotherapy response protein
binding may be useful in modulating the activity of the
chemotherapy response protein. Compounds that disrupt chemotherapy
response gene binding may be useful in modulating the expression of
the chemotherapy response gene, such as by modulating the binding
of a regulator of chemotherapy response gene. Such compounds may
include, but are not limited to molecules such as peptides which
would be capable of gaining access to the chemotherapy response
protein.
[0303] The basic principle of the assay systems used to identify
compounds that interfere with the interaction between the
chemotherapy response protein and its intracellular or
extracellular binding partner or partners involves preparing a
reaction mixture containing the chemotherapy response protein, and
the binding partner under conditions and for a time sufficient to
allow the two to interact and bind, thus forming a complex. In
order to test a compound for inhibitory activity, the reaction
mixture is prepared in the presence and absence of the test
compound. The test compound may be initially included in the
reaction mixture, or may be added at a time subsequent to the
addition of a chemotherapy response protein and its binding
partner. Control reaction mixtures are incubated without the test
compound or with a placebo. The formation of any complexes between
the chemotherapy response protein and the binding partner is then
detected. The formation of a complex in the control reaction, but
not in the reaction mixture containing the test compound, indicates
that the compound interferes with the interaction of the
chemotherapy response protein and the interactive binding partner.
Additionally, complex formation within reaction mixtures containing
the test compound and a normal chemotherapy response protein may
also be compared to complex formation within reaction mixtures
containing the test compound and a mutant chemotherapy response
protein. This comparison may be important in those cases where it
is desirable to identify compounds that disrupt interactions of
mutant but hot the normal chemotherapy response protein.
[0304] The assay for compounds that interfere with the interaction
of the chemotherapy response proteins and binding partners can be
conducted in a heterogeneous or homogeneous format. Heterogeneous
assays involve anchoring either the chemotherapy response protein
or the binding partner onto a solid phase and detecting complexes
anchored on the solid phase at the end of the reaction. In
homogeneous assays, the entire reaction is carried out in a liquid
phase. In either approach, the order of addition of reactants can
be varied to obtain different information about the compounds being
tested. For example, test compounds that interfere with the
interaction between the chemotherapy response proteins and the
binding partners, e.g., by competition, can be identified by
conducting the reaction in the presence of the test substance;
i.e., by adding the test substance to the reaction mixture prior to
or simultaneously with the chemotherapy response protein and
interactive binding partner. Alternatively, test compounds that
disrupt preformed complexes, e.g. compounds with higher binding
constants that displace one of the components from the complex, can
be tested by adding the test compound to the reaction mixture after
complexes have been formed. The various formats are described
briefly below.
[0305] In a heterogeneous assay system, either the chemotherapy
response protein or its interactive binding partner, is anchored
onto a solid surface, while the non-anchored species is labeled,
either directly or indirectly. In practice, microtiter plates are
conveniently utilized. The anchored species may be immobilized by
non-covalent or covalent attachments. Non-covalent attachment may
be accomplished simply by coating the solid surface with a solution
of the chemotherapy response protein or binding partner and drying.
Alternatively, an immobilized antibody specific for the species to
be anchored may be used to anchor the species to the solid surface.
The surfaces may be prepared in advance and stored.
[0306] In order to conduct the assay, the partner of the
immobilized species is exposed to the coated surface with or
without the test compound. After the reaction is complete,
unreacted components are removed (e.g., by washing) and any
complexes formed will remain immobilized on the solid surface. The
detection of complexes anchored on the solid surface can be
accomplished in a number of ways. Where the non-immobilized species
is pre-labeled, the detection of label immobilized on the surface
indicates that complexes were formed. Where the non-immobilized
species is not pre-labeled, an indirect label can be used to detect
complexes anchored on the surface; e.g., using a labeled antibody
specific for the initially non-immobilized species (the antibody,
in turn, may be directly labeled or indirectly labeled with a
labeled anti-Ig antibody). Depending upon the order of addition of
reaction components, test compounds which inhibit complex formation
or which disrupt preformed complexes can be detected.
[0307] Alternatively, the reaction can be conducted in a liquid
phase in the presence or absence of the test compound, the reaction
products separated from unreacted components, and complexes
detected; e.g., using an immobilized antibody specific for one of
the binding components to anchor any complexes formed in solution,
and a labeled antibody specific for the other partner to detect
anchored complexes. Again, depending upon the order of addition of
reactants to the liquid phase, test compounds which inhibit complex
or which disrupt preformed complexes can be identified.
[0308] In an alternative embodiment, a homogeneous assay can be
used. In this approach, a preformed complex of the chemotherapy
response protein and the interactive binding partner is prepared in
which either the chemotherapy response protein or its binding
partners is labeled, but the signal generated by the label is
quenched due to complex formation (see, e.g., U.S. Pat. No.
4,109,496 which utilizes this approach for immunoassays). The
addition of a test substance that competes with and displaces one
of the species from the preformed complex will result in the
generation of a signal above background. In this way, test
substances which disrupt chemotherapy response protein/binding
partner interaction can be identified.
[0309] In a particular embodiment, the chemotherapy response
protein can be prepared for immobilization using recombinant DNA
techniques. For example, the coding region of chemotherapy response
gene can be fused to a glutathione-S-transferase (GST) gene using a
fusion vector, such as pGEX-5X-1, in such a manner that its binding
activity is maintained in the resulting fusion protein. The
interactive binding partner can be purified and used to raise a
monoclonal antibody, using methods routinely practiced in the art.
This antibody can be labeled with the radioactive isotope
.sup.125I, for example, by methods routinely practiced in the art.
In a heterogeneous assay, e.g., the GST-chemotherapy response
protein fusion protein can be anchored to glutathione-agarose
beads. The interactive binding partner can then be added in the
presence or absence of the test compound in a manner that allows
interaction and binding to occur. At the end of the reaction
period, unbound material can be washed away, and the labeled
monoclonal antibody can be added to the system and allowed to bind
to the complexed components. The interaction between the
chemotherapy response protein and the interactive binding partner
can be detected by measuring the amount of radioactivity that
remains associated with the glutathione-agarose beads. A successful
inhibition of the interaction by the test compound will result in a
decrease in measured radioactivity.
[0310] Alternatively, the GST-chemotherapy response protein fusion
protein and the interactive binding partner can be mixed together
in liquid in the absence of the solid glutathione-agarose beads.
The test compound can be added either during or after the species
are allowed to interact. This mixture can then be added to the
glutathione-agarose beads and unbound material is washed away.
Again the extent of inhibition of the chemotherapy response
protein/binding partner interaction can be detected by adding the
labeled antibody and measuring the radioactivity associated with
the beads.
[0311] In another embodiment of the invention, these same
techniques can be employed using peptide fragments that correspond
to the binding domains of the chemotherapy response protein and/or
the interactive binding partner (in cases where the binding partner
is a protein), in place of one or both of the full length proteins.
Any number of methods routinely practiced in the art can be used to
identify and isolate the binding sites. These methods include, but
are not limited to, mutagenesis of the gene encoding one of the
proteins and screening for disruption of binding in a
co-immunoprecipitation assay. Compensating mutations in the gene
encoding the second species in the complex can then be selected.
Sequence analysis of the genes encoding the respective proteins
will reveal the mutations that correspond to the region of the
protein involved in interactive binding. Alternatively, one protein
can be anchored to a solid surface using methods described in this
section above, and allowed to interact with and bind to its labeled
binding partner, which has been treated with a proteolytic enzyme,
such as trypsin. After washing, a short, labeled peptide comprising
the binding domain may remain associated with the solid material,
which can be isolated and identified by amino acid sequencing.
Also, once the gene coding for the binding partner is obtained,
short gene segments can be engineered to express peptide fragments
of the protein, which can then be tested for binding activity and
purified or synthesized.
[0312] For example, and not by way of limitation, a chemotherapy
response protein can be anchored to a solid material as described
in this section, above, by making a GST-chemotherapy response
protein fusion protein and allowing it to bind to glutathione
agarose beads. The interactive binding partner can be labeled with
a radioactive isotope, such as .sup.35S, and cleaved with a
proteolytic enzyme such as trypsin. Cleavage products can then be
added to the anchored GST-chemotherapy response protein fusion
protein and allowed to bind. After washing away unbound peptides,
labeled bound material, representing the binding partner binding
domain, can be eluted, purified, and analyzed for amino acid
sequence by well-known methods. Peptides so identified can be
produced synthetically or fused to appropriate facilitative
proteins using recombinant DNA technology.
[0313] Some chemotherapy response proteins are kinases. Kinase
activity of a chemotherapy response protein can be assayed in vitro
using a synthetic peptide substrate of a chemotherapy response
protein of interest, e.g., a GSK-derived biotinylated peptide
substrate. The phosphopeptide product is quantitated using a
Homogenous Time-Resolved Fluorescence (HTRF) assay system (Park et
al., 1999, Anal. Biochem. 269:94-104). The reaction mixture
contains suitable amounts of ATP, peptide substrate, and the
chemotherapy response protein. The peptide substrate has a suitable
amino acid sequence and is biotinylated at the N-terminus. The
kinase reaction is incubated, and then terminated with
Stop/Detection Buffer and GSK3.alpha. anti-phosphoserine antibody
(e.g., Cell Signaling Technologies, Beverly, Mass.; Cat #9338)
labeled with europium-chelate (e.g., from Perkin Elmer, Boston,
Mass.). The reaction is allowed to equilibrate, and relative
fluorescent units are determined. Inhibitor compounds are assayed
in the reaction described above, to determine compound IC50s. A
particular compound is added to in a half-log dilution series
covering a suitable range of concentrations, e.g., from 1 nM to 100
Relative phospho substrate formation, read as HTRF fluorescence
units, is measured over the range of compound concentrations and a
titration curve generated using a four parameter sigmoidal fit.
Specific compounds having IC.sub.50 below a predetermined threshold
value, e.g., .ltoreq.50 .mu.M against a substrate, can be
identified.
[0314] The extent of peptide phosphorylation can be determined by
Homogeneous Time Resolved Fluorescence (HTRF) using a lanthanide
chelate (Lance)-coupled monoclonal antibody specific for the
phosphopeptide in combination with a streptavidin-linked
allophycocyanin (SA-APC) fluorophore which binds to the biotin
moiety on the peptide. When the Lance and APC are in proximity
(i.e. bound to the same phosphopeptide molecule), a non-radiative
energy transfer takes place from the Lance to the APC, followed by
emission of light from APC at 665 nm. The assay can be run using
various assay format, e.g., streptavidin flash plate assay,
streptavidin filter plate assay.
[0315] A standard PICA assay can be used to assay the activity of
protein kinase A (PICA). A standard PKC assay can be used to assay
the activity of protein kinase C (PKC). The most common methods for
assaying PKA or PKC activity involves measuring the transfer of
.sup.32P-labeled phosphate to a protein or peptide substrate that
can be captured on phosphocellulose filters via weak electrostatic
interactions.
[0316] Kinase inhibitors can be identified using fluorescence
polarization to monitor kinase activity. This assay utilizes
GST-chemotherapy response protein, peptide substrate, peptide
substrate tracer, an anti-phospho monoclonal IgG, and the inhibitor
compound. Reactions are incubated for a period of time and then
terminated. Stopped reactions are incubated and fluorescence
polarization values determined.
[0317] In a specific embodiment, a standard SPA Filtration Assay
and FlashPlate.RTM. Kinase Assay can be used to measure the
activity of a chemotherapy response protein. In these assays,
GST-chemotherapy response protein, biotinylated peptide substrate,
ATP, and .sup.33P-.gamma.-ATP are allowed to react. After a
suitable period of incubation, the reactions are terminated. In a
SPA Filtration Assay, peptide substrate is allowed to bind
Scintilation proximity assay (SPA) beads (Amersham Biosciences),
followed by filtration on a Packard GF/B Unifilter plate and washed
with phosphate buffered saline. pried plates are sealed and the
amount of .sup.33P incorporated into the peptide substrate is
determined. In a FlashPlate.RTM. Kinase Assay, a suitable amount of
the reaction is transferred to streptavidin-coated FlashPlates.RTM.
(NEN) and incubated. Plates are washed, dried, sealed and the
amount of .sup.33P incorporated into the peptide substrate is
determined.
[0318] A standard DELFIA.RTM. Kinase Assay can also be used. In a
DELFIA.RTM. Kinase Assay, GST-chemotherapy response protein,
peptide substrate, and ATP are allowed to react. After the
reactions are terminated, the biotin-peptide substrates are
captured in the stopped reactions. Wells are washed and reacted
with anti-phospho polyclonal antibody and europium labeled
anti-rabbit-IgG. Wells are washed and europium released from the
bound antibody is detected.
[0319] Other assays, such as those described in WO 04/080973, WO
02/070494, and WO 03/101444, may also be utilized to determine
biological activity of the instant compounds.
5.5.2. Screening Compounds that Modulate Expression or Activity of
a Gene and/or its Products
[0320] For chemotherapy response genes that are kinases, inhibitor
compounds can be assayed for their ability to inhibit a
chemotherapy response protein by monitoring the phosphorylation or
autophosphorylation in response to the compound. Cells are grown in
culture medium. Cells are pooled, counted, seeded into 6 well
dishes at 200,000 cells per well in 2 ml media, and incubated.
Serial dilution series of compounds or control are added to each
well and incubated. Following the incubation period, each well is
washed and Protease Inhibitor Cocktail Complete is added to each
well. Lysates are then transferred to microcentrifuge tubes and
frozen at -80.degree. C. Lysates are thawed on ice and cleared by
centrifugation and the supernatants are transferred to clean tubes.
Samples are electorphoresed and proteins are transferred onto PVDF.
Blots are then blocked and probed using an antibody against
phospho-serine or phospho threonine. Bound antibody is visualized
using a horseradish peroxidase conjugated secondary antibody and
enhanced chemiluminescence. After stripping of the first antibody
set, blots are re-probed for total chemotherapy response protein,
using a monoclonal antibody specific for the chemotherapy response
protein. The chemotherapy response protein monoclonal is detected
using a sheep anti-mouse IgG coupled to horseradish peroxidase and
enhanced chemiluminescence. ECL exposed films are scanned and the
intensity of specific bands is quantitated. Titrations are
evaluated for level of phosphor-Ser signal normalized to total
chemotherapy response protein and IC50 values are calculated.
[0321] Detection of phosphonucleolin in cell lysates can be carried
out using biotinylated anti-nucleolin antibody and ruthenylated
goat anti-mouse antibody. To each well of a 96-well plate is added
biotynylated anti-nucleolin antibody and streptavidin coated
paramagnetic beads, along with a suitable cell lysate. The
antibodies and lysate are incubated. Next, another
anti-phosphonucleolin antibody are added to each well of the lysate
mix and incubated. Lastly, the ruthenylated goat anti-mouse
antibody in antibody buffer is added to each well and incubated.
The lysate antibody mixtures are read and EC50s for compound
dependent increases in phosphor-nucleolin are determined.
[0322] The compounds identified in the screen include compounds
that demonstrate the ability to selectively modulate the expression
or activity of a chemotherapy response gene or its encoded protein.
These compounds include but are not limited to siRNA, antisense
nucleic acid, ribozyme, triple helix forming nucleic acid,
antibody, and polypeptide molecules, aptamrs, and small organic or
inorganic molecules.
5.6. Methods of Performing RNA Interference
[0323] Any method known in the art for gene silencing can be used
in the present invention (see, e.g., Guo et al., 1995, Cell
81:611-620; Fire et al., 1998, Nature 391:806-811; Grant, 1999,
Cell 96:303-306; Tabara et al., 1999, Cell 99:123-132; Zamore et
al., 2000, Cell 101:25-33; Bass, 2000, Cell 101:235-238; Petcherski
et al., 2000, Nature 405:364-368; Elbashir et al., Nature
411:494-498; Paddison et al., Proc. Natl. Acad. Sci. USA
99:1443-1448). The siRNAs targeting a gene can be designed
according to methods known in the art (see, e.g., International
Application Publication No. WO 2005/018534, published on Mar. 3,
2005, and Elbashir et al., 2002, Methods 26:199-213, each of which
is incorporated herein by reference in its entirety).
[0324] An siRNA having only partial sequence homology to a target
gene can also be used (see, e.g., International Application
Publication No. WO 2005/018534, published on Mar. 3, 2005, which is
incorporated herein by reference in its entirety). In one
embodiment, an siRNA that comprises a sense strand contiguous
nucleotide sequence of 11-18 nucleotides that is identical to a
sequence of a transcript of a gene but the siRNA does not have full
length homology to any sequences in the transcript is used to
silence the gene. Preferably, the contiguous nucleotide sequence is
in the central region of the siRNA molecules. A contiguous
nucleotide sequence in the central region of an siRNA can be any
continuous stretch of nucleotide sequence in the siRNA which does
not begin at the 3' end. For example, a contiguous nucleotide
sequence of 11 nucleotides can be the nucleotide sequence 2-12,
3-13, 4-14, 5-15, 6-16, 7-17, 8-18, or 9-19. In preferred
embodiments, the contiguous nucleotide sequence is 11-16, 11-15,
14-15, 11, 12, or 13 nucleotides in length.
[0325] In another embodiment, an siRNA that comprises a 3' sense
strand contiguous nucleotide sequence of 8-18 nucleotides which is
identical to a sequence of a transcript of a gene but which siRNA
does not have full length sequence identity to any contiguous
sequences in the transcript is used to silence the gene. In this
application, a 3' 8-18 nucleotide sequence is a continuous stretch
of nucleotides that begins at the first paired base, i.e., it does
not comprise the two base 3' overhang. Thus, when it is stated that
a particular nucleotide sequence is at the 3' end of the siRNA, the
2 base overhang is not considered. In preferred embodiments, the
contiguous nucleotide sequence is 8-16, 8-15, 8-12, 11, 10, 9, or 8
nucleotides in length.
[0326] An siRNA having only partial sequence homology to its target
genes is especially useful for silencing a plurality of different
genes in a cell. In one embodiment, an siRNA is used to silence a
plurality of different genes, the transcript of each of the genes
comprises a nucleotide sequence that is identical to a central
contiguous nucleotide sequence of at least 11 nucleotides of the
sense strand or the antisense strand of the siRNA, and/or comprises
a nucleotide sequence that is identical to a contiguous nucleotide
sequence of at least 9 nucleotides at the 3' end of the sense
strand or the antisense strand of the siRNA. In preferred
embodiments, the central contiguous nucleotide sequence is 11-15,
14-15, 11, 12, or 13 nucleotides in length. In other preferred
embodiments, the 3' contiguous nucleotide sequence is 8-15, 8-12,
11, 10, 9, or 8 nucleotides in length.
[0327] In one embodiment, in vitro siRNA transfection is carried
out as follows: one day prior to transfection, 100 microliters of
chosen cells, e.g., cervical cancer HeLa cells (ATCC, Cat. No.
CCL-2), grown in DMEM/10% fetal bovine serum (Invitrogen, Carlsbad,
Calif.) to approximately 90% confluency are seeded in a 96-well
tissue culture plate (Corning, Corning, N.Y.) at 1500 cells/well.
For each transfection 85 microliters of OptiMEM (Invitrogen) is
mixed with 5 microliter of serially diluted siRNA (Dharma on,
Denver) from a 20 micro molar stock. For each transfection 5
microliter OptiMEM is mixed with 5 microliter Oligofectamine
reagent (Invitrogen) and incubated 5 minutes at room temperature.
The 10 microliter OptiMEM/Oligofectamine mixture is dispensed into
each tube with the OptiMEM/siRNA mixture, mixed and incubated 15-20
minutes at room temperature. 10 microliter of the transfection
mixture is aliquoted into each well of the 96-well plate and
incubated for 4 hours at 37.degree. C. and 5% CO.sub.2.
[0328] In preferred embodiments, an siRNA pool containing at least
k (k=2, 3, 4, 5, 6 or 10) different siRNAs targeting the secondary
target gene at different sequence regions is used to transfect the
cells. In another preferred embodiment, an siRNA pool containing at
least k (k=2, 3, 4, 5, 6 or 10) different siRNAs targeting two or
more different target genes is used to transfect the cells.
[0329] In a preferred embodiment, the total siRNA concentration of
the pool is about the same as the concentration of a single siRNA
when used individually, e.g., 100 nM. Preferably, the total
concentration of the pool of siRNAs is an optimal concentration for
silencing the intended target gene. An optimal concentration is a
concentration further increase of which does not increase the level
of silencing substantially. In one embodiment, the optimal
concentration is a concentration further increase of which does not
increase the level of silencing by more than 5%, 10% or 20%. In a
preferred embodiment, the composition of the pool, including the
number of different siRNAs in the pool and the concentration of
each different siRNA, is chosen such that the pool of siRNAs causes
less than 30%, 20%, 10% or 5%, 1%, 0.1% or 0.01% of silencing of
any off-target genes (e.g., as determined by standard nucleic acid
assay, e.g., PCR). In another preferred embodiment, the
concentration of each different siRNA in the pool of different
siRNAs is about the same. In still another preferred embodiment,
the respective concentrations of different siRNAs in the pool are
different from each other by less than 5%, 10%, 20% or 50% of the
concentration of any one siRNA or said total siRNA concentration of
said different siRNAs. In still another preferred embodiment, at
least one siRNA in the pool of different siRNAs constitutes more
than 90%, 80%, 70%, 50%, or 20% of the total siRNA concentration in
the pool. In still another preferred embodiment, none of the siRNAs
in the pool of different siRNAs constitutes more than 90%, 80%,
70%, 50%, or 20% of the total siRNA concentration in the pool. In
other embodiments, each siRNA in the pool has an concentration that
is lower than the optimal concentration when used individually. In
a preferred embodiment, each different siRNA in the pool has an
concentration that is lower than the concentration of the siRNA
that is effective to achieve at least 30%, 50%, 75%, 80%, 85%, 90%
or 95% silencing when used in the absence of other siRNAs or in the
absence of other siRNAs designed to silence the gene. In another
preferred embodiment, each different siRNA in the pool has a
concentration that causes less than 30%, 20%, 10% or 5% of
silencing of the gene when used in the absence of other siRNAs or
in the absence of other siRNAs designed to silence the gene. In a
preferred embodiment, each siRNA has a concentration that causes
less than 30%, 20%, 10% or 5% of silencing of the target gene when
used alone, while the plurality of siRNAs causes at least 80% or
90% of silencing of the target gene.
[0330] Another method for gene silencing is to introduce an shRNA,
for short hairpin RNA (see, e.g., Paddison et al., 2002, Genes Dev.
16, 948-958; Brummelkamp et al., 2002, Science 296, 550-553; Sui,
G. et al. 2002, Proc. Natl. Acad. Sci. USA 99, 5515-5520, all of
which are incorporated by reference herein in their entirety),
which can be processed in the cells into siRNA. In this method, a
desired siRNA sequence is expressed from a plasmid (or virus) as an
inverted repeat with an intervening loop sequence to form a hairpin
structure. The resulting RNA transcript containing the hairpin is
subsequently processed by Dicer to produce siRNAs for silencing.
Plasmid-based shRNAs can be expressed stably in cells, allowing
long-term gene silencing in cells both in vitro and in vivo, e.g.,
in animals (see, McCaffrey et al. 2002, Nature 418, 38-39; Xia et
al., 2002, Nat. Biotech. 20, 1006-1010; Lewis et al., 2002, Nat.
Genetics 32, 107-108; Rubinson et al., 2003, Nat. Genetics 33,
401-406; Tiscornia et al., 2003, Proc. Natl. Acad. Sci. USA 100,
1844-1848, all of which are incorporated by reference herein in
their entirety). Thus, in one embodiment, a plasmid-based shRNA is
used.
[0331] In a preferred embodiment, shRNAs are expressed from
recombinant vectors introduced either transiently or stably
integrated into the genome (see, e.g., Paddison et al., 2002, Genes
Dev 16:948-958; Sui et al., 2002, Proc Natl Acad Sci USA
99:5515-5520; Yu et al., 2002, Proc Natl Acad Sci USA 99:6047-6052;
Miyagishi et al., 2002, Nat Biotechnol 20:497-500; Paul et al.,
2002, Nat Biotechnol 20:505-508; Kwak et al., 2003, J Pharmacol Sci
93:214-217; Brummelkamp et al., 2002, Science 296:550-553; Boden et
al., 2003, Nucleic Acids Res 31:5033-5038; Kawasaki et al., 2003,
Nucleic Acids Res 31:700-707). The siRNA that disrupts the target
gene can be expressed (via an shRNA) by any suitable vector which
encodes the shRNA. The vector can also encode a marker which can be
used for selecting clones in which the vector or a sufficient
portion thereof is integrated in the host genome such that the
shRNA is expressed. Any standard method known in the art can be
used to deliver the vector into the cells. In one embodiment, cells
expressing the shRNA are generated by transfecting suitable cells
with a plasmid containing the vector. Cells can then be selected by
the appropriate marker. Clones are then picked, and tested for
knockdown. In a preferred embodiment, the expression of the shRNA
is under the control of an inducible promoter such that the
silencing of its target gene can be turned on when desired.
Inducible expression of an siRNA is particularly useful for
targeting essential genes.
[0332] In one embodiment, the expression of the shRNA is under the
control of a regulated promoter that allows tuning of the silencing
level of the target gene. This allows screening against cells in
which the target gene is partially knocked out. As used herein, a
"regulated promoter" refers to a promoter that can be activated
when an appropriate inducing agent is present. An "inducing agent"
can be any molecule that can be used to activate transcription by
activating the regulated promoter. An inducing agent can be, but is
not limited to, a peptide or polypeptide, a hormone, or an organic
small molecule. An analogue of an inducing agent, i.e., a molecule
that activates the regulated promoter as the inducing agent does,
can also be used. The level of activity of the regulated promoter
induced by different analogues may be different, thus allowing more
flexibility in tuning the activity level of the regulated promoter.
The regulated promoter in the vector can be any mammalian
transcription regulation system known in the art (see, e.g., Gossen
et al, 1995, Science 268:1766-1769; Lucas et al, 1992, Annu. Rev.
Biochem. 61:1131; Li et al., 1996, Cell 85:319-329; Saez et al.,
2000, Proc. Natl. Acad. Sci. USA 97:14512-14517; and Pollock et
al., 2000, Proc. Natl. Acad. Sci. USA 97:13221-13226). In preferred
embodiments, the regulated promoter is regulated in a dosage and/or
analogue dependent manner. In one embodiment, the level of activity
of the regulated promoter is tuned to a desired level by a method
comprising adjusting the concentration of the inducing agent to
which the regulated promoter is responsive. The desired level of
activity of the regulated promoter, as obtained by applying a
particular concentration of the inducing agent, can be determined
based on the desired silencing level of the target gene.
[0333] In one embodiment, a tetracycline regulated gene expression
system is used (see, e.g., Gossen et al, 1995, Science
268:1766-1769; U.S. Pat. No. 6,004,941). A tet regulated system
utilizes components of the tet repressor/operator/inducer system of
prokaryotes to regulate gene expression in eukaryotic cells. Thus,
the invention provides methods for using the tet regulatory system
for regulating the expression of an shRNA linked to one or more tet
operator sequences. The methods involve introducing into a cell a
vector encoding a fusion protein that activates transcription. The
fusion protein comprises a first polypeptide that binds to a tet
operator sequence in the presence of tetracycline or a tetracycline
analogue operatively linked to a second polypeptide that activates
transcription in cells. By modulating the concentration of a
tetracycline, or a tetracycline analogue, expression of the tet
operator-linked shRNA is regulated.
[0334] In other embodiments, an ecdyson regulated gene expression
system (see, e.g., Saez et al., 2000, Proc. Natl. Acad. Sci. USA
97:14512-14517), or an MMTV glucocorticoid response element
regulated gene expression system (see, e.g., Lucas et al, 1992,
Annu. Rev. Biochem. 61:1131) may be used to regulate the expression
of the shRNA.
[0335] In one embodiment, the pRETRO-SUPER (pRS) vector which
encodes a puromycin-resistance marker and drives shRNA expression
from an H1 (RNA Pol III) promoter is used. The pRS-shRNA plasmid
can be generated by any standard method known in the art. In one
embodiment, the pRS-shRNA is deconvoluted from the library plasmid
pool for a chosen gene by transforming bacteria with the pool and
looking for clones containing only the plasmid of interest.
Preferably, a 19 mer siRNA sequence is used along with suitable
forward and reverse primers for sequence specific PCR. Plasmids are
identified by sequence specific PCR, and confirmed by sequencing.
Cells expressing the shRNA are generated by transfecting suitable
cells with the pRS-shRNA plasmid. Cells are selected by the
appropriate marker, e.g., puromycin, and maintained until colonies
are evident. Clones are then picked, and tested for knockdown. In
another embodiment, an shRNA is expressed by a plasmid, e.g., a
pRS-shRNA. The knockdown by the pRS-shRNA plasmid, can be achieved
by transfecting cells using Lipofectamine 2000 (Invitrogen).
[0336] In yet another method, siRNAs can be delivered to an organ
or tissue in an animal, such a human, in vivo (see, e.g., Song et
al. 2003, Nat. Medicine 9, 347-351; Sorensen et al., 2003, J. Mol.
Biol. 327, 761-766; Lewis et al., 2002, Nat. Genetics 32, 107-108,
all of which are incorporated by reference herein in their
entirety). In this method, a solution of siRNA is injected
intravenously into the animal. The siRNA can then reach an organ or
tissue of interest and effectively reduce the expression of the
target gene in the organ or tissue of the animal.
5.7. Production of CR Proteins and Peptides
[0337] Chemotherapy response proteins, or peptide fragments
thereof, can be prepared for uses according to the present
invention. For example, chemotherapy response proteins, or peptide
fragments thereof, can be used for the generation of antibodies, in
diagnostic assays, for screening of inhibitors, or for the
identification of other cellular gene products involved in the
regulation of expression and/or activity of a chemotherapy response
gene.
[0338] The chemotherapy response proteins or peptide fragments
thereof, may be produced by recombinant DNA technology using
techniques well known in the art. The amino acid sequences of the
chemotherapy response proteins are well-known and can be obtained
from, e.g., GenBank.RTM.. Methods which are well known to those
skilled in the art can be used to construct expression vectors
containing chemotherapy response protein coding sequences and
appropriate transcriptional and translational control signals.
These methods include, for example, in vitro recombinant DNA
techniques, synthetic techniques, and in vivo genetic
recombination. See, for example, the techniques described in
Sambrook et al., 1989, supra, and Ausubel et al., 1989, supra.
Alternatively, RNA capable of encoding chemotherapy response
protein sequences may be chemically synthesized using, for example,
synthesizers. See, for example, the techniques described in
"Oligonucleotide Synthesis", 1984, Gait, M. J. ed., IRL Press,
Oxford, which is incorporated herein by reference in its
entirety.
[0339] A variety of host-expression vector systems may be utilized
to express the chemotherapy response gene coding sequences. Such
host-expression systems represent vehicles by which the coding
sequences of interest may be produced and subsequently purified,
but also represent cells which may, when transformed or transfected
with the appropriate nucleotide coding sequences, exhibit the
chemotherapy response protein in situ. These include but are not
limited to microorganisms such as bacteria (e.g., E. coli, B.
subtilis) transformed with recombinant bacteriophage DNA, plasmid
DNA or cosmid DNA expression vectors containing chemotherapy
response protein coding sequences; yeast (e.g., Saccharomyces,
Pichia) transformed with recombinant yeast expression vectors
containing the chemotherapy response protein coding sequences;
insect cell systems infected with recombinant virus expression
vectors (e.g., baculovirus) containing the chemotherapy response
protein coding sequences; plant cell systems infected with
recombinant virus expression vectors (e.g., cauliflower mosaic
virus, CaMV; tobacco mosaic virus, TMV) or transformed with
recombinant plasmid expression vectors (e.g., Ti plasmid)
containing chemotherapy response protein coding sequences; or
mammalian cell systems (e.g., COS, CHO, BHK, 293, 3T3, N2a)
harboring recombinant expression constructs containing promoters
derived from the genome of mammalian cells (e.g., metallothionein
promoter) or from mammalian viruses (e.g., the adenovirus late
promoter; the vaccinia virus 7.5K promoter).
[0340] In bacterial systems, a number of expression vectors may be
advantageously selected depending upon the use intended for the
chemotherapy response protein being expressed. For example, when a
large quantity of such a protein is to be produced, for the
generation of pharmaceutical compositions of chemotherapy response
protein or for raising antibodies to chemotherapy response protein,
for example, vectors which direct the expression of high levels of
fusion protein products that are readily purified may be desirable.
Such vectors include, but are not limited, to the E. coli
expression vector pUR278 (Ruther et al., 1983, EMBO J. 2:1791), in
which the chemotherapy response protein coding sequence may be
ligated individually into the vector in frame with the lac Z coding
region so that a fusion protein is produced; pIN vectors (Inouye
& Inouye, 1985, Nucleic Acids Res. 13:3101-3109; Van Heeke
& Schuster, 1989, J. Biol. Chem. 264:5503-5509); and the like.
pGEX vectors may also be used to express foreign polypeptides as
fusion proteins with glutathione S-transferase (GST). In general,
such fusion proteins are soluble and can easily be purified from
lysed cells by adsorption to glutathione-agarose beads followed by
elution in the presence of free glutathione. The pGEX vectors are
designed to include thrombin or factor Xa protease cleavage sites
so that the cloned target gene product can be released from the GST
moiety.
[0341] In an insect system, Autographa californica nuclear
polyhedrosis virus (AcNPV) is used as a vector to express foreign
genes. The virus grows in Spodoptera frugiperda cells. The
chemotherapy response gene coding sequence may be cloned
individually into non-essential regions (for example the polyhedrin
gene) of the virus and placed under control of an AcNPV promoter
(for example the polyhedrin promoter). Successful insertion of
chemotherapy response gene coding sequence will result in
inactivation of the polyhedrin gene and production of non-occluded
recombinant virus (i.e., virus lacking the proteinaceous coat coded
for by the polyhedrin gene). These recombinant viruses are then
used to infect Spodoptera frugiperda cells in which the inserted
gene is expressed. (E.g., see Smith et al., 1983, J. Virol. 46:
584; Smith, U.S. Pat. No. 4,215,051).
[0342] In mammalian host cells, a number of viral-based expression
systems may be utilized. In cases where an adenovirus is used as an
expression vector, the chemotherapy response gene coding sequence
of interest may be ligated to an adenovirus
transcription/translation control complex, e.g., the late promoter
and tripartite leader sequence. This chimeric gene may then be
inserted in the adenovirus genome by in vitro or in vivo
recombination. Insertion in a non-essential region of the viral
genome (e.g., region E1 or E3) will result in a recombinant virus
that is viable and capable of expressing chemotherapy response
protein in infected hosts. (E.g., See Logan & Shenk, 1984,
Proc. Natl. Acad. Sci. USA 81:3655-3659). Specific initiation
signals may also be required for efficient translation of inserted
chemotherapy response protein coding sequences. These signals
include the ATG initiation codon and adjacent sequences. In cases
where an entire chemotherapy response gene, including its own
initiation codon and adjacent sequences, is inserted into the
appropriate expression vector, no additional translational control
signals may be needed. However, in cases where only a portion of
the chemotherapy response gene coding sequence is inserted,
exogenous translational control signals, including, perhaps, the
ATG initiation codon, must be provided. Furthermore, the initiation
codon must be in phase with the reading frame of the desired coding
sequence to ensure translation of the entire insert. These
exogenous translational control signals and initiation codons can
be of a variety of origins, both natural and synthetic. The
efficiency of expression may be enhanced by the inclusion of
appropriate transcription enhancer elements, transcription
terminators, etc. (see Bittner et al., 1987, Methods in Enzymol.
153:516-544).
[0343] In addition, a host cell strain may be chosen which
modulates the expression of the inserted sequences, or modifies and
processes the gene product in the specific fashion desired. Such
modifications (e.g., glycosylation) and processing (e.g., cleavage)
of protein products may be important for the function of the
protein. Different host cells have characteristic and specific
mechanisms for the post-translational processing and modification
of proteins and gene products. Appropriate cell lines or host
systems can be chosen to ensure the correct modification and
processing of the foreign protein expressed. To this end,
eukaryotic host cells which possess the cellular machinery for
proper processing of the primary transcript, glycosylation, and
phosphorylation of the gene product may be used. Such mammalian
host cells include but are not limited to CHO, VERO, BHK, HeLa,
COS, MDCK, 293, 3T3, WI38.
[0344] For long-term, high-yield production of recombinant
proteins, stable expression is preferred. For example, cell lines
which stably express the chemotherapy response protein may be
engineered. Rather than using expression vectors which contain
viral origins of replication, host cells can be transformed with
DNA controlled by appropriate expression control elements (e.g.,
promoter, enhancer, sequences, transcription terminators,
polyadenylation sites, etc.), and a selectable marker. Following
the introduction of the foreign DNA, engineered cells may be
allowed to grow for 1-2 days in an enriched media, and then are
switched to a selective media. The selectable marker in the
recombinant plasmid confers resistance to the selection and allows
cells to stably integrate the plasmid into their chromosomes and
grow to form foci which in turn can be cloned and expanded into
cell lines. This method may advantageously be used to engineer cell
lines which express the chemotherapy response protein. Such
engineered cell lines may be particularly useful in screening and
evaluation of compounds that affect the endogenous activity of the
chemotherapy response protein.
[0345] In another embodiment, the expression characteristics of an
endogenous gene (e.g., a chemotherapy response gene) within a cell,
cell line or microorganism may be modified by inserting a DNA
regulatory element heterologous to the endogenous gene of interest
into the genome of a cell, stable cell line or cloned microorganism
such that the inserted regulatory element is operatively linked
with the endogenous gene (e.g., a chemotherapy response gene) and
controls, modulates, activates, or inhibits the endogenous gene.
For example, endogenous chemotherapy response genes which are
normally "transcriptionally silent", i.e., a chemotherapy response
gene which is normally not expressed, or is expressed only at very
low levels in a cell line or microorganism, may be activated by
inserting a regulatory element which is capable of promoting the
expression of the gene product in that cell line or microorganism.
Alternatively, transcriptionally silent, endogenous chemotherapy
response genes may be activated by insertion of a promiscuous
regulatory element that works across cell types.
[0346] A heterologous regulatory element may be inserted into a
stable cell line or cloned microorganism, such that it is
operatively linked with and activates or inhibits expression of
endogenous chemotherapy response genes, using techniques, such as
targeted homologous recombination, which are well known to those of
skill in the art, and described e.g., in Chappel, U.S. Pat. No.
5,272,071; PCT Publication No. WO 91/06667 published May 16, 1991;
Skoultchi, U.S. Pat. No. 5,981,214; and Treco et al U.S. Pat. No.
5,968,502 and PCT Publication No. WO 94/12650 published Jun. 9,
1994. Alternatively, non-targeted, e.g. non-homologous
recombination techniques may be used which are well-known to those
of skill in the art and described, e.g., in PCT Publication No. WO
99/15650 published Apr. 1, 1999.
[0347] Chemotherapy response gene activation (or inactivation) may
also be accomplished using designer transcription factors using
techniques well known in the art. Briefly, a designer zinc finger
protein transcription factor (ZFP-TF) is made which is specific for
a regulatory region of the chemotherapy response gene to be
activated or inactivated. A construct encoding this designer ZFP-TF
is then provided to a host cell in which the chemotherapy response
gene is to be controlled. The construct directs the expression of
the designer ZFP-TF protein, which in turn specifically modulates
the expression of the endogenous chemotherapy response gene. The
following references relate to various aspects of this approach in
further detail: Wang & Pabo, 1999, Proc. Natl. Acad. Sci. USA
96, 9568; Berg, 1997, Nature Biotechnol. 15, 323; Greisman &
Pabo, 1997, Science 275, 657; Berg & Shi, 1996, Science 271,
1081; Rebar & Pabo, 1994, Science 263, 671; Rhodes & Klug,
1993, Scientific American 269, 56; Pavletich & Pabo, 1991,
Science 252, 809; Liu et al., 2001, J. Biol. Chem. 276, 11323;
Zhang et al., 2000, J. Biol. Chem. 275, 33850; Beerli et al., 2000,
Proc. Natl. Acad. Sci. USA 97, 1495; Kang et al., 2000, J. Biol.
Chem. 275, 8742; Beerli et al., 1998, Proc. Natl. Acad. Sci. USA
95, 14628; Kim & Pabo, 1998, Proc. Natl. Acad. Sci. USA 95,
2812; Choo et al., 1997, J. Mol. Biol. 273, 525; Kim & Pabo,
1997, J. Biol. Chem. 272, 29795; Liu et al, 1997, Proc. Natl. Acad.
Sci. USA 94, 5525; Kim et al, 1997, Proc. Natl. Acad. Sci. USA 94,
3616; Kikyo et al., 2000, Science 289, 2360; Robertson &
Wolffe, 2000, Nature Reviews 1, 11; and Gregory, 2001, Curr. Opin.
Genet. Devt.11:142.
[0348] A number of selection systems may be used, including but not
limited to the herpes simplex virus thymidine kinase (Wigler, et
al., 1977, Cell 11:223), hypoxanthine-guanine phosphoribosyl
transferase (Szybalska & Szybalski, 1962, Proc. Natl. Acad.
Sci. USA 48:2026), and adenine phosphoribosyl transferase (Lowy, et
al., 1980, Cell 22:817) genes can be employed in tk.sup.-,
hgprt.sup.- or aprt.sup.- cells, respectively. Also, antimetabolite
resistance can be used as the basis of selection for the following
genes: dhfr, which confers resistance to methotrexate (Wigler, et
al., 1980, Natl. Acad. Sci. USA 77:3567; O'Hare, et al., 1981,
Proc. Natl. Acad. Sci. USA 78:1527); gpt, which confers resistance
to mycophenolic acid (Mulligan & Berg, 1981, Proc. Natl. Acad.
Sci. USA 78:2072); neo, which confers resistance to the
aminoglycoside G-418 (Colberre-Garapin, et al., 1981, J. Mol. Biol.
150:1); and hygro, which confers resistance to hygromycin
(Santerre, et al., 1984, Gene 30:147).
[0349] Alternatively, any fusion protein may be readily purified by
utilizing an antibody specific for the fusion protein being
expressed. For example, a system described by Janknecht et al.
allows for the ready purification of non-denatured fusion proteins
expressed in human cell lines (Janknecht, et al., 1991, Proc. Natl.
Acad. Sci. USA 88: 8972-8976). In this system, the gene of interest
is subcloned into a vaccinia recombination plasmid such that the
gene's open reading frame is translationally fused to an
amino-terminal tag consisting of six histidine residues. Extracts
from cells infected with recombinant vaccinia virus are loaded onto
Ni.sup.2+.cndot.nitriloacetic acid-agarose columns and
histidine-tagged proteins are selectively eluted with
imidazole-containing buffers.
[0350] In a specific embodiment, recombinant human chemotherapy
response proteins can be expressed as a fusion protein with
glutathione S-transferase at the amino-terminus (GST-chemotherapy
response protein) using standard baculovirus vectors and a
(Bac-to-Bac.RTM.) insect cell expression system purchased from
GIBCO.TM. Invitrogen. Recombinant protein expressed in insect cells
can be purified using glutathione sepharose (Amersham Biotech)
using standard procedures described by the manufacturer.
5.8. Production of Antibodies that Bind a CR Protein
[0351] Chemotherapy response protein or a fragment thereof can be
used to raise antibodies which bind chemotherapy response protein.
Such antibodies include but are not limited to polyclonal,
monoclonal, chimeric, single chain, Fab fragments, and an Fab
expression library. In a preferred embodiment, anti chemotherapy
response protein C-terminal antibodies are raised using an
appropriate C-terminal fragment of a chemotherapy response protein,
e.g., the kinase domain. Such antibodies bind the kinase domain of
the chemotherapy response protein. In another preferred embodiment,
anti chemotherapy response protein N-terminal antibodies are raised
using an appropriate N-terminal fragment of a chemotherapy response
protein. The N-terminal domain of a chemotherapy response protein
is less homologous to other kinases, and therefore offered a more
specific target for a particular chemotherapy response protein.
5.8.1. Production of Monoclonal Antibodies Specific for a CR
Protein
[0352] Antibodies can be prepared by immunizing a suitable subject
with a chemotherapy response protein or a fragment thereof as an
immunogen. The antibody titer in the immunized subject can be
monitored over time by standard techniques, such as with an enzyme
linked immunosorbent assay (ELISA) using immobilized polypeptide.
If desired, the antibody molecules can be isolated from the mammal
(e.g., from the blood) and further purified by well-known
techniques, such as protein A chromatography to obtain the IgG
fraction.
[0353] At an appropriate time after immunization, e.g., when the
specific antibody titers are highest, antibody-producing cells can
be obtained from the subject and used to prepare monoclonal
antibodies by standard techniques, such as the hybridoma technique
originally described by Kohler and Milstein (1975, Nature
256:495-497), the human B cell hybridoma technique by Kozbor et al.
(1983, Immunol. Today 4:72), the EBV-hybridoma technique by Cole et
al. (1985, Monoclonal Antibodies and Cancer Therapy, Alan R. Liss,
Inc., pp. 77-96) or trioma techniques. The technology for producing
hybridomas is well known (see Current Protocols in Immunology,
1994, John Wiley & Sons, Inc., New York, N.Y.). Hybridoma cells
producing a monoclonal antibody are detected by screening the
hybridoma culture supernatants for antibodies that bind the
polypeptide of interest, e.g., using a standard ELISA assay.
[0354] Monoclonal antibodies are obtained from a population of
substantially homogeneous antibodies, i.e., the individual
antibodies comprising the population are identical except for
possible naturally occurring mutations that may be present in minor
amounts. Thus, the modifier "monoclonal" indicates the character of
the antibody as not being a mixture of discrete antibodies. For
example, the monoclonal antibodies may be made using the hybridoma
method first described by Kohler et al., 1975, Nature, 256:495, or
may be made by recombinant DNA methods (U.S. Pat. No. 4,816,567).
The term "monoclonal antibody" as used herein also indicates that
the antibody is an immunoglobulin.
[0355] In the hybridoma method of generating monoclonal antibodies,
a mouse or other appropriate host animal, such as a hamster, is
immunized as hereinabove described to elicit lymphocytes that
produce or are capable of producing antibodies that will
specifically bind to the protein used for immunization (see, e.g.,
U.S. Pat. No. 5,914,112, which is incorporated herein by reference
in its entirety).
[0356] Alternatively, lymphocytes may be immunized in vitro.
Lymphocytes then are fused with myeloma cells using a suitable
fusing agent, such as polyethylene glycol, to form a hybridoma cell
(Goding, Monoclonal Antibodies: Principles and Practice, pp. 59-103
(Academic Press, 1986)). The hybridoma cells thus prepared are
seeded and grown in a suitable culture medium that preferably
contains one or more substances that inhibit the growth or survival
of the unfused, parental myeloma cells. For example, if the
parental myeloma cells lack the enzyme hypoxanthine guanine
phosphoribosyl transferase (HGPRT or HPRT), the culture medium for
the hybridomas typically will include hypoxanthine, aminopterin,
and thymidine (HAT medium), which substances prevent the growth of
HGPRT-deficient cells.
[0357] Preferred myeloma cells are those that fuse efficiently,
support stable high-level production of antibody by the selected
antibody-producing cells, and are sensitive to a medium such as HAT
medium. Among these, preferred myeloma cell lines are murine
myeloma lines, such as those derived from MOPC-21 and MPC-11 mouse
tumors available from the Salk Institute Cell Distribution Center,
San Diego, Calif. USA, and SP-2 cells available from the American
Type Culture Collection, Rockville, Md. USA.
[0358] Human myeloma and mouse-human heteromyeloma cell lines also
have been described for the production of human monoclonal
antibodies (Kozbor, 1984, J. Immunol., 133:3001; Brodeur et al.,
Monoclonal Antibody Production Techniques and Applications, pp.
51-63 (Marcel Dekker, Inc., New York, 1987)). Culture medium in
which hybridoma cells are growing is assayed for production of
monoclonal antibodies directed against the antigen. Preferably, the
binding specificity of monoclonal antibodies produced by hybridoma
cells is determined by immunoprecipitation or by an in vitro
binding assay, such as radioimmunoassay (RIA) or enzyme-linked
immuno-absorbent assay (ELISA). The binding affinity of the
monoclonal antibody can, for example, be determined by the
Scatchard analysis of Munson et al., 1980, Anal. Biochem.,
107:220.
[0359] After hybridoma cells are identified that produce antibodies
of the desired specificity, affinity, and/or activity, the clones
may be subcloned by limiting dilution procedures and grown by
standard methods (Goding, Monoclonal Antibodies: Principles and
Practice, pp. 59-103, Academic Press, 1986). Suitable culture media
for this purpose include, for example, D-MEM or RPMI-1640 medium.
In addition, the hybridoma cells may be grown in vivo as ascites
tumors in an animal. The monoclonal antibodies secreted by the
subclones are suitably separated from the culture medium, ascites
fluid, or serum by conventional immunoglobulin purification
procedures such as, for example, protein A-Sepharose,
hydroxylapatite chromatography, gel electrophoresis, dialysis, or
affinity chromatography.
[0360] Alternative to preparing monoclonal antibody-secreting
hybridomas, a monoclonal antibody directed against a chemotherapy
response protein or a fragment thereof can be identified and
isolated by screening a recombinant combinatorial immunoglobulin
library (e.g., an antibody phage display library) with the
chemotherapy response protein or the fragment. Kits for generating
and screening phage display libraries are commercially available
(e.g., Pharmacia Recombinant Phage Antibody System, Catalog No.
27-9400-01; and the Stratagene antigen SurfZAP.TM. Phage Display
Kit, Catalog No. 240612). Additionally, examples of methods and
reagents particularly amenable for use in generating and screening
antibody display library can be found in, for example, U.S. Pat.
Nos. 5,223,409 and 5,514,548; PCT Publication No. WO 92/18619; PCT
Publication No. WO 91/17271; PCT Publication No. WO 92/20791; PCT
Publication No. WO 92/15679; PCT Publication No. WO 93/01288; PCT
Publication No. WO 92/01047; PCT Publication No. WO 92/09690; PCT
Publication No. WO 90/02809; Fuchs et al., 1991, Bio/Technology
9:1370-1372; Hay et al., 1992, Hum. Antibod. Hybridomas 3:81-85;
Huse et al., 1989, Science 246:1275-1281; Griffiths et al., 1993,
EMBO J. 12:725-734.
[0361] In addition, techniques developed for the production of
"chimeric antibodies" (Morrison, et al., 1984, Proc. Natl. Acad.
Sci., 81, 6851-6855; Neuberger, et al., 1984, Nature 312, 604-608;
Takeda, et al., 1985, Nature, 314, 452-454) by splicing the genes
from a mouse antibody molecule of appropriate antigen specificity
together with genes from a human antibody molecule of appropriate
biological activity can be used. A chimeric antibody is a molecule
in which different portions are derived from different animal
species, such as those having a variable region derived from a
murine mAb and a human immunoglobulin constant region. (See, e.g.,
Cabilly et al., U.S. Pat. No. 4,816,567; and Boss et al., U.S. Pat.
No. 4,816,397, which are incorporated herein by reference in their
entirety.)
[0362] Humanized antibodies are antibody molecules from non-human
species having one or more complementarity determining regions
(CDRs) from the non-human species and a framework region from a
human immunoglobulin molecule. (See e.g., U.S. Pat. No. 5,585,089,
which is incorporated herein by reference in its entirety.) Such
chimeric and humanized monoclonal antibodies can be produced by
recombinant DNA techniques known in the art, for example using
methods described in PCT Publication No. WO 87/02671; European
Patent Application 184,187; European Patent Application 171,496;
European Patent Application 173,494; PCT Publication No. WO
86/01533; U.S. Pat. No. 4,816,567 and 5,225,539; European Patent
Application 125,023; Better et al., 1988, Science 240:1041-1043;
Liu et al., 1987, Proc. Natl. Acad. Sci. USA 84:3439-3443; Liu et
al., 1987, J. Immunol. 139:3521-3526; Sun et al., 1987, Proc. Natl.
Acad. Sci. USA 84:214-218; Nishimura et al., 1987, Canc. Res.
47:999-1005; Wood et al., 1985, Nature 314:446-449; Shaw et al.,
1988, J. Natl. Cancer Inst. 80:1553-1559; Morrison 1985, Science
229:1202-1207; Oi et al., 1986, Bio/Techniques 4:214; Jones et al.,
1986, Nature 321:552-525; Verhoeyan et al., 1988, Science 239:1534;
and Beidler et al., 1988, J. Immunol. 141:4053-4060.
[0363] Complementarity determining region (CDR) grafting is another
method of humanizing antibodies. It involves reshaping murine
antibodies in order to transfer full antigen specificity and
binding affinity to a human framework (Winter et al. U.S. Pat. No.
5,225,539). CDR-grafted antibodies have been successfully
constructed against various antigens, for example, antibodies
against IL-2 receptor as described in Queen et al., 1989 (Proc.
Natl. Acad. Sci. USA 86:10029); antibodies against cell surface
receptors-CAMPATH as described in Riechmann et al. (1988, Nature,
332:323; antibodies against hepatitis B in Cole et al. (1991, Proc.
Natl. Acad. Sci. USA 88:2869); as well as against viral
antigens-respiratory syncitial virus in Tempest et al. (1991,
Bio-Technology 9:267). CDR-grafted antibodies are generated in
which the CDRs of the murine monoclonal antibody are grafted into a
human antibody. Following grafting, most antibodies benefit from
additional amino acid changes in the framework region to maintain
affinity, presumably because framework residues are necessary to
maintain CDR conformation, and some framework residues have been
demonstrated to be part of the antigen binding site. However, in
order to preserve the framework region so as not to introduce any
antigenic site, the sequence is compared with established germline
sequences followed by computer modeling.
[0364] Completely human antibodies are particularly desirable for
therapeutic treatment of human patients. Such antibodies can be
produced using transgenic mice which are incapable of expressing
endogenous immunoglobulin heavy and light chain genes, but which
can express human heavy and light chain genes. The transgenic mice
are immunized in the normal fashion with a chemotherapy response
protein.
[0365] Monoclonal antibodies directed against a chemotherapy
response protein can be obtained using conventional hybridoma
technology. The human immunoglobulin transgenes harbored by the
transgenic mice rearrange during B cell differentiation, and
subsequently undergo class switching and somatic mutation. Thus,
using such a technique, it is possible to produce therapeutically
useful IgG, IgA and IgE antibodies. For an overview of this
technology for producing human antibodies, see Lonberg and Huszar
(1995, Int. Rev. Immunol. 13:65-93). For a detailed discussion of
this technology for producing human antibodies and human monoclonal
antibodies and protocols for producing such antibodies, see e.g.,
U.S. Pat. No. 5,625,126; U.S. Pat. No. 5,633,425; U.S. Pat. No.
5,569,825; U.S. Pat. No. 5,661,016; and U.S. Pat. No. 5,545,806. In
addition, companies such as Abgenix, Inc. (Freemont, Calif., see,
for example, U.S. Pat. No. 5,985,615) and Medarex, Inc. (Princeton,
N.J.), can be engaged to provide human antibodies directed against
a chemotherapy response protein or a fragment thereof using
technology similar to that described above.
[0366] Completely human antibodies which recognize and bind a
selected epitope can be generated using a technique referred to as
"guided selection." In this approach a selected non-human
monoclonal antibody, e.g., a mouse antibody, is used to guide the
selection of a completely human antibody recognizing the same
epitope (Jespers et al., 1994, Bio/technology 12:899-903).
[0367] A pre-existing anti-chemotherapy response protein antibody
can be used to isolate additional antigens of the chemotherapy
response protein by standard techniques, such as affinity
chromatography or immunoprecipitation for use as immunogens.
Moreover, such an antibody can be used to detect the protein (e.g.,
in a cellular lysate or cell supernatant) in order to evaluate the
abundance and pattern of expression of chemotherapy response
protein. Detection can be facilitated by coupling the antibody to a
detectable substance. Examples of detectable substances include
various enzymes, prosthetic groups, fluorescent materials,
luminescent materials, bioluminescent materials, and radioactive
materials. Examples of suitable enzymes include horseradish
peroxidase, alkaline phosphatase, beta-galactosidase, or
acetylcholinesterase; examples of suitable prosthetic group
complexes include streptavidin/biotin and avidin/biotin; examples
of suitable fluorescent materials include umbelliferone,
fluorescein, fluorescein isothiocyanate, rhodamine,
dichlorotriazinylamine fluorescein, dansyl chloride or
phycoerythrin; an example of a luminescent material includes
luminol; examples of bioluminescent materials include luciferase,
luciferin, and aequorin, and examples of suitable radioactive
material include .sup.125I, .sup.131I, .sup.35S or .sup.3H.
5.8.2. Production of Polyclonal Anti-CR Protein Antibodies
[0368] The anti-chemotherapy response protein antibodies can be
produced by immunization of a suitable animal, such as but are not
limited to mouse, rabbit, and horse.
[0369] An immunogenic preparation comprising a chemotherapy
response protein or a fragment thereof can be used to prepare
antibodies by immunizing a suitable subject (e.g., rabbit, goat,
mouse or other mammal). An appropriate immunogenic preparation can
contain, for example, recombinantly expressed or chemically
synthesized chemotherapy response protein peptide or polypeptide.
The preparation can further include an adjuvant, such as Freund's
complete or incomplete adjuvant, or similar immunostimulatory
agent.
[0370] A fragment of a chemotherapy response protein suitable for
use as an immunogen comprises at least a portion of the
chemotherapy response protein that is 8 amino acids, more
preferably 10 amino acids and more preferably still, 15 amino acids
long.
[0371] The invention also provides chimeric or fusion chemotherapy
response protein polypeptides for use as immunogens. As used
herein, a "chimeric" or "fusion" chemotherapy response protein
polypeptide comprises all or part of a chemotherapy response
protein polypeptide operably linked to a heterologous polypeptide.
Within the fusion chemotherapy response protein polypeptide, the
term "operably linked" is intended to indicate that the
chemotherapy response protein polypeptide and the heterologous
polypeptide are fused in-frame to each other. The heterologous
polypeptide can be fused to the N-terminus or C-terminus of the
chemotherapy response protein polypeptide.
[0372] One useful fusion chemotherapy response protein polypeptide
is a GST fusion chemotherapy response protein polypeptide in which
the chemotherapy response protein polypeptide is fused to the
C-terminus of GST sequences. Such fusion chemotherapy response
protein polypeptides can facilitate the purification of a
recombinant chemotherapy response protein polypeptide.
[0373] In another embodiment, the fusion chemotherapy response
protein polypeptide contains a heterologous signal sequence at its
N-terminus so that the chemotherapy response protein polypeptide
can be secreted and purified to high homogeneity in order to
produce high affinity antibodies. For example, the native signal
sequence of an immunogen can be removed and replaced with a signal
sequence from another protein. For example, the gp67 secretory
sequence of the baculovirus envelope protein can be used as a
heterologous signal sequence (Current Protocols in Molecular
Biology, Ausubel et al., eds., John Wiley & Sons, 1992). Other
examples of eukaryotic heterologous signal sequences include the
secretory sequences of melittin and human placental alkaline
phosphatase (Stratagene; La Jolla, Calif.). In yet another example,
useful prokaryotic heterologous signal sequences include the phoA
secretory signal and the protein A secretory signal (Pharmacia
Biotech; Piscataway, N.J.).
[0374] In yet another embodiment, the fusion chemotherapy response
protein polypeptide is an immunoglobulin fusion protein in which
all or part of a chemotherapy response protein polypetide is fused
to sequences derived from a member of the immunoglobulin protein
family. The immunoglobulin fusion proteins can be used as
immunogens to produce antibodies directed against the chemotherapy
response protein polypetide in a subject.
[0375] Chimeric and fusion chemotherapy response protein
polypeptide can be produced by standard recombinant DNA techniques.
In one embodiment, the fusion gene can be synthesized by
conventional techniques including automated DNA synthesizers.
Alternatively, PCR amplification of gene fragments can be carried
out using anchor primers which give rise to complementary overhangs
between two consecutive gene fragments which can subsequently be
annealed and re-amplified to generate a chimeric gene sequence
(e.g., Ausubel et al., supra). Moreover, many expression vectors
are commercially available that already encode a fusion domain
(e.g., a GST polypeptide). A nucleic acid encoding an immunogen can
be cloned into such an expression vector such that the fusion
domain is linked in-frame to the polypeptide.
[0376] The chemotherapy response protein immunogenic preparation is
then used to immunize a suitable animal. Preferably, the animal is
a specialized transgenic animal that can secret human antibody.
Non-limiting examples include transgenic mouse strains which can be
used to produce a polyclonal population of antibodies directed to a
specific pathogen (Fishwild et al., 1996, Nature Biotechnology
14:845-851; Mendez et al., 1997, Nature Genetics 15:146-156). In
one embodiment of the invention, transgenic mice that harbor the
unrearranged human immunoglobulin genes are immunized with the
target immunogens. After a vigorous immune response against the
immunogenic preparation has been elicited in the mice, blood
samples of the mice are collected and a purified preparation of
human IgG molecules can be produced from the plasma or serum. Any
method known in the art can be used to obtain the purified
preparation of human IgG molecules, including but is not limited to
affinity column chromatography using anti-human IgG antibodies
bound to a suitable column matrix. Anti-human IgG antibodies can be
obtained from any sources known in the art, e.g., from commercial
sources such as Dako Corporation and ICN. The preparation of IgG
molecules produced comprises a polyclonal population of IgG
molecules that bind to the immunogen or immunogens at different
degree of affinity. Preferably, a substantial fraction of the
preparation contains IgG molecules specific to the immunogen or
immunogens. Although polyclonal preparations of IgG molecules are
described, it is understood that polyclonal preparations comprising
any one type or any combination of different types of
immunoglobulin molecules are also envisioned and are intended to be
within the scope of the present invention.
[0377] A population of antibodies directed to a chemotherapy
response protein can be produced from a phage display library.
Polyclonal antibodies can be obtained by affinity screening of a
phage display library having a sufficiently large and diverse
population of specificities with a chemotherapy response protein or
a fragment thereof. Examples of methods and reagents particularly
amenable for use in generating and screening antibody display
library can be found in, for example, U.S. Patent Nos. 5,223,409
and 5,514,548; PCT Publication No. WO 92/18619; PCT Publication No.
WO 91/17271; PCT Publication No. WO 92/20791; PCT Publication No.
WO 92/15679; PCT Publication No. WO 93/01288; PCT Publication No.
WO 92/01047; PCT Publication No. WO 92/09690; PCT Publication No.
WO 90/02809; Fuchs et al., 1991, Bio/Technology 9:1370-1372; Hay et
al., 1992, Hum. Antibod. Hybridomas 3:81-85; Huse et al., 1989,
Science 246:1275-1281; Griffiths et al., 1993, EMBO J. 12:725-734.
A phage display library permits selection of desired antibody or
antibodies from a very large population of specificities. An
additional advantage of a phage display library is that the nucleic
acids encoding the selected antibodies can be obtained
conveniently, thereby facilitating subsequent construction of
expression vectors.
[0378] In other preferred embodiments, the population of antibodies
directed to a chemotherapy response protein or a fragment thereof
is produced by a method using the whole collection of selected
displayed antibodies without clonal isolation of individual members
as described in U.S. Pat. No. 6,057,098, which is incorporated by
reference herein in its entirety. Polyclonal antibodies are
obtained by affinity screening of a phage display library having a
sufficiently large repertoire of specificities with, e.g., an
antigenic molecule having multiple epitopes, preferably after
enrichment of displayed library members that display multiple
antibodies. The nucleic acids encoding the selected display
antibodies are excised and amplified using suitable PCR primers.
The nucleic acids can be purified by gel electrophoresis such that
the full length nucleic acids are isolated. Each of the nucleic
acids is then inserted into a suitable expression vector such that
a population of expression vectors having different inserts is
obtained. The population of expression vectors is then expressed in
a suitable host.
5.8.3 Production of Peptides
[0379] A chemotherapy response protein-binding peptide or
polypeptide or peptide or polypeptide of a chemotherapy response
protein may be produced by recombinant DNA technology using
techniques well known in the art. Thus, the polypeptide or peptide
can be produced by expressing nucleic acid containing sequences
encoding the polypeptide or peptide. Methods which are well known
to those skilled in the art can be used to construct expression
vectors containing coding sequences and appropriate transcriptional
and translational control signals. These methods include, for
example, in vitro recombinant DNA techniques, synthetic techniques,
and in vivo genetic recombination. See, for example, the techniques
described in Sambrook et al., 1989, supra, and Ausubel et al.,
1989, supra. Alternatively, RNA capable of encoding chemotherapy
response protein polypeptide sequences may be chemically
synthesized using, for example, synthesizers. See, for example, the
techniques described in "Oligonucleotide Synthesis", 1984, Gait, M.
J. ed., IRL Press, Oxford, which is incorporated herein by
reference in its entirety.
5.9. Chemotherapeutic Drugs
[0380] The invention can be practiced with any known
chemotherapeutic drugs, including but not limited to DNA damaging
agents, anti-metabolites, anti-mitotic agents, or a combination of
two or more of such known anti-cancer agents.
[0381] DNA damage agents cause chemical damage to DNA and/or RNA.
DNA damage agents can disrupt DNA replication or cause the
generation of nonsense DNA or RNA. DNA damaging agents include but
are not limited to topoisomerase inhibitor, DNA binding agent, and
ionizing radiation. A topoisomerase inhibitor that can be used in
conjunction with the invention can be a topoisomerase I (Topo I)
inhibitor, a topoisomerase II (Topo II) inhibitor, or a dual
topoisomerase I and II inhibitor. A topo I inhibitor can be for
example from any of the following classes of compounds:
camptothecin analogue (e.g., karenitecin, aminocamptothecin,
lurtotecan, topotecan, irinotecan, BAY 56-3722, rubitecan, GI14721,
exatecan mesylate), rebeccamycin analogue, PNU 166148,
rebeccamycin, TAS-103, camptothecin (e.g., camptothecin
polyglutamate, camptothecin sodium), intoplicine, ecteinascidin
743, J-107088, pibenzimol. Examples of preferred topo I inhibitors
include but are not limited to camptothecin, topotecan
(hycaptamine), irinotecan (irinotecan hydrochloride), belotecan, or
an analogue or derivative of any of the foregoing.
[0382] A topo II inhibitor that can be used in conjunction with the
invention can be for example from any of the following classes of
compounds: anthracycline antibiotics (e.g., carubicin, pirarubicin,
daunorubicin citrate liposomal, daunomycin,
4-iodo-4-doxydoxorubicin, doxorubicin, n,n-dibenzyl daunomycin,
morpholinodoxorubicin, aclacinomycin antibiotics, duborimycin,
menogaril, nogalamycin, zorubicin, epirubicin, marcellomycin,
detorubicin, annamycin, 7-cyanoquinocarcinol, deoxydoxorubicin,
idarubicin, GPX-100, MEN-10755, valrubicin, KRN5500),
epipodophyllotoxin compound (e.g., podophyllin, teniposide,
etoposide, GL331, 2-ethylhydrazide), anthraquinone compound (e.g.,
ametantrone, bisantrene, mitoxantrone, anthraquinone),
ciprofloxacin, acridine carboxamide, amonafide, anthrapyrazole
antibiotics (e.g., teloxantrone, sedoxantrone trihydrochloride,
piroxantrone, anthrapyrazole, losoxantrone), TAS-103, fostriecin,
razoxane, XK469R, XK469, chloroquinoxaline sulfonamide, merbarone,
intoplicine, elsamitrucin, CI-921, pyrazoloacridine, elliptinium,
amsacrine. Examples of preferred topo II inhibitors include but are
not limited to doxorubicin (Adriamycin), etoposide phosphate
(etopofos), teniposide, sobuzoxane, or an analogue or derivative of
any of the foregoing.
[0383] DNA binding agents that can be used in conjunction with the
invention include but are not limited to a DNA groove binding
agent, e.g., DNA minor groove binding agent; DNA crosslinking
agent; intercalating agent; and DNA adduct forming agent. A DNA
minor groove binding agent can be an anthracycline antibiotic,
mitomycin antibiotic (e.g., porfiromycin, KW-2149, mitomycin B,
mitomycin A, mitomycin C), chromomycin A3, carzelesin, actinomycin
antibiotic (e.g., cactinomycin, dactinomycin, actinomycin F1),
brostallicin, echinomycin, bizelesin, duocarmycin antibiotic (e.g.,
KW 2189), adozelesin, olivomycin antibiotic, plicamycin,
zinostatin, distamycin, MS-247, ecteinascidin 743, amsacrine,
anthramycin, and pibenzimol, or an analogue or derivative of any of
the foregoing.
[0384] DNA crosslinking agents include but are not limited to
antineoplastic alkylating agent, methoxsalen, mitomycin antibiotic,
and psoralen. An antineoplastic alkylating agent can be a
nitrosourea compound (e.g., cystemustine, tauromustine, semustine,
PCNU, streptozocin, SarCNU, CGP-6809, carmustine, fotemustine,
methylnitrosourea, nimustine, ranimustine, ethylnitrosourea,
lomustine, chlorozotocin), mustard agent (e.g., nitrogen mustard
compound, such as spiromustine, trofosfamide, chlorambucil,
estramustine, 2,2,2-trichlorotriethylamine, prednimustine,
novembichin, phenamet, glufosfamide, peptichemio, ifosfamide,
defosfamide, nitrogen mustard, phenesterin, mannomustine,
cyclophosphamide, melphalan, perfosfamide, mechlorethamine oxide
hydrochloride, uracil mustard, bestrabucil, DHEA mustard,
tallimustine, mafosfamide, aniline mustard, chlornaphazine; sulfur
mustard compound, such as bischloroethylsulfide; mustard prodrug,
such as TLK286 and ZD2767), ethylenimine compound (e.g., mitomycin
antibiotic, ethylenimine, uredepa, thiotepa, diaziquone,
hexamethylene bisacetamide, pentamethylmelamine, altretamine,
carzinophilin, triaziquone, meturedepa, benzodepa, carboquone),
alkylsulfonate compound (e.g., dimethylbusulfan, Yoshi-864,
improsulfan, piposulfan, treosulfan, busulfan, hepsulfam), epoxide
compound (e.g., anaxirone, mitolactol, dianhydrogalactitol,
teroxirone), miscellaneous alkylating agent (e.g., ipomeanol,
carzelesin, methylene dimethane sulfonate, mitobronitol, bizelesin,
adozelesin, piperazinedione, VNP40101M, asaley,
6-hydroxymethylacylfulvene, E09, etoglucid, ecteinascidin 743,
pipobroman), platinum compound (e.g., ZD0473, liposomal-cisplatin
analogue, satraplatin, BBR 3464, spiroplatin, ormaplatin,
cisplatin, oxaliplatin, carboplatin, lobaplatin, zeniplatin,
iproplatin), triazene compound (e.g., imidazole mustard, CB10-277,
mitozolomide, temozolomide, procarbazine, dacarbazine), picoline
compound (e.g., penclomedine), or an analogue or derivative of any
of the foregoing. Examples of preferred alkylating agents include
but are not limited to cisplatin, dibromodulcitol, fotemustine,
ifosfamide (ifosfamid), ranimustine (ranomustine), nedaplatin
(latoplatin), bendamustine (bendamustine hydrochloride),
eptaplatin, temozolomide (methazolastone), carboplatin, altretamine
(hexamethylmelamine), prednimustine, oxaliplatin (oxalaplatinum),
carmustine, thiotepa, leusulfon (busulfan), lobaplatin,
cyclophosphamide, bisulfan, melphalan, and chlorambucil, or an
analogue or derivative of any of the foregoing.
[0385] Intercalating agents can be an anthraquinone compound,
bleomycin antibiotic, rebeccamycin analogue, acridine, acridine
carboxamide, amonafide, rebeccamycin, anthrapyrazole antibiotic,
echinomycin, psoralen, LU 79553, BW A773U, crisnatol mesylate,
benzo(a)pyrene-7,8-diol-9,10-epoxide, acodazole, elliptinium,
pixantrone, or an analogue or derivative of any of the
foregoing.
[0386] DNA adduct forming agents include but are not limited to
enediyne antitumor antibiotic (e.g., dynemicin A, esperamicin A1,
zinostatin, dynemicin, calicheamicin gamma 1I), platinum compound,
carmustine, tamoxifen (e.g., 4-hydroxy-tamoxifen), psoralen,
pyrazine diazohydroxide, benzo(a)pyrene-7,8-diol-9,10-epoxide, or
an analogue or derivative of any of the foregoing.
[0387] Anti-metabolites block the synthesis of nucleotides or
deoxyribonucleotides, which are necessary for making DN, thereby
preventing cells from replicating. Anti-metabolites include but are
not limited to cytosine, arabinoside, floxuridine, 5-fluorouracil
(5-FU), mercaptopurine, gemcitabine, hydroxyurea (HU), and
methotrexate (MTX).
[0388] Anti-mitotic agents disrupt the development of the mitotic
spindle thereby interfering with tumor cell proliferation.
Anti-mitotic agents include but are not limited to Vinblastine,
Vincristine, and Paclitaxel (Taxol). Anti-mitotic agents also
include agents that target the enzymes that regulate mitosis, e.g.,
agents that target kinesin spindle protein (KSP), e.g.,
L-001000962-000Y.
5.10. Pharmaceutical Formulations and Routes of Administration
[0389] The compounds that can be used to modulate the expression of
the chemotherapy response genes or the activity of their gene
products can be administered to a patient at effective doses. Such
an effective dose refers to that amount of the compound sufficient
to result in the desired change in the expression or activity level
of one or more CR genes and/or gene products thereof.
5.10.1. Effective Dose
[0390] Toxicity and therapeutic efficacy of such compounds can be
determined by standard pharmaceutical procedures in cell cultures
or experimental animals, e.g., for determining the LD.sub.50 (the
dose lethal to 50% of the population) and the ED.sub.50 (the dose
therapeutically effective in 50% of the population). The dose ratio
between toxic and therapeutic effects is the therapeutic index and
it can be expressed as the ratio LD.sub.50/ED.sub.50. Compounds
which exhibit large therapeutic indices are preferred. While
compounds that exhibit toxic side effects may be used, care should
be taken to design a delivery system that targets such compounds to
the site of affected tissue in order to minimize potential damage
to uninfected cells and, thereby, reduce side effects.
[0391] The data obtained from the cell culture assays and animal
studies can be used in formulating a range of dosage for use in
humans. The dosage of such compounds lies preferably within a range
of circulating concentrations that include the ED.sub.50 with
little or no toxicity. The dosage may vary within this range
depending upon the dosage form employed and the route of
administration utilized. For any compound used in the method of the
invention, the therapeutically effective dose can be estimated
initially from cell culture assays. A dose may be formulated in
animal models to achieve a circulating plasma concentration range
that includes the IC.sub.50 (i.e., the concentration of the test
compound which achieves a half-maximal inhibition of symptoms) as
determined in cell culture. Such information can be used to more
accurately determine useful doses in humans. Levels in plasma may
be measured, for example, by high performance liquid
chromatography.
5.10.2. Formulations and Use
[0392] Pharmaceutical compositions for use in accordance with the
present invention may be formulated in conventional manner using
one or more pharmaceutically acceptable carriers or excipients.
[0393] Thus, the compounds and their pharmaceutically acceptable
salts and solvates may be formulated for administration by
inhalation or insufflation (either through the mouth or the nose)
or oral, buccal, parenteral or rectal administration.
[0394] For oral administration, the pharmaceutical compositions may
take the form of, for example, tablets or capsules prepared by
conventional means with pharmaceutically acceptable excipients such
as binding agents (e.g., pregelatinised maize starch,
polyvinylpyrrolidone or hydroxypropyl methylcellulose); fillers
(e.g., lactose, microcrystalline cellulose or calcium hydrogen
phosphate); lubricants (e.g., magnesium stearate, talc or silica);
disintegrants (e.g., potato starch or sodium starch glycolate); or
wetting agents (e.g., sodium lauryl sulphate). The tablets may be
coated by methods well known in the art. Liquid preparations for
oral administration may take the form of, for example, solutions,
syrups or suspensions, or they may be presented as a dry product
for constitution with water or other suitable vehicle before use.
Such liquid preparations may be prepared by conventional means with
pharmaceutically acceptable additives such as suspending agents
(e.g., sorbitol syrup, cellulose derivatives or hydrogenated edible
fats); emulsifying agents (e.g., lecithin or acacia); non-aqueous
vehicles (e.g., almond oil, oily esters, ethyl alcohol or
fractionated vegetable oils); and preservatives (e.g., methyl or
propyl-p-hydroxybenzoates or sorbic acid). The preparations may
also contain buffer salts, flavoring, coloring and sweetening
agents as appropriate.
[0395] Preparations for oral administration may be suitably
formulated to give controlled release of the active compound.
[0396] For buccal administration the compositions may take the form
of tablets or lozenges formulated in conventional manner.
[0397] For administration by inhalation, the compounds for use
according to the present invention are conveniently delivered in
the form of an aerosol spray presentation from pressurized packs or
a nebuliser, with the use of a suitable propellant, e.g.,
dichlorodifluoromethane, trichlorofluoromethane,
dichlorotetrafluoroethane, carbon dioxide or other suitable gas. In
the case of a pressurized aerosol the dosage unit may be determined
by providing a valve to deliver a metered amount. Capsules and
cartridges of e.g. gelatin for use in an inhaler or insufflator may
be formulated containing a powder mix of the compound and a
suitable powder base such as lactose or starch.
[0398] The compounds may be formulated for parenteral
administration by injection, e.g., by bolus injection or continuous
infusion. Formulations for injection may be presented in unit
dosage form, e.g., in ampoules or in multi-dose containers, with an
added preservative. The compositions may take such forms as
suspensions, solutions or emulsions in oily or aqueous vehicles,
and may contain formulatory agents such as suspending, stabilizing
and/or dispersing agents. Alternatively, the active ingredient may
be in powder form for constitution with a suitable vehicle, e.g.,
sterile pyrogen-free water, before use.
[0399] The compounds may also be formulated in rectal compositions
such as suppositories or retention enemas, e.g., containing
conventional suppository bases such as cocoa butter or other
glycerides.
[0400] In addition to the formulations described previously, the
compounds may also be formulated as a depot preparation. Such long
acting formulations may be administered by implantation (for
example subcutaneously or intramuscularly) or by intramuscular
injection. Thus, for example, the compounds may be formulated with
suitable polymeric or hydrophobic materials (for example as an
emulsion in an acceptable oil) or ion exchange resins, or as
sparingly soluble derivatives, for example, as a sparingly soluble
salt.
[0401] The compositions may, if desired, be presented in a pack or
dispenser device which may contain one or more unit dosage forms
containing the active ingredient. The pack may for example comprise
metal or plastic foil, such as a blister pack. The pack or
dispenser device may be accompanied by instructions for
administration.
5.10.3. Routes of Administration
[0402] Suitable routes of administration may, for example, include
oral, rectal, transmucosal, transdermal, or intestinal
administration; parenteral delivery, including intramuscular,
subcutaneous, intramedullary injections, as well as intrathecal,
direct intraventricular, intravenous, intraperitoneal, intranasal,
or intraocular injections.
[0403] Alternately, one may administer the compound in a local
rather than systemic manner, for example, via injection of the
compound directly into an affected area, often in a depot or
sustained release formulation.
[0404] Furthermore, one may administer the drug in a targeted drug
delivery system, for example, in a liposome coated with an antibody
specific for affected cells. The liposomes will be targeted to and
taken up selectively by the cells.
5.10.4. Packaging
[0405] The compositions may, if desired, be presented in a pack or
dispenser device which may contain one or more unit dosage forms
containing the active ingredient. The pack may for example comprise
metal or plastic foil, such as a blister pack. The pack or
dispenser device may be accompanied by instructions for
administration. Compositions comprising a compound formulated in a
compatible pharmaceutical carrier may also be prepared, placed in
an appropriate container, and labeled for treatment of an indicated
condition. Suitable conditions indicated on the label may include
treatment of a disease such as one characterized by aberrant or
excessive expression or activity of a chemotherapy response
protein.
5.10.5. Combination Therapy
[0406] In a combination therapy, one or more compositions of the
present invention, e.g., agent that reduces the level of expression
and/or activity of one or more CR genes and/or gene products
thereof, can be administered before, at the same time as, or after
the administration of a chemotherapeutic agent. In one embodiment,
the compositions of the invention are administered before the
administration of a chemotherapeutic agent (i.e., the agent that
modulates expression or activity of a chemotherapy response gene
and/or encoded protein is for sequential or concurrent use with one
or more the chemotherapeutic agent). In one embodiment, the
composition of the invention and a chemotherapeutic agent are
administered in a sequence and within a time interval such that the
composition of the invention and a chemotherapeutic agent can act
together to provide an increased benefit than if they were
administered alone. In another embodiment, the composition of the
invention and a chemotherapeutic agent are administered
sufficiently close in time so as to provide the desired therapeutic
outcome. The time intervals between the administration of the
compositions of the invention and a chemotherapeutic agent can be
determined by routine experiments that are familiar to one skilled
person in the art. In one embodiment, a chemotherapeutic agent is
given to the patient after the level of the chemotherapy response
gene and/or encoded protein reaches a desirable threshold. The
level of a chemotherapy response gene and/or encoded protein can be
determined by using any techniques known in the art such as those
described in Section 5.3., infra.
[0407] The composition of the invention and a chemotherapeutic
agent can be administered simultaneously or separately, in any
appropriate form and by any suitable route. In one embodiment, the
composition of the invention and the chemotherapeutic agent are
administered by different routes of administration. In an alternate
embodiment, each is administered by the same route of
administration. The composition of the invention and the
chemotherapeutic agent can be administered at the same or different
sites, e.g. arm and leg.
[0408] In various embodiments, such as those described above, the
composition of the invention and a chemotherapeutic agent are
administered less than 1 hour apart, at about 1 hour apart, 1 hour
to 2 hours apart, 2 hours to 3 hours apart, 3 hours to 4 hours
apart, 4 hours to 5 hours apart, 5 hours to 6 hours apart, 6 hours
to 7 hours apart, 7 hours to 8 hours apart, 8 hours to 9 hours
apart, 9 hours to 10 hours apart, 10 hours to 11 hours apart, 11
hours to 12 hours apart, no more than 24 hours apart or no more
than 48 hours apart, or no more than 1 week or 2 weeks or 1 month
or 3 months apart. As used herein, the word about means within 10%.
In other embodiments, the composition of the invention and a
chemotherapeutic agent are administered 2 to 4 days apart, 4 to 6
days apart, 1 week apart, 1 to 2 weeks apart, 2 to 4 weeks apart,
one month apart, 1 to 2 months apart, or 2 or more months apart. In
preferred embodiments, the composition of the invention and a
chemotherapeutic agent are administered in a time frame where both
are still active. One skilled in the art would be able to determine
such a time frame by determining the half life of each administered
component. In separate or in the foregoing embodiments, the
composition of the invention and a chemotherapeutic agent are
administered less than 2 weeks, one month, six months, 1 year or 5
years apart.
[0409] In another embodiment, the compositions of the invention are
administered at the same time or at the same patient visit, as the
chemotherapeutic agent.
[0410] In still another embodiment, one or more of the compositions
of the invention are administered both before and after the
administration of a chemotherapeutic agent. Such administration can
be beneficial especially when the chemotherapeutic agent has a
longer half life than that of the one or more of the compositions
of the invention used in the treatment.
[0411] In one embodiment, the chemotherapeutic agent is
administered daily and the composition of the invention is
administered once a week for the first 4 weeks, and then once every
other week thereafter. In one embodiment, the chemotherapeutic
agent is administered daily and the composition of the invention is
administered once a week for the first 8 weeks, and then once every
other week thereafter.
[0412] In certain embodiments, the composition of the invention and
the chemotherapeutic agent are cyclically administered to a
subject. Cycling therapy involves the administration of the
composition of the invention for a period of time, followed by the
administration of a chemotherapeutic agent for a period of time and
repeating this sequential administration. Cycling therapy can
reduce the development of resistance to one or more of the
therapies, avoid or reduce the side effects of one of the
therapies, and/or improve the efficacy of the treatment. In such
embodiments, the invention contemplates the alternating
administration of the composition of the invention followed by the
administration of a chemotherapeutic agent 4 to 6 days later,
preferable 2 to 4 days, later, more preferably 1 to 2 days later,
wherein such a cycle may be repeated as many times as desired.
[0413] In certain embodiments, the composition of the invention and
a chemotherapeutic agent are alternately administered in a cycle of
less than 3 weeks, once every two weeks, once every 10 days or once
every week. In a specific embodiment of the invention, one cycle
can comprise the administration of a chemotherapeutic agent by
infusion over 90 minutes every cycle, 1 hour every cycle, or 45
minutes every cycle. Each cycle can comprise at least 1 week of
rest, at least 2 weeks of rest, at least 3 weeks of rest. In an
embodiment, the number of cycles administered is from 1 to 12
cycles, more typically from 2 to 10 cycles, and more typically from
2 to 8 cycles.
[0414] It will be apparent to one skilled person in the art that
any combination of different timing of the administration of the
compositions of the invention and a chemotherapeutic agent can be
used. For example, when the chemotherapeutic agent has a longer
half life than that of the composition of the invention, it is
preferable to administer the compositions of the invention before
and after the administration of the chemotherapeutic agent.
[0415] The frequency or intervals of administration of the
compositions of the invention depends on the desired level of the
chemotherapy response gene and/or encoded protein, which can be
determined by any of the techniques known in the art, e.g., those
techniques described infra. The administration frequency of the
compositions of the invention can be increased or decreased when
the level of the chemotherapy response gene and/or encoded protein
changes either higher or lower from the desired level.
5.11. Implementation Systems and Methods
[0416] The analytical methods of the present invention can
preferably be implemented using a computer system, such as the
computer system described in this section, according to the
following programs and methods. Such a computer system can also
preferably store and manipulate measured signals obtained in
various experiments that can be used by a computer system
implemented with the analytical methods of this invention.
Accordingly, such computer systems are also considered part of the
present invention.
[0417] An exemplary computer system suitable from implementing the
analytic methods of this invention is illustrated in FIG. 6.
Computer system 601 is illustrated here as comprising internal
components and as being linked to external components. The internal
components of this computer system include one or more processor
elements 602 interconnected with a main memory 603. For example,
computer system 601 can be an Intel Pentium IV.RTM.-based processor
of 2 GHZ or greater clock rate and with 256 MB or more main memory.
In a preferred embodiment, computer system 601 is a cluster of a
plurality of computers comprising a head "node" and eight sibling
"nodes," with each node having a central processing unit ("CPU").
In addition, the cluster also comprises at least 128 MB of random
access memory ("RAM") on the head node and at least 256 MB of RAM
on each of the eight sibling nodes. Therefore, the computer systems
of the present invention are not limited to those consisting of a
single memory unit or a single processor unit.
[0418] The external components can include a mass storage 604. This
mass storage can be one or more hard disks that are typically
packaged together with the processor and memory. Such hard disks
are typically of 10 GB or greater storage capacity and more
preferably have at least 40 GB of storage capacity. For example, in
a preferred embodiment, described above, wherein a computer system
of the invention comprises several nodes, each node can have its
own hard drive. The head node preferably has a hard drive with at
least 10 GB of storage capacity whereas each sibling node
preferably has a hard drive with at least 40 GB of storage
capacity. A computer system of the invention can further comprise
other mass storage units including, for example, one or more floppy
drives, one more CD-ROM drives, one or more DVD drives or one or
more DAT drives.
[0419] Other external components typically include a user interface
device 605, which is most typically a monitor and a keyboard
together with a graphical input device 606 such as a "mouse." The
computer system is also typically linked to a network link 607
which can be, e.g., part of a local area network ("LAN") to other,
local computer systems and/or part of a wide area network ("WAN"),
such as the Internet, that is connected to other, remote computer
systems. For example, in the preferred embodiment, discussed above,
wherein the computer system comprises a plurality of nodes, each
node is preferably connected to a network, preferably an NFS
network, so that the nodes of the computer system communicate with
each other and, optionally, with other computer systems by means of
the network and can thereby share data and processing tasks with
one another.
[0420] Loaded into memory during operation of such a computer
system are several software components that are also shown
schematically in FIG. 6. The software components comprise both
software components that are standard in the art and components
that are special to the present invention. These software
components are typically stored on mass storage such as the hard
drive 604, but can be stored on other computer readable media as
well including, for example, one or more floppy disks, one or more
CD-ROMs, one or more DVDs or one or more DATs. Software component
610 represents an operating system which is responsible for
managing the computer system and its network interconnections. The
operating system can be, for example, of the Microsoft Windows.TM.
family such as Windows 95, Window 98, Windows NT, Windows 2000 or
Windows XP. Alternatively, the operating software can be a
Macintosh operating system, a UNIX operating system or a LINUX
operating system. Software component 611 comprises common languages
and functions that are preferably present in the system to assist
programs implementing methods specific to the present invention.
Languages that can be used to program the analytic methods of the
invention include, for example, C and C++, FORTRAN, PERL, HTML,
JAVA, and any of the UNIX or LINUX shell command languages such as
C shell script language. The methods of the invention can also be
programmed or modeled in mathematical software packages that allow
symbolic entry of equations and high-level specification of
processing, including specific algorithms to be used, thereby
freeing a user of the need to procedurally program individual
equations and algorithms. Such packages include, e.g., Matlab from
Mathworks (Natick, Mass.), Mathematica from Wolfram Research
(Champaign, Ill.) or S-Plus from MathSoft (Seattle, Wash.).
[0421] It will be clear to one skilled in the art that the computer
system may comprise an outputting or displaying system for
communicating a result from the analysis to an end user. In some
embodiments, the outputting or display system comprises extenal
component(s). It will be clear to one skilled in the art that
outputting the result is not limited to outputting to linked
external component(s), but may alternatively or additionally be
outputting to internal component(s). It will also be clear to one
skilled in the art that the claimed methods can, but need not be,
computer-implemented, and that, for example, the displaying or
outputting step can be done by, for example, by communicating to a
person orally or in writing (e.g., in handwriting).
[0422] Software component 612 comprises any analytic methods of the
present invention described supra, preferably programmed in a
procedural language or symbolic package. For example, software
component 612 preferably includes programs that cause the processor
to implement steps of accepting a plurality of measured signals and
storing the measured signals in the memory. For example, the
computer system can accept measured signals that are manually
entered by a user (e.g., by means of the user interface). More
preferably, however, the programs cause the computer system to
retrieve measured signals from a database. Such a database can be
stored on a mass storage (e.g., a hard drive) or other computer
readable medium and loaded into the memory of the computer, or the
compendium can be accessed by the computer system by means of the
network 607.
[0423] In addition to the exemplary program structures and computer
systems described herein, other, alternative program structures and
computer systems will be readily apparent to the skilled artisan.
Such alternative systems, which do not depart from the above
described computer system and programs structures either in spirit
or in scope, are therefore intended to be comprehended within the
accompanying claims.
5.12. Kits
[0424] The invention provides kits that are useful in determining
chemotherapy responsiveness in a patient. The kits of the present
invention comprise one or more probes and/or primers for one or
more gene products or for each of at least 2, 5, 10, 20, or 30 gene
products that are encoded by the respectively marker genes listed
in Table 1 or functional equivalents of such genes, wherein the
probes and/or primers are at least 50%, 75%, 80%, 85%, 90%, 95%,
97%, 98%, 99% or 100% of the total probes and/or primers in the
kit. The probes of marker genes may be part of an array, or the
biomarker(s) may be packaged separately and/or individually.
[0425] In one embodiment, the invention provides kits comprising
probes that are immobilized at an addressable position on a
substrate, e.g., in a microarray. In a particular embodiment, the
invention provides such a microarray.
[0426] The kits of the present invention may also contain probes
that can be used to detect protein products of the marker genes of
the invention. In a specific embodiment, the invention provides a
kit comprises a plurality of antibodies that specifically bind one
or more, or a plurality of at least 5, 10, 20, or 30 proteins that
are encoded by the respectively marker genes listed in Table 1 or
functional equivalents of such genes, wherein the antibodies are at
least 50%, 75%, 80%, 85%, 90%, 95%, 97%, 98%, 99% or 100% of the
total antibodies in the kit. In accordance with this embodiment,
the kit may comprise a set of antibodies or functional fragments or
derivatives thereof (e.g., Fab, F(ab').sub.2, Fv, or scFv
fragments). In accordance with this embodiment, the kit may include
antibodies, fragments or derivatives thereof (e.g., Fab,
F(ab').sub.2, Fv, or scFv fragments) that are specific for these
proteins. In one embodiment, the antibodies may be detectably
labeled.
[0427] The kits of the present invention may also include reagents
such as buffers, or other reagents that can be used in obtaining
the marker profile. Prevention of the action of microorganisms can
be ensured by the inclusion of various antibacterial and antifungal
agents, for example, paraben, chlorobutanol, phenol sorbic acid,
and the like. It may also be desirable to include isotonic agents
such as sugars, sodium chloride, and the like.
[0428] In some embodiments of the invention, the kits of the
present invention comprise a microarray. The microarray can be any
of the microarrays described above, e.g., in Section 5.3.2,
optionally in a sealed container. In one embodiment this microarray
comprises a plurality of probe spots, wherein at least 20%, 40%,
60%, 80%, or 90% of the probe spots in the plurality of probe spots
correspond to marker genes listed in Table 1.
[0429] In still other embodiments, the kits of the invention may
further comprise a computer program product for use in conjunction
with a computer system, wherein the computer program product
comprises a computer readable storage medium and a computer program
mechanism embedded therein. In such kits, the computer program
mechanism comprises instructions for prediction of prognosis using
a marker profile obtained with the reagents of the kits.
[0430] In still other embodiments, the kits of the present
invention comprise a computer having a central processing unit and
a memory coupled to the central processing unit. The memory stores
instructions for prediction of prognosis using a marker profile
obtained with the reagents of the kits.
6. EXAMPLES
[0431] The following examples are presented by way of illustration
of the present invention, and are not intended to limit the present
invention in any way.
[0432] A 311 cohort samples were collected from breast cancer
patients. See van de Vijver et al., 2002, A gene-expression
signature as a predictor of survival in breast cancer. N Engl J
Med. 347(25):1999-2009. Microarrays containing approximately 25,000
human gene sequences (Hu25K microarrays) were used for this study.
Sequences for microarrays were selected from RefSeq (a collection
of non-redundant mRNA sequences, located on the Internet at
nlm.nih.gov/LocusLink/refseq.html) and Phil Green EST contigs,
which is a collection of EST contigs assembled by Dr. Phil Green et
al at the University of Washington (Ewing and Green, Nat. Genet.
25(2):232-4 (2000)), available on the Internet at
phrap.org/est_assembly/index.html. Each mRNA or EST contig was
represented on Hu25K microarray by a single 60 mer oligonucleotide
essentially as described in Hughes et al., Nature Biotech.
19(4):342-347 and in International Publication WO 01/06013,
published Jan. 25, 2001, and in International Publication WO
01/05935, published Jan. 25, 2001, except that the rules for oligo
screening were modified to remove oligonucleotides with more than
30% C or with 6 or more contiguous C residues.
[0433] Using the 311 NKI breast cancer cohort sample data and a
"nearest neighbor" method, a total of 122 hubs/networks were
constructed (magnitude of correlation coefficient for connected
genes>0.5). Among the 311 patients, 110 patients received
chemotherapy of either 5-fluorouracil or CMF combination
(consisting of cyclophosphamide, methotrexate, and 5-fluorouracil).
FIG. 1 shows one example of such a hub.
[0434] FIG. 1. (a) A network (hub #34) enriched for interferon
stimulated genes (ISG). (b) The hub genes are highly co-regulated
in breast cancer data where the network was derived from. Each row
represents a sample, each column represents one gene. A darker
shade, which was magenta in the original depiction of FIG. 1b,
represents up-regulation; and a lighter shade, which was cyan in
the original depiction of FIG. 1b, represents down regulation.
[0435] As a second step, the hub expression level in each breast
cancer sample was computed by averaging over genes in each hub.
Hubs whose expression levels were related to chemotherapy
sensitivity were searched for by first dividing samples into two
populations according to the hub expression level. Within each
population, the treatment effect was examined by checking whether
the metastasis rate was affected by the chemotherapy. Specifically
a log-rank-test was performed on the metastasis free probability as
a function of time for patients with treatment vs. no treatment.
When this search was performed over all 311 samples, there were
only two hubs with log-rank-test P-value<0.01. Among the two,
the most significant one was a hub (#34) enriched for interferon
stimulated genes (ISGs) (FIGS. 1 and 2), with a P-value of
0.3%.
[0436] Since breast cancer expression patterns are very different
between the estrogen receptor positive (ER+) patients and negative
(ER-) patients, a search was also performed over ER+ (239 samples)
and ER- patients (72 samples), respectively. A total of 7 hubs with
log-rank-test P-value<0.01 were identified. Again, the ISG hub
is among the 7 and with the most significant P-value (0.3%). Given
the ISG hub was the most promising hub for "predicting" the
chemo-sensitivity, all 122 constructed hubs were re-examined and
another hub (hub #88) that was also enriched for ISGs was
identified. The log-rank-test P-value for this hub was 2%, barely
missed the selection criteria.
[0437] FIG. 2. The expression level of interferon stimulated genes
(ISGs) is related to chemotherapy (CMF) sensitivity in breast
cancer patients. (a) Patients with low expression of ISGs show
great chemotherapy sensitivity as indicated by the Kaplan-Meier
plot of metastasis-free probability between patients received the
treatment (red) vs. no treatment (blue). At 10 years after
diagnosis of cancer, the treatment boosted the metastasis-free
probability from 60% to .about.95%(log-rank-test P-value 0.3%). (b)
Patients with high expression of ISGs show no chemo-therapy
sensitivity. There was essentially no difference in metastasis-free
probability between patients with and without chemotherapy
(P=75%).
[0438] Thirdly, these 9 hubs (including hub #88 which just missed
the threshold) were tested in an ex-vivo ovarian cancer data set.
50 ovarian cancer samples were plated ex-vivo and treated by a
panel of 19 anticancer drugs, including Paclitaxel, carboplatin,
etoposide, and 5-FU. The tumor cell growth inhibition for each drug
treatment was measured and samples were categorized into 3 classes
for each drug: EDR (extreme drug resistance), LDR (low drug
resistance) and in between. The 50 ovarian cancer samples pre-dose
of drugs were profiled against the pool of all samples. The
expression levels of hub genes were tested by their correlation to
the drug resistance categories.
[0439] Among the 9 hubs tested, only 3 hubs (#20, #34 and #88, two
of which were enriched for ISGs) had significant fraction of
members correlated (P-value of correlation<5%) to the growth
inhibition by 5-FU (FIG. 3). The gene expression pattern of the two
ISG hubs (see Table 1 for gene list) in ovarian cancer is shown in
FIG. 4. As can be seen from this plot, low drug resistance mostly
corresponded to the low expression of ISGs, and the extreme drug
resistance mostly corresponded to high expression level of ISGs,
agree well with the clinical breast cancer observation.
[0440] Finally, the specificity of ISG pathway reporting on the
Paclitaxel, carboplatin, etoposide, and 5-FU sensitivity was
examined. The correlation between expression level and drug
resistance for all 19 anti-cancer drugs was calculated. As shown in
FIG. 5, the number of genes correlated with drug resistance was
above about 30% for Paclitaxel, carboplatin, etoposide, and 5-FU,
indicating ISGs are relatively specific for reporting resistance to
these drugs.
[0441] FIG. 3. Bar chart of number of genes in each P-value bin for
9 hubs. P-value was based on the correlation coefficient between
gene expression level and 5-FUdrug resistance category in ovarian
ex-vivo experiment. 3 hubs (#20, 34 & 88) had significant
fraction of members whose base-line expression level correlated
with the drug resistance (with P-value of correlation<5%). Two
of the 3 hubs (#34 & 88) belong to ISG pathway.
[0442] FIG. 4. Expression of ISGs and their relation with drug
resistance in ex-vivo ovarian samples. Left panel: category of 5-FU
drug resistance measured by growth inhibition. EDR stands for
extreme drug resistance, LDR stands for low drug resistance. The
remaining category stands for intermediate. Heatmap: expression of
ISGs from hub 34 & hub 88. Each row represents a sample, each
column represents one gene. A darker shade, which was magenta in
the original heatmap of FIG. 4, represents up-regulation; and a
lighter shade, which was cyan in the original heatmap of FIG. 4,
represents down regulation. For LDR samples, ISGs are mostly
under-expressed compared to the average, whereas for EDR samples,
the ISG levels are relatively higher. Top panel: correlation of
expression level to drug resistance.
[0443] FIG. 5. Fraction of interferon-stimulated-genes (ISGs)
correlated with drug resistance in ex-vivo ovarian cancer samples
treated with a panel of anti-cancer drugs. The ISGs are relatively
specific in reporting the 5-FU drug sensitivity.
[0444] In summary, a set of markers including many
interferon-stimulated-genes are identified that correlates with
chemotherapy sensitivity to Paclitaxel, carboplatin, etoposide, and
5-FU in both clinical and ex-vivo model systems. This demonstrates
the utility of combining pathway analysis and model systems to help
predict response to chemotherapy.
7. REFERENCES CITED
[0445] All references cited herein are incorporated herein by
reference in their entirety and for all purposes to the same extent
as if each individual publication or patent or patent application
was specifically and individually indicated to be incorporated by
reference in its entirety for all purposes.
[0446] Many modifications and variations of the present invention
can be made without departing from its spirit and scope, as will be
apparent to those skilled in the art. The specific embodiments
described herein are offered by way of example only, and the
invention is to be limited only by the terms of the appended claims
along with the full scope of equivalents to which such claims are
entitled.
Sequence CWU 1
1
3911145DNAHomo sapiens 1gctccggcca gccgcggtcc agagcgcgcg aggttcgggg
agctccgcca ggctgctggt 60acctgcgtcc gcccggcgag caggacaggc tgctttggtt
tgtgacctcc aggcaggacg 120gccatcctct ccagaatgaa gatcttcttg
ccagtgctgc tggctgccct tctgggtgtg 180gagcgagcca gctcgctgat
gtgcttctcc tgcttgaacc agaagagcaa tctgtactgc 240ctgaagccga
ccatctgctc cgaccaggac aactactgcg tgactgtgtc tgctagtgcc
300ggcattggga atctcgtgac atttggccac agcctgagca agacctgttc
cccggcctgc 360cccatcccag aaggcgtcaa tgttggtgtg gcttccatgg
gcatcagctg ctgccagagc 420tttctgtgca atttcagtgc ggccgatggc
gggctgcggg caagcgtcac cctgctgggt 480gccgggctgc tgctgagcct
gctgccggcc ctgctgcggt ttggcccctg accgcccaga 540ccctgtcccc
cgatccccca gctcaggaag gaaagcccag ccctttctgg atcccacagt
600gtatgggagc ccctgactcc tcacgtgcct gatctgtgcc cttggtccca
ggtcaggccc 660accccctgca cctccacctg ccccagcccc tgcctctgcc
caagtgggcc agctgccctc 720acttctgggg tggatgatgt gaccttcctt
gggggactgc ggaagggacg agggttccct 780ggagtcttac ggtccaacat
cagaccaagt cccatggaca tgctgacagg gtccccaggg 840agaccgtgtc
agtagggatg tgtgcctggc tgtgtacgtg ggtgtgcagt gcacgtgaga
900gcacgtggcg gcttctgggg gccatgtttg gggagggagg tgtgccagca
gcctggagag 960cctcagtccc tgtagccccc tgccctggca cagctgcatg
cacttcaagg gcagcctttg 1020ggggttgggg tttctgccac ttccgggtct
aggccctgcc caaatccagc cagtcctgcc 1080ccagcccacc cccacattgg
agccctcctg ctgctttggt gcctcaaata aatacagatg 1140tcccc
114523579DNAHomo sapiens 2ctgaggccca cgcagggcct agggtgggaa
gatggcaggt gggggcggcg acctgagcac 60caggaggctg aatgaatgta tttcaccagt
agcaaatgag atgaaccatc ttcctgcaca 120cagccacgat ttgcaaagga
tgttcacgga agaccagggt gtagatgaca ggctgctcta 180tgacattgta
ttcaagcact tcaaaagaaa taaggtggag atttcaaatg caataaaaaa
240gacatttcca ttcctcgagg gcctccgtga tcgtgatctc atcacaaata
aaatgtttga 300agattctcaa gattcttgta gaaacctggt ccctgtacag
agagtggtgt acaatgttct 360tagtgaactg gagaagacat ttaacctgcc
agttctggaa gcactgttca gcgatgtcaa 420catgcaggaa taccccgatt
taattcacat ttataaaggc tttgaaaatg taatccatga 480caaattgcct
ctccaagaaa gtgaagaaga agagagggag gagaggtctg gcctccaact
540aagtcttgaa caaggaactg gtgaaaactc ttttcgaagc ctgacttggc
caccttcggg 600ttccccatct catgctggta caaccccacc tgaaaatgga
ctctcagagc acccctgtga 660aacagaacag ataaatgcaa agagaaaaga
tacaaccagt gacaaagatg attcgctagg 720aagccaacaa acaaatgaac
aatgtgctca aaaggctgag ccaacagagt cctgcgaaca 780aattgctgtc
caagtgaata atggggatgc tggaagggag atgccctgcc cgttgccctg
840tgatgaagaa agcccagagg cagagctaca caaccatgga atccaaatta
attcctgttc 900tgtgcgactg gtggatataa aaaaggaaaa gccattttct
aattcaaaag ttgagtgcca 960agcccaagca agaactcatc ataaccaggc
atctgacata atagtcatca gcagtgagga 1020ctctgaagga tccactgacg
ttgatgagcc cttagaagtc ttcatctcag caccgagaag 1080tgagcctgtg
atcaataatg acaacccttt agaatcaaat gatgaaaagg agggccaaga
1140agccacttgc tcacgacccc agattgtacc agagcccatg gatttcagaa
aattatctac 1200attcagagaa agttttaaga aaagagtgat aggacaagac
cacgactttt cagaatccag 1260tgaggaggag gcgcccgcag aagcctcaag
cggggcactg agaagcaagc atggtgagaa 1320ggctcctatg acttctagaa
gtacatctac ttggagaata cccagcagga agagacgttt 1380cagcagtagt
gacttttcag acctgagtaa tggagaagag cttcaggaaa cctgcagctc
1440atccctaaga agagggtcag gatcacagcc acaagaacct gaaaataaga
agtgctcctg 1500tgtcatgtgt tttccaaaag gtgtgccaag aagccaagaa
gcaaggactg aaagtagtca 1560agcatctgac atgatggata ccatggatgt
tgaaaacaat tctactttgg aaaaacacag 1620tgggaaaaga agaaaaaaga
gaaggcatag atctaaagta aatggtctcc aaagagggag 1680aaagaaagac
agacctagaa aacatttaac tctgaataac aaagtccaaa agaaaagatg
1740gcaacaaaga ggaagaaaag ccaacactag acctttgaaa agaagaagaa
aaagaggtcc 1800aagaattccc aaagatgaaa atattaattt taaacaatct
gaacttcctg tgacctgtgg 1860tgaggtgaag ggcactctat ataaggagcg
attcaaacaa ggaacctcaa agaagtgtat 1920acagagtgag gataaaaagt
ggttcactcc cagggaattt gaaattgaag gagaccgcgg 1980agcatccaag
aactggaagc taagtatacg ctgcggtgga tataccctga aagtcctgat
2040ggagaacaaa tttctgccag aaccaccaag cacaagaaaa aagagaatac
tggaatctca 2100caacaatacc ttagttgacc cttgtgagga gcataagaag
aagaacccag atgcttcagt 2160caagttctca gagtttttaa agaagtgctc
agagacatgg aagaccattt ttgctaaaga 2220gaaaggaaaa tttgaagata
tggcaaaggc ggacaaggcc cattatgaaa gagaaatgaa 2280aacctatatc
cctcctaaag gggagaaaaa aaagaagttc aaggatccca atgcacccaa
2340gaggcctcct ttggcctttt tcctgttctg ctctgagtat cgcccaaaaa
tcaaaggaga 2400acatcctggc ctgtccattg atgatgttgt gaagaaactg
gcagggatgt ggaataacac 2460cgctgcagct gacaagcagt tttatgaaaa
gaaggctgca aagctgaagg aaaaatacaa 2520aaaggatatt gctgcatatc
gagctaaagg aaagcctaat tcagcaaaaa agagagttgt 2580caaggctgaa
aaaagcaaga aaaagaagga agaggaagaa gatgaagagg atgaacaaga
2640ggaggaaaat gaagaagatg atgataaata agttgcttct agtgcagttt
ttttcttgtc 2700tataaagcat ttaagctgcc tgtacacaac tcactccttt
taaagaaaaa aacttcaacg 2760taagactgtg taagatttgt ttttaaaccg
tacactgtgt ttttttgtat agttaaccac 2820taccgaatgt gtcttcagat
agccctgtcc tggtggtatt tagccactaa cctttgcctg 2880gtacagtatg
ggggttgtaa attggcatgg aaatttaaag caggttcttg ttagtgcaca
2940gcacaaatta gttgtatagg aggatggtag ttttttcacc ttcagttgtc
tctgatgtag 3000cttatacaaa acatttgttg ttctgttaac tgaatgccac
tctgtaattg caaaaaaaaa 3060aaacagttgc agctgttttg ttgacattct
gaatgcttct aagtaaatac aatttttaaa 3120aaaccgtatg agggaactgt
gtagacaagg taccaggtca gtcttcttcc atgttctatt 3180agctccacaa
agccaatctc aatccctcaa aacaatcttg tcatacttga aaatatgaca
3240ctctagtcaa agccttggta aaataatcag tgtttccaat ctgtcctgtt
acaaaagaaa 3300cagattatta ttgaacttat gcaaataacc attgtcataa
gaatgtttat gaatagtttc 3360caaattatgg caaattcatg tagagagaga
aaagtaactg ttttggtttt gctcacaaaa 3420gtctacttta cctaagggct
gtcagatata agtaacttaa aagaaagaga agttttcttg 3480acttttgaaa
acaaaatatg aaaagaatcg gcaatgtttc aaacaaaaag tcataaaagt
3540cactttattc ctccatcaaa aaaaaaaaaa aaaaaaaaa 35793398DNAHomo
sapiens 3agttcaaagg cagataaatc tgtaaattat tttatcctat ctaccatttc
ttaagaagac 60attactccaa aataattaaa tttaaggctt tatcaggtct gcatatagaa
tcttaaattc 120taataaagtt tcatgttaat gtcataggat ttttaaaaga
gctataggta atttctgtat 180aatatgtgta tattaaaatg taattgattt
cagttgaaag tattttaaag ctgataaata 240gcattagggt tctttgcaat
gtggtatcta gctgtattat tggttttatt tactttaaac 300attttgaaaa
gcttatactg gcagcctaga aaaacaaaca attaatgtat ctttatgtcc
360ctggcacatg aataaacttt gctgtggttt actaatct 39842787DNAHomo
sapiens 4agagcggagg ccgcactcca gcactgcgca gggaccgcct tggaccgcag
ttgccggcca 60ggaatcccag tgtcacggtg gacacgcctc cctcgcgccc ttgccgccca
cctgctcacc 120cagctcaggg gctttggaat tctgtggcca cactgcgagg
agatcggttc tgggtcggag 180gctacaggaa gactcccact ccctgaaatc
tggagtgaag aacgccgcca tccagccacc 240attccaagga ggtgcaggag
aacagctctg tgataccatt taacttgttg acattacttt 300tatttgaagg
aacgtatatt agagcttact ttgcaaagaa ggaagatggt tgtttccgaa
360gtggacatcg caaaagctga tccagctgct gcatcccacc ctctattact
gaatggagat 420gctactgtgg cccagaaaaa tccaggctcg gtggctgaga
acaacctgtg cagccagtat 480gaggagaagg tgcgcccctg catcgacctc
attgactccc tgcgggctct aggtgtggag 540caggacctgg ccctgccagc
catcgccgtc atcggggacc agagctcggg caagagctcc 600gtgttggagg
cactgtcagg agttgccctt cccagaggca gcgggatcgt gaccagatgc
660ccgctggtgc tgaaactgaa gaaacttgtg aacgaagata agtggagagg
caaggtcagt 720taccaggact acgagattga gatttcggat gcttcagagg
tagaaaagga aattaataaa 780gcccagaatg ccatcgccgg ggaaggaatg
ggaatcagtc atgagctaat caccctggag 840atcagctccc gagatgtccc
ggatctgact ctaatagacc ttcctggcat aaccagagtg 900gctgtgggca
atcagcctgc tgacattggg tataagatca agacactcat caagaagtac
960atccagaggc aggagacaat cagcctggtg gtggtcccca gtaatgtgga
catcgccacc 1020acagaggctc tcagcatggc ccaggaggtg gaccccgagg
gagacaggac catcggaatc 1080ttgacgaagc ctgatctggt ggacaaagga
actgaagaca aggttgtgga cgtggtgcgg 1140aacctcgtgt tccacctgaa
gaagggttac atgattgtca agtgccgggg ccagcaggag 1200atccaggacc
agctgagcct gtccgaagcc ctgcagagag agaagatctt ctttgagaac
1260cacccatatt tcagggatct gctggaggaa ggaaaggcca cggttccctg
cctggcagaa 1320aaacttacca gcgagctcat cacacatatc tgtaaatctc
tgcccctgtt agaaaatcaa 1380atcaaggaga ctcaccagag aataacagag
gagctacaaa agtatggtgt cgacataccg 1440gaagacgaaa atgaaaaaat
gttcttcctg atagataaaa ttaatgcctt taatcaggac 1500atcactgctc
tcatgcaagg agaggaaact gtaggggagg aagacattcg gctgtttacc
1560agactccgac acgagttcca caaatggagt acaataattg aaaacaattt
tcaagaaggc 1620cataaaattt tgagtagaaa aatccagaaa tttgaaaatc
agtatcgtgg tagagagctg 1680ccaggctttg tgaattacag gacatttgag
acaatcgtga aacagcaaat caaggcactg 1740gaagagccgg ctgtggatat
gctacacacc gtgacggata tggtccggct tgctttcaca 1800gatgtttcga
taaaaaattt tgaagagttt tttaacctcc acagaaccgc caagtccaaa
1860attgaagaca ttagagcaga acaagagaga gaaggtgaga agctgatccg
cctccacttc 1920cagatggaac agattgtcta ctgccaggac caggtataca
ggggtgcatt gcagaaggtc 1980agagagaagg agctggaaga agaaaagaag
aagaaatcct gggattttgg ggctttccag 2040tccagctcgg caacagactc
ttccatggag gagatctttc agcacctgat ggcctatcac 2100caggaggcca
gcaagcgcat ctccagccac atccctttga tcatccagtt cttcatgctc
2160cagacgtacg gccagcagct tcagaaggcc atgctgcagc tcctgcagga
caaggacacc 2220tacagctggc tcctgaagga gcggagcgac accagcgaca
agcggaagtt cctgaaggag 2280cggcttgcac ggctgacgca ggctcggcgc
cggcttgccc agttccccgg ttaaccacac 2340tctgtccagc cccgtagacg
tgcacgcaca ctgtctgccc ccgttcccgg gtagccactg 2400gactgacgac
ttgagtgctc agtagtcaga ctggatagtc cgtctctgct tatccgttag
2460ccgtggtgat ttagcaggaa gctgtgagag cagtttggtt tctagcatga
agacagagcc 2520ccaccctcag atgcacatga gctggcggga ttgaaggatg
ctgtcttcgt actgggaaag 2580ggattttcag ccctcagaat cgctccacct
tgcagctctc cccttctctg tattcctaga 2640aactgacaca tgctgaacat
cacagcttat ttcctcattt ttataatgtc ccttcacaaa 2700cccagtgttt
taggagcatg agtgccgtgt gtgtgcgtcc tgtcggagcc ctgtctcctc
2760tctctgtaat aaactcattt ctagcag 278752808DNAHomo sapiens
5gcggcggcgg cggcgcagtt tgctcatact ttgtgacttg cggtcacagt ggcattcagc
60tccacacttg gtagaaccac aggcacgaca agcatagaaa catcctaaac aatcttcatc
120gaggcatcga ggtccatccc aataaaaatc aggagaccct ggctatcata
gaccttagtc 180ttcgctggta tactcgctgt ctgtcaacca gcggttgact
ttttttaagc cttctttttt 240ctcttttacc agtttctgga gcaaattcag
tttgccttcc tggatttgta aattgtaatg 300acctcaaaac tttagcagtt
cttccatctg actcaggttt gcttctctgg cggtcttcag 360aatcaacatc
cacacttccg tgattatctg cgtgcatttt ggacaaagct tccaaccagg
420atacgggaag aagaaatggc tggtgatctt tcagcaggtt tcttcatgga
ggaacttaat 480acataccgtc agaagcaggg agtagtactt aaatatcaag
aactgcctaa ttcaggacct 540ccacatgata ggaggtttac atttcaagtt
ataatagatg gaagagaatt tccagaaggt 600gaaggtagat caaagaagga
agcaaaaaat gccgcagcca aattagctgt tgagatactt 660aataaggaaa
agaaggcagt tagtccttta ttattgacaa caacgaattc ttcagaagga
720ttatccatgg ggaattacat aggccttatc aatagaattg cccagaagaa
aagactaact 780gtaaattatg aacagtgtgc atcgggggtg catgggccag
aaggatttca ttataaatgc 840aaaatgggac agaaagaata tagtattggt
acaggttcta ctaaacagga agcaaaacaa 900ttggccgcta aacttgcata
tcttcagata ttatcagaag aaacctcagt gaaatctgac 960tacctgtcct
ctggttcttt tgctactacg tgtgagtccc aaagcaactc tttagtgacc
1020agcacactcg cttctgaatc atcatctgaa ggtgacttct cagcagatac
atcagagata 1080aattctaaca gtgacagttt aaacagttct tcgttgctta
tgaatggtct cagaaataat 1140caaaggaagg caaaaagatc tttggcaccc
agatttgacc ttcctgacat gaaagaaaca 1200aagtatactg tggacaagag
gtttggcatg gattttaaag aaatagaatt aattggctca 1260ggtggatttg
gccaagtttt caaagcaaaa cacagaattg acggaaagac ttacgttatt
1320aaacgtgtta aatataataa cgagaaggcg gagcgtgaag taaaagcatt
ggcaaaactt 1380gatcatgtaa atattgttca ctacaatggc tgttgggatg
gatttgatta tgatcctgag 1440accagtgatg attctcttga gagcagtgat
tatgatcctg agaacagcaa aaatagttca 1500aggtcaaaga ctaagtgcct
tttcatccaa atggaattct gtgataaagg gaccttggaa 1560caatggattg
aaaaaagaag aggcgagaaa ctagacaaag ttttggcttt ggaactcttt
1620gaacaaataa caaaaggggt ggattatata cattcaaaaa aattaattca
tagagatctt 1680aagccaagta atatattctt agtagataca aaacaagtaa
agattggaga ctttggactt 1740gtaacatctc tgaaaaatga tggaaagcga
acaaggagta agggaacttt gcgatacatg 1800agcccagaac agatttcttc
gcaagactat ggaaaggaag tggacctcta cgctttgggg 1860ctaattcttg
ctgaacttct tcatgtatgt gacactgctt ttgaaacatc aaagtttttc
1920acagacctac gggatggcat catctcagat atatttgata aaaaagaaaa
aactcttcta 1980cagaaattac tctcaaagaa acctgaggat cgacctaaca
catctgaaat actaaggacc 2040ttgactgtgt ggaagaaaag cccagagaaa
aatgaacgac acacatgtta gagcccttct 2100gaaaaagtat cctgcttctg
atatgcagtt ttccttaaat tatctaaaat ctgctaggga 2160atatcaatag
atatttacct tttattttaa tgtttccttt aattttttac tatttttact
2220aatctttctg cagaaacaga aaggttttct tctttttgct tcaaaaacat
tcttacattt 2280tactttttcc tggctcatct ctttattctt tttttttttt
ttaaagacag agtctcgctc 2340tgttgcccag gctggagtgc aatgacacag
tcttggctca ctgcaacttc tgcctcttgg 2400gttcaagtga ttctcctgcc
tcagcctcct gagtagctgg attacaggca tgtgccaccc 2460acccaactaa
tttttgtgtt tttaataaag acagggtttc accatgttgg ccaggctggt
2520ctcaaactcc tgacctcaag taatccacct gcctcggcct cccaaagtgc
tgggattaca 2580gggatgagcc accgcgccca gcctcatctc tttgttctaa
agatggaaaa accaccccca 2640aattttcttt ttatactatt aatgaatcaa
tcaattcata tctatttatt aaatttctac 2700cgcttttagg ccaaaaaaat
gtaagatcgt tctctgcctc acatagctta caagccagct 2760ggagaaatat
ggtactcatt aaaaaaaaaa aaaaagtgat gtacaacc 280861260DNAHomo sapiens
6gggggtgggg tccccggggc ggggcggggc gcgctgtgtc gcgggtcgga gctcggtcct
60gctggaggcc acgggtgcca cacactcggt cccgacatga tggcgagcat gcgagtggtg
120aaggagctgg aggatcttca gaagaagcct cccccatacc tgcggaacct
gtccagcgat 180gatgccaatg tcctggtgtg gcacgctctc ctcctacccg
accaacctcc ctaccacctg 240aaagccttca acctgcgcat cagcttcccg
ccggagtatc cgttcaagcc tcccatgatc 300aaattcacaa ccaagatcta
ccaccccaac gtggacgaga acggacagat ttgcctgccc 360atcatcagca
gtgagaactg gaagccttgc accaagactt gccaagtcct ggaggccctc
420aatgtgctgg tgaatagacc gaatatcagg gagcccctgc ggatggacct
cgctgacctg 480ctgacacaga atccggagct gttcagaaag aatgccgaag
agttcaccct ccgattcgga 540gtggaccggc cctcctaact catgttctga
ccctctgtgc actggatcct cggcatagcg 600gacggacaca cctcatggac
tgaggccaga gccccctgtg gcccattccc cattcatttt 660tcccttctta
ggttgttagt cattagtttg tgtgtgtgtg tggtggaggg aagggagcta
720tgagtgtgtg tgttgtgtat ggactcactc ccaggttcac ctggccacag
gtgcaccctt 780cccacaccct ttacattccc cagagccaag ggagtttaag
tttgcagtta caggccagtt 840ctccagctct ccatcttaga gagacaggtc
accttgcagg cctgcttgca ggaaatgaat 900ccagcagcca actcgaatcc
ccctagggct caggcactga gggcctgggg acagtggagc 960atatgggtgg
gagacagatg gagggtaccc tatttacaac tgagtcagcc aagccactga
1020tgggaatata cagatttagg tgctaaaccg tttattttcc acggatgagt
cacaatctga 1080agaatcaaac ttccatcctg aaaatctata tgtttcaaaa
ccacttgcca tcctgttaga 1140ttgccagttc ctgggaccag gcctcagact
gtgaagtata tatcctccag cattcagtcc 1200agggggagcc acggaaacca
tgttcttgct taagccatta aagtcagaga tgaattctgg 12607983DNAHomo sapiens
7gtggaattca tggcatctac ttcgtatgac tattgcagag tgcccatgga agacggggat
60aagcgctgta agcttctgct ggggatagga attctggtgc tcctgatcat cgtgattctg
120ggggtgccct tgattatctt caccatcaag gccaacagcg aggcctgccg
ggacggcctt 180cgggcagtga tggagtgtcg caatgtcacc catctcctgc
aacaagagct gaccgaggcc 240cagaagggct ttcaggatgt ggaggcccag
gccgccacct gcaaccacac tgtgatggcc 300ctaatggctt ccctggatgc
agagaaggcc caaggacaaa agaaagtgga ggagcttgag 360ggagagatca
ctacattaaa ccataagctt caggacgcgt ctgcagaggt ggagcgactg
420agaagagaaa accaggtctt aagcgtgaga atcgcggaca agaagtacta
ccccagctcc 480caggactcca gctccgctgc ggcgccccag ctgctgattg
tgctgctggg cctcagcgct 540ctgctgcagt gagatcccag gaagctggca
catcttggaa ggtccgtcct gctcggcttt 600tcgcttgaac attcccttga
tctcatcagt tctgagcggg tcatggggca acacggttag 660cggggagagc
acggggtagc cggagaaggg cctctggagc aggtctggag gggccatggg
720gcagtcctgg gtgtggggac acagtcgggt tgacccaggg ctgtctccct
ccagagcctc 780cctccggaca atgagtcccc cctcttgtct cccaccctga
gattgggcat ggggtgcggt 840gtggggggca tgtgctgcct gttgttatgg
gttttttttg cggggggggt tgcttttttc 900tggggtcttt gagctccaaa
aaataaacac ttcctttgag ggagagcaaa aaaaaaaaaa 960aaaaaaaaaa
aaaaaaaaaa aaa 9838634DNAHomo sapiens 8cggctgagag gcagcgaact
catctttgcc agtacaggag cttgtgccgt ggcccacagc 60ccacagccca cagccatggg
ctgggacctg acggtgaaga tgctggcggg caacgaattc 120caggtgtccc
tgagcagctc catgtcggtg tcagagctga aggcgcagat cacccagaag
180attggcgtgc acgccttcca gcagcgtctg gctgtccacc cgagcggtgt
ggcgctgcag 240gacagggtcc cccttgccag ccagggcctg ggccctggca
gcacggtcct gctggtggtg 300gacaaatgcg acgaacctct gagcatcctg
gtgaggaata acaagggccg cagcagcacc 360tacgaggtcc ggctgacgca
gaccgtggcc cacctgaagc agcaagtgag cgggctggag 420ggtgtgcagg
acgacctgtt ctggctgacc ttcgagggga agcccctgga ggaccagctc
480ccgctggggg agtacggcct caagcccctg agcaccgtgt tcatgaatct
gcgcctgcgg 540ggaggcggca cagagcctgg cgggcggagc taagggcctc
caccagcatc cgagcaggat 600caagggccgg aaataaaggc tgttgtaaga gaat
6349768DNAHomo sapiens 9ccttcagcat aaaagctgat ccacaaacaa gaggagcacc
agacctcctc ttggcttcga 60gatggcttcg ccacaccaag agcccaaacc tggagacctg
attgagattt tccgccttgg 120ctatgagcac tgggccctgt atataggaga
tggctacgtg atccatctgg ctcctccaag 180tgagtacccc ggggctggct
cctccagtgt cttctcagtc ctgagcaaca gtgcagaggt 240gaaacggggg
cgcctggaag atgtggtggg aggctgttgc tatcgggtca acaacagctt
300ggaccatgag taccaaccac ggcccgtgga ggtgatcatc agttctgcga
aggagatggt 360tggtcagaag atgaagtaca gtattgtgag caggaactgt
gagcactttg tcgcccagct 420gagatatggc aagtcccgct gtaaacaggt
ggaaaaggcc aaggttgaag tcggtgtggc 480cacggcgctt ggaatcctgg
ttgttgctgg atgctctttt gcgattagga gataccaaaa 540aaaagcaaca
gcctgaagca gccacaaaat cctgtgttag aagcagctgt gggggtccca
600gtggagatga gcctccccca tgcctccagc agcctgaccc tcgtgccctg
tctcaggcgt 660tctctagatc ctttcctctg tttccctctc tcgctggcaa
aagtatgatc taattgaaac 720aagactgaag gatcaataaa cagccatctg
ccccttcaaa aaaaaaaa 76810337DNAHomo sapiens 10gcctcctgca gcccccatag
cagattctga gaacaataac tccacaatgg cgtcggcctc 60ggagggtgaa atggagtgtg
ggcaggagct gaaggaggaa gggggcccgt gcttgttccc 120gggctcagac
agttggcaag aaaaccccga ggagccctgt tccaaagcct cctggaccgt
180ccaagaagga gctacatcag aggttttggt agatgctgct gtagacctca
tatccgatga 240atgggaagct gctaatgcca tacccagcaa gagaaggaag
caggatgcag ccccgcttga 300ggccgccagc gtgccttctg cagactgtga gcagagc
337112254DNAHomo sapiens 11aatcgaaagt agactctttt ctgaagcatt
tcctgggatc agcctgacca cgctccatac 60tgggagaggc ttctgggtca aaggaccagt
ctgcagaggg atcctgtggc tggaagcgag 120gaggctccac acggccgttg
cagctaccgc agccaggatc tgggcatcca ggcacggcca 180tgacccctcc
gaggctcttc tgggtgtggc tgctggttgc aggaacccaa ggcgtgaacg
240atggtgacat gcggctggcc gatgggggcg ccaccaacca gggccgcgtg
gagatcttct 300acagaggcca gtggggcact gtgtgtgaca acctgtggga
cctgactgat gccagcgtcg 360tctgccgggc cctgggcttc gagaacgcca
cccaggctct gggcagagct gccttcgggc 420aaggatcagg ccccatcatg
ctggacgagg tccagtgcac gggaaccgag gcctcactgg 480ccgactgcaa
gtccctgggc tggctgaaga gcaactgcag gcacgagaga gacgctggtg
540tggtctgcac caatgaaacc aggagcaccc acaccctgga cctctccagg
gagctctcgg 600aggcccttgg ccagatcttt gacagccagc ggggctgcga
cctgtccatc agcgtgaatg 660tgcagggcga ggacgccctg ggcttctgtg
gccacacggt catcctgact gccaacctgg 720aggcccaggc cctgtggaag
gagccgggca gcaatgtcac catgagtgtg gatgctgagt 780gtgtgcccat
ggtcagggac cttctcaggt acttctactc ccgaaggatt gacatcaccc
840tgtcgtcagt caagtgcttc cacaagctgg cctctgccta tggggccagg
cagctgcagg 900gctactgcgc aagcctcttt gccatcctcc tcccccagga
cccctcgttc cagatgcccc 960tggacctgta tgcctatgca gtggccacag
gggacgccct gctggagaag ctctgcctac 1020agttcctggc ctggaacttc
gaggccttga cgcaggccga ggcctggccc agtgtcccca 1080cagacctgct
ccaactgctg ctgcccagga gcgacctggc ggtgcccagc gagctggccc
1140tactgaaggc cgtggacacc tggagctggg gggagcgtgc ctcccatgag
gaggtggagg 1200gcttggtgga gaagatccgc ttccccatga tgctccctga
ggagctcttt gagctgcagt 1260tcaacctgtc cctgtactgg agccacgagg
ccctgttcca gaagaagact ctgcaggccc 1320tggaattcca cactgtgccc
ttccagttgc tggcccggta caaaggcctg aacctcaccg 1380aggataccta
caagccccgg atttacacct cgcccacctg gagtgccttt gtgacagaca
1440gttcctggag tgcacggaag tcacaactgg tctatcagtc cagacggggg
cctttggtca 1500aatattcttc tgattacttc caagccccct ctgactacag
atactacccc taccagtcct 1560tccagactcc acaacacccc agcttcctct
tccaggacaa gagggtgtcc tggtccctgg 1620tctacctccc caccatccag
agctgctgga actacggctt ctcctgctcc tcggacgagc 1680tccctgtcct
gggcctcacc aagtctggcg gctcagatcg caccattgcc tacgaaaaca
1740aagccctgat gctctgcgaa gggctcttcg tggcagacgt caccgatttc
gagggctgga 1800aggctgcgat tcccagtgcc ctggacacca acagctcgaa
gagcacctcc tccttcccct 1860gcccggcagg gcacttcaac ggcttccgca
cggtcatccg ccccttctac ctgaccaact 1920cctcaggtgt ggactagacg
cgtggccaag ggtggtgaga accggagaac cccaggacgc 1980cctcactgca
ggctcccctc ctcggcttcc ttcctctctg caatgacctt caacaaccgg
2040ccaccagatg tcgccctact cacctgaggc tcagcttcaa gaaattactg
gaaggcttcc 2100actagggtcc accaggagtt ctcccaccac ctcaccagtt
tccaggtggt aagcaccagg 2160aggccctcga ggttgctctg gatcccccca
cagcccctgg tcagtctgcc cttgtcactg 2220gtctgaggtc attaaaatta
cattgaggtt ccta 2254122815DNAHomo sapiens 12aggaagcgga ggaaggtgaa
gtaggaccga attcctgtgc cgaagaggcc tgcagtggga 60gagcaggatg ggggctccgg
aggtggcgcc caggctctga gctaccctag gtctgcagac 120tagcgggcat
tggccagaga catggcccag ccactggcct tcatcctcga tgtccctgag
180accccagggg accagggcca gggccccagc ccctatgatg aaagcgaagt
gcacgactcc 240ttccagcagc tcatccagga gcagagccag tgcacggccc
aggaggggct ggagctgcag 300cagagagagc gggaggtgac aggaagtagc
cagcagacac tctggcggcc cgagggcacc 360cagagcacgg ccacactccg
catcctggcc agcatgccca gccgcaccat tggccgcagc 420cgaggtgcca
tcatctccca gtactacaac cgcacggtgc agcttcggtg caggagcagc
480cggcccctgc tcgggaactt tgtccgctcc gcctggccca gcctccgcct
gtacgacctg 540gagctggacc ccacggccct ggaggaggag gagaagcaga
gcctcctggt gaaggagttc 600cagagcctgg cagtggcaca gcgggaccac
atgcttcgcg ggatgccctt aagcctggct 660gagaaacgca gcctgcgaga
gaagagcagg accccgaggg ggaagtggag gggccagccg 720ggcagcggcg
gggtctgctc ctgctgtggc cggctcagat atgcctgcgt gctggccttg
780cacagcctgg gcctggcgct gctctccgcc ctgcaggccc tgatgccgtg
gcgctacgcc 840ctgaagcgca tcgggggcca gttcggctcc agcgtgctct
cctacttcct ctttctcaag 900accctgctgg ctttcaatgc cctcctgctg
ctgctgctgg tggccttcat catgggccct 960caggtcgcct tcccacccgc
cctgccgggc cctgcccccg tctgcacagg cctggagctc 1020ctcacaggcg
cgggttgctt cacccacacc gtcatgtact acggccacta cagtaacgcc
1080acgctgaacc agccgtgtgg cagccccctg gatggcagcc agtgcacacc
cagggtgggt 1140ggcctgccct acaacatgcc cctggcctac ctctccactg
tgggcgtgag cttctttatc 1200acctgcatca ccctggtgta cagcatggct
cactctttcg gggagagcta ccgggtgggc 1260agcacctctg gcatccacgc
catcaccgtc ttctgctcct gggactacaa ggtgacgcag 1320aagcgggcct
cccgcctcca gcaggacaat attcgcaccc ggctgaagga gctgctggcc
1380gagtggcagc tgcggcacag ccccaggagc gtgtgcggga ggctgcggca
ggcggctgtg 1440ctggggcttg tgtggctgct gtgtctgggg accgcgctgg
gctgcgccgt ggccgtccac 1500gtcttctcgg agttcatgat ccagagtcca
gaggctgctg gccaggaggc tgtgctgctg 1560gtcctgcccc tggtggttgg
cctcctcaac ctgggggccc cctacctgtg ccgtgtcctg 1620gccgccctgg
agccgcatga ctccccggta ctggaggtgt acgtggccat ctgcaggaac
1680ctcatcctca agctggccat cctggggaca ctgtgctacc actggctggg
ccgcagggtg 1740ggcgtcctgc agggccagtg ctgggaggat tttgtgggcc
aggagctgta ccggttcctg 1800gtgatggact tcgtcctcat gttgctggac
acgctttttg gggaactggt gtggaggatt 1860atctccgaga agaagctgaa
gaggaggcgg aagccggagt ttgacattgc ccggaatgtc 1920ctggagctga
tttatgggca gactctgacc tggctggggg tgctcttctc gcccctcctc
1980cccgccgtgc agatcatcaa gctgctgctc gtcttctatg tcaagaagac
cagccttctg 2040gccaactgcc aggcgccgcg ccggccctgg ctggcctcac
acatgagcac cgtcttcctc 2100acgctgctct gcttccccgc cttcctgggc
gccgctgtct tcctctgcta cgccgtctgg 2160caggtgaagc cctcgagcac
ctgcggcccc ttccggaccc tggacaccat gtacgaggcc 2220ggcagggtgt
gggtgcgcca cctggaggcg gcaggcccca gggtctcctg gctgccctgg
2280gtgcaccggt acctgatgga aaacaccttc tttgtcttcc tggtgtcagc
cctgctgctg 2340gccgtgatct acctcaacat ccaggtggtg cggggccagc
gcaaggtcat ctgcctgctc 2400aaggagcaga tcagcaatga gggtgaggac
aaaatcttct taatcaacaa gcttcactcc 2460atctacgaga ggaaggagag
ggaggagagg agcagggttg ggacaaccga ggaggctgcg 2520gcaccccctg
ccctgctcac agatgaacag gatgcctagg gggacggcga tgggcctcac
2580gggcccgccc agcaccctga gaccacactg ttgcctccca gtgaccctgc
tgggacacca 2640ggacaaggaa gacagtttcg cctctcgaaa gccgcagctg
cgcctaggct ggagctggaa 2700gggtgggtga atccggcttg ggcatcccca
atgaactctg ccctgcctgg gactctattt 2760attctgatta aaggggtttt
gcaaatggga aaaaaaaaaa aaaaaaaaaa aaaaa 2815137204DNAHomo sapiens
13gccccagcac tcgccggcgg cagtgaaagg acgcgccgga gccggataac agaaagtaac
60gtgaaggaat tcaggtgact cagacatgga ggagagaaga cctcatctgg atgccaggcc
120caggaattcc cataccaacc acagaggccc tgtggatgga gagttaccac
caagagctag 180aaatcaggcc aataacccac cagccaatgc tctccgagga
ggagccagcc accctggaag 240gcatcctagg gccaacaacc atcctgctgc
ttactggcag agggaagaga gatttagggc 300catgggcagg aacccacatc
aaggaaggag gaaccaggag gggcatgcca gcgacgaagc 360tagagaccaa
agacatgacc aggagaatga caccaggtgg agaaatggca accaggactg
420taggaaccgc agaccaccat ggtccaatga caacttccag cagtggcgga
ctccccacca 480gaagcctaca gaacagccac agcaggcgaa gaaactgggc
tacaagttct tagaaagtct 540tctgcagaaa gacccttctg aggtggtcat
cacacttgcc acaagtttag ggctgaaaga 600gctcctttct cattcttcca
tgaaatctaa cttccttgag ctcatctgtc aggttcttcg 660gaaggcttgt
agctccaaaa tggatcgcca gagtgttctc catgtactgg gcatattgaa
720aaactccaaa tttctcaaag tctgcctgcc tgcttatgtg gtagggatga
tcactgaacc 780catccctgac atccgaaacc agtatccaga gcacataagc
aacatcatct ccctcctcca 840ggaccttgta agtgtcttcc ctgccagctc
tgtgcaggaa acttccatgc tggtttccct 900cctgccaacc tctcttaatg
ctctgagagc ctctggtgtt gacatagaag aggaaacgga 960gaagaacctg
gaaaaggtac agactatcat tgaacatctg caggaaaaga ggcgagaggg
1020cactttgaga gtggatacct acactctagt gcagcctgag gcagaagacc
atgttgagag 1080ctaccgaacc atgcccattt accctaccta caatgaagtg
cacttggatg agaggccctt 1140ccttcgcccc aatatcattt ctggaaaata
cgacagcact gctatctatc tggataccca 1200cttccggctc ctgcgagaag
atttcgtcag acctttacgg gaaggtattt tggaacttct 1260ccaaagcttt
gaagaccagg gcctgaggaa gagaaagttt gatgacatcc gaatctactt
1320tgacaccagg attatcaccc ccatgtgttc atcatcaggc atagtctaca
aggtgcagtt 1380tgacacaaaa ccactgaagt ttgttcgctg gcagaattcc
aaacgattgc tctatgggtc 1440tttggtatgc atgtccaagg acaacttcga
gacatttctt tttgccaccg tatctaacag 1500ggagcaggaa gatctctgcc
gaggaattgt ccagctctgc ttcaatgagc aaagccaaca 1560gctgctagca
gaggtccagc cctctgactc tttcctcatg gtagagacaa ctgcatactt
1620tgaggcctac aggcacgtcc tggaaggact ccaggaggtc caggaggaag
atgttccctt 1680ccagaggaat atcgtggagt gtaactctca tgtgaaggag
ccaaggtact tgctaatggg 1740gggcagatat gactttaccc ccttaataga
gaatccttca gccactgggg aatttctaag 1800aaatgtcgag ggtttgagac
atcccagaat taatgtctta gatcctggcc agtggccctc 1860aaaagaagcc
ctgaagctgg atgactccca gatggaagcc ttgcagtttg ctctcacaag
1920ggaactggct attattcaag gacctcctgg aacaggcaaa acctatgtgg
gtctaaaaat 1980tgttcaggcc ctcctaacca acgagtctgt ttggcaaatt
agcctccaga agttccccat 2040cttggttgtg tgttatacta atcatgcttt
ggaccagttt ctggaaggca tctacaattg 2100tcagaagacc agcattgtgc
gggtgggtgg aaggagcaac agtgaaatcc tgaagcagtt 2160caccctaagg
gagctgagga acaagcggga attccgccgc aacctcccca tgcacctccg
2220aagggcctac atgagtatca tgacacagat gaaggagtca gagcaagagc
ttcatgaagg 2280agccaagacc ctggagtgca ccatgcgtgg tgtcctacgg
gaacagtacc tgcagaagta 2340catctcaccc cagcactggg aaagtctcat
gaatggacca gtgcaggata gtgaatggat 2400ttgcttccag cactggaagc
attccatgat gctggagtgg ctaggtcttg gtgtcggttc 2460tttcacgcaa
agtgtttctc cagcaggacc tgagaataca gcccaggcag aaggggatga
2520ggaggaagaa ggggaggagg agagttcgct gatagagatc gcagaggaag
ctgacctgat 2580tcaagcagac cgggtgattg aggaggaaga ggtggtgagg
ccccagcggc ggaagaagga 2640agagagtgga gcagaccagg agttggctaa
aatgcttctg gccatgaggc tagaccattg 2700tggcactggg acagcagctg
gacaggagca agccacagga gagtggcaga cccagcgcaa 2760ccagaaaaag
aaaatgaaaa aaagagtgaa ggatgagctt cgcaaactga acaccatgac
2820tgcagccgag gccaacgaga tcgaggatgt ttggcacctg gacctcagtt
ctcgctggca 2880gctttatagg ctctggctac agttgtacca ggctgacacc
cgccggaaga tcctcagcta 2940tgaacgccag taccgcacat cagcagaaag
aatggccgag ctgagactcc aggaagacct 3000gcacattctt aaagatgccc
aggttgtagg aatgacaacc acaggtgctg ccaaataccg 3060ccagatccta
cagaaggtgg agccgaggat tgtcatagtg gaagaagctg cggaagtcct
3120tgaggcccat accattgcca cattgagcaa agcttgccag cacctcattt
tgattgggga 3180ccaccagcag ctgcgcccca gtgccaacgt gtatgatctg
gccaagaact tcaaccttga 3240ggtgtccctt tttgaacggc tagtgaaagt
aaacattccc tttgtccgtc tgaattacca 3300gcaccgtatg tgccctgaaa
ttgcccgcct tttgaccccc cacatttacc aggatctgga 3360gaatcatcca
tctgttctta agtatgagaa gattaagggg gtgtcttcca accttttctt
3420tgtagaacac aactttcctg aacaggaaat ccaagagggc aaaagccatc
agaaccagca 3480tgaggctcac tttgtggtag agctgtgcaa gtacttcctg
tgccaggaat acctgccttc 3540ccagatcacc atcctcacta cctataccgg
gcagctcttc tgcctgcgca aactgatgcc 3600tgccaagaca tttgctggcg
tcagggtcca tgttgtggac aaataccaag gggaagagaa 3660tgacatcatc
ctcctctcgc tagtgcggag caaccaagaa ggcaaggtgg gttttctgca
3720gatatccaac cgcatctgtg tggccttgtc ccgagccaag aagggaatgt
actgcatcgg 3780aaacatgcag atgctggcca aggtgcccct gtggagcaag
atcattcata cacttcgaga 3840gaacaatcaa ataggcccca tgctccggct
ctgctgccag aaccaccctg aaacccacac 3900cttagtatcc aaagcttctg
acttccaaaa agtacccgaa ggaggctgca gcctgccctg 3960cgagttccgc
ctgggctgtg ggcatgtctg cacccgtgcc tgccaccctt atgactcttc
4020acacaaggag ttccaatgca tgaagccatg ccagaaggtc atctgtcagg
aagggcaccg 4080gtgtcccctt gtttgcttcc aggagtgtca gccttgtcag
gtgaaggtgc ccaaaatcat 4140tcctcggtgc ggccatgaac aaatggtccc
ttgttccgtg cctgagtcag atttctgctg 4200ccaggagcct tgctccaagt
ctctgagatg tgggcacaga tgcagccacc catgtggtga 4260ggactgtgtg
cagttgtgtt cagaaatggt caccataaaa ctcaagtgtg ggcacagtca
4320accggtaaaa tgtggtcatg tggaaggcct cctgtatggt ggtctgctag
tcaagtgtac 4380cacaaagtgt ggcactatct tggactgcgg gcatccttgc
ccaggctcct gccacagctg 4440cttcgaaggg cgtttccatg aacgctgtca
gcagccctgc aagcgcctgc ttatctgctc 4500acacaagtgc caggaaccat
gcattggtga gtgcccaccc tgccagcgga cctgtcagaa 4560ccgctgtgtc
cacagccagt gcaagaagaa atgtggggag ctgtgtagtc cctgcgtgga
4620accctgtgtc tggcgctgcc agcactacca gtgcaccaaa ctctgctctg
agccctgcaa 4680ccgaccccca tgctatgtgc cttgtactaa gctgctagtt
tgtggccacc cctgcattgg 4740tctctgtggg gagccatgcc ccaagaaatg
ccggatctgc cacatggatg aggtcaccca 4800aatattcttt ggctttgagg
atgagcctga tgcccgcttt gtgcagctgg aagactgcag 4860ccacatcttt
gaggtgcaag ccctagaccg ctacatgaat gaacagaagg atgatgaagt
4920cgccatcaga ttgaaagtct gccctatctg ccaggtgccc atccgcaaaa
acctgaggta 4980tggaactagc ataaaacagc ggctagaaga gattgaaatc
atcaaggaaa agatccaggg 5040ctcagcaggg gaaatagcaa ccagccagga
acggcttaag gccctgctgg agaggaagag 5100cctcctccac cagctgcttc
ctgaagactt cctgatgtta aaggagaagc tggcccagaa 5160aaatctgtca
gtgaaggacc tgggtctggt tgagaattac atcagcttct atgaccacct
5220ggccagcctg tgggattccc tgaaaaagat gcatgtctta gaagagaaaa
gagtgaggac 5280tcgactagaa caggtccatg agtggctggc caagaagcgc
ttgagcttca ctagccagga 5340actaagtgac ctccgaagtg aaatccagag
gctcacatac ctggtgaacc ttctgacccg 5400ctacaagata gcagagaaga
aggtgaaaga tagcatagca gtagaggtct atagtgtcca 5460gaatatcctt
gagaaaacat gtaagttcac ccaagaggat gaacaacttg tgcaggaaaa
5520gatggaagct ctgaaagcca cccttccctg ctctggcctg ggcatctcag
aggaagagcg 5580agtgcagatt gtcagtgcca taggttatcc tcgtggtcac
tggttcaagt gccgcaatgg 5640ccatatctat gtgattggcg attgtggggg
agccatggag aggggcacgt gtcctgactg 5700taaggaagtg attggtggca
caaatcatac tctggaaaga agcaaccagc ttgcttctga 5760aatggatgga
gcccagcatg ctgcctggtc tgacacggcc aacaacctga tgaactttga
5820ggagatccag gggatgatgt aggaagatgg tacaccactg ccttttgccc
tcgccactga 5880atgactgggg ccagctccct aatgaaggaa ctgaagtttg
ttttttatta tcatcctttt 5940taggctgggc gcagtggctt acgcctgtaa
tcccagcact ttgggaggcc gaggcaggcg 6000gatcacgagg tcaggagttc
gagaccagcc tgaccaacat ggcgaaaccc cgtctctact 6060aaaaatacaa
aaattagctg ggcgttatgg cgggcgcctg taatcccagc tacttgggag
6120gctgaggcag aagaatcgct taaacccagg aggcggaggt tgcagtgagc
tgagatcatg 6180ccattgcact ccagtctggg cgacaggagc aagactctgt
ctcaaaaaaa aaaaatcatt 6240ctttttagtc ttagcaccta cttaaggatc
cacttttagg gctcacccac atttgtttct 6300agatttaccc ctgcgctaga
gtaagcactt tatctccaga actgagagca aagttaacaa 6360atctcacccc
ttctctcctg caaattagtg gacagactcc ctggaacatg tttggggctt
6420ccacctaggg ccacctagtg gtatctctgg gtctttactt ggtcagatgt
ttattctaca 6480ttgttcccca ggaacagagt atgagctcat tgatgcagac
cgattctaat tgccaggccc 6540taatttgcag actaactctc ataataaaca
gaggcccata gttgtttatg aactgcttat 6600cccttaaagg agcacaagaa
cccctccctg ccctccttgg gcaccctgcc tccaggagat 6660ggaggcacgt
gataagacaa aagactgcac caactcaccc tgacacagtt acatagtcac
6720tgagagtggg gaagatggga cagcccacat gctgcataag atgggcctta
tgcagcaggc 6780ccaggtcgtc attaaggagt gacccctttc ctgtaacctg
cactttggga tggtagaagt 6840ttctttacct gctgacaggt ttggtggcac
tgctggttac ccctgggccc tgaatggagc 6900taaaatcaca tttggtacca
gcagcaccta tcccaagtgt gatccttcat cccaacactc 6960cctcttggag
ctgttccctg ggtagagcta gcatgccagc agcttctgca ggctccaaac
7020ccaggccaga agccagaccc aggcctgctg cctgcatctg cattccctcc
ttccagtgtt 7080ccttagaaca gacatttagg tatctcaggt cctttctaag
tgtccctttc ctatgtatgc 7140atttcctttt tttgtcttta ctatgcactt
tagcttataa agccaattaa aaacgatgat 7200tgag 7204142054DNAHomo sapiens
14gggaagctcg ggccggcagg gtttccccgc acgctggcgc ccagctcccg gcgcggaggc
60cgctgtaagt ttcgctttcc attcagtgga aaacgaaagc tgggcggggt gccacgagcg
120cggggccaga ccaaggcggg cccggagcgg aacttcggtc ccagctcggt
ccccggctca 180gtcccgacgt ggaactcagc agcggaggct ggacgcttgc
atggcgcttg agagattcca 240tcgtgcctgg ctcacataag cgcttcctgg
aagtgaagtc gtgctgtcct gaacgcgggc 300caggcagctg cggcctgggg
gttttggagt gatcacgaat gagcaaggcg tttgggctcc 360tgaggcaaat
ctgtcagtcc atcctggctg agtcctcgca gtccccggca gatcttgaag
420aaaagaagga agaagacagc aacatgaaga gagagcagcc cagagagcgt
cccagggcct 480gggactaccc tcatggcctg gttggtttac acaacattgg
acagacctgc tgccttaact 540ccttgattca ggtgttcgta atgaatgtgg
acttcaccag gatattgaag aggatcacgg 600tgcccagggg agctgacgag
cagaggagaa gcgtcccttt ccagatgctt ctgctgctgg 660agaagatgca
ggacagccgg cagaaagcag tgcggcccct ggagctggcc tactgcctgc
720agaagtgcaa cgtgcccttg tttgtccaac atgatgctgc ccaactgtac
ctcaaactct 780ggaacctgat taaggaccag atcactgatg tgcacttggt
ggagagactg caggccctgt 840atacgatccg ggtgaaggac tccttgattt
gcgttgactg tgccatggag agtagcagaa 900acagcagcat gctcaccctc
ccactttctc tttttgatgt ggactcaaag cccctgaaga 960cactggagga
cgccctgcac tgcttcttcc agcccaggga gttatcaagc aaaagcaagt
1020gcttctgtga gaactgtggg aagaagaccc gtgggaaaca ggtcttgaag
ctgacccatt 1080tgccccagac cctgacaatc cacctcatgc gattctccat
caggaattca cagacgagaa 1140agatctgcca ctccctgtac ttcccccaga
gcttggattt cagccagatc cttccaatga 1200agcgagagtc ttgtgatgct
gaggagcagt ctggagggca gtatgagctt tttgctgtga 1260ttgcgcacgt
gggaatggca gactccggtc attactgtgt ctacatccgg aatgctgtgg
1320atggaaaatg gttctgcttc aatgactcca atatttgctt ggtgtcctgg
gaagacatcc 1380agtgtaccta cggaaatcct aactaccact ggcaggaaac
tgcatatctt ctggtttaca 1440tgaagatgga gtgctaatgg aaatgcccaa
aaccttcaga gattgacacg ctgtcatttt 1500ccatttccgt tcctggatct
acggagtctt ctaagagatt ttgcaatgag gagaagcatt 1560gttttcaaac
tatataactg agccttattt ataattaggg atattatcaa aatatgtaac
1620catgaggccc ctcaggtcct gatcagtcag aatggatgct ttcaccagca
gacccggcca 1680tgtggctgct cggtcctggg tgctcgctgc tgtgcaagac
attagccctt tagttatgag 1740cctgtgggaa cttcaggggt tcccagtggg
gagagcagtg gcagtgggag gcatctgggg 1800gccaaaggtc agtggcaggg
ggtatttcag tattatacaa ctgctgtgac cagacttgta 1860tactggctga
atatcagtgc tgtttgtaat ttttcacttt gagaaccaac attaattcca
1920tatgaatcaa gtgttttgta actgctattc atttattcag caaatattta
ttgatcatct 1980cttctccata agatagtgtg ataaacacag tcatgaataa
agttattttc cacaaaaaaa 2040aaaaaaaaaa aaaa 2054152961DNAHomo sapiens
15aagagatgat ttctccatcc tgaacgtgca gcgagcttgt caggaagatc ggaggtgcca
60agtagcagag aaagcatccc ccagctctga cagggagaca gcacatgtct aaggcccaca
120agccttggcc ctaccggagg agaagtcaat tttcttctcg aaaatacctg
aaaaaagaaa 180tgaattcctt ccagcaacag ccaccgccat tcggcacagt
gccaccacaa atgatgtttc 240ctccaaactg gcagggggca gagaaggacg
ctgctttcct cgccaaggac ttcaactttc 300tcactttgaa caatcagcca
ccaccaggaa acaggagcca accaagggca atggggcccg 360agaacaacct
gtacagccag tacgagcaga aggtgcgccc ctgcattgac ctcatcgact
420ccctgcgggc tctgggtgtg gagcaggacc tggccctgcc agccatcgcc
gtcatcgggg 480accagagctc gggcaagagc tctgtgctgg aggcactgtc
aggagtcgcg cttcccagag 540gcagcggaat cgtaaccagg tgtccgctgg
tgctgaaact gaaaaagcag ccctgtgagg 600catgggccgg aaggatcagc
taccggaaca ccgagctaga gcttcaggac cctggccagg 660tggagaaaga
gatacacaaa gcccagaacg tcatggccgg gaatggccgg ggcatcagcc
720atgagctcat cagcctggag atcacctccc ctgaggttcc agacctgacc
atcattgacc 780ttcccggcat caccagggtg gctgtggaca accagccccg
agacatcgga ctgcagatca 840aggctctcat caagaagtac atccagaggc
agcagacgat caacttggtg gtggttccct 900gtaacgtgga cattgccacc
acggaggcgc tgagcatggc ccatgaggtg gacccggaag 960gggacaggac
catcggtatc ctgaccaaac cagatctaat ggacaggggc actgagaaaa
1020gcgtcatgaa tgtggtgcgg aacctcacgt accccctcaa gaagggctac
atgattgtga 1080agtgccgggg ccagcaggag atcacaaaca ggctgagctt
ggcagaggca accaagaaag 1140aaattacatt ctttcaaaca catccatatt
tcagagttct cctggaggag gggtcagcca 1200cggttccccg actggcagaa
agacttacca ctgaactcat catgcatatc caaaaatcgc 1260tcccgttgtt
agaaggacaa ataagggaga gccaccagaa ggcgaccgag gagctgcggc
1320gttgcggggc tgacatcccc agccaggagg ccgacaagat gttctttcta
attgagaaaa 1380tcaagatgtt taatcaggac atcgaaaagt tagtagaagg
agaagaagtt gtaagggaga 1440atgagacccg tttatacaac aaaatcagag
aggattttaa aaactgggta ggcatacttg 1500caactaatac ccaaaaagtt
aaaaatatta tccacgaaga agttgaaaaa tatgaaaagc 1560agtatcgagg
caaggagctt ctgggatttg tcaactacaa gacatttgag atcatcgtgc
1620atcagtacat ccagcagctg gtggagcccg cccttagcat gctccagaaa
gccatggaaa 1680ttatccagca agctttcatt aacgtggcca aaaaacattt
tggcgaattt ttcaacctta 1740accaaactgt tcagagcacg attgaagaca
taaaagtgaa acacacagca aaggcagaaa 1800acatgatcca acttcagttc
agaatggagc agatggtttt ttgtcaagat cagatttaca 1860gtgttgttct
gaagaaagtc cgagaagaga tttttaaccc tctggggacg ccttcacaga
1920atatgaagtt gaactctcat tttcccagta atgagtcttc ggtttcctcc
tttactgaaa 1980taggcatcca cctgaatgcc tacttcttgg aaaccagcaa
acgtctcgcc aaccagatcc 2040catttataat tcagtatttt atgctccgag
agaatggtga ctccttgcag aaagccatga 2100tgcagatact acaggaaaaa
aatcgctatt cctggctgct tcaagagcag agtgagaccg 2160ctaccaagag
aagaatcctt aaggagagaa tttaccggct cactcaggcg cgacacgcac
2220tctgtcaatt ctccagcaaa gagatccact gaagggcggc gatgcctgtg
gttgttttct 2280tgtgcgtact cattcattct aaggggagtc ggtgcaggat
gccgcttctg ctttggggcc 2340aaactcttct gtcactatca gtgtccatct
ctactgtact ccctcagcat cagagcatgc 2400atcaggggtc cacacaggct
cagctctctc caccacccag ctcttccctg accttcacga 2460agggatggct
ctccagtcct tgggtcccgt agcacacagt tacagtgtcc taagatactg
2520ctatcattct tcgctaattt gtatttgtat tcccttcccc ctacaagatt
atgagacccc 2580agagggggaa ggtctgggtc aaattcttct tttgtatgtc
cagtctcctg cacagcacct 2640gcagcattgt aactgcttaa taaatgacat
ctcactgaac gaatgagtgc tgtgtaagtg 2700atggagatac ctgaggctat
tgctcaagcc caggccttgg acatttagtg actgttagcc 2760ggtccctttc
agatccagtg gccatgcccc ctgcttccca tggttcactg tcattgtgtt
2820tcccagcctc tccactcccc cgccagaaag gagcctgagt gattctcttt
tcttcttgtt 2880tccctgatta tgatgagctt ccattgttct gttaagtctt
gaagaggaat ttaataaagc 2940aaagaaactt tttaaaaacg t 2961163539DNAHomo
sapiens 16caagagttgg taagctcgct gcagtgggtg gagagaggcc tctagacttc
agtttcagtt 60tcctggctct gggcagcagc aagaattcct ctgcctccca tcctaccatt
cactgtcttg 120ccggcagcca gctgagagca atgggaaatg gggagtccca
gctgtcctcg gtgcctgctc 180agaagctggg ttggtttatc caggaatacc
tgaagcccta cgaagaatgt cagacactga 240tcgacgagat ggtgaacacc
atctgtgacg tcctgcagga acccgaacag ttccccctgg 300tgcagggagt
ggccataggt ggctcctatg gacggaaaac agtcttaaga ggcaactccg
360atggtaccct tgtcctcttc ttcagtgact taaaacaatt ccaggatcag
aagagaagcc 420aacgtgacat cctcgataaa actggggata agctgaagtt
ctgtctgttc acgaagtggt 480tgaaaaacaa tttcgagatc cagaagtccc
ttgatgggtt caccatccag gtgttcacaa 540aaaatcagag aatctctttc
gaggtgctgg ccgccttcaa cgctctgagc ttaaatgata 600atcccagccc
ctggatctat cgagagctca aaagatcctt ggataagaca aatgccagtc
660ctggtgagtt tgcagtctgc ttcactgaac tccagcagaa gttttttgac
aaccgtcctg 720gaaaactaaa ggatttgatc ctcttgataa agcactggca
tcaacagtgc cagaaaaaaa 780tcaaggattt accctcgctg tctccgtatg
ccctggagct gcttacggtg tatgcctggg 840aacaggggtg cagaaaagac
aactttgaca ttgctgaagg cgtcagaacc gtactggagc 900tgatcaaatg
ccaggagaag ctgtgtatct attggatggt caactacaac tttgaagatg
960agaccatcag gaacatcctg ctgcaccagc tccaatcagc gaggccagta
atcttggatc 1020cagttgaccc aaccaataat gtgagtggag ataaaatatg
ctggcaatgg ctgaaaaaag 1080aagctcaaac ctggttgact tctcccaacc
tggataatga gttacctgca ccatcttgga 1140atgttctgcc tgcaccactc
ttcacgaccc caggccacct tctggataag ttcatcaagg 1200agtttctcca
gcccaacaaa tgcttcctag agcagattga cagtgctgtt aacatcatcc
1260gtacattcct taaagaaaac tgcttccgac aatcaacagc caagatccag
attgtccggg 1320gaggatcaac cgccaaaggc acagctctga agactggctc
tgatgccgat ctcgtcgtgt 1380tccataactc acttaaaagc tacacctccc
aaaaaaacga gcggcacaaa atcgtcaagg 1440aaatccatga acagctgaaa
gccttttgga gggagaagga ggaggagctt gaagtcagct 1500ttgagcctcc
caagtggaag gctcccaggg tgctgagctt ctctctgaaa tccaaagtcc
1560tcaacgaaag tgtcagcttt gatgtgcttc ctgcctttaa tgcactgggt
cagctgagtt 1620ctggctccac acccagcccc gaggtttatg cagggctcat
tgatctgtat aaatcctcgg 1680acctcccggg aggagagttt tctacctgtt
tcacagtcct gcagcgaaac ttcattcgct 1740cccggcccac caaactaaag
gatttaattc gcctggtgaa gcactggtac aaagagtgtg 1800aaaggaaact
gaagccaaag gggtctttgc ccccaaagta tgccttggag ctgctcacca
1860tctatgcctg ggagcagggg agtggagtgc cggattttga cactgcagaa
ggtttccgga 1920cagtcctgga gctggtcaca caatatcagc agctctgcat
cttctggaag gtcaattaca 1980actttgaaga tgagaccgtg aggaagtttc
tactgagcca gttgcagaaa accaggcctg 2040tgatcttgga cccagccgaa
cccacaggtg acgtgggtgg aggggaccgt tggtgttggc 2100atcttctggc
aaaagaagca aaggaatggt tatcctctcc ctgcttcaag gatgggactg
2160gaaacccaat accaccttgg aaagtgccga caatgcagac accaggaagt
tgtggagcta 2220ggatccatcc tattgtcaat gagatgttct catccagaag
ccatagaatc ctgaataata 2280attctaaaag aaacttctag agatcatctg
gcaatcgctt ttaaagactc ggctcaccgt 2340gagaaagagt cactcacatc
cattcttccc ttgatggtcc ctattcctcc ttcccttgct 2400tcttggactt
cttgaaatca atcaagactg caaacccttt cataaagtct tgccttgctg
2460aactccctct ctgcaggcag cctgccttta aaaatagttg ctgtcatcca
ctttatgtgc 2520atcttatttc tgtcaacttg tatttttttt cttgtatttt
tccaattagc tcctcctttt 2580tccttccagt ctaaaaaagg aatcctctgt
gtcttcaaag caaagctctt tactttcccc 2640ttggttctca taactctgtg
atcttgctct cggtgcttcc aactcatcca cgtcctgtct 2700gtttcctctg
tatacaaaac cctttctgcc cctgctgaca cagacatcct ctatgccagc
2760agccagccaa ccctttcatt agaacttcaa gctctccaaa ggctcagatt
ataactgttg 2820tcatatttat atgaggctgt tgtcttttcc ttctgagcct
gcctttctcc cccccaccca 2880ggagtatcct cttgccaaat caaaagactt
tttccttggg ctttagcctt aaagatactt 2940gaaggtctag gtgctttaac
ctcacatacc ctcacttaaa cttttatcac tgttgcatat 3000accagttgtg
atacaataaa gaatgtatct ggattttgtg cctagttcct agcacacagc
3060ttcaaaaatt ctagagtttc ctgataggag tgtcttttgt attcataaca
agcccttttc 3120acccatgcct gggtttatgc taacaaggtt acccatggtg
ggcccttagt ttcaaggaag 3180gagttggcca agccagaaag accaagcatg
tggttaaagc attggaattt tcagccccat 3240cccaccccca atctccaagg
aggtgatggg gctggaaatt gagttcaatt ttaacatggc 3300cagtgattta
agcaatgctg cctatgtaaa gaaaccccaa taaaaactct ggacagtgag
3360gcttggggag cttcctgatt ggcagacatt ccaatgtact aggaaggtag
cgcatcttga 3420ttccacaggg acaaaggctc ctgagctctg ggcccttcca
gtgcttgcca ccctacatac 3480tctttgtctg gctcttcatt tgtattcttt
ataataaaat ggtgattgta agtagagca 3539171815DNAHomo sapiens
17ggggagatga tccgagccgc gccgccgccg ctgttcctgc tgctgctgct gctgctgctg
60ctagtgtcct gggcgtcccg aggcgaggca gcccccgacc aggacgagat ccagcgcctc
120cccgggctgg ccaagcagcc gtctttccgc cagtactccg gctacctcaa
aagctccggc 180tccaagcacc tccactactg gtttgtggag tcccagaagg
atcccgagaa cagccctgtg 240gtgctttggc tcaatggggg tcccggctgc
agctcactag atgggctcct cacagagcat 300ggccccttcc tggtccagcc
agatggtgtc accctggagt acaaccccta ttcttggaat 360ctgattgcca
atgtgttata cctggagtcc ccagctgggg tgggcttctc ctactccgat
420gacaagtttt atgcaactaa tgacactgag gtcgcccaga gcaattttga
ggcccttcaa 480gatttcttcc gcctctttcc ggagtacaag aacaacaaac
ttttcctgac cggggagagc 540tatgctggca tctacatccc caccctggcc
gtgctggtca tgcaggatcc cagcatgaac 600cttcaggggc tggctgtggg
caatggactc tcctcctatg agcagaatga caactccctg 660gtctactttg
cctactacca tggccttctg gggaacaggc tttggtcttc tctccagacc
720cactgctgct ctcaaaacaa gtgtaacttc tatgacaaca aagacctgga
atgcgtgacc 780aatcttcagg aagtggcccg catcgtgggc aactctggcc
tcaacatcta caatctctat 840gccccgtgtg ctggaggggt gcccagccat
tttaggtatg agaaggacac tgttgtggtc 900caggatttgg gcaacatctt
cactcgcctg ccactcaagc ggatgtggca tcaggcactg 960ctgcgctcag
gggataaagt gcgcatggac cccccctgca ccaacacaac agctgcttcc
1020acctacctca acaacccgta cgtgcggaag gccctcaaca tcccggagca
gctgccacaa 1080tgggacatgt gcaactttct ggtaaactta cagtaccgcc
gtctctaccg aagcatgaac 1140tcccagtatc tgaagctgct tagctcacag
aaataccaga tcctattata taatggagat 1200gtagacatgg cctgcaattt
catgggggat gagtggtttg tggattccct caaccagaag 1260atggaggtgc
agcgccggcc ctggttagtg aagtacgggg acagcgggga gcagattgcc
1320ggcttcgtga aggagttctc ccacatcgcc tttctcacga tcaagggcgc
cggccacatg 1380gttcccaccg acaagcccct cgctgccttc accatgttct
cccgcttcct gaacaagcag 1440ccatactgat gaccacagca accagctcca
cggcctgatg cagcccctcc cagcctctcc 1500cgctaggaga gtcctcttct
aagcaaagtg cccctgcagg cgggttctgc cgccaggact 1560gcccccttcc
cagagccctg tacatcccag actgggccca gggtctccca tagacagcct
1620gggggcaagt tagcacttta ttcccgcagc agttcctgaa tggggtggcc
tggccccttc 1680tctgcttaaa gaatgccctt tatgatgcac tgattccatc
ccaggaaccc aacagagctc 1740aggacagccc acagggaggt ggtggacgga
ctgtaattga tagattgatt atggaattaa 1800attgggtaca gcttc
181518836DNAHomo sapiens 18ccagccttca gccggagaac cgtttactcg
ctgctgtgcc catctatcag caggctccgg 60gctgaagatt gcttctcttc tctcctccaa
ggtctagtga cggagcccgc gcgcggcgcc 120accatgcggc agaaggcggt
atcgcttttc ttgtgctacc tgctgctctt cacttgcagt 180ggggtggagg
caggtaagaa aaagtgctcg gagagctcgg acagcggctc cgggttctgg
240aaggccctga ccttcatggc cgtcggagga ggactcgcag tcgccgggct
gcccgcgctg 300ggcttcaccg gcgccggcat cgcggccaac tcggtggctg
cctcgctgat gagctggtct 360gcgatcctga atgggggcgg cgtgcccgcc
ggggggctag tggccacgct gcagagcctc 420ggggctggtg gcagcagcgt
cgtcataggt aatattggtg ccctgatggg ctacgccacc 480cacaagtatc
tcgatagtga ggaggatgag gagtagccag cagctcccag aacctcttct
540tccttcttgg cctaactctt ccagttagga tctagaactt tgcctttttt
tttttttttt 600tttttttgag atgggttctc actatattgt ccaggctaga
gtgcagtggc tattcacaga 660tgcgaacata gtacactgca gcctccaact
cctagcctca agtgatcctc ctgtctcaac 720ctcccaagta ggattacaag
catgcgccga cgatgcccag aatccagaac tttgtctatc 780actctcccca
acaacctaga tgtgaaaaca gaataaactt cacccagaaa acactt 836192077DNAHomo
sapiens 19ccgagcgcca gcgcggggaa ccgggaaaag gaaaccgtgt tgtgtacgta
agattcagga 60aacgaaacca ggagccgcgg gtgttggcgc aaaggttact cccagaccct
tttccggctg 120acttctgaga aggttgcgca cagctgtgcc cggcagtcta
gaggcgcaga agaggaagcc 180atcgcctggc cccggctctc tggaccttgt
ctcgctcggg agcggaaaca gcggcagcca 240gagaactgtt ttaatcatgg
acaaacaaaa ctcacagatg aatgcttctc acccggaaac 300aaacttgcca
gttgggtatc ctcctcagta tccaccgaca gcattccaag gacctccagg
360atatagtggc taccctgggc cccaggtcag ctacccaccc ccaccagccg
gccattcagg 420tcctggccca gctggctttc ctgtcccaaa tcagccagtg
tataatcagc cagtatataa 480tcagccagtt ggagctgcag gggtaccatg
gatgccagcg ccacagcctc cattaaactg 540tccacctgga ttagaatatt
taagtcagat agatcagata ctgattcatc agcaaattga 600acttctggaa
gttttaacag gttttgaaac taataacaaa tatgaaatta agaacagctt
660tggacagagg gtttactttg cagcggaaga tactgattgc tgtacccgaa
attgctgtgg 720gccatctaga ccttttacct tgaggattat tgataatatg
ggtcaagaag tcataactct 780ggagagacca ctaagatgta gcagctgttg
ttgtccctgc tgccttcagg agatagaaat 840ccaagctcct cctggtgtac
caataggtta tgttattcag acttggcacc catgtctacc 900aaagtttaca
attcaaaatg agaaaagaga ggatgtacta aaaataagtg gtccatgtgt
960tgtgtgcagc tgttgtggag atgttgattt tgagattaaa tctcttgatg
aacagtgtgt 1020ggttggcaaa atttccaagc actggactgg aattttgaga
gaggcattta cagacgctga 1080taactttgga atccagttcc ctttagacct
tgatgttaaa atgaaagctg taatgattgg 1140tgcctgtttc ctcattgact
tcatgttttt tgaaagcact ggcagccagg aacaaaaatc 1200aggagtgtgg
tagtggatta gtgaaagtct cctcaggaaa tctgaagtct gtatattgat
1260tgagactatc taaactcata cctgtatgaa ttaagctgta aggcctgtag
ctctggttgt 1320atacttttgc ttttcaaatt atagtttatc ttctgtataa
ctgatttata aaggtttttg 1380tacatttttt aatactcatt gtcaatttga
gaaaaaggac atatgagttt ttgcatttat 1440taatgaaact tcctttgaaa
aactgctttg aattatgatc tctgattcat tgtccatttt 1500actaccaaat
attaactaag gccttattaa tttttatata aattatatct tgtcctatta
1560aatctagtta caatttattt catgcataag agctaatgtt attttgcaaa
tgccatatat 1620tcaaaaaagc tcaaagataa ttttctttac tattatgttc
aaataatatt caatatgcat 1680attatcttta aaaagttaaa tgttttttta
atcttcaaga aatcatgcta cacttaactt 1740ctcctagaag ctaatctata
ccataatatt ttcatattca caagatatta aattaccaat 1800tttcaaatta
ttgttagtaa agaacaaaat gattctctcc caaagaaaga cacattttaa
1860atactccttc actctaaaac tctggtatta taacttttga aagttaatat
ttctacatga 1920aatgtttagc tcttacactc tatccttcct agaaaatggt
aattgagatt actcagatat 1980taattaaata caatatcata tatatattca
cagagtataa acctaaataa tgatctatta 2040gattcaaata tttgaaataa
aaacttgatt tttttgt 2077201640DNAHomo sapiens 20aatgccacct
gcttgaaggc tatatgtgac aagtcactag aggttcacct gcaggttgac 60gccatgtaca
caaatgtcaa agtaactaat atttgctctg atgggacact ctactgccag
120gtgccttgta agggtctgaa caagctcagt gaccttctac gtaagataga
ggactacttc 180cattgcaagc acatgacctc tgagtgcttt gtttcattac
ccttctgtgg gaaaatctgc 240ctcttccatt gcaaaggaaa atggttacga
gtagagatca caaatgttca cagcagccgg 300gctcttgatg ttcagttcct
ggactctggc actgtgacat ctgtaaaagt gtcagagctc 360agggaaattc
cacctcggtt tctacaagaa atgattgcaa taccacctca ggccattaag
420tgctgtttag cagatcttcc acaatctatt ggcatgtgga caccagatgc
agtgctgtgg 480ttaagagatt ctgttttgaa ttgctcggac tgtagcatta
aggttacaaa agtggatgaa 540accagaggga tcgcacatgt ttatttattt
acccctaaga acttccctga ccctcatcgc 600agtattaatc gccagattac
aaatgcagac ttgtggaagc atcagaagga tgtgtttttg 660agtgccatat
ccagtggagc tgactctccc aacagcaaaa atggcaacat gcccatgtcg
720ggcaacactg gagagaattt cagaaagaac ctcacagatg tcatcaaaaa
gtccatggtg 780gaccatacga gcgctttctc cacagaggaa ctgccacctc
ctgtccactt atcaaagcca 840ggggaacaca tggatgtgta tgtgcctgtg
gcctgtcacc caggctactt cgtcatccag 900ccttggcagg agatacataa
gttggaagtt ctgatggaag agatgattct atattacagc 960gtgtctgaag
agcgccacat agcagtggag aaagaccaag tgtatgctgc aaaagtggaa
1020aataagtggc acagggtgct tttaaaagga atcctgacca atggactggt
atctgtgtat 1080gagctggatt atggcaaaca cgaattagtc aacataagaa
aagtacagcc cctagtggac 1140atgttccgaa agctgccctt ccaagcagtc
acagctcaac ttgcaggagt gaagtgcaac 1200cagtggtctg aggaggcttc
tatggtgttt cgaaatcatg tggagaagaa acctctggtg 1260gcactggtgc
agacagtcat tgaaaatgct aacccttggg accggaaagt agtggtctac
1320ttagtggaca catcgttgcc agacaccgat acctggattc atgattttat
gtcagagtat 1380ctgatagagc tttcaaaagt taattaatga ctgcctctga
aaccttgaca actaattcag 1440attttttagc aataacaaaa tgtagtaggc
ttaaaaaaaa tcttaactct gctacatggc 1500tctgactgct gtgggggatt
gaaaagaata tgcttatgtt tgatgaaaga tatttaacaa 1560gttttgtttt
aacagagttg acttttcaaa gaaaattgta cttgaattat tactataata
1620ttagaataaa aatgtttatc 164021591DNAHomo sapiens 21agaaaatgct
tctatttttc tttagcacct ccatggttct catataccca tgtctgtaaa 60aagtgacatg
agaattttgt tgggttacat tttattgtat ttattagatt cgcttatata
120gatgacttag gcagaaataa agtcatgtct ttagaaggtg aacaagccaa
cttgtgatgg 180cctgcctttt gcttttggca gttgggatga gaacaattga
ctctcccatt ggttgttaga 240tagttgaaat ggtgcgttgg tggtcatact
tagtgttcta ggctgtgaaa tcatggagtt 300cttccacttc caagaatgac
tcatttgctg ttggattcta gtacagaatt tagcagcctg 360atgtgtcccc
aaactgattt aatttctact gaagtgccct tgtgtacatt tgttttgtaa
420tttaccaaag tactacctga gtgtataatg actcctgcag tgagttaatg
taattgctgc 480tttgaccatt gttttaaatc tgtgtactag agtaactgtg
agcagaatga aatcacatta 540tctcagtgtt caaaatatca ttctaataaa
gtacatgcat taaacaattt t 591221098DNAHomo sapiens 22atgtcagccc
cactggatgc cgccctccac gcccttcagg aggagcaggc cagaccgccc 60tccacgccct
tcaggaggag caggccagac tcaagatgag gctgtgggac ctgcagcagc
120tgagaaagga gctcggggac tcccccaaag acaaggtccc attttcagtg
cccaagatcc 180ccctggtatt ccgaggacac acccagcagg acccggaagt
gcctaagtct ttagtttcca 240atttgcggat ccactgccct ctgcttgcgg
gctctgctct gatcaccttt gatgacccca 300aagtggctga gcaggtgctg
caacaaaagg agcacacgat caacatggag gagtgccggc 360tgcgggtgca
ggtccagccc ttggagctgc ccatggtcac caccatccag gtgatggtgt
420ccagccagtt gagtggccgg agggtgttgg tcactggatt tcctgccagc
ctcaggctga 480gtgaggagga gctgctggac aagctagaga tcttctttgg
caagactagg aacggaggtg 540gcgatgtgga cgttcgggag ctactgccag
ggagtgtcat gctggggttt gctagggatg 600gagtggctca gcgtctgtgc
caaatcggcc agttcacagt gccactgggt gggcagcaag 660tccctctgag
agtctctccg tatgtgaatg gggagatcca gaaggctgag atcaggtcgc
720agccagttcc ccgctcggta ctggtgctca acattcctga tatcttggat
ggcccggagc 780tgcatgacgt cctggagatc cacttccaga agcccacccg
cgggggcggg gaggtagagg 840ccctgacagt cgtaccccaa ggacagcagg
gcctagcagt cttcacctct gagtcaggct 900aggggcctcc ccttctcatc
ctccccaccc ccccgccaag gttctcacac tggcctgggc 960ttgggtgccc
atataggagg tctgtatgtt caccaacagt gcggaggggt cacacattgc
1020aaaacactgc ccagaacagt aaaaagagcc tgcatgccaa aaaaaaaaaa
aaaaaaaaaa 1080aaaaaaaaaa aaaaaaaa 1098232359DNAHomo sapiens
23gttttgcctg ctagcatctc cctgtaactc tcccaatctt gaggagtgat ccctgtccca
60gcccctggaa aggggcagga acgacaaact caaagtccag gatgttcacc atgacaagag
120ccatggaaga ggctcttttt cagcacttca tgcaccagaa gctggggatc
gcctatgcca 180tacacaagcc atttcccttc tttgaaggcc tcctagacaa
ctccatcatc actaagagaa 240tgtacatgga atctctggaa gcctgtagaa
atttgatccc tgtatccaga gtggtgcaca 300acattctcac ccaactggag
aggactttta acctgtctct tctggtgaca ttgttcagtc 360aaattaacct
gcgtgaatat cccaatctgg tgacgattta cagaagcttc aaacgtgttg
420gtgcttccta tgaacggcag agcagagaca caccaatcct acttgaagcc
ccaactggcc 480tagcagaagg aagctccctc cataccccac tggcgctgcc
cccaccacaa ccccctcaac 540caagctgttc accctgtgcg
ccaagagtca gtgagcctgg aacatcctcc cagcaaagcg 600atgagatcct
gagtgagtcg cccagcccat ctgaccctgt cctgcctctc cctgcactca
660tccaggaagg aagaagcact tcagtgacca atgacaagtt aacatccaaa
atgaatgcgg 720aagaagactc agaagagatg cccagcctcc tcactagcac
tgtgcaagtg gccagtgaca 780acctgatccc ccaaataaga gataaagaag
accctcaaga gatgccccac tctcccttgg 840gctctatgcc agagataaga
gataattctc cagaaccaaa tgacccagaa gagccccagg 900aggtgtccag
cacaccttca gacaagaaag gaaagaaaag aaaaagatgt atctggtcaa
960ctccaaaaag gagacataag aaaaaaagcc tcccaagagg gacagcctca
tctagacacg 1020gaatccaaaa gaagctcaaa agggtggatc aggttcctca
aaagaaagat gactcaactt 1080gtaactccac ggtagagaca agggcccaaa
aggcgagaac tgaatgtgcc cgaaagtcga 1140gatcagagga gatcattgat
ggcacttcag aaatgaatga aggaaagagg tcccagaaga 1200cgcctagtac
accacgaagg gtcacacaag gggcagcctc acctgggcat ggcatccaag
1260agaagctcca agtggtggat aaggtgactc aaaggaaaga cgactcaacc
tggaactcag 1320aggtcatgat gagggtccaa aaggcaagaa ctaaatgtgc
ccgaaagtcc agatcgaaag 1380aaaagaaaaa ggagaaagat atctgttcaa
gctcaaaaag gagatttcag aaaaatattc 1440accgaagagg aaaacccaaa
agtgacactg tggattttca ctgttctaag ctccccgtga 1500cctgtggtga
ggcgaaaggg attttatata agaagaaaat gaaacacgga tcctcagtga
1560agtgcattcg gaatgaggat ggaacttggt taacaccaaa tgaatttgaa
gtcgaaggaa 1620aaggaaggaa cgcaaagaac tggaaacgga atatacgttg
tgaaggaatg accctaggag 1680agctgctgaa gcggaaaaac tcggatgaat
gcgaggtgtg ctgtcaaggg ggacaacttc 1740tctgctgcgg tacttgtcca
cgagtcttcc atgaggactg tcacatcccc cctgtggaag 1800ccaagaggat
gctgtggagt tgcaccttct gcaggatgaa gaggtcttca ggaagccaac
1860agtgccatca tgtatctaag accctggaga ggcagatgca gcctcaggac
cagctgattc 1920gagattacgg tgagcccttt caggaagcaa tgtggttgga
cctggttaag gaaaggctga 1980ttacggaaat gtacacggtg gcatggtttg
tgcgagacat gcgcctgatg tttcgcaacc 2040ataaaacatt ttacaaggct
tctgactttg gccaggtagg acttgactta gaggcagaat 2100ttgaaaaaga
tctcaaagac gtgctcggtt ttcatgaagc caatgacggc ggtttctgga
2160ctcttccttg accctgttct gtaaagactg aagcatcccc acctcaggat
tcagctgatg 2220ggaccctggc ttggactgtt gattgccagt gagtctggga
tgtaattggc tgccctcagg 2280acccaaaccc agacacttca taggattatc
acaccctcca tctttattct ttctttttac 2340ctttaaaagt ctatatcta
2359243200DNAHomo sapiens 24caggaagggc catgaagatt aataaagatt
tggactcagg gcaaatattt acttagtagc 60aataactcaa agaattactg ttgaataaat
aagccaatta agcagccaat cacgtactat 120gcggatgcac acaaatgaaa
ccctcacttc aacctgaaga cattcgcaca tgagttacgt 180agagggacct
gcaggaagcg gtagagaaaa cataaggctt atgcgtttaa tttccacacc
240aatttcagga tctttgtcac tgacagcagc actaagactt gttaacttta
tatagttaag 300aagaacaagg ctgagcgcga tgactcacgc ctgtaagcct
agaactttgg gaggccaaag 360caggcagact gcttgagccc aggagttcca
gaccagcctg ggcaacatgg caacacccca 420tctctacaaa aaaatacaag
aatcagctgg gcgtggtgat gtgttcctgt aatctcagct 480actcgggagg
cagaggcagg aggattgctt gaacccggga ggcagaggtt gtagttagcc
540gagatctcgc cactgcactc cagtctggac gacagagtga gactcagtct
caaataaata 600aataaataca taaatataag gaaaaaaata aagctgcttt
ctcctcttcc tcctctttgg 660tctcatctgg ctctgctcca ggcatctgcc
acaatgtggg tgcttacacc tgctgctttt 720gctgggaagt tcttgagtgt
gttcaggcaa cctctgagct ctctgtggag gagcctggtc 780ccgctgttct
gctggctgag ggcaaccttc tggctgctag ctaccaagag gagaaagcag
840cagctggtcc tgagagggcc agatgagacc aaagaggagg aagaggaccc
tcctctgccc 900accaccccaa ccagcgtcaa ctatcacttc actcgccagt
gcaactacaa atgcggcttc 960tgtttccaca cagccaaaac atcctttgtg
ctgccccttg aggaagcaaa gagaggattg 1020cttttgctta aggaagctgg
tatggagaag atcaactttt caggtggaga gccatttctt 1080caagaccggg
gagaatacct gggcaagttg gtgaggttct gcaaagtaga gttgcggctg
1140cccagcgtga gcatcgtgag caatggaagc ctgatccggg agaggtggtt
ccagaattat 1200ggtgagtatt tggacattct cgctatctcc tgtgacagct
ttgacgagga agtcaatgtc 1260cttattggcc gtggccaagg aaagaagaac
catgtggaaa accttcaaaa gctgaggagg 1320tggtgtaggg attatagaat
ccctttcaag ataaattctg tcattaatcg tttcaacgtg 1380gaagaggaca
tgacggaaca gatcaaagca ctaaaccctg tccgctggaa agtgttccag
1440tgcctcttaa ttgaaggtga gaattgtgga gaagatgctc taagagaagc
agaaagattt 1500gttattggtg atgaagaatt tgaaagattc ttggagcgcc
acaaagaagt gtcctgcttg 1560gtgcctgaat ctaaccagaa gatgaaagac
tcctacctta ttctggatga atatatgcgc 1620tttctgaact gtagaaaggg
acggaaggac ccttccaagt ccatcctgga tgttggtgta 1680gaagaagcta
taaaattcag tggatttgat gaaaagatgt ttctgaagcg aggaggaaaa
1740tacatatgga gtaaggctga tctgaagctg gattggtaga gcggaaagtg
gaacgagact 1800tcaacacacc agtgggaaaa ctcctagagt aactgccatt
gtctgcaata ctatcccgtt 1860ggtatttccc agtggctgaa aacctgattt
tctgctgcac gtggcatctg attacctgtg 1920gtcactgaac acacgaataa
cttggatagc aaatcctgag acaatggaaa accattaact 1980ttacttcatt
ggcttataac cttgttgtta ttgaaacagc acttctgttt ttgagtttgt
2040tttagctaaa aagaaggaat acacacagga ataatgaccc caaaaatgct
tagataaggc 2100ccctatacac aggacctgac atttagctca atgatgcgtt
tgtaagaaat aagctctagt 2160gatatctgtg ggggcaatat ttaatttgga
tttgattttt taaaacaatg tttactgcga 2220tttctatatt tccattttga
aactatttct tgttccaggt ttgttcattt gacagagtca 2280gtattttttg
ccaaatatcc agataaccag ttttcacatc tgagacatta caaagtatct
2340gcctcaatta tttctgctgg ttataatgct tttttttttt tttgctttta
tgccattgca 2400gtcttgtact ttttactgtg atgtacagaa atagtcaaca
gatgtttcca agaacatatg 2460atatgataat cctaccaatt ttcaagaagt
ctctagaaag agataacaca tggaaagacg 2520gcgtggtgca gcccagccca
cggtgcctgt tccatgaatg ctggctacct atgtgtgtgg 2580tacctgttgt
gtccctttct cttcaaagat ccctgagcaa aacaaagata cgctttccat
2640ttgatgatgg agttgacatg gaggcagtgc ttgcattgct ttgttcgcct
atcatctggc 2700cacatgaggc tgtcaagcaa aagaatagga gtgtagttga
gtagctggtt ggccctacat 2760ttctgagaag tgacgttaca ctgggttggc
ataagatatc ctaaaatcac gctggaacct 2820tgggcaagga agaatgtgag
caagagtaga gagagtgcct ggatttcatg tcagtgaagc 2880catgtcacca
tatcatattt ttgaatgaac tctgagtcag ttgaaatagg gtaccatcta
2940ggtcagttta agaagagtca gctcagagaa agcaagcata agggaaaatg
tcacgtaaac 3000tagatcaggg aacaaaatcc tctccttgtg gaaatatccc
atgcagtttg ttgatacaac 3060ttagtatctt attgcctaaa aaaaaatttc
ttatcattgt ttcaaaaaag caaaatcatg 3120gaaaattttt gttgtccagg
caaataaaag gtcattttaa tttaaaaaaa aaaaaaaaaa 3180aaaaaaaaaa
aaaaaggcca 320025656DNAHomo sapiens 25gggaacacat ccaagcttaa
gacggtgagg tcagcttcac attctcagga actctccttc 60tttgggtctg gctgaagttg
aggatctctt actctctagg ccacggaatt aacccgagca 120ggcatggagg
cctctgctct cacctcatca gcagtgacca gtgtggccaa agtggtcagg
180gtggcctctg gctctgccgt agttttgccc ctggccagga ttgctacagt
tgtgattgga 240ggagttgtgg ctgtgcccat ggtgctcagt gccatgggct
tcactgcggc gggaatcgcc 300tcgtcctcca tagcagccaa gatgatgtcc
gcggcggcca ttgccaatgg gggtggagtt 360gcctcgggca gccttgtggc
tactctgcag tcactgggag caactggact ctccggattg 420accaagttca
tcctgggctc cattgggtct gccattgcgg ctgtcattgc gaggttctac
480tagctccctg cccctcgccc tgcagagaag agaaccatgc caggggagaa
ggcacccagc 540catcctgacc cagcgaggag ccaactatcc caaatatacc
tggggtgaaa tataccaaat 600tctgcatctc cagaggaaaa taagaaataa
agatgaattg ttgcaactct tcaaaa 656264759DNAHomo sapiens 26tagttattaa
agttcctatg cagctccgcc tcgcgtccgg cctcatttcc tcggaaaatc 60cctgctttcc
ccgctcgcca cgccctcctc ctacccggct ttaaagctag tgaggcacag
120cctgcgggga acgtagctag ctgcaagcag aggccggcat gaccaccgag
cagcgacgca 180gcctgcaagc cttccaggat tatatccgga agaccctgga
ccctacctac atcctgagct 240acatggcccc ctggtttagg gaggaagagg
tgcagtatat tcaggctgag aaaaacaaca 300agggcccaat ggaggctgcc
acactttttc tcaagttcct gttggagctc caggaggaag 360gctggttccg
tggctttttg gatgccctag accatgcagg ttattctgga ctttatgaag
420ccattgaaag ttgggatttc aaaaaaattg aaaagttgga ggagtataga
ttacttttaa 480aacgtttaca accagaattt aaaaccagaa ttatcccaac
cgatatcatt tctgatctgt 540ctgaatgttt aattaatcag gaatgtgaag
aaattctaca gatttgctct actaagggga 600tgatggcagg tgcagagaaa
ttggtggaat gccttctcag atcagacaag gaaaactggc 660ccaaaacttt
gaaacttgct ttggagaaag aaaggaacaa gttcagtgaa ctgtggattg
720tagagaaagg tataaaagat gttgaaacag aagatcttga ggataagatg
gaaacttctg 780acatacagat tttctaccaa gaagatccag aatgccagaa
tcttagtgag aattcatgtc 840caccttcaga agtgtctgat acaaacttgt
acagcccatt taaaccaaga aattaccaat 900tagagcttgc tttgcctgct
atgaaaggaa aaaacacaat aatatgtgct cctacaggtt 960gtggaaaaac
ctttgtttca ctgcttatat gtgaacatca tcttaaaaaa ttcccacaag
1020gacaaaaggg gaaagttgtc ttttttgcga atcagatccc agtgtatgaa
cagcagaaat 1080ctgtattctc aaaatacttt gaaagacatg ggtatagagt
tacaggcatt tctggagcaa 1140cagctgagaa tgtcccagtg gaacagattg
ttgagaacaa tgacatcatc attttaactc 1200cacagattct tgtgaacaac
cttaaaaagg gaacgattcc atcactatcc atctttactt 1260tgatgatatt
tgatgaatgc cacaacacta gtaaacaaca cccgtacaat atgatcatgt
1320ttaattatct agatcagaaa cttggaggat cttcaggccc actgccccag
gtcattgggc 1380tgactgcctc ggttggtgtt ggggatgcca aaaacacaga
tgaagccttg gattatatct 1440gcaagctgtg tgcttctctt gatgcgtcag
tgatagcaac agtcaaacac aatctggagg 1500aactggagca agttgtttat
aagccccaga agtttttcag gaaagtggaa tcacggatta 1560gcgacaaatt
taaatacatc atagctcagc tgatgaggga cacagagagt ctggcaaaga
1620gaatctgcaa agacctcgaa aacttatctc aaattcaaaa tagggaattt
ggaacacaga 1680aatatgaaca atggattgtt acagttcaga aagcatgcat
ggtgttccag atgccagaca 1740aagatgaaga gagcaggatt tgtaaagccc
tgtttttata cacttcacat ttgcggaaat 1800ataatgatgc cctcattatc
agtgagcatg cacgaatgaa agatgctctg gattacttga 1860aagacttctt
cagcaatgtc cgagcagcag gattcgatga gattgagcaa gatcttactc
1920agagatttga agaaaagctg caggaactag aaagtgtttc cagggatccc
agcaatgaga 1980atcctaaact tgaagacctc tgcttcatct tacaagaaga
gtaccactta aacccagaga 2040caataacaat tctctttgtg aaaaccagag
cacttgtgga cgctttaaaa aattggattg 2100aaggaaatcc taaactcagt
tttctaaaac ctggcatatt gactggacgt ggcaaaacaa 2160atcagaacac
aggaatgacc ctcccggcac agaagtgtat attggatgca ttcaaagcca
2220gtggagatca caatattctg attgccacct cagttgctga tgaaggcatt
gacattgcac 2280agtgcaatct tgtcatcctt tatgagtatg tgggcaatgt
catcaaaatg atccaaacca 2340gaggcagagg aagagcaaga ggtagcaagt
gcttccttct gactagtaat gctggtgtaa 2400ttgaaaaaga acaaataaac
atgtacaaag aaaaaatgat gaatgactct attttacgcc 2460ttcagacatg
ggacgaagca gtatttaggg aaaagattct gcatatacag actcatgaaa
2520aattcatcag agatagtcaa gaaaaaccaa aacctgtacc tgataaggaa
aataaaaaac 2580tgctctgcag aaagtgcaaa gccttggcat gttacacagc
tgacgtaaga gtgatagagg 2640aatgccatta cactgtgctt ggagatgctt
ttaaggaatg ctttgtgagt agaccacatc 2700ccaagccaaa gcagttttca
agttttgaaa aaagagcaaa gatattctgt gcccgacaga 2760actgcagcca
tgactgggga atccatgtga agtacaagac atttgagatt ccagttataa
2820aaattgaaag ttttgtggtg gaggatattg caactggagt tcagacactg
tactcgaagt 2880ggaaggactt tcattttgag aagataccat ttgatccagc
agaaatgtcc aaatgatatc 2940aggtcctcaa tcttcagcta cagggaatga
gtaactttga gtggagaaga aacaaacata 3000gtgggtataa tcatggatcg
cttgtacccc tgtgaaaata tattttttaa aaatatcttt 3060agcagtttgt
actatattat atatgcaaag cacaaatgag tgaatcacag cactgagtat
3120tttgtaggcc aacagagctc atagtacttg ggaaaaatta aaaagcctca
tttctagcct 3180tctttttaga gtcaactgcc aacaaacaca cagtaatcac
tctgtacaca ctgggataga 3240tgaatgaatg gaatgttggg aatttttatc
tccctttgtc tccttaacct actgtaaact 3300ggcttttgcc cttaacaatc
tactgaaatt gttcttttga aggttaccag tgactctggt 3360tgccaaatcc
actgggcact tcttaacctt ctatttgacc tctgcgcatt tggccctgtt
3420gagcactctt cttgaagctc tccctgggct tctctctctt ctagttctat
tctagtcttt 3480ttttattgag tcctcctctt tgctgatccc ttccaagggt
tcaatatata tacatgtata 3540tactgtacat atgtatatgt aactaatata
catacataca ggtatgtata tgtaatggtt 3600atatgtactc atgttcctgg
tgtagcaacg tgtggtatgg ctacacagag aacatgagaa 3660cataaagcca
tttttatgct tactactaaa agctgtccac tgtagagttg ctgtatgtag
3720caatgtgtat ccactctaca gtggtcagct tttagtagag agcataaaaa
tgataaaata 3780cttcttgaaa acttagttta ctatacatct tgccctatta
atatgttctc ttaacgtgtg 3840ccattgttct ctttgaccat tttcctataa
tgatgttgat gttcaacacc tggactgaat 3900gtctgttctc agatcccttg
gatgttacag atgaggcagt ctgactgtcc tttctacttg 3960aaagattaga
atatgtatcc aaatggcatt cacgtgtcac ttagcaaggt ttgctgatgc
4020ttcaaagagc ttagtttgcg gtttcctgga cgtggaaaca agtatctgag
ttccctggag 4080atcaacggga tgaggtgtta cagctgcctc cctcttcatg
caatctggtg agcagtggtg 4140caggcgggga gccagagaaa cttgccagtt
atataacttc tctttggctt ttcttcatct 4200gtaaaacaag gataatactg
aactgtaagg gttagtggag agtttttaat taaaagaatg 4260tgtgaaaagt
acatgacaca gtagttgctt gataatagtt actagtagta gtattcttac
4320taagacccaa tacaaatgga ttatttaaac caagtttatg agttggtttt
ttttcatttt 4380ctatttgtat tttattaaga gtgtcttttc ttatgtgatt
ttttttaatt gctatttgat 4440atggtttggc tatatgtccc cacccaaatc
tcatcttgaa ttataatccc catgtgtcaa 4500gggagggacc tgacgggagg
tgattggatc acgggggcag ttgtccccat gctgttcttg 4560ggatagtgag
ttagttctca tgagatctga tggttttata agtgtttgac aattcctcct
4620ttacacacac tctctctctc atctgctgcc atgtaagact tgcctgcttc
cccttctgcc 4680atgattgtaa gtttcctgag gcctcctcag ccatgtggaa
ctgtgaatct attaagcctc 4740ttttctttat aaatgaaaa 4759271714DNAHomo
sapiens 27ggggcatttt gtgcctgcct agctatccag acagagcagc taccctcagc
tctagctgat 60actacagaca gtacaacaga tcaagaagta tggcagtgac aactcgtttg
acacggttgc 120acgaaaagat cctgcaaaat cattttggag ggaagcggct
tagccttctc tataagggta 180gtgtccatgg attccgtaat ggagttttgc
ttgacagatg ttgtaatcaa gggcctactc 240taacagtgat ttatagtgaa
gatcatatta ttggagcata tgcggaagag agttaccagg 300aaggaaagta
tgcttccatc atcctttttg cacttcaaga tactaaaatt tcagaatgga
360aactaggact atgtacacca gaaacactgt tttgttgtga tgttacaaaa
tataactccc 420caactaattt ccagatagat ggaagaaata gaaaagtgat
tatggactta aagacaatgg 480aaaatcttgg acttgctcaa aattgtacta
tctctattca ggattatgaa gtttttcgat 540gcgaagattc actggatgaa
agaaagataa aaggggtcat tgagctcagg aagagcttac 600tgtctgcctt
gagaacttat gaaccatatg gatccctggt tcaacaaata cgaattctgc
660tgctgggtcc aattggagct gggaagtcca gctttttcaa ctcagtgagg
tctgttttcc 720aagggcatgt aacgcatcag gctttggtgg gcactaatac
aactgggata tctgagaagt 780ataggacata ctctattaga gacgggaaag
atggcaaata cctgccgttt attctgtgtg 840actcactggg gctgagtgag
aaagaaggcg gcctgtgcag ggatgacata ttctatatct 900tgaacggtaa
cattcgtgat agataccagt ttaatcccat ggaatcaatc aaattaaatc
960atcatgacta cattgattcc ccatcgctga aggacagaat tcattgtgtg
gcatttgtat 1020ttgatgccag ctctattcaa tacttctcct ctcagatgat
agtaaagatc aaaagaattc 1080gaagggagtt ggtaaacgct ggtgtggtac
atgtggcttt gctcactcat gtggatagca 1140tggatttgat tacaaaaggt
gaccttatag aaatagagag atgtgagcct gtgaggtcca 1200agctagagga
agtccaaaga aaacttggat ttgctctttc tgacatctcg gtggttagca
1260attattcctc tgagtgggag ctggaccctg taaaggatgt tctaattctt
tctgctctga 1320gacgaatgct atgggctgca gatgacttct tagaggattt
gccttttgag caaataggga 1380atctaaggga ggaaattatc aactgtgcac
aaggaaaaaa atagatatgt gaaaggttca 1440cgtaaatttc ctcacatcac
agaagattaa aattcagaaa ggagaaaaca cagaccaaag 1500agaagtatct
aagaccaaag ggatgtgttt tattaatgtc taggatgaag aaatgcatag
1560aacattgtag tacttgtaaa taactagaaa taacatgatt tagtcataat
tgtgaaaaat 1620agtaataatt tttcttggat ttatgttctg tatctgtgaa
aaaataaatt tcttataaaa 1680ctcggaaaaa aaaaaaaaaa aaaaaaaaaa aaaa
1714282645DNAHomo sapiens 28gcaagttcct gagagccggg aagaactgta
ggaatagtca cagcttgaca accgaacaca 60acctgagtgt gctgagaact catggcgttg
accacctgag ctataatgag ctatgccaac 120tcttgtttca gaacgacccc
tggcttttgc cagaaatttg ccaacattac aacaaaggag 180atggacccca
cggctcttgt gcctttcaaa agcagtgcat caagctccat atctgccagt
240attttttaca gggggaatgc aagtttggca ctagctgtaa gagatcccat
gatttctcta 300attctgagaa tctggaaaaa ttggagaagt tgggtatgag
ctcagacctg gtgagcaggc 360tgcctaccat ttatagaaat gcacatgaca
tcaagaataa gagctctgcc cccagcagag 420tgcctcctct ttttgtccca
caggggactt ctgaaagaaa agacagttca ggttctgtgt 480ccccaaacac
tcttagccag gaggagggtg atcagatctg tttgtaccat atccggaaaa
540gttgtagctt tcaagataag tgccatagag ttcatttcca tttgccgtat
cgatggcaat 600tcttggatag aggcaaatgg gaggatttgg acaacatgga
acttattgaa gaggcatatt 660gcaatcccaa aatagaaagg atcctgtgct
ctgagtcagc cagtaccttt cactctcatt 720gtctgaactt taacgccatg
acttacggtg ctacccaggc tcgccgcctc tccacggcct 780cctctgtcac
caaacctcca cacttcatcc tcaccactga ctggatttgg tactggagtg
840atgagtttgg ttcttggcag gaatatggaa gacagggcac ggtgcaccct
gtgaccactg 900tcagcagtag cgacgtggag aaggcctacc tggcctactg
tacaccgggg tctgacggcc 960aggcagccac cttgaagttc caggccggaa
agcacaacta cgagttagat ttcaaagcct 1020tcgttcagaa aaacctggtc
tatggcacaa ctaaaaaggt ttgccgcaga cccaaatacg 1080tgtctcccca
ggatgtgacg accatgcaaa cctgcaatac caagtttcca ggcccgaaga
1140gcatcccaga ctattgggac tcctctgccc tgccagaccc aggctttcag
aagatcaccc 1200ttagttcttc ctcggaagag tatcagaagg tctggaacct
ctttaaccgc acgctgcctt 1260tctactttgt tcagaagatt gagcgagtac
agaacctggc cctctgggaa gtctaccagt 1320ggcaaaaagg acagatgcag
aagcagaacg gagggaaggc cgtggacgag cggcagctgt 1380tccacggcac
cagcgccatt tttgtggacg ccatctgcca gcagaacttt gactggcggg
1440tctgtggtgt tcatggcact tcctacggca aggggagcta ctttgcccga
gatgctgcat 1500attcccacca ctacagcaaa tccgacacgc agacccacac
gatgttcctg gcccgggtgc 1560tggtgggcga gttcgtcagg ggcaatgcct
cctttgtccg tccgccggcc aaggagggct 1620ggagcaacgc cttctatgat
agctgcgtga acagtgtgtc cgacccctcc atctttgtga 1680tctttgagaa
acaccaggtc tacccagagt atgtcatcca gtacaccacc tcctccaagc
1740cctcggtcac accctccatc ctgctggcct tgggctccct gttcagcagc
cgacagtgag 1800cgcacaggag tgttccaggc ctttcacctg ctctgccttg
aaatggctat ttgggccttt 1860ccttttcttt ttaaacagaa acttttaatg
aactgttctc ttaacattga cctctcaatg 1920aagttatgtt cttaatctct
tgctaataat gatttttact tttaagtcac ttttgggttc 1980actagtggat
taaccagaag tgattgtagt tgagtccagt tttgcttttt aataatgtgt
2040tgaagtttta gtttttactc tttgttgact ttgctgctta ttggcaccag
ggacagagtt 2100tctagataca attttatgga ttggttttaa tttttatgag
tttgtctctg cagtgattcg 2160gtttctcaga gtctcatggc atcatagttt
ttccagaatg acacagtagc caccggtgga 2220tgacagccca cgggcggcac
agtcacttct gcctgttgct ctgacaccaa cccaggcagc 2280tctgctgtgg
cttctcctgg gctctggcat tagttggtct gtgtcacatt gtcagaacag
2340gtggctgctg tgtggtgcca tcgagtccct gctggttccc cttgtcctgg
gagggtcacc 2400cattgcccaa ggaagtgcat ccacctggca ggtgacctgg
aggagtagct tccccgagga 2460cccccaggct tggcctgtga ttgcgcaaac
ccacatttcc taagcacact ggacaccctt 2520cgagtgtggg ttttaacatc
cctgtgagat tgaatacttg tgccacacat gtcacaaaag 2580agtatggaaa
taaaagaaaa tttatccgaa aaaaaaaaaa aaaaaaaaaa aaaaaaaaaa
2640aaaaa
2645292058DNAHomo sapiens 29gcacgaggaa gccacagatc tcttaagaac
tttctgtctc caaaccgtgg ctgctcgata 60aatcagacag aacagttaat cctcaattta
agcctgatct aacccctaga aacagatata 120gaacaatgga agtgacaaca
agattgacat ggaatgatga aaatcatctg cgcaactgct 180tggaaatgtt
tctttgagtc ttctctataa gtctagtgtt catggaggta gcattgaaga
240tatggttgaa agatgcagcc gtcagggatg tactataaca atggcttaca
ttgattacaa 300tatgattgta gcctttatgc ttggaaatta tattaattta
cgtgaaagtt ctacagagcc 360aaatgattcc ctatggtttt cacttcaaaa
gaaaaatgac accactgaaa tagaaacttt 420actcttaaat acagcaccaa
aaattattga tgagcaactg gtgtgtcgtt tatcgaaaac 480ggatattttc
attatatgtc gagataataa aatttatcta gataaaatga taacaagaaa
540cttgaaacta aggttttatg gccaccgtca gtatttggaa tgtgaagttt
ttcgagttga 600aggaattaag gataacctag acgacataaa gaggataatt
aaagccagag agcacagaaa 660taggcttcta gcagacatca gagactatag
gccctatgca gacttggttt cagaaattcg 720tattcttttg gtgggtccag
ttgggtctgg aaagtccagt tttttcaatt cagtcaagtc 780tatttttcat
ggccatgtga ctggccaagc cgtagtgggg tctgatacca ccagcataac
840cgagcggtat aggatatatt ctgttaaaga tggaaaaaat ggaaaatctc
tgccatttat 900gttgtgtgac actatggggc tagatggggc agaaggagca
ggactgtgca tggatgacat 960tccccacatc ttaaaaggtt gtatgccaga
cagatatcag tttaattccc gtaaaccaat 1020tacacctgag cattctactt
ttatcacctc tccatctctg aaggacagga ttcactgtgt 1080ggcttatgtc
ttagacatca actctattga caatctctac tctaaaatgt tggcaaaagt
1140gaagcaagtt cacaaagaag tattaaactg tggtatagca tatgtggcct
tgcttactaa 1200agtggatgat tgcagtgagg ttcttcaaga caacttttta
aacatgagta gatctatgac 1260ttctcaaagc cgggtcatga atgtccataa
aatgctaggc attcctattt ccaatatttt 1320gatggttgga aattatgctt
cagatttgga actggacccc atgaaggata ttctcatcct 1380ctctgcactg
aggcagatgc tgcgggctgc agatgatttt ttagaagatt tgcctcttga
1440ggaaactggt gcaattgaga gagcgttaca gccctgcatt tgagataagt
tgccttgatt 1500ctgacatttg gcccagcctg tactggtgtg ccgcaatgag
agtcaatctc tattgacagc 1560ctgcttcaga ttttgctttt gttcgttttg
ccttctgtcc ttggaacagt catatctcaa 1620gttcaaaggc caaaacctga
gaagcggtgg gctaagatag gtcctactgc aaaccacccc 1680tccatatttc
cgtaccattt acaattcagt ttctgtgaca tctttttaaa ccactggagg
1740aaaaatgaga tattctctaa tttattcttc tataacactc tatatagagc
tatgtgagta 1800ctaatcacat tgaataatag ttataaaatt attgtataga
catctgcttc ttaaacagat 1860tgtgagttct ttgagaaaca gcgtggattt
tacttatctg tgtattcaca gagcttagca 1920cagtgcctgg taatgagcaa
gcatacttgc cattactttt ccttcccact ctctccaaca 1980tcacattcac
tttaaatttt tctgtatata gaaaggaaaa ctagcctggg caacatgatg
2040aaaccccatc tccactgc 205830860DNAHomo sapiens 30ggatggcaac
cttcagctag actgcctggc tcaagggtgg aagcaatacc aacagagagc 60atttggctgg
ttccggtgtt cctcctgcca gcgaagttgg gcttccgcca agtgcagatt
120ctgtgccaca cgtactggga gcactggaca tcccagggtc aggtgcgtat
gaggctcttt 180ggccaaaggt gccagaagtg ctcctggtcc caatatgaga
tgcctgagtt ctcctcggat 240agcaccatga ggattctgag caacctggtg
cagcatatac tgaagaaata ctatggaaat 300ggcatgagga agtctccaga
aatgccagta atcctggaag tgtccctgga aggatcccat 360gacacagcca
attgtgaggc atgcactttg ggcatatgtg gacagggctt aaaaagctac
420atgacaaagc cgtccaaatc cctactcccc cacctaaaga ctgggaattc
ctcacctgga 480attggtgctg tgtacctcgc aaaccaagcc aagaaccagt
cagatgaggc aaaagaggct 540aaggggagtg ggtatgagaa attagggccc
agtcgagacc cagatccact gaacatctgt 600gtctttattt tgctgcttgt
atttattgta gtcaaatgct ttacatcaga atgatgaaaa 660taggcttgcc
actttctctt attttaattc catggtagtc aatgaactgg ctgccacttt
720aatataactg aaaattcatt ttgagaccaa gcaggatcaa gtttgtagaa
taaacactgg 780tttcctagcc atcctctgaa aacagtatga aacatgacca
agtacataat ggatttagta 840ataaatattg tcgaattgct 86031449DNAHomo
sapiens 31caggccaaaa agtggtcccg cgtgcccttc tccgtgcctg actttgactt
cctgcagcat 60tgtgccgaga acttgtcgga cctctccctg gactgaccac ctcattgctg
cagtgcccgg 120tttgggctgt agggggcggg agagtctgca gcagactcca
ggcccctcct tcctgaatca 180tcagctgtgg gcatcaggcc caccagccac
acaggagtcc tgggcaccct ggcttaggct 240cccgcaatgg gaaaacaacc
ggagggccag agcttagtcc agacctacct tgtacgcaca 300tagacatttt
catatgcact ggatggagtt agggaaactg aggcaaaaga atttgccata
360ctgtactcag aatcacgaca ttccttccct accaaggcca cttctatttt
ttgaggctcc 420tcataaaaat aaatgaaaaa atgggatag 449323638DNAHomo
sapiens 32ggtagatgcg gctgtgacag cagcaaagaa tgacggccaa gggcgacagc
aggggctggc 60catgctgtaa aggggcttct tgggagggtc cagcctcagg aatcaagggg
aactcctgag 120ccgagaattc tgaagatctc ctccctccct gaagctgtgg
gctgggccat cggaaaactt 180tcagttttgt ttccttgcct gcaagaaacg
aaactcaacc gaaagcctgc agagagcaga 240acatggaagg agacttctcg
gtgtgcagga actgtaaaag acatgtagtc tctgccaact 300tcaccctcca
tgaggcttac tgcctgcggt tcctggtcct gtgtccggag tgtgaggagc
360ctgtccccaa ggaaaccatg gaggagcact gcaagcttga gcaccagcag
gttgggtgta 420cgatgtgtca gcagagcatg cagaagtcct cgctggagtt
tcataaggcc aatgagtgcc 480aggagcgccc tgttgagtgt aagttctgca
aactggacat gcagctcagc aagctggagc 540tccacgagtc ctactgtggc
agccggacag agctctgcca aggctgtggc cagttcatca 600tgcaccgcat
gctcgcccag cacagagatg tctgtcgcag tgaacaggcc cagctcggga
660aaggggaaag aatttcagct cctgaaaggg aaatctactg tcattattgc
aaccaaatga 720ttccagaaaa taagtatttc caccatatgg gtaaatgttg
tccagactca gagtttaaga 780aacactttcc tgttggaaat ccagaaattc
ttccttcatc tcttccaagt caagctgctg 840aaaatcaaac ttccacgatg
gagaaagatg ttcgtccaaa gacaagaagt ataaacagat 900ttcctcttca
ttctgaaagt tcatcaaaga aagcaccaag aagcaaaaac aaaaccttgg
960atccactttt gatgtcagag cccaagccca ggaccagctc ccctagagga
gataaagcag 1020cctatgacat tctgaggaga tgttctcagt gtggcatcct
gcttcccctg ccgatcctaa 1080atcaacatca ggagaaatgc cggtggttag
cttcatcaaa aggaaaacaa gtgagaaatt 1140tcagctagat ttggaaaagg
aaaggtacta caaattcaaa agatttcact tttaacactg 1200gcattcctgc
ctacttgctg tggtggtctt gtgaaaggtg atgggtttta ttcgttgggc
1260tttaaaagaa aaggtttggc agaactaaaa acaaaactca cgtatcatct
caatagatac 1320agaaaaggct tttgataaaa ttcaacttga cttcatgtta
aaaaccctca acaaaccagg 1380cgtcgaagga acatacctca aaataataag
agccatctat gacaaaacca cagccaacat 1440catactgaat gagcaaaagc
tggagcatta ctcttgagaa gtagaacaag gcacttcagt 1500cctattcaac
atagtactgg aagtcctcgc cacagcaatc aggcaagaga aagaaataaa
1560aggcaaccaa aaagaaagga agtcgaagta tctctgtttg cagacgatat
gattctatat 1620ctagaaaacc ccatgatctt ggcccaaaag ctcctagatc
tgataaacaa cttcagctaa 1680ctttcaggag acaaaatcaa tatacaaaat
atggtagcat ttttatacac caacgacatc 1740caagctgaga gccaaatcaa
gaatgcaatc ctattcacaa ttgccacaaa aagaataaaa 1800tacctaggaa
tacagctaac cagggagatg aaagatctct acaacaaaaa ttacaaaaca
1860ctgctgaaag aaatcagaga tgacacaaat ggaaaaacat tccatactta
tggataggaa 1920gaatcaatat tgttaaaatg gccatactac ccaaagcaat
ttatagattc aatgctattc 1980ctatcaaact accaataaca ttcttcacag
aatcagaaaa aaaaagcatt aaaatttatt 2040tgaaaccaaa aaagagccca
aaaagccaaa gcaatcctaa gcaaaaagaa caaagctgga 2100ggcatcgcat
tacccaactt caaactatac tacagggcta cagtaaccaa aactgcatga
2160tactggtaca aaagcatggt gctggtacaa aagcagacac atagatcaat
ggaacagaat 2220agagggccca gaaataaagc tacacaccta caaccatcta
atctttgaca aagttgacaa 2280aaatacgcaa tggggaaaga attccccatt
cagtaagtgg tactgggata actagctagc 2340catatgcaga ggattgaaac
tgaaccactt ccttacacca tatgcaaaaa tcaactcaag 2400atggattaaa
gacttaaatg taaaacccca aactataaaa actctggaag ataacctagg
2460caataccatt ctggacatag gaacggaaaa agatttcatg acaaagatcc
caaaaataat 2520tgtaacgaaa gcaaaaattg acaaatggga catgattaaa
cagaattacc atttgactca 2580gcaatcccat tattggttat atacccaaag
gaatctaaat cattctgtca taaagacata 2640tatacacaaa tgttcacggc
agcactatac acaatcgcaa agtcagggaa tcaaactaaa 2700tgtccatcag
tggtagaaag gataaagaaa atgtggtggc agggagtggt ggctcatgtc
2760tgtaatccca gcactttggg aggctgaggc gggtggttca cctgaggtca
ggagtttgag 2820accagcctgg ccaacatggc gaaactccgt ctccgctaaa
aatacgaaaa ttagccaggc 2880gtggtggcga gcacctgtca tcccagctac
ttgggaggcc taggcgtgag aatcgcttga 2940acctggaagg tggtggttgc
agtgagccga gatcctgcca ctgcactcca gcctgggcaa 3000ccaagcgaga
ctctgcctta aaaaaaaaaa aaagaaaatg tggcacatat acaccatgga
3060atactatgca gccataaaaa agaatgggat catgtcctgt gcagcaacgt
ggatggagct 3120ggaagccatt atcctaaatg aactcactca gaaacagaaa
accaaatacc acatgttctc 3180acttataagt agaagctaaa cattgagtac
acatggatac aaagaaggga accgcagaca 3240ctggggccta cctgaggtcg
gagcatggaa ggagggtgag gatcaaaaaa ctacctatct 3300ggtactatgc
tttttatctg gatgatgaaa taatctgtac aacaaaccct ggtgacatgc
3360aatttaccta tatagcaagc ctacacatgt gcccctgaac ctaaaaaaaa
agttaaaaga 3420aaaacgtttg gattattttc cctctttcga acaaagacat
tggtttgccc aaggactaca 3480aataaaccaa cgggaaaaaa gaaaggttcc
agttttgtct gaaaattctg attaagcctc 3540tgggccctac agcctggaga
acctggagaa tcctacaccc acagaacccg gctttgtccc 3600caaagaataa
aaacacctct ctaaaaaaaa aaaaaaaa 3638331673DNAHomo sapiens
33tcccttctga ggaaacgaaa ccaacagcag tccaagctca gtcagcagaa gagataaaag
60caaacaggtc tgggaggcag ttctgttgcc actctctctc ctgtcaatga tggatctcag
120aaatacccca gccaaatctc tggacaagtt cattgaagac tatctcttgc
cagacacgtg 180tttccgcatg caaatcaacc atgccattga catcatctgt
gggttcctga aggaaaggtg 240cttccgaggt agctcctacc ctgtgtgtgt
gtccaaggtg gtaaagggtg gctcctcagg 300caagggcacc accctcagag
gccgatctga cgctgacctg gttgtcttcc tcagtcctct 360caccactttt
caggatcagt taaatcgccg gggagagttc atccaggaaa ttaggagaca
420gctggaagcc tgtcaaagag agagagcatt ttccgtgaag tttgaggtcc
aggctccacg 480ctggggcaac ccccgtgcgc tcagcttcgt actgagttcg
ctccagctcg gggagggggt 540ggagttcgat gtgctgcctg cctttgatgc
cctgggtcag ttgactggcg gctataaacc 600taacccccaa atctatgtca
agctcatcga ggagtgcacc gacctgcaga aagagggcga 660gttctccacc
tgcttcacag aactacagag agacttcctg aagcagcgcc ccaccaagct
720caagagcctc atccgcctag tcaagcactg gtaccaaaat tgtaagaaga
agcttgggaa 780gctgccacct cagtatgccc tggagctcct gacggtctat
gcttgggagc gagggagcat 840gaaaacacat ttcaacacag cccagggatt
tcggacggtc ttggaattag tcataaacta 900ccagcaactc tgcatctact
ggacaaagta ttatgacttt aaaaacccca ttattgaaaa 960gtacctgaga
aggcagctca cgaaacccag gcctgtgatc ctggacccgg cggaccctac
1020aggaaacttg ggtggtggag acccaaaggg ttggaggcag ctggcacaag
aggctgaggc 1080ctggctgaat tacccatgct ttaagaattg ggatgggtcc
ccagtgagct cctggattct 1140gctggctgaa agcaacagtg cagacgatga
gaccgacgat cccaggaggt atcagaaata 1200tggttacatt ggaacacatg
agtaccctca tttctctcat agacccagca cactccaggc 1260agcatccacc
ccacaggcag aagaggactg gacctgcacc atcctctgaa tgccagtgca
1320tcttggggga aagggctcca gtgttatctg gaccagttcc ttcattttca
ggtgggactc 1380ttgatccaga gaggacaaag ctcctcagtg agctggtgta
taatccagga cagaacccag 1440gtctcctgac tcctggcctt ctatgccctc
tatcctatca tagataacat tctccacagc 1500ctcacttcat tccacctatt
ctctgaaaat attccctgag agagaacaga gagatttaga 1560taagagaatg
aaattccagc cttgactttc ttctgtgcac ctgatgggag ggtaatgtct
1620aatgtattat caataacaat aaaaataaag caaataccat ttaaaaaaaa aaa
1673346031DNAHomo sapiens 34gctgggtcct aggccaggtc tggggtaacc
tggaacttcc acctgggctc tgcgctaggt 60ctctgtttca ctccctcccc gcggggcgcg
cagctcgcgg gtctttggac accaccggtc 120ctgagtccgc ggactgccat
tttcattaag aactgccact tagaggtacc aaaataaagg 180gtatttgcta
cctttaatac ttgccagttc aggttggagg cacaggcagc agcaagaatg
240gaaagaaatg ttcttacaac attttcacag gaaatgtccc agttaatttt
gaatgaaatg 300ccaaaagctg aatattccag tttattcaat gattttgttg
aatctgaatt ttttttgatt 360gatggggatt cattacttat cacatgtatc
tgtgagatat catttaagcc tgggcagaac 420ctccatttct tctatctggt
tgaacgctat cttgtggatc ttattagcaa aggaggacaa 480ttcaccatag
ttttcttcaa ggatgccgag tatgcgtatt tcaacttccc tgaacttctt
540tctttgagaa ctgctttaat tcttcatctt cagaagaata ccaccattga
tgttcgaaca 600acattttcga gatgcttatc aaaagagtgg ggaagtttct
tggaagagag ttacccatat 660ttcctgatag ttgcagacga aggcctgaac
gatctacaaa cacagctttt caacttttta 720atcattcatt cttgggcaag
gaaggtcaac gttgtacttt cctcagggca agaatctgat 780gttctttgcc
tttatgcata ccttcttcca agcatgtaca gacaccagat tttttcctgg
840aagaataagc agaacattaa agatgcttat acaaccctgc ttaaccagtt
ggaaagattt 900aagctttcag cattagcacc tctttttgga agtttaaaat
ggaataatat tacggaagag 960gcacacaaga ctgtatctct gcttacacaa
gtctggccag aaggatctga cattcggcgt 1020gtcttttgtg ttacttcatg
ctcattatct ttgagaatgt accatcgctt tttaggaaac 1080agagagccct
cctctggtca ggaaactgag atccaacagg tgaacagtaa ttgcttaacc
1140ctgcaggaga tggaagattt gtgtaaactg cattgtctca ctgtggtttt
tctactccat 1200ctgcctcttt ctcaaagagc ttgtgctaga gtcatcactt
cacattgggc tgaggacatg 1260aagcctttat tacaaatgaa aaagtggtgt
gaatatttca tcttaagaaa tatacatact 1320tttgaatttt ggaatctgaa
tttaattcac ctttctgact taaatgatga gcttttgttg 1380aagaatattg
ctttttacta tgaaaatgaa aatgtaaaag gcctacattt gaatttggga
1440gataccatta tgaaagatta tgaatatctc tggaataccg tatcaaagtt
ggtcagagac 1500tttgaggttg gacagccatt tcctctgaga acaacaaaag
tttgttttct tggaaagaaa 1560ccatcaccaa tcaaagacag ctccaatgaa
atggtgccca atttgggttt tattccaacg 1620tcatcttttg tggttgataa
atttgctgga gatattttga aagatttgcc ttttctaaag 1680agtgatgatc
ctattgttac ttcactggtt aaacaaaagg aatttgatga acttgtgcac
1740tggcattctc ataaacccct gagtgatgat tatgacaggt ccaggtgtca
gtttgatgaa 1800aaatctagag accctcgtgt tcttagatct gtgcaaaagt
atcatgtttt ccaacggttt 1860tatgggaatt cattagaaac agtctcttcg
aaaatcatcg tgactcaaac tattaagtca 1920aagaaggatt ttagtgggcc
caagagcaaa aaggcacacg agaccaaggc tgaaataatt 1980gctagagaga
ataagaaaag gttatttgcc agggaagaac aaaaggaaga gcaaaagtgg
2040aatgctttgt cattttctat tgaagagcaa ttgaaagaaa atttacactc
tggaataaag 2100agcctggaag attttttgaa atcctgtaaa agtagctgtg
tgaaacttca ggttgaaatg 2160gtggggttaa ctgcttgctt gaaagcctgg
aaagaacatt gccgaagtga agaaggtaaa 2220accacgaaag atttaagtat
agctgttcag gtgatgaaaa ggatccactc cttgatggaa 2280aaatactcag
aacttttaca agaagatgat cggcaactca tagccagatg ccttaagtat
2340ttaggatttg atgagttggc aagttcttta catccagccc aggatgcaga
aaatgatgta 2400aaagtgaaga aaaggaataa atattcaatt ggcattgggc
cagctcggtt ccaactgcaa 2460tacatgggcc attatttgat acgagatgag
agaaaagacc cagatcccag ggtccaggat 2520tttattcccg acacatggca
gcgagagctc cttgatgttg tggataagaa tgagtcagca 2580gtgattgttg
ccccaacgtc ctcaggcaaa acatatgcct cctactactg tatggagaaa
2640gtgctgaagg agagcgacga cggggtggtc gtgtacgttg cacccacaaa
ggcccttgtt 2700aatcaagtgg cagcaactgt tcagaatcgt tttacgaaaa
atctgccaag tggtgaagtt 2760ctctgtggtg ttttcaccag ggagtatcgt
catgatgcct taaactgtca ggtacttatt 2820acagtgcctg cctgctttga
aattctgctg cttgctcctc atcgccaaaa ctgggtgaaa 2880aagatcagat
atgttatatt tgatgaggtt cattgtcttg gtggagaaat tggagcagaa
2940atctgggaac atctccttgt catgatccga tgtccctttt tggctctttc
agctaccata 3000agtaatcctg aacatctcac cgagtggcta caatcggtaa
aatggtactg gaaacaagaa 3060gacaaaataa ttgaaaataa taccgcttct
aaaagacatg tgggtcgtca ggccggcttt 3120cccaaagact acttgcaagt
aaaacaatcg tataaagtta gacttgtgct ctatggagag 3180aggtataatg
atctagagaa gcatgtatgt tcaataaaac atggtgacat tcattttgat
3240cattttcacc catgtgctgc actaacaaca gatcatattg aaaggtatgg
attccctcct 3300gatcttaccc tttcacctcg agaaagcatc cagctgtatg
atgccatgtt tcaaatttgg 3360aaaagttggc ctcgggccca ggaactgtgc
ccagaaaact tcattcattt taacaataaa 3420ttagtcatta aaaagatgga
tgctaggaaa tatgaagaga gtctaaaggc agaattaaca 3480agttggatta
aaaatggcaa cgtagagcag gccagaatgg tacttcagaa tcttagtcct
3540gaagcagatt tgagtccaga aaacatgatc accatgtttc cacttctagt
tgaaaaacta 3600aggaaaatgg agaagttacc tgcactattt tttttattca
agttaggagc tgtagaaaac 3660gcagctgaaa gtgtgagcac tttcctaaag
aaaaagcagg agacaaaaag gcctcccaaa 3720gctgataaag aagcccatgt
catggctaac aaacttcgaa aagttaaaaa atccatagag 3780aaacaaaaga
tcatagatga aaagagccag aaaaaaacca gaaatgtgga tcaaagccta
3840atacatgaag ctgaacatga taatctagtg aagtgtctag agaagaacct
ggaaatccca 3900caggactgca catatgctga tcaaaaagca gtggacactg
agactttgca gaaggtattt 3960ggtcgagtaa aatttgaaag aaaaggtgaa
gaattgaaag ccttggcaga aaggggtatt 4020ggatatcatc acagtgctat
gagtttcaaa gaaaaacaat tagttgaaat cctctttaga 4080aaaggatatc
ttagggtggt gacagctact ggaacacttg ctttaggtgt caacatgcct
4140tgtaaatctg tggtttttgc tcaaaactca gtctatctgg atgcgttgaa
ttatagacag 4200atgtctggcc gtgctggaag aagaggtcaa gacctgatgg
gagatgtata tttctttgat 4260attccattcc ccaaaatagg aaaactcata
aaatccaatg ttcctgagct gagaggacac 4320ttccctctca gcataaccct
ggtcctgcga ctcatgctgc tggcttccaa gggagatgac 4380ccagaggata
ccaaggcaaa ggtgctatca gtgctaaagc attcattgct gtccttcaag
4440caacccagag tcatggacat gttaaaactt tacttcctgt tttctttgca
gttcctggtg 4500aaagagggct atttagatca agaaggtaat cctatggggt
ttgctggact tgtatcacat 4560ttgcattatc atgaaccttc taatcttgtt
tttgtcagtt ttcttgtaaa tggactcttc 4620catgatctct gtcagccaac
caggaaaggc tcaaaacatt tttctcaaga cgttatggaa 4680aagctagtat
tagtattggc acatctcttt ggaagaagat attttccacc aaagttccaa
4740gatgcacact tcgagtttta tcaatcaaag gtgttccttg atgatctccc
tgaggatttt 4800agtgatgctt tagatgaata taacatgaaa attatggagg
actttaccac tttcctacga 4860attgtttcca aactggctga tatgaatcag
gaatatcaac tcccattgtc aaaaatcaaa 4920ttcacaggta aagaatgtga
agactctcaa ctcgtatctc atttgatgag ctgcaaggaa 4980ggaagagtag
caatttcacc atttgtttgt ctgtctggga actttgatga tgatttgctt
5040cgactagaaa ctccaaacca tgttactcta ggcacaatcg gtgtcaatcg
ctctcaggct 5100ccagtgctgt tgtcacagaa atttgataac cgaggaagga
aaatgtcgct taatgcctat 5160gcactggatt tctacaaaca tggttccttg
ataggattag tccaggataa caggatgaat 5220gaaggagatg cttattattt
gttgaaggat tttgcactca ccattaaatc tatcagtgtt 5280tccttgcgtg
agctatgtga aaatgaagac gacaacgttg tcttagcctt tgaacaactg
5340agtacaactt tttgggaaaa gttaaacaaa gtctaaaaac aaagtctatg
caaaccactt 5400aaaaataatt ccatagtagt ttttcaggtc acgtttttga
ttcttatgct tcttgccaga 5460aatacattat gataaagtgg aaatacatta
cgatgaagtg gaaagagcaa acactttgga 5520atcaaacaga gttgcaatca
aacctgcaat gttctgtcat gaatactcac aaattattta 5580gtatacctga
atcttggttt ctttttataa ctgagtaata atggttacat ctcaggtagt
5640ttgaggattg actaaaaaaa tgcgagaatg ttgtatgtga ctgaataaca
atttttactc 5700tgcgaagcca aagtaaatat aatattatca gtaactttat
ccccagtgtc agtatttata 5760aaatgtttat taaggctaga aaaaatgaat
acaatatcct gaaggtgaaa tatattctct 5820tcaattagca taaatatgat
ttacataagt tagctataca gctattgaga tagtactttc 5880tagtaaactt
aaactacttt ttaaacatac attttgtgat gatttaacaa aaatatagag
5940aatgatttgc tttattgtaa ttgtatataa gtgactggaa aagcacaaag
aaataaagtg 6000ggttcgatct gttaaataaa aaaaaaaaaa a 603135632DNAHomo
sapiens 35gccatctagt ctgtggtttt
ctgttgaagc agtctgaatt gactaaaaca gtcacttgga 60gtagttataa accactttcc
tgttgaaagc agaacatgct gattcaactg ttttgttcaa 120tagcaatgat
agattttgtt taagtcccct acactttctt atttctaaat gatcaagagt
180acacttcctg gcagtgatta aggagtgtgt atctaacaga aaaaatatat
ataccctgtg 240aacccgaata tggaattcag attgtttctg ccctcagtat
catacttaaa aaacaagcat 300acaaacaaac ataagggaac aaacagcaac
cataacaaaa acaaaccttt aaaggtgggt 360ttttgctgtg ataaatgaat
acggtactct gaaggagaaa aaagtttctc aaatgagctt 420aaactgcaag
tgatttaaaa attagagaat ataattctta aagctattga aagtttcaac
480cagaaaacct caagtgaatt ttgtatgtaa atgaaatctt gaatgtaagt
tctgtgattc 540tttaagcaaa caattagctg aaaacttggt attgttgtag
tttatgtagt aagtgacttg 600gcacccatca gaaaataaag ggcattaaat tg
63236409DNAHomo sapiens 36aaaaaaaaaa aaaaaaaaaa aaagagttgt
tttctcatgt tcattatagt tcattacagt 60tacatagtcc gaaggtctta caactaatca
ctggtagcaa taaatgcttc aggcccacat 120gatgctgatt agttctcagt
tttcattcag ttcacaatat aaccaccatt cctgccctcc 180ctgccaaggg
tcataaatgg tgactgccta acaacaaaat ttgcagtctc atctcatttt
240catccagact tctggaactc aaagattaac ttttgactaa ccctggaata
tctcttatct 300cacttatagc ttcaggcatg tatttatatg tattcttgat
agcaatacca taatcaatgt 360gtattcctga tagtaatgct acaataaatc
caaacatttc aactctgtt 409373903DNAHomo sapiens 37agcacttgaa
gttcaggcag cgagagttga catggggcca gggctgcgcc cctggggcgg 60gttgaagaca
gggtgagtct cttgatattc aggaaatcat cgcgcaccca gtcaccagcg
120ttcgggagcc tgtcgcagcg ggaccgacgg aatccggagc aggcgacagg
gcgcagaagc 180gggatgtact tctgttgggg cgccgactcc agggagctgc
agcgccggag gacggcgggc 240agccccgggg ctgagctact gcaggcggcc
agcggggagc gccactctct gctgctgctg 300accaaccaca gggtcctctc
gtgcggagac aacagcaggg gtcagctggg ccgcaggggc 360gcgcagcgcg
gggagctgcc agaaccaatt caggcattgg aaaccctaat tgttgatctc
420gtgagctgcg ggaaggagca ctccctggct gtgtgccaca aaggaagggt
cttcgcatgg 480ggagctggtt ctgaagggca gctggggatt ggagaattca
aggaaataag tttcacacct 540aagaaaataa tgactctgaa tgatataaaa
ataatacaag tttcctgtgg acactaccac 600tccctggcat tatcaaaaga
tagccaagtg ttttcgtggg gaaagaacag ccatgggcag 660ctgggcttgg
ggaaggagtt cccctcccaa gccagcccgc agagggtgag gtccctggag
720gggatcccac tggctcaggt ggctgccgga ggggctcaca gctttgccct
gtctctctgt 780gggacttcgt ttggctgggg aagtaacagt gccgggcagc
tggccctcag tgggcgtaat 840gtcccagtgc aaagcaacaa gcctctctca
gtcggtgcac tgaagaatct aggtgtggtt 900tatatcagct gtggtgatgc
acacactgcg gtgcttaccc aggacgggaa agtgttcaca 960tttggagaca
atcgctctgg acagctggga tacagcccca ctcctgagaa gagaggtcca
1020caacttgtgg aaagaattga tggcctagtt tcgcagatag attgtggaag
ttatcacacc 1080ctggcatatg tgcacaccac tggtcaggtg gtatcttttg
gtcatggacc aagtgacaca 1140agcaagccaa ctcatccgga ggccctgaca
gagaactttg acattagctg cctgatttct 1200gctgaagact tcgtggatgt
tcaagtcaaa cacatttttg ctggaacata tgccaacttt 1260gtgacaactc
atcaggatac tagttccaca cgtgctcccg ggaaaaccct gccagaaata
1320agccgaatta gccagtccat ggcagaaaaa tggatagcag tgaaaagaag
aagtactgaa 1380catgaaatgg ctaaaagtga aattagaatg atattttcat
ctcctgcttg tctgactgca 1440agttttttaa agaaaagagg aactggagaa
acgacttcca ttgatgtgga cttagaaatg 1500gcaagagata ccttcaagaa
gttaacaaaa aaggaatgga tttcttccat gataactacg 1560tgtctcgagg
atgatctgct cagagctctt ccatgccatt ctccacacca agaagcttta
1620tcagttttcc tcctgctccc agaatgtcct gtgatgcatg attctaagaa
ctggaagaac 1680ctggtggttc catttgcaaa ggctgtgtgt gaaatgagta
aacaatcttt gcaagtccta 1740aagaagtgtt gggcattttt gcaagaatct
tctctgaatc cgctgatcca gatgcttaaa 1800gcagccatca tctctcagct
gcttcatcag actaaaaccg aacaggatca ctgtaatgtt 1860aaagctcttt
taggaatgat gaaagaactg cataaggtaa acaaagctaa ctgtcgacta
1920ccagaaaata ctttcaacat aaatgaactc tccaacttat taaactttta
tatagataga 1980ggaagacagc tctttcggga taaccacctg atacctgcag
aaacccccag tcctgttatt 2040ttcagtgatt ttccatttat ctttaattcg
ctatccaaaa ttaaattatt gcaagctgat 2100tcacatataa agatgcagat
gtcagaaaag aaagcataca tgcttatgca tgaaacaatt 2160ctgcaaaaaa
aggatgaatt tcctccatca cccagattta tacttagagt cagacgaagt
2220cgcctggtta aagatgctct gcgtcaatta agtcaagctg aagctactga
cttctgcaaa 2280gtattagtgg ttgaatttat taatgaaatt tgtcctgagt
ctggaggggt tagttcagag 2340ttcttccact gtatgtttga agagatgacc
aagccagaat atggaatgtt catgtatcct 2400gaaatgggtt cctgcatgtg
gtttcctgcc aagcctaaac ctgagaagaa aagatatttc 2460ctctttggaa
tgctgtgtgg actctcctta ttcaatttaa atgttgctaa ccttcctttc
2520ccactggctc tgtataaaaa acttctggac caaaagccat cattggaaga
tttaaaagaa 2580ctcagtcctc ggttggggaa gagtttgcaa gaagttctag
atgatgctgc tgatgacatt 2640ggagatgcgc tctgcatacg cttttctata
cactgggacc aaaatgatgt tgacttaatt 2700ccaaatggga tctccatacc
tgtggaccaa accaacaaga gagactatgt ttctaagtat 2760attgattaca
ttttcaacgt ctctgtaaaa gcagtttatg aggaatttca gagaggattt
2820tatagagtct gtgagaagga gatacttaga catttctacc ctgaagaact
aatgacagca 2880atcattggaa atactgatta tgactggaaa cagtttgaac
agaattcaaa gtatgagcaa 2940ggataccaaa aatcacatcc tactatacag
ttgttttgga aggctttcca caaactaacc 3000ttggatgaaa agaaaaaatt
cctctttttc cttacaggac gtgataggct gcatgcaaga 3060ggcatacaga
aaatggaaat agtatttcgc tgtcctgaaa ctttcagtga aagagatcac
3120ccaacatcaa taacttgtca taatattctc tccctcccta agtattctac
aatggaaaga 3180atggaggaag cacttcaagt agccatcaac aacaacagag
gatttgtctc acccatgctc 3240acacagtcat aatcacctct gagagactca
gggtgggctt tctcacactt ggatccttct 3300gttcttcctt acacctaaat
aatacaagag attaatgaat agtggttaga agtagttgag 3360ggagagattg
ggggaatggg gagatgatga tgatggtcaa agggtgcaaa atctcacaca
3420agactgaggc aggagaatag ggtacagaga tagggatcta aggatgactt
ggacacactc 3480cctggcactg aagagtctga acactggcct gtgattggtc
cattccagga ccttcatttg 3540cataaggtat caaaccacat cagcctctga
ttggccatgg gccagacctg cactctggcc 3600aatgattggt tcattccagg
acattcattt gcataaggag tcaaaccaca ccagtcttgg 3660attggctgtg
agccaattca cctcagtctc taattggctg tgagtcagtc tttcatttac
3720atagggtgta accatcaaga aacctctaca gggtacttaa gccccagaag
attttgctac 3780cagggctctt gagccacttg ctctagccca ctcccaccct
gtggaatgta ctttcacttt 3840tgctgcttca ctgccttgtg ctccaataaa
tccactcctt caccacccaa aaaaaaaaaa 3900aaa 3903381775DNAHomo sapiens
38gtaactgaaa atccacaaga cagaatagcc agatctcaga ggagcctggc taagcaaaac
60cctgcagaac ggctgcctaa tttacagcaa ccatgaggcc acttaaggat gcagcaagaa
120ggagccatct gcaatccagg aagaaattcc ttgccaggaa ccaaattggt
tgtcaccttc 180atctaggact tctagcctcg agaacttaca aatggtgatg
atcatcaggt caaggatagt 240ctggagcaat tgagatgtca ctttacatgg
gagttatcca ttgatgacga tgaaatgcct 300gatttagaaa acagagtctt
ggatcagatt gaattcctag acaccaaata cagtgtggga 360atacacaacc
tactagccta tgtgaaacac ctgaaaggcc agaatgagga agccctgaag
420agcttaaaag aagctgaaaa cttaatgcag gaagaacatg acaaccaagc
aaatgtgagg 480agtctggtga cctggggcaa ctttgcctgg atgtattacc
acatgggcag actggcagaa 540gcccagactt acctggacaa ggtggagaac
atttgcaaga agctttcaaa tcccttccgc 600tatagaatgg agtgtccaga
aatagactgt gaggaaggat gggccttgct gaagtgtgga 660ggaaagaatt
atgaacgggc caaggcctgc tttgaaaagg tgcttgaagt ggaccctgaa
720aacccggaat ccagcgctgg gtatgcgatc tctgcctatc gcctggatgg
ctttaaatta 780gccacaaaaa atcacaagcc attttctttg cttcccctaa
ggcaggctgt ccgcttaaat 840ccagacaatg gatatattaa ggttctcctt
gccctgaagc ttcaggatga aggacaggaa 900gctgaaggag aaaagtacat
tgaagaagct ctagccaaca tgtcctcaca gacctatgtc 960tttcgatatg
cagccaagtt ttaccgaaga aaaggctctg tggataaagc tcttgagtta
1020ttaaaaaagg ccttgcagga aacacccact tctgtcttac tgcatcacca
gatagggctt 1080tgctacaagg cacaaatgat ccaaatcaag gaggctacaa
aagggcagcc tagagggcag 1140aacagagaaa agctagacaa aatgataaga
tcagccatat ttcattttga atctgcagtg 1200gaaaaaaagc ccacatttga
ggtggctcat ctagacctgg caagaatgta tatagaagca 1260ggcaatcaca
gaaaagctga agagaatttt caaaaattgt tatgcatgaa accagtggta
1320gaagaaacaa tgcaagacat acatttccac tatggtcggt ttcaggaatt
tcaaaagaaa 1380tctgacgtca atgcaattat ccattattta aaagctataa
aaatagaaca ggcatcatta 1440acaagggata aaagtatcaa ttctttgaag
aaattggttt taaggaaact tcggagaaag 1500gcattagatc tggaaagctt
gagcctcctt gggttcgtct acaaattgga aggaaatatg 1560aatgaagccc
tggagtacta tgagcgggcc ctgagactgg ctgctgactt tgagaactct
1620gtgagacaag gtccttaggc acccagatat cagccacttt cacatttcat
ttcattttat 1680gctaacattt actaatcatc ttttctgctt actgttttca
gaaacattat aattcactgt 1740aatgatgtaa ttcttgaata ataaatctga caaaa
1775391977DNAHomo sapiens 39ggcagacagg aagacttctg aagaacaaat
cagcctggtc accagctttt cggaacagca 60gagacacaga gggcagtcat gagtgaggtc
accaagaatt ccctggagaa aatccttcca 120cagctgaaat gccatttcac
ctggaactta ttcaaggaag acagtgtctc aagggatcta 180gaagatagag
tgtgtaacca gattgaattt ttaaacactg agttcaaagc tacaatgtac
240aacttgttgg cctacataaa acacctagat ggtaacaacg aggcagccct
ggaatgctta 300cggcaagctg aagagttaat ccagcaagaa catgctgacc
aagcagaaat cagaagtcta 360gtcacttggg gaaactacgc ctgggtctac
tatcacttgg gcagactctc agatgctcag 420atttatgtag ataaggtgaa
acaaacctgc aagaaatttt caaatccata cagtattgag 480tattctgaac
ttgactgtga ggaagggtgg acacaactga agtgtggaag aaatgaaagg
540gcgaaggtgt gttttgagaa ggctctggaa gaaaagccca acaacccaga
attctcctct 600ggactggcaa ttgcgatgta ccatctggat aatcacccag
agaaacagtt ctctactgat 660gttttgaagc aggccattga gctgagtcct
gataaccaat acgtcaaggt tctcttgggc 720ctgaaactgc agaagatgaa
taaagaagct gaaggagagc agtttgttga agaagccttg 780gaaaagtctc
cttgccaaac agatgtcctc cgcagtgcag ccaaatttta cagaagaaaa
840ggtgacctag acaaagctat tgaactgttt caacgggtgt tggaatccac
accaaacaat 900ggctacctct atcaccagat tgggtgctgc tacaaggcaa
aagtaagaca aatgcagaat 960acaggagaat ctgaagctag tggaaataaa
gagatgattg aagcactaaa gcaatatgct 1020atggactatt cgaataaagc
tcttgagaag ggactgaatc ctctgaatgc atactccgat 1080ctcgctgagt
tcctggagac ggaatgttat cagacaccat tcaataagga agtccctgat
1140gctgaaaagc aacaatccca tcagcgctac tgcaaccttc agaaatataa
tgggaagtct 1200gaagacactg ctgtgcaaca tggtttagag ggtttgtcca
taagcaaaaa atcaactgac 1260aaggaagaga tcaaagacca accacagaat
gtatccgaaa atctgcttcc acaaaatgca 1320ccaaattatt ggtatcttca
aggattaatt cataagcaga atggagatct gctgcaagca 1380gccaaatgtt
atgagaagga actgggccgc ctgctaaggg atgccccttc aggcataggc
1440agtattttcc tgtcagcatc tgagcttgag gatggtagtg aggaaatggg
ccagggcgca 1500gtcagctcca gtcccagaga gctcctctct aactcagagc
aactgaactg agacagagga 1560ggaaaacaga gcatcagaag cctgcagtgg
tggttgtgac gggtaggagg ataggaagac 1620agggggcccc aacctgggat
tgctgagcag ggaagctttg catgttgctc taaggtacat 1680ttttaaagag
ttgttttttg gccgggcgca gtggctcatg cctgtaatcc cagcactttg
1740ggaggccgag gtgggcggat cacgaggtct ggagtttgag accatcctgg
ctaacacagt 1800gaaatcccgt ctctactaaa aatacaaaaa attagccagg
cgtggtggct ggcacctgta 1860gtcccagcta cttgggaggc tgaggcagga
gaatggcgtg aacctggaag gaagaggttg 1920cagtgagcca agattgcgcc
ccctgcactc cagcctgggc ttcagagcaa gactcgg 1977
* * * * *
References