U.S. patent application number 14/798101 was filed with the patent office on 2016-01-14 for prognostic method.
This patent application is currently assigned to PROGENIKA BIOPHARMA, S.A.. The applicant listed for this patent is PROGENIKA BIOPHARMA, S.A.. Invention is credited to Jokin del Amo, Antonio Martinez Martinez, Juan Morote Robles, Laureano Simon Buela, Diego Tejedor Hernandez.
Application Number | 20160010160 14/798101 |
Document ID | / |
Family ID | 36955513 |
Filed Date | 2016-01-14 |
United States Patent
Application |
20160010160 |
Kind Code |
A1 |
del Amo; Jokin ; et
al. |
January 14, 2016 |
PROGNOSTIC METHOD
Abstract
A method for prognosing recurrence of prostate cancer (PCa) in a
subject following prostatectomy using the outcomes of selected
single nucleotide polymorphisms (SNPs) and clinical variables. A
method for genotyping PCa associated genetic variations comprising
use of a DNA microarray. A microarray for use in the described
methods.
Inventors: |
del Amo; Jokin; (Getxo,
ES) ; Tejedor Hernandez; Diego; (Vizcaya, ES)
; Martinez Martinez; Antonio; (Vizcaya, ES) ;
Simon Buela; Laureano; (Guipuzcoa, ES) ; Robles; Juan
Morote; (Barcelona, ES) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
PROGENIKA BIOPHARMA, S.A. |
Derio |
|
ES |
|
|
Assignee: |
PROGENIKA BIOPHARMA, S.A.
Derio
ES
|
Family ID: |
36955513 |
Appl. No.: |
14/798101 |
Filed: |
July 13, 2015 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
12309208 |
Dec 14, 2009 |
9109257 |
|
|
PCT/IB2007/002364 |
Jul 12, 2007 |
|
|
|
14798101 |
|
|
|
|
Current U.S.
Class: |
514/789 ;
435/6.11; 506/9 |
Current CPC
Class: |
C12Q 2600/16 20130101;
C12Q 2600/156 20130101; C12Q 1/6886 20130101; C12Q 2600/158
20130101; C12Q 2600/106 20130101; C12Q 2600/172 20130101; C12Q
2600/118 20130101 |
International
Class: |
C12Q 1/68 20060101
C12Q001/68 |
Foreign Application Data
Date |
Code |
Application Number |
Jul 12, 2006 |
GB |
0613840.8 |
Claims
1-15. (canceled)
16. A method of prognosing PCa recurrence in a subject comprising
determining the genotype of the subject at one or more positions of
single nucleotide polymorphism selected from SNPs 9, 24, 25, 28,
34, 46, 47, 51, 58 and 80 in Table 1B.
17-108. (canceled)
109. The method of claim 16, wherein said method comprises
determining the genotype of the subject at least the SNP 58
(rs1800247) or an SNP having R.sup.2>0.8 with said SNP 58
(rs1800247).
110. The method of claim 109, wherein said method comprises
determining the genotype of the subject at least the SNP 58
(rs1800247).
111. The method of claim 16, wherein the presence said SNP 58
(rs1800247) is indicative of a likelihood of prostate cancer
recurrence after prostatectomy in said subject.
112. The method of claim 16, wherein the method further comprises
obtaining or determining one or more clinical variables listed in
Table 18.
113. The method of claim 16, wherein said prognosing prostate
cancer recurrence comprises determining the likelihood of prostate
cancer recurrence within 5 years of said subject having had
prostatectomy surgery.
114. The method of claim 16, wherein determining the genotype of
the subject comprises genotyping a sample which has been obtained
from the subject, which sample has been obtained from blood,
saliva, liver, kidney, pancreas, heart, urine or from cells from
the buccal cavity.
115. The method of claim 114, wherein said blood comprises serum,
lymphocytes, Iymphoblastoid cells, fibroblasts, platelets,
mononuclear cells or other blood cells.
116. A method for identifying a human subject having an increased
risk of prostate cancer recurrence, comprising: detecting in a DNA
sample obtained from the subject the presence of at least one T
allele of the rs1800247 single nucleotide polymorphism by
amplifying a portion of the DNA sample with nucleic acid primers
comprising SEQ ID NO: 417 and SEQ ID NO: 506; and correlating the
presence of the T allele of the rs1800247 single nucleotide
polymorphism with an increased risk of prostate cancer recurrence
in the subject.
117. The method of claim 116, wherein said method comprises
detecting two T alleles of the subject at the rs1800247 single
nucleotide polymorphism.
118. The method of claim 116, wherein the method further comprises
obtaining or determining one or more clinical variables selected
from the group consisting of prostate specific antigen level, onset
age, clinical stage, prostatectomy Gleason grade, surgical
oncologic margins, and surgical gland margins.
119. The method of claim 116, wherein the increased risk of
prostate cancer recurrence comprises increased risk of prostate
cancer recurrence within 5 years of said subject having had
prostatectomy surgery.
120. The method of claim 116, wherein the DNA sample from the
subject has been obtained from blood, saliva, liver, kidney,
pancreas, heart, urine or from cells from the buccal cavity of the
subject.
121. The method of claim 120, wherein said blood comprises serum,
lymphocytes, lymphoblastoid cells, fibroblasts, platelets,
mononuclear cells or other blood cells.
122. A method of treating a human post-operative prostate cancer
subject, comprising: (a) (i) detecting in a DNA sample obtained
from the subject the presence of at least one T allele of the
rs1800247 single nucleotide polymorphism, wherein said detecting
comprises sequencing the DNA sample or contacting the DNA sample
with a probe that binds specifically to the T allele and not the C
allele of the rs1800247 single nucleotide polymorphism; and (ii)
correlating the presence of the T allele of the rs1800247 single
nucleotide polymorphism with an increased risk of prostate cancer
recurrence in the subject; and (b) administering adjuvant therapy
to the subject with increased risk of prostate cancer recurrence,
wherein said adjuvant therapy comprises radiation, chemotherapy
and/or androgen-deprivation therapy.
123. A method of treating a human pre-operative prostate cancer
subject, comprising: (a) (i) detecting in a DNA sample obtained
from the subject the presence of at least one T allele of the
rs1800247 single nucleotide polymorphism, wherein said detecting
comprises sequencing the DNA sample or contacting the DNA sample
with a probe that binds specifically to the T allele and not the C
allele of the rs1800247 single nucleotide polymorphism; and (ii)
correlating the presence of the T allele of the rs1800247 single
nucleotide polymorphism with an increased risk of prostate cancer
recurrence in the subject; and (b) administering therapy without
surgery to the subject with increased risk of prostate cancer
recurrence, wherein said therapy without surgery comprises
radiation, chemotherapy and/or androgen-deprivation therapy.
Description
RELATED APPLICATIONS
[0001] This is a divisional of U.S. patent application Ser. No.
12/309,208, filed on Dec. 14, 2009, which is the .sctn.371 U.S.
National Stage of International Application No. PCT/IB2007/002364,
filed Jul. 12, 2007, which in turn claims the benefit of GB Patent
Application No. 0613840.8, filed Jul. 12, 2006, the contents of
each of which are incorporated herein by reference in their
entirety.
FIELD OF THE INVENTION
[0002] The invention relates to methods and products in particular,
microarrays for in vitro genotyping of prostate cancer (PCa)
associated genetic variations. The invention further relates to
methods for the prognosis and treatment of PCa, and to products for
use therein.
BACKGROUND TO THE INVENTION
[0003] Prostate cancer (PCa) remains the most commonly diagnosed
malignancy and second leading cause of cancer death in men older
than age 40 years. There are three stages of prostate cancer:
localised PCa; locally advanced PCa; and metastatic PCa.
[0004] For patients having localised PCa (confined to the prostate
gland), and treated with radical prostatectomy, there is a risk of
cancer progression or recurrence, usually indicated by an increased
level of prostate-specific antigen (PSA). Those who experience
early increases in PSA levels are more likely to develop metastatic
lesions, and have a poor prognosis. Several nomograms have been
developed to try to predict the probability of this biochemical
progression after surgery, usually based on classical clinical
parameters. However, these have failed to accurately predict PSA
recurrence.
[0005] Therefore, there remains in the art a need for reliable
means of predicting the course of PCa, and providing a basis for
more targeted treatments.
[0006] PCa is considered to be a complex genetic disease in which
inheritance is not considered to be the simple Mendelian example.
Association studies have recently identified several genes in which
one or more genetic variations result in a higher or lower risk of
contracting the disease, a better or worse response to drugs and/or
a better or worse prognosis. Single nucleotide polymorphisms (SNPs)
in germ-line DNA have been associated to highly aggressive or
drug-resistant types of PCa (Table 1A).
[0007] Development of a polygenic model for PCa, incorporating
multiple loci from the individual genes, requires a means for
discriminating alleles at multiple genetic loci that is
sufficiently sensitive, specific and reproducible for clinical
use.
[0008] DNA chips are often used to determine alleles at generic
loci.
[0009] In 2001, the Consortium for the Human Genome Project and the
private company Cetera presented the first complete example of the
human genome with 30,000 genes. From this moment on, the
possibility of studying the complete genome or large scale
(high-throughput) studies began. So-called "DNA-chips", also named
"micro-arrays", "DNA-arrays" or "DNA bio-chips" are apparatus that
functional genomics can use for large scale studies. Functional
genomics studies changes in the expression of genes due to
environmental factors and to genetic characteristics of an
individual. Gene sequences present small interindividual variations
at one unique nucleotide called an SNP ("single nucleotide
polymorphism"), which in a small percentage are involved in changes
in the expression and/or function of genes that cause certain
pathologies. The majority of studies which apply DNA-chips study
gene expression, although chips are also used in the detection of
SNPs.
[0010] The first DNA-chip was the "Southern blot" where labelled
nucleic acid molecules were used to examine nucleic acid molecules
attached to a solid support. The support was typically a nylon
membrane.
[0011] Two breakthroughs marked the definitive beginning of
DNA-chip. The use of a solid non-porous support, such as glass,
enabled miniaturisation of arrays thereby allowing a large number
of individual probe features to be incorporated onto the surface of
the support at a density of >1,000 probes per cm.sup.2. The
adaptation of semiconductor photolithographic techniques enabled
the production of DNA-chips containing more than 400,000 different
oligonucleotides in a region of approximately 20 .mu.m.sup.2,
so-called high density DNA-chips.
[0012] In general, a DNA-chip comprises a solid support, which
contains hundreds of fragments of sequences of different genes
represented in the form of DNA, cDNA or fixed oligonucleotides,
attached to the solid surface in fixed positions. The supports are
generally glass slides for the microscope, nylon membranes or
silicon "chips". It is important that the nucleotide sequences or
probes are attached to the support in fixed positions as the
robotized localisation of each probe determines the gene whose
expression is being measured. DNA-chips can be classified as:
[0013] high density DNA-chips: the oligonucleotides found on the
surface of the support, e.g. glass slides, have been synthesized
"in situ", by a method called photolithography. [0014] low density
DNA-chips: the oligonucleotides, cDNA or PCR amplification
fragments are deposited in the form of nanodrops on the surface of
the support, e.g. glass, by means of a robot that prints those DNA
sequences on the support. There are very few examples of low
density DNA-chips which exist: a DNA-chip to detect 5 mutations in
the tyrosinase gene; a DNA-chip to detect mutations in p53 and
k-ras; a DNA-chip to detect 12 mutations which cause hypertrophic
cardiomyopathy; a DNA-chip for genotyping of Escherichia coli
strains; or DNA-chips to detect pathogens such as Cryptosporidium
parvum or rotavirus.
[0015] For genetic expression studies, probes deposited on the
solid surface, e.g. glass, are hybridized to cDNAs synthesized from
mRNAs extracted from a given sample. In general the cDNA has been
labelled with a fluorophore. The larger the number of cDNA
molecules joined to their complementary sequence in the DNA-chip,
the greater the intensity of the fluorescent signal detected,
typically measured with a laser. This measure is therefore a
reflection of the number of mRNA molecules in the analyzed sample
and consequently, a reflection of the level of expression of each
gene represented in the DNA-chip.
[0016] Gene expression DNA-chips typically also contain probes for
detection of expression of control genes, often referred to as
"house-keeping genes", which allow experimental results to be
standardized and multiple experiments to be compared in a
quantitative manner. With the DNA-chip, the levels of expression of
hundreds or thousands of genes in one cell can be determined in one
single experiment. cDNA of a test sample and that of a control
sample can be labelled with two different fluorophores so that the
same DNA-chip can be used to study differences in gene
expression.
[0017] DNA-chips for detection of genetic polymorphisms, changes or
mutations (in general, genetic variations) in the DNA sequence,
comprise a solid surface, typically glass, on which a high number
of genetic sequences are deposited (the probes), complementary to
the genetic variations to be studied. Using standard robotic
printers to apply probes to the array a high density of individual
probe features can be obtained, for example probe densities of 600
features per cm.sup.2 or more can be typically achieved. The
positioning of probes on an array is precisely controlled by the
printing device (robot, inkjet printer, photolithographic mask
etc.) and probes are aligned in a grid. The organisation of probes
on the array facilitates the subsequent identification of specific
probe-target interactions. Additionally it is common, but not
necessary to divide the array features into smaller sectors, also
grid-shaped, that are subsequently referred to as sub-arrays.
Sub-arrays typically comprise 32 individual probe features although
lower (e.g. 16) or higher (e.g. 64 or more) features can comprise
each subarray.
[0018] One strategy used to detect genetic variations involves
hybridization to sequences which specifically recognize the normal
and the mutant allele in a fragment of DNA derived from a test
sample. Typically, the fragment has been amplified, e.g. by using
the polymerase chain reaction (PCR), and labelled e.g. with a
fluorescent molecule. A laser can be used to detect bound labelled
fragments on the chip and thus an individual who is homozygous for
the normal allele can be specifically distinguished from
heterozygous individuals (in the case of autosomal dominant
conditions then these individuals are referred to as carriers) or
those who are homozygous for the mutant allele.
[0019] Another strategy to detect genetic variations comprises
carrying out an amplification reaction or extension reaction on the
DNA-chip itself.
[0020] For differential hybridisation based methods there are a
number of methods for analysing hybridization data for genotyping:
[0021] Increase in hybridization level: The hybridization level of
complementary probes to the normal and mutant alleles are compared.
[0022] Decrease in hybridization level: Differences in the sequence
between a control sample and a test sample can be identified by a
fall in the hybridization level of the totally complementary
oligonucleotides with a reference sequence. A complete loss is
produced in mutant homozygous individuals while there is only 50%
loss in heterozygotes. In DNA-chips for examining all the bases of
a sequence of "n" nucleotides ("oligonucleotide") of length in both
strands, a minimum of "2n" oligonucleotides that overlap with the
previous oligonucleotide in all the sequence except in the
nucleotide are necessary. Typically the size of the
oligonucleotides is about 25 nucleotides. The increased number of
oligonucleotides used to reconstruct the sequence reduces errors
derived from fluctuation of the hybridization level. However, the
exact change in sequence cannot be identified with this method;
sequencing is later necessary in order to identify the
mutation.
[0023] Where amplification or extension is carried out on the
DNA-chip itself, three methods are presented by way of example:
[0024] In the Minisequencing strategy, a mutation specific primer
is fixed on the slide and after an extension reaction with
fluorescent dideoxynucleotides, the image of the DNA-chip is
captured with a scanner.
[0025] In the Primer extension strategy, two oligonucleotides are
designed for detection of the wild type and mutant sequences
respectively. The extension reaction is subsequently carried out
with one fluorescently labelled nucleotide and the remaining
nucleotides unlabelled. In either case the starting material can be
either an RNA sample or a DNA product amplified by PCR.
[0026] In the Tag arrays strategy, an extension reaction is carried
out in solution with specific primers, which carry a determined 5'
sequence or "tag". The use of DNA-chips with oligonucleotides
complementary to these sequences or "tags" allows the capture of
the resultant products of the extension. Examples of this include
the high density DNA-chip "Flex-flex" (Affymetrix).
[0027] For genetic diagnosis, simplicity must be taken into
account. The need for amplification and purification reactions
presents disadvantages for the on-chip extension/amplification
methods compared to the differential hybridization based
methods.
[0028] Typically, DNA-chip analysis is carried out using
differential hybridization techniques. However, differential
hybridization does not produce as high specificity or sensitivity
as methods associated with amplification on glass slides. For this
reason the development of mathematical algorithms, which increase
specificity and sensitivity of the hybridization methodology, are
needed (Cutler D J, Zwick M E, Carrasquillo M N, Yohn C T, Tobi K
P, Kashuk C, Mathews D J, Shah N, Eichler E E, Warrington J A,
Chakravarti A. Genome Research; 11:1913-1925 (2001).
[0029] Thus, despite advances in technology, the problems of
existing methods is simultaneously analysing a large number of
genetic variations in a sensitive, specific and reproducible way,
has prevented the application of DNA-chips for routine use in
clinical diagnosis
SUMMARY OF THE INVENTION
[0030] The inventors have identified new means for prognosing
recurrence of prostate cancer using combinations of informative SNP
variables and clinical variables. Accordingly the invention
provides a method of prognosing prostate cancer (PCa) recurrence
following prostatectomy in a subject, which comprises:
(I) obtaining outcomes for one or more single nucleotide
polymorphism variables and one or more clinical variables listed in
Table 18 for the subject; and (II) using the outcomes obtained in
(I) to prognose PCa recurrence; wherein (i) an outcome for an SNP
variable is the identity of the nucleotide in the genomic DNA of
the subject at the position of the single nucleotide polymorphism;
(ii) an outcome for the clinical variable PSA is the
pre-prostatectomy level of prostate specific antigen (PSA) in the
blood of the subject; (iii) an outcome for the clinical variable
onset age is the age in years at which the subject was diagnosed
with PCa; (iv) an outcome for the clinical variable clinical stage
is a T value assigned to the PCa in the subject before
prostatectomy; (v) an outcome for the clinical variable
prostatectomy Gleason grade is a number from 2 to 10 assigned after
prostatectomy; (vi) an outcome for the clinical variable surgical
oncologic margins is a yes or no to indicate the presence (yes) or
absence (no) of tumour cells at the borders of a surgically
resected tumour; (vii) an outcome for the clinical variable
surgical gland margins is a yes or no to indicate the presence
(yes) or absence (no) of tumour cells at the borders of the
prostate gland; and wherein: (a) the variables for which outcomes
are obtained in step (I) comprise the model 1 SNP and clinical
variables in Table 18; and/or (b) the variables for which outcomes
are obtained in step (I) comprise the model 2 SNP and clinical
variables in Table 18; and/or (c) the variables for which outcomes
are obtained in step (I) comprise the model 3 SNP and clinical
variables in Table 18; and/or (d) the variables for which outcomes
are obtained in step (I) comprise the model 4 SNP and clinical
variables in Table 18.
[0031] The invention also provides a method of deriving a
probability function for use in prognosing PCa recurrence following
prostatectomy in a subject, a computational method of deriving a
probability function for use in prognosing PCa recurrence following
prostatectomy in a subject and a method for prognosing PCa
recurrence in a subject comprising use of a probability function
derived using the data in any one of Tables 4 to 7, as set out in
the claims.
[0032] The inventors have also identified SNPs which have
significant allelic association with prostate cancer recurrence.
Accordingly the invention also provides a method of prognosing PCa
recurrence in a subject comprising determining the genotype of the
subject at one or more positions of single nucleotide polymorphism
selected from SNPs 9, 24, 25, 28, 34, 46, 47, 51, 58 and 80 in
Table 1B.
[0033] The invention also provides an in vitro method for
genotyping PCa associated genetic variations in an individual as
set out in the claims.
[0034] Further aspects of the invention include a computational
method for obtaining a genotype from DNA-chip hybridisation
intensity data, a method of deriving linear functions for use in a
genotyping method of the invention, a computational method of
deriving linear functions for use in a genotyping method of the
invention, a method of diagnosing PCa or susceptibility to PCa in
an individual comprising genotyping an individual with respect to
one or more genetic variations, methods for selecting a treatment
for PCa in a subject and for treating PCa in a subject, a method of
identifying genetic variations predictive of a particular PCa
phenotype and a method of predicting the likely development of a
PCa phenotype in an individual using the identified
variation(s).
[0035] Still further aspects include a computer system comprising a
processor and means for controlling the processor to carry out a
computational method of the invention, a computer program
comprising computer program code which when run on a computer or
computer network causes the computer or computer network to carry
out a computational method of the invention.
[0036] The invention also provides a DNA chip or microarray
suitable for use in the methods of the invention, an
oligonucleotide probe, probe pair, or 4-probe set listed in Table 2
(FIG. 2), an oligonucleotide primer or primer pair listed in Table
3A and/or 3B (FIG. 3), a PCR amplification kit comprising at least
one pair of the listed primers, a diagnostic kit for detection of
PCa associated genetic variations and a kit for prognosing PCa
recurrence in a subject.
[0037] All of these aspects of the invention are as set out in the
claims.
BRIEF DESCRIPTION OF THE SEQUENCES
[0038] The Sequence Listing is submitted as an ASCII text file in
the form of the file named Sequence_Listing.txt, which was created
on Jul. 7, 2015, and is 82,230 bytes, which is incorporated by
reference herein.
[0039] SEQ ID NOS: 1-360 are probes suitable for detection of the
PCa associated genetic variations in Table 1A. These probes are
listed in Table 2.
[0040] SEQ ID NOS: 361-538 are PCR primers suitable for amplifying
target DNA regions comprising PCa associated genetic variations
listed in Table 1A. These primers are listed in Tables 3A and
3B.
[0041] SEQ ID NO: 539 is an external control nucleic acid.
[0042] SEQ ID NOS: 540 & 541 are probes suitable for detection
of the external control nucleic acid of SEQ ID NO: 539.
[0043] SEQ ID NO: 542 is a forward TAG sequence.
[0044] SEQ ID NO: 543 is a reverse TAG sequence.
BRIEF DESCRIPTION OF THE FIGURES
[0045] FIG. 1A is a table (Table 1A) that shows genetic variations
(SNPs) associated with PCa and which may be analysed as described
herein. RefSNP codes (rs#) for each SNP are taken from the Single
Nucleotide Polymorphism Database (dbSNP) curated by the National
Center for Biotechnology Information (NCBI) (as found at
ncbi.nlm.nih.gov/entrez/query.fcgi?CMD=search&DB=snp, as at 4
Jul. 2007). The sequences of all the genes mentioned in Table 1A
are known and recognized on the following websites: GenBank (NCBI),
GeneCard (Weizmann Institute of Sciences) and Snpper.chip.org
(Innate Immunity PGA).
[0046] FIG. 1B is a table (Table 1B) that shows the nucleotide
alleles for genotypes 0, 1 and 2 as used herein, for SNPs 9, 24,
25, 28, 31, 32, 34, 46, 47, 51, 56, 58, 80. The Table also shows,
for each SNP, the genotype which is associated with a poorer
prognosis (these genotypes are in bold print). These are: SNP9
(TT), SNP 24 (GG), SNP 25 (CC), SNP 28 (AA), SNP 31 (AG/GG). SNP 32
(CC), SNP 34 (AA). SNP 46 (GG), SNP 47 (GG), SNP51 (AA), SNP 56
(CC), SNP 58 (TT), SNP80 (TT). For SNP31, the G allele is
associated to poor prognosis but no GG patients were observed in
the present studied population, so AG genotype is marked as the one
with poor prognosis compared to AA.
[0047] FIG. 2 is a table (Table 2) that lists oligonucleotide
probes for discriminating between alleles at the SNPs listed in
Table 1A. The table lists two probe pairs for each SNP (a 4-probe
set).
[0048] FIGS. 3A/3B are Tables 3A (FIG. 3A) and 3B (FIG. 3B) that
list oligonucleotide primers for PCR amplification of nucleic acid
regions containing the SNPs listed in FIG. 1A (Table 1A). Forward
primers are listed in FIG. 3A (Table 3A), and reverse primers in
FIG. 3B (Table 3B).
[0049] FIG. 4A is a table (Table 4) that shows the SNP (46) and the
clinical variable (PSA) together with their significance (Sig.) and
their odds ratios (Exp (B)) used to compute the model 1 for the
prediction of the phenotype. This model provides the probability of
PSA recurrence (PSA >0.2 ng/ml) from 0 (no risk) to 1 (maximum
risk).
[0050] FIG. 4B shows a ROC (receiver operating characteristic)
curve obtained for the model 1 that allows the estimation of its
discriminatory power. The ROC curve has been calculated in order to
maximize the specificity, thus reducing at the same time the
"false" positive rate. A specificity of 95% with a sensibility of
15% is the cut-off for this model regarding the probability of PSA
recurrence. This model shows a likelihood ratio (LR) value of
2.7.
[0051] FIG. 5A is a table (Table 5) that shows the SNPs (46 and 24)
and the clinical variables (PSA, Onset Age, Clinical Stage)
together with their significance (Sig.) and their odds ratios (Exp
(B)) used to compute the model 2 for the prediction of the
phenotype. This model provides the probability of PSA recurrence
(PSA >0.4 ng/ml) from 0 (no risk) to 1 (maximum risk).
[0052] FIG. 5B shows a ROC (receiver operating characteristic)
curve obtained for the model 2 that allows the estimation of its
discriminatory power. The ROC curve has been calculated in order to
maximize the specificity, thus reducing at the same time the
"false" positive rate. A specificity of 95% with a sensibility of
32% is the cut-off for this model regarding the probability of PSA
recurrence. This model shows a likelihood ratio (LR) value of
6.5.
[0053] FIG. 6A is a table (Table 6) that shows the SNPs (24, 31 and
56) and the clinical variables (PSA, Prostatectomy Gleason and
Surgical Margins) together with their significance (Sig.) and their
odds ratios (Exp (B)) used to compute the model 3 for the
prediction of the phenotype. This model provides the probability of
PSA recurrence (PSA >0.2 ng/ml) from 0 (no risk) to 1 (maximum
risk).
[0054] FIG. 6B is a ROC (receiver operating characteristic) curve
obtained for the model 3 that allows the estimation of its
discriminatory power. The ROC curve has been calculated in order to
maximize the specificity, thus reducing at the same time the
"false" positive rate. A specificity of 95% with a sensibility of
30% is the cut-off for this model regarding the probability of PSA
recurrence. This model shows a likelihood ratio (LR) value of
5.7.
[0055] FIG. 7A is a table (Table 7) that shows the SNPs (24, 25, 31
and 32) and the clinical variables (PSA, Prostatectomy Gleason and
Surgical Margins) together with their significance (Sig.) and their
odds ratios (Exp (B)) used to compute the model 4 for the
prediction of the phenotype. This model provides the probability of
PSA recurrence (PSA >0.4 ng/ml) from 0 (no risk) to 1 (maximum
risk).
[0056] FIG. 7B is a ROC (receiver operating characteristic) curve
obtained for the model 4 that allows the estimation of its
discriminatory power. The ROC curve has been calculated in order to
maximize the specificity, thus reducing at the same time the
"false" positive rate. A specificity of 95% with a sensibility of
42% is the cut-off for this model regarding the probability of PSA
recurrence. This model shows a likelihood ratio (LR) value of
8.5.
[0057] FIG. 8A is a table showing the pairwise P values (labelled
Sig.) for three common statistical tests between alleles 0, 1 and 2
of SNP9.
[0058] FIG. 8B is a plot showing Kaplan-Meier curves displaying
estimated PSA survival for patients carrying each of the three
genotypes 0, 1 and 2 for SNP 9.
[0059] FIG. 9A is a table showing the pairwise P values (labelled
Sig.) for three common statistical tests between genotypes 0, 1 and
2 of SNP24 (LPL, S447S, S447X and X447X).
[0060] FIG. 9B is a plot showing Kaplan-Meier curves displaying
estimated PSA survival for patients carrying each of the three
genotypes for SNP24.
[0061] FIG. 10A is a Table showing the pairwise P values (labelled
Sig.) for three common statistical tests between alleles 0, 1 and 2
of SNP25 (p53, R72R, R72P and P72P).
[0062] FIG. 10B is a plot showing Kaplan-Meier curves displaying
estimated PSA survival for patients carrying each of the three
genotypes for SNP25.
[0063] FIG. 11A is a Table showing the pairwise P values (labelled
Sig.) for three common statistical tests between alleles 0, 1 and 2
of SNP28.
[0064] FIG. 11B is a plot showing Kaplan-Meier curves displaying
estimated PSA survival for patients carrying each of the three
genotypes 0, 1 and 2 for SNP28.
[0065] FIG. 12A is a Table showing the pairwise P values (labelled
Sig.) for three common statistical tests between alleles 0, 1 and 2
of SNP34.
[0066] FIG. 12B is a plot showing Kaplan-Meier curves displaying
estimated PSA survival for patients carrying each of the three
genotypes 0, 1 and 2 of SNP34.
[0067] FIG. 13A is a Table showing the pairwise P values (labelled
Sig.) for three common statistical tests between alleles 0, 1 and 2
of SNP46 (Leptin, G-2548G, G-2548A and A-2548A).
[0068] FIG. 13B is a plot showing Kaplan-Meier curves displaying
estimated PSA survival for patients carrying each of the three
genotypes for SNP46.
[0069] FIG. 14A is a Table showing the pairwise P values (labelled
Sig.) for three common statistical tests between alleles 0, 1 and 2
for SNP47.
[0070] FIG. 14B is a plot showing Kaplan-Meier curves displaying
estimated PSA survival for patients carrying each of the three
genotypes 0, 1 and 2 for SNP47.
[0071] FIG. 15A is a Table showing the pairwise P values (labelled
Sig.) for three common statistical tests between alleles 0, 1 and 2
for SNP51.
[0072] FIG. 15B is a plot showing Kaplan-Meier curves displaying
estimated PSA survival for patients carrying each of the three
genotypes 0, 1 and 2 for SNP51.
[0073] FIG. 16A is a Table showing the pairwise P values (labelled
Sig.) for three common statistical tests between alleles 0, 1 and 2
for SNP58.
[0074] FIG. 16B is a plot showing Kaplan-Meier curves displaying
estimated PSA survival for patients carrying each of the three
genotypes 0, 1 and 2 for SNP58.
[0075] FIG. 17A is a Table showing the pairwise P values (labelled
Sig.) for three common statistical tests between alleles 0, 1 and 2
for SNP80.
[0076] FIG. 17B is a plot showing Kaplan-Meier curves displaying
estimated PSA survival for patients carrying each of the three
genotypes 0, 1 and 2 for SNP80.
[0077] FIG. 18 is a table (Table 18) that shows the SNP variables
and the clinical variables included in each of models 1 to 4
described herein. The Table indicates which SNP variables and
clinical variables (of those listed in the first column) are
informative for determining PCa recurrence using each model, and
shows which outcome for each variable is associated with poorer
prognosis.
[0078] FIG. 19A is a table (Table 19) showing the pairwise P values
(labelled Sig.) for three common statistical tests between alleles
0, 1 and 2 for SNP31.
[0079] FIG. 19B is a plot showing Kaplan-Meier curves displaying
estimated PSA survival for patients carrying each of the three
genotypes 0, 1 and 2 for SNP31.
[0080] FIG. 20A is a table (Table 20) showing the pairwise P values
(labelled Sig.) for three common statistical tests between alleles
0, 1 and 2 for SNP32.
[0081] FIG. 20B is a plot showing Kaplan-Meier curves displaying
estimated PSA survival for patients carrying each of the three
genotypes 0, 1 and 2 for SNP32.
[0082] FIG. 21A is a table (Table 21) showing the pairwise P values
(labelled Sig.) for three common statistical tests between alleles
0, 1 and 2 for SNP56.
[0083] FIG. 21B is a plot showing Kaplan-Meier curves displaying
estimated PSA survival for patients carrying each of the three
genotypes 0, 1 and 2 for SNP56.
DETAILED DESCRIPTION OF THE INVENTION
[0084] Prostate cancer, (PCa) is a complex disorder. In general
there are three clinically recognised stages of prostate cancer
development: localised PCa (generally confined to the prostate
gland); locally advanced PCa (breaching the capsule of the prostate
gland, with or without involvement of local nodes and/or tissue
close to the prostate); and metastatic PCa (invasive cancer
involving more distant organs).
[0085] Localised PCa is often treated with surgery (radical
prostatectomy). However, there is a risk of cancer recurrence
following surgery. In clinical terms, an increased
prostate-specific antigen (PSA) level within five years of surgery
(biochemical progression) usually indicates cancer progression or
recurrence. Those who have experienced early PSA recurrence are
known to be more prone to develop metastatic lesions and have a
poor prognosis.
[0086] Using the Proscan DNA microarray of the present invention
and clinical investigation, the inventors have identified a number
of profiles (based on combinations of SNP and clinical variables)
which are informative for predicting such early recurrence. The
inventors have thus established models for predicting early
recurrence in PCa patients. Accordingly, in one aspect, the present
invention relates to methods for prognosis of PCa. In particular,
the invention provides methods for reliably determining the
likelihood of early prostate cancer recurrence in patients who have
undergone radical prostatectomy.
[0087] The inventors selected a study population of Spanish male as
in Example 2. Each individual was clinically assessed to determine
the presence or absence of early (within 5 years of prostatectomy)
PCa recurrence. Controversy exists in the art regarding the
importance of setting PSA levels as >0.2 ng/ml or >0.4 ng/ml
as a threshold for defining "biochemical tumour recurrence". The
inventors therefore decided to use both thresholds in their
analysis.
[0088] Each of the individuals was also tested for various
preoperative and postoperative clinical variables and genotyped at
a number of genetic loci using the Proscan DNA microarray of the
invention (see Example 2).
[0089] The inventors then used genetic to select a subset of the
most informative SNP for further modelling.
[0090] Statistical analysis was carried out to establish four
models (each based on a combination of informative SNPs and
informative clinical variables) that would allow reliable
discrimination between patients having and not having the early PCa
recurrence phenotype, with high specificity, sensitivity and
accuracy.
[0091] The variables which were selected for inclusion in models
1-4 are listed in Table 18 (FIG. 18).
[0092] Model 1 discriminates between patients having and not having
PSA progression (defined as PSA >0.2 ng/ml) within five years of
surgery using preoperative clinical variables and SNPs. Model 2
discriminates between patients having and not having PSA
progression (defined as PSA >0.4 ng/ml) within five years of
surgery using preoperative clinical variables and SNPs. Model 3
discriminates between patients having and not having PSA
progression (defined as PSA >0.2 ng/ml) within five years of
surgery using both preoperative and postoperative clinical
variables and SNPs. Model 4 discriminates between patients having
and not having PSA progression (defined as PSA >0.4 mg/ml)
within five years of surgery using preoperative and post-operative
clinical variables and SNPs.
[0093] FIGS. 4B-7B show the ROC curves, sensitivity, specificity
and positive likelihood ratios (LR+) of each of the models
developed by the inventors.
[0094] Tables 4-7 show the calculation of probability functions
using the discriminating SNPs and clinical variables for each of
the models 104 respectively. Regression probability functions are
built using the statistical package for the social sciences (SPSS
Inc. Headquarters, Chicago, Ill., USA) Version 14.0. SPSS v.14. B
is the coefficient associated to each genotype in the probability
function. ET is the error in the calculation of B. Wald is the
statistical test. GL freedom degrees, P is the value of the Wald
test. X (B) is relative risk. The alternative genotypes, 0, 1 and
2, for the various SNPs are listed in Table 1B.
[0095] The inventors also investigated the association of
individual genetic markers to early PSA recurrence by performing a
survival analysis. In this way, the inventors identified 10 SNPs:
SNP 9, 24, 25, 28, 34, 46, 47, 51, 58 and 80 showing a significant
association between genotype and risk of recurrence (P<0.05).
The results are shown in FIGS. 8-17.
[0096] Thus, the inventors have identified SNPs which are
informative for the prognosis of PCa and in particular predicting
the likelihood of PCa recurrence following prostatectomy (Table
1B). The clinical and SNP variables identified, and the models
constructed using them, provide new means for predicting the
development of the early PCa recurrence phenotype in a subject.
Thus the invention provides methods for the prognosis of PCa and in
particular for predicting the risk of developing an early PCa
recurrence after prostatectomy.
[0097] In general PCa recurrence is clinically determined as the
subject showing biochemical progression. Biochemical progression
refers to an increase in levels of Prostate Specific Antigen (PSA)
(Stamey T, et al. N Engl J Med. 1987; 317:909-16. Pound C R, et al.
JAMA. 1999; 281:1591-1597)
[0098] There is some dispute in the art as to the PSA threshold
which indicates biochemical progression. Some sources indicate that
the threshold is a serum PSA level >0.2 ng/ml; others that the
threshold is PSA level >0.4 ng/ml. Recurrence herein may be
defined according to either threshold. Models 1 and 3 described
herein may be used to determine likelihood of recurrence defined
according to the >0.2 ng/ml threshold. Models 2 and 4 may be
used to determine likelihood of recurrence defined according to the
>0.2 ng/ml or the 0.4 ng/ml threshold.
[0099] In general early recurrence is a recurrence which occurs
within 5 years of prostatectomy.
[0100] In general the subject is a human male. The subject may be
for example, Chinese, Japanese or a Caucasian. Preferably the
subject is a Caucasian, such as a Spanish male.
[0101] Typically the subject has been diagnosed as having localised
PCa. Localised PCa is cancer that has not spread beyond the
prostate gland and accounts for about 90 percent of all PCa at
diagnosis. A diagnosis of localised PCa is typically made according
to four standard tests: Digital rectal examination (DRE), Prostate
Specific Antigen (PSA) serum levels, Transrectal ultrasound (TRUS)
and TRUS-guided biopsy. (AUA Guidelines 2007, American Urological
Association. AUA 2007 Annual Meeting: Media Advisory session. May,
2007).
[0102] The subject may be pre-operative or post-operative. Models 3
and 4 are for use post-operatively. Models 1 and 2 may be used pre-
or post-operatively. For use of individual SNPs the subject may be
pre-operative or post-operative. Pre- or post-operative refers to
the period before (pre) or after (post) radical prostatectomy in a
patient.
[0103] The present prognostic methods involve determining an
outcome for each of a number of single nucleotide polymorphism
(SNP) variables or predictors. The SNP variables are listed in
Table 1B. The SNPs included in models 1 to 4 are listed in FIGS.
4-7 and in FIG. 18. RefSNP codes (rs#) for each SNP are taken from
the Single Nucleotide Polymorphism Database (dbSNP) curated by the
National Center for Biotechnology Information (NCBI) (as found
online at ncbi.nlm.nih.gov/entrez/query.fcgi?CMD=search&DB=snp,
as at 4 Jul. 2007).
[0104] An outcome for a given SNP is the identity of the nucleotide
at that position in the genomic DNA sequence of a subject, or the
genotype of the subject at that SNP. Thus an outcome for a given
SNP may be A, T, C or G.
[0105] Table 1B lists the alternative genotypes (0, 1 and 2) at
each of the SNPs which are useful for prognosis. For SNPs 9, 24,
25, 28, 34, 46, 47, 31 and 80, allele 2 is associated with poor
prognosis. For SNPs 51, 58, 32 and 56 allele 0 is associated with
poor prognosis.
[0106] The inventors found that by determining or obtaining
outcomes for these informative SNPs (i.e. nucleotide identities at
the SNPs), or particular combinations thereof, it is possible to
assess the likelihood of recurrence in a subject.
[0107] The present prognostic methods may also comprise determining
an outcome for one or more clinical variables for a subject. These
clinical variables are also listed in Table 18. The Table also
shows which outcomes are linked to a poorer prognosis.
[0108] The pre-operative clinical variables included in one or more
of the models 1 to 4 are: PSA level; onset age; clinical stage. The
post-operative clinical variables included in one or more of the
models are: prostatectomy Gleason grade; surgical oncologic
margins; and surgical gland margins.
[0109] PSA level refers to the preoperative level of the PSA
antigen in the blood of the patient. Typically this is determined
at diagnosis (typically at routine screenings) and provides a
measure in ng/mL Thus an outcome for this variable a number from
0-1000, typically with one decimal.
[0110] Onset age refers to the age in years at which the patient
was diagnosed with localised PCa according to the criteria
described herein. Thus, an outcome for this variable is a number
ranging between 0-150 years.
[0111] Clinical stage refers to the stage assigned by the doctor
based on the results of all diagnostic tests and biopsies before
prostatectomy. Typically this is clinically determined as follows
and is in general represented by a T number or value, assigned as
follows. T1 is non-palpable PCa, T2 involves a palpable tumor
apparently confined within the prostate. If the tumor has
penetrated through the prostate capsule it is called T3 and T4 if
local invasion of a structure adjacent to the prostate is present.
Those 4 major categories are subdivided (a, b or c) based on
details from diagnostic tests. Thus an outcome for this variable is
T1a, T1b, T1c (those are codified as 0), T2a, T2b (those are
codified as 1) and T2c, T3a, T3b, T3c, T4a and T4b (those are
codified as 2).
[0112] Prostatectomy Gleason grade refers to the degree of
aggressiveness of a particular tumor based on the appearance of the
tumor cells. Typically this is determined by microscopic analysis
of the tissue from the tumor that has been extracted via
prostatectomy. In general the grade is a number between 2 and 10. A
higher Gleason grade indicates a poorly differentiated cancer, or
more aggressive or more likely to spread. Thus the outcome for this
variable is a number from 2 to 10. The numbers 2, 3, 4, 5, 6
(codified as 0), 7 (codified as 1), 8, 9, 10 (codified as 2).
[0113] Surgical oncologic margins refers to the presence or absence
of tumor cells at the borders of surgically resected tumors.
Typically this is determined by microscopic analysis of tumor
tissues after prostatectomy. Thus the outcome for this variable is
a No (Absence, codified as 0) or Yes (Presence, codified as 1).
[0114] Surgical gland margins refers to the presence or absence of
tumor cells at the borders of prostatic gland. Typically this is
determined by microscopic analysis of prostatic tissues after
prostatectomy. Thus the outcome for this variable is a No (Absence,
codified as 0) or Yes (Presence, codified as 1).
[0115] Table 18 shows which SNP variables and clinical variables
are included in each of the four models for prognosing recurrence.
As used herein, the "(model no.)" variables for a particular model
are the SNP variables and clinical variables, selected from those
in the first column of Table 18, which are included in the model,
and which are informative for predicting the likelihood of
recurrence occurring. For example, the "model 1 variables" are the
SNP variables and clinical variables, selected from those in the
first column of Table 18, which are included in model 1 and which
are informative for prognosing the likelihood of recurrence (i.e.
SNP46 and PSA).
[0116] For each of the variables included in each, Table 18 also
indicates which outcome (SNP allele or clinical outcome) is
associated with or suggestive of a poor prognosis.
[0117] Accordingly the invention in one aspect provides a method
for determining the likelihood of PCa recurrence as described
herein for a subject, comprising the step of determining or
obtaining, for that subject, outcomes for one or more SNP variables
and/or one or more clinical variables listed in Table 18 or Table
1B.
[0118] In one aspect the method is for predicting early PCa
recurrence in a subject and comprises determining outcomes for the
model 1 variables, and/or the model 2 variables and/or the model 3
variables, and/or the model 4 variables, listed in Table 18.
[0119] A method may comprise determining or obtaining outcomes for
the model 1 variables (Table 18). Use of these variables allows
discrimination of early PCa recurrence (defined as PSA>0.2
ng/ml) in a Spanish population with an LR+ of 2.7 (see Example 2
and FIG. 4). Details for the calculation of a probability function
using these variables are given in Table 4.
[0120] A method may comprise determining or obtaining outcomes for
the model 2 variables (Table 18). Use of these variables allows
discrimination of early PCa recurrence (defined as PSA>0.4
ng/ml) in a Spanish population with an LR+ of 6.5 (see Example 2
and FIG. 5). Details for the calculation of a probability function
using these variables are given in Table 5.
[0121] A method may comprise determining or obtaining outcomes for
the model 3 variables (Table 18). Use of these variables allows
discrimination of early PCa recurrence (defined as PSA>0.2
ng/ml) in a Spanish population with an LR+ of 5.7 (see Example 2
and FIG. 6). Details for the calculation of a probability function
using these variables are given in Table 6.
[0122] A method may comprise determining or obtaining outcomes for
the model 4 variables (Table 18). Use of these variables allows
discrimination of early PCa recurrence (defined as PSA>0.4
ng/ml) in a Spanish population with an LR+ of 8.5 (see Example 2
and FIG. 7). Details for the calculation of a probability function
using these variables are given in Table 7.
[0123] A method may comprise determining outcomes or obtaining for
the variables of one or more of the present models 1 to 4. The
method may comprise determining outcomes for variables of models
which use post-operative variables (model 3 and/or model 4). The
method may comprise determining outcomes for variables of models
which determine risk of recurrence based on a recurrence PSA
threshold of >0.4 ng/ml (model 2 and/or model 4).
[0124] In some aspects the present methods may include determining
other factors for a subject. For example, the subject may be
genotyped for one or more other genetic variations (such as other
SNPs not listed in Table 18). These may be mutations associated
with PCa or another condition. For example, a subject may be
genotyped at one or more of the remaining SNPs listed in Table 1B,
and/or at one or more of the remaining SNPs listed in Table 1A.
Other markers (e.g. SNPs) associated with other diseases may also
be determined.
[0125] The present methods may also be used in conjunction with or
in addition to standard clinical tests. For example, the methods
may be used in conjunction with or in addition to one or more
nomograms aimed at predicting the probability of biochemical
progression, e.g. based on clinical parameters such as PSA level,
Gleason grade or clinical stage.
[0126] The present methods may be used together with clinical tests
for PCa recurrence. As such the present methods may be used to
confirm a clinical diagnosis of recurrence.
[0127] The present methods allow accurate prediction of PCa
recurrence phenotypes based on a relatively small number of
informative SNPs and clinical variables. This can be advantageous
in that it allows use of genotyping techniques that would not
necessarily be suitable for large scale SNP screening, as well as
larger scale genotyping methods.
[0128] In general, even if a larger number of SNPs or genetic
variations or factors are tested in the present methods, prediction
of PCa recurrence can be made based only on outcomes of the
variables listed for any one of the models in Table 18. These
variables are sufficient for the prediction. Therefore in one
example, the present methods allow differential prognosis of early
recurrence of PCa or not, based on (at a maximum) the outcomes for
the variables for any one or more of the models as in Table 18. The
models may be used in combination as described herein.
[0129] In some instances though, it may be that some additional
variables such as SNPs or other factors are used in the prediction.
For example, in the present methods, prognosis may be made based on
the outcomes of a maximum of 100, 90, 80, 70, 60, 50, 45, 40, 35,
30, 25, 20, 19,18,17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4,
3, or 2 variables, such as SNPs or PCa associated SNPs. The SNPs
may comprise (or consist of), or be selected from the Table 1B or
Table 1A SNP variables.
[0130] In one aspect the method may involve genotyping a maximum of
100, 90, 80, 70, 60, 50, 40, 30, 25, 20, 19, 18, 17, 16, 15, 14,
13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, or 2 SNPs or PCa-associated
SNPs. The method may involve genotyping a maximum of (no more than)
all the SNPs in Table 18, Table 1B or Table 1A. In some instances,
the method comprises genotyping at a maximum, SNP variables for one
or more of models 1 to 4, selected as described above.
[0131] Preferably the number and combination of variables such as
SNPs used to construct a model for predicting recurrence according
to the invention, is such that the model allows prediction to be
made with an LR+ value of at least 2, such as at least 3, 4, 5, 6,
7, 8, 9, or 10. Calculation of LR+ values is described herein.
[0132] Once outcomes are determined for a test subject for each of
the variables listed for a model, these outcomes are used in or
inserted in a suitable probability function (for prediction of that
phenotype), as described herein a probability function value may be
calculated. Outcomes may be codified as described herein for use in
the probability function and calculation of the probability
function value. That probability function value can then be
compared to probability function values obtained from a population
of individuals of known, clinically determined phenotype (with
respect to PCa recurrence, as described herein). Typically this may
be done by comparison with a graph showing the distribution of
values in the population. It can thus be determined whether a test
individual is at high or low risk of recurrence based on the
phenotypic group to which the test probability function value
belongs.
[0133] A suitable probability function for determining a given
phenotype may be derived by methods as set out in the present
Examples and described herein. Typically a study population of
individuals is provided. These individuals are of known (clinically
determined) phenotype with respect to the phenotype that the
probability function will be used to determine. In the present
case, the phenotype is PCa recurrence (yes or no) within 5 years of
prostatectomy. The PCa recurrence may be clinically determined as
described herein. Where the probability function is to be derived
from the model 1 or model 3 variables, PCa recurrence is generally
established using a PSA threshold of >0.2 ng/ml. Where the
probability function is to be derived from the model 2 or model 4
variables, PCa recurrence is generally established using a PSA
threshold of >0.4 ng/ml.
[0134] Typically the individuals in the study population will be
males who have been diagnosed with clinically localised PCa
according to the criteria described herein. In general the males
may be pre- or post-prostatectomy. Typically there will be at least
5 years follow up after surgery. In one example, the study
population does not include patients who are receiving adjuvant
therapy.
[0135] The population may be for example, a Chinese, Japanese or a
Caucasian male population, such as Spanish male population.
Preferably the population used for deriving a probability function
comprises a representative sample of the population in which the
probability function will be applied.
[0136] In general at least n individuals are included in the study
population. Typically n is 200-1000, for example 300, 400, 500 or
600. Where a probability function is for determining between
alternative phenotypes, preferably there are approximately equal
numbers of individuals with each of the alternative phenotypes in
the population. Thus where there are two alternative phenotypes, A
and B, the population is preferably approximately 50% phenotype A
and 50% phenotype B. However, the ratios may be for example,
60%/40%, 70%/30%, 80%/20%, 90%/10% or any statistically acceptable
distribution.
[0137] Each individual in the study population is then tested to
determine outcomes for the particular variables on which the
probability function is to be based. For example, these variables
may be the variables listed for model 1, 2, 3, or 4 in Table 18.
This provides a number of outcomes for each individual.
[0138] Multiple genotype-phenotype associations may then be
analysed using stepwise multivariate logistic regression analysis,
using as the dependent variable the clinically determined phenotype
(PCa recurrence or not) and as independent variables the outcomes
of the informative variables, e.g. as recommended by Balding D J.
(2006.sup.35). The goodness of fit of the models obtained may be
evaluated using Hosmer-Lemeshow statistics and their accuracy
assessed by calculating the area under the curve (AUC) of the
Receiver Operating Characteristic curve (ROC) with 95% confidence
intervals (see, e.g. Janssens A C J W et al., 2006.sup.36. Suitable
methods are described in the Examples.
[0139] The sensitivity, specificity, and positive likelihood ratio
(LR+=sensitivity/(1-specificity)) may be computed by means of ROC
curves. Preferably the model obtained has an LR+ value of at least
2, for example, at least 3, 4, 5, 6, 7, 8, 9 or 10.
[0140] Mean probability function values for each of the alternative
phenotypes in the population can be compared using a t test. In
general the probability functions are able to distinguish between
the different phenotypes in the study population in a statistically
significant way, for example, at p.ltoreq.0.05 in a t-test. Thus
the probability functions produce a statistically significant
separation between individuals of different phenotype in the
population.
[0141] Statistical analyses may be performed, for example, using
the Statistical Package for the Social Sciences (SPSS Inc.
Headquarters, Chicago, Ill., USA) version 14.0.
[0142] Probability function values can be calculated for each
individual of known phenotype in the study population and plotted
in a suitable graph.
[0143] In order to carry out the present methods of prognosis, a
probability function value is calculated for the test individual,
and this is compared with the probability function values for the
individuals of known phenotype in the study population in order to
determine the risk of a given phenotype in that individual. The
comparison may be done by comparison with a graph or by any other
suitable means known to those skilled in the art.
[0144] Thus in one aspect the invention further provides a method
of deriving a probability function for use in predicting or
determining early PCa recurrence as described herein,
comprising:
(i) providing a study population of individuals, wherein each
individual is of known clinically determined phenotype with respect
to PCa recurrence as described herein; (ii) determining for each
individual outcomes for each of a set of informative variables,
thereby obtaining a set of outcomes for each individual; (iii)
applying stepwise multiple logistic regression analysis to the
outcomes obtained in (ii) and the known phenotypes referred to in
(i); and (iv) thereby deriving a probability function which
produces a statistically significant separation between individuals
of different phenotype in the population; wherein: (a) the
probability function is for prognosing early PCa recurrence
according to the invention, and the set of variables for which
outcomes are determined or obtained in step (ii) is selected from
or consists of the model 1 variables listed in Table 18; (b) the
probability function is for prognosing early PCa recurrence
according to the invention, and the set of variables for which
outcomes are determined or obtained in step (ii) is selected from
or consists of the model 2 variables listed in Table 18; (c) the
probability function is for prognosing early PCa recurrence
according to the invention, and the set of variables for which
outcomes are determined or obtained in step (ii) is selected from
or consists of the model 3 variables listed in Table 18; and/or (d)
the probability function is for prognosing early PCa recurrence
according to the invention, and the set of variables for which
outcomes are determined or obtained in step (ii) is selected from
or consists of the model 4 variables listed in Table 18.
[0145] Derivation of the probability functions may be carried out
by a computer. Therefore in one aspect, the invention also relates
to a computational method of deriving a probability function for
use in prognosing PCa recurrence which method comprises applying
stepwise multiple logistic regression analysis to outcomes data and
phenotype data obtained from a suitable study population of
individuals, wherein each individual is of known clinically
determined phenotype with respect to PCa recurrence, thereby
deriving a probability function which produces a statistically
significant separation between individuals of different phenotype
in the population;
wherein: (i) the phenotype data comprises the known clinically
determined phenotype of each individual; (ii) the outcomes data for
each individual comprises outcomes for one or more single
nucleotide polymorphism variables and one or more clinical
variables listed in column 1 of Table 18; and wherein: (a) the
probability function is for prognosing early PCa recurrence
according to the invention, and the variables for which outcomes
data is obtained (and referred to in (ii)) comprise or consist of
the model 1 variables listed in Table 18; (b) the probability
function is for prognosing early PCa recurrence according to the
invention, and the variables for which outcomes data is obtained
(and referred to in (ii)) comprise or consist of the model 2
variables listed in Table 18; (c) the probability function is for
prognosing early PCa recurrence according to the invention, and the
variables for which outcomes data is obtained (and referred to in
(ii)) comprise or consist of the model 3 variables listed in Table
18; and/or (d) the probability function is for prognosing early PCa
recurrence according to the invention, and the variables for which
outcomes data is obtained (and referred to in (ii)) comprise or
consist of the model 4 variables listed in Table 18.
[0146] Suitable study populations and statistical analysis methods
are described above. Reference may also be made to the present
Examples.
[0147] Details for calculation of a probability function using the
variables listed for each of models 1 to 4 are given in Tables 4 to
7 respectively. Statistical analyses may be performed, for example,
using the Statistical Package for the Social Sciences (SPSS Inc.
Headquarters, Chicago, Ill., USA) version 14.0. These may be used
for calculation of probability function values for use in the
methods herein. The probability functions may be used to determine
a prognosis according to the invention.
[0148] In one aspect the invention relates to probability functions
constructed or derived using the data in any of Tables 4 to 7, and
to their use in a method, e.g. a computational method, for
prognosing PCa recurrence. The invention further relates to
associated computer programs and computer systems as described
herein. The invention also relates to the probability functions
derived according to the present methods and to their use in the
methods described herein.
[0149] The invention also relates to the probability functions
derived according to the present methods and to their use in the
methods described herein.
[0150] The process of calculating a probability function value for
a test subject and comparing the value to values obtained from a
study population of individuals of known phenotypes in order to
evaluate the risk of developing a phenotype in the test subject may
also be carried out using appropriate software.
[0151] Therefore in one aspect the invention relates to a
computational method for prognosing PCa recurrence using the
outcomes of discriminating variables ("outcomes data") obtained
according to the methods described herein (e.g. variables listed
for any of models 1 to 4). In the computational method, outcomes
data for the discriminating variables obtained from a test subject
(test outcomes data) is inputted in a suitable probability function
to produce a probability function value for the test subject. The
test probability function value is then compared with probability
function values for individuals of known phenotype (with respect to
PCa recurrence as described herein) in order to prognose the
likelihood of PCa recurrence in the test individual. The comparison
may be made using the methods described herein.
[0152] The invention further relates to a computer system
comprising a processor and means for controlling the processor to
carry out a computational method described herein, and to a
computer program comprising computer program code which when run on
a computer or computer network causes the computer or computer
network to carry out the computational method. In one aspect, the
computer program is stored on a computer readable medium.
[0153] As described above and in the Examples, the present
inventors have identified a number of single nucleotide
polymorphisms (SNPs) which show single locus allelic association
with poor PCa recurrence prognosis (likelihood of PCa recurrence,
defined according to the 0.4 ng/ml threshold). The SNPs are listed
in Table 1B and are SNPs 9. 24, 25, 28, 34, 46, 47, 51, 58 and 80.
The single allele studies continued throughout the follow up, and
therefore the association is with PCa recurrence at any time in the
subject's lifetime.
[0154] As shown in FIGS. 8-17, particular genotypes at these SNPs
are statistically significantly (P<0.05) associated with poor
prognosis. These are: SNP 9 (TT); SNP 24 (GG); SNP 25 (CC); SNP 28
(AA); SNP 34 (AA); SNP 46 (GG); SNP 47 (GG); SNP 51 (AA); SNP 58
(TT); and SNP 80 (TT).
[0155] By identifying the nucleotide in the genomic DNA of a
subject at one (or more) of these SNPs, it is possible to assess
the risk or susceptibility of that individual to PCa recurrence
during the individual's lifetime.
[0156] In one aspect the invention relates to the use of one or
more of the SNPs in Table 1B in a method for prognosing PCa, in
particular for determining the likelihood of PCa recurrence, e.g.
within 5 years of surgery as described herein. Thus the invention
in one aspect relates to a method for prognosing PCa recurrence (as
described herein) comprising determining the genotype of an
individual at one or more of the SNPs in Table 1B.
[0157] In general the present methods are carried out ex vivo or in
vitro, e.g. using a sample obtained from the individual. A method
may comprise use of the outcomes of clinical variables which have
been obtained by the methods described herein.
[0158] Various methods are known in the art for determining the
presence or absence in a test sample of a particular nucleic acid
sequence, for example a nucleic acid sequence which has a
particular nucleotide at a position of single nucleotide
polymorphism. For example, genotype may be determined by microarray
analysis, sequencing, primer extension, ligation of allele specific
oligonucleotides, mass determination of primer extension products,
restriction length polymorphism analysis, single strand
conformational polymorphism analysis, pyrosequencing, dHPLC or
denaturing gradient gel electrophoresis (DGGE). Furthermore, having
sequenced nucleic acid of an individual or sample, the sequence
information can be retained and subsequently searched without
recourse to the original nucleic acid itself. Thus, for example, a
sequence alteration or mutation may be identified by scanning a
database of sequence information using a computer or other
electronic means.
[0159] In general, a sample is provided, containing nucleic acid
which comprises at least one of the genetic variations to be
tested. The nucleic acid comprises one or more target regions
comprising the genetic variation(s) (SNPs) which are to be
characterised.
[0160] The nucleic acid may be obtained from any appropriate
biological sample which contains nucleic acid. The sample may be
taken from a fluid or tissue, secretion, cell or cell line derived
from the human body.
[0161] For example, samples may be taken from blood, including
serum, lymphocytes, lymphoblastoid cells, fibroblasts, platelets,
mononuclear cells or other blood cells, from saliva, liver, kidney,
pancreas or heart, urine or from any other tissue, fluid, cell or
cell line derived from the human body. For example, a suitable
sample may be a sample of cells from the buccal cavity.
[0162] Preferably nucleic acid is obtained from a blood sample.
[0163] In general, nucleic acid is extracted from the biological
sample using conventional techniques. The nucleic acid to be
extracted from the biological sample may be DNA, or RNA, typically
total RNA.
[0164] Typically RNA is extracted if the genetic variation to be
studied is situated in the coding sequence of a gene. Where RNA is
extracted from the biological sample, the methods may further
comprise a step of obtaining cDNA from the RNA. This may be carried
out using conventional methods, such as reverse transcription using
suitable primers. Subsequent procedures are then typically carried
out on the extracted DNA or the cDNA obtained from extracted RNA.
The term DNA, as used herein, may include both DNA and cDNA.
[0165] In general the genetic variations to be tested are known and
characterised, e.g. in terms of sequence. Therefore nucleic acid
regions comprising the genetic variations may be obtained using
methods known in the art.
[0166] In one aspect, DNA regions which contain the genetic
variations (SNPS) to be identified (target regions) are subjected
to an amplification reaction in order to obtain amplification
products which contain the genetic variations to be identified. Any
suitable technique or method may be used for amplification.
[0167] For example, the polymerase chain reaction (PCR) (reviewed
for instance in "PCR protocols; A Guide to Methods and
Applications," Eds Innis et al, 1990, Academic Press, New York,
Mullis et al, Cold Spring Harbor Symp. Quant. Biol., 51:263,
(1987), Ehrlich (ed), PCR technology, Stockton Press, N Y, 1989,
and Ehrlich et al, Science, 252:1643-1650, (1991)) may be used. The
nucleic acid used as template in the amplification reaction may be
genomic DNA, cDNA or RNA.
[0168] Other specific nucleic acid amplification techniques include
strand displacement activation, the QB replicase system, the repair
chain reaction, the ligase chain reaction, rolling circle
amplification and ligation activated transcription.
[0169] Allele-specific oligonucleotides may be used in PCR to
specifically amplify particular sequences if present in a test
sample. Assessment of whether a PCR band contains a gene variant
may be carried out in a number of ways familiar to those skilled in
the art. The PCR product may for instance be treated in a way that
enables one to display the polymorphism on a denaturing
polyacrylamide DNA sequencing gel, with specific bands that are
linked to the gene variants being selected.
[0170] Those skilled in the art are well versed in the design of
primers for use in processes such as PCR. Various techniques for
synthesizing oligonucleotide primers are well known in the art,
including phosphotriester and phosphodiester synthesis methods.
[0171] A further aspect of the present invention provides a pair of
oligonucleotide amplification primers suitable for use in the
methods described herein.
[0172] PCR primers suitable for amplification of target DNA regions
comprising the SNPs in Table 1A are listed in Table 3A and Table
3B. The present methods may comprise the use of one or more of
these primers or one or more of the listed primer pairs, according
to the SNPs to be genotyped, wherein these SNPs are selected as
described herein. In one aspect the method comprises use of all of
the primers listed in Tables 3A and 3B. Suitable reaction
conditions may be determined using the knowledge in the art.
[0173] The amplified nucleic acid may then be sequenced and/or
tested in any other way to determine the presence or absence of a
particular feature. Nucleic acid for testing may be prepared from
nucleic acid removed from cells or in a library using a variety of
other techniques such as restriction enzyme digest and
electrophoresis.
[0174] For example, the allele of the at least one polymorphism
(i.e. the identity of the nucleotide at the position of single
nucleotide polymorphism) may be determined by determining the
binding of an oligonucleotide probe to the amplified region of the
genomic sample. A suitable oligonucleotide probe comprises a
nucleotide sequence which binds specifically to a particular allele
of the at least one polymorphism and does not bind specifically to
other alleles of the at least one polymorphism. Such a probe may
correspond in sequence to a region of genomic nucleic acid, or its
complement, which contains one or more of the SNPs described
herein. Under suitably stringent conditions, specific hybridisation
of such a probe to test nucleic acid is indicative of the presence
of the sequence alteration in the test nucleic acid. For efficient
screening purposes, more than one probe may be used on the same
test sample.
[0175] Those skilled in the art are well able to employ suitable
conditions of the desired stringency for selective hybridisation,
taking into account factors such as oligonucleotide length and base
composition, temperature and so on.
[0176] Suitable selective hybridisation conditions for
oligonucleotides of 17 to 30 bases include hybridization overnight
at 42.degree. C. in 6.times.SSC and washing in 6.times.SSC at a
series of increasing temperatures from 42.degree. C. to 65.degree.
C.
[0177] Other suitable conditions and protocols are described in
Molecular Cloning: a Laboratory Manual: 2nd edition, Sambrook et
al., 1989, Cold Spring Harbor Laboratory Press and Current
Protocols in Molecular Biology, Ausubel et al. eds., John Wiley
& Sons, 1992.
[0178] A further aspect of the present invention provides an
oligonucleotide which hybridises specifically to a nucleic acid
sequence which comprises a particular allele of a polymorphism
selected from the group consisting of the single nucleotide
polymorphisms shown in Table 1A. 1B or Table 18, and does not bind
specifically to other alleles of the SNP. Hybridisation may be
determined under suitable selective hybridisation conditions as
described herein.
[0179] Such oligonucleotides may be used in a method of screening
nucleic acid.
[0180] In some preferred embodiments, oligonucleotides according to
the present invention are at least about 10 nucleotides in length,
more preferably at least about 15 nucleotides in length, more
preferably at least about 20 nucleotides in length.
Oligonucleotides may be up to about 100 nucleotides in length, more
preferably up to about 50 nucleotides in length, more preferably up
to about 30 nucleotides in length. The boundary value `about X
nucleotides` as used above includes the boundary value `X
nucleotides`. Oligonucleotides which specifically hybridise to
particular alleles of the SNPs listed in Table 1A are listed in
Table 2 and are described herein.
[0181] Where the nucleic acid is double-stranded DNA, hybridisation
will generally be preceded by denaturation to produce
single-stranded DNA. The hybridisation may be as part of an
amplification, e.g. PCR procedure, or as part of a probing
procedure not involving amplification. An example procedure would
be a combination of PCR and low stringency hybridisation. A
screening procedure, chosen from the many available to those
skilled in the art, is used to identify successful hybridisation
events and isolated hybridised nucleic acid.
[0182] Binding of a probe to target nucleic acid (e.g. DNA) may be
measured using any of a variety of techniques at the disposal of
those skilled in the art. For instance, probes may be
radioactively, fluorescently or enzymatically labelled. Other
methods not employing labelling of probe include examination of
restriction fragment length polymorphisms, amplification using PCR,
RNase cleavage and allele specific oligonucleotide probing. Probing
may employ the standard Southern blotting technique. For instance
DNA may be extracted from cells and digested with different
restriction enzymes. Restriction fragments may then be separated by
electrophoresis on an agarose gel, before denaturation and transfer
to a nitrocellulose filter. Labelled probe may be hybridised to the
DNA fragments on the filter and binding determined. DNA for probing
may be prepared from RNA preparations from cells.
[0183] Approaches which rely on hybridisation between a probe and
test nucleic acid and subsequent detection of a mismatch may be
employed. Under appropriate conditions (temperature, pH etc.), an
oligonucleotide probe will hybridise with a sequence which is not
entirely complementary. The degree of base-pairing between the two
molecules will be sufficient for them to anneal despite a
mis-match. Various approaches are well known in the art for
detecting the presence of a mis-match between two annealing nucleic
acid molecules.
[0184] For instance, RNase A cleaves at the site of a mis-match.
Cleavage can be detected by electrophoresing test nucleic acid to
which the relevant probe or probe has annealed and looking for
smaller molecules (i.e. molecules with higher electrophoretic
mobility) than the full length probe/test hybrid.
[0185] Nucleic acid in a test sample, which may be a genomic sample
or an amplified region thereof, may be sequenced to identify or
determine the identity of a polymorphic allele. The allele of the
SNP in the test nucleic acid can therefore be compared with the
susceptibility alleles of the SNP as described herein to determine
whether the test nucleic acid contains one or more alleles which
are associated with disease.
[0186] Typically in sequencing, primers complementary to the target
sequence are designed so that they are a suitable distance (e.g.
50-400 nucleotides) from the polymorphism. Sequencing is then
carried out using conventional techniques. For example, primers may
be designed using software that aims to select sequence(s) within
an appropriate window which have suitable Tm values and do not
possess secondary structure or that will hybridise to non-target
sequence.
[0187] Sequencing of an amplified product may involve precipitation
with isopropanol, resuspension and sequencing using a TaqFS+Dye
terminator sequencing kit. Extension products may be
electrophoresed on an ABI 377 DNA sequencer and data analysed using
Sequence Navigator software.
[0188] Genotype analysis may be carried out by microarray analysis.
Any suitable microarray technology may be used. The methodology
reported in Tejedor et al 2005 (Clinical Chemistry, 51: 1137-1144)
including the MG 1.0 software, and in International Patent
Application No. PCT/IB2006/00796 filed 12 Jan. 2006 (the contents
of which are hereby incorporated by reference) may be used. This
technology uses a low-density DNA array and hybridisation to
allele-specific oligonucleotide probes to screen for SNPs. Thus in
one aspect the ProScan microarray and technology of the present
invention may be used to determine the genotype of the informative
SNPs as described herein.
[0189] Once a subject has received a prognosis of aggressive PCa (a
significant risk of PCa recurrence according to the invention), the
most appropriate treatment for that subject can be selected. In
this way, the invention allows better targeting of therapies to
patients.
[0190] Thus in a further aspect, the invention provides a method of
selecting a suitable treatment for a subject diagnosed as having
localised PCa, the method comprising:
(a) determining the likelihood of PCa recurrence post-surgery in
the subject by a method described herein; and (b) selecting a
suitable treatment.
[0191] The selected treatment may then be administered to the
subject. Thus the invention also relates to a method of treating
localised PCa in a subject comprising:
(a) determining the likelihood of PCa recurrence post-surgery in
the subject by a method described herein; and (b) treating the
subject with a suitable treatment.
[0192] For example, where risk of recurrence is determined for a
pre-operative subject, this may be used to determine whether
surgery is required or desirable, or to select another local
therapy. Where recurrence is determined for a post-operative
subject, this will help to assess whether an adjuvant therapy, e.g.
radiation or chemotherapy, is required or advisable.
[0193] Means for carrying out the present prognostic methods may be
provided in kit form e.g. in a suitable container such as a vial in
which the contents are protected from the external environment.
Therefore in one aspect the invention further relates to prognostic
kits suitable for use in the methods described herein. Typically a
kit comprises:
(i) means for determining outcomes for the selected variable(s) or
SNP variables; and (ii) instructions for determining prognosis
based on the outcomes of the variables.
[0194] The means (i) may comprise one or more oligonucleotide
probes suitable for detection of one or more SNP variables to be
determined. For example, the means (i) may comprise one or more
probe pairs or probe sets listed in Table 2. In one instance the
kit may comprise all of the probe sets in Table 2.
[0195] The means (i) may comprise a suitable microarray, as
described herein. The means (i) may comprise one or more pairs of
sequencing primers suitable for sequencing one or more of the SNP
variables to be determined.
[0196] The instructions (ii) typically comprise instructions to use
the outcomes determined using the means (i) for the prognosis. The
instructions may comprise a chart showing risks of PCa recurrence.
The kit may include details of probability functions which may be
used in prognosis, such as those described herein.
[0197] A kit may in some cases include a computer program as
described herein.
[0198] A kit may include other components suitable for use in the
present methods. For example, a kit may include primers suitable
for amplification of target DNA regions containing the SNPs to be
determined, such as those described herein. For example, a kit may
contain one or more primer pairs listed in Tables 3A & 3B. A
kit may also include suitable labelling and detection means,
controls and/or other reagents such as buffers, nucleotides or
enzymes e.g. polymerase, nuclease, transferase.
[0199] Nucleic acid according to the present invention, such as an
oligonucleotide probe and/or pair of amplification primers, may be
provided as part of a kit. The kit may include instructions for use
of the nucleic acid, e.g. in PCR and/or a method for determining
the presence of nucleic acid of interest in a test sample. A kit
wherein the nucleic acid is intended for use in PCR may include one
or more other reagents required for the reaction, such as
polymerase, nucleosides, buffer solution etc. The nucleic acid may
be labelled.
[0200] A kit for use in determining the presence or absence of
nucleic acid of interest may include one or more articles and/or
reagents for performance of the method, such as means for providing
the test sample itself, e.g. a swab for removing cells from the
buccal cavity or a syringe for removing a blood sample (such
components generally being sterile).
[0201] In a further aspect the present invention also relates to
DNA chips or microarrays and methods for their use, which allow
reliable genotyping of individuals with respect to multiple PCa
associated genetic variations simultaneously and for clinical
purposes.
[0202] Thus in one aspect, the invention further provides a method
of genotyping PCa associated genetic variations in an individual,
which is sufficiently sensitive, specific and reproducible for
clinical use. The inventors have developed low density
DNA-microarrays with specifically designed probes for use in the
method, and a computational method or algorithm for interpreting
and processing the data generated by the arrays.
[0203] In one aspect, the invention relates to an in vitro method
for genotyping PCa associated genetic variations in an individual.
The method allows simultaneous genotyping of multiple human genetic
variations present in one or more genes of a subject. The method of
the invention allows identification of nucleotide changes, such as,
insertions, duplications and deletions and the determination of the
genotype of a subject for a given genetic variation.
[0204] Genetic variation or genetic variant refers to mutations,
polymorphisms or allelic variants. A variation or genetic variant
is found amongst individuals within the population and amongst
populations within the species.
[0205] A PCa associated genetic variation may refer to a genetic
variation that is associated with PCa in a statistically
significant way and that can be used as an aid in the diagnosis,
prognosis or prediction of response to therapy in an
individual.
[0206] Polymorphism refers to a variation in the sequence of
nucleotides of nucleic acid where every possible sequence is
present in a proportion of equal to or greater than 1% of a
population; in a particular case, when the said variation occurs in
just one nucleotide (A, C, T or G) it is called a single nucleotide
polymorphism (SNP).
[0207] Genetic mutation refers to a variation in the sequence of
nucleotides in a nucleic acid where every possible sequence is
present in less than 1% of a population
[0208] Allelic variant or allele refers to a polymorphism that
appears in the same locus in the same population.
[0209] Thus a genetic variation may comprise a deletion,
substitution or insertion of one or more nucleotides. In one aspect
the genetic variations to be genotyped according to the present
methods comprise SNPs.
[0210] A given gene may comprise one or more genetic variations.
Thus the present methods may be used for genotyping of one or more
genetic variations in one or more genes.
[0211] Typically the individual is a human.
[0212] Typically, for a given genetic variation there are three
possible genotypes:
AA the individual is homozygous for genetic variation A (e.g.
homozygous for a wild type allele) BB the individual is homozygous
for genetic variation B (e.g. homozygous for a mutant allele) AB
the individual is heterozygous for genetic variations A and B (e.g.
one wild type and one mutant allele)
[0213] The genetic variations, such as SNPs, to be analysed
according to the present methods, are associated with PCa. Examples
of genetic variations associated with PCa which may be assessed by
the present methods include those in Table 1A (FIG. 1A).
[0214] The sequences of all the genes mentioned in Table 1A are
known and recognized on the following websites: GenBank (NCBI),
GeneCard (Weizmann Institute of Sciences) and Snpper.chip.org
(Innate Immunity PGA). RefSNP codes (rs#) for each SNP are taken
from the Single Nucleotide Polymorphism Database (dbSNP) curated by
the National Center for Biotechnology Information (NCBI) (as found
online at ncbi.nlm.nih.gov/entrez/query.fcgi?CMD=search&DB=snp,
as of 4 Jul. 2007).
[0215] By permitting clinical genotyping of one or more of the
above genetic variations, the present method has use in for
example, diagnosing susceptibility to or the presence of PCa in a
subject. The present genotyping methods are also be useful in
prognosing PCa phenotypes, as described herein.
[0216] At least one PCa associated genetic variation, e.g. SNP, is
analysed in the present genotyping methods. The present methods
allow simultaneous genotyping of multiple variations in an
individual and typically multiple variations are analysed, in
general, at least 10, 12, 14, 16, 18 or 20 PCa associated genetic
variations. For example, at least 25, 30, 35, 40, 45, 50, 55, 60,
65, 70, 75, 80, 85, 90, 95, 100, 105 or 110 variations or up to
150, 200, 300, 400, 500, or 600 variations may be tested, such as
250, 350 or 450 variations.
[0217] Thus the genotyping methods may be used for genotyping an
individual with respect to all of or a selection of the variations
in Table 1A, as described herein. For example, at least 1, 2, 3, 4,
5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85
or all of the Table 1 variations may be genotyped. The variations
to be detected may additionally include other PCa associated
genetic variations.
[0218] The present invention also encompasses methods in which
other genetic variations are assessed in addition to the PCa
associated genetic variations.
[0219] According to the present methods, a sample is provided,
containing nucleic acid which comprises at least one of the genetic
variations to be tested (the target DNA). Suitable samples and
methods for obtaining the samples are described herein in relation
to the prognostic methods.
[0220] As described, DNA regions which contain the genetic
variations to be identified (target DNA regions) may be subjected
to an amplification reaction in order to obtain amplification
products which contain the genetic variations to be identified. Any
suitable technique or method may be used for amplification. In
general, the technique allows the (simultaneous) amplification of
all the DNA sequences containing the genetic variations to be
identified. In other words, where multiple genetic variations are
to be analysed, it is preferable to simultaneously amplify all of
the corresponding target DNA regions (comprising the variations).
Carrying out the amplification in a single step (or as few steps as
possible) simplifies the method.
[0221] For example, multiplex PCR may be carried out, using
appropriate pairs of oligonucleotide PCR primers which are capable
of amplifying the target regions containing the genetic variations
to be identified. Any suitable pair of primers which allow specific
amplification of a target DNA region may be used. In one aspect,
the primers allow amplification in the least possible number of PCR
reactions. Thus, by using appropriate pairs of oligonucleotide
primers and appropriate conditions, all of the target DNA regions
necessary for genotyping the genetic variations can be amplified
for genotyping (e.g. DNA-chip) analysis with the minimum number of
reactions. Suitable PCR primers for amplification of target DNA
regions comprising the PCa-associated genetic variations in Table
1A are listed in Tables 3A & 3B. The present method may
comprise the use of one or more of these primers or one or more of
the listed primer pairs. For example, the present methods may be
used for genotyping of Table 1A variations selected as described
above. The corresponding primers in Table 3A & 3B may be
selected for use accordingly.
[0222] In one instance, the amplification products can be labelled
during the amplification reaction with a detectable label. The aim
is to be able to later detect hybridisation between the fragments
of target DNA containing the genetic variations being analysed and
probes fixed on a solid support. The greater the extent of
hybridisation of labelled target DNA to a probe, the greater the
intensity of detectable label at that probe position.
[0223] The amplification products may be labelled by conventional
methods. For example, a labelled nucleotide may be incorporated
during the amplification reaction or labelled primers may be used
for amplification.
[0224] Labelling may be direct using for example, fluorescent or
radioactive markers or any other marker known by persons skilled in
the art. Examples of fluorophores which can be used, include for
example, Cy3 or Cy5. Alternatively enzymes may be used for sample
labelling, for example alkaline phosphatase or peroxidase. Examples
of radioactive isotopes which can be used include for example
.sup.33P, .sup.125I, or any other marker known by persons skilled
in the art. In one instance, labelling of amplification products is
carried out using a nucleotide which has been labelled directly or
indirectly with one or more fluorophores. In another example,
labelling of amplification products is carried out using primers
labelled directly or indirectly with one or more fluorophores.
[0225] Labelling may also be indirect, using, for example, chemical
or enzymatic methods. For example, an amplification product may
incorporate one member of a specific binding pair, for example
avidin or streptavidin, conjugated with a fluorescent marker and
the probe to which it will hybridise may be joined to the other
member of the specific binding pair, for example biotin
(indicator), allowing the probe/target binding signal to be
measured by fluorimetry. In another example, an amplification
product may incorporate one member of a specific binding pair, for
example, an anti-digoxigenin antibody combined with an enzyme
(marker) and the probe to which it will hybridise may be joined to
the other member of the specific binding pair, for example
dioxigenin (indicator). On hybridization of amplification product
to probe the enzyme substrate is converted into a luminous or
fluorescent product and the signal can be read by, for example,
chemi-luminescence or fluorometry.
[0226] The nucleic acid comprising the genetic variation(s) to be
tested, e.g. the (optionally labelled) amplification products, may
further undergo a fragmentation reaction, thereby obtaining some
fragmentation products which comprise or contain the genetic
variations to be identified or analysed. Typically fragmentation
increases the efficiency of the hybridisation reaction.
Fragmentation may be carried out by any suitable method known in
the art, for example, by contacting the nucleic acid, e.g. the
amplification products with a suitable enzyme such as a DNase.
[0227] If the nucleic acid has not been previously labelled, e.g.
during the amplification reaction, (and, typically, where no
posthybridisation amplification or ligation is carried out on the
solid support) then labelling with a detectable label may be
carried out prehybridisation by labelling the fragmentation
products. Suitable labelling techniques are known in the art and
may be direct or indirect as described herein. Direct labelling may
comprise the use of, for example, fluorophores, enzymes or
radioactive isotopes. Indirect labelling may comprise the use of,
for example, specific binding pairs that incorporate e.g.
fluorophores, enzymes, etc. For example, if amplification products
have not been labelled during the amplification reaction the
fragmentation products may undergo a direct or indirect labelling
with one or various markers, for example one or various
fluorophores, although other known markers can be used by those
skilled in the art.
[0228] According to the present methods the nucleic acid, e.g. the
amplification or fragmentation products, comprising the genetic
variation(s) to be detected (target DNA), is contacted with
oligonucleotide probes which are capable of detecting the
corresponding genetic variations by hybridisation under suitable
conditions.
[0229] Typically the hybridisation conditions allow specific
hybridisation between probes and corresponding target nucleic acids
to form specific probe/target hybridisation complexes while
minimising hybridisation between probes carrying one or more
mismatches to the DNA. Such conditions may be determined
empirically, for example by varying the time and/or temperature of
hybridisation and/or the number and stringency of the array washing
steps that are performed following hybridisation and are designed
to eliminate all probe-DNA interactions that are inspecific.
[0230] In the method, the probes are provided deposited on a solid
support or surface. The probes are deposited at positions on the
solid support according to a predetermined pattern, forming a
"DNA-chip". It has been found that the chips should comply with a
number of requirements in order to be used in the present methods,
for example in terms of the design of the probes, the number of
probes provided for each genetic variation to be detected and the
distribution of probes on the support. These are described in
detail herein. The inventors have developed suitable genotyping
chips for use in the present methods and accordingly in one aspect
the invention provides a DNA-chip or (micro)array comprising a
plurality of probes deposited or immobilised on a solid support as
described herein.
[0231] In general the solid support or phase comprises
oligonucleotide probes suitable for detection of each genetic
variation to be tested in the present method. The number and type
of genetic variations to be tested using a chip may be selected as
described herein.
[0232] Typically there will be at least one probe which is capable
of hybridising specifically to genetic variation A (e.g. a wildtype
or normal allele) (probe 1) and one probe which is capable of
hybridising specifically to genetic variation B (e.g. a mutant
allele) (probe 2) under the selected hybridisation conditions.
These probes form a probe pair. Probe 1 is for detection of genetic
variation A and probe 2 for detection of genetic variation B.
Typically the probes can be used to discriminate between A and B
(e.g. the wildtype and mutant alleles).
[0233] The probes may examine either the sense or the antisense
strand. Typically, probes 1 and 2 examine the same nucleic acid
strand (e.g. the sense strand or antisense strand) although in some
cases the probes may examine different strands. In one aspect
probes 1 and 2 have the same sequence except for the site of the
genetic variation.
[0234] In one instance, the probes in a probe pair have the same
length. In some aspects, where two or more pairs of probes are
provided for analysis of a genetic variation, the probes may all
have the same length.
[0235] Preferably more than one probe pair is provided for
detection of each genetic variation. Thus, at least 2, 3, 4, 5, 6,
7, 8, 9, 10 or more probe pairs may be provided per genetic
variation. In one aspect, (at least) 2 probe pairs are provided.
The aim is to reduce the rate of false positives and negatives in
the present methods.
[0236] For example, for a given genetic variation there may be:
Probe 1 which is capable of hybridising to genetic variation A
(e.g. a normal allele) Probe 2 which is capable of hybridising to
genetic variation B (e.g. a mutant allele) Probe 3 which is capable
of hybridising to genetic variation A (e.g. a normal allele) Probe
4 which is capable of hybridising to genetic variation B (e.g. a
mutant allele).
[0237] The probes may examine the same or different strands. Thus
in one embodiment, probes 3 and 4 are the complementary probes of
probes 1 and 2 respectively and are designed to examine the
complementary strand. In one aspect it is preferred that the probes
provided for detection of each genetic variation examine both
strands.
[0238] More than 2 pairs of probes may be provided for analysis of
a genetic variation as above. For example, where a genetic
variation exists as any one of 4 bases in the same strand (e.g.
there are three mutant possibilities), at least one pair of probes
may be provided to detect each possibility. Preferably, at least 2
pairs of probes are provided for each possibility.
[0239] Thus, for example, for an SNP G2677T/A/C, at least one pair
of probes may be provided for detection of G2677T, one pair for
detection of G2677/A, and one pair for detection of G2677C.
Preferably at least two pairs of probes are provided for each of
these substitutions.
[0240] A number of methods are known in the art for designing
oligonucleotide probes suitable for use in DNA-chips.
[0241] A "standard tiling" method may be used. In this method, 4
oligonucleotides are designed that are totally complementary to the
reference sequence except in the central position where, typically
the 4 possible nucleotides A, C, G and T are examined. An
illustrative example of this strategy is the DNA-chip for
genotyping of HIV-1 (Affymetrix).
[0242] In "alternative tiling" 5 oligonucleotides are designed, so
that the fifth examines a possible deletion in the sequence. An
example of this strategy is the DNA-chip to detect mutations in p53
(Affymetrix).
[0243] In "block tiling" 4 oligonucleotides are designed that are
totally complementary to the normal sequence and another 4 totally
complementary to the mutant sequence. The nucleotide which changes
is placed in the central position, but a mismatch of one of the 4
bases (A, C, T or G) is placed 2 nucleotides before or after the
nucleotide position that it is wished to interrogate. An example of
this strategy is the DNA-chip for the detection of mutations in
cytochrome p450 (Roche and Affymetrix).
[0244] A further example is "alternative block tiling" where the
"mismatch" is used to increase the specificity of the hybrid not
only in one position but also in the positions -4, -1, 0, +1 and +4
to identify the change produced in the central position or 0. An
example is the DNA-chip to detect 1,500 SNPs (Affymetrix).
[0245] Any one or more of these strategies may be used to design
probes for the present invention. Preferably standard tiling is
used, in particular with 2 pairs of probes e.g. 2 pairs of
complementary probes as above. Thus it is preferable that the
oligonucleotide sequence is complementary to the target DNA or
sequence in the regions flanking the variable nucleotide(s).
However, in some cases, one or more mismatches may be introduced,
as described above.
[0246] The oligonucleotide probes for use in the present invention
typically present the base to be examined (the site of the genetic
variation) at the centre of the oligonucleotide. This is
particularly the case where differential hybridisation methods are
used, as in general this allows the best discrimination between
matched and mismatched probes. In these methods, typically there is
formation of specific detectable hybridisation complexes without
post-hybridisation on-chip amplification. For example, for precise
(single base) mutations, the base which differs between the normal
and the mutant allele is typically placed in the central position
of the probe. In the case of insertions, deletions and
duplications, the first nucleotide which differs between the normal
and the mutant sequence is placed in the central position. It is
believed that placing the mutation at the centre of the probe
maximises specificity.
[0247] Where post-hybridisation on-chip amplification (e.g.
ligation or primer extension methods) is employed, oligonucleotide
probes typically present the variable base(s) at the 3' end of the
probe. Where OLA methodology is used, oligonucleotides (labelled
directly or indirectly) are also designed which hybridise to
probe-target complexes to allow ligation.
[0248] In general the probes for use in the present invention
comprise or in some embodiments consist (essentially) of 17 to 27
nucleotides, for example, 19, 21, 23, or 25 nucleotides or 18, 20,
22, 24 or 26 nucleotides.
[0249] Preferably the individual probes provided for detection of a
genetic variation are capable of hybridising specifically to the
normal and mutant alleles respectively under the selected
hybridisation conditions. For example, the melting temperature of
the probe/target complexes may occur at 75-85.degree. C. and
hybridisation may be for one hour, although higher and lower
temperatures and longer or shorter hybridisations may also
suffice.
[0250] The probes provided for (suitable for) detection of each
genetic variation (as described above) are typically capable of
discriminating between genetic variation A and B (e.g. the normal
and mutant alleles) under the given hybridisation conditions as
above. Preferably the discrimination capacity of the probes is
substantially 100%. If the discrimination capacity is not 100%, the
probes are preferably redesigned. Preferably the melting
temperature of the probe/target complexes occurs at 75-85 degrees
C. Methods for testing discrimination capacity are described
herein.
[0251] In one example, the probes provided for detection of a
genetic variation examine both strands and have lengths ranging
from 19-27 nucleotides. Preferably the probes have 100%
discrimination capacity and the melting temperature of probe/target
complexes is 75-85 degrees C.
[0252] Typically in order to obtain probes for use in the present
methods, a number of probes are designed and tested experimentally
for, e.g. hybridisation specificity and ability to discriminate
between genetic variants (e.g. a normal and a mutant allele).
Candidate oligonucleotide probe sequences may be designed as
described above. These may vary for example in length, strand
specificity, position of the genetic variation and degree of
complementarity to the sequence flanking the genetic variation in
the target DNA. Once probe pairs have been designed, these can be
tested for hybridisation specificity and discrimination capacity.
The capacity of specific probes to discriminate between the genetic
variations A and B (e.g. normal and mutant alleles) depends on
hybridisation conditions, the sequence flanking the mutation and
the secondary structure of the sequence in the region of the
mutation. By using stable hybridisation conditions, appropriate
parameters such as strand specificities and lengths can be
established in order to maximise discrimination. Preferably, the
genetic variation is maintained at the central position in the
tested probes.
[0253] Methods for testing discrimination capacity of probes are
described herein. Typically a number of candidate probe pairs are
provided and used in a training method as described below. In
general two pairs of probes (probes 1 and 2, and probes 3 and 4)
are tested in the method. For example, two pairs of probes
examining both strands (complementary to each other) may be tested.
If it is not possible to obtain 100% discrimination between the
three genotyping groups using the probes, the probes are typically
redesigned. Hybridisation conditions in the training method are
generally maintained stably. Typically the melting temperature of
probe/target complexes is 75-85.degree. C.
[0254] For example, starting from probes of 25 nucleotides which
detect a genetic variation (e.g. the normal allele) and another
genetic variation (e.g. a mutant allele) in both strands (sense and
antisense), in general an average of 8 probes may be experimentally
tested to identify two definite pairs.
[0255] Probes are chosen to have maximum hybridisation specificity
and discrimination capacity between genetic variants (e.g. a normal
and a mutant allele) under suitable hybridisation conditions. For
example, the probes for detection of a given genetic variation,
e.g. two probe pairs, typically have substantially 100%
discrimination capacity. Typically the melting temperature of
probe/target complexes is at 75-85.degree. C.
[0256] Using the methods herein the inventors have developed
oligonucleotide probes suitable for detection of the PCa-associated
genetic variations in Table 1A. These probes are presented as SEQ
ID NOS 1-360 (Table 2). The probes are listed in probe sets (90
sets in total), according to the genetic variation to be detected.
At least two pairs of probes are listed in each set.
[0257] In one aspect the invention relates to any one or more of
the oligonucleotide probes, pairs of probes or sets of probes set
out in SEQ ID NOS 1-360 (Table 2), and to their use in the
genotyping, diagnostic or therapeutic methods of the invention. The
invention further relates to any one or more of the oligonucleotide
probes, pairs of probes or sets of probes set out in SEQ ID NOS
1-360 for use in medicine, for example in a diagnostic or
therapeutic method described herein. A chip of the invention may
comprise one or more of the listed probe pairs or sets as described
herein.
[0258] In general probes are provided on the support in replicate.
Typically, at least 4, 6, 8, 10, 12, 14, 16, 18 or 20 replicates
are provided of each probe, in particular, 6, 8 or 10 replicates.
Thus for example, the support (or DNA-chip) may comprise or include
10 replicates for each of (at least) 4 probes used to detect each
genetic variation (i.e. 40 probes). Alternatively the support (or
DNA-chip) may comprise or include 8 replicates for each of (at
least) 4 probes used to detect each genetic variation (i.e. 32
probes). Still further the support (or DNA-chip) may comprise or
include 6 replicates for each of (at least) 4 probes used to detect
each genetic variation (i.e. 24 probes). Using probe replicates
helps to minimise distortions in data interpretation from the chip
and improves reliability of the methods.
[0259] In general the support also comprises one or more control
oligonucleotide probes. These are also provided in replicate as
above. Thus the support (or DNA-chip) may additionally comprise one
or more oligonucleotides deposited on the support which are useful
as positive and/or negative controls of the hybridisation
reactions. If post-hybridisation amplification or ligation
reactions are carried out on the chip, there may also be one or
more positive or negative controls of these reactions.
[0260] Typically the chip or array will include positive control
probes, e.g., probes known to be complementary and hybridisable to
sequences in the target polynucleotide molecules, probes known to
hybridise to an external control DNA, and negative control probes,
e.g., probes known to not be complementary and hybridizable to
sequences in the target polynucleotide molecules. The chip may have
one or more controls specific for each target, for example, 2, 3,
or more controls. There may also be at least one control for the
array.
[0261] Positive controls may for example be synthesized along the
perimeter of the array or in diagonal stripes across the array. The
reverse complement for each probe may be synthesized next to the
position of the probe to serve as a negative control. In yet
another example, sequences from other species of organism may be
used as negative controls in order to help determine background
(non-specific) hybridisation.
[0262] As above, the support (or DNA-chip) may include some (one or
more) oligonucleotides deposited on the support which are useful as
positive and negative controls of the hybridization reactions. In
general, each one of the sub-arrays, for example 16, which
typically constitute a DNA-chip, is flanked by some external
hybridization controls, which serve as reference points allowing
allow the points within the grid to be located more easily.
[0263] In one instance, the nucleotide sequence of an external
control DNA is the following (5'->3'):
CEH: GTCGTCAAGATGCTACCGTTCAGGAGTCGTCAAGATGCTACCGTTCAGGA (SEQ ID NO:
539)
[0264] and the sequences of the oligonucleotides for its detection
are the following:
ON1: CTTGACGACTCCTGAACGG (SEQ ID NO: 540)
ON2: CTTGACGACACCTGAACGG (SEQ ID NO: 541)
[0265] Positive control probes are generally designed to hybridise
equally to all target DNA samples and provide a reference signal
intensity against which hybridisation of the target DNA (sample) to
the test probes can be compared. Negative controls comprise either
"blanks" where only solvent (DMSO) has been applied to the support
or control oligonucleotides that have been selected to show no, or
only minimal, hybridisation to the target, e.g. human, DNA (the
test DNA). The intensity of any signal detected at either blank or
negative control oligonucleotide features is an indication of
non-specific interactions between the sample DNA and the array and
is thus a measure of the background signal against which the signal
from real probe-sample interactions must be discriminated.
[0266] Desirably, the number of sequences in the array will be such
that where the number of nucleic acids suitable for detection of
genetic variations is n, the number of positive and negative
control nucleic acids is n', where n' is typically from 0.01 to
0.4n.
[0267] In general, the support or chip is suitable for genotyping
PCa associated genetic variations, in particular, genotyping
according to the present methods. The chip typically comprises
probes suitable for detection of at least one but preferably
multiple, PCa associated genetic variation(s), typically at least
10, 12, 14, 16, 18 or 20 variations. For example, at least 25, 30,
35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 105 or 110
variations or up to 150, 200, 300, 400, 500, or 600 variations may
be tested, such as 250, 350 or 450 variations.
[0268] The PCa associated genetic variations may include any or all
of those in Table 1A. Thus an array or chip may comprise probes
suitable for genotyping an individual with respect to all of the
variations in Table 1A, or a selection of the variations in the
Table, as described herein.
[0269] A DNA-chip according to the invention (`Proscan`) allows
simultaneous, sensitive, specific and reproducible genotyping of
genetic variations associated with PCa. Non-limiting examples of
such variations are given in Table 1A. Nevertheless, the number of
genetic variations contained in the Table can be increased as other
genetic variations are subsequently identified and are associated
with PCa. Thus the genetic variations detectable by the chip may
comprise, or consist (essentially) of those listed in Table 1A or
Table 1B or FIG. 18 or a selection of these, as described in
relation to the present methods. The chip will comprise probes
suitable for detection of these genetic variations as described
herein. Preferably where a chip comprises probes for detection of a
genetic variation in Table 1A the chip comprises one or more of the
probes listed in SEQ ID NOS 1-360 (Table 2) as suitable for
detection of that genetic variation, e.g. the probes set listed in
SEQ ID NOs 1-360 for detection of that variation. In one aspect the
present chip comprises one or more probes selected from those in
SEQ ID NOS 1-360. The probes are listed in probe sets, according to
the genetic variation to be detected. At least two pairs of probes
are provided in each set. A chip may comprise at least one probe
pair or at least one probe set, or a selection of the probe sets,
for example a probe pair or a probe set from at least 5, 10, 15,
20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85 or all 90
sets, according to the genetic variations being tested. A chip may
comprise other probes for detection of variations in Table 1A or
other variations associated with PCa instead of or in addition to
those specifically listed.
[0270] Proscan may additionally comprise oligonucleotide probes for
detection of genetic variations not associated with PCa. For
example, the chips may comprise probes for detection of genetic
variations such as SNPs associated with another (related) condition
such as colon, rectal or bladder cancer. Typically, in Proscan, the
number of nucleic acids suitable for detection of genetic
variations associated with PCa (e.g. those in Table 1A or Table 1B
or FIG. 18) represent at least 50%, 60%, 70%, 75%, 80%, 85%, 90%,
95%, 97%, 98%, 99% or more of the nucleic acids in the array.
[0271] In general the support or chip has from 300 to 40000 nucleic
acids (probes), for example, from 400 to 30000 or 400 to 20000. The
chip may have from 1000 to 20000 probes, such as 1000 to 15000 or
1000 to 10000, or 1000 to 5000. A suitable chip may have from 2000
to 20000, 2000 to 10000 or 2000 to 5000 probes. For example, a chip
may have 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000,
10000, 12000, 14000, 16000, 18000 or 20000 probes. Smaller chips
400 to 1000 probes, such as 400, 500, 600, 700, 800, 900 or 950
probes are also envisaged.
[0272] In general the array or chip of the invention comprises a
support or surface with an ordered array of binding (e.g.
hybridisation) sites or probes. Thus the arrangement of probes on
the support is predetermined Each probe (i.e. each probe replicate)
is located at a known predetermined position on the solid support
such that the identity (i.e. the sequence) of each probe can be
determined from its position in the array. Typically the probes are
uniformly distributed in a predetermined pattern.
[0273] Preferably, the probes deposited on the support, although
they maintain a predetermined arrangement, are not grouped by
genetic variation but have a random distribution. Typically they
are also not grouped within the same genetic variation. If desired,
this random distribution can be always the same. Therefore,
typically the probes are deposited on the solid support (in an
array) following a predetermined pattern so that they are uniformly
distributed, for example, between the two areas that may constitute
a DNA-chip, but not grouped according to the genetic variation to
be characterised. Distributing probe replicates across the array in
this way helps to reduce or eliminate any distortion of signal and
data interpretation, e.g. arising from a non-uniform distribution
of background noise across the array.
[0274] As explained above, probes may be arranged on the support in
subarrays.
[0275] The support, on which the plurality of probes is deposited,
can be any solid support to which oligonucleotides can be attached.
Practically any support, to which an oligonucleotide can be joined
or immobilized, and which may be used in the production of
DNA-chips, can be used in the invention. For example, the said
support can be of a non-porous material, for example, glass,
silicone, plastic, or a porous material such as a membrane or
filter (for example, nylon, nitrocellulose) or a gel. In one
embodiment, the said support is a glass support, such as a glass
slide.
[0276] Microarrays are in general prepared by selecting probes
which comprise a given polynucleotide sequence, and then
immobilizing such probes to a solid support or surface. Probes may
be designed, tested and selected as described herein. In general
the probes may comprise DNA sequences. In some embodiments the
probes may comprise RNA sequences, or copolymer sequences of DNA
and RNA. The polynucleotide sequences of the probes may also
comprise DNA and/or RNA analogues, or combinations thereof. For
example, the polynucleotide sequences of the probes may be full or
partial fragments of genomic DNA. The polynucleotide sequences of
the probes may also be synthesized nucleotide sequences, such as
synthetic oligonucleotide sequences. The probe sequences can be
synthesized either enzymatically in vivo, enzymatically in vitro
(e.g., by PCR), or non-enzymatically in vitro.
[0277] Microarrays or chips can be made in a number of ways.
However produced, microarrays typically share certain
characteristics. The arrays are reproducible, allowing multiple
copies of a given array to be produced and easily compared with
each other. Preferably, microarrays are made from materials that
are stable under binding (e.g., nucleic acid hybridization)
conditions. The microarrays are preferably small, e.g., between
0.25 to 25 or 0.5 to 20 cm.sup.2, such 0.5 to 20 cm.sup.2 or 0.5 to
15 cm.sup.2, for example, 1 to 15 cm.sup.2 or 1 to 10 cm.sup.2,
such as 2, 4, 6 or 8 cm.sup.2.
[0278] Probes may be attached to the present support using
conventional techniques for immobilization of oligonucleotides on
the surface of the supports. The techniques used depend, amongst
other factors, on the nature of the support used [porous
(membranes, micro-particles, etc.) or non-porous (glass, plastic,
silicone, etc.)] In general, the probes can be immobilized on the
support either by using non-covalent immobilization techniques or
by using immobilization techniques based on the covalent binding of
the probes to the support by chemical processes.
[0279] Preparation of non-porous supports (e.g., glass, silicone,
plastic) requires, in general, either pre-treatment with reactive
groups (e.g., amino, aldehyde) or covering the surface of the
support with a member of a specific binding pair (e.g. avidin,
streptavidin). Likewise, in general, it is advisable to
pre-activate the probes to be immobilized by means of corresponding
groups such as thiol, amino or biotin, in order to achieve a
specific immobilization of the probes on the support.
[0280] The immobilization of the probes on the support can be
carried out by conventional methods, for example, by means of
techniques based on the synthesis in situ of probes on the support
(e.g., photolithography, direct chemical synthesis, etc.) or by
techniques based on, for example, robotic arms which deposit the
corresponding pre-synthesized probe (e.g. printing without contact,
printing by contact).
[0281] In one embodiment, the support is a glass slide and in this
case, the probes, in the number of established replicates (for
example, 6, 8 or 10) are printed on pre-treated glass slides, for
example coated with aminosilanes, using equipment for automated
production of DNA-chips by deposition of the oligonucleotides on
the glass slides ("micro-arrayer"). Deposition is carried out under
appropriate conditions, for example, by means of crosslinking with
ultraviolet radiation and heating (80.degree. C.), maintaining the
humidity and controlling the temperature during the process of
deposition, typically at a relative humidity of between 40-50% and
typically at a temperature of 20.degree. C.
[0282] The replicate probes are distributed uniformly amongst the
areas or sectors (sub-arrays), which typically constitute a
DNA-chip. The number of replicas and their uniform distribution
across the DNA-chip minimizes the variability arising from the
printing process that can affect experimental results. Likewise,
positive and negative hybridisation controls (as described herein)
may be printed.
[0283] To control the quality of the manufacturing process of the
DNA-chip, in terms of hybridization signal, background noise,
specificity, sensitivity and reproducibility of each replica as
well as differences caused by variations in the morphology of the
spotted probe features after printing, a commercial DNA can be
used. For example, as a quality control of the printing of the
DNA-chips, hybridization may be carried out with a commercial DNA
(e.g. k562 DNA High Molecular Weight, Promega)
[0284] In the first place, the morphology and size of the printed
spots are analyzed. In the hybridization with control DNA the
parameters described below for determining reliability of genotype
determination, are adhered to; specifically the relationship
between the signal intensity and background noise, average
specificity and sensitivity and reproducibility between replicated
copies of the same probe. This method allows the correct genotype
of the control DNA to be determined.
[0285] As above, in accordance with the present method, a nucleic
acid sample, e.g. amplification or fragmentation products,
comprising the genetic variation(s) to be detected (target DNA) is
contacted with a probe array as described herein, under conditions
which allow hybridisation to occur between target DNA and the
corresponding probes. Specific hybridisation complexes are thus
formed between target nucleic acid and corresponding probes.
[0286] The hybridization of e.g. fragmentation products, with
probes capable of detecting corresponding genetic variations
deposited on a support may be carried out using conventional
methods and devices. In one instance, hybridization is carried out
using an automated hybridisation station. For hybridization to
occur, the e.g. fragmentation products, are placed in contact with
the probes under conditions which allow hybridization to take
place. Using stable hybridization conditions allows the length and
sequence of the probes to be optimised in order to maximize the
discrimination between genetic variations A and B, e.g. between
wild type and mutant sequences, as described herein.
[0287] In one instance, the method relies on differential
hybridisation, in particular an increase in hybridisation signal.
The method involves formation of specific hybridisation complexes
between target DNA and corresponding probes. Thus target DNA
bearing the wild type sequence will hybridise to the probes
designed to detect the wild type sequence, whereas target DNA
bearing a mutant sequence will hybridise to the probes designed to
detect that mutant sequence. The hybridisation complexes are
detectably labelled by means described herein (e.g. the target DNA
is directly labelled, or both target and probe are labelled in such
a way that the label is only detectable on hybridisation). By
detecting the intensity of detectable label (if any) at the
predetermined probe positions it is possible to determine the
nature of the target DNA in the sample. In this instance the probes
(also referred to as allele specific oligonucleotides, ASOs)
preferably have the variable nucleotide(s) at the central position,
as described herein.
[0288] In another instance, hybridisation of target DNA to probes
on the solid support (chip) may be followed by on-chip
amplification, for example, using primer extension or ligation,
e.g. oligonucleotide ligation assay (OLA) technologies (Eggerding F
A, Iovannisci D M, Brinson E., Grossman P., Winn-Deen E. S. 1995
Human Mutation, 5:153-65). In this case, the probes on the support
typically comprise the variable nucleotide(s) at the 3' end of the
probe.
[0289] Labelling can be carried out during post hybridisation
amplification. The labelling can be by direct labelling using, for
example, fluorophores, enzymes, radioactive isotopes, etc. or by
indirect labelling using, for example, specific binding pairs which
incorporate fluorophores, enzymes etc., by using conventional
methods, such as those previously mentioned in relation to
labelling amplification or fragmentation products.
[0290] Post-hybridization amplification may be carried out, for
example, using the "primer extension" methodology. Typically, after
hybridization, an extension reaction of the hybrid oligonucleotides
is carried out on the support (e.g. a glass slide). Extension may
be carried out with directly or indirectly labelled nucleotides and
will only happen if the extreme 3' of the oligonucleotide
hybridizes perfectly with the amplification product.
[0291] Primer extension is a known method for genotype
discrimination (Pastinen T, Raitio M, Lindroos K, Tainola P,
Peltonen L, Syvanen A C. 2000 Genome Research 10:1031-42.) and can
be performed in a number of different ways. In a commonly used
approach a set of allele specific oligonucleotide probes are
designed to hybridise to the target sequences. The probes differ
from one another in their extreme 3' nucleotide, which for each
probe is designed to complement one of the possible polymorphic
nucleotides at a given position.
[0292] When the 3' nucleotide of the probe complements the sequence
under test then the ensuing base pairing allows a DNA polymerase to
extend the oligonucleotide primer by incorporation of additional
nucleotides that can be directly or indirectly labelled thereby
allowing the subsequent identification of those probes that have
been extended and those that have not. Probes that are successfully
extended carry the complementary nucleotide to the SNP at their 3'
end thus allowing the genotype of the test sample to be determined.
Similar approaches, for example the Amplification Refractory
Mutation System (ARMS) have also been developed.
[0293] Alternatively, a post hybridization ligation reaction may be
carried out, for example using OLA methodology. After
hybridization, a ligation reaction of the hybridised
oligonucleotides is carried out on the support (e.g. glass slide)
with labelled oligonucleotides. A ligation will only take place if
the extreme 3' end of the probe deposited on the support hybridizes
perfectly with the target DNA (e.g. amplification product).
[0294] The oligonucleotide ligation assay (OLA) is another method
for interrogating SNPs (Eggerding F A, Iovannisci D M, Brinson E.,
Grossman P., Winn-Deen E. S. 1995 Human Mutation, 5:153-65). OLA
uses a pair of oligonucleotide probes that hybridize to adjacent
segments of target DNA including the variable base. The probe
designed to hybridise to the 5' side of the polymorphic nucleotide
is an allele-specific oligonucleotide (ASO) to one of the target
alleles. The last base at the 3' end of this ASO is positioned at
the site of the target DNA's polymorphism; the ASO typically also
has a biotin molecule at its 5' end that functions as a "hook" that
can subsequently be used to recover the oligonucleotide by virtue
of the highly specific interaction that biotin undergoes with
streptavidin.
[0295] The oligomer on the 3' or right-hand side of the pair is the
common oligomer (the sequence is the same for the two or more
different alleles it is wished to test.) The common oligomer is
positioned at an invariable site next to the target DNA's
polymorphism and is fluorescently labelled at its 3' end.
[0296] If the ASO is perfectly complementary to the target sequence
the ASO hybridizes completely when annealed and will lie flat
against that target allowing DNA ligase to covalently join the ASO
to the common oligomer. After the ligation reaction the biotin hook
is used to remove the ASO and the e.g. fluorescently labeled common
oligomer will also be removed, producing detectable
fluorescence.
[0297] When the ASO is not a perfect match to the target sequence
hybridization is incomplete and the 3' base of the oligomer will
not be base-paired to the target DNA thus preventing ligation.
Under these circumstances when the biotin hook is used to remove
the ASO, the common oligonucleotide will not be removed and
therefore there is no detectable label, e.g. fluorescence, in the
molecule removed.
[0298] To distinguish between two known alleles that differ by a
single base, three oligonucleotides are necessary: Two are
allele-specific oligonucleotides (ASOs) that differ from each other
only in the single 3' terminal base; the first is complementary to
one allele and the second is complementary to the second allele.
The third oligonucleotide is complementary to the invariable
sequence adjacent to the variant base.
[0299] Once hybridisation (and optionally post-hybridisation
amplification) has taken place, the intensity of detectable label
at each probe position (including control probes) can be
determined. The intensity of the signal (the raw intensity value)
is a measure of hybridisation at each probe.
[0300] The intensity of detectable label at each probe position
(each probe replica) may be determined using any suitable means.
The means chosen will depend upon the nature of the label. In
general an appropriate device, for example, a scanner, collects the
image of the hybridized and developed DNA-chip. An image is
captured and quantified.
[0301] In one instance, e.g. where fluorescent labelling is used,
after hybridization, (optionally after post-hybridization
amplification or ligation) the hybridized and developed DNA-chip is
placed in a scanner in order to quantify the intensity of labelling
at the points where hybridization has taken place. Although
practically any scanner can be used, in one embodiment a
fluorescence confocal scanner is used. In this case, the DNA-chip
is placed in the said apparatus and the signal emitted by the
fluorophore due to excitation by a laser is scanned in order to
quantify the signal intensity at the points where hybridization has
taken place. Non-limiting examples of scanners which can be used
according to the present invention, include scanners marketed by
the following companies: Axon, Agilent, Perkin Elmer, etc.
[0302] Typically, in determining the intensity of detectable label
at each probe position (i.e. for each probe replica), account is
taken of background noise, which is eliminated. Background noise
arises because of non-specific binding to the probe array and may
be determined by means of controls included in the array. Once the
intensity of the background signal has been determined, this can be
subtracted from the raw intensity value for each probe replica in
order to obtain a clean intensity value. Typically the local
background, based on the signal intensity detected in the vicinity
of each individual feature is subtracted from the raw signal
intensity value. This background is determined from the signal
intensity in a predetermined area surrounding each feature (e.g. an
area of X, Y or Z .mu.m2 centred on the position of the probe).
[0303] The background signal is typically determined from the local
signal of "blank" controls (solvent only). In many instances the
device, e.g. scanner, which is used to determine signal intensities
will provide means for determining background signal.
[0304] Thus, for example, where the label is a fluorescent label,
absolute fluorescence values (raw intensity values) may be gathered
for each probe replica and the background noise associated with
each probe replica can also be assessed in order to produce "clean"
values for signal intensity at each probe position.
[0305] Once the target DNA has been hybridised to the chip and the
intensity of detectable label has been determined at the probe
replica positions on the chip (the raw intensity values), it is
necessary to provide a method (model) which can relate the
intensity data from the chip to the genotype of the individual.
[0306] The inventors have found that this can be done by applying a
suitable algorithm to the intensity data. The algorithm and
computer software developed by the inventors allows analysis of the
genetic variations with sufficient sensitivity and reproducibility
as to allow use in a clinical setting. The algorithm uses three
linear functions which characterise each of the three genotypes AA,
AB and BB for a given genetic variation. The method generally
involves collating the intensity values for all of the replicas of
each probe, to calculate an average intensity value for each probe.
Optionally, the raw intensity values for each replica may be
amended to take account of background noise (to obtain a clean
intensity value) before the intensity values for each of the
replicas are collated.
[0307] In general, for a given genetic variation, analysis and
interpretation of a chip comprises the following steps:
(a) providing the intensity of detectable label at each replica for
each of at least four probes (probes 1, 2, 3 and 4) provided for
detection of the genetic variation (the raw intensity value),
wherein: [0308] probe 1 detects (is capable of specifically
hybridising to) genetic variation A (e.g. a normal allele), and
probe 2 detects (is capable of specifically hybridising to) genetic
variation B (e.g. a mutant allele); [0309] probe 3 detects (is
capable of specifically hybridising to) genetic variation A (e.g. a
normal allele) and probe 4 detects (is capable of specifically
hybridising to) genetic variation B (e.g. a mutant allele); and
[0310] probes 1 and 2 form a first probe pair and probes 3 and 4
form a second probe pair; (b) optionally amending the raw intensity
value for each replica to take account of background noise, thus
obtaining a clean intensity value; (c) collating the (optionally
clean) intensity values for each of the replicas of each probe and
determining an average intensity value for each probe; (d)
calculating ratios 1 and 2 wherein:
[0310] Ratio 1 = average intensity value for probe 1 average
intensity value for probe 1 + average intensity value for probe 2
##EQU00001## and ##EQU00001.2## Ratio 2 = average intensity value
for probe 3 average intensity value for probe 3 + average intensity
value for probe 4 ##EQU00001.3##
(e) inputting ratios 1 and 2 into each of three linear functions
which characterise each of the three possible genotypes, AA, AB and
BB, wherein: Function 1 is the linear function that characterises
individuals with the genotype AA and consists of a linear
combination of ratios 1 and 2; Function 2 is the linear function
that characterises individuals with the genotype AB and consists of
a linear combination of ratios 1 and 2; Function 3 is the linear
function that characterises individuals with the genotype BB and
consists of a linear combination of ratios 1 and 2; the linear
functions are formed by coefficients which accompany the variables
ratio 1 and 2; (f) determining which of the three linear functions
has the highest value; and (g) thereby determining the genotype of
the individual for the genetic variation.
[0311] Thus the linear function corresponding to the genotype of
that individual will have the highest absolute value.
[0312] The inventors have found that the use of replicas and
averages calculated from replicas is important for reliable working
of the invention. Use of the functions speeds up analysis and
allows better discrimination.
[0313] Preferably the discrimination capacity between the three
genotypes is (approximately) 100%. If the discrimination is less
than 100% the probes are preferably redesigned.
[0314] The raw intensity value for each probe replica may be
determined according to the methods described above. Thus probe
sequences and replicas can be selected as described herein. In one
example, 4 probes are used per genetic variation and 6, 8 or 10
replicas are used per probe.
[0315] Typically, amending the raw intensity value to obtain the
clean intensity value for each probe replica comprises subtracting
background noise from the raw value. Background noise is typically
determined using appropriate controls as described herein.
[0316] Typically calculating the average intensity value comprises
eliminating extreme values or outliers. Thus, when the (optionally
clean) intensity values from each of the probe replicas are
collated, outlying values can be identified and excluded from
further consideration. In one embodiment outliers make up between
10% and 50%, for example, 15, 20, 25, 30, 35, 40 or 45% of the
values obtained. In one embodiment, 40% of values are eliminated.
In one embodiment, 4 probes are used with 6, 8 or 10 replicas per
probe and extreme values or outliers make up between 10% and 50% of
the values obtained.
[0317] A number of suitable linear functions are known in the art.
These functions may be used in a linear discriminant analysis for
the purposes of the present invention.
[0318] In one aspect the invention thus relates to a computational
method or model (algorithm) for determining genotype with respect
to a given genetic variation using ratios 1 and 2 in the three
linear functions as defined above (steps e and f). The method can
thus in one embodiment produce an output of genotype (AA, AB or BB)
from an input of ratios 1 and 2. The method may also include
calculating one or both of ratios 1 and 2 (step d). In some
embodiments the method additionally comprises calculating an
average intensity value for each probe (step c) and/or calculating
a clean intensity value for each probe replica (step b). Thus the
input to the model may comprise one or more of the average
intensity values, clean replica intensity values or raw replica
intensity values. The method may additionally comprise determining
the raw intensity value for each probe replica (step a). The method
may comprise one or more of the above steps.
[0319] In order to carry out the above methods, the coefficients
for the linear functions must first be determined in a training
process using data from control individuals whose genotype for the
genetic variation is already known. Methods for training are known
in the art. Typically in such methods, input data (in this case,
typically ratios 1 and 2) is used for which the output (in the
present case, genotype) is already known. Coefficients are
substituted in the three linear equations at random and the output
is calculated. Based on that output, one or more coefficients are
altered and the input data is entered again to produce another
output. The process is continued until coefficients are obtained
which optimise the desired output. These optimised coefficients are
then used in the linear functions when the method is applied to
test data (where the output is as yet unknown).
[0320] In order to train the present model, ratios 1 and 2 are
obtained for n control individuals having genotype AA (for example,
homozygous wild type), n control individuals having genotype AB
(heterozygous) and n control individuals having genotype BB (for
example, homozygous mutant). The ratios may be obtained using the
methods described above. The ratios are inputted as above and the
coefficients altered in a discriminatory analysis until three
linear functions are obtained which maximise discrimination between
the AA, AB and BB groups. These coefficients are then used in the
three functions when the model is used on unknown test samples
(where the genotype is not predetermined).
[0321] Thus in one aspect the invention provides a method of
deriving linear functions for use in the present genotyping
methods. The method typically comprises carrying out the steps of
the genotyping methods as described, for n control individuals
having genotype AA (for example, homozygous wild type), n control
individuals having genotype AB (heterozygous) and n control
individuals having genotype BB (for example, homozygous mutant)
with respect to a genetic variation. The intensity values obtained
for each of the probe replicas are gathered as described and an
algorithm is applied.
[0322] As described for the genotyping methods, application of the
algorithm comprises calculating an average intensity value for each
probe and the algorithm uses three linear functions intended to
characterise each of the three possible genotypes, AA, AB and BB
for the given genetic variation. Coefficients are inserted in the
functions in a repetitive way until functions are derived which
maximise discrimination between the genotypes in a discriminatory
analysis. This provides the coefficients for use in the linear
functions when the method or algorithm is in operational use (i.e.
to determine the genotype of test individuals).
[0323] The algorithm or method which uses the three linear
functions for analysing the intensity data may be as described
above.
[0324] In some cases, the training method allows feedback
optimisation. Thus, as intensity values and ratios are obtained for
test individuals and these are genotyped, the intensity data, e.g.
the ratios, and genotype are inputted and coefficients recalculated
for the linear functions.
[0325] In one aspect the invention relates to a computational
method for training. The method can be used to derive linear
functions for use in the present genotyping methods by using ratios
1 and 2 obtained for each of n individuals having genotype AA, n
individuals having genotype AB and n individuals having genotype BB
with respect to a genetic variation. The ratios can be obtained by
the methods described above. The method typically comprises
applying the algorithm which uses the three linear functions
(Functions 1, 2 and 3) intended to characterise each of the three
possible genotypes AA, AB or BB for the genetic variation such
that:
Function 1 is the linear function that characterises individuals
with the genotype AA and consists of a linear combination of ratios
1 and 2; Function 2 is the linear function that characterises
individuals with the genotype AB and consists of a linear
combination of ratios 1 and 2; Function 3 is the linear function
that characterises individuals with the genotype BB and consists of
a linear combination of ratios 1 and 2; and the linear functions
are formed by coefficients which accompany the variables ratio 1
and 2; and deriving linear functions which maximise discrimination
between the three genotype groups AA, AB and BB in a discriminatory
analysis, so as to obtain the coefficients which can be used in the
linear functions when the algorithm is used in a test method (i.e.
is in operational use for determining genotype).
[0326] The algorithm or method which uses the three linear
functions for analysing the intensity data may be as described
above.
[0327] The computational training method may additionally involve
calculating ratios 1 and 2 from average intensity value provided
for each of the probes, and/or collating intensity values from
probe replicas to determine an average intensity value for each
probe and/or amending a raw intensity value for a probe replica to
take account of background noise thereby obtaining clean intensity
values for the replica.
[0328] In some aspects the computational method also allows a
feedback optimisation step as described.
[0329] Typically in training n is .gtoreq.3, for example, 3, 4, 5,
6, 7, 8, 9 or 10. In one aspect, n is .gtoreq.5. In some cases n
may be from 10 to 50 or more, for example, 15 to 40, or 25 to 35,
such as 20 or 30.
[0330] Probes and probe replicas for the training method are
selected as described herein. In one embodiment 4 probes are used
for each genetic variation, with 6, 8 or 10 replicas of each probe.
Once selected, the probes used in training are also used when the
model is in operational use (to determine unknown genotype). If the
probes are altered, typically the model must be retrained to
optimise discrimination with the new probes.
[0331] Preferably the coefficients are such that the discrimination
between the three genotype groups (both in training and in
operational use) is substantially 100%. If the discrimination is
not 100%, the probes are preferably redesigned.
[0332] As above, the model may also undergo feedback optimisation
when it is in operational use. In that case, the model is first
used to determine the genotype of an individual (AA, AB or BB). The
ratios 1 and 2 for that individual are then inputted into the model
and the coefficients in the linear functions altered as necessary
in order to optimise discrimination between the three genotype
groups. In this way, the additional data gathered as the model is
in use can be used to optimise the discrimination capacity of the
linear functions.
[0333] There are a number of parameters which can be determined and
optimised in order to optimise performance and reliability of the
analytical model or method.
[0334] (i) In one aspect ratios 1 and 2 determined for an
individual fall within the range of ratios 1 and 2 used to train
the model (i.e. to optimise the three linear functions). If desired
this can thus provide a double test for the genotype of an
individual.
[0335] (ii) In one aspect the average fluorescence intensity of 4n
replicas (where "n" is the number of replicas for each probe, e.g.
6, 8 or 10), for example, 40 replicas, with regard to the
background noise is greater than 5.
[0336] (iii) In one aspect the variation between intensity values
(raw or clean) for replicas of the same probe is a minimum. For
example, the coefficient of variation between the intensity values
for the replicas of a given probe is preferably less than 0.25
[0337] (iv) In one aspect the ratio of the sum of the raw intensity
values for all probe replicas on a chip to the intensity of the
background noise is greater than 15 when a fluorescence scanner is
used.
[0338] (v) In one aspect the raw signal intensity value obtained
for the negative controls is .ltoreq.3 times greater than the
intensity value of the background noise. For example, negative
controls may include the DMSO "blank" and the non-hybridising
oligonucleotides referred to above. The background noise is the
signal derived from the regions of the array where no probe has
been spotted and may be determined as above.
[0339] Preferably any one or more of (i) to (v) applies when
intensity is fluorescence intensity of a fluorescent label, in
particular where the intensity is determined by means of a confocal
fluorescent scanner.
[0340] Ensuring that the model meets one or more of the above helps
to provide reliability and reproducibility. Any one or more of (i)
to (v) may be true for the model. Preferably the model meets (i)
above. In one example, (i), (ii) and (iii) are true. In another
example, (iii), (iv), (v) are true. Preferably, all of the above
are true for the model. This applies both to training and to
operational use.
[0341] As above, the experimentally derived ratios obtained for a
test sample may be compared to the ratios previously obtained for
the (n) control samples obtained from individuals of known
genotype, where n is as above, usually >5, or >10, or >20.
The reference ratios derived from analysis of the control samples
permits a genotype to be assigned to the test sample. This can
therefore be a double test.
[0342] In one instance the analytical method or algorithm of the
invention comprises a sequence of the following steps:
using 4 probes (2 pairs of probes) in replicate (6, 8 or 10
replicas), calculating the average intensity of each probe from the
collated intensities of the replicas; calculating ratios 1 and 2 as
above for the 2 pairs of probes (to detect the genetic variations A
and B); substituting ratios 1 and 2 obtained in three linear
equations which have been derived in a discriminatory analysis
using ratios 1 and 2 calculated for "n" control patients with
genotype AA, "n" control patients with genotype AB and "n" control
patients with genotype BB (with respect to the genetic variation)
(in one experiment "n" is 5); and determining the genotype of a
patient for the genetic variation (for each genetic variation
included in the DNA-chip) based on which linear function has the
greatest absolute value. The test ratios may also be compared to
the ratios of the "n" control patients to determine each
genotype.
[0343] The analysis and interpretation above has been described
with respect to one genetic variation. However, it is to be
understood that the present chip generally includes probes for
detection of multiple genetic variations which can be analysed at
the same time. Thus the present methods include analysis of
multiple genetic variations, as described herein, in parallel.
[0344] In a further aspect the invention relates to a computer
system comprising a processor and means for controlling the
processor to carry out a computational method of the invention.
[0345] The invention additionally relates to a computer program
comprising computer program code which when run on a computer or
computer network causes the computer or computer network to carry
out a computational method of the invention. The computer program
may be stored on a computer readable medium.
[0346] In addition to the probes and chips described herein, the
inventors have also designed and validated oligonucleotide primers
which are capable of amplifying, e.g. by means of multiplex PCR,
target DNA regions containing the human genetic variations
associated with PCa in Table 1A. These primers are useful in
preparing nucleic acid for use in the present genotyping,
prognostic and therapeutic methods.
[0347] Tables 3A & 3B list pairs of primers which amplify
target DNA regions containing the PCa associated genetic variations
in Table 1A (SEQ ID NOS 361-538) along with the corresponding
genetic variation.
[0348] The listed oligonucleotide primers have the advantage of
allowing specific amplification of the said target DNA regions in a
very low number of PCR reactions. The listed primers allow, in a
minimum number of multiplex PCR reactions, amplification of all the
fragments necessary for genotyping the genetic variations in Table
1A, and which may be analyzed on Proscan.
[0349] In a further aspect, the present invention relates to each
of the PCR primers listed in Tables 3A & 3B, and in particular
to each of the listed pairs of PCR primers and their use in PCR
amplification, e.g. in a multiplex PCR reaction, of a target DNA
region containing the corresponding genetic variation. The
invention in one aspect provides any one of these primers or pairs
of primers for use in medicine, in particular for use in the
present genotyping, prognostic or therapeutic methods.
[0350] The invention further relates to a PCR amplification kit
comprising at least one pair of listed PCR primers. The kit may
additionally include, for example, a (thermostable) polymerase,
dNTPs, a suitable buffer, additional primers, and/or instructions
for use, e.g. to amplify a target DNA region containing the
corresponding genetic variation. The kit may be used for
amplification of target DNA regions from nucleic acid samples, for
use in the present methods.
[0351] In another aspect the present invention relates to a
genotyping or diagnostic (preferably in vitro) kit for genotyping
PCa associated genetic variations and/or for diagnosing PCa or
susceptibility to PCa. The kit comprises a DNA-chip or array
according to the invention. The kit may additionally comprise
instructions for use of the chip in a genotyping method of the
invention, for example instructions for use in the present
analytical method or algorithm. Further components of a kit may
include: [0352] computer software, a computer program or a computer
system according to the invention; [0353] one or more PCR primers
or pairs of PCR primers according to the invention; and/or [0354] a
PCR amplification kit according to the invention.
[0355] The probes for the chip or PCR primers may be selected as
above depending on the genetic variations to be detected or the
diagnostic purpose of the kit.
[0356] The kit may contain one or more positive and/or negative
controls of the hybridisation reaction.
[0357] The invention further relates to the use of the kit in a
genotyping, prognostic or therapeutic method of the invention.
[0358] As described herein, the present genotyping methods are
useful for diagnosing PCa or susceptibility to PCa in a subject.
The genotyping results obtained in the methods may be used to
determine prognosis and may be useful in determining the
appropriate treatment for PCa (e.g. by predicting response to
therapy).
[0359] PCa presents a number of phenotypes, most notably benign vs
malignant and localised vs metastatic cancer. In some cases, PCa
may be biologically aggressive and likely to progress to metastatic
cancer. There may be a predisposition to suffer therapy related
osteoporosis, or androgen-deprivation therapy resistance. Patients
may also differ in their response to radiation therapy.
[0360] Little is currently known about what makes some PCa
biologically aggressive and more likely to progress to metastatic
and potentially lethal disease. Identifying genetic variations in
some key genes involved in PCa aggressiveness would be extremely
valuable for predicting PCa progression and for determining
specific treatment options in men diagnosed with the disease.
[0361] Some groups of men receiving androgen-deprivation therapy
for PCa, an increasingly common treatment, show an increased risk
of bone fracture while others never do. Identifying patients with
genetic predisposition to suffer from therapy-related osteoporosis
could be useful to select individuals for preventive
anti-osteoporotic treatment with bisphosphonates.
[0362] Genetic variations in hormone metabolism-related genes like
androgen receptor (AR) in men have been associated to
androgen-deprivation therapy resistance. Detecting germline
variations in those genes could be useful to identify patients who
would benefit from alternative therapeutic actions such as surgery,
radiation and chemotherapy.
[0363] External-beam radiotherapy appears to be as effective as
surgery in curing prostate cancer. However, a group of patients may
experience severe late sequelae, specifically proctitis or
cystitis, after high-dose external-beam conformal radiation
therapy. Detecting DNA variations associated to individual response
to radiation could help identify prospectively those patients and,
with dose de-escalation, spare them a great deal of discomfort and
suffering.
[0364] Particular genetic variations associated with PCa may be
predictive of particular phenotypes or development of particular
phenotypes and hence disease progression. In other words, it may be
that there is a statistically significant association between e.g.
the mutant allele B, of a given genetic variation and the
occurrence/development of a particular phenotype.
[0365] Since the present genotyping methods allow reliable
genotyping of multiple genetic variations in a clinical setting,
these can be used to genotype individuals of known PCa phenotype,
and to thus identify genetic variations predictive of particular
PCa phenotypes.
[0366] In one aspect the invention therefore relates to a method of
identifying genetic variations predictive of a particular PCa
phenotype, such as the phenotypes listed above. The method involves
genotyping a plurality of individuals with respect to one or more
genetic variations associated with PCa using a method of the
invention. In such a retrospective study typically 300-1000
individuals are genotyped, for example 400,500 or 600 individuals
may be genotyped. The phenotype of each individual is already known
based on standard clinical procedures (e.g. biopsy).
[0367] Once the genotypes are obtained, this data is compared with
the phenotype data and statistically significant associations
between particular genotypes and particular phenotypes are
identified. Methods for determining statistical significance are
known in the art.
[0368] The genetic variations identified as predictive of
particular phenotypes/disease course can then be used to diagnose
these phenotypes/disease courses in test individuals, by genotyping
the individuals with respect to the predictive genetic
variation(s). Thus it is possible to determine the likely course of
disease progression in the individual. Genotyping can be done by
any appropriate method, depending on the number of variations to be
tested. For example, a genotyping method of the invention may be
used. Alternatively, sequence based or other chip-based methods may
be appropriate.
[0369] Thus in one aspect the invention further relates to a method
of diagnosing PCa phenotype or predicting the likely course of
disease progression in an individual by determining the genotype of
the individual with respect to one or more genetic variations which
have been identified as predictive (of the particular PCa phenotype
or disease course) by the methods described herein.
[0370] Once the prediction has been made, it will then be possible
to select the most suitable therapeutic approach, e.g. to determine
the need for surgical intervention.
[0371] The present arrays and methods thus provide a means for
clinicians to predict the likely course of disease progression in
PCa patients and also aid in the selection of the most suitable
treatment regime. They are therefore useful prognostic tools.
Genotype information obtained according to the present invention
may aid in clinical decision making or diagnosis in cases where
symptoms (disease phenotype) are ambiguous. Genetic information
provided by Proscan or other methods could also help in determining
the likelihood of disease development in asymptomatic individuals
(e.g. immediate family members of PCa sufferers) allowing, for
example, guidance on lifestyle and diet to be provided and
indicating the need for continued monitoring of individuals who
have a genetic constitution that indicates possible susceptibility
to disease development.
[0372] In one aspect the invention therefore relates to a method of
diagnosing PCa or susceptibility to PCa in an individual, or
determining the likely course of disease progression in an
individual as above. Preferably the method is in vitro. The
invention further relates to a method of selecting a treatment, for
an individual having PCa, in some cases where the individual has
been diagnosed or tested according to the methods of the invention.
Still further the invention in some aspects relates to methods of
treating an individual suffering from PCa, wherein, after the
treatment is selected, the treatment is administered to the
individual.
[0373] The diagnostic, predictive and therapeutic methods may
comprise carrying out a genotyping method of the invention as
described herein. Any of the methods may involve carrying out a
training method of the invention as described herein in order to
derive linear functions for use in determining genotype. Further
the methods may comprise the use of a chip, computer system,
computer program, oligonucleotide probes or pair or set of probes,
oligonucleotide primer or pair of primers, PCR amplification kit or
diagnostic kit of the invention as described herein.
[0374] Apart from the contribution to the diagnosis and treatment
of PCa and the development of new therapeutic strategies for this
disease, the present invention is useful for elucidating the
molecular origin of PCa and the biology of other cancers.
[0375] In one aspect the present invention relates to a microarray
adapted for use in the present methods as described herein.
[0376] The invention further relates to the use of one or more
oligonucleotide probe(s) and/or one or more primer(s) or primer
pair(s) of the invention in a method for prognosing PCa, such as a
method described herein.
[0377] Further aspects of the invention will now be illustrated
with reference to the accompanying Figures and experimental
exemplification, by way of example and not limitation. Further
aspects and embodiments will be apparent to those of ordinary skill
in the art. All documents mentioned in this specification are
hereby incorporated herein by reference.
EXAMPLES
[0378] Although in general, the techniques mentioned herein are
well known in the art, reference may be made in particular to
Sambrook et al, 1989, Molecular Cloning: a laboratory manual.
Example 1
Detection of PCa Associated Human Genetic Variations Using a
DNA-Chip According to the Invention (Proscan)
1.1 Design of the DNA-Chip for Genotyping PCa Associated Genetic
Variations
[0379] A DNA-chip to detect human genetic variations associated
with PCa which permits simultaneous, sensitive, specific and
reproducible detection was designed and manufactured. The said
genetic variations are related to a greater or lesser risk of
suffering from PCa, a better or worse response to treatment and
also a better or worse prognosis of the disease. Illustrative
examples of human genetic variations associated with antigens
connected to PCa which can be determined using this DNA-chip are
shown in Table 1A (FIG. 1).
[0380] The DNA-chip designed and manufactured consists of a support
(glass slide) which shows a plurality of probes on its surface that
permits the detection of genetic variations previously mentioned.
These probes are capable of hybridizing with the amplified
sequences of the genes related to PCa. The DNA sequences of each
one of the probes used is referred to in Table 2 (FIG. 2) [in
general, the name of the gene and the mutation is indicated (change
of nucleotide, "ins": insertion "del" deletion or change of amino
acid)]. The listed probes have all been technically validated.
1.2 Production of the DNA-Chip
Printing and Processing of the Glass Slides
[0381] The probes capable of detecting the genetic variations
previously identified are printed onto aminosilane coated supports
(glass slides) using DMSO as a solvent. The printing is carried out
using a spotter or printer of oligonucleotides (probes) while
controlling the temperature and relative humidity.
[0382] The joining of the probes to the support (glass slides) is
carried out by means of crosslinking with ultraviolet radiation and
heating as described in the documentation provided by the
manufacturer (for example, Corning Lifesciences, as found at
corning.com). The relative humidity during the deposition process
is maintained between 40-50% and the temperature around 20.degree.
C.
1.3 Validation of the Clinical Usefulness of the DNA-Chip
1.3.1 Preparation of the Sample to be Hybridized
[0383] The DNA of the individual is extracted from a blood sample
by a standard protocol of filtration. (For example, commercial kits
from Macherey Nagel, Qiagene etc.).
[0384] Target DNA regions containing the genetic variations of
interest are amplified by multiplex PCR using appropriate pairs of
oligonucleotide primers. Any suitable pair of oligonucleotides can
be used which allow specific amplification of genetic fragments
where a genetic variation to be detected might exist.
Advantageously, those pairs of oligonucleotide primers which permit
the said amplifications to be performed in the least possible
number of PCR reactions are used.
[0385] The oligonucleotide primers used to PCR amplify target
regions containing the genetic variations in Table 1A are listed in
Tables 3A & 3B (FIG. 3) (SEQ ID NOS: 361-538). These primers
represent an additional aspect to the invention.
[0386] The PCR multiplex reactions are carried out simultaneously
under the same conditions of time and temperature which permit
specific amplification of the gene fragments in which the genetic
variations to be detected might exist. Once the PCR multiplex has
finished, agarose gel analysis is used to check that the
amplification reaction has taken place.
[0387] Next, the sample to be hybridized (products of
amplification) is subjected to fragmentation with a DNase and the
resulting fragmentation products subjected to indirect labelling. A
terminal transferase adds a nucleotide, covalently joined to one
member of a pair of molecules that specifically bind to one another
(e.g. biotin allowing subsequent specific binding to streptavidin)
to the ends of these small DNA fragments.
[0388] Before applying the sample to the DNA-chip, the sample is
denatured by heating to 95.degree. C. for 5 minutes and then, the
"ChipMap Kit Hybridization Buffer" (Ventana Medical System) is
added.
1.3.2 Hybridization
[0389] Hybridization is carried out automatically in a
hybridisation station such as the Ventana Discovery (Ventana
Medical Systems) that has been specifically developed for such a
use. Alternatively hybridisation can be performed manually.
[0390] The prehybridization and blocking of the slides is carried
out with BSA. Next, the hybridization solution (ChipMap Kit
Hybridization Buffer, Ventana Medical System) is applied to the
surface of the DNA-chip which is maintained at 45.degree. C. for 1
hour following the protocol of Ventana 9.0 Europe (Ventana Medical
System). Finally the slides are subjected to different cleaning
solutions (ChipMap hybridisation Kit Buffers, Ventana Medical
System). Once the process of hybridization has finished, the final
cleaning and drying of the slides begins.
[0391] When hybridization has taken place, the DNA chip is
developed by incubation with a fluorescently labelled molecule that
is able to specifically bind to the molecule incorporated into the
amplification product by terminal transferase (e.g. in the case of
biotin incorporation a fluorophore coupled to streptavidin such as
streptavidin-Cy3 can be used) to label the probe positions where
hybridization has occurred.
1.3.3. Scanning the Slides
[0392] The slides are placed in a fluorescent confocal scanner, for
example Axon 4100.sup.a, and the signal emitted by the fluorophore
is scanned when stimulated by the laser.
1.3.4 Quantification of the Image
[0393] The scanner's own software allows quantification of the
image obtained from the signal at the points where hybridization
has taken place.
1.3.5 Interpretation of the Results
[0394] From the signal obtained with the probes which detect the
different genetic variations, the genotype of the individual is
established. In the first instance the scanner software executes a
function to subtract the local background noise from the absolute
signal intensity value obtained for each probe. Next, the
replicates for each of the 4 probes that are used to characterize
each genetic variation are grouped. The average intensity value for
each of 4 probes is calculated using the average collated from the
replicates in order to identify abnormal values (outliers) that can
be excluded from further consideration. Once the average intensity
value for each of the probes is known then two ratios are
calculated (ratio 1 and ratio 2):
Ratio 1 = Average intensity for probe 1 Average intensity for probe
1 + Average intensity for probe 2 ##EQU00002## Ratio 2 = Average
intensity for probe 3 Average intensity for probe 3 + Average
intensity for probe 4 ##EQU00002.2##
wherein probe 1 detects (is capable of specifically hybridising to)
genetic variation A (e.g. a normal allele), probe 2 detects (is
capable of specifically hybridising to) genetic variation B (e.g. a
mutant allele), probe 3 detects (is capable of specifically
hybridising to) genetic variation A (e.g. a normal allele) and
probe 4 detects (is capable of specifically hybridising to) genetic
variation B (e.g. a mutant allele).
[0395] These ratios are substituted in three linear functions which
characterize each one of the three possible genotypes:
TABLE-US-00001 AA Function 1 AB Function 2 BB Function 3
[0396] The function which presents the highest absolute value
determines the genotype of the patient. In this case, the linear
functions are obtained by analyzing 5 subjects for each of the
three possible genotypes of the genetic variation (AA, AB, BB).
With the results, ratios 1 and 2 are calculated for the 15
subjects. These ratios are classification variables for the three
groups to create the linear functions, with which the
discriminatory capacity of the two pairs of designed probes are
evaluated. If the discriminatory capacity is not 100%, the probes
are redesigned. New subjects characterized for each of the three
genotypes make up new ratios 1 and 2 to perfect the linear
functions and in short, to improve the discriminatory capacity of
the algorithm based on these three functions.
[0397] When using a confocal fluorescent scanner, to obtain
reliable results it is preferable that ratios 1 and 2 are within
the range of the ratios used to build the groups, the average
fluorescence intensity of the 4n (for example 40) replicates with
regard to background noise is greater than 5 and the coefficient of
variation of all of the DNA-chip replicates is below 0.25.
[0398] Again when a fluorescent confocal scanner is used in the
experiment, for a complete hybridization to be considered reliable
preferably the ratio of probe fluorescence intensity to background
noise of all the DNA-chip probes is above 15. Likewise, the average
of all the ratios is preferably above 0.6 and the negative control
is preferably less than or equal to 3 times the background
noise
[0399] To sum up, in this case 4 probes (repeated 10 times) are
presented on the slide for detection of each mutation. Two of the
probes detect one genetic variation (A) and the other two the other
genetic variation (B). The examined base is located in the central
position of the probes.
[0400] A subject homozygous for the genetic variation A will not
show genetic variation B. Consequently, in the image obtained from
the glass support the probes which detect genetic variation B will
show a hybridization signal significantly less than that shown by
variation A and vice versa. In this case the ratios 1 and 2 will
show 1 and the subjects will be assigned as homozygous AA by the
software analysis.
[0401] On the other hand, a heterozygous subject for the determined
genetic variation shows both the genetic variations. Therefore, the
probes which detect them show an equivalent hybridization signal.
The ratios 1 and 2 will show 0.5 and the subject will be assigned
as heterozygous AB by the software analysis.
[0402] Oligonucleotide primers used for PCR amplifications are
listed in Tables 3A and 3B (FIG. 3). These correspond to SEQ ID
NOS: 361-538, with consecutive numbering of the forward/reverse
primer pairs for each of the SNPs. Thus, for example, the forward
primer for SNP1 (in BCL2) is SEQ ID NO: 361 and the reverse primer
for SNP1 (in BCL2) is SEQ ID NO: 362. Similarly the forward primer
for SNP2 (in CDKI1B) is SEQ ID NO: 363 and the reverse primer for
SNP2 (in CDKI1B) is SEQ ID NO: 364. For another example, the
forward primer for SNP6 (in HSD3B1) is SEQ ID NO: 371 and the
reverse primer for SNP6 (in HSD3B1) is SEQ ID NO: 372.
Example 2
Establishing Models for Predicting PCa Phenotypes
Methods
Study Design
[0403] Records of 840 patients with Prostate Cancer (PCa)
undergoing radical prostatectomy from the Department of Urology of
Miguel Servet University Hospital, Zaragoza (Spain) were entered
into the PCa HUMS (HUMS: Hospital Universitario Miguel Servet)
database between 1986 and 2002. The database was screened for
patients with clinically localized PCa with at least 5-years of
follow-up after surgery. Individuals matching those criteria
(n=375) were invited to participate in the study. A total of 269
patients out of 375 (72%) accepted to donate blood and clinical
data for the study and signed informed consents. This group was
composed of 85 men who developed PSA recurrence (post-prostatectomy
PSA level of >0.2 ng/ml) during the first 5 years after surgery
and 182 who did not experience PSA recurrence during this period.
When recurrence was defined as post-prostatectomy PSA level of
>0.4 ng/ml group composition was 74 and 193 respectively.
Patients receiving adjuvant radiation (n=30) were excluded. The
remaining 239 patients were used for the study.
[0404] The study was in accordance with the Helsinki Declaration
(World Medical Association) and the EMEA (European Medicines
Agency) recommendations.
[0405] Baseline clinical and analytical variables were recorded
from every patient before and after surgery. Preoperative clinical
variables were: onset age, PSA levels, biopsy Gleason grade and
clinical stage. Postoperative clinical variables were:
prostatectomy Gleason grade, pathological stage, lymph node
involvement and surgical margin status.
Studied Phenotype
[0406] For localized PCa patients treated with radical
prostatectomy, an increased PSA level usually indicates cancer
progression or recurrence. Those who experience early PSA
recurrence are more prone to develop metastatic lesions and poor
prognosis. In our study presence or absence of biochemical
progression (increase in PSA level) after 5-years of follow-up was
considered as main outcome. Controversy exists regarding the
importance of setting PSA levels >0.2 ng/ml or >0.4 ng/ml as
the threshold for defining "biochemical tumor recurrence". The
precision and sensibility of PSA assays seems to have improved in
the last years. However, most studies concluded that the threshold
for biochemical recurrence at 0.4 ng/ml is more reliable. In order
to explore all possibilities four different analyses were
performed: 1) prediction of PSA progression (defined as PSA>0.2
ng/ml) within 5 years of surgery using pre-operative clinical
variables and SNP's. 2) prediction of PSA progression (defined as
PSA>0.4 ng/ml) within 5 years of surgery using pre-operative
clinical variables and SNP's. 3) prediction of PSA progression
(defined as PSA>0.2 ng/ml) within 5 years of surgery using
pre-operative and post-operative clinical variables and SNP's. 4)
prediction of PSA progression (defined as PSA>0.4 ng/ml) within
5 years of surgery using pre-operative and post-operative clinical
variables and SNP's.
Genotyping and Single Nucleotide Polymorphism (SNP) Selection
[0407] Peripheral blood (2 ml) was obtained from each patient and
placed in an EDTA-treated tube. DNA was extracted with the QIAamp
DNA Blood MiniKit (Qiagen) following the manufacturer's
specifications. Genotyping was carried out using PROSCAN DNA
microarray. A total of 83 SNPs belonging to genes involved in
androgen metabolism and signalling, carcinogen metabolism, cell
proliferation, DNA-repair and growth factor signalling pathways,
among others, were genotyped for each patient.
[0408] The SNP selection was based on previous published data and
our own research expertise. The SNP selection in those genes was
based on a minor allele frequency of 0.1. Only "TagSNPs"
(R.sup.2>0.8) were taken into account as this gave more
statistical power by reducing the degrees of freedom (df) of our
tests.
TABLE-US-00002 Forward TAG TAG FW SEQ ID NO 542
TAATACGACTCACTATAGGGAGA Reverse TAGs TAG RW SEQ ID NO 543
AATTAACCCTCACTAAAGGGAGA
[0409] From all the SNPs genotyped, only those showing highest
association to the studied phenotype were included in the stepwise
logistic regression analysis to limit the overall false-positive
rate. First of all, chi-squared (.chi..sup.2) tests were performed
in order to test the conformity with Hardy-Weinberg expectations
(HWE) of the genetic polymorphisms under analysis. Tests of HWE
were carried out for all loci among all the different phenotypes
described. Only SNPs that conformed to HWE in both separate groups
under analysis were included in the study. SNPs with extremely high
deviations from the predictions of HWE (p values lower than 0.01)
were excluded from the analysis since such deviations could
indicate problems such as genotyping errors.
[0410] In addition, single locus association tests between SNP
allele frequency (allelic associations) and patient status were
carried out using the standard contingency .chi..sup.2 test, and
p-values were determined, including Bonferroni correction for
multiple testing. The SNPs with the smallest p-values were included
in the regression analysis. All the genetic analyses were carried
out using HelixTree.RTM. software (Golden Helix, Inc., Bozeman,
Mont., USA).
Statistical Modelling
[0411] For each model, the statistical analysis was carried out
between the two patient groups (with or without 5-years PSA
recurrence) to attempt discrimination between both groups in each
of the four different analyses. Four different models were
evaluated, one for each of the four analysis: one model to
distinguish between patients with or without PSA progression
(defined as PSA>0.2 ng/ml) within 5 years of surgery using
pre-operative clinical variables and SNP's (model 1); one model to
distinguish between patients with or without PSA progression
(defined as PSA>0.4 ng/ml) within 5 years of surgery using
pre-operative clinical variables and SNP's (model 2); one model to
distinguish between patients with or without PSA progression
(defined as PSA>0.2 ng/ml) within 5 years of surgery using
pre-operative and post-operative clinical variables and SNP's
(model 3) and one model to distinguish between patients with or
without PSA progression (defined as PSA>0.4 ng/ml) within 5
years of surgery using pre-operative and post-operative clinical
variables and SNP's (model 4).
[0412] Statistical analyses were performed using the Statistical
Package for the Social Sciences (SPSS Inc. Headquarters, Chicago,
Ill., USA) version 14.0. Multiple genotype-phenotype associations
were analysed by means of multivariate logistic regression (Forward
LR) with clinically determined disease phenotypes as dependent
variables and the individual loci and clinical and analytical data
as independent variables. The goodness of fit of the models was
evaluated using Hosmer-Lemeshow statistics and their accuracy was
assessed by calculating the area under the curve (AUC) of the
Receiver Operating Characteristic curve (ROC) with 95% confidence
intervals. The explained variability of the models on the basis of
the SNPs was evaluated by means of the R.sup.2 Nagelkerke. To
measure the impact of the variables included in the models of the
analysed phenotypes, the sensitivity, specificity, and positive
likelihood ratio (LR+=sensitivity/(1-specificity)) were computed by
means of ROC curves.
[0413] In order to verify association of individual genetic markers
to PSA recurrence all along the follow-up, a survival analysis was
performed. Estimates of PSA recurrence (event) were calculated
using the Kaplan-Meier method and graphically displayed. Comparison
of PSA recurrence probability for patients carrying AA, AB and BB
genotypes for each SNP that entered in the models was performed
using Log Rank, Breslow and Tarone-Ware statistics.
[0414] The Kaplan-Meier figures show on the x-axis "time of
follow-up" and on y-axis "proportion of patients without recurrence
(cumulated survival) for the 3 alternative genotypes. All along the
follow-up the proportion of patients without recurrence decreases,
but for some SNPs the decrease is significantly more dramatic for
any one of the genotypes
[0415] For example, in FIG. 9 for SNP24, 3 years after surgery, 80%
patients carrying genotype 0 or 1 do not present recurrence but for
those carrying genotype 2 only 25% do not present recurrence. To
see if those differences are significant or a pairwise comparison
can be made of the differences in recurrence-free time between the
3 genotypes (Table 9). The comparison is made by three different
statistical tests (Log Rank, Breslow and Tarone). For example, the
Log Rank Test shows that significant differences are found between
0 and 2 genotypes (P=0.020) and close to significance differences
exists between genotypes 1 and 2 (P=0.059). The Chi-square value is
the statistical value that allows calculation of the P-value.
Results
[0416] To differentiate between the two phenotypes (recurrence of
prostate cancer within five years of surgery, or not), four models
were obtained: a) model 1: two predictors (SNP46 [rs7799039]+PSA)
entered into the forward LR model, b) model 2: five predictors
(SNP46 [rs7799039]+SNP24 [rs328]+PSA+Onset Age+Clinical Stage)
entered in the model, 3) model 3: seven predictors (SNP24
[rs328]+SNP31 [rs1048943]+SNP56 [rs4646903]+PSA+Prostatectomy
Gleason Grade+Surgical Oncologic Margins+Surgical Gland Margins)
entered in the model 4) model 4: Eight predictors (SNP24
[rs328]+SNP25 [rs1042522]+SNP31 [rs1048943]+SNP32
[rs12329760]+PSA+Prostatectomy Gleason Grade+Surgical Oncologic
Margins+Surgical Gland Margins) entered in the model. Information
regarding the variables (clinical and SNPs) remaining in each
function is shown in Tables 4 to 7 (FIGS. 4A to 7A). Regression
probability functions are built using the Statistical Package for
the Social Sciences (SPSS Inc. Headquarters, Chicago, Ill., USA)
version 14.0. SPSSv14. B is the coefficient associated to each
genotype in the probability function. ET is the error in the
calculation of B. Wald is the statistical test. GL freedom degrees.
Sig. P value of B for the Wald test. Exp (B) is Relative Risk.
[0417] The contribution of genetic and clinical factors to studied
PCa phenotypes can be further demonstrated by the substantial
proportion of variance (R.sup.2 Nagelkerke) explained by the
functions (13% for model 1; 26% for model 2; 33% for model 3 and
39% for model 4). Probability functions and ROC curves were
obtained for the analysed phenotype. ROC curves, sensitivity,
specificity and positive likelihood ratios (LR+) of all the models
are given in FIGS. 4 to 7.
[0418] Nagelkerke R2 is a way of measuring the proportion of
variants explained by the functions. The area under the ROC-curve
(ROC AUC) is a measure of test performance or "diagnostic
accuracy". The positive likelihood ratio (LR+) is calculated as
sensitivity/1-specificity.
[0419] Survival analysis confirmed an impact of patients' genotypes
and PCa progression, as the Kaplan-Meier curves show (FIGS. 8-17
and 19-21). For the SNPs that are in the models, SNP 24, 25 and 46
were found to be significant by using Log Rank test or Breslow or
Tarone Ware (p<0.05), whilst for the others a positive trend was
observed even if the difference did not reach significance
DISCUSSION
[0420] To date several nomograms have been developed to predict the
probability of biochemical progression after radical prostatectomy
for localized prostate cancer. Those nomograms are usually based on
classical clinical parameters such as PSA level, Gleason grade or
clinical stage. Those nomograms have failed to accurately predict
PSA recurrence. Consequently, nomograms including novel markers
that are particularly associated to PCa aggressiveness are urgently
needed. In our study we describe four new nomograms or models. The
new models add genetic markers (germline SNPs) to the standard
clinical predictors. Those SNPs were already associated to PCa and
our study demonstrate their usefulness as predictors of
biologically aggressive PCa and progression as Likelihood Ratio
(LR) values of the models indicate. LR is an accurate and practical
way of expressing the power of diagnostic tests. Three of the four
models described herein present a significantly high LR value
(LR>5), thus evidencing the capacity of the methods (based on
SNP combinations plus clinical data) to predict aggressive PCa
phenotype. The high ROC-AUCs obtained for some of these models
(>0.8) provides further evidence for the high discriminatory
power of the predictors combinations used. The usefulness of the
ROC-AUC magnitude as a tool for evaluating the strength of the
relationship between predictors and disease has been described
previously.
[0421] Using these SNPs to obtain a genetic profile of the patient
provides an extra tool for the physician to differentiate patients
with high probability of early PSA progression after surgery.
Moreover, individual impact of germline genetic markers on cancer
behaviour was verified by the survival analysis that considered PSA
recurrence (0.4 ng/ml) as the final event.
[0422] In the case of model 1 we did not obtain a high LR, probably
because of the low reliability of 0.2 ng/ml PSA threshold and low
predictive capacity of pre-operative clinical data. Within the
three last models Model 4 showed the best predictive power (LR=8.8,
R.sup.2=0.38, ROC-AUC=0.83, specificity=95% and sensibility=42%).
This could be due to 0.4 ng/ml PSA recurrence threshold robustness,
the high predictive capacity of post-operative clinical data and
mainly to the fact of including up to four novel genetic markers
(SNPs). The new model could help to identify patients undergoing
surgery with high risk of biochemical recurrence who may benefit
from neoadjuvant treatment protocols.
TABLE-US-00003 Gene Symbol Reference Alpha 5-reductase type II
SRD5A2 J Urol 2003, 169(4, Suppl.): Abst 313. gene J Urol. 2003
169(6): 2378-81 Prostate 2004 59(1): 69-76 Eur J Hum Genet. 2004
12(4): 321-32 Prostate 2002 52(4): 269-78. Pharmacogenetics. 2002
12(4): 307-12. Androgen receptor AR Prostate. 2004 60(4): 343-51.
Int J Radiat Oncol Biol Phys 2002, 54 Abst 232 Hum Genet. 2002;
110(2): 122-9 Lab Invest. 2003 83(12): 1709-13 Lab Invest. 2002
82(11): 1591-8 Cancer Res. 2004; 64(2): 765-71 Cancer Res. 2003 Jan
1; 63(1): 149-53 Int J Cancer. 2002 Jul 20; 100(3): 309-17 Cancer
Genet Cytogenet. 2003 Mar; 141(2): 91-6 Int J Urol. 2002 9(10):
545-53 Urol Int. 2002; 68(1): 16-23. J Urol. 2002 168(5): 2245-8.
Clin Cancer Res. 2001 7(10): 3092-6. Angiotensin-converting ACE J
Pathol. 2004 202(3): 330-5 enzyme Ataxia telangiectasia ATM Br J
Cancer. 2004 91(4): 783-7 mutated B-cell CLL/lymphoma 2, BCL2 53rd
Annu Meet Am Soc Hum Genet (Nov 4-Nov transcript variant 1 8, Los
Angeles) 2003, Abst 457 Beta-17-hydroxysteroid HSD17B3 Prostate.
2002 Sep 15; 53(1): 65-8 dehydrogenase Beta-3-hydroxysteroid HSD3B1
Cancer Res. 2002 Mar 15; 62(6): 1784-9. dehydrogenase B1
Beta-3-hydroxysteroid HSD3B2 Cancer Res. 2002 Mar 15; 62(6):
1784-9. dehydrogenases B2 Breast Cancer 2 BRCA2 Am J Hum Genet.
2003. 72(1): 1-12. Cadherin 1, type 1, E- CDH1 J Urol 2002, 167(4,
Suppl.): Abst 284 cadherin Int J Cancer. 2004 109(3): 348-52 Int J
Cancer. 2002 100(6): 683-5 Clin Cancer Res. 2001 7(11): 3465-71.
Collagen alpha1 type COL18A1 Proc Am Assoc Cancer Res 2003, 44:
Abst 4054 XVIII (contains Cancer Res. 2001; 61(20): 7375-8.
endostatin) Cyclin D1 parathyroid CD1 Jpn J Cancer Res 2002,
93(Suppl.): Abst 3245 adenomatosis 1 J Urol 2003, 169(4, Suppl.):
Abst 209 Anticancer Res. 2003 23(6D): 4947-5 Int J Cancer. 2003
103(1): 116-20. Cyclin dependant Kinase CDKN1A Cancer Res. 2003
63(9): 2033-6. 1A Cyclin dependant Kinase CDKN1B Cancer Res. 2004.
64(6): 1997-9 1B Cancer Res. 2003 63(9): 2033-6 Cyclin-dependent
kinase CDKNI1B Cancer Res 2004, 64(6): 1997 inhibitor 1B (p27,
Kip1) gene Cytochrome P450, family CYP1A1 Cancer Genet Cytogenet.
2004 154(1): 81-5. 1, subfamily A, polypeptide 1 Cytochrome P450,
family CYP1B1 Br J Cancer 2003, 89(8): 1524 1, subfamily B, Biochem
Biophys Res C. 2002 296(4): 820-6. polypeptide 1 Anticancer Res.
2004 24(4): 2431-7 Cytochrome P450, family CYP19 Anticancer Res.
2003 23(6D): 4941-6 19, intron 4 Cancer. 2003 98(7): 1411-6 Clin
Cancer Res. 2001 7(10): 3092-6. Cytochrome P450, family CYP2C19
Cancer Biol Ther. 2002 1(6): 669-73 2, subfamily C, polypeptide 19
Cytochrome P450, family CYP3A4 Br J Cancer 2003, 88(6) 3, subfamily
A, Eur J Hum Genet. 2004 12(4): 321-32 polypeptide 4 Oncol Rep.
2002; 9(3): 653-5. J Urol 2003, 169(4, Suppl.): Abst 1442 Cancer.
2003 Nov 1; 98(9): 1855-62. Cancer Res. 2004 Oct 15; 64(20):
7426-31 Cytochrome P450, family CYP7B1 Pharmacogenomics J. 2004;
4(4): 245-50. 7, subfamily B, polypeptide 1 Cytochrome P450, CYP17
J Urol 2003, 169(4, Suppl.): Abst 313 subfamily 17 Eur J Hum Genet.
2004 12(4): 321-32 Cancer Epidemiol Biomarkers Prev. 2003. 12(2):
120-6, J Urol 2002, 167(4, Suppl.): Abst 583 CHK2 checkpoint CHEK2
Am J Hum Genet 2003, 72(2): 270, homolog (isoform 1) (S. pombe)
Cancer Res 2004, 64(8): 2677 Early growth response-1 EGR1 Am J Clin
Oncol. 2001 Oct; 24(5): 500-5. gene ElaC homolog 2 ELAC2 Prostate.
2004 61(3): 248-52 Cancer Epidemiol Biomarkers Prev. 2003. 12(9):
876-81 Int J Cancer. 2003 Nov 1; 107(2): 224-8. Endothelial nitric
oxide NOS Cancer Lett. 2003 Jan 10; 189(1): 85-90. synthase Clin
Cancer Res. 2002 8(11): 3433-7 Epoxide hydrolase 1, EPHX Isr Med
Assoc J. 2003 Oct; 5(10): 741-5. microsomal Estrogen receptor ESR1
Cancer Genet Cytogenet. 2003 Mar; 141(2): 91-6. Estrogen receptor 1
ESR1 J Urol 2003, 169(4, Suppl.): Abst 316 Cancer. 2003 98(7):
1411-6. Eur Urol. 2003 Oct; 44(4): 487-90. Mol Carcinog. 2003
37(4): 202-8. Clin Cancer Res. 2001 7(10): 3092-6. Fibroblast
growth factor FGFR4 Clin Cancer Res. 2004. 10: 6169-78 receptor-4
Glutathione S- GSTM3 Prostate. 2004 Mar 1; 58(4): 414-20
transferase M3 Glutathione-S- GST Indian J Cancer. 2004. 41(3):
115-9. transferases Human oxoguanine OGG1 J Urol. 2003 170(6 Pt 1):
2471-4 glycosylase 1 Cancer Res. 2002 Apr 15; 62(8): 2253-7. Human
sulfotransferase SULT1A1 Cancer Epidemiol Biomarkers Prev. 2004 1A1
13(2): 270-6 Hypoxia inducible factor- HIF1 J Cancer Res Clin
Oncol. 2002 128(7): 358-62 1alpha (HIF-1alpha). Hypoxia-inducible
factor HIF1 J Urol 2002, 167(4, Suppl.): Abst 201 1, alpha subunit,
transcript variant 1 Insulin gene INS Br J Cancer. 2003 88(2):
263-9 Insulin-like growth IGFBP3 Cancer Res. 2003 63(15): 4407-11
factor-binding protein3 Kallikrein 10 KLK10 Prostate 2002, 51(1):
35 Kallikrein-2 KLK2 J Clin Oncol. 2003 21(12): 2312-9. Leptin LEP
Prostate. 2004 59(3): 268-74. Lipoprotein lipase LPL Int J Cancer.
2004 112(5): 872-6. Macrophage scavenger MSR1 Prostate. 2004 59(2):
132-40. receptor 1 Macrophage-inhibitory MIC1 J Natl Cancer Inst.
2004 96(16): 1248-54. cytokine-1 Methylenetetrahydro- MTHFR Int J
Oncol. 2004 25(5): 1465-71 folate reductase N-acetyltransferase-2
NAT2 Int J Urol. 2003 Mar; 10(3): 167-73. Environ Mol Mutagen.
2002; 40(3): 161-7. Osteocalcin BGLAP Eur Urol. 2003 Feb; 43(2):
197-200 P53 tumor suppressor P53 Urol Int. 2004; 73(1): 41-6
Paraoxonase PON1 J Natl Cancer Inst. 2003 95(11): 812-8. Prostate
specific antigen PSA J Urol. 2004 171(4): 1529-32 PSA gene
Prostate. 2002; 53(1): 88-94 RNAse L RNASEL Br J Cancer. 2003
89(4): 691-6. J Med Genet. 2003 40(3): e21. TGFbetal TGFB1 Cancer
Epidemiol Biomarkers Prev. 2004.13(5): 759-64 Carcinogenesis.
2004.25(2): 237-40. Toll-like receptor 4 TLR4 Cancer Res. 2004
64(8): 2918-22. Transmembrane serine TMPRSS2 Prostate. 2004 59(4):
357-9 protease 2 UDP-glucuronosyl- UGT2B15 J Urol. 2004 171(6):
2484-8. transferase Prostate. 2004. 59(4): 436-9 Vascular
endothelial VEGF Urology. 2003 62(2): 374-7. growth factor Cancer
Res. 2002 Jun 15; 62(12): 3369-72 v-erb-b2 erythroblastic HER2
Prostate. 2004 leukemia vir oncogene Interleukin-6 IL6 J Urology.
2005 174: 753-756 Vitamin D Receptor VDR Br J Cancer 2003, 88(6):
928 Cancer Epidemiol Biomarkers Prev. 2003.12(1): 23-7. J Hum
Genet. 2002; 47(8): 413-8 Urol Int. 2002; 68(4): 226-31 Endocr J.
2001 48(5): 543-9
Sequence CWU 1
1
543123DNAHomo sapiens; 1gctgggagaa cagggtacga taa 23223DNAHomo
sapiens; 2gctgggagaa cggggtacga taa 23323DNAHomo sapiens;
3ttatcgtacc ctgttctccc agc 23423DNAHomo sapiens; 4ttatcgtacc
ccgttctccc agc 23521DNAHomo sapiens; 5gggttcgggc cgcgtagggg c
21621DNAHomo sapiens; 6gggttcgggc tgcgtagggg c 21721DNAHomo
sapiens; 7gcccctacgc ggcccgaacc c 21821DNAHomo sapiens; 8gcccctacgc
agcccgaacc c 21923DNAHomo sapiens; 9aaactcttac attgcataca tag
231023DNAHomo sapiens; 10aaactcttac actgcataca tag 231121DNAHomo
sapiens; 11aactcttaca ttgcatacat a 211221DNAHomo sapiens;
12aactcttaca ctgcatacat a 211323DNAHomo sapiens; 13atcttccaaa
tcatttttag tta 231423DNAHomo sapiens; 14atcttccaaa taatttttag tta
231521DNAHomo sapiens; 15tcttccaaat catttttagt t 211621DNAHomo
sapiens; 16tcttccaaat aatttttagt t 211723DNAHomo sapiens;
17ccttctacag cggtgccttc caa 231823DNAHomo sapiens; 18ccttctacag
cagtgccttc caa 231923DNAHomo sapiens; 19ttggaaggca ccgctgtaga agg
232023DNAHomo sapiens; 20ttggaaggca ctgctgtaga agg 232123DNAHomo
sapiens; 21gcacaaggag aacctgaagt cca 232223DNAHomo sapiens;
22gcacaaggag accctgaagt cca 232319DNAHomo sapiens; 23gacttcaggt
tctccttgt 192419DNAHomo sapiens; 24gacttcaggg tctccttgt
192523DNAHomo sapiens; 25ccatttcccc tcttaaatga gaa 232623DNAHomo
sapiens; 26ccatttcccc tgttaaatga gaa 232721DNAHomo sapiens;
27catttcccct cttaaatgag a 212821DNAHomo sapiens; 28catttcccct
gttaaatgag a 212923DNAHomo sapiens; 29aagcctatgg cgccccgtgc gcg
233023DNAHomo sapiens; 30aagcctatgg ctccccgtgc gcg 233121DNAHomo
sapiens; 31gcgcacgggg cgccataggc t 213221DNAHomo sapiens;
32gcgcacgggg agccataggc t 213323DNAHomo sapiens; 33gatccacttc
cggtaatgca cca 233423DNAHomo sapiens; 34gatccacttc cagtaatgca cca
233521DNAHomo sapiens; 35atccacttcc ggtaatgcac c 213621DNAHomo
sapiens; 36atccacttcc agtaatgcac c 213719DNAHomo sapiens;
37acgtccatca tctctgcgg 193819DNAHomo sapiens; 38acgtccatcg
tctctgcgg 193921DNAHomo sapiens; 39gacgtccatc atctctgcgg t
214021DNAHomo sapiens; 40gacgtccatc gtctctgcgg t 214123DNAHomo
sapiens; 41tgcctagtgg gttcacctgc cca 234223DNAHomo sapiens;
42tgcctagtgg ggtcacctgc cca 234325DNAHomo sapiens; 43ctgcctagtg
ggttcacctg cccac 254425DNAHomo sapiens; 44ctgcctagtg gggtcacctg
cccac 254525DNAHomo sapiens; 45cacacattct tggccttctg cagat
254625DNAHomo sapiens; 46cacacattct tgaccttctg cagat 254725DNAHomo
sapiens; 47atctgcagaa ggccaagaat gtgtg 254825DNAHomo sapiens;
48atctgcagaa ggtcaagaat gtgtg 254925DNAHomo sapiens; 49aatcatgacc
cactgaagtg gccta 255025DNAHomo sapiens; 50aatcatgacc cagtgaagtg
gccta 255125DNAHomo sapiens; 51taggccactt cagtgggtca tgatt
255225DNAHomo sapiens; 52taggccactt cactgggtca tgatt 255323DNAHomo
sapiens; 53gagcatggcg gcattggcgc agg 235423DNAHomo sapiens;
54gagcatggcg ggattggcgc agg 235523DNAHomo sapiens; 55cctgcgccaa
tgccgccatg ctc 235623DNAHomo sapiens; 56cctgcgccaa tcccgccatg ctc
235723DNAHomo sapiens; 57ggcggcgcgg gctggcaggc ggg 235823DNAHomo
sapiens; 58ggcggcgcgg gttggcaggc ggg 235921DNAHomo sapiens;
59ccgcctgcca gcccgcgccg c 216021DNAHomo sapiens; 60ccgcctgcca
acccgcgccg c 216121DNAHomo sapiens; 61cctcttctgc gtacattact t
216221DNAHomo sapiens; 62cctcttctgc ctacattact t 216323DNAHomo
sapiens; 63gaagtaatgt acgcagaaga ggc 236423DNAHomo sapiens;
64gaagtaatgt aggcagaaga ggc 236521DNAHomo sapiens; 65ccccagatga
gcccccagaa c 216621DNAHomo sapiens; 66ccccagatga tcccccagaa c
216723DNAHomo sapiens; 67agttctgggg gctcatctgg ggc 236823DNAHomo
sapiens; 68agttctgggg gatcatctgg ggc 236923DNAHomo sapiens;
69tgaggggcat ggggacgggg ttc 237023DNAHomo sapiens; 70tgaggggcat
gaggacgggg ttc 237123DNAHomo sapiens; 71gaaccccgtc cccatgcccc tca
237223DNAHomo sapiens; 72gaaccccgtc ctcatgcccc tca 237323DNAHomo
sapiens; 73accaaagcat ctgggatggc cct 237423DNAHomo sapiens;
74accaaagcat ccgggatggc cct 237523DNAHomo sapiens; 75agggccatcc
cagatgcttt ggt 237623DNAHomo sapiens; 76accaaagcat ccgggatggc cct
237723DNAHomo sapiens; 77gtaggctgat ccttattcaa aat 237823DNAHomo
sapiens; 78gtaggctgat cgttattcaa aat 237923DNAHomo sapiens;
79attttgaata aggatcagcc tac 238023DNAHomo sapiens; 80attttgaata
acgatcagcc tac 238123DNAHomo sapiens; 81tctttaaggg gtctgtcatg gaa
238223DNAHomo sapiens; 82tctttaaggg ggctgtcatg gaa 238323DNAHomo
sapiens; 83ttccatgaca gaccccttaa aga 238423DNAHomo sapiens;
84ttccatgaca gcccccttaa aga 238523DNAHomo sapiens; 85tgctgctggc
cgggctgtat cga 238623DNAHomo sapiens; 86tgctgctggc caggctgtat cga
238723DNAHomo sapiens; 87tcgatacagc ccggccagca gca 238823DNAHomo
sapiens; 88tcgatacagc ctggccagca gca 238923DNAHomo sapiens;
89tggaattggg gatcactgga agt 239023DNAHomo sapiens; 90tggaattggg
ggtcactgga agt 239123DNAHomo sapiens; 91acttccagtg atccccaatt cca
239223DNAHomo sapiens; 92acttccagtg acccccaatt cca 239323DNAHomo
sapiens; 93gaataagaag tcaggctggt gag 239423DNAHomo sapiens;
94gaataagaag tgaggctggt gag 239523DNAHomo sapiens; 95ctcaccagcc
tgacttctta ttc 239623DNAHomo sapiens; 96ctcaccagcc tcacttctta ttc
239723DNAHomo sapiens; 97ggctgctccc cgcgtggccc ctg 239823DNAHomo
sapiens; 98ggctgctccc cccgtggccc ctg 239923DNAHomo sapiens;
99caggggccac gcggggagca gcc 2310023DNAHomo sapiens; 100caggggccac
ggggggagca gcc 2310123DNAHomo sapiens; 101ttctcaacag ataccctcac ttc
2310223DNAHomo sapiens; 102ttctcaacag acaccctcac ttc 2310323DNAHomo
sapiens; 103gaagtgaggg tatctgttga gaa 2310423DNAHomo sapiens;
104gaagtgaggg tgtctgttga gaa 2310523DNAHomo sapiens; 105agactccgag
ttgaatgaaa atg 2310623DNAHomo sapiens; 106agactccgag tcgaatgaaa atg
2310723DNAHomo sapiens; 107cattttcatt caactcggag tct 2310823DNAHomo
sapiens; 108cattttcatt cgactcggag tct 2310923DNAHomo sapiens;
109gcaccctggc tgctgtgttt gtg 2311023DNAHomo sapiens; 110gcaccctggc
tactgtgttt gtg 2311123DNAHomo sapiens; 111cacaaacaca gcagccaggg tgc
2311223DNAHomo sapiens; 112cacaaacaca gtagccaggg tgc 2311323DNAHomo
sapiens; 113aaagtggtcc acgaggattt cca 2311423DNAHomo sapiens;
114aaagtggtcc atgaggattt cca 2311523DNAHomo sapiens; 115tggaaatcct
cgtggaccac ttt 2311623DNAHomo sapiens; 116tggaaatcct catggaccac ttt
2311723DNAHomo sapiens; 117gctgagcccc ccatactcta ttc 2311823DNAHomo
sapiens; 118gctgagcccc cgatactcta ttc 2311923DNAHomo sapiens;
119gaatagagta tggggggctc agc 2312023DNAHomo sapiens; 120gaatagagta
tcgggggctc agc 2312123DNAHomo sapiens; 121tcggtgagac cattgcccgc tgg
2312223DNAHomo sapiens; 122tcggtgagac cgttgcccgc tgg 2312323DNAHomo
sapiens; 123ccagcgggca atggtctcac cga 2312423DNAHomo sapiens;
124ccagcgggca acggtctcac cga 2312521DNAHomo sapiens; 125catccttcag
atgtactcat c 2112621DNAHomo sapiens; 126catccttcag gtgtactcat c
2112721DNAHomo sapiens; 127gatgagtaca tctgaaggat g 2112821DNAHomo
sapiens; 128gatgagtaca cctgaaggat g 2112923DNAHomo sapiens;
129agagcgaggg aagcctcggg ggc 2313023DNAHomo sapiens; 130agagcgaggg
aggcctcggg ggc 2313123DNAHomo sapiens; 131gcccccgagg cttccctcgc tct
2313223DNAHomo sapiens; 132gcccccgagg cctccctcgc tct 2313323DNAHomo
sapiens; 133tgaatttgcc caaaatgtcc tgt 2313423DNAHomo sapiens;
134tgaatttgcc cgaaatgtcc tgt 2313523DNAHomo sapiens; 135acaggacatt
ttgggcaaat tca 2313623DNAHomo sapiens; 136acaggacatt tcgggcaaat tca
2313723DNAHomo sapiens; 137tcatttgagg agctgaaagc tca 2313823DNAHomo
sapiens; 138tcatttgagg atctgaaagc tca 2313923DNAHomo sapiens;
139tgagctttca gctcctcaaa tga 2314023DNAHomo sapiens; 140tgagctttca
gatcctcaaa tga 2314123DNAHomo sapiens; 141aagagcacat agagattaat gac
2314223DNAHomo sapiens; 142aagagcacat atagattaat gac 2314323DNAHomo
sapiens; 143gtcattaatc tctatgtgct ctt 2314423DNAHomo sapiens;
144gtcattaatc tatatgtgct ctt 2314523DNAHomo sapiens; 145ttcatcgtcc
cctctcccct gtc 2314623DNAHomo sapiens; 146ttcatcgtcc catctcccct gtc
2314723DNAHomo sapiens; 147gacaggggag aggggacgat gaa 2314823DNAHomo
sapiens; 148gacaggggag atgggacgat gaa 2314925DNAHomo sapiens;
149gcttctttgg gaaggggaag taggg 2515025DNAHomo sapiens;
150ccctacttcc ccttcccaaa gaagc 2515125DNAHomo sapiens;
151gcttctttgg gagggggaag taggg 2515225DNAHomo sapiens;
152ccctacttcc ccctcccaaa gaagc 2515323DNAHomo sapiens;
153ctagagggtc accgcgtcta tgc 2315423DNAHomo sapiens; 154ctagagggtc
aacgcgtcta tgc 2315523DNAHomo sapiens; 155gcatagacgc ggtgaccctc tag
2315623DNAHomo sapiens; 156gcatagacgc gttgaccctc tag 2315723DNAHomo
sapiens; 157gaagggccaa ggaaggggtt aga 2315823DNAHomo sapiens;
158gaagggccaa gcaaggggtt aga 2315921DNAHomo sapiens; 159ctaacccctt
ccttggccct t 2116021DNAHomo sapiens; 160ctaacccctt gcttggccct t
2116123DNAHomo sapiens; 161tagacgcagc ccgcaggcag ccc 2316223DNAHomo
sapiens; 162tagacgcagc ctgcaggcag ccc 2316323DNAHomo sapiens;
163gggctgcctg cgggctgcgt cta 2316423DNAHomo sapiens; 164gggctgcctg
caggctgcgt cta 2316523DNAHomo sapiens; 165ccctcactta ccgggtcaca ctt
2316623DNAHomo sapiens; 166ccctcactta ctgggtcaca ctt 2316721DNAHomo
sapiens; 167cctcacttac cgggtcacac t 2116821DNAHomo sapiens;
168cctcacttac tgggtcacac t 2116923DNAHomo sapiens; 169tcttctcctt
taacggcaag gac 2317023DNAHomo sapiens; 170tcttctcctt tgacggcaag gac
2317123DNAHomo sapiens; 171gtccttgccg ttaaaggaga aga 2317223DNAHomo
sapiens; 172gtccttgccg tcaaaggaga aga 2317323DNAHomo sapiens;
173gcggctgctg ccgctgctgc tac 2317423DNAHomo sapiens; 174gcggctgctg
ctgctgctgc tac 2317523DNAHomo sapiens; 175gtagcagcag cggcagcagc cgc
2317623DNAHomo sapiens; 176gtagcagcag cagcagcagc cgc 2317723DNAHomo
sapiens; 177cccttccatc cctcaggtgt cct 2317823DNAHomo sapiens;
178cccttccatc cttcaggtgt cct 2317921DNAHomo sapiens; 179ccttccatcc
ctcaggtgtc c 2118021DNAHomo sapiens; 180ccttccatcc ttcaggtgtc c
2118123DNAHomo sapiens; 181gacagggttg cgctgatcct ccc 2318223DNAHomo
sapiens; 182gacagggttg cactgatcct ccc 2318323DNAHomo sapiens;
183gggaggatca gcgcaaccct gtc 2318423DNAHomo sapiens; 184gggaggatca
gtgcaaccct gtc 2318523DNAHomo sapiens; 185ggctcccgct gccatcctgg ctc
2318623DNAHomo sapiens; 186ggctcccgct gacatcctgg ctc 2318721DNAHomo
sapiens; 187gctcccgctg ccatcctggc t 2118821DNAHomo sapiens;
188gctcccgctg acatcctggc t 2118923DNAHomo sapiens; 189caggaagcct
gcagtcctgg aag
2319023DNAHomo sapiens; 190caggaagcct gtagtcctgg aag 2319123DNAHomo
sapiens; 191cttccaggac tgcaggcttc ctg 2319223DNAHomo sapiens;
192cttccaggac tacaggcttc ctg 2319323DNAHomo sapiens; 193ctccgggcgt
gagcacgagg agc 2319423DNAHomo sapiens; 194ctccgggcgt gcgcacgagg agc
2319523DNAHomo sapiens; 195gctcctcgtg ctcacgcccg gag 2319623DNAHomo
sapiens; 196gctcctcgtg cgcacgcccg gag 2319721DNAHomo sapiens;
197aacagcaagt actagctctc c 2119821DNAHomo sapiens; 198aacagcaagt
gctagctctc c 2119921DNAHomo sapiens; 199ggagagctag tacttgctgt t
2120021DNAHomo sapiens; 200ggagagctag cacttgctgt t 2120123DNAHomo
sapiens; 201cctccaccat gatactagga ccc 2320223DNAHomo sapiens;
202cctccaccat ggtactagga ccc 2320321DNAHomo sapiens; 203ctccaccatg
atactaggac c 2120421DNAHomo sapiens; 204ctccaccatg gtactaggac c
2120523DNAHomo sapiens; 205gcgtgagcca ccgcgcctgg ccg 2320623DNAHomo
sapiens; 206gcgtgagcca ctgcgcctgg ccg 2320721DNAHomo sapiens;
207cgtgagccac cgcgcctggc c 2120821DNAHomo sapiens; 208cgtgagccac
tgcgcctggc c 2120921DNAHomo sapiens; 209gtgctgggat tacaggcgtg a
2121021DNAHomo sapiens; 210gtgctgggat gacaggcgtg a 2121119DNAHomo
sapiens; 211cacgcctgta atcccagca 1921219DNAHomo sapiens;
212cacgcctgtc atcccagca 1921321DNAHomo sapiens; 213ccgctccaac
gccctcaacc c 2121421DNAHomo sapiens; 214ccgctccaac accctcaacc c
2121519DNAHomo sapiens; 215cgctccaacg ccctcaacc 1921619DNAHomo
sapiens; 216cgctccaaca ccctcaacc 1921723DNAHomo sapiens;
217ggattttcag ggtaggtaat gaa 2321823DNAHomo sapiens; 218ggattttcag
gataggtaat gaa 2321923DNAHomo sapiens; 219ttcattacct accctgaaaa tcc
2322023DNAHomo sapiens; 220ttcattacct atcctgaaaa tcc 2322121DNAHomo
sapiens; 221gtgtgagccc gggaggtgga g 2122221DNAHomo sapiens;
222gtgtgagccc aggaggtgga g 2122319DNAHomo sapiens; 223tgtgagcccg
ggaggtgga 1922419DNAHomo sapiens; 224tgtgagccca ggaggtgga
1922523DNAHomo sapiens; 225aaaagcatac aattgataat tca 2322623DNAHomo
sapiens; 226aaaagcatac atttgataat tca 2322723DNAHomo sapiens;
227tgaattatca attgtatgct ttt 2322823DNAHomo sapiens; 228tgaattatca
aatgtatgct ttt 2322923DNAHomo sapiens; 229cacaatatcc tctggggttt ggc
2323023DNAHomo sapiens; 230cacaatatcc tttggggttt ggc 2323123DNAHomo
sapiens; 231gccaaacccc agaggatatt gtg 2323223DNAHomo sapiens;
232gccaaacccc aaaggatatt gtg 2323323DNAHomo sapiens; 233tccaggcttc
cgcaacttac acg 2323423DNAHomo sapiens; 234tccaggcttc ctcaacttac acg
2323523DNAHomo sapiens; 235cgtgtaagtt gcggaagcct gga 2323623DNAHomo
sapiens; 236cgtgtaagtt gaggaagcct gga 2323721DNAHomo sapiens;
237tctactccac tgctgtctat c 2123821DNAHomo sapiens; 238tctactccac
cgctgtctat c 2123921DNAHomo sapiens; 239gatagacagc ggtggagtag a
2124021DNAHomo sapiens; 240gatagacagc agtggagtag a 2124121DNAHomo
sapiens; 241cgtgcggcct cgattggagg t 2124221DNAHomo sapiens;
242cgtgcggcct tgattggagg t 2124319DNAHomo sapiens; 243gtgcggcctc
gattggagg 1924419DNAHomo sapiens; 244gtgcggcctt gattggagg
1924523DNAHomo sapiens; 245cgttgtcccc aaattgcagg aac 2324623DNAHomo
sapiens; 246cgttgtcccc agattgcagg aac 2324723DNAHomo sapiens;
247gttcctgcaa tttggggaca acg 2324823DNAHomo sapiens; 248gttcctgcaa
tctggggaca acg 2324921DNAHomo sapiens; 249gccttctcct ctctgtcccc a
2125021DNAHomo sapiens; 250gccttctcct ttctgtcccc a 2125119DNAHomo
sapiens; 251ccttctcctc tctgtcccc 1925219DNAHomo sapiens;
252ccttctcctt tctgtcccc 1925323DNAHomo sapiens; 253ggcggcagct
ccggtccgcg ccc 2325423DNAHomo sapiens; 254ggcggcagct cgggtccgcg ccc
2325523DNAHomo sapiens; 255gggcgcggac cggagctgcc gcc 2325623DNAHomo
sapiens; 256gggcgcggac ccgagctgcc gcc 2325723DNAHomo sapiens;
257ccgaccggcc ggccttcgcc tcc 2325823DNAHomo sapiens; 258ccgaccggcc
gtccttcgcc tcc 2325923DNAHomo sapiens; 259ggaggcgaag gccggccggt cgg
2326023DNAHomo sapiens; 260ggaggcgaag gacggccggt cgg 2326121DNAHomo
sapiens; 261cgcctctctc ttgcccttgt c 2126221DNAHomo sapiens;
262cgcctctctc ctgcccttgt c 2126319DNAHomo sapiens; 263gcctctctct
tgcccttgt 1926419DNAHomo sapiens; 264gcctctctcc tgcccttgt
1926523DNAHomo sapiens; 265cagaaaaaag acgcaggatt tcc 2326623DNAHomo
sapiens; 266cagaaaaaag atgcaggatt tcc 2326723DNAHomo sapiens;
267ggaaatcctg cgtctttttt ctg 2326823DNAHomo sapiens; 268ggaaatcctg
catctttttt ctg 2326923DNAHomo sapiens; 269agggaaaaga agaggatact tct
2327023DNAHomo sapiens; 270agggaaaaga aaggatactt ctc 2327123DNAHomo
sapiens; 271agaagtatcc tcttcttttc cct 2327223DNAHomo sapiens;
272gagaagtatc ctttcttttc cct 2327323DNAHomo sapiens; 273gtttgtgggg
cactccctgc cag 2327423DNAHomo sapiens; 274gtttgtgggg cgctccctgc cag
2327521DNAHomo sapiens; 275tggcagggag tgccccacaa a 2127621DNAHomo
sapiens; 276tggcagggag cgccccacaa a 2127723DNAHomo sapiens;
277tcttacaggg atggaggcaa tgg 2327823DNAHomo sapiens; 278tcttacaggg
acggaggcaa tgg 2327923DNAHomo sapiens; 279ccattgcctc catccctgta aga
2328023DNAHomo sapiens; 280ccattgcctc cgtccctgta aga 2328123DNAHomo
sapiens; 281gccgcgctga ttgaggccat cca 2328223DNAHomo sapiens;
282gccgcgctga tcgaggccat cca 2328321DNAHomo sapiens; 283ccgcgctgat
tgaggccatc c 2128421DNAHomo sapiens; 284ccgcgctgat cgaggccatc c
2128521DNAHomo sapiens; 285tctgcgggag ccgatttcat c 2128621DNAHomo
sapiens; 286tctgcgggag tcgatttcat c 2128721DNAHomo sapiens;
287gatgaaatcg gctcccgcag a 2128821DNAHomo sapiens; 288gatgaaatcg
actcccgcag a 2128923DNAHomo sapiens; 289gaccagtgaa gaaagtgtct ttg
2329023DNAHomo sapiens; 290gaccagtgaa gcaagtgtct ttg 2329121DNAHomo
sapiens; 291aaagacactt tcttcactgg t 2129221DNAHomo sapiens;
292aaagacactt gcttcactgg t 2129323DNAHomo sapiens; 293cattttggga
acagtggatg tta 2329423DNAHomo sapiens; 294cattttggga agagtggatg tta
2329523DNAHomo sapiens; 295taacatccac tgttcccaaa atg 2329623DNAHomo
sapiens; 296taacatccac tcttcccaaa atg 2329723DNAHomo sapiens;
297gattatttcc caggaaccca taa 2329823DNAHomo sapiens; 298gattatttcc
cgggaaccca taa 2329921DNAHomo sapiens; 299attatttccc aggaacccat a
2130021DNAHomo sapiens; 300attatttccc gggaacccat a 2130123DNAHomo
sapiens; 301agcaccccct gaatccaggt aag 2330223DNAHomo sapiens;
302agcaccccct ggatccaggt aag 2330323DNAHomo sapiens; 303cttacctgga
ttcagggggt gct 2330423DNAHomo sapiens; 304cttacctgga tccagggggt gct
2330521DNAHomo sapiens; 305gaaggcttca atggatcctt t 2130621DNAHomo
sapiens; 306gaaggcttca gtggatcctt t 2130721DNAHomo sapiens;
307aaaggatcca ttgaagcctt c 2130821DNAHomo sapiens; 308aaaggatcca
ctgaagcctt c 2130923DNAHomo sapiens; 309gaatctggta cctggaccaa atc
2331023DNAHomo sapiens; 310gaatctggta cttggaccaa atc 2331121DNAHomo
sapiens; 311aatctggtac ctggaccaaa t 2131221DNAHomo sapiens;
312aatctggtac ttggaccaaa t 2131321DNAHomo sapiens; 313cctgccgtca
gtggtcacct g 2131421DNAHomo sapiens; 314cctgccgtca atggtcacct g
2131519DNAHomo sapiens; 315ctgccgtcag tggtcacct 1931619DNAHomo
sapiens; 316ctgccgtcaa tggtcacct 1931723DNAHomo sapiens;
317gggtattttt acatccctcc agt 2331823DNAHomo sapiens; 318gggtattttt
atatccctcc agt 2331921DNAHomo sapiens; 319ggtattttta catccctcca g
2132021DNAHomo sapiens; 320ggtattttta tatccctcca g 2132123DNAHomo
sapiens; 321gcttgaacct caaacaattg aag 2332223DNAHomo sapiens;
322gcttgaacct cgaacaattg aag 2332323DNAHomo sapiens; 323cttcaattgt
ttgaggttca agc 2332423DNAHomo sapiens; 324cttcaattgt tcgaggttca agc
2332523DNAHomo sapiens; 325acctggtgat gaatccctta cta 2332623DNAHomo
sapiens; 326acctggtgat ggatccctta cta 2332723DNAHomo sapiens;
327tagtaaggga ttcatcacca ggt 2332823DNAHomo sapiens; 328tagtaaggga
tccatcacca ggt 2332923DNAHomo sapiens; 329agcacagcaa gtggaaaatc tgt
2333023DNAHomo sapiens; 330agcacagcaa gggaaaatct gtc 2333123DNAHomo
sapiens; 331acagattttc cacttgctgt gct 2333223DNAHomo sapiens;
332gacagatttt cccttgctgt gct 2333323DNAHomo sapiens; 333attttagatt
actgattttg ggc 2333423DNAHomo sapiens; 334attttagatt atgattttgg gca
2333523DNAHomo sapiens; 335gcccaaaatc agtaatctaa aat 2333623DNAHomo
sapiens; 336tgcccaaaat cataatctaa aat 2333723DNAHomo sapiens;
337gctttctaat ggtgacaact gat 2333823DNAHomo sapiens; 338gctttctaat
gatgacaact gat 2333921DNAHomo sapiens; 339ctttctaatg gtgacaactg a
2134021DNAHomo sapiens; 340ctttctaatg atgacaactg a 2134121DNAHomo
sapiens; 341tagacgacag cgcaggcaag a 2134221DNAHomo sapiens;
342tagacgacag agcaggcaag a 2134321DNAHomo sapiens; 343tcttgcctgc
gctgtcgtct a 2134421DNAHomo sapiens; 344tcttgcctgc tctgtcgtct a
2134523DNAHomo sapiens; 345ttggatggct ccaaatcacc ccc 2334623DNAHomo
sapiens; 346ttggatggct cgaaatcacc ccc 2334723DNAHomo sapiens;
347gggggtgatt tggagccatc caa 2334823DNAHomo sapiens; 348gggggtgatt
tcgagccatc caa 2334923DNAHomo sapiens; 349tggtgagcgt ggactttccg gaa
2335023DNAHomo sapiens; 350tggtgagcgt gaactttccg gaa 2335123DNAHomo
sapiens; 351ttccggaaag tccacgctca cca 2335223DNAHomo sapiens;
352ttccggaaag ttcacgctca cca 2335323DNAHomo sapiens; 353gcccggagct
gccctttcct ctt 2335423DNAHomo sapiens; 354gcccggagct gacctttcct ctt
2335523DNAHomo sapiens; 355aagaggaaag ggcagctccg ggc 2335623DNAHomo
sapiens; 356aagaggaaag gtcagctccg ggc 2335723DNAHomo sapiens;
357ttgtgtcttg cgatgctaaa gga 2335823DNAHomo sapiens; 358ttgtgtcttg
ccatgctaaa gga 2335923DNAHomo sapiens; 359tcctttagca tcgcaagaca caa
2336023DNAHomo sapiens; 360tcctttagca tggcaagaca caa 2336143DNAHomo
sapiens; 361taatacgact cactataggg agagttgctt ttcctctggg aag
4336243DNAHomo sapiens; 362taatacgact cactataggg agaccatttg
atcagcggag act 4336345DNAHomo sapiens; 363taatacgact cactataggg
agaacccatg tatctaggag agctg 4536443DNAHomo sapiens; 364taatacgact
cactataggg agagaggggt catgaggtga ctg 4336544DNAHomo sapiens;
365taatacgact cactataggg agatgttatt ccttcctccc caac 4436643DNAHomo
sapiens; 366taatacgact cactataggg agacagcgag atctggcgta taa
4336745DNAHomo sapiens; 367taatacgact cactataggg agatgctgtt
accaaatctc agtgg 4536843DNAHomo sapiens; 368taatacgact cactataggg
agagattccg tcccctttct ttc 4336943DNAHomo sapiens; 369taatacgact
cactataggg agagggggtc cacttgtctg taa 4337043DNAHomo sapiens;
370taatacgact cactataggg agagagccaa ggcaggtttt aga 4337143DNAHomo
sapiens; 371taatacgact cactataggg agaggcagca taagcaggac ttc
4337243DNAHomo sapiens; 372taatacgact cactataggg agaatggttt
gcaggaaaca agg 4337343DNAHomo sapiens; 373taatacgact cactataggg
agaacctctg tcttgggcta cca 4337443DNAHomo sapiens; 374taatacgact
cactataggg agagactctt ccacctccca aca 4337541DNAHomo sapiens;
375taatacgact cactataggg agaagcacac ggagagcctg a 4137643DNAHomo
sapiens; 376taatacgact cactataggg agagaaggca ggagacagtg gat
4337743DNAHomo sapiens;
377taatacgact cactataggg agaacctggt ccccaaaaga aat 4337843DNAHomo
sapiens; 378taatacgact cactataggg agagctgtgc tctttttcca ggt
4337943DNAHomo sapiens; 379taatacgact cactataggg agatgcttga
ggtgagtttt tgc 4338043DNAHomo sapiens; 380taatacgact cactataggg
agagaatgac aacaagcccg aat 4338142DNAHomo sapiens; 381taatacgact
cactataggg agacggacat catcctgtac gc 4238243DNAHomo sapiens;
382taatacgact cactataggg agaaaagagc ttcaacccca aca 4338344DNAHomo
sapiens; 383taatacgact cactataggg agacttccac agggtgatct tctg
4438443DNAHomo sapiens; 384taatacgact cactataggg agagaagacc
caggtccaga tga 4338543DNAHomo sapiens; 385taatacgact cactataggg
agagctgctt ccactatggc ttc 4338642DNAHomo sapiens; 386taatacgact
cactataggg agaaacagag gaggggaaag ca 4238743DNAHomo sapiens;
387taatacgact cactataggg agaccgacac gtctctgcta ctg 4338843DNAHomo
sapiens; 388taatacgact cactataggg agaaaggaga tcgaggtccc act
4338942DNAHomo sapiens; 389taatacgact cactataggg agaaagaaca
gcctggcctt gt 4239043DNAHomo sapiens; 390taatacgact cactataggg
agaggtcaac ccatctgagt tcc 4339143DNAHomo sapiens; 391taatacgact
cactataggg agataatcct ccctctcgtg cag 4339243DNAHomo sapiens;
392taatacgact cactataggg agatgctccg ctgaccttaa aga 4339343DNAHomo
sapiens; 393taatacgact cactataggg agaggccact tgtttgtgtg tgt
4339443DNAHomo sapiens; 394taatacgact cactataggg agacaccact
ccttccaggg tta 4339543DNAHomo sapiens; 395taatacgact cactataggg
agagagggga gaaagaggga aga 4339643DNAHomo sapiens; 396taatacgact
cactataggg agagcatttg ctgttcggag ttt 4339745DNAHomo sapiens;
397taatacgact cactataggg agacacacac acacacaaat ccaag 4539843DNAHomo
sapiens; 398taatacgact cactataggg agaccctttc tgatcccagg tct
4339943DNAHomo sapiens; 399taatacgact cactataggg agagctggtt
cttgggaaat cct 4340043DNAHomo sapiens; 400taatacgact cactataggg
agacagcatc tgctccctct acc 4340142DNAHomo sapiens; 401taatacgact
cactataggg agactgagga gccccaacaa ct 4240243DNAHomo sapiens;
402taatacgact cactataggg agacacggtt tctcttccag gac 4340342DNAHomo
sapiens; 403taatacgact cactataggg agacgaggcc ctcctacctt tt
4240443DNAHomo sapiens; 404taatacgact cactataggg agacagggtg
ttgagtgaca gga 4340548DNAHomo sapiens; 405taatacgact cactataggg
agatttttgt tgacagaatt caaaactt 4840643DNAHomo sapiens;
406taatacgact cactataggg agacagcttg cccgagttct act 4340743DNAHomo
sapiens; 407taatacgact cactataggg agaccaagag gaagccctaa tcc
4340845DNAHomo sapiens; 408taatacgact cactataggg agacaccttg
gttcttgtag acgac 4540943DNAHomo sapiens; 409taatacgact cactataggg
agactgcctt tgtcccctag atg 4341043DNAHomo sapiens; 410taatacgact
cactataggg agagagacca tgaccactca cca 4341143DNAHomo sapiens;
411taatacgact cactataggg agagccagga tggtctcagt ctc 4341243DNAHomo
sapiens; 412taatacgact cactataggg agagccagga tggtctcagt ctc
4341343DNAHomo sapiens; 413taatacgact cactataggg agattcgaga
gtgaggacgt gtg 4341443DNAHomo sapiens; 414taatacgact cactataggg
agactactgg tttgggaggg aca 4341544DNAHomo sapiens; 415taatacgact
cactataggg agaaagcagt ctgtttgagg gaca 4441646DNAHomo sapiens;
416taatacgact cactataggg agatgccatt aaaagaaaat catcca
4641742DNAHomo sapiens; 417taatacgact cactataggg agacccctag
agctcagcca gt 4241844DNAHomo sapiens; 418taatacgact cactataggg
agacagactt agctcaaccc gtca 4441943DNAHomo sapiens; 419taatacgact
cactataggg agagctccag gagaatcttt cca 4342043DNAHomo sapiens;
420taatacgact cactataggg agacaaaccg ctgccactac act 4342143DNAHomo
sapiens; 421taatacgact cactataggg agagcgacct cttcagatgg att
4342243DNAHomo sapiens; 422taatacgact cactataggg agaagagtca
gctccgacct ctc 4342341DNAHomo sapiens; 423taatacgact cactataggg
agagaccctt ggccgctaaa c 4142442DNAHomo sapiens; 424taatacgact
cactataggg agaccatagt ggtgctgaat gg 4242543DNAHomo sapiens;
425taatacgact cactataggg agattggaat gaggacagcc ata 4342644DNAHomo
sapiens; 426taatacgact cactataggg agacagcaag gatttgaaag atgc
4442743DNAHomo sapiens; 427taatacgact cactataggg agactccatg
tttctgggga aat 4342843DNAHomo sapiens; 428taatacgact cactataggg
agagtaatcc gagcctccac tga 4342941DNAHomo sapiens; 429taatacgact
cactataggg agactgagct ccctggtggt g 4143043DNAHomo sapiens;
430taatacgact cactataggg agactgagag ctcctgtgcc ttc 4343142DNAHomo
sapiens; 431taatacgact cactataggg agactctctg cccagtccct gt
4243242DNAHomo sapiens; 432taatacgact cactataggg agaccctctg
tcaggagtgt gc 4243345DNAHomo sapiens; 433taatacgact cactataggg
agaaaggtat tcaaggcagg gagta 4543446DNAHomo sapiens; 434taatacgact
cactataggg agaaattaca accagagctt ggcata 4643543DNAHomo sapiens;
435taatacgact cactataggg agacctgtga tcccactttc atc 4343643DNAHomo
sapiens; 436taatacgact cactataggg agagcaagct cacggttgtc tta
4343743DNAHomo sapiens; 437taatacgact cactataggg agaggcatgg
ttcaccttct cct 4343843DNAHomo sapiens; 438taatacgact cactataggg
agatggtgtc tccaggtcaa tca 4343943DNAHomo sapiens; 439taatacgact
cactataggg agaggctgtt ccctttgaga acc 4344043DNAHomo sapiens;
440taatacgact cactataggg agaggaccaa atcaggagag agc 4344143DNAHomo
sapiens; 441taatacgact cactataggg agaaccccag aaggggttta ctg
4344246DNAHomo sapiens; 442taatacgact cactataggg agatcacctt
gtgatgttag tttgga 4644343DNAHomo sapiens; 443taatacgact cactataggg
agattaatgg caggtgtgaa ttg 4344444DNAHomo sapiens; 444taatacgact
cactataggg agaaaggtgt ggccattgta aaaa 4444543DNAHomo sapiens;
445taatacgact cactataggg agagagtcca ggggaacagc ttc 4344642DNAHomo
sapiens; 446taatacgact cactataggg agaggtaccg catgcacaag tc
4244745DNAHomo sapiens; 447taatacgact cactataggg agactgcatc
agttcacttt tgacc 4544841DNAHomo sapiens; 448taatacgact cactataggg
agagacagcc aacgcctctt g 4144943DNAHomo sapiens; 449taatacgact
cactataggg agacaagaca tgccaaagtg ctg 4345041DNAHomo sapiens;
450aattaaccct cactaaaggg agagggctgg gaggagaaga t 4145143DNAHomo
sapiens; 451aattaaccct cactaaaggg agacactcgc acgtttgaca tct
4345243DNAHomo sapiens; 452aattaaccct cactaaaggg agaacaaagg
ttccattgcc act 4345348DNAHomo sapiens; 453aattaaccct cactaaaggg
agacataata ttcccaacac aattcttg 4845443DNAHomo sapiens;
454aattaaccct cactaaaggg agaacctgac cttggtgttg agc 4345543DNAHomo
sapiens; 455aattaaccct cactaaaggg agacccacat gcacatctct gtc
4345647DNAHomo sapiens; 456aattaaccct cactaaaggg agatgctctt
tatgttgaac tgtgtga 4745743DNAHomo sapiens; 457aattaaccct cactaaaggg
agaaactctg gtccaccagg aca 4345843DNAHomo sapiens; 458aattaaccct
cactaaaggg agaccagaac gtgaggtgga ctt 4345943DNAHomo sapiens;
459aattaaccct cactaaaggg agaaagacca cgaccagcag aat 4346043DNAHomo
sapiens; 460aattaaccct cactaaaggg agagttgctc gaggacaagt tcc
4346143DNAHomo sapiens; 461aattaaccct cactaaaggg agaaaagcgg
gagatgaagt cct 4346243DNAHomo sapiens; 462aattaaccct cactaaaggg
agatcatcac tctgctggtc agg 4346343DNAHomo sapiens; 463aattaaccct
cactaaaggg agatggggaa tttctttgtc cag 4346443DNAHomo sapiens;
464aattaaccct cactaaaggg agaaggggaa aaacgctacc tgt 4346543DNAHomo
sapiens; 465aattaaccct cactaaaggg agacagtcaa tccctttggt gct
4346643DNAHomo sapiens; 466aattaaccct cactaaaggg agaaaagttg
gggacacaca agc 4346743DNAHomo sapiens; 467aattaaccct cactaaaggg
agatcgttcc cttggatctg atg 4346846DNAHomo sapiens; 468aattaaccct
cactaaaggg agacaggaaa gtcttttccc attaca 4646943DNAHomo sapiens;
469aattaaccct cactaaaggg agatgtccgt aggaaggatc agc 4347041DNAHomo
sapiens; 470aattaaccct cactaaaggg agagatgcgc ccagtacctg t
4147143DNAHomo sapiens; 471aattaaccct cactaaaggg agaaccagga
ggtacatggc att 4347243DNAHomo sapiens; 472aattaaccct cactaaaggg
agacatgaag ctgcctccct tag 4347343DNAHomo sapiens; 473aattaaccct
cactaaaggg agactgccct ggtaggtttt ctg 4347443DNAHomo sapiens;
474aattaaccct cactaaaggg agacttcacg tggatgaagt gga 4347543DNAHomo
sapiens; 475aattaaccct cactaaaggg agagaagagt ccctgacccc tct
4347642DNAHomo sapiens; 476aattaaccct cactaaaggg agactccagg
ctccagcttt gt 4247743DNAHomo sapiens; 477aattaaccct cactaaaggg
agaggccttt tggtccagaa ttt 4347843DNAHomo sapiens; 478aattaaccct
cactaaaggg agaatgtgaa ccagctccct gtc 4347943DNAHomo sapiens;
479aattaaccct cactaaaggg agagcaggat agccaggaag aga 4348043DNAHomo
sapiens; 480aattaaccct cactaaaggg agagtgctgc cccatactca ctt
4348143DNAHomo sapiens; 481aattaaccct cactaaaggg agacgttgtc
agaaatggtc gaa 4348243DNAHomo sapiens; 482aattaaccct cactaaaggg
agaggtgggt gtatccacag gac 4348345DNAHomo sapiens; 483aattaaccct
cactaaaggg agatggagaa agttgaacca cctct 4548443DNAHomo sapiens;
484aattaaccct cactaaaggg agagagttca acagcaagca gca 4348543DNAHomo
sapiens; 485aattaaccct cactaaaggg agacccttct cggcaattta cac
4348643DNAHomo sapiens; 486aattaaccct cactaaaggg agaaagcttc
tgtggctgga gtc 4348743DNAHomo sapiens; 487aattaaccct cactaaaggg
agactgattg gctgagggtt cac 4348843DNAHomo sapiens; 488aattaaccct
cactaaaggg agacccaggc tgaatgacaa aag 4348941DNAHomo sapiens;
489aattaaccct cactaaaggg agaaggggct cacaacagtg c 4149041DNAHomo
sapiens; 490aattaaccct cactaaaggg agacagcccc aaccttgtca c
4149143DNAHomo sapiens; 491aattaaccct cactaaaggg agactctcag
agctgctcac acg 4349242DNAHomo sapiens; 492aattaaccct cactaaaggg
agacaggcgt cagcaccagt ag 4249343DNAHomo sapiens; 493aattaaccct
cactaaaggg agacaggctg ggaaacaagg tag 4349443DNAHomo sapiens;
494aattaaccct cactaaaggg agactccagc cgatctctct gtt 4349543DNAHomo
sapiens; 495aattaaccct cactaaaggg agattgcagg tcgcttcctt att
4349643DNAHomo sapiens; 496aattaaccct cactaaaggg agaacctctc
attcaaccgc cta 4349741DNAHomo sapiens; 497aattaaccct cactaaaggg
agaggaagcg gctgatcctc a 4149843DNAHomo sapiens; 498aattaaccct
cactaaaggg agacccagga gccctataaa acc 4349943DNAHomo sapiens;
499aattaaccct cactaaaggg agaggccagc tgggaataga gat 4350043DNAHomo
sapiens; 500aattaaccct cactaaaggg agaacggtgt gatttgtgct gaa
4350143DNAHomo sapiens; 501aattaaccct cactaaaggg agaaacggtg
tgatttgtgc tga 4350243DNAHomo sapiens; 502aattaaccct cactaaaggg
agagggagca ggaaagtgag gtt 4350346DNAHomo sapiens; 503aattaaccct
cactaaaggg agaaagagtt tttaggaccc acttcc 4650444DNAHomo sapiens;
504aattaaccct cactaaaggg agagagctca ggagtttgag acca 4450543DNAHomo
sapiens; 505aattaaccct cactaaaggg agaagggcaa acctgagtca tca
4350643DNAHomo sapiens; 506aattaaccct cactaaaggg agagaatctg
ccagggctat ttg 4350743DNAHomo sapiens; 507aattaaccct cactaaaggg
agagtctggc caagctgctg tat 4350843DNAHomo sapiens; 508aattaaccct
cactaaaggg agattgggcc aaaacaaata agc 4350942DNAHomo sapiens;
509aattaaccct cactaaaggg agaacgtttc cattgtgcgg ta 4251042DNAHomo
sapiens; 510aattaaccct cactaaaggg agacaaccca ctctcccttg ga
4251143DNAHomo sapiens; 511aattaaccct cactaaaggg agaaggagta
gcaggagcgt ggt 4351241DNAHomo sapiens; 512aattaaccct cactaaaggg
agaagcgaac gagaggtgag c 4151341DNAHomo sapiens; 513aattaaccct
cactaaaggg agatggcgcg tgaagaagtt g 4151444DNAHomo sapiens;
514aattaaccct cactaaaggg agagcaaaga atcacacaca cacc 4451549DNAHomo
sapiens; 515aattaaccct cactaaaggg agacagaaat atgcaacagt tacaaaagg
4951643DNAHomo sapiens; 516aattaaccct cactaaaggg agaccttcag
gtttgggaac tca 4351742DNAHomo sapiens; 517aattaaccct cactaaaggg
agacctcatg aagggggaga tg 4251841DNAHomo sapiens; 518aattaaccct
cactaaaggg agagtggctc ggtctccaca c 4151943DNAHomo sapiens;
519aattaaccct cactaaaggg agatactgct tggagtgctc ctc 4352043DNAHomo
sapiens; 520aattaaccct cactaaaggg agatcacaaa gcggaagaat gtg
4352143DNAHomo sapiens; 521aattaaccct cactaaaggg agatggttct
cccgagaggt aaa 4352243DNAHomo sapiens; 522aattaaccct cactaaaggg
agaccctgat gacatcctga ttg 4352346DNAHomo sapiens; 523aattaaccct
cactaaaggg agatcacttt ccataaaagc aaggtt 4652443DNAHomo sapiens;
524aattaaccct cactaaaggg agatgtactt cagggcttgg tca 4352543DNAHomo
sapiens; 525aattaaccct cactaaaggg agacaagcca ctgaaggagc ata
4352646DNAHomo sapiens; 526aattaaccct cactaaaggg agaggcagga
gatgagaatt aagaaa 4652743DNAHomo sapiens; 527aattaaccct cactaaaggg
agaggctgat ccttcccaga aat 4352842DNAHomo sapiens; 528aattaaccct
cactaaaggg agatgcagga gaaggtgaac ca 4252943DNAHomo sapiens;
529aattaaccct cactaaaggg agagaggatg aagcccacca aac 4353047DNAHomo
sapiens; 530aattaaccct cactaaaggg agagagttgg gtgatacata cacaagg
4753143DNAHomo sapiens; 531aattaaccct cactaaaggg agatgagctg
gtctgaatgt tcg 4353243DNAHomo sapiens; 532aattaaccct cactaaaggg
agacagcagt cccaacagaa aca 4353343DNAHomo sapiens; 533aattaaccct
cactaaaggg agagcagtgg tagtggtggc att 4353443DNAHomo sapiens;
534aattaaccct cactaaaggg agatggtgta acctcccttg aaa 4353543DNAHomo
sapiens; 535aattaaccct cactaaaggg agatccctgc acttctaggc act
4353643DNAHomo sapiens; 536aattaaccct cactaaaggg agagcttcac
tgggtgtgga aat 4353743DNAHomo sapiens; 537aattaaccct cactaaaggg
agagtggaga ggaggaggac aaa 4353843DNAHomo sapiens; 538aattaaccct
cactaaaggg agagcctcag acatctccag tcc 4353950DNAHomo sapiens;
539gtcgtcaaga tgctaccgtt caggagtcgt caagatgcta ccgttcagga
5054019DNAHomo sapiens; 540cttgacgact cctgaacgg 1954119DNAHomo
sapiens; 541cttgacgaca cctgaacgg 1954223DNAHomo sapiens;
542taatacgact cactataggg aga 2354323DNAHomo sapiens; 543aattaaccct
cactaaaggg aga 23
* * * * *