U.S. patent application number 12/739559 was filed with the patent office on 2011-06-30 for method of diagnosing neoplasms.
This patent application is currently assigned to CLINICAL GENOMICS PTY. LTD.. Invention is credited to Glenn Southwell Brown, Robert Dunne, Lloyd Douglas Graham, Lawrence Charles Lapointe, Peter Molloy, Susanne Pedersen, Graeme P. Young.
Application Number | 20110160072 12/739559 |
Document ID | / |
Family ID | 40578971 |
Filed Date | 2011-06-30 |
United States Patent
Application |
20110160072 |
Kind Code |
A1 |
Lapointe; Lawrence Charles ;
et al. |
June 30, 2011 |
METHOD OF DIAGNOSING NEOPLASMS
Abstract
The present invention relates generally to a nucleic acid
molecule, the RNA and protein expression profiles of which are
indicative of the onset, predisposition to the onset and/or
progression of a large intestine neoplasm. More particularly, the
present invention is directed to a nucleic acid molecule, the
expression profiles of which are indicative of the onset and/or
progression of a colorectal neoplasm, such as an adenoma or an
adenocarcinoma. The expression profiles of the present invention
are useful in a range of applications including, but not limited
to, those relating to the diagnosis and/or monitoring of colorectal
neoplasms, such as colorectal adenomas and adenocarcinomas.
Accordingly, in a related aspect the present invention is directed
to a method of screening a subject for the onset, predisposition to
the onset and/or progression of a large intestine neoplasm by
screening for modulation in the expression profile of said nucleic
acid molecule markers.
Inventors: |
Lapointe; Lawrence Charles;
(New South Wales, AU) ; Dunne; Robert; (New South
Wales, AU) ; Young; Graeme P.; (South Australia,
AU) ; Molloy; Peter; (New South Wales, AU) ;
Pedersen; Susanne; (New South Wales, AU) ; Brown;
Glenn Southwell; (New South Wales, AU) ; Graham;
Lloyd Douglas; (New South Wales, AU) |
Assignee: |
CLINICAL GENOMICS PTY. LTD.
North Ryde, NSW
AU
COMMONWEALTH SCIENTIFIC AND INDUSTRIAL RESEARCH
ORGANISATION
Campbell, Australian Capital Territory
AU
|
Family ID: |
40578971 |
Appl. No.: |
12/739559 |
Filed: |
October 23, 2008 |
PCT Filed: |
October 23, 2008 |
PCT NO: |
PCT/AU2008/001569 |
371 Date: |
November 9, 2010 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
60982114 |
Oct 23, 2007 |
|
|
|
Current U.S.
Class: |
506/9 ; 506/17;
536/23.1 |
Current CPC
Class: |
C12Q 1/6886 20130101;
C12Q 2600/16 20130101; G01N 2800/06 20130101; C12Q 2600/158
20130101; G01N 33/57446 20130101; C12Q 2600/112 20130101 |
Class at
Publication: |
506/9 ; 506/17;
536/23.1 |
International
Class: |
C40B 30/04 20060101
C40B030/04; C40B 40/08 20060101 C40B040/08; C07H 21/00 20060101
C07H021/00 |
Claims
1. A method of screening for the onset or predisposition to the
onset of a large intestine neoplasm, or monitoring the progress of
a neoplasm, in an individual, said method comprising: (i) measuring
the level of expression of hCG.sub.--1815491 in a biological sample
from said individual wherein a higher level of expression of
hCG.sub.--1815491 or variant thereof relative to control levels is
indicative of a neoplastic large intestine cell or a cell
predisposed to the onset of a neoplastic state; or (ii) measuring
the level of expression of a gene comprising a sequence of
nucleotides as set forth in SEQ ID NO:1 or a sequence having at
least 90% similarity to SEQ ID NO:1 across the length of the gene,
or variant of SEQ ID NO:1, in a biological sample from said
individual wherein a higher level of expression of said gene or
variant thereof relative to control levels is indicative of a
neoplastic large intestine cell or a cell predisposed to the onset
of a neoplastic state.
2. (canceled)
3. The method according to claim 1 said method comprising measuring
the level of expression of one or more mRNA transcripts, which
transcripts comprise an RNA sequence selected from: (i) an RNA
sequence characterized by the sequence of one of: SEQ ID NO:21, or
a sequence having at least 90% similarity across the length of the
sequence, or variant of SEQ ID NO:21; SEQ ID NO:22, or a sequence
having at least 90% similarity across the length of the sequence,
or variant of SEQ ID NO:22; SEQ ID NO:23, or a sequence having at
least 90% similarity across the length of the sequence, or variant
of SEQ ID NO:23; SEQ ID NO:24, or a sequence having at least 90%
similarity across the length of the sequence, or variant of SEQ ID
NO:24; SEQ ID NO:25, or a sequence having at least 90% similarity
across the length of the sequence, or variant of SEQ ID NO:25; SEQ
ID NO:26, or a sequence having at least 90% similarity across the
length of the sequence, or variant of SEQ ID NO:26; SEQ ID NO:27,
or a sequence having at least 90% similarity across the length of
the sequence, or variant of SEQ ID NO:27; SEQ ID NO:28, or a
sequence having at least 90% similarity across the length of the
sequence, or variant of SEQ ID NO:28; SEQ ID NO:29, or a sequence
having at least 90% similarity across the length of the sequence,
or variant of SEQ ID NO:29; SEQ ID NO:30, or a sequence having at
least 90% similarity across the length of the sequence, or variant
of SEQ ID NO:30; or SEQ ID NO:31, or a sequence having at least 90%
similarity across the length of the sequence, or variant of SEQ ID
NO:31 (ii) an RNA sequence characterized by the sequence of one of:
SEQ ID NO:21, or a sequence having at least 90% similarity across
the length of the sequence, or variant of SEQ ID NO:21; SEQ ID
NO:24, or a sequence having at least 90% similarity across the
length of the sequence, or variant of SEQ ID NO:24; SEQ ID NO:27,
or a sequence having at least 90% similarity length of the
sequence, or variant of SEQ ID NO:27; SEQ ID NO:22, or a sequence
having at least 90% similarity across the length of the sequence,
or variant of SEQ ID NO:22; SEQ ID NO:23, or a sequence having at
least 90% similarity across the length of the sequence, or variant
of SEQ ID NO:23; SEQ ID NO:30, or a sequence having at least 90%
similarity across the length of the sequence, or variant of SEQ ID
NO:30; SEQ ID NO:31, or a sequence having at least 90% similarity
across the length of the sequence, or variant of SEQ ID NO:31; SEQ
ID NO:25, or a sequence having at least 90% similarity across the
length of the sequence, or variant of SEQ ID NO:25. (iii) an RNA
sequence characterized by the sequence of one of: SEQ ID NO:21, or
a sequence having at least 90% similarity across the length of the
sequence, or variant of SEQ ID NO:21; SEQ ID NO:24, or a sequence
having at least 90% similarity across the length of the sequence,
or variant of SEQ ID NO:24; SEQ ID NO:27, or a sequence having at
least 90% similarity across the length of the sequence, or variant
of SEQ ID NO:27; SEQ ID NO:22, or a sequence having at least 90%
similarity across the length of the sequence, or variant of SEQ ID
NO:22.
4. (canceled)
5. (canceled)
6. A method of screening for the onset or predisposition to the
onset of a large intestine neoplasm in an individual, said method
comprising measuring the level of expression of an RNA transcript,
which transcript comprises one or more exon segments selected from:
(i) an exon segment defined by SEQ ID NO:2, or a sequence having at
least 90% similarity across the length of the sequence, or variant
of SEQ ID NO:2; (ii) an exon segment defined by SEQ ID NO:3, or a
sequence having at least 90% similarity across the length of the
sequence, or variant of SEQ ID NO:3 (iii) an exon segment defined
by SEQ ID NO:4, or a sequence having at least 90% similarity across
the length of the sequence, or variant of SEQ ID NO:4; (iv) an exon
segment defined by SEQ ID NO:5, or a sequence having at least 90%
similarity across the length of the sequence, or variant of SEQ ID
NO:5; (v) an exon segment defined by SEQ ID NO:6, or a sequence
having at least 90% similarity across the length of the sequence,
or variant of SEQ ID NO:6; (vi) an exon segment defined by SEQ ID
NO:7, or a sequence having at least 90% similarity across the
length of the sequence, or variant of SEQ ID NO:7; (vii) an exon
segment defined by SEQ ID NO:8, or a sequence having at least 90%
similarity across the length of the sequence, or variant of SEQ ID
NO:8; (viii) an exon segment defined by SEQ ID NO:9, or a sequence
having at least 90% similarity across the length of the sequence,
or variant of SEQ ID NO:9; (ix) an exon segment defined by SEQ ID
NO:10, or a sequence having at least 90% similarity across the
length of the sequence, or variant of SEQ ID NO:10; (x) an exon
segment defined by SEQ ID NO:11, or a sequence having at least 90%
similarity across the length of the sequence, or variant of SEQ ID
NO:11; (xi) an exon segment defined by SEQ ID NO:12, or a sequence
having at least 90% similarity across the length of the sequence,
or variant of SEQ ID NO:12 (xii) an exon segment defined by SEQ ID
NO:13, or a sequence having at least 90% similarity across the
length of the sequence, or variant of SEQ ID NO:13 (xiii) an exon
segment defined by SEQ ID NO:14, or a sequence having at least 90%
similarity across the length of the sequence, or variant of SEQ ID
NO:14 (xiv) an exon segment defined by SEQ ID NO:15, or a sequence
having at least 90% similarity across the length of the sequence,
or variant of SEQ ID NO:15 (xv) an exon segment defined by SEQ ID
NO:16, or a sequence having at least 90% similarity across the
length of the sequence, or variant of SEQ ID NO:16 (xvi) an exon
segment defined by SEQ ID NO:17, or a sequence having at least 90%
similarity across the length of the sequence, or variant of SEQ ID
NO:17 (xvii) an exon segment defined by SEQ ID NO:18, or a sequence
having at least 90% similarity across the length of the sequence,
or variant of SEQ ID NO:18 (xviii) an exon segment defined by SEQ
ID NO:19, or a sequence having at least 90% similarity across the
length of the sequence, or variant of SEQ ID NO:19; or (xix) an
exon segment defined by SEQ ID NO:20, or a sequence having at least
90% similarity across the length of the sequence, or variant of SEQ
ID NO:20 in a biological sample from said individual wherein a
higher level of said RNA transcript or variant thereof relative to
control levels is indicative of a neoplastic large intestine cell
or a cell predisposed to the onset of a neoplastic state.
7. The method according to claim 6 wherein said transcript
comprises one or more exon segments selected from: (i) an exon
segment defined by SEQ ID NO:3, or a sequence having at least 90%
similarity across the length of the sequence, or variant of SEQ ID
NO:3 (ii) an exon segment defined by SEQ ID NO:4, or a sequence
having at least 90% similarity across the length of the sequence,
or variant of SEQ ID NO:4; (iii) an exon segment defined by SEQ ID
NO:5, or a sequence having at least 90% similarity across the
length of the sequence, or variant of SEQ ID NO:5; (iv) an exon
segment defined by SEQ ID NO:6, or a sequence having at least 90%
similarity across the length of the sequence, or variant of SEQ ID
NO:6; (v) an exon segment defined by SEQ ID NO:7, or a sequence
having at least 90% similarity across the length of the sequence,
or variant of SEQ ID NO:7; (vi) an exon segment defined by SEQ ID
NO:8, or a sequence having at least 90% similarity across the
length of the sequence, or variant of SEQ ID NO:8; (vii) an exon
segment defined by SEQ ID NO:9, or a sequence having at least 90%
similarity across the length of the sequence, or variant of SEQ ID
NO:9; or (viii) an exon segment defined by SEQ ID NO:10, or a
sequence having at least 90% similarity across the length of the
sequence, or variant of SEQ ID NO:10; (ix) an exon segment defined
by SEQ ID NO:11, or a sequence having at least 90% similarity
across the length of the sequence, or variant of SEQ ID NO:11; (x)
an exon segment defined by SEQ ID NO:12, or a sequence having at
least 90% similarity across the length of the sequence, or variant
of SEQ ID NO:12; (xi) an exon segment defined by SEQ ID NO:13, or a
sequence having at least 90% similarity across the length of the
sequence, or variant of SEQ ID NO:13; (xii) an exon segment defined
by SEQ ID NO:14, or a sequence having at least 90% similarity
across the length of the sequence, or variant of SEQ ID NO:14;
(xiii) an exon segment defined by SEQ ID NO:15, or a sequence
having at least 90% similarity across the length of the sequence,
or variant of SEQ ID NO:15; (xiv) an exon segment defined by SEQ ID
NO:18, or a sequence having at least 90% similarity across the
length of the sequence, or variant of SEQ ID NO:18; (xv) an exon
segment defined by SEQ ID NO:19, or a sequence having at least 90%
similarity across the length of the sequence, or variant of SEQ ID
NO:19
8. The method according to claim 7 wherein said transcript is
selected from: (i) an RNA transcript which comprises each of the
exon segments defined by SEQ ID NO:5, SEQ ID NO:6, SEQ ID NO:10 and
SEQ ID NO:12, or a sequence having at least 90% similarity across
the length of these sequences, or variants of SEQ ID NO:5, SEQ ID
NO:6, SEQ ID NO:10 and SEQ ID NO:12; (ii) an RNA transcript which
comprises each of the exon segments defined by SEQ ID NO:5, SEQ ID
NO:6, SEQ ID NO:10 and SEQ ID NO:14, or a sequence having at least
90% similarity across the length of these sequences, or variants of
SEQ ID NO:5, SEQ ID NO:6, SEQ ID NO:10 and SEQ ID NO:14; (iii) an
RNA transcript which comprises each of the exon segments defined by
SEQ ID NO:3 and SEQ ID NO:6, or a sequence having at least 90%
similarity across the length of these sequences, or variants of SEQ
ID NO:3 and SEQ ID NO:6; (iv) an RNA transcript which comprises
each of the exon segments defined by SEQ ID NO:11, SEQ ID NO:12 and
SEQ ID NO:18, or a sequence having at least 90% similarity across
the length of these sequences, or variants of SEQ ID NO:11, SEQ ID
NO:12 and SEQ ID NO:18; (v) an RNA transcript which comprises each
of the exon segments defined by SEQ ID NO:4 and SEQ ID NO:7, or a
sequence having at least 90% similarity across the length of these
sequences, or variants of SEQ ID NO:4 and SEQ ID NO:7; (vi) an RNA
transcript which comprises each of the exon segments defined by SEQ
ID NO:6, SEQ ID NO:10 and SEQ ID NO:13, or a sequence having at
least 90% similarity across the length of these sequences, or
variants of SEQ ID NO:6, SEQ ID NO:10 and SEQ ID NO:13; (vii) an
RNA transcript which comprises each of the exon segments defined by
SEQ ID NO:6 and SEQ ID NO:8, or a sequence having at least 90%
similarity across the length of these sequences, or variants of SEQ
ID NO:6 and SEQ ID NO:8; (viii) an RNA transcript which comprises
each of the exon segments defined by SEQ ID NO:19 and SEQ ID NO:18,
or a sequence having at least 90% similarity across the length of
these sequences, or variants of SEQ ID NO:19 and SEQ ID NO:18; (ix)
an RNA transcript which comprises each of the exon segments defined
by SEQ ID NO:15 and SEQ ID NO:18, or a sequence having at least 90%
similarity across the length of these sequences, or variants of SEQ
ID NO:15 and SEQ ID NO:18; (x) an RNA transcript which comprises
each of the exon segments defined by SEQ ID NO:6 and SEQ ID NO:9,
or a sequence having at least 90% similarity across the length of
these sequences, or variants of SEQ ID NO:6 and SEQ ID NO:9; or
(xi) an RNA transcript which comprises each of the exon segments
defined by SEQ ID NO:4, SEQ ID NO:6, SEQ ID NO:10 and SEQ ID NO:12,
or a sequence having at least 90% similarity across the length of
these sequences, or variants of SEQ ID NO:4, SEQ ID NO:6, SEQ ID
NO:10 and SEQ ID NO:12.
9. The method according to claim 6 wherein said transcript
comprises one or more exon segments selected from: (i) an exon
segment defined by SEQ ID NO:5, or a sequence having at least 90%
similarity across the length of the sequence, or variant of SEQ ID
NO:5; (ii) an exon segment defined by SEQ ID NO:6, or a sequence
having at least 90% similarity across the length of the sequence,
or variant of SEQ ID NO:6; (iii) an exon segment defined by SEQ ID
NO:8, or a sequence having at least 90% similarity across the
length of the sequence, or variant of SEQ ID NO:8; (iv) an exon
segment defined by SEQ ID NO:10, or a sequence having at least 90%
similarity across the length of the sequence, or variant of SEQ ID
NO:10; (v) an exon segment defined by SEQ ID NO:11, or a sequence
having at least 90% similarity across the length of the sequence,
or variant of SEQ ID NO:11; (vi) an exon segment defined by SEQ ID
NO: 12, or a sequence having at least 90% similarity across the
length of the sequence, or variant of SEQ ID NO:12; (vii) an exon
segment defined by SEQ ID NO:14, or a sequence having at least 90%
similarity across the length of the sequence, or variant of SEQ ID
NO:14; or (viii) an exon segment defined by SEQ ID NO:18, or a
sequence having at least 90% similarity across the length of the
sequence, or variant of SEQ ID NO:18.
10. The method according to claim 7 wherein said transcript is
selected from: (i) an RNA transcript which comprises each of the
exon segments defined by SEQ ID NO:5, SEQ ID NO:6, SEQ ID NO:10 and
SEQ ID NO:12, or a sequence having at least 90% similarity across
the length of these sequences, or variants of SEQ ID NO:5, SEQ ID
NO:6, SEQ ID NO:10 and SEQ ID NO:12; (ii) an RNA transcript which
comprises each of the exon segments defined by SEQ ID NO:5, SEQ ID
NO:6, SEQ ID NO:10 and SEQ ID NO:14, or a sequence having at least
90% similarity across the length of these sequences, or variants of
SEQ ID NO:5, SEQ ID NO:6, SEQ ID NO:10 and SEQ ID NO:14; (iii) an
RNA transcript which comprises each of the exon segments defined by
SEQ ID NO:11, SEQ ID NO:12, SEQ ID NO:18 and SEQ ID NO:24, or a
sequence having at least 90% similarity across the length of these
sequences, or variants of SEQ ID NO:11, SEQ ID NO:12, SEQ ID NO:18
and SEQ ID NO:24; or (iv) an RNA transcript which comprises each of
the exon segments defined by SEQ ID NO:6 and SEQ ID NO:8, or a
sequence having at least 90% similarity across the length of these
sequences, or variants of SEQ ID NO:6 and SEQ ID NO:8.
11. The method according to claims 1 or 6 wherein the exon segments
of said transcripts are spliced such that they are joined.
12. A method of screening for the onset or predisposition to the
onset of a large intestine neoplasm in an individual, said method
comprising measuring the level of expression of one or more RNA
transcripts, which transcripts comprise an RNA sequence
characterised by the sequence of one of: (i) SEQ ID NO:21 or a
sequence having at least 90% similarity across the length of the
sequence, or variant of SEQ ID NO:21; (ii) SEQ ID NO:24 or a
sequence having at least 90% similarity across the length of the
sequence, or variant of SEQ ID NO:24; (iii) SEQ ID NO:25 or a
sequence having at least 90% similarity across the length of the
sequence, or variant of SEQ ID NO:25; (iv) SEQ ID NO:26 or a
sequence having at least 90% similarity across the length of the
sequence, or variant of SEQ ID NO:26; (v) SEQ ID NO:27 or a
sequence having at least 90% similarity across the length of the
sequence, or variant of SEQ ID NO:27; (vi) SEQ ID NO:29 or a
sequence having at least 90% similarity across the length of the
sequence, or variant of SEQ ID NO:29; (vii) SEQ ID NO:30 or a
sequence having at least 90% similarity across the length of the
sequence, or variant of SEQ ID NO:30; or (viii) SEQ ID NO:31 or a
sequence having at least 90% similarity across the length of the
sequence, or variant of SEQ ID NO:31; in a biological sample from
said individual wherein a higher level of expression of the genes
or transcripts of group (i) and/or group (ii) relative to
background levels is indicative of a neoplastic cell or a cell
predisposed to the onset of a neoplastic state.
13. The method according to claim 12 wherein said transcripts
comprise an RNA sequence characterised by the sequence of one of:
(i) SEQ ID NO:21 or a sequence having at least 90% similarity
across the length of the sequence, or variant of SEQ ID NO:21; (ii)
SEQ ID NO:22 or a sequence having at least 90% similarity across
the length of the sequence, or variant of SEQ ID NO:22; (iii) SEQ
ID NO:23 or a sequence having at least 90% similarity across the
length of the sequence, or variant of SEQ ID NO:23; (iv) SEQ ID
NO:24 or a sequence having at least 90% similarity across the
length of the sequence, or variant of SEQ ID NO:24; or (v) SEQ ID
NO:27 or a sequence having at least 90% similarity across the
length of the sequence, or variant of SEQ ID NO:27; in a biological
sample from said individual wherein a higher level of expression of
the genes or transcripts of group (i) and/or group (ii) relative to
background levels is indicative of a neoplastic cell or a cell
predisposed to the onset of a neoplastic state.
14. The method according to claim 13 wherein said RNA sequences are
characterised by the sequence of either SEQ ID NO:21 or SEQ ID
NO:22.
15. The method according to any one of claims 1, 6 or 12 wherein
said level of expression is mRNA expression.
16. The method according to any one of claims 1, 6 or 12 wherein
said at least 90% similarity is 91%, 92%, 93%, 94%, 95%, 96%, 97%,
98% or 99%.
17. The method according to any one of claims 1, 6 or 12 wherein
said level of expression is assessed by analysing RNA expression or
protein expression.
18. (canceled)
19. The method according to any one of claims 1, 6 or 12 wherein
said control level is a non-neoplastic level.
20. The method according to any one of claims 1, 6 or 12 wherein
said neoplastic cell is an adenoma or adenocarcinoma.
21. The method according to claim 20 wherein said cell is a
colorectal cell.
22. The method according to any one of claims 1, 6 or 12 wherein
said individual is a human.
23. A molecular array, which array comprises: (i) nucleic acid
molecules comprising a nucleotide sequence corresponding to any one
or more of the hCG.sub.--1815491 sequences described in claims 1 or
6 or a sequence exhibiting at least 90% identity thereto or a
functional derivative, fragment, variant or homologue of said
nucleic acid molecule; or (ii) nucleic acid molecules comprising a
nucleotide sequence capable of hybridising to any one or more of
the sequences of (i) under medium stringency conditions or a
functional derivative, fragment, variant or homologue of said
nucleic acid molecule; or (iii) nucleic acid probes or
oligonucleotides comprising a nucleotide sequence capable of
hybridising to any one or more of the sequences of (i) under medium
stringency conditions or a functional derivative, fragment, variant
or homologue of said nucleic acid molecule; or (iv) probes capable
of binding to any one or more of the proteins encoded by the
nucleic acid molecules of (i) or a derivative, fragment or
homologue thereof wherein the level of expression of said marker
genes of (i) or proteins of (iv) is indicative of the neoplastic
state of a cell or cellular subpopulation derived from the large
intestine.
24. A diagnostic kit for assaying biological samples comprising an
agent for detecting one or more neoplastic markers as defined in
claim 1 or 6 and reagents useful for facilitating the detection of
said agent.
Description
FIELD OF THE INVENTION
[0001] The present invention relates generally to a nucleic acid
molecule, the RNA and protein expression profiles of which are
indicative of the onset, predisposition to the onset and/or
progression of a large intestine neoplasm. More particularly, the
present invention is directed to a nucleic acid molecule, the
expression profiles of which are indicative of the onset and/or
progression of a colorectal neoplasm, such as an adenoma or an
adenocarcinoma. The expression profiles of the present invention
are useful in a range of applications including, but not limited
to, those relating to the diagnosis and/or monitoring of colorectal
neoplasms, such as colorectal adenomas and adenocarcinomas.
[0002] Accordingly, in a related aspect the present invention is
directed to a method of screening a subject for the onset,
predisposition to the onset and/or progression of a large intestine
neoplasm by screening for modulation in the expression profile of
said nucleic acid molecule markers.
BACKGROUND OF THE INVENTION
[0003] Bibliographic details of the publications referred to by
author in this specification are collected alphabetically at the
end of the description.
[0004] The reference in this specification to any prior publication
(or information derived from it), or to any matter which is known,
is not, and should not be taken as an acknowledgment or admission
or any form of suggestion that that prior publication (or
information derived from it) or known matter forms part of the
common general knowledge in the field of endeavour to which this
specification relates.
[0005] Adenomas are benign tumours, or neoplasms, of epithelial
origin which are derived from glandular tissue or exhibit clearly
defined glandular structures. Some adenomas show recognisable
tissue elements, such as fibrous tissue (fibroadenomas) and
epithelial structure, while others, such as bronchial adenomas,
produce active compounds that might give rise to clinical
syndromes.
[0006] Adenomas may progress to become an invasive neoplasm and are
then termed adenocarcinomas. Accordingly, adenocarcinomas are
defined as malignant epithelial tumours arising from glandular
structures, which are constituent parts of many organs of the body.
The term adenocarcinoma is also applied to tumours showing a
glandular growth pattern. These tumours may be sub-classified
according to the substances that they produce, for example mucus
secreting and serous adenocarcinomas, or to the microscopic
arrangement of their cells into patterns, for example papillary and
follicular adenocarcinomas. These carcinomas may be solid or cystic
(cystadenocarcinomas). Each organ may produce tumours showing a
variety of histological types, for example the ovary may produce
both mucinous and cystadenocarcinoma.
[0007] Adenomas in different organs behave differently. In general,
the overall chance of carcinoma being present within an adenoma
(i.e. a focus of cancer having developed within a benign lesion) is
approximately 5%. However, this is related to size of an adenoma.
For instance, in the large bowel (colon and rectum specifically)
occurrence of a cancer within an adenoma is rare in adenomas of
less than 1 centimetre. Such a development is estimated at 40 to
50% in adenomas which are greater than 4 centimetres and show
certain histopathological change such as villous change, or high
grade dysplasia. Adenomas with higher degrees of dysplasia have a
higher incidence of carcinoma. In any given colorectal adenoma, the
predictors of the presence of cancer now or the future occurrence
of cancer in the organ include size (especially greater than 9 mm)
degree of change from tubular to villous morphology, presence of
high grade dysplasia and the morphological change described as
"serrated adenoma". In any given individual, the additional
features of increasing age, familial occurrence of colorectal
adenoma or cancer, male gender or multiplicity of adenomas, predict
a future increased risk for cancer in the organ--so-called risk
factors for cancer. Except for the presence of adenomas and its
size, none of these is objectively defined and all those other than
number and size are subject to observer error and to confusion as
to precise definition of the feature in question. Because such
factors can be difficult to assess and define, their value as
predictors of current or future risk for cancer is imprecise.
[0008] Once a sporadic adenoma has developed, the chance of a new
adenoma occurring is approximately 30% within 26 months.
[0009] Colorectal adenomas represent a class of adenomas which are
exhibiting an increasing incidence, particularly in more affluent
countries. The causes of adenoma, and of progression to
adenocarcinoma, are still the subject of intensive research. To
date it has been speculated that in addition to genetic
predisposition, environmental factors (such as diet) play a role in
the development of this condition. Most studies indicate that the
relevant environmental factors relate to high dietary fat, low
fibre, low vegetable intake, smoking, obesity, physical inactivity
and high refined carbohydrates.
[0010] Colonic adenomas are localised areas of dysplastic
epithelium which initially involve just one or several crypts and
may not protrude from the surface, but with increased growth in
size, usually resulting from an imbalance in proliferation and/or
apoptosis, they may protrude. Adenomas can be classified in several
ways. One is by their gross appearance and the major descriptors
include degrees of protrusion: flat sessile (i.e. protruding but
without a distinct stalk) or pedunculated (i.e. having a stalk).
Other gross descriptors include actual size in the largest
dimension and actual number in the colon/rectum. While small
adenomas (less than say 5 or 10 millimetres) exhibit a smooth tan
surface, pedunculated and especially larger adenomas tend to have a
cobblestone or lobulated red-brown surface. Larger sessile adenomas
may exhibit a more delicate villous surface. Another set of
descriptors include the histopathological classification; the prime
descriptors of clinical value include degree of dysplasia (low or
high), whether or not a focus of invasive cancer is present, degree
of change from tubular gland formation to villous gland formation
(hence classification is tubular, villous or tubulovillous),
presence of admixed hyperplastic change and of so-called "serrated"
adenomas and its subgroups. Adenomas can be situated at any site in
the colon and/or rectum although they tend to be more common in the
rectum and distal colon. All of these descriptors, with the
exception of number and size, are relatively subjective and subject
to interobserver disagreement.
[0011] The various descriptive features of adenomas are of value
not just to ascertain the neoplastic status of any given adenomas
when detected, but also to predict a person's future risk of
developing colorectal adenomas or cancer. Those features of an
adenoma or number of adenomas in an individual that point to an
increased future risk for cancer or recurrence of new adenomas
include: size of the largest adenoma (especially 10 mm or larger),
degree of villous change (especially at least 25% such change and
particularly 100% such change), high grade dysplasia, number (3 or
more of any size or histological status) or presence of serrated
adenoma features. None except size or number is objective and all
are relatively subjective and subject to interobserver
disagreement. These predictors of risk for future neoplasia (hence
"risk") are vital in practice because they are used to determine
the rate and need for and frequency of future colonoscopic
surveillance. More accurate risk classification might thus reduce
workload of colonoscopy, make it more cost-effective and reduce the
risk of complications from unnecessary procedures.
[0012] Adenomas are generally asymptomatic, therefore rendering
difficult their diagnosis and treatment at a stage prior to when
they might develop invasive characteristics and so became cancer.
It is technically impossible to predict the presence or absence of
carcinoma based on the gross appearance of adenomas, although
larger adenomas are more likely to show a region of malignant
change than are smaller adenomas. Sessile adenomas exhibit a higher
incidence of malignancy than pedunculated adenomas of the same
size. Some adenomas result in blood loss which might be observed or
detectable in the stools; while sometimes visible by eye, it is
often, when it occurs, microscopic or "occult". Larger adenomas
tend to bleed more than smaller adenomas. However, since blood in
the stool, whether overt or occult, can also be indicative of
non-adenomatous conditions, the accurate diagnosis of adenoma is
rendered difficult without the application of highly invasive
procedures such as colonoscopy combined with tissue acquisition by
either removal (i.e. polypectomy) or biopsy and subsequent
histopathological analysis.
[0013] Accordingly, there is an on-going need to elucidate the
causes of adenoma and to develop more informative diagnostic
protocols or aids to diagnosis that enable one to direct
colonoscopy at people more likely to have adenomas. These adenomas
may be high risk, advanced or neither of these. Furthermore, it can
be difficult after colonoscopy to be certain that all adenomas have
been removed, especially in a person who has had multiple adenomas.
An accurate screening test may minimise the need to undertake an
early second colonoscopy to ensure that the colon has been cleared
of neoplasms. Accordingly, the identification of molecular markers
for adenomas would provide means for understanding the cause of
adenomas and cancer, improving diagnosis of adenomas including
development of useful screening tests, elucidating the histological
stage of an adenoma, characterising a patient's future risk for
colorectal neoplasia on the basis of the molecular state of an
adenoma and facilitating treatment of adenomas.
[0014] To date, research has focused on the identification of gene
mutations which lead to the development of colorectal neoplasms. In
work leading up to the present invention, however, it has been
determined that changes in expression profiles of genes which may
also expressed in healthy individuals are indicative of the
development of neoplasms of the large intestine, such as adenomas
and adenocarcinomas. More specifically, there has been identified a
gene, an increase in the expression of which is indicative of the
onset of a large intestine adenoma or adenocarcinoma. Yet more
particularly, it has been determined that this gene, which
comprises SEQ ID NO:1 and is herein called hCG.sub.--1815491,
encodes 18 identified exon segments, several of which are expressed
in two or more splice variants forms. hCG.sub.--1815491 has now
been found to transcribe to at least 11 variant RNA transcript
forms. It has still further been determined that although the
levels of multiple transcribed forms of hCG.sub.--1815491 show some
level of increase in expression in the context of neoplasia
development, hCG.sub.--1815491 is, in fact, alternatively spliced
in a neoplastic specific manner, thereby enabling a level of
diagnostic and prognostic discrimination which is rarely available
in the context of a single gene and has been unavailable in terms
of the diagnosis of colorectal neoplasias. The findings of the
present invention have therefore facilitated the development of a
screening method to diagnose the onset, or predisposition thereto,
of adenocarcinoma, adenoma and/or the monitoring of conditions
characterised by the development of these types of neoplasms.
SUMMARY OF THE INVENTION
[0015] Throughout this specification and the claims which follow,
unless the context requires otherwise, the word "comprise", and
variations such as "comprises" and "comprising", will be understood
to imply the inclusion of a stated integer or step or group of
integers or steps but not the exclusion of any other integer or
step or group of integers or steps.
[0016] As used herein, the term "derived from" shall be taken to
indicate that a particular integer or group of integers has
originated from the species specified, but has not necessarily been
obtained directly from the specified source. Further, as used
herein the singular forms of "a", "and" and "the" include plural
referents unless the context clearly dictates otherwise.
[0017] Unless otherwise defined, all technical and scientific terms
used herein have the same meaning as commonly understood by one of
ordinary skill in the art to which this invention belongs.
[0018] The subject specification contains amino acid and nucleotide
sequence information prepared using the programme PatentIn Version
3.4, presented herein after the bibliography. Each amino acid and
nucleotide sequence is identified in the sequence listing by the
numeric indicator <210> followed by the sequence identifier
(eg. <210>1, <210>2, etc). The length, type of sequence
(amino acid, DNA, etc.) and source organism for each sequence is
indicated by information provided in the numeric indicator fields
<211>m<212> and <213>, respectively. Amino acid
and nucleotide sequences referred to in the specification are
identified by the indicator SEQ ID NO: followed by the sequence
identifier (eg. SEQ ID NO:1, SEQ ID NO: 2, etc). The sequence
identifier referred to in the specification correlates to the
information provided in numeric indicator field <400> in the
sequence listing, which is followed by the sequence identifier (eg.
<400>1, <400>2, etc). That is SEQ ID NO: 1 as detailed
in the specification correlates to the sequence indicated as
<400>1 in the sequence listing.
[0019] One aspect of the present invention is directed to a method
of screening for the onset or predisposition to the onset of a
large intestine neoplasm in an individual, said method comprising
measuring the level of expression of hCG.sub.--1815491 in a
biological sample from said individual wherein a higher level of
expression of hCG.sub.--1815491 or variant thereof relative to
control levels is indicative of a neoplastic large intestine cell
or a cell predisposed to the onset of a neoplastic state.
[0020] The present invention more particularly provides a method of
screening for the onset or predisposition to the onset of a large
intestine neoplasm in an individual, said method comprising
measuring the level of expression of a gene comprising a sequence
of nucleotides as set forth in SEQ ID NO:1 or a sequence having at
least 90% similarity to SEQ ID NO:1 across the length of the gene,
or variant of SEQ ID NO:1, in a biological sample from said
individual wherein a higher level of expression of said gene or
variant thereof relative to control levels is indicative of a
neoplastic large intestine cell or a cell predisposed to the onset
of a neoplastic state.
[0021] Another aspect of the present invention provides a method of
screening for the onset or predisposition to the onset of a large
intestine neoplasm in an individual, said method comprising
measuring the level of expression of one or more RNA transcripts,
which transcripts comprise an RNA sequence characterised by the
sequence of one of: [0022] (i) SEQ ID NO:21, or a sequence having
at least 90% similarity across the length of the sequence, or
variant of SEQ ID NO:21; [0023] (ii) SEQ ID NO:22, or a sequence
having at least 90% similarity across the length of the sequence,
or variant of SEQ ID NO:22; [0024] (iii) SEQ ID NO:23, or a
sequence having at least 90% similarity across the length of the
sequence, or variant of SEQ ID NO:23; [0025] (iv) SEQ ID NO:24, or
a sequence having at least 90% similarity across the length of the
sequence, or variant of SEQ ID NO:24; [0026] (v) SEQ ID NO:25, or a
sequence having at least 90% similarity across the length of the
sequence, or variant of SEQ ID NO:25; [0027] (vi) SEQ ID NO:26, or
a sequence having at least 90% similarity across the length of the
sequence, or variant of SEQ ID NO:26; [0028] (vii) SEQ ID NO:27, or
a sequence having at least 90% similarity across the length of the
sequence, or variant of SEQ ID NO:27; [0029] (viii) SEQ ID NO:28,
or a sequence having at least 90% similarity across the length of
the sequence, or variant of SEQ ID NO:28; [0030] (ix) SEQ ID NO:29,
or a sequence having at least 90% similarity across the length of
the sequence, or variant of SEQ ID NO:29; [0031] (x) SEQ ID NO:30,
or a sequence having at least 90% similarity across the length of
the sequence, or variant of SEQ ID NO:30; [0032] (xi) SEQ ID NO:31,
or a sequence having at least 90% similarity across the length of
the sequence, or variant of SEQ ID NO:31 in a biological sample
from said individual wherein a higher level of said RNA transcript
or variant thereof relative to control levels is indicative of a
neoplastic large intestine cell or a cell predisposed to the onset
of a neoplastic state.
[0033] In still another aspect the RNA transcript, the level of
expression of which is assessed in accordance with the method of
the present invention, is one or more of the transcripts
characterised by the sequence of one of: [0034] (i) SEQ ID NO:21,
or a sequence having at least 90% similarity across the length of
the sequence, or variant of SEQ ID NO:21; [0035] (ii) SEQ ID NO:24,
or a sequence having at least 90% similarity across the length of
the sequence, or variant of SEQ ID NO:24; [0036] (iii) SEQ ID
NO:27, or a sequence having at least 90% similarity across the
length of the sequence, or variant of SEQ ID NO:27; [0037] (iv) SEQ
ID NO:22, or a sequence having at least 90% similarity across the
length of the sequence, or variant of SEQ ID NO:22; [0038] (v) SEQ
ID NO:23, or a sequence having at least 90% similarity across the
length of the sequence, or variant of SEQ ID NO:23; [0039] (vi) SEQ
ID NO:30, or a sequence having at least 90% similarity across the
length of the sequence, or variant of SEQ ID NO:30; [0040] (vii)
SEQ ID NO:31, or a sequence having at least 90% similarity across
the length of the sequence, or variant of SEQ ID NO:31; [0041]
(viii) SEQ ID NO:25, or a sequence having at least 90% similarity
across the length of the sequence, or variant of SEQ ID NO:25.
[0042] In yet another aspect said RNA transcript is one or more of
the transcripts characterised by the sequence of one of: [0043] (i)
SEQ ID NO:21, or a sequence having at least 90% similarity across
the length of the sequence, or variant of SEQ ID NO:21; [0044] (ii)
SEQ ID NO:24, or a sequence having at least 90% similarity across
the length of the sequence, or variant of SEQ ID NO:24; [0045]
(iii) SEQ ID NO:27, or a sequence having at least 90% similarity
across the length of the sequence, or variant of SEQ ID NO:27;
[0046] (iv) SEQ ID NO:22, or a sequence having at least 90%
similarity across the length of the sequence, or variant of SEQ ID
NO:22.
[0047] In a further aspect there is provided a method of screening
for the onset or predisposition to the onset of a large intestine
neoplasm in an individual, said method comprising measuring the
level of expression of an RNA transcript, which transcript
comprises one or more exon segments selected from: [0048] (i) an
exon segment defined by SEQ ID NO:2, or a sequence having at least
90% similarity across the length of the sequence, or variant of SEQ
ID NO:2; [0049] (ii) an exon segment defined by SEQ ID NO:3, or a
sequence having at least 90% similarity across the length of the
sequence, or variant of SEQ ID NO:3 [0050] (iii) an exon segment
defined by SEQ ID NO:4, or a sequence having at least 90%
similarity across the length of the sequence, or variant of SEQ ID
NO:4; [0051] (iv) an exon segment defined by SEQ ID NO:5, or a
sequence having at least 90% similarity across the length of the
sequence, or variant of SEQ ID NO:5; [0052] (v) an exon segment
defined by SEQ ID NO:6, or a sequence having at least 90%
similarity across the length of the sequence, or variant of SEQ ID
NO:6; [0053] (vi) an exon segment defined by SEQ ID NO:7, or a
sequence having at least 90% similarity across the length of the
sequence, or variant of SEQ ID NO:7; [0054] (vii) an exon segment
defined by SEQ ID NO:8, or a sequence having at least 90%
similarity across the length of the sequence, or variant of SEQ ID
NO:8; [0055] (viii) an exon segment defined by SEQ ID NO:9, or a
sequence having at least 90% similarity across the length of the
sequence, or variant of SEQ ID NO:9; [0056] (ix) an exon segment
defined by SEQ ID NO:10, or a sequence having at least 90%
similarity across the length of the sequence, or variant of SEQ ID
NO:10; [0057] (x) an exon segment defined by SEQ ID NO:11, or a
sequence having at least 90% similarity across the length of the
sequence, or variant of SEQ ID NO:11; [0058] (xi) an exon segment
defined by SEQ ID NO:12, or a sequence having at least 90%
similarity across the length of the sequence, or variant of SEQ ID
NO:12 [0059] (xii) an exon segment defined by SEQ ID NO:13, or a
sequence having at least 90% similarity across the length of the
sequence, or variant of SEQ ID NO:13 [0060] (xiii) an exon segment
defined by SEQ ID NO:14, or a sequence having at least 90%
similarity across the length of the sequence, or variant of SEQ ID
NO:14 [0061] (xiv) an exon segment defined by SEQ ID NO:15, or a
sequence having at least 90% similarity across the length of the
sequence, or variant of SEQ ID NO:15 [0062] (xv) an exon segment
defined by SEQ ID NO:16, or a sequence having at least 90%
similarity across the length of the sequence, or variant of SEQ ID
NO:16 [0063] (xvi) an exon segment defined by SEQ ID NO:17, or a
sequence having at least 90% similarity across the length of the
sequence, or variant of SEQ ID NO:17 [0064] (xvii) an exon segment
defined by SEQ ID NO:18, or a sequence having at least 90%
similarity across the length of the sequence, or variant of SEQ ID
NO:18 [0065] (xviii) an exon segment defined by SEQ ID NO:19, or a
sequence having at least 90% similarity across the length of the
sequence, or variant of SEQ ID NO:19; or [0066] (xix) an exon
segment defined by SEQ ID NO:20, or a sequence having at least 90%
similarity across the length of the sequence, or variant of SEQ ID
NO:20 in a biological sample from said individual wherein a higher
level of said RNA transcript or variant thereof relative to control
levels is indicative of a neoplastic large intestine cell or a cell
predisposed to the onset of a neoplastic state.
[0067] More particularly there is provided a method of screening
for the onset or predisposition to the onset of a large intestine
neoplasm in an individual, said method comprising measuring the
level of expression of an RNA transcript, which transcript
comprises one or more exon segments selected from: [0068] (i) an
exon segment defined by SEQ ID NO:3, or a sequence having at least
90% similarity across the length of the sequence, or variant of SEQ
ID NO:3 [0069] (ii) an exon segment defined by SEQ ID NO:4, or a
sequence having at least 90% similarity across the length of the
sequence, or variant of SEQ ID NO:4; [0070] (iii) an exon segment
defined by SEQ ID NO:5, or a sequence having at least 90%
similarity across the length of the sequence, or variant of SEQ ID
NO:5; [0071] (iv) an exon segment defined by SEQ ID NO:6, or a
sequence having at least 90% similarity across the length of the
sequence, or variant of SEQ ID NO:6; [0072] (v) an exon segment
defined by SEQ ID NO:7, or a sequence having at least 90%
similarity across the length of the sequence, or variant of SEQ ID
NO:7; [0073] (vi) an exon segment defined by SEQ ID NO:8, or a
sequence having at least 90% similarity across the length of the
sequence, or variant of SEQ ID NO:8; [0074] (vii) an exon segment
defined by SEQ ID NO:9, or a sequence having at least 90%
similarity across the length of the sequence, or variant of SEQ ID
NO:9; or [0075] (viii) an exon segment defined by SEQ ID NO:10, or
a sequence having at least 90% similarity across the length of the
sequence, or variant of SEQ ID NO:10; [0076] (ix) an exon segment
defined by SEQ ID NO:11, or a sequence having at least 90%
similarity across the length of the sequence, or variant of SEQ ID
NO:11; [0077] (x) an exon segment defined by SEQ ID NO:12, or a
sequence having at least 90% similarity across the length of the
sequence, or variant of SEQ ID NO:12; [0078] (xi) an exon segment
defined by SEQ ID NO:13, or a sequence having at least 90%
similarity across the length of the sequence, or variant of SEQ ID
NO:13; [0079] (xii) an exon segment defined by SEQ ID NO:14, or a
sequence having at least 90% similarity across the length of the
sequence, or variant of SEQ ID NO:14; [0080] (xiii) an exon segment
defined by SEQ ID NO:15, or a sequence having at least 90%
similarity across the length of the sequence, or variant of SEQ ID
NO:15; [0081] (xiv) an exon segment defined by SEQ ID NO:18, or a
sequence having at least 90% similarity across the length of the
sequence, or variant of SEQ ID NO:18; [0082] (xv) an exon segment
defined by SEQ ID NO:19, or a sequence having at least 90%
similarity across the length of the sequence, or variant of SEQ ID
NO:19 in a biological sample from said individual wherein a higher
level of said RNA transcript or variant thereof relative to control
levels is indicative of a neoplastic large intestine cell or a cell
predisposed to the onset of a neoplastic state.
[0083] Yet more particularly there is provided a method of
screening for the onset or predisposition to the onset of a large
intestine neoplasm in an individual, said method comprising
measuring the level of expression of an RNA transcript selected
from: [0084] (i) an RNA transcript which comprises each of the exon
segments defined by SEQ ID NO:5, SEQ ID NO:6, SEQ ID NO:10 and SEQ
ID NO:12, or a sequence having at least 90% similarity across the
length of these sequences, or variants of SEQ ID NO:5, SEQ ID NO:6,
SEQ ID NO:10 and SEQ ID NO:12; [0085] (ii) an RNA transcript which
comprises each of the exon segments defined by SEQ ID NO:5, SEQ ID
NO:6, SEQ ID NO:10 and SEQ ID NO:14, or a sequence having at least
90% similarity across the length of these sequences, or variants of
SEQ ID NO:5, SEQ ID NO:6, SEQ ID NO:10 and SEQ ID NO:14; [0086]
(iii) an RNA transcript which comprises each of the exon segments
defined by SEQ ID NO:3 and SEQ ID NO:6, or a sequence having at
least 90% similarity across the length of these sequences, or
variants of SEQ ID NO:3 and SEQ ID NO:6; [0087] (iv) an RNA
transcript which comprises each of the exon segments defined by SEQ
ID NO:11, SEQ ID NO:12 and SEQ ID NO:18, or a sequence having at
least 90% similarity across the length of these sequences, or
variants of SEQ ID NO:11, SEQ ID NO:12 and SEQ ID NO:18; [0088] (v)
an RNA transcript which comprises each of the exon segments defined
by SEQ ID NO:4 and SEQ ID NO:7, or a sequence having at least 90%
similarity across the length of these sequences, or variants of SEQ
ID NO:4 and SEQ ID NO:7; [0089] (vi) an RNA transcript which
comprises each of the exon segments defined by SEQ ID NO:6, SEQ ID
NO:10 and SEQ ID NO:13, or a sequence having at least 90%
similarity across the length of these sequences, or variants of SEQ
ID NO:6, SEQ ID NO:10 and SEQ ID NO:13; [0090] (vii) an RNA
transcript which comprises each of the exon segments defined by SEQ
ID NO:6 and SEQ ID NO:8, or a sequence having at least 90%
similarity across the length of these sequences, or variants of SEQ
ID NO:6 and SEQ ID NO:8; [0091] (viii) an RNA transcript which
comprises each of the exon segments defined by SEQ ID NO:19 and SEQ
ID NO:18, or a sequence having at least 90% similarity across the
length of these sequences, or variants of SEQ ID NO:19 and SEQ ID
NO:18; [0092] (ix) an RNA transcript which comprises each of the
exon segments defined by SEQ ID NO:15 and SEQ ID NO:18, or a
sequence having at least 90% similarity across the length of these
sequences, or variants of SEQ ID NO:15 and SEQ ID NO:18; [0093] (x)
an RNA transcript which comprises each of the exon segments defined
by SEQ ID NO:6 and SEQ ID NO:9, or a sequence having at least 90%
similarity across the length of these sequences, or variants of SEQ
ID NO:6 and SEQ ID NO:9; or [0094] (xi) an RNA transcript which
comprises each of the exon segments defined by SEQ ID NO:4, SEQ ID
NO:6, SEQ ID NO:10 and SEQ ID NO:12, or a sequence having at least
90% similarity across the length of these sequences, or variants of
SEQ ID NO:4, SEQ ID NO:6, SEQ ID NO:10 and SEQ ID NO:12 in a
biological sample from said individual wherein a higher level of
said RNA transcript or variant thereof relative to control levels
is indicative of a neoplastic large intestine cell or a cell
predisposed to the onset of a neoplastic state.
[0095] Still more particularly there is provided a method of
screening for the onset or predisposition to the onset of a large
intestine neoplasm in an individual, said method comprising
measuring the level of expression of an RNA transcript, which
transcript is selected from: [0096] (i) an RNA transcript which
comprises each of the exon segments defined by SEQ ID NO:5, SEQ ID
NO:6, SEQ ID NO:10 and SEQ ID NO:12, or a sequence having at least
90% similarity across the length of these sequences, or variants of
SEQ ID NO:5, SEQ ID NO:6, SEQ ID NO:10 and SEQ ID NO:12; [0097]
(ii) an RNA transcript which comprises each of the exon segments
defined by SEQ ID NO:5, SEQ ID NO:6, SEQ ID NO:10 and SEQ ID NO:14,
or a sequence having at least 90% similarity across the length of
these sequences, or variants of SEQ ID NO:5, SEQ ID NO:6, SEQ ID
NO:10 and SEQ ID NO:14; [0098] (iii) an RNA transcript which
comprises each of the exon segments defined by SEQ ID NO:11, SEQ ID
NO:12, SEQ ID NO:18 and SEQ ID NO:24, or a sequence having at least
90% similarity across the length of these sequences, or variants of
SEQ ID NO:11, SEQ ID NO:12, SEQ ID NO:18 and SEQ ID NO:24; or
[0099] (iv) an RNA transcript which comprises each of the exon
segments defined by SEQ ID NO:6 and SEQ ID NO:8, or a sequence
having at least 90% similarity across the length of these
sequences, or variants of SEQ ID NO:6 and SEQ ID NO:8. in a
biological sample from said individual wherein a higher level of
said RNA transcript or variant thereof relative to control levels
is indicative of a neoplastic large intestine cell or a cell
predisposed to the onset of a neoplastic state.
[0100] In another further aspect, there is therefore provided a
method of screening for the onset or predisposition to the onset of
a large intestine neoplasm in an individual, said method comprising
measuring the level of expression of one or more RNA transcripts,
which transcripts comprise an RNA sequence characterised by the
sequence of one of: [0101] (i) SEQ ID NO:21 or a sequence having at
least 90% similarity across the length of the sequence, or variant
of SEQ ID NO:21; [0102] (ii) SEQ ID NO:24 or a sequence having at
least 90% similarity across the length of the sequence, or variant
of SEQ ID NO:24; [0103] (iii) SEQ ID NO:25 or a sequence having at
least 90% similarity across the length of the sequence, or variant
of SEQ ID NO:25; [0104] (iv) SEQ ID NO:26 or a sequence having at
least 90% similarity across the length of the sequence, or variant
of SEQ ID NO:26; [0105] (v) SEQ ID NO:27 or a sequence having at
least 90% similarity across the length of the sequence, or variant
of SEQ ID NO:27; [0106] (vi) SEQ ID NO:29 or a sequence having at
least 90% similarity across the length of the sequence, or variant
of SEQ ID NO:29; [0107] (vii) SEQ ID NO:30 or a sequence having at
least 90% similarity across the length of the sequence, or variant
of SEQ ID NO:30; or [0108] (viii) SEQ ID NO:31 or a sequence having
at least 90% similarity across the length of the sequence, or
variant of SEQ ID NO:31; in a biological sample from said
individual wherein a higher level of expression of the genes or
transcripts of group (i) and/or group (ii) relative to background
levels is indicative of a neoplastic cell or a cell predisposed to
the onset of a neoplastic state.
[0109] In yet another aspect said transcripts comprise an RNA
sequence characterised by the sequence of one of: [0110] (i) SEQ ID
NO:21 or a sequence having at least 90% similarity across the
length of the sequence, or variant of SEQ ID NO:21; [0111] (ii) SEQ
ID NO:22 or a sequence having at least 90% similarity across the
length of the sequence, or variant of SEQ ID NO:22; [0112] (iii)
SEQ ID NO:23 or a sequence having at least 90% similarity across
the length of the sequence, or variant of SEQ ID NO:23; [0113] (iv)
SEQ ID NO:24 or a sequence having at least 90% similarity across
the length of the sequence, or variant of SEQ ID NO:24; or [0114]
(v) SEQ ID NO:27 or a sequence having at least 90% similarity
across the length of the sequence, or variant of SEQ ID NO:27; in a
biological sample from said individual wherein a higher level of
expression of the genes or transcripts of group (i) and/or group
(ii) relative to background levels is indicative of a neoplastic
cell or a cell predisposed to the onset of a neoplastic state.
[0115] Still another aspect of the present invention provides a
diagnostic kit for assaying biological samples comprising an agent
for detecting one or more neoplastic marker reagents useful for
facilitating the detection by the agent in the first compartment.
Further means may also be included, for example, to receive a
biological sample. The agent may be any suitable detecting
molecule.
BRIEF DESCRIPTION OF THE DRAWINGS
[0116] FIG. 1. Detection of hCG.sub.--1815491 gene expression. The
expression from hCG.sub.--1815491 in colon tissue specimens from
222 non-diseased controls (black, area designated with an "N"), 42
colitis tissues (red, are designated by an "I"), 29 adenoma (green,
area designated by an "A") and 161 adenocarcinoma (blue, area
designated by "Ca") were measured by hybridization to Affymetrix
probeset IDs 238021_s_at (A) and 238022_at (B). The two Affymetrix
probeset IDs were included on the commercially available Affymetrix
GeneChip HGU133A & HGU13B. Gene expression profiles from RNA
extracted from the total of 454 colon tissue specimens were
obtained from GeneLogic Inc (Gaithersburg, Md. USA). A quality
control analysis was performed to remove arrays not meeting
essential quality control measures as defined by the manufacturer.
Transcript expression levels were calculated by both Microarray
Suite (MAS) 5.0 (Affymetrix) and the Robust Multichip Average (RMA)
normalization techniques (Affymetrix. GeneChip expression data
analysis fundamentals. Affymetrix, Santa Clara, Calif. USA, 2001;
Hubbell et al. Bioinformatics, 18:1585-1592, 2002; Irizarry et al.
Nucleic Acid Research, 31, 2003) MAS normalized data was used for
performing standard quality control routines and the final data set
was normalized with RMA for all subsequent analyses.
[0117] FIG. 2. Detection of SEQ ID NO:1 expression in 71 colorectal
tissue specimens. The expression of SEQ ID NO:1 in a total of 71
colorectal specimens from 30 non-diseased controls ("normals"), 21
adenoma and 21 adenocarcinoma subjects was measured by end-point
PCR using the forward and reverse oligonucleotide primers
5'-TAACTGGAATTCATGTTGGCTGAAATTCATCCCA (SEQ ID NO:89) and
5'-CACGATAAGCTTTTATTATAGTCTATAAACAGGAATACCCAAAACATA TTTAAACC (SEQ
ID NO:90). The resulting PCR products were separated by agarose
based gel electrophoresis.
[0118] FIG. 3. Measurements of SEQ ID NO:1 RNA concentration levels
in colorectal tissue specimens. Quantitative Real-Time PCR, using
forward and reverse oligonucleotide primers, 5'-TAACTGGAATTCATGTTGG
CTGAAATTCATCCCA (SEQ ID NO:91) and 5'-CACGATAAGCTTTTATTATA
GTCTATAAACAGGAATACCCAAAACATATTT AAACC (SEQ ID NO:92) was performed
on RNA extracted from a total of 71 colorectal specimens from 30
non-diseased controls (white), 21 adenoma (striped) and 21
adenocarcinoma (black) subjects. Relative expression levels were
calculated as described in Example 1.
[0119] FIG. 4. Schematic representation of predicted RNA variants
derived from hCG.sub.--1815491. cDNA clones derived from map region
8579310 to 8562303 (SEQ ID NO:1) on human chromosome 16 were used
to locate exon sequences. Arrows: Oligo nucleotide primer sets
(Table 5) were designed to allow measurement of individual RNA
variants by PCR. Oligonucleotide primers covering splice junctions
are shown as spanning intron sequences which is not included in the
actual oligonucleotide primer sequence. Exon nucleotide sequence
and genomic locations are given in FIGS. 22 and 23. The
relationship of exon "E" numbering and SEQ ID NO. numbering is
further defined in Table 1.
[0120] FIG. 5. Example on differential expression of
hCG.sub.--1815491 RNA variants in colorectal tissue specimens. The
expression of the ten predicted RNA transcripts derived from the
map region 8579310 to 8562303 on the strand of chromosome 16 was
measured by end-point PCR using specific oligonucleotide primer
sets (Table 5). DNA sequencing of the resulting PCR amplicons
confirmed the products to be derivates of SEQ ID NO:21, SEQ ID
NO:22, SEQ ID NO:24, SEQ ID NO:25, SEQ ID NO:26, SEQ ID NO:27, SEQ
ID NO:28, SEQ ID NO:29, SEQ ID NO:30 and SEQ ID NO:31 (Table
5).
[0121] FIG. 6. Measurement of SEQ ID NO:21 RNA concentration levels
in colorectal tissue specimens. Quantitative Real-Time PCR, using
forward oligonucleotide primer, 5'-ACACGGCTTTCCGGAGTAGA (SEQ ID
NO:93), and reverse oligonucleotide primer,
5'-AACAGGTTTTACCTCCTTATCTTCAGAA (SEQ ID NO:94), was performed on
RNA extracted from a total of 71 colorectal tissue specimens from
30 non-diseased controls (white), 21 adenoma (striped) and 20
adenocarcinoma (black) subjects. SEQ ID NO:21 RNA expression levels
are depicted relative to HRPT as explained in Example 1.
[0122] FIG. 7. Identification of a novel RNA variant derived from
SEQ ID NO:1. End-point PCR, using a forward oligonucleotide primer,
5'-GGCGGAGGAGAGGTG AGC (SEQ ID NO:95), spanning the junction
between SEQ ID NO:4 and SEQ ID NO:5 and a reverse oligonucleotide
primer, 5'-GCTGACAGCATCCA AATGTATTATG (SEQ ID NO:96), hybridizing
to SEQ ID NO:6, was performed on RNA extracted from colorectal
tissue specimens from 2 non-diseased controls (Ctrl), 3 adenoma and
3 adenocarcinoma subjects. The resulting PCR products were
separated by agarose-based gel electrophoresis and the products
observed in the neoplastic tissue samples were sequenced, which
confirmed the novel splicing of SEQ ID NO:4, SEQ ID NO:5 and SEQ ID
NO: 6 (Table 5).
[0123] FIG. 8. Measurement of expression of individual target
regions in SEQ ID NO:1. The level of RNA hybridization to 13
Affymetrix probesets, Table 3, residing in the map region 8579310
to 8562303 was measured using the Affymetrix GeneChip HuGene Exon
1.0 as recommended by manufacturer. RNA was extracted from colon
tissue specimens from 5 non-diseased controls (left bar in
boxplots), 5 adenoma (middle bar in boxplots) and 5 adenocarcinoma
subjects (right bar in boxplots). The individual boxplots are also
given in FIGS. 9-21. The relationship of exon "E" numbering and SEQ
ID NO. numbering is further defined in Table 1.
[0124] FIG. 9. Measurement of RNA expression on Affymetrix GeneChip
HuGene Exon 1.0 probeset ID 3692527 (referred to as Probeset A in
FIG. 8) targeting map region 8577230 to 8576913 of SEQ ID NO:1.
Expression profiles were obtained from hybridisation analysis of
RNA extracted from colon tissue specimens from 5 non-diseased
controls, 5 adenoma and 5 adenocarcinoma subjects as further
described in Example 1
[0125] FIG. 10. Measurement of RNA expression on Affymetrix
GeneChip HuGene Exon 1.0 probeset ID 3692526 (referred to as
Probeset B in FIG. 8) targeting map region 8576785 to 8576609 of
SEQ ID NO:1. Expression profiles were obtained from hybridisation
analysis of RNA extracted from colon tissue specimens from 5
non-diseased controls, 5 adenoma and 5 adenocarcinoma subjects as
further described in Example 1
[0126] FIG. 11. Measurement of RNA expression on Affymetrix
GeneChip HuGene Exon 1.0 probeset ID 3692525 (referred to as
Probeset C in FIG. 8) targeting map region 8573317 to 8573214 of
SEQ ID NO:1. Expression profiles were obtained from hybridisation
analysis of RNA extracted from colon tissue specimens from 5
non-diseased controls, 5 adenoma and 5 adenocarcinoma subjects as
further described in Example 1.
[0127] FIG. 12. Measurement of RNA expression on Affymetrix
GeneChip HuGene Exon 1.0 probeset ID 3692524 (referred to as
Probeset D in FIG. 8) targeting map region 8571756 to 8571721 of
SEQ ID NO:1. Expression profiles were obtained from hybridisation
analysis of RNA extracted from colon tissue specimens from 5
non-diseased controls, 5 adenoma and 5 adenocarcinoma subjects as
further described in Example 1.
[0128] FIG. 13. Measurement of RNA expression on Affymetrix
GeneChip HuGene Exon 1.0 probeset ID 3692522 (referred to as
Probeset E in FIG. 8) targeting map region 8568480 to 8568447 of
SEQ ID NO:1. Expression profiles were obtained from hybridisation
analysis of RNA extracted from colon tissue specimens from 5
non-diseased controls, 5 adenoma and 5 adenocarcinoma subjects as
further described in Example 1.
[0129] FIG. 14. Measurement of RNA expression on Affymetrix
GeneChip HuGene Exon 1.0 probeset ID 3692521 (referred to as
Probeset F in FIG. 8) targeting map region 8568438 to 8568409 of
SEQ ID NO:1. Expression profiles were obtained from hybridisation
analysis of RNA extracted from colon tissue specimens from 5
non-diseased controls, 5 adenoma and 5 adenocarcinoma subjects as
further described in Example 1.
[0130] FIG. 15. Measurement of RNA expression on Affymetrix
GeneChip HuGene Exon 1.0 probeset ID 3692504 (referred to as
Probeset G in FIG. 8) targeting map region 8566289 to 8566014 of
SEQ ID NO:1. Expression profiles were obtained from hybridisation
analysis of RNA extracted from colon tissue specimens from 5
non-diseased controls, 5 adenoma and 5 adenocarcinoma subjects as
further described in Example 1.
[0131] FIG. 16. Measurement of RNA expression on Affymetrix
GeneChip HuGene Exon 1.0 probeset ID 3692505 (referred to as
Probeset H in FIG. 8) targeting map region 8577467 to 8577374 of
SEQ ID NO:1. Expression profiles were obtained from hybridisation
analysis of RNA extracted from colon tissue specimens from 5
non-diseased controls, 5 adenoma and 5 adenocarcinoma subjects as
further described in Example 1.
[0132] FIG. 17. Measurement of RNA expression on Affymetrix
GeneChip HuGene Exon 1.0 probeset ID 3692523 (referred to as
Probeset I in FIG. 8) targeting map region 8569323 to 8568689 of
SEQ ID NO:1. Expression profiles were obtained from hybridisation
analysis of RNA extracted from colon tissue specimens from 5
non-diseased controls, 5 adenoma and 5 adenocarcinoma subjects as
further described in Example 1.
[0133] FIG. 18. Measurement of RNA expression on Affymetrix
GeneChip HuGene Exon 1.0 probeset ID 3692520 (referred to as
Probeset J in FIG. 8) targeting map region 8568331 to 8567516 of
SEQ ID NO:1. Expression profiles were obtained from hybridisation
analysis of RNA extracted from colon tissue specimens from 5
non-diseased controls, 5 adenoma and 5 adenocarcinoma subjects as
further described in Example 1.
[0134] FIG. 19. Measurement of RNA expression on Affymetrix
GeneChip HuGene Exon 1.0 probeset ID 3692519 (referred to as
Probeset K in FIG. 8) targeting map region 8567301 to 8567162 of
SEQ ID NO:1. Expression profiles were obtained from hybridisation
analysis of RNA extracted from colon tissue specimens from 5
non-diseased controls, 5 adenoma and 5 adenocarcinoma subjects as
further described in Example 1.
[0135] FIG. 20. Measurement of RNA expression on Affymetrix
GeneChip HuGene Exon 1.0 probeset ID 3692517 (referred to as
Probeset L in FIG. 8) targeting map region 8567033 to 8566994 of
SEQ ID NO:1. Expression profiles were obtained from hybridisation
analysis of RNA extracted from colon tissue specimens from 5
non-diseased controls, 5 adenoma and 5 adenocarcinoma subjects as
further described in Example 1.
[0136] FIG. 21. Measurement of RNA expression on Affymetrix
GeneChip HuGene Exon 1.0 probeset ID 3692518 (referred to as
Probeset M in FIG. 8) targeting map region 8567158 to 8567091 of
SEQ ID NO:1. Expression profiles were obtained from hybridisation
analysis of RNA extracted from colon tissue specimens from 5
non-diseased controls, 5 adenoma and 5 adenocarcinoma subjects as
further described in Example 1.
[0137] FIG. 22. SEQ ID NO:1 is specified by a 17,008 nucleotide
sequence located on the minus strand of human chromosome 16 in the
map region 8579310 to 8562303 (+ strand nomenclature) as specified
by the NCBI contig ref: NT.sub.--010498.15|Hs16.sub.--10655, NCBI
36 March 2006 genome. Grey shading indicates location of nucleotide
segments, i.e. exons, utilised in the RNA variants further
described in FIG. 23.
[0138] FIG. 23. SEQ ID NO: 2 to SEQ ID NO: 20 identified to be
alternatively spliced to generate the 10 RNA variants depicted in
FIG. 4.
[0139] FIG. 24. Nucleotide sequences targeted by Affymetrix
probeset ID 238021_s_at and probeset ID 238022_at.
DETAILED DESCRIPTION OF THE INVENTION
[0140] The present invention is predicated, in part, on the
elucidation of a gene expression profile, specifically that of
hCG.sub.--1815491, which characterises large intestine cellular
populations in terms of their neoplastic state. This finding has
now facilitated the development of routine means of screening for
the onset or predisposition to the onset of a large intestine
neoplasm based on screening for upregulation of the expression of
this molecule, relative to control expression levels. To this end,
in addition to assessing expression levels of hCG.sub.--1815491
relative to normal or non-neoplastic levels, it has been determined
that hCG.sub.--1815491 is alternatively spliced in a neoplastic
specific manner, thereby enabling a high level of
discrimination.
[0141] In accordance with the present invention, it has been
determined that hCG.sub.--1815491 is modulated, in terms of
differential changes to its levels of expression, depending on
whether the cell expressing that gene is neoplastic or not. It
should be understood that reference to a gene "expression product"
or "expression of a gene" is a reference to either a transcription
product (such as primary RNA or mRNA) or a translation product such
as protein. This gene and its expression products, whether they be
RNA transcripts or encoded proteins, are collectively referred to
as the "neoplastic marker".
[0142] Accordingly, one aspect of the present invention is directed
to a method of screening for the onset or predisposition to the
onset of a large intestine neoplasm in an individual, said method
comprising measuring the level of expression of hCG.sub.--1815491
in a biological sample from said individual wherein a higher level
of expression of hCG.sub.--1815491 or variant thereof relative to
control levels is indicative of a neoplastic large intestine cell
or a cell predisposed to the onset of a neoplastic state.
[0143] Reference to "large intestine" should be understood as a
reference to a cell derived from one of the six anatomical regions
of the large intestine, which regions commence after the terminal
region of the ileum, these being: [0144] (i) the cecum; [0145] (ii)
the ascending colon; [0146] (iii) the transverse colon; [0147] (iv)
the descending colon; [0148] (v) the sigmoid colon; and [0149] (vi)
the rectum.
[0150] Reference to "neoplasm" should be understood as a reference
to a lesion, tumour or other encapsulated or unencapsulated mass or
other form of growth which comprises neoplastic cells. A
"neoplastic cell" should be understood as a reference to a cell
exhibiting abnormal growth. The term "growth" should be understood
in its broadest sense and includes reference to proliferation. In
this regard, an example of abnormal cell growth is the uncontrolled
proliferation of a cell. Another example is failed apoptosis in a
cell, thus prolonging its usual life span. The neoplastic cell may
be a benign cell or a malignant cell. In a preferred embodiment,
the subject neoplasm is an adenoma or an adenocarcinoma. Without
limiting the present invention to any one theory or mode of action,
an adenoma is generally a benign tumour of epithelial origin which
is either derived from epithelial tissue or exhibits clearly
defined epithelial structures. These structures may take on a
glandular appearance. It can comprise a malignant cell population
within the adenoma, such as occurs with the progression of a benign
adenoma to a malignant adenocarcinoma.
[0151] Preferably, said neoplastic cell is an adenoma or
adenocarcinoma and even more preferably a colorectal adenoma or
adenocarcinoma.
[0152] Reference to "hCG.sub.--1815491" and its transcribed and
translated expression products should be understood as a reference
to all forms of this gene and to fragments thereof. As would be
appreciated by the person of skill in the art, genes are known to
exhibit allelic or polymorphic variation between individuals.
Accordingly, reference to "hCG.sub.--1815491" should be understood
to extend to such variants which, in terms of the present
diagnostic applications, achieve the same outcome despite the fact
that minor genetic variations between the actual nucleic acid
sequences may exist between individuals. Reference to "variants"
should also be understood to extend to alternative transcriptional
forms of hCG.sub.--1815491, such as splice variants or variants
which otherwise exhibit variation to exon expression and
arrangement, such as in terms of multiple exon combinations or
alternate 5'- or 3'-ends. The present invention should therefore be
understood to extend to all forms of RNA (eg mRNA, primary RNA
transcript, miRNA, etc), cDNA and peptide isoforms which arise from
alternative splicing or any other mutation, polymorphic or allelic
variation. It should also be understood to include reference to any
subunit polypeptides such as precursor forms which may be
generated, whether existing as a monomer, multimer, fusion protein
or other complex.
[0153] Without limiting the present invention to any one theory or
mode of action, the hCG.sub.--1815491 genomic sequence comprises
SEQ ID NO:1. The SEQ ID NO:1 nucleic acid molecule has been
determined to generate at least 18 alternatively spliced exon
segments, as follows:
(i) Exon segment E1 which is defined by SEQ ID NO:2 (ii) Exon
segment E2 which is defined by SEQ ID NO:3 (iii) Exon segment E2a
which is defined by SEQ ID NO:4 (iv) Exon segment E2b which is
defined by SEQ ID NO:5 (v) Exon segment E3 which is defined by SEQ
ID NO:6 (vi) Exon segment E3a which is defined by SEQ ID NO:7 (vii)
Exon segment E4 which is defined by SEQ ID NO:8 (viii) Exon segment
E5 which is defined by SEQ ID NO:9 (ix) Exon segment E5a which is
defined by SEQ ID NO:10 (x) Exon segment E5b which is defined by
SEQ ID NO:11 (xi) Exon segment E6 which is defined by SEQ ID NO:12
(xii) Exon segment E6a which is defined by SEQ ID NO:13 (xiii) Exon
segment E6c which is defined by SEQ ID NO:14 (xiv) Exon segment E6d
which is defined by SEQ ID NO:15 (xv) Exon segment E6e which is
defined by SEQ ID NO:16 (xvi) Exon segment E7 which is defined by
SEQ ID NO:17 (xvii) Exon segment E7a which is defined by SEQ ID
NO:18 (xviii) Exon segment UE6/7 which is defined by SEQ ID NO:19
(xix) Exon segment E8 which is defined by SEQ ID NO:20.
[0154] SEQ ID NO:1 has at least 8 putative exon segments (SEQ ID
NO:2, SEQ ID NO:3, SEQ ID NO:6, SEQ ID NO:8, SEQ ID NO:9, SEQ ID
NO:12, SEQ ID NO:17, SEQ ID NO:20) of which several are
alternatively spliced. It has been still further determined that
from this genomic structure there are transcribed at least 11
different RNA transcripts which each comprise one of the sequences
depicted in SEQ ID NOs:21-31, Table 1 and are schematically
depicted in FIG. 4. It would be appreciated that the sequences
which are depicted in SEQ ID NOs:21-31 take the form of DNA since
they have been assembled using SEQ ID NO:1. However, the RNA
transcripts which are generated either in vivo or in vitro would be
characterised by comprising a corresponding sequence, albeit in RNA
form.
[0155] Accordingly, in terms of the method of the present
invention, screening for the "level of expression" of
hCG.sub.--1815491 may be achieved in a variety of ways including
screening for any of the forms of RNA transcribed from
hCG.sub.--1815491, cDNA generated therefrom or a protein expression
product. Changes to the levels of any of these products is
indicative of changes to the expression of the subject gene. Still
further, the molecule which is identified and measured may be a
whole molecule or a fragment thereof. For example, one is more
likely to identify only fragments of RNA or protein molecules in a
stool sample although provided that said fragment comprises
sufficient sequence to indicate that its origin with the
hCG.sub.--1815491 gene is more likely than not (such as one or more
of the exon segments or exons detailed above), fragmented
hCG.sub.--1815491 molecules are useful in the context of the method
of the present invention. For example, the identification of RNA
transcripts corresponding to one or more of the exon segments
herein defined, alone or in combination, is a useful means of
screening for changes to hCG.sub.--1815491 expression.
[0156] The present invention therefore more particularly provides a
method of screening for the onset or predisposition to the onset of
a large intestine neoplasm in an individual, said method comprising
measuring the level of expression of a gene comprising a sequence
of nucleotides as set forth in SEQ ID NO:1 or a sequence having at
least 90% similarity to SEQ ID NO:1 across the length of the gene,
or variant of SEQ ID NO:1, in a biological sample from said
individual wherein a higher level of expression of said gene or
variant thereof relative to control levels is indicative of a
neoplastic large intestine cell or a cell predisposed to the onset
of a neoplastic state.
[0157] Reference to "gene" herein should be understood as a
reference to any genomic locus or set of loci which give rise to
RNA transcripts from one or more promoters, including transcripts
formed by the splicing of two or more exons as hereinbefore
described. It would be appreciated that not all RNA transcripts are
necessarily translated to a protein expression product.
[0158] In one embodiment of the present invention, said
hCG.sub.--1815491 expression levels are assessed by screening for
the levels of expression of one or more of the RNA transcripts
which are generated from the SEQ ID NO:1 genomic sequence.
[0159] Accordingly, in accordance with this embodiment there is
provided a method of screening for the onset or predisposition to
the onset of a large intestine neoplasm in an individual, said
method comprising measuring the level of expression of one or more
RNA transcripts, which transcripts comprise an RNA sequence
characterised by the sequence of one of: [0160] (i) SEQ ID NO:21,
or a sequence having at least 90% similarity across the length of
the sequence, or variant of SEQ ID NO:21; [0161] (ii) SEQ ID NO:22,
or a sequence having at least 90% similarity across the length of
the sequence, or variant of SEQ ID NO:22; [0162] (iii) SEQ ID
NO:23, or a sequence having at least 90% similarity across the
length of the sequence, or variant of SEQ ID NO:23; [0163] (iv) SEQ
ID NO:24, or a sequence having at least 90% similarity across the
length of the sequence, or variant of SEQ ID NO:24; [0164] (v) SEQ
ID NO:25, or a sequence having at least 90% similarity across the
length of the sequence, or variant of SEQ ID NO:25; [0165] (vi) SEQ
ID NO:26, or a sequence having at least 90% similarity across the
length of the sequence, or variant of SEQ ID NO:26; [0166] (vii)
SEQ ID NO:27, or a sequence having at least 90% similarity across
the length of the sequence, or variant of SEQ ID NO:27; [0167]
(viii) SEQ ID NO:28, or a sequence having at least 90% similarity
across the length of the sequence, or variant of SEQ ID NO:28;
[0168] (ix) SEQ ID NO:29, or a sequence having at least 90%
similarity across the length of the sequence, or variant of SEQ ID
NO:29; [0169] (x) SEQ ID NO:30, or a sequence having at least 90%
similarity across the length of the sequence, or variant of SEQ ID
NO:30; [0170] (xi) SEQ ID NO:31, or a sequence having at least 90%
similarity across the length of the sequence, or variant of SEQ ID
NO:31 in a biological sample from said individual wherein a higher
level of said RNA transcript or variant thereof relative to control
levels is indicative of a neoplastic large intestine cell or a cell
predisposed to the onset of a neoplastic state.
[0171] Reference to said RNA transcript being "characterised by"
the sequence of any one of SEQ ID NOs:21-31 should be understood to
mean that the subject RNA transcript comprises a corresponding RNA
form of the DNA sequence information which is depicted in SEQ ID
NOs:21-31. That is, each of the DNA nucleotides depicted in these
sequences should be replaced with the corresponding RNA version of
that nucleotide.
[0172] Preferably, the RNA transcript, the level of expression of
which is assessed in accordance with the method of the present
invention, is one or more of the transcripts characterised by the
sequence of one of: [0173] (i) SEQ ID NO:21, or a sequence having
at least 90% similarity across the length of the sequence, or
variant of SEQ ID NO:21; [0174] (ii) SEQ ID NO:24, or a sequence
having at least 90% similarity across the length of the sequence,
or variant of SEQ ID NO:24; [0175] (iii) SEQ ID NO:27, or a
sequence having at least 90% similarity across the length of the
sequence, or variant of SEQ ID NO:27; [0176] (iv) SEQ ID NO:22, or
a sequence having at least 90% similarity across the length of the
sequence, or variant of SEQ ID NO:22; [0177] (v) SEQ ID NO:23, or a
sequence having at least 90% similarity across the length of the
sequence, or variant of SEQ ID NO:23; [0178] (vi) SEQ ID NO:30, or
a sequence having at least 90% similarity across the length of the
sequence, or variant of SEQ ID NO:30; [0179] (vii) SEQ ID NO:31, or
a sequence having at least 90% similarity across the length of the
sequence, or variant of SEQ ID NO:31; [0180] (viii) SEQ ID NO:25,
or a sequence having at least 90% similarity across the length of
the sequence, or variant of SEQ ID NO:25.
[0181] Even more preferably, said RNA transcript is one or more of
the transcripts characterised by the sequence of one of: [0182] (i)
SEQ ID NO:21, or a sequence having at least 90% similarity across
the length of the sequence, or variant of SEQ ID NO:21; [0183] (ii)
SEQ ID NO:24, or a sequence having at least 90% similarity across
the length of the sequence, or variant of SEQ ID NO:24; [0184]
(iii) SEQ ID NO:27, or a sequence having at least 90% similarity
across the length of the sequence, or variant of SEQ ID NO:27;
[0185] (iv) SEQ ID NO:22, or a sequence having at least 90%
similarity across the length of the sequence, or variant of SEQ ID
NO:22.
[0186] Most preferably, said RNA transcript is characterised by SEQ
ID NO:2.
[0187] In accordance with these aspects of the present invention,
one may screen for the RNA transcript itself or for an expression
product translated from said RNA transcript.
[0188] It should be understood that one may choose to screen for
any one or more of said transcripts in a single sample of
interest.
[0189] As detailed hereinbefore, hCG.sub.--1815491 has been
determined to comprise 18 alternatively spliced exon segments which
give rise to at least 11 RNA transcripts. It has now been
determined that screening for the expression of one or more of the
exon segments themselves is indicative of the neoplastic state of
the individual in issue. It has still further been determined that
the identification of certain combinations of these exons is
particularly useful in this regard. To this end, it should be
appreciated that the specific exon combinations which are
hereinafter discussed may, in some RNA transcripts, have been
spliced such that they are joined. In other transcripts, the
subject exons may not be joined to one another but may be
positioned, relative to one another, either proximally or distally
along the transcript.
[0190] According to this embodiment there is therefore provided a
method of screening for the onset or predisposition to the onset of
a large intestine neoplasm in an individual, said method comprising
measuring the level of expression of an RNA transcript, which
transcript comprises one or more exon segments selected from:
[0191] (i) an exon segment defined by SEQ ID NO:2, or a sequence
having at least 90% similarity across the length of the sequence,
or variant of SEQ ID NO:2; [0192] (ii) an exon segment defined by
SEQ ID NO:3, or a sequence having at least 90% similarity across
the length of the sequence, or variant of SEQ ID NO:3 [0193] (iii)
an exon segment defined by SEQ ID NO:4, or a sequence having at
least 90% similarity across the length of the sequence, or variant
of SEQ ID NO:4; [0194] (iv) an exon segment defined by SEQ ID NO:5,
or a sequence having at least 90% similarity across the length of
the sequence, or variant of SEQ ID NO:5; [0195] (v) an exon segment
defined by SEQ ID NO:6, or a sequence having at least 90%
similarity across the length of the sequence, or variant of SEQ ID
NO:6; [0196] (vi) an exon segment defined by SEQ ID NO:7, or a
sequence having at least 90% similarity across the length of the
sequence, or variant of SEQ ID NO:7; [0197] (vii) an exon segment
defined by SEQ ID NO:8, or a sequence having at least 90%
similarity across the length of the sequence, or variant of SEQ ID
NO:8; [0198] (viii) an exon segment defined by SEQ ID NO:9, or a
sequence having at least 90% similarity across the length of the
sequence, or variant of SEQ ID NO:9; [0199] (ix) an exon segment
defined by SEQ ID NO:10, or a sequence having at least 90%
similarity across the length of the sequence, or variant of SEQ ID
NO:10; [0200] (x) an exon segment defined by SEQ ID NO:11, or a
sequence having at least 90% similarity across the length of the
sequence, or variant of SEQ ID NO:11; [0201] (xi) an exon segment
defined by SEQ ID NO:12, or a sequence having at least 90%
similarity across the length of the sequence, or variant of SEQ ID
NO:12 [0202] (xii) an exon segment defined by SEQ ID NO:13, or a
sequence having at least 90% similarity across the length of the
sequence, or variant of SEQ ID NO:13 [0203] (xiii) an exon segment
defined by SEQ ID NO:14, or a sequence having at least 90%
similarity across the length of the sequence, or variant of SEQ ID
NO:14 [0204] (xiv) an exon segment defined by SEQ ID NO:15, or a
sequence having at least 90% similarity across the length of the
sequence, or variant of SEQ ID NO:15 [0205] (xv) an exon segment
defined by SEQ ID NO:16, or a sequence having at least 90%
similarity across the length of the sequence, or variant of SEQ ID
NO:16 [0206] (xvi) an exon segment defined by SEQ ID NO:17, or a
sequence having at least 90% similarity across the length of the
sequence, or variant of SEQ ID NO:17 [0207] (xvii) an exon segment
defined by SEQ ID NO:18, or a sequence having at least 90%
similarity across the length of the sequence, or variant of SEQ ID
NO:18 [0208] (xviii) an exon segment defined by SEQ ID NO:19, or a
sequence having at least 90% similarity across the length of the
sequence, or variant of SEQ ID NO:19; or [0209] (xix) an exon
segment defined by SEQ ID NO:20, or a sequence having at least 90%
similarity across the length of the sequence, or variant of SEQ ID
NO:20 in a biological sample from said individual wherein a higher
level of said RNA transcript or variant thereof relative to control
levels is indicative of a neoplastic large intestine cell or a cell
predisposed to the onset of a neoplastic state.
[0210] More particularly there is provided a method of screening
for the onset or predisposition to the onset of a large intestine
neoplasm in an individual, said method comprising measuring the
level of expression of an RNA transcript, which transcript
comprises one or more exon segments selected from: [0211] (i) an
exon segment defined by SEQ ID NO:3, or a sequence having at least
90% similarity across the length of the sequence, or variant of SEQ
ID NO:3 [0212] (ii) an exon segment defined by SEQ ID NO:4, or a
sequence having at least 90% similarity across the length of the
sequence, or variant of SEQ ID NO:4; [0213] (iii) an exon segment
defined by SEQ ID NO:5, or a sequence having at least 90%
similarity across the length of the sequence, or variant of SEQ ID
NO:5; [0214] (iv) an exon segment defined by SEQ ID NO:6, or a
sequence having at least 90% similarity across the length of the
sequence, or variant of SEQ ID NO:6; [0215] (v) an exon segment
defined by SEQ ID NO:7, or a sequence having at least 90%
similarity across the length of the sequence, or variant of SEQ ID
NO:7; [0216] (vi) an exon segment defined by SEQ ID NO:8, or a
sequence having at least 90% similarity across the length of the
sequence, or variant of SEQ ID NO:8; [0217] (vii) an exon segment
defined by SEQ ID NO:9, or a sequence having at least 90%
similarity across the length of the sequence, or variant of SEQ ID
NO:9; or [0218] (viii) an exon segment defined by SEQ ID NO:10, or
a sequence having at least 90% similarity across the length of the
sequence, or variant of SEQ ID NO:10; [0219] (ix) an exon segment
defined by SEQ ID NO:11, or a sequence having at least 90%
similarity across the length of the sequence, or variant of SEQ ID
NO:11; [0220] (x) an exon segment defined by SEQ ID NO:12, or a
sequence having at least 90% similarity across the length of the
sequence, or variant of SEQ ID NO:12; [0221] (xi) an exon segment
defined by SEQ ID NO:13, or a sequence having at least 90%
similarity across the length of the sequence, or variant of SEQ ID
NO:13; [0222] (xii) an exon segment defined by SEQ ID NO:14, or a
sequence having at least 90% similarity across the length of the
sequence, or variant of SEQ ID NO:14; [0223] (xiii) an exon segment
defined by SEQ ID NO:15, or a sequence having at least 90%
similarity across the length of the sequence, or variant of SEQ ID
NO:15; [0224] (xiv) an exon segment defined by SEQ ID NO:18, or a
sequence having at least 90% similarity across the length of the
sequence, or variant of SEQ ID NO:18; [0225] (xv) an exon segment
defined by SEQ ID NO:19, or a sequence having at least 90%
similarity across the length of the sequence, or variant of SEQ ID
NO:19 in a biological sample from said individual wherein a higher
level of said RNA transcript or variant thereof relative to control
levels is indicative of a neoplastic large intestine cell or a cell
predisposed to the onset of a neoplastic state.
[0226] Yet more particularly there is provided a method of
screening for the onset or predisposition to the onset of a large
intestine neoplasm in an individual, said method comprising
measuring the level of expression of an RNA transcript selected
from: [0227] (i) an RNA transcript which comprises each of the exon
segments defined by SEQ ID NO:5, SEQ ID NO:6, SEQ ID NO:10 and SEQ
ID NO:12, or a sequence having at least 90% similarity across the
length of these sequences, or variants of SEQ ID NO:5, SEQ ID NO:6,
SEQ ID NO:10 and SEQ ID NO:12; [0228] (ii) an RNA transcript which
comprises each of the exon segments defined by SEQ ID NO:5, SEQ ID
NO:6, SEQ ID NO:10 and SEQ ID NO:14, or a sequence having at least
90% similarity across the length of these sequences, or variants of
SEQ ID NO:5, SEQ ID NO:6, SEQ ID NO:10 and SEQ ID NO:14; [0229]
(iii) an RNA transcript which comprises each of the exon segments
defined by SEQ ID NO:3 and SEQ ID NO:6, or a sequence having at
least 90% similarity across the length of these sequences, or
variants of SEQ ID NO:3 and SEQ ID NO:6; [0230] (iv) an RNA
transcript which comprises each of the exon segments defined by SEQ
ID NO:11, SEQ ID NO:12 and SEQ ID NO:18, or a sequence having at
least 90% similarity across the length of these sequences, or
variants of SEQ ID NO:11, SEQ ID NO:12 and SEQ ID NO:18; [0231] (v)
an RNA transcript which comprises each of the exon segments defined
by SEQ ID NO:4 and SEQ ID NO:7, or a sequence having at least 90%
similarity across the length of these sequences, or variants of SEQ
ID NO:4 and SEQ ID NO:7; [0232] (vi) an RNA transcript which
comprises each of the exon segments defined by SEQ ID NO:6, SEQ ID
NO:10 and SEQ ID NO:13, or a sequence having at least 90%
similarity across the length of these sequences, or variants of SEQ
ID NO:6, SEQ ID NO:10 and SEQ ID NO:13; [0233] (vii) an RNA
transcript which comprises each of the exon segments defined by SEQ
ID NO:6 and SEQ ID NO:8, or a sequence having at least 90%
similarity across the length of these sequences, or variants of SEQ
ID NO:6 and SEQ ID NO:8; [0234] (viii) an RNA transcript which
comprises each of the exon segments defined by SEQ ID NO:19 and SEQ
ID NO:18, or a sequence having at least 90% similarity across the
length of these sequences, or variants of SEQ ID NO:19 and SEQ ID
NO:18; [0235] (ix) an RNA transcript which comprises each of the
exon segments defined by SEQ ID NO:15 and SEQ ID NO:18, or a
sequence having at least 90% similarity across the length of these
sequences, or variants of SEQ ID NO:15 and SEQ ID NO:18; [0236] (x)
an RNA transcript which comprises each of the exon segments defined
by SEQ ID NO:6 and SEQ ID NO:9, or a sequence having at least 90%
similarity across the length of these sequences, or variants of SEQ
ID NO:6 and SEQ ID NO:9; or [0237] (xi) an RNA transcript which
comprises each of the exon segments defined by SEQ ID NO:4, SEQ ID
NO:6, SEQ ID NO:10 and SEQ ID NO:12, or a sequence having at least
90% similarity across the length of these sequences, or variants of
SEQ ID NO:4, SEQ ID NO:6, SEQ ID NO:10 and SEQ ID NO:12 in a
biological sample from said individual wherein a higher level of
said RNA transcript or variant thereof relative to control levels
is indicative of a neoplastic large intestine cell or a cell
predisposed to the onset of a neoplastic state.
[0238] In a further aspect there is provided a method of screening
for the onset or predisposition to the onset of a large intestine
neoplasm in an individual, said method comprising measuring the
level of expression of an RNA transcript, which transcript
comprises one or more exon segments selected from: [0239] (i) an
exon segment defined by SEQ ID NO:5, or a sequence having at least
90% similarity across the length of the sequence, or variant of SEQ
ID NO:5; [0240] (ii) an exon segment defined by SEQ ID NO:6, or a
sequence having at least 90% similarity across the length of the
sequence, or variant of SEQ ID NO:6; [0241] (iii) an exon segment
defined by SEQ ID NO:8, or a sequence having at least 90%
similarity across the length of the sequence, or variant of SEQ ID
NO:8; [0242] (iv) an exon segment defined by SEQ ID NO:10, or a
sequence having at least 90% similarity across the length of the
sequence, or variant of SEQ ID NO:10; [0243] (v) an exon segment
defined by SEQ ID NO:11, or a sequence having at least 90%
similarity across the length of the sequence, or variant of SEQ ID
NO:11; [0244] (vi) an exon segment defined by SEQ ID NO:12, or a
sequence having at least 90% similarity across the length of the
sequence, or variant of SEQ ID NO:12; [0245] (vii) an exon segment
defined by SEQ ID NO:14, or a sequence having at least 90%
similarity across the length of the sequence, or variant of SEQ ID
NO:14; or [0246] (viii) an exon segment defined by SEQ ID NO:18, or
a sequence having at least 90% similarity across the length of the
sequence, or variant of SEQ ID NO:18. in a biological sample from
said individual wherein a higher level of said RNA transcript or
variant thereof relative to control levels is indicative of a
neoplastic large intestine cell or a cell predisposed to the onset
of a neoplastic state.
[0247] Still more particularly there is provided a method of
screening for the onset or predisposition to the onset of a large
intestine neoplasm in an individual, said method comprising
measuring the level of expression of an RNA transcript, which
transcript is selected from: [0248] (i) an RNA transcript which
comprises each of the exon segments defined by SEQ ID NO:5, SEQ ID
NO:6, SEQ ID NO:10 and SEQ ID NO:12, or a sequence having at least
90% similarity across the length of these sequences, or variants of
SEQ ID NO:5, SEQ ID NO:6, SEQ ID NO:10 and SEQ ID NO:12; [0249]
(ii) an RNA transcript which comprises each of the exon segments
defined by SEQ ID NO:5, SEQ ID NO:6, SEQ ID NO:10 and SEQ ID NO:14,
or a sequence having at least 90% similarity across the length of
these sequences, or variants of SEQ ID NO:5, SEQ ID NO:6, SEQ ID
NO:10 and SEQ ID NO:14; [0250] (iii) an RNA transcript which
comprises each of the exon segments defined by SEQ ID NO:11, SEQ ID
NO:12, SEQ ID NO:18 and SEQ ID NO:24, or a sequence having at least
90% similarity across the length of these sequences, or variants of
SEQ ID NO:11, SEQ ID NO:12, SEQ ID NO:18 and SEQ ID NO:24; or
[0251] (iv) an RNA transcript which comprises each of the exon
segments defined by SEQ ID NO:6 and SEQ ID NO:8, or a sequence
having at least 90% similarity across the length of these
sequences, or variants of SEQ ID NO:6 and SEQ ID NO:8. in a
biological sample from said individual wherein a higher level of
said RNA transcript or variant thereof relative to control levels
is indicative of a neoplastic large intestine cell or a cell
predisposed to the onset of a neoplastic state.
[0252] In yet still another aspect, the exon segments of said
transcripts are spliced such that they are joined.
[0253] With regard to the issue of sequence similarity (also
referred to as "identity"), terms used to describe sequence
relationships between two or more polynucleotides include
"reference sequence", "comparison window", "sequence similarity",
"sequence identity", "percentage of sequence similarity",
"percentage of sequence identity", "substantially similar" and
"substantial identity". A "reference sequence" is at least 12 but
frequently 15 to 18 and often at least 25 or above, such as 30
monomer units in length. Because two polynucleotides may each
comprise (1) a sequence (i.e. only a portion of the complete
polynucleotide sequence) that is similar between the two
polynucleotides, and (2) a sequence that is divergent between the
two polynucleotides, sequence comparisons between two (or more)
polynucleotides are typically performed by comparing sequences of
the two polynucleotides over a "comparison window" to identify and
compare local regions of sequence similarity. A "comparison window"
refers to a conceptual segment of typically 12 contiguous residues
that is compared to a reference sequence. The comparison window may
comprise additions or deletions (i.e. gaps) of about 20% or less as
compared to the reference sequence (which does not comprise
additions or deletions) for optimal alignment of the two sequences.
Optimal alignment of sequences for aligning a comparison window may
be conducted by computerized implementations of algorithms (GAP,
BESTFIT, FASTA, and TFASTA in the Wisconsin Genetics Software
Package Release 7.0, Genetics Computer Group, 575 Science Drive
Madison, Wis., USA) or by inspection and the best alignment (i.e.
resulting in the highest percentage homology over the comparison
window) generated by any of the various methods selected. Reference
also may be made to the BLAST family of programs as for example
disclosed by Altschul et al. (Nucl. Acids Res. 25: 3389, 1997). A
detailed discussion of sequence analysis can be found in Unit 19.3
of Ausubel et al. ("Current Protocols in Molecular Biology" John
Wiley & Sons Inc, Chapter 15, 1994-1998). A range of other
algorithms may be used to compare the nucleotide and amino acid
sequences such as but not limited to PILEUP, CLUSTALW, SEQUENCHER
or VectorNTI.
[0254] The terms "sequence similarity" and "sequence identity" as
used herein refers to the extent that sequences are identical or
functionally or structurally similar on a nucleotide-by-nucleotide
basis over a window of comparison. Thus, a "percentage of sequence
identity", for example, is calculated by comparing two optimally
aligned sequences over the window of comparison, determining the
number of positions at which the identical nucleic acid base (e.g.
A, T, C, G, I) occurs in both sequences to yield the number of
matched positions, dividing the number of matched positions by the
total number of positions in the window of comparison (i.e., the
window size), and multiplying the result by 100 to yield the
percentage of sequence identity. For the purposes of the present
invention, "sequence identity" will be understood to mean the
"match percentage" calculated by the DNASIS computer program
(Version 2.5 for windows; available from Hitachi Software
engineering Co., Ltd., South San Francisco, Calif., USA) using
standard defaults as used in the reference manual accompanying the
software. Similar comments apply in relation to sequence
similarity.
[0255] As detailed above, and more specifically, nucleic acid
sequence identities (homologies) may be evaluated using any of the
variety of sequence comparison algorithms and programs known in the
art. The extent of sequence identity (homology) may be determined
using any computer program and associated parameters, including
those described herein, such as BLAST 2.2.2. or FASTA version
3.0t78, with the default parameters. For example, the sequence
comparison algorithm is a BLAST version algorithm. In one aspect,
for nucleic acid sequence identity analysis, the BLAST nucleotide
parameters comprise word size=11, expect=10, filter low complexity
with DUST, cost to open gap=5, cost to extend gap=2, penalty for
mismatch=-3, reward for match=1, Dropoff (X) for BLAST extensions
in bits=20, final X dropoff value for gapped alignment=50, and all
other options are set to default.
[0256] Exemplary algorithms and programs include, but are not
limited to, TBLASTN, BLASTP, FASTA, TFASTA, and CLUSTALW (Pearson
and Lipman, Proc. Natl. Acad. Sci. USA 85(8):2444-2448, 1988;
Altschul et al., J. Mol. Biol. 215(3):403-410, 1990; Thompson et
al., Nucleic Acids Res. 22(2):4673-4680, 1994; Higgins et al.,
Methods Enzymol. 266:383-402, 1996; Altschul et al., Nature
Genetics 3:266-272, 1993). Homology or identity can be measured
using sequence analysis software (e.g., Sequence Analysis Software
Package of the Genetics Computer Group, University of Wisconsin
Biotechnology Center, 1710 University Avenue, Madison, Wis. 53705).
Such software matches similar sequences by assigning degrees of
homology to various deletions, substitutions and other
modifications.
[0257] BLAST, BLAST 2.0 and BLAST 2.2.2 algorithms are also used to
practice the invention. They are described, e.g., in; Altschul et
al. (1990), supra. Software for performing BLAST analyses is
publicly available through the National Center for Biotechnology
Information. This algorithm involves first identifying high scoring
sequence pairs (HSPs) by identifying short words of length W in the
query sequence, which either match or satisfy some positive-valued
threshold score T when aligned with a word of the same length in a
database sequence. T is referred to as the neighbourhood word score
threshold (Altschul et al. (1990) supra). These initial
neighbourhood word hits act as seeds for initiating searches to
find longer HSPs containing them. The word hits are extended in
both directions along each sequence for as far as the cumulative
alignment score can be increased. Cumulative scores are calculated
using, for nucleotide sequences, the parameters M (reward score for
a pair of matching residues; always >0). Extension of the word
hits in each direction are halted when: the cumulative alignment
score falls off by the quantity X from its maximum achieved value;
the cumulative score goes to zero or below, due to the accumulation
of one or more negative-scoring residue alignments; or the end of
either sequence is reached. The BLAST algorithm parameters W, T,
and X determine the sensitivity and speed of the alignment. The
BLASTN program (for nucleotide sequences) uses as defaults a
wordlength (W) of 11, an expectation (E) of 10, M=5, N=-4 and a
comparison of both strands. The BLAST algorithm also performs a
statistical analysis of the similarity between two sequences (see,
e.g., Karlin & Altschul (1993) Proc. Natl. Acad. Sci. USA
90:5873). One measure of similarity provided by BLAST algorithm is
the smallest sum probability (P(N)), which provides an indication
of the probability by which a match between two nucleotide
sequences would occur by chance.
[0258] The subject sequences are defined as exhibiting at least 90%
similarity. In one embodiment, said percentage similarity is 91%,
92%, 93%, 94%, 95%, 96%, 97%, 98% or 99%.
[0259] It should be understood that the "individual" who is the
subject of testing may be any human or non-human mammal. Examples
of non-human mammals includes primates, livestock animals (e.g.
horses, cattle, sheep, pigs, donkeys), laboratory test animals
(e.g. mice, rats, rabbits, guinea pigs), companion animals (e.g.
dogs, cats) and captive wild animals (e.g. deer, foxes). Preferably
the mammal is a human.
[0260] The method of the present invention is predicated on the
comparison of the level of hCG.sub.--1815491 in a biological sample
with the control levels of this marker. The "control level" may be
either a "normal level", which is the level of marker expressed by
a corresponding large intestine cell or cellular population which
is not neoplastic, or the background level which is detectable in a
negative control sample.
[0261] The normal (or "non-neoplastic") level may be determined
using tissues derived from the same individual who is the subject
of testing. However, it would be appreciated that this may be quite
invasive for the individual concerned and it is therefore likely to
be more convenient to analyse the test results relative to a
standard result which reflects individual or collective results
obtained from individuals other than the patient in issue. This
latter form of analysis is in fact the preferred method of analysis
since it enables the design of kits which require the collection
and analysis of a single biological sample, being a test sample of
interest. The standard results which provide the normal level may
be calculated by any suitable means which would be well known to
the person of skill in the art. For example, a population of normal
tissues can be assessed in terms of the level of the neoplastic
marker of the present invention, thereby providing a standard value
or range of values against which all future test samples are
analysed. It should also be understood that the normal level may be
determined from the subjects of a specific cohort and for use with
respect to test samples derived from that cohort. Accordingly,
there may be determined a number of standard values or ranges which
correspond to cohorts which differ in respect of characteristics
such as age, gender, ethnicity or health status. Said "normal
level" may be a discrete level or a range of levels. An increase in
the expression level of the subject genes relative to normal levels
is indicative of the tissue being neoplastic.
[0262] Preferably, said control level is a non-neoplastic
level.
[0263] According to these aspects of the present invention, said
large intestine tissue is preferably colorectal tissue.
[0264] Still more preferably, said neoplasm is a colorectal adenoma
or adenocarcinoma.
[0265] In a related aspect, it has been determined that a
subpopulation of the hCG.sub.--1815491 markers are not only
expressed at levels higher than normal levels, their expression
pattern is uniquely characterised by the fact that expression
levels above that of background control levels are not detectable
in non-neoplastic tissue. This determination has therefore enabled
the development of qualitative screening systems which are simply
designed to detect hCG.sub.--1815491 expression relative to a
control background level. In accordance with this aspect of the
present invention, said "control level" is therefore the
"background level". Preferably, said background level is of the
chosen testing methodology.
[0266] According to this aspect, there is therefore provided a
method of screening for the onset or predisposition to the onset of
a large intestine neoplasm in an individual, said method comprising
measuring the level of expression of one or more RNA transcripts,
which transcripts comprise an RNA sequence characterised by the
sequence of one of: [0267] (i) SEQ ID NO:21 or a sequence having at
least 90% similarity across the length of the sequence, or variant
of SEQ ID NO:21; [0268] (ii) SEQ ID NO:24 or a sequence having at
least 90% similarity across the length of the sequence, or variant
of SEQ ID NO:24; [0269] (iii) SEQ ID NO:25 or a sequence having at
least 90% similarity across the length of the sequence, or variant
of SEQ ID NO:25; [0270] (iv) SEQ ID NO:26 or a sequence having at
least 90% similarity across the length of the sequence, or variant
of SEQ ID NO:26; [0271] (v) SEQ ID NO:27 or a sequence having at
least 90% similarity across the length of the sequence, or variant
of SEQ ID NO:27; [0272] (vi) SEQ ID NO:29 or a sequence having at
least 90% similarity across the length of the sequence, or variant
of SEQ ID NO:29; [0273] (vii) SEQ ID NO:30 or a sequence having at
least 90% similarity across the length of the sequence, or variant
of SEQ ID NO:30; or [0274] (viii) SEQ ID NO:31 or a sequence having
at least 90% similarity across the length of the sequence, or
variant of SEQ ID NO:31; in a biological sample from said
individual wherein a higher level of expression of the genes or
transcripts of group (i) and/or group (ii) relative to background
levels is indicative of a neoplastic cell or a cell predisposed to
the onset of a neoplastic state.
[0275] In a most preferred embodiment, said transcripts comprise an
RNA sequence characterised by the sequence of one of: [0276] (i)
SEQ ID NO:21 or a sequence having at least 90% similarity across
the length of the sequence, or variant of SEQ ID NO:21; [0277] (ii)
SEQ ID NO:22 or a sequence having at least 90% similarity across
the length of the sequence, or variant of SEQ ID NO:22; [0278]
(iii) SEQ ID NO:23 or a sequence having at least 90% similarity
across the length of the sequence, or variant of SEQ ID NO:23;
[0279] (iv) SEQ ID NO:24 or a sequence having at least 90%
similarity across the length of the sequence, or variant of SEQ ID
NO:24; or [0280] (v) SEQ ID NO:27 or a sequence having at least 90%
similarity across the length of the sequence, or variant of SEQ ID
NO:27; in a biological sample from said individual wherein a higher
level of expression of the genes or transcripts of group (i) and/or
group (ii) relative to background levels is indicative of a
neoplastic cell or a cell predisposed to the onset of a neoplastic
state.
[0281] Most preferably, said RNA sequences are characterised by the
sequence of either SEQ ID NO:21 or SEQ ID NO:22.
[0282] The detection method of the present invention can be
performed on any suitable biological sample. To this end, reference
to a "biological sample" should be understood as a reference to any
sample of biological material derived from an animal such as, but
not limited to, cellular material, biological fluids (eg. blood),
faeces, tissue biopsy specimens, surgical specimens or fluid which
has been introduced into the body of an animal and subsequently
removed (such as, for example, the solution retrieved from an enema
wash). The biological sample which is tested according to the
method of the present invention may be tested directly or may
require some form of treatment prior to testing. For example, a
biopsy or surgical sample may require homogenisation prior to
testing or it may require sectioning for in situ testing of the
qualitative expression levels of individual genes. Alternatively, a
cell sample may require permeabilisation prior to testing. Further,
to the extent that the biological sample is not in liquid form, (if
such form is required for testing) it may require the addition of a
reagent, such as a buffer, to mobilise the sample.
[0283] To the extent that the neoplastic marker gene expression
product is present in a biological sample, the biological sample
may be directly tested or else all or some of the nucleic acid or
protein material present in the biological sample may be isolated
prior to testing. To this end, and as hereinbefore described, it
would be appreciated that when screening for changes to the level
of expression of hCG.sub.--1815491 or the specifically recited
transcripts, one may screen for the RNA transcripts themselves,
cDNA which has been transcribed therefrom or a translated protein
product. In yet another example, the sample may be partially
purified or otherwise enriched prior to analysis. For example, to
the extent that a biological sample comprises a very diverse cell
population, it may be desirable to enrich for a sub-population of
particular interest. It is within the scope of the present
invention for the target cell population or molecules derived
therefrom to be pretreated prior to testing, for example,
inactivation of live virus or being run on a gel. It should also be
understood that the biological sample may be freshly harvested or
it may have been stored (for example by freezing) prior to testing
or otherwise treated prior to testing (such as by undergoing
culturing).
[0284] The choice of what type of sample is most suitable for
testing in accordance with the method disclosed herein will be
dependent on the nature of the situation. Preferably, said sample
is a faecal (stool) sample, enema wash, surgical resection, tissue
biopsy or blood sample.
[0285] As detailed hereinbefore, the present invention is designed
to screen for a neoplastic cell or cellular population, which is
located in the large intestine. Accordingly, reference to "cell or
cellular population" should be understood as a reference to an
individual cell or a group of cells. Said group of cells may be a
diffuse population of cells, a cell suspension, an encapsulated
population of cells or a population of cells which take the form of
tissue.
[0286] As detailed hereinbefore, reference to "expression" should
be understood as a reference to the transcription and/or
translation of a nucleic acid molecule. In this regard, the present
invention is exemplified with respect to screening for
hCG.sub.--1815491 expression products taking the form of RNA
transcripts (eg primary RNA or mRNA). Reference to "RNA" should be
understood to encompass reference to any form of RNA, such as
primary RNA or mRNA. Without limiting the present invention in any
way, the modulation of gene transcription leading to increased or
decreased RNA synthesis will also correlate with the translation of
some of these RNA transcripts to produce a protein product.
Accordingly, the present invention also extends to detection
methodology which is directed to screening for modulated levels or
patterns of the neoplastic marker protein products as an indicator
of the neoplastic state of a cell or cellular population. Although
one method is to screen for RNA transcripts and/or the
corresponding protein product, it should be understood that the
present invention is not limited in this regard and extends to
screening for any other form of neoplastic marker expression
product such as, for example, a primary RNA transcript. It is well
within the skill of the person of skill in the art to determine the
most appropriate screening target for any given situation.
[0287] Reference to "nucleic acid molecule" should be understood as
a reference to both deoxyribonucleic acid molecules and ribonucleic
acid molecules and fragments thereof. The present invention
therefore extends to both directly screening for RNA levels in a
biological sample or screening for the complementary cDNA which has
been reverse-transcribed from an RNA population of interest. It is
well within the skill of the person of skill in the art to design
methodology directed to screening for either DNA or RNA. As
detailed above, the method of the present invention also extends to
screening for the protein product translated from the subject
RNA.
[0288] In terms of screening for the upregulation of
hCG.sub.--1815491 it would also be well known to the person of
skill in the art that changes which are detectable at the DNA level
are indicative of changes to gene expression activity and therefore
changes to expression product levels. Such changes include but are
not limited to, changes to DNA methylation and chromatin proteins
associated with the gene. Accordingly, reference herein to
"screening the level of expression" and comparison of these "levels
of expression" to control "levels of expression" should be
understood as a reference to assessing DNA factors which are
related to transcription, such as gene/DNA methylation patterns or
association with specific chromosomal proteins.
[0289] The term "protein" should be understood to encompass
peptides, polypeptides and proteins (including protein fragments).
The protein may be glycosylated or unglycosylated and/or may
contain a range of other molecules fused, linked, bound or
otherwise associated to the protein such as amino acids, lipids,
carbohydrates or other peptides, polypeptides or proteins.
Reference herein to a "protein" includes a protein comprising a
sequence of amino acids as well as a protein associated with other
molecules such as amino acids, lipids, carbohydrates or other
peptides, polypeptides or proteins.
[0290] The proteins encoded by hCG.sub.--1815491 may be in
multimeric form meaning that two or more molecules are associated
together. Where the same protein molecules are associated together,
the complex is a homomultimer. An example of a homomultimer is a
homodimer. Where at least one marker protein is associated with at
least one non-marker protein, then the complex is a heteromultimer
such as a heterodimer.
[0291] Reference to a "fragment" should be understood as a
reference to a portion of the subject nucleic acid molecule or
protein. As detailed hereinbefore, this is particularly relevant
with respect to screening for modulated RNA levels in stool samples
since the subject RNA is likely to have been degraded or otherwise
fragmented due to the environment of the gut. One may therefore
actually be detecting fragments of the subject RNA molecule, which
fragments are identified by virtue of the use of a suitably
specific probe.
[0292] Reference to the "onset" of a neoplasm, such as adenoma or
adenocarcinoma, should be understood as a reference to one or more
cells of that individual exhibiting dysplasia. In this regard, the
adenoma or adenocarcinoma may be well developed in that a mass of
dysplastic cells has developed. Alternatively, the adenoma or
adenocarcinoma may be at a very early stage in that only relatively
few abnormal cell divisions have occurred at the time of diagnosis.
The present invention also extends to the assessment of an
individual's predisposition to the development of a neoplasm, such
as an adenoma or adenocarcinoma. Without limiting the present
invention in any way, changed levels of the neoplastic marker may
be indicative of that individual's predisposition to developing a
neoplasia, such as the future development of an adenoma or
adenocarcinoma or another adenoma or adenocarcinoma.
[0293] Although the preferred method is to diagnose neoplasia
development or predisposition thereto, the detection of converse
changes in the levels of said marker may be desired under certain
circumstances, for example, to monitor the effectiveness of
therapeutic or prophylactic treatment directed to modulating a
neoplastic condition, such as adenoma or adenocarcinoma
development. For example, where elevated levels of
hCG.sub.--1815491 indicates that an individual has developed a
condition characterised by adenoma or adenocarcinoma development,
for example, screening for a decrease in the levels of this marker
subsequently to the onset of a therapeutic regime may be utilised
to indicate reversal or other form of improvement of the subject
individual's condition.
[0294] The method of the present invention is therefore useful as a
one off test or as an on-going monitor of those individuals thought
to be at risk of neoplasia development or as a monitor of the
effectiveness of therapeutic or prophylactic treatment regimes
directed to inhibiting or otherwise slowing neoplasia development.
In these situations, mapping the modulation of hCG.sub.--1815491
expression levels in any one or more classes of biological samples
is a valuable indicator of the status of an individual or the
effectiveness of a therapeutic or prophylactic regime which is
currently in use. Accordingly, the method of the present invention
should be understood to extend to monitoring for increases or
decreases in hCG.sub.--1815491 expression levels in an individual
relative to their normal level (as hereinbefore defined), or
relative to one or more earlier marker expression levels determined
from a biological sample of said individual.
[0295] Means of testing for the subject expressed neoplasm marker
in a biological sample can be achieved by any suitable method,
which would be well known to the person of skill in the art, such
as but not limited to: [0296] (i) In vivo detection. [0297]
Molecular Imaging may be used following administration of imaging
probes or reagents capable of disclosing altered expression of the
marker in the intestinal tissues. [0298] Molecular imaging (Moore
et al., BBA, 1402:239-249, 1988; Weissleder et al., Nature Medicine
6:351-355, 2000) is the in vivo imaging of molecular expression
that correlates with the macro-features currently visualized using
"classical" diagnostic imaging techniques such as X-Ray, computed
tomography (CT), MRI, Positron Emission Tomography (PET) or
endoscopy. [0299] (ii) Detection of up-regulation of RNA expression
in the cells by Fluorescent In Situ Hybridization (FISH), or in
extracts from the cells by technologies such as Quantitative
Reverse Transcriptase Polymerase Chain Reaction (QRTPCR) or Flow
cytometric qualification of competitive RT-PCR products (Wedemeyer
et al., Clinical Chemistry 48:9 1398-1405, 2002). [0300] (iii)
Assessment of expression profiles of RNA, for example by array
technologies (Alon et al., Proc. Natl. Acad. Sci. USA: 96,
6745-6750, June 1999). [0301] A "microarray" is a linear or
multi-dimensional array of preferably discrete regions, each having
a defined area, formed on the surface of a solid support. The
density of the discrete regions on a microarray is determined by
the total numbers of target polynucleotides to be detected on the
surface of a single solid phase support. As used herein, a DNA
microarray is an array of oligonucleotide probes placed onto a chip
or other surfaces used to detect complementary oligonucleotides
from a complex nucleic acid mixture. Since the position of each
particular group of probes in the array is known, the identities of
the target polynucleotides can be determined based on their binding
to a particular position in the microarray. [0302] Recent
developments in DNA microarray technology make it possible to
conduct a large scale assay of a plurality of target nucleic acid
molecules on a single solid phase support. U.S. Pat. No. 5,837,832
(Chee et al.) and related patent applications describe immobilizing
an array of oligonucleotide probes for hybridization and detection
of specific nucleic acid sequences in a sample. Target
polynucleotides of interest isolated from a tissue of interest are
hybridized to the DNA chip and the specific sequences detected
based on the target polynucleotides' preference and degree of
hybridization at discrete probe locations. One important use of
arrays is in the analysis of differential gene expression, where
the profile of expression of genes in different cells or tissues,
often a tissue of interest and a control tissue, is compared and
any differences in gene expression among the respective tissues are
identified. Such information is useful for the identification of
the types of genes expressed in a particular tissue type and
diagnosis of conditions based on the expression profile. [0303] In
one example, RNA from the sample of interest is subjected to
reverse transcription to obtain labelled cDNA. See U.S. Pat. No.
6,410,229 (Lockhart et al.) The cDNA is then hybridized to
oligonucleotides or cDNAs of known sequence arrayed on a chip or
other surface in a known order. In another example, the RNA is
isolated from a biological sample and hybridised to a chip on which
are anchored cDNA probes. The location of the oligonucleotide to
which the labelled cDNA hybridizes provides sequence information on
the cDNA, while the amount of labelled hybridized RNA or cDNA
provides an estimate of the relative representation of the RNA or
cDNA of interest. See Schena, et al. Science 270:467-470 (1995).
For example, use of a cDNA microarray to analyze gene expression
patterns in human cancer is described by DeRisi, et al. (Nature
Genetics 14:457-460 (1996)). [0304] In a preferred embodiment,
nucleic acid probes corresponding to the subject nucleic acids are
made. The nucleic acid probes attached to the microarray are
designed to be substantially complementary to the nucleic acids of
the biological sample such that specific hybridization of the
target sequence and the probes of the present invention occurs.
This complementarity need not be perfect, in that there may be any
number of base pair mismatches that will interfere with
hybridization between the target sequence and the single stranded
nucleic acids of the present invention. It is expected that the
overall homology of the genes at the nucleotide level probably will
be about 40% or greater, probably about 60% or greater, and even
more probably about 80% or greater; and in addition that there will
be corresponding contiguous sequences of about 8-12 nucleotides or
longer. However, if the number of mutations is so great that no
hybridization can occur under even the least stringent of
hybridization conditions, the sequence is not a complementary
target sequence. Thus, by "substantially complementary" herein is
meant that the probes are sufficiently complementary to the target
sequences to hybridize under normal reaction conditions,
particularly high stringency conditions. [0305] A nucleic acid
probe is generally single stranded but can be partly single and
partly double stranded. The strandedness of the probe is dictated
by the structure, composition, and properties of the target
sequence. In general, the oligonucleotide probes range from about
6, 8, 10, 12, 15, 20, 30 to about 100 bases long, with from about
10 to about 80 bases being preferred, and from about 15 to about 40
bases being particularly preferred. That is, generally entire genes
are rarely used as probes. In some embodiments, much longer nucleic
acids can be used, up to hundreds of bases. The probes are
sufficiently specific to hybridize to a complementary template
sequence under conditions known by those of skill in the art. The
number of mismatches between the probe's sequences and their
complementary template (target) sequences to which they hybridize
during hybridization generally do not exceed 15%, usually do not
exceed 10% and preferably do not exceed 5%, as-determined by BLAST
(default settings). [0306] Oligonucleotide probes can include the
naturally-occurring heterocyclic bases normally found in nucleic
acids (uracil, cytosine, thymine, adenine and guanine), as well as
modified bases and base analogues. Any modified base or base
analogue compatible with hybridization of the probe to a target
sequence is useful in the practice of the invention. The sugar or
glycoside portion of the probe can comprise deoxyribose, ribose,
and/or modified forms of these sugars, such as, for example,
2'-O-alkyl ribose. In a preferred embodiment, the sugar moiety is
2'-deoxyribose; however, any sugar moiety that is compatible with
the ability of the probe to hybridize to a target sequence can be
used. [0307] In one embodiment, the nucleoside units of the probe
are linked by a phosphodiester backbone, as is well known in the
art. In additional embodiments, internucleotide linkages can
include any linkage known to one of skill in the art that is
compatible with specific hybridization of the probe including, but
not limited to phosphorothioate, methylphosphonate, sulfamate
(e.g., U.S. Pat. No. 5,470,967) and polyamide (i.e., peptide
nucleic acids). Peptide nucleic acids are described in Nielsen et
al. (1991) Science 254: 1497-1500, U.S. Pat. No. 5,714,331, and
Nielsen (1999) Curr. Opin. Biotechnol. 10:71-75. [0308] In certain
embodiments, the probe can be a chimeric molecule; i.e., can
comprise more than one type of base or sugar subunit, and/or the
linkages can be of more than one type within the same primer. The
probe can comprise a moiety to facilitate hybridization to its
target sequence, as are known in the art, for example,
intercalators and/or minor groove binders. Variations of the bases,
sugars, and internucleoside backbone, as well as the presence of
any pendant group on the probe, will be compatible with the ability
of the probe to bind, in a sequence-specific fashion, with its
target sequence. A large number of structural modifications, are
possible within these bounds. Advantageously, the probes according
to the present invention may have structural characteristics such
that they allow the signal amplification, such structural
characteristics being, for example, branched DNA probes as those
described by Urdea et al. (Nucleic Acids Symp. Ser., 24:197-200
(1991)) or in the European Patent No. EP-0225,807. Moreover,
synthetic methods for preparing the various heterocyclic bases,
sugars, nucleosides and nucleotides that form the probe, and
preparation of oligonucleotides of specific predetermined sequence,
are well-developed and known in the art. A preferred method for
oligonucleotide synthesis incorporates the teaching of U.S. Pat.
No. 5,419,966. [0309] Multiple probes may be designed for a
particular target nucleic acid to account for polymorphism and/or
secondary structure in the target nucleic acid, redundancy of data
and the like. In some embodiments, where more than one probe per
sequence is used, either overlapping probes or probes to different
sections of a single target gene are used. That is, two, three,
four or more probes, are used to build in a redundancy for a
particular target. The probes can be overlapping (i.e. have some
sequence in common), or are specific for distinct sequences of a
gene. When multiple target polynucleotides are to be detected
according to the present invention, each probe or probe group
corresponding to a particular target polynucleotide is situated in
a discrete area of the microarray. [0310] Probes may be in
solution, such as in wells or on the surface of a micro-array, or
attached to a solid support. Examples of solid support materials
that can be used include a plastic, a ceramic, a metal, a resin, a
gel and a membrane. Useful types of solid supports include plates,
beads, magnetic material, microbeads, hybridization chips,
membranes, crystals, ceramics and self-assembling monolayers. One
example comprises a two-dimensional or three-dimensional matrix,
such as a gel or hybridization chip with multiple probe binding
sites (Pevzner et al., J. Biomol. Struc. & Dyn. 9:399-410,
1991; Maskos and Southern, Nuc. Acids Res. 20:1679-84, 1992).
Hybridization chips can be used to construct very large probe
arrays that are subsequently hybridized with a target nucleic acid.
Analysis of the hybridization pattern of the chip can assist in the
identification of the target nucleotide sequence. Patterns can be
manually or computer analyzed, but it is clear that positional
sequencing by hybridization lends itself to computer analysis and
automation. In another example, one may use an Affymetrix chip on a
solid phase structural support in combination with a fluorescent
bead based approach. In yet another example, one may utilise a cDNA
microarray. In this regard, the oligonucleotides described by
Lockkart et al. (i.e. Affymetrix synthesis probes in situ on the
solid phase) are particularly preferred, that is, photolithography.
[0311] As will be appreciated by those in the art, nucleic acids
can be attached or immobilized to a solid support in a wide variety
of ways. By "immobilized" herein is meant the association or
binding between the nucleic acid probe and the solid support is
sufficient to be stable under the conditions of binding, washing,
analysis, and removal. The binding can be covalent or non-covalent.
By "non-covalent binding" and grammatical equivalents herein is
meant one or more of either electrostatic, hydrophilic, and
hydrophobic interactions. Included in non-covalent binding is the
covalent attachment of a molecule, such as streptavidin, to the
support and the non-covalent binding of the biotinylated probe to
the streptavidin. By "covalent binding" and grammatical equivalents
herein is meant that the two moieties, the solid support and the
probe, are attached by at least one bond, including sigma bonds, pi
bonds and coordination bonds. Covalent bonds can be formed directly
between the probe and the solid support or can be formed by a cross
linker or by inclusion of a specific reactive group on either the
solid support or the probe or both molecules. Immobilization may
also involve a combination of covalent and non-covalent
interactions. [0312] Nucleic acid probes may be attached to the
solid support by covalent binding such as by conjugation with a
coupling agent or by covalent or non-covalent binding such as
electrostatic interactions, hydrogen bonds or antibody-antigen
coupling, or by combinations thereof. Typical coupling agents
include biotin/avidin, biotin/streptavidin, Staphylococcus aureus
protein A/IgG antibody F.sub.c fragment, and streptavidin/protein A
chimeras (T. Sano and C. R. Cantor, Bio/Technology 9:1378-81
(1991)), or derivatives or combinations of these agents. Nucleic
acids may be attached to the solid support by a photocleavable
bond, an electrostatic bond, a disulfide bond, a peptide bond, a
diester bond or a combination of these sorts of bonds. The array
may also be attached to the solid support by a selectively
releasable bond such as 4,4'-dimethoxytrityl or its derivative.
Derivatives which have been found to be useful include 3 or
4[bis-(4-methoxyphenyl)]-methyl-benzoic acid, N-succinimidyl-3 or
4[bis-(4-methoxyphenyl)]-methyl-benzoic acid, N-succinimidyl-3 or
4[bis-(4-methoxyphenyl)]-hydroxymethyl-benzoic acid,
N-succinimidyl-3 or 4[bis-(4-methoxyphenyl)]-chloromethyl-benzoic
acid, and salts of these acids. [0313] In general, the probes are
attached to the microarray in a wide variety of ways, as will be
appreciated by those in the art. As described herein, the nucleic
acids can either be synthesized first, with subsequent attachment
to the microarray, or can be directly synthesized on the
microarray. [0314] The microarray comprises a suitable solid
substrate. By "substrate" or "solid support" or other grammatical
equivalents herein is meant any material that can be modified to
contain discrete individual sites appropriate for the attachment or
association of the nucleic acid probes and is amenable to at least
one detection method. The solid phase support of the present
invention can be of any solid materials and structures suitable for
supporting nucleotide hybridization and synthesis. Preferably, the
solid phase support comprises at least one substantially rigid
surface on which the oligonucleotide primers can be immobilized and
the reverse transcriptase reaction performed. The substrates with
which the polynucleotide microarray elements are stably associated
and may be fabricated from a variety of materials, including
plastics, ceramics, metals, acrylamide, cellulose, nitrocellulose,
glass, polystyrene, polyethylene vinyl acetate, polypropylene,
polymethacrylate, polyethylene, polyethylene oxide, polysilicates,
polycarbonates, Teflon, fluorocarbons, nylon, silicon rubber,
polyanhydrides, polyglycolic acid, polylactic acid,
polyorthoesters, polypropylfumerate, collagen, glycosaminoglycans,
and polyamino acids. Substrates may be two-dimensional or
three-dimensional in form, such as gels, membranes, thin films,
glasses, plates, cylinders, beads, magnetic beads, optical fibers,
woven fibers, etc. A preferred form of array is a three-dimensional
array. A preferred three-dimensional array is a collection of
tagged beads. Each tagged bead has different oligonucleotide
primers attached to it. Tags are detectable by signalling means
such as color (Luminex, Illumina) and electromagnetic field
(Pharmaseq) and signals on tagged beads can even be remotely
detected (e.g., using optical fibers). The size of the solid
support can be any of the standard microarray sizes, useful for DNA
microarray technology, and the size may be tailored to fit the
particular machine being used to conduct a reaction of the
invention. In general, the substrates allow optical detection and
do not appreciably fluoresce.
[0315] In one embodiment, the surface of the microarray and the
probe may be derivatized with chemical functional groups for
subsequent attachment of the two. Thus, for example, the microarray
is derivatized with a chemical functional group including, but not
limited to, amino groups, carboxy groups, oxo groups and thiol
groups, with amino groups being particularly preferred. Using these
functional groups, the probes can be attached using functional
groups on the probes. For example, nucleic acids containing amino
groups can be attached to surfaces comprising amino groups, for
example using linkers as are known in the art; for example, homo-
or hetero-bifunctional linkers as are well known. In addition, in
some cases, additional linkers, such as alkyl groups (including
substituted and heteroalkyl groups) may be used. [0316] In this
embodiment, the oligonucleotides are synthesized as is known in the
art, and then attached to the surface of the solid support. As will
be appreciated by those skilled in the art, either the 5' or 3'
terminus may be attached to the solid support, or attachment may be
via an internal nucleoside. In an additional embodiment, the
immobilization to the solid support may be very strong, yet
non-covalent. For example, biotinylated oligonucleotides can be
made, which bind to surfaces covalently coated with streptavidin,
resulting in attachment. [0317] The arrays may be produced
according to any convenient methodology, such as preforming the
polynucleotide microarray elements and then stably associating them
with the surface. Alternatively, the oligonucleotides may be
synthesized on the surface, as is known in the art. A number of
different array configurations and methods for their production are
known to those of skill in the art and disclosed in WO 95/25116 and
WO 95/35505 (photolithographic techniques), U.S. Pat. No. 5,445,934
(in situ synthesis by photolithography), U.S. Pat. No. 5,384,261
(in situ synthesis by mechanically directed flow paths); and U.S.
Pat. No. 5,700,637 (synthesis by spotting, printing or coupling);
the disclosure of which are herein incorporated in their entirety
by reference. Another method for coupling DNA to beads uses
specific ligands attached to the end of the DNA to link to
ligand-binding molecules attached to a bead. Possible
ligand-binding partner pairs include biotin-avidin/streptavidin, or
various antibody/antigen pairs such as digoxygenin-antidigoxygenin
antibody (Smith et al., Science 258:1122-1126 (1992)). Covalent
chemical attachment of DNA to the support can be accomplished by
using standard coupling agents to link the 5'-phosphate on the DNA
to coated microspheres through a phosphoamidate bond. Methods for
immobilization of oligonucleotides to solid-state substrates are
well established. See Pease et al., Proc. Natl. Acad. Sci. USA
91(11):5022-5026 (1994). A preferred method of attaching
oligonucleotides to solid-state substrates is described by Guo et
al., Nucleic Acids Res. 22:5456-5465 (1994). Immobilization can be
accomplished either by in situ DNA synthesis (Maskos and Southern,
supra) or by covalent attachment of chemically synthesized
oligonucleotides (Guo et al., supra) in combination with robotic
arraying technologies. [0318] In addition to the solid-phase
technology represented by microarray arrays, gene expression can
also be quantified using liquid-phase assays. One such system is
kinetic polymerase chain reaction (PCR). Kinetic PCR allows for the
simultaneous amplification and quantification of specific nucleic
acid sequences. The specificity is derived from synthetic
oligonucleotide primers designed to preferentially adhere to
single-stranded nucleic acid sequences bracketing the target site.
This pair of oligonucleotide primers form specific, non-covalently
bound complexes on each strand of the target sequence. These
complexes facilitate in vitro transcription of double-stranded DNA
in opposite orientations. Temperature cycling of the reaction
mixture creates a continuous cycle of primer binding,
transcription, and re-melting of the nucleic acid to individual
strands. The result is an exponential increase of the target dsDNA
product. This product can be quantified in real time either through
the use of an intercalating dye or a sequence specific probe.
SYBR(r) Green 1, is an example of an intercalating dye, that
preferentially binds to dsDNA resulting in a concomitant increase
in the fluorescent signal. Sequence specific probes, such as used
with TaqMan technology, consist of a fluorochrome and a quenching
molecule covalently bound to opposite ends of an oligonucleotide.
The probe is designed to selectively bind the target DNA sequence
between the two oligonucleotide primers. When the DNA strands are
synthesized during the PCR reaction, the fluorochrome is cleaved
from the probe by the exonuclease activity of the polymerase
resulting in signal dequenching. The probe signalling method can be
more specific than the intercalating dye method, but in each case,
signal strength is proportional to the dsDNA product produced. Each
type of quantification method can be used in multi-well liquid
phase arrays with each well representing oligonucleotide primers
and/or probes specific to nucleic acid sequences of interest. When
used with messenger RNA preparations of tissues or cell lines, an
array of probe/primer reactions can simultaneously quantify the
expression of multiple gene products of interest. See Germer et
al., Genome Res. 10:258-266 (2000); Heid et al., Genome Res.
6:986-994 (1996). [0319] (iv) Measurement of altered neoplastic
marker protein levels in cell extracts, for example by immunoassay.
[0320] Testing for proteinaceous neoplastic marker expression
product in a biological sample can be performed by any one of a
number of suitable methods which are well known to those skilled in
the art. Examples of suitable methods include, but are not limited
to, antibody based screening of tissue sections, biopsy specimens
or bodily fluid samples. [0321] To the extent that antibody based
methods of diagnosis are used, the presence of the marker protein
may be determined in a number of ways such as by Western blotting,
ELISA or flow cytometry procedures. These, of course, include both
single-site and two-site or "sandwich" assays of the
non-competitive types, as well as in the traditional competitive
binding assays. These assays also include direct binding of a
labelled antibody to a target. [0322] Sandwich assays are among the
most useful and commonly used assays. A number of variations of the
sandwich assay technique exist, and all are intended to be
encompassed by the present invention. Briefly, in a typical forward
assay, an unlabelled antibody is immobilized on a solid substrate
and the sample to be tested brought into contact with the bound
molecule. After a suitable period of incubation, for a period of
time sufficient to allow formation of an antibody-antigen complex,
a second antibody specific to the antigen, labelled with a reporter
molecule capable of producing a detectable signal is then added and
incubated, allowing time sufficient for the formation of another
complex of antibody-antigen-labelled antibody. Any unreacted
material is washed away, and the presence of the antigen is
determined by observation of a signal produced by the reporter
molecule. The results may either be qualitative, by simple
observation of the visible signal, or may be quantitated by
comparing with a control sample. Variations on the forward assay
include a simultaneous assay, in which both sample and labelled
antibody are added simultaneously to the bound antibody. These
techniques are well known to those skilled in the art, including
any minor variations as will be readily apparent. [0323] In the
typical forward sandwich assay, a first antibody having specificity
for the marker or antigenic parts thereof, is either covalently or
passively bound to a solid surface. The solid surface is typically
glass or a polymer, the most commonly used polymers being
cellulose, polyacrylamide, nylon, polystyrene, polyvinyl chloride
or polypropylene. The solid supports may be in the form of tubes,
beads, discs of microplates, or any other surface suitable for
conducting an immunoassay. The binding processes are well-known in
the art and generally consist of cross-linking, covalently binding
or physically adsorbing, the polymer-antibody complex is washed in
preparation for the test sample. An aliquot of the sample to be
tested is then added to the solid phase complex and incubated for a
period of time sufficient (e.g. 2-40 minutes) and under suitable
conditions (e.g. 25.degree. C.) to allow binding of any subunit
present in the antibody. Following the incubation period, the
antibody subunit solid phase is washed and dried and incubated with
a second antibody specific for a portion of the antigen. The second
antibody is linked to a reporter molecule which is used to indicate
the binding of the second antibody to the antigen. [0324] An
alternative method involves immobilizing the target molecules in
the biological sample and then exposing the immobilized target to
specific antibody which may or may not be labelled with a reporter
molecule. Depending on the amount of target and the strength of the
reporter molecule signal, a bound target may be detectable by
direct labelling with the antibody. Alternatively, a second
labelled antibody, specific to the first antibody is exposed to the
target-first antibody complex to form a target-first
antibody-second antibody tertiary complex. The complex is detected
by the signal emitted by the reporter molecule. [0325] By "reporter
molecule" as used in the present specification, is meant a molecule
which, by its chemical nature, provides an analytically
identifiable signal which allows the detection of antigen-bound
antibody. Detection may be either qualitative or quantitative. The
most commonly used reporter molecules in this type of assay are
either enzymes, fluorophores or radionuclide containing molecules
(i.e. radioisotopes) and chemiluminescent molecules. [0326] In the
case of an enzyme immunoassay, an enzyme is conjugated to the
second antibody, generally by means of glutaraldehyde or periodate.
As will be readily recognized, however, a wide variety of different
conjugation techniques exist, which are readily available to the
skilled artisan. Commonly used enzymes include horseradish
peroxidase, glucose oxidase, beta-galactosidase and alkaline
phosphatase, amongst others. The substrates to be used with the
specific enzymes are generally chosen for the production, upon
hydrolysis by the corresponding enzyme, of a detectable color
change. Examples of suitable enzymes include alkaline phosphatase
and peroxidase. It is also possible to employ fluorogenic
substrates, which yield a fluorescent product rather than the
chromogenic substrates noted above. In all cases, the
enzyme-labelled antibody is added to the first antibody hapten
complex, allowed to bind, and then the excess reagent is washed
away. A solution containing the appropriate substrate is then added
to the complex of antibody-antigen-antibody. The substrate will
react with the enzyme linked to the second antibody, giving a
qualitative visual signal, which may be further quantitated,
usually spectrophotometrically, to give an indication of the amount
of antigen which was present in the sample. "Reporter molecule"
also extends to use of cell agglutination or inhibition of
agglutination such as red blood cells on latex beads, and the like.
[0327] Alternately, fluorescent compounds, such as fluorecein and
rhodamine, may be chemically coupled to antibodies without altering
their binding capacity. When activated by illumination with light
of a particular wavelength, the fluorochrome-labelled antibody
adsorbs the light energy, inducing a state to excitability in the
molecule, followed by emission of the light at a characteristic
color visually detectable with a light microscope. As in the EIA,
the fluorescent labelled antibody is allowed to bind to the first
antibody-hapten complex. After washing off the unbound reagent, the
remaining tertiary complex is then exposed to the light of the
appropriate wavelength the fluorescence observed indicates the
presence of the hapten of interest. Immunofluorescence and EIA
techniques are both very well established in the art and are
particularly preferred for the present method. However, other
reporter molecules, such as radioisotope, chemiluminescent or
bioluminescent molecules, may also be employed. [0328] (v) Without
limiting the present invention to any one theory or mode of action,
during development gene expression is regulated by processes that
alter the availability of genes for expression in different cell
lineages without any alteration in gene sequence, and these states
can be inherited through a cell division--a process called
epigenetic inheritance. Epigenetic inheritance is determined by a
combination of DNA methylation (modification of cytosine to give
5-methyl cytosine, 5 meC) and by modifications of the histone
chromosomal proteins that package DNA. Thus methylation of DNA at
CpG sites and modifications such as deacetylation of histone H3 on
lysine 9, and methylation on lysine 9 or 27 are associated with
inactive chromatin, while the converse state of a lack of DNA
methylation, acetylation of lysine 9 of histone H3 is associated
with open chromatin and active gene expression. In cancer, this
epigenetic regulation of gene expression is frequently found to be
disrupted (Esteller & Herman, 2000; Jones & Baylin, 2002).
Genes such as tumour suppressor or metastasis suppressor genes are
often found to be silenced by DNA methylation, while other genes
may be hypomethylated and inappropriately expressed. Thus, among
genes that elevated or inappropriate expression in cancer, this in
some instances is characterised by a loss of methylation of the
promoter or regulatory region of the gene. [0329] A variety of
methods are available for detection of aberrantly methylated DNA of
a specific gene, even in the presence of a large excess of normal
DNA (Clark 2007). Thus, elevated expression of certain genes may be
detected through detection of the presence of hypomethylated
sequences in tissue, bodily fluid or other patient samples. [0330]
Epigenetic alterations and chromatin changes in cancer are also
evident in the altered association of modified histones with
specific genes (Esteller, 2007); for example activated genes are
often found associated with histone H3 that is acetylated on lysine
9 and methylated on lysine 4. The use of antibodies targeted to
altered histones allows for the isolation of DNA associated with
particular chromatin states and has potential use in cancer
diagnosis. [0331] (vi) Determining altered expression of protein
neoplastic markers on the cell surface, for example by
immunohistochemistry. [0332] (vii) Determining altered protein
expression based on any suitable functional test, enzymatic test or
immunological test in addition to those detailed in points (iv) and
(v) above.
[0333] A person of ordinary skill in the art could determine, as a
matter of routine procedure, the appropriateness of applying a
given method to a particular type of biological sample.
[0334] Without limiting the present invention in any way, and as
detailed above, gene expression levels can be measured by a variety
of methods known in the art. For example, gene transcription or
translation products can be measured. Gene transcription products,
i.e., RNA, can be measured, for example, by hybridization assays,
run-off assays, Northern blots, or other methods known in the
art.
[0335] Hybridization assays generally involve the use of
oligonucleotide probes that hybridize to the single-stranded RNA
transcription products. Thus, the oligonucleotide probes are
complementary to the transcribed RNA expression product. Typically,
a sequence-specific probe can be directed to hybridize to RNA or
cDNA. A "nucleic acid probe", as used herein, can be a DNA probe or
an RNA probe that hybridizes to a complementary sequence. One of
skill in the art would know how to design such a probe such that
sequence specific hybridization will occur. One of skill in the art
will further know how to quantify the amount of sequence specific
hybridization as a measure of the amount of gene expression for the
gene was transcribed to produce the specific RNA.
[0336] The hybridization sample is maintained under conditions that
are sufficient to allow specific hybridization of the nucleic acid
probe to a specific gene expression product. "Specific
hybridization", as used herein, indicates near exact hybridization
(e.g., with few if any mismatches). Specific hybridization can be
performed under high stringency conditions or moderate stringency
conditions. In one embodiment, the hybridization conditions for
specific hybridization are high stringency. For example, certain
high stringency conditions can be used to distinguish perfectly
complementary nucleic acids from those of less complementarity.
"High stringency conditions", "moderate stringency conditions" and
"low stringency conditions" for nucleic acid hybridizations are
explained on pages 2.10.1-2.10.16 and pages 6.3.1-6.3.6 in Current
Protocols in Molecular Biology (Ausubel et al., 1998 supra), the
entire teachings of which are incorporated by reference herein).
The exact conditions that determine the stringency of hybridization
depend not only on ionic strength (e.g., 0.2.times.SSC,
0.1.times.SSC), temperature (e.g., room temperature, 42.degree. C.,
68.degree. C.) and the concentration of destabilizing agents such
as formamide or denaturing agents such as SDS, but also on factors
such as the length of the nucleic acid sequence, base composition,
percent mismatch between hybridizing sequences and the frequency of
occurrence of subsets of that sequence within other non-identical
sequences. Thus, equivalent conditions can be determined by varying
one or more of these parameters while maintaining a similar degree
of identity or similarity between the two nucleic acid molecules.
Typically, conditions are used such that sequences at least about
60%, at least about 70%, at least about 80%, at least about 90% or
at least about 95% or more identical to each other remain
hybridized to one another. By varying hybridization conditions from
a level of stringency at which no hybridization occurs to a level
at which hybridization is first observed, conditions that will
allow a given sequence to hybridize (e.g., selectively) with the
most complementary sequences in the sample can be determined.
[0337] Exemplary conditions that describe the determination of wash
conditions for moderate or low stringency conditions are described
in Kraus, M. and Aaronson, S., 1991. Methods Enzymol., 200:546-556;
and in, Ausubel et al. 1998, supra)). Washing is the step in which
conditions are usually set so as to determine a minimum level of
complementarity of the hybrids. Generally, starting from the lowest
temperature at which only homologous hybridization occurs, each
.degree. C. by which the final wash temperature is reduced (holding
SSC concentration constant) allows an increase by 1% in the maximum
mismatch percentage among the sequences that hybridize. Generally,
doubling the concentration of SSC results in an increase in T.sub.m
of about 17.degree. C. Using these guidelines, the wash temperature
can be determined empirically for high, moderate or low stringency,
depending on the level of mismatch sought. For example, a low
stringency wash can comprise washing in a solution containing
0.2.times.SSC/0.1% SDS for 10 minutes at room temperature; a
moderate stringency wash can comprise washing in a pre-warmed
solution (42.degree. C.) solution containing 0.2.times.SSC/0.1% SDS
for 15 minutes at 42.degree. C.; and a high stringency wash can
comprise washing in pre-warmed (68.degree. C.) solution containing
0.1.times.SSC/0.1% SDS for 15 minutes at 68.degree. C. Furthermore,
washes can be performed repeatedly or sequentially to obtain a
desired result as known in the art. Equivalent conditions can be
determined by varying one or more of the parameters given as an
example, as known in the art, while maintaining a similar degree of
complementarity between the target nucleic acid molecule and the
primer or probe used (e.g., the sequence to be hybridized).
[0338] A related aspect of the present invention provides a
molecular array, which array comprises a plurality of: [0339] (i)
nucleic acid molecules comprising a nucleotide sequence
corresponding to any one or more of the neoplastic marker sequences
hereinbefore described or a sequence exhibiting at least 80%
identity thereto or a functional derivative, fragment, variant or
homologue of said nucleic acid molecule; or [0340] (ii) nucleic
acid molecules comprising a nucleotide sequence capable of
hybridising to any one or more of the sequences of (i) under medium
stringency conditions or a functional derivative, fragment, variant
or homologue of said nucleic acid molecule; or [0341] (iii) nucleic
acid probes or oligonucleotides comprising a nucleotide sequence
capable of hybridising to any one or more of the sequences of (i)
under medium stringency conditions or a functional derivative,
fragment, variant or homologue of said nucleic acid molecule; or
[0342] (iv) probes capable of binding to any one or more of the
proteins encoded by the nucleic acid molecules of (i) or a
derivative, fragment or, homologue thereof wherein the level of
expression of said marker genes of (i) or proteins of (iv) is
indicative of the neoplastic state of a cell or cellular
subpopulation derived from the large intestine.
[0343] Preferably, said percent identity is at least 85%, 90%, 91%,
92%, 93%, 94%, 95%, 96%, 97%, 98% or 99%.
[0344] Low stringency includes and encompasses from at least about
1% v/v to at least about 15% v/v formamide and from at least about
1M to at least about 2M salt for hybridisation, and at least about
1M to at least about 2M salt for washing conditions. Alternative
stringency conditions may be applied where necessary, such as
medium stringency, which includes and encompasses from at least
about 16% v/v at least about 30% v/v formamide and from at least
about 0.5M to at least about 0.9M salt for hybridisation, and at
least about 0.5M to at least about 0.9M salt for washing
conditions, or high stringency, which includes and encompasses from
at least about 31% v/v to at least about 50% v/v formamide and from
at least about 0.01M to at least about 0.15M salt for
hybridisation, and at least about 0.01M to at least about 0.15M
salt for washing conditions. In general, washing is carried out at
T.sub.m=69.3+0.41 (G+C) % [19]=-12.degree. C. However, the T.sub.m
of a duplex DNA decreases by 1.degree. C. with every increase of 1%
in the number of mismatched based pairs (Bonner et al (1973) J.
Mol. Biol. 81:123).
[0345] Preferably, the subject probes are designed to bind to the
nucleic acid or protein to which they are directed with a level of
specificity which minimises the incidence of non-specific
reactivity. However, it would be appreciated that it may not be
possible to eliminate all potential cross-reactivity or
non-specific reactivity, this being an inherent limitation of any
probe based system.
[0346] In terms of the probes which are used to detect the subject
proteins, they may take any suitable form including antibodies and
aptamers.
[0347] A library or array of nucleic acid or protein probes
provides rich and highly valuable information. Further, two or more
arrays or profiles (information obtained from use of an array) of
such sequences are useful tools for comparing a test set of results
with a reference, such as another sample or stored calibrator. In
using an array, individual probes typically are immobilized at
separate locations and allowed to react for binding reactions.
Oligonucleotide primers associated with assembled sets of markers
are useful for either preparing libraries of sequences or directly
detecting markers from other biological samples.
[0348] A library (or array, when referring to physically separated
nucleic acids corresponding to at least some sequences in a
library) of hCG.sub.--1815491 markers exhibits highly desirable
properties. These properties are associated with specific
conditions, and may be characterized as regulatory profiles. A
profile, as termed here refers to a set of members that provides
diagnostic information of the tissue from which the markers were
originally derived. A profile in many instances comprises a series
of spots on an array made from deposited sequences.
[0349] A molecular array, which array comprises a plurality of:
[0350] (i) nucleic acid molecules comprising a nucleotide sequence
corresponding to any one or more of the hCG.sub.--1815491 markers
as hereinbefore defined or a sequence exhibiting at least 80%
identity thereto or a functional derivative, fragment, variant or
homologue of said nucleic acid molecule; or [0351] (ii) nucleic
acid molecules comprising a nucleotide sequence capable of
hybridising to any one or more of the sequences of (i) under medium
stringency conditions or a functional derivative, fragment, variant
or homologue of said nucleic acid molecule; or [0352] (iii) nucleic
acid probes or oligonucleotides comprising a nucleotide sequence
capable of hybridising to any one or more of the sequences of (i)
under medium stringency conditions or a functional derivative,
fragment, variant or homologue of said nucleic acid molecule; or
[0353] (iv) probes capable of binding to any one or more of the
proteins encoded by the nucleic acid molecules of (i) or a
derivative, fragment or, homologue thereof [0354] wherein the level
of expression of said marker genes of (i) or proteins of (iv) is
indicative of the neoplastic state of a cell or cellular
subpopulation derived from the large intestine.
[0355] A characteristic patient profile is generally prepared by
use of an array. An array profile may be compared with one or more
other array profiles or other reference profiles. The comparative
results can provide rich information pertaining to disease states,
developmental state, receptiveness to therapy and other information
about the patient.
[0356] Another aspect of the present invention provides a
diagnostic kit for assaying biological samples comprising an agent
for detecting one or more neoplastic marker reagents useful for
facilitating the detection by the agent in the first compartment.
Further means may also be included, for example, to receive a
biological sample. The agent may be any suitable detecting
molecule.
[0357] The present invention is further described by the following
non-limiting examples:
Example 1
Materials and Methods
Extraction of RNA
[0358] RNA extractions were performed using Trizol.RTM. reagent
(Invitrogen, Carlsbad, Calif., USA) as per manufacturer's
instructions. Each sample was homogenised in 300 .mu.L of Trizol
reagent using a modified dremel drill and sterilised disposable
pestles. Additional 200 .mu.L of Trizol reagent was added to the
homogenate and samples were incubated at RT for 10 minutes. 100
.mu.L of chloroform was then added, samples were shaken vortexed
for 15 seconds, and incubated at RT for 3 further minutes. The
aqueous phase containing target RNA was obtained by centrifugation
at 12,000 rpm for 15 min, 40.degree. C. RNA was then precipitated
by incubating samples at RT for 10 min with 250 .mu.L of
isopropanol. Purified RNA precipitate was collected by
centrifugation at 12,000 rpm for 10 minutes, 40.degree. C. and
supernatants were discarded. Pellets were then washed with 1 mL 75%
ethanol, followed by vortexing and centrifugation at 7,500 g for 8
min, 40.degree. C. Finally, pellets were air-dried for 5 min and
resuspended in 80 .mu.L of RNase free water. To improve subsequent
solubility samples were incubated at 55.degree. C. for 10 min. RNA
was quantified by measuring the optical density at A260/280 nm. RNA
quality was assessed by electrophoresis on a 1.2% agarose
formaldehyde gel.
Gene Chip Processing
[0359] Gene Chips were processed using the standard Affymetrix
protocol developed for the HU Gene ST 1.0 array described in
[Affymetrix, 2007]. Briefly: First cycle dsDNA was synthesized from
100 ng of total RNA extract using random hexamer primers tagged
with T7 promoter sequence and SuperScript II (Invitrogen, Carlsbad
Calif.) and then DNA Polymerase I. Anti-sense cRNA was then
synthesized using T7 polymerase and combined with SuperScript II,
dUTP (+dNTP), and random hexamers to synthesize sense strand cDNA
incorporating uracil. A combination of uracil DNA glycosylase (UDG)
and apurinic/apyrimidinic endonuclease 1 (APE 1) were used to
fragment the DNA product.
[0360] Next, the DNA was biotin labelled by terminal
deoxynucleotidyl transferase (TdT) with the Affymetrix proprietary
DNA Labeling Reagent covalently linked to biotin. Hybridization to
the Custom Chip CG AGPa520460F was carried out at 45.degree. C. for
16-18 hours. Finally, the chips were washed, stained and scanned as
above. All GeneChips analyzed in our lab were stained with
streptavidin phycoerytherin and washed with a solution containing
biotinylated anti-streptavidin antibodies using the Affymetrix
Fluidics Station 450. Finally, the stained and washed microarrays
were scanned with the Affymetrix Scanner 3000.
qRT-PCR
[0361] Quantitative real time polymerase chain reaction was used to
confirm particular gene expression discoveries using Applied
Biosystems pre-designed and optimized TaqMan gene expression
assays. The resulting expression levels were quantified as a ratio
to three genes (HPRT, TBP and GAPDH) with literature reported low
variance expression levels. Final results were reported using the
.DELTA.-cycle threshold method. Prior to Real-time PCR analysis 100
ng of total RNA was subject to linear amplification using the
QIAGEN QuantiTect Whole Transcriptome amplification kit (QIAGEN,
Country) according to the manufacturer's instructions. 2 .mu.l of
the amplified, diluted (1:50) cDNA was then analysed in a 25 .mu.l
reaction volume by RT-PCR using TaqMan universal master mix
(Applied Biosystems, USA) in an ABI prism 7700 sequence detector
(Manufacturer, Country) following manufacturer's protocols.
End-Point PCR
[0362] Prior to end-point PCR analysis 2 ug of total RNA was
subject to linear amplification a high capacity cDNA reverse
transcription kit available from Applied Biosystems. 5 .mu.l of the
amplified, diluted (1:2) cDNA was then analysied in a 25 .mu.l
reaction volume by PCR using a PCR Master Mix (Promega) according
to manufacturer's recommendation. 2.5 .mu.l of the amplified
products were analysed on 2% agarose E-gel (Invitrogen) along with
a 100-base pair DNA Ladder Marker.
Results
[0363] We have explored the nucleotide structure and expression
levels of transcripts related to hCG.sub.--1815491 based on the
identification of diagnostic utility of Affymetrix probesets
238021_s_at and 238022_at from our gene chip analysis.
[0364] The gene hCG.sub.--1815491 is currently represented in NCBI
as a single RefSeq sequence, XM.sub.--93911. The RefSeq sequence of
hCG.sub.--1815491 is based on 89 GenBank accessions from 83 cDNA
clones. Prior to March 2006, these clones were predicted to
represent two overlapping genes, LOC388279 and LOC650242 (the
latter also known as hCG.sub.--1815491). In March 2006, the human
genome database was filtered against clone rearrangements,
co-aligned with the genome and clustered in a minimal non-redundant
way. As a result, LOC388272 and LOC650242 were merged into one gene
named hCG.sub.--1815491 (earlier references to hCG.sub.--1815491
are: LOC388279, hCG.sub.--1815491, LOC650242, XM.sub.--944116,
AF275804, XM.sub.--373688).
[0365] We have determined that SEQ ID NO:1, which is defined by the
genomic coordinates 8579310 to 8562303 on human chromosome 16 as
defined by the NCBI contig reference NT 010498.15|Hs16.sub.--10655,
NCBI 36 March 2006 genome encompasses hCG.sub.--1815491. We have
aligned the 10 predicted RNA variants derived from this gene with
the genomic nucleotide sequence residing in the map region 8579310
to 8562303. This alignment analysis revealed the existence of at
least 6 exons, of which several are alternatively spliced. The
identified 6 exons are in contrast to the just 4 exons specified in
the NCBI hCG.sub.--1815491 RefSeq XM.sub.--93911. We have used the
identified and expanded exon-intron structure of hCG.sub.--1815491
to design specific oligonucleotide primers, which allowed us to
measure the expression of RNA variants generated from SEQ ID NO:1
by using PCR-based methodology.
[0366] We have conclusively demonstrated the utility of SEQ ID NO:1
to diagnose neoplasia. In particular, we have identified that SEQ
ID NO:1 can be used to diagnose adenomas, benign neoplastic lesions
that can lead to colorectal adenocarcinoma. We have also
demonstrated that SEQ ID NO:1 can be used to diagnose colorectal
cancer itself. We hence claim this molecule for broad clinical
utility.
[0367] In addition, we have conclusively demonstrated
neoplastic-specific expression of some of the RNA variants derived
from SEQ ID NO:1. Neoplastic-specific splicing of hCG.sub.--1815491
has not previously been reported. In particular, RNA variant SEQ ID
NO:21 is by far the most pronounced differentially expressed
variant of SEQ ID NO:1, and SEQ ID NO:21 appears to be sensitive
and specific for colorectal benign pre-cancerous adenomas as well
as colorectal carcinoma. Hence we claim diagnostic utility of SEQ
ID NO:21 for detection of colorectal neoplasia.
[0368] Lastly, we have identified a novel RNA variant, SEQ ID
NO:23, derived from alternative splicing of SEQ ID NO:1. This RNA
variant is the result of an unprecedented splicing of map regions
8577328-8576605 and 8573324-8573212. We use this example to claim
diagnostic utility of any combinations of nucleotide segments
derived from SEQ ID NO:1.
Diagnostic Utility of Oligonucleotide Probesets Directed against
hCG.sub.--1815491 Using Affymetrix Microarray Genechips
[0369] The gene expression of human hCG.sub.--1815491 was measured
by determining the hybridization of RNA extracted from clinical
specimens to Affymetrix oligonucleotide probesets, designated
238021_s_at and 238022_at, FIG. 1. The clinical specimens included
a total of 454 colorectal tissues derived from 161 adenocarcinoma,
29 adenoma, 42 colitis and 222 non-diseased subjects
CONCLUSION
[0370] We conclude that transcripts derived from the human gene
hCG-1815491 have diagnostic utility for identification of
colorectal neoplasia.
Diagnostic Utility of SEQ ID NO:1
[0371] End-point PCR, using the oligonucleotide sequence primers,
5'-TAACTGGAATTCATGTTGGCTGAAATTCATCCCA (located in SEQ ID NO:6) and
5'-CACGATAAGCTTTTATTATAGTCTATAAACAGGAATACCCAAAACATA TTTAAACC
(located in SEQ ID NO:18), was performed to measure the RNA
expression level from map region 8573246 to 88567197 within SEQ ID
NO:1 in a total of 71 colorectal tissue specimens: 30 non-diseased
controls, 21 adenoma tissues and 20 adenocarcinoma tissues, FIG. 2.
End-point PCR demonstrated the appearance of four major products
that were present in essentially all adenoma and adenocarcinoma
colon tissue specimens. Most colon tissue samples from non-disease
control specimens produced none or a limited subset of the PCR
products. The multiple PCR bands included an approximately 284 base
pair product that is the predicted size from the RefSeq NCBI
hCG.sub.--1815491 entry as well as other bands presumed to arise
from alternative splicing.
CONCLUSION
[0372] We conclude that SEQ ID NO:1 that contains map region
8573246 to 88567197 has diagnostic utility as means for detection
of colorectal neoplasia.
Diagnostic Utility of SEQ ID NO:1 by Measuring Concentration
Levels
[0373] Quantitative real-time PCR, using the same oligonucleotide
sequence primers as described in Example 2,5'-TAACTGG
AATTCATGTTGGCTGAAATTCATCCCA and
5'-CACGATAAGCTTTTATTATAGTCTATAAACAGGAATACCCAAAACATA TTTAAACC, was
performed to measure the RNA concentration level of SEQ ID NO:1
transcripts derived from map region 8573246 to 88567197 in a total
of 71 colorectal tissue specimens: 30 non-diseased controls, 21
adenoma tissues and 20 adenocarcinoma tissues, FIG. 3. The figure
shows that most normal tissues expressed low or non-detectable
levels of transcripts by contrast to adenoma and adenocarcinoma
tissues nearly expressed moderate to high levels of transcripts
from SEQ ID NO:1.
CONCLUSION
[0374] We conclude that SEQ ID NO:1 that contains map region
8573246 to 88567197 has diagnostic utility as means as detection of
colorectal neoplasia.
Diagnostic Utility of RNA Transcript Variants from SEQ ID NO:1
[0375] cDNA clones from NCBI/Aceview (Table 4) were used to gather
information regarding predicted RNA transcripts derived from
hCG.sub.--1815491, FIG. 4 & TABLE 1. None of the reported
clones were derived from normal or neoplastic colon tissues.
[0376] Oligonucleotide sequence primer sets were generated to each
of the predicted 10 hCG.sub.--1815491 RNA variants (Table 5) and
end-point PCR using these primer sets was performed to measure the
existence of the ten [10] hCG.sub.--1815491 transcript variants in
a total of 72 colorectal tissue specimens from 30 non-disease, 21
adenoma and 21 adenocarcinoma subjects.
[0377] The differential expression of the 10 predicted RNA
transcripts, as determined using transcript specific primers, is
exemplified in FIG. 5 and Table 2. Differential expression as
measured by end-point PCR was observed for several of the 10 RNA
variants (TABLE 2) e.g. SEQ ID NO:22, SEQ ID NO:24, SEQ ID NO:27
and in particular SEQ ID NO:21 was the best one.
CONCLUSION
[0378] We conclude that predicted RNA variants derived from SEQ ID
NO:1 exist and they are generated through alternative usage of
nucleotide segments in SEQ ID NO:1. We conclude that the presence
of several of the RNA variants and specific splicing events, such
as represented in SEQ ID NO:22, SEQ ID NO:24, SEQ ID NO:27 but in
particular SEQ ID NO:21, have diagnostic utility for detection of
colorectal neoplasia.
Diagnostic Utility of RNA Transcript Variants from SEQ ID NO:1, by
Measuring Concentration Levels
[0379] Quantitative Real-Time PCR, was performed to measure the
concentration level of RNA variants derived from map region 8579310
to 8562303 on the minus strand of human chromosome 16 in a total of
72 colorectal tissue specimens from 30 non-disease controls, 21
adenoma and 21 adenocarcinoma subjects. Quantitative differences
were observed for several of the transcripts, and an example of the
quantitative expression profile of SEQ ID NO:21 is given in FIG.
6.
CONCLUSION
[0380] We conclude that measurement of the RNA concentrations of
SEQ ID NO:25, SEQ ID NO:30, SEQ ID NO:24 but in particular SEQ ID
NO:21 has diagnostic utility for detection of colorectal
neoplasia.
Detection of a Novel RNA Variant, SEQ ID NO:23
[0381] We hypothesized that the gene contained within SEQ ID NO:1
contained 6 or more exons that were alternatively spliced in
multiple combinations in human colorectal tissue. Alignment of the
nucleotide sequences of the predicted mRNA variants derived from
hCG.sub.--1815491 illustrated that the first 184 nucleotides of RNA
SEQ ID NO:25, map region 8577328-8576881 in SEQ ID NO:1, and the
first 274 nucleotides of RNA SEQ ID NO:21, map region
8576878-8576605 in SEQ ID NO:1, were in fact flanking each other.
End-point PCR, using a forward primer spanning the splice junction
of SEQ ID NO:4 and SEQ ID NO:5, 5'-GGCGGAGGAGAGGTGAGC, with a
reverse primer 5'-GCTGACAGCATCCA AATGTATTATG hybridizing to SEQ ID
NO:6 was performed to demonstrate a novel RNA variant derived from
alternative splicing of map region 8576892-8576605 with
8573324-8573280, FIG. 7. The novel RNA variant, named SEQ ID NO:23,
appeared up-regulated in colorectal tissue specimens from 3 adenoma
and 3 adenocarcinoma subjects but not in 2 non-disease controls,
FIG. 7.
CONCLUSION
[0382] Review of all publicly available data indicates that a
nucleotide sequence corresponding the SEQ ID NO:23 has never before
been identified. We conclude that SEQ ID NO:23 represents a novel
RNA variant derived from SEQ ID NO:1. While new sequence data is
common with respect to the human genome project, we have identified
that this transcript designated SEQ ID NO:23 is a splice variant
diagnostic of colorectal neoplasia.
Diagnostic Utility of Individual Exons of hCG.sub.--1815491
[0383] Gene expression across the chromosomal map region 8579310 to
8562303 on chromosome 16 was measured by determining the
hybridization of RNA extracted from clinical specimens to the
Affymetrix oligonucleotide probesets specified in TABLE 3. The
observed differential expression of the probesets specified in
Table 3 from 5 non-disease subjects, 5 adenoma and 5 adenocarcinoma
subjects are summarized in FIG. 8. Details of the differential
expression across the 13 probesets are provided in FIG. 9-21. We
note that expression was not measured across all predicted exons
from SEQ ID NO:1, as the available probesets on the Affymetrix
GeneChip HuGene Exon 1.0 only targeted a subset of the predicted
exons in SEQ ID NO:1.
CONCLUSION
[0384] We conclude that the map region 8577414 to 8566289 has
diagnostic utility for identification of colorectal neoplasia. In
particular, Affymetrix probesets 3692525 (SEQ ID NO:6), 3692524
(SEQ ID NO:9), 3692519 (SEQ ID NO:18), 3692520 (SEQ ID NO:17),
3692523 and 3692522 (SEQ ID NO:15), and 3692521 (SEQ ID NO:13) can
be used to diagnose adenomas, benign neoplastic lesions that can
lead to colorectal adenocarcinoma. We also conclude that these
probesets can be used to diagnose colorectal cancer itself.
[0385] Those skilled in the art will appreciate that the invention
described herein is susceptible to variations and modifications
other than those specifically described. It is to be understood
that the invention includes all such variations and modifications.
The invention also includes all of the steps, features,
compositions and compounds referred to or indicated in this
specification, individually or collectively, and any and all
combinations of any two or more of said steps or features.
TABLE-US-00001 TABLE 1 LIST OF MOLECULE SEQUENCES Genomic Map
SEQUENCE Region-Human ID FIG. 23 Nucleotide sequence Chromosome 16
SEQ ID SEE FIG. 2 8579310- NO: 1 8562303 SEQ ID E2b-E3-
gagcccccgcccgggccaggccctctggccgcgccgtccgcccctctagt 8576878- NO: 21
E5a-E6- cgtgtcccctcgtgggccgaacggacgcggcggtgccccgcgcccgacca 8576605
E7a gacgtcccgtgggctagggcctgggcctcgggccgcgtcggcgccggtcg 8573324-
agcctctccgggtgtcggggttcggggcgggcgcgcgtgggcgtggctcc 8573212
tctgtccacgcctgttcccttcgtcgccgcggctctcgtccgggacacgg 8571761-
ctttccggagtagagcccttggaggtgttaagtgtgatgcttccataata 8571696
catttggatgctgtcagctaagttcacttctgaactaaggggttcctcca 8568521-
aatgttggctgaaattcatcccaaggctggtctgcaaagtctgcaattca 8568409
taatggagctactgtactggctattggaaggaggagattctgaagataag 8567320-
gaggtaaaacctgtttagaaattaaaaatgagttacgatttaaagaaaat 8566974
tcagatgactcattgtgagtgctagttctcttgtaggatgccactggaaa
tgttgaaatgaaaaatattcagccgttggtctttgaaatttcctgtgatg
tgtttcaatctagatgcaaagaacatggaaaaatcaaagtgctcgagtgg
tttaaatatgttttgggtattcctgtttatagactataatacttttccaa
ttaaaatcctcagttgtcacgcagaagaaggttaagctgtatttgattgc
cagttttactgaaaatgcttagtattttacagtatcaccaaatatatttt
gtttagccaaggtataggaaaaataaaataaattgtataggttgactttt
ttctaaaatgtctttattggattgaatgaatgtttatacctgaaaaaaaa aggttcaaaaaaa
SEQ ID E2b-E3- Gagcccccgcccgggccaggccctctggccgcgccgtccgcccctctagt
8576878- NO: 22 E5a-E6c-
cgtgtcccctcgtgggccgaacggacgcggcggtgccccgcgcccgacca 8576605 E7a
gacgtcccgtgggctagggcctgggcctcgggccgcgtcggcgccggtcg 8573324-
agcctctccgggtgtcggggttcggggcgggcgcgcgtgggcgtggctcc 8573212
tctgtccacgcctgttcccttcgtcgccgcggctctcgtccgggacacgg 8571761-
ctttccggagtagagcccttggaggtgttaagtgtgatgcttccataata 8571696
catttggatgctgtcagctaagttcacttctgaactaaggggttcctcca 8568449-
aatgttggctgaaattcatcccaaggctggtctgcaaagtctgcaattca 8568409
taatggagctactgtactggctattggaaggaggagattctgaagataag 8567320-
gagttctcttgtaggatgccactggaaatgttgaaatgaaaaatattcag 8566974
ccgttggtctttgaaatttcctgtgatgtgtttcaatctagatgcaaaga
acatggaaaaatcaaagtgctcgagtggtttaaatatgttttgggtattc
ctgtttatagactataatacttttccaattaaaatcctcagttgtcacgc
agaagaaggttaagctgtatttgattgccagttttactgaaaatgcttag
tattttacagtatcaccaaatatattttgtttagccaaggtataggaaaa
ataaaataaattgtataggttgacttttttctaaaatgtctttattggat
tgaatgaatgtttatacctgaaaaaaaaaggttcaaaaaaa SEQ ID E2a-E2b-
tctcggcgccagaggggcggggaggggcggggtctcgatcgcgctattgt 8577328- NO: 23
E3 catggagacgggaagctggctgcagcggcggcggggaccgtggggccgag 8576605
gtggctgccagccggccaatgtctaagcgaggcggagcggcccaggcggc 8573324-
ccgagcctgggggagcgcgcagccggccagtggcggcctcgccggcggcc 8573212
tcttcccgggctcgcagtaggcccgagtcgtcgccgggagctcctgggag
cagcgtccccgccctgctcccctcgctcccgcctcttgcggccccacggc
ccctcagcgcccgcccccggctccgcccgccgcagccgcagcccctggcg
ctaacggtcggtaacggcccgcgcgcgccgcccgccgggggctcgcgcca
gccacgagggagcgtccgcggcccgcgcgcccgcgcggcggaggagaggt
gagcccccgcccgggccaggccctctggccgcgccgtccgcccctctagt
cgtgtcccctcgtgggccgaacggacgcggcggtgccccgcgcccgacca
gacgtcccgtgggctagggcctgggcctcgggccgcgtcggcgccggtcg
agcctctccgggtgtcggggttcggggcgggcgcgcgtgggcgtggctcc
tctgtccacgcctgttcccttcgtcgccgcggctctcgtccgggacacgg
ctttccggagtagagcccttggaggtgttaagtgtgatgcttccataata
catttggatgctgtcagctaagttcacttctgaactaaggggttcctcca
aatgttggctgaaattcatcccaaggctggtctgcaa SEQ ID E5b-E6-
catgctttttgagaagtgtatcatctaggaagaaaatcaaatggagtatt 8571889- NO: 24
E7a ggtaattaaattgtaattccatgaaggaaggaagtggtgcaaaagatgaa 8571696
gctaactattcctgtttttctttttaagagtctgcaattcataatggagc 8568521-
tactgtactggctattggaaggaggagattctgaagataaggaggtaaaa 8568409
cctgtttagaaattaaaaatgagttacgatttaaagaaaattcagatgac 8567320-
tcattgtgagtgctagttctcttgtaggatgccactggaaatgttgaaat 8566974
gaaaaatattcagccgttggtctttgaaatttcctgtgatgtgtttcaat
ctagatgcaaagaacatggaaaaatcaaagtgctcgagtggtttaaatat
gttttgggtattcctgtttatagactataatacttttccaattaaaatcc
tcagttgtcacgcagaagaaggttaagctgtatttgattgccagttttac
tgaaaatgcttagtattttacagtatcaccaaatatattttgtttagcca
aggtataggaaaaataaaataaattgtataggttgacttttttctaaaat
gtctttattggattgaatgaatgtttatacctgaaaaaaaaaggttcaaa aaaa SEQ ID
E2a-E3a tctcggcgccagaggggcggggaggggcggggtctcgatcgcgctattgt 8577328-
NO: 25 catggagacgggaagctggctgcagcggcggcggggaccgtggggccgag 8576881
gtggctgccagccggccaatgtctaagcgaggcggagcggcccaggcggc 8573324-
ccgagcctgggggagcgcgcagccggccagtggcggcctcgccggcggcc 8573041
tcttcccgggctcgcagtaggcccgagtcgtcgccgggagctcctgggag
cagcgtccccgccctgctcccctcgctcccgcctcttgcggccccacggc
ccctcagcgcccgcccccggctccgcccgccgcagccgcagcccctggcg
ctaacggtcggtaacggcccgcgcgcgccgcccgccgggggctcgcgcca
gccacgagggagcgtccgcggcccgcgcgcccgcgcggcggaggagaggt
gttaagtgtgatgcttccataatacatttggatgctgtcagctaagttca
cttctgaactaaggggttcctccaaatgttggctgaaattcatcccaagg
ctggtctgcaagtgagtgtctgcacacagtttgcttgtatgtggagtcga
tccaaaatagcatcaatgttggttttaccaaagtatttattattgataat
agaggctaagtacaaaatgtagagaatgtcagctacttgaggcctttgat
tattaaaaattttattaatgcattaaacaaga SEQ ID E3-E5a-
gtgttaagtgtgatgcttccataatacatttggatgctgtcagctaagtt 8573324- NO: 26
E6a-E7a cacttctgaactaaggggttcctccaaatgttggctgaaattcatcccaa 8573212
ggctggtctgcaaagtctgcaattcataatggagctactgtactggctat 8571761-
tggaaggaggagattctgaagataaggaggatgccactggaaatgttgaa 8571696
atgaaaaatattcagccgttggtctttgaaatttcctgtgatgtgtttca 8568438-
atctagatgcaaagaacatggaaaaatcaaagtgctcgagtggtttaaat 8568409
atgttttgggtattcctgtttatagactataatacttttccaattaaaat 8567320-
cctcagttgtcacgcagaagaaggttaagctgtatttgattgccagtttt 8566974
actgaaaatgcttagtattttacagtatcaccaaatatattttgtttagc
caaggtataggaaaaataaaataaattgtataggttgacttttttctaaa
atgtctttattggattgaatgaatgtttatacctgaaaaaaaaaggttca aaaaaa SEQ ID
E2a-E3- tctcggcgccagaggggcggggaggggcggggtctcgatcgcgctattgt 8577328-
NO: 27 E4-E5a- catggagacgggaagctggctgcagcggcggcggggaccgtggggccgag
8576881 E6-E7a gtggctgccagccggccaatgtctaagcgaggcggagcggcccaggcggc
8573324- ccgagcctgggggagcgcgcagccggccagtggcggcctcgccggcggcc 8573212
tcttcccgggctcgcagtaggcccgagtcgtcgccgggagctcctgggag 8572798-
cagcgtccccgccctgctcccctcgctcccgcctcttgcggccccacggc 8572712
ccctcagcgcccgcccccggctccgcccgccgcagccgcagcccctggcg 8571761-
ctaacggtcggtaacggcccgcgcgcgccgcccgccgggggctcgcgcca 8571696
gccacgagggagcgtccgcggcccgcgcgcccgcgcggcggaggagaggt 8568521-
gttaagtgtgatgcttccataatacatttggatgctgtcagctaagttca 8568409
cttctgaactaaggggttcctccaaatgttggctgaaattcatcccaagg 8567320-
ctggtctgcattacctatttcttttaagaataaatttagtgggaatatca 8566974
gttccagtcatgggtaccaaacttttttagtgacagagtacacacagagt
ctgcaattcataatggagctactgtactggctattggaaggaggagattc
tgaagataaggaggtaaaacctgtttagaaattaaaaatgagttacgatt
taaagaaaattcagatgactcattgtgagtgctagttctcttgtaggatg
ccactggaaatgttgaaatgaaaaatattcagccgttggtctttgaaatt
tcctgtgatgtgtttcaatctagatgcaaagaacatggaaaaatcaaagt
gctcgagtggtttaaatatgttttgggtattcctgtttatagactataat
acttttccaattaaaatcctcagttgtcacgcagaagaaggttaagctgt
atttgattgccagttttactgaaaatgcttagtattttacagtatcacca
aatatattttgtttagccaaggtataggaaaaataaaataaattgtatag
gttgacttttttctaaaatgtctttattggattgaatgaatgtttatacc
tgaaaaaaaaaggttcaaaaaaa SEQ ID E6e
tgataagcaacatccaaatattttgaccctgcttttagtggtttttttca 8569201- NO: 28
aatcttattttgagtcttacttttagtcatagaatagctactgatttgat 8566974
unspliced gcggtctttaactgacttaatatttttacaatttcaatatattttgcatt
ggaatctccagtaatgaatattaaaatatatgtacaatcatttgtagatg
atatcaattatattaagacatttcagatgggctattgtagtatttaatgt
gccgtattttatggtagaataattctcagtctctggacatcaagattgct
ttcagtgggaatgaagattaatttacttcagtcctgattttttaggcatc
aatgcatgttttcatttttgtcagacttttaccctcttttaatgtaattc
tcaacttcttatggatttacttcccaatacataaaatccttcaaaacaag
aatgataataatttttatactttttataaaaataaatttatttttagtcc
atcaaggtgtctgaagattttatgcctaggtatctccatatctaacttga
taaggaaaataggataaacaatgctggtaatagcaggaaagtaagtattt
gaataagatgtcaaactgatatttcatgtgaacctaactcattttatggt
aactaataattatcttatttaaatcaataggtaaaacctgtttagaaatt
aaaaatgagttacgatttaaagaaaattcagatgactcattgtgagtgct
agttctcttgtaggatgccactggaaatgttgaaatgaaaaatgtaagta
tatcttttggtggaaaaaaggatagtctctaggacacaaaattactgttt
tatttttttctcaggagtttgcctaagggtgtgacagatgatctctgtca
cttgtcttagttgtgtcctgcaataaactggatgctttataaaatactag
acctgtgatttcgtatgctgtaatatttcatttctccatcacccctccaa
attatttcttagtttggagtaaaataataaatgtattatagtcaacatct
cttgacccctctttagtttcagctaaactaagcatgtgtgtttgtgtgtt
cattttatagttcatgtgtagaactatgtgaattaaatttaagaaacatg
taaagtagaggaaatagttttctggagaaatttttcctttttggatatta
tgcccttttccattgcttttctctgcttgaaagcaaaaaaaagtacccta
cccctgttctcctttagggaaaaactattcctataaagtatttttaaatc
gtgcaagtcattgcctagggttagctaaaacatttctttttaaaaaggag
aaaatgccctggctttaacattttcttgtatttgtatctattaagataaa
cagtttactttgatacagtacataccaatctacttaattttttttccagg
attccttttactatgtttggtctgaccttttatgataacttaatatggga
acaaattagcatataattctattttccatgtgacctcaaccagttgcaga
attgtaccactactttagggggggcaatttgacagtttatgtagactata
gcattaattgttcccaaatgttcagtgcatcctggctaatgtgttattga
aggtgttttcacgtaagcagttagaggaagcacttcacccctattactaa
gttattaaaatgcctcctaaaggtagcattttaaattagtatacataatt
gattagtaatttgtcttctcccaagcataaaacagcatagcagagttaag
tgtgaccagtgaagtataagatattagggattgatggtgacaatgatcat
agcaactaaatggattttttttttcttttagattcagccgttggtctttg
aaatttcctgtgatgtgtttcaatctagatgcaaagaacatggaaaaatc
aaagtgctcgagtggtttaaatatgttttgggtattcctgtttatagact
ataatacttttccaattaaaatcctcagttgtcacgcagaagaaggttaa
gctgtatttgattgccagttttactgaaaatgcttagtattttacagtat
caccaaatatattttgtttagccaaggtataggaaaaataaaataaattg
tataggttgacttttttctaaaatgtctttattggattgaatgaatgttt
atacctgaaaaaaaaaggttcaaaaaaat SEQ ID E6d-E7a
tttaatagaaggaaaatataaatttaatatctgggcaattgagaccttta 8570158- NO: 29
aacttactttaaaagtatgatcttgatgtatatgatactgttttgtcttt 8568409
gctatattaacagaattagaggggtgttctgcaattcaaataccttatat 8567320-
attccaaattttattctctataatggacttttaaaataaaaggtatatgt 8566974
gcttcaagagggcaaaatttgaatcatgagctaatttgctaagcatcaga
ttatagaaaagcatccttgattaatttggaactgtgaaagggggcgggta
aaactgttttctgcagaaatttactagtgcagcaaccatttaaattaaat
gtttgttaacataatagtgatggcattttctcctccccctccttgtggtt
ttgtccaactagatgttacagtggcagttgcactgactgttaagtgttta
aatgatgacaccattatgtgaagtgattttgaaatgagagattccagcca
agaattacatctgctcccatctccttcaaatcatactctctggcagtaca
gattatgattgatttgtttgtgacagattgcaggaaacagtcattgattt
ttcaatattttaccttaaaattatttacagttgtaaccatggggaggtat
tttcatgggctgtcagcccctgaaagactaggataatattccctgctctc
tgacaagacaaattacctgtaatgagtgcagtagctgaagggtatacttt
tattttaaaatatgtcaataaccccagtgactaaacgaatattgatttag
cataatgaagcctgagtaacgtgaaaatgagctttttcaaggggcatggt
aaagtctttctttttagctggttgtaagaagcttttgattcttttcagcc
agctggtaggaatatagaattttataagcaaaccatcaggaatgatagtg
ttgtttctgataagcaacatccaaatattttgaccctgcttttagtggtt
tttttcaaatcttattttgagtcttacttttagtcatagaatagctactg
atttgatgcggtctttaactgacttaatatttttacaatttcaatatatt
ttgcattggaatctccagtaatgaatattaaaatatatgtacaatcattt
gtagatgatatcaattatattaagacatttcagatgggctattgtagtat
ttaatgtgccgtattttatggtagaataattctcagtctctggacatcaa
gattgctttcagtgggaatgaagattaatttacttcagtcctgatttttt
aggcatcaatgcatgttttcatttttgtcagacttttaccctcttttaat
gtaattctcaacttcttatggatttacttcccaatacataaaatccttca
aaacaagaatgataataatttttatactttttataaaaataaatttattt
ttagtccatcaaggtgtctgaagattttatgcctaggtatctccatatct
aacttgataaggaaaataggataaacaatgctggtaatagcaggaaagta
agtatttgaataagatgtcaaactgatatttcatgtgaacctaactcatt
ttatggtaactaataattatcttatttaaatcaataggtaaaacctgttt
agaaattaaaaatgagttacgatttaaagaaaattcagatgactcattgt
gagtgctagttctcttgtaggatgccactggaaatgttgaaatgaaaaat
attcagccgttggtctttgaaatttcctgtgatgtgtttcaatctagatg
caaagaacatggaaaaatcaaagtgctcgagtggtttaaatatgttttgg
gtattcctgtttatagactataatacttttccaattaaaatcctcagttg
tcacgcagaagaaggttaagctgtatttgattgccagttttactgaaaat
gcttagtattttacagtatcaccaaatatattttgtttagccaaggtata
ggaaaaataaaataaattgtataggttgacttttttctaaaatgtcttta
ttggattgaatgaatgtttatacctgaaaaaaaaaggttcaaaaaaa SEQ ID E2a-E3-
tctcggcgccagaggggcggggaggggcggggtctcgatcgcgctattgt 8577328- NO: 30
E5-E7 catggagacgggaagctggctgcagcggcggcggggaccgtggggccgag 8576881
gtggctgccagccggccaatgtctaagcgaggcggagcggcccaggcggc 8573324-
ccgagcctgggggagcgcgcagccggccagtggcggcctcgccggcggcc 8573212
tcttcccgggctcgcagtaggcccgagtcgtcgccgggagctcctgggag 8571761-
cagcgtccccgccctgctcccctcgctcccgcctcttgcggccccacggc 8571392
ccctcagcgcccgcccccggctccgcccgccgcagccgcagcccctggcg 8567576-
ctaacggtcggtaacggcccgcgcgcgccgcccgccgggggctcgcgcca 8566974
gccacgagggagcgtccgcggcccgcgcgcccgcgcggcggaggagaggt
gttaagtgtgatgcttccataatacatttggatgctgtcagctaagttca
cttctgaactaaggggttcctccaaatgttggctgaaattcatcccaagg
ctggtctgcaaagtctgcaattcataatggagctactgtactggctattg
gaaggaggagattctgaagataaggaggtaatattatctcttttaaaaga
atactttcctctgtaatcctgaatctttattacatgtaagaactttgtgc
agtagacagcaatttctttgaatttggtatatggaaacaattttattttc
ctctgctaagtttttgagcctgcctcttctagtgccatggactgcattgg
tagagctgagaaatatcatttagccatactcagcacccttaaaatagctt
ctttctgagaattagatctgtgaaggtgtcctgcacagttcttgtagatg
tcattttagtttgtggttgacgtgcatgcattgcatcctggctaatgtgt
tattgaaggtgttttcacgtaagcagttagaggaagcacttcacccctat
tactaagttattaaaatgcctcctaaaggtagcattttaaattagtatac
ataattgattagtaatttgtcttctcccaagcataaaacagcatagcaga
gttaagtgtgaccagtgaagtataagatattagggattgatggtgacaat
gatcatagcaactaaatggattttttttttcttttagattcagccgttgg
tctttgaaatttcctgtgatgtgtttcaatctagatgcaaagaacatgga
aaaatcaaagtgctcgagtggtttaaatatgttttgggtattcctgttta
tagactataatacttttccaattaaaatcctcagttgtcacgcagaagaa
ggttaagctgtatttgattgccagttttactgaaaatgcttagtatttta
cagtatcaccaaatatattttgtttagccaaggtataggaaaaataaaat
aaattgtataggttgacttttttctaaaatgtctttattggattgaatga
atgtttatacctgaaaaaaaaaggttcaaaaaaa SEQ ID E2a-E3-
tctcggcgccagaggggcggggaggggcggggtctcgatcgcgctattgt 8577328- NO: 31
E5a-E6- catggagacgggaagctggctgcagcggcggcggggaccgtggggccgag 8576881
E7a gtggctgccagccggccaatgtctaagcgaggcggagcggcccaggcggc 8573324-
ccgagcctgggggagcgcgcagccggccagtggcggcctcgccggcggcc 8573212
tcttcccgggctcgcagtaggcccgagtcgtcgccgggagctcctgggag 8571761-
cagcgtccccgccctgctcccctcgctcccgcctcttgcggccccacggc 8571696
ccctcagcgcccgcccccggctccgcccgccgcagccgcagcccctggcg 8568521-
ctaacggtcggtaacggcccgcgcgcgccgcccgccgggggctcgcgcca 8568409
gccacgagggagcgtccgcggcccgcgcgcccgcgcggcggaggagaggt 8567320-
gttaagtgtgatgcttccataatacatttggatgctgtcagctaagttca 8566974
cttctgaactaaggggttcctccaaatgttggctgaaattcatcccaagg
ctggtctgcaaagtctgcaattcataatggagctactgtactggctattg
gaaggaggagattctgaagataaggaggtaaaacctgtttagaaattaaa
aatgagttacgatttaaagaaaattcagatgactcattgtgagtgctagt
tctcttgtaggatgccactggaaatgttgaaatgaaaaatattcagccgt
tggtctttgaaatttcctgtgatgtgtttcaatctagatgcaaagaacat
ggaaaaatcaaagtgctcgagtggtttaaatatgttttgggtattcctgt
ttatagactataatacttttccaattaaaatcctcagttgtcacgcagaa
gaaggttaagctgtatttgattgccagttttactgaaaatgcttagtatt
ttacagtatcaccaaatatattttgtttagccaaggtataggaaaaataa
aataaattgtataggttgacttttttctaaaatgtctttattggattgaa
tgaatgtttatacctgaaaaaaaaaggttcaaaaaaa
TABLE-US-00002 TABLE 2 SUMMARY OF END-POINT PCR BASED MEASUREMENT
OF PREDICTED RNA VARIANTS DERIVED FROM SEQ ID NO: 1 Non-diseased
Controls Adenoma Adenocarcinoma SEQ ID NO: 21 3 positive out of 30
19 positive out of 21 20 positive out of 21 SEQ ID NO: 23 0
positive out of 2 3 positive out of 3 3 positive out of 3 SEQ ID
NO: 24 1 positive out of 30 15 positive out of 21 5 positive out of
21 SEQ ID NO: 27 1 positive out of 30 11 positive out of 21 11
positive out of 21 SEQ ID NO: 22 1 positive out of 30 6 positive
out of 21 8 positive out of 21 SEQ ID NO: 29 8 positive out of 30
18 positive out of 21 20 positive out of 21 SEQ ID NO: 28 12
positive out of 30 18 positive out of 21 18 positive out of 21 SEQ
ID NO: 30 16 positive out of 30 20 positive out of 21 21 positive
out of 21 SEQ ID NO: 31 16 positive out of 30 21 positive out of 21
21 positive out of 21 SEQ ID NO: 25 19 positive out of 30 20
positive out of 21 21 positive out of 21 SEQ ID NO: 26 19 positive
out of 30 20 positive out of 21 21 positive out of 21
TABLE-US-00003 TABLE 3 AFFYMETRIX HuGene Exon 1.0 PROBESETS
TARGETING NUCLEOTIDE SEQUENCES IN SEQ ID NO: 1 PROBESET SEQ ID NO:
79 ID TARGET SEQUENCE SEQ ID NO: 76 3692517
taaaatgtctttattggattgaatgaatgtttatacctga SEQ ID NO: 77 3692518
aggttaagctgtatttgattgccagttttactgaaaatgcttagtattttacagtatc
accaaatata SEQ ID NO: 78 3692519
aaatttcctgtgatgtgtttcaatctagatgcaaagaacatggaaaaatcaaagtgct
cgagtggtttaaatatgttttgggtattcctgtttatagactataatacttttccaat
taaaatcctcagttgtcacgcaga SEQ ID NO: 79 3692520
gcctaagggtgtgacagatgatctctgtcacttgtcttagttgtgtcctgcaataaac
tggatgctttataaaatactagacctgtgatttcgtatgctgtaatatttcatttctc
catcacccctccaaattatttcttagtttggagtaaaataataaatgtattatagtca
acatctcttgacccctctttagtttcagctaaactaagcatgtgtgtttgtgtgttca
ttttatagttcatgtgtagaactatgtgaattaaatttaagaaacatgtaaagtagag
gaaatagttttctggagaaatttttcctttttggatattatgcccttttccattgctt
ttctctgcttgaaagcaaaaaaaagtaccctacccctgttctcctttagggaaaaact
attcctataaagtatttttaaatcgtgcaagtcattgcctagggttagctaaaacatt
tctttttaaaaaggagaaaatgccctggctttaacattttcttgtatttgtatctatt
aagataaacagtttactttgatacagtacataccaatctacttaattttttttccagg
attccttttactatgtttggtctgaccttttatgataacttaatatgggaacaaatta
gcatataattctattttccatgtgacctcaaccagttgcagaattgtaccactacttt
agggggggcaatttgacagtttatgtagactatagcattaattgttcccaaatgttca
gtgcatcctggctaatgtgttattgaaggtgttttcacgtaagcagttagaggaagca cttc SEQ
ID NO: 80 3692521 gatgccactggaaatgttgaaatgaaaaat SEQ ID NO: 81
3692522 gaaaattcagatgactcattgtgagtgctagttc SEQ ID NO: 82 3692523
ttcaaggggcatggtaaagtctttctttttagctggttgtaagaagcttttgattctt
ttcagccagctggtaggaatatagaattttataagcaaaccatcaggaatgatagtgt
tgtttctgataagcaacatccaaatattttgaccctgcttttagtggtttttttcaaa
tcttattttgagtcttacttttagtcatagaatagctactgatttgatgcggtcttta
actgacttaatatttttacaatttcaatatattttgcattggaatctccagtaatgaa
tattaaaatatatgtacaatcatttgtagatgatatcaattatattaagacatttcag
atgggctattgtagtatttaatgtgccgtattttatggtagaataattctcagtctct
ggacatcaagattgctttcagtgggaatgaagattaatttacttcagtcctgattttt
taggcatcaatgcatgttttcatttttgtcagacttttaccctcttttaatgtaattc
tcaacttcttatggatttacttcccaatacataaaatccttcaaaacaagaatgataa
taatttttatactttttataaaaataaatttatttttagtccatcaaggtgtctg SEQ ID NO:
83 3692524 gcaattcataatggagctactgtactggctattgga SEQ ID NO: 84
3692525 gtgtgatgcttccataatacatttggatgctgtcagctaagttcacttctgaactaag
gggttcctccaaatgttggctgaaattcatcccaaggctggtctgc SEQ ID NO: 85
3692526 ccgaccagacgtcccgtgggctagggcctgggcctcgggccgcgtcggcgccggtcga
gcctctccgggtgtcggggttcggggcgggcgcgcgtgggcgtggctcctctgtccac
gcctgttcccttcgtcgccgcggctctcgtccgggacacggctttccggagtagagcc ctt SEQ
ID NO: 86 3692527
aggtggctgccagccggccaatgtctaagcgaggcggagcggcccaggcggcccgagc
ctgggggagcgcgcagccggccagtggcggcctcgccggcggcctcttcccgggctcg
cagtaggcccgagtcgtcgccgggagctcctgggagcagcgtccccgccctgctcccc
tcgctcccgcctcttgcggccccacggcccctcagcgcccgcccccggctccgcccgc
cgcagccgcagcccctggcgctaacggtcggtaacggcccgcgcgcgccgcccgccgg
gggctcgcgccagccacgagggagcgtc SEQ ID NO: 87 3692505
ggcctgagcggttcagactacattctccgagagcccctgggtccgcccagcccagtgc
ctgacacctccttcacctatgattgggcgctggcct SEQ ID NO: 88 3692504
gtatagcacagcatcacaacctggatactgacattgatgcagtcaagacagagaacat
ttatatcatgaggaggatccctcattaccgccctttgatatccacccctacttccaga
ccatctcactcctcccttaaccctggcaaccactagcatgttctccatttctataaat
ttgcctttataggaatgttatataattgcaattaaagtgtgtaaccttttggggtttg
actcacccggcatcattttctggagattcagcttatatgtgtca
TABLE-US-00004 TABLE 4 hCG_1815491 cDNA clones DB455235 DB347418
BU590179 AI827680 BQ638202 AA581577 AI004404 BX096724 BM920423
AW173121 DB222387 W38547 CN278390 BF436749 BM151589 CN278219
LOC388279 BU737152 DB349477 AA928654 XM_373688 CA313804 BF692451
AI985612 AI245732 H89247 BI561324 BQ011371 AW023444 BE246152
DB452125 AI804090 BM696001 BI497216 BU165627 AI342725 BU729242
DB145524 BU165662 AW975944 LOC650242 DB143311 BU569024 AA746740
XM_644116 DA828150 BF672570 BU689926 DB446128 CN289138 BU160166
BG193316 DB175550 CN292893 AW117234 AA625672 BM698708 CV372409
DB517664 AI214681 BI768666 BF912258 CB854553 DW420944 CD356299
BE000458 AI923595 N90090 CN288533 CD000458 AA825162 CV575277
CN275915 AI903846 BU180741 BU625145 CA436924 BM150430 BG720116
DB520645 BF679396 DB372595 BE504515 AV725613 CB217500 AA829347
AF275804 BQ002970 AI242819 AI204177 AA844729 AA954994 BM974647
TABLE-US-00005 TABLE 5 OLIGONUCLEOTIDE PRIMERS Genomic map regions
Primer (start of nucleotide sequence primer) Amplicon sequence
confirmation 5'- TAACTGGAATTCATGTTGGC TGAAATTCATCCCA 8573248=>
MULTIPLE AMPLICONS GENERATED. 5'- CACGATAAGCTTTTATTATA
GTCTATAAACAGGA ATACCCAAAACATATTTAA ACC <=88567198 5'-
ACACGGCTTTCCGGAGTAGA 5'- AACAGGTTTTACCTCCTTAT CTTCAGAA 8576635=>
<=8571695// 8568521- 8568509 ##STR00001## 5'-
ACACGGCTTTCCGGAGTAGA 5'- GGCATCCTACAAGAGAACT CCTTATC 8576635=>
<=8571695// 8568449- 8568433 ##STR00002## 5'- GGCGGAGGAGAGGTGAGC
5'-GCTGACAGCATCCA AATGTATTATG 8576892=> <=8573280
##STR00003## 5'- TTTTTGAGAAGTGTATCATC TAGGAAGAA 5'-
ACATATTTAAACCACTCGA GCACTTTG 8571884- 8571856 <=8567253- 8567226
##STR00004## 5'- CAGCCACGAGGGAGCGT 5'- GGATCGACTCCACATACAA GCA
8576931=> <=8573192- 8573170 ##STR00005## 5'-
ATGTTGGCTGAAATTCATCC CA 5'- TTCCAGTGGCATCCTCCTTA TC 8573247=>
<=8571696// 8568437- 8568425 ##STR00006## 5'-
ATGTTGGCTGAAATTCATCC CA 5'- TCTGTGTGTACTCTGTCACT AAAAAAGTTT
8573247=> <=8572712 ##STR00007## 5'- TAAGATATTAGGGATTGAT
GGTGACAA 5'- ACATATTTAAACCACTCGA GCACTTTG 8567385=> <=8567226
##STR00008## 5'- TGCCTAGGTATCTCCATATC TAACTTGA 5'-
ACATATTTAAACCACTCGA GCACTTTG 8568679=> <=8567226 ##STR00009##
5'- ATGTTGGCTGAAATTCATCC CA 5'- TGCTGAGTATGGCTAAATG ATATTTCTC
8573247=> <=8571488 ##STR00010## 5'-CAGCCACGAGGGAGCGT 5'-
AACAGGTTTTACCTCCTTAT CTTCAGAA 8568679=> <=8571695// 8568521-
8568510 ##STR00011##
BIBLIOGRAPHY
[0386] Alon et al., Proc. Natl. Acad. Sci. USA: 96, 6745-6750, June
1999 [0387] Ausubel, F. et al., "Current Protocols in Molecular
Biology", John Wiley & Sons, (1998) [0388] Bonner et al (1973)
J. Mol. Biol. 81:123 [0389] DeRisi, et al., Nature Genetics
14:457-460 (1996) [0390] Germer et al., Genome Res. 10:258-266
(2000) [0391] Guo et al., Nucleic Acids Res. 22:5456-5465 (1994)
[0392] Heid et al., Genome Res. 6:986-994 (1996) [0393] Kraus, M.
and Aaronson, S., 1991. Methods Enzymol., 200:546-556 [0394] Maskos
and Southern, Nuc. Acids Res. 20:1679-84, 1992 [0395] Moore et al.,
BBA, 1402:239-249, 1988 [0396] Nielsen (1999) Curr. Opin.
Biotechnol. 10:71-75 [0397] Nielsen et al. (1991) Science 254:
1497-1500 [0398] Pease et al., Proc. Natl. Acad. Sci. USA
91(11):5022-5026 (1994) [0399] Pevzner et al., J. Biomol. Struc.
& Dyn. 9:399-410, 1991 [0400] Schena, et al. Science
270:467-470 (1995) [0401] Smith et al., Science 258:1122-1126
(1992) [0402] T. Sano and C. R. Cantor, Bio/Technology 9:1378-81
(1991) [0403] Urdea et al., Nucleic Acids Symp. Ser., 24:197-200
(1991) [0404] Wedemeyer et al., Clinical Chemistry 48:9 1398-1405,
2002) [0405] Weissleder et al., Nature Medicine 6:351-355, 2000
Sequence CWU 1
1
98117009DNAHomo Sapiens 1tggccacaca cgggcatggg gcgcgccgcg
ccgcggccgc caacgagccg ggcggcgccc 60tgcgagggcg agcgggcggg cacctggcct
ctggctgccc tgggccgccg ctcctctggc 120cccggctccg gggctccggc
ccgcggcgcc tcctcgctgg cttcccgcgc gcctccggct 180gcgaccgccg
cgcccgctcc tctgcgcgcc tcgctcgccc cagctgggct ttttttcccc
240tcccctcccc tcccttgctg ctttctcttt tttcccctcg ctctttgcac
cggggggctc 300tgcttttgcc tttgcaaagg tcctgccaag atgctaagtt
ggaaattgag gattctgacg 360ccttgtgcgc gcccgaagct cctccttccc
ggggtgtagt cggtgggagg actggcagga 420gcttctgggc ggccgcagcc
aacccggccg ccaggcgcgc ctcgccctct ccctcttcct 480cctcggctct
ctctcccgct cgctggcgct cctctcgccc ctccctagcg cccgcctccc
540ctccgcggcc ccccctctca cccctcctct ttgcctcccc ttttcccctc
cccggtctct 600ccctctctct cgctttctct cagactctcg agcgccggcc
ccaggatgac aatcacatcc 660caggagcgcc gatctcttcc aactttcctc
ttctgctaac tcggggcgga gtggcagtcc 720cgccgccccg aagcacaaag
ggaacgaggc cgcgggctgt gcgccggcga acgctctgcg 780ctctcctagc
cacagtagat cgcggactta gcggatttct tgttctccgg caggctgggc
840tccgaggcca ttcgttgccc acccccctct ggcgtcttcc ccaagccagg
gggcccggag 900agccagctgg agatccggaa tgaaagtctc tgggagagcc
gatggatggc ccgcgcccag 960ggcgcaggaa gtccgggatg actgcccctc
tgcgccggca gcagcaggtg ggcgagagac 1020aggcctcaga cgtctgcacg
tctccgcctc gccttccttc taccgacccc ccgccggacg 1080gcgagggaga
agacactggt tcctggaact accggggtag cctttttcta gagtaggggt
1140ggtgggcaag aactgccaga cagagaatca gctacacccg aggagatctc
gggaccgtcc 1200ccagctccac tcccacgcct cgcagcctct ttctcccggt
tttcccaccg cacgcagctc 1260gcaccgccaa gtcggtggtg gtggagggtg
cgggtcccct tcccttttgt ttaactaagt 1320cgccttcccc ttcgcacgca
ctctcgcatc cgcccacgct ccactgcaaa cactgggcag 1380agcagtgccc
aacccaacca ctgtttgttc agcgaggcgc tggcgaagca aacacacaca
1440actgagtcag aggcagagcc cttgcccacg acggccacac agttagaata
caaaacagac 1500cctctcctct tctctccacc tcccgccacc acaagccacg
gacgcactcg aactctggga 1560cagacagggg cgctggtgaa ccaggacaga
agatggcaca ggtttgggca cgccgcggaa 1620gctcgggata tcgggaactt
cgcataatgg ggccacacgc aagccgaagc ataactatat 1680acaaagtttc
tcgcattgac tttgacgggc gagatgtact ttatttcgcg atctcacact
1740aacccagcgc cgcaccggca cccgctcggt gctgcgctct cgtgcacgcg
cgttggctcc 1800tcccctccgt ctgctcccct cccccagaca ccgcccacca
agaggcctga gcggttcaga 1860ctacattctc cgagagcccc tgggtccgcc
cagcccagtg cctgacacct ccttcaccta 1920tgattgggcg ctggcctccc
tgggctccgc cccctggtga cgtaaccccg ctttcctccg 1980agtctcggcg
ccagaggggc ggggaggggc ggggtctcga tcgcgctatt gtcatggaga
2040cgggaagctg gctgcagcgg cggcggggac cgtggggccg aggtggctgc
cagccggcca 2100atgtctaagc gaggcggagc ggcccaggcg gcccgagcct
gggggagcgc gcagccggcc 2160agtggcggcc tcgccggcgg cctcttcccg
ggctcgcagt aggcccgagt cgtcgccggg 2220agctcctggg agcagcgtcc
ccgccctgct cccctcgctc ccgcctcttg cggccccacg 2280gcccctcagc
gcccgccccc ggctccgccc gccgcagccg cagcccctgg cgctaacggt
2340cggtaacggc ccgcgcgcgc cgcccgccgg gggctcgcgc cagccacgag
ggagcgtccg 2400cggcccgcgc gcccgcgcgg cggaggagag gtgagccccc
gcccgggcca ggccctctgg 2460ccgcgccgtc cgcccctcta gtcgtgtccc
ctcgtgggcc gaacggacgc ggcggtgccc 2520cgcgcccgac cagacgtccc
gtgggctagg gcctgggcct cgggccgcgt cggcgccggt 2580cgagcctctc
cgggtgtcgg ggttcggggc gggcgcgcgt gggcgtggct cctctgtcca
2640cgcctgttcc cttcgtcgcc gcggctctcg tccgggacac ggctttccgg
agtagagccc 2700ttggaggttt gtgccaccga gaacccgagt ctgtcacaca
gacatcctca ttacatcatc 2760ctgccactcc ggcagtcccg cgctttctcc
ccccaccccc gcccccgccc ccgcccccgc 2820cccgtggctt gtttgttggt
tgttttttta atttttttaa ccccttttct tgtactgtct 2880tctttttggt
gtcaggggct ggagagactc ctgcaagata ttgaggcatt tagaatgtat
2940ggttctgttg ccgtggttgt cacctccgct ctacgccaaa tttaagactc
atttcttaag 3000gatgtctcta ttttctgctt tatttgcaca ttgatggctc
tgctgctcag ctggccacgg 3060ccgcccggcg tctgagatag aattgtgtta
atatgagaaa gggaacggcc gatcatgatt 3120agcaggcaga cggcatgcag
atagcatctg gcgggttttc tccgttttta tatgaaagcg 3180ttcacttgtt
tgctggtaaa cacgaggttt taacttttta tctgggtagt ggtgaccgtt
3240tgtttcactc tcccctcctt tttacttcct attcattagc ttttccgttt
ggaaagctgc 3300atacttgtta gcggtagttt ggtgtcttga ctctgaagct
ttctcttcca gcagaggtgg 3360gatctgtatc cctcccatcc taggttacca
agaagacttt ttctcccctt actttctatt 3420gaatttgtgt ctttcccaac
atttaaactt acactaggga taagtttgct cgggttgttt 3480ttgacgtagg
actattaaat attctcattt tgaaacttcc taaaggttta ttggtttgtc
3540atgtctgcac cgatccagtt tcgcataact cttaaaattt aaagctttaa
agtacggttc 3600gcaattgtat tgtcaagaca tggccctaga gcttgttctt
caagttttgg ctgtacagtt 3660gtcattgttt cagtcaagtg caaaagaggt
tattcctgtt tatttagtgt taatagctat 3720ttacagttaa aactttaaag
actcatagaa gtaggtctgc tgtattttaa agatctcatc 3780attcttgtgg
gagagacaca acaccaacac caggaaaggc aacagattgc ttctgtcata
3840tcgctttgct gttttggagt gtacttaaag gattatgatt gataaggctg
tcttgccaga 3900gaatgttgac acaggtgtgc atcgttatct taattaagat
cctgagaaaa tgttgtctat 3960aaatggtgac ataagaaggg cggcatattt
ttgtacagtt ttgcttttgg agaatttaat 4020tgcacttgaa acctgtcttt
gagcttccag attttgtgag cactttagca acagaagttt 4080tagagagaaa
tgatagttta atgttaagtg ttctggcatg tactaatact acttgtggct
4140ttttggcaat gcctgtgtcg gctttctttg tgctctacct ttgtggaata
atggtttcta 4200attttgaaag aaaagcataa aatagagcag tattttatta
agcaaacact tgttagtgtt 4260cagaataagt ctgtagagga gggcatgctt
aatgtgcaga aacaggaata taataagagg 4320taaatatatg agaaatttat
gctatttcgt ttttttggaa aagaaacagt ctcctggggt 4380gccttaatga
tctaaaacaa tttctttttc atatcattaa atacctagat acaatgttta
4440cggaatgcct tgtttttatt tctaaaaata gaagaaaact gttttatttc
ttccaaaata 4500ctgttctgtg tgtttgaatt gatgagtaag tgaagatagt
gcatttagtc tcttactagt 4560ttaaggatct gagtcataac ctgggtgacc
actaaaataa tattgctgtt ctttactaca 4620aatctcggaa tgcaaaaaca
ctggcaagat ggatgctgtt gatgatatta cccttggctt 4680acccagcacc
taataccttt taacataagt attgcattaa gattttctct tctctttgta
4740gcttaggatt tcctggaagc tggctgtcta tcagtgtgta tgtaagaagt
gtgtatgttt 4800ttctttaata ttaaaagtca ctgtaaaaag aggaaatgaa
agtggaggag accataggca 4860tatgatcgga aaagccaggt gtagagtcac
agctcattct tgactccaag gataattacc 4920tggttagaat tacatggtaa
actaaaagaa agacttacgt agatttgttt gctaaaacta 4980aaacaagacc
attaattttg agataattgc tttatttatc tcccatacca cccctaatca
5040aaattttttt cccctctggc agctatgatg agcattctta tgctgttatc
catgtgttat 5100tttcaaattt gatatattgc attaaaagag atgaaatagg
aatatgaact aattatccat 5160tttaacgaat aactcaataa ttattattat
gtgtatacat tctagaaatc ttaggaaaaa 5220atgttcttta gttgataaca
ggaaagtaaa ttaaatcatg taatcctgag tgaattaaat 5280atgctcttca
aaatagatga tgaacatagc atcatacttt ggaataatgt tttaactctg
5340ggatagtaca gcctgttttg agtggtagta tttcattatc tttatatgtt
tctcttcaat 5400atctcttttt tgttgactcc tatattttaa catctttttg
gtatatttgt tagctgttta 5460tagggaataa tttttcttaa tagaatagca
gctggtgcaa ctctaagaac atgtcaaatc 5520taggaattct tttttccttt
ttcaataatg agtggtttgc tttcatagca ctgctctgaa 5580gtgtgatgct
gaaggcttat ttctgcataa atctgcagtg tttttatagc tgtagtgtac
5640ctaagcacca atttagctta gcaaattaac ttagatgctt caaaattgaa
aaccagtata 5700ttctaatgta gtttgagttt aagtctttat tagtaaggat
gcacttaaca tttgttcagg 5760agaagagaat ataattttct cctgcagttg
tcatttacgt aatccatttt tctagtcttt 5820tctcaggcta atgatattcc
ataccttgat acagctttta tttgtttatt tgccataata 5880tggagtgata
ccttcgagaa aattagatat ttgaatctaa tatcccaaaa gtattactta
5940gttttctgtg tttccttaaa cagaatcatt tccattttcg gtgcaggtgt
taagtgtgat 6000gcttccataa tacatttgga tgctgtcagc taagttcact
tctgaactaa ggggttcctc 6060caaatgttgg ctgaaattca tcccaaggct
ggtctgcaag tgagtgtctg cacacagttt 6120gcttgtatgt ggagtcgatc
caaaatagca tcaatgttgg ttttaccaaa gtatttatta 6180ttgataatag
aggctaagta caaaatgtag agaatgtcag ctacttgagg cctttgatta
6240ttaaaaattt tattaatgca ttaaacaaga gtacagtaaa tagataaatt
ttaggttcat 6300gaaataaaac tgaataattt atttttactt actatttatc
atggaattac tttgaataat 6360ttatttttaa tggtataatt ggacagtaaa
atttataaac tcagtgcttt tcataaaaat 6420caaagtgaag tttgtaatat
tttatacaaa tagaattatt atttaagaga aataacctgt 6480ttatgcctaa
ttacagtttt taatcatttc agttacctat ttcttttaag aataaattta
6540gtgggaatat cagttccagt catgggtacc aaactttttt agtgacagag
tacacacagg 6600tatgtaaaac ttgtcatttc tcatcaaata gaggctgctg
aatataggca ggtaagaaaa 6660gctatgagaa agaattgttt tgcagaatat
ttgtctagtt gtcagagcaa ggaactgaat 6720ttattgacac agaacatttt
attaagtaaa aaaaatggtt cttataacaa aaaaaaaagc 6780taattttaca
gaaggcagta gttaatatag ccgccaataa aataaatgtt tctctgaata
6840ctttccgagg cattacagtg atttttaact aatatgaagt gatatataat
aattttaaag 6900taacatcctt gagttttcct attattaatt gcttatggaa
attgggtttc acgtatgact 6960gagagctaaa gcattacagt gagttagaaa
acacaacaca aatgtaaaga aaatgttagg 7020tggtgagtaa tctgattcct
ttttgtttgc ctttaagctt agttttttgt tttgttttac 7080tttgttttaa
accatgtata aaattgttga atttaaaaga taagaggatt aagtaatttc
7140ttttttcctc ctaaggataa atgtaggaaa aatctaaaca cataggcaga
ttggttgagt 7200tttatatctg ttatcggcca catttattaa gattcatatt
tcatgtatat tagaggtatt 7260cacatgtatt aaaattctta tattccttct
atataaaata gatgtagggg gttcccactc 7320ttgaaaatat gaagaaaaga
tgtcctttca gcaataatgg gttatggttg attaactgag 7380aaggttgtat
taaacgttct ctagtagaaa tggctaaaga gcatgctttt tgagaagtgt
7440atcatctagg aagaaaatca aatggagtat tggtaattaa attgtaattc
catgaaggaa 7500ggaagtggtg caaaagatga agctaactat tcctgttttt
ctttttaaga gtctgcaatt 7560cataatggag ctactgtact ggctattgga
aggaggagat tctgaagata aggaggtaat 7620attatctctt ttaaaagaat
actttcctct gtaatcctga atctttatta catgtaagaa 7680ctttgtgcag
tagacagcaa tttctttgaa tttggtatat ggaaacaatt ttattttcct
7740ctgctaagtt tttgagcctg cctcttctag tgccatggac tgcattggta
gagctgagaa 7800atatcattta gccatactca gcacccttaa aatagcttct
ttctgagaat tagatctgtg 7860aaggtgtcct gcacagttct tgtagatgtc
attttagttt gtggttgacg tgcatgcatt 7920tagcatgttg cttaaccgtc
ctcattcgcc tcccagttct ttgttgcctt catttggggg 7980gatgtgtttt
ctgctggatg atttactgct agacatgacc aaactctgag tataagactt
8040ggtgtttggc agctggttcc agtgctctgc tgagaagtag ttgggcccag
cctggggcat 8100tgtagtgtcc tggaggccgg gagctcctgc acaggggttc
ttttcctggt tcagagcctt 8160aggtgccttt ccacttttca cgtaatcttt
tcctaggttg gcttgctgct tactttgcag 8220ctgttgcagg gattcacatg
gaatggaggg ctccctttta ttgtggatta tttctttgat 8280aatttaccca
tgtgcttttt cattttcaaa aacccagtgg tgtttaagaa tgaaagattc
8340ttgaagaaca aaaattgggt taaatttgta tctatcagaa aagattctat
cctctgatgc 8400tatgtaggca cattgaatat gcaggctaaa ttaaaaacag
agtaaaacct ttttataata 8460tgccaattac tatatctaag aatgtttata
caggctgaaa ttttgtaatg gtatttatat 8520ctgttttcct ttttaaggaa
aataatatac ttttctaggc atcaaaaata tctgcccctt 8580tcactaatgc
tgatgtacca ccttccccca acccccaacg ctaaatttga ctggcttaaa
8640aacatctgcc ccctgaacta tatctgggga ggaatattaa taaaacaatg
aaggacttca 8700tggtgtctag ttatataatt aggaaattgg ttagaaacgt
ggctaaaacc agtttctttc 8760attaaaagaa ctagtgtaca tttaaaaaca
aaaaccttca ggaaaaagac tctttgatct 8820ttaagtcaaa tagtattttt
ggttaacaca gcaggaaggg gaaatatacc aatttcagat 8880tcttttattt
atgcctaagt agaggttgta agcagctaag gaagtgatta aaatgtagtc
8940tagtaaaaaa tggtgctgat tacttggaaa gcagtttaca tttgcaaaaa
aattcagtat 9000tatgaccttc accaactttt acacatcata tacttgggtt
attacttatg gtatagcttg 9060tgtatggaat gcaagggtat tttcaaattg
actaggtctt gctggtcatc ttaaaatatg 9120ttacaatatt acaaaattga
aatgtaatat tttttaatag aaggaaaata taaatttaat 9180atctgggcaa
ttgagacctt taaacttact ttaaaagtat gatcttgatg tatatgatac
9240tgttttgtct ttgctatatt aacagaatta gaggggtgtt ctgcaattca
aataccttat 9300atattccaaa ttttattctc tataatggac ttttaaaata
aaaggtatat gtgcttcaag 9360agggcaaaat ttgaatcatg agctaatttg
ctaagcatca gattatagaa aagcatcctt 9420gattaatttg gaactgtgaa
agggggcggg taaaactgtt ttctgcagaa atttactagt 9480gcagcaacca
tttaaattaa atgtttgtta acataatagt gatggcattt tctcctcccc
9540ctccttgtgg ttttgtccaa ctagatgtta cagtggcagt tgcactgact
gttaagtgtt 9600taaatgatga caccattatg tgaagtgatt ttgaaatgag
agattccagc caagaattac 9660atctgctccc atctccttca aatcatactc
tctggcagta cagattatga ttgatttgtt 9720tgtgacagat tgcaggaaac
agtcattgat ttttcaatat tttaccttaa aattatttac 9780agttgtaacc
atggggaggt attttcatgg gctgtcagcc cctgaaagac taggataata
9840ttccctgctc tctgacaaga caaattacct gtaatgagtg cagtagctga
agggtatact 9900tttattttaa aatatgtcaa taaccccagt gactaaacga
atattgattt agcataatga 9960agcctgagta acgtgaaaat gagctttttc
aaggggcatg gtaaagtctt tctttttagc 10020tggttgtaag aagcttttga
ttcttttcag ccagctggta ggaatataga attttataag 10080caaaccatca
ggaatgatag tgttgtttct gataagcaac atccaaatat tttgaccctg
10140cttttagtgg tttttttcaa atcttatttt gagtcttact tttagtcata
gaatagctac 10200tgatttgatg cggtctttaa ctgacttaat atttttacaa
tttcaatata ttttgcattg 10260gaatctccag taatgaatat taaaatatat
gtacaatcat ttgtagatga tatcaattat 10320attaagacat ttcagatggg
ctattgtagt atttaatgtg ccgtatttta tggtagaata 10380attctcagtc
tctggacatc aagattgctt tcagtgggaa tgaagattaa tttacttcag
10440tcctgatttt ttaggcatca atgcatgttt tcatttttgt cagactttta
ccctctttta 10500atgtaattct caacttctta tggatttact tcccaataca
taaaatcctt caaaacaaga 10560atgataataa tttttatact ttttataaaa
ataaatttat ttttagtcca tcaaggtgtc 10620tgaagatttt atgcctaggt
atctccatat ctaacttgat aaggaaaata ggataaacaa 10680tgctggtaat
agcaggaaag taagtatttg aataagatgt caaactgata tttcatgtga
10740acctaactca ttttatggta actaataatt atcttattta aatcaatagg
taaaacctgt 10800ttagaaatta aaaatgagtt acgatttaaa gaaaattcag
atgactcatt gtgagtgcta 10860gttctcttgt aggatgccac tggaaatgtt
gaaatgaaaa atgtaagtat atcttttggt 10920ggaaaaaagg atagtctcta
ggacacaaaa ttactgtttt atttttttct caggagtttg 10980cctaagggtg
tgacagatga tctctgtcac ttgtcttagt tgtgtcctgc aataaactgg
11040atgctttata aaatactaga cctgtgattt cgtatgctgt aatatttcat
ttctccatca 11100cccctccaaa ttatttctta gtttggagta aaataataaa
tgtattatag tcaacatctc 11160ttgacccctc tttagtttca gctaaactaa
gcatgtgtgt ttgtgtgttc attttatagt 11220tcatgtgtag aactatgtga
attaaattta agaaacatgt aaagtagagg aaatagtttt 11280ctggagaaat
ttttcctttt tggatattat gcccttttcc attgcttttc tctgcttgaa
11340agcaaaaaaa agtaccctac ccctgttctc ctttagggaa aaactattcc
tataaagtat 11400ttttaaatcg tgcaagtcat tgcctagggt tagctaaaac
atttcttttt aaaaaggaga 11460aaatgccctg gctttaacat tttcttgtat
ttgtatctat taagataaac agtttacttt 11520gatacagtac ataccaatct
acttaatttt ttttccagga ttccttttac tatgtttggt 11580ctgacctttt
atgataactt aatatgggaa caaattagca tataattcta ttttccatgt
11640gacctcaacc agttgcagaa ttgtaccact actttagggg gggcaatttg
acagtttatg 11700tagactatag cattaattgt tcccaaatgt tcagtgcatc
ctggctaatg tgttattgaa 11760ggtgttttca cgtaagcagt tagaggaagc
acttcacccc tattactaag ttattaaaat 11820gcctcctaaa ggtagcattt
taaattagta tacataattg attagtaatt tgtcttctcc 11880caagcataaa
acagcatagc agagttaagt gtgaccagtg aagtataaga tattagggat
11940tgatggtgac aatgatcata gcaactaaat ggattttttt tttcttttag
attcagccgt 12000tggtctttga aatttcctgt gatgtgtttc aatctagatg
caaagaacat ggaaaaatca 12060aagtgctcga gtggtttaaa tatgttttgg
gtattcctgt ttatagacta taatactttt 12120ccaattaaaa tcctcagttg
tcacgcagaa gaaggttaag ctgtatttga ttgccagttt 12180tactgaaaat
gcttagtatt ttacagtatc accaaatata ttttgtttag ccaaggtata
12240ggaaaaataa aataaattgt ataggttgac ttttttctaa aatgtcttta
ttggattgaa 12300tgaatgttta tacctgaaaa aaaaaggttc aaaaaaattc
ctttttctat cagagtcatt 12360cttttgacaa tcagatatta gactaggttt
aaaataactt tctaatatga ctcatattta 12420tgggaggaaa aagacttgaa
agatattcct atgtgtactt taattttctg taatagtccc 12480tctggaatag
aatatctctt cctcaaacaa atctattggc tcatttcata tacaagaaaa
12540agttgcccta aatggcagag tttcatctct gtaaagaagg ctgacatcca
cttccttcac 12600agaatccctg catttgggta attgagtaat gatagattac
cttgccattg ggaaatacct 12660tgttgtggcc tttgtagcag ctataggaag
atggaagaat tcttttgtat agacaggtct 12720ctagcctctt tggtgtggac
actgtgaagg ggtacatctg gtgagaagga gtgcttaggg 12780aaggaactgg
gaaggtccac aaaggctgat gatgacaaag aatccaaggg ctatgatgaa
12840ttcttaagct tagatctcag agtcataaag taaatttatt aggctagaga
gatttccttt 12900tttatttttg atcaaacttt attttttcag attgttaagt
tcacatgcag ttgtacaaaa 12960taatacagag atctttgaac actacccagt
ttctcctaat agtaacttct tgcaatatac 13020tgtatagcac agcatcacaa
cctggatact gacattgatg cagtcaagac agagaacatt 13080tatatcatga
ggaggatccc tcattaccgc cctttgatat ccacccctac ttccagacca
13140tctcactcct cccttaaccc tggcaaccac tagcatgttc tccatttcta
taaatttgcc 13200tttataggaa tgttatataa ttgcaattaa agtgtgtaac
cttttggggt ttgactcacc 13260cggcatcatt ttctggagat tcagcttata
tgtgtcaata gtttgttccc ttttttttag 13320ttcagtagta ttctgtggta
cgtgtgtacc acatcactca tcaattcgga cttctgggtt 13380gtatctggta
tttgattatt acaaatagaa gtactatata tatattcgtg tagaggtttc
13440tgtgtgaaca taaactttca ttttcctggg acagatgccc acgagtgaaa
attgtgagtc 13500atatggtaat tatgtagagt attttttttt tgagacgatt
tgtttatgat tttaaaacat 13560aaaaatataa gttgtagcca aatataagga
aggatttttc cacaatctag actttattct 13620tattttagag aaatactttt
gaaaaaattg ctttattggc aaacaattcc tatgtaaagg 13680aataaaagac
gcatatactc taggaaagat gttgcaaagc tgacaggcaa ctttgagaaa
13740gatgagggca gttttccagt gttactggag acttgacatt caaggatgta
ttatgatagt 13800ttctgaattg atagattctt tagcgtttgt cactaccctc
aggcctgcag tttttatttg 13860tagaattaac tcatctttac aatgctttgt
tgagtgctgt tgcagaaaat gctcatgtaa 13920ttcaattata tgcagttagg
aaaaagaagt atctttttct agtacatggc ccctaaaatg 13980atgactttcc
acaatgctga atggaggaaa cagctctgtt tccatcaaac tctggtggaa
14040attatacaaa gttatattat gctttatgag acatagtgag gtagtacaca
accatatatg 14100ctatattatt ggggagaccc actgaaagaa ttttcaagga
taatttttgc tctgattttc 14160tatcatgtat attgacttta ttgattgatt
gattgagatg gagtctcacc ccgtcaccca 14220ggctgaagtg cagtggtgca
atctctgctc actgcaacct ccgcctctca ggttcaagtg 14280atcctcgtgc
ctcagcctcc cgagtagctg agactatagg tgtgtgctac gatgcccagc
14340taatttttgt atttttagta gagacaggat ttggtagcct ggtctcgaac
tcctgacctc 14400aggtgatcta cccaccttgg cctcccaaaa tactgggatt
acaggtgtga gccaccgtgc 14460tggccaagta tattgacttt aaaacgttac
cggctgggcg tggtggttca cgcctgttat 14520cccagcactt tgggagcccg
aggcggatgg attacctgaa gtcaggagtt cgagaccagc 14580ccaacctggt
gaaaccccgt ctctactaaa aatacacaaa ttagccgggc gtggtggcag
14640gcgcctataa tcccaactac tgggaaggct gaggcaggag aattgcttga
acctggcagg 14700cggaggctgc agtgagccaa gactgcgcca ctgcactcca
gcctgggcaa caagagcaaa 14760actcagtctc aaaaaaaaca gaaaaaaaaa
gttaccacgg gtgaaacatt gggctggtca 14820cagtgatcct tctaacagca
gtcaatatta taaatctagc tttttttttt gagtgagata 14880ctaggcccag
ggaacccgag tctttcaatc ctcatgaata tagtttaatc tttagctcat
14940tatggtcatg accatagtta aataaatgac acgagataat aaaatagtct
ttgcccttaa 15000aactgacctg ggtaaaagga tgtacacttg aatctttgaa
taccagaata gtcattcatt 15060cattgagccg tgttgtagca tcccatcttc
tggcacttcc agcactggga acccatcata 15120aagaaggcac tacaggccag
gcgctgtggc tcacgcctgt aatcccagca ctttgggagg 15180ccaaggcagg
cggatcacct gaggtcagga gttcgagacc agcctggcca acatggtgaa
15240accccatctc tactaaaaat acaaaaagta actgggcatg gtggcaggcg
cctgtaatcc 15300cagctacttg ggagggtaag gccagagcat cgcttgaacc
cggaaggcag aggttgcagt 15360gagccaagat cctgccattg cactccagcc
tgggggataa gagcgagact tcgtctaaaa 15420aaaataaaaa taaaaaggca
ctacagcacc tgttctctta gtgtagctct attagctctt 15480agactagggg
gtagggatgg gaaaaagtaa aaagaaactg agaaaccaat ctaattacgc
15540ataatcagaa atccaaagct tccagaattc atgaggaaaa agaactgcag
ctgatggagt 15600aggcattaca gagaatgaga ggcttgcttt ttttttaagt
ttcattttat ttgtaattga 15660cacaatagca catatttacg ggatacagtg
tgacatttta atacatttac atattgtgta 15720atgatcaaat taggataatt
agcttatgta tcacttcaaa aacataattt ctttgtgatg 15780agaacattca
aaatcttcta gtattttgaa atacataatg caatattgtt aaccacaatc
15840accctactgt gcaatagaac accagagttt actcctcttt tccatctgta
gcttcagacc 15900cattgaccaa tctctcccct tcgcccccca ccaccccccc
ccccgccgtc accactctcc 15960tcccctcccc agcaagaggc ttgaatcttg
aaggagcaca tcagatagaa agagggaaat 16020caactgggac atggtttgtt
ggaagtgggg tggaatctgg tttgtttaca gtttcagagt 16080ttgaggcatc
aaaagtgtat gtcattgccc aaaagtattt aacagctttc tatgttttag
16140gaattcacta gaaataaggc atgcttttta aaaaagaaaa aaaatagtaa
tctttgaatt 16200taaatatgtg ggtttccatg aaaaacaaat gagtaccctt
ccttgaaaaa ctactacttt 16260gaagagtctc gtaggttctg aggctcgtgt
gtgtgtgtgg gtgtgtgtgt ataaaacatt 16320ttcttttata tttggtaaga
aggaataagg tattttatac ttttctggtc tgttaaatgc 16380ataatctata
agactataat actttaaaaa attttagttt atctgaggct taaaataaca
16440aaagaactta aagctcctgt ggaagagctc caaaattaaa actgtaagat
cacatataga 16500taaaacgtta tataaagcag gctttgggac cccaggcatg
atggctcatt cctgtaatcc 16560cagcactttg ggaggccaag gtgggtggat
cacctgaacc caaaagttcg agaccagcct 16620gagcaacatg gtgaaaccct
gtctctacaa aatacagaaa aaaaaaatat taaccaattt 16680ttttaacctc
ccagctactt gggaggctga gatgggagga tggcttgagc ctggggaggt
16740cgcggctgca atgagtcatg atcgcgccac tgcactccag cctgggtaat
agattgaaac 16800tctgtctcca cacacacaca cacaaatgac ttttggtaag
actgtacata gagtttatgg 16860ccttattaag aaactctttt gagcaaccta
agaaaacaca agtatttctg tacattactt 16920tgttaaagtg tacaggataa
taagagagag ctaagagcct tttctaaatg tgcatccaat 16980tttaagaggc
taaaatagga agctttcaa 170092184DNAHomo Sapiens 2atgtacttta
tttcgcgatc tcacactaac ccagcgccgc accggcaccc gctcggtgct 60gcgctctcgt
gcacgcgcgt tggctcctcc cctccgtctg ctcccctccc ccagacaccg
120cccaccaaga ggcctgagcg gttcagacta cattctccga gagcccctgg
gtccgcccag 180ccca 1843724DNAHomo Sapiens 3tctcggcgcc agaggggcgg
ggaggggcgg ggtctcgatc gcgctattgt catggagacg 60ggaagctggc tgcagcggcg
gcggggaccg tggggccgag gtggctgcca gccggccaat 120gtctaagcga
ggcggagcgg cccaggcggc ccgagcctgg gggagcgcgc agccggccag
180tggcggcctc gccggcggcc tcttcccggg ctcgcagtag gcccgagtcg
tcgccgggag 240ctcctgggag cagcgtcccc gccctgctcc cctcgctccc
gcctcttgcg gccccacggc 300ccctcagcgc ccgcccccgg ctccgcccgc
cgcagccgca gcccctggcg ctaacggtcg 360gtaacggccc gcgcgcgccg
cccgccgggg gctcgcgcca gccacgaggg agcgtccgcg 420gcccgcgcgc
ccgcgcggcg gaggagaggt gagcccccgc ccgggccagg ccctctggcc
480gcgccgtccg cccctctagt cgtgtcccct cgtgggccga acggacgcgg
cggtgccccg 540cgcccgacca gacgtcccgt gggctagggc ctgggcctcg
ggccgcgtcg gcgccggtcg 600agcctctccg ggtgtcgggg ttcggggcgg
gcgcgcgtgg gcgtggctcc tctgtccacg 660cctgttccct tcgtcgccgc
ggctctcgtc cgggacacgg ctttccggag tagagccctt 720ggag 7244448DNAHomo
Sapiens 4tctcggcgcc agaggggcgg ggaggggcgg ggtctcgatc gcgctattgt
catggagacg 60ggaagctggc tgcagcggcg gcggggaccg tggggccgag gtggctgcca
gccggccaat 120gtctaagcga ggcggagcgg cccaggcggc ccgagcctgg
gggagcgcgc agccggccag 180tggcggcctc gccggcggcc tcttcccggg
ctcgcagtag gcccgagtcg tcgccgggag 240ctcctgggag cagcgtcccc
gccctgctcc cctcgctccc gcctcttgcg gccccacggc 300ccctcagcgc
ccgcccccgg ctccgcccgc cgcagccgca gcccctggcg ctaacggtcg
360gtaacggccc gcgcgcgccg cccgccgggg gctcgcgcca gccacgaggg
agcgtccgcg 420gcccgcgcgc ccgcgcggcg gaggagag 4485274DNAHomo Sapiens
5gagcccccgc ccgggccagg ccctctggcc gcgccgtccg cccctctagt cgtgtcccct
60cgtgggccga acggacgcgg cggtgccccg cgcccgacca gacgtcccgt gggctagggc
120ctgggcctcg ggccgcgtcg gcgccggtcg agcctctccg ggtgtcgggg
ttcggggcgg 180gcgcgcgtgg gcgtggctcc tctgtccacg cctgttccct
tcgtcgccgc ggctctcgtc 240cgggacacgg ctttccggag tagagccctt ggag
2746113DNAHomo Sapiens 6gtgttaagtg tgatgcttcc ataatacatt tggatgctgt
cagctaagtt cacttctgaa 60ctaaggggtt cctccaaatg ttggctgaaa ttcatcccaa
ggctggtctg caa 1137284DNAHomo Sapiens 7gtgttaagtg tgatgcttcc
ataatacatt tggatgctgt cagctaagtt cacttctgaa 60ctaaggggtt cctccaaatg
ttggctgaaa ttcatcccaa ggctggtctg caagtgagtg 120tctgcacaca
gtttgcttgt atgtggagtc gatccaaaat agcatcaatg ttggttttac
180caaagtattt attattgata atagaggcta agtacaaaat gtagagaatg
tcagctactt 240gaggcctttg attattaaaa attttattaa tgcattaaac aaga
284887DNAHomo Sapiens 8ttacctattt cttttaagaa taaatttagt gggaatatca
gttccagtca tgggtaccaa 60acttttttag tgacagagta cacacag 879370DNAHomo
Sapiens 9agtctgcaat tcataatgga gctactgtac tggctattgg aaggaggaga
ttctgaagat 60aaggaggtaa tattatctct tttaaaagaa tactttcctc tgtaatcctg
aatctttatt 120acatgtaaga actttgtgca gtagacagca atttctttga
atttggtata tggaaacaat 180tttattttcc tctgctaagt ttttgagcct
gcctcttcta gtgccatgga ctgcattggt 240agagctgaga aatatcattt
agccatactc agcaccctta aaatagcttc tttctgagaa 300ttagatctgt
gaaggtgtcc tgcacagttc ttgtagatgt cattttagtt tgtggttgac
360gtgcatgcat 3701066DNAHomo Sapiens 10agtctgcaat tcataatgga
gctactgtac tggctattgg aaggaggaga ttctgaagat 60aaggag 6611194DNAHomo
Sapiens 11catgcttttt gagaagtgta tcatctagga agaaaatcaa atggagtatt
ggtaattaaa 60ttgtaattcc atgaaggaag gaagtggtgc aaaagatgaa gctaactatt
cctgtttttc 120tttttaagag tctgcaattc ataatggagc tactgtactg
gctattggaa ggaggagatt 180ctgaagataa ggag 19412113DNAHomo Sapiens
12gtaaaacctg tttagaaatt aaaaatgagt tacgatttaa agaaaattca gatgactcat
60tgtgagtgct agttctcttg taggatgcca ctggaaatgt tgaaatgaaa aat
1131330DNAHomo Sapiens 13gatgccactg gaaatgttga aatgaaaaat
301441DNAHomo Sapiens 14ttctcttgta ggatgccact ggaaatgttg aaatgaaaaa
t 41151750DNAHomo Sapiens 15tttaatagaa ggaaaatata aatttaatat
ctgggcaatt gagaccttta aacttacttt 60aaaagtatga tcttgatgta tatgatactg
ttttgtcttt gctatattaa cagaattaga 120ggggtgttct gcaattcaaa
taccttatat attccaaatt ttattctcta taatggactt 180ttaaaataaa
aggtatatgt gcttcaagag ggcaaaattt gaatcatgag ctaatttgct
240aagcatcaga ttatagaaaa gcatccttga ttaatttgga actgtgaaag
ggggcgggta 300aaactgtttt ctgcagaaat ttactagtgc agcaaccatt
taaattaaat gtttgttaac 360ataatagtga tggcattttc tcctccccct
ccttgtggtt ttgtccaact agatgttaca 420gtggcagttg cactgactgt
taagtgttta aatgatgaca ccattatgtg aagtgatttt 480gaaatgagag
attccagcca agaattacat ctgctcccat ctccttcaaa tcatactctc
540tggcagtaca gattatgatt gatttgtttg tgacagattg caggaaacag
tcattgattt 600ttcaatattt taccttaaaa ttatttacag ttgtaaccat
ggggaggtat tttcatgggc 660tgtcagcccc tgaaagacta ggataatatt
ccctgctctc tgacaagaca aattacctgt 720aatgagtgca gtagctgaag
ggtatacttt tattttaaaa tatgtcaata accccagtga 780ctaaacgaat
attgatttag cataatgaag cctgagtaac gtgaaaatga gctttttcaa
840ggggcatggt aaagtctttc tttttagctg gttgtaagaa gcttttgatt
cttttcagcc 900agctggtagg aatatagaat tttataagca aaccatcagg
aatgatagtg ttgtttctga 960taagcaacat ccaaatattt tgaccctgct
tttagtggtt tttttcaaat cttattttga 1020gtcttacttt tagtcataga
atagctactg atttgatgcg gtctttaact gacttaatat 1080ttttacaatt
tcaatatatt ttgcattgga atctccagta atgaatatta aaatatatgt
1140acaatcattt gtagatgata tcaattatat taagacattt cagatgggct
attgtagtat 1200ttaatgtgcc gtattttatg gtagaataat tctcagtctc
tggacatcaa gattgctttc 1260agtgggaatg aagattaatt tacttcagtc
ctgatttttt aggcatcaat gcatgttttc 1320atttttgtca gacttttacc
ctcttttaat gtaattctca acttcttatg gatttacttc 1380ccaatacata
aaatccttca aaacaagaat gataataatt tttatacttt ttataaaaat
1440aaatttattt ttagtccatc aaggtgtctg aagattttat gcctaggtat
ctccatatct 1500aacttgataa ggaaaatagg ataaacaatg ctggtaatag
caggaaagta agtatttgaa 1560taagatgtca aactgatatt tcatgtgaac
ctaactcatt ttatggtaac taataattat 1620cttatttaaa tcaataggta
aaacctgttt agaaattaaa aatgagttac gatttaaaga 1680aaattcagat
gactcattgt gagtgctagt tctcttgtag gatgccactg gaaatgttga
1740aatgaaaaat 1750162228DNAHomo Sapiens 16tgataagcaa catccaaata
ttttgaccct gcttttagtg gtttttttca aatcttattt 60tgagtcttac ttttagtcat
agaatagcta ctgatttgat gcggtcttta actgacttaa 120tatttttaca
atttcaatat attttgcatt ggaatctcca gtaatgaata ttaaaatata
180tgtacaatca tttgtagatg atatcaatta tattaagaca tttcagatgg
gctattgtag 240tatttaatgt gccgtatttt atggtagaat aattctcagt
ctctggacat caagattgct 300ttcagtggga atgaagatta atttacttca
gtcctgattt tttaggcatc aatgcatgtt 360ttcatttttg tcagactttt
accctctttt aatgtaattc tcaacttctt atggatttac 420ttcccaatac
ataaaatcct tcaaaacaag aatgataata atttttatac tttttataaa
480aataaattta tttttagtcc atcaaggtgt ctgaagattt tatgcctagg
tatctccata 540tctaacttga taaggaaaat aggataaaca atgctggtaa
tagcaggaaa gtaagtattt 600gaataagatg tcaaactgat atttcatgtg
aacctaactc attttatggt aactaataat 660tatcttattt aaatcaatag
gtaaaacctg tttagaaatt aaaaatgagt tacgatttaa 720agaaaattca
gatgactcat tgtgagtgct agttctcttg taggatgcca ctggaaatgt
780tgaaatgaaa aatgtaagta tatcttttgg tggaaaaaag gatagtctct
aggacacaaa 840attactgttt tatttttttc tcaggagttt gcctaagggt
gtgacagatg atctctgtca 900cttgtcttag ttgtgtcctg caataaactg
gatgctttat aaaatactag acctgtgatt 960tcgtatgctg taatatttca
tttctccatc acccctccaa attatttctt agtttggagt 1020aaaataataa
atgtattata gtcaacatct cttgacccct ctttagtttc agctaaacta
1080agcatgtgtg tttgtgtgtt cattttatag ttcatgtgta gaactatgtg
aattaaattt 1140aagaaacatg taaagtagag gaaatagttt tctggagaaa
tttttccttt ttggatatta 1200tgcccttttc cattgctttt ctctgcttga
aagcaaaaaa aagtacccta cccctgttct 1260cctttaggga aaaactattc
ctataaagta tttttaaatc gtgcaagtca ttgcctaggg 1320ttagctaaaa
catttctttt taaaaaggag aaaatgccct ggctttaaca ttttcttgta
1380tttgtatcta ttaagataaa cagtttactt tgatacagta cataccaatc
tacttaattt 1440tttttccagg attcctttta ctatgtttgg tctgaccttt
tatgataact taatatggga 1500acaaattagc atataattct attttccatg
tgacctcaac cagttgcaga attgtaccac 1560tactttaggg ggggcaattt
gacagtttat gtagactata gcattaattg ttcccaaatg 1620ttcagtgcat
cctggctaat gtgttattga aggtgttttc acgtaagcag ttagaggaag
1680cacttcaccc ctattactaa gttattaaaa tgcctcctaa aggtagcatt
ttaaattagt 1740atacataatt gattagtaat ttgtcttctc ccaagcataa
aacagcatag cagagttaag 1800tgtgaccagt gaagtataag atattaggga
ttgatggtga caatgatcat agcaactaaa 1860tggatttttt ttttctttta
gattcagccg ttggtctttg aaatttcctg tgatgtgttt 1920caatctagat
gcaaagaaca tggaaaaatc aaagtgctcg agtggtttaa atatgttttg
1980ggtattcctg tttatagact ataatacttt tccaattaaa atcctcagtt
gtcacgcaga 2040agaaggttaa gctgtatttg attgccagtt ttactgaaaa
tgcttagtat tttacagtat 2100caccaaatat attttgttta gccaaggtat
aggaaaaata aaataaattg tataggttga 2160cttttttcta aaatgtcttt
attggattga atgaatgttt atacctgaaa aaaaaaggtt 2220caaaaaaa
222817603DNAHomo Sapiens 17tgcatcctgg ctaatgtgtt attgaaggtg
ttttcacgta agcagttaga ggaagcactt 60cacccctatt actaagttat taaaatgcct
cctaaaggta gcattttaaa ttagtataca 120taattgatta gtaatttgtc
ttctcccaag cataaaacag catagcagag ttaagtgtga 180ccagtgaagt
ataagatatt agggattgat ggtgacaatg atcatagcaa ctaaatggat
240tttttttttc ttttagattc agccgttggt ctttgaaatt tcctgtgatg
tgtttcaatc 300tagatgcaaa gaacatggaa aaatcaaagt gctcgagtgg
tttaaatatg ttttgggtat 360tcctgtttat agactataat acttttccaa
ttaaaatcct cagttgtcac gcagaagaag 420gttaagctgt atttgattgc
cagttttact gaaaatgctt agtattttac agtatcacca 480aatatatttt
gtttagccaa ggtataggaa aaataaaata aattgtatag gttgactttt
540ttctaaaatg tctttattgg attgaatgaa tgtttatacc tgaaaaaaaa
aggttcaaaa 600aaa 60318347DNAHomo Sapiens 18attcagccgt tggtctttga
aatttcctgt gatgtgtttc aatctagatg caaagaacat 60ggaaaaatca aagtgctcga
gtggtttaaa tatgttttgg gtattcctgt ttatagacta 120taatactttt
ccaattaaaa tcctcagttg tcacgcagaa gaaggttaag ctgtatttga
180ttgccagttt tactgaaaat gcttagtatt ttacagtatc accaaatata
ttttgtttag 240ccaaggtata ggaaaaataa aataaattgt ataggttgac
ttttttctaa aatgtcttta 300ttggattgaa tgaatgttta tacctgaaaa
aaaaaggttc aaaaaaa 34719832DNAHomo Sapiens 19gtaagtatat cttttggtgg
aaaaaaggat agtctctagg acacaaaatt actgttttat 60ttttttctca ggagtttgcc
taagggtgtg acagatgatc tctgtcactt gtcttagttg 120tgtcctgcaa
taaactggat gctttataaa atactagacc tgtgatttcg tatgctgtaa
180tatttcattt ctccatcacc cctccaaatt atttcttagt ttggagtaaa
ataataaatg 240tattatagtc aacatctctt gacccctctt tagtttcagc
taaactaagc atgtgtgttt 300gtgtgttcat tttatagttc atgtgtagaa
ctatgtgaat taaatttaag aaacatgtaa 360agtagaggaa atagttttct
ggagaaattt ttcctttttg gatattatgc ccttttccat 420tgcttttctc
tgcttgaaag caaaaaaaag taccctaccc ctgttctcct ttagggaaaa
480actattccta taaagtattt ttaaatcgtg caagtcattg cctagggtta
gctaaaacat 540ttctttttaa aaaggagaaa atgccctggc tttaacattt
tcttgtattt gtatctatta 600agataaacag tttactttga tacagtacat
accaatctac ttaatttttt ttccaggatt 660ccttttacta tgtttggtct
gaccttttat gataacttaa tatgggaaca aattagcata 720taattctatt
ttccatgtga cctcaaccag ttgcagaatt gtaccactac tttagggggg
780gcaatttgac agtttatgta gactatagca ttaattgttc ccaaatgttc ag
83220276DNAHomo Sapiens 20gtatagcaca gcatcacaac ctggatactg
acattgatgc agtcaagaca gagaacattt 60atatcatgag gaggatccct cattaccgcc
ctttgatatc cacccctact tccagaccat 120ctcactcctc ccttaaccct
ggcaaccact agcatgttct ccatttctat aaatttgcct 180ttataggaat
gttatataat tgcaattaaa gtgtgtaacc ttttggggtt tgactcaccc
240ggcatcattt tctggagatt cagcttatat gtgtca 27621913DNAHomo Sapiens
21gagcccccgc ccgggccagg ccctctggcc gcgccgtccg cccctctagt cgtgtcccct
60cgtgggccga acggacgcgg cggtgccccg cgcccgacca gacgtcccgt gggctagggc
120ctgggcctcg ggccgcgtcg gcgccggtcg agcctctccg ggtgtcgggg
ttcggggcgg 180gcgcgcgtgg gcgtggctcc tctgtccacg cctgttccct
tcgtcgccgc ggctctcgtc 240cgggacacgg ctttccggag tagagccctt
ggaggtgtta agtgtgatgc ttccataata 300catttggatg ctgtcagcta
agttcacttc tgaactaagg ggttcctcca aatgttggct 360gaaattcatc
ccaaggctgg tctgcaaagt ctgcaattca taatggagct actgtactgg
420ctattggaag gaggagattc tgaagataag gaggtaaaac ctgtttagaa
attaaaaatg 480agttacgatt taaagaaaat tcagatgact cattgtgagt
gctagttctc ttgtaggatg 540ccactggaaa tgttgaaatg aaaaatattc
agccgttggt ctttgaaatt tcctgtgatg 600tgtttcaatc tagatgcaaa
gaacatggaa aaatcaaagt gctcgagtgg tttaaatatg 660ttttgggtat
tcctgtttat agactataat acttttccaa ttaaaatcct cagttgtcac
720gcagaagaag gttaagctgt atttgattgc cagttttact gaaaatgctt
agtattttac 780agtatcacca aatatatttt gtttagccaa ggtataggaa
aaataaaata aattgtatag 840gttgactttt ttctaaaatg tctttattgg
attgaatgaa tgtttatacc tgaaaaaaaa 900aggttcaaaa aaa 91322841DNAHomo
Sapiens 22gagcccccgc ccgggccagg ccctctggcc gcgccgtccg cccctctagt
cgtgtcccct 60cgtgggccga acggacgcgg cggtgccccg cgcccgacca gacgtcccgt
gggctagggc 120ctgggcctcg ggccgcgtcg gcgccggtcg agcctctccg
ggtgtcgggg ttcggggcgg 180gcgcgcgtgg gcgtggctcc tctgtccacg
cctgttccct tcgtcgccgc ggctctcgtc 240cgggacacgg ctttccggag
tagagccctt ggaggtgtta agtgtgatgc ttccataata 300catttggatg
ctgtcagcta agttcacttc tgaactaagg ggttcctcca aatgttggct
360gaaattcatc ccaaggctgg tctgcaaagt ctgcaattca taatggagct
actgtactgg 420ctattggaag gaggagattc tgaagataag gagttctctt
gtaggatgcc actggaaatg 480ttgaaatgaa aaatattcag ccgttggtct
ttgaaatttc ctgtgatgtg tttcaatcta 540gatgcaaaga acatggaaaa
atcaaagtgc tcgagtggtt taaatatgtt ttgggtattc 600ctgtttatag
actataatac ttttccaatt aaaatcctca gttgtcacgc agaagaaggt
660taagctgtat ttgattgcca gttttactga aaatgcttag tattttacag
tatcaccaaa 720tatattttgt ttagccaagg tataggaaaa ataaaataaa
ttgtataggt tgactttttt 780ctaaaatgtc tttattggat tgaatgaatg
tttatacctg aaaaaaaaag gttcaaaaaa 840a 84123837DNAHomo Sapiens
23tctcggcgcc agaggggcgg ggaggggcgg ggtctcgatc gcgctattgt catggagacg
60ggaagctggc tgcagcggcg gcggggaccg tggggccgag gtggctgcca gccggccaat
120gtctaagcga ggcggagcgg cccaggcggc ccgagcctgg gggagcgcgc
agccggccag 180tggcggcctc gccggcggcc tcttcccggg ctcgcagtag
gcccgagtcg tcgccgggag 240ctcctgggag cagcgtcccc gccctgctcc
cctcgctccc gcctcttgcg gccccacggc 300ccctcagcgc ccgcccccgg
ctccgcccgc cgcagccgca gcccctggcg ctaacggtcg 360gtaacggccc
gcgcgcgccg cccgccgggg gctcgcgcca gccacgaggg agcgtccgcg
420gcccgcgcgc ccgcgcggcg gaggagaggt gagcccccgc ccgggccagg
ccctctggcc 480gcgccgtccg cccctctagt cgtgtcccct cgtgggccga
acggacgcgg cggtgccccg 540cgcccgacca gacgtcccgt gggctagggc
ctgggcctcg ggccgcgtcg gcgccggtcg 600agcctctccg ggtgtcgggg
ttcggggcgg gcgcgcgtgg gcgtggctcc tctgtccacg 660cctgttccct
tcgtcgccgc ggctctcgtc cgggacacgg ctttccggag tagagccctt
720ggaggtgtta agtgtgatgc ttccataata catttggatg ctgtcagcta
agttcacttc 780tgaactaagg ggttcctcca aatgttggct gaaattcatc
ccaaggctgg tctgcaa 83724654DNAHomo Sapiens 24catgcttttt gagaagtgta
tcatctagga agaaaatcaa atggagtatt ggtaattaaa 60ttgtaattcc atgaaggaag
gaagtggtgc aaaagatgaa gctaactatt cctgtttttc 120tttttaagag
tctgcaattc ataatggagc tactgtactg gctattggaa ggaggagatt
180ctgaagataa ggaggtaaaa cctgtttaga aattaaaaat gagttacgat
ttaaagaaaa 240ttcagatgac tcattgtgag tgctagttct cttgtaggat
gccactggaa atgttgaaat
300gaaaaatatt cagccgttgg tctttgaaat ttcctgtgat gtgtttcaat
ctagatgcaa 360agaacatgga aaaatcaaag tgctcgagtg gtttaaatat
gttttgggta ttcctgttta 420tagactataa tacttttcca attaaaatcc
tcagttgtca cgcagaagaa ggttaagctg 480tatttgattg ccagttttac
tgaaaatgct tagtatttta cagtatcacc aaatatattt 540tgtttagcca
aggtatagga aaaataaaat aaattgtata ggttgacttt tttctaaaat
600gtctttattg gattgaatga atgtttatac ctgaaaaaaa aaggttcaaa aaaa
65425732DNAHomo Sapiens 25tctcggcgcc agaggggcgg ggaggggcgg
ggtctcgatc gcgctattgt catggagacg 60ggaagctggc tgcagcggcg gcggggaccg
tggggccgag gtggctgcca gccggccaat 120gtctaagcga ggcggagcgg
cccaggcggc ccgagcctgg gggagcgcgc agccggccag 180tggcggcctc
gccggcggcc tcttcccggg ctcgcagtag gcccgagtcg tcgccgggag
240ctcctgggag cagcgtcccc gccctgctcc cctcgctccc gcctcttgcg
gccccacggc 300ccctcagcgc ccgcccccgg ctccgcccgc cgcagccgca
gcccctggcg ctaacggtcg 360gtaacggccc gcgcgcgccg cccgccgggg
gctcgcgcca gccacgaggg agcgtccgcg 420gcccgcgcgc ccgcgcggcg
gaggagaggt gttaagtgtg atgcttccat aatacatttg 480gatgctgtca
gctaagttca cttctgaact aaggggttcc tccaaatgtt ggctgaaatt
540catcccaagg ctggtctgca agtgagtgtc tgcacacagt ttgcttgtat
gtggagtcga 600tccaaaatag catcaatgtt ggttttacca aagtatttat
tattgataat agaggctaag 660tacaaaatgt agagaatgtc agctacttga
ggcctttgat tattaaaaat tttattaatg 720cattaaacaa ga 73226556DNAHomo
Sapiens 26gtgttaagtg tgatgcttcc ataatacatt tggatgctgt cagctaagtt
cacttctgaa 60ctaaggggtt cctccaaatg ttggctgaaa ttcatcccaa ggctggtctg
caaagtctgc 120aattcataat ggagctactg tactggctat tggaaggagg
agattctgaa gataaggagg 180atgccactgg aaatgttgaa atgaaaaata
ttcagccgtt ggtctttgaa atttcctgtg 240atgtgtttca atctagatgc
aaagaacatg gaaaaatcaa agtgctcgag tggtttaaat 300atgttttggg
tattcctgtt tatagactat aatacttttc caattaaaat cctcagttgt
360cacgcagaag aaggttaagc tgtatttgat tgccagtttt actgaaaatg
cttagtattt 420tacagtatca ccaaatatat tttgtttagc caaggtatag
gaaaaataaa ataaattgta 480taggttgact tttttctaaa atgtctttat
tggattgaat gaatgtttat acctgaaaaa 540aaaaggttca aaaaaa
556271173DNAHomo Sapiens 27tctcggcgcc agaggggcgg ggaggggcgg
ggtctcgatc gcgctattgt catggagacg 60ggaagctggc tgcagcggcg gcggggaccg
tggggccgag gtggctgcca gccggccaat 120gtctaagcga ggcggagcgg
cccaggcggc ccgagcctgg gggagcgcgc agccggccag 180tggcggcctc
gccggcggcc tcttcccggg ctcgcagtag gcccgagtcg tcgccgggag
240ctcctgggag cagcgtcccc gccctgctcc cctcgctccc gcctcttgcg
gccccacggc 300ccctcagcgc ccgcccccgg ctccgcccgc cgcagccgca
gcccctggcg ctaacggtcg 360gtaacggccc gcgcgcgccg cccgccgggg
gctcgcgcca gccacgaggg agcgtccgcg 420gcccgcgcgc ccgcgcggcg
gaggagaggt gttaagtgtg atgcttccat aatacatttg 480gatgctgtca
gctaagttca cttctgaact aaggggttcc tccaaatgtt ggctgaaatt
540catcccaagg ctggtctgca ttacctattt cttttaagaa taaatttagt
gggaatatca 600gttccagtca tgggtaccaa acttttttag tgacagagta
cacacagagt ctgcaattca 660taatggagct actgtactgg ctattggaag
gaggagattc tgaagataag gaggtaaaac 720ctgtttagaa attaaaaatg
agttacgatt taaagaaaat tcagatgact cattgtgagt 780gctagttctc
ttgtaggatg ccactggaaa tgttgaaatg aaaaatattc agccgttggt
840ctttgaaatt tcctgtgatg tgtttcaatc tagatgcaaa gaacatggaa
aaatcaaagt 900gctcgagtgg tttaaatatg ttttgggtat tcctgtttat
agactataat acttttccaa 960ttaaaatcct cagttgtcac gcagaagaag
gttaagctgt atttgattgc cagttttact 1020gaaaatgctt agtattttac
agtatcacca aatatatttt gtttagccaa ggtataggaa 1080aaataaaata
aattgtatag gttgactttt ttctaaaatg tctttattgg attgaatgaa
1140tgtttatacc tgaaaaaaaa aggttcaaaa aaa 1173282229DNAHomo Sapiens
28tgataagcaa catccaaata ttttgaccct gcttttagtg gtttttttca aatcttattt
60tgagtcttac ttttagtcat agaatagcta ctgatttgat gcggtcttta actgacttaa
120tatttttaca atttcaatat attttgcatt ggaatctcca gtaatgaata
ttaaaatata 180tgtacaatca tttgtagatg atatcaatta tattaagaca
tttcagatgg gctattgtag 240tatttaatgt gccgtatttt atggtagaat
aattctcagt ctctggacat caagattgct 300ttcagtggga atgaagatta
atttacttca gtcctgattt tttaggcatc aatgcatgtt 360ttcatttttg
tcagactttt accctctttt aatgtaattc tcaacttctt atggatttac
420ttcccaatac ataaaatcct tcaaaacaag aatgataata atttttatac
tttttataaa 480aataaattta tttttagtcc atcaaggtgt ctgaagattt
tatgcctagg tatctccata 540tctaacttga taaggaaaat aggataaaca
atgctggtaa tagcaggaaa gtaagtattt 600gaataagatg tcaaactgat
atttcatgtg aacctaactc attttatggt aactaataat 660tatcttattt
aaatcaatag gtaaaacctg tttagaaatt aaaaatgagt tacgatttaa
720agaaaattca gatgactcat tgtgagtgct agttctcttg taggatgcca
ctggaaatgt 780tgaaatgaaa aatgtaagta tatcttttgg tggaaaaaag
gatagtctct aggacacaaa 840attactgttt tatttttttc tcaggagttt
gcctaagggt gtgacagatg atctctgtca 900cttgtcttag ttgtgtcctg
caataaactg gatgctttat aaaatactag acctgtgatt 960tcgtatgctg
taatatttca tttctccatc acccctccaa attatttctt agtttggagt
1020aaaataataa atgtattata gtcaacatct cttgacccct ctttagtttc
agctaaacta 1080agcatgtgtg tttgtgtgtt cattttatag ttcatgtgta
gaactatgtg aattaaattt 1140aagaaacatg taaagtagag gaaatagttt
tctggagaaa tttttccttt ttggatatta 1200tgcccttttc cattgctttt
ctctgcttga aagcaaaaaa aagtacccta cccctgttct 1260cctttaggga
aaaactattc ctataaagta tttttaaatc gtgcaagtca ttgcctaggg
1320ttagctaaaa catttctttt taaaaaggag aaaatgccct ggctttaaca
ttttcttgta 1380tttgtatcta ttaagataaa cagtttactt tgatacagta
cataccaatc tacttaattt 1440tttttccagg attcctttta ctatgtttgg
tctgaccttt tatgataact taatatggga 1500acaaattagc atataattct
attttccatg tgacctcaac cagttgcaga attgtaccac 1560tactttaggg
ggggcaattt gacagtttat gtagactata gcattaattg ttcccaaatg
1620ttcagtgcat cctggctaat gtgttattga aggtgttttc acgtaagcag
ttagaggaag 1680cacttcaccc ctattactaa gttattaaaa tgcctcctaa
aggtagcatt ttaaattagt 1740atacataatt gattagtaat ttgtcttctc
ccaagcataa aacagcatag cagagttaag 1800tgtgaccagt gaagtataag
atattaggga ttgatggtga caatgatcat agcaactaaa 1860tggatttttt
ttttctttta gattcagccg ttggtctttg aaatttcctg tgatgtgttt
1920caatctagat gcaaagaaca tggaaaaatc aaagtgctcg agtggtttaa
atatgttttg 1980ggtattcctg tttatagact ataatacttt tccaattaaa
atcctcagtt gtcacgcaga 2040agaaggttaa gctgtatttg attgccagtt
ttactgaaaa tgcttagtat tttacagtat 2100caccaaatat attttgttta
gccaaggtat aggaaaaata aaataaattg tataggttga 2160cttttttcta
aaatgtcttt attggattga atgaatgttt atacctgaaa aaaaaaggtt
2220caaaaaaat 2229292097DNAHomo Sapiens 29tttaatagaa ggaaaatata
aatttaatat ctgggcaatt gagaccttta aacttacttt 60aaaagtatga tcttgatgta
tatgatactg ttttgtcttt gctatattaa cagaattaga 120ggggtgttct
gcaattcaaa taccttatat attccaaatt ttattctcta taatggactt
180ttaaaataaa aggtatatgt gcttcaagag ggcaaaattt gaatcatgag
ctaatttgct 240aagcatcaga ttatagaaaa gcatccttga ttaatttgga
actgtgaaag ggggcgggta 300aaactgtttt ctgcagaaat ttactagtgc
agcaaccatt taaattaaat gtttgttaac 360ataatagtga tggcattttc
tcctccccct ccttgtggtt ttgtccaact agatgttaca 420gtggcagttg
cactgactgt taagtgttta aatgatgaca ccattatgtg aagtgatttt
480gaaatgagag attccagcca agaattacat ctgctcccat ctccttcaaa
tcatactctc 540tggcagtaca gattatgatt gatttgtttg tgacagattg
caggaaacag tcattgattt 600ttcaatattt taccttaaaa ttatttacag
ttgtaaccat ggggaggtat tttcatgggc 660tgtcagcccc tgaaagacta
ggataatatt ccctgctctc tgacaagaca aattacctgt 720aatgagtgca
gtagctgaag ggtatacttt tattttaaaa tatgtcaata accccagtga
780ctaaacgaat attgatttag cataatgaag cctgagtaac gtgaaaatga
gctttttcaa 840ggggcatggt aaagtctttc tttttagctg gttgtaagaa
gcttttgatt cttttcagcc 900agctggtagg aatatagaat tttataagca
aaccatcagg aatgatagtg ttgtttctga 960taagcaacat ccaaatattt
tgaccctgct tttagtggtt tttttcaaat cttattttga 1020gtcttacttt
tagtcataga atagctactg atttgatgcg gtctttaact gacttaatat
1080ttttacaatt tcaatatatt ttgcattgga atctccagta atgaatatta
aaatatatgt 1140acaatcattt gtagatgata tcaattatat taagacattt
cagatgggct attgtagtat 1200ttaatgtgcc gtattttatg gtagaataat
tctcagtctc tggacatcaa gattgctttc 1260agtgggaatg aagattaatt
tacttcagtc ctgatttttt aggcatcaat gcatgttttc 1320atttttgtca
gacttttacc ctcttttaat gtaattctca acttcttatg gatttacttc
1380ccaatacata aaatccttca aaacaagaat gataataatt tttatacttt
ttataaaaat 1440aaatttattt ttagtccatc aaggtgtctg aagattttat
gcctaggtat ctccatatct 1500aacttgataa ggaaaatagg ataaacaatg
ctggtaatag caggaaagta agtatttgaa 1560taagatgtca aactgatatt
tcatgtgaac ctaactcatt ttatggtaac taataattat 1620cttatttaaa
tcaataggta aaacctgttt agaaattaaa aatgagttac gatttaaaga
1680aaattcagat gactcattgt gagtgctagt tctcttgtag gatgccactg
gaaatgttga 1740aatgaaaaat attcagccgt tggtctttga aatttcctgt
gatgtgtttc aatctagatg 1800caaagaacat ggaaaaatca aagtgctcga
gtggtttaaa tatgttttgg gtattcctgt 1860ttatagacta taatactttt
ccaattaaaa tcctcagttg tcacgcagaa gaaggttaag 1920ctgtatttga
ttgccagttt tactgaaaat gcttagtatt ttacagtatc accaaatata
1980ttttgtttag ccaaggtata ggaaaaataa aataaattgt ataggttgac
ttttttctaa 2040aatgtcttta ttggattgaa tgaatgttta tacctgaaaa
aaaaaggttc aaaaaaa 2097301534DNAHomo Sapiens 30tctcggcgcc
agaggggcgg ggaggggcgg ggtctcgatc gcgctattgt catggagacg 60ggaagctggc
tgcagcggcg gcggggaccg tggggccgag gtggctgcca gccggccaat
120gtctaagcga ggcggagcgg cccaggcggc ccgagcctgg gggagcgcgc
agccggccag 180tggcggcctc gccggcggcc tcttcccggg ctcgcagtag
gcccgagtcg tcgccgggag 240ctcctgggag cagcgtcccc gccctgctcc
cctcgctccc gcctcttgcg gccccacggc 300ccctcagcgc ccgcccccgg
ctccgcccgc cgcagccgca gcccctggcg ctaacggtcg 360gtaacggccc
gcgcgcgccg cccgccgggg gctcgcgcca gccacgaggg agcgtccgcg
420gcccgcgcgc ccgcgcggcg gaggagaggt gttaagtgtg atgcttccat
aatacatttg 480gatgctgtca gctaagttca cttctgaact aaggggttcc
tccaaatgtt ggctgaaatt 540catcccaagg ctggtctgca aagtctgcaa
ttcataatgg agctactgta ctggctattg 600gaaggaggag attctgaaga
taaggaggta atattatctc ttttaaaaga atactttcct 660ctgtaatcct
gaatctttat tacatgtaag aactttgtgc agtagacagc aatttctttg
720aatttggtat atggaaacaa ttttattttc ctctgctaag tttttgagcc
tgcctcttct 780agtgccatgg actgcattgg tagagctgag aaatatcatt
tagccatact cagcaccctt 840aaaatagctt ctttctgaga attagatctg
tgaaggtgtc ctgcacagtt cttgtagatg 900tcattttagt ttgtggttga
cgtgcatgca ttgcatcctg gctaatgtgt tattgaaggt 960gttttcacgt
aagcagttag aggaagcact tcacccctat tactaagtta ttaaaatgcc
1020tcctaaaggt agcattttaa attagtatac ataattgatt agtaatttgt
cttctcccaa 1080gcataaaaca gcatagcaga gttaagtgtg accagtgaag
tataagatat tagggattga 1140tggtgacaat gatcatagca actaaatgga
tttttttttt cttttagatt cagccgttgg 1200tctttgaaat ttcctgtgat
gtgtttcaat ctagatgcaa agaacatgga aaaatcaaag 1260tgctcgagtg
gtttaaatat gttttgggta ttcctgttta tagactataa tacttttcca
1320attaaaatcc tcagttgtca cgcagaagaa ggttaagctg tatttgattg
ccagttttac 1380tgaaaatgct tagtatttta cagtatcacc aaatatattt
tgtttagcca aggtatagga 1440aaaataaaat aaattgtata ggttgacttt
tttctaaaat gtctttattg gattgaatga 1500atgtttatac ctgaaaaaaa
aaggttcaaa aaaa 1534311087DNAHomo Sapiens 31tctcggcgcc agaggggcgg
ggaggggcgg ggtctcgatc gcgctattgt catggagacg 60ggaagctggc tgcagcggcg
gcggggaccg tggggccgag gtggctgcca gccggccaat 120gtctaagcga
ggcggagcgg cccaggcggc ccgagcctgg gggagcgcgc agccggccag
180tggcggcctc gccggcggcc tcttcccggg ctcgcagtag gcccgagtcg
tcgccgggag 240ctcctgggag cagcgtcccc gccctgctcc cctcgctccc
gcctcttgcg gccccacggc 300ccctcagcgc ccgcccccgg ctccgcccgc
cgcagccgca gcccctggcg ctaacggtcg 360gtaacggccc gcgcgcgccg
cccgccgggg gctcgcgcca gccacgaggg agcgtccgcg 420gcccgcgcgc
ccgcgcggcg gaggagaggt gttaagtgtg atgcttccat aatacatttg
480gatgctgtca gctaagttca cttctgaact aaggggttcc tccaaatgtt
ggctgaaatt 540catcccaagg ctggtctgca aagtctgcaa ttcataatgg
agctactgta ctggctattg 600gaaggaggag attctgaaga taaggaggta
aaacctgttt agaaattaaa aatgagttac 660gatttaaaga aaattcagat
gactcattgt gagtgctagt tctcttgtag gatgccactg 720gaaatgttga
aatgaaaaat attcagccgt tggtctttga aatttcctgt gatgtgtttc
780aatctagatg caaagaacat ggaaaaatca aagtgctcga gtggtttaaa
tatgttttgg 840gtattcctgt ttatagacta taatactttt ccaattaaaa
tcctcagttg tcacgcagaa 900gaaggttaag ctgtatttga ttgccagttt
tactgaaaat gcttagtatt ttacagtatc 960accaaatata ttttgtttag
ccaaggtata ggaaaaataa aataaattgt ataggttgac 1020ttttttctaa
aatgtcttta ttggattgaa tgaatgttta tacctgaaaa aaaaaggttc 1080aaaaaaa
108732249DNAHomo Sapiens 32agccgttggt ctttgaaatt tcctgtgatg
tgtttcaatc tagatgcaaa gaacatggaa 60aaatcaaagt gctcgagtgg tttaaatatg
ttttgggtat tcctgtttat agactataat 120acttttccaa ttaaaatcct
cagttgtcac gcagaagaag gttaagctgt atttgattgc 180cagttttact
gaaaatgctt agtattttac agtatcacca aatatatttt gtttagccaa 240ggtatagga
24933273DNAHomo Sapiens 33tgctgtcagc taagttcact tctgaactaa
ggggttcctc caaatgttgg ctgaaattca 60tcccaaggct ggtctgcaaa gtctgcaatt
cataatggag ctactgtact ggctattgga 120aggaggagat tctgaagata
aggaggtaaa acctgtttag aaattaaaaa tgagttacga 180tttaaagaaa
attcagatga ctcattgtga gtgctagttc tcttgtagga tgccactgga
240aatgttgaaa tgaaaaatat tcagccgttg gtc 2733434DNAArtificial
sequenceSynthetic 34taactggaat tcatgttggc tgaaattcat ccca
343556DNAArtificial sequenceSynthetic 35cacgataagc ttttattata
gtctataaac aggaataccc aaaacatatt taaacc 563620DNAArtificial
sequenceSynthetic 36acacggcttt ccggagtaga 203728DNAArtificial
sequenceSynthetic 37aacaggtttt acctccttat cttcagaa 2838222DNAHomo
Sapiens 38gacacggctt tccggagtag agcccttgga ggtgttaagt gtgatgcttc
cataatacat 60ttggatgctg tcagctaagt tcacttctga actaaggggt tcctccaaat
gttggctgaa 120attcatccca aggctggtct gcaaagtctg caattcataa
tggagctact gtactggcta 180ttggaaggag gagattctga agataaggag
gtaaaacctg tt 22239222DNAHomo Sapiens 39gacacggctt tccggagtag
agcccttgga ggtgttaagt gtgatgcttc cataatacat 60ttggatgctg tcagctaagt
tcacttctga actaaggggt tcctccaaat gttggctgaa 120attcatccca
aggctggtct gcaaagtctg caattcataa tggagctact gtactggcta
180ttggaaggag gagattctga agataaggag gtaaaacctg tt
2224020DNAArtificial sequenceSynthetic 40acacggcttt ccggagtaga
204126DNAArtificial sequenceSynthetic 41ggcatcctac aagagaactc
cttatc 2642226DNAHomo Sapiens 42acacggcttt ccggagtaga gcccttggag
gtgttaagtg tgatgcttcc ataatacatt 60tggatgctgt cagctaagtt cacttctgaa
ctaaggggtt cctccaaatg ttggctgaaa 120ttcatcccaa ggctggtctg
caaagtctgc aattcataat ggagctactg tactggctat 180tggaaggagg
agattctgaa gataaggagt tctcttgtag gatgcc 22643226DNAHomo Sapiens
43acacggcttt ccggagtaga gcccttggag gtgttaagtg tgatgcttcc ataatacatt
60tggatgctgt cagctaagtt cacttctgaa ctaaggggtt cctccaaatg ttggctgaaa
120ttcatcccaa ggctggtctg caaagtctgc aattcataat ggagctactg
tactggctat 180tggaaggagg agattctgaa gataaggagt tctcttgtag gatgcc
2264425DNAArtificial sequenceSynthetic 44gctgacagca tccaaatgta
ttatg 254529DNAArtificial sequenceSynthetic 45tttttgagaa gtgtatcatc
taggaagaa 294627DNAArtificial sequenceSynthetic 46acatatttaa
accactcgag cactttg 2747398DNAHomo Sapiens 47ctttttgaga agtgtatcat
ctaggaagaa aatcaaatgg agtattggta attaaattgt 60aattccatga aggaaggaag
tggtgcaaaa gatgaagcta actattcctg tttttctttt 120taagagtctg
caattcataa tggagctact gtactggcta ttggaaggag gagattctga
180agataaggag gtaaaacctg tttagaaatt aaaaatgagt tacgatttaa
agaaaattca 240gatgactcat tgtgagtgct agttctcttg taggatgcca
ctggaaatgt tgaaatgaaa 300aatattcagc cgttggtctt tgaaatttcc
tgtgatgtgt ttcaatctag atgcaaagaa 360catggaaaaa tcaaagtgct
cgagtggttt aaatatgt 39848398DNAHomo Sapiens 48ctttttgaga agtgtatcat
ctaggaagaa aatcaaatgg agtattggta attaaattgt 60aattccatga aggaaggaag
tggtgcaaaa gatgaagcta actattcctg tttttctttt 120taagagtctg
caattcataa tggagctact gtactggcta ttggaaggag gagattctga
180agataaggag gtaaaacctg tttagaaatt aaaaatgagt tacgatttaa
agaaaattca 240gatgactcat tgtgagtgct agttctcttg taggatgcca
ctggaaatgt tgaaatgaaa 300aatattcagc cgttggtctt tgaaatttcc
tgtgatgtgt ttcaatctag atgcaaagaa 360catggaaaaa tcaaagtgct
cgagtggttt aaatatgt 3984922DNAArtificial sequenceSynthetic
49ggatcgactc cacatacaag ca 2250205DNAHomo Sapiens 50cagccacgag
ggagcgtccg cggcccgcgc gcccgcgcgg cggaggagag gtgttaagtg 60tgatgcttcc
ataatacatt tggatgctgt cagctaagtt cacttctgaa ctaaggggtt
120cctccaaatg ttggctgaaa ttcatcccaa ggctggtctg caagtgagtg
tctgcacaca 180gtttgcttgt atgtggagtc gatcc 20551205DNAHomo Sapiens
51cagccacgag ggagcgtccg cggcccgcgc gcccgcgcgg cggaggagag gtgttaagtg
60tgatgcttcc ataatacatt tggatgctgt cagctaagtt cacttctgaa ctaaggggtt
120cctccaaatg ttggctgaaa ttcatcccaa ggctggtctg caagtgagtg
tctgcacaca 180gtttgcttgt atgtggagtc gatcc 2055222DNAArtificial
sequenceSynthetic 52atgttggctg aaattcatcc ca 225322DNAArtificial
sequenceSynthetic 53ttccagtggc atcctcctta tc 2254115DNAHomo Sapiens
54atgttggctg aaattcatcc caaggctggt ctgcaaagtc tgcaattcat aatggagcta
60ctgtactggc tattggaagg aggagattct gaagataagg aggatgccac tggaa
11555115DNAHomo Sapiens 55atgttggctg aaattcatcc caaggctggt
ctgcaaagtc tgcaattcat aatggagcta 60ctgtactggc tattggaagg aggagattct
gaagataagg aggatgccac tggaa 1155622DNAArtificial sequenceSynthetic
56atgttggctg aaattcatcc ca 225730DNAArtificial sequenceSynthetic
57tctgtgtgta ctctgtcact aaaaaagttt 3058124DNAHomo Sapiens
58atgttggctg aaattcatcc caaggctggt ctgcaattac ctatttcttt taagaataaa
60tttagtggga atatcagttc cagtcatggg taccaaactt ttttagtgac agagtacaca
120caga
12459124DNAHomo Sapiens 59atgttggctg aaattcatcc caaggctggt
ctgcaattac ctatttcttt taagaataaa 60tttagtggga atatcagttc cagtcatggg
taccaaactt ttttagtgac agagtacaca 120caga 1246027DNAArtificial
sequenceSynthetic 60taagatatta gggattgatg gtgacaa
276127DNAArtificial sequenceSynthetic 61acatatttaa accactcgag
cactttg 2762160DNAHomo Sapiens 62taagatatta gggattgatg gtgacaatga
tcatagcaac taaatggatt ttttttttct 60tttagattca gccgttggtc tttgaaattt
cctgtgatgt gtttcaatct agatgcaaag 120aacatggaaa aatcaaagtg
ctcgagtggt ttaaatatgt 16063159DNAHomo Sapiens 63taagatatta
gggattgatg gtgacaatga tcatagcaac taaatggatt tttttttctt 60ttagattcag
ccgttggtct ttgaaatttc ctgtgatgtg tttcaatcta gatgcaaaga
120acatggaaaa atcaaagtgc tcgagtggtt taaatatgt 1596428DNAArtificial
sequenceSynthetic 64tgcctaggta tctccatatc taacttga
286527DNAArtificial sequenceSynthetic 65acatatttaa accactcgag
cactttg 2766306DNAHomo Sapiens 66tgcctaggta tctccatatc taacttgata
aggaaaatag gataaacaat gctggtaata 60tttatggtaa ctaataatta tcttatttaa
atcaataggt aaaacctgtt tagaaattaa 120aaatgagtta cgatttaaag
aaaattcaga tgactcattg tgagtgctag ttctcttgta 180ggatgccact
ggaaatgttg aaatgaaaaa tattcagccg ttggtctttg aaatttcctg
240tgatgtgttt caatctagat gcaaagaaca tggaaaaatc aaagtgctcg
agtggtttaa 300atatgt 30667366DNAHomo Sapiens 67tgcctaggta
tctccatatc taacttgata aggaaaatag gataaacaat gctggtaata 60gcaggaaagt
aagtatttga ataagatgtc aaactgatat ttcatgtgaa cctaactcat
120tttatggtaa ctaataatta tcttatttaa atcaataggt aaaacctgtt
tagaaattaa 180aaatgagtta cgatttaaag aaaattcaga tgactcattg
tgagtgctag ttctcttgta 240ggatgccact ggaaatgttg aaatgaaaaa
tattcagccg ttggtctttg aaatttcctg 300tgatgtgttt caatctagat
gcaaagaaca tggaaaaatc aaagtgctcg agtggtttaa 360atatgt
3666822DNAArtificial sequenceSynthetic 68atgttggctg aaattcatcc ca
226928DNAArtificial sequenceSynthetic 69tgctgagtat ggctaaatga
tatttctc 2870310DNAHomo Sapiens 70atgttggctg aaattcatcc caaggctggt
ctgcaaagtc tgcaattcat aatggagcta 60ctgtactggc tattggaagg aggagattct
gaagataagg aggtaatatt atctctttta 120aaagaatact ttcctctgta
atcctgaatc tttattacat gtaagaactt tgtgcagtag 180acagcaattt
ctttgaattt ggtatatgga aacaatttta ttttcctctg ctaagttttt
240gagcctgcct cttctagtgc catggactgc attggtagag ctgagaaata
tcatttagcc 300atactcagca 31071310DNAHomo Sapiens 71atgttggctg
aaattcatcc caaggctggt ctgcaaagtc tgcaattcat aatggagcta 60ctgtactggc
tattggaagg aggagattct gaagataagg aggtaatatt atctctttta
120aaagaatact ttcctctgta atcctgaatc tttattacat gtaagaactt
tgtgcagtag 180acagcaattt ctttgaattt ggtatatgga aacaatttta
ttttcctctg ctaagttttt 240gagcctgcct cttctagtgc catggactgc
attggtagag ctgagaaata tcatttagcc 300atactcagca 3107217DNAArtificial
sequenceSynthetic 72cagccacgag ggagcgt 177328DNAArtificial
sequenceSynthetic 73aacaggtttt acctccttat cttcagaa 2874240DNAHomo
Sapiens 74agccacgagg gagcgtccgc ggcccgcgcg cccgcgcggc ggaggagagg
tgttaagtgt 60gatgcttcca taatacattt ggatgctgtc agctaagttc acttctgaac
taaggggttc 120ctccaaatgt tggctgaaat tcatcccaag gctggtctgc
aaagtctgca attcataatg 180gagctactgt actggctatt ggaaggagga
gattctgaag ataaggaggt aaaacctgtt 24075240DNAHomo Sapiens
75agccacgagg gagcgtccgc ggcccgcgcg cccgcgcggc ggaggagagg tgttaagtgt
60gatgcttcca taatacattt ggatgctgtc agctaagttc acttctgaac taaggggttc
120ctccaaatgt tggctgaaat tcatcccaag gctggtctgc aaagtctgca
attcataatg 180gagctactgt actggctatt ggaaggagga gattctgaag
ataaggaggt aaaacctgtt 2407640DNAHomo Sapiens 76taaaatgtct
ttattggatt gaatgaatgt ttatacctga 407768DNAHomo Sapiens 77aggttaagct
gtatttgatt gccagtttta ctgaaaatgc ttagtatttt acagtatcac 60caaatata
6878140DNAHomo Sapiens 78aaatttcctg tgatgtgttt caatctagat
gcaaagaaca tggaaaaatc aaagtgctcg 60agtggtttaa atatgttttg ggtattcctg
tttatagact ataatacttt tccaattaaa 120atcctcagtt gtcacgcaga
14079816DNAHomo Sapiens 79gcctaagggt gtgacagatg atctctgtca
cttgtcttag ttgtgtcctg caataaactg 60gatgctttat aaaatactag acctgtgatt
tcgtatgctg taatatttca tttctccatc 120acccctccaa attatttctt
agtttggagt aaaataataa atgtattata gtcaacatct 180cttgacccct
ctttagtttc agctaaacta agcatgtgtg tttgtgtgtt cattttatag
240ttcatgtgta gaactatgtg aattaaattt aagaaacatg taaagtagag
gaaatagttt 300tctggagaaa tttttccttt ttggatatta tgcccttttc
cattgctttt ctctgcttga 360aagcaaaaaa aagtacccta cccctgttct
cctttaggga aaaactattc ctataaagta 420tttttaaatc gtgcaagtca
ttgcctaggg ttagctaaaa catttctttt taaaaaggag 480aaaatgccct
ggctttaaca ttttcttgta tttgtatcta ttaagataaa cagtttactt
540tgatacagta cataccaatc tacttaattt tttttccagg attcctttta
ctatgtttgg 600tctgaccttt tatgataact taatatggga acaaattagc
atataattct attttccatg 660tgacctcaac cagttgcaga attgtaccac
tactttaggg ggggcaattt gacagtttat 720gtagactata gcattaattg
ttcccaaatg ttcagtgcat cctggctaat gtgttattga 780aggtgttttc
acgtaagcag ttagaggaag cacttc 8168030DNAHomo Sapiens 80gatgccactg
gaaatgttga aatgaaaaat 308134DNAHomo Sapiens 81gaaaattcag atgactcatt
gtgagtgcta gttc 3482635DNAHomo Sapiens 82ttcaaggggc atggtaaagt
ctttcttttt agctggttgt aagaagcttt tgattctttt 60cagccagctg gtaggaatat
agaattttat aagcaaacca tcaggaatga tagtgttgtt 120tctgataagc
aacatccaaa tattttgacc ctgcttttag tggttttttt caaatcttat
180tttgagtctt acttttagtc atagaatagc tactgatttg atgcggtctt
taactgactt 240aatattttta caatttcaat atattttgca ttggaatctc
cagtaatgaa tattaaaata 300tatgtacaat catttgtaga tgatatcaat
tatattaaga catttcagat gggctattgt 360agtatttaat gtgccgtatt
ttatggtaga ataattctca gtctctggac atcaagattg 420ctttcagtgg
gaatgaagat taatttactt cagtcctgat tttttaggca tcaatgcatg
480ttttcatttt tgtcagactt ttaccctctt ttaatgtaat tctcaacttc
ttatggattt 540acttcccaat acataaaatc cttcaaaaca agaatgataa
taatttttat actttttata 600aaaataaatt tatttttagt ccatcaaggt gtctg
6358336DNAHomo Sapiens 83gcaattcata atggagctac tgtactggct attgga
3684104DNAHomo Sapiens 84gtgtgatgct tccataatac atttggatgc
tgtcagctaa gttcacttct gaactaaggg 60gttcctccaa atgttggctg aaattcatcc
caaggctggt ctgc 10485177DNAHomo Sapiens 85ccgaccagac gtcccgtggg
ctagggcctg ggcctcgggc cgcgtcggcg ccggtcgagc 60ctctccgggt gtcggggttc
ggggcgggcg cgcgtgggcg tggctcctct gtccacgcct 120gttcccttcg
tcgccgcggc tctcgtccgg gacacggctt tccggagtag agccctt 17786318DNAHomo
Sapiens 86aggtggctgc cagccggcca atgtctaagc gaggcggagc ggcccaggcg
gcccgagcct 60gggggagcgc gcagccggcc agtggcggcc tcgccggcgg cctcttcccg
ggctcgcagt 120aggcccgagt cgtcgccggg agctcctggg agcagcgtcc
ccgccctgct cccctcgctc 180ccgcctcttg cggccccacg gcccctcagc
gcccgccccc ggctccgccc gccgcagccg 240cagcccctgg cgctaacggt
cggtaacggc ccgcgcgcgc cgcccgccgg gggctcgcgc 300cagccacgag ggagcgtc
3188794DNAHomo Sapiens 87ggcctgagcg gttcagacta cattctccga
gagcccctgg gtccgcccag cccagtgcct 60gacacctcct tcacctatga ttgggcgctg
gcct 9488276DNAHomo Sapiens 88gtatagcaca gcatcacaac ctggatactg
acattgatgc agtcaagaca gagaacattt 60atatcatgag gaggatccct cattaccgcc
ctttgatatc cacccctact tccagaccat 120ctcactcctc ccttaaccct
ggcaaccact agcatgttct ccatttctat aaatttgcct 180ttataggaat
gttatataat tgcaattaaa gtgtgtaacc ttttggggtt tgactcaccc
240ggcatcattt tctggagatt cagcttatat gtgtca 2768934DNAArtificial
sequenceSynthetic 89taactggaat tcatgttggc tgaaattcat ccca
349056DNAArtificial sequenceSynthetic 90cacgataagc ttttattata
gtctataaac aggaataccc aaaacatatt taaacc 569134DNAArtificial
sequenceSynthetic 91taactggaat tcatgttggc tgaaattcat ccca
349256DNAArtificial sequenceSynthetic 92cacgataagc ttttattata
gtctataaac aggaataccc aaaacatatt taaacc 569320DNAArtificial
sequenceSynthetic 93acacggcttt ccggagtaga 209428DNAArtificial
sequenceSynthetic 94aacaggtttt acctccttat cttcagaa
289518DNAArtificial sequenceSynthetic 95ggcggaggag aggtgagc
189625DNAArtificial sequenceSynthetic 96gctgacagca tccaaatgta ttatg
259718DNAArtificial sequenceSynthetic 97ggcggaggag aggtgagc
189817DNAArtificial sequenceSynthetic 98cagccacgag ggagcgt 17
* * * * *