U.S. patent application number 17/014939 was filed with the patent office on 2022-06-30 for array-based translocation and rearrangement assays.
The applicant listed for this patent is AFFYMETRIX, INC.. Invention is credited to Glenn K. Fu, Keith W. Jones, Michael H. Shapero, Andrew Sparks.
Application Number | 20220205041 17/014939 |
Document ID | / |
Family ID | |
Filed Date | 2022-06-30 |
United States Patent
Application |
20220205041 |
Kind Code |
A1 |
Sparks; Andrew ; et
al. |
June 30, 2022 |
ARRAY-BASED TRANSLOCATION AND REARRANGEMENT ASSAYS
Abstract
Methods for detecting genomic rearrangements are provided. In
one embodiment, methods are provided for the use of paired end tags
from restriction fragments to detect genomic rearrangements.
Sequences from the ends of the fragments are brought together to
form ditags and the ditags are detected. Combinations of ditags are
detected by an on-chip sequencing strategy that is described
herein, using inosine for de novo sequencing of short segments of
DNA. In another aspect, translocations are identified by using
target specific capture and analysis of the captured products on a
tiling array.
Inventors: |
Sparks; Andrew; (Los Gatos,
CA) ; Shapero; Michael H.; (Campbell, CA) ;
Fu; Glenn K.; (Dublin, CA) ; Jones; Keith W.;
(Sunnyvale, CA) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
AFFYMETRIX, INC. |
Carlsbad |
CA |
US |
|
|
Appl. No.: |
17/014939 |
Filed: |
September 8, 2020 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
15901104 |
Feb 21, 2018 |
|
|
|
17014939 |
|
|
|
|
14751884 |
Jun 26, 2015 |
9932636 |
|
|
15901104 |
|
|
|
|
12402486 |
Mar 11, 2009 |
9074244 |
|
|
14751884 |
|
|
|
|
61035697 |
Mar 11, 2008 |
|
|
|
International
Class: |
C12Q 1/6883 20060101
C12Q001/6883; G16B 25/00 20060101 G16B025/00; G16B 25/20 20060101
G16B025/20; C12Q 1/6827 20060101 C12Q001/6827 |
Claims
1. A method for detecting a chromosomal translocation in a sample
from an individual comprising: (a) obtaining a pool of capture
probes consisting of a plurality of DNA fragments complementary to
a first chromosome of interest; (b) obtaining a tester sample
preparation comprising DNA fragments from the sample flanked by
common priming sequences; (c) combining the tester sample
preparation with the pool of capture probes to allow specific
hybridization of the capture probes to complementary fragments in
the tester sample preparation and thereby forming complexes; (d)
capturing complexes formed between capture probes and tester
fragments in (c); (e) recovering the tester fragments captured in
(d) and amplifying the recovered tester fragments; (f) amplifying
the recovered tester fragments using primers to the priming
sequences; (g) hybridizing the amplified fragments to a tiling
array comprising a plurality of probes to said first chromosome of
interest and a plurality of probes to one or more second
chromosomes to obtain a hybridization pattern; and (h) analyzing
the hybridization pattern wherein the presence of hybridization to
probes to one of said second chromosomes is indicative of a
chromosomal translocation between the first chromosome and one of
said second chromosomes.
2. The method of claim 1 wherein the individual is a human.
3. The method of claim 2 wherein the array comprises a plurality of
probes to each human chromosome.
4. The method of claim 3 where said probes to each human chromosome
are spaced over the entirety of each human chromosome, excluding
the centromere region, at an average spacing of about 5,000
basepairs.
5. The method of claim 1 wherein the pool of capture probes further
consists of a plurality of probes from at least one additional
chromosome.
6. The method of claim 1 wherein the tester sample preparation is
obtained by fragmenting the sample with one or more restriction
enzymes, ligating an adapter sequence to the ends generated by
fragmentation and amplification of the adapter-ligated fragments by
PCR using a primer complementary to the adapter sequence.
7. The method of claim 1 wherein the tester sample preparation is
obtained by fragmenting the sample by a method selected from
shearing, chemical fragmentation or sonication.
8. A method for analysis of genomic rearrangements in a sample
comprising: (a) digesting the sample with a selected restriction
enzyme to obtain restriction fragments that have a first end tag
and a second end tag flanking a central portion, wherein the
sequences of the end tags in the genome can be determined by using
a computer to identify the sequence surrounding each restriction
site for said selected restriction enzyme; (b) ligating the
fragments to a common backbone fragment to form first circular
molecules comprising restriction fragments and the backbone,
wherein the common backbone fragment has a first type IIS
restriction enzyme recognition site at its first end and a second
type IIS restriction enzyme recognition site at its second end; (c)
cleaving the first circular molecule using a type IIS restriction
enzyme to form a first fragment comprising the backbone fragment
flanked by the first end tag and the second end tag and a second
fragment containing the central portion of the restriction
fragment; (d) ligating the ends of the first fragment to form
second circular molecules, wherein the ends of said first end tag
and said second end tag are ligated together to form a ditag; (e)
amplifying at least a portion of the second circular molecule using
a pair of primers complementary to said backbone to obtain
amplification target comprising said ditag flanked by priming
sites; (f) hybridizing the amplification target to an array
comprising a plurality of ditag sequencing probes, wherein said
plurality comprises probes for each end tag sequence, wherein
ditags hybridize to probes that are complementary to a first end
tag in the ditag; and, (g) determining at least a partial sequence
of the second end tag in a plurality of the ditags, wherein the
presence in a ditag of two non-neighboring end tags indicates a
genomic rearrangement.
9. The method of claim 8 wherein the partial sequence is at least 5
bases.
10. The method of claim 8 wherein the partial sequence is 6
bases.
11. The method of claim 8 wherein the partial sequence is at least
6 bases.
12. The method of claim 8 wherein the first type IIS restriction
enzyme and the second type IIS restriction enzyme are the same type
IIS restriction enzyme.
13. The method of claim 12 wherein the type IIS restriction enzyme
is MmeI.
14. The method of claim 8 wherein the first end tag and the second
end tag are each between 10 and 20 bases in length.
15. The method of claim 8 wherein the first end tag and the second
end tag are each between 18 and 27 bases in length.
16. The method of claim 8 wherein for each base to be sequenced in
the second end tag the array comprises a probe comprising the
complement of .sup.-the first end tag and between 0 and 5
inosines.
17. A ditag sequencing array, said array comprising: (a) a
plurality of ditag sequencing probe sets, wherein each probe in a
ditag sequencing probe set is complementary to the same end tag in
a collection of end tags, wherein said collection of end tags is
determined by selecting a restriction enzyme, using a computer to
identify the 15 to 30 bases immediately adjacent to each
recognition site for said restriction enzyme in a selected sequence
or collection of sequences, wherein the probe set comprises at
least 4 different probes, present at different features of the
array, wherein the probes have different numbers of inosines at the
ends of the probes.
18. The array of claim 17 wherein the array comprises at least
100,000 different ditag sequencing probe sets.
19. The array of claim 18 wherein each probe set comprises a first
probe with 0 inosines, a second probe with 1 inosine, third probe
with 2 inosines, a fourth probe with 3 inosines, a fifth probe with
4 inosines and a sixth probe with 5 inosines.
Description
RELATED APPLICATIONS
[0001] This application claims the benefit of U.S. Provisional
Application No. 61/035,697, filed Mar. 11, 2008, the entire
contents of which are hereby incorporated by reference.
FIELD OF THE INVENTION
[0002] The methods of the invention relate generally to detection
of chromosomal rearrangements and translocations using hybrid
selection and tiling arrays.
BACKGROUND OF THE INVENTION
[0003] A chromosome translocation is a chromosome abnormality
caused by rearrangement of parts between nonhomologous chromosomes.
A fusion gene may be created when the translocation joins two
otherwise separated genes, an event which is common in cancer.
Cytogenetics and karyotyping of affected cells may be used to
detect translocations. There are two main types, reciprocal (also
known as non-Robertsonian) and Robertsonian. Also, translocations
can be balanced and result in an even exchange of material with no
genetic information extra or missing, or unbalanced, having an
unequal exchange of chromosome material and sometimes resulting in
extra or missing genes or portions thereof. Chromosomal
rearrangements are known to contribute to a variety of diseases in
humans.
[0004] Translocations and inversions are structural abnormalities;
other types of chromosomal abnormalities include numerical or copy
number changes, for example, extra or missing chromosomes or
chromosomal regions and large-scale deletions or duplications.
Structural abnormalities can arise from errors during homologous
recombination. Both structural and numerical abnormalities can
occur in gametes and therefore will be present in all cells of an
affected person's body, or they can occur during mitosis and give
rise to a genetic mosaic individual who has some normal and some
abnormal cells.
SUMMARY OF THE INVENTION
[0005] In a first aspect, methods are provided for assaying a
diploid sample for the presence of a translocation, by assessing
whether the sample contains at least one DNA molecule consisting of
sequences normally affiliated with two different chromosomes. This
method entails specifically capturing and amplifying one chromosome
from a sample by hybrid selection, and assaying the captured
material for the presence of other chromosomes by hybridizing the
captured material to a whole-genome tiling array.
[0006] In another aspect methods are provided for assaying a
diploid sample for the presence of large-scale rearrangements,
including insertions, deletions, translocations, and inversions, by
globally assessing whether the ends of restrictions fragments from
a sample have been rearranged with respect to each other and their
position in the reference sequence of the human genome.
[0007] The methods accomplish this via the following steps, which
will be described in more detail below.
[0008] First, digest a genomic DNA of interest with a restriction
enzyme, and then generate a population of "paired-end ditags", each
of which is derived from a different restriction fragment, and each
of which contains an approximately 18 bp tag from the left terminus
of the restriction fragment coupled directly to an approximately 18
bp tag from the right terminus of the restriction fragment.
[0009] Second, hybridize the population of ditags to a "ditag
sequencing array", designed to capture every tag on the array, and
to generate enough sequence information regarding each of the
captured tags' ditag mates to determine the identity of all
.about.500 K ditags in the sample. Perform the on-chip chemistry
necessary to determine the identities of all ditags in the sample.
Compare the sample's ditags with those predicted from the human
genome. Variant ditags indicate restriction fragments containing
rearrangements with respect to the reference sequence of the human
genome.
BRIEF DESCRIPTION OF THE DRAWINGS
[0010] The accompanying drawings, which are incorporated in and
form a part of this specification, illustrate embodiments of the
invention and, together with the description, serve to explain the
principles of the invention:
[0011] FIG. 1 shows an exemplary sample prep process.
[0012] FIG. 2 shows ditag sequencing array features to query tags
from one ditag.
[0013] FIG. 3 shows examples of a ditag hybridized to three query
tags with different numbers of inosines.
[0014] FIG. 4 provides an example of the resulting orientation of
the tag sequences resulting from the method of FIG. 1.
[0015] FIG. 5 illustrates the expected combinations of ditag from a
genomic region in the upper panel and the expected ditags from the
same region following a deletion of a region.
DETAILED DESCRIPTION OF THE INVENTION
a) General
[0016] The present invention has many preferred embodiments and
relies on many patents, applications and other references for
details known to those of the art. Therefore, when a patent,
application, or other reference is cited or repeated below, it
should be understood that it is incorporated by reference in its
entirety for all purposes as well as for the proposition that is
recited.
[0017] As used in this application, the singular form "a," "an,"
and "the" include plural references unless the context clearly
dictates otherwise. For example, the term "an agent" includes a
plurality of agents, including mixtures thereof.
[0018] An individual is not limited to a human being but may also
be other organisms including but not limited to mammals, plants,
bacteria, or cells derived from any of the above.
[0019] Throughout this disclosure, various aspects of this
invention can be presented in a range format. It should be
understood that the description in range format is merely for
convenience and brevity and should not be construed as an
inflexible limitation on the scope of the invention. Accordingly,
the description of a range should be considered to have
specifically disclosed all the possible subranges as well as
individual numerical values within that range. For example,
description of a range such as from 1 to 6 should be considered to
have specifically disclosed subranges such as from 1 to 3, from 1
to 4, from 1 to 5, from 2 to 4, from 2 to 6, from 3 to 6 etc., as
well as individual numbers within that range, for example, 1, 2, 3,
4, 5, and 6. This applies regardless of the breadth of the
range.
[0020] The practice of the present invention may employ, unless
otherwise indicated, conventional techniques and descriptions of
organic chemistry, polymer technology, molecular biology (including
recombinant techniques), cell biology, biochemistry, and
immunology, which are within the skill of the art. Such
conventional techniques include polymer array synthesis,
hybridization, ligation, and detection of hybridization using a
label. Specific illustrations of suitable techniques can be had by
reference to the example herein below. However, other equivalent
conventional procedures can, of course, also be used. Such
conventional techniques and descriptions can be found in standard
laboratory manuals such as Genome Analysis: A Laboratory Manual
Series (Vols. I-IV), Using Antibodies: A Laboratory Manual, Cells:
A Laboratory Manual, PCR Primer: A Laboratory Manual, and Molecular
Cloning: A Laboratory Manual (all from Cold Spring Harbor
Laboratory Press), Stryer, L. (1995) Biochemistry (4th Ed.)
Freeman, New York, Gait, "Oligonucleotide Synthesis: A Practical
Approach" 1984, IPL Press, London, Nelson and Cox (2000),
Lehninger, Principles of Biochemistry 3.sup.rd Ed., W. H. Freeman
Pub., New York, N.Y. and Berg et al. (2002) Biochemistry, 5.sup.th
Ed., W. H. Freeman Pub., New York, N.Y., all of which are herein
incorporated in their entirety by reference for all purposes.
[0021] The present invention can employ solid substrates, including
arrays in some preferred embodiments. Methods and techniques
applicable to polymer (including protein) array synthesis have been
described in U.S. Ser. No. 09/536,841, now abandoned, WO 00/58516,
U.S. Pat. Nos. 5,143,854, 5,242,974, 5,252,743, 5,324,633,
5,384,261, 5,405,783, 5,424,186, 5,451,683, 5,482,867, 5,491,074,
5,527,681, 5,550,215, 5,571,639, 5,578,832, 5,593,839, 5,599,695,
5,624,711, 5,631,734, 5,795,716, 5,831,070, 5,837,832, 5,856,101,
5,858,659, 5,936,324, 5,968,740, 5,974,164, 5,981,185, 5,981,956,
6,025,601, 6,033,860, 6,040,193, 6,090,555, 6,136,269, 6,269,846
and 6,428,752, in PCT Applications Nos. PCT/US99/00730
(International Publication Number WO 99/36760) and PCT/US01/04285,
which are all incorporated herein by reference in their entirety
for all purposes. Additional methods for nucleic acid array
synthesis are disclosed in US 20070161778, Kuimelis et al, which
describes the use of acid scavengers in array synthesis and U.S.
Pat. No. 6,271,957 which describes methods for array synthesis
where areas are activated by spatial light modulation and without
the use of a photomask.
[0022] Patents that describe synthesis techniques in specific
embodiments include U.S. Pat. Nos. 5,412,087, 6,147,205, 6,262,216,
6,310,189, 5,889,165, and 5,959,098. Nucleic acid arrays are
described in many of the above patents, but the same techniques are
applied to polypeptide arrays.
[0023] Nucleic acid arrays that are useful in the present invention
include those that are commercially available from Affymetrix
(Santa Clara, Calif.) under the brand name GENECHIP.RTM.. Example
arrays are shown on the website at affymetrix.com. In preferred
aspects the arrays are arrays of oligonucleotide probes of from
length 15 to 100, more preferably from 20 to 50 and often from 20
to 30 bases in length. In preferred aspects the probes are arranged
in features so that probes of the same sequence are present in the
same feature. Many thousands, tens of thousands, hundreds of
thousands or millions of different copies of a given probe sequence
may be present in a feature. Depending on the method of synthesis
of the probes on the array features will often contain non-full
length probes that may be a portion of the desired sequence.
[0024] The present invention also contemplates many uses for
polymers attached to solid substrates. These uses include gene
expression monitoring, profiling, library screening, genotyping and
diagnostics. Gene expression monitoring and profiling methods can
be shown in U.S. Pat. Nos. 5,800,992, 6,013,449, 6,020,135,
6,033,860, 6,040,138, 6,177,248 and 6,309,822. Genotyping and uses
therefore are shown in U.S. Pub. No. 20070065816, now abandoned,
and U.S. Pat. Nos. 5,856,092, 6,300,063, 5,858,659, 6,284,460,
6,361,947, 6,368,799, 6,872,529 and 6,333,179. Other uses are
embodied in U.S. Pat. Nos. 5,871,928, 5,902,723, 6,045,996,
5,541,061, and 6,197,506.
[0025] The present invention also contemplates sample preparation
methods in certain preferred embodiments. Prior to or concurrent
with genotyping, the genomic sample may be amplified by a variety
of mechanisms, some of which may employ PCR. See, PCR Technology:
Principles and Applications for DNA Amplification (Ed. H. A.
Erlich, Freeman Press, NY, N.Y., 1992); PCR Protocols: A Guide to
Methods and Applications (Eds. Innis, et al., Academic Press, San
Diego, Calif., 1990); Mattila et al., Nucleic Acids Res. 19, 4967
(1991); Eckert et al., PCR Methods and Applications 1, 17 (1991);
PCR (Eds. McPherson et al., IRL Press, Oxford); and U.S. Pat. Nos.
4,683,202, 4,683,195, 4,800,159 4,965,188, and 5,333,675, and each
of which is incorporated herein by reference in their entireties
for all purposes. The sample may be amplified on the array. See,
for example, U.S. Pat. No. 6,300,070, which is incorporated herein
by reference.
[0026] Other suitable amplification methods include the ligase
chain reaction (LCR) (e.g., Wu and Wallace, Genomics 4, 560 (1989),
Landegren et al., Science 241, 1077 (1988) and Barringer et al.
Gene 89:117 (1990)), transcription amplification (Kwoh et al.,
Proc. Natl. Acad. Sci. USA 86, 1173 (1989) and WO88/10315),
self-sustained sequence replication (Guatelli et al., Proc. Nat.
Acad. Sci. USA, 87, 1874 (1990) and WO90/06995), selective
amplification of target polynucleotide sequences (U.S. Pat. No.
6,410,276), consensus sequence primed polymerase chain reaction
(CP-PCR) (U.S. Pat. No. 4,437,975), arbitrarily primed polymerase
chain reaction (AP-PCR) (U.S. Pat. Nos. 5,413,909, 5,861,245) and
nucleic acid based sequence amplification (NABSA). (See, U.S. Pat.
Nos. 5,409,818, 5,554,517, and 6,063,603 each of which is
incorporated herein by reference). Other amplification methods that
may be used are described in, U.S. Pat. Nos. 5,242,794, 5,494,810,
4,988,617 and in U.S. Ser. No. 09/854,317, each of which is
incorporated herein by reference.
[0027] Methods related to the paired-end tag strategy disclosed
herein have been used to characterize fragments generated in
chromosomal immunoprecipitation (ChIP) experiments using
conventional sequencing (Wei et al., Cell. 2006 January
13;124(1):207-19), and to identify 5' and 3' termini of mRNA
molecules using conventional sequencing (Ng et al., Nucleic Acids
Res. 2006 Jul. 13;34(12):e84).
[0028] Paired-end diTagging for transcriptome and genome analysis
are disclosed in Ng et al Curr Protoc Mol Biol., Chapter 21:Unit
21.12 (2007). Software tools for managing paired-end diTag (PET)
sequence data are disclosed, for example, in Chiu et al. BMC
Bioinformatics, 2006, 25;7:390.
[0029] US Patent publication Nos. 20060063158, 20050100911 and
20060183132 describe methods related to the hybrid selection
methods disclosed herein and are incorporated herein by reference
in their entireties.
[0030] Additional methods of sample preparation and techniques for
reducing the complexity of a nucleic sample are described in Dong
et al., Genome Research 11, 1418 (2001), in U.S. Pat. Nos.
6,361,947, 6,391,592, 6,632,611, 6,872,529, 6,958,225, 7,202,039
and U.S. Ser. Nos. 09/916,135, now abandoned.
[0031] Methods for conducting polynucleotide hybridization assays
have been well developed in the art. Hybridization assay procedures
and conditions will vary depending on the application and are
selected in accordance with the general binding methods known
including those referred to in: Maniatis et al. Molecular Cloning:
A Laboratory Manual (2.sup.nd Ed. Cold Spring Harbor, N.Y., 1989);
Berger and Kimmel Methods in Enzymology, Vol. 152, Guide to
Molecular Cloning Techniques (Academic Press, Inc., San Diego,
Calif., 1987); Young and Davism, P.N.A.S, 80: 1194 (1983). Methods
and apparatus for carrying out repeated and controlled
hybridization reactions have been described in U.S. Pat. Nos.
5,871,928, 5,874,219, 6,045,996 and 6,386,749, 6,391,623 each of
which are incorporated herein by reference.
[0032] The present invention also contemplates signal detection of
hybridization between ligands in certain preferred embodiments. See
U.S. Pat. Nos. 5,143,854, 5,578,832, 5,631,734, 5,834,758,
5,936,324, 5,981,956, 6,025,601, 6,141,096, 6,185,030, 6,201,639,
6,218,803, and 6,225,625, in U.S. Ser. No. 60/364,731 and in PCT
Application PCT/US99/06097 (published as WO99/47964), each of which
also is hereby incorporated by reference in its entirety for all
purposes.
[0033] Methods and apparatus for signal detection and processing of
intensity data are disclosed in, for example, U.S. Pat. Nos.
5,143,854, 5,547,839, 5,578,832, 5,631,734, 5,800,992, 5,834,758;
5,856,092, 5,902,723, 5,936,324, 5,981,956, 6,025,601, 6,090,555,
6,141,096, 6,185,030, 6,201,639; 6,218,803; and 6,225,625, in U.S.
Ser. No. 60/364,731 and in PCT Application PCT/US99/06097
(published as WO99/47964), each of which also is hereby
incorporated by reference in its entirety for all purposes.
[0034] The practice of the present invention may also employ
conventional biology methods, software and systems. Computer
software products of the invention typically include computer
readable medium having computer-executable instructions for
performing the logic steps of the method of the invention. Suitable
computer readable medium include floppy disk, CD-ROM/DVD/DVD-ROM,
hard-disk drive, flash memory, ROM/RAM, magnetic tapes and etc. The
computer executable instructions may be written in a suitable
computer language or combination of several languages. Basic
computational biology methods are described in, e.g. Setubal and
Meidanis et al., Introduction to Computational Biology Methods (PWS
Publishing Company, Boston, 1997); Salzberg, Searles, Kasif, (Ed.),
Computational Methods in Molecular Biology, (Elsevier, Amsterdam,
1998); Rashidi and Buehler, Bioinformatics Basics: Application in
Biological Science and Medicine (CRC Press, London, 2000) and
Ouelette and Bzevanis Bioinformatics: A Practical Guide for
Analysis of Gene and Proteins (Wiley & Sons, Inc., 2.sup.nd
ed., 2001). See U.S. Pat. No. 6,420,108.
[0035] The present invention may also make use of various computer
program products and software for a variety of purposes, such as
probe design, management of data, analysis, and instrument
operation. See, U.S. Pat. Nos. 5,593,839, 5,795,716, 5,733,729,
5,974,164, 6,066,454, 6,090,555, 6,185,561, 6,188,783, 6,223,127,
6,229,911 and 6,308,170.
[0036] Additionally, the present invention may have preferred
embodiments that include methods for providing genetic information
over networks such as the Internet as shown in U.S. Pub. Nos.
US20020183936, 20070087368, 20040002818, 20030120432, 20040049354
and 20030100995.
b) Definitions
[0037] A "translocation" or "chromosomal translocation" is a
chromosome abnormality caused by rearrangement of parts between
nonhomologous chromosomes. It is detected on cytogenetics or a
karyotype of affected cells. There are two main types, reciprocal
(also known as non-Robertsonian) and Robertsonian. Also,
translocations can be balanced (in an even exchange of material
with no genetic information extra or missing, and ideally fill
functionality) or unbalanced (where the exchange of chromosome
material is unequal resulting in extra or missing genes).
[0038] Reciprocal translocations are usually an exchange of
material between nonhomologous chromosomes. They are found in about
1 in 600 human newborns. Such translocations are usually harmless
and may be found through prenatal diagnosis, However, carriers of
balanced reciprocal translocations have increased risks of creating
gametes with unbalanced chromosome translocations leading to
miscarriages or children with abnormalities.
[0039] A Robertsonian translocation is a type of rearrangement that
involves two acrocentric chromosomes (chromosomes with very short p
arms, in humans includes chromosomes 13, 14, 15, 21 and 22) that
fuse near the centromere region with loss of the short arms. The
resulting karyotype in humans leaves only 45 chromosomes since two
chromosomes have fused together. A Robertsonian translocation
involving chromosomes 13 and 14 is the most common translocation in
human and is seen in about 1 in 1300 persons. Carriers of
Robertsonian translocations are phenotypically normal, but there is
a risk of unbalanced gametes which lead to miscarriages or abnormal
offspring. For example, carriers of Robertsonian translocations
involving chromosome 21 have a higher chance to have a child with
Down syndrome.
[0040] There are a number of well characterized chromosomal
abnormalities that lead to disease in humans. For example, Turner
syndrome results from a single X chromosome (45, X or 45, X0).
Klinefelter syndrome, the most common male chromosomal disease,
otherwise known as 47, XXY is caused by an extra X chromosome.
Edwards syndrome is caused by trisomy (three copies) of chromosome
18. Down syndrome, a common chromosomal disease, is caused by
trisomy of chromosome 21. Patau syndrome is caused by trisomy of
chromosome 13. Also documented are trisomy 8, trisomy 9 and trisomy
16, although they generally do not survive to birth.
[0041] There are a number of disorders that are known to arise from
loss of just a piece of one chromosome. For example, Cri du chat
(cry of the cat), from a truncated short arm on chromosome 5. 1p36
Deletion syndrome, from the loss of part of the short arm of
chromosome 1. Angelman syndrome is characterized by about 50% of
cases have a segment of the long arm of chromosome 15 missing.
Chromosomal abnormalities can also occur in cancerous cells of an
otherwise genetically normal individual. A well-documented example
is the Philadelphia chromosome, a translocation mutation commonly
associated with chronic myelogenous leukemia and less often with
acute lymphoblastic leukemia.
[0042] Translocations are typically named according to the
following: where t(A;B)(p1;q2) is used to denote a translocation
between chromosome A and chromosome B. The information in the
second set of parentheses, gives the precise location within the
chromosome for chromosomes A and B respectively--with p indicating
the short arm of the chromosome, q indicating the long arm, and the
numbers after p or q refers to regions, bands and subbands seen
when staining the chromosome.
[0043] A karyotype is the observed characteristics (number, type,
shape etc) of the chromosomes of an individual or species.
[0044] In normal diploid organisms, autosomal chromosomes are
present in two identical copies, although polyploid cells have
multiple copies of chromosomes and haploid cells have single
copies. The chromosomes are arranged and displayed (often on a
photo) in a standard format known as an idiogram: in pairs, ordered
by size and position of centromere for chromosomes of the same
size. Karyotypes are used to study chromosomal aberrations, and may
be used to determine other macroscopically visible aspects of an
individual's genotype, such as sex. In order to be able to see the
chromosomes and determine their size and internal pattern, they are
chemically labeled with a dye ("stained"). The pattern of
individual chromosomes is called chromosome banding.
[0045] Normal human karyotypes contain 22 pairs of autosomal
chromosomes and one pair of sex chromosomes. Normal karyotypes for
women contain two X chromosomes and are typically denoted 46,XX;
men have both an X and a Y chromosome denoted 46,XY.
[0046] In some embodiments of the presently disclosed methods one
or more Type IIs restriction enzyme are used. Type IIs enzymes are
a class of enzymes that cleave outside of their recognition
sequence to one side. The specificity of cleavage is determined by
the presence of the recognition site, hut the site of actual
cleavage can be variable. This provides an opportunity to "capture"
unknown sequence. For example, the recognition site for MmeI (see
U.S. Pat. No. 7,115,407) is:
TABLE-US-00001 SEQ ID NO: 10 5' . . .
TCCRAC(N).sub.20.sup..gradient. . . . 3' SEQ ID NO: 11 3' . . .
AGGYTG(N).sub.18.DELTA. . . . 5'
Another restriction enzyme that may be used is EcoP151 which has
the following recognition site:
TABLE-US-00002 SEQ ID NO: 12 5' . . .
CAGCAG(N).sub.25.sup..gradient. . . . 3' SEQ ID NO: 13 3' . . .
GTCGTC(N).sub.27.DELTA. . . . 5'
Enzymes with relatively long N regions are preferable as the length
of the "tag" is determined by the length of the N region and longer
tags provide more information. Other enzymes that may be used
include, for example, NmeAIII, BsgI, BpuEI, BpmI, AcuI, Eco57MI,
Eco57I, GsuI, and CstMI. The length of the N region is preferably
between 15 and 30 bases, for example 15, 16, 17, 18, 19, 20, 21,
22, 23, 24, 25, 26, 27, 28, 29 or 30 bases. Polishing the ends of
the resulting fragments may result in filling in the overhang with
complementary bases or removing the overhang, altering the length
of the resulting tag accordingly. In some aspects two enzymes may
be used and they may each result in a different tag length.
c) Methods for Detecting Translocations and Rearrangements
[0047] In one aspect, methods for detecting translocation events at
the genome level are disclosed. In a first step (step 1), a pool of
capture probes is created. The capture probe pool preferably
consists of DNA fragments which are complimentary to the chromosome
of interest, and which are labeled in such a way (e.g.,
biotinylated) that they can be captured on a solid surface (e.g.,
streptavidin-coated paramagnetic beads). Fragment sizes in the
range of 50-250 bases are preferred, but other sizes, for example
200 to 1000, 500 to 2000 or 200 to 2000 may be used as well. The
pool of DNA fragments may include, for example, whole genome
amplified flow-sorted chromosomes, pooled .about.10 kb LR-PCR
amplicons generated using locus-specific primers, pooled PCR
products generated using dU mediated amplification or pooled
synthetic oligonucleotides corresponding to sequences within the
chromosome of interest. Depending upon the preparation method, a
single capture probe preparation preferably generates sufficient
capture probe for 10 to 10,000 hybrid selection reactions. In
another aspect the capture probe preparation may be amplified using
a common set of primers.
[0048] In a next step, a tester sample is prepared (step 2). The
tester sample consists of DNA fragments prepared from the sample to
be analyzed such that the fragments collectively represent the
entire genome of the DNA sample, and such that the fragments can
all be amplified by PCR using a single set of PCR primers. The DNA
fragments are prepared by fragmentation of genomic DNA to generate
a desired range of double-stranded fragment sizes. Fragment sizes
in the 100-1000 bp range are preferred, but other size ranges may
also be used, for example, 200 to 2000 or 500 to 2000.
[0049] Methods that generate random double-stranded DNA fragments
include hydrodynamic shearing, sonication, and DNAse I digestion in
the presence of Mn.sup.2+ or Co.sup.2+ rather than Mg.sup.2+ (all
of these methods are preferably followed by treatment with T4
polymerase to create blunt ends). Alternatively, locus-specific
fragmentation by restriction digestion can be performed. The
fragmented dsDNA can then ligated to linkers containing universal
primer binding sites, thereby enabling amplification of tester
fragments using a single set of PCR primers. The fragments that are
amplified are a representative subset of the genome of the starting
sample.
[0050] In another aspect, Sigma's GenomPtex kit may be used to
generate tester. The kit reliably converts genomic DNA into
fragments with an average size of .about.500 bp that are decorated
with universal primer binding sites. See US Pat Pub 20030143599,
20040209299 and 20070031857.
[0051] Next, isolate and amplify tester fragments which hybridize
to capture probes (step 3). The capture probe and tester fragments
are hybridized together under conditions that result in the
specific hybridization of tester fragments that are complimentary
to the sequences in the capture probe. Because the tester fragments
are derived from the entire genome, the molar concentration of the
tester fragments can be relatively low (.about.10-100 fM). To
ensure capture of cognate tester, it is preferably to include high
molar concentrations (10-100 pM) of capture probe to drive the
hybridization reaction, as well as to allow sufficient tune
(.about.48 H) for capture. Also, in a preferred aspect,
non-biotinylated Cot1 DNA may be included to block by competitive
hybridization the capture of tester fragments containing repetitive
elements.
[0052] The capture probes (as well as any hybridized tester
fragments) are captured onto solid phase using e.g.,
streptavidin-coated paramagnetic beads (step 4). After washing the
beads several times remove unhybridized tester fragments, the
hybridized tester fragments are eluted from the beads by
denaturation (step 5). The eluted tester fragments can then be PCR
amplified using the tester-specific primers discussed in the
previous section (step 6) in preparation for hybridization
analysis.
[0053] In preferred aspects the amplified tester fragments from
above are analyzed by hybridization to a tiling array to assay for
chromosomal translocation. In the absence of a chromosomal
rearrangement, the tester-specific PCR product should contain only
amplified fragments complimentary to the capture probes. However,
in the event of a chromosomal translocation, the PCR product will
contain fragments derived from two different chromosomes: the
chromosome assayed by the capture probe, and some other chromosome.
To detect such events one can hybridize target prepared from the
tester-specific PCR product onto a whole genome tiling array. The
tester-specific PCR product is fragmented to 50-100 base fragments
using DNAse I, and then end-labeled with biotin using TdT,
following standard protocols. The resulting target is hybridized to
a whole genome tiling array, and the array is stained and scanned,
again following standard protocols.
[0054] The resulting hybridization pattern is then analyzed for
evidence of translocation. Hybridization signal from array features
corresponding to the chromosome targeted by the capture probe is
indicative of successful positive selection during the hybrid
selection process. By contrast, absence of hybridization signal
from array features corresponding to chromosomes not targeted by
the capture probe is indicative of successful negative selection
during the hybrid selection process. Finally, the presence of
hybridization signal from array features corresponding to
chromosomes not targeted by the capture probe would be indicative
of a translocation.
[0055] In another aspect, depending upon the size of the tester
fragments subjected to the hybrid selection process, capture probes
would not need to cover the entire chromosomal sequence, yet could
still capture tester fragments covering the entire chromosome. For
example, if the average size of tester fragments was 10 kb, then
capture probes spaced every 5 kb could capture tester fragments
covering the entire chromosome of interest. Similarly, 10 kb tester
fragments would theoretically allow tiling probe densities of one
probe pair (PM, MM) every 5 kb to detect virtually any
translocation. This density would allow one to query the human
genome with about 600,000 probe pairs (3E9/5E3=6E5). Higher density
would ensure any translocation would be detected by multiple probe
pairs, thereby enabling higher sensitivity and specificity.
[0056] The process described above enables one to assay for
translocations involving a single chromosome with a single hybrid
selection reaction and a single whole genome tiling array. The same
methods may be applied to perform 24 separate hybrid selection
reactions, one per chromosome, and assaying each of these on its
own genome-wide tiling array, thus providing the ability to detect
translocations between all possible pair-wise combinations of
chromosomes. This approach has the added benefit of informational
redundancy, i.e., a translocation between chromosomes 9 and 22
would be detected twice, once by the chromosome 9 hybrid selection
reaction, and once by the chromosome 22 hybrid selection reaction.
Moreover, the exact translocation breakpoint could be mapped to
within the density of probes on the tiling array.
[0057] To reduce the number of reactions required to assay all
possible chromosome combinations, multiple chromosomes may be
assayed in a single hybrid selection reaction, and each chromosome
can be assayed in multiple reactions, such that a unique assignment
could be inferred from the data (e.g., see table 1 below).
TABLE-US-00003 TABLE 1 Hybrid Selection Reaction 1 2 3 4 5 6 7 8 9
10 Chromosome 1 X X X Assayed in 2 X X X Reaction 3 X X X 4 X X X 5
X X X 6 X X X 7 X X X 8 X X X 9 X X X 10 X X X 11 X X X 12 X X X 13
X X X 14 X X X 15 X X X 16 X X X 17 X X X 18 X X X 19 X X X 20 X X
X 21 X X X 22 X X X 23 X X X 24 X X X
[0058] For example, if there is a translocation between chromosome
9 and chromosome 22, using the reactions in Table 1 it could be
assigned from reactions 2 and 3. A translocation between
chromosomes 1 and 2 could be assigned from reactions 7 and 8.
[0059] In another aspect, the detection and mapping of particular
translocations can be targeted, rather than targeting the detection
of all possible translocations genome-wide. This would be
particularly valuable in contexts where patients may have a
translocation involving a specific pair of chromosomes, but where
the exact translocation breakpoint may vary from patient to
patient. The methods disclosed herein may be combined with those
disclosed in US 2006073511.
Detection of Rearrangements using DITAGs.
[0060] In another aspect methods for detecting and analyzing
genomic rearrangements using "ditag" methodology are disclosed.
Ditags are disclosed, for example, in Wei et al., Cell 2006 Jan
13;124(1):207-19, and Ng et al., Nucleic Acids Res. 2006 Jul.
13;34(12):e84, which are both incorporated herein by reference in
their entireties for all purposes. In a first step "ditags" are
generated from genomic DNA. In a preferred aspect, the sample prep
is illustrated in FIG. 1. Digest a genomic DNA of interest with a
restriction enzyme, e.g., a 6-cutter that produces a total of
approximately 500,000 restriction fragments 101. The "tags" [103]
and [105] are the sequences at the ends of the restriction
fragments and can be predicted using genomic sequence databases and
in silico digestion methods. The central portion of the restriction
fragment is 104. Ligate the population of restriction fragments (RE
frag or RE fragment) en masse into a "ditag plasmid backbone" 107
forming circles 108. In one embodiment the resulting library of
circularized restriction fragments can be transformed into E.coli
(provided the backbone 107 contains the required elements needed
for reproduction in bacteria). The transformed bacteria may be used
to amplify the material for subsequent steps. Exonuclease cleavage
of non-circularized fragments may also be performed.
[0061] The ditag plasmid backbone 107 contains type IIs restriction
enzyme (e.g., MmeI) sites flanking both ends of the restriction
fragment cloning site so that cleavage occurs in the restriction
fragment (sites of cleavage indicated by arrows). The ditag
plasmid/restriction fragment DNA 108 is digested with the type IIs
restriction enzyme, thereby separating the central portion of the
restriction fragment 104 from the rest of 108. The tags 103 and 105
include the terminal 18 bp (when MmeI is used) from the ends of the
restriction fragments. The length of the tags will vary depending
on the type IIs enzyme used. The resulting fragment (includes 103,
107 and 105) is then circularized to form a circle 109 containing
the ditags. The ditag is the combination of tags 103 and 105 joined
together by ligation of the free ends. The ditags can be amplified,
for example, using PCR amplification with primers 111 and 113 which
are complementary to sequences in 107, to create ditag target 115,
which contains all ditags from all restriction fragments from the
genomic DNA. The amplified ditag target can be labeled during or
after amplification, for example, by incorporation of a
biotinylated, or otherwise labeled, nucleotide during synthesis or
by end labeling using a terminal transferase. The ditag target can
then be hybridized to a ditag sequencing array, described below,
for analysis.
[0062] In preferred aspects a ditag sequencing array is used for
sequencing analysis. Given a set of about 500,000 restriction
fragments containing 1 million tags (2 per fragment), having known
sequences that are adjacent to the selected restriction site or
sites, and given the possibility that a genomic rearrangement could
bring any tag into the same restriction fragment as any other tag,
to detect every possible combination of the about 1 million tags
coupled to all other about 1 million tags using direct
hybridization would require about 1 trillion probes. Reduction of
the number of probes required for analysis may be achieved by using
methods such as those shown in FIG. 2.
[0063] The ditag sequencing array (see FIG. 2) enables capture by
hybridization of each of the 1M tags, using probes that are
perfectly complementary to the tags, followed by determination of a
number of bases of sequence from the adjacent tag in the ditag. The
array shown in FIG. 2 determines 6 bases of the adjacent tag. This
is accomplished using probe sets 201 and 203 specific for each
strand of each tag 205 (forward tag) and 206 (reverse tag), where
each probe set consists of 6 probes that have a portion that is
complimentary to the captured tag. Probe set 201 is complementary
to forward tag 205 while probe set 203 is complementary to the
reverse tag 206. The 6 probes differ from one another in that they
have from 0 to 5 inosine bases at their termini, shown in the
figure as increasing length of the open square. This enables
genotyping of 6 sequential bases in the hybridized tag 205 or 206,
using either single-base extension of 3'-up probes or base-specific
ligation to 5'-PO.sub.4 probes. Each of the probes in the probe set
can be used to determine one base in the unknown tag. For probe set
201 the unknown is the portion of the forward tag sequence
corresponding to 105. For probe set 203 the unknown is the portion
of the reverse tag sequence corresponding to 103. The lower portion
of the probe is constant within probe set 201 or probe set 203 and
is the complement of 103 and 105 respectively.
[0064] By determining 6 bases of information for each tag, one can
distinguish between a maximum of 4.sup.6=4096 possible states.
Thus, 6 bases of sequence should reduce the universe of possible
mates from .about.1M to .about.1M/4K=.about.250. In addition, by
comparing the 6 bases of sequence information with the sequence of
the wild-type tag, one can determine with very high confidence
whether the ditag is variant. Because the number of variant ditags
in any given sample is expected to be a small fraction (e.g.,
<500) of the total .about.500K tags, the total universe of
variant tags that need be considered in a given sample will be a
small subset (e.g., <1000) of all .about.1M tags. As such, 6
bases of information per tag is likely sufficient to match most
tags in variant ditags up with their mates. Moreover, if there are
ambiguities, comparing sets of candidate tag mates for each tag
across all variant tags, and identifying concordant mates between
pairs of tags, should result in the determination of virtually all
variant ditags with high confidence.
[0065] Ditag sequencing is performed to determine the identity of
all ditags in the sample. The PCR product 115 is directly
hybridized to the array in some aspects and may be about 55 to 120
base pairs, more preferably 70 to 100 and more preferably about 70
to 80 bp. In another aspect where shorter fragments are desired,
the ditags may be liberated from the primer sequences in the PCR
product 115 by digestion with a restriction enzyme. In some aspects
the type IIs restriction enzyme used to separate the ditag plasmid
from the rest of the restriction fragment 104 may be used. Thus, in
preferred aspects the PCR product does not need to be digested with
a non-sequence specific nuclease such as DNAseI and also does not
require labeling prior to hybridization since the probes will
preferably be labeled.
[0066] A 500,000 fragment ditag target would have a complexity of
about 50 Mbp. Typically we have observed .about.90% call rates and
99% accuracy from haploid genotyping (4 possible genotype states
per position) of single base extension data generated from targets
of this complexity. For diploid organisms, including humans,
variant tag base calling is typically performed in the presence of
wild-type tag sequence. However, this task is not nearly as
difficult as de novo diploid genotyping (which must consider 10
possible genotype states per position), because the sequence of the
wild-type allele is known, so only four genotype states are
possible per position. Therefore, a 90% call rate and 99% accuracy
should be approximately representative of the data quality we can
expect from single base extension.
[0067] The single base extension method is shown in greater detail
in FIG. 3. The ditag sequence (SEQ ID NO 1) is shown hybridized to
3 different probes (SEQ ID NO 2, 3 and 4) that are complementary to
one of the tags (the 3' 18 bases of SEQ ID NO 1) and are designed
to sequence individual positions in the second tag (the 5' 18 bases
of SEQ ID NO 1). The probes are attached via their 5' ends so the
3' end is available for extension (or ligation). The first probe
(SEQ ID NO 2) varies from the second probe (SEQ ID NO 3) by the
addition in the second of a single inosine base (I) at the 3' end.
The inosine can base pair with A, G, C or T, allowing interrogation
of the second position of the second tag, G in this tag. The first
probe interrogates the first position of the second tag, C in this
tag. Template directed addition is used to add a single blocked,
labeled nucleotide to the 3' end of the probes. The base that is
added is the complement of the base opposite in the second tag
sequences. Thus, a G is added to SEQ ID NO 2 resulting in SEQ ID NO
5 and indicating that the first base of the second tag is C. A C is
added to SEQ ID NO 3 resulting in SEQ ID NO 6 and indicating that
the second base of the second tag is G. An A is added to SEQ ID NO
4 resulting in SEQ ID NO 7 and indicating that the second base of
the second tag is T. The labels are indicated by a * and in
preferred aspects each label is specific for the base. The bases
are preferably blocked from extension, for example, by using bases
that are dideoxy or are otherwise blocked at the 3' position so
that only a single base is added. Each probe is present at a
different feature at a known or determinable location. Features
have many hundreds, thousands, or more, copies of the same probe
sequence.
[0068] In a preferred aspect the sequencing analysis uses 4-color
single base extension or base specific ligation. Each of the bases
(A,G, C and T) is labeled with a different distinguishable label so
that the identity of the base that is incorporated into the probe
can be determined and that can be used to determine the base
present in the ditag at the complementary position. For example, if
an A is incorporated into the probe then the ditag has a T at that
position. In another aspect the assay may be performed using a
single label and performing the extension or ligation reactions in
separate parallel reactions on separate arrays each having a
different base (A, G, C or T) present. A combination approach may
also be used, for example, two different labels and two different
arrays.
[0069] Data quality may degrade somewhat with single base extension
from probes containing multiple inosines, but it is still
sufficient for the definition of variant ditags as described above.
If additional information content is desired, the number of bases
sequenced can be increased beyond 6 bases of information.
[0070] FIG. 4 illustrates the same method shown in FIG. 1 but with
example tag sequences at the ends of the fragment. The fragment 401
has double stranded tags at the ends. After ligation to the
backbone sequence the construct shown in 403 is obtained. The
arrows connect the restriction enzyme recognition site (RE) with
the cleavage site (at the end of the arrow point). After cleavage
with the RE the construct shown in 405 is obtained. The sequence on
the left in 401 is still on the left but the orientation is
flipped. After the second ligation, the construct of 407 is
obtained. The left and right tags from 401 are now ligated
together. The orientation is the same as in the original fragment
but with the center portion removed. This fragment is then
amplified.
[0071] The above example contemplates using .about.500K restriction
fragments, at an average fragment size of .about.6K. This would
allow mapping the breakpoint(s) of most genomic rearrangements from
1 kbp to within 6 kbp. The number of fragments can be reduced,
reducing the number of tags that must be analyzed, by using a
restriction enzyme that cuts the genome less frequently. The
resolution of the method is reduced if larger fragments sizes are
used. In another aspect a subset of the tags may be analyzed on the
array. This also can result in a reduction in the sensitivity of
the technique in proportion to the reduction in the number of tags
being queried, for example, if one assays 100K of the 1M tags, one
should expect to detect only .about.10% of all possible genomic
rearrangements.
[0072] The sequence of the tags can be predicted using genomic
database information. In the absence of rearrangements the two ends
of a given restriction fragment can be predicted from the genomic
sequence. This is illustrated in FIG. 5. The arrows indicate
cleavage sites for a restriction enzyme and the numbered regions to
the left and right of the arrow head are the "tag" sequences
corresponding to that restriction enzyme. So, for example, the
first restriction site (a) is flanked by tag sequences 1 and 2.
Cleavage at (a) and (b) generates fragment (i) having first end
sequence (2) and second end sequence (3). The expected ditag for
this fragment would have sequence 2 and 3 in the same ditag. If
there was a rearrangement that deleted the restriction site (b) as
illustrated in the lower panel, then fragment (i) would result in a
different ditag that would have sequence 2 and 5 in the ditag.
Probes to sequence 2 would detect sequence 5 as the adjacent
sequence and probes to sequence 5 would detect sequence 2 as the
adjacent sequence.
[0073] Arrays may also be designed to detect particular types of
rearrangements directly by hybridization and without the need for
extension or ligation, For example, to evaluate inversions only
less than 100 kb in size, one need only consider .about.1M tags
times the .about.20 other tags that might be mated with each tag by
such a lesion. One could simply tile probes that are perfectly
complementary to the .about.20M possible ditags, label the PCR
product containing the ditag and dispense with the requirement for
single base extension.
[0074] Single base extension methods have been previously described
in Syvannen, Nat Rev Genet. 2:930-942 (2001), for example. Ligation
based sequencing methods have been previously described in, for
example, EP723598. Methods for use of paired-end. genomic signature
tags for genome and epigenomic analysis are disclosed, for example,
in Dunn et al., Genet Eng (NY) 28:159-73 (2007).
Conclusion
[0075] It is to be understood that the above description is
intended to be illustrative and not restrictive. Many variations of
the invention will be apparent to those of skill in the art upon
reviewing the above description. The scope of the invention should
be determined with reference to the appended claims, along with the
full scope of equivalents to which such claims are entitled. All
cited references, including patent and non-patent literature, are
incorporated herewith by reference in their entireties for all
purposes.
Sequence CWU 1
1
13136DNAhomo sapiens 1gtccaagttc gacaatgcgt ctcatgttct agtcca
36218DNAhomo sapiens 2tggactagaa catcagac 18319DNAhomo
sapiensmisc_feature(19)..(19)n is a, c, g, or t 3tggactagaa
catcagacn 19420DNAhomo sapiensmisc_feature(19)..(20)n is a, c, g,
or t 4tggactagaa catcagacnn 20519DNAhomo sapiens 5tggactagaa
catcagacg 19620DNAhomo sapiensmisc_feature(19)..(19)n is a, c, g,
or t 6tggactagaa catcagacnc 20721DNAhomo
sapiensmisc_feature(19)..(20)n is a, c, g, or t 7tggactagaa
catcagacnn a 21820DNAhomo sapiens 8actctggtgt cgattcctgg
20919DNAhomo sapiens 9caggaatcga caccagagt 191026DNAhomo
sapiensmisc_feature(7)..(26)n is a, c, g, or t 10tccracnnnn
nnnnnnnnnn nnnnnn 261124DNAhomo sapiensmisc_feature(1)..(18)n is a,
c, g, or t 11nnnnnnnnnn nnnnnnnngt ygga 241231DNAhomo
sapiensmisc_feature(7)..(31)n is a, c, g, or t 12cagcagnnnn
nnnnnnnnnn nnnnnnnnnn n 311333DNAhomo sapiensmisc_feature(1)..(27)n
is a, c, g, or t 13nnnnnnnnnn nnnnnnnnnn nnnnnnnctg ctg 33
* * * * *