U.S. patent application number 09/741150 was filed with the patent office on 2002-06-27 for isolated nucleic acid molecules encoding human protease proteins, and uses thereof.
Invention is credited to Beasley, Ellen M., Di Francesco, Valentina, Guegler, Karl, Ketchum, Karen A., Shao, Wei, Webster, Marion, Yan, Chunhua.
Application Number | 20020081704 09/741150 |
Document ID | / |
Family ID | 26942293 |
Filed Date | 2002-06-27 |
United States Patent
Application |
20020081704 |
Kind Code |
A1 |
Guegler, Karl ; et
al. |
June 27, 2002 |
ISOLATED NUCLEIC ACID MOLECULES ENCODING HUMAN PROTEASE PROTEINS,
AND USES THEREOF
Abstract
The present invention provides amino acid sequences of peptides
that are encoded by genes within the human genome, the protease
peptides of the present invention. The present invention
specifically provides isolated peptide and nucleic acid molecules,
methods of identifying orthologs and paralogs of the protease
peptides, and methods of identifying modulators of the protease
peptides.
Inventors: |
Guegler, Karl; (Menlo Park,
CA) ; Webster, Marion; (San Francesco, CA) ;
Yan, Chunhua; (Boyds, MD) ; Shao, Wei;
(Frederick, MD) ; Ketchum, Karen A.; (Germantown,
MD) ; Di Francesco, Valentina; (Rockville, MD)
; Beasley, Ellen M.; (Darnstown, MD) |
Correspondence
Address: |
CELERA GENOMICS CORP.
ATTN: WAYNE MONTGOMERY, VICE PRES, INTEL PROPERTY
45 WEST GUDE DRIVE
C2-4#20
ROCKVILLE
MD
20850
US
|
Family ID: |
26942293 |
Appl. No.: |
09/741150 |
Filed: |
December 21, 2000 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
60252410 |
Nov 22, 2000 |
|
|
|
Current U.S.
Class: |
435/226 ;
435/325; 435/69.1; 435/7.1; 506/14; 536/23.2 |
Current CPC
Class: |
C12N 9/64 20130101; G01N
2333/96433 20130101; Y10T 436/143333 20150115; G01N 33/573
20130101; G01N 2500/04 20130101 |
Class at
Publication: |
435/226 ;
435/69.1; 435/6; 435/7.1; 536/23.2; 435/325 |
International
Class: |
C12N 009/64; G01N
033/53; C12Q 001/68; C12P 021/02; C12N 005/06 |
Claims
That which is claimed is:
1. An isolated peptide consisting of an amino acid sequence
selected from the group consisting of: (a) an amino acid sequence
shown in SEQ ID NO:2; (b) an amino acid sequence of an allelic
variant of an amino acid sequence shown in SEQ ID NO:2, wherein
said allelic variant is encoded by a nucleic acid molecule that
hybridizes under stringent conditions to the opposite strand of a
nucleic acid molecule shown in SEQ ID NOS:1 or 3; (c) an amino acid
sequence of an ortholog of an amino acid sequence shown in SEQ ID
NO:2, wherein said ortholog is encoded by a nucleic acid molecule
that hybridizes under stringent conditions to the opposite strand
of a nucleic acid molecule shown in SEQ ID NOS:1 or 3; and (d) a
fragment of an amino acid sequence shown in SEQ ID NO:2, wherein
said fragment comprises at least 10 contiguous amino acids.
2. An isolated peptide comprising an amino acid sequence selected
from the group consisting of: (a) an amino acid sequence shown in
SEQ ID NO:2; (b) an amino acid sequence of an allelic variant of an
amino acid sequence shown in SEQ ID NO:2, wherein said allelic
variant is encoded by a nucleic acid molecule that hybridizes under
stringent conditions to the opposite strand of a nucleic acid
molecule shown in SEQ ID NOS:1 or 3; (c) an amino acid sequence of
an ortholog of an amino acid sequence shown in SEQ ID NO:2, wherein
said ortholog is encoded by a nucleic acid molecule that hybridizes
under stringent conditions to the opposite strand of a nucleic acid
molecule shown in SEQ ID NOS:1 or 3; and (d) a fragment of an amino
acid sequence shown in SEQ ID NO:2, wherein said fragment comprises
at least 10 contiguous amino acids.
3. An isolated antibody that selectively binds to a peptide of
claim 2.
4. An isolated nucleic acid molecule consisting of a nucleotide
sequence selected from the group consisting of: (a) a nucleotide
sequence that encodes an amino acid sequence shown in SEQ ID NO:2;
(b) a nucleotide sequence that encodes of an allelic variant of an
amino acid sequence shown in SEQ ID NO:2, wherein said nucleotide
sequence hybridizes under stringent conditions to the opposite
strand of a nucleic acid molecule shown in SEQ ID NOS:1 or 3; (c) a
nucleotide sequence that encodes an ortholog of an amino acid
sequence shown in SEQ ID NO:2, wherein said nucleotide sequence
hybridizes under stringent conditions to the opposite strand of a
nucleic acid molecule shown in SEQ ID NOS:1 or 3; (d) a nucleotide
sequence that encodes a fragment of an amino acid sequence shown in
SEQ ID NO:2, wherein said fragment comprises at least 10 contiguous
amino acids; and (e) a nucleotide sequence that is the complement
of a nucleotide sequence of (a)-(d).
5. An isolated nucleic acid molecule comprising a nucleotide
sequence selected from the group consisting of: (a) a nucleotide
sequence that encodes an amino acid sequence shown in SEQ ID NO:2;
(b) a nucleotide sequence that encodes of an allelic variant of an
amino acid sequence shown in SEQ ID NO:2, wherein said nucleotide
sequence hybridizes under stringent conditions to the opposite
strand of a nucleic acid molecule shown in SEQ ID NOS:1 or 3; (c) a
nucleotide sequence that encodes an ortholog of an amino acid
sequence shown in SEQ ID NO:2, wherein said nucleotide sequence
hybridizes under stringent conditions to the opposite strand of a
nucleic acid molecule shown in SEQ ID NOS:1 or 3; (d) a nucleotide
sequence that encodes a fragment of an amino acid sequence shown in
SEQ ID NO:2, wherein said fragment comprises at least 10 contiguous
amino acids; and (e) a nucleotide sequence that is the complement
of a nucleotide sequence of (a)-(d).
6. A gene chip comprising a nucleic acid molecule of claim 5.
7. A transgenic non-human animal comprising a nucleic acid molecule
of claim 5.
8. A nucleic acid vector comprising a nucleic acid molecule of
claim 5.
9. A host cell containing the vector of claim 8.
10. A method for producing any of the peptides of claim 1
comprising introducing a nucleotide sequence encoding any of the
amino acid sequences in (a)-(d) into a host cell, and culturing the
host cell under conditions in which the peptides are expressed from
the nucleotide sequence.
11. A method for producing any of the peptides of claim 2
comprising introducing a nucleotide sequence encoding any of the
amino acid sequences in (a)-(d) into a host cell, and culturing the
host cell under conditions in which the peptides are expressed from
the nucleotide sequence.
12. A method for detecting the presence of any of the peptides of
claim 2 in a sample, said method comprising contacting said sample
with a detection agent that specifically allows detection of the
presence of the peptide in the sample and then detecting the
presence of the peptide.
13. A method for detecting the presence of a nucleic acid molecule
of claim 5 in a sample, said method comprising contacting the
sample with an oligonucleotide that hybridizes to said nucleic acid
molecule under stringent conditions and determining whether the
oligonucleotide binds to said nucleic acid molecule in the
sample.
14. A method for identifying a modulator of a peptide of claim 2,
said method comprising contacting said peptide with an agent and
determining if said agent has modulated the function or activity of
said peptide.
15. The method of claim 14, wherein said agent is administered to a
host cell comprising an expression vector that expresses said
peptide.
16. A method for identifying an agent that binds to any of the
peptides of claim 2, said method comprising contacting the peptide
with an agent and assaying the contacted mixture to determine
whether a complex is formed with the agent bound to the
peptide.
17. A pharmaceutical composition comprising an agent identified by
the method of claim 16 and a pharmaceutically acceptable carrier
therefor.
18. A method for treating a disease or condition mediated by a
human protease protein, said method comprising administering to a
patient a pharmaceutically effective amount of an agent identified
by the method of claim 16.
19. A method for identifying a modulator of the expression of a
peptide of claim 2, said method comprising contacting a cell
expressing said peptide with an agent, and determining if said
agent has modulated the expression of said peptide.
20. An isolated human protease peptide having an amino acid
sequence that shares at least 70% homology with an amino acid
sequence shown in SEQ ID NO:2.
21. A peptide according to claim 20 that shares at least 90 percent
homology with an amino acid sequence shown in SEQ ID NO:2.
22. An isolated nucleic acid molecule encoding a human protease
peptide, said nucleic acid molecule sharing at least 80 percent
homology with a nucleic acid molecule shown in SEQ ID NOS:1 or
3.
23. A nucleic acid molecule according to claim 22 that shares at
least 90 percent homology with a nucleic acid molecule shown in SEQ
ID NOS:1 or 3.
Description
RELATED APPLICATIONS
[0001] The present application claims priority to provisional
application U.S. Serial No. 60/252,410, filed Nov. 22, 2000 (Atty.
Docket CL000968-PROV).
FIELD OF THE INVENTION
[0002] The present invention is in the field of protease proteins
that are related to the ATP-dependent protease subfamily (a type of
mitochondrial lon protease homolog 1 precursor), recombinant DNA
molecules, and protein production. The present invention
specifically provides novel peptides and proteins that effect
protein cleavage/processing/turnover and nucleic acid molecules
encoding such peptide and protein molecules, all of which are
useful in the development of human therapeutics and diagnostic
compositions and methods.
BACKGROUND OF THE INVENTION
[0003] The proteases may be categorized into families by the
different amino acid sequences (generally between 2 and 10
residues) located on either side of the cleavage site of the
protease.
[0004] The proper functioning of the cell requires careful control
of the levels of important structural proteins, enzymes, and
regulatory proteins. One of the ways that cells can reduce the
steady state level of a particular protein is by proteolytic
degradation. Further, one of the ways cells produce functioning
proteins is to produce pre or proprotein precursors that are
processed by proteolytic degradation to produce an active moiety.
Thus, complex and highly-regulated mechanisms have been evolved to
accomplish this degradation.
[0005] Proteases regulate many different cell proliferation,
differentiation, and signaling processes by regulating protein
turnover and processing. Uncontrolled protease activity (either
increased or decreased) has been implicated in a variety of disease
conditions including inflammation, cancer, arteriosclerosis, and
degenerative disorders.
[0006] An additional role of intracellular proteolysis is in the
stress-response. Cells that are subject to stress such as
starvation, heat-shock, chemical insult or mutation respond by
increasing the rates of proteolysis. One function of this enhanced
proteolysis is to salvage amino acids from non-essential proteins.
These amino acids can then be re-utilized in the synthesis of
essential proteins or metabolized directly to provide energy.
Another function is in the repair of damage caused by the stress.
For example, oxidative stress has been shown to damage a variety of
proteins and cause them to be rapidly degraded.
[0007] The International Union of Biochemistry and Molecular
Biology (IUBMB) has recommended to use the term peptidase for the
subset of peptide bond hydrolases (Subclass E.C 3.4.). The widely
used term protease is synonymous with peptidase. Peptidases
comprise two groups of enzymes: the endopeptidases and the
exopeptidases, which cleave peptide bonds at points within the
protein and remove amino acids sequentially from either N or
C-terminus respectively. The term proteinase is also used as a
synonym word for endopeptidase and four mechanistic classes of
proteinases are recognized by the IUBMB: two of these are described
below (also see: Handbook of Proteolytic Enzymes by Barrett,
Rawlings, and Woessner AP Press, NY 1998). Also, for a review of
the various uses of proteases as drug targets, see: Weber M,
Emerging treatments for hypertension: potential role for
vasopeptidase inhibition; Am J Hypertens November 1999; 12(11 Pt
2):139S-147S; Kentsch M, Otter W, Novel neurohormonal modulators in
cardiovascular disorders. The therapeutic potential of
endopeptidase inhibitors, Drugs R D April 1999; 1(4):331-8;
Scarborough R M, Coagulation factor Xa: the prothrombinase complex
as an emerging therapeutic target for small molecule inhibitors, J
Enzym Inhib 1998;14(1):15-25; Skotnicki J S, et al., Design and
synthetic considerations of matrix metalloproteinase inhibitors,
Ann NY Acad Sci June 1999 30;878:61-72; McKerrow J H, Engel J C,
Caffrey C R, Cysteine protease inhibitors as chemotherapy for
parasitic infections, Bioorg Med Chem April 1999;7(4):639-44; Rice
K D, Tanaka R D, Katz B A, Numerof R P, Moore W R, Inhibitors of
tryptase for the treatment of mast cell-mediated diseases, Curr
Pharm Des October 1998; 4(5):381-96; Materson B J, Will angiotensin
converting enzyme genotype, receptor mutation identification, and
other miracles of molecular biology permit reduction of NNT Am J
Hypertens August 1998; 11(8 Pt 2):138S-142S
[0008] Serine Proteases
[0009] The serine proteases (SP) are a large family of proteolytic
enzymes that include the digestive enzymes, trypsin and
chymotrypsin, components of the complement cascade and of the
blood-clotting cascade, and enzymes that control the degradation
and turnover of macromolecules of the extracellular matrix. SP are
so named because of the presence of a serine residue in the active
catalytic site for protein cleavage. SP have a wide range of
substrate specificities and can be subdivided into subfamilies on
the basis of these specificities. The main sub-families are
trypases (cleavage after arginine or lysine), aspases (cleavage
after aspartate), chymases (cleavage after phenylalanine or
leucine), metases (cleavage after methionine), and serases
(cleavage after serine).
[0010] A series of six SP have been identified in murine cytotoxic
T-lymphocytes (CTL) and natural killer (NK) cells. These SP are
involved with CTL and NK cells in the destruction of virally
transformed cells and tumor cells and in organ and tissue
transplant rejection (Zunino, S. J. et al. (1990) J. Immunol.
144:2001-9; Sayers, T. J. et al. (1994) J. Immunol. 152:2289-97).
Human homologs of most of these enzymes have been identified
(Trapani, J. A. et al. (1988) Proc. Natl. Acad. Sci. 85:6924-28;
Caputo, A. et al. (1990) J. Immunol. 145:737-44). Like all SP, the
CTL-SP share three distinguishing features: 1) the presence of a
catalytic triad of histidine, serine, and aspartate residues which
comprise the active site; 2) the sequence GDSGGP which contains the
active site serine; and 3) an N-terminal IIGG sequence which
characterizes the mature SP.
[0011] The SP are secretory proteins which contain N-terminal
signal peptides that serve to export the immature protein across
the endoplasmic reticulum and are then cleaved (von Heijne (1986)
Nuc. Acid. Res. 14:5683-90). Differences in these signal sequences
provide one means of distinguishing individual SP. Some SP,
particularly the digestive enzymes, exist as inactive precursors or
preproenzymes, and contain a leader or activation peptide sequence
3' of the signal peptide. This activation peptide may be 2-12 amino
acids in length, and it extends from the cleavage site of the
signal peptide to the N-terminal IIGG sequence of the active,
mature protein. Cleavage of this sequence activates the enzyme.
This sequence varies in different SP according to the biochemical
pathway and/or its substrate (Zunino et al, supra; Sayers et al,
supra). Other features that distinguish various SP are the presence
or absence of N-linked glycosylation sites that provide membrane
anchors, the number and distribution of cysteine residues that
determine the secondary structure of the SP, and the sequence of a
substrate binding sites such as S'. The S' substrate binding region
is defined by residues extending from approximately +17 to +29
relative to the N-terminal I (+1). Differences in this region of
the molecule are believed to determine SP substrate specificities
(Zunino et al, supra).
[0012] Trypsinogens
[0013] The trypsinogens are serine proteases secreted by exocrine
cells of the pancreas (Travis J and Roberts R. Biochemistry 1969;
8: 2884-9; Mallory P and Travis J, Biochemistry 1973; 12: 2847-51).
Two major types of trypsinogen isoenzymes have been characterized,
trypsinogen-1, also called cationic trypsinogen, and trypsinogen-2
or anionic trypsinogen. The trypsinogen proenzymes are activated to
trypsins in the intestine by enterokinase, which removes an
activation peptide from the N-terminus of the trypsinogens. The
trypsinogens show a high degree of sequence homology, but they can
be separated on the basis of charge differences by using
electrophoresis or ion exchange chromatography. The major form of
trypsinogen in the pancreas and pancreatic juice is trypsinogen-1
(Guy C O et al., Biochem Biophys Res Commun 1984; 125: 516-23). In
serum of healthy subjects, trypsinogen-1 is also the major form,
whereas in patients with pancreatitis, trypsinogen-2 is more
strongly elevated (Itkonen et al., J Lab Clin Med 1990; 115:712-8).
Trypsinogens also occur in certain ovarian tumors, in which
trypsinogen-2 is the major form (Koivunen et al., Cancer Res 1990;
50: 2375-8). Trypsin-1 in complex with alpha-1-antitrypsin, also
called alpha-1-antiprotease, has been found to occur in serum of
patients with pancreatitis (Borgstrom A and Ohlsson K, Scand J Clin
Lab Invest 1984; 44: 381-6) but determination of this complex has
not been found useful for differentiation between pancreatic and
other gastrointestinal diseases (Borgstrom et al., Scand J Clin Lab
Invest 1989; 49:757-62).
[0014] Trypsinogen-1 and -2 are closely related immunologically
(Kimland et al., Clin Chim Acta 1989; 184: 31-46; Itkonen et al.,
1990), but by using monoclonal antibodies (Itkonen et al., 1990) or
by absorbing polyclonal antisera (Kimland et al., 1989) it is
possible to obtain reagents enabling specific measurement of each
form of trypsinogen.
[0015] When active trypsin reaches the blood stream, it is
inactivated by the major trypsin inhibitors alpha-2-macroglobulin
and alpha-1-antitrypsin (AAT). AAT is a 58 kilodalton serine
protease inhibitor synthesized in the liver and is one of the main
protease inhibitors in blood. Whereas complexes between trypsin-1
and AAT are detectable in serum (Borgstrom and Ohlsson, 1984) the
complexes with alpha-2-macroglobulin are not measurable with
antibody-based assays (Ohlsson K, Acta Gastroenterol Belg 1988; 51:
3-12).
[0016] Inflammation of the pancreas or pancreatitis may be
classified as either acute or chronic by clinical criteria. With
treatment, acute pancreatitis can often be cured and normal
function restored. Chronic pancreatitis often results in permanent
damage. The precise mechanisms which trigger acute inflammation are
not understood. However, some causes in the order of their
importance are alcohol ingestion, biliary tract disease,
post-operative trauma, and hereditary pancreatitis. One theory
provides that autodigestion, the premature activation of
proteolytic enzymes in the pancreas rather than in the duodenum,
causes acute pancreatitis. Any number of other factors including
endotoxins, exotoxins, viral infections, ischemia, anoxia, and
direct trauma may activate the proenzymes. In addition, any
internal or external blockage of pancreatic ducts can also cause an
accumulation of pancreatic juices in the pancreas resulting
cellular damage.
[0017] Anatomy, physiology, and diseases of the pancreas are
reviewed, inter alia, in Guyton A C (1991) Textbook of Medical
Physiology, W B Saunders Co, Philadelphia Pa.; Isselbacher K J et
al (1994) Harrison's Principles of Internal Medicine, McGraw-Hill,
New York City; Johnson K E (1991) Histology and Cell Biology,
Harwal Publishing, Media, Pa.; and The Merck Manual of Diagnosis
and Therapy (1992) Merck Research Laboratories, Rahway, N.J.
[0018] Metalloprotease
[0019] The metalloproteases may be one of the older classes of
proteinases and are found in bacteria, fungi as well as in higher
organisms. They differ widely in their sequences and their
structures but the great majority of enzymes contain a zinc atom
which is catalytically active. In some cases, zinc may be replaced
by another metal such as cobalt or nickel without loss of the
activity. Bacterial thermolysin has been well characterized and its
crystallographic structure indicates that zinc is bound by two
histidines and one glutamic acid. Many enzymes contain the sequence
HEXXH, which provides two histidine ligands for the zinc whereas
the third ligand is either a glutamic acid (thermolysin,
neprilysin, alanyl aminopeptidase) or a histidine (astacin). Other
families exhibit a distinct mode of binding of the Zn atom. The
catalytic mechanism leads to the formation of a non covalent
tetrahedral intermediate after the attack of a zinc-bound water
molecule on the carbonyl group of the scissile bond. This
intermediate is further decomposed by transfer of the glutamic acid
proton to the leaving group.
[0020] Metalloproteases contain a catalytic zinc metal center which
participates in the hydrolysis of the peptide backbone (reviewed in
Power and Harper, in Protease Inhibitors, A. J. Barrett and G.
Salversen (eds.) Elsevier, Amsterdam, 1986, p. 219). The active
zinc center differentiates some of these proteases from calpains
and trypsins whose activities are dependent upon the presence of
calcium. Examples of metalloproteases include carboxypeptidase A,
carboxypeptidase B, and thermolysin.
[0021] Metalloproteases have been isolated from a number of
procaryotic and eucaryotic sources, e.g. Bacillus subtilis (McConn
et al., 1964, J. Biol. Chem. 239:3706); Bacillus megaterium;
Serratia (Miyata et al., 1971, Agr. Biol. Chem. 35:460);
Clostridium bifermentans (MacFarlane et al., 1992, App. Environ.
Microbiol. 58:1195-1200), Legionella pneumophila (Moffat et al.,
1994, Infection and Immunity 62:751-3). In particular, acidic
metalloproteases have been isolated from broad-banded copperhead
venoms (Johnson and Ownby, 1993, Int. J. Biochem. 25:267-278),
rattlesnake venoms (Chlou et al., 1992, Biochem. Biophys. Res.
Commun. 187:389-396) and articular cartilage (Treadwell et al.,
1986, Arch. Biochem. Biophys. 251:715-723). Neutral
metalloproteases, specifically those having optimal activity at
neutral pH have, for example, been isolated from Aspergillus sojae
(Sekine, 1973, Agric. Biol. Chem. 37:1945-1952). Neutral
metalloproteases obtained from Aspergillus have been classified
into two groups, npI and npII (Sekine, 1972, Agric. Biol. Chem.
36:207-216). So far, success in obtaining amino acid sequence
information from these fungal neutral metalloproteases has been
limited. An npII metalloprotease isolated from Aspergillus oryzae
has been cloned based on amino acid sequence presented in the
literature (Tatsumi et al., 1991, Mol. Gen. Genet. 228:97-103).
However, to date, no npI fungal metalloprotease has been cloned or
sequenced. Alkaline metalloproteases, for example, have been
isolated from Pseudomonas aeruginosa (Baumann et al., 1993, EMBO J
12:3357-3364) and the insect pathogen Xenorhabdus luminescens
(Schmidt et al., 1998, Appl. Environ. Microbiol. 54:2793-2797).
[0022] Metalloproteases have been devided into several distinct
families based primarily on activity and sturcture: 1) water
nucleophile; water bound by single zinc ion ligated to two His
(within the motif HEXXH) and Glu, His or Asp; 2) water nucleophile;
water bound by single zinc ion ligated to His, Glu (within the
motif HXXE) and His; 3) water nucleophile; water bound by single
zinc ion ligated to His, Asp and His; 4) Water nucleophile; water
bound by single zinc ion ligated to two His (within the motif IEBH)
and Glu and 5) water nucleophile; water bound by two zinc ions
ligated by Lys, Asp, Asp, Asp, Glu.
[0023] Examples of members of the metalloproteinase family include,
but are not limited to, membrane alanyl aminopeptidase (Homo
sapiens), germinal peptidyl-dipeptidase A (Homo sapiens), thimet
oligopeptidase (Rattus norvegicus), oligopeptidase F (Lactococcus
lactis), mycolysin (Streptomyces cacaoi), immune inhibitor A
(Bacillus thuringiensis), snapalysin (Streptomyces lividans),
leishmanolysin (Leishmania major), microbial collagenase (Vibrio
alginolyticus), microbial collagenase, class I (Clostridium
perfringens), collagenase 1 (Homo sapiens), serralysin (Serratia
marcescens), fragilysin (Bacteroides fragilis), gametolysin
(Chlamydomonas reinhardtii), astacin (Astacus fluviatilis),
adamalysin (Crotalus adamanteus), ADAM 10 (Bos taurus), neprilysin
(Homo sapiens), carboxypeptidase A (Homo sapiens), carboxypeptidase
E (Bos taurus), gamma-D-glutamyl-(L)-meso-diaminopimelate peptidase
I (Bacillus sphaericus), vanY D-Ala-D-Ala carboxypeptidase
(Enterococcus faecium), endolysin (bacteriophage A118), pitrilysin
(Escherichia coli), mitochondrial processing peptidase
(Saccharomyces cerevisiae), leucyl aminopeptidase (Bos taurus),
aminopeptidase I (Saccharomyces cerevisiae), membrane dipeptidase
(Homo sapiens), glutamate carboxypeptidase (Pseudomonas sp.), Gly-X
carboxypeptidase (Saccharomyces cerevisiae), O-sialoglycoprotein
endopeptidase (Pasteurella haemolytica), beta-lytic
metalloendopeptidase (Achromobacter lyticus), methionyl
aminopeptidase I (Escherichia coli), X-Pro aminopeptidase
(Escherichia coli), X-His dipeptidase (Escherichia coli),
IgA1-specific metalloendopeptidase (Streptococcus sanguis),
tentoxilysin (Clostridium tetani), leucyl aminopeptidase (Vibrio
proteolyticus), aminopeptidase (Streptomyces griseus), IAP
aminopeptidase (Escherichia coli), aminopeptidase T (Thermus
aquaticus), hyicolysin (Staphylococcus hyicus), carboxypeptidase
Taq (Thermus aquaticus), anthrax lethal factor (Bacillus
anthracis), penicillolysin (Penicillium citrinum), fungalysin
(Aspergillus fumigatus), lysostaphin (Staphylococcus simulans),
beta-aspartyl dipeptidase (Escherichia coli), carboxypeptidase Ss1
(Sulfolobus solfataricus), FtsH endopeptidase (Escherichia coli),
glutamyl aminopeptidase (Lactococcus lactis), cytophagalysin
(Cytophaga sp.), metalloendopeptidase (vaccinia virus), VanX
D-Ala-D-Ala dipeptidase (Enterococcus faecium), Ste24p
endopeptidase (Saccharomyces cerevisiae), dipeptidyl-peptidase III
(Rattus norvegicus), S2P protease (Homo sapiens), sporulation
factor SpoIVFB (Bacillus subtilis), and HYBD endopeptidase
(Escherichia coli).
[0024] Metalloproteases have been found to have a number of uses.
For example, there is strong evidence that a metalloprotease is
involved in the in vivo proteolytic processing of the
vasoconstrictor, endothelin-1. Rat metalloprotease has been found
to be involved in peptide hormone processing. One important
subfamily of the metalloproteases are the matrix
metalloproteases.
[0025] A number of diseases are thought to be mediated by excess or
undesired metalloprotease activity or by an imbalance in the ratio
of the various members of the protease family of proteins. These
include: a) osteoarthritis (Woessner, et al., J. Biol.Chem. 259(6),
3633, 1984; Phadke, et al., J. Rheumatol. 10, 852, 1983), b)
rheumatoid arthritis (Mullins, et al., Biochim. Biophys. Acta 695,
117, 1983; Woolley, et al., Arthritis Rheum. 20, 1231, 1977;
Gravallese, et al., Arthritis Rheum. 34, 1076, 1991), c) septic
arthritis (Williams, et al., Arthritis Rheum. 33, 533, 1990), d)
tumor metastasis (Reich, et al., Cancer Res. 48, 3307, 1988, and
Matrisian, et al., Proc. Nat'l. Acad. Sci., USA 83, 9413, 1986), e)
periodontal diseases (Overall, et al., J. Periodontal Res. 22, 81,
1987), f) corneal ulceration (Burns, et al., Invest. Opthalmol.
Vis. Sci. 30, 1569, 1989), g) proteinuria (Baricos, et al.,
Biochem. J. 254, 609, 1988), h) coronary thrombosis from
atherosclerotic plaque rupture (Henney, et al., Proc. Nat'l. Acad.
Sci., USA 88, 8154-8158, 1991), i) aneurysmal aortic disease (Vine,
et al., Clin. Sci. 81, 233, 1991), j) birth control (Woessner, et
al., Steroids 54, 491, 1989), k) dystrophobic epidermolysis bullosa
(Kronberger, et al., J. Invest. Dermatol. 79, 208, 1982), and l)
degenerative cartilage loss following traumatic joint injury, m)
conditions leading to inflammatory responses, osteopenias mediated
by MMP activity, n) tempero mandibular joint disease, o)
demyelating diseases of the nervous system (Chantry, et al., J.
Neurochem. 50, 688, 1988).
[0026] Aspartic protease
[0027] Aspartic proteases have been divided into several distinct
families based primarily on activity and structure. These include
1) water nucleophile; water bound by two Asp from monomer or dimer;
all endopeptidases, from eukaryote organisms, viruses or virus-like
organisms and 2) endopeptidases that are water nucleophile and are
water bound by Asp and Asn.
[0028] Most of aspartic proteases belong to the pepsin family. The
pepsin family includes digestive enzymes such as pepsin and
chymosin as well as lysosomal cathepsins D and processing enzymes
such as renin, and certain fungal proteases (penicillopepsin,
rhizopuspepsin, endothiapepsin). A second family comprises viral
proteases such as the protease from the AIDS virus (HIV) also
called retropepsin. Crystallographic studies have shown that these
enzymes are bilobed molecules with the active site located between
two homologous lobes. Each lobe contributes one aspartate residue
of the catalytically active diad of aspartates. These two aspartyl
residues are in close geometric proximity in the active molecule
and one aspartate is ionized whereas the second one is unionized at
the optimum pH range of 2-3. Retropepsins, are monomeric, i.e carry
only one catalytic aspartate and then dimerization is required to
form an active enzyme.
[0029] In contrast to serine and cysteine proteases, catalysis by
aspartic protease do not involve a covalent intermediate though a
tetrahedral intermediate exists. The nucleophilic attack is
achieved by two simultaneous proton transfer: one from a water
molecule to the diad of the two carboxyl groups and a second one
from the diad to the carbonyl oxygen of the substrate with the
concurrent CO--NH bond cleavage. This general acid-base catalysis,
which may be called a "push-pull" mechanism leads to the formation
of a non covalent neutral tetrahedral intermediate.
[0030] Examples of the aspartic protease family of proteins
include, but are not limited to, pepsin A (Homo sapiens), HIV1
retropepsin (human immunodeficiency virus type 1), endopeptidase
(cauliflower mosaic virus), bacilliform virus putative protease
(rice tungro bacilliform virus), aspergillopepsin II (Aspergillus
niger), thermopsin (Sulfolobus acidocaldarius), nodavirus
endopeptidase (flock house virus), pseudomonapepsin (Pseudomonas
sp. 101), signal peptidase II (Escherichia coli), polyprotein
peptidase (human spumaretrovirus), copia transposon (Drosophila
melanogaster), SIRE-1 peptidase (Glycine max), retrotransposon bs1
endopeptidase (Zea mays), retrotransposon peptidase (Drosophila
buzzatii), Tas retrotransposon peptidase (Ascaris lumbricoides),
Pao retrotransposon peptidase (Bombyx mori), putative proteinase of
Skippy retrotransposon (Fusarium oxysporum), tetravirus
endopeptidase (Nudaurelia capensis omega virus), presenilin 1 (Homo
sapiens).
[0031] Proteases and Cancer
[0032] Proteases are critical elements at several stages in the
progression of metastatic cancer. In this process, the proteolytic
degradation of structural protein in the basal membrane allows for
expansion of a tumor in the primary site, evasion from this site as
well as homing and invasion in distant, secondary sites. Also,
tumor induced angiogenesis is required for tumor growth and is
dependent on proteolytic tissue remodeling. Transfection
experiments with various types of proteases have shown that the
matrix metalloproteases play a dominant role in these processes in
particular gelatinases A and B (MMP-2 and MMP-9, respectively). For
an overview of this field see Mullins, et al., Biochim. Biophys.
Acta 695, 177, 1983; Ray, et al., Eur. Respir. J. 7, 2062, 1994;
Birkedal-Hansen, et al., Crit. Rev. Oral Biol. Med. 4, 197,
1993.
[0033] Furthermore, it was demonstrated that inhibition of
degradation of extracellular matrix by the native matrix
metalloprotease inhibitor TIMP-2 (a protein) arrests cancer growth
(DeClerck, et al., Cancer Res. 52, 701, 1992) and that TIMP-2
inhibits tumor-induced angiogenesis in experimental systems (Moses,
et al. Science 248, 1408, 1990). For a review, see DeClerck, et
al., Ann. N.Y. Acad. Sci. 732, 222, 1994. It was further
demonstrated that the synthetic matrix metalloprotease inhibitor
batimastat when given intraperitoneally inhibits human colon tumor
growth and spread in an orthotopic model in nude mice (Wang, et al.
Cancer Res. 54, 4726, 1994) and prolongs the survival of mice
bearing human ovarian carcinoma xenografts (Davies, et. al., Cancer
Res. 53, 2087, 1993). The use of this and related compounds has
been described in Brown, et al., WO-9321942 A2.
[0034] There are several patents and patent applications claiming
the use of metalloproteinase inhibitors for the retardation of
metastatic cancer, promoting tumor regression, inhibiting cancer
cell proliferation, slowing or preventing cartilage loss associated
with osteoarthritis or for treatment of other diseases as noted
above (e.g. Levy, et al., WO-9519965 A1; Beckett, et al.,
WO-9519956 A1; Beckett, et al., WO-9519957 A1; Beckett, et al.,
WO-9519961 A1; Brown, et al., WO-9321942 A2; Crimmin, et al.,
WO-9421625 A1; Dickens, et al., U.S. Pat. No. 4,599,361; Hughes, et
al., U.S. Pat. No. 5,190,937; Broadhurst, et al., EP 574758 A1;
Broadhurst, et al., EP 276436; and Myers, et al., EP 520573 A1.
[0035] ATP-Dependent Proteases (Mitochondrial Lon protease homolog
1 precursor) The present invention provides a novel human
ATP-dependent protease. ATP-dependent proteases, such as Lon
proteases, require ATP hydrolysis for function and play critical
roles in numerous important biological processes, such as organism
development, gene transcription, intracellular proteolysis and
protein biogenesis, prevention of nonspecific or excessive
proteolysis (Goldberg, Semin Cell Biol December 1990;1(6):423-32),
and intercellular signaling. Therefore, novel human ATP-dependent
proteases are useful for modulating/regulating any of these
important biological processes, particularly for diagnosing,
preventing and/or treating defects in proteolysis, gene
transcription, intercellular signalling, and numerous human
developmental disorders.
[0036] Many ATP-dependent proteases are involved in modulation of
proteolysis, insertion of proteins into membranes, and disassembly
or oligomerization of protein complexes (Suzuki et al., Trends
Biochem Sci April 1997;22(4):118-23). Proteolysis is critical for
maintaining the stability of important metabolic enzymes and for
effectively removing terminally damaged polypeptides (Porankiewicz
et al., Mol Microbiol May 1999; 32(3):449-58). ATP-dependent
proteases may be found in mitochondria and chloroplasts, as well as
in the cytoplasm.
[0037] In E. coli, Lon ATP-dependent proteases together with Clp
ATP-dependent proteases, account for 70-80% of the energy-dependent
degradation of proteins. Lon and Clp both interact directly with
substrates to cause degradation (Maurizi et al., Experientia
February 1992 15;48(2):178-201). Proteolysis in Escherichia coli,
such as by Lon proteases, eliminates abnormal and misfolded
proteins from the cell and also reduces the time and amounts of
availability of key regulatory proteins (Gottesman et al., Annu Rev
Genet 1996;30:465-506).
[0038] Lon-type proteases catalyze the ATP-dependent degradation of
mitochondrial matrix proteins. In yeast, mitochondrial Lon-type
proteases has been found to be involved in a variety of critical
mitochondrial functions, including mitochondrial protein turnover,
assembly of mitochondrial enzyme complexes, and maintenance of
mitochondrial DNA integrity. Furthermore, Lon-type proteases are
essential for respiratory function (Barakat et al., Plant Mol Biol
May 1998;37(1):141-54).
[0039] The improtance of Lon proteases in development is further
illustrated in Myxococcus xanthus, in which disruption of a Ion
gene (specifically, the lonD gene), encoding a Lon protease, has
been shown to block development at an early stage. The
lonD-disrupted strains of Myxococcus xanthus could not form
fruiting bodies nor myxospores (Tojo et al., J Bacteriol July
1993;175(14):4545-9).
[0040] The bsgA gene of Myxococcus xanthus encodes another
ATP-dependent protease that is critical for the regulation of early
gene expression during fruiting body formation and sporulation in
Myxococcus xanthus. Myxococcus xanthus strains with mutated bsgA
genes are unable to initiate a required cell-cell interaction,
thereby leading to an inability to transcribe normal levels of many
developmentally induced genes (Gill et al., J Bacteriol July
1993;175(14):4538-44).
[0041] Novel Lon proteases may also be useful as markers during
spermatogenesis, and during mitochondrial and germ cell development
(Meinhardt et al, Hum Reprod Update March-April
1999;5(2):108-19).
[0042] For a further review of ATP-dependent proteases, including
Lon proteases, see Schmidt et al., Curr Opin Chem Biol October
1999;3(5):584-91; Etlinger et al., Revis Biol Celular
1989;20:197-216; and Langer et al., Experientia December 1996
15;52(12):1069-76. Barakat et al., Plant Mol Biol May
1998;37(1):141-54, Suzuki et al., Science. April 1994
8;264(5156):273-6, Teichmann et al., J Biol Chem. April 1996
26;271(17):10137-42, van Dijl et al., Proc Natl Acad Sci USA.
September 1998 1;95(18):10584-9, Van Dyck et al., J Biol Chem.
January 1994 7;269(1):238-42, Rep et al., Science. October 1996
4;274(5284):103-6, Campbell et al., Mol Biol Cell. August
1994;5(8):899-905, Witte et al., EMBO J. May 1988; 7(5):1439-47,
Wang et al., Proc Natl Acad Sci USA. December 1993
1;90(23):11247-51, Leonhardt et al., Mol Cell Biol. October
1993;13(10):6304-13, Fu et al., Biochemistry. February 1998
17;37(7):1905-9.
[0043] Protease proteins, particularly members of the ATP-dependent
protease subfamily, are a major target for drug action and
development. Accordingly, it is valuable to the field of
pharmaceutical development to identify and characterize previously
unknown members of this subfamily of protease proteins. The present
invention advances the state of the art by providing a previously
unidentified human protease proteins that have homology to members
of the ATP-dependent protease subfamily.
SUMMARY OF THE INVENTION
[0044] The present invention is based in part on the identification
of amino acid sequences of human protease peptides and proteins
that are related to the ATP-dependent protease subfamily, as well
as allelic variants and other mammalian orthologs thereof. These
unique peptide sequences, and nucleic acid sequences that encode
these peptides, can be used as models for the development of human
therapeutic targets, aid in the identification of therapeutic
proteins, and serve as targets for the development of human
therapeutic agents that modulate protease activity in cells and
tissues that express the protease. Experimental data as provided in
FIG. 1 indicates expression in humans in retinoblastomas (eye),
melanotic melanomas (skin), endometrium adenocarcinomas (uterus),
adenocarcinomas (ovary), schizophrenic brain, kidney and human
heart.
DESCRIPTION OF THE FIGURE SHEETS
[0045] FIG. 1 provides the nucleotide sequence of a cDNA molecule
sequence that encodes the protease protein of the present
invention. (SEQ ID NO:1) In addition, structure and functional
information is provided, such as ATG start, stop and tissue
distribution, where available, that allows one to readily determine
specific uses of inventions based on this molecular sequence.
Experimental data as provided in FIG. 1 indicates expression in
humans in retinoblastomas (eye), melanotic melanomas (skin),
endometrium adenocarcinomas (uterus), adenocarcinomas (ovary),
schizophrenic brain, kidney and human heart.
[0046] FIG. 2 provides the predicted amino acid sequence of the
protease of the present invention. (SEQ ID NO:2) In addition
structure and functional information such as protein family,
function, and modification sites is provided where available,
allowing one to readily determine specific uses of inventions based
on this molecular sequence.
[0047] FIG. 3 provides genomic sequences that span the gene
encoding the protease protein of the present invention. (SEQ ID
NO:3) In addition structure and functional information, such as
intron/exon structure, promoter location, etc., is provided where
available, allowing one to readily determine specific uses of
inventions based on this molecular sequence. As illustrated in FIG.
3, an insertion/deletion SNP variant ("indel") was identified at
position 12469.
DETAILED DESCRIPTION OF THE INVENTION
[0048] General Description
[0049] The present invention is based on the sequencing of the
human genome. During the sequencing and assembly of the human
genome, analysis of the sequence information revealed previously
unidentified fragments of the human genome that encode peptides
that share structural and/or sequence homology to
protein/peptide/domains identified and characterized within the art
as being a protease protein or part of a protease protein and are
related to the ATP-dependent protease subfamily. Utilizing these
sequences, additional genomic sequences were assembled and
transcript and/or cDNA sequences were isolated and characterized.
Based on this analysis, the present invention provides amino acid
sequences of human protease peptides and proteins that are related
to the ATP-dependent protease subfamily, nucleic acid sequences in
the form of transcript sequences, cDNA sequences and/or genomic
sequences that encode these protease peptides and proteins, nucleic
acid variation (allelic information), tissue distribution of
expression, and information about the closest art known
protein/peptide/domain that has structural or sequence homology to
the protease of the present invention.
[0050] In addition to being previously unknown, the peptides that
are provided in the present invention are selected based on their
ability to be used for the development of commercially important
products and services. Specifically, the present peptides are
selected based on homology and/or structural relatedness to known
protease proteins of the ATP-dependent protease subfamily and the
expression pattern observed. Experimental data as provided in FIG.
1 indicates expression in humans in retinoblastomas (eye),
melanotic melanomas (skin), endometrium adenocarcinomas (uterus),
adenocarcinomas (ovary), schizophrenic brain, kidney and human
heart. The art has clearly established the commercial importance of
members of this family of proteins and proteins that have
expression patterns similar to that of the present gene. Some of
the more specific features of the peptides of the present
invention, and the uses thereof, are described herein, particularly
in the Background of the Invention and in the annotation provided
in the Figures, and/or are known within the art for each of the
known ATP-dependent protease family or subfamily of protease
proteins.
[0051] Specific Embodiments
[0052] Peptide Molecules
[0053] The present invention provides nucleic acid sequences that
encode protein molecules that have been identified as being members
of the protease family of proteins and are related to the
ATP-dependent protease subfamily (protein sequences are provided in
FIG. 2, transcript/cDNA sequences are provided in FIG. 1 and
genomic sequences are provided in FIG. 3). The peptide sequences
provided in FIG. 2, as well as the obvious variants described
herein, particularly allelic variants as identified herein and
using the information in FIG. 3, will be referred herein as the
protease peptides of the present invention, protease peptides, or
peptides/proteins of the present invention.
[0054] The present invention provides isolated peptide and protein
molecules that consist of, consist essentially of, or comprise the
amino acid sequences of the protease peptides disclosed in the FIG.
2, (encoded by the nucleic acid molecule shown in FIG. 1,
transcript/cDNA or FIG. 3, genomic sequence), as well as all
obvious variants of these peptides that are within the art to make
and use. Some of these variants are described in detail below.
[0055] As used herein, a peptide is said to be "isolated" or
"purified" when it is substantially free of cellular material or
free of chemical precursors or other chemicals. The peptides of the
present invention can be purified to homogeneity or other degrees
of purity. The level of purification will be based on the intended
use. The critical feature is that the preparation allows for the
desired function of the peptide, even if in the presence of
considerable amounts of other components (the features of an
isolated nucleic acid molecule is discussed below).
[0056] In some uses, "substantially free of cellular material"
includes preparations of the peptide having less than about 30% (by
dry weight) other proteins (i.e., contaminating protein), less than
about 20% other proteins, less than about 10% other proteins, or
less than about 5% other proteins. When the peptide is
recombinantly produced, it can also be substantially free of
culture medium, i.e., culture medium represents less than about 20%
of the volume of the protein preparation.
[0057] The language "substantially free of chemical precursors or
other chemicals" includes preparations of the peptide in which it
is separated from chemical precursors or other chemicals that are
involved in its synthesis. In one embodiment, the language
"substantially free of chemical precursors or other chemicals"
includes preparations of the protease peptide having less than
about 30% (by dry weight) chemical precursors or other chemicals,
less than about 20% chemical precursors or other chemicals, less
than about 10% chemical precursors or other chemicals, or less than
about 5% chemical precursors or other chemicals.
[0058] The isolated protease peptide can be purified from cells
that naturally express it, purified from cells that have been
altered to express it (recombinant), or synthesized using known
protein synthesis methods. Experimental data as provided in FIG. 1
indicates expression in humans in retinoblastomas (eye), melanotic
melanomas (skin), endometrium adenocarcinomas (uterus),
adenocarcinomas (ovary), schizophrenic brain, kidney and human
heart. For example, a nucleic acid molecule encoding the protease
peptide is cloned into an expression vector, the expression vector
introduced into a host cell and the protein expressed in the host
cell. The protein can then be isolated from the cells by an
appropriate purification scheme using standard protein purification
techniques. Many of these techniques are described in detail
below.
[0059] Accordingly, the present invention provides proteins that
consist of the amino acid sequences provided in FIG. 2 (SEQ ID
NO:2), for example, proteins encoded by the transcript/cDNA nucleic
acid sequences shown in FIG. 1 (SEQ ID NO: 1) and the genomic
sequences provided in FIG. 3 (SEQ ID NO:3). The amino acid sequence
of such a protein is provided in FIG. 2. A protein consists of an
amino acid sequence when the amino acid sequence is the final amino
acid sequence of the protein.
[0060] The present invention further provides proteins that consist
essentially of the amino acid sequences provided in FIG. 2 (SEQ ID
NO:2), for example, proteins encoded by the transcript/cDNA nucleic
acid sequences shown in FIG. 1 (SEQ ID NO:1) and the genomic
sequences provided in FIG. 3 (SEQ ID NO:3). A protein consists
essentially of an amino acid sequence when such an amino acid
sequence is present with only a few additional amino acid residues,
for example from about 1 to about 100 or so additional residues,
typically from 1 to about 20 additional residues in the final
protein.
[0061] The present invention further provides proteins that
comprise the amino acid sequences provided in FIG. 2 (SEQ ID NO:2),
for example, proteins encoded by the transcript/cDNA nucleic acid
sequences shown in FIG. 1 (SEQ ID NO:1) and the genomic sequences
provided in FIG. 3 (SEQ ID NO:3). A protein comprises an amino acid
sequence when the amino acid sequence is at least part of the final
amino acid sequence of the protein. In such a fashion, the protein
can be only the peptide or have additional amino acid molecules,
such as amino acid residues (contiguous encoded sequence) that are
naturally associated with it or heterologous amino acid
residues/peptide sequences. Such a protein can have a few
additional amino acid residues or can comprise several hundred or
more additional amino acids. The preferred classes of proteins that
are comprised of the protease peptides of the present invention are
the naturally occurring mature proteins. A brief description of how
various types of these proteins can be made/isolated is provided
below.
[0062] The protease peptides of the present invention can be
attached to heterologous sequences to form chimeric or fusion
proteins. Such chimeric and fusion proteins comprise a protease
peptide operatively linked to a heterologous protein having an
amino acid sequence not substantially homologous to the protease
peptide. "Operatively linked" indicates that the protease peptide
and the heterologous protein are fused in-frame. The heterologous
protein can be fused to the N-terminus or C-terminus of the
protease peptide.
[0063] In some uses, the fusion protein does not affect the
activity of the protease peptide per se. For example, the fusion
protein can include, but is not limited to, enzymatic fusion
proteins, for example beta-galactosidase fusions, yeast two-hybrid
GAL fusions, poly-His fusions, MYC-tagged, HI-tagged and Ig
fusions. Such fusion proteins, particularly poly-His fusions, can
facilitate the purification of recombinant protease peptide. In
certain host cells (e.g., mammalian host cells), expression and/or
secretion of a protein can be increased by using a heterologous
signal sequence.
[0064] A chimeric or fusion protein can be produced by standard
recombinant DNA techniques. For example, DNA fragments coding for
the different protein sequences are ligated together in-frame in
accordance with conventional techniques. In another embodiment, the
fusion gene can be synthesized by conventional techniques including
automated DNA synthesizers. Alternatively, PCR amplification of
gene fragments can be carried out using anchor primers which give
rise to complementary overhangs between two consecutive gene
fragments which can subsequently be annealed and re-amplified to
generate a chimeric gene sequence (see Ausubel et al., Current
Protocols in Molecular Biology, 1992). Moreover, many expression
vectors are commercially available that already encode a fusion
moiety (e.g., a GST protein). A protease peptide-encoding nucleic
acid can be cloned into such an expression vector such that the
fusion moiety is linked in-frame to the protease peptide.
[0065] As mentioned above, the present invention also provides and
enables obvious variants of the amino acid sequence of the proteins
of the present invention, such as naturally occurring mature forms
of the peptide, allelic/sequence variants of the peptides,
non-naturally occurring recombinantly derived variants of the
peptides, and orthologs and paralogs of the peptides. Such variants
can readily be generated using art-known techniques in the fields
of recombinant nucleic acid technology and protein biochemistry. It
is understood, however, that variants exclude any amino acid
sequences disclosed prior to the invention.
[0066] Such variants can readily be identified/made using molecular
techniques and the sequence information disclosed herein. Further,
such variants can readily be distinguished from other peptides
based on sequence and/or structural homology to the protease
peptides of the present invention. The degree of homology/identity
present will be based primarily on whether the peptide is a
functional variant or non-functional variant, the amount of
divergence present in the paralog family and the evolutionary
distance between the orthologs.
[0067] To determine the percent identity of two amino acid
sequences or two nucleic acid sequences, the sequences are aligned
for optimal comparison purposes (e.g., gaps can be introduced in
one or both of a first and a second amino acid or nucleic acid
sequence for optimal alignment and non-homologous sequences can be
disregarded for comparison purposes). In a preferred embodiment, at
least 30%, 40%, 50%, 60%, 70%, 80%, or 90% or more of the length of
a reference sequence is aligned for comparison purposes. The amino
acid residues or nucleotides at corresponding amino acid positions
or nucleotide positions are then compared. When a position in the
first sequence is occupied by the same amino acid residue or
nucleotide as the corresponding position in the second sequence,
then the molecules are identical at that position (as used herein
amino acid or nucleic acid "identity" is equivalent to amino acid
or nucleic acid "homology"). The percent identity between the two
sequences is a function of the number of identical positions shared
by the sequences, taking into account the number of gaps, and the
length of each gap, which need to be introduced for optimal
alignment of the two sequences.
[0068] The comparison of sequences and determination of percent
identity and similarity between two sequences can be accomplished
using a mathematical algorithm. (Computational Molecular Biology,
Lesk, A. M., ed., Oxford University Press, New York, 1988;
Biocomputing: Informatics and Genome Projects, Smith, D. W., ed.,
Academic Press, New York, 1993; Computer Analysis of Sequence Data,
Part 1, Griffin, A. M., and Griffin, H. G., eds., Humana Press, New
Jersey, 1994; Sequence Analysis in Molecular Biology, von Heinje,
G., Academic Press, 1987; and Sequence Analysis Primer, Gribskov,
M. and Devereux, J., eds., M Stockton Press, New York, 1991). In a
preferred embodiment, the percent identity between two amino acid
sequences is determined using the Needleman and Wunsch (J. Mol.
Biol. (48):444-453 (1970)) algorithm which has been incorporated
into the GAP program in the GCG software package (available at
http://www.gcg.com), using either a Blossom 62 matrix or a PAM250
matrix, and a gap weight of 16, 14, 12, 10, 8, 6, or 4 and a length
weight of 1, 2, 3, 4, 5, or 6. In yet another preferred embodiment,
the percent identity between two nucleotide sequences is determined
using the GAP program in the GCG software package (Devereux, J., et
al., Nucleic Acids Res. 12(1):387 (1984)) (available at
http://www.gcg.com), using a NWSgapdna.CMP matrix and a gap weight
of 40, 50, 60, 70, or 80 and a length weight of 1, 2, 3, 4, 5, or
6. In another embodiment, the percent identity between two amino
acid or nucleotide sequences is determined using the algorithm of
E. Myers and W. Miller (CABIOS, 4:11-17 (1989)) which has been
incorporated into the ALIGN program (version 2.0), using a PAM120
weight residue table, a gap length penalty of 12 and a gap penalty
of 4.
[0069] The nucleic acid and protein sequences of the present
invention can further be used as a "query sequence" to perform a
search against sequence databases to, for example, identify other
family members or related sequences. Such searches can be performed
using the NBLAST and XBLAST programs (version 2.0) of Altschul, et
al. (J. Mol. Biol. 215:403-10 (1990)). BLAST nucleotide searches
can be performed with the NBLAST program, score=100, wordlength=12
to obtain nucleotide sequences homologous to the nucleic acid
molecules of the invention. BLAST protein searches can be performed
with the XBLAST program, score=50, wordlength=3 to obtain amino
acid sequences homologous to the proteins of the invention. To
obtain gapped alignments for comparison purposes, Gapped BLAST can
be utilized as described in Altschul et al. (Nucleic Acids Res.
25(17):3389-3402 (1997)). When utilizing BLAST and gapped BLAST
programs, the default parameters of the respective programs (e.g.,
XBLAST and NBLAST) can be used.
[0070] Full-length pre-processed forms, as well as mature processed
forms, of proteins that comprise one of the peptides of the present
invention can readily be identified as having complete sequence
identity to one of the protease peptides of the present invention
as well as being encoded by the same genetic locus as the protease
peptide provided herein. As indicated by the data presented in FIG.
3, the map position was determined to be on chromosome 16 by ePCR,
and confirmed with radiation hybrid mapping.
[0071] Allelic variants of a protease peptide can readily be
identified as being a human protein having a high degree
(significant) of sequence homology/identity to at least a portion
of the protease peptide as well as being encoded by the same
genetic locus as the protease peptide provided herein. Genetic
locus can readily be determined based on the genomic information
provided in FIG. 3, such as the genomic sequence mapped to the
reference human. As indicated by the data presented in FIG. 3, the
map position was determined to be on chromosome 16 by ePCR, and
confirmed with radiation hybrid mapping. As used herein, two
proteins (or a region of the proteins) have significant homology
when the amino acid sequences are typically at least about 70-80%,
80-90%, and more typically at least about 90-95% or more
homologous. A significantly homologous amino acid sequence,
according to the present invention, will be encoded by a nucleic
acid sequence that will hybridize to a protease peptide encoding
nucleic acid molecule under stringent conditions as more fully
described below.
[0072] FIG. 3 provides information on a SNP that has been found in
the gene encoding the protease protein of the present invention.
Specifically, a thymine indel at position 12469 was identified.
[0073] Paralogs of a protease peptide can readily be identified as
having some degree of significant sequence homology/identity to at
least a portion of the protease peptide, as being encoded by a gene
from humans, and as having similar activity or function. Two
proteins will typically be considered paralogs when the amino acid
sequences are typically at least about 60% or greater, and more
typically at least about 70% or greater homology through a given
region or domain. Such paralogs will be encoded by a nucleic acid
sequence that will hybridize to a protease peptide encoding nucleic
acid molecule under moderate to stringent conditions as more fully
described below.
[0074] Orthologs of a protease peptide can readily be identified as
having some degree of significant sequence homology/identity to at
least a portion of the protease peptide as well as being encoded by
a gene from another organism. Preferred orthologs will be isolated
from mammals, preferably primates, for the development of human
therapeutic targets and agents. Such orthologs will be encoded by a
nucleic acid sequence that will hybridize to a protease peptide
encoding nucleic acid molecule under moderate to stringent
conditions, as more fully described below, depending on the degree
of relatedness of the two organisms yielding the proteins. As
indicated by the data presented in FIG. 3, the map position was
determined to be on chromosome 16 by ePCR, and confirmed with
radiation hybrid mapping.
[0075] FIG. 3 provides information on a SNP that has been found in
the gene encoding the protease protein of the present invention.
Specifically, a thymine indel at position 12469 was identified.
Non-naturally occurring variants of the protease peptides of the
present invention can readily be generated using recombinant
techniques. Such variants include, but are not limited to
deletions, additions and substitutions in the amino acid sequence
of the protease peptide. For example, one class of substitutions
are conserved amino acid substitution. Such substitutions are those
that substitute a given amino acid in a protease peptide by another
amino acid of like characteristics. Typically seen as conservative
substitutions are the replacements, one for another, among the
aliphatic amino acids Ala, Val, Leu, and Ile; interchange of the
hydroxyl residues Ser and Thr; exchange of the acidic residues Asp
and Glu; substitution between the amide residues Asn and Gln;
exchange of the basic residues Lys and Arg; and replacements among
the aromatic residues Phe and Tyr. Guidance concerning which amino
acid changes are likely to be phenotypically silent are found in
Bowie et al., Science 247:1306-1310 (1990).
[0076] Variant protease peptides can be fully functional or can
lack function in one or more activities, e.g. ability to bind
substrate, ability to cleave substrate, ability to participate in a
signaling pathway, etc. Fully functional variants typically contain
only conservative variation or variation in non-critical residues
or in non-critical regions. FIG. 2 provides the result of protein
analysis and can be used to identify critical domains/regions.
Functional variants can also contain substitution of similar amino
acids that result in no change or an insignificant change in
function. Alternatively, such substitutions may positively or
negatively affect function to some degree.
[0077] Non-functional variants typically contain one or more
non-conservative amino acid substitutions, deletions, insertions,
inversions, or truncation or a substitution, insertion, inversion,
or deletion in a critical residue or critical region.
[0078] Amino acids that are essential for function can be
identified by methods known in the art, such as site-directed
mutagenesis or alanine-scanning mutagenesis (Cunningham et al.,
Science 244:1081-1085 (1989)), particularly using the results
provided in FIG. 2. The latter procedure introduces single alanine
mutations at every residue in the molecule. The resulting mutant
molecules are then tested for biological activity such as protease
activity or in assays such as an in vitro proliferative activity.
Sites that are critical for binding partner/substrate binding can
also be determined by structural analysis such as crystallization,
nuclear magnetic resonance or photoaffinity labeling (Smith et al.,
J. Mol. Biol. 224:899-904 (1992); de Vos et al. Science 255:306-312
(1992)).
[0079] The present invention further provides fragments of the
protease peptides, in addition to proteins and peptides that
comprise and consist of such fragments, particularly those
comprising the residues identified in FIG. 2. The fragments to
which the invention pertains, however, are not to be construed as
encompassing fragments that may be disclosed publicly prior to the
present invention.
[0080] As used herein, a fragment comprises at least 8, 10, 12, 14,
16, or more contiguous amino acid residues from a protease peptide.
Such fragments can be chosen based on the ability to retain one or
more of the biological activities of the protease peptide or could
be chosen for the ability to perform a function, e.g. bind a
substrate or act as an immunogen. Particularly important fragments
are biologically active fragments, peptides that are, for example,
about 8 or more amino acids in length. Such fragments will
typically comprise a domain or motif of the protease peptide, e.g.,
active site, a transmembrane domain or a substrate-binding domain.
Further, possible fragments include, but are not limited to, domain
or motif containing fragments, soluble peptide fragments, and
fragments containing immunogenic structures. Predicted domains and
functional sites are readily identifiable by computer programs well
known and readily available to those of skill in the art (e.g.,
PROSITE analysis). The results of one such analysis are provided in
FIG. 2.
[0081] Polypeptides often contain amino acids other than the 20
amino acids commonly referred to as the 20 naturally occurring
amino acids. Further, many amino acids, including the terminal
amino acids, may be modified by natural processes, such as
processing and other post-translational modifications, or by
chemical modification techniques well known in the art. Common
modifications that occur naturally in protease peptides are
described in basic texts, detailed monographs, and the research
literature, and they are well known to those of skill in the art
(some of these features are identified in FIG. 2).
[0082] Known modifications include, but are not limited to,
acetylation, acylation, ADP-ribosylation, amidation, covalent
attachment of flavin, covalent attachment of a heme moiety,
covalent attachment of a nucleotide or nucleotide derivative,
covalent attachment of a lipid or lipid derivative, covalent
attachment of phosphotidylinositol, cross-linking, cyclization,
disulfide bond formation, demethylation, formation of covalent
crosslinks, formation of cystine, formation of pyroglutamate,
formylation, gamma carboxylation, glycosylation, GPI anchor
formation, hydroxylation, iodination, methylation, myristoylation,
oxidation, proteolytic processing, phosphorylation, prenylation,
racemization, selenoylation, sulfation, transfer-RNA mediated
addition of amino acids to proteins such as arginylation, and
ubiquitination.
[0083] Such modifications are well known to those of skill in the
art and have been described in great detail in the scientific
literature. Several particularly common modifications,
glycosylation, lipid attachment, sulfation, gamma-carboxylation of
glutamic acid residues, hydroxylation and ADP-ribosylation, for
instance, are described in most basic texts, such as
Proteins--Structure and Molecular Properties, 2nd Ed., T. E.
Creighton, W. H. Freeman and Company, New York (1993). Many
detailed reviews are available on this subject, such as by Wold,
F., Posttranslational Covalent Modification of Proteins, B. C.
Johnson, Ed., Academic Press, New York 1-12 (1983); Seifter et al.
(Meth. Enzymol. 182: 626-646 (1990)) and Rattan et al. (Ann. N. Y
Acad. Sci. 663:48-62 (1992)).
[0084] Accordingly, the protease peptides of the present invention
also encompass derivatives or analogs in which a substituted amino
acid residue is not one encoded by the genetic code, in which a
substituent group is included, in which the mature protease peptide
is fused with another compound, such as a compound to increase the
half-life of the protease peptide (for example, polyethylene
glycol), or in which the additional amino acids are fused to the
mature protease peptide, such as a leader or secretory sequence or
a sequence for purification of the mature protease peptide or a
pro-protein sequence.
[0085] Protein/Peptide Uses
[0086] The proteins of the present invention can be used in
substantial and specific assays related to the functional
information provided in the Figures; to raise antibodies or to
elicit another immune response; as a reagent (including the labeled
reagent) in assays designed to quantitatively determine levels of
the protein (or its binding partner or ligand) in biological
fluids; and as markers for tissues in which the corresponding
protein is preferentially expressed (either constitutively or at a
particular stage of tissue differentiation or development or in a
disease state). Where the protein binds or potentially binds to
another protein or ligand (such as, for example, in a
protease-effector protein interaction or protease-ligand
interaction), the protein can be used to identify the binding
partner/ligand so as to develop a system to identify inhibitors of
the binding interaction. Any or all of these uses are capable of
being developed into reagent grade or kit format for
commercialization as commercial products.
[0087] Methods for performing the uses listed above are well known
to those skilled in the art. References disclosing such methods
include "Molecular Cloning: A Laboratory Manual", 2d ed., Cold
Spring Harbor Laboratory Press, Sambrook, J., E. F. Fritsch and T.
Maniatis eds., 1989, and "Methods in Enzymology: Guide to Molecular
Cloning Techniques", Academic Press, Berger, S. L. and A. R. Kimmel
eds., 1987.
[0088] The potential uses of the peptides of the present invention
are based primarily on the source of the protein as well as the
class/action of the protein. For example, proteases isolated from
humans and their human/mammalian orthologs serve as targets for
identifying agents for use in mammalian therapeutic applications,
e.g. a human drug, particularly in modulating a biological or
pathological response in a cell or tissue that expresses the
protease. Experimental data as provided in FIG. 1 indicates that
protease proteins of the present invention are expressed in humans
in numerous cancers, including retinoblastomas (eye), melanotic
melanomas (skin), endometrium adenocarcinomas (uterus), and
adenocarcinomas (ovary). Furthermore, expression has also been
observed in the schizophrenic brain and kidney. These expression
patterns have been determined by a virtual northern blot analysis.
In addition, PCR-based tissue screening panel indicates expression
in human heart. A large percentage of pharmaceutical agents are
being developed that modulate the activity of protease proteins,
particularly members of the ATP-dependent protease subfamily (see
Background of the Invention). The structural and functional
information provided in the Background and Figures provide specific
and substantial uses for the molecules of the present invention,
particularly in combination with the expression information
provided in FIG. 1. Experimental data as provided in FIG. 1
indicates expression in humans in retinoblastomas (eye), melanotic
melanomas (skin), endometrium adenocarcinomas (uterus),
adenocarcinomas (ovary), schizophrenic brain, kidney and human
heart. Such uses can readily be determined using the information
provided herein, that which is known in the art, and routine
experimentation.
[0089] The proteins of the present invention (including variants
and fragments that may have been disclosed prior to the present
invention) are useful for biological assays related to proteases
that are related to members of the ATP-dependent protease
subfamily. Such assays involve any of the known protease functions
or activities or properties useful for diagnosis and treatment of
protease-related conditions that are specific for the subfamily of
proteases that the one of the present invention belongs to,
particularly in cells and tissues that express the protease.
Experimental data as provided in FIG. 1 indicates that protease
proteins of the present invention are expressed in humans in
numerous cancers, including retinoblastomas (eye), melanotic
melanomas (skin), endometrium adenocarcinomas (uterus), and
adenocarcinomas (ovary). Furthermore, expression has also been
observed in the schizophrenic brain and kidney. These expression
patterns have been determined by a virtual northern blot analysis.
In addition, PCR-based tissue screening panel indicates expression
in human heart.
[0090] The proteins of the present invention are also useful in
drug screening assays, in cell-based or cell-free systems.
Cell-based systems can be native, i.e., cells that normally express
the protease, as a biopsy or expanded in cell culture. Experimental
data as provided in FIG. 1 indicates expression in humans in
retinoblastomas (eye), melanotic melanomas (skin), endometrium
adenocarcinomas (uterus), adenocarcinomas (ovary), schizophrenic
brain, kidney and human heart. In an alternate embodiment,
cell-based assays involve recombinant host cells expressing the
protease protein.
[0091] The polypeptides can be used to identify compounds that
modulate protease activity of the protein in its natural state or
an altered form that causes a specific disease or pathology
associated with the protease. Both the proteases of the present
invention and appropriate variants and fragments can be used in
high-throughput screens to assay candidate compounds for the
ability to bind to the protease. These compounds can be further
screened against a functional protease to determine the effect of
the compound on the protease activity. Further, these compounds can
be tested in animal or invertebrate systems to determine
activity/effectiveness. Compounds can be identified that activate
(agonist) or inactivate (antagonist) the protease to a desired
degree.
[0092] Further, the proteins of the present invention can be used
to screen a compound for the ability to stimulate or inhibit
interaction between the protease protein and a molecule that
normally interacts with the protease protein, e.g. a substrate or a
component of the signal pathway that the protease protein normally
interacts (for example, a protease). Such assays typically include
the steps of combining the protease protein with a candidate
compound under conditions that allow the protease protein, or
fragment, to interact with the target molecule, and to detect the
formation of a complex between the protein and the target or to
detect the biochemical consequence of the interaction with the
protease protein and the target, such as any of the associated
effects of signal transduction such as protein cleavage, cAMP
turnover, and adenylate cyclase activation, etc.
[0093] Candidate compounds include, for example, 1) peptides such
as soluble peptides, including Ig-tailed fusion peptides and
members of random peptide libraries (see, e.g., Lam et al., Nature
354:82-84 (1991); Houghten et al., Nature 354:84-86 (1991)) and
combinatorial chemistry-derived molecular libraries made of D-
and/or L-configuration amino acids; 2) phosphopeptides (e.g.,
members of random and partially degenerate, directed phosphopeptide
libraries, see, e.g., Songyang et al., Cell 72:767-778 (1993)); 3)
antibodies (e.g., polyclonal, monoclonal, humanized,
anti-idiotypic, chimeric, and single chain antibodies as well as
Fab, F(ab').sub.2, Fab expression library fragments, and
epitope-binding fragments of antibodies); and 4) small organic and
inorganic molecules (e.g., molecules obtained from combinatorial
and natural product libraries).
[0094] One candidate compound is a soluble fragment of the receptor
that competes for substrate binding. Other candidate compounds
include mutant proteases or appropriate fragments containing
mutations that affect protease function and thus compete for
substrate. Accordingly, a fragment that competes for substrate, for
example with a higher affinity, or a fragment that binds substrate
but does not allow release, is encompassed by the invention.
[0095] The invention further includes other end point assays to
identify compounds that modulate (stimulate or inhibit) protease
activity. The assays typically involve an assay of events in the
signal transduction pathway that indicate protease activity. Thus,
the cleavage of a substrate, inactivation/activation of a protein,
a change in the expression of genes that are up- or down-regulated
in response to the protease protein dependent signal cascade can be
assayed.
[0096] Any of the biological or biochemical functions mediated by
the protease can be used as an endpoint assay. These include all of
the biochemical or biochemical/biological events described herein,
in the references cited herein, incorporated by reference for these
endpoint assay targets, and other functions known to those of
ordinary skill in the art or that can be readily identified using
the information provided in the Figures, particularly FIG. 2.
Specifically, a biological function of a cell or tissues that
expresses the protease can be assayed. Experimental data as
provided in FIG. 1 indicates that protease proteins of the present
invention are expressed in humans in numerous cancers, including
retinoblastomas (eye), melanotic melanomas (skin), endometrium
adenocarcinomas (uterus), and adenocarcinomas (ovary). Furthermore,
expression has also been observed in the schizophrenic brain and
kidney. These expression patterns have been determined by a virtual
northern blot analysis. In addition, PCR-based tissue screening
panel indicates expression in human heart. Binding and/or
activating compounds can also be screened by using chimeric
protease proteins in which the amino terminal extracellular domain,
or parts thereof, the entire transmembrane domain or subregions,
such as any of the seven transmembrane segments or any of the
intracellular or extracellular loops and the carboxy terminal
intracellular domain, or parts thereof, can be replaced by
heterologous domains or subregions. For example, a
substrate-binding region can be used that interacts with a
different substrate then that which is recognized by the native
protease. Accordingly, a different set of signal transduction
components is available as an end-point assay for activation. This
allows for assays to be performed in other than the specific host
cell from which the protease is derived.
[0097] The proteins of the present invention are also useful in
competition binding assays in methods designed to discover
compounds that interact with the protease (e.g. binding partners
and/or ligands). Thus, a compound is exposed to a protease
polypeptide under conditions that allow the compound to bind or to
otherwise interact with the polypeptide. Soluble protease
polypeptide is also added to the mixture. If the test compound
interacts with the soluble protease polypeptide, it decreases the
amount of complex formed or activity from the protease target. This
type of assay is particularly useful in cases in which compounds
are sought that interact with specific regions of the protease.
Thus, the soluble polypeptide that competes with the target
protease region is designed to contain peptide sequences
corresponding to the region of interest.
[0098] To perform cell free drug screening assays, it is sometimes
desirable to immobilize either the protease protein, or fragment,
or its target molecule to facilitate separation of complexes from
uncomplexed forms of one or both of the proteins, as well as to
accommodate automation of the assay.
[0099] Techniques for immobilizing proteins on matrices can be used
in the drug screening assays. In one embodiment, a fusion protein
can be provided which adds a domain that allows the protein to be
bound to a matrix. For example, glutathione-S-transferase fusion
proteins can be adsorbed onto glutathione sepharose beads (Sigma
Chemical, St. Louis, Mo.) or glutathione derivatized microtitre
plates, which are then combined with the cell lysates (e.g.,
.sup.35S-labeled) and the candidate compound, and the mixture
incubated under conditions conducive to complex formation (e.g., at
physiological conditions for salt and pH). Following incubation,
the beads are washed to remove any unbound label, and the matrix
immobilized and radiolabel determined directly, or in the
supernatant after the complexes are dissociated. Alternatively, the
complexes can be dissociated from the matrix, separated by
SDS-PAGE, and the level of protease-binding protein found in the
bead fraction quantitated from the gel using standard
electrophoretic techniques. For example, either the polypeptide or
its target molecule can be immobilized utilizing conjugation of
biotin and streptavidin using techniques well known in the art.
Alternatively, antibodies reactive with the protein but which do
not interfere with binding of the protein to its target molecule
can be derivatized to the wells of the plate, and the protein
trapped in the wells by antibody conjugation. Preparations of a
protease-binding protein and a candidate compound are incubated in
the protease protein-presenting wells and the amount of complex
trapped in the well can be quantitated. Methods for detecting such
complexes, in addition to those described above for the
GST-immobilized complexes, include immunodetection of complexes
using antibodies reactive with the protease protein target
molecule, or which are reactive with protease protein and compete
with the target molecule, as well as enzyme-linked assays which
rely on detecting an enzymatic activity associated with the target
molecule.
[0100] Agents that modulate one of the proteases of the present
invention can be identified using one or more of the above assays,
alone or in combination. It is generally preferable to use a
cell-based or cell free system first and then confirm activity in
an animal or other model system. Such model systems are well known
in the art and can readily be employed in this context.
[0101] Modulators of protease protein activity identified according
to these drug screening assays can be used to treat a subject with
a disorder mediated by the protease pathway, by treating cells or
tissues that express the protease. Experimental data as provided in
FIG. 1 indicates expression in humans in retinoblastomas (eye),
melanotic melanomas (skin), endometrium adenocarcinomas (uterus),
adenocarcinomas (ovary), schizophrenic brain, kidney and human
heart. These methods of treatment include the steps of
administering a modulator of protease activity in a pharmaceutical
composition to a subject in need of such treatment, the modulator
being identified as described herein.
[0102] In yet another aspect of the invention, the protease
proteins can be used as "bait proteins" in a two-hybrid assay or
three-hybrid assay (see, e.g., U.S. Pat. No. 5,283,317; Zervos et
al. (1993) Cell 72:223-232; Madura et al. (1993) J. Biol. Chem.
268:12046-12054; Bartel et al. (1993) Biotechniques 14:920-924;
Iwabuchi et al. (1993) Oncogene 8:1693-1696; and Brent WO94/10300),
to identify other proteins, which bind to or interact with the
protease and are involved in protease activity. Such
protease-binding proteins are also likely to be involved in the
propagation of signals by the protease proteins or protease targets
as, for example, downstream elements of a protease-mediated
signaling pathway. Alternatively, such protease-binding proteins
are likely to be protease inhibitors.
[0103] The two-hybrid system is based on the modular nature of most
transcription factors, which consist of separable DNA-binding and
activation domains. Briefly, the assay utilizes two different DNA
constructs. In one construct, the gene that codes for a protease
protein is fused to a gene encoding the DNA binding domain of a
known transcription factor (e.g., GAL-4). In the other construct, a
DNA sequence, from a library of DNA sequences, that encodes an
unidentified protein ("prey" or "sample") is fused to a gene that
codes for the activation domain of the known transcription factor.
If the "bait" and the "prey" proteins are able to interact, in
vivo, forming a protease-dependent complex, the DNA-binding and
activation domains of the transcription factor are brought into
close proximity. This proximity allows transcription of a reporter
gene (e.g., LacZ) which is operably linked to a transcriptional
regulatory site responsive to the transcription factor. Expression
of the reporter gene can be detected and cell colonies containing
the functional transcription factor can be isolated and used to
obtain the cloned gene which encodes the protein which interacts
with the protease protein.
[0104] This invention further pertains to novel agents identified
by the above-described screening assays. Accordingly, it is within
the scope of this invention to further use an agent identified as
described herein in an appropriate animal model. For example, an
agent identified as described herein (e.g., a protease-modulating
agent, an antisense protease nucleic acid molecule, a
protease-specific antibody, or a protease-binding partner) can be
used in an animal or other model to determine the efficacy,
toxicity, or side effects of treatment with such an agent.
Alternatively, an agent identified as described herein can be used
in an animal or other model to determine the mechanism of action of
such an agent. Furthermore, this invention pertains to uses of
novel agents identified by the above-described screening assays for
treatments as described herein.
[0105] The protease proteins of the present invention are also
useful to provide a target for diagnosing a disease or
predisposition to disease mediated by the peptide. Accordingly, the
invention provides methods for detecting the presence, or levels
of, the protein (or encoding mRNA) in a cell, tissue, or organism.
Experimental data as provided in FIG. 1 indicates expression in
humans in retinoblastomas (eye), melanotic melanomas (skin),
endometrium adenocarcinomas (uterus), adenocarcinomas (ovary),
schizophrenic brain, kidney and human heart. The method involves
contacting a biological sample with a compound capable of
interacting with the protease protein such that the interaction can
be detected. Such an assay can be provided in a single detection
format or a multi-detection format such as an antibody chip
array.
[0106] One agent for detecting a protein in a sample is an antibody
capable of selectively binding to protein. A biological sample
includes tissues, cells and biological fluids isolated from a
subject, as well as tissues, cells and fluids present within a
subject.
[0107] The peptides of the present invention also provide targets
for diagnosing active protein activity, disease, or predisposition
to disease, in a patient having a variant peptide, particularly
activities and conditions that are known for other members of the
family of proteins to which the present one belongs. Thus, the
peptide can be isolated from a biological sample and assayed for
the presence of a genetic mutation that results in aberrant
peptide. This includes amino acid substitution, deletion,
insertion, rearrangement, (as the result of aberrant splicing
events), and inappropriate post-translational modification.
Analytic methods include altered electrophoretic mobility, altered
tryptic peptide digest, altered protease activity in cell-based or
cell-free assay, alteration in substrate or antibody-binding
pattern, altered isoelectric point, direct amino acid sequencing,
and any other of the known assay techniques useful for detecting
mutations in a protein. Such an assay can be provided in a single
detection format or a multi-detection format such as an antibody
chip array.
[0108] In vitro techniques for detection of peptide include enzyme
linked immunosorbent assays (ELISAs), Western blots,
immunoprecipitations and immunofluorescence using a detection
reagent, such as an antibody or protein binding agent.
Alternatively, the peptide can be detected in vivo in a subject by
introducing into the subject a labeled anti-peptide antibody or
other types of detection agent. For example, the antibody can be
labeled with a radioactive marker whose presence and location in a
subject can be detected by standard imaging techniques.
Particularly useful are methods that detect the allelic variant of
a peptide expressed in a subject and methods which detect fragments
of a peptide in a sample.
[0109] The peptides are also useful in pharmacogenomic analysis.
Pharmacogenomics deal with clinically significant hereditary
variations in the response to drugs due to altered drug disposition
and abnormal action in affected persons. See, e.g., Eichelbaum, M.
(Clin. Exp. Pharmacol. Physiol. 23(10-11):983-985 (1996)), and
Linder, M. W. (Clin. Chem. 43(2):254-266 (1997)). The clinical
outcomes of these variations result in severe toxicity of
therapeutic drugs in certain individuals or therapeutic failure of
drugs in certain individuals as a result of individual variation in
metabolism. Thus, the genotype of the individual can determine the
way a therapeutic compound acts on the body or the way the body
metabolizes the compound. Further, the activity of drug
metabolizing enzymes effects both the intensity and duration of
drug action. Thus, the pharmacogenomics of the individual permit
the selection of effective compounds and effective dosages of such
compounds for prophylactic or therapeutic treatment based on the
individual's genotype. The discovery of genetic polymorphisms in
some drug metabolizing enzymes has explained why some patients do
not obtain the expected drug effects, show an exaggerated drug
effect, or experience serious toxicity from standard drug dosages.
Polymorphisms can be expressed in the phenotype of the extensive
metabolizer and the phenotype of the poor metabolizer. Accordingly,
genetic polymorphism may lead to allelic protein variants of the
protease protein in which one or more of the protease functions in
one population is different from those in another population. The
peptides thus allow a target to ascertain a genetic predisposition
that can affect treatment modality. Thus, in a ligand-based
treatment, polymorphism may give rise to amino terminal
extracellular domains and/or other substrate-binding regions that
are more or less active in substrate binding, and protease
activation. Accordingly, substrate dosage would necessarily be
modified to maximize the therapeutic effect within a given
population containing a polymorphism. As an alternative to
genotyping, specific polymorphic peptides could be identified.
[0110] The peptides are also useful for treating a disorder
characterized by an absence of, inappropriate, or unwanted
expression of the protein. Experimental data as provided in FIG. 1
indicates expression in humans in retinoblastomas (eye), melanotic
melanomas (skin), endometrium adenocarcinomas (uterus),
adenocarcinomas (ovary), schizophrenic brain, kidney and human
heart. Accordingly, methods for treatment include the use of the
protease protein or fragments.
[0111] Antibodies
[0112] The invention also provides antibodies that selectively bind
to one of the peptides of the present invention, a protein
comprising such a peptide, as well as variants and fragments
thereof. As used herein, an antibody selectively binds a target
peptide when it binds the target peptide and does not significantly
bind to unrelated proteins. An antibody is still considered to
selectively bind a peptide even if it also binds to other proteins
that are not substantially homologous with the target peptide so
long as such proteins share homology with a fragment or domain of
the peptide target of the antibody. In this case, it would be
understood that antibody binding to the peptide is still selective
despite some degree of cross-reactivity.
[0113] As used herein, an antibody is defined in terms consistent
with that recognized within the art: they are multi-subunit
proteins produced by a mammalian organism in response to an antigen
challenge. The antibodies of the present invention include
polyclonal antibodies and monoclonal antibodies, as well as
fragments of such antibodies, including, but not limited to, Fab or
F(ab').sub.2, and Fv fragments.
[0114] Many methods are known for generating and/or identifying
antibodies to a given target peptide. Several such methods are
described by Harlow, Antibodies, Cold Spring Harbor Press,
(1989).
[0115] In general, to generate antibodies, an isolated peptide is
used as an immunogen and is administered to a mammalian organism,
such as a rat, rabbit or mouse. The full-length protein, an
antigenic peptide fragment or a fusion protein can be used.
Particularly important fragments are those covering functional
domains, such as the domains identified in FIG. 2, and domain of
sequence homology or divergence amongst the family, such as those
that can readily be identified using protein alignment methods and
as presented in the Figures.
[0116] Antibodies are preferably prepared from regions or discrete
fragments of the protease proteins. Antibodies can be prepared from
any region of the peptide as described herein. However, preferred
regions will include those involved in function/activity and/or
protease/binding partner interaction. FIG. 2 can be used to
identify particularly important regions while sequence alignment
can be used to identify conserved and unique sequence
fragments.
[0117] An antigenic fragment will typically comprise at least 8
contiguous amino acid residues. The antigenic peptide can comprise,
however, at least 10, 12, 14, 16 or more amino acid residues. Such
fragments can be selected on a physical property, such as fragments
correspond to regions that are located on the surface of the
protein, e.g., hydrophilic regions or can be selected based on
sequence uniqueness (see FIG. 2).
[0118] Detection on an antibody of the present invention can be
facilitated by coupling (i.e., physically linking) the antibody to
a detectable substance. Examples of detectable substances include
various enzymes, prosthetic groups, fluorescent materials,
luminescent materials, bioluminescent materials, and radioactive
materials. Examples of suitable enzymes include horseradish
peroxidase, alkaline phosphatase, .beta.-galactosidase, or
acetylcholinesterase; examples of suitable prosthetic group
complexes include streptavidin/biotin and avidin/biotin; examples
of suitable fluorescent materials include umbelliferone,
fluorescein, fluorescein isothiocyanate, rhodamine,
dichlorotriazinylamine fluorescein, dansyl chloride or
phycoerythrin; an example of a luminescent material includes
luminol; examples of bioluminescent materials include luciferase,
luciferin, and aequorin, and examples of suitable radioactive
material include .sup.125I, .sup.131I, .sup.35S or .sup.3H.
[0119] Antibody Uses
[0120] The antibodies can be used to isolate one of the proteins of
the present invention by standard techniques, such as affinity
chromatography or immunoprecipitation. The antibodies can
facilitate the purification of the natural protein from cells and
recombinantly produced protein expressed in host cells. In
addition, such antibodies are useful to detect the presence of one
of the proteins of the present invention in cells or tissues to
determine the pattern of expression of the protein among various
tissues in an organism and over the course of normal development.
Experimental data as provided in FIG. 1 indicates that protease
proteins of the present invention are expressed in humans in
numerous cancers, including retinoblastomas (eye), melanotic
melanomas (skin), endometrium adenocarcinomas (uterus), and
adenocarcinomas (ovary). Furthermore, expression has also been
observed in the schizophrenic brain and kidney. These expression
patterns have been determined by a virtual northern blot analysis.
In addition, PCR-based tissue screening panel indicates expression
in human heart. Further, such antibodies can be used to detect
protein in situ, in vitro, or in a cell lysate or supernatant in
order to evaluate the abundance and pattern of expression. Also,
such antibodies can be used to assess abnormal tissue distribution
or abnormal expression during development or progression of a
biological condition. Antibody detection of circulating fragments
of the full length protein can be used to identify turnover.
[0121] Further, the antibodies can be used to assess expression in
disease states such as in active stages of the disease or in an
individual with a predisposition toward disease related to the
protein's function. When a disorder is caused by an inappropriate
tissue distribution, developmental expression, level of expression
of the protein, or expressed/processed form, the antibody can be
prepared against the normal protein. Experimental data as provided
in FIG. 1 indicates expression in humans in retinoblastomas (eye),
melanotic melanomas (skin), endometrium adenocarcinomas (uterus),
adenocarcinomas (ovary), schizophrenic brain, kidney and human
heart. If a disorder is characterized by a specific mutation in the
protein, antibodies specific for this mutant protein can be used to
assay for the presence of the specific mutant protein.
[0122] The antibodies can also be used to assess normal and
aberrant subcellular localization of cells in the various tissues
in an organism. Experimental data as provided in FIG. 1 indicates
expression in humans in retinoblastomas (eye), melanotic melanomas
(skin), endometrium adenocarcinomas (uterus), adenocarcinomas
(ovary), schizophrenic brain, kidney and human heart. The
diagnostic uses can be applied, not only in genetic testing, but
also in monitoring a treatment modality. Accordingly, where
treatment is ultimately aimed at correcting expression level or the
presence of aberrant sequence and aberrant tissue distribution or
developmental expression, antibodies directed against the protein
or relevant fragments can be used to monitor therapeutic
efficacy.
[0123] Additionally, antibodies are useful in pharmacogenomic
analysis. Thus, antibodies prepared against polymorphic proteins
can be used to identify individuals that require modified treatment
modalities. The antibodies are also useful as diagnostic tools as
an immunological marker for aberrant protein analyzed by
electrophoretic mobility, isoelectric point, tryptic peptide
digest, and other physical assays known to those in the art.
[0124] The antibodies are also useful for tissue typing.
Experimental data as provided in FIG. 1 indicates expression in
humans in retinoblastomas (eye), melanotic melanomas (skin),
endometrium adenocarcinomas (uterus), adenocarcinomas (ovary),
schizophrenic brain, kidney and human heart. Thus, where a specific
protein has been correlated with expression in a specific tissue,
antibodies that are specific for this protein can be used to
identify a tissue type.
[0125] The antibodies are also useful for inhibiting protein
function, for example, blocking the binding of the protease peptide
to a binding partner such as a substrate. These uses can also be
applied in a therapeutic context in which treatment involves
inhibiting the protein's function. An antibody can be used, for
example, to block binding, thus modulating (agonizing or
antagonizing) the peptides activity. Antibodies can be prepared
against specific fragments containing sites required for function
or against intact protein that is associated with a cell or cell
membrane. See FIG. 2 for structural information relating to the
proteins of the present invention.
[0126] The invention also encompasses kits for using antibodies to
detect the presence of a protein in a biological sample. The kit
can comprise antibodies such as a labeled or labelable antibody and
a compound or agent for detecting protein in a biological sample;
means for determining the amount of protein in the sample; means
for comparing the amount of protein in the sample with a standard;
and instructions for use. Such a kit can be supplied to detect a
single protein or epitope or can be configured to detect one of a
multitude of epitopes, such as in an antibody detection array.
Arrays are described in detail below for nucleic acid arrays and
similar methods have been developed for antibody arrays.
[0127] Nucleic Acid Molecules
[0128] The present invention further provides isolated nucleic acid
molecules that encode a protease peptide or protein of the present
invention (cDNA, transcript and genomic sequence). Such nucleic
acid molecules will consist of, consist essentially of, or comprise
a nucleotide sequence that encodes one of the protease peptides of
the present invention, an allelic variant thereof, or an ortholog
or paralog thereof.
[0129] As used herein, an "isolated" nucleic acid molecule is one
that is separated from other nucleic acid present in the natural
source of the nucleic acid. Preferably, an "isolated" nucleic acid
is free of sequences which naturally flank the nucleic acid (i.e.,
sequences located at the 5' and 3' ends of the nucleic acid) in the
genomic DNA of the organism from which the nucleic acid is derived.
However, there can be some flanking nucleotide sequences, for
example up to about 5 KB, 4 KB, 3 KB, 2 KB, or 1 KB or less,
particularly contiguous peptide encoding sequences and peptide
encoding sequences within the same gene but separated by introns in
the genomic sequence. The important point is that the nucleic acid
is isolated from remote and unimportant flanking sequences such
that it can be subjected to the specific manipulations described
herein such as recombinant expression, preparation of probes and
primers, and other uses specific to the nucleic acid sequences.
[0130] Moreover, an "isolated" nucleic acid molecule, such as a
transcript/cDNA molecule, can be substantially free of other
cellular material, or culture medium when produced by recombinant
techniques, or chemical precursors or other chemicals when
chemically synthesized. However, the nucleic acid molecule can be
fused to other coding or regulatory sequences and still be
considered isolated.
[0131] For example, recombinant DNA molecules contained in a vector
are considered isolated. Further examples of isolated DNA molecules
include recombinant DNA molecules maintained in heterologous host
cells or purified (partially or substantially) DNA molecules in
solution. Isolated RNA molecules include in vivo or in vitro RNA
transcripts of the isolated DNA molecules of the present invention.
Isolated nucleic acid molecules according to the present invention
further include such molecules produced synthetically.
[0132] Accordingly, the present invention provides nucleic acid
molecules that consist of the nucleotide sequence shown in FIG. 1
or 3 (SEQ ID NO:1, transcript sequence and SEQ ID NO:3, genomic
sequence), or any nucleic acid molecule that encodes the protein
provided in FIG. 2, SEQ ID NO:2. A nucleic acid molecule consists
of a nucleotide sequence when the nucleotide sequence is the
complete nucleotide sequence of the nucleic acid molecule.
[0133] The present invention further provides nucleic acid
molecules that consist essentially of the nucleotide sequence shown
in FIG. 1 or 3 (SEQ ID NO:1, transcript sequence and SEQ ID NO:3,
genomic sequence), or any nucleic acid molecule that encodes the
protein provided in FIG. 2, SEQ ID NO:2. A nucleic acid molecule
consists essentially of a nucleotide sequence when such a
nucleotide sequence is present with only a few additional nucleic
acid residues in the final nucleic acid molecule.
[0134] The present invention further provides nucleic acid
molecules that comprise the nucleotide sequences shown in FIG. 1 or
3 (SEQ ID NO:1, transcript sequence and SEQ ID NO:3, genomic
sequence), or any nucleic acid molecule that encodes the protein
provided in FIG. 2, SEQ ID NO:2. A nucleic acid molecule comprises
a nucleotide sequence when the nucleotide sequence is at least part
of the final nucleotide sequence of the nucleic acid molecule. In
such a fashion, the nucleic acid molecule can be only the
nucleotide sequence or have additional nucleic acid residues, such
as nucleic acid residues that are naturally associated with it or
heterologous nucleotide sequences. Such a nucleic acid molecule can
have a few additional nucleotides or can comprises several hundred
or more additional nucleotides. A brief description of how various
types of these nucleic acid molecules can be readily made/isolated
is provided below.
[0135] In FIGS. 1 and 3, both coding and non-coding sequences are
provided. Because of the source of the present invention, humans
genomic sequence (FIG. 3) and cDNA/transcript sequences (FIG. 1),
the nucleic acid molecules in the Figures will contain genomic
intronic sequences, 5' and 3' non-coding sequences, gene regulatory
regions and non-coding intergenic sequences. In general such
sequence features are either noted in FIGS. 1 and 3 or can readily
be identified using computational tools known in the art. As
discussed below, some of the non-coding regions, particularly gene
regulatory elements such as promoters, are useful for a variety of
purposes, e.g. control of heterologous gene expression, target for
identifying gene activity modulating compounds, and are
particularly claimed as fragments of the genomic sequence provided
herein.
[0136] The isolated nucleic acid molecules can encode the mature
protein plus additional amino or carboxyl-terminal amino acids, or
amino acids interior to the mature peptide (when the mature form
has more than one peptide chain, for instance). Such sequences may
play a role in processing of a protein from precursor to a mature
form, facilitate protein trafficking, prolong or shorten protein
half-life or facilitate manipulation of a protein for assay or
production, among other things. As generally is the case in situ,
the additional amino acids may be processed away from the mature
protein by cellular enzymes.
[0137] As mentioned above, the isolated nucleic acid molecules
include, but are not limited to, the sequence encoding the protease
peptide alone, the sequence encoding the mature peptide and
additional coding sequences, such as a leader or secretory sequence
(e.g., a pre-pro or pro-protein sequence), the sequence encoding
the mature peptide, with or without the additional coding
sequences, plus additional non-coding sequences, for example
introns and non-coding 5' and 3' sequences such as transcribed but
non-translated sequences that play a role in transcription, mRNA
processing (including splicing and polyadenylation signals),
ribosome binding and stability of mRNA. In addition, the nucleic
acid molecule may be fused to a marker sequence encoding, for
example, a peptide that facilitates purification.
[0138] Isolated nucleic acid molecules can be in the form of RNA,
such as mRNA, or in the form DNA, including cDNA and genomic DNA
obtained by cloning or produced by chemical synthetic techniques or
by a combination thereof. The nucleic acid, especially DNA, can be
double-stranded or single-stranded. Single-stranded nucleic acid
can be the coding strand (sense strand) or the non-coding strand
(anti-sense strand).
[0139] The invention further provides nucleic acid molecules that
encode fragments of the peptides of the present invention as well
as nucleic acid molecules that encode obvious variants of the
protease proteins of the present invention that are described
above. Such nucleic acid molecules may be naturally occurring, such
as allelic variants (same locus), paralogs (different locus), and
orthologs (different organism), or may be constructed by
recombinant DNA methods or by chemical synthesis. Such
non-naturally occurring variants may be made by mutagenesis
techniques, including those applied to nucleic acid molecules,
cells, or organisms. Accordingly, as discussed above, the variants
can contain nucleotide substitutions, deletions, inversions and
insertions. Variation can occur in either or both the coding and
non-coding regions. The variations can produce both conservative
and non-conservative amino acid substitutions.
[0140] The present invention further provides non-coding fragments
of the nucleic acid molecules provided in FIGS. 1 and 3. Preferred
non-coding fragments include, but are not limited to, promoter
sequences, enhancer sequences, gene modulating sequences and gene
termination sequences. Such fragments are useful in controlling
heterologous gene expression and in developing screens to identify
gene-modulating agents. A promoter can readily be identified as
being 5' to the ATG start site in the genomic sequence provided in
FIG. 3.
[0141] A fragment comprises a contiguous nucleotide sequence
greater than 12 or more nucleotides. Further, a fragment could at
least 30, 40, 50, 100, 250 or 500 nucleotides in length. The length
of the fragment will be based on its intended use. For example, the
fragment can encode epitope bearing regions of the peptide, or can
be useful as DNA probes and primers. Such fragments can be isolated
using the known nucleotide sequence to synthesize an
oligonucleotide probe. A labeled probe can then be used to screen a
cDNA library, genomic DNA library, or mRNA to isolate nucleic acid
corresponding to the coding region. Further, primers can be used in
PCR reactions to clone specific regions of gene.
[0142] A probe/primer typically comprises substantially a purified
oligonucleotide or oligonucleotide pair. The oligonucleotide
typically comprises a region of nucleotide sequence that hybridizes
under stringent conditions to at least about 12, 20, 25, 40, 50 or
more consecutive nucleotides.
[0143] Orthologs, homologs, and allelic variants can be identified
using methods well known in the art. As described in the Peptide
Section, these variants comprise a nucleotide sequence encoding a
peptide that is typically 60-70%, 70-80%, 80-90%, and more
typically at least about 90-95% or more homologous to the
nucleotide sequence shown in the Figure sheets or a fragment of
this sequence. Such nucleic acid molecules can readily be
identified as being able to hybridize under moderate to stringent
conditions, to the nucleotide sequence shown in the Figure sheets
or a fragment of the sequence. Allelic variants can readily be
determined by genetic locus of the encoding gene.
[0144] FIG. 3 provides information on a SNP that has been found in
the gene encoding the protease protein of the present invention.
Specifically, athymine indel at position 12469 was identified.
[0145] As used herein, the term "hybridizes under stringent
conditions" is intended to describe conditions for hybridization
and washing under which nucleotide sequences encoding a peptide at
least 60-70% homologous to each other typically remain hybridized
to each other. The conditions can be such that sequences at least
about 60%, at least about 70%, or at least about 80% or more
homologous to each other typically remain hybridized to each other.
Such stringent conditions are known to those skilled in the art and
can be found in Current Protocols in Molecular Biology, John Wiley
& Sons, N.Y. (1989), 6.3.1-6.3.6. One example of stringent
hybridization conditions are hybridization in 6.times. sodium
chloride/sodium citrate (SSC) at about 45C, followed by one or more
washes in 0.2.times.SSC, 0.1% SDS at 50-65C. Examples of moderate
to low stringency hybridization conditions are well known in the
art.
[0146] Nucleic Acid Molecule Uses
[0147] The nucleic acid molecules of the present invention are
useful for probes, primers, chemical intermediates, and in
biological assays. The nucleic acid molecules are useful as a
hybridization probe for messenger RNA, transcript/cDNA and genomic
DNA to isolate full-length cDNA and genomic clones encoding the
peptide described in FIG. 2 and to isolate cDNA and genomic clones
that correspond to variants (alleles, orthologs, etc.) producing
the same or related peptides shown in FIG. 2. As illustrated in
FIG. 3, an insertion/deletion SNP variant ("indel") was identified
at position 12469.
[0148] The probe can correspond to any sequence along the entire
length of the nucleic acid molecules provided in the Figures.
Accordingly, it could be derived from 5' noncoding regions, the
coding region, and 3' noncoding regions. However, as discussed,
fragments are not to be construed as encompassing fragments
disclosed prior to the present invention.
[0149] The nucleic acid molecules are also useful as primers for
PCR to amplify any given region of a nucleic acid molecule and are
useful to synthesize antisense molecules of desired length and
sequence.
[0150] The nucleic acid molecules are also useful for constructing
recombinant vectors. Such vectors include expression vectors that
express a portion of, or all of, the peptide sequences. Vectors
also include insertion vectors, used to integrate into another
nucleic acid molecule sequence, such as into the cellular genome,
to alter in situ expression of a gene and/or gene product. For
example, an endogenous coding sequence can be replaced via
homologous recombination with all or part of the coding region
containing one or more specifically introduced mutations.
[0151] The nucleic acid molecules are also useful for expressing
antigenic portions of the proteins.
[0152] The nucleic acid molecules are also useful as probes for
determining the chromosomal positions of the nucleic acid molecules
by means of in situ hybridization methods. As indicated by the data
presented in FIG. 3, the map position was determined to be on
chromosome 16 by ePCR, and confirmed with radiation hybrid
mapping.
[0153] The nucleic acid molecules are also useful in making vectors
containing the gene regulatory regions of the nucleic acid
molecules of the present invention.
[0154] The nucleic acid molecules are also useful for designing
ribozymes corresponding to all, or a part, of the mRNA produced
from the nucleic acid molecules described herein.
[0155] The nucleic acid molecules are also useful for making
vectors that express part, or all, of the peptides.
[0156] The nucleic acid molecules are also useful for constructing
host cells expressing a part, or all, of the nucleic acid molecules
and peptides.
[0157] The nucleic acid molecules are also useful for constructing
transgenic animals expressing all, or a part, of the nucleic acid
molecules and peptides.
[0158] The nucleic acid molecules are also useful as hybridization
probes for determining the presence, level, form and distribution
of nucleic acid expression. Experimental data as provided in FIG. 1
indicates that protease proteins of the present invention are
expressed in humans in numerous cancers, including retinoblastomas
(eye), melanotic melanomas (skin), endometrium adenocarcinomas
(uterus), and adenocarcinomas (ovary). Furthermore, expression has
also been observed in the schizophrenic brain and kidney. These
expression patterns have been determined by a virtual northern blot
analysis. In addition, PCR-based tissue screening panel indicates
expression in human heart. Accordingly, the probes can be used to
detect the presence of, or to determine levels of, a specific
nucleic acid molecule in cells, tissues, and in organisms. The
nucleic acid whose level is determined can be DNA or RNA.
Accordingly, probes corresponding to the peptides described herein
can be used to assess expression and/or gene copy number in a given
cell, tissue, or organism. These uses are relevant for diagnosis of
disorders involving an increase or decrease in protease protein
expression relative to normal results.
[0159] In vitro techniques for detection of mRNA include Northern
hybridizations and in situ hybridizations. In vitro techniques for
detecting DNA includes Southern hybridizations and in situ
hybridization.
[0160] Probes can be used as a part of a diagnostic test kit for
identifying cells or tissues that express a protease protein, such
as by measuring a level of a protease-encoding nucleic acid in a
sample of cells from a subject e.g., mRNA or genomic DNA, or
determining if a protease gene has been mutated. Experimental data
as provided in FIG. 1 indicates that protease proteins of the
present invention are expressed in humans in numerous cancers,
including retinoblastomas (eye), melanotic melanomas (skin),
endometrium adenocarcinomas (uterus), and adenocarcinomas (ovary).
Furthermore, expression has also been observed in the schizophrenic
brain and kidney. These expression patterns have been determined by
a virtual northern blot analysis. In addition, PCR-based tissue
screening panel indicates expression in human heart.
[0161] Nucleic acid expression assays are useful for drug screening
to identify compounds that modulate protease nucleic acid
expression.
[0162] The invention thus provides a method for identifying a
compound that can be used to treat a disorder associated with
nucleic acid expression of the protease gene, particularly
biological and pathological processes that are mediated by the
protease in cells and tissues that express it. Experimental data as
provided in FIG. 1 indicates expression in humans in
retinoblastomas (eye), melanotic melanomas (skin), endometrium
adenocarcinomas (uterus), adenocarcinomas (ovary), schizophrenic
brain, kidney and human heart. The method typically includes
assaying the ability of the compound to modulate the expression of
the protease nucleic acid and thus identifying a compound that can
be used to treat a disorder characterized by undesired protease
nucleic acid expression. The assays can be performed in cell-based
and cell-free systems. Cell-based assays include cells naturally
expressing the protease nucleic acid or recombinant cells
genetically engineered to express specific nucleic acid
sequences.
[0163] The assay for protease nucleic acid expression can involve
direct assay of nucleic acid levels, such as mRNA levels, or on
collateral compounds involved in the signal pathway. Further, the
expression of genes that are up- or down-regulated in response to
the protease protein signal pathway can also be assayed. In this
embodiment the regulatory regions of these genes can be operably
linked to a reporter gene such as luciferase.
[0164] Thus, modulators of protease gene expression can be
identified in a method wherein a cell is contacted with a candidate
compound and the expression of mRNA determined. The level of
expression of protease mRNA in the presence of the candidate
compound is compared to the level of expression of protease mRNA in
the absence of the candidate compound. The candidate compound can
then be identified as a modulator of nucleic acid expression based
on this comparison and be used, for example to treat a disorder
characterized by aberrant nucleic acid expression. When expression
of mRNA is statistically significantly greater in the presence of
the candidate compound than in its absence, the candidate compound
is identified as a stimulator of nucleic acid expression. When
nucleic acid expression is statistically significantly less in the
presence of the candidate compound than in its absence, the
candidate compound is identified as an inhibitor of nucleic acid
expression.
[0165] The invention further provides methods of treatment, with
the nucleic acid as a target, using a compound identified through
drug screening as a gene modulator to modulate protease nucleic
acid expression in cells and tissues that express the protease.
Experimental data as provided in FIG. 1 indicates that protease
proteins of the present invention are expressed in humans in
numerous cancers, including retinoblastomas (eye), melanotic
melanomas (skin), endometrium adenocarcinomas (uterus), and
adenocarcinomas (ovary). Furthermore, expression has also been
observed in the schizophrenic brain and kidney. These expression
patterns have been determined by a virtual northern blot analysis.
In addition, PCR-based tissue screening panel indicates expression
in human heart. Modulation includes both up-regulation (i.e.
activation or agonization) or down-regulation (suppression or
antagonization) or nucleic acid expression.
[0166] Alternatively, a modulator for protease nucleic acid
expression can be a small molecule or drug identified using the
screening assays described herein as long as the drug or small
molecule inhibits the protease nucleic acid expression in the cells
and tissues that express the protein. Experimental data as provided
in FIG. 1 indicates expression in humans in retinoblastomas (eye),
melanotic melanomas (skin), endometrium adenocarcinomas (uterus),
adenocarcinomas (ovary), schizophrenic brain, kidney and human
heart.
[0167] The nucleic acid molecules are also useful for monitoring
the effectiveness of modulating compounds on the expression or
activity of the protease gene in clinical trials or in a treatment
regimen. Thus, the gene expression pattern can serve as a barometer
for the continuing effectiveness of treatment with the compound,
particularly with compounds to which a patient can develop
resistance. The gene expression pattern can also serve as a marker
indicative of a physiological response of the affected cells to the
compound. Accordingly, such monitoring would allow either increased
administration of the compound or the administration of alternative
compounds to which the patient has not become resistant. Similarly,
if the level of nucleic acid expression falls below a desirable
level, administration of the compound could be commensurately
decreased.
[0168] The nucleic acid molecules are also useful in diagnostic
assays for qualitative changes in protease nucleic acid expression,
and particularly in qualitative changes that lead to pathology. The
nucleic acid molecules can be used to detect mutations in protease
genes and gene expression products such as mRNA. The nucleic acid
molecules can be used as hybridization probes to detect naturally
occurring genetic mutations in the protease gene and thereby to
determine whether a subject with the mutation is at risk for a
disorder caused by the mutation. Mutations include deletion,
addition, or substitution of one or more nucleotides in the gene,
chromosomal rearrangement, such as inversion or transposition,
modification of genomic DNA, such as aberrant methylation patterns
or changes in gene copy number, such as amplification. Detection of
a mutated form of the protease gene associated with a dysfunction
provides a diagnostic tool for an active disease or susceptibility
to disease when the disease results from overexpression,
underexpression, or altered expression of a protease protein.
[0169] Individuals carrying mutations in the protease gene can be
detected at the nucleic acid level by a variety of techniques. FIG.
3 provides information on a SNP that has been found in the gene
encoding the protease protein of the present invention.
Specifically, a thymine indel at position 12469 was identified. As
indicated by the data presented in FIG. 3, the map position was
determined to be on chromosome 16 by ePCR, and confirmed with
radiation hybrid mapping. Genomic DNA can be analyzed directly or
can be amplified by using PCR prior to analysis. RNA or cDNA can be
used in the same way. In some uses, detection of the mutation
involves the use of a probe/primer in a polymerase chain reaction
(PCR) (see, e.g. U.S. Pat. Nos. 4,683,195 and 4,683,202), such as
anchor PCR or RACE PCR, or, alternatively, in a ligation chain
reaction (LCR) (see, e.g., Landegran et al., Science 241:1077-1080
(1988); and Nakazawa et al., PNAS 91:360-364 (1994)), the latter of
which can be particularly useful for detecting point mutations in
the gene (see Abravaya et al., Nucleic Acids Res. 23:675-682
(1995)). This method can include the steps of collecting a sample
of cells from a patient, isolating nucleic acid (e.g., genomic,
mRNA or both) from the cells of the sample, contacting the nucleic
acid sample with one or more primers which specifically hybridize
to a gene under conditions such that hybridization and
amplification of the gene (if present) occurs, and detecting the
presence or absence of an amplification product, or detecting the
size of the amplification product and comparing the length to a
control sample. Deletions and insertions can be detected by a
change in size of the amplified product compared to the normal
genotype. Point mutations can be identified by hybridizing
amplified DNA to normal RNA or antisense DNA sequences.
[0170] Alternatively, mutations in a protease gene can be directly
identified, for example, by alterations in restriction enzyme
digestion patterns determined by gel electrophoresis.
[0171] Further, sequence-specific ribozymes (U.S. Pat. No.
5,498,531) can be used to score for the presence of specific
mutations by development or loss of a ribozyme cleavage site.
Perfectly matched sequences can be distinguished from mismatched
sequences by nuclease cleavage digestion assays or by differences
in melting temperature.
[0172] Sequence changes at specific locations can also be assessed
by nuclease protection assays such as RNase and S1 protection or
the chemical cleavage method. Furthermore, sequence differences
between a mutant protease gene and a wild-type gene can be
determined by direct DNA sequencing. A variety of automated
sequencing procedures can be utilized when performing the
diagnostic assays (Naeve, C. W., (1995) Biotechniques 19:448),
including sequencing by mass spectrometry (see, e.g., PCT
International Publication No. WO 94/16101; Cohen et al., Adv.
Chromatogr. 36:127-162 (1996); and Griffin et al., Appl. Biochem.
Biotechnol. 38:147-159 (1993)).
[0173] Other methods for detecting mutations in the gene include
methods in which protection from cleavage agents is used to detect
mismatched bases in RNA/RNA or RNA/DNA duplexes (Myers et al.,
Science 230:1242 (1985)); Cotton et al, PNAS 85:4397 (1988);
Saleeba et al., Meth. Enzymol. 217:286-295 (1992)), electrophoretic
mobility of mutant and wild type nucleic acid is compared (Orita et
al., PNAS 86:2766 (1989); Cotton et al., Mutat. Res. 285:125-144
(1993); and Hayashi et al., Genet. Anal. Tech. Appl. 9:73-79
(1992)), and movement of mutant or wild-type fragments in
polyacrylamide gels containing a gradient of denaturant is assayed
using denaturing gradient gel electrophoresis (Myers et al., Nature
313:495 (1985)). Examples of other techniques for detecting point
mutations include selective oligonucleotide hybridization,
selective amplification, and selective primer extension.
[0174] The nucleic acid molecules are also useful for testing an
individual for a genotype that while not necessarily causing the
disease, nevertheless affects the treatment modality. Thus, the
nucleic acid molecules can be used to study the relationship
between an individual's genotype and the individual's response to a
compound used for treatment (pharmacogenomic relationship).
Accordingly, the nucleic acid molecules described herein can be
used to assess the mutation content of the protease gene in an
individual in order to select an appropriate compound or dosage
regimen for treatment. FIG. 3 provides information on a SNP that
has been found in the gene encoding the protease protein of the
present invention. Specifically, a thymine indel at position 12469
was identified.
[0175] Thus nucleic acid molecules displaying genetic variations
that affect treatment provide a diagnostic target that can be used
to tailor treatment in an individual. Accordingly, the production
of recombinant cells and animals containing these polymorphisms
allow effective clinical design of treatment compounds and dosage
regimens.
[0176] The nucleic acid molecules are thus useful as antisense
constructs to control protease gene expression in cells, tissues,
and organisms. A DNA antisense nucleic acid molecule is designed to
be complementary to a region of the gene involved in transcription,
preventing transcription and hence production of protease protein.
An antisense RNA or DNA nucleic acid molecule would hybridize to
the mRNA and thus block translation of mRNA into protease protein.
FIG. 3 provides information on a SNP that has been found in the
gene encoding the protease protein of the present invention.
Specifically, a thymine indel at position 12469 was identified.
[0177] Alternatively, a class of antisense molecules can be used to
inactivate mRNA in order to decrease expression of protease nucleic
acid. Accordingly, these molecules can treat a disorder
characterized by abnormal or undesired protease nucleic acid
expression. This technique involves cleavage by means of ribozymes
containing nucleotide sequences complementary to one or more
regions in the mRNA that attenuate the ability of the mRNA to be
translated. Possible regions include coding regions and
particularly coding regions corresponding to the catalytic and
other functional activities of the protease protein, such as
substrate binding.
[0178] The nucleic acid molecules also provide vectors for gene
therapy in patients containing cells that are aberrant in protease
gene expression. Thus, recombinant cells, which include the
patient's cells that have been engineered ex vivo and returned to
the patient, are introduced into an individual where the cells
produce the desired protease protein to treat the individual.
[0179] The invention also encompasses kits for detecting the
presence of a protease nucleic acid in a biological sample.
Experimental data as provided in FIG. 1 indicates that protease
proteins of the present invention are expressed in humans in
numerous cancers, including retinoblastomas (eye), melanotic
melanomas (skin), endometrium adenocarcinomas (uterus), and
adenocarcinomas (ovary). Furthermore, expression has also been
observed in the schizophrenic brain and kidney. These expression
patterns have been determined by a virtual northern blot analysis.
In addition, PCR-based tissue screening panel indicates expression
in human heart. For example, the kit can comprise reagents such as
a labeled or labelable nucleic acid or agent capable of detecting
protease nucleic acid in a biological sample; means for determining
the amount of protease nucleic acid in the sample; and means for
comparing the amount of protease nucleic acid in the sample with a
standard. The compound or agent can be packaged in a suitable
container. The kit can further comprise instructions for using the
kit to detect protease protein mRNA or DNA.
[0180] Nucleic Acid Arrays
[0181] The present invention further provides nucleic acid
detection kits, such as arrays or microarrays of nucleic acid
molecules that are based on the sequence information provided in
FIGS. 1 and 3 (SEQ ID NOS:1 and 3).
[0182] As used herein "Arrays" or "Microarrays" refers to an array
of distinct polynucleotides or oligonucleotides synthesized on a
substrate, such as paper, nylon or other type of membrane, filter,
chip, glass slide, or any other suitable solid support. In one
embodiment, the microarray is prepared and used according to the
methods described in U.S. Pat. No. 5,837,832, Chee et al., PCT
application W095/11995 (Chee et al.), Lockhart, D. J. et al. (1996;
Nat. Biotech. 14: 1675-1680) and Schena, M. et al. (1996; Proc.
Natl. Acad. Sci. 93: 10614-10619), all of which are incorporated
herein in their entirety by reference. In other embodiments, such
arrays are produced by the methods described by Brown et al., U.S.
Pat. No. 5,807,522.
[0183] The microarray or detection kit is preferably composed of a
large number of unique, single-stranded nucleic acid sequences,
usually either synthetic antisense oligonucleotides or fragments of
cDNAs, fixed to a solid support. The oligonucleotides are
preferably about 6-60 nucleotides in length, more preferably 15-30
nucleotides in length, and most preferably about 20-25 nucleotides
in length. For a certain type of microarray or detection kit, it
may be preferable to use oligonucleotides that are only 7-20
nucleotides in length. The microarray or detection kit may contain
oligonucleotides that cover the known 5', or 3', sequence,
sequential oligonucleotides which cover the full length sequence;
or unique oligonucleotides selected from particular areas along the
length of the sequence. Polynucleotides used in the microarray or
detection kit may be oligonucleotides that are specific to a gene
or genes of interest.
[0184] In order to produce oligonucleotides to a known sequence for
a microarray or detection kit, the gene(s) of interest (or an ORF
identified from the contigs of the present invention) is typically
examined using a computer algorithm which starts at the 5' or at
the 3' end of the nucleotide sequence. Typical algorithms will then
identify oligomers of defined length that are unique to the gene,
have a GC content within a range suitable for hybridization, and
lack predicted secondary structure that may interfere with
hybridization. In certain situations it may be appropriate to use
pairs of oligonucleotides on a microarray or detection kit. The
"pairs" will be identical, except for one nucleotide that
preferably is located in the center of the sequence. The second
oligonucleotide in the pair (mismatched by one) serves as a
control. The number of oligonucleotide pairs may range from two to
one million. The oligomers are synthesized at designated areas on a
substrate using a light-directed chemical process. The substrate
may be paper, nylon or other type of membrane, filter, chip, glass
slide or any other suitable solid support.
[0185] In another aspect, an oligonucleotide may be synthesized on
the surface of the substrate by using a chemical coupling procedure
and an ink jet application apparatus, as described in PCT
application W095/251116 (Baldeschweiler et al.) which is
incorporated herein in its entirety by reference. In another
aspect, a "gridded" array analogous to a dot (or slot) blot may be
used to arrange and link cDNA fragments or oligonucleotides to the
surface of a substrate using a vacuum system, thermal, UV,
mechanical or chemical bonding procedures. An array, such as those
described above, may be produced by hand or by using available
devices (slot blot or dot blot apparatus), materials (any suitable
solid support), and machines (including robotic instruments), and
may contain 8, 24, 96, 384, 1536, 6144 or more oligonucleotides, or
any other number between two and one million which lends itself to
the efficient use of commercially available instrumentation.
[0186] In order to conduct sample analysis using a microarray or
detection kit, the RNA or DNA from a biological sample is made into
hybridization probes. The mRNA is isolated, and cDNA is produced
and used as a template to make antisense RNA (aRNA). The aRNA is
amplified in the presence of fluorescent nucleotides, and labeled
probes are incubated with the microarray or detection kit so that
the probe sequences hybridize to complementary oligonucleotides of
the microarray or detection kit. Incubation conditions are adjusted
so that hybridization occurs with precise complementary matches or
with various degrees of less complementarity. After removal of
nonhybridized probes, a scanner is used to determine the levels and
patterns of fluorescence. The scanned images are examined to
determine degree of complementarity and the relative abundance of
each oligonucleotide sequence on the microarray or detection kit.
The biological samples may be obtained from any bodily fluids (such
as blood, urine, saliva, phlegm, gastric juices, etc.), cultured
cells, biopsies, or other tissue preparations. A detection system
may be used to measure the absence, presence, and amount of
hybridization for all of the distinct sequences simultaneously.
This data may be used for large-scale correlation studies on the
sequences, expression patterns, mutations, variants, or
polymorphisms among samples.
[0187] Using such arrays, the present invention provides methods to
identify the expression of the protease proteins/peptides of the
present invention. In detail, such methods comprise incubating a
test sample with one or more nucleic acid molecules and assaying
for binding of the nucleic acid molecule with components within the
test sample. Such assays will typically involve arrays comprising
many genes, at least one of which is a gene of the present
invention and or alleles of the protease gene of the present
invention. FIG. 3 provides information on a SNP that has been found
in the gene encoding the protease protein of the present invention.
Specifically, a thymine indel at position 12469 was identified.
[0188] Conditions for incubating a nucleic acid molecule with a
test sample vary. Incubation conditions depend on the format
employed in the assay, the detection methods employed, and the type
and nature of the nucleic acid molecule used in the assay. One
skilled in the art will recognize that any one of the commonly
available hybridization, amplification or array assay formats can
readily be adapted to employ the novel fragments of the Human
genome disclosed herein. Examples of such assays can be found in
Chard, T, An Introduction to Radioimmunoassay and Related
Techniques, Elsevier Science Publishers, Amsterdam, The Netherlands
(1986); Bullock, G. R. et al., Techniques in Immunocytochemistry,
Academic Press, Orlando, Fla. Vol. 1 (1982), Vol. 2 (1983), Vol. 3
(1985); Tijssen, P., Practice and Theory of Enzyme Immunoassays:
Laboratory Techniques in Biochemistry and Molecular Biology,
Elsevier Science Publishers, Amsterdam, The Netherlands (1985).
[0189] The test samples of the present invention include cells,
protein or membrane extracts of cells. The test sample used in the
above-described method will vary based on the assay format, nature
of the detection method and the tissues, cells or extracts used as
the sample to be assayed. Methods for preparing nucleic acid
extracts or of cells are well known in the art and can be readily
be adapted in order to obtain a sample that is compatible with the
system utilized.
[0190] In another embodiment of the present invention, kits are
provided which contain the necessary reagents to carry out the
assays of the present invention.
[0191] Specifically, the invention provides a compartmentalized kit
to receive, in close confinement, one or more containers which
comprises: (a) a first container comprising one of the nucleic acid
molecules that can bind to a fragment of the Human genome disclosed
herein; and (b) one or more other containers comprising one or more
of the following: wash reagents, reagents capable of detecting
presence of a bound nucleic acid.
[0192] In detail, a compartmentalized kit includes any kit in which
reagents are contained in separate containers. Such containers
include small glass containers, plastic containers, strips of
plastic, glass or paper, or arraying material such as silica. Such
containers allows one to efficiently transfer reagents from one
compartment to another compartment such that the samples and
reagents are not cross-contaminated, and the agents or solutions of
each container can be added in a quantitative fashion from one
compartment to another. Such containers will include a container
which will accept the test sample, a container which contains the
nucleic acid probe, containers which contain wash reagents (such as
phosphate buffered saline, Tris-buffers, etc.), and containers
which contain the reagents used to detect the bound probe. One
skilled in the art will readily recognize that the previously
unidentified protease gene of the present invention can be
routinely identified using the sequence information disclosed
herein can be readily incorporated into one of the established kit
formats which are well known in the art, particularly expression
arrays.
[0193] Vectors/host cells
[0194] The invention also provides vectors containing the nucleic
acid molecules described herein. The term "vector" refers to a
vehicle, preferably a nucleic acid molecule, which can transport
the nucleic acid molecules. When the vector is a nucleic acid
molecule, the nucleic acid molecules are covalently linked to the
vector nucleic acid. With this aspect of the invention, the vector
includes a plasmid, single or double stranded phage, a single or
double stranded RNA or DNA viral vector, or artificial chromosome,
such as a BAC, PAC, YAC, OR MAC.
[0195] A vector can be maintained in the host cell as an
extrachromosomal element where it replicates and produces
additional copies of the nucleic acid molecules. Alternatively, the
vector may integrate into the host cell genome and produce
additional copies of the nucleic acid molecules when the host cell
replicates.
[0196] The invention provides vectors for the maintenance (cloning
vectors) or vectors for expression (expression vectors) of the
nucleic acid molecules. The vectors can function in prokaryotic or
eukaryotic cells or in both (shuttle vectors).
[0197] Expression vectors contain cis-acting regulatory regions
that are operably linked in the vector to the nucleic acid
molecules such that transcription of the nucleic acid molecules is
allowed in a host cell. The nucleic acid molecules can be
introduced into the host cell with a separate nucleic acid molecule
capable of affecting transcription. Thus, the second nucleic acid
molecule may provide a trans-acting factor interacting with the
cis-regulatory control region to allow transcription of the nucleic
acid molecules from the vector. Alternatively, a trans-acting
factor may be supplied by the host cell. Finally, a trans-acting
factor can be produced from the vector itself. It is understood,
however, that in some embodiments, transcription and/or translation
of the nucleic acid molecules can occur in a cell-free system.
[0198] The regulatory sequence to which the nucleic acid molecules
described herein can be operably linked include promoters for
directing mRNA transcription. These include, but are not limited
to, the left promoter from bacteriophage .lambda., the lac, TRP,
and TAC promoters from E. coli, the early and late promoters from
SV40, the CMV immediate early promoter, the adenovirus early and
late promoters, and retrovirus long-terminal repeats.
[0199] In addition to control regions that promote transcription,
expression vectors may also include regions that modulate
transcription, such as repressor binding sites and enhancers.
Examples include the SV40 enhancer, the cytomegalovirus immediate
early enhancer, polyoma enhancer, adenovirus enhancers, and
retrovirus LTR enhancers.
[0200] In addition to containing sites for transcription initiation
and control, expression vectors can also contain sequences
necessary for transcription termination and, in the transcribed
region a ribosome binding site for translation. Other regulatory
control elements for expression include initiation and termination
codons as well as polyadenylation signals. The person of ordinary
skill in the art would be aware of the numerous regulatory
sequences that are useful in expression vectors. Such regulatory
sequences are described, for example, in Sambrook et al., Molecular
Cloning: A Laboratory Manual. 2nd. ed., Cold Spring Harbor
Laboratory Press, Cold Spring Harbor, N.Y., (1989).
[0201] A variety of expression vectors can be used to express a
nucleic acid molecule. Such vectors include chromosomal, episomal,
and virus-derived vectors, for example vectors derived from
bacterial plasmids, from bacteriophage, from yeast episomes, from
yeast chromosomal elements, including yeast artificial chromosomes,
from viruses such as baculoviruses, papovaviruses such as SV40,
Vaccinia viruses, adenoviruses, poxviruses, pseudorabies viruses,
and retroviruses. Vectors may also be derived from combinations of
these sources such as those derived from plasmid and bacteriophage
genetic elements, e.g. cosmids and phagemids. Appropriate cloning
and expression vectors for prokaryotic and eukaryotic hosts are
described in Sambrook et al., Molecular Cloning: A Laboratory
Manual. 2nd. ed., Cold Spring Harbor Laboratory Press, Cold Spring
Harbor, N.Y., (1989).
[0202] The regulatory sequence may provide constitutive expression
in one or more host cells (i.e. tissue specific) or may provide for
inducible expression in one or more cell types such as by
temperature, nutrient additive, or exogenous factor such as a
hormone or other ligand. A variety of vectors providing for
constitutive and inducible expression in prokaryotic and eukaryotic
hosts are well known to those of ordinary skill in the art.
[0203] The nucleic acid molecules can be inserted into the vector
nucleic acid by well-known methodology. Generally, the DNA sequence
that will ultimately be expressed is joined to an expression vector
by cleaving the DNA sequence and the expression vector with one or
more restriction enzymes and then ligating the fragments together.
Procedures for restriction enzyme digestion and ligation are well
known to those of ordinary skill in the art.
[0204] The vector containing the appropriate nucleic acid molecule
can be introduced into an appropriate host cell for propagation or
expression using well-known techniques. Bacterial cells include,
but are not limited to, E. coli, Streptomyces, and Salmonella
typhimurium. Eukaryotic cells include, but are not limited to,
yeast, insect cells such as Drosophila, animal cells such as COS
and CHO cells, and plant cells.
[0205] As described herein, it may be desirable to express the
peptide as a fusion protein. Accordingly, the invention provides
fusion vectors that allow for the production of the peptides.
Fusion vectors can increase the expression of a recombinant
protein, increase the solubility of the recombinant protein, and
aid in the purification of the protein by acting for example as a
ligand for affinity purification. A proteolytic cleavage site may
be introduced at the junction of the fusion moiety so that the
desired peptide can ultimately be separated from the fusion moiety.
Proteolytic enzymes include, but are not limited to, factor Xa,
thrombin, and enteroprotease. Typical fusion expression vectors
include pGEX (Smith et al., Gene 67:31-40 (1988)), pMAL (New
England Biolabs, Beverly, Mass.) and pRIT5 (Pharmacia, Piscataway,
N.J.) which fuse glutathione S-transferase (GST), maltose E binding
protein, or protein A, respectively, to the target recombinant
protein. Examples of suitable inducible non-fusion E. coli
expression vectors include pTrc (Amann et al., Gene 69:301-315
(1988)) and pET 11d (Studier et al., Gene Expression Technology:
Methods in Enzymology 185:60-89 (1990)).
[0206] Recombinant protein expression can be maximized in host
bacteria by providing a genetic background wherein the host cell
has an impaired capacity to proteolytically cleave the recombinant
protein. (Gottesman, S., Gene Expression Technology: Methods in
Enzymology 185, Academic Press, San Diego, Calif. (1990)119-128).
Alternatively, the sequence of the nucleic acid molecule of
interest can be altered to provide preferential codon usage for a
specific host cell, for example E. coli. (Wada et al., Nucleic
Acids Res. 20:2111-2118 (1992)).
[0207] The nucleic acid molecules can also be expressed by
expression vectors that are operative in yeast. Examples of vectors
for expression in yeast e.g., S. cerevisiae include pYepSec1
(Baldari, et al., EMBO J. 6:229-234 (1987)), pMFa (Kuijan et al.,
Cell 30:933-943(1982)), pJRY88 (Schultz et al., Gene 54:113-123
(1987)), and pYES2 (Invitrogen Corporation, San Diego, Calif.).
[0208] The nucleic acid molecules can also be expressed in insect
cells using, for example, baculovirus expression vectors.
Baculovirus vectors available for expression of proteins in
cultured insect cells (e.g., Sf9 cells) include the pAc series
(Smith et al., Mol. Cell Biol. 3:2156-2165 (1983)) and the pVL
series (Lucklow et al., Virology 170:31-39 (1989)).
[0209] In certain embodiments of the invention, the nucleic acid
molecules described herein are expressed in mammalian cells using
mammalian expression vectors. Examples of mammalian expression
vectors include pCDM8 (Seed, B. Nature 329:840(1987)) and pMT2PC
(Kaufman et al., EMBO J. 6:187-195 (1987)).
[0210] The expression vectors listed herein are provided by way of
example only of the well-known vectors available to those of
ordinary skill in the art that would be useful to express the
nucleic acid molecules. The person of ordinary skill in the art
would be aware of other vectors suitable for maintenance
propagation or expression of the nucleic acid molecules described
herein. These are found for example in Sambrook, J., Fritsh, E. F.,
and Maniatis, T. Molecular Cloning: A Laboratory Manual. 2nd, ed.,
Cold Spring Harbor Laboratory, Cold Spring Harbor Laboratory Press,
Cold Spring Harbor, N.Y., 1989.
[0211] The invention also encompasses vectors in which the nucleic
acid sequences described herein are cloned into the vector in
reverse orientation, but operably linked to a regulatory sequence
that permits transcription of antisense RNA. Thus, an antisense
transcript can be produced to all, or to a portion, of the nucleic
acid molecule sequences described herein, including both coding and
non-coding regions. Expression of this antisense RNA is subject to
each of the parameters described above in relation to expression of
the sense RNA (regulatory sequences, constitutive or inducible
expression, tissue-specific expression).
[0212] The invention also relates to recombinant host cells
containing the vectors described herein. Host cells therefore
include prokaryotic cells, lower eukaryotic cells such as yeast,
other eukaryotic cells such as insect cells, and higher eukaryotic
cells such as mammalian cells.
[0213] The recombinant host cells are prepared by introducing the
vector constructs described herein into the cells by techniques
readily available to the person of ordinary skill in the art. These
include, but are not limited to, calcium phosphate transfection,
DEAE-dextran-mediated transfection, cationic lipid-mediated
transfection, electroporation, transduction, infection,
lipofection, and other techniques such as those found in Sambrook,
et al. (Molecular Cloning: A Laboratory Manual. 2nd, ed., Cold
Spring Harbor Laboratory, Cold Spring Harbor Laboratory Press, Cold
Spring Harbor, N.Y., 1989).
[0214] Host cells can contain more than one vector. Thus, different
nucleotide sequences can be introduced on different vectors of the
same cell. Similarly, the nucleic acid molecules can be introduced
either alone or with other nucleic acid molecules that are not
related to the nucleic acid molecules such as those providing
trans-acting factors for expression vectors. When more than one
vector is introduced into a cell, the vectors can be introduced
independently, co-introduced or joined to the nucleic acid molecule
vector.
[0215] In the case of bacteriophage and viral vectors, these can be
introduced into cells as packaged or encapsulated virus by standard
procedures for infection and transduction. Viral vectors can be
replication-competent or replication-defective. In the case in
which viral replication is defective, replication will occur in
host cells providing functions that complement the defects.
[0216] Vectors generally include selectable markers that enable the
selection of the subpopulation of cells that contain the
recombinant vector constructs. The marker can be contained in the
same vector that contains the nucleic acid molecules described
herein or may be on a separate vector. Markers include tetracycline
or ampicillin-resistance genes for prokaryotic host cells and
dihydrofolate reductase or neomycin resistance for eukaryotic host
cells. However, any marker that provides selection for a phenotypic
trait will be effective.
[0217] While the mature proteins can be produced in bacteria,
yeast, mammalian cells, and other cells under the control of the
appropriate regulatory sequences, cell-free transcription and
translation systems can also be used to produce these proteins
using RNA derived from the DNA constructs described herein.
[0218] Where secretion of the peptide is desired, which is
difficult to achieve with multi-transmembrane domain containing
proteins such as proteases, appropriate secretion signals are
incorporated into the vector. The signal sequence can be endogenous
to the peptides or heterologous to these peptides.
[0219] Where the peptide is not secreted into the medium, which is
typically the case with proteases, the protein can be isolated from
the host cell by standard disruption procedures, including freeze
thaw, sonication, mechanical disruption, use of lysing agents and
the like. The peptide can then be recovered and purified by
well-known purification methods including ammonium sulfate
precipitation, acid extraction, anion or cationic exchange
chromatography, phosphocellulose chromatography,
hydrophobic-interaction chromatography, affinity chromatography,
hydroxylapatite chromatography, lectin chromatography, or high
performance liquid chromatography.
[0220] It is also understood that depending upon the host cell in
recombinant production of the peptides described herein, the
peptides can have various glycosylation patterns, depending upon
the cell, or maybe non-glycosylated as when produced in bacteria.
In addition, the peptides may include an initial modified
methionine in some cases as a result of a host-mediated
process.
[0221] Uses of vectors and host cells
[0222] The recombinant host cells expressing the peptides described
herein have a variety of uses. First, the cells are useful for
producing a protease protein or peptide that can be further
purified to produce desired amounts of protease protein or
fragments. Thus, host cells containing expression vectors are
useful for peptide production.
[0223] Host cells are also useful for conducting cell-based assays
involving the protease protein or protease protein fragments, such
as those described above as well as other formats known in the art.
Thus, a recombinant host cell expressing a native protease protein
is useful for assaying compounds that stimulate or inhibit protease
protein function.
[0224] Host cells are also useful for identifying protease protein
mutants in which these functions are affected. If the mutants
naturally occur and give rise to a pathology, host cells containing
the mutations are useful to assay compounds that have a desired
effect on the mutant protease protein (for example, stimulating or
inhibiting function) which may not be indicated by their effect on
the native protease protein.
[0225] Genetically engineered host cells can be further used to
produce non-human transgenic animals. A transgenic animal is
preferably a mammal, for example a rodent, such as a rat or mouse,
in which one or more of the cells of the animal include a
transgene. A transgene is exogenous DNA which is integrated into
the genome of a cell from which a transgenic animal develops and
which remains in the genome of the mature animal in one or more
cell types or tissues of the transgenic animal. These animals are
useful for studying the function of a protease protein and
identifying and evaluating modulators of protease protein activity.
Other examples of transgenic animals include non-human primates,
sheep, dogs, cows, goats, chickens, and amphibians.
[0226] A transgenic animal can be produced by introducing nucleic
acid into the male pronuclei of a fertilized oocyte, e.g., by
microinjection, retroviral infection, and allowing the oocyte to
develop in a pseudopregnant female foster animal. Any of the
protease protein nucleotide sequences can be introduced as a
transgene into the genome of a non-human animal, such as a
mouse.
[0227] Any of the regulatory or other sequences useful in
expression vectors can form part of the transgenic sequence. This
includes intronic sequences and polyadenylation signals, if not
already included. A tissue-specific regulatory sequence(s) can be
operably linked to the transgene to direct expression of the
protease protein to particular cells.
[0228] Methods for generating transgenic animals via embryo
manipulation and microinjection, particularly animals such as mice,
have become conventional in the art and are described, for example,
in U.S. Pat. Nos. 4,736,866 and 4,870,009, both by Leder et al.,
U.S. Pat. No. 4,873,191 by Wagner et al. and in Hogan, B.,
Manipulating the Mouse Embryo, (Cold Spring Harbor Laboratory
Press, Cold Spring Harbor, N.Y., 1986). Similar methods are used
for production of other transgenic animals. A transgenic founder
animal can be identified based upon the presence of the transgene
in its genome and/or expression of transgenic mRNA in tissues or
cells of the animals. A transgenic founder animal can then be used
to breed additional animals carrying the transgene. Moreover,
transgenic animals carrying a transgene can further be bred to
other transgenic animals carrying other transgenes. A transgenic
animal also includes animals in which the entire animal or tissues
in the animal have been produced using the homologously recombinant
host cells described herein.
[0229] In another embodiment, transgenic non-human animals can be
produced which contain selected systems that allow for regulated
expression of the transgene. One example of such a system is the
cre/loxP recombinase system of bacteriophage P1. For a description
of the cre/loxP recombinase system, see, e.g., Lakso et al PNAS
89:6232-6236 (1992). Another example of a recombinase system is the
FLP recombinase system of S. cerevisiae (O'Gorman et al. Science
251:1351-1355 (1991). If a cre/loxP recombinase system is used to
regulate expression of the transgene, animals containing transgenes
encoding both the Cre recombinase and a selected protein is
required. Such animals can be provided through the construction of
"double" transgenic animals, e.g., by mating two transgenic
animals, one containing a transgene encoding a selected protein and
the other containing a transgene encoding a recombinase.
[0230] Clones of the non-human transgenic animals described herein
can also be produced according to the methods described in Wilmut,
I. et al. Nature 385:810-813 (1997) and PCT International
Publication Nos. WO 97/07668 and WO 97/07669. In brief, a cell,
e.g., a somatic cell, from the transgenic animal can be isolated
and induced to exit the growth cycle and enter G.sub.0 phase. The
quiescent cell can then be fused, e.g., through the use of
electrical pulses, to an enucleated oocyte from an animal of the
same species from which the quiescent cell is isolated. The
reconstructed oocyte is then cultured such that it develops to
morula or blastocyst and then transferred to pseudopregnant female
foster animal. The offspring born of this female foster animal will
be a clone of the animal from which the cell, e.g., the somatic
cell, is isolated.
[0231] Transgenic animals containing recombinant cells that express
the peptides described herein are useful to conduct the assays
described herein in an in vivo context. Accordingly, the various
physiological factors that are present in vivo and that could
effect substrate binding, protease protein activity/activation, and
signal transduction, may not be evident from in vitro cell-free or
cell-based assays. Accordingly, it is useful to provide non-human
transgenic animals to assay in vivo protease protein function,
including substrate interaction, the effect of specific mutant
protease proteins on protease protein function and substrate
interaction, and the effect of chimeric protease proteins. It is
also possible to assess the effect of null mutations, that is
mutations that substantially or completely eliminate one or more
protease protein functions.
[0232] All publications and patents mentioned in the above
specification are herein incorporated by reference. Various
modifications and variations of the described method and system of
the invention will be apparent to those skilled in the art without
departing from the scope and spirit of the invention. Although the
invention has been described in connection with specific preferred
embodiments, it should be understood that the invention as claimed
should not be unduly limited to such specific embodiments. Indeed,
various modifications of the above-described modes for carrying out
the invention which are obvious to those skilled in the field of
molecular biology or related fields are intended to be within the
scope of the following claims.
Sequence CWU 1
1
4 1 2594 DNA Human 1 gtgcgaaagg ctgccagcat gtcatcagtg agccccatcc
agatccccag tcgcctcccg 60 ctgctgctca cccacgaggg cgtcctgctg
cccggctcca ccatgcgcac cagcgtggac 120 tcggcccaca acctgcagct
ggtgcggagc cgccttctga agggcacgtc gctgcaaagc 180 accatcctgg
gcgtcatccc caacacgcct gaccccgcca gcgacgcgca ggacctgccg 240
ccgctgcaca ggattggcac agctgcactg gccgttcagg ttgtgggcag taactggccc
300 aagccccact acactctgtt gattacaggc ctatgccgtt tccagattgt
acaggtctta 360 aaagagaagc catatcccat tgctgaagtg gagcagttgg
accgacttga ggagtttccc 420 aacacctgta aaatgaggga ggagctagga
gaactatcag agcagtttta caaatatgca 480 gtacaattgg ttgaaatgtt
ggatatgtct gtccctgcag ttgctaaatt gagacgtctt 540 ttagatagtc
ttccaaggga agctttacca gacatcttga catcaattat ccgaacaagc 600
aacaaagaga aactccagat tttagatgct gtgagcctag aggagcggtt caagatgact
660 ataccactgc ttgtcagaca aattgaaggc ctgaaattgc ttcaaaaaac
cagaaaaccc 720 aagcaagatg atgataagag ggttatagca atacgcccta
ttaggagaat tacacatatc 780 tcaggtactt tagaagatga agatgaagat
gaagataatg atgacattgt catgctagag 840 aaaaaaatac gaacatctag
tatgccagag caggcccata aagtctgtgt caaagagata 900 aagagactca
aaaaaatgcc tcagtcaatg ccagaatatg ctctgactag aaattatttg 960
gaacttatgg tagaacttcc ttggaacaaa agtacaactg accgcctgga cattagggca
1020 gcccggattc ttctggataa tgaccattac gccatggaaa aattgaagaa
aagagtactg 1080 gaatacttgg ctgtcagaca gctcaaaaat aacctgaagg
gcccaatcct atgctttgtt 1140 ggccctcctg gagttggtaa aacaagtgtg
ggaagatcag tggccaagac tctaggtcga 1200 gagttccaca ggattgcact
tggaggagta tgtgatcagt ctgacattcg aggacacagg 1260 cgcacctatg
ttggcagcat gcctggtcgc atcatcaacg gcttgaagac tgtgggagtg 1320
aacaacccag tgttcctatt agatgaggtt gacaaactgg gaaaaagtct acagggtgat
1380 ccagcagcag ctctgcttga ggtgttggat cctgaacaaa accataactt
cacagatcat 1440 tatctaaatg tggcctttga cctttctcaa gttcttttta
tagctactgc caacaccact 1500 gctaccattc cagctgcctt gttggacaga
atggagatca ttcaggttcc aggttataca 1560 caggaggaga agatagagat
tgcccatagg cacttgatcc ccaagcagct ggaacaacat 1620 gggctgactc
cacagcagat tcagataccc caggtcacca ctcttgacat catcaccagg 1680
tataccagag aggcaggggt tcgttctctg gatagaaaac ttggggccat ttgccgagct
1740 gtggccgtga aggtggcaga aggacagcat aaggaagcca agttggaccg
ttctgatgtg 1800 actgagagag aaggttgcag agaacacatc ttagaagatg
aaaaacctga atctatcagt 1860 gacactactg acttggctct accacctgaa
atgccgattt tgattgattt ccatgctctg 1920 aaagacatcc ttgggccccc
gatgtatgaa atggaggtat ctcagcgttt gagtcagcca 1980 ggagtagcaa
taggtttggc ttggactccc ttaggtggag aaatcatgtt cgtggaggcg 2040
agtcgaatgg atggcgaggg ccagttaact ctgaccggcc agctcgggga cgtgatgaag
2100 gagtccgccc acctcgctat cagctggctc cgcagcaacg caaagaagta
ccagctgacc 2160 aatgcttttg gaagttttga tcttcttgac aacacagaca
tccatctgca cttcccagct 2220 ggagctgtca caaaagatgg accatctgct
ggagttacca tagtaacctg tctcgcctca 2280 ctttttagtg ggcggctggt
acgttcagat gtagccatga ctggagaaat tacactgaga 2340 ggtcttgttc
ttccagtggg tggaattaaa gacaaagtgc tggcggcaca cagagcggga 2400
ctgaagcaag tcattattcc tcggagaaat gaaaaagacc ttgagggaat cccaggcaac
2460 gtacgacagg atttaagttt tgtcacagca agctgcctgg atgaggttct
taatgcagct 2520 tttgatggtg gctttactgt caagaccaga cctggtctgt
taaatagcaa actgtaggtc 2580 caaatctcaa tttt 2594 2 852 PRT Human 2
Met Ser Ser Val Ser Pro Ile Gln Ile Pro Ser Arg Leu Pro Leu Leu 1 5
10 15 Leu Thr His Glu Gly Val Leu Leu Pro Gly Ser Thr Met Arg Thr
Ser 20 25 30 Val Asp Ser Ala His Asn Leu Gln Leu Val Arg Ser Arg
Leu Leu Lys 35 40 45 Gly Thr Ser Leu Gln Ser Thr Ile Leu Gly Val
Ile Pro Asn Thr Pro 50 55 60 Asp Pro Ala Ser Asp Ala Gln Asp Leu
Pro Pro Leu His Arg Ile Gly 65 70 75 80 Thr Ala Ala Leu Ala Val Gln
Val Val Gly Ser Asn Trp Pro Lys Pro 85 90 95 His Tyr Thr Leu Leu
Ile Thr Gly Leu Cys Arg Phe Gln Ile Val Gln 100 105 110 Val Leu Lys
Glu Lys Pro Tyr Pro Ile Ala Glu Val Glu Gln Leu Asp 115 120 125 Arg
Leu Glu Glu Phe Pro Asn Thr Cys Lys Met Arg Glu Glu Leu Gly 130 135
140 Glu Leu Ser Glu Gln Phe Tyr Lys Tyr Ala Val Gln Leu Val Glu Met
145 150 155 160 Leu Asp Met Ser Val Pro Ala Val Ala Lys Leu Arg Arg
Leu Leu Asp 165 170 175 Ser Leu Pro Arg Glu Ala Leu Pro Asp Ile Leu
Thr Ser Ile Ile Arg 180 185 190 Thr Ser Asn Lys Glu Lys Leu Gln Ile
Leu Asp Ala Val Ser Leu Glu 195 200 205 Glu Arg Phe Lys Met Thr Ile
Pro Leu Leu Val Arg Gln Ile Glu Gly 210 215 220 Leu Lys Leu Leu Gln
Lys Thr Arg Lys Pro Lys Gln Asp Asp Asp Lys 225 230 235 240 Arg Val
Ile Ala Ile Arg Pro Ile Arg Arg Ile Thr His Ile Ser Gly 245 250 255
Thr Leu Glu Asp Glu Asp Glu Asp Glu Asp Asn Asp Asp Ile Val Met 260
265 270 Leu Glu Lys Lys Ile Arg Thr Ser Ser Met Pro Glu Gln Ala His
Lys 275 280 285 Val Cys Val Lys Glu Ile Lys Arg Leu Lys Lys Met Pro
Gln Ser Met 290 295 300 Pro Glu Tyr Ala Leu Thr Arg Asn Tyr Leu Glu
Leu Met Val Glu Leu 305 310 315 320 Pro Trp Asn Lys Ser Thr Thr Asp
Arg Leu Asp Ile Arg Ala Ala Arg 325 330 335 Ile Leu Leu Asp Asn Asp
His Tyr Ala Met Glu Lys Leu Lys Lys Arg 340 345 350 Val Leu Glu Tyr
Leu Ala Val Arg Gln Leu Lys Asn Asn Leu Lys Gly 355 360 365 Pro Ile
Leu Cys Phe Val Gly Pro Pro Gly Val Gly Lys Thr Ser Val 370 375 380
Gly Arg Ser Val Ala Lys Thr Leu Gly Arg Glu Phe His Arg Ile Ala 385
390 395 400 Leu Gly Gly Val Cys Asp Gln Ser Asp Ile Arg Gly His Arg
Arg Thr 405 410 415 Tyr Val Gly Ser Met Pro Gly Arg Ile Ile Asn Gly
Leu Lys Thr Val 420 425 430 Gly Val Asn Asn Pro Val Phe Leu Leu Asp
Glu Val Asp Lys Leu Gly 435 440 445 Lys Ser Leu Gln Gly Asp Pro Ala
Ala Ala Leu Leu Glu Val Leu Asp 450 455 460 Pro Glu Gln Asn His Asn
Phe Thr Asp His Tyr Leu Asn Val Ala Phe 465 470 475 480 Asp Leu Ser
Gln Val Leu Phe Ile Ala Thr Ala Asn Thr Thr Ala Thr 485 490 495 Ile
Pro Ala Ala Leu Leu Asp Arg Met Glu Ile Ile Gln Val Pro Gly 500 505
510 Tyr Thr Gln Glu Glu Lys Ile Glu Ile Ala His Arg His Leu Ile Pro
515 520 525 Lys Gln Leu Glu Gln His Gly Leu Thr Pro Gln Gln Ile Gln
Ile Pro 530 535 540 Gln Val Thr Thr Leu Asp Ile Ile Thr Arg Tyr Thr
Arg Glu Ala Gly 545 550 555 560 Val Arg Ser Leu Asp Arg Lys Leu Gly
Ala Ile Cys Arg Ala Val Ala 565 570 575 Val Lys Val Ala Glu Gly Gln
His Lys Glu Ala Lys Leu Asp Arg Ser 580 585 590 Asp Val Thr Glu Arg
Glu Gly Cys Arg Glu His Ile Leu Glu Asp Glu 595 600 605 Lys Pro Glu
Ser Ile Ser Asp Thr Thr Asp Leu Ala Leu Pro Pro Glu 610 615 620 Met
Pro Ile Leu Ile Asp Phe His Ala Leu Lys Asp Ile Leu Gly Pro 625 630
635 640 Pro Met Tyr Glu Met Glu Val Ser Gln Arg Leu Ser Gln Pro Gly
Val 645 650 655 Ala Ile Gly Leu Ala Trp Thr Pro Leu Gly Gly Glu Ile
Met Phe Val 660 665 670 Glu Ala Ser Arg Met Asp Gly Glu Gly Gln Leu
Thr Leu Thr Gly Gln 675 680 685 Leu Gly Asp Val Met Lys Glu Ser Ala
His Leu Ala Ile Ser Trp Leu 690 695 700 Arg Ser Asn Ala Lys Lys Tyr
Gln Leu Thr Asn Ala Phe Gly Ser Phe 705 710 715 720 Asp Leu Leu Asp
Asn Thr Asp Ile His Leu His Phe Pro Ala Gly Ala 725 730 735 Val Thr
Lys Asp Gly Pro Ser Ala Gly Val Thr Ile Val Thr Cys Leu 740 745 750
Ala Ser Leu Phe Ser Gly Arg Leu Val Arg Ser Asp Val Ala Met Thr 755
760 765 Gly Glu Ile Thr Leu Arg Gly Leu Val Leu Pro Val Gly Gly Ile
Lys 770 775 780 Asp Lys Val Leu Ala Ala His Arg Ala Gly Leu Lys Gln
Val Ile Ile 785 790 795 800 Pro Arg Arg Asn Glu Lys Asp Leu Glu Gly
Ile Pro Gly Asn Val Arg 805 810 815 Gln Asp Leu Ser Phe Val Thr Ala
Ser Cys Leu Asp Glu Val Leu Asn 820 825 830 Ala Ala Phe Asp Gly Gly
Phe Thr Val Lys Thr Arg Pro Gly Leu Leu 835 840 845 Asn Ser Lys Leu
850 3 112132 DNA Human misc_feature (1)...(112132) n = A,T,C or G 3
atcattaaaa agtcaggaaa caacaggtgc tggagaggat gtggagaaat aggaacactt
60 ttacactgtt ggtgggactg taaactagtt caaccattgt ggaagacagt
gtggcaattc 120 ctcaaggatc tggaactaga aataccattt gacccagcca
tcccattgct gggtatatac 180 ccaaaggatt ataaatcatg ctgctataaa
gacacacaca cacgtatgct tactgcggca 240 ctattcgcaa tagcaaagac
ttggaaccaa cccaaatgtc catcaatgat agactggatt 300 aagaaaatgt
ggcacatata caccatggaa tactatgcag ccataaaaaa ggatgagttc 360
atgtcctttg tagggacatg gatgatgctg gaaaccatca ttctgagcaa actatcgcaa
420 agaccgaaaa caaaacactg caagttctca ctcataggtg gcaactgaac
aatgagaaca 480 cttggacaca gggtggggaa catcacactc aggggcctgt
cgttgggtgg tggggagtgg 540 gggggaaggg ataccattag gagatatacc
taatgtaaat gacgagttag tgagtgcagc 600 aaaccaacat ggcacatgta
tacatatgta acaaacctgt acgttgtgca catgtaccct 660 agaacttaaa
ctataataaa aaataaaatt aaattaaaaa catgaaaaaa aataaaagta 720
tcaaggttgt aaaaaaaaaa aaaattggac gggcgcagtg gctcaggcct gtaatcccag
780 cacttttggg aggccaaggc gggcagatca ctgaggtcag gagattgaga
ccatcctggc 840 taacatggcg aaaccccgtc tctactaaaa atacaaaaaa
ttagccgggc agtggttgcg 900 ggtgcctgta gtccccagct actcgggagg
ctgaagcagg agaatggcat gaacccggga 960 ggcggagctt gcagtgagcc
gagatctcgc cactacactc cagcctgggt gacagagcga 1020 gactccgtct
caaaaaaaaa aaaaaaaaaa aaaaattgag gacttgccac agattagaga 1080
gcacctagga gatttcataa caaaacacct aggagatttc acaacaggat cctggatatt
1140 ggatcctgga ccagatccaa tgaaggacat tagtgggaaa actggcaaaa
tttgggtaag 1200 gcctataggt taaacgataa taatgttaat ttcctggttt
tgatcattga actatgatta 1260 tgtaagatga taacagacga aactgggtga
aaggtatata ggaactctgc tgtagttttg 1320 tacatctaaa atcaattcgg
gccgggcacg ttggctcacg cctgtaatcc cagcactttg 1380 ggaggccgag
gtggacggat cgcttgaggt caggagttaa agaccagcct ggccaacatg 1440
gtgaaatccc ctccctacta aaaatacaac aattagctgg gtgtggtggc gggcatctgt
1500 aatcccagct actcgggagg ctgaggcagg agaatcgctt gaacccggga
ggcagaggct 1560 gcaagccgtg ggtatcgcgc cattgcactc cagcctccgc
gacagagcga gaatctgtct 1620 cagaataaat aaataaataa ataaataaat
aattagttcg aatcaaaagt taaaaacact 1680 tcaagtatat gtaaaaaatc
gaagaaaacg ttaaaaacac ttcaagtata tacaattcaa 1740 ataagatcat
ccttccaaat atactctgta agtgaggcga aggtcgctgc acgcttgagt 1800
gcacgtcttt ccgcataggt aggacgctca agtcttaccg ggaggctctc ctagagagca
1860 gcgcgaagcc atggcttttg ggcccgggga cggaccgtag cgcgtagccg
gaagcggagg 1920 cgtggaggcg ggtctgaggt ttggtgactg cggggcaggc
cgggggcagc tgtctgtctg 1980 gctctttttg acagccccca gtgcgaaagg
ctgccagcat gtcatcagtg agccccatcc 2040 agatccccag tcgcctcccg
ctgctgctca cccacgaggg cgtcctgctg cccggctcca 2100 ccatgcgcac
cagcgtggac tcggcccgca acctgcagct ggtgcggagc cgccttctga 2160
agggcacgtc gctgcaaagc accatcctgg gcgtcatccc caacacgcct gaccccgcca
2220 gcgacgcgca ggacctgccg ccgctgcaca ggtaggcctg gctgcccccg
cggcggcggc 2280 gggcggcgcg gcctcctccg gggacctggg cccaggccac
ggcctgcctt gagcgcgagg 2340 ctcagttcgg ggcggccttc gcggctcggt
tccgcctctc tggtgctatc acttgcaaaa 2400 tggggatgtc agatacctgc
cccatgacca tgaatgagat cgttcatgaa gtagtgcctg 2460 acacctggtg
aaactacgca gttccctacc gttctggata atttaatttg aatcctcttc 2520
cccctctccg caattcctcg ccctcggtct tcagcctcct aggccagtgc ttttaacttt
2580 ccaggccctt tctttctccc cggtgatctc tgccttcact tgccttcgct
tttcaccttt 2640 ctccccactg ccctttactc ctatccgcct ccccttttct
gtcacccatc atttttgtcc 2700 gctgaggcat tctctgctcc gtgagtttta
acttttcctg tttcattcct aaactgcact 2760 atttgtgggt gcctttcttc
tatactccct gccacccttc tccttctccc cctaatcctt 2820 ctgtttccct
ttgtaaaggg cctttactgc tcacattttc gctggtcccc ctttctggaa 2880
ctttcctagc ttctcacctc tgctccttca ctcataacat ttcttaggcc ccaggcttac
2940 tactatattg cccagtaccc tcgccctatt ggtgtgactt tgggtgagag
ctttaacctc 3000 tatttctttt attctgcaat ttggaaactg acagcatcca
tctcttaggc aagttatgaa 3060 gaataaattg aataatgtgt atattccact
ttgcaccatg catgatggat gactttgctg 3120 tccagtactg tgtagtgcat
gtggctcgtc aaattgagat gatagaattg ccagttgtcc 3180 tggtttgctg
cgattgtctg ttttagcatt gaaagtccta tgttttagcc cctccgtccc 3240
agggaaacca ggaggttggt caccctaaat gtgctgtaag tgtacaatac acgccagatt
3300 ttgaaaaaac tttttgatta atacatttta tatggattaa atgttggaaa
ggtaatattt 3360 tgagtacttg gggttaataa aatgttaaga tttctgctgt
ttttacttta taatgtggcc 3420 actaaaattt tatatgtggt ccacattata
tttttattgg acaatgctgg tatatcgtat 3480 gctctcaaca agtatcttca
aactcacctg ccaagcaccc gcctcctatt cctaactcta 3540 ctggaggtgt
tgtgttttca gtttagagct tctcctttcc tggcagttat cccttatttt 3600
taaattaggg gttcctgact ctgaatggat ttccggaggg ttggacatgt cttatttttc
3660 ctcaaaatct tgtgactatg tacatttttt taggagaatc ctttgctttc
ttcagattct 3720 caaaggagac tggtacctcc ccccaccccc gttaaaagaa
agcaaaacaa agcaacaaag 3780 accaacaaac cttccacagc agcccagtat
tcatttatat tgtaaaagcc ttgattttct 3840 caagcatgga aaatattttg
gctcccatct gacctgcttt ggttattgcc tgagtggaat 3900 tggtcacatt
ccaagtttca gtactctttg ataaattgta ttggattcta gtttcccaac 3960
atacgactct gctccttctg cttacttttc ccaaattatt ttgccttctg tgcccaggca
4020 cacttagttc cctgtctagg caagagtggt cattattaga cttcattttc
tttctactgt 4080 gcatatgtat tgattagcca tgggcacatt gtgaacttga
aaagtcgatt tagtcacatt 4140 ttaagtttca ctatttgttg gtattattct
ggcaagattt tggaaggttt ttattattta 4200 ttcatttgtg tattttttga
gacagagtct cattctgtct cctccgctga agtgcagtgg 4260 cgtgatcgta
gcccaccgca accttgattg aactcctggg ctcaagtgat cgtcctgcct 4320
cagcttctgg agtggctggg actataggcg tgcaccacta cacccagcta atttttaaat
4380 tttttgtaga aatggggtct cactatgttg cttaggctgg tctcaaactc
ctggactcaa 4440 gctatcccct gctttggcct ctggagtagc tgggactata
ggcaagcgcc accataccct 4500 tcaggttttt aatttatttt atgaaaatcc
ctccaaagca acaatcctca attctcctgc 4560 ttgaaagtaa tcactaataa
tcaggtactg tgtgatctga tccttgatgt tcatattatt 4620 gcctttaact
gagtagcaat gttaaaattt aatcatttaa attagaaaac atatattgaa 4680
aagtcttcat agaagtccgg cattataaga actcatcaga ccatctagtt atcctagaag
4740 tattgtttgc tacttaaaaa gcctatgtgg aaagattgta ccatattcct
tggtaatagt 4800 ttccaatgtc tttttttctc taatagggcc tttaaaacac
tctacttaaa aaaaaaaaaa 4860 aaaaaaggct ttaacaatac caatactgag
taatccatag cattagcctg tttccacgca 4920 caagtctgtc cttccccagt
tacctgcttt tctgtatggt agcccagagg ccagaagagg 4980 ggctctgttc
ctttctcttg tttcctttgc gctatccagg tgacgctggc acagccttca 5040
aagagcagca gaagtaattt gctcccagcg ttctttgcca catagagtgg cagggttaaa
5100 tgatttaaaa tttaatcatt taaattagaa aacatagatt gaaaagtctt
catagaattc 5160 cagcattaaa agaactcatc agaccatcta gttatcctag
aagtattgtt tgctacttca 5220 aaagcctatg tggaaagatt gtacgatatt
ccttggtaat agtttcgaat gtcttttttt 5280 ctctaatatg gcttttaaag
cactctactt aaaaaaaaaa aaagctttaa taataccaat 5340 accgagttat
ccacagtatt agtctgtttc catgcacaaa tctgtccttc cccagttacc 5400
tgcttttctg tatggtagcc cagaggtgag atgaggggct ctgttcctgt ctcttgtttc
5460 ctttacacca tccaggtgac actggctgca gccttcaagg agcagcagaa
gtaatttgct 5520 cccagcgttc tttgccacac agagtggcag gattagatgt
tgacttacct ctgccacttc 5580 cttggtggtt ttgagtagta cagtcccttt
ctgcacgtta gtgtgcaggc atgttgcctg 5640 caggagcctt tttaaaggag
gagctttgga cttgtcctgc agtatagaac ttggctggca 5700 tgctgaccca
gggcaccctg catttttctg cttagtagaa ctgcattttt agtgcttcct 5760
gagtgaccca ttgttttctt agtgaaaagg ggtcataatt tagtactacc tgtacaatat
5820 cctttcaagc atttcaagat ggtcatccag ctttcttcca aatttacact
tttcagggta 5880 catggcttca tttcctcata gtgccgactt ctcagtctcc
ctcaccaggc tggtgtcaaa 5940 cttgtgagct caagtgatcc tcctgcctct
gtctcccaaa gtgttaggat tacaggcgtg 6000 agccaccatg cctggcctat
gtttataatt cttgtaggta gaagtggtac ctattgtcca 6060 ttgtaatgag
aaaaaagtaa aatttgtctt aaaatataat taaggaactc aatttattaa 6120
atttaaattt atcctttaaa ttttaaattt aaatttattt cttaaattta tttctattac
6180 attttcttgt aaccatgtac acctaagttg ttctacttta atttttttga
gacagggtct 6240 cactctgtca cccatgctgg tgcagtggtg ccatctcagc
tcactgcaac ctttgcctcc 6300 caggttcaag tgatcctctc acctcagcct
cctgagtgtc tgggattaca ggcatgtgcc 6360 acaatgccta gctatttttt
tttttttttt ttggtggaga cggggttttg ccatgttgcg 6420 caaactggtt
tcgaactcct gagcccaagt gatccacttg cctcggcctc ccaaagtgct 6480
gggattatag gtgtgagcca ccatgccatg ttctaccttt ttgaatctca tttactcact
6540 tgtaataagg aaataatact accttcttca tggggtgaag ggaggtataa
aatgaagtat 6600 acatatgaaa gccttttgaa actgcaaagc attctaaacc
tatatccaaa tgggtagttt 6660 taaatgtaga ttttcacaaa aggggattaa
agagaggagt ggggaggccc catattattc 6720 caacacgggc tgaactgaac
taacatcatt gcaggaaggt cttggaagat taaagattcc 6780 aagaaaaatt
aagggctttg agtaaaaaaa ttttttaaaa gtggctgggc ctggtggcac 6840
gtgcctgtac tcccatctac tcatgatgct gaggcggagg attacttgag cccaggtgat
6900 cgaagctgca gtgagctata atcgtaccac tgcactccag cctgggtgac
agagcaagat 6960 tctgtctata ggaaaaaaaa aaaaaaaaaa agcaagtgct
gggcatatag gctggaatta 7020 gatatttaca taatatcctc atcttggaaa
actttttcca gtagtgctgc ttttagattt 7080 tcccactact
gcagttgatg gttcttaaat atgtttggaa ctcttatatt atttaggtca 7140
gtttccaaat tacacaaatt gtaaccattg tagtcagacc tcacttgaat gaaaacaata
7200 ttttacaaac tctgagggta gattcgagtt aggatttgga ttaaaacatt
atcttaaaac 7260 ctctgagggt agattcgagt taggagtttc aaaacttctt
tgaacaatat cataattagg 7320 atgtagattt acagagctac tagctaaagg
gaaggacacc agtcattggg atgtataagt 7380 ttggatctgt tgcaaaatta
aaatgctgcc ttttgagcat gcctaataat gcacatacaa 7440 tagaagagcc
agaattttta gaaaaatgac tgacttgata tacaaccttt tgtatatcat 7500
agaaggaaaa tattagttga gtattttgtt tatttacctg tttgtatata taaaacctgg
7560 ggcccaatat acaatagatt ctttttcact atgcttttca cccacagtgt
ctcaccaggt 7620 actctgtttc tagccatcta taatttcata gatgtttttc
tttaaaaggg atgtattcta 7680 ggctgggcga ggtgggtctt gcctgtaatc
ctagcacttt gggaggccaa gatgggagga 7740 ttgcttgagg ccagtagttg
gagatcagcc tggtcaacat catgagatcc catctctgtt 7800 aaaaaaagaa
aaaaaaattt ttttaaaggg ataatttcta gtcaactata agtgatttta 7860
agtaaaaagc aattaaggca tgtatacatc tgtacctttt gtaggcatag tataaattca
7920 gcttaatctc ttcagtttgg aacatcttcc tttcacagca aaaatattgt
atttgcttta 7980 taagaaaacc ccttttggcc aggtgtggtg gctcacgccc
gtaatcctag cactttggga 8040 ggctgaggtg ggtggattac cttaggtcag
gagttcgaga ccagcctggc caacatagtg 8100 aaaccctgtc tctactaaaa
atacaaaaat tagctgggcg tggtggtgtg tgccctgtaa 8160 tcccagctag
ttgggaggct gaggcacgag aatcccttga acccaggagg cagagtgcaa 8220
taagccgaga tcacgccatt gtacgtcagg ctgggcgaca gggtgagact ccctctaaaa
8280 aacaaacaaa aaaaccacag tggctcacac ctgtaatccc agcactttgg
gaggccaagg 8340 tgggcgaatc atgaggtcaa gagatcgaga tcatcctggc
caacatggtg aaacctcatc 8400 tctacaaaaa atacaaaaaa ttagctgggc
gtggtggtgt gtgcctgtag tcccagctac 8460 ttgggaggct gaggcaggag
aatcacttga atctgggaga cggaggttgc agtgagccaa 8520 gattaggcta
ctgcgctcca gcctggtgac aaagtgagac tccgtctcaa aaaaaaaaaa 8580
caaaaaacaa aaaacaactc tttagcatca ccttttagca atgacatagc ccaaataatt
8640 aaatttgtct cctgatcgga gatttggatt tgtctcatct ctctttctgg
ttcctccttg 8700 gtttctactt tgtaaaccct ttaggccggg gatccagttt
cttgtctgtg gatgttttat 8760 atacaaacag gactgtgagc tctttcagca
ttgtacaaac agtgatgaat atcatctgca 8820 attaattatg tttaagttat
tctctaatca gtttagaggt ggctcacttc ctcaggcaat 8880 ctgagtgggc
tttcaggaag tgggaaatat tatctactat tgattgaaga aaagcagcca 8940
caacacaaat aagtcaaaat aatagctaat tgctaaataa tttcaagttt tttatgtatg
9000 tgattttttt ccctcaccaa tttatcttct cagttgtttg gcttattatt
taaatcagtt 9060 tttattgtaa acatggtaat gactgaaagg taagaaaagg
atagacgtag ttcagaataa 9120 actgagtggc agaaagaagc caaaggctat
gtgtaatcta cggaatgagt aatttataag 9180 gaagtaatca agaattcact
gtgtatagaa gtaagcaagt tcactcacat agtcacatac 9240 tgtattacat
gatttattat ctttgagatg ggcaggtgtg gtgttcttct attaccgctt 9300
tcctagggtg ttgagagttc tagtccttct attttctttt ctggaattac cacttttcct
9360 atggctgaag ggagaaaata ttatttattt tgggatctgg aattgtcttc
tcaatgttga 9420 tttttgtatt ttatataact gacttagttt ggatgaggct
tcctttctgt gaattaaatt 9480 tatatgtgac ttgatcagag ttgtatttgc
tgatgaggag ctgagacttg aagccttttc 9540 acctattgtt aggtaaaatg
attaccactt agaactaggt tgagaccttt tgagatgtgg 9600 gtctttcttt
agctctcctc agtctatggc agtgtgtgga ctgtaatatt tagccctcac 9660
acttagaaat tcagtgttaa gggcatatat ataagttccc agtatgtgat ggcagcttgt
9720 gataaggtgg gtatgtggaa gtttcataga ctgattatgt aagaaaactg
acttgatgtt 9780 agtagcacaa ctggtgttgg aacggagatt tcttagattg
gtttatgcta tttatattta 9840 aatgtattta aattgataat atttatcctg
gtataagatt gccttattct tagttgacaa 9900 tgttaattta agatatgtaa
ttctcagctg cttttctctt acatttttac gcttgaataa 9960 tccaagtgtt
tacaaattcc tacctaattt tttaaaagag gtgcagatta tagtgagatg 10020
gtctgctttg ccatatagct gagggtagtg gcagaagagg ccacatactg gatgctaagt
10080 taaatagaga aaaaatttat ttacacttca gatgtctttt gcttaatgaa
tgtatcagaa 10140 aagccaacac tttctgaagt gagtttctgt tctaccgtat
tgaatgtttg taataccgat 10200 gttttgtgtg tttttcagga ttggcacagc
tgcactggcc gttcaggttg tgggcagtaa 10260 ctggcccaag ccccactaca
ctctgttgat tacaggccta tgccgtttcc agattgtaca 10320 ggtcttaaaa
gagaagccat atcccattgc tgaagtggag cagttggacc gacttgagga 10380
gtttcccaac acctgtaaaa tgagggagga gctaggagaa ctatcagagc agttttacaa
10440 atatgcagta caagtaagtt gcttttattt tttcttaaaa cccatttttc
tttggttctt 10500 ttgctttcct aagatatggt gaatctgttg gatagtgaag
ttttaggaca gtatacattt 10560 aaatgagtta gtaacattat atattaattc
tgatttactc ttatctgggg ttgtacctaa 10620 atcattccag gacatattgg
cctacccttt ctaaagtttt ccaaatgtta tttctacagc 10680 tttccttcta
acttctactg tctctaaact agataattat taaacctaaa tatttaaagc 10740
taaaaaacga aatactgcac agaagctgtc tgtcactaaa atatctaggc accatttata
10800 taaattacaa tatattactt caaaagtcaa gatcacattg tctagcagta
actatggtag 10860 atcaagcctg tggtgggctg atttcaagta tggttaaaac
cttgattaac tagaatgctg 10920 ggaaggaagc acattttaga tatgcattaa
atatttgact ctttaattct agttcttttt 10980 ggttaactct agatagaaca
gaaagctcct attcccaccc cattttgttt caaaccttaa 11040 tgaaacataa
aattataaag tatagtcttc tacttttcta ttagtttaat ccagtgacta 11100
taactagatc tatgaggatc agataatgtt taaaagtcac aattataaat actactgatc
11160 attgaaatat gtgtggggca agtgttcata gccagtggta tttgtatctg
atgtggcatt 11220 tgaagagcca tacttacagt gtaatgaaca ataacagaaa
aatagtaaat ttgagggcca 11280 ggtgcgctgg tgcacacctg taatcccagc
actttgggag gctgaggtgg gtggattgct 11340 tgagcccagt agttcgagat
cagcctaggc agcatggtga gatcccgtct ctacaaaatg 11400 tacaaaaatt
agccgagtgt gatggtgcgt gcctgtagtc ccagctactg gggaggctga 11460
ggtgggagga ttacttgaac ctagtaggtg gaagttgcag tgagccaaga ttgcatcact
11520 gcattccagc ctgggcaaca gagcgagacc ctgactcaaa aaaaaaaaaa
gaaaaataga 11580 aaatttgaat ctgtaatttc tatatgggct gaaagaaagc
actttgagga aagaaatttc 11640 agtttgaaaa ctggaataag tgaatatact
gcttaggaat aaaggagatt gagagaaata 11700 gaatttcttt ttcttttcag
cagtgatgtt ccctgggtct ttgtgcctct attggacata 11760 gatagcttca
tagcctcttt tgctttgctt ttacttcttt gtactttgaa tctagaggaa 11820
ctttttaaac ttgtaaagat tttgcagtga cattaaagga atttttagaa ataaatagat
11880 caccacacat cttactgtca tcatgcatca aatttaattt ttgttcgtct
tctgggctca 11940 gttcatattc aattatatgt tttgtttttg tatccatgtc
tgatgttcat attaagtact 12000 tttgttaatt tcattgagtt aatgtatact
aattttataa tttctctttt tagacattaa 12060 agttatttcc aattattctc
tttcatcccc ttctgcatct acttctactt ctgcatctct 12120 tcaatgaact
tcttcaatag catcctgtct cctagttctt ctgtcttgaa ccttttctct 12180
tcactgagcc tttctaaaag aagtctgggg catcccattc ccttgagtaa aagactttaa
12240 tggctatagg atggacacca aatttcttag tataacatta agaccgtttg
caacttgtct 12300 tgggcctatc tgtcttgcgt caactctagt tatcacctca
ctgacaccct agttctagct 12360 ctactgaatg taaaacagct tcacattgag
ttattttatg tctctatgat tctgccttca 12420 gttctctgct gggagtgctc
ttccatctct gatttttttt tttttttttg aaatggagtc 12480 ttgccctgtt
gcccaggctg gagtgcagtg gtgcaatttc ggctcactgc agcctccgcc 12540
tcccgggttc aagcgattct cctgcttcag cctcccaagt agctggcatt acaggcatgc
12600 gccaccacgc ccggctaact ttttgtgtct ttagtagaga tgaggtttca
ccatgttggc 12660 caggctggtc tcgaactcct gacctcatga tccaaccgcc
accacgcccg gcctccatct 12720 ctgaatttta aaattgaatc tatgctttcc
caacagctgt aggctgttag cgctcatctc 12780 tgtgtgcctt cacagtctgt
catacatgtc atttaacata atgcttatca cattgtattg 12840 aaatgtatct
tataggtatt ttttctctac caaacttgaa ttcacttttc tcctttagcc 12900
atcctgtact gagcagtgtt ttgggtctgg caaatagttt gtactcagta aatgtttgga
12960 aaatgagttt taactgtttt attttcgtgg ggtgaattcc tagtagcaag
ggtattcaaa 13020 ttttattatc tacttcttcc acctgaacag cttcatcgta
attatacttt aattcccttc 13080 attctaggca ggtaatggat aagttccaaa
attacgatgt tgttggagag gtttgaatat 13140 tactagcaca tgaaatctga
tttgaactga ctaaatgaag gtttagtaca tcattatgaa 13200 ttagtgtgaa
ctaagttttg ctatgttaac ttctctgaaa tctcagtcgc ataatgtgag 13260
tgtctttctg gctcatgctt catgcctgag actagtgggg gttgtgtctg cctattaaag
13320 tcactcggac ccaggtggat tggagattca tctaaagaca tgcttccctt
atctctaagg 13380 caggaaaagg aaatggggcg catctctcat tggcttgtaa
tgcttctgcc cagaaggagc 13440 tgtcacttcc actacgtttc atggatcaat
ttaagactca tagacacacc tattagtata 13500 ttcgaaggaa gttagaaaga
gcagtgccca gaagaaaagg ggagtttgtc agtagcccta 13560 atgactatca
cagttactga aagtgtgcct tgggcataat ctatcttaac tcccagatat 13620
acgctgacag ttgtttttct aaaagtcatt cacagtgctc agattctagt tagtccaaat
13680 tgatatggtt tggctgtgtc cccacccaaa tatcaccttg agttgtaata
attcccatgt 13740 gtcagggggc ggtgccaggt gtagataatt gaattatggg
ggcggttccc ccatactgtt 13800 ctcttggtgg tgaataagtc tcacaagatc
agatggttat ataaatgata gttcccctgc 13860 acacgctgtc ttgcctgcta
ccatgtaaga caggcctttg cttctccttt gccttcctcc 13920 atgattgtga
ggcctcccca gccatgtgga actgtgagtc cattaaacct ctgtctttta 13980
taaattaccc agtctctggt atgtctttat tagcagtgtg agaacagact aatacaaaat
14040 gttatactaa atattaatat ttcatcctct gattggccgt gataatagca
tcaactatgc 14100 taaatttcta ataatacaca tatttctaat aatatgcatc
taatagggtt tatattgtga 14160 ttatgtaaga gaatattctt gttcttaaga
acaagggtcc ttaatctgtc acaggattag 14220 agatttaaag aataaggatc
tcgattctgc agcttatcct caaatgttca ttaattatgt 14280 gtgagtgtgg
agagagagaa agcaaacatg gcaaaatgcc acttttcagt tggtgaattc 14340
aattggtgaa tctggaagaa ggatgtacag gagttattgt atgattcttg caactttttt
14400 gtacatttga atttttttca atagaaagtt aaaaataatc atggcacagg
tttacaaaac 14460 ccttgtaaac attagtgtta actactttta agccattatt
gcttttcatt ctgattgatg 14520 ttttgaaagt acttttcttt tcctctgagg
cctgtaaaat acgtggacta tattaatcag 14580 tgatctttca aaaacaaaga
ctgaggccca aacattaaac ctagatggaa atctgatttt 14640 taaaaattca
caaataatgc cagatttcat ttaaaagact ttttttcccc cttctagttg 14700
gttgaaatgt tggatatgtc tgtccctgca gttgctaaat tgagacgtct tttagatagt
14760 cttccaaggg aagctttacc agacatcttg acatcaatta tccgaacaag
caacaaagag 14820 aaactccagg tacagtgttc ccttttgaac gccaggttgc
tttgtcactt tttattgaga 14880 actagatagt gagtagttaa gttttgacct
tcaagaaaaa gatattggag acccaaagta 14940 attgaaatgc ttttacattt
aaactgactt tcaaatgtga ttgttttata tttttgttga 15000 cacaagcagc
tcttttattt tatatttttg ttgacacaag cagctctttt atttgcataa 15060
tcagtaatgg tagtcaattt acagaaaaag ttaaagcaaa gaatcataaa aaggtaaata
15120 tttgactggg tgctcacgcc tgtagtccca gcactttggg aggctgagat
gggtggatcg 15180 cttgagatca ggagttcgag accagcctgg ccaacatggt
aaaaccccat ctctactaaa 15240 aatacaaaat tagctgggcg tggtggtgcg
cgcctataat cccagctact cgagaggctg 15300 aggcaggaga atcgcttgaa
cctgggaggc agaggctgca gtgagccaag attgcaccac 15360 tgcactccag
cctgggcaac agagactctg cctctaaata aataaataaa taaatattta 15420
atttaactta aatatgtaga cattctttga ttcactattt ttaaacgtgg agccatggcc
15480 cttcccttat gtgtggacct gctttcttag aatcttcatc atgtttctta
tataaatcac 15540 acctatgatg cattacttat aattttaaat ttatatttat
ttaaagtgaa atgaatttta 15600 aagacacttg aaaagtaatc caagtataga
atcctacatt tacatgactt aatccccaaa 15660 ctgtaatact ttaagttttc
ttgcacactt atttttaaga tatttttaaa gcagtatttt 15720 taatgaatca
tcctagaata tttgtttgtt ttcagtgaaa cagctctttc atatgttatc 15780
agtttattta atacttaaat ccaactgtta taatagcaaa tacaactaac acaaacaggt
15840 tggttataca caggaattca attaatccag tgggagtaga agagttacag
gactgccaga 15900 gagccccctg gctgtgggcg gcagcagtgt gttttactgc
gggaacagag agcggcctgt 15960 gctccgacaa atcactagtg agagttggtt
gagtgcttct gttctcttgt gtatgtaaac 16020 atttaatatt ttgaacctat
aatttgttta gatctaatat gaaaacacat tctgggcttc 16080 aagagagtaa
ttcccagaaa gagttgacgt caactgtgtg tctggttttt tcatcttaaa 16140
aacacacagc ttcggccggg cgcagtggcc cacgcctgta atcccaacac tttgggaggc
16200 cgaggtggga agatcacgag gtcaggagat cgagaccatc ctggctaaca
gagtgaaacc 16260 ctgtctctac taaaaataca aaaaattagc cgggcatggt
gtcgggtgcc tgtagtccca 16320 gttactctgg aggctgaggc aggagaatga
cgtgaaccca ggagggggag cttgcactga 16380 gccaagatct cgccactgca
ctccaacctg gggacagagc aagattccgt ctcaaaaaaa 16440 aaaagaaaaa
aaaaaaccac acagcttcat tttaaagtga aaaaccaaga tcctgttttt 16500
tctttctttt ttaaggattc tgatattcat ctcaaacaac cttgctgatt aatatagttc
16560 atttggttgt cttagccata gtgtagcttt gaatactgtt aataattttt
ttttaacttg 16620 gcaatttaaa ccatggctct gactgtctgt ttttggattg
tgtgtttctg agagagatcc 16680 tattgattga ctcacatttc cttagatttt
agatgctgtg agcctagagg agcggttcaa 16740 gatgactata ccactgcttg
tcagacaaat tgaaggcctg aaattgcttc aaaaaaccag 16800 aaaacccaag
caagatgatg ataagagggt aaatatttat tttaacccat ttcagttttg 16860
aaaaaaaaat aaggagaata aagagaggaa caaagaagaa aagtttattg tctcctacca
16920 ctcgcactac tgataaaatt taggtgtttc cctctcatcc ttttctttgc
ctggattttt 16980 ttttaaagca tgtaagcatt tttctcactt tgttttggtt
atcatccaaa aggataattt 17040 actgagccat ttcccctttt gtgttgtttc
caatgttttg tgtattgtaa acactaacaa 17100 ataactatga tgggtgtctt
tgagtataac atttttttac tgcatgtaat actaagaaac 17160 taatacaaaa
ctctttctta aaaggactat atgttgtgtc aaaatttggc tgttttcaac 17220
ttataataag tttccatttt tatttagtca aactcttgat ctttttttgt tttctaagct
17280 taagtcctct aaccttcagt ggcttgataa atattcactt tcctttcagt
ttaattttag 17340 ttgatttttt aaaaagtatt taattcttta acccatatat
tattttgaag acagcagttg 17400 tatttttccc tcaaatagct ttttgtttga
ctcaacacca ctaattaaat aatccttccc 17460 atccccatta tcatctatta
catttatatg tatgatggga tctgtttgaa gtctaccttg 17520 atctgctgat
tttactattt ttatgtctgg acagagttta tattaggaag atatatttga 17580
tgtggacagg atgtgaaaat ggcatttctc tgaaggtgtt gagatgcagc gctctgactt
17640 aagttgaggc gttgagaatt atgttagcaa tttgacgttc atcagcgcag
aagtcttgtc 17700 atcaaagaga atacattgta gagaaagcgg agcagaaggg
aagaactcct ccccggtggg 17760 actagagaag gggcagtcaa gtaggctgag
gagagagata ggaacagtga tgatcatgct 17820 ggcgattagt actccaggac
accatgctgt ttaaaacatg cagaaagctg gattatttct 17880 ggcttgagat
caggtcaggg actcaattac tcattttgta tagagagaca aatccactgg 17940
gagttgcaga aaactgcaac ttactctcag taaagtttgc catcacttaa aatgaaagtt
18000 tttcaaaagt gctccagaaa ataagcaaga gacagttatt taaaaagtag
gaattaggat 18060 aatatttgga gttaacctaa aactctctcc tttttgttcc
cctaagagtt gaaaagcact 18120 gttttagcag tcaggaagga aaaatgcatt
aaaaagtgct tttgtcttaa caatgaaatc 18180 actgatatgc ttataaaaat
ctcactttta aaaaatatat aatatgttca gttttttatt 18240 tataatattt
tatctgctga tgacttatgt aagaataaaa gcatatattt agtacttgtg 18300
tttttataaa attaaatttt tatttactgc tttatgtttt aaacattttt atatttgaat
18360 gtattaaata gataaatttt ccaggttaaa aaataagttc tgggctgaat
gcagtggctc 18420 atgcctgtaa tcccagcact ttgggaggcc aaggaaggag
aattgcttga ggccaggagt 18480 tcaagaccag gctgggcaac atagtgagac
ctcatcttta caaaaaaaat ttaaaaaatt 18540 agccagcatg ctggtgtgtg
tctgtagtcc cagctattta ggaagctgag gtggaaggat 18600 tacttgagcc
agggaggttg aggctgcagt aagcagtgtt catgccattg cacttcagcc 18660
tggattacaa agcttgacct tgtctcaaaa aataaaatgt tctgggggct tttaaattaa
18720 atgctagtat ataattttgc tccagtagtg gttgtttatt catgaatttc
aaggagcata 18780 taaggtagtt ttaacatatg atagagagat catagagaat
acaaaggcca tttgactttg 18840 cacagaatat gttttttaga tttgaaagaa
caattttggc aggatgggaa cagatgccga 18900 aggctcactg aagtaattga
tgaggtaggg gatctggtgg ttatagccac ttgctggaga 18960 agcagaactt
cacaagaaag gaagtaaata gtgcgatagt taactagaag aaactagagg 19020
taagaaaaaa atattttgaa agcaggaaag ctttgaagac aaaatagagc cagtggtgga
19080 aaggttgaag atgctaggaa gaaattttgt aatgtaggag ataaaatgga
atttttttca 19140 gtcaccaaat ggtaagaagt aatgtatttc aagaaaatag
tggctgcaat agtagctcaa 19200 agaaaggtaa ttcctagatg gtttaattat
ttctagtatc cagttccttg aaatttgttt 19260 tctcatgcaa gtattattgt
aagcatatac caaagaatca tgtctacctt acgttggtct 19320 acttctgcaa
ttctgctgcc tctctgtata caactgcctt ttgattatca ttctgaactt 19380
cacttcctaa agatagagac tgtagtcata aaaatattta ttcagcacca gtcataatct
19440 tatgtgtacc tgggtacttc gtttccaatt tattttgaca tacggtttta
cttttctgct 19500 ttctatgtta ggttatagca atacgcccta ttaggagaat
tacacatatc tcaggtactt 19560 tagaagatga agatgaagat gaagataatg
atgacattgt catgctagag aaaaaaatac 19620 gaacatctag tatgccagag
caggcccata aagtctgtgt caaagagata aagaggtaaa 19680 ttataaaagg
catttgttca ttattgtttt cattcttggt actcctgatt aacaccactt 19740
tcactactct tttctccaat actgaggata cataatacaa atcttccacc tgcagtgtgc
19800 tgtcaggcaa tataactctt gcagctgcct ttttgttgtc tgaaagaaca
gaccatgctt 19860 ctttgtttat acgtaatgtt tgttcagtta gcatcatatt
cttcacatgt gacttttctt 19920 ctctagatta taaactctca agggcaagga
ctgtccattt ctctttgtac aagacaaagt 19980 acagggaaac cttgataaca
gaataggata tatgggttga ttacattttc tggatatccc 20040 cagtgttaaa
ctgaaagcca tttttccttt gcatactttt aactttataa ctcttattac 20100
attttctttt attagtgaat tgtagtgagc ctgcttgaat gcttagtgac ttaatatttg
20160 actttctgag gcttacagtt aagaacatta gtaattgtag ttgatgggta
ttttatattg 20220 cctctgacat tagttaatat atgtagaaca tttattatgt
gcagaacact ttgctaagca 20280 ttgcatatat tatggaagta gcatttgtta
ttaaatatat gatattagct tgcttttatg 20340 agcagacctc actcatctct
gatacaaaaa aaaatgtatt gtattatgca tagttaggca 20400 cttacatctt
attgtgataa gtaaaccaat ggatatatgt cacttgacta tccctgtgag 20460
cttaaaaggg acacacacta gtaaggccat atttccaggt tagaattaga tataatgttt
20520 tctcctgcag tttgcaggta tctgccttat tttgttttgt aagtacctta
agtacttaga 20580 aaatatgaga atactttgta gagaaagcag agcagaaggg
aagaacccct ccctggtggg 20640 actccagaag gggcagttaa gtaggctggg
gagagagata ggagtggtga tcattacatt 20700 acaaaacaaa ataaacgttt
tattatctgg atactttaaa actttttcag atttgtttaa 20760 acatgcatga
tatatctaac caagaaagag agctgtgttt gatttttctg ttatggaatt 20820
tttctgtgtt cttgaacatg tttgctgtgt attctttctc cacagactca aaaaaatgcc
20880 tcagtcaatg ccagaatatg ctctgactag aaattatttg gaacttatgg
tagaacttcc 20940 ttggaacaaa agtacaactg gtaagccaaa aaataacacc
tgttttgcag tctaattgtc 21000 actcagaaag ctcatgcaat ttttcatttc
aaatttactc cactgattgt cgtactgtta 21060 aattattttt gttttcaatt
tttttgaaac cattttattg aagtgtgatt gtcgtacaaa 21120 aagctgtata
taattaatga atacatctca gtgagtttca gaataagtat acacccatga 21180
aaccatcaca atcttcatag ccataaacat atccgtcacc tccaaagttt cctcctacct
21240 cttttgtgat tattattatc atcattatta ttggcttttt tcttttggtg
ctggtggtaa 21300 gaacattgaa cataaggtct aatgttaaat taacaatatt
gttagcgata ggcacttttc 21360 tttatagtag atctctagaa cttatttatc
ttgcataagt gaaactttgt tccctttaac 21420 catcacctcc catttccttc
tcctctcatc ctgtggcaac tactagtcta ctctccattt 21480 ctatgagttt
cactatttta gattccacat gcattaaata ggtgaaatca tacagtactt 21540
gtctttctgt gtctggctta tttcacttag catgatgccc tctaacctag aggtccatcc
21600 atgttgtcac agatggcaag atttccttct tttttaaggt gcataatatt
ccattgtgtg 21660 tctataccac attttcttta ttcacttatg tgtcagtaga
catttcagtt atttccgtat 21720 cttggctatt gtaagtaata ctgcagtgaa
tacggaagtg cagataactc tttgagatcc 21780 tgatttcagt tcctttggct
gtttacccag aggtggcatt gctggatcat atgtaagttg 21840 tatttgaact
tttttagtaa cttccatact gttttcataa tggctgttat cgggggacct 21900
gccccaataa tcatgtaggt tcttttctat tttcctaagc attggctggc ttgagaaata
21960 aagagacaga gtacaaaaga gagaaatttt aaagctgggt gtctggggga
gacatcacac 22020 gttggtagga tccgtgatgc cccacaagcc acaaaaacca
gcaagttttt attagggatt 22080 ttcaaaaggg gagggagtgt gcgaataggt
gtgggtgaca gacatcaagt acttaacagg 22140 gtaatagaat
atcacaaggc aaatggaggc agggcgagat cacaggacca cagctccgag 22200
gcgaaattaa aattgctaat gaagtttcgg gcaccattgt cactgataac atcttatcag
22260 gagacggggt tttgagataa cggatctgac caaaatttat tagatgggaa
tttcctcttc 22320 ctaataagcc tgggagcgct atgggagact ggagtctatc
tcacctctgc aatctcgacc 22380 ataagagaca ggtacgcccc gggggggcca
gttcagagac ctacccctag gtgcgcattc 22440 tgtttctcag ggacattcca
tgctgagaaa aaagaattca gcgatatttc ttccatttgc 22500 ttttgaaaga
agagaaatat ggctctgttc tgcccggctc accagcggtc agagtttaag 22560
gttatctctc ttattccctg aacaattgct gttatcctgt tctttttcca cggtgctcag
22620 atttcatatt gcacaaacac acatgctgta caatttgtgc agttaacgca
attatcacat 22680 agtcctgagg ccacatacat cctccttggc tgacaggatt
aagagattaa agtaaagaca 22740 ggcataggaa atcacaagag tattgattga
ggaagtgata agtgtccatg aaatctttac 22800 gatttatgtt tagagattgc
agtaaagaca ggcataagaa attacaaaag tattaatttg 22860 gggaactaat
aaatgtccat aaaatcttca caatccacgt tcttctgcca tggcttcagc 22920
cggtccctcc gtttggggtc cctgacttcc cgcaacacgc tgtaccaatt tacattccga
22980 acaacagtgt acaagggtgc ccttttctcc atatcctcac cttcactgat
gatggttttt 23040 ttgtttgttt gtttgttttt ttaaataatg gccatcctaa
caggcataaa gtgctttctc 23100 attgtggttt tgatttgcat ttccctgatg
attagtcatg ataagcacct atttgatttt 23160 ttgccgttaa gtttcatgag
ttccttgtgt attttggata ttaacccctt atcagaaata 23220 tggtttgcac
atattttctg ctgttacata ggttgccttc tcattttgct gaactttttt 23280
tattctgtac agaagctttt cagtttgata taatttcact tgttcatttt tgcttttgtt
23340 gccttgactt tggtgtcaat atccaaaaat accatgccca gaccaatgtc
aaggagcttt 23400 taaaatatat tttgttctag gagttttaca gtttcaggcc
ttacatttaa gtctttaatc 23460 cattttgaat taatgtttgt acatggtgtc
atataagggt tcaagtgcat tcttctgcct 23520 gtgggtatct ggttttccca
caacattttc ttgaagagac tgccctttcc ctattgtata 23580 ttcttggtgc
ccttgttgaa aattggttga ccttctaggt aactttatag gtttatttct 23640
gggccctcta ttctattcca ttggtccgtg tgtctgtttt tgtgccagaa tcatactctc
23700 tgattactgt agcttcgtaa tataacttga agtcagaaag tctggtgcct
ccacgtttgt 23760 tcttgctcaa gattggtttg gctattcagg gtcttttgta
atttcttatt aattttagga 23820 tttttaaatc tatttttgtg aaaaatgtca
ttggaatttt aatagggatt acattgaact 23880 tgtaaattgc tttgagtggt
atagacattt taacaacatt cttctagtct acgaacatgt 23940 aatatctttc
catttatttg tgtctgactt atttcatcag tgttttataa tttttagtgt 24000
acagacattt tacctccttg gttaagtttg tacttaagta tttcattctt tctgaaacta
24060 ttgtaaatga gattgtttcc ttaatttcta tttatttatt tatttttttg
acaggagttt 24120 cactcttgtc gcccaggctg gagtgcagtg gcatgatctt
ggctcactgc aacctctgcc 24180 tcccaagttc aagcgattct cctgcctcag
cctcacgagt agccttaaat acaggcacct 24240 gccatgacac ccggctaatt
ttttgtattt ttagcagaga cggggtttca ccatgttgga 24300 caggctagtc
tcgaactctt gacctcaagt gatccacctg cctcggcctc ccaaagtgct 24360
gggattacaa acgtgagcca ctgcgtctgg cccttaattt ctctttggag aaaggttttt
24420 tttttttttg agctttattg aagtgtaatt gacgtacagt aaacttcaca
aatgtagtat 24480 gtacattttg atgagttttg acttacatat acatctgtaa
taccatcacc ataattaaga 24540 taatgagcat aaccctcacc tccaaaagtt
tcttcatgct ctttgataat cccttccttc 24600 ttccccgccc ctttcctcct
tgcctcctaa tccccaagca accactaaag attaatctgt 24660 attttctaaa
atttcatata aatggaatca tagagtatga gccctttttt ctggcttctt 24720
taattcagca tgattatttt gaggttcatc catgttgctg tatataacag taatttgttt
24780 ctttttattg ctggagttgt attctgttgt atggatatac catcatttgt
ttatcaattc 24840 atctgttgat agacatttgg gttgttttca gttttttggc
tattaaaaat aaagctgtct 24900 gggcacagtg gctcatacct gtaatcctag
cactttgaga gaccaaagtg gacagatcat 24960 ttgagcccag gagtttgaga
ccagcatgag taacacagga agaccccaac tctatttaaa 25020 aaaataaaat
aataaatgaa ataaaaatat ttaataaaat atcaaaaaat aaagctactg 25080
tgaactgtgg tagtaaattt atttttaaat ttatgtaatg tttgcatgtc gtgacaaaat
25140 actgcctttt agttgaaagg aaacatttct tggtactctg agatgccatg
tgtgtcagca 25200 ctagagatgt gtagcagcca tgtatccatc atgaaaataa
ttccattgtt tagcattgca 25260 catagcacaa agaactgaag atgaataaat
tatggtataa aaggagtcat gttaagctcc 25320 taaaccatta ctacacagga
ttatgtctag ataattgtga gtgtggttat aaaaccatga 25380 aaatgccatt
catatatata tttttgagat ggagtctcgc tctgtcaccc agtctggagt 25440
gcagtggtgt gatcttgact cactgcagcc tccgcctcct gggttcaagc aattctcctg
25500 cctcagcctc tcaagtagct gggattacag gcgcttgcaa ccacacccaa
ctcatttttg 25560 tatttttagt agagacaggg tttcactaca ttggccaggc
tggtctcgaa cttctggcct 25620 caagtgatct gcctgctttg tcctccaaaa
gtgctgggat tacagacctg agccactgtg 25680 tccagcctaa atatctttgt
ttgtttgttt gttcgttttt tgagatggtg tcttgccctg 25740 tcggccaggc
tgtagtgcag tggtgtgatc tcagctcact gcaacccctg cctcctgtgt 25800
tcaagtgact ctcctgccct agtctactga gtagcaggga ttacaggcgc ctgccaccat
25860 gcccagctaa tttttgtgtt tttagtagag atggggtttc accatgttgg
ccaggctggt 25920 ctcgaactcc tgacctcaag tgatcctccc acctcggcct
cccaaagtgt tgggattaca 25980 ggtgtgagcc accgagcctg gccccccatt
cataatttct gaaagagaag tttacctacc 26040 aagtagagat ctcagatagt
aaccgaaaac aaaaaggaaa gcagagagga aagagttgta 26100 ggaaatatgt
ttgcagattt tcccagctta gaggagtcag tagataccat ttcaatcttc 26160
taattataaa taaggaaatt tatattgaaa tttgaaaaat tttttacatg taatcacatg
26220 ttattcaaaa caggaagcat gctttctgaa tcattaaaga gaataattag
aaaaatatat 26280 cctgtataga aaagatagaa aataatttat acagcatgga
aatcaccttt acttaaaaga 26340 ttgaaagaac ttttaaaatt gtctttactt
ggcatatttc ttgcaagaaa tttcttcaca 26400 gtgttttcag tcttttctaa
attatcttga cttttattct taccttactg aatgtgttaa 26460 tcatgaatgg
ataacgcatt ataacaagta cctttttagg tacaagatga tattttgatg 26520
gaaacttact cttcttgaac atgatgacat tgatgaccta acactgaacc atgtttgcat
26580 aactaaaata aatcccactg ggacttagta tattattctt tatagatttg
atttactagc 26640 attttaatat ttacagctat ataaaaagat ttgtctgagg
ttttctttta tgtttactgt 26700 ggtaggtttt agtgtcaggg ctagcactgt
gaaacaattg agaaactctc tatctttcac 26760 ttcttcatat attcattggt
tgggttctgg agccaggaaa gggggaagaa attttagttg 26820 ttcttctcct
acttcactca cctaggactc tgactaaaat caatagtact ataattaaat 26880
tatatagttt actgcttagc taggtttttt gggggactag cttgggaacc aaattaccat
26940 ctcaggccat ttttttcctt tatgaaatat ccttagcaaa ttctaaataa
ttaattaaaa 27000 gatatgtatt aattaattaa aagatttctg tgtatttctc
tctcccatct tcttctttca 27060 ctgccagcat gatcaggtgg ctgtgtatta
taccctggca gccacccagc tagtgaattc 27120 attttggctt ctgttacctg
gtgtttaatc tgagtatttt aaatgctaaa tcttattagt 27180 aaacctgttg
aaagcttggc tctagaaaca aagcctaact catacacttc tggtgagact 27240
ttgatacaac tttctgtgtg gcaattaggc aattctttac atcatctgtt tttttttttt
27300 tttttgaccc agcacttctg ttcatagaag ataagctgaa agaaatcatt
gcagatatat 27360 gggaagattt agttccagtg atgcacagtt gaagcatctt
ttataaatgt aaagatgtgt 27420 aaacaacttg aatgctcagc agtagggaat
tagttaaatg aatatagata atttagtaat 27480 ggaacattaa gtaaccatag
aatgttactg ataaatatat gtgtgacagt gaaagttgtc 27540 tgtcatatat
taagtgaaaa aaacatttta caaaacttaa aggccccata aaatcccatt 27600
ttgaaaaata ggtttgtaaa tgcacgcaca cagcctggaa ttacacatac tgaagtaaag
27660 gtagtggtga tctcttgggg gcatgagatt atgggtaact gttttcttct
tttctgttag 27720 tgttatcagg ttttctggaa tgaacatatg ttactactga
aataaggaaa aaaatcaccc 27780 ttttttttaa aaaacaaatg ccagcacaca
tacaatatgt agaaattaag aagtaatgca 27840 taactagaaa atcattccaa
ataaaatgat atgaacattg agtttttaat tgtgtagtgc 27900 ctactatctc
tggggacact aagtcttaag cagagaaacc aaaccaaatg cagatctcct 27960
agaatcctca tctagaaaga tccaagtctg ttcttatcac atctattttc aaaaaaaata
28020 ttttgccctc gtcatgcttg aaaggagttc tttaacttaa aaattttatg
tgttctaatt 28080 atttctgttg ggttatttga cagaccgcct ggacattagg
gcagcccgga ttcttctgga 28140 taatgaccat tacgccatgg aaaaattgaa
gaaaagagta ctggaatact tggctgtcag 28200 acagctcaaa aataacctga
agggcccaat cctatgcttt gttggccctc ctggagttgg 28260 taaaacaagt
gtgggaagat cagtggccaa gactctaggt cgagagttcc acaggattgc 28320
acttggagga gtatgtgatc agtctgacat tcgaggacac aggtagaaca cttctctcag
28380 tttaatctct gattcctctt tctttttaat tgactagagc tccctaaaag
cttaggcata 28440 gcatacatct attttcctta aagggctatg tgtggtacct
tgaatgaaaa ggacatttac 28500 aagaagtatc agctagccta gagcctctaa
gcgtaatgat aaacccaaac taaccttgat 28560 ttgtatgaca gtggatacta
ctctgtgcct caactttcct ggaatctcat ttgaatgtaa 28620 ttataagtta
tttatgattg gatattatta tgtctttaca ctcttttcaa cccagtagca 28680
tgccataaat aatgatccct aactctcaga gttaaaaaaa gtaactgcaa tagggagggc
28740 caataggagg aggtgagaag tctttgataa caaacttgtt ctgattgcag
tctaaacttc 28800 ctcttatgaa ggttggtttg tattatgaat atgagtaata
aggataaatg ttagcataat 28860 tattaaggct tattcttgca ttttggactc
actttctata aaaaaacaat aaactgtaag 28920 aactgtccct ctaggctggg
cacagtggct catgcctgta atcctaacac tttgggaggc 28980 tgaggtgggt
ggattgtttg agcctaacag tttgagacca gccggggcaa catagggaaa 29040
cacttttgtc tctacaaaat ttatatttaa attttttaat tttaaatttt aatttttgtc
29100 tccacaaaaa ttaaaaaatt atgcaggcac agtggcatgc acctgtggtc
ccagctactc 29160 aggaggctga gatgggagaa tcatttaggc cnnnnnnnnn
nnnnnnnnnn nnnnnnnnnn 29220 nnnnnnnnnn nnnnnnnnnn naaccaagcg
gcaaataagg aaactttgtc ttacacaagt 29280 aaatttactt cttcatttac
attaaatttg gttccacaaa aatataaaat taagctaggc 29340 acaatggcag
gccttgtgtt cccagctctt agaaggtcta aatggagtat cattacgtct 29400
tgaaagttcc agtttgcagt aacccatatt gtcccctcgc acgccagcct ggagacagag
29460 acattatctc aaacaaacaa acaaacaaac aacaacaaaa ctgtttctga
ttaatctgac 29520 attattagaa tcagatttgc atgttgcatt cattgttctc
actggtctct ttgttgatct 29580 gatggaaatt gccttgggaa agcatgaatt
tacatttcgt ggtttaaggg attcatagca 29640 attgtaagtt gtgagaaaac
atacctatag tgtatgtgtt aaagaacatg tttaaatgta 29700 ggaaccatga
actgcttata aaagaatatg atgctttttt aatatcttgt tttctatttg 29760
ccttattcaa agggatccct atccatagac agggatggga aactgtttca gaaacttttc
29820 tataagaaat ggttattttt attctctttt atttgctcac ttaaaattct
tacgcattta 29880 aaaagtatca ttactggcct tgtgtagtag ctcatgcctg
taatcccagc actttgggag 29940 gccaaggcag gcagttgctt gagctcagga
gttcaagaac agcctgggca acttggtgac 30000 accccatctc taaaaaaata
ataataataa attttaaaaa agactcatca caagatttta 30060 gtaaataaac
aatgaggcgt gcagatcaga gtagagaatt gatttgggtg atttcttctg 30120
gcaatttcaa aagatatttt tgttgcctag acttcttatt cttgcatgta ccactagagg
30180 ctatagtttg ctttcgtaaa ggaattggca tttctcttgg accaaactca
aagaagctgc 30240 gtctagggcc taaatcttct aattttagct acagagtaag
tatttgatgg catttagaga 30300 gtgagttcgt ggaattaatg ctatgtgaaa
ttgacatcat aagcacgtga catgtaggta 30360 atttgttctt atttcttttc
acattggtat tgattatttg ataaggcttg gaaagcactt 30420 attcaatacc
tgacacacag tgagcattca ctaaaaatta gctttaacca ttatttaaat 30480
tctattaata aattctcagg aggacaaatt tagatttaca agcttcagta tgagttttta
30540 taaatttcaa tctgattttt taattgcctt ctaaaatatt tatcctattc
tcagcattat 30600 tacttaattt atacggcaga attatgggaa aatgcatttt
tctgttgcct actaatggac 30660 agtgtatagt gtcatggttc tcaccactta
caaacatcac tggattaaaa taaatctcta 30720 ttttaaatcc ttactgacat
ataaaatttg ttcttttttt caagtgaata tgcttttgtg 30780 tatgtgactg
tattaagaaa attgagtctg aagaaaataa gaattgactt tatgggtctt 30840
ttgtaaaagg aggttgtgtt acaatcacca ttgcctaaaa tatttgtaaa tataaccttt
30900 ttagaaacgt atatatggag gctgtgattg ttgccgagta aaaagtataa
ggatttgttt 30960 tgtgaatcat tctattcagc ctgattttag atacaccttg
ctggtaagtg ttacttagcc 31020 atcagtgtac cagatgtttg attaactact
atagcaacct gcccttgtgc tgttggggac 31080 atattaccca tctaccccgt
gaattattaa agcctggtga aaaattttat ttcaaaccct 31140 gtttggaagc
acgtggagag tagtggggtt cagttgttga ggaaagggtg agggcagagc 31200
atgcacttag gtcagttatg aattgaaggt gaataggagg aggagagaaa gaacaaccga
31260 caattccagc acaaccatgg gtgtgcctgg gggaacatgt ggttccatgt
gacagttgag 31320 gcatttggga gacaacccag gtcttgacgt ttgagtaccg
gtcacatgct cacagttaga 31380 gttcatgaaa agttttgttt ttcctcagcc
tttgagtagg caccactgtt ccgcagcctt 31440 agaatagcca aggaaaaaga
aagccaggga aaaagaaagc tgctttgtta ttgtccttgc 31500 ttatcctctc
gattttgcca ctcactctcc ctgttttccc atgtgtggaa cactttcctt 31560
ttgctaaaag tacctgcgta tgagaagaag gatgccgata agttggggat tgattttaaa
31620 aacaagcaaa gatatgtttt ttatggttaa atgataatga ggtgggagat
ggggaagcaa 31680 aagagaggct tgccttaata tttaatctta aacttggaaa
ataatagtga tctgactaaa 31740 cattgcctca tttttgtctg tattgttttg
agtagcttaa aggaagaata atgtttatgc 31800 tacgtattaa ctcattcagt
ttttcagtct tttcgatatt tctcatttgg atttatctcc 31860 attgtgattt
ttctgtccac tttgtaagcc acaaaatact cattcccttc tatcagtttt 31920
aacaacttaa atttttatat ttaagtatta catttaaata atttaagtca attcacacaa
31980 atataaggta actaacttct tttaagatga agttttatga aataatgttt
gcataattgt 32040 ttttcatttg ttctttggta aaaagaaata atatattatt
gttatgatat atcttaaatc 32100 actgtggata ttaactccta gaaatacttt
accagctgtt tacttagata ataaaattat 32160 attattgcaa gaaatccttg
tctcaacttt caaacaagat gagaagaaaa atgaacttgt 32220 gatttccaca
ttgatacatt ttcatatgca acctgaaatg gtaaagttat aaataaacta 32280
tttcattatt agtttctaca agggaaaaat aactgaagca gcaagcttct aatgtatttt
32340 tttagcatag tgtaccagat atattatggt ttgcccacta tcctttcaac
ttacatttgc 32400 atgtagctct tctttgcctc tccaaaactt aggtttattt
taaggcctca acccaaggct 32460 tcctccatta atgtaagtgc agtcagttat
gatttcactc ttctctaaac tgaccaccta 32520 ttgtgctcct ttatcgaata
cgggcctctg gcatttctac catacaactg tggagatgaa 32580 acataaatac
gtttataaaa agtacaagct ttctcaggca ggggatttat cgtctatctc 32640
ctttatgtac cccatgatgc ttatttaaca tggtgctaaa tgtggtgagc gctctctggg
32700 tgttttgtga attcatgtaa gattaaaaca taatattttg gaagttatgc
aaccctttag 32760 acgagtacac ccatacaaat tagtctataa aaagatttag
gaatgactac cagaagaata 32820 attgcatttg tttagacatg ctattataca
ttaaaatccc agtttcttaa agactgtttt 32880 tctttttgag atcattagga
tcttttttaa actgattcct ttttccagtt tgagatacac 32940 acacacaccc
acacacccac ccacacccac acccacacat ccacacaccc ttggtagaaa 33000
atgtgaaaaa taaggggaaa aaatcctcat gtttttctac cgtacaaaga taatcactgt
33060 taacatttgt tttgttctgc cagacttatc attggatttt aagtaacaga
attgtaatcc 33120 tgtcattttc acttaacatt gtaacactta aactcttttc
tattccaaat tctttgtaaa 33180 ttttatttta acagtttgca ttatagcctg
cgggagccga gccctttaat tgaataggta 33240 ggaagagtgg atggtgaaat
gcctatattt ttctctcttg tctgctataa aagacatttg 33300 caaaagttgc
ttccatgagg cagaaattga aatgggactc aaattcaggt gtactgaatt 33360
ctgctcttgt gctttttcca ggaaaccaga agtaaacttt aagtagctgt tgctaataat
33420 gatgagcatc actggaaagc tcactgtgtg ccagggaccg tgctgtgtgc
tttgcctgtg 33480 ttctctcatg atccttatat taatataacc caccaggttg
acactatttt ccccatctta 33540 taggtgagga aactgaggct taggtcaagt
aatttgccca aaatagtatt cagaggcttg 33600 tactgtgtta cctttagagt
gctgatggaa agatgctttg agtgctggca cggtggatct 33660 ggtggggaac
aatcttacag ctctatatct agcctctact ctgtggtaag accccgtctc 33720
tgtcataaaa gtgctcactg gctctataga ggaggttatt atacccatga ataaaaacta
33780 ggttgtaagt aaccatcaga tgagttatgg ggccagtaag tgctgtagac
attgcattat 33840 tagagcgatc cctttgtgag aggtagtcag aaaaagtttc
ttagaattgt tgggatttac 33900 gtagcaggaa gaggagtatt aagggcagga
aggcaccata tttttaagaa aggtaaaaat 33960 ttttaagggg cgtaatagta
tcttgattgt ggttgaagca agaaagtaat ggcagcaagt 34020 tgggaagatg
aatgggagct ggattgtgaa aagcctcgaa ctccagacaa aggaatttga 34080
accttattct gtaggctctg ggaagcaatg gaaagtgtaa gaggaattgc ttatatacag
34140 tgtgagtaga atctaggatt ccaatttttt tagaaagggt gcctacctag
aatattattt 34200 tctctctgtg acttcaggtg tagaattgtc agtacttgtt
tttgaagttt actcatcaaa 34260 aaaggaaagg caaataaata actgcagcaa
aaaatgaccc attagagcct ttgagattct 34320 ttaaaaaaat tcccttccct
accactctta aaaatcagag taatggcaaa tctgtaagtt 34380 ctctagaaaa
ataattggaa agaatttata aattctgagt ctcgtctttc ctgtatctga 34440
ttctgaaatc ttgaatgtgc taattcctta tattaacagg acaatgttta ttgcctttgc
34500 ttccctgtgc cttagtcacc tttcccggat gaaaggcatt cccatgatat
ttttaaggct 34560 tgcttgcctt ttcaaagttc actctgttta ttctgtccta
ctttatacca gtcatgtggc 34620 agaaatcagg cctgctctgt gaatcggctt
tgtgcagatc atgaggtaac tgtggctgtt 34680 ccacttgtca ttgatcattt
tcttctcggc agtcaggctt ttatgccttt tcagagacag 34740 catttgcttt
gcacaacata gacagcaggg ttataattaa aattagtaaa ttgctgcttt 34800
aagttttgct ggctttgtaa aaaagacacc ttttttggtt tgataaactt atgtgttttt
34860 atttcatgcc acactctaca tctgtcataa ttatgtgggt gattcttgtc
caaatacaat 34920 aaagcaggct ctcacatttt aacgttcaac aaaatacctg
gctggctgaa cgtggttatt 34980 gccaattagt gcatatggga tgaatacagt
tttgttcaaa aggacagaat aatggaattc 35040 tgatataaat actgttgacc
ccagatcctt atactataat taatagatta tttcctctga 35100 aaataaaaga
gattggagtt tttctttttt gttgttgttt ttggtctgca ttctgagtgg 35160
ctgtttgaac tgattttaat ttccttcatg aagatgatga tgttttagct ggcccagggg
35220 cagccatttc agtgtgcata aaggtggttg cgttgggtag ggggatgctc
agaaaaatca 35280 tggaaagcat gggaattcat agggtacttt ggacattttg
gaatcttgaa gagtaagaac 35340 cgtaactggt gacttaagtg tcgtgtttct
tcatttcacc aaatggcaaa atgtgataca 35400 gttcttccaa tatcatgggc
aacttgtagc cagaattaag tagaagataa gattagaatt 35460 gaatataata
acttttgatt tatcatagtg ccttttaaat acatagtacc tctttgctat 35520
attatagtga tagctaaatg atcttttcac attcctaagt tttgatttct gaatggcgtc
35580 gctcctgcct cctgacatct cacactgtga atgtgctact tgctttctct
aggcgcacct 35640 atgttggcag catgcctggt cgcatcatca acggcttgaa
gactgtggga gtgaacaacc 35700 cagtgttcct attagatgag gttgacaaac
tgggaaaaag tctacagggt gatccagcag 35760 cagctctgct tgaggtaaga
tttggaaaat tccctgtctg tcttcatact ggaagagtat 35820 ggaggagggt
tgataatcat attcaagtga tatacacagt ggtgtagctt tagttatggg 35880
aaaaacagtt tgataccggc tgaggtctga gcaatttggc acttaaatta aaatgttttt
35940 gagatttctt tcactaagtc cccttttttt ttattttcct tttgtatttt
aatcagatag 36000 tttaacaaag ttttgtgcac acttattatc tagaggccaa
caattctaca cagttatggc 36060 aaaaaaaaca gcaagcaagt ctccttctcc
ctggggtccc ccatgccttc ttctgcactt 36120 tgacctcttc agcttttagt
tgattaaccc tattttcaaa atagcatggc tatcttgcac 36180 ttcctgattt
tttttttttt tagtttttgt cattttctat agatgccccc caacaggagg 36240
tgaagatttt accttttttc ttccgttgtc cccactgtat catttttata ccttagatct
36300 cgcaataaga atttttttct tgtttttttg ttgttttttt cttgtgaata
ctaatacatc 36360 catattagta tttacattat tatgattatg taaatgcttt
tcacagcagg agccacatgg 36420 taaactgtga tcacttttcc tgttcctatt
tttgtttttc tctacttttt aagaatattt 36480 tcagagttag ctgtcttgtt
tcttttgttt actttttcac caatcgtcta attctgtcaa 36540 gaccttcaga
cactttaggt gttctatcca ttttatcttc ttaagcgtcc ggtctgaact 36600
ggttgttttt gacatccggt tttatggctt ccttcctagg ttctcccttc acctctcacc
36660 atgttggatt tcctgtctcc tgtattccat ttcttgctct ttcttggtcc
attccctcat 36720 ttttgtggtg ttaactccct gatagtttcc tgagaaagct
tgcatgagtg gtaaatgttt 36780 tagactttgc atatctgaaa atgtctttat
gtttccctca tacttgatta gtaatttgag 36840 taaagaattc tggttggaaa
taatttttct atagaattgt actttgcctc cattttactt 36900 cactttccca
tttccagtgt tgctgttggt aaaactgatt ccattcagtt cctatccttg 36960
cagacctgct ttaccctgaa aactttcagg ttcttccctt tatcctggga ttctgaaatt
37020 tcctaataat ctgccttggc atgggtttct tttcatgcat ttttgctcat
tctttctttg 37080 aattcttcct gttctttggt tctaaaattt ttcttaaatt
cttttattga tgacttttcc 37140 cctttatttt ttggaactcc catgacttgg
atattatgtt tcagacttat cttttctctc 37200 ctattagtct
ccacttttat gttttgctct actttctgtg cagactttct cagatttatc 37260
ttttaaaaac cctctgaatt tattatttca aaaactttct ctgcatgttc ttttatagta
37320 tcctgttctt gttacatagt tgtaatatat cttatctcca tgagaaagat
acttatagat 37380 atattttaaa attttacttc tctgaccact tggtatatta
aaaagaaaaa gaaaaaaatt 37440 acttctcttt aagctgcttt tatctgttta
ttatatattt cttttagtct cttttatatt 37500 agagtctttc attagatatc
tggacatttt tgtttgtgtg tttatattta atagtaaggg 37560 acaaaaaggc
tgattggagg ctatgagcat aggagtgggg cttatcaaca gtgagttcca 37620
caatagagtc agctggctgt gctgtttggt tgaggaatct tctactcaat agctttaagt
37680 cttccttctt aggatggtca gattcctcag agaagacttc ctgtctcttg
ccttgagaat 37740 gaaggcctgg ctgccatcat tctgggaacc aagcagggga
agaatgattg gggtcggggg 37800 tatcactgca ttcagcatcc gtgtatatgc
attcacctga gctcttgttt tcagcatagt 37860 atatgttctt atcagctgtg
cccagggtcc cctgtgcaga gaaccactgt tttatgttct 37920 taagaaaata
aacttccagt gttttgctgg ggtgggggag gggatctggg atctgactgc 37980
ttcctaaatt tatttcagcc agtcctcctt attttagcac atcagcccct cctccctttt
38040 acccttgctt aaaatattat taatgcaaat tgatttgtaa aattgaggaa
aacttacttt 38100 gtgaaagttt ttattttttt cttgtttatt tctgtgcttt
gagctgcctc gtgcttcctg 38160 ttttttttct gtttttgtga tcttagaaca
ggatggcctg ggacatgtgt cttattaagc 38220 aggagaccat acattctggt
ttgcttggca cattcccagt ttatgcctaa tattaattgc 38280 actctttttt
agtctcagaa gtgggttttg tttggacgat aaaaaagtac agttacctta 38340
cttaaaagcc ctggtatttg gaggtaaggg tttgatttgg ttcagttttg ctacttttta
38400 ttgtaagatc attaccttct ggctccataa ctggttcttt ttactatgaa
gagtaaaata 38460 gtgaacatta tttaagattt tagtagtttc ttatataata
tctttagact ttcagtttaa 38520 tttatattgg gacatttttt caggttatct
gacagattct cccattagac acttacagtt 38580 atcctgttga aaataatttt
agagtattcc cctgacactt aaattttttc aacaactgtt 38640 ttgaagcaag
ttcaccaaag acagctttac aagtagtagt agatgattaa gtcccctgtt 38700
tatttgttca gttgataaac aatatgtttt aggtcttcac ctatatatac tttgtaatga
38760 ttcaataata tttgttaaat tgatctttga taacaagcag ctagcataat
gatattttct 38820 tgtctgatgt agaccttggt actcactttt ttggcagtcg
atttattagc attcaaaaaa 38880 aaggtatgaa aacctcaaat gatatctcag
agtaaatgcc ccctgggccc acgtactaat 38940 cactgtagtt tagttatgaa
tagcattggt tccttacaga ctgtaaatgc tataaaatga 39000 agcaagacat
acatatggag gaactgagta tcttggtagc tgacagcctc ttcctccctg 39060
cttgcccaag tcctgggtaa aaacctcaga cctcacagat tgttgaaaca attaaataac
39120 agtacatatt aaagcactct ataaatggta aagtactgta cagatgttaa
tttaatatcc 39180 actgatattt cttctgtgtc cattttgaaa gccacttgct
gcttccattg ccagtaggtt 39240 cacttaaatt taaaaaaaga acaaactcaa
ttacacaaca cgttacattt aaagtgaata 39300 ttcctgagag tttggagacc
caagtatagt tttattatct ttctacatag aaaacctgct 39360 tttaaaaaat
gatatctaga tattatttgt aaaatgtata agattatttt atgtttaagc 39420
taattatatt attaaggtaa tatagcccag atgtgaagaa tgtaatagta gatgtaaata
39480 tacactagag tgcttactct gaataaagaa taaacttttt ctgctgtgta
ttcttctttt 39540 tatttatgta ggatatgccc gtttccttga cctaccatgt
aattgttgct tatgtaaaac 39600 agaatgtatt tcaagttatt acttaatatt
gtccaaaaaa ggagaattca aaatttagat 39660 gatctctttt gaaaatttat
tggaagacta taaaaatagg tccaactact taattaataa 39720 atggtggtag
gcagtagaat ttgggcaagt ctataactga gtagcactaa aatattagat 39780
ataaggaaag taagggcttg tatgtaatta atagacttga aagaaaatta cagaattatt
39840 ttcttaccag atatatgtta tatttataac tggcacatgt ccagacttta
ttgttaaata 39900 tgaatgcata tctcaaatac atttttgtgt gagtgggcaa
ataaaatgca tggatacaat 39960 aattaattgt ctttataggc aataatattt
acagttcgaa aaacatatat tccccaaaat 40020 agagaagtca ctagtctaga
tatagtaaac ttcctttaaa actgaagttc ttacttaatt 40080 cgaattagat
ccagttagta attagaccaa tagtatattt actacttaga tacagtagac 40140
atgatctttt gatttgagct atacaattat tgtcaaagaa tgtcagaaga gagggactta
40200 gacatcatct aatccagctt catgctctta aggataaaaa gcttaaggcc
taagatatta 40260 ttttaatttc ttatttcact acatgctata ttaatgatat
aatttccaaa tatcgaatgg 40320 agttaaaaaa tgccttaaat aaggcatacc
ttgttttatt gtgttgtgct tcattgtact 40380 tcacagactg tgttttttta
acaaattaaa tgtttatggn nnnnnnnnnn nnnnnnnnnn 40440 nnnnnnnnnn
nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnngggcac ccgtgtatcc 40500
ccagcccctc ggaagcttga gccaataaca ataccttgac ccggggaggc agagtttgcg
40560 gtcaccggag gggggggggg ggcgtcgcaa cctgggttac aaaccaatac
tctttctccc 40620 gtccccgaca aaaaaaagaa agaaagtgtt tatggcaacc
ccgtgtcaag caagtctgtt 40680 gacaccattt ttccaacatc ttacttcatg
tctgtatgtc acattttggt agttattgca 40740 atatttttaa ctttttcatt
attatatcct attatgatga tctgttatca gtgatctttg 40800 gtattgctat
tgtgattgtt ttggggcacc acaaactgca cccatataag acagcaaact 40860
taatcaataa atgttgagta tgtactaact gctcaactgg ccaggcattc ccctttctct
40920 ctccctctcc tctggctcct attccctgag acacagcaat attgaaatta
ggccaagtaa 40980 taaccctgca gtggcttcta agtgttgaag tgaaaggaag
agtcacacat ctcattgtaa 41040 atcgaaagct aaaaataatt aagcttagtg
aggaaggcat gttgaaagct aggcctcttg 41100 tgccagatag ccaagttgtg
agttcagagg aaaaattctc aaaggaaatt agaaatgcta 41160 ttccagtgaa
cacaccaatg ataagaaagt gaaatggcct tattgctgat atgaagaaag 41220
ttttagtggt ctggataaaa gattaagcca actacaacat tcccttaagc cgaaacctag
41280 tccagagcaa ggccctaagg ctcttcagtt ctatgaaagc tgagagaggt
gagaaagctg 41340 cagaagaaaa atttgaagct aacagaagtt ggttcatgag
atttaaggca agaagccatt 41400 tctacaacat aaagtgcaaa gggaagcagc
aagtactgat gtattgtaga agctgcatca 41460 tgttatctat ccagaacatc
tagctaacat cattgataaa ggtggctaca ctaaaaaaca 41520 gattttctat
gtagatgaaa cagccttatt ttgtattgga agaagtgtca tttaggactt 41580
tcatggctag agaagtcagt acctggcttc aaagcttcaa agggcaggct aactcttgtt
41640 aggggctaat gcagctggtg actttaagat gaagccagtg ctcattgacc
attctgaaaa 41700 ccctaaggcc cttaagaatg atgcaaaatc tactctgcct
ttgttctgta aatggaacaa 41760 caaagcctag gtgacaatgc atctgtttat
agcatggttt tactaagtac tttaagccca 41820 ctgttgaaac ttaccgttca
aaaaaaatag attcttttga aaatattact gctcgttgtc 41880 aatgcttctg
gtcacccaag agctgtgatg gagatgtaca aggagattaa tactgttttc 41940
attccttata aaacaacatc cattctgcag cccatggatc aaggagttat tttaactttc
42000 aagtcttatt atttaagaaa cacatttttt aaggctattg ctcccataga
ttatgattcg 42060 tcccatgcat cagggcgaag tacattgaaa acccctagaa
aagattcacc attctagatg 42120 ccattaagaa cattcatgat tcacgggagg
aggtcaaaat atcaacatga acaggagttc 42180 aggaagagtt gattccagcc
ctcatggatg actttgaggg gttcagactt cagtggagga 42240 agttaccgca
gttgtggtag aaatagcaag agaactagaa ttagaaccca aagatgtgac 42300
tgaaatactg caatctcatg gtaaaacttg aacagatgag gagttgcttc ttacagatga
42360 gcaaagaaag cgggtttctt gaaatggaat ctagtcctgg tgaggatgct
atgaaccttg 42420 ttgaaatgac aaccttgatg ttgtgaacct tgttgaaatt
ctaaacaaga tttagaatat 42480 tacataaaca tagttgataa aggcagcaac
agggtttgaa aggattgact tcaattttga 42540 aagaaattct acggtgggca
aaatgctatc gaatagcaat gcaggctata agaaattgtt 42600 tcatgaaagg
aagagtcaat agatgaagca aattttactg ttgccttatt ttaagaaatc 42660
gccacagcca ccctaacttt cagcagccac cacctgatca gtcatcaacc attaatattg
42720 agacaagaca ctccaccagc aaaatgacaa caactaacac tgaagactca
ggtgattagc 42780 attttatagc aagaaagtat ttgttaatta aggcatgtac
attgtttttt agacataatg 42840 ctattgcaca cttaatagac tatagtatat
tgtgtaaaca taacttttat atgcactggg 42900 aaacaaaaaa aaacatacat
gtgactcact ctgttgcaaa atttgcttta ttgcagtggt 42960 ctggaactga
acccacagtg tctctgaggt atacctgtat tgaggagggg ttgcaaattt 43020
tagcacatag gcaaatttgc aaatatggaa taataaggat caactgtaat tactgcttta
43080 tgccattatc ttttaaatca gataagaaaa agttacgtca acaatatatt
tacactgcct 43140 tttatgtttg caatgtaatc acttctgcca gtgcgctcta
tttctttgtg tggatactgt 43200 ctagtgtcct taaacttcag tctttcatat
ttcttgtctc atctcctggt gacatattct 43260 cagtttttgt ttttctggga
atgtcttaat ttctccttca tttttgaagt aattttgttg 43320 gtatagaatt
tgggttgaca attgtttgct ttcagccctt tcgcatgtcc tctcaccact 43380
ttctggtctc tgtggtttct gctgtgaagc cagctgttaa gcttgtggcg gatctcttat
43440 gcctaatgag ggcagcattt ttctctcata gttttcagta ttctctcttt
gtctttcatt 43500 tctgacagat tgactgtgtt tatgtgtgat cctctgagtt
tacttagttc tttttgagct 43560 tcttggatgt gtaggtaaat gtttttcatc
aaatttgaga agtatgtggc cagtatttct 43620 tcaaatattc tttatgcccc
tttctttttc ctctccttct gaaactcgta ttatggtgtg 43680 ttggtaatct
ttgtggagtc ccgtaggtct ctaaagtgct gttcactttt tttaaagcct 43740
tttttctttc tattcttcag acaggatcat ctcagttgac ctgtcttcaa gttcattgat
43800 tctttcttct gccagctgaa attgtcattc agcccctcta gtgaattttt
cattcaaatt 43860 actgtagttt tcaactccaa aatttctatt ttaaaatttt
tattatttat ctttgtttat 43920 attctctatt tgtcaagaca tcattctcat
actttcctgt aattgtttag acatgatttc 43980 ctttagtttt tttaaatgtt
agtaaatata acagaaaaag tcccattttt accactttta 44040 tgtgtacagt
tcagtaatgt taagcacatt cgcattgttg tgcagccaat ctccagaact 44100
ttttcatctt gttaaagtga aggtgtatac tcattacaca gcaattccct gtttctttct
44160 ccctccctca gtccctggca gctaccattc tcttttctgt ttctatgagt
gactactcta 44220 tatacctcat ataagtgcat catacggtac ttatcttttt
ataattgact gacttcactt 44280 agtttcctca aagttcatca atgttggggc
attagttttt taagcatatt tatagtagct 44340 gatttgtaat cttttttttt
ttttttttga gacggagtct caccatgttg cccaggctgg 44400 agtgcagtgg
cgggatcttg gctcactgca agctccgcct cccaggttca caccattctc 44460
ccgcctcagc ctcccaagta gctgggacta caggtgcctg ccaccaggtc tggctaattt
44520 tttgtatttt tagtagagat ggggtttcac catgttagcc aggatggtct
cgatctcctg 44580 accttgtgat ctgcccgcct tggcctccca aagtgctgag
attacagtcg tgagccaccg 44640 tgcctggccg ctgatttgta atctttatct
aataaatcca acatgtcttc cttagggatg 44700 gtttccattg acttctcttt
ttcttttttg agacagggtc tcgctctgtc acccagactg 44760 gagtgcagtg
gcgcactcat ggctcatggc agccttgacc ttacccaggc tcaagtgacc 44820
cacccacctc agcctcccga gtagctggga ctacaggcac acaccagcat gcctggccaa
44880 ttttttgtag agacagggtt tcgccatgtt gcccaggctg gtctcgaact
cctgagctca 44940 agcaatttgc tcaccttggc ctcccagagt actgggatta
caggcatgag ccactgaacc 45000 cagctgactt ctcttttttt tttttactct
ttagggccgt acttttgtat ttctttgtgt 45060 gtgtctcata attttttttg
ttgaaactga atatttagag tgttatattt atattaaata 45120 cagtcagata
tataattgaa taatataacc ttaagggttt tttgtttgtg ctgttgttgt 45180
tgctgtttgt ttagtgactt tctggtttca ttctgtaaag tctgttttat tcattaatgt
45240 gtgaccactg aagttgctca gtttgtttag tggtcagcta gtgaccggac
agagatttcc 45300 ttaagtacct ggacagtagc tctcccactc cttgcccaag
gggctcttat gtgtgtattg 45360 aagtgggcct ttcacacttt ggcagatggt
ttacaactct gccttagcct tcacttcctg 45420 cttttgcaga gcctcagtgt
ctgccaaaga tgagcttata gggccttctc aggtctttcc 45480 tggatatact
tagagcctgc acattcacat gaaattttgg attctcaggc atatgtcaag 45540
gcttttcaaa gtccccatga atatctcatt tcccagtttt tccatttaag ttttttggtc
45600 agcctcttgt tagtcccaac tagtttcatt gcctcaggca gctgcagtgc
taaaacagtt 45660 gccactggtt gtttttggca aatgtcctaa ggataaaact
gttctcacag agtgttctct 45720 gagttaagtc aaataaggat atggagctct
tctaaggaac tgccagagtc aaacagggac 45780 agttctctgg ggatggggct
tttgaaggat tgtaatcctt ttctaccccc taacaggatt 45840 gctaggctac
tggttttcac agctactggg gttatgaggc tgttgatttt gctaccatga 45900
acttgagaga aagggatgag tgtaaagcaa gttaaaatat cacaaagctc gttctgttta
45960 ttgagattca gctgtttttc ttgaataagc actcctcaaa ttgttgcaag
ttagtatgta 46020 gcattctgaa aaagttgatt ttgacaattt ttgctagtgc
tctcattgct tttctggagg 46080 agcagatttt cagagtttct tactctacca
ttatataata gaagtgcttc ctcccccatt 46140 tcattttgat tctgtgcttg
aatgatttca ctgcatgctt ctgatacttg tattttggtt 46200 tatcacttgt
tcagatgaaa tatatcttca ggttacttca ttcaaagatt tgtgtgtgag 46260
ttgtattttg aatctcttct atatttgaga aggcttcttt gttgtctgca ccagtagtaa
46320 tatatatgta aataaaataa gaatgtatta gtcttcttct tttttttttt
tttttttttt 46380 gagacggagt cttgccctgt cacccaggct ggagtgcaat
agtgcaatct tggctcactg 46440 caacctctgc ctcccaggtt caagcgattc
tcctgcctca gtctcctgag tagctgagat 46500 tacaggcacg tgccaccacg
cctgactaat tttttgtatc tttagtagag atgggctttc 46560 accatgttgg
ttaggctggt ctcgaactcc tgacctcgtg atccatccgc ctcggcctcc 46620
caaagtgctg gtattacagg catgagccac cgcgcccagt cagaatgtat tagaatgtat
46680 ttcttaagac tgccataaca aaataccaca gactgggtag ctttgaagac
caaacagaaa 46740 tttatttcct tatggttttg gaggctagaa ttccaagacc
aaggtgttta taggtttgat 46800 ttctcctaag gcctctctcc ttggcttaca
gacaaccgac ttgtggctgt gtcctcggga 46860 gacctgtgtg catgcatccc
tggggtctcc tctttcctct tataagggta ccaattgtat 46920 tagactaggg
gcccactctt accttcattt aaccttaatt accttcttaa acaccctgtc 46980
tccaaataca gtcttcaccc tgactgccct tgagacagag cggagggggt tagggattct
47040 gtcaattttg agggggcaca attcagtcca taacaaagga catatataat
agatacataa 47100 tatatatgta ccagtgtgcc catatcatgt actttatgta
aaacgaaatc agttttaaaa 47160 ggtaattata ttttcaatga aagcactgtg
ttctaattag ataattgttt ttacttcata 47220 atatgtctat cctagcttat
tatataaata aaagtgtcaa ctctgttatt ttcttgtggt 47280 tcataccttt
gcctataccc tttttaatga tactttgcag gaatcttttt aaaccactca 47340
acccatttgt aatattaggc tctgtgaacc cggaaaattt gagacaggtc tcagttaatt
47400 taggaagtat atttggccaa ggttgaggac gcgcgcccat gacacagcct
caggaggtcc 47460 tgacgacacg tgcccaaggt ggtcagagca cagcttgatt
ttatacattt tagggaagca 47520 tgagacgtca atcagcatat gtaaggtgaa
cattggtttg gtctggaaag gcaggacagc 47580 tctctggaga gggcttccag
gtcacaggta gataagagac aaacccttgt gttcttttga 47640 gtttctgatt
agcctttcca aagggggcaa tcaggtttac ctcagtgagc agaggggtga 47700
ctttgaatag aatgggaggc aggtttgccc taagcgttcc cagcttgatt tttccctcta
47760 gtctggtgat tttgggggcc aaatatattt tcttttcaca gcacacatgg
acagcaatgt 47820 gctgtaatta tagttaaggc agataagtga ggacaccaca
ggcagccttc gaccttatgg 47880 aacttcttct aagtgaagac atcaattcca
ttttggatat taaatattta caagctattt 47940 ttttctggta tttataaata
aaaaagataa atacaaatac taatattttc tacttgcact 48000 ttggtgggtc
attttccact tttgtgacca ctggtctaaa tagataaaca aatgtcttca 48060
caaatgggta gtaggttcac aggtgttcat tttgttatta tgcatcatat cttatatata
48120 ttacatatat ttgatgtatt caagattgta aaatatttta aactagtgat
aattttgctt 48180 gaaaattctg taggtgttat tctaatgaca ttctcatttt
tattgcacag gaggaggaat 48240 ctaaatcttt tcaatctata gtgtcaaggt
cttctagaat attttcgttt ctttaatccc 48300 tattttaatt tactgagacc
tcttctttag ttatattaac cagttatgaa ttgtatctct 48360 taatttttcc
cgtatttatc ccctacatgt ctctaaagcc ctttttcttc tatgtcctga 48420
acacttttct caagtttgtc tttatcacag atttaatttc catagttgag gatatagagg
48480 aaaagtaaac tcagtttctc ctactgcact ctcacaacac agaacacctc
tgaccaaatg 48540 cacgggtttt ttctccatat gccaagcaag cagttcttca
gcaaccgacc acagctgggt 48600 gtcctctaat tcaattctga caaagtgtat
cagatcctac gggttgagca ctgagtccca 48660 caagactgcc tcccccttca
gatgccagtc gtgagttgac ttccagaacg tgtgaccaac 48720 cagttataaa
ttggagtacc cacaagcccc cctcctcagg tttgcttaat ttgctagagt 48780
agctcacaga actcagggaa acaatttact tgcatttact ggtttattaa aagaatattt
48840 taaagaatac aaacaaacag cacaggagct tccatcccag tgaagtcagg
gtccaccagt 48900 cttcttgcac ctgggtgtgc tcaaattcac cttcctggaa
gcttcctgac ctcagtcctt 48960 tcgggttttt aatggaggcc ttgtcacata
ggcctgattg attaaatcac tggccattgg 49020 tgatcaactc aactcttagc
tcttctcccc tcccaagaga ttgggctggg gaactgacaa 49080 gtcctcagcc
ctctaatcat gccttggtct ttcctgtgac cagcccacat cctgaagctg 49140
tggagggact gccagccacc agtcaatcac taacatacaa aatgatactt atcactttgg
49200 tgattccaag gattttagga gttgcatgtc aggaaacaaa gagatgaagg
ccaaatatat 49260 attttacagt atcataatag tattaattgt gtgtggcttt
cagagctgat tttagttatg 49320 ttattttatc tttattttct gttgtggaaa
atttcaacca tagcaaaagc agagaagata 49380 gtataatgaa ttctgtggac
tcatcaccca gctttaatat cttgtttcat ctattgcttc 49440 ccattctccc
ctacccaacc tctgattatt ttgaagcaga ttccagacat catcttttca 49500
taaatgtttc agtagctatc gacaaaagat atacactttt aaaaagcata atcatactat
49560 atcacaccta aagatgacag ttacctagtc ttgtgtaatg aactctatgt
aatctattcc 49620 tggattgcct acagacatct atagttcttc tcttgtcaga
aattattatt gaagaataat 49680 tctcagtgta cattcctccc acggttcatc
ccattgtgac ttcacattcc taggaataat 49740 gcgtcatatc acagctattt
ccattcccag tcatactttg taggtaggaa ttatagtcct 49800 aggattgata
cagaaaatct tttagttggg gagaataaag gagaaacagc cctaattatt 49860
tttgaaagtg gccctggatg tgggcagtag aatccctgct ctgaagttag ggtaagaaga
49920 tgaggtttga tagctacaaa gctcttaatt gtaattttcg tccttccatg
gactcaccag 49980 tttgcctcgg agcttcatct gagtagtgat taccagaaat
tattttctgc cagaatattg 50040 atcagtattt ctgatgctgt ttaaattcta
tatgtctttt tatgcttttg aaaaccagaa 50100 agtatctgag acaggtctca
accagtttag aagtttattt tggcaacgtt ctccagagat 50160 gattgtgagg
gcttcagtat ttaaagggga atgggcagat attggggaaa gaggaagaaa 50220
ttttaaaagg tatgagtaga caagagacaa acggttgcat tcttttgagt ctttgatcag
50280 ccattcacct gtgagagggg agcagaggaa tagtcactga cgcattcatc
tagcttagtg 50340 aatctgcatt tctacataag ataaaataaa tatagcgtac
aggaagccat cagatatgca 50400 tttgtctcag gtgagcagag ggatgacttt
gagttctgtc ctttgtcctg tatgtgtaaa 50460 gaataagcta tcaatttaca
tggttggggt gaaattcaac agaactgtta caggttaaag 50520 atcttggggc
ctacaaggaa tttctcagtg gggggattgt gagggagata tgtagctttt 50580
tttgtctttg tagctatctt atttggaaac aaaatgggag gcaggtttgt gtgacgcagt
50640 tcccagcttg tctcttccct tttgcttagt gatttggggg tcctgagatt
tactttcctt 50700 tcacactctt cctgagtaaa agaggaaggc aggcaaattg
ggcacaaatt tagcctaagt 50760 ctgcctcctt acatattaat attttaagtt
tggcctaaag gtttcccctt acaaagtaaa 50820 ctgcagccta actagctgtg
taaacacact attcttaaca ccaatcacag attttcagca 50880 agtcacagga
agtcagctgt taacaaactt taaataaagc aaacaccaag ctgtaagcaa 50940
tcccgctgtt tctgtacact ctttgttttc tgcatgtcgc tttccttttt ctgtccataa
51000 atattatcaa accatatgcc agagtttctc tgaacctatt ctgtttctgg
gagctgccca 51060 atttgagact tgttctttgc tcaattaaac tgttaattta
tctagagttt ttcttttaac 51120 aagcatcact aattttttct ccttataatc
taggtattct gtcacactgt tttaaaaacc 51180 tccttcataa ttcagaaaca
ttgctttatt aattttccta ctttttaaaa acgctagtgt 51240 cttaaaattt
taagagaaaa aaattacttg ttcaagtctg acagccattt ctaaaacata 51300
tccagcatat atgaattaca tatgcttaga gccattaaag aatagaattt tttccggcca
51360 ggcatggtgg ctcatgcctg taatcccagc actttgggag gccgaggtgg
gcagatcacg 51420 aggtcaggag atcgagacca tcctggctaa catggtgaaa
ccccatctct actaaaaata 51480 caaaaaagta gccgtgcatg gtggcgggcg
cctgtagtcc cagctactcg ggaggctgag 51540 gcaggagaat ggcgtgagcc
cgggaggcgg agcttgcagt gagccgagat cgcgccactg 51600 cactctagcc
tgggcgaaag aacgagactg tctcaaaaaa aaaaaagaat agattttttt 51660
ccttagctag tgttaaaaaa ttactcatga cgcttattaa aggtggtaag gattacttta
51720 ttcaaggtgg gagactacgt ataagaaaca ctgcaatggg gttttgcagt
gacaggagga 51780 gagtgaatgg ggaatcagta gagggaaaca ttctaagagg
aagaattggg gttacggggg 51840 attctcacta gaaggacaca acagaactct
tgctgaaggg aggccagggt gaaaagatac 51900 tgggttagaa gtgagaacag
atacgtatgg gtatgggtca tttttgctaa cctgacttag 51960 caggattctt
gctcaaattg gattttacaa agacagaggg aaggctgaca ttggcctagt 52020
tgagcagagg actcagagga gcctgactca agtttgcgtc aaaagaagag cgtttttgtc
52080 actagatgat agttttaact attttccata cataaacatt ttccgtacct
aaacagtttg 52140 tttgttcatt tgtttgttag tttgtgttgg attttcactc
tgtcgcccac gctggagtgc 52200 agtggcgtga tctcagccca cggcaacttc
tgcctccaaa gttcaagcaa ttctcatgcc 52260 tcagcctccc
gagtagctgg agctacaggc atgtgccacc ataccaggct aatttttgta 52320
ttttttttta gtagagacag agtttcacca tgttggctag gctggtctca aacacctgac
52380 ctcaactgat ctgcctgctt cggcctccca aagtacttgg attacaggtg
tgagccaccg 52440 tgcccggcct gtgaacagtt tttagatgat tagtagatag
taagaccact cttaaccaat 52500 tcaatactga acataattag ttttccttga
ttacttgaaa gtacttgttt tttaatgata 52560 ttaaacatta ttaagtcttg
tgaaaatgtg aaattagagc tttctgggaa ttctagatag 52620 agtttccagt
aataattaat gtttaacaaa attcagaatt atgtatgagg cctagaatta 52680
agactagctt ggggctgggc gtggtagcgc acgtctgtaa tccctgcact ttgggaggcc
52740 aaggcaggtg gattgcttga ggccaggagt ttgagaccaa tctggccaac
atggtgaaac 52800 cccatctcta ctaaaattgc aaaaattagc caggtggggg
tggtacgcac ctgtaatccc 52860 agctactcag gaggcaaaga ttgtagtgag
ctggagacca tgccactgca cctcaacctt 52920 ggtgacaaaa tgagactctg
tctcaaacaa aacaaaacaa aacaaaacaa aaaactaact 52980 ttggatagtt
ttgaaaataa gtaaaacttc agaaagaatc agaaggtagg aaaaactgct 53040
tatatagtta aattgtggtt ggtgagtata ttagtcattt tattgccttt ttgaatatgt
53100 atggcaaccc tatttatagt aattgggcgt aagtgagagt gttaatatgt
ttaaggtttg 53160 gaacatgtag aagctgttgg tgccttatga aagttctgca
ccagcccctt agcaacaagt 53220 gcctgtgact tgaagctctt taatgtacag
ttgcacattt taagaatcca agttgactga 53280 taaattatct aatgtatcta
attcaaatat ttttaagagc tattgtaatc ccagtacttt 53340 gggagactga
ggcaggcgga tcacttgagg tcaagaattt gagaccagcc tggccaacat 53400
ggtgaaaccc catctctact aaaaatacaa aagttagcca ggcatggtgg cgcacacctg
53460 tagtcccagc tactcaggag gctgaggcag gagaatcgct ggaacccggg
aggcggaggt 53520 tgcagtgagc tgagattgtg ccactgcact ccaacctggg
caacagagta agactctgtc 53580 tcaagaaaaa aagagttatt gatgttttgc
ttattataag cagcaatgtt ttgtagtaag 53640 ccatttttaa atagtgaatt
ttttgctgta tcagaatata gtagcatagt aatttttact 53700 cttatttaac
tcatagcaaa ggttactctt atttggaatt ctcctttcag ttaaataatt 53760
tataccagac tttctgaaaa tgtttgagga ggattatatg ggttcttatt tactggttct
53820 ttgagaattt caaaatactt tacacatttg ctttatattc ccatagcagt
ttagataggg 53880 tgtgttacca agatggaaac tggttctgca ggactggtaa
cttatgatgg ccaaacaatg 53940 agtcattaat aaatagattt ttgaacaaag
cttgaaactg taatttctgc tgctttgtgc 54000 tattacattt tcagaaattt
tgacactgaa cgtattttat tttttaaaaa gtatgtagaa 54060 tgtagagaat
gcaaataata atgctcagat gttagttttg tctgtttctt aaattcttct 54120
gagcagaaat accaaccttg ccagtacatc atgtgtgttt tcacttatat acagccttct
54180 gttggcacta ctaaagtttt taaaatgttt tttgttctcc cctaggtgtt
ggatcctgaa 54240 caaaaccata acttcacaga tcattatcta aatgtggcct
ttgacctttc tcaagttctt 54300 tttatagcta ctgccaacac cactgctacc
attccagctg ccttgttgga cagaatggag 54360 atcattcagg ttccaggtac
ctgactctta aatcattatg atacatcttg cctttctgac 54420 cataacttta
aaattagtta tgctatggag ttttgactaa aagaagttca tttgccaaca 54480
tacaatcttc agaagttctg aggaatgtat ataaatcagt ttctatgtag cttcaaagtc
54540 tggaagagca aaacagcaaa cgttgacaac aacaatttca gatttaatta
gcatgaaaga 54600 atgataattt tatgacaaat aagacattct tctttagtat
aatttctaaa atggcaggct 54660 gtgtgtggtg gctcacacct gtcatcccag
cacttttggg aggctgaggc aggtggatca 54720 cttgaggtca ggaattcgag
accagcctgg ccaacgtggt gaaacaccat ctcaataaaa 54780 atacaaaaat
tagcctggca tggtggcggg cgcctgtagt cccacctact cgggaggctg 54840
aggcgggaga attcccttga acctggggaa ggggaggttg cagtgagcct cacgccactg
54900 cactccagcc tgggtgacag agtgaaactc catttcaaaa aaaaaaaaaa
aaaaagagta 54960 actgaacttt ctcataaaat ctggcctcac ttttatatta
aagtgcatgc cgcttttaaa 55020 ttcctcttga atctgtcaaa tagttaaatt
ttttaaatgt cttccctgtc actggagcgt 55080 gcaaaatgta ttccttcagt
tactaacact agataagtta tagcattttc accttatttt 55140 aattgctcag
aattgttttt ccctggaaga gatcaaatat cactgagttt ttttttaatg 55200
tagagtagaa tctaaatgtc tttatttatt taattattta gagacagagt ctagcttgtt
55260 gcccaggctg gagtgcagtg gcacgatctc ggctcactgc agcctccgcc
tccgaagttc 55320 aagtgagtct cgtgtgtcag cctcccaagt agctgagatt
acaggcactc gtgaccacgc 55380 ccaggtaatt tttgtatttt tagtagagac
catgttggcc agtctggcct cgaactcctg 55440 gcctcaagtg atctgcctgc
cttggcctcc aaaagtataa ggattacaga cgtgagccac 55500 catgtccagc
ctaaatgtct tttacttatt ttttcttttt ttgagatgga gtctcactct 55560
gtcacccagg ctggaatgca gtggcacaat cttggctcac tgcaacctct gcctcctggt
55620 tcaagcgatt cttgtgcctc agcctcctga gtagctggga ctacaggtgt
gcaccatcac 55680 acctggctaa tttttgcatt gttagtaggg acagggtttc
gccatattgg ccaggctggt 55740 cttgaactcc tgaccttagg tgattcaccc
gcctcagcct ccaaagtgct gggattacag 55800 gcgtgaaccg ccacactcgg
ccctaaatgt ctttagattc taaatgtaat ctaaatgtat 55860 ttttcatatt
aatctgaaat atatttttac tactaagtga attataattg gatttctgtt 55920
tgtttttttt ttgagatgga gtctcactct gtcaccaggc tggagtgcag tggcacgatc
55980 tcagctcact gcaacctcta tgtcccaggt tcaaacaatt ctcttgcctc
agcctcacaa 56040 gtagctggga ctacaggcgt gcaccaccac gcccagctaa
tttttgtatt tttagtagag 56100 atgggatttc accatgttgg ccaggaaggt
ctcaatgtct tgacctcatg atccacccac 56160 cttggcctcc caatataact
ggatttctta attatctgtg agcattgcag gttcctgtat 56220 ttagttttaa
aatatggtag agtaaaaagt taattgtgtg tatttaaagt ctaaagtaaa 56280
taagtaatga attccctgga aactccaagt tatggcagaa aattcattag atacactaaa
56340 gtaaagtgaa agaatcagga cagctgctgc agaggggagc atatgatgcc
accttcttcc 56400 tttggcagat ttagctgtcc gatcttctag ctttcctggt
gtttactaac ctctttccat 56460 tcaaaaggtg ccttatcaat tcatattttt
aatttttgct tgttaaatgg aaagggacat 56520 tagttggaat tttgtcttac
gggatttaga gacaaaggaa atctatattt attcaggcta 56580 ttaaataaga
acattatgtg ttctaaatat actatatata gaaaaaatac atatatacat 56640
acataaatac atatgcacac atatataaat acatacacac acacacacac atatatatat
56700 ataccatcat gtggaggaaa aaacctttta tatggacatc ttaggttttc
ttttgctgct 56760 acaatttatt ttatagtcat agttctggaa acagtatctt
tagagccctt cccttggaac 56820 ccactgctta tttaattgag gtgtgtgtgt
gtgtgtgtgt gtgtgtgtgt gtgtgtgtgt 56880 gtgtgtgttt caagtataga
tcaaattagg ctaaaaagat gcatttattc ttctatttga 56940 aatttcagag
gatttgagga taaagagata attgtctcta agatttgagg tgttttcctc 57000
tttgggaaat atatcattta atcagaaaac tttcaagcac tgtgcttagt aaatgcttgt
57060 tttgtttgtg aaaacgttgg aaattttaac aattattgac ttagatcaaa
tttctttttc 57120 tttttttttt tggaggcagt ctctgttgcc caggctggag
tgcagtggtg caatctcaac 57180 tcattgcaac ctccacctcc ccagctgaag
caattctcgt gcctcagcct ccagagtaac 57240 caggactaca gacatgcgca
accatgctca gctaattttt tgtgttttta gtagagacag 57300 ggtttcgcca
tgttgcccag gctggtctca aactcttaag ttcaagtgat ccgcccgcct 57360
cagcctccca aagtgctagg attacaggtg tgagctaacg tgcctggcca gaataaattt
57420 cttcattgta attatagtct catttgaaat aatacttaaa tttgttctaa
atctaagatc 57480 catttaatgc tacatttgat tcattaaaaa agcatggcac
tggctgggag cagtgactca 57540 tgcctataat cctcagcact ttgggaggct
gaggnnnnnn nnnnnnnnnn nnnnnnnnnn 57600 nnnnnnnnnn nnnnnnnnnn
nnnnctataa tctcagcact ttgggaggct gaggctggtg 57660 gatcacttga
ggccaggagt ttgagaccag cctggccaac ttggcaaagc cctgtctact 57720
gaaaatacaa aaatcagcca gcgtggttgt gcatgcctgt aatcccagct gctcgggagg
57780 gtgaggcagg agaatcactt gaacctgaga ggtggaggtt gtagtgagcc
gagatcacgc 57840 cactgcactg cagcctgggc gacagagcaa gactctgtct
ctaaaaaaaa aaacaaaaaa 57900 caaagcatgg cattatggga gccatgtaaa
taattacaaa acaagatctc ttcttttcca 57960 ggttatacac aggaggagaa
gatagagatt gcccataggc acttgatccc caagcagctg 58020 gaacaacatg
ggctgactcc acagcagatt cagatacccc aggtcaccac tcttgacatc 58080
atcaccaggt tagttagcca tcctgaggct tcattaactc caggcaactt ttgagtattt
58140 actgagttac caaacaggac atagagtatc aatatttgag tttttcatct
tttgagataa 58200 gccacagtct cctgaaaagg agattagttt attggcatcc
catagcatcc atttctcttt 58260 cttcaacaac ttccagcaag tgttatcata
actattgatt tacaccgttc tctacactag 58320 gcagaagttt acagagaaac
catttggaat attgttatag ctaaagctga aatttatgct 58380 ttgccacaat
agcaatataa ggggttaatt tgatcattta aaaaccaaat acatggcaaa 58440
tatagagaca ctttttatgc ccaggatctt gaaagttgtt gaattctctt aagaggtgat
58500 atgctacttt cagataatct gatttaagtt actcactttt cttttcttct
ctttggctga 58560 gagattttta aaatccttag aattttgatc ttcagaatta
acactggaac aatagagaag 58620 gtgccttccc aagtttacta ccaaatgctt
aagcctgtag caagcagtgt gtaaattatc 58680 tgaatagagt attgcttagt
ctaatttaca gattccctgt ttgaatggaa aatatactct 58740 gttgagaatt
tatatccacc acagcctctt acagttttcc tagctcagta ttacagatcc 58800
attgcatcat ccagcaagtc atgtcaggct gccaagctct cctcttgcgg cccttttcta
58860 gtaactactg tttttaagag atttgaagta tctctctatt ttgaactttg
acttagagtt 58920 tggccagact gtcttttgat ctatgccttc ttatggatct
atttagattt atatacaaag 58980 cagtaagact aagtcttacc tgggggttcc
ttttcttaat ttgtcttgtg atttatggtg 59040 tagataatgc caggagaaat
aaattaagtg acttatatgt ctgagtcttc caacaatatc 59100 attattccag
ataacaccca tgatgccttt gggtaacttt caataagtca tttaacattt 59160
ttgatagctt ccccatctgt aaaatatgag ggatggagaa aaatccagag tttatctgaa
59220 taataatgat tctgaagagt gatcattatt tatatttccc agttgttacc
tagagaactg 59280 tttctttttt tatgtatact tgttaactca aaatatcaga
tcttaaaagc tgtggacata 59340 aggaaatatc tggagcagtt ttgttagttt
tgatattgtt tttaaaaaca gcacaagtat 59400 gtactattcc aggcacagtt
tttggatatt tagtgagtta ccaaacttag gacatagagt 59460 atcaatattt
gagtttttca tcttttgtga taagtcacag tcatagaccc taatgttcta 59520
gtctttctta tctccaagta taactcacct gcttgaatac ttccagtccc agtatgctta
59580 attctagcga ataactacct tttcatgggt aattctaact gtaacaaaga
tattcttttt 59640 atttatttat ttatttttta agacaggttt tcatgctgtt
actcaggctg gagtgcagtg 59700 gcatgatctt gggtcactgg agcctctgcc
tcctaggctc aagccatctt gccatctcag 59760 ctcccaagta gctgggacca
caggtgcatg ccgggcgtgg tggtgtgtgc ctgtaatccc 59820 agctactcgg
gaggctgagg caggggaatt gcttgaacca gggaggtgga ggttgcggtg 59880
agttgagatc gtgccactgc actccagcct gggcaacaga gtgagactcc gtctcaaaaa
59940 aaaaaaaata gagatggggt tctcaccatc ttggccaggc tggcctggaa
ctcctgagct 60000 caagtgataa ttgttacaaa gatactcttt ctattcactt
ttctataatt ttcttcttct 60060 gccttatagg agcacctgga atctaagtgt
aattcctcct tgtacagccc ttctgacatt 60120 aagataaaat actatcaggt
gctgcacact aagtgttctc ttcttcaagc taaccattcc 60180 tctcctctgt
accattcctc ttgatgtagt ttcaagactt ctcaccctcc tgattagtct 60240
tcttctgaaa gaatcctgta tatcaatgtg tcttttaaaa ttaaacaccc agaattgaac
60300 acagtgtttc agatagagtc taaacagttc atggtatagg aagcccatgc
ttttcttatt 60360 ctgactatat tattttatga ctgtatctct agattcttag
ctttttaaag attattctct 60420 tccctttttc agtgaatttc gctaagcttg
gcatatccca ttttgtattt ataaagctga 60480 attttttaaa gcccaaatgt
agaagttgtt aagatgcctc cctgttttct cccttattga 60540 aattatacgt
agttgcataa tataggcttt atatccttct atacctttga ctgaaatgag 60600
tattagagtg tttagctaag agctttttat ctgtcttttc tcagaacttt taaaatctgc
60660 tttcctaaag tctacagtgt atgtctgact taatcaaatg tatggctttg
tcaaatccaa 60720 ttcttcagat aaaactgcat tctccacctg atcctgtcca
ttcaggtcca tccaaagctg 60780 agtggccaaa agtggtttca ctatataatg
gtctgtggaa tgacttaacg gagtttgatt 60840 ctaatgtaca tgtgtttaaa
gcagctctgc ttaaaccaca catagcatct ttttcacaaa 60900 gtcctcaaag
tcagtgctgt catcacttag cataccttct tcctttagaa atcttcacaa 60960
tgaaaataca ctgaagaaag gtggttagca aagtgcctag tgaaaaccag atttctgtct
61020 cagatttgtt tttgttttag ttccacaaag agcacaattt ctcttattct
ttcagtagta 61080 tttcaaatac aatgaattta tctagaattt tcctaaattg
acaaattttg tttaagaaaa 61140 ctcttcaaca aattaccgag gagtaaatgg
ttttttatat gctgccaagt ttactttggc 61200 aatgtaaatt gaactagaac
tagggttcat ttttaagtgt aggattataa ttcaagataa 61260 tctgtataaa
ggaaattgtt gtagctgaaa atagatcaaa gtattgaaga aataacaata 61320
atgaggagtt ttaagtgtgg aaaagttagt actcaagaaa gggtaatgaa cttttaaatg
61380 tacactgttt taccaaaaat gttaatcaca ttacctctct atttttttaa
gtggtatata 61440 gtcaaaaata aaatattttt gtttgatgac aggtatacca
gagaggcagg ggttcgttct 61500 ctggatagaa aacttggggc catttgccga
gctgtggccg tgaaggtggc agaaggacag 61560 cataaggaag ccaagttgga
ccgttctgat gtgactgaga gagaaggttg gtgaccttgt 61620 tctggcattc
tcaggcctgg tggctaggag tgagtgacag aagaaggttg ggtatggagg 61680
ggaaggtgtt gggtagtcct tggagcagtg gcacacatga ctccactgtt aaatgcatcc
61740 agtaagtaat accttaatgt ttcaacatat ttcatccaga ggattgtctt
ttacaaatag 61800 cacagtttta actggaataa taatatgaat gctttgagga
tataggaact gtattagggt 61860 tcactagagg gacaagacta ataggataga
tgtgtatatg aagaagagtt taaggagtat 61920 taactcacac aatcacatgg
tgaagtccca caataggcca tctgcaggcc gaggagcaag 61980 gaagccagtc
caagttccaa aatctcaaaa gtagggaagc cgacagtaca gccttcagtc 62040
tgtggccgaa gccccaagag cccccagcaa accactggcg tacgttcaag agtccaaaag
62100 ttgaagaact tcgagtccaa tattcgaggg caagaagcat ccagcacggg
agaaagctga 62160 aggccagaag attcagcaag tctgatcctt ccagcttctt
ttctctgctt tattctagcc 62220 atgctggaag ctgattagat ggtgcccact
cagattgagg gtgggtctgc ctctcctagt 62280 ccgctgactc aaatgttaat
ctcctttgac tatatcctca cagacacact ggaacaatac 62340 tttgcatcct
tcaatccaaa gttgaaactc actattaacc atcacagtaa ctttctccag 62400
atgtataatg atggtgtacg ttatgtatgg gttctggtgt tatcttattt ctttctgacc
62460 cagacagtta agtctttaaa taatttataa cataaaaagt ttttacaaca
taagacaatc 62520 catgctgttc aggtactgca aggacagacc tttgtactct
ggaatagctc catgtgtaat 62580 aatttttcac acattttctt ttatggataa
acaactaaat gtaatttaaa ttattcttta 62640 aaaaattatt gtgaaggtgt
tctattactg gaattaatca aatgtggatg ttcctttggt 62700 atctacttaa
aatgttttaa ctggccaggc acagtggctc atgcctttga tcccagcact 62760
ttggaaggtt gaggcaggca gatgacttga ggtcaggagt ttgagaccag cctagccaac
62820 acggtgaaac cccgtctcta ctaaaaatac aaaaattagc caggcgtggt
gttgggcgcc 62880 tgtagtcccc gctactctgg aggttgaggc aggagaatcg
cttgagccca aaagtcagag 62940 gttgcagtga gcaaaggtca tgcccactgc
actccatctg ggcaacggag cgagactcca 63000 tctcaaaaaa ataaataagt
aaataaaata aaatgtttta atttcttgcc ccaaaactgt 63060 aaggggtctc
agttcatcat atcatgctgt tatgcagttt gccaaaactt gctttaacaa 63120
acatgagttg tagggaattg acaatttctt tcatagtaaa gagatttatt agatttttct
63180 atcatttcca tagctgtttc cagaaaggag ttggatgact gtgattaaag
aaccataatt 63240 tatggtggac ccagttgaac agacacagcc aaatgtcttt
cttgtttttc catcagtcgc 63300 tgaacacagt gcattttaca gcagtagcat
cagagtcagc tttcacagaa tccttctgtg 63360 gccagtacag tgcttcaccc
ctgcctcccc acgcctggaa cctcactggt tcattttctc 63420 cagagagcga
agctcctatc ttctgttgga ttggagggag gcagtgcctt cattatgtgg 63480
agtaggagta gaggtagtga gttctaattg tattttatcc agactttaaa acttgtgctt
63540 tatttttatt atttttattt tattttactt tttgagatgg agtctcgctc
tgtcgtccag 63600 gctggactgc ggtggcacaa tcttggctca ctgcaacctc
cgtctccgag gttcaagtga 63660 ttctcctgcc tcagcctccc cagtagctgg
tactgtagac ggatgccacc acgcccggct 63720 aatttttgta tttttagtag
agacagggtt tcaccatgtt ggccaggctg gtcttcaact 63780 gctaacctca
ggtgatctgc ccaccttagc ctgccaaagt gctgggatta caggtgtgag 63840
ccactgcgcc tggctttatt tttatttttt atttttactc tgccttggga gaatctagaa
63900 aacttttgcc ttttgtccca ctcttcatcc atgctttcag ggctaccttg
aattctttag 63960 cttttgtaga cttttaggac ccacatcaac ttgttgttct
ctatctctag ccccacaaat 64020 gttgaggttt ctgctttctc tagcctgtta
agtgttggtt actttttgtc catgtacttt 64080 ttgtttccca aaattttgtc
agcatctctt gtcagctgat gtcctctttg tcattatttt 64140 tgttcttgtg
ggtttatata ttttttattt cttaattgtc attttaatac tattcagaca 64200
ggaagtaaaa acgcatgctc agactaccat ttatagaaat ttgaatttaa aaaaaatgtc
64260 ctaggtgagg gagtacctat caagggtgga aatcacttgt gtagatgaca
gtgacagtgg 64320 agaactgaag tctataaaag ttaagaccta gatctagatg
ctcctgaatt tccccttttt 64380 attcttaaca acacttcctt tgtgctgtga
tctcaagcaa ctgagcctag gtctttttat 64440 tcttgtctga tataacagaa
ggtagaggat gaaataaatg agtttattag gtaacacatt 64500 ttgaaaattg
tgtttaagat ttagatgata tattttagaa cttctaataa attcagagga 64560
attcaatgtc aaaggaaact tttgtatagt tatacattgc ttaatgttta tacatacatc
64620 catgtagcat acttctaata atatctttaa ttatactagt tattttaaaa
taacccacaa 64680 atactcaagg aattgttcag tttgtgaact gtgtgagaac
tacagttttt catggtaaca 64740 tttatttgtg tggtttttaa aagtgatcac
aggacatctc ctaaaagata atatagttaa 64800 gcagatttgc ttagttaaga
tattaccaag agcatctaga tgaataatta gaataaatac 64860 ttgtctcttg
gagacgattt tgggtgtagt ctttactaga ggcataggta tggactccaa 64920
gttggctcta atattatgag atacccttga gtaaataaca gccattctct agaccttagt
64980 agaatgatta ttaggtgtcc tgaattgttt atgacctcaa ccaaaccaaa
agaataattt 65040 ctacaaaaga gtctatgtta ggttttcata gcaccaagtt
caaatggagc ttagtaatga 65100 aaattttctc attaagaaat gaattaatta
aaattaagag cataaaataa gacagttgtt 65160 ttagaaactt caagtaatac
agtgtgggag ttatttttaa tgttaaaaat aaagctttcc 65220 taattcaagc
acgagagaca gaaaaaaaat aataaggctg aacttggagt tactgccagg 65280
aagaaaagta attttaggcc acaagcttca aaacaggcag aaacctccag tgtatcaaac
65340 aaactttctg gaataggccc agaagcactg atctgtgaac agttgtcttt
gtatttgtgg 65400 ggtcttaact ggcagttaaa gagactaaat aatagcaggg
agtttaaaaa gcaggtgaga 65460 tttagaattg atcgatctgt gttagcggag
gaacatttat ggtttcagtc acttacctat 65520 aaagtatgag aattgtttct
ttaaaagaat gctgcctctg tttttctgca tgttgttagt 65580 attttctgaa
ttgccgtttt cctttctagg gtatttgttg ggttgagaga ttagttggat 65640
tacatgacta cagttttatt ctgctttttg cctgcctttt gccaagaaag acacaaatgt
65700 cccatgtatt taattttgca cacttcagtg tttctaaaca gggtaaatgt
tcatttgttt 65760 aagtacccat gtatcatata ttcaatttat atctagcaag
atttttcctc aaaaattatc 65820 ctaagcaaag aaggatttat attataatca
gtccttataa agtttctcat aatacactgc 65880 attctcaatt actttatttt
tgaagaacat agtatttgag gaagttacat taaacagaaa 65940 gaacctgggt
agatactagt ttctgattat tttcatagaa gtcacctgaa aaattggtta 66000
gaaaaaaaag acaaaattaa tacaaattta acagttattt gtgaaatatg taaatgttgt
66060 gttattccat tttgctgtgc tacaaaggaa tacttgaggc tgggtaattt
ataaagaaaa 66120 gagatttgtt tgggtcagag ttctgcaggc tctataacag
gcacagtgct agcttataag 66180 gtgagacctt aggtagctta taatcatgat
ggaggacaat gggagagcag gcatgtcaca 66240 tggtgagaga gggagcaagg
aaagagccag ggacctttta acaaccagct gtcatgtgaa 66300 ctcattacca
tggggaaggc accaagccat ttatcaggga tctgcccctg tgacccaaac 66360
atctcccagt aggtccctcc tccaacattg ggaaacaaag ctatagtaac caaaacagca
66420 tggtactggt ataaaaatag acacatagat caatggaaca gaatgcagaa
actagaaata 66480 aagccacaaa tctacagcca actgatcttt ggcaaagtag
acaaaaacgt acactgggaa 66540 aggacaacct attcagtaaa tggtgctgag
aaaattggat agccatctgc agaaagaatg 66600 aaactgaacc actctctctc
ttattttata taaaaatcaa ctcgaggtta ggctaggtgg 66660 ctcacacctg
taatctcagc actttgggag gctgaggtgg gtggatcact tgaggtcagg 66720
agtctgagac caacctggcc aaaatggtga aaccccgtct ctactaaaaa tacaaaaatt
66780 agctgggcgt gctggtgcat gcctatagtc ccagctactc gggaggctga
gacaggagaa 66840 tcacttgaac ccaggaggcg gatggtgcag tgagcccgag
atcgcgccat tgcactccag 66900 tgtaggggta tcgcagcgag actctgtctc
aaaaaaaaaa aaaaaaaagt caactcaaga 66960 tagattaaag actttaaatg
taaaatccaa aactaaaaca tactagaaga aaatctagaa 67020 aaaattcttc
tagacgttgc cataaacaaa gagttcatga ctaagacctc agaagcaaaa 67080
gcaacaaaac caaaagtaga cagatgagac ttaattaaac taaaaagctt tttatacagc
67140 aaaagaaaca acagagtaaa cagacagctt gcagaataag caaaaatatt
tgcaaaatac 67200 atatgcaaaa gaccaatacc cagaatctac aaggtaactc
aagcaactca acaacaacaa 67260 aagaacccca aataacccca ttaaaaagta
ggcaaaggag atgaaagaca tttttcaaaa 67320 gaagacatac
aagtggccag gaagcatttg aaaaaatgct caatatcact aatcatcaga 67380
gaaatgaaaa atctatgaga taccatctta taccagtcaa aatggctatt tttagaaagt
67440 caaaagtaac agatgttggt gaggatgtgg agaaaaggga gtgcttatat
agtgctggga 67500 gaaatgtaaa ttagtaccac ctctatggaa aacatatgga
gagttctcaa agaacaaaaa 67560 atagaaccgt catttgatcc agcaatccca
ctactgggta tatacccaga ggaaaagaat 67620 tcattatgtc aaaaagatac
ctgcacacat atgttcgttt tatctgatat aaaaagtctg 67680 ttttatctgg
tataaaaaga atggaatcat gccttttgca gcaatatgga tgaaactgaa 67740
ggctgtgaca ataactcaga aattcaaata ctgaatattc tcatttataa gtggaagcca
67800 aataatgtgg acatatgaac atagagtgtg gaataataga cacaagcatg
agctatcatg 67860 cccagcctca aaaaatttaa tttccctctt aattttgtca
ttgacccaaa ggttgtccag 67920 gagcatgttg tttaatttac atgtgtttgt
atatttttga gagtttctct tcagattgat 67980 ttttagtttt attccattgt
gtgaagatac ttgatatgat tttgattttt ttttaaattt 68040 attgagactt
gttttgtggc ctgacgtttg gtctgtcttg gagaatgtcc catgtgctaa 68100
tgagaaaaat gtatcttttg tggttgttgg gtagaatgtt ctgtaaatgt ctgttaggtc
68160 catttggttt taagttcagt gtttctttgt tgactttgtc tgtctcagtg
ttgaagtccc 68220 acattttgta ttgctatctg tctcttttct taggcctagt
agtatttgtt ttattaatct 68280 ggtactccag ttttgggagt atatacttag
gattgttata tcttcttgtt gaattgatcc 68340 ctatgtcatt atatactggc
ctttaaaaaa aaaaaaacta ttgttgattt aaagtctgtt 68400 ttatctaata
taagtatagt tactcttgct tgcttttggt ttccttttgc atggaacatt 68460
tttccacccc tttaccttca gtctgtgtgt ctttaacagt aaggcaaatt tcttgtaagc
68520 agcatgtagt tgttgttttt taatccattg caccaattta tatctttgaa
gtggtgcatt 68580 caaggttaat actgatgcat gaggttttgt tccagtcata
atgttaattg ctatctagtt 68640 gctttgtaga tttttttttt tcttttaagc
aagagtcttg agtcttgctc tgtcacccag 68700 tctggagtgc aatggcgcga
tcttggctca ctacaacctc cacctcccaa gttcaagcga 68760 ttcccttgct
tcagcctccc aagtagctgg aattacaggt gcatgccacc atgcctggct 68820
aatttttgta tttttagtac agacgggatt ttgtcacgtt ggccaggctg gtctcgaact
68880 cctgacctca ggtgatcctc ccgccttggc ctcccaaagt gctgggatta
caggcgtgaa 68940 ccaccgcaac cagccagctt tgtagattct ttgtttgttt
tttgttcccg ctttgtggtc 69000 ttctggagtt ctgtcatgtt gcccttttat
ttctttcttt tccttatttg tataattgtt 69060 tcataaaact tgtgagtttc
atgtgttttt atgatagagt atcacctttt gttcccatgt 69120 ttagaacttc
tttaaatatt tctcatagga ccaatcaagt ggtgatgaat tccctcattt 69180
gcttatctgg gaaacacttt atttctcctt catttgtgaa gcttacacta gcaggataca
69240 aaattcgagt ttgaccattt tctttaagca ctttgaaaat agaatccccg
tctcttctgg 69300 cttctgaagt ttctgctgag aagtccactg ttagtttgat
gaagtttcct gtataagtga 69360 ctagacactt ttactgtatt tagggatttt
cccttcacat tgaccttaga cagcctgatg 69420 actagatgcc atggtgagat
ccttctcgca atgtatttgg ctggagtttg ttgagcgtct 69480 tgtatctgga
tgtctagatc ctttgctaga ctagggaagg ttttctcaat tattttctca 69540
aataggtttt ctgaaatttt tgctttttct tctcctctag gaatacctat gattcatagg
69600 ttccaatgtc ttatgtaatc ccttactttt cagaggctct actcattttt
taaaattctt 69660 ttttcttttt tttttttgtc tgactggatt aattgaaaaa
acctatctta aagttctgag 69720 gttctttctt ctgcttggtc tagtctgttg
ttgaagcttt caaatgtatt ttataattcc 69780 ttcaatgaat tttttatttc
caggagttct gtttggtttt ctttttaaaa tacctatctc 69840 tttggtaaat
ttctcattca tttcctgaac tgattttctg acttctttgt attagttttc 69900
agatttctct tgtatcttgt tgagctactt tttttctttt aatttaattt tattttgaaa
69960 cagggtctcg ctctgttgcc ttgtctggag tgcagtgatg cagtcatagc
tcattgtaag 70020 cccaagcagt cctctgcctc actgtcctaa gtagctacaa
attcaggcac ataccaccac 70080 acctagctta tttttttatt ttttgtagag
atggagggtt atactgtgtt gcccaggcta 70140 gtcttgaact cctggcctta
agtgatcctc cttcctcttg ccttggcttc ctaaactatt 70200 gggattgcag
gcatgagtca ctgtgccctg cccctgacag cttcttcttt tttttttttt 70260
ctgagacaga gttttaccct gtcacccagg ctagagtgca gtggcacgat ctcggctcac
70320 tgcagcctcc acctcctggg ttcaagtgat tcttgtgcct cagcctcctg
agtagctggg 70380 attacaagcg tgcgttacca tgcctggcta atttttgtat
ttttttagta gagatgcggt 70440 ttcaccttgt tggccaggca ggtcttgaac
tcctggcctc aagtgatcca tccaccttgg 70500 cttcctaaag tgctaggatt
acaggtgtga gccactgtat ccagcccctg atagcttctc 70560 taaatcagtg
ttttgaattc tttatctggc attttgaaga tttgtttttt agttaggatc 70620
cattgctaga gaattactgt gtttctctgg gggtgtcata gcaccttttt tttttcatat
70680 ttccaatatt actgtgctga ttcatttgta tctgggataa cagttgcttc
ttattatttt 70740 ttagtttact tttgttgggg caggactttc tttcccttga
ggatgtatct attatgtatg 70800 ttgagtaggg tcatttggct ttgcttcagg
gtgcattcag tgacatagac actgtatgat 70860 agccttggtt ataaagtagt
cttagtatgg tggctttctc aaatgccagt gacagtagta 70920 atgtacgggg
tgggtgattg ggctcaaggc ctcctgccta gctggggtgg atgatggtgg 70980
cagcagaggt cgtgcaaaac ttgctttctt ccaaggcact atgcagttgt atcaatagat
71040 gttgtaatgg gtggtgcagg ttgacttccc agctaggagg tggtgcctgc
agatgagcgt 71100 cagctgcaat agtggcagta gggtgattaa cctttgtaat
tcaagaatta ttcaggtatc 71160 tcaggtaccg agctgggccg tgaaactctc
aggggtcctg gtcttgtgct gtgcttccag 71220 ggtagattgt ggggtgaagc
caggcaggct ggaccagcca agctcatgtt tgagccccct 71280 gaatgggtac
ttagggcctg ggataaaatt tccagaggct gcctcataca ttgtttcaag 71340
aattacttta tcttagataa tcttggtatc tggtagtgta agtcttccag ctttgttctt
71400 cttcagaatt gggttggcta ttgtaggtcc ttcaaatatc catgtaaatt
ttaaagtcag 71460 tttgtcattt tctaccaaca agtaaataaa taaaaactcc
tggggcattt ttattatgat 71520 tccgttgaat ctgtaaatct agttggggag
aattgacaat ttgtattatc aagtcttcta 71580 attcatgacc agcttcattt
atttaagtct tcttacataa gttttttttc ttcagctttt 71640 aagttccagg
gtacatgtgc aggatgtaca agtttattat gtaggtaaac atgtgccatg 71700
gtggtttgct gcacagataa tccatcaccc aggtattaag cccagcatcc attagctatt
71760 cttcctgatg ctctccctcc cctcactccc acccacaaca ggccccagtg
tgtatttttc 71820 cctgccatgt gtccatgtgt tgtcattgtt cagctcccac
ttataagtga gaacatgcag 71880 tgtttggttt tctgatcctg cattagtttg
ttgaggataa tggcttctag tttcatccat 71940 gtccctgcag aggacatgct
ctcgttcctt tttatggctg catagtattt catggtgtac 72000 atgtaccaca
ttttctttat ccagtctgtc attgatgcgc atttgggttg attccatgtc 72060
tttgctattg tgaatagtgc tgcaatgaat atatataaat cattctgttt ctttggctat
72120 atacccagta gtgggattgc tggatcaaat ggtatttctg cttctagatc
tttgaggaat 72180 caccacactg tcttccacaa tggttgaact aattaaactc
ccaccaacag tgtaaaagca 72240 ttccttattc ttcacaacct cgccagcatc
tgttgtttct tgacttttta ataattgtca 72300 ttctgactgg cgtgagatgg
tatctcattg tagtttttat ttgcatttct ctaatgatca 72360 gtgatgttga
gctctttgtc ctatgtttgt tggcaacata atgtcttctt ttgagaagtg 72420
tctgttcatg tcccttgccc actttttaat ggggttgttt ttttttttcc ttgtaaattt
72480 gtgttcctgg tagactctag atactagact tttgtcgggt ggatagattg
aaaaattctt 72540 ttcccattct gtaggttgtc tgttcactct gatgatactt
tcttttgctg tgcagaagct 72600 ctttagttta attagatccc atttgtcaat
ttttgctttt gttgctattg cttttgtcat 72660 tttcttcatg aaatctttgc
ccgtgcctat gtcctgaatg gtattgccta gatttttttc 72720 taaggttttt
atagttttgg gttttacatt taagtcttta attcatcttg agttattaaa 72780
taatttttgt ataaggtgta aggaaggggt ccagtttctg ttttctgcat atggctagcc
72840 agttttccca gcaccattta ttaaatagag aatcctttct tcattggtta
ctagtacaaa 72900 aacagacaca tagaccaata gaatagaatg gagaactcag
aaataagacc acacatctac 72960 aaccatctga tcttcttaaa taagtttttt
aagagttttg atcattttct gtggcacact 73020 tttacataat ttttctttag
atatcttcct aggtatttga tctttatgtg tatattattg 73080 taaataacgt
tcttaaaatt ttgttttcta attttttgtt ggtagtgtat gacaatgcaa 73140
tattggcctc ctgttcaaca aacttgccac attcacttat taatcataat tgtttgtgga
73200 atcttttgga ttttctgcat ctaccatcct gtaatcacaa atgcagatgt
cagtttttac 73260 ttcttccttt ccaacgttat accttttatt taatttcttc
cctaatatgt tggctaggac 73320 ctcctgggaa atgctgaata gaaataatga
taatagacaa agtaagcagg ataaaagcct 73380 atgaagaaat taccaactga
cataggcttt gctttgtagc tttaggtcac ccctcatcac 73440 ctaatattat
aaaatgacaa ttcggtagga ttctcagaaa ctgtccagtt tgaccctgat 73500
ttaattctca acattctcca gtaaacacta tgccttgcct gtttgacttt gttaacagac
73560 atgtcagaca atcatgtggt gaagtgtgat tttacttgtt tattcaacct
gagatttgct 73620 gacagttcgt tctgtgttgc tgtaacagaa taccacagac
tgggtaattt taaatgagca 73680 gaaatgtatt ggttcacagt tctggaggct
gaagagtcca atgtcaaggt gccagcttct 73740 gacaggaacc ttcttgctgc
atcttcacat ggcagaaggg caaagaaaga gaagggggcc 73800 tgaactcact
cttttataag gatatcagtc tcacccataa gggcagaatc ttcaggaacc 73860
taagagcaac ttgttacttc atggcctact gacctcttaa aagtctcact acttaatatt
73920 gttacaatgg cagttaaatt tcaacatgaa ttttgaaggg gacaaacatt
taaaccatag 73980 cactgacttt cttgaatttg tatactcttt tattggtttt
ggaaagattt tggccattat 74040 cttttcaaat attcttccca tttttttact
cttccttctg ggattctgag aagagagccc 74100 ttcactgtct cttatcctcc
tttctatttt ttttttgttt gttaattttt ctctctcatt 74160 cagtttagat
attttctgtt gccctgtatt ccagtttgtt attgctttct tctatttttt 74220
tgtggtctgc tattaagcct atgaagttct taattaccat attgtaattt tttttttttt
74280 ttttttttac ttttagaatg gccactggat attttttttt tctttcttta
agacagagtc 74340 tcactctgtc acccaggcta aagtgcagtg gcacgatttt
ggcttactgc aacctttgcc 74400 tcctggattc aagcgattct gatgtctcag
cctcctgagt agctgggatt acaggcgtgt 74460 accaccatac ccagctaatt
ttgtattttt agtagagacg gggtttcacc gtgttggcca 74520 ggctggtctc
gaactcctta ccttaggtga tctgccctcc tctgcctgcc aaagtgcaaa 74580
gtgctgggat tacaggcatg agccaccgcg cccagcccat tggattcttt tttttttttt
74640 ttttttttga gacggagtct cgccctgttg ctcaggctgg catgcagtgg
cgtgaccttg 74700 gctaactgca accttcacct cccaggttca agtgattctc
ttgcttcagc ctcccgagta 74760 gctgggatta caggcgcccg ccaccacacc
cgaccaattt ttgtattttt agtagagacg 74820 gggtttcacc atgttggcca
ggctggtctt gaactcctga cctcaagtga tccacccacc 74880 ttggcctccc
aaagtgctgg gattacaggc atgggccacc acacccggcc aggattcttt 74940
gtatatatat ggactccaat agattctcca ttgatatttt ctatcttttt atctatttaa
75000 tccctccttt tccctatttt cttggacatg ctagtcatta ttttgaaaat
ctctacctta 75060 acactccatt atctgattca gttatgtttg gtgtttgttt
tgtttgtatt accttttttt 75120 cccccttgat ttctagtttt ttgttctgtt
ttttagcatt tcttgtattt ttttactgga 75180 tgccagacat tggatgaaaa
atacaagggc tgtaactatt atcctctgaa aagtgttaca 75240 ttttcttctg
attggtaact acagtaccaa cctgtcactc tgtcctgtca aggctgagtt 75300
ttaggctttg tcaggactcg tcaatttcag tttgggtctt attactggga tacagtcttt
75360 atttttatta tgtggtactc ccaggatgta gttcttattc cttcgtgggt
gacccttact 75420 tctagagcat gatctttctg agttctcaca tgaaaatcca
atcaggtctt tagcatcctg 75480 gcttctcctt tctcctgggt ttctaaaaga
ctcaccctga atacattcaa cttaggagtt 75540 agtcaacagc ttgaggggga
tttaagtgca gatttttgag atccttcttt ttggtttctt 75600 cctttattgg
gattttgcca atgaagtccc agttgctttg acaacctcta attttcagaa 75660
ttacttttga ctaaatgttt tatgattcta aacataccat ctactctgtc aattctgaat
75720 tatggtgata ctcaattcta cctcaaatcc caaagaaaag agggggaaaa
aacaacaaaa 75780 ctaagaagaa acattgcttt tgttttgtag ctttaggctt
ctacctatat aattgactat 75840 tataaaatct catttgagta ggatctttag
tagccaccta ctttgactgt gatttgattt 75900 ataaatccct tcacaacatt
cctcagtaaa caccatgctt tgcctgtttg acttggttaa 75960 cagacatgtc
tttataaact tggctatcca ttttccagtc tgtaggaaaa gagaagctgt 76020
aagttggaga aaaggctagt ggttgggtgg tgagtcataa gcaataagat ttgatgtcag
76080 tgatgacagg cctgtcctct tatgatagat tccttgagcc ccctgctgac
cacaaagctt 76140 tggctggcta gaccacaagt ctgtctccct caatgacaat
ttttgtagct caatatggat 76200 cctattttgt gtgagttgca tttggagatt
tattgtttat ctgctgtatt tgccttaggt 76260 gggacagtga aatcaaccta
atgtagtgga aggaagtagg tattacatcc ttaattcctt 76320 gatatacatc
cttttattat gtggtactcc cgggatgtgg tttttcagat ttggagaaga 76380
atagttaaaa aaaaaaaatg cagaaaggat caaaagcact tgattctctc gcagggacag
76440 cttcctgttt tggttgagga aggagctgca cttaaaataa ctagcataaa
gcatgcttag 76500 ggcttgcttt ccagacaacc tcaatttaaa atgcatcaaa
agccaggtgt ggtggctaac 76560 atctgtaatc ccagcacttt gggaggctga
agagggcaga tcacttgagg tcaggagttt 76620 gagaccagcc tggccaacat
ggtgaaaccc catctcttct aaaaatacaa aaattagctg 76680 ggcgtggtgg
cacacacctg tagtcccagc tacttgggag gctgagatgg gaggatcatt 76740
tgaacctggg aggcggggat tgcagtgagc cgagatcaca ccacagcact ctagcctggg
76800 caacagagca agactctgcc tcaaaaaaag aaagaaaata aaattcatca
aaataaaata 76860 tttgaatttt acagcactag ttcttttcat tcattgactt
tcattctccc actttaccac 76920 acctttaact attggcaaga atgtggtgag
tgggagaaag cgtatcctgc cacgtaagca 76980 agtataccta gagccaaggg
gtcagagtgt cacagaggag agccacatgc tgatgggctt 77040 gtgttcgttc
ccactcactg actatgcaag cgcctcttct cttagccttt ctcaggatgc 77100
agttctccag ggaggaatca gccttctgtt gggctgcttt cagagctctt tgttgtggct
77160 tcctgccatt gactttgcaa gccctaagca tgctttatgc tagttatttt
aagtgcagct 77220 ccttcctcaa ccaaaacagg aagctggctc tgcaagagaa
tcaagtgctt ttgatccttt 77280 cagctttttt tttttttgac tattcttctc
caaatctgaa acatatccat tctcgtctac 77340 ggccatgagt gcatttatgt
taacagaaaa tgctaaattt aatgtttaga aagtaacctc 77400 tgtggccaga
catggtgact aatgcctgta atcctggcac tttgggaggc cgaggcaggc 77460
agatcacttg aggccaggag ttcgagacca gcctggccaa cacagtgaaa ccctgtctct
77520 actaaaaata gaaaaaatta gttgggcatg gtggtgggtg cctgtaatct
cagctacttg 77580 ggagggtgag gcaggagaat cacttgagcc caagaggtgg
aggtcgcagt gagccaaaaa 77640 tcaagccact gcactctagc ctggatgaca
gagcaagact ctctcaaaaa aaataaaaag 77700 taacctctgt gctttgtgta
actttttgct aaattcctgt ctttgtcttc ttggaacagt 77760 cttctacttg
ttacaggatc ttcctatctt ttggatttta tattagtttt aatataaaat 77820
taatatagtt ttatattata tagcccactg acatggctgt tagctgacct cagttccttg
77880 ctgacttggc cagagccttc agtttcttat ctctggtaag aggtaatgtg
tctctcccta 77940 gggcaaggct gtgacagctg gcttctccca gagggaatga
tgtgtgagag aagcagggag 78000 agtaagaatc aagacaaaac tgcagtcttt
tatacccatc actattgcca tattctcttg 78060 gtcacacagc ccaaccctgg
tatgatatgg gaggcactaa ctccatgggg atgggatatc 78120 tgggcaccat
cttgaaggct agctgacaca gattattttt tgtgcgtgtg cctgtaagaa 78180
ttttttggcc aggcgtggtg gctcacgcct ttaatcccag cactttggga gggcgaggtg
78240 ggtgggtcac gaggtcagga gttcaagacc agcctggcca agatggtgaa
accccatctc 78300 tactgaaata caaaaattag ccaggcatgg tggcaggggc
ctgtaatctc aactactcgg 78360 gaggctgagg caggagaatc gcttgaactt
ggggggcgga ggttgcagtg agccgagatc 78420 acgccactgc actctagcct
gggcagcaga gtaagactct gtctcaaaaa aaaaaaaaaa 78480 aaaaagaatt
tttctaagcc cgcattgaag tttatactgt agaatatcca tcaaacttga 78540
gctgatttct tatcaaagac ccaggttgca cagatagggg ttagaagttt ggattcggtt
78600 ttgcattttc agtatttaaa gtcttgtttc atcttgttca ttcttacctt
tcctttgatt 78660 gtattagtag ctcaggacaa ataagaattt ataattttcc
aaggaactaa ggttgctgtt 78720 gaggaatatg ggtttcagag acaagagttt
aggcactggc tcattggtac taagcttcag 78780 gggtttgtag tgttgttaga
gctaattgga ttttacaaat aagccaagat tattaaaaaa 78840 aaaaaataga
tctagagagt aacactttct gtgctaaatc cattgcattt gatgggatac 78900
taggcagtat gctatgtcca aacttctaaa atcaggcggt ggtctaacgt tgaggtgaaa
78960 atatcatgtt gggtatatac tgccaatatc atgaagatat actaaatatt
attttctgag 79020 tctgacattt acactgattt actgatttat ccctcatcaa
tattggcctg gtttaagaga 79080 gacttgtttg cctgtacaga ccgggaggaa
gcttcaatga aggcaaaaat ctaactataa 79140 taggagccaa acatttgtta
tttgaattcc aattggggac aggaaaataa aatattatca 79200 aataattata
aagtcatcat tctgttaaat gaatcatata ggaaaatgca ttgaccttaa 79260
aacagagtct ggctctgtta cccggactgg agtggagtgg cctggtttca acttgctgca
79320 acctccacct cacgggctta agctgtcctc ccacctcagt ccctagagta
gctgggacca 79380 caggttttgc catgttgctc aggctgttct caaactcctg
agctcaagaa atccacctgt 79440 ctcagcctcc tgaagtgctg ggattacagg
cgtgagccac cgcgcccggc ctgcagtgac 79500 ctttggttgt cattgttata
cattatcaaa acaaactcaa gttacaagag tattaaagca 79560 atacttaatg
gttttaaaaa aaatattaca aaaggtctct gcattttaac tactcatcta 79620
aataattgtc taggaatatt ttctgaatct ctaatacagg aaatgagatt tattaataca
79680 taaaacccac tgaaaacagg ggtgcaaact ttcttgtctg gtactaaaga
tggattccta 79740 tgttttgggc ccttgtttat accagtttat tcaatcagtg
agtcagctag catttactga 79800 atagtcatat gcgttgctta atgatgggga
taatgttctg agaagtgcat ccctgggaaa 79860 ttttgtcatt gtggaaacat
catagagtgt acttacacaa acctagatgg tatagctttc 79920 tacacaccta
ggctatatgg tatagcctgt taatcctagg ctataaactt ctacagcatg 79980
tgactatact gaatactgta ggcaattata acagagtggt atttgtatat ctaaacaaca
80040 gatgaacaat aaagaaaaaa taaacaacaa ataaaagctg gtacttctgt
ataaaggcac 80100 ttaccatgaa tggagttgca ggactggaag tagctctgcg
tgagtcagca agtgagtggg 80160 agtgaatgtg aaagcctagg acattactgt
gtatatacta ctatagactt attaacactg 80220 tacacttagc ctgtattttt
taattttttt cttttttttt ttttacttct ttttcttttt 80280 ttgagacagg
ctgtgttgct caggctggtc ttgaactctt gggctcaagt gatccttcta 80340
cctcatcctc ctaagtagct gggattacag gtgtgtgcca ccacacccag ctttttaaaa
80400 cttttcaaat cttttataat aacactcagc ttaaaacaca aatacactgt
atagctatac 80460 aaaaaatatt tttaccccat ttatgcctag tgctccatta
ttggaacact aagcttgtgg 80520 gagttattta tatcctactg ctcaaggtca
ttgccaaggt ctgatttttc acaaaaaaaa 80580 attcacaact tctggcataa
atgggttaat atccttactg tatataagct tttttaaaaa 80640 ttgttttact
ttttaaactt ctttgttaaa agcaaagaca cagacacaca ttagcccagt 80700
cctgaactag gtcaggatct tcagtttcac tgtcttccac ttccacatct tggcccactg
80760 gaaggtcttc agaggcagta acatgcatgg ataacagtgc cttctacctt
ctgaaggacc 80820 tgcctgaggc tgttttacag ttaacttctt ttttacagaa
gggagtacac tctaaaataa 80880 tgatgaaaag catagtatag tccaggcacg
atagtgtgtg cctgtagtcc cagctactca 80940 ggaggctgag gcaggaagat
tgcttgaacc catgagttca agaccagtct gggcaacata 81000 gcgagactcc
acctctaaaa atatatataa gaataaaaaa ttttttttaa atgaagcata 81060
gtaagtacat aaaccaataa catagtcact cactatgact atgaagtatt atgtactgta
81120 tgtaattgta cgtgctgtgc atttatacag ctggcagcac aataggtttg
tgtacaccaa 81180 gcatcaccac aaagatttgg gtaatgcatt ccattgccct
aacggggcta caacatcact 81240 aggcaatagg aatctttcag gtccgttgtt
gtcttctggg acttctgtca tatatgtggt 81300 ctgcctttga ccaaaatgtt
gttatgcagt gcgtgactat acccactata tgttcaagtt 81360 ctaaattgga
ttctgggaag ctgattaaag agaaaataat gtgtagtcta ttggaagagg 81420
tagataaaca atttttaagt gaaataattg ctaattttta acctctgtgg aggcactgaa
81480 ctgatcattg aaagctctat tttacttact aaagatatgg tagcttataa
aaattactta 81540 tagtaaatgg acatgaaaag gtcatttgct tacatctcta
aattcatttt gatggaaaaa 81600 tagtggaaaa atgtttgcag ataccctttt
gtttgtttgt ttttttcata atagataatt 81660 gccactaaaa ttgaagaatg
gccaggtccg ttggctcatg cctgtaatcc cagcactttg 81720 ggaggccaag
gcgggtggat tacttaagct caggagttca agattaacct ggccaacatg 81780
gcaaaacccc gtctctacta aaaatacaaa aaattagcca ggtgtggtgg tgcacacgcc
81840 tgttgtccca gctacttggg tgactgaggc atgagaatca catgagcctg
ggaggcggag 81900 gttgcagtga gctgagattg tgccactgca ctccagcctg
ggcaacaggt gagactctgt 81960 ctccaaaaaa aaaaaaaaac aactaaaatt
gaaaaatacc tcacagtcat aacttccatc 82020 tgtatctcag tggttattat
gtagaaatgt tcagtaggta aacttgaaag aaaatgtatt 82080 tggtaatcgt
aaggttgtgt tgccaccccc aaaataatga agaaaatacc aacagaaaga 82140
aaaaggattt attgctggcc tgaaggttct tctgggcatt tgatctacag atttctccat
82200 tatagctagt tcctttaaaa aaataaaaaa cattgaaaat atgcagaccc
aaatgccttg 82260 gcagccctgg tcagtaactt gaatctcagt tgcacttagc
acaattcctc tggctgggaa 82320 gatgttgttt tggaaaagat taacctgaaa
tgacagcacg aattatacag ttggaaatac 82380 tcaggttttt
ctgatttttt tcaaaagata ctttgctttt ccttttctgc cttaccatgg 82440
gaaggtcctt agatgcatca tatccttgtc agtttagcct tgtgacacat atttctgcaa
82500 ttttgtgcaa taagaaagcc actcgaaatc tcagcatttc atgtcacttt
taaagtaggc 82560 tcagttaaaa caaaaccact tgattgtttg tataaccaca
accatatgtg tctttctctc 82620 catgcttaaa caaggtctga aatcgtgtgt
caaacagttg agatgtaaac atctcctcct 82680 cacacataac ccctctgcca
tgttgttatt tatatcccca gtaacacact tcttgtccct 82740 gacacaagta
cagccgtctc cacattccat tttgctccta ctccatcagc ttgcaagaaa 82800
aattttaatc attcaaaaat aattgttaca taattacttt tcactgatta aaaatatttg
82860 tttacttgac aaaattagca ttaaaaacag taattctttg gcagattaat
aagtattttg 82920 atgatttgtc atttttcaca gatgttgata aaatttaaga
attacatagc cgaaatttgg 82980 tctaattcaa caaaccacaa ttgactcttt
tggtaaggcc ctatgacgaa tggtatggga 83040 gagtggagtt tatccaatct
gactttcatt ttattgatac ggaaactggg gccccatttg 83100 ttcttttttt
taattgctac ataatataca tatttatggg gtatagtgtg atgtttcagt 83160
acatgtatac attgtgtaaa aatcaaatca ggctgtttag catatctgtc acctcatata
83220 tttatcattt ctttgtggta agtatattta aaattctcta ttctagctat
tttgaaatat 83280 acaatactgt taaccatagt cactgtgcaa tagaacagtg
gtccccaacc tttttggcac 83340 cagggaccaa tttcatggga gacagttttt
ccacggacct gtggggtggt ggtttcagga 83400 taaaactctt ccacctcgga
tcatcagcat tagattctca taaggagcac ccaccctaca 83460 tccctcacat
gcacagttca taattcacaa tagagtttga gctcctatga gaatctaatg 83520
ccgctgctga tctgaccgga ggcggtgctc aggccgtaat gcttgcccac ccgctgctca
83580 cctcctcctg acaggccatg gactggtact gaccagtcca cagcctaggg
tttggggacc 83640 cctgcagtag aacaccagaa cttattcctc ctatttatct
gcaattttgt acccattgac 83700 caatctctcc ccatccccac tatctctccc
cttgccagtc tcttgtaacc actgttctac 83760 tctctgtttc tgtaagatca
acttctttag attccacata taagtgagat catgcagtat 83820 ttgtcttttg
gtgcctggct aatttcactt aatataatgt cctccaggtt caaccatgtt 83880
gccacatgtg acaggatttt attctttttg tggctgaata atattccatt gtttatatat
83940 gtcacatttt ctttatccat tcatccgttg atggatgctt acgttgattc
catatattag 84000 ctattgtgaa tagtgctgca acaaacatgg aagtgcagat
acccctttga catattcatt 84060 tcctttggat aaatgcccat ttgtgggatt
gctggatcat atgatagttc aacttttaga 84120 ttttgagaaa cctccatact
gttttccata atggctgtac taatttacat tccagccacc 84180 agtgtgtaag
agttctcctt tctccacatc cacaccaact acaggtggct tttctagact 84240
ggactttagg ttgggacaaa aagtgtcttt gagagtcagt agtcctaata ctgtactgtg
84300 aatgctgtgg acttaggcag tttgtttaag cttgtttaaa ctgggtctct
ctttccttag 84360 atataaatgg agggttagac tggatcttta agcttctgcc
cagcatttaa tgttctgttt 84420 attgtggttc tagcctgtgc ttcttgaatt
cctgattctt cctgaattct gctaagcatc 84480 agaatgcagt ctatacattc
tcaacagctt cccaaagaca tgatattagt ataacagaaa 84540 cagtagtagt
cctttcttgg aaaattatcc ccatttctgg accctatttt attgctggct 84600
gcaattaaca ggttcttgta tgtcccatcc ttccctcctc ctccctaacc cacaggcatt
84660 aaaaacctgc tgtttgtgaa aatgaacact tctttgataa tctggaagaa
ggggttcctg 84720 ttaccagaaa atttagctct tgaactcctg ggactgggct
tgaaagcata gtactattat 84780 gcttcagatt aagcagggta tagagaataa
ggagtgatca caaaaattct gtcttgaata 84840 aagatgatga tagatatccc
agggccctct gtggttagat agtctccatt tctaccacat 84900 tctgaggaat
tgtgggtgtt gcgcttttta tgtttctggc ctccctgcta cttgccattg 84960
gttggatcac tggccaagag ctaccgagaa ctaccatttt gcttcaagat tttttcaaac
85020 agcaaggaac ttttttattt tttaacagag agctactgaa gtttcctgag
ttattacaac 85080 ccccttatcc ttcctcctta cttccccttt caataattcc
ctttcctccc tcttcccaca 85140 gcagttcttt ggctattggg cctgttttca
ttgaaatcat cttcctgtgg cagagggaaa 85200 atgaatagag aagaacagtt
gactgtgtcc aagtgatagc tgcttgctta ggaaaagcct 85260 ggtccttccc
cagaggagtc tgtccctata ggacttccct ccataatagc tgtgcttcca 85320
tcagctctag aggatggctt agcccccttc gggggtacac cgcatttcac tctcacttgg
85380 ctcacagcca tcaccacagt ccatgctgtg agtgcattgc tggttctgcc
cccgtgctgt 85440 gtgcatctct gctgctttaa tgctgggaaa ctccgtggtt
atgccccaac tatcttggca 85500 atgttctgaa tcagacatag ataataccta
ttaaaggtat taataggcca ataataccta 85560 gtaaagaaga gctgggatat
acctctgcat agattaaatc aactagaaaa cactagcccc 85620 ctcccatttt
cagaccgatt ttatttcttt taagtgggaa aatagtcgaa gtgggatgaa 85680
gcagagctag cttattctac tcattttata tttctgtggc cttttcaacc tctgtttaac
85740 agcactttat tacttagttt ttttgttttg ttttgttttt ttgggatgga
atctcacgtt 85800 gtcgcccagg ttggagtgca gtggcatgat ctcggctcac
tgcaacctcc acttcccggg 85860 ttcaagcgat tctcatgtgt tagcctctca
agtagctggg attacaggca cctgccacca 85920 ggtccggcta atttttgtgt
tttcattaga gatggggttt caccatgttg gccaggctgg 85980 tctcgaactc
ctcacctcag gtgatctgcc cgcctcagcc tcccaaagtg ctgggattat 86040
aggtgtgaac caccacgccc agcctcactt tattactttt aagaatatgc ttcaaaatag
86100 tttgtaaaga agattttaat agggagcact tatatgaaat ataatagtga
tatatagtat 86160 agcatagagc agagtcttca gtctttgtat ctttttcttt
ttttcttatg catatttaat 86220 gtatgtgatt cccaaccgtt gtgtgattgt
ggtcagagcc ctgtctgtgg gatgctgggt 86280 agaatgagat tgtagagagc
actttgtttt cttgtaattg aagggtttgg ggtgagaata 86340 tgtgagtcat
agaaatctgt atagtaaata ttactctaaa aagggagcca tcaggatctg 86400
ggagaatttg ctaaaggaaa actaagaatg aaaaaaaggc caggtacagt ggctcactcc
86460 tgtaatccca acactttgag aggccaaggc aggaggacct gaggccagga
gttcaagacc 86520 aacctggcca acatagtgaa accccgtctc tactaaaaat
acaaaaattg ggccgggcgc 86580 ggtggttcac acctgtaatc ccagcacttt
gagaggctgt ggcgggtgaa tcacgatatc 86640 aggagttcga gactagcctg
accaacatgg tgaaaccccg tctctactaa aaatacaaaa 86700 attgggccgg
gcgcagtggc tcacacctgt aatcccagca ctttgagagg ccgtggcggg 86760
tggatcacga tatcaggagt tcgagactag cctgaccaac atggtgaaac cccgtctcta
86820 ctaaaaatac aaaaattagc caggcatggt gacgtgtgcc tgtaatctca
gcttctcagg 86880 aggctgaggc aggagaatca cttgaaccca ggaggtggaa
gttgcagtga gccgagatca 86940 caccattgcc ctctagcctg ggtgacacgg
ggactccgtc tcaaaaaaaa aaaaaaaaaa 87000 aattggccag gtgtggtggt
acacacctgt aatcccagct acttgggagg ctgaggcatg 87060 agaatcgcat
gaacacagac ggcagaggtt gcagtgagct gagatcacac cactacgctc 87120
cagcctctgt ctcaaaaaaa aaaggggggg aggggcggtg gggggagcgg gagccagtat
87180 ataattcagt atctctcatc tatacatatt aaggcttttg accattacca
aattctccca 87240 gcagctctct gagagtactg taattctggt tttgctgatt
agaaaaccag atacaaagag 87300 gtaaagtcac cttgttctag gccactaggt
ggtaatctga gtcaggactg gagacaatga 87360 tttattttta atatctcatg
taatgttaat ctcataactc agggcataac tcttttacca 87420 ttttggacta
tatcatttca ttcatatgat aaagacactg tagcttcccc ctcacctgca 87480
gcttcacttt ctgcagtttt agttacctgt ggtcaaccat cgtccaaaaa tattaactgg
87540 aaaattctag aaataatcca ctcgtaagtt ttaaattgtg cactattctg
ggcagtgtga 87600 tgaaatgtcg agccatcctg ctctgtgtga ccctggacag
gaagcctctc tttgtccagc 87660 atatccatgc tgtatgactc ccgccccttt
agccactcag cagccatctc acttaccaga 87720 tcaactgtct tggtttcagg
gtgtttgtgt tcaagtaacc cttcctttac ttaataatgg 87780 acccaaagcc
aagagcagtg atgctggcat tctgggttta ttttattagt attgttgtaa 87840
atctcttact ttgcttaatt tataaattaa acatgatcat aagtacatat ctatagggaa
87900 aaaatggtat atatagggtt ctgaaccatc ctgcatttca ggtatccacc
gtgggtctgg 87960 aaatgtatcg cctgtggaga aggggtgact actgtgtatg
taaaaatcac cctgtgtgaa 88020 atgttatatc ctcccctttc ctcagtttaa
cgttgttttg aaagaatttt ctcacattac 88080 ttgaaaacac ttaggaaacc
atttttagtg actgtagtat tttaccagtt agatatgcca 88140 tggtttactt
aaccatgttc ctaatgttgg gtacttatat tggatctaag ttttgctgtt 88200
atttgtagtg ctgcgatggg tgactgtgca caaacccttg cctgtacttt tgtgtatttc
88260 cctaaggata gattgctgca aaaaagaacc actgagtgtg agactgtaaa
tatttggaag 88320 gctttcagtc tatttccata ttgctttcct gaaagattga
accagtttat acttctgtaa 88380 gcaacagtgt ttgagaagat ctctttactt
tttttaacat tgacctttgt catttcttaa 88440 actttactag ttattttggt
aaccggcttg tttttataat ttgaatttct ttgcttctca 88500 gtgaaataat
agtttctttt ataggagtat taaccatttg ttaagaacca ctattttagt 88560
ccaaaagaaa ggtatataag aagaaaactg cacaattcca gtgggaagga cttggggtca
88620 gggtccctga tatgttggaa ggttgaactt tttgttgttg gtttttcccc
ttgccttaaa 88680 aagtccatat tgcttgaatg ttgcaatctt gggcaaggcc
agcaattaat ccaagggatg 88740 atgccactgt cttctcctgg tgctggtcct
ttctgacaga gaacatggta ctagggctga 88800 gtgcttgaat gcttgcacat
aggacccaga aggtgcacat ataaccgggg gttcgttcct 88860 tgagtgatat
ctttgtgaga tgacattttg cttgttggtt gtttgtttta taatgaggaa 88920
tcaaagtggg tattctagga agatccagtg tttccctact cacactttgc attacacaca
88980 gtccaggggg tgactcagaa tccagtgctg tcctgcctct cccagttggc
tgacaccatt 89040 ttcttgactg gagccttagt tttctaggca tatattctaa
tgatggaaca ttttgaaatg 89100 cagattattt ttgaggttac tgaatttttt
aataacacag ctgctgtccc taaattgcca 89160 tcttttataa ggtctagttg
cattagaaat agctctccca accccactcc cccagtgctc 89220 agaacgctga
accccgtact acacttggaa aaggattgga tgtcctaaag cattggttat 89280
gtaattgtgg gttggctttc acccactgag ctttacttcc tcctgtgatc gtgaaataca
89340 agctggcaac agtaattaga tctcagaaaa gcttgtcaca aagcaccaca
gactagagaa 89400 acttgtaagc tctttttgca ctggctgaag tttttgagta
ccactacctt ccatctatag 89460 tgtagtaacc ttagacaggt agtgcttttc
ttctgtgcat taattttaat taagcaatga 89520 cacctacttt cttttccact
ctgagatctg catgtagcta aacttatcag gtgagtgctt 89580 tcccatcttt
gatcattgat actgcttgga atataccgga aaaagagcag caagcagaaa 89640
atctcccatt tccacaagct gctgactaac tcagaattgc tagattttgt gaagcaaatg
89700 aatgctataa aagaagtcag aaagatcagg gaagctgtcc ctaggacttg
gtcaggccaa 89760 accttgaaat atcaagtgat gttacagagg tacaattatg
agaatatata taactcaaga 89820 cttacatatg tgataaatag tgcattgctc
tttgccgtct ccaaaggatt ttcttttttt 89880 tttttttttg agacggagtc
tcactgtgtc gcccaggctg gagtgcagtg gcgcgatctc 89940 cgctcactgc
aagctctgcc tcccgggttc acgccattct cctgcctcag cctcccgagt 90000
agctgggact acaggcaccc accaccacgc ccagctaatt ttttgtattt ttagtagaga
90060 cggggtttca ctgtgttagc caggatggtc tcgatctcct gacctcgtga
tccacgcgcc 90120 tcggcctccc aaagtgctgg gattacaggc gtgagccacc
acgcctggcc aggattttat 90180 ttttaattct cacagcaatt ctgcagagag
aggtagtgag aggtttaatg ctttgttcaa 90240 cataatttgc tgttaaatag
ccattcattg gcagaaaatc tgaactgttg tgttttcctt 90300 cctgtgtcat
tcatggtttc agtcctgaag aggagcccac tagagcccaa caggagagga 90360
gagtgggaga atccctcacc cagaagttca cagtggtatc atttagtgac actcaggatg
90420 tctccagtta ttgttagaat ttaaagttag gttcatccct gtgaggtcca
agaaaatata 90480 aaaataaaat aagggtctac tagtattaaa catactctgt
aatcactttt gaaaggaaag 90540 gagttagtgg aaaaaatgga agaaccatag
cgaaactaaa ataaatatat gtagatatat 90600 tgctggacgt ggtggctcac
acctgtaatc ccaacactat gggaagctga ggcagccaga 90660 tcacttgagg
tcaggagttc aagaccagcc tggtcaacat ggtgaaaccc cgtctctact 90720
aaaaatacaa acattaggcc aggctcagtg gctcacacct gtaatcccag cagtttggga
90780 ggctgaggtg ggcggatcac ctgaggtcag gagttcgaga ccagcctggc
caacatgctg 90840 aaaccccatc tctactaaaa atgcaaaatt tagctgggca
tggtggcaca tgcctgtagt 90900 cccagctaca gggaggttga gccaggagaa
tcgcttgaac ccaggaggtg gaggttgcag 90960 tgagccatga ttgtggcact
acacgcccgc ctgggtgaca cagcgagact ccatctcaaa 91020 aaaaaaaaaa
ttacatatat atacacatac acacacacac aaacattagc cgggcatggt 91080
gttgtgcacc agtaatccca gctactctgg aggctgaggc aggagaatcg cttgaaccca
91140 ggaggcagag gttgcagtga gccgagattg caccactgca ctgcagcttg
ggtgacagag 91200 cgagactctg tctcaaaaaa tatagataga tagacaatgt
tagataactg cataattatt 91260 atatgtgtgt attaatatac gaagcaatca
ctttcagaag gaatagtgtg ttaaaaaaag 91320 gtaatgaaag attttaaaac
aaaacacttc atgagacaag aagttagaac aattacggca 91380 aactaaaaga
aaaagctagg aatgagatcg aatacagcca agtatttcct gcagttttaa 91440
aacctctact ccccattttg ggtttctggc cacagattac gtaatatttt tcgttacttg
91500 aactggaatt acaaagattg atacagaaga tggtccgata agtcaattgg
gtcctgctcc 91560 ttgtatgtct aggtccaaac caaaatgagt caatatttgg
acaagatatc agccatccag 91620 ggcttatagg caggtaaagg agatggccca
ttattacagg gatttcaaac caggctttgt 91680 attctcttac cctggcactg
ccaattatat ttatttattg gaaaatgata accttagagt 91740 taagctatat
gcttataaaa gaggcactgc ttatatgggt tctatcatgt ccaggtttac 91800
attgcccgtt agaaaacagg acacctggct gggtgcagca actcatgcct gtaatcccag
91860 cactttggga ggccaagcga gtgaggatcg cttgagccca ggaggtcaag
gcagcagtga 91920 gctgtgttca caccagtgca ctagacacca tctcaaaaaa
aaaaaaaagt gttgggggga 91980 gagagagaaa gagagagaga gagagaagag
gaggggaggg gaggggatac ctgatcagac 92040 tcctctgaag agggaattga
aaagtttgtc acaagccctg agttatgctg atataacaga 92100 gaattgttag
atcagagaat ccaaagtaac ctactgcgct tagcccttca gtctttgtcc 92160
tagctatagg ccataaagtt gaatagtgcc gggaattgtt cttgacttaa gaatataatg
92220 gtcaaaaagg acaggcaaag ttgtttccct tctggaactt acactttaat
gggggagata 92280 gacaataagc aagtaaaagt aattgaacaa ggcaattgca
aataccaccc tcggtgagct 92340 cttgaaacac aaattatttc acctgcattc
cacagataca caggtgaatg tttgccttga 92400 taaatgcata aaagtgactg
aacttttgag gtccactggg cttttgtttg atatttactg 92460 ctagtgaatt
ttccagcctg caaatctctt agaacttcta aatacatttt tttttctttt 92520
aggttgcaga gaacacatct tagaagatga aaaacctgaa tctatcagtg acactactga
92580 cttggctcta ccacctgaaa tgccgatttt gattgatttc catgctctga
aagacatcct 92640 tgggcccccg atgtatgaaa tggaggtgat tcattctttt
tatttctttt tgctccagtc 92700 aatgaaagga acactttatt gaggccccag
ggccgtaggg cctgggcagg aggctgccct 92760 ttggggaagg aatagcctta
ttcgaccttc tttttgggac gcaggttgtt ggtgtggccg 92820 cacttcttgc
agcagttgac tgcatggggg cgcaggcgag cacagctctt gtggcacatc 92880
atcttcttgc agttgtattt ctgggcaagg tggcagaggg aaggctccgt aatgccacct
92940 cacaggcaca gcatcaggcg cagggtggac tctttctgga tgttgtagtc
taagagtgtg 93000 tggccatcct tcagctgttt gccctcaaat atcagacact
gctggtcagg taagatgccc 93060 tacctgtctt gaattttggc tttgacattc
tcagtggcat cactgggctc gacctcaagg 93120 gtgatggtct ggcctgtgag
ggtcttcaca aagatccaca tctcagcgtc tgcagcttgg 93180 ccagtctcac
tccattctca tttttttgtt ggtactcact ggtgtactca ggtggttgct 93240
taacagagaa gtaaaattgg atgtttccag aggctgaatt ttgccttaag atggaaactt
93300 tatttctata tggtattgtg ttttagtgct tattgtgata atatgacttg
ccaggagcca 93360 gagatcccag ccatatcctc ttttagaacc ccagtctcat
tttattctct accattcagt 93420 tccattttaa ggacaatgcc tctgactctt
cttcttagaa aaattacata ttcttatgtg 93480 tactttaagg agggatttct
ttgtgctatc aagggcttgg gggaagaggc ggggaatcaa 93540 cctgatacag
gtctgaaaac atgagcatag cttagcttca gactgtgcta gtgcagaccc 93600
agatgacatc tttcaggaac ctattgttcc attgttaata gttcctttag ggttaaaccc
93660 acatgcaggt ctagccctat tttcatcttt ctctcctaac tgtacctcac
agcagaaggc 93720 ctgggtgcca agaccgagtt gaagcagctg atggaaatag
atgttagact ataactgcta 93780 agggcattgt gaaataattt ataggtgctt
agatgagctt tcataggttg gttactataa 93840 aaatgtttgt attatactac
tgaatttagc tttatcatca cctccttatc agtttaagga 93900 aaaaatattt
tcagaaaata aatctgataa actatgtaga agataatctc tccatctaac 93960
atttgaaatc attaccagta gatatggttt tcctcaagtt cttacaactg agcagatgag
94020 aaatagcccc caagcctgtc ttgtttatcc atttaaactc taaactggtc
attaaagcta 94080 atgagcctct ctacagagct ctcagttaca agaatagaac
ttgtttactc ttgacagtaa 94140 atctggactt gaacaataga atcagaagca
ttgttttgat tatttgaatt cttaagatat 94200 catggatttg aattttgaag
tgttgaaaga acttgagcaa aacattgttg attgagaaag 94260 tgaacaaaac
ctgctttctc gttctgggag gatccagtga cattgtgagt gaagacgcaa 94320
acaggttttg actcctgcat ggccgatgac ctttttctgt aggcttacca gaaaagtaca
94380 ttccaacagt tctttgagga tttaaactag agcagcaaat aaagacaaaa
gattaatgca 94440 tgtctctgtt gcatataccc ctctctccca gccatttctg
ctgatgttaa gtttggaagc 94500 attgctgaca ttcctggagc attagcaaag
aaagagccaa gagaacagaa atgagaaatt 94560 ttataaacac tgcttaccag
ttatccttgt tagcatggga gaaccttatt ttccttgtag 94620 catgtgagct
ttaacatagt aacactttta ccaacatgag tctgcagaaa gactccagta 94680
gccattttgt cttttataga tagcatctta gaatggaaga tgtggtgtgt cacatgcgtg
94740 cgtgcggaga gaccaccaaa caggctttgt gtgagcaaca aggctgttat
ttcacctggg 94800 tacaggtgag ctgagtccga aaagagagtc agcaaaggga
gataggggtg gggccgtttc 94860 ataggatttg ggtgggtagt ggaaaattac
agtcaaaggg ggttgttctc ttgctggcag 94920 gggcgggggt cacaaggtgc
tcagttgggg agcttctgag ccaggagaag gaatttcact 94980 aggttaatcg
ctcagttaag gtgggacaga aacaaatcac aatggtggaa tgtcatcagt 95040
taaggcagga accaaccatt ttcacttctt ttgtgattct tcacttgctt caggccatct
95100 ggatgtatac atgcaggtca caggggatat gatggcttag cttgggctca
gaggcctgac 95160 atcgtgtttt gagtgttggg aacattgtgt tcattttttt
catacttgaa agtgagaact 95220 caccctgtag ccgggtgtct ctacctgtag
tggtctgatg accaccagcc ccaaattact 95280 taaccacaca gtctacctct
gcttttgcat ctataaaatt aagatttatg gaacatttct 95340 ttcttgtccg
tgagggctgt cactgtgcta ggagtgtaat tccattttac atacaaggga 95400
aaaagtttga agagattaaa tgaattgtac aaattcacgt aagtggcagt tggtagagtt
95460 aggattcaga ctcagatcag cttattccaa gtccattatt ctttctacct
ttctacagta 95520 ccctgtcagg ccaaaataat tcctgccctt gtctgctaga
agagagtggc agtgatgtat 95580 gagagttttt taaaaaggca tctgctctac
atcagattct cattcatatt cttaccaact 95640 ctgttgctct gttttggaat
gggagaggct gggctcaact tgttgaccac tcccattttt 95700 gtatctcttg
gctatcaggc actgtgtaag gccctccaca gtgatcattt aatcctcagt 95760
catggttgtc tttccaataa cagttgagga aacaggctta gagtatttaa ataacttgag
95820 agaagacaca acttatgcca gaaatgagat ttggttctag acctgaccaa
ctccaaacct 95880 agtgctgttt attactctag aaaaacatca caggcaacct
gagcagggcc tctgttcatt 95940 gcagagagct cacaggtgga cctgagcagg
gcgtctgttc tttgcacctc acaagtggcc 96000 agtcttattt ctctacttct
ttgtgctttc ctaggcaaag aatctgaaga gagaggttat 96060 actaggaata
ctggaataca tgttgaggtg ttcccaagat gttataagat acctttcatt 96120
tgtttgtttt tactttttga gatgaggtct cactctgtca cctaggctgg attgcagtgg
96180 catgatcata gctcactgca acctccacct cctgggctcc cacttcagcc
tcctgagtag 96240 ctgggaccac aggcgtgtgc taccataccc agctaatttt
ctctgtattt ttttgtagag 96300 atggggtttc accatgttgt cccagactgg
tctcaaactt cctgagctca agccatccac 96360 ctgcctcagc cttcccaaag
tgctggaatt ataggcatga gccaccaaac ccagccgata 96420 cctttttttt
gtctaaatgc ctgtattctc ccttagggta aattacagtc tagggtctgt 96480
ggtttcttct agaaagagtt tgattcattt aataaatacc tattaaggac ctaacatgtg
96540 cttctggcaa cacagtagta aacaagcaag gtatgatgtc tgccttcatg
gatcccactt 96600 taatgcagga aaacaataga caagtaaaca aataatcaca
aattgaagtt gatgctatag 96660 agaaaacaaa cagggtggta ctgagataga
cagtaactac tctagctata tctgaggtct 96720 gttttagagg tagaagtaga
catgctgatg ggaaacattt ggggaatgaa ggaaacagtt 96780 atcaaaaggg
acttacaggt ttctggccag agtgacaggg catgtgtagt agtgctgttt 96840
actgagatgg ggaagacttg gggagggaga tgaggagaga gtgttgcaaa gaaaactgag
96900 agctcttttg aacacattac agttgaaata tccaggctgg gcgcggtggc
tcatgcctgt 96960 aatcccagca ctttgggagg ctgaggcagg tggattgctt
gagtctggga gttcaagacc 97020 agcctgggcg acacggcaaa atcccttctc
tacaaaaaat acaaaaatta gctgggtgtg 97080 gtggcttatg cctgtagtca
caactacttg ggaggctgag gtgggaggat cacttgagcc 97140 tgggagacgg
aggttgcaat gagccaagat cacgccactg cattccagcc tgggtgacag 97200
aacaagaccc tgtctcaaaa aaataaaata aaagttagaa atatctgtga ggcatagaag
97260 tagagacatt tggacattca gatctattgc tcagaggaaa tacccaagat
ggagatttta 97320 gaattattag aaaatagagg atatttagag ccccagatat
tgaggctttc acatcaccta 97380 agaaaaaagg atacattttt aaaaagcagg
tagtctagaa gcaagccctg aagaacagca 97440 ttatttaggg
atcatataga gagaagagga gccaacaaag aagtcgggaa aaacagaaag 97500
ggactgggaa ggaacaagcc ttcagggaag aggaaaacca ggatgttgtg ctgccataga
97560 gacagaagag gagagtattt caagaaagag gggacatcaa aatgtgttta
ctgtttgaga 97620 gatcaaaaga agatcaaggt cagaacaaat gtgtattgga
tttgatggca tgaaggttgt 97680 tggtgacctt gaaagagatt tcacaaggaa
ggagtggtgg ggatggtaga aattggagta 97740 tgttgaagag agaatgggag
gcgaggaagt agaattagtg tgtaggcagc tctttagaag 97800 tttggctgta
aacaattgca gagaaatgag gcagctagaa gagaatatgg atgtcaaagg 97860
gagaatgttt tcaaaatagt agctgctgct gagagtaatc cagtagagag cacagactga
97920 tgttgcagga cagagcagtg gtacgataga aacaaagtct ccaggaaagt
gagagggggt 97980 gggacccaaa gcaccagtga ggaaatggct tttgttggga
gaagggatac cttttgcagg 98040 atattatgta gaaagggaca agaatattga
gttatttata aggaaaagat tataatgatg 98100 gggctaacgt gtgtgagctg
cacaagagag gagtgaagtt agggcagagc tgctgtatga 98160 tgggaatgtg
ctggagttca tggcttgagt acaggcgagc tagaaggata agaaatgatg 98220
gtcaggggtt tcagaggtag catggtttct gttggtgata agtacctgga agagggtggc
98280 tgagttcagg aggcatttaa agaactgaga agccaggttc tgggagagca
tcatgccttc 98340 actgaagaca cccagggtga tagcaggggc tggggcagaa
aggaaggagc agagtttaga 98400 atcttcctga atgtcagaga cagtgaagag
agagtcagga tggtaaagcc agctgccata 98460 agcaggggct cagaagggta
gaagaataag gcctgaaagt tgcaaggcag cctcttactg 98520 actaaatttt
aaacttagtc tctttgagct tgatgtcttc ctctgataaa tggtggtaag 98580
catgtgcacg ttatcacaga gttcaaattt ggtgagtcag tgtacccact gcattgccca
98640 gtaatactaa aaaagaaaaa acaaatacta atttctgcaa ctaccatact
ccctaaaaac 98700 agagacctac ccccaatcac caaaaaatcc ccattgtttt
tctaatccaa attttgtaca 98760 tatttaataa ccttatacca ccacttacta
tttttttact ttcatcgaag atgaatctac 98820 aaaaatatat taatgtcaaa
aaatattact gacctagcaa actggcagtt gggaagtaag 98880 gtaagaaggc
acacttttat taattaataa tatcttttgt attccctaaa cagattgaaa 98940
aatgatggat tagttcattc ttgcattcct ataaagaaat acctgaaacc aggcacagtg
99000 gctcacgcct gtaaatccca gcgctttggg aggccaaggt gggcggatcg
cttgagttcg 99060 agaccaacct gggcagcaaa gtgagacctg gtctctacaa
aaaatacaaa atattacccg 99120 gaaggctgag gtgggatcca cctgagccca
gaaggttgag gctgcagtga gctgtgatca 99180 caccattgca ctctagccta
agtgacagag tgaaaactct gtctcaaaaa aaacaaagaa 99240 ccacctgaga
ctgggtaatt tataaagaaa agaggtttaa ttggctcacg gttctgaagg 99300
ttctaaagga agcatagctc cagcattagg ccaggtgcat tggctcacac ctgtaatccc
99360 agcactttgg gaggccaagg gcaggcggat catgaggtca ggatttcgag
accagcctgg 99420 ccaatatggt gaaaccctgt ctctactaaa aatacaaaat
tagctgggcg tggtggcgca 99480 cacctgtagt ctcagctact cgagaggccg
aggcagaaga atcacttgaa cccaggaggc 99540 ggaggttgca atgagctgag
atcgtgccac tgcactccag cttgggacac agagtgagac 99600 tccatctcaa
aaataaataa ataaataaat aaataaatag ctccagcatc agcttctggg 99660
gaggcctcag gaaacttaca gccttggcag aaagtgaagg gggagccggc atgtcatgtg
99720 gccagagcag gagcaagagt gcaggagggg aggtggccac atgcttttaa
acaacccacc 99780 tcccacaaga actcactcac tattgcgagg acgacagtac
caaggggatg gggctaaacc 99840 attcatgaga aatttccctc cgtgatccag
tcacctccca ccaggcccca cctccagcac 99900 tgaggattat agttcaacat
gagatttggt ggagacacag atccaaacca tatcaaatgg 99960 gttctaggaa
cttagcctag atttcagatt taggaacagt atcataggtc accttttcaa 100020
aatacataaa gtttcctaca gaaacaatat caattaagtg catgttttaa aaataaaaat
100080 aaaggttact acaaaaaaag tggggaggag caggagtggg tgcaggtgtc
cccaggaagc 100140 ctaggcatag ctcacactgc atgtgctatc acggcgagac
tcagaactgc cccgaatccg 100200 aggaggggcc atgcgagtag gtgggcctag
gcacctcctc agtcactggc tgtgcccttt 100260 cactctgtca ctgggagaca
gaatcctgag ttttctgctt cagggagcct gcatggaaag 100320 agtaggtcac
tgccggaaat caggctagtt ttagcaaaag gaacggacat taggcacctc 100380
caaagggaca aaggaccaat atacctggtt ggggacagga ttctgtcatt tgattattcc
100440 tgactcatgt tttcatgagg tagtccccca cctcatataa aagcctcagt
gttggcttct 100500 gaccatggtg tatgaaaagc ccttgtctaa aggttactgc
cctgagaaaa taataaagga 100560 agaagaggat agacatgaag acactttaaa
gcctcctgaa tagaatgcat ccagaagcga 100620 attccaggag attctgtcat
catgcttgcc tttcaagcaa acaaaattag ctgctagaac 100680 tgagaaagag
tgtaaacacc aactaaatgc ctcaaagaat catggtagta aattacttct 100740
ccatgttgct ccatataaac ctgctgtgcc acctgttgaa ggcagcactg atgctgcatg
100800 ttcagtctgg tccaaggccc caacaggaat ccgttgtgcc aagaaaaggc
cctactggaa 100860 ggattggaga gcagctggtt ctcagcaatg caagcatcag
gccaggctgg ggctgcttaa 100920 tgctgcttaa gagatgacag tggtggaccc
caacacctct ccaagggatg tagaatctgc 100980 ttttcccatt tctgaatgct
actgaaacaa atctacaact agaaaaatca aatattcatg 101040 aattcaagac
ttgggatctc agtactaaga ctttaaagaa gttgccagat ggatcgcttc 101100
tgtggtgaca gccctggcag gagcattcaa gtgctctatg agctacaaaa gaaaccagtt
101160 gatggtgtga acaccactac agagcaacct gcacaccaca gcaatttgac
agctcaggtt 101220 ctgtgtctca tgtggcaccg tgcttgtcct tggaaagaag
gcctacaaaa ttcttcatat 101280 ctccattcct tgacatctgc tggcaaactc
ccactcatat tttaagactc agcctctcct 101340 gtgacacctg tgtcttctct
ccaaacaggg agggacgctt gcctcttcag agctccccac 101400 actggagtat
aactgctcct gtgtctgatg cccttagtct cagtgccagg aggtattcat 101460
gcttatgtcc ccatggcctg taacagagcc tgcatcagga tgcttggtaa aggactgttg
101520 aatgaatgtc aaatatgggt ccctctgatg ggtctatacg tgttgatcta
ggattggaag 101580 ggtcacaaag agttgtgcat gcttacaatt tcaatcaaat
atcactattt ttagttaaga 101640 gggaagagta gtgtgaaatt ggcaataatt
agatactcca aatgttcttt aaaaactaat 101700 agcattgatg tattaagaat
gcaatcagcc gggcacagca gctcacacct gtaatcccag 101760 cactttggga
ggctgaggca ggtggatcat gaggtcagga gttcgagacc agcctggcca 101820
agattgtgaa acccccgtct ctactaaaaa tacaaaaatt agccgggcat ggtgacgcac
101880 acctgtagtc ccagctactt gggaggctga ggcaggagaa ttgcttgaac
ccaggaggtg 101940 gaggttgcag tgagcccaga tcgtgccatt gcactccagc
ctgggtgacg agcgaaactc 102000 agtctaaaaa aaaaaagaat gcaatcatac
attagaagac acattctgtt ttagattttt 102060 acttaaatat tttaaatact
tccttaatct gcatatttac cttattgata gatttcagaa 102120 gaaattgatc
atttcatgga acaagattta ttagacacat aaggaaagtg aatcataaca 102180
actgtacagg tgggaaattg aacaacaaaa atgaccctga gatacccaca ttctactttg
102240 gcatatagtg ggaaaaacat tctagacttc aagtctaggc ctatcttggc
taatgtaacc 102300 gatgacttca caaaccattt atgggactag aagctgaaag
gaaagtactg gtggataaac 102360 atcatattga aattatgttg agtcacttat
ttgctataaa acacaaattg ttttgtgtaa 102420 aggggttaag atggctggaa
aactgtctcc actcaagagc aagaaagcag catgtgtctt 102480 accctgtacc
ttcattttta cttgtacttc ataatttctg agggagaaat acgtggaaac 102540
cagatgcttg atatagtttc agaacacgtc cttaaagaat atgactccaa gtctaagaat
102600 tgtaggtcct ttgcttctta gataactact gttagccttg atcacagaga
ttccaggttt 102660 aataacttca gttctcccca ctgtgtatat agatgttaag
ttacacagat ttggcattat 102720 tcccattttc aggttaatat cagaacactt
gttatcaagt caggatagta attgtgagcc 102780 tagatgctct aggtttggcc
atacgtggtt atctacacca ccaactgttc caattaacaa 102840 tttaccagtt
gcttctaccc aaagtaccaa gactccagca aatggggaat attggaaact 102900
ggcttggctt cttgaagcaa catggtaatc aataagaatc ttggctgggc atggtggctc
102960 atgcctgcag tcccagcact ttaggaggcc aagatggaaa gatgggaaga
tcgctcaagc 103020 ccaggagttc aagaccagcc tgggcgacat cgtgaaaccc
catctctaca aaaaaataca 103080 aaaattagct gggtatggtc gtgggtgcct
gtagtcccag ctgctgggga gctgaggtgg 103140 gagatcacct gagcccagga
ggcagttgca gtgagccaag attgcaccac tgcactccag 103200 cctgggtgac
agagtgagac tctgtctcaa aacaaacaaa acaacaatct ggctgggcgc 103260
ggtcgctaat gtctgtaatc ccaacacttt gggaggctga ggaggcagat cacttgaggt
103320 caggaattcg agaccagcct ggccaacatg gtgaaacccg tctctattaa
aaatacaaaa 103380 attagccggg catggtggca cacacctgta atcccagcta
cttgggaggc tgaggcaaga 103440 gaattgcttg aaccaggagg cagaggttgc
agtgagctga gatcatgcct ctgcactcca 103500 gcctgagcta cagagcgaga
ctctgtctca aaaaaacaaa aaacaaaaac aagaagaatc 103560 ttactactgc
ttcttcgggg atacttttgg tattattttg acaaatgaat tgtgaggatt 103620
caaatataag aaagggatta ttcttggtag agttaacaaa attgtaccaa atgacttttt
103680 gtgttaaaca cgattcattc acccaaccct agaaaggagc ctgaatgaag
tctaatttgg 103740 gtgacagatt cccacacaaa ttagatgtat gtcattcagg
tatagagaat tgattttata 103800 ttagaaaaaa caaaccttgt aaacagtttt
ataaataact gtttcatgat tttccttaag 103860 tagtactgat ctcttacata
tagatcgttt gtgtctttcg cctcaagtta gtatagaaca 103920 gggcaagtgg
caaagctcga ggaaagtgtg acctgaggta catgctgtca gcttgatgct 103980
ggagtttggc ctctcaaatc tctaacctgt taaatgaagt taattaggat taattttttt
104040 taatgtatgt ttactactga aaataagtgc tcggccagac gcagaggctc
acgcctgtaa 104100 tcccagcact ttgggaggcc gaggctggca gatcacctga
agtcagggag tttgagacca 104160 gcctggccaa catggcgaaa cactgtctct
attaaaaata caaaaattag ctgggtgtgg 104220 tgatacatgc ctgtaatccc
agctactcgg agcctgaggc aggagaactg cttgaaccca 104280 ggaggcggag
gttgcattga gccaagattg tgccattgca ctccagccca ggcgacagag 104340
tgagactcat gtctcaaaaa aaaaaaaaaa aaaaagagga aaagaagtgc ccaatagctt
104400 caatggatgc cacataattt tggaataatt tttacaatca ggaatttcat
tgtccaagcc 104460 ccttagaaaa agaagcaacc cagccccata cccagaaagt
caagctgtat agtgctgttc 104520 cttagtgagg acggtcaact ctcagtagaa
aaatctcctg tttggattag tgcttagttg 104580 acctattgtg ttcagttcct
ctaacatgag taacttctat tggataggaa attttgaagc 104640 tcaaagggtg
taatgagagt taacattact gattttccac tgttactttt tagtgttttc 104700
ataacttgga tgtgttaacc tatggcccat caactatgct cctagtctca ggtgacaaca
104760 tgttcaattt aagatggcag gcagtacagt ggacctctct catcccatgg
gaaggaaccc 104820 aggatgttta ttatgtagta ttgtatagtc tctgcagcag
taatagagaa agttaaaggt 104880 aagcggtgga gaagtaaaat ctagagtttc
taatataacc cttctcactt ttcttttcaa 104940 aaaaaataag agggtctcac
catgttgccc acactggtct ctatcgaact cctgggctca 105000 agcgatcctg
tcgtctcagc ctcccaaagt gctaggatta caggcatgag ccactctgca 105060
tggccaagct cactcttctt aaaggtctgc tagtaagagg gtttctactt tttgaaacaa
105120 attcatgatt acctaaaatg aagctaggtt atgaagtata tataaatatg
cagcccaata 105180 ggctgggtgt ggtggctcac acctgtaatc ccagcacttt
gggaggctga ggcaggcaga 105240 tcacttgagg tcaggagttt gagaccagtc
tggccaacat ggtgagacca catctctaca 105300 aaaaatacaa aaattagcgg
gtgtggtggc ctgtgtgcgc ccatagtacc agccacttgg 105360 gaggcagagg
caggagaatc acttgaagcc aggaggcaga gttttcagtg agctgaaatt 105420
gtgtcactgt acttcaagcc tgggcaatgg agtgagactg tctcaaaata tatatatatt
105480 tgcagcccaa taaagatact tagataaaac tattgggttt attccttgaa
aactagggca 105540 tgtgtagcta gatctggctc ataaaaagca aagttattta
catatatttt aaggtaaaat 105600 tgcctctgat aaatgtcaaa gaggaagttt
aggtctttct tctggcagaa agccagagag 105660 taagtgctga atgtgacgca
gaatcatgtt aggtaacaag gactttgagg taagtggctg 105720 aagtcttctg
tggagtcagc cgactcttgc aggattgtgt ggtatcagtc acctttagca 105780
tttgccaacc caactctgat cattcttctt ctttcaaggt atctcagcgt ttgagtcagc
105840 caggagtagc aataggtttg gcttggactc ccttaggtgg agaaatcatg
ttcgtggagg 105900 cgagtcgaat ggatggcgag ggccagttaa ctctgaccgg
ccagctcggg gacgtgatga 105960 aggagtccgc ccacctcgct atcagctggc
tccgcagcaa cgcaaagaag taccagctga 106020 ccaatggtag gagcctgcac
ccggccaggc aggcgtgacc caggaggcgg taccttccat 106080 ggcggagact
ggcatgagct cgagactgcc agttacacat ctagcaaagt acacaccgtt 106140
ttgaacccct gtggaaatcc tagttcccat ttcaggacta tttgactagt gcctgaacta
106200 gaaactaatt caaaaggttt attttgtttt aatacgactt agagtagaat
ggaactgttc 106260 ttccacaccc tcacccaaat tgtactgtcc accaatattt
tgaagaattc atttacccaa 106320 aacattcatt tttgtttgtg actttttttt
taggagaaaa agaaaacagg tttaattttt 106380 ctacattaaa gtcccttttt
cctttttaaa gcttttggaa gttttgatct tcttgacaac 106440 acagacatcc
atctgcactt cccagctgga gctgtcacaa aagatggacc atctgctgga 106500
gttaccatag taacctgtct cgcctcactt tttagtgggc ggctggtacg ttcagatgta
106560 gccatgactg gagaaattac actgagaggt cttgttcttc cagtaagtat
gaaaaaacaa 106620 tttatatggt tattttttat ttaatttttg aaaattaata
ttatttttaa atacgggttt 106680 gccttctttc tatgaaaacc ttggttttaa
gtatatatta tatttttatg cctgtaacta 106740 attcatattt taaaattttg
atcaaataaa agaaaaactg acaatttttc acattttcct 106800 tttttttttt
tttttttttt tgaaatagac aggtctcact ctgttgccca ggctggagtg 106860
cagtggtgtg actgtagctc actatagcca ccaagtcctg ggctcaagcg atcctcctgt
106920 ctgtctcccg aatagctggg actataggag cacgccacca tgctcagcta
atttatttta 106980 ttttgcgtag agacagggtc tctctgtgtt gtccaggctt
gtctcaaact ccaggtctca 107040 tgcagtcctc tcatctccac ctcccaaagt
gctgggatta caggcgtgag ccaccacatt 107100 cagcccacgt ttcccattct
aagatttgct aagggaaaaa aatattagtg tggtcatcag 107160 aaatattggc
agttacatga aaatttgagg ccttgttcta cttgacaaat tgttaaagat 107220
atagcacatg tgcaaaatgg gatagtagtt gtttttaagc tttaagccca tttcttaaat
107280 ttgaagtttc tttgagacct cctgtccccc tgcagaaaac tttgctagta
tagaatggaa 107340 actctaataa agattaacca tatctaatga ctacattttg
aaaaggttct atacatgtgg 107400 ggtcttgagg ctccagatcc taaactgctt
ataaaaatag tgtgataaaa tgtacagaac 107460 ttgagagtat ttaaagttgt
tagttgagta ttagtctaca acagactaga ctacaatttt 107520 agtccacaac
aagattttgg caggttcata gcaagatgag gaaaaaaaaa aagaaatagt 107580
ctttttttct tttttctatc gagatggagt ccggctctct tacccaggtt ggagtacagt
107640 ggcacaatct tggctcactg caacctctgc ctcccaagtt gaagtgattc
tcctgcctca 107700 gtctctcaac tagctgggat tacaagcatg cgccaccacg
cccggataat tttttctatt 107760 tttagaacct ccatagaaca aatgggtttt
ctacttggtc ccctctcaga gcaaatcgta 107820 gcccaagtaa aggcttctgc
agcctcagga gagacagcca cagcggcctg gggtacacct 107880 tcagctccag
accattacaa gaggcaggat ggaaagcagc agcacttgaa agaaaggcct 107940
gtgaaagctg gagaaaacct cctttgagaa cagaggacaa gacggggctt tgggatttga
108000 aagtggtcaa agaattattc aggaaaaaac tatagtgaaa aacaatttgt
tgttagaact 108060 ccaacatcta aaaggagttc taacaaacag gaaaatggaa
tggaacaaat tatccaagaa 108120 ataactgaac atttcctaga agttaaggca
tcttgagatc gaaaggacca ttactaacca 108180 ggaaaaacat ttcatcccct
tgacttttca gattactgag gataaagcgg cctcagcact 108240 gacactggat
gtgcagtacc ttcaaaacta tgagggaaaa tgggccaggc gtggcagctg 108300
acgtctgtaa tcccagcact ttgggaggct aaacaggagg atagctcaag tccaggagtt
108360 caagaccagc ctgggaaata tatctctaca aaaattgttt taaaaatagt
aaggaggctg 108420 ggtgtggtgg ctcacgcctg taactccaac actttgggag
gccaaggtgg gcgtatcact 108480 tgaggttagg agtttgagac cagcctggcc
aacatggtga aaccctgtct ctactaaaaa 108540 tacaaaaaaa ttatccggat
gtggtggcgc atgcctgtaa tcccagctac tcaggaggct 108600 gaggcaggag
aatcgcttga acctgggagg cagaaagttg cagtgagcca agattgtgcc 108660
actgcaactc tagcttgggt gacagagtaa gactgtctca aaaaaaaaaa aaatagtaat
108720 gaaagctgtg agggaaaatg ttttacatct agtcttgtat acatggcctt
agtatcaatc 108780 aagtgtgaaa gtaaaatatt ttcaaacatg caaggaatca
gttcatctta cactcttttg 108840 aagaaggtac tttgaaggag tacttcagca
gcatgaacaa aaccttgaaa gaagatgcca 108900 gtggggcggg aaggcctgga
gcagccagcc agtcttaatt ggagcagatg caacacatta 108960 ccccaaagca
agaatactcc atactcttca agttcctgtg ggccaggaat tcaggagagg 109020
ctgagctggg ttcttgtggc ccagggtctc tggccttaca gtctaggttc cagccaggct
109080 gcagtcacat gaaggctgac aggctggaga aactgcttcc atggtggttg
actcatgtga 109140 ctggcaaatt ggtcccatct agtggcagga ggccccagtt
cctcacctga tggacttgcc 109200 cataggctgc ttgagtgacc tcagacatta
tgactggcca cctccagggc aggtgatcaa 109260 gagagattca ggcagcagct
ctcgtttttt gtgactcagc cgtggagatc atacagcatc 109320 actcccacca
cactctgttt cttaccgagt cacaaagcct ggcccacatt caagcagggg 109380
gaccattgta gacatgtttg aaagccacca taggagccta gtttagggat acattttctt
109440 cattaaccag catggaggtt ctggctttaa acctgtagag agggaagtaa
ccccagcaca 109500 cagctaagct ctgcaggagc ggcgctcatg gtcagaatca
cgtgctgctt tttcagatca 109560 acctaaagac tagacggttg tgattacacc
tgaatgccaa tttactttga cagcatttat 109620 aaaaacaatc attgacagaa
gaggaactca tacctatcaa caatttagaa tccccctcat 109680 cagagtcttt
aatataacac caattgaaac attaaaaaaa ggttactact tatccttttt 109740
cctggctttc ctagctcatg ctataacaaa acggaagatg atttggatgt tttaaaatag
109800 tagtggttaa attcagtgaa agaaagctgg gtcagggttt ctttcagctt
gagggtgatc 109860 attaacccta aaaacttttt tctctcctta caggtgggtg
gaattaaaga caaagtgctg 109920 gcggcacaca gagcgggact gaagcaagtc
attattcctc ggagaaatga aaaagacctt 109980 gagggaatcc caggcaacgt
acgacaggat ttaagttttg tcacagcaag ctgcctggat 110040 gaggttctta
atgcagcttt tgatggtggc tttactgtca agaccagacc tggtctgtta 110100
aatagcaaac tgtaggtcca aatctcaatt ttttagaatt ttaagttatg aagtgctcaa
110160 aggtactgac acagttgatt ttattcacac cattaggggt atgcaagatg
tccctgtttt 110220 ataaacataa tcacaacagt aataaacctc aagtagtggc
tagtgtttag tatagaaata 110280 taagatgttg atttagtaaa ctgataaaaa
tcgaattctt gtctttttag tgggatcctt 110340 actgtccctg gaaagatata
gcatagtggt tctcagcaca gtctccagaa cagaagcatc 110400 tgtagtacct
ggtaacttgt tagaaatgta cattctcagg ctccacagca ggccgcctga 110460
atcaaatcct gggaggtggg gacagaaatc tgtgttttaa gaagccttcc aggtaattct
110520 gctgcacact caagttcagg aaccaccggt atagaccatt accttagtgg
atttacctgt 110580 agagtttatt ggatcctgaa accaatcaat tacttagaac
taggcaaaga tgaaagtata 110640 gccaactatt cttggctata tatatatatt
caagtgggcc gggcgtgatg gctcacacct 110700 gtaattccag cactttggga
ggtcgaggta ggcagatcac cgagcccaag agttcaagac 110760 aatcctggcc
aacggcgaaa ctctgtctct acaaaaaata tacaggcgtg ttagcatgtg 110820
cctgtaatcc cagcttcttg ggaagctgag gcacaagaat tgcctgaacc caggaggtgg
110880 aggttgcagt gagctgggat cgcgccattg cactccagcc tggctgacag
agcgagactg 110940 tctctaaaaa aaaaagactc aagtggaccc tacaatgaag
cctacacatc ccaatagaag 111000 ccccttctta tgctgaggga agcagccctc
agaacatgat agcttgtatc cagcagagtg 111060 gcacgtgctg gcacacctca
cagaagcacc ctggccctgg atgcctgcaa cctcagaaga 111120 gtgcagctcc
cagagggagg cagccatcca tctgggatgg tcctaagcat ggaatcctaa 111180
ctcctgattc cgtctcctat ttcttgcttg gctacgccag ttcccaaatc tggtagatgt
111240 ccatgcccat gtgctcctgc tgggactcaa ttcaggctat gtatgactat
gaagtcaggc 111300 tcatctgctt actggctgtg tgaacttttt gtatcttggt
tttcttcatc catgaaatcc 111360 aagtaatact acctaattgt tactgtggag
attaagttca aatgcaatgt atagtaatat 111420 taagcaattt ctagttatta
ttctagccag taatggactt cagaatcttt tattacacaa 111480 tataagaata
tgtatgtaaa gacattttgg aatttcctgg atgagaagga agtctgggct 111540
gggcatggtg gctcacgcct gtaaccctag cactttagga aatcgaggcg agtggatcac
111600 ttaagctcag gagttcaagg ccagcctggg caacatggca aaaccccatt
tctacaaaaa 111660 atacaaaaat tagctgggca tggtggcacc cgcctgtagt
ccagctactt gaggctgaga 111720 tgggaggatg agggaggtcg gggctgcagt
gagccaagat cacgccactg cactccagca 111780 ccctgggcga cagagtgaga
ccctgtctca aaaaaaaaaa aaaaaaaaag attgggccaa 111840 aatactgtga
taaaatagca ggcctgctga taaaagttta tctgaatgca ttgagaggaa 111900
aagtccagac ctaggactag ttatggcagt tggagagaaa gaacatcggg atgtttgaaa
111960 atatgccatt gactatctta actactgtaa ttttatcatt tccaacgtca
tctaactggg 112020 gactagaaca aactgtgaat tcactttcag caaccagagg
gcgctaatcc acacccacat 112080 cgctctgccc tgttccaccc agcaggggca
acaaggatat aacttggggt tc 112132 4 884 PRT Human 4 Ser Asp Ser Pro
Val Glu Leu Pro Ser Arg Leu Ala Val Leu Pro Phe 1 5 10 15 Arg Asn
Lys Val Leu Leu Pro Gly Ala Ile Val Arg Ile Arg Cys Thr 20 25 30
Asn Pro Ser Ser Val Lys Leu Val Glu Gln Glu Leu Trp Gln Lys Glu 35
40 45 Glu Lys Gly Leu Ile Gly Val Leu Pro Val Arg Asp Ser Glu Ala
Thr 50
55 60 Ala Val Gly Ser Leu Leu Ser Pro Gly Val Gly Ser Asp Ser Gly
Glu 65 70 75 80 Gly Gly Ser Lys Val Gly Gly Ser Ala Val Glu Ser Ser
Lys Gln Asp 85 90 95 Thr Lys Asn Gly Lys Glu Pro Ile His Trp His
Ser Lys Gly Val Ala 100 105 110 Ala Arg Ala Leu His Leu Ser Arg Gly
Val Glu Lys Pro Ser Gly Arg 115 120 125 Val Thr Tyr Ile Val Val Leu
Glu Gly Leu Cys Arg Phe Ser Val Gln 130 135 140 Glu Leu Ser Ala Arg
Gly Pro Tyr His Val Ala Arg Val Ser Arg Leu 145 150 155 160 Asp Met
Thr Lys Thr Glu Leu Glu Gln Ala Glu Gln Asp Pro Asp Leu 165 170 175
Ile Ala Leu Ser Arg Gln Phe Lys Ala Thr Ala Met Glu Leu Ile Ser 180
185 190 Val Leu Glu Gln Lys Gln Lys Thr Val Gly Arg Thr Lys Val Leu
Leu 195 200 205 Asp Thr Val Pro Val Tyr Arg Leu Ala Asp Ile Phe Val
Ala Ser Phe 210 215 220 Glu Ile Ser Phe Glu Glu Gln Leu Ser Met Leu
Asp Ser Val His Leu 225 230 235 240 Lys Val Arg Leu Ser Lys Ala Thr
Glu Leu Val Asp Arg His Leu Gln 245 250 255 Ser Ile Leu Val Ala Glu
Lys Ile Thr Gln Lys Val Glu Gly Gln Leu 260 265 270 Ser Lys Ser Gln
Lys Glu Phe Leu Leu Arg Gln Gln Met Arg Ala Ile 275 280 285 Lys Glu
Glu Leu Gly Asp Asn Asp Asp Asp Glu Asp Asp Val Ala Ala 290 295 300
Leu Glu Arg Lys Met Gln Asn Ala Gly Met Pro Ala Asn Ile Trp Lys 305
310 315 320 His Ala Gln Arg Glu Met Arg Arg Leu Arg Lys Met Gln Pro
Gln Gln 325 330 335 Pro Gly Tyr Ser Ser Ser Arg Ala Tyr Leu Glu Leu
Leu Ala Asp Leu 340 345 350 Pro Trp Gln Lys Val Ser Glu Glu Arg Glu
Leu Asp Leu Arg Val Ala 355 360 365 Lys Glu Ser Leu Asp Gln Asp His
Tyr Gly Leu Thr Lys Val Lys Gln 370 375 380 Arg Ile Ile Glu Tyr Leu
Ala Val Arg Lys Leu Lys Pro Asp Ala Arg 385 390 395 400 Gly Pro Val
Leu Cys Phe Val Gly Pro Pro Gly Val Gly Lys Thr Ser 405 410 415 Leu
Ala Ser Ser Ile Ala Lys Ala Leu Asn Arg Lys Phe Ile Arg Ile 420 425
430 Ser Leu Gly Gly Val Lys Asp Glu Ala Asp Ile Arg Gly His Arg Arg
435 440 445 Thr Tyr Ile Gly Ser Met Pro Gly Arg Leu Ile Asp Gly Leu
Lys Arg 450 455 460 Val Ser Val Ser Asn Pro Val Met Leu Leu Asp Glu
Ile Asp Lys Thr 465 470 475 480 Gly Ser Asp Val Arg Gly Asp Pro Ala
Ser Ala Leu Leu Glu Val Leu 485 490 495 Asp Pro Glu Gln Asn Lys Ala
Phe Asn Asp His Tyr Leu Asn Val Pro 500 505 510 Phe Asp Leu Ser Lys
Val Ile Phe Val Ala Thr Ala Asn Arg Met Gln 515 520 525 Pro Ile Pro
Pro Pro Leu Leu Asp Arg Met Glu Ile Ile Glu Leu Pro 530 535 540 Gly
Tyr Thr Pro Glu Glu Lys Leu Lys Ile Ala Met Lys His Leu Ile 545 550
555 560 Pro Arg Val Leu Glu Gln His Gly Leu Ser Thr Thr Asn Leu Gln
Ile 565 570 575 Pro Glu Ala Met Val Lys Leu Val Ile Glu Arg Tyr Thr
Arg Glu Ala 580 585 590 Gly Val Arg Asn Leu Glu Arg Asn Leu Ala Ala
Leu Ala Arg Ala Ala 595 600 605 Ala Val Lys Val Ala Glu Gln Val Lys
Thr Leu Arg Leu Gly Lys Glu 610 615 620 Ile Gln Pro Ile Thr Thr Thr
Leu Leu Asp Ser Arg Leu Ala Asp Gly 625 630 635 640 Gly Glu Val Glu
Met Glu Val Ile Pro Met Glu His Asp Ile Ser Asn 645 650 655 Thr Tyr
Glu Asn Pro Ser Pro Met Ile Val Asp Glu Ala Met Leu Glu 660 665 670
Lys Val Leu Gly Pro Pro Arg Phe Asp Asp Arg Glu Ala Ala Asp Arg 675
680 685 Val Ala Ser Pro Gly Val Ser Val Gly Leu Val Trp Thr Ser Val
Gly 690 695 700 Gly Glu Val Gln Phe Val Glu Ala Thr Ala Met Val Gly
Lys Gly Asp 705 710 715 720 Leu His Leu Thr Gly Gln Leu Gly Asp Val
Ile Lys Glu Ser Ala Gln 725 730 735 Leu Ala Leu Thr Trp Val Arg Ala
Arg Ala Ala Asp Leu Asn Leu Ser 740 745 750 Pro Thr Ser Asp Ile Asn
Leu Leu Glu Ser Arg Asp Ile His Ile His 755 760 765 Phe Pro Ala Gly
Ala Val Pro Lys Asp Gly Pro Ser Ala Gly Val Thr 770 775 780 Leu Val
Thr Ala Leu Val Ser Leu Phe Ser Asn Arg Lys Val Arg Ala 785 790 795
800 Asp Thr Ala Met Thr Gly Glu Met Thr Leu Arg Gly Leu Val Leu Pro
805 810 815 Val Gly Gly Val Lys Asp Lys Val Leu Ala Ala His Arg Tyr
Gly Ile 820 825 830 Lys Arg Val Ile Leu Pro Glu Arg Asn Leu Lys Asp
Leu Ser Glu Val 835 840 845 Pro Leu Pro Ile Leu Ser Asp Met Glu Ile
Leu Leu Val Lys Arg Ile 850 855 860 Glu Glu Val Leu Asp His Ala Phe
Glu Gly Arg Cys Pro Leu Arg Ser 865 870 875 880 Arg Ser Lys Leu
* * * * *
References