Direct Oligonucleotide Synthesis On Cells And Biomolecules Godron; Xavier ; et al. [DNA Script SAS]

Direct Oligonucleotide Synthesis On Cells And Biomolecules

Godron; Xavier ; et al.

Patent Application Summary

U.S. patent application number 17/687128 was filed with the patent office on 2022-06-16 for direct oligonucleotide synthesis on cells and biomolecules. This patent application is currently assigned to DNA Script SAS. The applicant listed for this patent is DNA Script SAS. Invention is credited to Sylvain Gariel, Xavier Godron, Adrian Horgan, Jeffrey Jeddeloh, Robert Nicol, Thomas Ybert.

Application Number	20220186213 17/687128
Document ID	/
Family ID
Filed Date	2022-06-16

United States Patent Application	20220186213
Kind Code	A1
Godron; Xavier ; et al.	June 16, 2022

DIRECT OLIGONUCLEOTIDE SYNTHESIS ON CELLS AND BIOMOLECULES

Abstract

The invention is directed to methods for synthesizing oligonucleotides direction on biomolecules or cells living or fixed. In some embodiments, template-free enzymatic synthesis is implemented under biological conditions with successive cycles of (i) enzymatic addition of a 3'-O-blocked nucleoside triphosphate and (ii) enzymatic deblocking of the incorporated nucleotide to regenerate a free 3' hydroxyl. The invention has applications in single-cell cDNA library construction and analysis.

Inventors:

Godron; Xavier; (Paris, FR) ; Horgan; Adrian; (Paris, FR) ; Gariel; Sylvain; (Paris, FR) ; Jeddeloh; Jeffrey; (Verona, WI) ; Nicol; Robert; (Cambridge, MA) ; Ybert; Thomas; (Paris, FR)

Applicant:

Name	City	State	Country	Type
DNA Script SAS	Le Kremlin-Bicetre		FR

Assignee:

DNA Script SAS
Le Kremlin-Bic tre
FR

Appl. No.:

17/687128

Filed:

March 4, 2022

Related U.S. Patent Documents


Application Number	Filing Date	Patent Number
16970590	Aug 17, 2020	11268091
PCT/EP2019/084347	Dec 10, 2019
17687128

International Class:

C12N 15/10 20060101 C12N015/10; C12N 15/11 20060101 C12N015/11; C12Q 1/44 20060101 C12Q001/44; C40B 50/06 20060101 C40B050/06; C40B 50/08 20060101 C40B050/08; C40B 50/14 20060101 C40B050/14; C40B 70/00 20060101 C40B070/00

Foreign Application Data

Date	Code	Application Number
Dec 13, 2018	EP	18306687.7
Feb 25, 2019	EP	19305219.8

Claims

1. A method of synthesizing on a viable cell an oligonucleotide with a predetermined sequence, the method comprising the steps of: a) providing an initiator with a free 3'-hydroxyl attached to a cell surface molecule of the cell or anchored in the cell surface membrane of the cell; and b) repeating under biological conditions for a plurality of cycles the steps of (i) contacting the initiator or elongated fragments having free 3'-O-hydroxyls with a 3'-O-blocked nucleoside triphosphate and a template-independent DNA polymerase so that the initiator or elongated fragments are elongated by incorporation of a 3'-O-blocked nucleoside triphosphate to form 3'-O-blocked elongated fragments, and (ii) deblocking the elongated fragments to form elongated fragments having free 3'-hydroxyls, thereby synthesizing the oligonucleotide of predetermined sequence.

2. The method of claim 1, wherein said 3'-O-blocked nucleoside triphosphate is a 3'-phosphate-nucleoside triphosphate and said step of deblocking is carried out by treating said 3'-O-blocked elongated fragments with a 3'-phosphatase activity.

3. The method of claim 2, wherein said 3'-phophatase activity is provided by T4 polynucleotide kinase, recombinant shrimp alkaline phosphatase, or a calf intestinal alkaline phosphatase.

4. The method of claim 1. wherein said 3'-O-blocked nucleoside triphosphate is a 3'-ester-blocked nucleoside triphosphat; and said step of deblocking is carried out by treating said 3'-O-ester-blocked elongated fragments with an esterase activity.

5. The method of claim 1, wherein said biological conditions comprise buffered physiological salts at a pH in the range of from 6.8 to 7.8 and a temperature in the range of from 15.degree. C. to 41.degree. C.

6. The method of claim 1, wherein said viable cells are mammalian cells.

7. The method of claim 1, wherein said initiator comprises an oligonucleotide having a lipophilic anchor covalently attached to a 5' end, wherein the lipophilic anchor inserts stably into a cell surface membrane of said viable cell.

8. A method of generating a cDNA library with cell-specific oligonucleotide barcodes, the method comprising the steps of: (a) synthesizing under biological conditions a unique oligonucleotide barcode on the cell surface membrane of each cell in a population of cells to form a population of barcoded cells; (b) isolating each barcoded cell in a reactor; (c) lysing barcoded cells in each reactor; and (d) performing reverse-transcriptase polymerase chain reaction (RT-PCR) in each reactor to produce a cDNA library with cell-specific oligonucleotide barcodes.

9. The method of claim 8, wherein said step of synthesizing comprises (a) attaching initiators to said cell surface membrane of each of said cells of said population, and (b) repeating cycles of (i) contacting under biological conditions the initiators or elongated fragments having free 3'-O-hydroxyls with a 3'-O-blocked nucleoside triphosphate and a template-independent DNA polymerase so that the initiators or elongated fragments are elongated by incorporation of a 3'-O-blocked nucleoside triphosphate to form 3'-O-blocked elongated fragments, and (ii) deblocking the elongated fragments to form elongated fragments having free 3'-hydroxyls.

10. The method of claim 9, wherein each of said cycles further comprises splitting said population of said cells among separate reaction mixtures in which said initiators or elongated fragments are elongated by a different kind of nucleoside triphosphate to form said elongated fragments after which said cells of the separate reaction mixtures are combined.

11. The method of claim 8, wherein said reactors are micelles of a water-in-oil emulsion.

12. The method of claim 11, wherein said micelles are generated by a microfluidics device.

13. The method of claim 9, wherein said cDNAs from said reactors are combined and analyzed by high throughput DNA sequencing.

14. A method of extending one or more native polynucleotides with a predetermined nucleotide sequence, comprising: providing the one or more native polynucleotides in a reaction mixture under TdT reaction conditions, the one or more native polynucleotides having free 3'-O-hydroxyls; and extending the one or more native polynucleotides with the predetermined nucleotide sequence by repeated cycles of the steps (i) contacting the one or more native polynucleotides or elongated native polynucleotides having the free 3'-O-hydroxyls with a 3'-O-blocked nucleoside triphosphate and a TdT variant so that the one or more native polynucleotides or elongated native polynucleotides are elongated by incorporation of the 3'-O-blocked nucleoside triphosphate to form 3'-O-blocked elongated native polynucleotides, and (ii) deblocking the elongated native polynucleotides to form elongated native polynucleotides having free 3'-O-hydroxyls, thereby synthesizing on the one or more native polynucleotides an oligonucleotide of the predetermined nucleotide sequence.

15. The method of claim 14, wherein said predetermined nucleotide sequence comprises at least a plurality of different kinds of nucleotides.

16. The method of claim 15, wherein said predetermined nucleotide sequence is unique for each native polynucleotide of the one or more native polynucleotides.

17. The method of claim 14, wherein each of said cycles further comprises: prior to step (i), splitting said one or more native polynucleotides or said elongated native polynucleotides among two or more separate reaction mixtures, wherein contacting with the 3'-O-blocked nucleoside triphosphate in step (i) comprises contacting said native polynucleotides or said elongated native polynucleotides with two or more different kinds of nucleoside triphosphate to form said elongated native polynucleotides: and after step (i), combining said elongated native polynucleotides of the separate reaction mixtures.

18. A method of generating cDNA libraries each with an oligonucleotide label, the method comprising the steps of: (a) capturing an mRNA by hybridizing the mRNA to capture oligonucleotides attached to one or more solid supports, wherein the capture oligonucleotides are complementary to segments of the mRNA, and the capture oligonucleotides are attached to the one or more solid supports by 5'-ends and have 3'-ends with free 3'-hydroxyls; (b) extending the 3'-ends of the capture oligonucleotides with a reverse transcriptase using the captured mRNAs as templates to form the cDNA libraries on the one or more solid supports; and (c) synthesizing oligonucleotide labels on cDNAs of the one or more solid supports by template-free enzymatic synthesis.

19. The method of claim 18, wherein: said step of capturing includes capturing the mRNA of a single cell on a bead to form said cDNA libraries that are cell-specific cDNA libraries; and said oligonucleotide labels are unique cell-specific oligonucleotide barcodes.

20. The method of claim 18, wherein said step of synthesizing comprises synthesizing said unique cell-specific oligonucleotide barcodes is by a split and mix synthesis method.

21. The method of claim 18, wherein: said one or more solid supports is a solid surface with said capture oligonucleotides attached thereto; and said step of capturing includes capturing mRNA of a permeabilized tissue slice disposed on the solid surface to form a spatial cDNA library array that preserves a spatial distribution of the cDNAs of the permeabilized tissue slice.

22. The method of claim 21, wherein said step of synthesizing includes synthesizing at each of a plurality of different predetermined positions on said spatial cDNA library array a unique position tag to form position tag-cDNA conjugates.

23. The method of claim 22, further comprising: steps of releasing and sequencing said position tag-cDNA conjugates to determine the spatial distribution of said mRNAs in said permeabilized tissue slice.

24. The method of claim 21, wherein said solid surface includes binding compounds attached thereto for capturing predetermined non-nucleic acid ligands.

25. The method of claim 24, wherein said binding compounds comprise one or more kinds of antibodies each with a predetermined specificity for one of said predetermined non-nucleic acid ligands, each different kind of antibody having attached a releasable oligonucleotide barcode from which the antibody can be identified.

Description

[0001] The present application is a divisional of and claims priority to U.S. patent application Ser. No. 16/970,590, entitled "DIRECT OLIGONUCLEOTIDE SYNTHESIS ON CELLS AND BIOMOLECULES," filed on Aug. 17, 2020, which is a U.S. National Stage

[0002] Entry of International Application No. PCT/EP2019/084347, entitled "DIRECT OLIGONUCLEOTIDE SYNTHESIS ON CELLS AND BIOMOLECULES," filed on Dec. 10, 2019, which claims priority to European Application No. 19305219.8 filed on Feb. 25, 2019 and European Application No. 18306687.7 filed on Dec. 13, 2018. All above-identified applications are hereby incorporated by reference in their entireties.

[0003] There are many instances in the biological and medical sciences where large-scale analysis of biomolecules or cells can be facilitated by the use of nucleic acid tags, or barcodes, e.g. Brenner et al, Proc. Natl. Acad. Sci., 97: 1665-1670 (2000); Brenner et al, U.S. Pat. No. 7,537,897; Brenner et al, U.S. Pat. No. 8,476,018; McCloskey et al, U.S. patent publication 2007/0020640; Kinde et al, Proc. Natl. Acad. Sci., 108: 9530-9535 (2011); Fu et al, Proc. Natl. Acad. Sci., 108: 9026-9031 (2011); Nolan, U.S. patent publication 2016/0251697; Zheng et al, Nature Comm., 8:14049 (2017).

[0004] Typically such oligonucleotide labels are attached either (1) by "labeling by sampling" (also referred to as, "stochastic labeling"), where a large set of oligonucleotide labels are pre-synthesized and used to form conjugates with a much smaller population of target organisms or biomolecules to give a conjugate population of organisms or biomolecules with unique labels e.g., Brenner et al, U.S. Pat. No. 7,537,897; or Fu et al, Proc. Natl. Acad. Sci. (cited above); or (2) by "split and mix" hybridizations of a plurality of pre-synthesized oligonucleotide subunits to give a population of organisms or biomolecules all of which have substantially unique labels, e.g. Nolan (cited above); or Seelig et al, U.S. patent publication 2016/0138086. Pre-synthesized tags or tag subunits have been used in these processes because oligonucleotide synthesis has been dominated by chemical methods, such as phosphoramidite chemistry, which requires harsh non-aqueous conditions that are incompatible with biological organisms and biomolecules.

[0005] It would be highly desirable if a capability were available for direct oligonucleotide synthesis on biological organisms or biomolecules to provide such organisms or biomolecules with oligonucleotide labels, such as unique and durable barcodes or tags, for tracking and sorting. Such labeling, particularly when coupled with next-generation sequencing techniques, would be a valuable tool for large-scale cell-based analysis of a host of biological processes.

SUMMARY OF THE INVENTION

[0006] The invention is directed to methods for synthesizing oligonucleotides directly on biomolecules and biological cells, including the application of such methods for single-cell analysis, such as single-cell transcriptome analysis.

[0007] In some embodiments the invention is directed to methods of synthesizing oligonucleotides on biological cells or biomolecules comprising the steps of: (a) providing biological cells or biomolecule having an initiator with a free 3'-hydroxyl; (b) repeating for a plurality of cycles the steps of (i) contacting under elongation conditions the initiator or elongated fragments having free 3'-O-hydroxyls with a 3'-O-blocked nucleoside triphosphate and a template-independent DNA polymerase so that the initiator or elongated fragments are elongated by incorporation of a 3'-O-blocked nucleoside triphosphate to form 3'-O-blocked elongated fragments, and (ii) deblocking the elongated fragments to form elongated fragments having free 3'-hydroxyls, thereby synthesizing on the biological cells or biomolecules a oligonucleotide of predetermined sequence. In some embodiments, such steps are carried out under conditions that maintain biological cells, especially mammalian cells, in a viable state. In some embodiments, the step of deblocking is carried out enzymatically under conditions that maintain biological cells, especially mammalian cells, in a viable state.

[0008] In some embodiments, the invention is directed to a method of generating a cell-specific cDNA library with cell-specific oligonucleotide barcodes comprising the steps of: (a) synthesizing a unique oligonucleotide barcode on a cell surface membrane of each cell in a population of cells to form a population of barcoded cells; (b) isolating each barcoded cell in a reactor; (c) lysing barcoded cells in each reactor; (d) performing reverse-transcriptase polymerase chain reaction (RT-PCR) in each reactor to produce a cDNA library with cell-specific oligonucleotide barcodes. In some embodiments, the step of synthesizing is carried out by a template-free enzymatic synthesis method of the invention. In some embodiments, the RT-PCR reaction includes attaching barcodes to cDNAs by a polymerase cycling amplification reaction.

[0009] In some embodiments, the invention is directed to methods of generating cell-specific cDNA libraries each with cell-specific oligonucleotide barcodes comprising the steps of: (a) capturing mRNA of a single cell by hybridizing the mRNA to capture oligonucleotides attached to a bead, wherein the capture oligonucleotides are complementary to segments of the mRNA and wherein the capture oligonucleotides are attached to the bead by 5-ends and have free 3'-hydroxyls; (b) extending 3'-ends of the capture oligonucleotides with a reverse transcriptase using the captured mRNAs as templates to form cell-specific cDNA libraries; (c) synthesizing a unique cell-specific oligonucleotide barcode on each cDNA of a bead by template-free enzymatic synthesis. In some embodiments, the unique cell- or bead-specific barcode is a random sequence oligonucleotide and the step of synthesizing such barcode is carried out by a "split and mix" procedure with template-free enzymatic synthesis.

[0010] In some embodiments, the invention is directed to methods of extending one or more native polynucleotides with a predetermined nucleotide sequence, comprising: providing one or more native polynucleotides in a reaction mixture under TdT reaction conditions, the native polynucleotides having free 3'-hydroxyls; and extending the one or more native polynucleotide with a predetermined sequence of nucleotides by repeated cycles of the steps (i) contacting the native polynucleotides or elongated native polynucleotides having free 3'-O-hydroxyls with a 3'-O-blocked nucleoside triphosphate and a TdT variant so that the native polynucleotides or elongated native polynucleotides are elongated by incorporation of a 3'-O-blocked nucleoside triphosphate to form 3'-O-blocked elongated native polynucleotides, and (ii) deblocking the elongated native polynucleotides to form elongated native polynucleotides having free 3'-hydroxyls, thereby synthesizing on the native polynucleotides a oligonucleotide of the predetermined sequence.

[0011] These above-characterized aspects, as well as other aspects, of the present invention are exemplified in a number of illustrated implementations and applications, some of which are shown in the figures and characterized in the claims section that follows. However, the above summary is not intended to describe each illustrated embodiment or every implementation of the present invention.

BRIEF DESCRIPTION OF THE DRAWINGS

[0012] FIG. 1A illustrates diagrammatically the steps of a method of template-free enzymatic nucleic acid synthesis using a TdT.

[0013] FIGS. 1B-1C illustrate one embodiment of the invention for directly synthesizing an oligonucleotide on the 3' end of cDNA molecules.

[0014] FIGS. 1D-1F illustrate the generation of tagged cDNA libraries on solid supports, e.g. beads or planar arrays, by applying the method illustrated in FIGS. 1B-1C.

[0015] FIG. 1G illustrates an embodiment for identifying spatial distributions of proteins using oligonucleotide labeled antibodies.

[0016] FIG. 1H illustrates an embodiment for idenfiying both gene expression and protein distribution using immobilized oligonucleotides and antibodies with DNA labels.

[0017] FIG. 1I. illustrates an embodiment for focusing the spatial sequencing analysis on a particular surface area of interest through multiple tagging steps.

[0018] FIGS. 2A-2D illustrate embodiments of the invention for directly synthesizing oligonucleotide tags on living cells or fixed and permeabilized cells.

[0019] FIG. 3A illustrates attachment of initiator oligonucleotides onto a cell surface membrane by a lipophilic anchor.

[0020] FIG. 3B illustrates the "split and mix" synthesis of unique molecular barcodes on initiators anchored in the cell surface membranes of cells.

[0021] FIG. 3C illustrates microfluidics processing of barcoded cells to generate single-cell specific cDNA libraries where each library includes a cell-specific barcode.

[0022] FIG. 4A illustrates a procedure for amplifying specific genes from single cells.

[0023] FIG. 4B illustrates a procedure (template switching) for amplifying a full cDNA library from single cells.

[0024] FIG. 4C illustrates a procedure for attaching cell-specific barcodes to cDNA sequences by polymerase cycling amplification (PCA).

[0025] FIG. 5 illustrates an alternative method for generating barcoded single-cell cDNA libraries.

[0026] FIG. 6 illustrates a chimeric enzymatically/chemically synthesized probe.

DETAILED DESCRIPTION OF THE INVENTION

[0027] The general principles of the invention are disclosed in more detail herein particularly by way of examples, such as those shown in the drawings and described in detail. It should be understood, however, that the intention is not to limit the invention to the particular embodiments described. The invention is amenable to various modifications and alternative forms, specifics of which are shown for several embodiments. The intention is to cover all modifications, equivalents, and alternatives falling within the principles and scope of the invention.

[0028] The practice of the present invention may employ, unless otherwise indicated, conventional techniques and descriptions of organic chemistry, molecular biology (including recombinant techniques), cell biology, and biochemistry, which are within the skill of the art. Such conventional techniques may include, but are not limited to, preparation and use of synthetic peptides, synthetic polynucleotides, monoclonal antibodies, nucleic acid cloning, amplification, sequencing and analysis, and related techniques. Protocols for such conventional techniques can be found in product literature from manufacturers and in standard laboratory manuals, such as Genome Analysis: A Laboratory Manual Series (Vols. I-IV); PCR Primer: A Laboratory Manual; and Molecular Cloning: A Laboratory Manual (all from Cold Spring Harbor Laboratory Press); Lutz and Bomscheuer, Editors, Protein Engineering Handbook (Wiley-VCH, 2009); Hermanson, Bioconjugate Techniques, Second Edition (Academic Press, 2008); and like references.

[0029] The invention is directed to methods for directly synthesizing oligonucleotides on cells or biomolecules using template-free enzymatic oligonucleotide synthesis techniques. Such techniques can be applied directly to biomolecules extracted from their natural settings (e.g. DNAs or RNAs), to biomolecules (e.g. proteins) modified by the attachment of an initiator, or to biomolecules (e.g. nucleic acids, polysaccharides, polypeptides, etc.) produced synthetically. Such techniques may be applied in any circumstances where the mild conditions of an enzymatic reaction are necessary or useful. In particular, the template-free enzymatic synthesis techniques may be employed in hybrid chemical-enzymatic polynucleotide synthesis wherein a precursor polynucleotide is synthesized chemically, then the precursor is further modified by additions of components enzymatically, such as labeled nucleotides, that may be altered in, or not survive, the harsh conditions of chemical synthesis. Such hybrid synthesis methods may include several alterations between chemical additions and enzymatic additions of nucleotides or analogs thereof. Such hybrid synthesis techniques may pair enzymatic synthesis with a variety of different chemical synthesis approaches including, but not limited to, phosphoramidite, phosphodiester, phosphotriester, phosphite triester, H-phosphonate chemistries, e.g. Narang, Editor, Synthesis and applications of DNA and RNA (Academic Press, Inc., 1987).

[0030] In some embodiments, template-free enzymatic synthesis techniques require the presence of an initiator oligonucleotide having a free 3'-hydroxyl which may be part of a biomolecule, in the case of polynucleotides of cDNAs, or it may be readily added by a variety of chemical techniques, e.g. using readily available click chemistry reactions, in the case of cellular membrane proteins, antibodies, or the like. With the availability of an initiator, enzymatic oligonucleotide synthesis may be implemented by repeated cycles of (i) extension of the initiator (or previously extended strand) having a free 3'-hydroxyl by a single nucleotide using a template-free polymerase, such as a terminal deoxynucleotidyl transferase (TdT), in the presence of a 3'-O-blocked nucleoside triphosphate, and (ii) de-blocking recently incorporated 3'-O-blocked nucleotides to regenerate new extendable 3'-hydroxyls. Cycles are continued until an oligonucleotide having a desired sequence is synthesized. In some embodiments, unique oligonucleotide barcodes may be synthesized directly on biological cells, such as mammalian cells, of a population by a "split and mix" synthesis strategy. In some embodiments, capping steps may be included in which non-extended free hydroxyls are reacted with compounds that prevents any further extensions of the capped strand. In some embodiments, such compound may be a dideoxynucleoside triphosphate. In other embodiments, non-extended strands with free 3'-hydroxyls may be degraded by treating them with a 3'-exonuclease activity, e.g. Exo I, as described by Jensen et al, Biochemistry, 57: 1821-1832 (2018).

[0031] In some embodiments, the invention is directed to methods of synthesizing oligonucleotides on biological cells or biomolecules comprising the steps of (a) providing biological cells or biomolecule having an initiator with a free 3'-hydroxyl; (b) repeating for a plurality of cycles the steps of (i) contacting under elongation conditions the initiator or elongated fragments having free 3'-O-hydroxyls with a 3'-O-blocked nucleoside triphosphate and a template-independent DNA polymerase so that the initiator or elongated fragments are elongated by incorporation of a 3'-O-blocked nucleoside triphosphate to form 3'-O-blocked elongated fragments, and (ii) deblocking the elongated fragments to form elongated fragments having free 3'-hydroxyls, thereby synthesizing on the biological cells or biomolecules a oligonucleotide of predetermined sequence. In some embodiments, the biological cells are provided and deblocking is carried out enzymatically. In some embodiments, the 3'-O-blocked nucleoside triphosphate is a 3'-phosphate-nucleoside triphosphate and the step of deblocking is carried out by treating said 3'-O-blocked elongated fragments with a 3'-phosphatase activity. In some embodiments, the 3'-phophatase activity is provided by T4 polynucleotide kinase, recombinant shrimp alkaline phosphatase, or a calf intestinal alkaline phosphatase. In some embodiments, the 3'-O-blocked nucleoside triphosphate is a 3'-ester-nucleoside triphosphate and the step of deblocking is carried out by treating said 3'-O-blocked elongated fragments with an esterase activity. In some embodiments, the esterase activity is a lipase activity, such as, a proteinase K activity. In some embodiments, the 3'-O-blocked nucleoside triphosphate is a 3'-acetyl-nucleoside triphosphate and the step of deblocking is carried out by treating said 3'-O-blocked elongated fragments with an acetylesterase activity. In some embodiments, the template-independent DNA polymerase is a terminal deoxynucleotidyl transferase (TdT) variant having an amino acid sequence at least sixty percent identical to any one of the amino acid sequences of SEQ ID NOs 2-15 with substitutions at a first arginine at position 207 of SEQ ID NO: 2 or a functionally equivalent position in the amino acid sequences of SEQ ID NOs 3-15 and at a second arginine at 325 of SEQ ID NO: 2 or a functionally equivalent position in the amino acid sequences of SEQ ID NOs 3-15, wherein the variant TdT (i) is capable of synthesizing a nucleic acid fragment without a template and (ii) is capable of incorporating a 3'-O-modified nucleotide onto a free 3'-hydroxyl of a nucleic acid fragment.

[0032] In some embodiments, methods of the invention for synthesizing oligonucleotides on a viable cell may be carried out with the following steps: (a) providing an initiator with a free 3'-hydroxyl attached to a cell surface molecule of the cell or anchored in the cell surface membrane of the cell; (b) repeating under biological conditions for a plurality of cycles the steps of (i) contacting the initiator or elongated fragments having free 3'-O-hydroxyls with a 3'-O-blocked nucleoside triphosphate and a template-independent DNA polymerase so that the initiator or elongated fragments are elongated by incorporation of a 3'-O-blocked nucleoside triphosphate to form 3'-O-blocked elongated fragments, and (ii) enzymatically deblocking the elongated fragments to form elongated fragments having free 3'-hydroxyls, thereby synthesizing the oligonucleotide of predetermined sequence.

[0033] The above process is illustrated in FIGS. 1B-1C which show an embodiment where cDNAs have an oligonucleotide synthesized onto their 3' ends. Such cDNAs may be obtained, for example, from single cells isolated in reaction chambers. Cell-specific (or chamber-specific) oligonucleotides may be synthesized on such cDNAs. Afterwards, the contents of such reaction chambers may be combined, and the oligonucleotide-cDNA conjugates analyzed by large-scale sequence analysis to provide, for example, a single-cell transcriptome analysis of a population of cells, or transcriptome analysis of a group of cells from the same reactor that have been exposed to the same conditions. Primer (102) having 3'-polyT portion (104) is annealed to polyA region of messenger RNAs (mRNAs) (100) and extended (108) with a reverse transcriptase using conventional protocols to form cDNA (113)(SEQ ID NO: 16). Primer (102) also has portion (106) which provides a means for attaching the resulting extension products (i.e. the cDNAs) to a solid support. A wide variety of such attachment means are available including, but not limited to, a 5' oligonucleotide tail which may be anneal to a complementary strand attached to a solid support, a member of a click chemistry reaction pair which may be reacted with a complementary member attached to a solid support to form a covalent bond, a member of a non-nucleic acid binding pair which may form a complex with a complementary member attached to a solid phase support to form a non-covalent bond, an example of the latter being biotin and streptavidin. Returning to FIG. 1B, two of the above attachment modes are illustrated with solid support (120a) which has complementary oligonucleotide (112) attached by its 3' end and which captures cDNA (113) by forming a hybrid with portion (106) of extended primer (102), and with alternative support (120b) which has a member of a binding pair (114a)(such as, streptavidin) attached and which captures its complementary member (such as, biotin). The member of a binding pair attached to the cDNA does not require the presence of oligonucleotide tail (106); however, in some embodiments, such oligonucleotide (106) may include nucleotides or nucleotide sequences which may be employed for cleaving finished product from the solid supports, e.g. presence of a uracil for cleavage by USER treatment, or the recognition sequence of a restriction endonuclease, such as a nickase.

[0034] After cDNAs are cleavably or releasably attached to a solid support leaving their 3'-hydroxyls free, enzymatic synthesis can proceed to generate an oligonucleotide of a predeterminded sequence on the free 3' ends, which is illustrated diagrammatically in FIG. 1C. After strands (123) (SEQ ID NO: 16) are attached to a solid support (for example, 120b via a binding pair), it is exposed to a reaction mixture comprising a template-free polymerase, such as TdT, and a 3'-O-blocked nucleoside triphosphate under conditions that permit the TdT to catalyze the formation of phosphate linkages from the 3'-hydroxyls of the cDNAs and the triphosphates of the incoming 3'-O-blocked nucleotides, thereby incorporating the first nucleotide of the desired oligonucleotide. The 3'-O-blocked nucleoside triphosphates of the extension reaction are shown as "3'-O-blocked dYTPs" in the figure. The 3'-O-blocked hydroxyls of the resulting product are deblocked (122) with an appropriate de-blocking agent to form extended cDNAs (125) having free 3'-hydroxyls. As will be discussed more fully below, the selection of a blocking group and its method of removal may vary widely for different embodiments. Such selections for a particular embodiment is within the skill of an ordinary practitioner by evaluation of factors such as, desired speed of synthesis, desired yield of the synthesis, fragility of the target biomolecules or cells being labeled, in particular, whether biologically compatible enzymatic deblocking is more desirable or whether harsher chemical deblocking is acceptable, and so on. Cycles are repeated (126) using the successive nucleotides of the desired oligonucleotide until the synthesis of the oligonucleotide is complete. In some embodiments, additional steps, such as, one or more washing steps, or a step of removing 3'-O-blocked-dYTPs, are included. After completion of the synthesis, the oligonucleotide-labeled cDNAs may be removed from the solid support for further analysis or use. In some embodiments, the cDNA may be retained on the solid support for further analysis or use.

[0035] In some embodiments, oligonucleotides may be synthesized on other biomolecules, such as antibodies, by a process similar to that of FIG. 1C provided that the biomolecules have initiator sequences attached.

[0036] FIGS. 1D-1E illustrate how the above methods may be used with commercially available polyT beads to construct solid phase cDNA libraries, e.g. Dynabeads' oligo(dT) magnetic beads, Bosnes et al, ThermoFisher Application Note (2017). PolyT beads (150) are combined with cell extract or lysate containing polyA RNA (152) under conditions that permit hybridization of the polyT segments of the beads to the polyA segments of the RNAs, after which the hybridized polyT segments are extended in a reverse transcription reaction. After removal of the RNA template (156), a solid phase cDNA library (158) results, which may then be processed in accordance with the method of FIGS. 1B and 1C (160) to synthesize barcodes, primer binding sites, or the like (162), to allow further analysis of the cDNAs. Such barcodes may uniquely designate a particular sample, such as a patient sample, or as described further below, such barcodes may designate uniquely a single cell.

[0037] In some embodiments, the invention is directed to methods of generating cDNA libraries each with a oligonucleotide label comprising the steps of: (a) capturing mRNA by hybridizing the mRNA to capture oligonucleotides attached to one or more solid supports, wherein the capture oligonucleotides are complementary to segments of the mRNA and wherein the capture oligonucleotides are attached to the one or more solid supports by 5-ends and have 3'-ends with free 3'-hydroxyls; (b) extending 3'-ends of the capture oligonucleotides with a reverse transcriptase using the captured mRNAs as templates to form cDNA libraries on the one or more solid supports; and (c) synthesizing a oligonucleotide label on each cDNA on the one or more solid support by template-free enzymatic synthesis. In some embodiments, the step of synthesizing comprises repeating cycles of (i) contacting under elongation conditions said cDNAs with a 3'-O-blocked nucleoside triphosphate and a template-independent DNA polymerase so that said cDNAs or elongated cDNAs with free 3'-hydroxyls are elongated by incorporation of a 3'-O-blocked nucleoside triphosphate to form 3'-O-blocked elongated cDNAs, and (ii) deblocking the elongated cDNA to form elongated cDNAs having free 3'-hydroxyls. In some embodiments, each of said cycles further comprises splitting said cDNAs or elongated cDNAs with free 3'-hydroxyls among separate reaction mixtures in which said cDNAs or elongated cDNAs with free 3'-hydroxyls are elongated by a different kind of nucleoside triphosphate to form said elongated cDNAs after which said elongated cDNAs of the separate reaction mixtures are combined. In some embodiments, the step of capturing includes capturing mRNA of a single cell on a bead to form the cDNA libraries that are cell-specific cDNA libraries and wherein the oligonucleotide labels are unique cell-specific oligonucleotide barcodes. In some embodiments, the step of synthesizing a unique cell-specific barcode is implemented by a split and mix synthesis method.

[0038] Template-Free Enzymatic Synthesis of Oligonucleotides

[0039] Generally, methods of template-free (or equivalently, "template-independent") enzymatic DNA synthesis comprise repeated cycles of steps, such as are illustrated in FIG. 1A, in which a predetermined nucleotide is coupled to an initiator or growing chain in each cycle. The general elements of template-free enzymatic synthesis is described in the following references: Ybert et al, International patent publication WO/2015/159023; Ybert et al, International patent publication WO/2017/216472; Hyman, U.S. Pat. No. 5,436,143; Hiatt et al, U.S. Pat. No. 5,763,594; Jensen et al, Biochemistry, 57: 1821-1832 (2018); Mathews et al, Organic & Biomolecular Chemistry, DOI: 0.1039/c6ob01371f (2016); Schmitz et al, Organic Lett., 1(11): 1729-1731 (1999).

[0040] Initiator polynucleotides (1000) are provided, for example, attached to solid support (1020), which have free 3'-hydroxyl groups (1030). To the initiator polynucleotides (1000) (or elongated initiator polynucleotides in subsequent cycles) are added a 3'-O-protected-dNTP and a template-free polymerase, such as a TdT or variant thereof (e.g. Ybert et al, WO/2017/216472; Champion et al, WO2019/135007) under conditions (1040) effective for the enzymatic incorporation of the 3'-O-protected-dNTP onto the 3' end of the initiator polynucleotides (1000) (or elongated initiator polynucleotides). This reaction produces elongated initiator polynucleotides whose 3'-hydroxyls are protected (1060). If the elongated sequence is not complete, then another cycle of addition is implemented (1080). If the elongated initiator polynucleotide contains a competed sequence, then the 3'-O-protection group may be removed, or deprotected, and the desired sequence may be cleaved from the original initiator polynucleotide (1100). Such cleavage may be carried out using any of a variety of single strand cleavage techniques, for example, by inserting a cleavable nucleotide at a predetermined location within the original initiator polynucleotide. An exemplary cleavable nucleotide may be a uracil nucleotide which is cleaved by uracil DNA glycosylase.

[0041] If the elongated initiator polynucleotide does not contain a completed sequence, then the 3 '-O-protection groups are removed to expose free 3'-hydroxyls (1030) and the elongated initiator polynucleotides are subjected to another cycle of nucleotide addition and deprotection.

[0042] As used herein, an "initiator" (or equivalent terms, such as, "initiating fragment," "initiator nucleic acid," "initiator oligonucleotide," or the like) usually refers to a short oligonucleotide sequence with a free 3'-end, which can be further elongated by a template-free polymerase, such as TdT. In one embodiment, the initiating fragment is a DNA initiating fragment. In an alternative embodiment, the initiating fragment is an RNA initiating fragment.

[0043] In some embodiments, an initiating fragment possesses between 3 and 100 nucleotides, in particular between 3 and 20 nucleotides. In some embodiments, the initiating fragment is single-stranded. In alternative embodiments, the initiating fragment is double-stranded. In some embodiments, an initiator may comprise a non-nucleic acid compound having a free hydroxyl to which a TdT may couple a 3'-O-protected dNTP, e.g. Baiga, U.S. patent publications US2019/0078065 and US2019/0078126.

[0044] Returning to FIG. 1A, in some embodiments, an ordered sequence of nucleotides is coupled to an initiator nucleic acid using a template-free polymerase, such as TdT, in the presence of 3'-O-protected dNTPs in each synthesis step. In some embodiments, the method of synthesizing an oligonucleotide comprises the steps of (a) providing an initiator having a free 3'-hydroxyl; (b) reacting under extension conditions the initiator or an extension intermediate having a free 3'-hydroxyl with a template-free polymerase in the presence of a 3'-O-protected nucleoside triphosphate to produce a 3'-O-protected extension intermediate; (c) deprotecting the extension intermediate to produce an extension intermediate with a free 3'-hydroxyl; and (d) repeating steps (b) and (c) until the polynucleotide is synthesized. (Sometimes the terms "extension intermediate" and "elongation fragment" are used interchangeably). In some embodiments, an initiator is provided as an oligonucleotide attached to a solid support, e.g. by its 5' end. The above method may also include washing steps after the reaction, or extension, step, as well as after the de-protecting step. For example, the step of reacting may include a sub-step of removing unincorporated nucleoside triphosphates, e.g. by washing, after a predetermined incubation period, or reaction time. Such predetermined incubation periods or reaction times may be a few seconds, e.g. 30 sec, to several minutes, e.g. 30 min.

[0045] When the sequence of polynucleotides on a synthesis support includes reverse complementary subsequences, secondary intra-molecular or cross-molecular structures may be created by the formation of hydrogen bonds between the reverse complementary regions. In some embodiments, base protecting moieties for exocyclic amines are selected so that hydrogens of the protected nitrogen cannot participate in hydrogen bonding, thereby preventing the formation of such secondary structures. That is, base protecting moieties may be employed to prevent the formation of hydrogen bonds, such as are formed in normal base pairing, for example, between nucleosides A and T and between G and C. At the end of a synthesis, the base protecting moieties may be removed and the polynucleotide product may be cleaved from the solid support, for example, by cleaving it from its initiator.

[0046] 3'-O-blocked dNTPs without base protection may be purchased from commercial vendors or synthesized using published techniques, e.g. U.S. Pat. No. 7,057,026; Guo et al, Proc. Natl. Acad. Sci., 105(27): 9145-9150 (2008); Benner, U.S. Pat. Nos. 7,544,794 and 8,212,020; International patent publications WO2005/005667, WO91/06678; Canard et al, Gene (cited herein); Metzker et al, Nucleic Acids Research, 22: 4259-4267 (1994); Meng et al, J. Org. Chem., 14: 3248-3252 (3006); U.S. patent publication 2005/037991. 3'-O-blocked dNTPs with base protection may be synthesized as described below.

[0047] When base-protected dNTPs are employed the above method of FIG. 1A may further include a step (e) removing base protecting moieties, which in the case of acyl or amidine protection groups may (for example) include treating with concentrated ammonia.

[0048] The above method may also include capping step(s) as well as washing steps after the reacting, or extending, step, as well as after the deprotecting step. As mentioned above, in some embodiments, capping steps may be included in which non-extended free 3'-hydroxyls are reacted with compounds that prevents any further extensions of the capped strand. In some embodiments, such compound may be a dideoxynucleoside triphosphate. In other embodiments, non-extended strands with free 3'-hydroxyls may be degraded by treating them with a 3'-exonuclease activity, e.g. Exo I. For example, see Hyman, U.S. Pat. No. 5,436,143. Likewise, in some embodiments, strands that fail to be deblocked may be treated to either remove the strand or render it inert to further extensions.

[0049] In some embodiments, reaction conditions for an extension or elongation step may comprising the following: 2.0 .mu.M purified TdT; 125-600 .mu.M 3'-O-blocked dNTP (e.g. 3'-O--NH.sub.2-blocked dNTP); about 10 to about 500 mM potassium cacodylate buffer (pH between 63 and 7.5) and from about 0.01 to about 10 .mu.M of a divalent cation (e.g. CoCl.sub.2 or MnCl.sub.2), where the elongation reaction may be carried out in a 50 .mu.I, reaction volume, at a temperature within the range RT to 45.degree. C., for 3 minutes. In embodiments, in which the 3'-O-blocked dNTPs are 3'-O--NH.sub.2-blocked dNTPs, reaction conditions for a deblocking step may comprise the following: 700 mM NaNO.sub.2; 1 M sodium acetate (adjusted with acetic acid to pH in the range of 4.8-6.5), where the deblocking reaction may be carried out in a 50 .mu.l, volume, at a temperature within the range of RT to 45.degree. C. for 30 seconds to several minutes.

[0050] Depending on particular applications, the steps of deblocking and/or cleaving may include a variety of chemical or physical conditions, e.g. light, heat, pH, presence of specific reagents, such as enzymes, which are able to cleave a specified chemical bond. Guidance in selecting 3'-O-blocking groups and corresponding de-blocking conditions may be found in the following references, which are incorporated by reference: Benner, U.S. Pat. Nos. 7,544,794 and 8,212,020; U.S. Pat. Nos. 5,808,045; 8,808,988; International patent publication WO91/06678; and references cited below. In some embodiments, the cleaving agent (also sometimes referred to as a de-blocking reagent or agent) is a chemical cleaving agent, such as, for example, dithiothreitol (DTT). In alternative embodiments, a cleaving agent may be an enzymatic cleaving agent, such as, for example, a phosphatase, which may cleave a 3'-phosphate blocking group. It will be understood by the person skilled in the art that the selection of deblocking agent depends on the type of 3'-nucleotide blocking group used, whether one or multiple blocking groups are being used, whether initiators are attached to living cells or organisms or to solid supports, and the like, that necessitate mild treatment. For example, a phosphine, such as tris(2-carboxyethyl)phosphine (TCEP) can be used to cleave a 3'O-azidomethyl groups, palladium complexes can be used to cleave a 3'O-allyl groups, or sodium nitrite can be used to cleave a 3'O-amino group. In particular embodiments, the cleaving reaction involves TCEP, a palladium complex or sodium nitrite.

[0051] As noted above, in some embodiments it is desirable to employ two or more blocking groups that may be removed using orthogonal de-blocking conditions. The following exemplary pairs of blocking groups may be used in parallel synthesis embodiments (Table 1). It is understood that other blocking group pairs, or groups containing more than two, may be available for use in these embodiments of the invention.

TABLE-US-00001 TABLE 1 Pairs of blocking groups 3'-O--NH2 3'-O-azidomethyl 3'-O--NH2 3'-O-allyl 3'-O--NH2 3'-O-phosphate 3'-O-azidomethyl 3'-O-allyl 3'-O-azidomethyl 3'-O-phosphate 3'-O-allyl 3'-O-phosphate

[0052] Synthesizing oligonucleotides on living cells requires mild deblocking, or deprotection, conditions, that is, conditions that do not disrupt cellular membranes, denature proteins, interfere with key cellular functions, or the like. In some embodiments, deprotection conditions are within a range of physiological conditions compatible with cell survival. In such embodiments, enzymatic deprotection is desirable because it may be carried out under physiological conditions. In some embodiments specific enzymatically removable blocking groups are associated with specific enzymes for their removal. For example, ester- or acyl-based blocking groups may be removed with an esterase, such as acetylesterase, or like enzyme, and a phosphate blocking group may be removed with a 3' phosphatase, such as T4 polynucleotide kinase. By way of example, 3'-O-phosphates may be removed by treatment with as solution of 100 mM Tris-HCl (pH 6.5) 10 mM MgCl.sub.2, 5 mM 2-mercaptoethanol, and one Unit T4 polynucleotide kinase. The reaction proceeds for one minute at a temperature of 37.degree. C.

[0053] A "3'-phosphate-blocked" or "3'-phosphate-protected" nucleotide refers to nucleotides in which the hydroxyl group at the 3'-position is blocked by the presence of a phosphate containing moiety. Examples of 3'-phosphate-blocked nucleotides in accordance with the invention are nucleotidyl-3'-phosphate monoester/nucleotidyl-2',3'-cyclic phosphate, nucicotidyl-2'-phosphate monoester and nucleotidyl-2' or 3'-alkylphosphate diester, and nucleotidyl-2' or 3'-pyrophosphate. Thiophosphate or other analogs of such compounds can also be used, provided that the substitution does not prevent dephosphorylation resulting in a free 3'-OH by a phosphatase.

[0054] Further examples of synthesis and enzymatic deprotection of 3'-O-ester-protected dNTPs or 3'-O-phosphate-protected dNTPs are described in the following references: Canard et al, Proc. Natl. Acad. Sci., 92:10859-10863 (1995); Canard et al, Gene, 148: 1-6 (1994); Cameron et al, Biochemistry, 16(23): 5120-5126 (1977); Rasolonjatovo et al, Nucleosides & Nucleotides, 18(4&5): 1021-1022 (1999); Ferrero et al, Monatshefte fur Chemie, 131: 585-616 (2000); Taunton-Rigby et al, J. Org. Chem., 38(5): 977-985 (1973); Uemura et al, Tetrahedron Lett., 30(29): 3819-3820 (1989); Becker et al, J. Biol. Chem., 242(5): 936-950 (1967); Tsien, International patent publication WO1991/006678.

[0055] In some embodiments, the modified nucleotides comprise a modified nucleotide or nucleoside molecule comprising a purine or pyrimidine base and a ribose or deoxyribose sugar moiety having a removable 3'-OH blocking group covalently attached thereto, such that the 3' carbon atom has attached a group of the structure:

--O--Z

wherein --Z is any of --C(R').sub.2--O--R'', --C(R').sub.2--N(R'').sub.2, --C(R').sub.2--N(H)R'', --C(R').sub.2--S--R'' and --C(R').sub.2--F, wherein each R'' is or is part of a removable protecting group; each R' is independently a hydrogen atom, an alkyl, substituted alkyl, arylalkyl, alkenyl, alkynyl, aryl, heteroaryl, heterocyclic, acyl, cyano, alkoxy, aryloxy, heteroaryloxy or amido group, or a detectable label attached through a linking group; with the proviso that in some embodiments such substituents have up to 10 carbon atoms and/or up to 5 oxygen or nitrogen heteroatoms; or (R').sub.2 represents a group of formula .dbd.C(R''').sub.2 wherein each R''' may be the same or different and is selected from the group comprising hydrogen and halogen atoms and alkyl groups, with the proviso that in some embodiments the alkyl of each R''' has from 1 to 3 carbon atoms; and wherein the molecule may be reacted to yield an intermediate in which each R'' is exchanged for H or, where Z is --(R').sub.2--F, the F is exchanged for OH, SH or NH.sub.2, preferably OH, which intermediate dissociates under aqueous conditions to afford a molecule with a free 3'-OH; with the proviso that where Z is --C(R').sub.2--S--R'', both R' groups are not H. In certain embodiments, R' of the modified nucleotide or nucleoside is an alkyl or substituted alkyl, with the proviso that such alkyl or substituted alkyl has from 1 to 10 carbon atoms and from 0 to 4 oxygen or nitrogen heteroatoms. In certain embodiments, --Z of the modified nucleotide or nucleoside is of formula --C(R').sub.2--N.sub.3. In certain embodiments, Z is an azidomethyl group.

[0056] In some embodiments, Z is a cleavable organic moiety with or without heteroatoms having a molecular weight of 200 or less. In other embodiments, Z is a cleavable organic moiety with or without heteroatoms having a molecular weight of 100 or less. In other embodiments, Z is a cleavable organic moiety with or without heteroatoms having a molecular weight of 50 or less. In some embodiments, Z is an enzymatically cleavable organic moiety with or without heteroatoms having a molecular weight of 200 or less. In other embodiments, Z is an enzymatically cleavable organic moiety with or without heteroatoms having a molecular weight of 100 or less. In other embodiments, Z is an enzymatically cleavable organic moiety with or without heteroatoms having a molecular weight of 50 or less. In other embodiments, Z is an enzymatically cleavable ester group having a molecular weight of 200 or less. In other embodiments, Z is a phosphate group removable by a 3'-phosphatase. In some embodiments, one or more of the following 3'-phosphatases may be used with the manufacturer's recommended protocols: T4 polynucleotide kinase, calf intestinal alkaline phosphatase, recombinant shrimp alkaline phosphatase (e.g. available from New England Biolabs, Beverly, MA)

[0057] In a further embodiment, the 3'-blocked nucleotide triphosphate is blocked by either a 3'-O-azidomethyl, 3'-O--NH.sub.2 or 3'-O-allyl group.

[0058] In still other embodiments, 3'-O-blocking groups of the invention include 3'-O-methyl, 3'-O-(2-nitrobenzyl), 3'-O-allyl, 3'-O-amine, 3'-O-azidomethyl, 3'-O-tert-butoxy ethoxy, 3'-O-(2-cyanoethyl), and 3'-O-propargyl.

[0059] In some embodiments, 3'-O-- protection groups are electrochemically labile groups. That is, deprotection or cleavage of the protection group is accomplished by changing the electrochemical conditions in the vicinity of the protection group which result in cleavage. Such changes in electrochemical conditions may be brought about by changing or applying a physical quantity, such as a voltage difference or light to activate auxiliary species which, in turn, cause changes in the electrochemical conditions at the site of the protection group, such as an increase or decrease in pH. In some embodiments, electrochemically labile groups include, for example, pH-sensitive protection groups that are cleaved whenever the pH is changed to a predetermined value. In other embodiments, electrochemically labile groups include protecting groups which are cleaved directly whenever reducing or oxidizing conditions are changed, for example, by increasing or decreasing a voltage difference at the site of the protection group.

[0060] In some embodiments, enzymatic synthesis methods employ TdT variants that display increased incorporation activity with respect to 3'-O-modified nucleoside triphosphates. For example, such TdT variants may be produced using techniques described in Champion et al, U.S. Pat. No. 10,435,676, which is incorporated herein by reference. In some embodiments, a TdT variant is employed having an amino acid sequence at least 60 percent identical to SEQ ID NO: 2 and a substitution at a first arginine at position 207 and a substitution at a second arginine at position 325, or functionally equivalent residues thereof. In some embodiments, a terminal deoxynucleotidyl transferase (TdT) variant is employed that has an amino acid sequence at least sixty percent identical to an amino acid sequence selected from SEQ ID NO: 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14 or 15 with a substitution of arginine ("first arginine") at position 207 with respect to SEQ ID NOs 2, 3, 4, 6, 7, 9, 12 and 13, at position 206 with respect to SEQ ID NO 5, at position 208 with respect to SEQ ID NOs 8 and 10, at position 205 with respect to SEQ ID NO 11, at position 216 with respect to SEQ ID NO 14 and at position 210 with respect to SEQ ID NO 15; and a substitution of arginine ("second arginine") at position 325 with respect to SEQ ID NOs 2, 9 and 13, at position 324 with respect to SEQ ID NOs 3 and 4, at position 320 with respect to SEQ ID NO 320, at position 331 with respect to SEQ ID NOs 6 and 8, at position 323 with respect to SEQ ID NO 11, at position 328 with respect to SEQ ID NOs 12 and 15, and at position 338 with respect to SEQ ID NO 14; or functionally equivalent residues thereof; wherein the TdT variant (i) is capable of synthesizing a nucleic acid fragment without a template and (ii) is capable of incorporating a 3'-O-modified nucleotide onto a free 3'-hydroxyl of a nucleic acid fragment. In some embodiments, the above percent identity value is at least 80 percent identity with the indicated SEQ ID NOs; in some embodiments, the above percent identity value is at least 90 percent identity with the indicated SEQ ID NOs; in some embodiments, the above percent identity value is at least 95 percent identity with the indicated SEQ ID NOs; in some embodiments, the above percent identity value is at least 97 percent identity; in some embodiments, the above percent identity value is at least 98 percent identity; in some embodiments, the above percent identity value is at least 99 percent identity. As used herein, the percent identity values used to compare a reference sequence to a variant sequence do not include the expressly specified amino acid positions containing substitutions of the variant sequence; that is, the percent identity relationship is between sequences of a reference protein and sequences of a variant protein outside of the expressly specified positions containing substitutions in the variant. Thus, for example, if the reference sequence and the variant sequence each comprised 100 amino acids and the variant sequence had mutations at positions 25 and 81, then the percent homology would be in regard to sequences 1-24, 26-80 and 82-100.

[0061] In regard to (ii), such 3'-O-modified nucleotide may comprise a 3'-O--NH2-nucleoside triphosphate, a 3'-O-azidomethyl-nucleoside triphosphate, a 3'-O-allyl-nucleoside triphosphate, a 3'O-(2-nitrobenzyl)-nucleoside triphosphate, or a 3'-O-propargyl-nucleoside triphosphate.

[0062] In some embodiments, the above TdT variants have substitutions at the first and second arginines as shown in Table 2.

TABLE-US-00002 TABLE 2 TdT variants SEQ ID NO Substitutions 1 M192R/Q C302G/R R336L/N R454P/N/A/V E457N/L/T/S/K 2 M63R/Q C173G/R R207L/N R325P/N/A/V E328N/L/T/S/K 3 M63R/Q C173G/R R207L/N R324P/N/A/V E327N/L/T/S/K 4 M63R/Q C173G/R R207L/N R324P/N/A/V E327N/L/T/S/K 5 -- C172G/R R206L/N R320P/N/A/V -- 6 M63R/Q C173G/R R207L/N R331P/N/A/V E334N/L/T/S/K 7 M63R/Q C173G/R R207L/N -- E328N/L/T/S/K 8 -- C174G/R R208L/N R331P/N/A/V E334N/L/T/S/K 9 M73R/Q C173G/R R207L/N R325P/N/A/V E328N/L/T/S/K 10 M64R/Q C174G/R R208L/N -- E329N/L/T/S/K 11 M61R/Q C171G/R R205L/N R323P/N/A/V E326N/L/T/S/K 12 M63R/Q C173G/R R207L/N R328P/N/A/V E331N/L/T/S/K 13 -- C173G/R R207L/N R325P/N/A/V E328N/L/T/S/K 14 M63R/Q C182G/R R216L/N R338P/N/A/V E341N/L/T/S/K 15 M66R/Q C176G/R R210L/N R328P/N/A/V E331N/L/T/S/K

[0063] In some embodiments, further TdT variants for use with methods of the invention include one or more of the further substitutions of methionine, cysteine or glutamic acid, as shown in Table 1.

[0064] Further specific TdT variants that may be used in methods of the invention are set forth in Table 3. Each of the TdT variants DS1001 through DS1018 of Table 2 comprises an amino acid sequence at least 60 percent identical to SEQ ID NO 2 and comprises the substitutions at the indicated positions. In some embodiments, TdT variants DS1001 through DS1018 comprises an amino acid sequence at least 80 percent identical to SEQ ID NO 2 and comprises the substitutions at the indicated positions; in some embodiments, TdT variants DS1001 through DS1018 comprises an amino acid sequence at least 90 percent identical to SEQ ID NO 2 and comprises the substitutions at the indicated positions; in some embodiments, TdT variants DS1001 through DS1018 comprises an amino acid sequence at least 95 percent identical to SEQ ID NO 2 and comprises the substitutions at the indicated positions; in some embodiments, TdT variants DS1001 through DS1018 comprises an amino acid sequence at least 97 percent identical to SEQ ID NO 2 and comprises the substitutions at the indicated positions; in some embodiments, TdT variants DS1001 through DS1018 comprises an amino acid sequence at least 98 percent identical to SEQ ID NO 2 and comprises the substitutions at the indicated positions; in some embodiments, TdT variants DS1001 through DS1018 comprises an amino acid sequence at least 99 percent identical to SEQ ID NO 2 and comprises the substitutions at the indicated positions.

TABLE-US-00003 TABLE 3 Specific TdT Variants for Use with Methods of the Invention DS1001 (TH M27) A17V + L52F + M63R + A108V + C173G + R207L + K265T + G284P + E289V + R325P + E328N + R351K DS1002 (M44) A17V + Q37E + D41R + L52F + G57E + M63R + S94R + G98E + A108V + S119A + L131R + S146E + Q149R + C173G + R207L + K265T + G284P + E289V + R325P + Q326F + E328N + H337D + R351K + W377R DS1003 A17V + Q37E + D41R + L52F + G57E + M63R + S94R + G98E + A108V + S146E + Q149R + C173G + F193Y + V199M + M201V + R207L + K265T + G284P + E289V + Q326F + E328N + R351K DS1004 (M45) A17V + Q37E + D41R + L52F + G57E + M63R + S94R + G98E + A108V + S146E + Q149R + C173G + F193Y + V199M + M201V + R207L + K265T + G284P + E289V + R325A + Q326F + E328N + R351K DS1005 A17V + Q37E + D41R + L52F + G57E + M63R + S94R + G98E + A108V + S146E + Q149R + C173G + F193Y + V199M + M201V + R207L + K265T + G284P + E289V + Q326F + E328N + R351K DS1006 (M46) L52F + A108V + R351K + A17V + Q37E + D41R + G57E+ C59R + L60D + M63R + S94R + G98E + S119A + L131R + S146E + Q149R + C173G + R207L + K265T + G284P + E289V + R325A + Q326F + E328N DS1007 (M47) L52F + A108V + R351K + A17V + Q37E + D41R + G57E + C59R + L60D + M63R + S94R + G98E + K118Q + S119A + L131R + S146E + Q149R + C173G + R207L + K265T + G284P + E289V + R325A + Q326F + E328N + W377R DS1008 A17V + Q37E + D41R + L52F + G57E + C59R + L60D + M63R + S94R + G98E + A108V + S119A + L131R + S146E + Q149R + C173G + R207L + F259S + Q261L + G284P + E289V + R325A + Q326F + E328N + R351K + W377R DS1009 (MS 13-34) A17V + D41R + L53F + G57E + C59R + L60D + M63R + S94R + G98E + K118Q + S119A + L131R + S146E + Q149R + C173G + R207L + K265T + G284P + E289V + R325A + Q326F + R351K + W377R DS1010 (MS 34-1) A17V + D41R + L52F + G57E + C59R + L60D + M63R + S94R + G98E + A108V + S119A + L131R + S146E + Q149R + R207L + K265T + G284P + E289V + R325A + Q326F + R351K DS1011 A17V + D41R + L53F + G57E + C59R + L60D + M63R + S94R + G98E + K118Q + S119A + L131R + S146E + Q149R + C173G + R207L + K265T + G284P + E289V + Q326F + R351K + W377R DS1012 (M48) A17V + Q37E + D41R + L52F + G57E + C59R + L60D + M63R + S94R + G98E + A108V + S119A + L131R+ S146E + Q149R + C173G + R207L + F259S + Q261L, G284P + E289V + R325A + Q326F + E328N + R351K + W377R DS1013 A17V + Q37E + D41R + L52F + G57E + M63R + S94R + G98E + A108V + S146E + Q149R + C173G + R207L + K265T + G284P + E289V + R325A + Q326F + E328N + R351K DS1014 (M49) A17V + Q37E + D41R + L52F + G57E + C59R + L60D + M63R + S94R + G98E + A108V + S119A + L131R + S146E + Q149R + C173G + R207L + E257D + F259S + K260R + Q261L + G284P + E289V + R325A + Q326F + E328N + R351K + W377R DS1015 A17V + Q37E + D41R + L52F + G57E + C59R + L60D + M63R + S94R + G98E + A108V + S119A + L131R + S146E + Q149R + C173G + F193Y + V199M + M201V + R207L + E257D + F259S + K260R + Q261L + G284P + E289V + R325A + Q326F + E328N + R351K + W377R DS1016 TH c2_5 A17V + D41R + L52F + G57E + M63R + S94R + G98E + A108V + S146E + Q149R + C173G + M184T + R207L + K209H + G284L + E289A + R325V + E328K + R351K DS1017 (M27) A17V + L52F + G57E + M63R + A108V + C173G + R207L + K265T + G284P + E289V + R325P + E328N + R351K DS1018 (M60) A17V + L32T + Q37R + D41R + L52F + G57E + C59R + L60D + M63R + S67A + S94R + G98E + A108V + S119A + L131R + S146E + Q149R + V171A + S172E + C173R + V182I + S183E + R207L + K209H + M210K + T211I + E223G + A224P + E228D + Q261L + G284P + E289V + R325A + Q326F + E328N + R351K + D372E

[0065] TdT variants of the invention as described above each comprise an amino acid sequence having a percent sequence identity with a specified SEQ ID NO, subject to the presence of indicated substitutions. In some embodiments, the number and type of sequence differences between a TdT variant of the invention described in this manner and the specified SEQ ID NO may be due to substitutions, deletion and/or insertions, and the amino acids substituted, deleted and/or inserted may comprise any amino acid. In some embodiments, such deletions, substitutions and/or insertions comprise only naturally occurring amino acids. In some embodiments, substitutions comprise only conservative, or synonymous, amino acid changes, as described in Grantham, Science, 185: 862-864 (1974). That is, a substitution of an amino acid can occur only among members of its set of synonymous amino acids. In some embodiments, sets of synonymous amino acids that may be employed are set forth in Table 4A.

TABLE-US-00004 TABLE 4A Synonymous Sets of Amino Acids I Amino Acid Synonymous Set Ser Ser, Thr, Gly, Asn Arg Arg, Gln, Lys, Glu, His Leu Ile, Phe, Tyr, Met, Val, Leu Pro Gly, Ala, Thr, Pro Thr Pro, Ser, Ala, Gly, His, Gln, Thr Ala Gly, Thr, Pro, Ala Val Met, Tyr, Phe, Ile, Leu, Val Gly Gly, Ala, Thr, Pro, Ser Ile Met, Tyr, Phe, Val, Leu, Ile Phe Trp, Met, Tyr, Ile, Val, Leu, Phe Tyr Trp, Met, Phe, Ile, Val, Leu, Tyr Cys Cys, Ser, Thr His His, Glu, Lys, Gln, Thr, Arg Gln Gln, Glu, Lys, Asn, His, Thr, Arg Asn Asn, Gln, Asp, Ser Lys Lys, Glu, Gln, His, Arg Asp Asp, Glu, Asn Glu Glu, Asp, Lys, Asn, Gln, His, Arg Met Met, Phe, He, Val, Leu Trp Trp

[0066] In some embodiments, sets of synonymous amino acids that may be employed are set forth in Table 4B.

TABLE-US-00005 TABLE 4B Synonymous Sets of Amino Acids II Amino Acid Synonymous Set Ser Ser Arg Arg, Lys, His Leu Ile, Phe, Met, Leu Pro Ala, Pro Thr Thr Ala Pro,Ala Val Met, Ile Val Gly Gly Ile Met, Phe, Val, Leu, He Phe Met, Tyr, Ile, Leu, Phe Tyr Trp, Met Cys Cys, Ser His His, Gln, Arg Gln Gln, Glu, His Asn Asn, Asp Lys Lys, Arg Asp Asp, Asn Glu Glu, Gln Met Met, Phe, Ile, Val, Leu Trp Trp

Synthesizing Oligonucleotides on Biomolecules

[0067] Biomolecules on which oligonucleotides may be synthesized in accordance with the invention include, but are not limited to, polynucleotides, peptides, proteins, glycans, polysaccharides, and the like. Virtually any biomolecule or other material to which an initiator can be attached can have an oligonucleotide synthesized on it by methods of the invention. As noted above, for polynucleotides, such as cDNAs, genomic fragments, or the like, a variety of different initiator attachment schemes are available, including schemes resulting in a covalent bond between an initiator and a biomolecule or surface and schemes resulting in a non-covalent bond between an initiator and biomolecule or surface, such as the formation of a duplex between the initiator and another complementary oligonucleotide attached to a surface or biomolecule, or the formation of a complex between a capture moiety and its complementary moiety, such as a biotin and streptavidin.

[0068] Polynucleotides to which an oligonucleotide has been synthesized may be detached from a solid support in a variety of ways. Initiators hybridized to a capture oligonucleotide may simply be melted or dehybridized from the capture oligonucleotide, or the duplex may be designed to include a restriction endonuclease or nickase recognition site. In embodiments in which an initiator is covalently attached to a surface, several techniques are available to cleave a single strand, e.g. inserting a uracil at a predetermined location in an initiator, Delort et al, Nucleic Acids Research, 13: 319-335 (1985).

[0069] Oligonucleotide initiators may be attached to proteins, such as antibodies, using well-known techniques, such as described in the following references: Hermanson (cited above); Baskin et al, Proc. Natl. Acad. Sci., 104(43): 16793-16797 (2007); Gong et al, Bioconjugate Chemistry, 27: 217-225 (2016); Horisawa, Frontiers in Physiology, 5: 1-6 (2014); Jewett et al, Chem. Soc. Rev., 39(4): 1272-1279 (2010); U.S. Pat. No. 5,665,539; and the like.

[0070] Once an initiator is attached, then enzymatic synthesis may be performed to extend the initiator. In some embodiments, proteins are reversibly attached to a solid support prior to synthesis. As with polynucleotides, such attachment may be covalent or non-covalent. If the protein is a recombinant protein attachment may be by way of a peptide tag, such as a poly-histidine tag, or like method. In some embodiments, proteins may be immobilized on a solid support by capture and binding to an antibody attached to the solid support.

Synthesizing Oligonucleotides on Biological Cells

[0071] The value of single cell measurements has been long appreciated for assessing rare subpopulations which otherwise would be undetectable from ensemble measurements, which provide only averages of cellular parameters from many cells, e.g. Di Carlo et al, Methods in Molecular Biology, 853: 1-9 (2012). As a result, a range of technologies has been developed for high-throughput single cell analysis, e.g. reviewed in Shapiro et al, Nature Reviews Genetics, 14: 618-630 (2013). A common approach in many of these technologies has included the formation of single cell-containing reactors by stochastically distributing cells of a population into small reaction volumes for analysis. Although such stochastic methods permit handling cells in "bulk" mixtures, the methods allow only limited control of the numbers of cells that end up in the small volumes, e.g. Koster et al, LabChip, 8: 1110-1115 (2008), so that typically the higher the concentration of cells in the starting population, the greater the number of small volumes that end up with two or more cells. Since successful single-cell analysis depends on having only one cell in each reaction volume, very low cellular concentrations of starting populations are selected to avoid the occurrence of cellular doublets. Unfortunately, this creates significant inefficiencies in analyses conducted downstream of such stochastic isolation steps. This problem is exacerbated when cell-specific barcodes are delivered to cells by coalescing droplets carrying cells with droplets carrying barcoded beads, which are also stochastically distributed in the droplets. Thus, the availability of a technique to directly synthesis a unique barcode on a cell would obviate the requirement of delivering a single bead to a single cell.

[0072] Methods of the invention may be applied to a wide range of biological cells, including but not limited to, mammalian cells, yeast cells, bacterial cells, protozoan cells, fungal cells, plant cells, and the like. In some embodiments, methods of the invention are applied to mammalian cells. Such mammalian cells may be free of tissues, e.g. white blood cells, or such cells may be tissue-bound cells which have been disaggregated. During synthesis of oligonucleotides on living cells reaction conditions are selected to maintain the cells in a viable state. Such conditions (sometimes referred to herein as "biological conditions" or "viable conditions" or "cell-viable conditions") include disposing and maintaining cells in reaction mixtures that comprise a physiological salt solution that permits a balance of osmotic presssure across cell membranes, a pH in the range of from 6.8 to 7.8, and a temperature in a range of from 15.degree. C. to 41.degree. C. In some embodiments, a temperature in the range of from 25.degree. C. to 38.degree. C. is employed. Physiological salt solutions may include sodium, calcium and/or potassium ions in an aqueous solvent at a concentration in the range of 0.8-1.0 percent (w/v). For example, 0.9 percent (w/v) of sodium chloride in distilled water is a common physiological salt solution. It is understood that such physiological conditions are averages and that in particular implementations of the invention there may be brief deviations from such conditions without significant harm to the cells or biomolecules, for example, in deprotection steps. Likewise, it is understood that some biological cells may be viable in conditions outside those mentioned above, e.g. thermophilic organisms.

[0073] Attaching initiators to cells. A first step to generating a unique cellular label is attaching initiators to cells of a target population. This is accomplished using a variety of conventional techniques including, but not limited to, attaching an initiator to one or more antibodies specific for cell surface markers, integrating an initiator into an aptamer specific for cell surface markers, using click chemistry techniques to attach initiators directly to cell surface proteins, generating initiators with 5'-lipophilic tails which insert into the membranes of the target cells. Examples of such labeling techniques are described in the following references: Weber et al, Biomacromolecules, 15: 4621-4626 (2014); Borisenko et al, Nucleic Acids Research, 37(4): e28 (2009); Sano et al, Science, 258: 120-122 (1992); Kazane et al, Proc. Natl. Acad. Sci., 109(10): 3731-3736 (2012); Nikic et al, Nature Protocols, 10: 780-791 (2015); Baskin et al, Proc. Natl. Acad. Sci., 104(43): 16793-16797 (2007); Jewitt et al, Chem. Soc. Rev., 39(4): 1272-1279 (2010); Li et al, Chem. Sci., 8: 2107 (2017).

[0074] "Split and Mix" Barcoding of Cells. In some embodiments, the invention provides methods for uniquely barcoding cells, either living cells, fixed cells, or fixed and permeabilized cells. For example, in testing or screening compounds for biological effects, such as changes in gene expression, populations of cells after cells have been exposed to different agents or compounds. Samples of such cells may be tested while viable for changes in gene expression, for example, of cell surface molecules, or such cells may be fixed and permeabilized and tested for changes in the expression of both cellular proteins and mRNA. In some embodiments, protein expression may be monitored using one or more protein-specific antibodies each linked to a distinct initiator that may be extended using the enzymatic synthesis methods of the invention. In some embodiments, mRNA expression may be monitored using mRNA-specific primers to generate cDNAs that may be extended as described in FIGS. 1B-1C. After barcodes are synthesized (whether before or after testing), they may be harvested and tabulated, for example, by amplification, isolation, and sequencing, as illustrated in FIG. 2B for barcodes carried by antibodies. Such measurements are analogous to the more cumbersome barcoding scheme based on the hybridization of barcode subunits, e.g. described in Nolan, U.S. patent publication 2016/0251697. Thus, in some embodiments, the invention may be employed to measure the distribution of multiple epitopes on single cells of a large population of cells.

[0075] Similar to adding unique cellular tags in the case of beads described above, the invention also may be used to attach unique position tags in the case of spatial patterns of gene expression in a tissue slice disposed on a planar surface, as illustrated in FIG. 1F. Procedures for placing tissue slices on a planar array of oligonucleotides, identifying and imaging tissue features (such as cell boundaries), permeablizing cells of tissues, implementing reverse transcriptase reactions to produce a cDNA library attached to a planar array are disclosed in Stahl et al, Science, 353: 78-82 (2016); and Frisen et al, U.S. Pat. Nos. 9,593,365 and 10,030,261; and like references, which are incorporated herein by reference. Briefly, referring to FIG. 1F, planar array (164) is provided with a uniform coating of oligonucleotides (180), with a controlled density, attached by their 5' ends, wherein the oligonucleotides (shown in magnified view (165)) comprise segment (166), such as a primer binding site, for later amplification and manipulation of a cDNA, optional segment (167) comprising a molecular tag (sometimes referred to as a "unique molecular identifier" or UMI) which facilitates quantification of cDNA molecules even after amplification, and segment (168), such as a polyT segment, which permits capture of mRNA released from cells. The UMI (167) may comprise a random nucleotide segment. Oligonucleotides (180) may be made in bulk using conventional techniques and applied to the surface of planar array (164) in a single step. Different kinds of oligonucleotides, for example, oligonucleotides with different position tags are not required. Segment (167) may also include a cleavable linker or cleavable nucleotide for releasing cDNAs for analysis, such as, by sequencing. Onto array (164) is disposed a slice or thin layer (181) (e.g. 100-1000 .mu.m thick) of tissue, which it is then treated (169) (i) to identify features, such as cells or sub-tissues, of interest and to record and/or correlate such information to locations on planar array (164), and (ii) to permeablized cells in the tissue so that mRNA is released and allowed to diffuse to and be captured by oligonucleotides (180). The image information is used to define regions on array (164) within which common position tags are synthesized on cDNAs. Treatments may include staining with tissue-specific or biomolecule-specific compounds or dyes. The position tags allow cDNAs to be harvested and sequenced in bulk, yet be related to specific regions by their position tags. After the above steps (i) and (ii), reagents for a reverse transcriptase reaction are applied in order to synthesize cDNAs (171) using captured mRNAs (170) as templates to produce a spatial cDNA library array. Tissue slice (181) is then removed leaving array (164) with a pattern of different cDNAs attached to its surface. The different cDNAs at the different positions may be identified and quantified by attaching position tags to samples of cDNAs from a plurality of locations by inkjet delivery of synthesis reagents for the tags, which is illustrated in FIG. 1E by the superposition of synthesis locations (182) on cDNA pattern (175). In some embodiments, such plurality may be at least 100 positions, or at least 1000 positions, or at least 10,000 positions; in other embodiments, such plurality may be in the range of from 10 to 50,000 positions; or from 10 to 10,000 positions; or from 10 to 1000 positions. Guidance for design and control of inkjet delivery systems is well known by those with skill in the art and may be found in U.S. patent publication US2003/0170698 and U.S. Pat. Nos. 6,306,599; 6,323,043; 7,276,336; 7,534,561; and like references. Alternatively, an electrode array may be employed wherein synthesis steps, such as deprotection of electrochemically sensitive protection groups, e.g. 3'-O-azidomethyl, may be effected by altering a potential at electrodes in the array, e.g. Montgomery, U.S. Pat. Nos. 6,093,302, 6,444,111 and 6,280,595; Gindilis, U.S. Pat. No. 9,339,782; Maurer et al, U.S. Pat. No. 9,267,213; Maurer et al, PLosOne, December 2006, issue 1, e34; Fomina et al, LabChip, 16: 2236-2244 (2016); Kavusi et al, U.S. Pat. No. 9,075,041; Johnson et al, U.S. Pat. Nos. 9,874,538 and 9,910,008; Gordon et al, U.S. Pat. No. 6,251,595; Levine et al, and the like. IEEE J. Solid State Circuits, 43: 1859-1871 (2008); and the like.

[0076] Position tags (173) are selected (e.g. are long enough) to uniquely identify each location or region of interest. Additional segment (174) may be added to facilitate manipulation and sequencing of cDNAs (171). In some embodiments, this application of the invention may be carried out with the following steps: (a) providing an array comprising a uniform coating of capture probes each comprising a capture segment; (b) contacting a tissue sample with the array and allowing the nucleic acid of the tissue sample to interact with the capture domain of the capture probe so that the nucleic acid is captured; (c) treating the tissue sample to identify different regions of the tissue sample; (d) generating a nucleic acid molecule from the nucleic acid that interacts with the capture domain; (e) enzymatically synthesizing position tags onto the nucleic acid molecules; (f) determining the region that is associated with the nucleic acid that interacts with the capture domain; and (e) correlating the determined regions to the cDNAs. In some embodiments, the nucleic acid molecules from the tissue sample is RNA. In other embodiments, the nucleic acid molecules from the tissue sample may be genomic DNA. In other embodiments, the nucleic acid molecules from the tissue sample may be mRNA.

[0077] Similarly, as illustrated in FIG. 1G, spatial distributions ofproteins in a tissue sample may be identified by using antibodies with releasable oligonucleotide barcodes that contain a polyA region and an antibody identification region, that is, an antibody barcode that identifies the protein that the antibody is specific for. In some embodiments, antibodies may carry two tags; an antibody barcode as described above and a fluorescent label which would assist in the optical analysis of the tissue and later correlation of antibody positions with tissue structures or protein distributions of interest. As above, planar array (1164) is provided with a uniform coating of oligonucleotides (1180) attached by their 5' ends, wherein the oligonucleotides (shown in magnified view (1165)) optionally comprise segment (1166), such as a primer binding site, for later amplification and manipulation of an antibody barcode, optional segment (1167) comprising a molecular tag (sometimes referred to as a "unique molecular identifier" or UMI) which facilitates quantification of antibody molecules even after amplification, and segment (1168), such as a polyT segment, which permits capture of antibody barcodes (1183) (SEQ ID NO: 18) released from bound antibodies. Release may be effected by a chemically labile bond in a linker between antibody barcode and the antibody, such as a disulfide moiety.

[0078] The UMI (1167) may comprise a random nucleotide segment. Different kinds of oligonucleotides, for example, oligonucleotides with different position tags are not required because they are synthesized later using methods of the invention. Segment (1166) may also include a cleavable linker or cleavable nucleotide for releasing antibody barcodes for analysis, such as, by sequencing. Onto array (1164) is disposed a slice or thin layer (1181) (e.g. 100-1000 .mu.m thick) of tissue, which it is then treated (1169) (i) to identify features, such as cells or sub-tissues, of interest and to record and/or correlate such information to locations on planar array (1164), and (ii) to permeablized cells in the tissue so that antibodies can access target proteins and so that released antibody barcodes can diffuse to and be captured by oligonucleotides (1180). The image information is used to define regions on array (1164) within which common position tags are synthesized on antibody barcodes. As above, the position tags allow antibody barcodes to be harvested and sequenced in bulk, yet be related to specific regions by their position tags. After the above steps (i) and (ii), reagents for a reverse transcriptase reaction are applied in order to synthesize complements of the antibody barcodes (1171) just as mRNAs above using captured antibody barcodes (1170) as templates. Tissue slice (1181) is then removed leaving array (1164) with a pattern of different cDNAs attached to its surface. The different cDNAs at the different positions may be identified and quantified by attaching position tags to samples of antibody barcodes from regular locations by inkjet delivery of synthesis reagents for the position tags. As with cDNAs, position tags (1173) on antibody barcodes are selected (e.g. are long enough) to uniquely identify each location or region of interest.

[0079] Similarly, as illustrated in FIG. 11I, spatial patterns of gene expression and distribution of proteins in a tissue slice may be identified by using a planar array comprising a combination of oligonucleotides and DNA labeled antibodies with identifiers, that is, an antibody specific DNA sequence that identifies the protein that the antibody is specific for. Briefly, referring to FIG. 11I, planar array (3164) is provided with an uniform coating of oligonucleotides (3180) attached by their 5' ends, and an uniform coating of antibodies (3191) comprising DNA label (3190) that may be attached to one or more amino acids of antibody (3191). The density of oligonucleotides (3180) and antibodies (3191) of each kind on the planar array (3164) is controlled so that the density of oligonucleotides is predetermined and the density of each kind of antibody (i.e. antibodies with different specificities) is predetermined. As above, oligonucleotides (3180) (shown in magnified view (3165)) comprise segment (3166), such as a primer binding site, for later amplification and manipulation of a cDNA, optional segment (3167) comprising a molecular tag (sometimes referred to as a "unique molecular identifier" or UMI) which facilitates quantification of cDNA molecules even after amplification, and segment (3168), such as a polyT segment, which permits capture of mRNA released from cells. The UMI (3167) may comprise a random nucleotide segment. Antibodies (3191) (shown in magnified view (3165)) comprise a DNA label (3190)(attached to one or more amino acids of the antibody) and segment (3192) comprising a sequence identifier that identifies the protein that the antibody is specific for. Onto array (3164) is disposed a slice or thin layer (3181) (e.g. 100-1000 .mu.m thick) of tissue, which it is then treated (3169) (i) to identify features, such as cells or sub-tissues, of interest and to record and/or correlate such information to locations on planar array (3164), and (ii) to permeabilized cells in the tissue so that mRNA and proteins are released and allowed to diffuse to and be captured by oligonucleotides and antibodies (3180 and 3191, respectively). The image information may be used to define regions on array (3164) within which common position tags are synthesized on cDNAs or antibodies DNA. Treatments may include staining with tissue-specific or biomolecule-specific compounds or dyes. The position tags allow cDNAs and DNA attached to antibodies to be harvested and sequenced in bulk, yet be related to specific regions of the tissue by their position tags. After the above steps (i) and (ii), reagents for a reverse transcriptase reaction are applied in order to synthesize cDNAs (3171) using captured mRNAs (3170) as templates. Binding secondary antibodies (3197) to the same molecules (3193) than immobilized antibodies (3191) are applied to the array in order to form a capture sandwich (like in sandwich ELISA assay). Secondary antibodies (3197) comprise, attached to one or more amino acids, a DNA label (3194), and segment (3195) comprising a sequence identifier that identifies the protein that the antibody is specific for. Identifier segment (3192) of immobilized antibodies (3191) and identifier segment (3195) of secondary antibodies may be the same or different but are associated with and identify the antibody pair that recognize the same protein (3193). In addition, 3' regions of immobilized antibodies' DNA label (3190) and of secondary antibodies' DNA label (3194) are complementary in order to synthesize DNA antibodies strands (3196) during a polymerase elongation step. Tissue slice (3181) is then removed leaving array (3164) with a pattern of different cDNAs and antibodies DNA attached to its surface. The different cDNAs and antibodies DNA at the different positions may be identified and quantified by attaching position tags (3173) and manipulation segments (3174) to samples of cDNAs and antibodies DNA from regular locations by inkjet delivery of synthesis reagents.

[0080] FIG. 1I illustrates an embodiment for focusing analysis on a particular surface area (i.e. subregion) of the array (164) that is of particular interest. After following the procedure described by FIG. 1F, a first pass of sequencing analysis (4169) may reveal that a particular surface area (4172) of interest would require better spatial sequencing resolution. A second pass of inkjet delivery of synthesis reagents using the same array but with an offset pitch (4170) is used to generate additional synthesis locations (4183) in the area left untagged by the initial synthesis locations (4182). During this additional tagging step different position tags (4175) are synthesized compared to initial position tags (4173). Additional segments (4174 and 4176) may be added to facilitate manipulation and sequencing of DNA. Interestingly, subsequent passes of inkjet delivery of synthesis reagents on the same array (4171) may be carried over to further refine the analysis by increasing the spatial resolution of the sequencing. Furthermore, this focus analysis method can be applied equally well either to both oligonucleotide arrays (FIG. 1F) or to oligonucleotide and antibodies array (FIG. 11I).

[0081] Although FIGS. 1F-1I call for the use of arrays of capture oligonucleotides attached to solid surfaces, methods of the invention permit direct synthesis on tissues without necessarily requiring that analytes of interest, e.g. mRNAs or antibody barcodes, diffuse to and be captured by capture probes attached to an array. In some embodiments, synthesis of position tags may take place directly on a tissue section, with or without prior permeabilization. By way of example, such embodiments may be implemented in the following steps: (a) disposing on a tissue section under binding conditions a plurality of antibodies each capable of specifically binding to a different one of a plurality of proteins, each different antibody having releasably attached an antibody barcode, the antibody barcode comprising an initiator with a free 3'-hydroxyl; (b) repeating for a plurality of cycles at predetermined positions on the tissue section the steps of (i) contacting the initiator or elongated fragments having free 3'-O-hydroxyls with a 3'-O-blocked nucleoside triphosphate and a template-independent DNA polymerase so that the initiator or elongated fragments are elongated by incorporation of a 3'-O-blocked nucleoside triphosphate to form 3'-O-blocked elongated fragments, and (ii) enzymatically deblocking the elongated fragments to form elongated fragments having free 3'-hydroxyls, thereby synthesizing a different position tag onto the releasably attached antibody barcodes at each different position to form position tag-antibody barcode conjugates; (c) releasing the position tag-antibody barcode conjugates; and (d) sequencing the released position tag-antibody barcode conjugates to determine a spatial distribution of the plurality of proteins in the tissue section. In some embodiments, a step of permeabilizing cells of the tissue section may be included either to expose intracellular protein targets or to synthesize position tags directly on intracellular mRNAs.

[0082] Embodiments described in FIGS. 1F-1I may be implemented by the steps: (a) capturing biomolecules from a tissue slice disposed on a solid surface wherein each biomolecule comprises or can be modified to comprise an oligonucleotide identifying the biomolecule captured and having a free 3'-hydroxyl; (b) synthesizing a position tag on the free 3'-hydroxyls of the oligonucleotides at a plurality of different positions on the solid surface by template-free enzymatic synthesis; and (c) sequencing the oligonucleotides to determine a spatial distribution of biomolecules in the tissue slice. In some embodiments, a further step of releasing the oligonucleotides in implemented using conventional linking chemistries and protocols, e.g. described in the above-cited references. In some embodiments, biomolecules are polynucleotides, such as mRNA, RNA, antibody barcodes, proteins, or the like. Biomolecules may be captured by complementary oligonucleotides attached to the solid surface or antibodies attached to the solid surface. In the later case, antibody binding pairs (such as used in ELISAs) may be applied to a solid surface after capture of protein biomolecules. In some embodiments, either one or both antibodies of a binding pair may have oligonucleotide barcodes to which position tags may be synthesized.

[0083] FIGS. 2A-2D illustrate the above concepts for embodiments in which expression of selected proteins (or epitopes) is measured using specific antibody binding compounds and expression of either all or selected genes is measured using primers specific for all mRNAs or selected mRNAs. In both classes of probe, the attached oligonucleotide label may identify the compound it specifically binds to as well as serve as an initiator for enzymatic synthesis. FIG. 2A illustrates an embodiment for implementing a process of "split and mix" tag synthesis. A population of viable cells (200) is combined with a set of antibodies (204) each labeled with oligonucleotides (202) which both identify the antibody (and therefore its target protein) and serves as an initiator. Cells (200) are combined in a common vessel (206) and then distributed (usually in equal parts) to multi-well array (212) and into one of four wells 210a-210d in which one enzymatic elongation cycle is carried out, for example, one A extension in 210a, one G extension in 210b, one C extension in 210c, and one T extension in 210d. After deblocking the added nucleotide, the cells (200) are harvested and combined again in a common vessel (206). A random nucleotide tag of increasing length is generated with each nucleotide addition, so that with the n additions 4' tags are generated. To minimize the manipulation and possible damage to cells, if the size of population (200) is known, then the number of cycles may be limited to a number that insures a high probability of cells having unique tags, but that minimizes cell damage or loss. For example, if population (200) consisted of 10.sup.6 cells then 10-11 cycles generates 1-4 .times.10.sup.6 unique tags. In some embodiments, a number of cycles are implements to ensure that each cell carries a unique oligonucleotide tag with a probability of 99 percent or higher. In some embodiments, once a number of cycles has been implemented to give substantially all cells a unique oligonucleotide tag (218), then the tags may be harvested and analyzed by large-scale sequencing (and, for example, the expression of each protein and each gene in each cell can be tabulated). As illustrated in FIG. 2B, the initial oligonucleotides attached to either antibodies or primers may include other segments for molecular manipulation. For example, oligonucleotide (232) on antibody (230) may comprise segment (234) which may include a code for identifying the specificity of antibody (230) as well as further sequences for later manipulation, such as for PCR amplification. Oligonucleotide (232) also includes segment (236) having a free 3'-hydroxyl which serves as an initiator for an initial cycle of nucleotide addition. After a number of cycles is carried out to attached tag nucleotides (238 and 239), further nucleotides may be added with no splitting or mixing in order to attached a common segment (235), e.g. a primer sequence, to permit manipulation and analysis of the tags and protein or gene identification sequences. In some embodiments, this may be accomplished by amplifying the attached oligonucleotides to form amplicon (233) which then may be analyzed (231) by high throughput DNA sequencing.

[0084] As mentioned above, cells may be labeled with a similar random barcode that, instead of consisting of a random sequence of nucleotides, consists of a random sequence of homopolymer segments, wherein each homopolymer segment comprises a different kind of nucleotide than that of a nearest neighbor homopolymer segment. The advantage of such a barcoding scheme is that 3'-blocked dNTPs do not have to be used; therefore, no deblocking step is required, which makes the synthesis process simpler and potentially less damaging to the viability of the cells. The lengths of homopolymer segments used in such barcodes may vary widely. In some embodiments, conditions including the duration of reaction are selected so that the average length of a homopolymer segment is in the range of from 1 to 100 nucleotides; in other embodiments, the average length of a homopolymer segment is in the range of from 1 to 25 nucleotides; and in still other embodiments, the average length of a homopolymer segment is in the range of from 1 to 10 nucleotides.

[0085] Binding compounds used with the invention may include a wide variety of compositions that specifically bind to predetermined cellular constituents and to which initiators may be attached for generating identifying oligonucleotides. FIGS. 2C and 2D illustrate the range of different types of binding compounds that may be used with viable cells (FIG. 2C) and with cells that have been fixed and permeabilized (FIG. 2D) to give access to intracellular constituents. Usually only cellular antigens and/or constituents exposed to the extracellular environment are accessible in viable cells (240). Thus, in some embodiments, binding compounds comprise antibody binding compounds labeled with initiator oligonucleotides (242) as described above, which antibody is specific for predetermined cell surface proteins (e.g. 243a, 243b, 243c), or membrane probes (244), which comprise a membrane-specific component (248) that inserts in cell surface membrane (245), such as a lipophilic moiety, and an initiator oligonucleotide (246). As illustrated in FIG. 2D, fixed and permeabilized cells (280) provide access through pores (281) created in a permeabilization step, to intracellular RNA (286) and intracellular proteins (284), to which binding compounds comprising hybridization probes (e.g. 285) and antibody binding compounds (287), respectively, may be targeted. In some embodiments, binding compounds may comprise hybridization probes of genomic DNA.

[0086] In some embodiments, initiator oligonucleotides with free 3'-hydroxyls are stably inserted into to the cell surface membranes of target cells by derivatizing the 5' end of initiator oligonucleotides with a lipophilic moiety using conventional techniques, e.g. as disclosed in the following references: Weber et al, Biomacromolecules, 15: 4621-4626 (2014); Bunge et al, Langmuir, 23(8): 4455-4464 (2007); Borjesson et al, J. Amer. Chem. Soc., 131(8): 2831-2839 (2009); Bunge et al, J. Phys. Chem. B, 113(51): 16425-16434 (2009); and like references. Of particular interest is the technique disclosed by Weber (cited above) which calls for the insertion of complementary pairs of oligonucleotides each derivatized with a lipophilic moiety, one oligonucleotide of the pair on its 5' end (a longer initiator oligonucleotide) and the other on its 3' end (a shorter support oligonucleotide). The hybridized pairs are very stable in the cell membrane, which would minimize losses during synthesis.

[0087] As illustrated in FIG. 3A, in some embodiments, initiator oligonucleotides (300) comprise oligonucleotide (302) with a free 3' hydroxyl and lipophilic moiety (304) at a 5' end. Such initiator is capable of stably inserting into the lipid bilayer of a cell surface membrane with a free 3'-hydroxyl available for extension. Initiator oligonucleotides (300) are combined with target cells (306) under conditions (308) that permit initiator oligonucleotides (300) to insert (310) into cell surface membrane (312) by their lipophilic moieties so that free 3'-hydroxyls of the oligonucleotides are accessible for synthesis. Cells (314) may then be subjected to enzymatic extension of initiators (310) by methods of the invention.

[0088] In some embodiments, enzymatic extension of initiators (310) may be employed to generate unique cell-specific barcodes on cells (314) by a "split and mix" synthesis strategy, as illustrated in FIG. 3B. Cells with initiators are pooled in vessel (322) after which successive cycles of nucleotide additions are carried out. Cells (320) in vessel (322) are distributed (323) among four reaction chambers (324a-324d) in which is added to free 3'-hydroxyl of an attached initiator a 3'-O-blocked dA, dG, dC or dT, respectively, after which such added nucleotide is de-blocked to ready it for the next addition cycle. In some embodiments, cells of vessel (322) are distributed equally among the for reactions chambers; however, in alternative embodiments, cells of vessel (322) may be distributed non-equally among reaction chambers (324a-324d) to bias the occurrence of a nucleotide at a particular position. In other embodiments, more than one addition cycle may be carried out in the reaction chambers (324a-324d), thereby, for example, adding two or more nucleotides. Reaction chambers (324a-324d) are illustrated as wells in a solid structure (326), but they may comprise separate reaction vessels, such as separate reaction tubes. In some embodiments, reaction chambers (324a-324d) may comprise wells in conventional microwell plates of 24-, 48-, 96-, 384- or 1536-wells. In higher capacity microwell plates, e.g. 96-well, multiple syntheses may be carried out in parallel, for example, for barcoding and analyzing multiple samples at the same time. In some embodiments, after a cycle of nucleotide addition and deprotection, cells in chambers (324a-324d) are mixed (328) so that in the next nucleotide addition step each cell of the mixture has an equal probability of having added an A, C, G or T. In such embodiments, by such "split and mix" steps, a unique random sequence oligonucleotide may be generated on the initiators anchored in the cell membranes. Such "split and mix" steps may be continued (330) until an added random-sequence oligonucleotide is long enough for each cell of the population in vessel (322) is associated with a unique sequence. In some embodiments, after unique barcodes are formed (332), additional nucleotides of a common sequence may be synthesized without splitting and mixing (334). Such common sequences may include primer binding sites, or the like, for manipulating or amplifying the barcodes for later analysis. The resulting barcoded cells (336) may then be used in applications, such as single-cell transcriptome analysis, as illustrated in FIG. 3C. Guidance for large scale single cell transcriptome analysis with bead-based barcoding is disclosed in the following references: Kolodziejczyk et al, Molecular Cell, 58: 610-620 (2015); Saliba et al, Nucleic Acids Research, 42(14): 8845-8860 (2014); Church et al, U.S. patent publication 2013/0274117; Macosko et al, Cell, 161: 1202-1214 (2015); Klein et al, Cell, 161: 1187-1201 (2015); and the like. Generally, the techniques comprise steps of (i) capturing or isolating single cells, (ii) lysing single cells, (iii) reverse transcribing RNA to make cDNA, (iv) amplification of cDNAs, and (v) sequencing. Such techniques may further include a step of attaching cell-specific barcodes to cDNAs, in particular by generating droplets containing a single cell and a single barcode-carrying bead.

[0089] In some embodiments, unique oligonucleotide tags may be synthesized on viable cells by attaching tags comprising sequences of homopolymeric segments. In some embodiments, the invention is directed to methods of synthesizing on a viable cell an oligonucleotide barcode comprising the steps of: (a) providing an initiator with a free 3'-hydroxyl attached to a cell surface molecule of the cell or anchored in a cell surface membrane of the cell; (b) repeating under biological conditions a plurality of cycles of the step of contacting under elongation conditions the initiator or elongated fragments having free 3'-O-hydroxyls with a nucleoside triphosphate and a template-independent DNA polymerase so that the initiator or elongated fragments are elongated by a homopolymer segment to form elongated fragments having free 3'-hydroxyls, wherein the kind of nucleoside triphosphate added in each step after a first step is different from the kind in the immediately preceding step.

[0090] In some embodiments, each of the cycles further includes a step of removing unincorporated nucleoside triphosphates. In some embodiments, the elongation conditions include a concentration of said nucleoside triphosphates, a temperature and a reaction time to produce homopolymer segments having an average length in the range of from 1 to 100 nucleotides. In some embodiments, unique oligonucleotide tags comprising homopolymeric segments are produced using a split-and-mix procedure.

Single Cell Analysis

[0091] In some embodiments of the invention, cells from a population are disposed in reactors each containing a single cell. This may be accomplished by a variety of large-scale single-cell reactor platforms known i the art, e,g. Clarke et al, U.S. patent publication 201010255471; Mathies et al., U.S. patent publication 2010/0285975; Edd et al., U.S. patent publication 2010/0021984; Colston et al, U.S. patent publication 2010/0173394; Love et al, International patent publication WO2009/145925; Muraguchi et al, U.S. patent publication 2009/0181859; Novak et al, Angew. Chem. Int. Ed., 50: 390-395 (2011); Chen et al, Biomed Microdevices, 11: 1223-1231 (2009); and the like, which are incorporated herein by reference. In one aspect, cells are disposed in wells of a microwell array where reactions, such as PCA reactions, take place; in another aspect, cells are disposed in micelles of a water-in-oil emulsion, where micelles serve as reactors. Micelle reactors generated by microfiuidics devices, e.g. Mathies et al (cited above) or Edd et al (cited above), are of particular interest because uniform-sized micelles may be generated with lower shear and stress on cells than in bulk emulsification processes. Compositions and techniques for emulsifications, including carrying out amplification reactions, such as PCRs, in micelles is found in the following references, which are incorporated by reference: Becher, "Emulsions: Theory and Practice," (Oxford. University Press, 2001); Griffiths and Tawfik, U.S. Pat. No. 6,489,103; Tawfik and Griffiths, Nature Biotechnology, 16: 652-656 (1998); Nakano et al, J. Biotechnology, 102: 117-124 (2003); Dressman et al, Proc. Natl. Acad, Sci., 100: 8817-8822 (2003); Dressman et al, U.S, Pat. No, 8,048,627; Berka et al, U.S. Pat. Nos. 7,842,457 and 8,012,690; Diehl et al, Nature Methods, 3; 551-559 (2006); Williams et al, Nature Methods, 3: 545-550 (2006); Zeng et al, Analytical Chemistry, 82(8): 3183.-3190 (2010); Micellula DNA Emulsion & Purification Kit instructions (EURx, Gdansk, Poland, 2011); and the like. In one embodiment, the mixture of homogeneous sequence tags (e.g. beads) and reaction mixture is added dropwise into a spinning mixture of biocompatible oil (e.g., light mineral oil, Sigma) and allowed to emulsify. In another embodiment, the homogeneous sequence tags and reaction mixture are added dropwise into a cross-now of biocompatible oil.The oil used may be supplemented with one or more biocompatible emulsion stabilizers. These emulsion stabilizers may include Atlox 4912, Span 80, and other recognized and commercially available suitable stabilizers. in some embodiments, the emulsion is heat stable to allow thermal cycling, e.g,, to at least 94.degree. C., at least 95.degree. C., or at least 96.degree. C. In sonic embodiments, the droplets formed range in size from about 5 microns to about 500 microns. In soma embodiments, droplets are formed in a range of from about 10 microns to about 350 microns, or from about 50 to 250 microns, or from about 100 microns to about 200 microns. Advantageously, cross-now fluid mixing allows for control of the droplet formation, and uniformity of droplet size.

[0092] In some embodiments, micelles are produced having a uniform distribution of volumes so that reagents available in such reactors result in similarly amplified target nucleic acids and sequence tags. That is, widely varying reactor volumes, e.g. micelle volumes, may lead to amplification failures and/or widely varying degrees of amplification. Such failures and variation would preclude or increase the difficulty of making quantitative comparisons of target nucleic acids in individual cells of a population, e.g. differences in gene expression. in one aspect, micelles are produced that have a distribution of volumes with a coefficient of variation (CV) of thirty percent or less. In some embodiments, micelles have a distribution of volumes with a CV of twenty percent of less.

[0093] Cells of a sample and homogeneous sequence tags may be suspended in a reaction mixture prior to disposition into reactors. In one aspect, a reaction mixture is a PCA reaction mixture and is substantially the same as a PCR reaction mixture with at least one pair of inner (or linking) primers and at least one pair of outer primers. A reaction mixture may comprise one or more optional components, including but not limited to, thermostable restriction endonucleases; one or more proteinase inhibitors; lysing agents to facilitate release of target nucleic acids of isolated cells, e.g. Brown et al, Interface, 5: S131-S138 (2008); and the like. In some embodiments, a step of lysing cells may be accomplished by heating cells to a temperature of 95.degree. C. or above in the presence of a nonionic detergent, e.g. 0.1% Tween X-100, for a period prior to carrying out an amplification reaction. In one embodiment, such period of elevated temperature may be from 10-20 minutes. Alternatively, a step of lysing cells may be accomplished by one or more cycles of heating and cooling, e.g. 96.degree. C. for 15 min followed by 10.degree. C., for 10 min, in the presence of a nonionic detergent, e.g. 0.1% Tween X-100. In some embodiments, micelle reactors are generated and sorted in a micro fluidics device as described more fully below.

Single Cell Transcriptome Analysis

[0094] In FIG. 3C, for some embodiments, barcoded cells (340) may be prepared for transcriptome analysis using a droplet-based microfluidic device (345), which encapsulates barcoded single cells into aqueous micelles and coalesces the cell-containing micelles with a series of micelles containing reagents for constructing cDNA libraries. Alternatively, cell-containing micelles may be produced and reagents delivered to such micelles using non-microfluidic methods such as disclosed in Abate et al, International patent publication WO2019/139650. Cells (340) with initiator-barcode conjugates (344) embedded in their cell surface membranes are disposed in chamber (343) in aqueous solution (342) which may have a pH, salt concentrations and other necessary ingredients to maintain the integrity of the cells. From chamber (343) cells (340) and aqueous solution (342) are driven through passage (351) into junction (353) where confluent oil flows (350) cause the formation of aqueous micelle (346), some of which contain a single cell. Such droplet-based microfluidics devices may be constructed using well-known designs and techniques. For example, the following references provide guidance in the design and implementation of such microfluidic devices: Zare et al, Ann. Rev. Biomed. Eng., 12: 187-201 (2010); Link, U.S. patent publication 2012/0309002; Shapiro et al, Nature Reviews Genetics, 14: 618-630 (2013); Kim et al, Anal. Chem., 90: 1273-1279 (2018); Abate et al, U.S. patent publication 2017/0009274; Zagnoni et al, chapter 2, Methods in Cell Biology, 102: 25-48 (2011); Zheng et al, Nature Comm., 8:14049 (2016); Link et al, U.S. patent publication 2008/0014589; and the like.

[0095] Cell-containing micelle (346) is caused to coalesce with reagent micelle (348) in oil flow (354) at junction (352). Reagent micelle (348) contains lysis reagents for breaking down the cell surface membrane to expose mRNA for transcription and amplification. The result of such coalescence is micelle (356), which incubates during flow through passage (360) whose length is designed to provide a transit time sufficient for the lysis reagents carried by micelle (348) to complete lysis of the cell and produce a cellular lysate (358) ready for reverse transcription and amplification. Lysis reagents are described in the following references: Tang et al, Nature Protocol, 5(3): doi:10.1038/nprot.2009.236; Tbronhill et al, Prenatal Diagnosis, 21: 490-497 (2001); Kim et al, Fertility and Sterility, 92: 814-818 (2009); and the like. Exemplary lysis conditions for use with PCA reactions are as follows: 1) cells in H2O at 96.degree. C. for 15 min, followed by 15 min at 10.degree. C.; 2) 200 mM KOH, 50 mM dithiotheitol, heat to 65.degree. C. for 10 min; 3) for 4 .mu.L protease-based lysis buffer: 1 .mu.L of 17 .mu.M SDS combined with 3 .mu.L of 125 .mu.g/mL proteinase K, followed by incubation at 37.degree. C. for 60 min, then 95.degree. C. for 15 min (to inactivate the proteinase K); 4) for 10 .mu.L of a detergent-based lysis buffer: 2 .mu.L H2O, 2 .mu.L 250 ng/.mu.L polyA, 2 .mu.L 10 mM EDTA, 2 .mu.L 250 mM dithiothreitol, 2 .mu.L 0.5% N-laurylsarcosin salt solution. Single-cell analysis platforms, incubation times, lysis buffer and/or PCA reaction other components, their concentrations, reactions volumes and the like, are design choices that are optimized for particular applications by one of ordinary skill in the art. In one embodiment, an alkaline lysis buffer disclosed by Kim et al, Anal. Chem., 90: 1273-1279 (2018) is employed. Such buffer comprises 20 mM NaOH, 60% (v/v) PeG-200, and 2% (v/v) Triton X-100, and may be neutralized by the buffering capacity of an RT-PCR reagents.

[0096] After lysis, cell lysate in micelle (358) is coalesced at junction (368) with reagent micelle (362) from oil flow (364). Reagent micelle contains reverse transcriptase and PCR reaction components. In some embodiments, such components may comprise ingredient from a commercial RT-PCR kit, for example, ThermoFisher Invitrogen SuperScript IV One-Step RT-PCR system. In some embodiments, such components may comprise template-switching transcription components, e.g. Trombetta et al, Curr. Protocol Mol. Biol., 107: 4.22.1-4.22.17 (2014). After coalescence, droplets are collected in a temperature-control device, such as a thermocycler, which permits heat denaturation of reverse transcriptase and subsequence PCR of cDNAs and barcodes. Different embodiments of reverse transcription reactions are illustrated in FIGS. 4A and 4B. In FIG. 4A, polyT primer (402) is anneal to mRNA (400) and extended (406) to form a first DNA strand (405) (SEQ ID NO: 17). After removal of mRNA template (400), gene-specific primer (408) is annealed and extended to complete the cDNA. Primer (408) may comprise 5' tail (410) which includes common sequences, such as primer binding sites, for later manipulation and preparation for sequencing. In FIG. 4B, a template-switching scheme is illustrated which may be used for producing a single cell cDNA library, e.g. Zhu et al, Biotechniques, 30(4): 892-897 (2001). Template (422) is anneal to mRNA (420) and extended (424) with a reverse transcriptase, such as MMLV, that make template-free additions of a selected nucleotide (426) to the 3' end of the first cDNA strand after the end of the RNA template is reached. This allows adaptor ((428) to anneal to the template-free addition and be extended (432) to produce a second strand to complete cDNA (430). The 5' segment of adaptor (428) may be designed to include common sequences for later amplification and preparation for high throughput sequencing.

[0097] In some embodiments, after template-switching reverse transcription, polymerase cycling assembly reactions are carried out in each micelle. Polymerase cycling assembly (PCA) reactions permit a plurality of nucleic acid fragments to be fused together to form a single fusion product in one or more cycles of fragment annealing and polymerase extension, e.g. Xiang et al, FEBS Micro biol. Rev., 32: 522-540 (2008). PCA reactions come in many formats, In one format of interest, PCA follows a plurality of polymerase chain reactions (PCRs) taking place in a common reaction volume, wherein each component PCR includes at least one linking primer that permits strands from the resulting amplicon to anneal to strands from another amplicon in the reaction and to be extended to form a fusion product or a precursor of a fusion product. PCA in its various formats (and under various alternative names) is a well-known method for fragment assembly and gene synthesis, several forms of which are disclosed in the following references; Yon et al, Nucleic Acids Research, 17: 4895 (1989); Chen et al, J. Am. Chem. Soc., 116: 8799-8800 (1994); Stemmer et al, Gene, 164: 49-53 (1995); Hoover et al, Nucleic Acids Research, 30: c43 (2002); Xiong et al, Biotechnology Advances, 26: 121-134 (2008); Xiong et al, FEBS Microbial. Rev., 32: 522-540 (2008); and the like.

[0098] FIG. 4C illustrates the use of PCA to attach the same cell-specific barcode to each cDNA. "X" DNAs (462) may be the enzymatically synthesized barcode sequences flanked by primer binding sites. Primers (470) and (471) anneal to common sequences on the barcodes and cDNAs, respectively, and they have complementary 5' tails.

[0099] Multiple different target nucleic acids, such as cDNAs (460), g.sub.1, g.sub.2, . . . g.sub.n, are linked to the same barcode nucleic acid, X (462) to form (464) multiple fusion products X-g.sub.1, X-g.sub.2, X-g.sub.a (466). In some embodiments, such plurality is between 2 and 10000; and in another embodiment, it is between 2 and 1000; and in another embodiment, it is between 2 and 100. In PCA reactions of these embodiments, the concentration of inner primer (468) may be greater than those of inner primers (e.g. 471) of the various g.sub.i nucleic acids so that there is adequate quantities of the X amplicon to anneal with the many stands of the g.sub.i amplicons. In accordance with a method of the invention, the fusion products (466) may be extracted from the reaction mixture of the coalesced micelles and sequenced.

[0100] In some embodiments, a method for generating a cDNA library with cell-specific barcodes may comprise the steps of (a) synthesizing a unique oligonucleotide barcode on each cell of a population to form a population of barcoded cells; (b) disposing barcoded cells into multiple reactors each containing a single barcoded cell in a polymerase cycling assembly (PCA) reaction mixture, wherein the PCA reaction mixture comprises a pair of outer primers and one or more pairs of linking primers specific for a plurality of target nucleic acids in the barcoded cells and the oligonucleotide barcodes; (c) performing a PCA reaction in the reactors so that fusion products of the target nucleic acids and the oligonucleotide barcodes are formed in the reactors; and (d) sequencing the fusion products from the reactors to identify the target nucleic acids of each cell in the population.

[0101] An alternative application for single-cell transcriptome analysis is illustrated in FIG. 5. In this embodiment, cells (without barcodes)(502) are mixed with polyT beads (also without barcodes)(504) and disposed in an aqueous mixture in chamber (500) of microfluidics device (508). As mentioned above, approaches not depending on microfluidics devices may also be applied, e.g. Abate et al, International patent publication WO20191139650. The aqueous mixture is forced through passage (506) into oil stream (512) at junction (510) so that aqueous droplets form, some of which (516) contain one cell (517a) and one bead (517b). Such droplets are then coalesced with droplets (518) containing cell lysis reagents at junction (520) to form droplet (522) in which cells are lysed releasing polyA RNA, which anneals to the polyT primers attached to beads (517b). After appropriate incubation to release the desired cellular constituents, such as mRNA, the lysate-containing droplets are then coalesced at junction (530) with droplets (528) containing reverse transcriptase reagents, such as a reverse transcriptase, appropriate salts, and a buffer system that may counteract or alter conditions (e.g. high pH) imposed by the lysis reaction. Resulting droplets (531) are collected (532) and incubated so that polyA RNA anneals to polyT primers on beads (517b) and serves as a template for the reverse transcriptase extension of the polyT segments to form a single-cell cDNA library covalently attached to bead (517b). Beads (517b) may then be collected from the droplets and combined and subjected to "split and mix" synthesis to add a unique barcode and further sequences, such as primer binding sites, for subsequent manipulation, such as copying and preparation of high throughput sequencing, as described above.

[0102] Clearly many other microfluidics device configurations may be employed to generate micelles containing a single cell and a predetermined number of homogeneous sequence tags, for example, one homogeneous sequence tag, two homogeneous sequence tags, or to selectively add reagents to a micelle by selectively coalescing micelles, by electroporation, or the like, e.g. Zagoni et al, chapter 2, Methods of Cell. Biology, 102: 25-48 (2011); Brouzes, chapter 10, Methods of Cell Biology, 102: 105439 (2011); Wiklund et al, chapter 14, Methods of Cell Biology, 102: 177-196 (2011); Le Gac et al, chapter 7, Methods of Molecular Biology, 853: 65-82 (2012); and the like.

Fixing and Permeabilizing Cells or Tissues

[0103] In some embodiments, initiators coupled to binding compounds comprising nucleic acid hybridization probes and/or protein-specific binding compounds may be directed to intracellular targets, such as intracellular proteins, messenger RNAs, and/or genomic DNAs. In some embodiments, cells are fixed and permeablilized for application of binding compounds specific for such intracellular targets. Fixing and permeablization of cells may be carried out by conventional protocols, such as used in flow cytometry. Typically such protocols include a steps of treating cells with a fixing agent followed by a step of treating cells with a permeabilizing agent. A fixing step typically immobilizes intracellular cellular targets, while retaining cellular and subcellular architecture and permitting unhindered access of antibodies and/or hybridization probes to all cells and subcellular compartments. Wide ranges of fixatives are commercially available, and the correct choice of method will depend on the nature of the targets being examined and on the properties of the antibody and/or hybridization probes used. Fixation methods fall generally into two classes: organic solvents and cross-linking reagents. Organic solvents such as alcohols and acetone remove lipids and dehydrate the cells, while precipitating the proteins on the cellular architecture. Cross-linking reagents (such as paraformaldehyde) form intermolecular bridges, normally through free amino groups, thus creating a network of linked antigens. Cross-linkers preserve cell structure better than organic solvents, but may reduce the antigenicity of some cell components, and require the addition of a permeabilization step, to allow access of the antibodies and/or hybridization probes to the intracellular targets. Exemplary fixing and permeabilizing steps include, but are not limited to, methanol-acetone fixation (fix in cooled methanol, 10 minutes at -20.degree. C.; permeabilize with cooled acetone for 1 min at -20.degree. C.); paraformaldehyde-triton fixation (fix in 3-4% paraformaldehyde for 10-20 min; rinse with phosphate buffered saline (PBS); permeabilize with 0.5% Triton X-100 for 2-10 min); paraformaldehyde-methanol fixation (fix in 3-4% paraformaldehyde for 10-20 min; rinse with PBS; permeabilize with cooled methanol for 5-10 min at -20.degree. C.). Permeabilizing agents include, but are not limited to, detergents saponin, Triton X-100, Tween-20, NP40. Permeabilizing agents may also include proteinases, such as proteinase K, streptolysin O, and the like.

Chimeric Enzymatically and Chemically Synthesized Polynucleotides for User Specified Applications

[0104] Frequently products used in medicine and biology comprise components that may be used in every circumstance and components that must be provided anew for particular applications, the latter components sometimes being referred to as "user specified" or "user determined" components. Many nucleic acid reagents are of this character. In particular, common components of labeled hybridization probes may be manufactured in bulk and provided as kits for a user who, to obtain an operable assay, must supply a specific component, for example, a target specific component of a probe which hybridizes to a nucleic acid target of interest. Exemplary techniques that are provided in the above format include, but are not limited to, Taqman probes, CRISPR guide sequences, various kinds of PCR probes, and the like, which can be constructed as combinations of pre-existing chemically synthesized oligonucleotides and enzymatically user specified oligonucleotides using methods of the invention. FIG. 6 is a simple example of such a chimeric product comprising a taqman probe, or a precursor to a taqman probe. Product (600) comprising solid support (602) with initiator oligonucleotide (604) attached by its 5' end may be centrally mass produced using organic chemical techniques, as the components are employed in every specific probe design. In this embodiment, initiator oligonucleotide includes cleavable nucleotide "X" (605) and nucleotide distal to "X" including moiety, "R.sub.1" (603), which may be a label, such as a fluorescent donor or quencher, or a reactive group, such as a member of a click chemistry pair, which may be used to attach a donor or acceptor label. In some embodiments, R.sub.1 is attached to a base, e.g. an exocyclic amine. Product (600) may be a component of a kit for a user to produce a taqman probe specific for a target of special interest to him. To the 3' end of the initiator oligonucleotide of product (600), a user may synthesize a sequence-specific extension (608) that may include a nucleotide with moiety "R.sub.2" (607) which may be a complementary donor or quencher which operates with R.sub.1, or R.sub.2 may be a reactive group, such as a member of a click chemistry pair orthogonal to that of R.sub.1 which permits facile attachment of such a label. After the synthesis is completed, the extended oligonucleotide may be cleaved (610) from support (602) to give taqman probe (614) and used support (612) that may be discarded.

[0105] A similar kit may be prepared for providing single guide RNAs (sgRNAs): i) a bead with an initiator sequence attached wherein a T7 promoter is included; ii) customer buys kit with bead, synthesizes its favorite target specific sequence +20-25 nt of the 5' scaffold domain on the end of the initiator; iii) Upon synthesis completion, (a) Anneal complementary 3' scaffold domain onto the oligonucleotide still attached to the beads, (b) Allow primer extension to generate a dsDNA, and (c) Allow reverse transcription to generate sgRNA molecules.

Kits

[0106] The invention includes kits for carrying out methods of the invention. In some embodiments, "kit" refers to any delivery system for delivering materials or reagents for carrying out a method of the invention. In the context of reaction assays, such delivery systems include systems that allow for the storage, transport, or delivery of reaction reagents (e.g., probes, enzymes, etc. in the appropriate containers) and/or supporting materials (e.g., buffers, written instructions for performing the assay etc.) from one location to another. For example, kits include one or more enclosures (e.g., boxes) containing the relevant reaction reagents and/or supporting materials. Such contents may be delivered to the intended recipient together or separately. For example, a first container may contain an enzyme for use in an assay, while a second container contains probes.

[0107] In some embodiments, a kit of the invention includes a template-free polymerase. In some embodiments, a template-free polymerase is a terminal deoxynucleotidyl transferase (TdT) or a variant thereof. In some such kits, a template-free polymerase includes 3'-O-blocked nucleotides. In further embodiments in kits for extending polynucleotides or cDNAs, a kit may include a solid support with an initiator.

[0108] In some embodiments, a kit of the invention for synthesizing a random oligonucleotide barcode includes, a TdT or a variant thereof, 3'-O-blocked nucleoside triphosphates, arrays of microwells for carrying out extension and de-blocking reactions for split and mix synthesis of a barcode.

[0109] In some embodiments, a kit may include a microfluidic device for processing single cells and for delivering reagents thereto.

[0110] In some embodiments, a kit may include one or more solid supports with oligonucleotides attached for carry out methods of synthesizing unique oligonucleotide barcodes on cDNAs. In some embodiments, such one or more solid supports comprise beads; in other embodiments, such one or more solid supports comprise a planar support having a surface coated with capture oligonucleotides.

Definitions

[0111] Unless otherwise specifically defined herein, terms and symbols of nucleic acid chemistry, biochemistry, genetics, and molecular biology used herein follow those of standard treatises and texts in the field, e.g. Kornberg and Baker, DNA Replication, Second Edition (W.H. Freeman, New York, 1992); Lehninger, Biochemistry, Second Edition (Worth Publishers, New York, 1975); Strachan and Read, Human Molecular Genetics, Second Edition (Wiley-Liss, New York, 1999).

[0112] "Amplify," "amplifies," "amplified," "amplification," as used herein, generally refer to any process by which one or more copies are made of a target polynucleotide or a portion thereof. A variety of methods of amplifying polynucleotides (e.g. DNA and/or RNA) are available, some examples of which are described herein. Amplification may be linear, exponential, or involve both linear and exponential phases in a multi-phase amplification process. Amplification methods may involve changes in temperature, such as a heat denaturation step, or may be isothermal processes that do not require heat denaturation. "Amplicon" means the product of a polynucleotide amplification reaction; that is, a clonal population of polynucleotides, which may be single stranded or double stranded, which are replicated from one or more starting sequences. "Amplifying" means producing an amplicon by carrying out an amplification reaction. The one or more starting sequences may be one or more copies of the same sequence, or they may be a mixture of different sequences. Preferably, amplicons are formed by the amplification of a single starting sequence. Amplicons may be produced by a variety of amplification reactions whose products comprise replicates of the one or more starting, or target, nucleic acids. In one aspect, amplification reactions producing amplicons are "template-driven" in that base pairing of reactants, either nucleotides or oligonucleotides, have complements in a template polynucleotide that are required for the creation of reaction products. In one aspect, template-driven reactions are primer extensions with a nucleic acid polymerase or oligonucleotide ligations with a nucleic acid ligase. Such reactions include, but are not limited to, polymerase chain reactions (PCRs), linear polymerase reactions, nucleic acid sequence-based amplification (NASBAs), rolling circle amplifications, and the like, disclosed in the following references that are incorporated herein by reference: Mullis et al, U.S. Pat. Nos. 4,683,195; 4,965,188; 4,683,202; 4,800,159 (PCR); Gelfand et al, U.S. Pat. No. 5,210,015 (real-time PCR with "taqman" probes); Wittwer et al, U.S. Pat. No. 6,174,670; Kacian et al, U.S. Pat. No. 5,399,491 ("NASBA"); Lizardi, U.S. Pat. No. 5,854,033; Aono et al, Japanese patent publ. JP 4-262799 (rolling circle amplification); and the like. In one aspect, amplicons of the invention are produced by PCRs. An amplification reaction may be a "real-time" amplification if a detection chemistry is available that permits a reaction product to be measured as the amplification reaction progresses, e.g. "real-time PCR" described below, or "real-time NASBA" as described in Leone et al, Nucleic Acids Research, 26: 2150-2155 (1998), and like references. As used herein, the term "amplifying" means performing an amplification reaction. A "reaction mixture" means a solution containing all the necessary reactants for performing a reaction, which may include, but not be limited to, buffering agents to maintain pH at a selected level during a reaction, salts, co-factors, scavengers, and the like.

[0113] "Binding compound" means, in some embodiments, any molecule to which oligonucleotide tags can be attached for identification that is capable of specifically binding to a non-nucleic acid ligand. Binding compounds include, but are not limited to, antibodies or compounds derived from antibodies, e.g. Fab fragments. Non-nucleic acid ligands include, but are not limited to, proteins. In some embodiments, binding compounds are attached, e.g. covalently attached, to a surface of a solid support. In some embodiments, oligonucleotide tags are releasably attached to binding compounds; that is, they are attached by a linkage that includes a bond that may be selectively cleaved by predetermined conditions, e.g. light, high pH, low pH, specific redox conditions, specific electrical potential, or the like.

[0114] "Functionally equivalent" in reference to amino acid positions in two or more different TdTs means (i) the amino acids at the respective positions play the same functional role in an activity of the TdTs, and (ii) the amino acids occur at homologous amino acid positions in the amino acid sequences of the respective TdTs. It is possible to identify positionally equivalent or homologous amino acid residues in the amino acid sequences of two or more different TdTs on the basis of sequence alignment and/or molecular modelling. In some embodiments, functionally equivalent amino acid positions belong to sequence motifs that are conserved among the amino acid sequences of TdTs of evolutionarily related species, e.g. genus, families, or the like. Examples of such conserved sequence motifs are described in Motea et al, Biochim. Biophys. Acta. 1804(5): 1151-1166 (2010); Delarue et al, EMBO J., 21: 427-439 (2002); and like references.

[0115] "Microfluidics" device or "nanofluidics" device, used interchangeably herein, each means an integrated system for capturing, moving, mixing, dispensing or analyzing small volumes of fluid, including samples (which, in turn, may contain or comprise cellular or molecular analytes of interest), reagents, dilutants, buffers, or the like. Generally, reference to "microfluidics" and "nanofluidics" denotes different scales in the size of devices and volumes of fluids handled. In some embodiments, features of a microfluidic device have cross-sectional dimensions of less than a few hundred square micrometers and have passages, or channels, with capillary dimensions, e.g. having maximal cross-sectional dimensions of from about 500 gm to about 0.1 .mu.m. In some embodiments, microfluidics devices have volume capacities in the range of from 1.mu.L to a few nL, e.g. 10-100 nL. Dimensions of corresponding features, or structures, in nanofluidics devices are typically from 1 to 3 orders of magnitude less than those for microfluidics devices. One skilled in the art would know from the circumstances of a particular application which dimensionality would be pertinent. In some embodiments, microfluidic or nanofluidic devices have one or more chambers, ports, and channels that are interconnected and in fluid communication and that are designed for carrying out one or more analytical reactions or processes, either alone or in cooperation with an appliance or instrument that provides support functions, such as sample introduction, fluid and/or reagent driving means, such as positive or negative pressure, acoustical energy, or the like, temperature control, detection systems, data collection and/or integration systems, and the like. In some embodiments, microfluidics and nanofluidics devices may further include valves, pumps, filters and specialized functional coatings on interior walls, e.g. to prevent adsorption of sample components or reactants, facilitate reagent movement by electroosmosis, or the like. Such devices may be fabricated as an integrated device in a solid substrate, which may be glass, plastic, or other solid polymeric materials, and may have a planar format for ease of detecting and monitoring sample and reagent movement, especially via optical or electrochemical methods. In some embodiments, such devices are disposable after a single use. In some embodiments, microfluidic and nanofluidic devices include devices that form and control the movement, mixing, dispensing and analysis of droplets, such as, aqueous droplets immersed in an immiscible fluid, such as a light oil. The fabrication and operation of microfluidics and nanofluidics devices are well-known in the art as exemplified by the following references that are incorporated by reference: Ramsey, U.S. Pat. Nos. 6,001,229; 5,858,195; 6,010,607; and 6,033,546; Soane et al, U.S. Pat. Nos. 5,126,022 and 6,054,034; Nelson et al, U.S. Pat. No. 6,613,525; Maher et al, U.S. Pat. No. 6,399,952; Ricco et al, International patent publication WO 02/24322; Bjornson et al, International patent publication WO 99/19717; Wilding et al, U.S. Pat. Nos. 5,587,128; 5,498,392; Sia et al, Electrophoresis, 24: 3563-3576 (2003); Unger et al, Science, 288: 113-116 (2000); Enzelberger et al, U.S. Pat. No. 6,960,437; Cao, "Nanostructures & Nanomaterials: Synthesis, Properties & Applications," (Imperial College Press, London, 2004); Haeberle et al, LabChip, 7: 1094-1110 (2007); Cheng et al, Biochip Technology (CRC Press, 2001); and the like.

[0116] "Mutant" or "variant," which are used interchangeably, refer to polypeptides derived from a natural or reference TdT polypeptide described herein, and comprising a modification or an alteration, i.e., a substitution, insertion, and/or deletion, at one or more positions. Variants may be obtained by various techniques well known in the art. In particular, examples of techniques for altering the DNA sequence encoding the wild-type protein, include, but are not limited to, site-directed mutagenesis, random mutagenesis, sequence shuffling and synthetic oligonucleotide construction. Mutagenesis activities consist in deleting, inserting or substituting one or several amino-acids in the sequence of a protein or in the case of the invention of a polymerase. The following terminology is used to designate a substitution: L238A denotes that amino acid residue (Leucine, L) at position 238 of a reference, or wild type, sequence is changed to an Alanine (A). A132V/I/M denotes that amino acid residue (Alanine, A) at position 132 of the parent sequence is substituted by one of the following amino acids: Valine (V), Isoleucine (I), or Methionine (M). The substitution can be a conservative or non-conservative substitution. Examples of conservative substitutions are within the groups of basic amino acids (arginine, lysine and histidine), acidic amino acids (glutamic acid and aspartic acid), polar amino acids (glutamine, asparagine and threonine), hydrophobic amino acids (methionine, leucine, isoleucine, cysteine and valine), aromatic amino acids (phenylalanine, tryptophan and tyrosine), and small amino acids (glycine, alanine and serine).

[0117] "Polymerase chain reaction" or "PCR" means a reaction for the in vitro amplification of specific DNA sequences by the simultaneous primer extension of complementary strands of DNA. In other words, PCR is a reaction for making multiple copies or replicates of a target nucleic acid flanked by primer binding sites, such reaction comprising one or more repetitions of the following steps: (i) denaturing the target nucleic acid, (ii) annealing primers to the primer binding sites, and (iii) extending the primers by a nucleic acid polymerase in the presence of nucleoside triphosphates. Usually, the reaction is cycled through different temperatures optimized for each step in a thermal cycler instrument. Particular temperatures, durations at each step, and rates of change between steps depend on many factors well-known to those of ordinary skill in the art, e.g. exemplified by the references: McPherson et al, editors, PCR: A Practical Approach and PCR2: A Practical Approach (IRL Press, Oxford, 1991 and 1995, respectively). For example, in a conventional PCR using Taq DNA polymerase, a double stranded target nucleic acid may be denatured at a temperature >90.degree. C., primers annealed at a temperature in the range 50-75.degree. C., and primers extended at a temperature in the range 72-78.degree. C. Reaction volumes typically range from a few hundred nanoliters, e.g. 200 nL, to a few hundred .mu.L, e.g. 200 .mu.L.

[0118] "Primer" means an oligonucleotide, either natural or synthetic, that is capable, upon forming a duplex with a polynucleotide template, of acting as a point of initiation of nucleic acid synthesis and being extended from its 3' end along the template so that an extended duplex is formed. The sequence of nucleotides added during the extension process are determined by the sequence of the template polynucleotide. Usually primers are extended by a DNA polymerase. Primers usually have a length in the range of from 14 to 36 nucleotides.

[0119] "Polynucleotide" and "oligonucleotide" are used interchangeably and each mean a linear polymer of nucleotide monomers. Monomers making up polynucleotides and oligonucleotides are capable of specifically binding to a natural polynucleotide by way of a regular pattern of monomer-to-monomer interactions, such as Watson-Crick type of base pairing. Such monomers and their internucleosidic linkages may be naturally occurring or may be analogs thereof, e.g. naturally occurring or non-naturally occurring analogs. Non-naturally occurring analogs may include phosphorothioate internucleosidic linkages, locked nucleic acids, bases containing linking groups permitting the attachment of labels, such as fluorophores, or haptens, or other oligonucleotides, and the like. Whenever the use of an oligonucleotide or polynucleotide requires enzymatic processing, such as extension by a polymerase, ligation by a ligase, or the like, one of ordinary skill in the art would understand that oligonucleotides or polynucleotides in those instances would not contain certain analogs of intemucleosidic linkages, sugar moities, or bases at any or some positions. Polynucleotides typically range in size from a few monomeric units, e.g. 5-40, when they are usually referred to as "oligonucleotides," to several thousand monomeric units. Whenever a polynucleotide or oligonucleotide is represented by a sequence of letters (upper or lower case), such as "ATGCCTG," it will be understood that the nucleotides are in 5'.fwdarw.3' order from left to right and that "A" denotes deoxyadenosine, "C" denotes deoxycytidine, "G" denotes deoxyguanosine, and "T" denotes thymidine, "I" denotes deoxyinosine, "U" denotes uridine, unless otherwise indicated or obvious from context. Unless otherwise noted the terminology and atom numbering conventions will follow those disclosed in Strachan and Read, Human Molecular Genetics 2 (Wiley-Liss, New York, 1999). Usually polynucleotides comprise the four natural nucleosides (e.g. deoxyadenosine, deoxycytidine, deoxyguanosine, deoxythymidine for DNA or their ribose counterparts for RNA) linked by phosphodiester linkages; however, they may also comprise non-natural nucleotide analogs, e.g. including modified bases, sugars, or intemucleosidic linkages. Those skilled in the art would recognize when an enzyme has specific oligonucleotide or polynucleotide substrate requirements for activity, e.g. single stranded DNA, RNA/DNA duplex, or the like, and would be capable of selecting the appropriate compositions, especially with guidance from treatises, such as Sambrook et al, Molecular Cloning, Second Edition (Cold Spring Harbor Laboratory, New York, 1989), and like references. As used herein, "native polynucleotide" means a polymer of ribonucleotides or deoxyribonucleotides, without non-natural phosphate linkages, sugars, or bases. In some embodiments, native polynucleotides excludes polynucleotides having protection groups (such as exocyclic amine protection groups), linkers (including groups for attaching labels to bases), or labels, capture moieties, or the like. In some embodiments, a native polynucleotide may be a polynucleotide extracted from nature, a chemically or enzymatically synthesized polynucleotide without protection groups, or either of the foregoing attached to a support or with a label, linker or reactive moiety attached.

[0120] "Sequence identity" refers to the number (or fraction, usually expressed as a percentage) of matches (e.g., identical amino acid residues) between two sequences, such as two polypeptide sequences or two polynucleotide sequences. The sequence identity is determined by comparing the sequences when aligned so as to maximize overlap and identity while minimizing sequence gaps. In particular, sequence identity may be determined using any of a number of mathematical global or local alignment algorithms, depending on the length of the two sequences. Sequences of similar lengths are preferably aligned using a global alignment algorithm (e.g. Needleman and Wunsch algorithm; Needleman and Wunsch, 1970) which aligns the sequences optimally over the entire length, while sequences of substantially different lengths are preferably aligned using a local alignment algorithm (e.g. Smith and Waterman algorithm (Smith and Waterman, 1981) or Altschul algorithm (Altschul et al., 1997; Altschul et al., 2005)). Alignment for purposes of determining percent amino acid sequence identity can be achieved in various ways that are within the skill in the art, for instance, using publicly available computer software available on internet web sites such as http://blast.ncbi.nlm.nih.gov/ or ttp://www.ebi.ac.uk/Tools/emboss/. Those skilled in the art can determine appropriate parameters for measuring alignment, including any algorithm needed to achieve maximal alignment over the full length of the sequences being compared. For purposes herein, % amino acid sequence identity values refer to values generated using the pair wise sequence alignment program EMBOSS Needle, that creates an optimal global alignment of two sequences using the Needleman-Wunsch algorithm, wherein all search parameters are set to default values, i.e. Scoring matrix=BLOSUM62, Gap open=10, Gap extend=0.5, End gap penalty=false, End gap open=10 and End gap extend=0.5.

[0121] "Sequence tag" (or "tag") or "barcode" means an oligonucleotide that is attached to Qa polynucleotide or template molecule and is used to identify and/or track the polynucleotide or template in a reaction or a series of reactions. A sequence tag may be attached to the 3'- or 5'-end of a polynucleotide or template or it may be inserted into the interior of such polynucleotide or template to form a linear conjugate, sometime referred to herein as a "tagged polynucleotide," or "tagged template," or "tag-polynucleotide conjugate," "tag-molecule conjugate," or the like. Sequence tags may vary widely in size and compositions; the following references, which are incorporated herein by reference, provide guidance for selecting sets of sequence tags appropriate for particular embodiments: Brenner, U.S. Pat. No. 5,635,400; Brenner and Macevicz, U.S. Pat. No. 7,537,897; Brenner et al, Proc. Natl. Acad., Sci., 97: 1665-1670 (2000); Church et al, European patent publication 0 303 459; Shoemaker et al, Nature Genetics, 14: 450456 (1996); Morris et al, European patent publication 0799897A1; Lorinez et al., U.S. Pat, No. 5,981,179; and the like, Lengths and compositions of sequence tags can vary widely, and the selection of particular lengths and/or compositions depends on several factors including, without limitation, how tags are used to generate a readout, e.g, via a hybridization reaction or via an enmatic reaction, such as sequencing; whether they are labeled, e.g. with a fluorescent dye or the like; the number of distinguishable oligonucleotide tags required to unambiguously identify a set of polynucleotides, and the like, and how different must tags of a set be in order to ensure reliable identification, e.g. freedom from cross hybridization or misidentification from sequencing errors. in one aspect, sequence tags can each have a length within a range of from 2 to 36 nucleotides, or from 4 to 30 nucleotides, or from 8 to 20 nucleotides, or from 6 to 10 nucleotides, respectively. In one aspect, sets of sequence tags are used wherein each sequence tag of a set has a unique nucleotide sequence that differs from that of every other tag of the same set by at least two bases; in another aspect, sets of sequence tags are used wherein the sequence of each tag of a set differs from that of every other tag of the same set by at least three bases.

[0122] A "substitution" means that an amino acid residue is replaced by another amino acid residue. Preferably, the term "substitution" refers to the replacement of an amino acid residue by another selected from the naturally-occurring standard 20 amino acid residues, rare naturally occurring amino acid residues (e.g. hydroxyproline, hydroxylysine, allohydroxylysine, 6-N-methylysine, N-ethylglycine, N-methylglycine, N-ethylasparagine, allo-isoleucine, N-methylisoleucine, N-methylvaline, pyroglutamine, aminobutyric acid, ornithine, norleucine, norvaline), and non-naturally occurring amino acid residue, often made synthetically, (e.g. cyclohexyl-alanine). Preferably, the term "substitution" refers to the replacement of an amino acid residue by another selected from the naturally-occurring standard 20 amino acid residues. The sign "+" indicates a combination of substitutions. The amino acids are herein represented by their one-letter or three-letters code according to the following nomenclature: A: alanine (Ala); C: cysteine (Cys); D: aspartic acid (Asp); E: glutamic acid (Glu); F: phenylalanine (Phe); G: glycine (Gly); H: histidine (His); I: isoleucine (Ile); K: lysine (Lys); L: leucine (Leu); M: methionine (Met); N: asparagine (Asn); P: proline (Pro); Q: glutamine (Gin); R: arginine (Arg); S: serine (Ser); T: threonine (Thr); V: valine (Val); W: tryptophan (Trp) and Y: tyrosine (Tyr). In the present document, the following terminology is used to designate a substitution: L238A denotes that amino acid residue (Leucine, L) at position 238 of the parent sequence is changed to an Alanine (A). A132V/I/M denotes that amino acid residue (Alanine, A) at position 132 of the parent sequence is substituted by one of the following amino acids: Valine (V), Isoleucine (I), or Methionine (M). The substitution can be a conservative or non-conservative substitution. Examples of conservative substitutions are within the groups of basic amino acids (arginine, lysine and histidine), acidic amino acids (glutamic acid and aspartic acid), polar amino acids (glutamine, asparagine and threonine), hydrophobic amino acids (methionine, leucine, isoleucine, cysteine and valine), aromatic amino acids (phenylalanine, tryptophan and tyrosine), and small amino acids (glycine, alanine and serine).

[0123] "Transcriptome" means a collection of all (or nearly all) gene transcripts produced in a particular cell, collection of cells, sample or tissue type. In some embodiments, a transcriptiomne comprises all or nearly all of the polyA messenger RNA (mRNA) of a cell, collection of cells, sample or tissue type.

[0124] "Viable" in reference to cells, tissues or organisms, in some embodiments, means that the cells, tissues or organisms are capable of being grown, cultured, or further propagated. In some embodiments, viable cells are alive and capable of mitotic or meiotic division and further growth after being subjected to at least one cycle of template-free enzymatic elongation of an attached initiator oligonucleotide. The term "viable cell" may include viable eurkaryotic cells, prokaryotic cells, or viruses. In some embodiments, "viable cell" means viable eurkaryotic cell; and in other embodiments, "viable cell" means viable mammalian cell. "Viable conditions" as the term is used herein are physiochemical reaction conditions (e.g.

[0125] temperature, salt concentration, solvent, and the like) that have no substantial deleterious effect on cell viability. In some embodiments, it is understood that additional reaction mixture components would be required for particular cell types, e.g. vitamins, amino acids, or the like, for viability; that is, as used herein, `viable conditions" refers to necessary conditions for cell viability but not sufficient conditions for viability of every cell type. In some embodiments, viable conditions comprise an aqueous reaction mixture with physiological salts, especially, sodium, calcium and/or potassium, at a concentration in the range of 0.8 to 1.0 percent (w/v), pH in the range of 6.8-7.8, and temperature in the range of 15.degree.-41.degree. C.

Sequence CWU 1

1

181510PRTARTIFICIAL SEQUENCETDT enzyme 1Met Asp Pro Leu Gln Ala Val His Leu Gly Pro Arg Lys Lys Arg Pro1 5 10 15Arg Gln Leu Gly Thr Pro Val Ala Ser Thr Pro Tyr Asp Ile Arg Phe 20 25 30Arg Asp Leu Val Leu Phe Ile Leu Glu Lys Lys Met Gly Thr Thr Arg 35 40 45Arg Ala Phe Leu Met Glu Leu Ala Arg Arg Lys Gly Phe Arg Val Glu 50 55 60Asn Glu Leu Ser Asp Ser Val Thr His Ile Val Ala Glu Asn Asn Ser65 70 75 80Gly Ser Asp Val Leu Glu Trp Leu Gln Leu Gln Asn Ile Lys Ala Ser 85 90 95Ser Glu Leu Glu Leu Leu Asp Ile Ser Trp Leu Ile Glu Cys Met Gly 100 105 110Ala Gly Lys Pro Val Glu Met Met Gly Arg His Gln Leu Val Val Asn 115 120 125Arg Asn Ser Ser Pro Ser Pro Val Pro Gly Ser Gln Asn Val Pro Ala 130 135 140Pro Ala Val Lys Lys Ile Ser Gln Tyr Ala Cys Gln Arg Arg Thr Thr145 150 155 160Leu Asn Asn Tyr Asn Gln Leu Phe Thr Asp Ala Leu Asp Ile Leu Ala 165 170 175Glu Asn Asp Glu Leu Arg Glu Asn Glu Gly Ser Cys Leu Ala Phe Met 180 185 190Arg Ala Ser Ser Val Leu Lys Ser Leu Pro Phe Pro Ile Thr Ser Met 195 200 205Lys Asp Thr Glu Gly Ile Pro Cys Leu Gly Asp Lys Val Lys Ser Ile 210 215 220Ile Glu Gly Ile Ile Glu Asp Gly Glu Ser Ser Glu Ala Lys Ala Val225 230 235 240Leu Asn Asp Glu Arg Tyr Lys Ser Phe Lys Leu Phe Thr Ser Val Phe 245 250 255Gly Val Gly Leu Lys Thr Ala Glu Lys Trp Phe Arg Met Gly Phe Arg 260 265 270Thr Leu Ser Lys Ile Gln Ser Asp Lys Ser Leu Arg Phe Thr Gln Met 275 280 285Gln Lys Ala Gly Phe Leu Tyr Tyr Glu Asp Leu Val Ser Cys Val Asn 290 295 300Arg Pro Glu Ala Glu Ala Val Ser Met Leu Val Lys Glu Ala Val Val305 310 315 320Thr Phe Leu Pro Asp Ala Leu Val Thr Met Thr Gly Gly Phe Arg Arg 325 330 335Gly Lys Met Thr Gly His Asp Val Asp Phe Leu Ile Thr Ser Pro Glu 340 345 350Ala Thr Glu Asp Glu Glu Gln Gln Leu Leu His Lys Val Thr Asp Phe 355 360 365Trp Lys Gln Gln Gly Leu Leu Leu Tyr Cys Asp Ile Leu Glu Ser Thr 370 375 380Phe Glu Lys Phe Lys Gln Pro Ser Arg Lys Val Asp Ala Leu Asp His385 390 395 400Phe Gln Lys Cys Phe Leu Ile Leu Lys Leu Asp His Gly Arg Val His 405 410 415Ser Glu Lys Ser Gly Gln Gln Glu Gly Lys Gly Trp Lys Ala Ile Arg 420 425 430Val Asp Leu Val Met Cys Pro Tyr Asp Arg Arg Ala Phe Ala Leu Leu 435 440 445Gly Trp Thr Gly Ser Arg Gln Phe Glu Arg Asp Leu Arg Arg Tyr Ala 450 455 460Thr His Glu Arg Lys Met Met Leu Asp Asn His Ala Leu Tyr Asp Arg465 470 475 480Thr Lys Arg Val Phe Leu Glu Ala Glu Ser Glu Glu Glu Ile Phe Ala 485 490 495His Leu Gly Leu Asp Tyr Ile Glu Pro Trp Glu Arg Asn Ala 500 505 5102381PRTARTIFICIAL SEQUENCEtruncated mouse sequence 2Asn Ser Ser Pro Ser Pro Val Pro Gly Ser Gln Asn Val Pro Ala Pro1 5 10 15Ala Val Lys Lys Ile Ser Gln Tyr Ala Cys Gln Arg Arg Thr Thr Leu 20 25 30Asn Asn Tyr Asn Gln Leu Phe Thr Asp Ala Leu Asp Ile Leu Ala Glu 35 40 45Asn Asp Glu Leu Arg Glu Asn Glu Gly Ser Cys Leu Ala Phe Met Arg 50 55 60Ala Ser Ser Val Leu Lys Ser Leu Pro Phe Pro Ile Thr Ser Met Lys65 70 75 80Asp Thr Glu Gly Ile Pro Cys Leu Gly Asp Lys Val Lys Ser Ile Ile 85 90 95Glu Gly Ile Ile Glu Asp Gly Glu Ser Ser Glu Ala Lys Ala Val Leu 100 105 110Asn Asp Glu Arg Tyr Lys Ser Phe Lys Leu Phe Thr Ser Val Phe Gly 115 120 125Val Gly Leu Lys Thr Ala Glu Lys Trp Phe Arg Met Gly Phe Arg Thr 130 135 140Leu Ser Lys Ile Gln Ser Asp Lys Ser Leu Arg Phe Thr Gln Met Gln145 150 155 160Lys Ala Gly Phe Leu Tyr Tyr Glu Asp Leu Val Ser Cys Val Asn Arg 165 170 175Pro Glu Ala Glu Ala Val Ser Met Leu Val Lys Glu Ala Val Val Thr 180 185 190Phe Leu Pro Asp Ala Leu Val Thr Met Thr Gly Gly Phe Arg Arg Gly 195 200 205Lys Met Thr Gly His Asp Val Asp Phe Leu Ile Thr Ser Pro Glu Ala 210 215 220Thr Glu Asp Glu Glu Gln Gln Leu Leu His Lys Val Thr Asp Phe Trp225 230 235 240Lys Gln Gln Gly Leu Leu Leu Tyr Cys Asp Ile Leu Glu Ser Thr Phe 245 250 255Glu Lys Phe Lys Gln Pro Ser Arg Lys Val Asp Ala Leu Asp His Phe 260 265 270Gln Lys Cys Phe Leu Ile Leu Lys Leu Asp His Gly Arg Val His Ser 275 280 285Glu Lys Ser Gly Gln Gln Glu Gly Lys Gly Trp Lys Ala Ile Arg Val 290 295 300Asp Leu Val Met Cys Pro Tyr Asp Arg Arg Ala Phe Ala Leu Leu Gly305 310 315 320Trp Thr Gly Ser Arg Gln Phe Glu Arg Asp Leu Arg Arg Tyr Ala Thr 325 330 335His Glu Arg Lys Met Met Leu Asp Asn His Ala Leu Tyr Asp Arg Thr 340 345 350Lys Arg Val Phe Leu Glu Ala Glu Ser Glu Glu Glu Ile Phe Ala His 355 360 365Leu Gly Leu Asp Tyr Ile Glu Pro Trp Glu Arg Asn Ala 370 375 3803380PRTARTIFICIAL SEQUENCEBovine truncated (catalytic domain) 3Asp Tyr Ser Ala Thr Pro Asn Pro Gly Phe Gln Lys Thr Pro Pro Leu1 5 10 15Ala Val Lys Lys Ile Ser Gln Tyr Ala Cys Gln Arg Lys Thr Thr Leu 20 25 30Asn Asn Tyr Asn His Ile Phe Thr Asp Ala Phe Glu Ile Leu Ala Glu 35 40 45Asn Ser Glu Phe Lys Glu Asn Glu Val Ser Tyr Val Thr Phe Met Arg 50 55 60Ala Ala Ser Val Leu Lys Ser Leu Pro Phe Thr Ile Ile Ser Met Lys65 70 75 80Asp Thr Glu Gly Ile Pro Cys Leu Gly Asp Lys Val Lys Cys Ile Ile 85 90 95Glu Glu Ile Ile Glu Asp Gly Glu Ser Ser Glu Val Lys Ala Val Leu 100 105 110Asn Asp Glu Arg Tyr Gln Ser Phe Lys Leu Phe Thr Ser Val Phe Gly 115 120 125Val Gly Leu Lys Thr Ser Glu Lys Trp Phe Arg Met Gly Phe Arg Ser 130 135 140Leu Ser Lys Ile Met Ser Asp Lys Thr Leu Lys Phe Thr Lys Met Gln145 150 155 160Lys Ala Gly Phe Leu Tyr Tyr Glu Asp Leu Val Ser Cys Val Thr Arg 165 170 175Ala Glu Ala Glu Ala Val Gly Val Leu Val Lys Glu Ala Val Trp Ala 180 185 190Phe Leu Pro Asp Ala Phe Val Thr Met Thr Gly Gly Phe Arg Arg Gly 195 200 205Lys Lys Ile Gly His Asp Val Asp Phe Leu Ile Thr Ser Pro Gly Ser 210 215 220Ala Glu Asp Glu Glu Gln Leu Leu Pro Lys Val Ile Asn Leu Trp Glu225 230 235 240Lys Lys Gly Leu Leu Leu Tyr Tyr Asp Leu Val Glu Ser Thr Phe Glu 245 250 255Lys Phe Lys Leu Pro Ser Arg Gln Val Asp Thr Leu Asp His Phe Gln 260 265 270Lys Cys Phe Leu Ile Leu Lys Leu His His Gln Arg Val Asp Ser Ser 275 280 285Lys Ser Asn Gln Gln Glu Gly Lys Thr Trp Lys Ala Ile Arg Val Asp 290 295 300Leu Val Met Cys Pro Tyr Glu Asn Arg Ala Phe Ala Leu Leu Gly Trp305 310 315 320Thr Gly Ser Arg Gln Phe Glu Arg Asp Ile Arg Arg Tyr Ala Thr His 325 330 335Glu Arg Lys Met Met Leu Asp Asn His Ala Leu Tyr Asp Lys Thr Lys 340 345 350Arg Val Phe Leu Lys Ala Glu Ser Glu Glu Glu Ile Phe Ala His Leu 355 360 365Gly Leu Asp Tyr Ile Glu Pro Trp Glu Arg Asn Ala 370 375 3804380PRTARTIFICIAL SEQUENCEHuman truncated 4Asp Tyr Ser Asp Ser Thr Asn Pro Gly Pro Pro Lys Thr Pro Pro Ile1 5 10 15Ala Val Gln Lys Ile Ser Gln Tyr Ala Cys Gln Arg Arg Thr Thr Leu 20 25 30Asn Asn Cys Asn Gln Ile Phe Thr Asp Ala Phe Asp Ile Leu Ala Glu 35 40 45Asn Cys Glu Phe Arg Glu Asn Glu Asp Ser Cys Val Thr Phe Met Arg 50 55 60Ala Ala Ser Val Leu Lys Ser Leu Pro Phe Thr Ile Ile Ser Met Lys65 70 75 80Asp Thr Glu Gly Ile Pro Cys Leu Gly Ser Lys Val Lys Gly Ile Ile 85 90 95Glu Glu Ile Ile Glu Asp Gly Glu Ser Ser Glu Val Lys Ala Val Leu 100 105 110Asn Asp Glu Arg Tyr Gln Ser Phe Lys Leu Phe Thr Ser Val Phe Gly 115 120 125Val Gly Leu Lys Thr Ser Glu Lys Trp Phe Arg Met Gly Phe Arg Thr 130 135 140Leu Ser Lys Val Arg Ser Asp Lys Ser Leu Lys Phe Thr Arg Met Gln145 150 155 160Lys Ala Gly Phe Leu Tyr Tyr Glu Asp Leu Val Ser Cys Val Thr Arg 165 170 175Ala Glu Ala Glu Ala Val Ser Val Leu Val Lys Glu Ala Val Trp Ala 180 185 190Phe Leu Pro Asp Ala Phe Val Thr Met Thr Gly Gly Phe Arg Arg Gly 195 200 205Lys Lys Met Gly His Asp Val Asp Phe Leu Ile Thr Ser Pro Gly Ser 210 215 220Thr Glu Asp Glu Glu Gln Leu Leu Gln Lys Val Met Asn Leu Trp Glu225 230 235 240Lys Lys Gly Leu Leu Leu Tyr Tyr Asp Leu Val Glu Ser Thr Phe Glu 245 250 255Lys Leu Arg Leu Pro Ser Arg Lys Val Asp Ala Leu Asp His Phe Gln 260 265 270Lys Cys Phe Leu Ile Phe Lys Leu Pro Arg Gln Arg Val Asp Ser Asp 275 280 285Gln Ser Ser Trp Gln Glu Gly Lys Thr Trp Lys Ala Ile Arg Val Asp 290 295 300Leu Val Leu Cys Pro Tyr Glu Arg Arg Ala Phe Ala Leu Leu Gly Trp305 310 315 320Thr Gly Ser Arg Gln Phe Glu Arg Asp Leu Arg Arg Tyr Ala Thr His 325 330 335Glu Arg Lys Met Ile Leu Asp Asn His Ala Leu Tyr Asp Lys Thr Lys 340 345 350Arg Ile Phe Leu Lys Ala Glu Ser Glu Glu Glu Ile Phe Ala His Leu 355 360 365Gly Leu Asp Tyr Ile Glu Pro Trp Glu Arg Asn Ala 370 375 3805376PRTARTIFICIAL SEQUENCEChicken 1 truncated 5Gln Tyr Pro Thr Leu Lys Thr Pro Glu Ser Glu Val Ser Ser Phe Thr1 5 10 15Ala Ser Lys Val Ser Gln Tyr Ser Cys Gln Arg Lys Thr Thr Leu Asn 20 25 30Asn Cys Asn Lys Lys Phe Thr Asp Ala Phe Glu Ile Met Ala Glu Asn 35 40 45Tyr Glu Phe Lys Glu Asn Glu Ile Phe Cys Leu Glu Phe Leu Arg Ala 50 55 60Ala Ser Val Leu Lys Ser Leu Pro Phe Pro Val Thr Arg Met Lys Asp65 70 75 80Ile Gln Gly Leu Pro Cys Met Gly Asp Arg Val Arg Asp Val Ile Glu 85 90 95Glu Ile Ile Glu Glu Gly Glu Ser Ser Arg Ala Lys Asp Val Leu Asn 100 105 110Asp Glu Arg Tyr Lys Ser Phe Lys Glu Phe Thr Ser Val Phe Gly Val 115 120 125Gly Val Lys Thr Ser Glu Lys Trp Phe Arg Met Gly Leu Arg Thr Val 130 135 140Glu Glu Val Lys Ala Asp Lys Thr Leu Lys Leu Ser Lys Met Gln Arg145 150 155 160Ala Gly Phe Leu Tyr Tyr Glu Asp Leu Val Ser Cys Val Ser Lys Ala 165 170 175Glu Ala Asp Ala Val Ser Ser Ile Val Lys Asn Thr Val Cys Thr Phe 180 185 190Leu Pro Asp Ala Leu Val Thr Ile Thr Gly Gly Phe Arg Arg Gly Lys 195 200 205Lys Ile Gly His Asp Ile Asp Phe Leu Ile Thr Ser Pro Gly Gln Arg 210 215 220Glu Asp Asp Glu Leu Leu His Lys Gly Leu Leu Leu Tyr Cys Asp Ile225 230 235 240Ile Glu Ser Thr Phe Val Lys Glu Gln Ile Pro Ser Arg His Val Asp 245 250 255Ala Met Asp His Phe Gln Lys Cys Phe Ala Ile Leu Lys Leu Tyr Gln 260 265 270Pro Arg Val Asp Asn Ser Ser Tyr Asn Met Ser Lys Lys Cys Asp Met 275 280 285Ala Glu Val Lys Asp Trp Lys Ala Ile Arg Val Asp Leu Val Ile Thr 290 295 300Pro Phe Glu Gln Tyr Ala Tyr Ala Leu Leu Gly Trp Thr Gly Ser Arg305 310 315 320Gln Phe Gly Arg Asp Leu Arg Arg Tyr Ala Thr His Glu Arg Lys Met 325 330 335Met Leu Asp Asn His Ala Leu Tyr Asp Lys Arg Lys Arg Val Phe Leu 340 345 350Lys Ala Gly Ser Glu Glu Glu Ile Phe Ala His Leu Gly Leu Asp Tyr 355 360 365Val Glu Pro Trp Glu Arg Asn Ala 370 3756387PRTARTIFICIAL SEQUENCEPossum truncated 6Ser Ala Asn Pro Asp Pro Thr Ala Gly Thr Leu Asn Ile Leu Pro Pro1 5 10 15Thr Thr Lys Thr Ile Ser Gln Tyr Ala Cys Gln Arg Arg Thr Thr Ile 20 25 30Asn Asn His Asn Gln Arg Phe Thr Asp Ala Phe Glu Ile Leu Ala Lys 35 40 45Asn Tyr Glu Phe Lys Glu Asn Asp Asp Thr Cys Leu Thr Phe Met Arg 50 55 60Ala Ile Ser Val Leu Lys Cys Leu Pro Phe Glu Val Val Ser Leu Lys65 70 75 80Asp Thr Glu Gly Leu Pro Trp Ile Gly Asp Glu Val Lys Gly Ile Met 85 90 95Glu Glu Ile Ile Glu Asp Gly Glu Ser Leu Glu Val Gln Ala Val Leu 100 105 110Asn Asp Glu Arg Tyr Gln Ser Phe Lys Leu Phe Thr Ser Val Phe Gly 115 120 125Val Gly Leu Lys Thr Ala Asp Lys Trp Tyr Arg Met Gly Phe Arg Thr 130 135 140Leu Asn Lys Ile Arg Ser Asp Lys Thr Leu Lys Leu Thr Lys Met Gln145 150 155 160Lys Ala Gly Leu Cys Tyr Tyr Glu Asp Leu Ile Asp Cys Val Ser Lys 165 170 175Ala Glu Ala Asp Ala Val Ser Leu Leu Val Gln Asp Ala Val Trp Thr 180 185 190Phe Leu Pro Asp Ala Leu Val Thr Ile Thr Gly Gly Phe Arg Arg Gly 195 200 205Lys Glu Phe Gly His Asp Val Asp Phe Leu Ile Thr Ser Pro Gly Ala 210 215 220Glu Lys Glu Gln Glu Asp Gln Leu Leu Gln Lys Val Thr Asn Leu Trp225 230 235 240Lys Lys Gln Gly Leu Leu Leu Tyr Cys Asp Leu Ile Glu Ser Thr Phe 245 250 255Glu Asp Leu Lys Leu Pro Ser Arg Lys Ile Asp Ala Leu Asp His Phe 260 265 270Gln Lys Cys Phe Leu Ile Leu Lys Leu Tyr His His Lys Glu Asp Lys 275 280 285Arg Lys Trp Glu Met Pro Thr Gly Ser Asn Glu Ser Glu Ala Lys Ser 290 295 300Trp Lys Ala Ile Arg Val Asp Leu Val Val Cys Pro Tyr Asp Arg Tyr305 310 315 320Ala Phe Ala Leu Leu Gly Trp Ser Gly Ser Arg Gln Phe Glu Arg Asp 325 330 335Leu Arg Arg Tyr Ala Thr His Glu Lys Lys Met Met Leu Asp Asn His 340 345 350Ala Leu Tyr Asp Lys Thr Lys Lys Ile Phe Leu Lys Ala Lys Ser Glu 355 360 365Glu Glu Ile Phe Ala His Leu Gly Leu Glu Tyr Ile Gln Pro Ser Glu 370 375 380Arg Asn Ala3857381PRTARTIFICIAL SEQUENCENew truncated shrew 7Asp Cys Pro Ala Ser His Asp Ser Ser Pro Gln Lys Thr Glu Ser Ala1 5 10

15Ala Val Gln Lys Ile Ser Gln Tyr Ala Cys Gln Arg Arg Thr Thr Leu 20 25 30Asn Asn His Asn His Ile Phe Thr Asp Ala Phe Glu Ile Leu Ala Glu 35 40 45Asn Cys Glu Phe Arg Glu Asn Glu Gly Ser Tyr Val Thr Tyr Met Arg 50 55 60Ala Ala Ser Val Leu Lys Ser Leu Pro Phe Ser Ile Ile Ser Met Lys65 70 75 80Asp Thr Glu Gly Ile Pro Cys Leu Ala Asp Lys Val Lys Cys Val Ile 85 90 95Glu Glu Ile Ile Glu Asp Gly Glu Ser Ser Glu Val Lys Ala Val Leu 100 105 110Asn Asp Glu Arg Tyr Lys Ser Phe Lys Leu Phe Thr Ser Val Phe Gly 115 120 125Val Gly Leu Lys Thr Ala Glu Lys Trp Phe Arg Leu Gly Phe Arg Thr 130 135 140Leu Ser Gly Ile Met Asn Asp Lys Thr Leu Lys Leu Thr His Met Gln145 150 155 160Lys Ala Gly Phe Leu Tyr Tyr Glu Asp Leu Val Ser Cys Val Thr Arg 165 170 175Ala Glu Ala Glu Ala Val Gly Val Leu Val Lys Glu Ala Val Trp Ala 180 185 190Phe Leu Pro Asp Ala Ile Val Thr Met Thr Gly Gly Phe Arg Arg Gly 195 200 205Lys Lys Val Gly His Asp Val Asp Phe Leu Ile Thr Ser Pro Glu Ala 210 215 220Thr Glu Glu Gln Glu Gln Gln Leu Leu His Lys Val Ile Thr Phe Trp225 230 235 240Glu Lys Glu Gly Leu Leu Leu Tyr Cys Asp Leu Tyr Glu Ser Thr Phe 245 250 255Glu Lys Leu Lys Met Pro Ser Arg Lys Val Asp Ala Leu Asp His Phe 260 265 270Gln Lys Cys Phe Leu Ile Leu Lys Leu His Arg Glu Cys Val Asp Asp 275 280 285Gly Thr Ser Ser Gln Leu Gln Gly Lys Thr Trp Lys Ala Ile Arg Val 290 295 300Asp Leu Val Val Cys Pro Tyr Glu Cys Arg Ala Phe Ala Leu Leu Gly305 310 315 320Trp Thr Gly Ser Pro Gln Phe Glu Arg Asp Leu Arg Arg Tyr Ala Thr 325 330 335His Glu Arg Lys Met Met Leu Asp Asn His Ala Leu Tyr Asp Lys Thr 340 345 350Lys Arg Lys Phe Leu Ser Ala Asp Ser Glu Glu Asp Ile Phe Ala His 355 360 365Leu Gly Leu Asp Tyr Ile Glu Pro Trp Glu Arg Asn Ala 370 375 3808387PRTARTIFICIAL SEQUENCEPython truncated 8Glu Lys Tyr Gln Leu Pro Glu Asp Glu Asp Arg Ser Val Thr Ser Asp1 5 10 15Leu Asp Arg Asp Ser Ile Ser Glu Tyr Ala Cys Gln Arg Arg Thr Thr 20 25 30Leu Lys Asn Tyr Asn Gln Lys Phe Thr Asp Ala Phe Glu Ile Leu Ala 35 40 45Glu Asn Tyr Glu Phe Asn Glu Asn Lys Gly Phe Cys Thr Ala Phe Arg 50 55 60Arg Ala Ala Ser Val Leu Lys Cys Leu Pro Phe Thr Ile Val Gln Val65 70 75 80His Asp Ile Glu Gly Val Pro Trp Met Gly Lys Gln Val Lys Gly Ile 85 90 95Ile Glu Asp Ile Ile Glu Glu Gly Glu Ser Ser Lys Val Lys Ala Val 100 105 110Leu Asp Asn Glu Asn Tyr Arg Ser Val Lys Leu Phe Thr Ser Val Phe 115 120 125Gly Val Gly Leu Lys Thr Ser Asp Lys Trp Tyr Arg Met Gly Leu Arg 130 135 140Thr Leu Glu Glu Val Lys Arg Asp Lys Asn Leu Lys Leu Thr Arg Met145 150 155 160Gln Lys Ala Gly Phe Leu His Tyr Asp Asp Leu Thr Ser Cys Val Ser 165 170 175Lys Ala Glu Ala Asp Ala Ala Ser Leu Ile Val Gln Asp Val Val Trp 180 185 190Lys Ile Val Pro Asn Ala Ile Val Thr Ile Ala Gly Gly Phe Arg Arg 195 200 205Gly Lys Gln Thr Gly His Asp Val Asp Phe Leu Ile Thr Val Pro Gly 210 215 220Ser Lys Gln Glu Glu Glu Glu Leu Leu His Thr Val Ile Asp Ile Trp225 230 235 240Lys Lys Gln Glu Leu Leu Leu Tyr Tyr Asp Leu Ile Glu Ser Thr Phe 245 250 255Glu Asp Thr Lys Leu Pro Ser Arg Lys Val Asp Ala Leu Asp His Phe 260 265 270Gln Lys Cys Phe Ala Ile Leu Lys Val His Lys Glu Arg Glu Asp Lys 275 280 285Gly Asn Ser Ile Arg Ser Lys Ala Phe Ser Glu Glu Glu Ile Lys Asp 290 295 300Trp Lys Ala Ile Arg Val Asp Leu Val Val Val Pro Phe Glu Gln Tyr305 310 315 320Ala Phe Ala Leu Leu Gly Trp Thr Gly Ser Thr Gln Phe Glu Arg Asp 325 330 335Leu Arg Arg Tyr Ala Thr His Glu Lys Lys Met Met Leu Asp Asn His 340 345 350Ala Leu Tyr Asp Lys Thr Lys Lys Ile Phe Leu Asn Ala Ala Ser Glu 355 360 365Glu Glu Ile Phe Ala His Leu Gly Leu Asp Tyr Leu Glu Pro Trp Glu 370 375 380Arg Asn Ala3859381PRTARTIFICIAL SEQUENCEtruncated dog 9Asp Tyr Thr Ala Ser Pro Asn Pro Glu Leu Gln Lys Thr Leu Pro Val1 5 10 15Ala Val Lys Lys Ile Ser Gln Tyr Ala Cys Gln Arg Arg Thr Thr Leu 20 25 30Asn Asn Tyr Asn Asn Val Phe Thr Asp Ala Phe Glu Val Leu Ala Glu 35 40 45Asn Tyr Glu Phe Arg Glu Asn Glu Val Phe Ser Leu Thr Phe Met Arg 50 55 60Ala Ala Ser Val Leu Lys Ser Leu Pro Phe Thr Ile Ile Ser Met Lys65 70 75 80Asp Thr Glu Gly Ile Pro Cys Leu Gly Asp Gln Val Lys Cys Ile Ile 85 90 95Glu Glu Ile Ile Glu Asp Gly Glu Ser Ser Glu Val Lys Ala Val Leu 100 105 110Asn Asp Glu Arg Tyr Gln Ser Phe Lys Leu Phe Thr Ser Val Phe Gly 115 120 125Val Gly Leu Lys Thr Ser Glu Lys Trp Phe Arg Met Gly Phe Arg Thr 130 135 140Leu Ser Lys Ile Lys Ser Asp Lys Ser Leu Lys Phe Thr Pro Met Gln145 150 155 160Lys Ala Gly Phe Leu Tyr Tyr Glu Asp Leu Val Ser Cys Val Thr Arg 165 170 175Ala Glu Ala Glu Ala Val Gly Val Leu Val Lys Glu Ala Val Gly Ala 180 185 190Phe Leu Pro Asp Ala Phe Val Thr Met Thr Gly Gly Phe Arg Arg Gly 195 200 205Lys Lys Met Gly His Asp Val Asp Phe Leu Ile Thr Ser Pro Gly Ser 210 215 220Thr Asp Glu Asp Glu Glu Gln Leu Leu Pro Lys Val Ile Asn Leu Trp225 230 235 240Glu Arg Lys Gly Leu Leu Leu Tyr Cys Asp Leu Val Glu Ser Thr Phe 245 250 255Glu Lys Leu Lys Leu Pro Ser Arg Lys Val Asp Ala Leu Asp His Phe 260 265 270Gln Lys Cys Phe Leu Ile Leu Lys Leu His His Gln Arg Val Asp Gly 275 280 285Gly Lys Cys Ser Gln Gln Glu Gly Lys Thr Trp Lys Ala Ile Arg Val 290 295 300Asp Leu Val Met Cys Pro Tyr Glu Arg Arg Ala Phe Ala Leu Leu Gly305 310 315 320Trp Thr Gly Ser Arg Gln Phe Glu Arg Asp Leu Arg Arg Tyr Ala Ser 325 330 335His Glu Arg Lys Met Ile Leu Asp Asn His Ala Leu Tyr Asp Lys Thr 340 345 350Lys Lys Ile Phe Leu Lys Ala Glu Ser Glu Glu Glu Ile Phe Ala His 355 360 365Leu Gly Leu Asp Tyr Ile Glu Pro Trp Glu Arg Asn Ala 370 375 38010382PRTARTIFICIAL SEQUENCETRUNC MOLE 10Gly Asp Cys Pro Ala Ser His Asp Ser Ser Pro Gln Lys Thr Glu Ser1 5 10 15Ala Ala Val Gln Lys Ile Ser Gln Tyr Ala Cys Gln Arg Arg Thr Thr 20 25 30Leu Asn Asn His Asn His Ile Phe Thr Asp Ala Phe Glu Ile Leu Ala 35 40 45Glu Asn Cys Glu Phe Arg Glu Asn Glu Gly Ser Tyr Val Thr Tyr Met 50 55 60Arg Ala Ala Ser Val Leu Lys Ser Leu Pro Phe Ser Ile Ile Ser Met65 70 75 80Lys Asp Thr Glu Gly Ile Pro Cys Leu Ala Asp Lys Val Lys Cys Val 85 90 95Ile Glu Glu Ile Ile Glu Asp Gly Glu Ser Ser Glu Val Lys Ala Val 100 105 110Leu Asn Asp Glu Arg Tyr Lys Ser Phe Lys Leu Phe Thr Ser Val Phe 115 120 125Gly Val Gly Leu Lys Thr Ala Glu Lys Trp Phe Arg Leu Gly Phe Arg 130 135 140Thr Leu Ser Gly Ile Met Asn Asp Lys Thr Leu Lys Leu Thr His Met145 150 155 160Gln Lys Ala Gly Phe Leu Tyr Tyr Glu Asp Leu Val Ser Cys Val Thr 165 170 175Arg Ala Glu Ala Glu Ala Val Gly Val Leu Val Lys Glu Ala Val Trp 180 185 190Ala Phe Leu Pro Asp Ala Ile Val Thr Met Thr Gly Gly Phe Arg Arg 195 200 205Gly Lys Lys Val Gly His Asp Val Asp Phe Leu Ile Thr Ser Pro Glu 210 215 220Ala Thr Glu Glu Gln Glu Gln Gln Leu Leu His Lys Val Ile Thr Phe225 230 235 240Trp Glu Lys Glu Gly Leu Leu Leu Tyr Cys Asp Leu Tyr Glu Ser Thr 245 250 255Phe Glu Lys Leu Lys Met Pro Ser Arg Lys Val Asp Ala Leu Asp His 260 265 270Phe Gln Lys Cys Phe Leu Ile Leu Lys Leu His Arg Glu Cys Val Asp 275 280 285Asp Gly Thr Ser Ser Gln Leu Gln Gly Lys Thr Trp Lys Ala Ile Arg 290 295 300Val Asp Leu Val Val Cys Pro Tyr Glu Cys Arg Ala Phe Ala Leu Leu305 310 315 320Gly Trp Thr Gly Ser Pro Gln Phe Glu Arg Asp Leu Arg Arg Tyr Ala 325 330 335Thr His Glu Arg Lys Met Met Leu Asp Asn His Ala Leu Tyr Asp Lys 340 345 350Thr Lys Arg Lys Phe Leu Ser Ala Asp Ser Glu Glu Asp Ile Phe Ala 355 360 365His Leu Gly Leu Asp Tyr Ile Glu Pro Trp Glu Arg Asn Ala 370 375 38011379PRTARTIFICIAL SEQUENCEPika trunk 11Glu Tyr Ser Ala Asn Pro Ser Pro Gly Pro Gln Ala Thr Pro Ala Val1 5 10 15Tyr Lys Ile Ser Gln Tyr Ala Cys Gln Arg Arg Thr Thr Leu Asn Asn 20 25 30His Asn His Ile Phe Thr Asp Ala Phe Glu Ile Leu Ala Glu Asn Tyr 35 40 45Glu Phe Lys Glu Asn Glu Gly Cys Tyr Val Thr Tyr Met Arg Ala Ala 50 55 60Ser Val Leu Lys Ser Leu Pro Phe Thr Ile Val Ser Met Lys Asp Thr65 70 75 80Glu Gly Ile Pro Cys Leu Glu Asp Lys Val Lys Ser Ile Met Glu Glu 85 90 95Ile Ile Glu Glu Gly Glu Ser Ser Glu Val Lys Ala Val Leu Ser Asp 100 105 110Glu Arg Tyr Gln Cys Phe Lys Leu Phe Thr Ser Val Phe Gly Val Gly 115 120 125Leu Lys Thr Ser Glu Lys Trp Phe Arg Met Gly Phe Arg Ser Leu Ser 130 135 140Asn Ile Arg Leu Asp Lys Ser Leu Lys Phe Thr Gln Met Gln Lys Ala145 150 155 160Gly Phe Arg Tyr Tyr Glu Asp Ile Val Ser Cys Val Thr Arg Ala Glu 165 170 175Ala Glu Ala Val Asp Val Leu Val Asn Glu Ala Val Arg Ala Phe Leu 180 185 190Pro Asp Ala Phe Ile Thr Met Thr Gly Gly Phe Arg Arg Gly Lys Lys 195 200 205Ile Gly His Asp Val Asp Phe Leu Ile Thr Ser Pro Glu Leu Thr Glu 210 215 220Glu Asp Glu Gln Gln Leu Leu His Lys Val Met Asn Leu Trp Glu Lys225 230 235 240Lys Gly Leu Leu Leu Tyr His Asp Leu Val Glu Ser Thr Phe Glu Lys 245 250 255Leu Lys Gln Pro Ser Arg Lys Val Asp Ala Leu Asp His Phe Gln Lys 260 265 270Cys Phe Leu Ile Phe Lys Leu Tyr His Glu Arg Val Gly Gly Asp Arg 275 280 285Cys Arg Gln Pro Glu Gly Lys Asp Trp Lys Ala Ile Arg Val Asp Leu 290 295 300Val Met Cys Pro Tyr Glu Cys His Ala Phe Ala Leu Leu Gly Trp Thr305 310 315 320Gly Ser Arg Gln Phe Glu Arg Asp Leu Arg Arg Tyr Ala Ser His Glu 325 330 335Arg Lys Met Ile Leu Asp Asn His Ala Leu Tyr Asp Lys Thr Lys Arg 340 345 350Val Phe Leu Gln Ala Glu Asn Glu Glu Glu Ile Phe Ala His Leu Gly 355 360 365Leu Asp Tyr Ile Glu Pro Trp Glu Arg Asn Ala 370 37512384PRTARTIFICIAL SEQUENCETRUNC HEDGEHOG 12Asp Ala Ser Phe Gly Ser Asn Pro Gly Ser Gln Asn Thr Pro Pro Leu1 5 10 15Ala Ile Lys Lys Ile Ser Gln Tyr Ala Cys Gln Arg Arg Thr Ser Leu 20 25 30Asn Asn Cys Asn His Ile Phe Thr Asp Ala Leu Asp Ile Leu Ala Glu 35 40 45Asn His Glu Phe Arg Glu Asn Glu Val Ser Cys Val Ala Phe Met Arg 50 55 60Ala Ala Ser Val Leu Lys Ser Leu Pro Phe Thr Ile Ile Ser Met Lys65 70 75 80Asp Thr Lys Gly Ile Pro Cys Leu Gly Asp Lys Ala Lys Cys Val Ile 85 90 95Glu Glu Ile Ile Glu Asp Gly Glu Ser Ser Glu Val Lys Ala Ile Leu 100 105 110Asn Asp Glu Arg Tyr Gln Ser Phe Lys Leu Phe Thr Ser Val Phe Gly 115 120 125Val Gly Leu Lys Thr Ser Glu Lys Trp Phe Arg Met Gly Phe Arg Thr 130 135 140Leu Asn Lys Ile Met Ser Asp Lys Thr Leu Lys Leu Thr Arg Met Gln145 150 155 160Lys Ala Gly Phe Leu Tyr Tyr Glu Asp Leu Val Ser Cys Val Ala Lys 165 170 175Ala Glu Ala Asp Ala Val Ser Val Leu Val Gln Glu Ala Val Trp Ala 180 185 190Phe Leu Pro Asp Ala Met Val Thr Met Thr Gly Gly Phe Arg Arg Gly 195 200 205Lys Lys Leu Gly His Asp Val Asp Phe Leu Ile Thr Ser Pro Gly Ala 210 215 220Thr Glu Glu Glu Glu Gln Gln Leu Leu Pro Lys Val Ile Asn Phe Trp225 230 235 240Glu Arg Lys Gly Leu Leu Leu Tyr His Asp Leu Val Glu Ser Thr Phe 245 250 255Glu Lys Leu Lys Leu Pro Ser Arg Lys Val Asp Ala Leu Asp His Phe 260 265 270Gln Lys Cys Phe Leu Ile Leu Lys Leu His Leu Gln His Val Asn Gly 275 280 285Val Gly Asn Ser Lys Thr Gly Gln Gln Glu Gly Lys Asn Trp Lys Ala 290 295 300Ile Arg Val Asp Leu Val Met Cys Pro Tyr Glu Arg Arg Ala Phe Ala305 310 315 320Leu Leu Gly Trp Thr Gly Ser Arg Gln Phe Glu Arg Asp Leu Arg Arg 325 330 335Phe Ala Thr His Glu Arg Lys Met Met Leu Asp Asn His Ala Leu Tyr 340 345 350Asp Lys Thr Lys Arg Ile Phe Leu Lys Ala Glu Ser Glu Glu Glu Ile 355 360 365Phe Ala His Leu Gly Leu Asp Tyr Ile Asp Pro Trp Glu Arg Asn Ala 370 375 38013381PRTARTIFICIAL SEQUENCEtruncated tree shrew 13Asp His Ser Thr Ser Pro Ser Pro Gly Pro Gln Lys Thr Pro Ala Leu1 5 10 15Ala Val Gln Lys Ile Ser Gln Tyr Ala Cys Gln Arg Arg Thr Thr Leu 20 25 30Asn Asn Cys Asn Arg Val Phe Thr Asp Ala Phe Glu Thr Leu Ala Glu 35 40 45Asn Tyr Glu Phe Arg Glu Asn Glu Asp Ser Ser Val Ile Phe Leu Arg 50 55 60Ala Ala Ser Val Leu Arg Ser Leu Pro Phe Thr Ile Thr Ser Met Arg65 70 75 80Asp Thr Glu Gly Leu Pro Cys Leu Gly Asp Lys Val Lys Cys Val Ile 85 90 95Glu Glu Ile Ile Glu Asp Gly Glu Ser Ser Glu Val Asn Ala Val Leu 100 105 110Asn Asp Glu Arg Tyr Lys Ser Phe Lys Leu Phe Thr Ser Val Phe Gly 115 120 125Val Gly Leu Lys Thr Ser Glu Lys Trp Phe Arg Met Gly Phe Arg Thr 130 135 140Leu Ser Arg Val Arg Ser Asp Lys Ser Leu His Leu Thr Arg Met Gln145 150 155 160Gln Ala Gly Phe Leu Tyr Tyr Glu Asp Leu Ala Ser

Cys Val Thr Arg 165 170 175Ala Glu Ala Glu Ala Val Gly Val Leu Val Lys Glu Ala Val Gly Ala 180 185 190Phe Leu Pro Asp Ala Leu Val Thr Ile Thr Gly Gly Phe Arg Arg Gly 195 200 205Lys Lys Thr Gly His Asp Val Asp Phe Leu Ile Thr Ser Pro Gly Ser 210 215 220Thr Glu Glu Lys Glu Glu Glu Leu Leu Gln Lys Val Leu Asn Leu Trp225 230 235 240Glu Lys Lys Gly Leu Leu Leu Tyr Tyr Asp Leu Val Glu Ser Thr Phe 245 250 255Glu Lys Leu Lys Thr Pro Ser Arg Lys Val Asp Ala Leu Asp His Phe 260 265 270Pro Lys Cys Phe Leu Ile Leu Lys Leu His His Gln Arg Val Asp Gly 275 280 285Asp Lys Pro Ser Gln Gln Glu Gly Lys Ser Trp Lys Ala Ile Arg Val 290 295 300Asp Leu Val Met Cys Pro Tyr Glu Arg His Ala Phe Ala Leu Leu Gly305 310 315 320Trp Thr Gly Ser Arg Gln Phe Glu Arg Asp Leu Arg Arg Tyr Ala Thr 325 330 335His Glu Arg Lys Met Met Leu Asp Asn His Ala Leu Tyr Asp Lys Thr 340 345 350Lys Arg Val Phe Leu Lys Ala Glu Ser Glu Glu Asp Ile Phe Ala His 355 360 365Leu Gly Leu Asp Tyr Ile Glu Pro Trp Glu Arg Asn Ala 370 375 38014394PRTARTIFICIAL SEQUENCETRUNCATED PLATYPUS 14Leu Thr Asn Ser Ala Pro Ile Asn Cys Met Thr Glu Thr Pro Ser Leu1 5 10 15Ala Thr Lys Gln Val Ser Gln Tyr Ala Cys Glu Arg Arg Thr Thr Leu 20 25 30Asn Asn Cys Asn Gln Lys Phe Thr Asp Ala Phe Glu Ile Leu Ala Lys 35 40 45Asp Phe Glu Phe Arg Glu Asn Glu Gly Ile Cys Leu Ala Phe Met Arg 50 55 60Ala Ile Ser Val Leu Lys Cys Leu Pro Phe Thr Ile Val Arg Met Lys65 70 75 80Asp Ile Glu Gly Val Pro Trp Leu Gly Asp Gln Val Lys Ser Ile Ile 85 90 95Glu Glu Ile Ile Glu Asp Gly Glu Ser Ser Ser Val Lys Ala Val Leu 100 105 110Asn Asp Glu Arg Tyr Arg Ser Phe Gln Leu Phe Asn Ser Val Phe Glu 115 120 125Val Gly Leu Thr Asp Asn Gly Glu Asn Gly Ile Ala Arg Gly Phe Gln 130 135 140Thr Leu Asn Glu Val Ile Thr Asp Glu Asn Ile Ser Leu Thr Lys Thr145 150 155 160Thr Leu Ser Thr Ser Leu Trp Asn Tyr Leu Pro Gly Phe Leu Tyr Tyr 165 170 175Glu Asp Leu Val Ser Cys Val Ala Lys Glu Glu Ala Asp Ala Val Tyr 180 185 190Leu Ile Val Lys Glu Ala Val Arg Ala Phe Leu Pro Glu Ala Leu Val 195 200 205Thr Leu Thr Gly Gly Phe Arg Arg Gly Lys Lys Ile Gly His Asp Val 210 215 220Asp Phe Leu Ile Ser Asp Pro Glu Ser Gly Gln Asp Glu Gln Leu Leu225 230 235 240Pro Asn Ile Ile Lys Leu Trp Glu Lys Gln Glu Leu Leu Leu Tyr Tyr 245 250 255Asp Leu Val Glu Ser Thr Phe Glu Lys Thr Lys Ile Pro Ser Arg Lys 260 265 270Val Asp Ala Met Asp His Phe Gln Lys Cys Phe Leu Ile Leu Lys Leu 275 280 285His His Gln Lys Val Asp Ser Gly Arg Tyr Lys Pro Pro Pro Glu Ser 290 295 300Lys Asn His Glu Ala Lys Asn Trp Lys Ala Ile Arg Val Asp Leu Val305 310 315 320Met Cys Pro Phe Glu Gln Tyr Ala Tyr Ala Leu Leu Gly Trp Thr Gly 325 330 335Ser Arg Gln Phe Glu Arg Asp Leu Arg Arg Tyr Ala Thr His Glu Lys 340 345 350Lys Met Met Leu Asp Asn His Ala Leu Tyr Asp Lys Thr Lys Lys Ile 355 360 365Phe Leu Lys Ala Glu Ser Glu Glu Asp Ile Phe Thr His Leu Gly Leu 370 375 380Asp Tyr Ile Glu Pro Trp Glu Arg Asn Ala385 39015384PRTARTIFICIAL SEQUENCETRUNCATED JERBOA 15Ser Ser Glu Leu Glu Leu Leu Asp Val Ser Trp Leu Ile Glu Cys Met1 5 10 15Gly Ala Gly Lys Pro Val Glu Met Thr Gly Arg His Gln Leu Val Lys 20 25 30Gln Thr Phe Cys Leu Pro Gly Phe Ile Leu Gln Asp Ala Phe Asp Ile 35 40 45Leu Ala Glu Asn Cys Glu Phe Arg Glu Asn Glu Ala Ser Cys Val Glu 50 55 60Phe Met Arg Ala Ala Ser Val Leu Lys Ser Leu Pro Phe Pro Ile Ile65 70 75 80Ser Val Lys Asp Thr Glu Gly Ile Pro Trp Leu Gly Gly Lys Val Lys 85 90 95Cys Val Ile Glu Glu Ile Ile Glu Asp Gly Glu Ser Ser Glu Val Lys 100 105 110Ala Leu Leu Asn Asp Glu Arg Tyr Lys Ser Phe Lys Leu Phe Thr Ser 115 120 125Val Phe Gly Val Gly Leu Lys Thr Ala Glu Arg Trp Phe Arg Met Gly 130 135 140Phe Arg Thr Leu Ser Thr Val Lys Leu Asp Lys Ser Leu Thr Phe Thr145 150 155 160Arg Met Gln Lys Ala Gly Phe Leu His Tyr Glu Asp Leu Val Ser Cys 165 170 175Val Thr Arg Ala Glu Ala Glu Ala Val Ser Val Leu Val Gln Gln Ala 180 185 190Val Val Ala Phe Leu Pro Asp Ala Leu Val Ser Met Thr Gly Gly Phe 195 200 205Arg Arg Gly Lys Lys Ile Gly His Asp Val Asp Phe Leu Ile Thr Ser 210 215 220Pro Glu Ala Thr Glu Glu Glu Glu Gln Gln Leu Leu His Lys Val Thr225 230 235 240Asn Phe Trp Glu Gln Lys Gly Leu Leu Leu Tyr Cys Asp His Val Glu 245 250 255Ser Thr Phe Glu Lys Cys Lys Leu Pro Ser Arg Lys Val Asp Ala Leu 260 265 270Asp His Phe Gln Lys Cys Phe Leu Ile Leu Lys Leu Tyr Arg Glu Arg 275 280 285Val Asp Ser Val Lys Ser Ser Gln Gln Glu Gly Lys Gly Trp Lys Ala 290 295 300Ile Arg Val Asp Leu Val Met Cys Pro Tyr Glu Cys Arg Ala Phe Ala305 310 315 320Leu Leu Gly Trp Thr Gly Ser Arg Gln Phe Glu Arg Asp Leu Arg Arg 325 330 335Tyr Ala Thr His Glu Arg Lys Met Arg Leu Asp Asn His Ala Leu Tyr 340 345 350Asp Lys Thr Lys Arg Val Phe Leu Lys Ala Glu Ser Glu Glu Glu Ile 355 360 365Phe Ala His Leu Gly Leu Glu Tyr Ile Glu Pro Leu Glu Arg Asn Ala 370 375 3801632DNAARTIFICIAL SEQUENCEExemplary cDNA produced in a method of the invention (Fig. 1B)misc_feature(8)..(32)n is a, c, g, t or u 16tttttttnnn nnnnnnnnnn nnnnnnnnnn nn 321715DNAARTIFICIAL SEQUENCEExemplary cDNA produced in process of Fig. 4Amisc_feature(15)..(15)n is a, c, g, t or u 17tttttttttt ttttn 151813DNAARTIFICIAL SEQUENCEAntibody barcodemisc_feature(8)..(13)n is a, c, g, t or u 18aaaaaaannn nnn 13

* * * * *

References

blast.ncbi.nlm.nih.gov/orttp://ebi.ac.uk/Tools/emboss