Methods and compositions to enhance plant breeding Bull; Jason ; et al. [Monsanto Technology, L.L.C]

Methods and compositions to enhance plant breeding

Bull; Jason ; et al.

Patent Application Summary

U.S. patent application number 11/441915 was filed with the patent office on 2006-12-14 for methods and compositions to enhance plant breeding. This patent application is currently assigned to Monsanto Technology, L.L.C. Invention is credited to Jason Bull, David Butruille, Sam Eathington, Marlin Edwards, Anju Gupta, Richard Johnson, Wayne Kennard, Jennifer Rinehart, Kunsheng Wu.

Application Number	20060282911 11/441915
Document ID	/
Family ID	37137417
Filed Date	2006-12-14

United States Patent Application	20060282911
Kind Code	A1
Bull; Jason ; et al.	December 14, 2006

Methods and compositions to enhance plant breeding

Abstract

The present invention provides breeding methods and compositions to enhance the germplasm of a plant. The methods describe the identification and accumulation of transgenes and favorable haplotype genomic regions in the germplasm of a breeding population of crop plants.

Inventors:	Bull; Jason; (St. Louis, MO) ; Butruille; David; (Urbandale, IA) ; Eathington; Sam; (Ames, IA) ; Edwards; Marlin; (Davis, CA) ; Gupta; Anju; (Ankeny, IA) ; Johnson; Richard; (Urbana, IL) ; Kennard; Wayne; (Ankeny, IA) ; Rinehart; Jennifer; (Spring Green, WI) ; Wu; Kunsheng; (Ballwin, MO)
Correspondence Address:	FULBRIGHT & JAWORSKI, LLP 600 CONGRESS AVENUE, SUITE 2400 AUSTIN TX 78745 US
Assignee:	Monsanto Technology, L.L.C
Family ID:	37137417
Appl. No.:	11/441915
Filed:	May 26, 2006

Related U.S. Patent Documents


Application Number	Filing Date	Patent Number
60685584	May 27, 2005

Current U.S. Class:	800/266 ; 800/267; 800/278; 800/298; 800/312; 800/320.1
Current CPC Class:	A01N 57/20 20130101; A01H 5/10 20130101; C12N 15/8275 20130101; A01H 1/04 20130101; C12N 15/8286 20130101; C12N 15/821 20130101; Y02A 40/146 20180101; A01H 1/02 20130101
Class at Publication:	800/266 ; 800/278; 800/298; 800/312; 800/320.1; 800/267
International Class:	A01H 1/02 20060101 A01H001/02; A01H 5/00 20060101 A01H005/00

Claims

1. A method of breeding a transgenic plant comprising the steps of (a) providing a database identifying a value of at least one agronomic trait for at least two distinct haplotypes of the genome for a set of germplasm; (b) transforming a parent plant with a recombinant DNA to produce at least two transgenic events, wherein the recombinant DNA is inserted into linkage with the at least two distinct haplotypes of the genome of said parent plant; (c) referencing the database for the value of said agronomic trait for the events linked to the distinct haplotypes; and (d) selecting a plant for breeding, said plant comprising the transgenic event having a higher referenced value haplotype.

2. The method of claim 1, wherein the recombinant DNA is selected from the group consisting of DNA encoding a selectable marker, DNA encoding a scorable marker, a DNA recombination site, DNA encoding a protein providing an agronomic enhancement, and DNA for gene suppression.

3. The method of claim 1, wherein said at least one agronomic trait is yield or a multiple trait index.

4. The method of claim 1 wherein said transgenic event selected for breeding has the recombinant DNA linked to a haplotype wherein the haplotype is selected from the group consisting of not negative with respect to yield, not positive with respect to maturity, null with respect to maturity, amongst the favorable 50 percent with respect to an agronomic trait or a multiple trait index when compared to any other haplotype at the same chromosome segment, amongst the favorable 50 percent with respect to an agronomic trait or a multiple trait index when compared to any other haplotype across the entire genome.

5. The method of claim 1, wherein a progeny plant of the plant selected for breeding is selected by marker-assisted selection.

6. The method of claim 1, wherein a progeny plant of the plant selected for breeding is selected by detection of expression of the transgene or expression of the transgene agronomic trait.

7. The method of claim 1, wherein the plant is a crop plant selected from the group consisting of a forage crop, oilseed crop, grain crop, fruit crop, vegetable crop, fiber crop, spice crop, nut crop, turf crop, sugar crop, beverage crop, and forest crop.

8. The method of claim 7, wherein the oilseed crop is selected from the group consisting of soybean, canola, oil seed rape, oil palm, sunflower, olive, corn, cottonseed, peanut, flaxseed, safflower, and coconut.

9. The method of claim 8, wherein the soybean has in its genome at least one genetic marker that is genetically linked to a haplotype selected from the group consisting of C8W6H5, C18W3H8, C19W3H6, C16W8H43, C1W1H2, C1W2H1, C14W7H2, and C6W4H1; and said haplotype further comprises a linked transgene.

10. The method of claim 9, wherein the genetic marker is a DNA marker selected from the group consisting of SEQ ID NO: 1-32.

11. The method of claim 6, further comprising the step of crossing the progeny plant with a third soybean plant to produce additional progeny plants.

12. The method of claim 7, wherein said grain crop is corn and has in its genome at least one genetic marker that is genetically linked to a haplotype selected from the group consisting of C1W19H14, C1W30H4, C1W36H2 and C8W4H5; and said haplotype further comprises a linked transgene.

13. The method of claim 12, wherein said genetic marker is a DNA marker selected from the group consisting of SEQ ID NO: 33-54.

14. The method of claim 1, wherein the recombinant DNA and the haplotype are linked at a genetic distance of 0 to within about 10 cM.

15. The method of claim 1, wherein the recombinant DNA and the haplotype are linked at a distance of 0 to within about 5 cM.

16. The method of claim 4, wherein a haplotype allele is associated with agronomic fitness or occurs at a frequency of 50 percent or more in a breeding population or a set of germplasm.

17. The method of claim 2, wherein the agronomic enhancement is selected from the group consisting of herbicide tolerance, disease resistance, insect or pest resistance, altered fatty acid, protein or carbohydrate metabolism, increased grain yield, increased oil, altered plant maturity, enhanced stress tolerance, and altered morphological characteristics.

18. The method of claim 17, wherein the herbicide tolerance is selected from the group consisting of glyphosate, glufosinate, sulfonylureas, imidazolinones, bromoxynil, dalapon, dicamba, 2,4-D, cyclohezanedione, protoporphyrinogen oxidase inhibitors, and isoxaflutole tolerance.

19. The method of claim 6, wherein the progeny plant contains at least a portion of the haplotype of the plant selected for breeding wherein the portion is selected from the group consisting of at least 10 cM, at least 5 cM, and at least 1 cM.

20. The method of claim 19, wherein using the progeny plant in activities related to germplasm improvement the activities selected from the group consisting of using the plant for making breeding crosses, further testing of the plant, advancement of the plant through self fertilization, use of the plant or parts thereof for transformation, use of the plant or parts thereof for mutagenesis, and use of the plant or parts thereof for TILLING.

21. A method for inserting a transgene into a plant haplotype comprising: (a) incorporating into genetic linkage with a haplotype, a target site comprising at least a first recombination site; and (b) introducing into a plant cell a transgene expression cassette comprising at least a first recombination site, wherein the first recombination site of the expression cassette flanks a polynucleotide comprising a transgene of interest; and (c) providing a recombinase that recognizes and implements recombination of the expression cassette at the first recombination site thereby creating a preferred T-type genomic region, wherein the preferred T-type genomic region has an estimated T-type value, wherein preferred means selected for a haplotype that previously did not contain a transgene or is preferred over a haplotype that previously contained a transgene wherein the haplotype is selected from the group consisting of not negative with respect to yield, is not positive with respect to maturity, null with respect to maturity, amongst the best 50 percent with respect to an agronomic trait or a multiple trait index when compared to any other haplotype at the same chromosome segment, amongst the best 50 percent with respect to an agronomic trait or a multiple trait index when compared to any other haplotype across the entire genome.

22. The method of claim 21, wherein the recombination site is selected from the group consisting of FRT, mutant FRT, LOX, mutant LOX sites, and zinc finger nuclease modified site.

23. The method of claim 21, wherein a haplotype allele is associated with agronomic fitness or occurs at a frequency of 50 percent or more in a breeding population or a set of germplasm.

24. The method of claim 21, wherein the recombinase is cre or flp.

25. A method for mapping at least one T-Type transgene event comprising: (a) identifying from the flanking sequence surrounding at least a first transgenic event in a transformed plant or line at least a first polymorphism between the parent lines of a mapping population, wherein the transformed plant or line may be different from the parent lines of the mapping population; and (b) assaying the progeny plants of the mapping population for the polymorphism, (c) performing a linkage analysis to determine a map position of the polymorphism and thereby a map location of the transgenic event; and (d) correlating the map location to a haplotype of the transformed plant.

26. A method for enhancing accumulation of one or more T-type genomic regions in a germplasm comprising: (a) inserting a transgene into a genome of a first plant; and (b) determining a map location of the transgene in the genome; and (c) correlating the map location to a haplotype, wherein the transgene and the haplotype comprises a T-type genomic region; and (d) crossing the first plant with a second plant that contains at least one T-type genomic region or haplotype that is different from the first plant T-type genomic region; (e) selecting at least one progeny plant by detecting expression of the transgene of the first plant, wherein the progeny plant comprises in its genome at least a portion of the T-type genomic region of the first plant and at least one T-type genomic region or haplotype of the second plant; (f) using the progeny plant in activities related to germplasm improvement the activities selected from the group consisting of using the plant for making breeding crosses, further testing of the plant, advancement of the plant through self fertilization, use of the plant or parts thereof for transformation, use of the plant or parts thereof for mutagenesis, and use of the plant or parts thereof for TILLING.

27. A crop plant comprising a preferred T-type genomic region, wherein a transgene of the T-type genomic region is further defined as conferring a trait selected from the group consisting of herbicide tolerance, disease resistance, insect or pest resistance, altered fatty acid, protein or carbohydrate metabolism, increased grain yield, increased oil, altered plant maturity, enhanced stress tolerance, and altered morphological characteristics; and the haplotype of the T-type genomic region is selected from the group consisting of not negative with respect to yield, is not positive with respect to maturity, null with respect to maturity, amongst the best 50 percent with respect to an agronomic trait or a multiple trait index when compared to any other haplotype at the same chromosome segment, amongst the best 50 percent with respect to an agronomic trait or a multiple trait index when compared to any other haplotype across the entire genome.

28. The method of claim 27, wherein a haplotype has a high value if it is present with a frequency of 50 percent or more in a breeding population or a set of germplasm.

29. The crop plant of claim 27, wherein the preferred T-type genomic region comprises a transgene and a haplotype that are genetically linked within a distance of 0 to about 10 cM.

30. The crop plant of claim 27, wherein the preferred T-type genomic region comprises a transgene and a haplotype that are genetically linked within a distance of 0 to about 5 cM.

31. The crop plant of claim 27, wherein said crop plant is a transgenic herbicide tolerant soybean plant and wherein the transgene is genetically linked to a haplotype identified as C8W6H5.

32. The soybean plant of claim 31, wherein the genetic marker is selected from the group consisting of SEQ ID NO: 1, 2, 3 and 59.

33. The crop plant of claim 27, wherein said crop plant is a transgenic insect tolerant soybean plant and wherein the transgene is genetically linked to a haplotype identified as C6W4H1.

34. The soybean plant of claim 33, wherein the genetic marker is selected from the group consisting of: SEQ ID NO: 29-32.

35. The crop plant of claim 27, wherein said crop plant is a transgenic insect tolerant corn plant and wherein the transgene is genetically linked to a haplotype identified as C1W36H2.

36. The corn plant of claim 35, wherein the genetic marker is selected from the group consisting of: SEQ ID NO: 48-50.

37. A method for enhancing accumulation of one or more haplotypes in a germplasm comprising: (a) determining a map location of a transgene in the genome; and (b) correlating the map location to a haplotype, wherein the transgene and the haplotype comprises a T-type genomic region; and (c) crossing the first plant with a second plant that contains at least one T-type genomic region or haplotype that is different from the first plant T-type genomic region; (d) selecting at least one progeny plant by detecting expression of the transgene of the first plant, wherein the progeny plant comprises in its genome at least a portion of the T-type genomic region of the first plant and at least one T-type genomic region or haplotype of the second plant; (e) using the progeny plant in activities related to germplasm improvement the activities selected from the group consisting of using the plant for making breeding crosses, further testing of the plant, advancement of the plant through self fertilization, use of the plant or parts thereof for transformation, use of the plant or parts thereof for mutagenesis, and use of the plant or parts thereof for TILLING.

Description

[0001] This application claims the benefit of U.S. Provisional Application No. 60/685,584, filed May 27, 2005, the entire text of which is specifically incorporated by reference herein

BACKGROUND OF THE INVENTION

[0002] 1. Field of the Invention

[0003] The invention relates to the field of plant breeding and plant biotechnology, in particular to a transgene inserted into genetic linkage with a genomic region of a plant, and to the use of the transgene/genomic region to enhance the germplasm and to accumulate other favorable genomic regions in breeding populations.

[0004] 2. Description of Related Art

[0005] Breeding has advanced from selection for economically important traits in plants and animals based on phenotypic records of the individual and its relatives to the use of molecular genetics to identify genomic regions that contain the valuable genetic traits. Information at the DNA level has lead to faster genetic accumulation of valuable traits into a germplasm than that achieved based on the phenotypic data only. The development of transgenic crops has further revolutionized breeding and agricultural crop production. The outstanding success of genetically engineered crops is evident from the fact that the area of farmland devoted to transgenic crops has grown from a negligible acreage ten years ago to well over half the acreage for major crops in agriculturallyimportant countries such as USA, Canada, Brazil and Argentina. In addition to the development of input traits, plant biotechnology also holds great promise for the future development of output traits that will directly benefit consumers, like nutritionally superior foods, such as the vitamin A enriched rice, unsaturated oils, and agricultural products of medical value to name a few. The potential for commercial success of a transgene encoding a new or improved input or output trait is a great incentive for development of novel transgenes and their deployment through breeding these genes into elite germplasm.

[0006] During the development of transgenic crop plants much effort is concentrated on optimization of the insertion and expression of the transgene, and then introgressing the transgene throughout the breeding population by classical breeding methods. The site of insertion of a transgene into the host genome has been a concern for at least two reasons; (i) the region where it inserted may modulate the level of expression of the transgene, and (ii) the insertion of the transgene may disrupt the normal function or expression of a gene near or where it has been inserted. The selection of genomic locations that are beneficial for gene integration provides for suitable levels of stable expression of an introduced gene, or genes, and generally does not negatively affect other agronomic characteristics of the crop plant.

[0007] The genomic region in which the transgene has been inserted also provides agronomic phenotypes to the crop plant. These phenotypes have their own value in a breeding program and these regions should be considered when selecting among multiple transgene insertion events. Transgene insertion events into genomic regions that are associated with improved performance with respect to an agronomic trait or multiple trait index result in an improved phenotype in the crop plant and progeny derived from the crop plant that contain the transgene and the associated improved phenotype. Selecting for the transgenic event necessarily results in selecting a segment of the host genome that surrounds it, and the improved phenotypic effect. Further improvements involve the identification of molecular markers for the tracking and maintenance of the genomic segment with the associated transgene. This is an area that has not been adequately addressed in current plant breeding with transgene insertion events.

[0008] There is a need in the art of plant breeding to identify genomic regions associated with improved performance with respect to an agronomic trait or multiple trait index that are linked with a transgene insertion event and then select for these transgene-genomic regions for dispersion into the breeding population of the crop. The present invention provides consideration to estimating the value of the genomic region and the transgene event. This value can then be used as a criterion for selecting among multiple transgenic events. A further benefit is that linkage drag around a transgene is minimized and valuable genomic regions are selected that contain the transgene for breeding into the germplasm of a crop.

SUMMARY OF THE INVENTION

[0009] The present invention provides a method of breeding with transgenic plants. In one aspect, this method comprises providing a database identifying a value of an agronomic trait for at least two distinct haplotypes of the genome for a set of germplasm. The method further comprises transforming a parent plant with recombinant DNA to produce at least two transgenic events wherein the recombinant DNA is inserted into linkage with the at least two distinct haplotypes of the genome of the parent plant. The database may then be referenced to estimate the value of the agronomic trait for the events linked to the distinct haplotypes, and transgenic event having a higher referenced breeding value may then be selected for breeding into a germplasm.

[0010] The present invention provides a method for improving plant germplasm by accumulation of one or more haplotypes in a germplasm. The method comprises inserting a transgene into a genome of a first plant, and then determining a map location of the transgene in the genome. The map location may be correlated to a linked haplotype, wherein the transgene and the haplotype comprise a T-type genomic region. The first plant may then be crossed with a second plant. The second plant may contain at least one T-type genomic region or haplotype that is different from the first plant T-type genomic region. At least one progeny plant may then be selected, the progeny plant having detectable expression of the transgene or its phenotype and comprising in its genome the T-type genomic region of the first plant and at least one T-type genomic or haplotype of the second plant. The progeny plant may be used in activities related to germplasm improvement, which can be selected from use of the plant for making breeding crosses, further testing of the plant, advancement of the plant through self fertilization, use of the plant or parts thereof for transformation, use of the plant or parts thereof for mutagenesis, and use of the plant or parts thereof for TILLING, or any combination of these.

[0011] The present invention includes a method for breeding of a crop plant, in particular a soybean or corn plant with enhanced agronomic and transgenic traits comprising a preferred T-type genomic region. A transgene of the T-type genomic region is further defined as conferring a preferred property like herbicide tolerance, disease resistance, insect or pest resistance, altered fatty acid, protein or carbohydrate metabolism, increased grain yield, increased oil, increased nutritional content, increased growth rates, enhanced stress tolerance, or altered morphological characteristics, or any combination of these.

[0012] The present invention provides a novel method for mapping at least one genomic region of insertion of a transgene. This method involves indirect mapping and does not require the establishment of a de novo population segregating for a transgene. The method comprises first identifying at least a first polymorphism between the parent lines of a mapping population in the corresponding genomic region adjacent to a transgenic insertion event in a transformed plant or line, then assaying the progeny plants of the mapping population for the polymorphism. Linkage analysis may be performed to determine a map position of the polymorphism and thereby a map location of the transgenic insertion event. The map location in the mapping population may then be correlated to a haplotype of the transformed plant and its progeny.

DESCRIPTION OF ILLUSTRATIVE EMBODIMENTS

[0013] The definitions and methods provided define the present invention and guide those of ordinary skill in the art in the practice of the present invention. Unless otherwise noted, terms are to be understood according to conventional usage by those of ordinary skill in the relevant art. Definitions of common terms in molecular biology may also be found in Rieger et al. (1991); and Lewin (1994). The nomenclature for DNA bases as set forth at 37 CFR .sctn. 1.822 is used.

[0014] As used herein, the term "corn" means Zea mays or maize and includes all plant varieties that can be bred with corn, including wild maize species.

[0015] As used herein, the term "soybean" means Glycine max and includes all plant varieties that can be bred with soybean, including wild soybean species.

[0016] As used herein, the term "comprising" means "including but not limited to".

[0017] A transgenic "event" is produced by transformation of a plant cell with heterologous DNA, i.e., a nucleic acid construct that includes a transgene of interest, regeneration of a population of plants resulting from the insertion of the transgene into the genome of the plant, and selection of a particular plant characterized by insertion into a particular genome location. The term "event" refers to the original transformant and progeny of the transformant that include the heterologous DNA. The term "event" also refers to progeny produced by a sexual outcross between the transformant and another variety that include the heterologous DNA.

[0018] The present invention overcomes the deficiencies of the current transgene breeding methods by describing a T-type genomic region, defined as a transgene and a linked haplotype genomic region, through which the genetically linked transgene and haplotype are selected and then introgressed into gernplasm through breeding. The selection of the T-type genomic region is based on the estimation of a T-value that the T-type genomic region provides to the germplasm of the crop plant. The basis of the valuation distinguishes and selects improved T-type genomic regions for use in a breeding method, and selects and advances plants comprising the improved T-type genomic regions. The genomic locations for gene integration are favorable based on providing suitable levels of stable expression of an introduced gene, or genes, and for identifying transgene associations with favorable haplotype regions that also provide beneficial agronomic characteristics to the germplasm. By considering the beneficial aspects of both the transgene and the genomic region to which it is genetically linked, additional value can be built into a transgenic event and its use for developing superior germplasm. In an unexpected outcome from extensive experience in breeding with transgenic plants, the inventors have realized that additional consideration should be given to the genomic region that is linked to the transgene insertion. As a transgene is diffused by breeding methods into plant germplasm a portion of the genetic region linked to the transgene is also diffused. By giving consideration to the genetic region linked to a transgene it is possible to implement biotechnological and breeding strategies to increase the overall value of the transgene and the genetic region to which it is linked to enhance germplasm improvement and minimize the risk of advancement of less favorable genetic regions, often referred to as linkage drag.

[0019] For example, in one aspect of the present invention, T-type genomic regions of new glyphosate tolerant soybean events have been identified that comprise a glyphosate tolerance transgene with suitable levels of expression in linkage with a haplotype. The highest yielding T-type was identified as event 19788 (also referred to as MON89788) and provided for the replacement of the T-type genomic region of event 40-3-2 with a haplotype in the same genomic region with improved yield as determined in a side-by-side comparison. This finding will have significant impact on enhancing the germplasm of glyphosate tolerant soybean. A significant portion of recent soybean breeding has utilized lines containing the Roundup Ready.RTM. trait found in event 40-3-2 (Padgette et al., 1995), with possibly as much as 80-95% of the soybean germplasm offered for sale in the United States currently containing this transgenic event. In order to continue to enhance soybean germplasm, it is desirable to be able to identify glyphosate tolerant events that also have favorable haplotype genomic regions and replace the 40-3-2 T-type genomic region in the germplasm, therefore providing elite agronomic traits of the parental line to the progeny.

[0020] In another aspect of the present invention, T-type genomic regions of insect tolerant soybean events are identified that comprise an insect resistance transgene with suitable levels of expression in linkage with a haplotype. The event GM.sub.--19459 was selected from a population of transgenic soybean events. These events contain a transgene inserted into the soybean genome that expresses a protein toxic to Lepidopteran insect pests of soybean. The various haplotype genomic regions have been mapped to assist in the selection of an event with the most favorable T-type genomic region.

[0021] In another aspect of the present invention, T-type genomic regions of insect tolerant corn events are identified that comprise an insect resistance transgene with suitable levels of expression in linkage with a haplotype. The insect tolerant corn event is selected from a population of transgenic corn events. These events contain a transgene inserted into the corn genome that expresses a protein toxic to Lepidopteran insect pests of corn. The various haplotype genomic regions are mapped to assist in the selection of an event with the most favorable T-type genomic region.

[0022] Any transgene inserted into the genome of a crop plant that can be mapped to a genomic location can then be compared to a haplotype marker developed in that location to determine if the location comprises a haplotype with an enhanced breeding value.

[0023] In one embodiment, the current invention provides genetic markers and methods for the identification and breeding of T-type genomic regions in soybean. The invention therefore allows for the first time the creation of soybean plants that combine the value of a transgene and an agronomically elite, or favorable haplotype. Favorable haplotypes are at least identified as those that have been inherited more frequently than expected in a plant population. Using the methods of the present invention, loci comprising a T-type genomic region may be introduced into potentially any desired soybean plant. Molecular markers are provided that when used in a marker assisted breeding program provide a means to identify and maintain the association of the favorable haplotype and the transgene to provide the valuable T-type genomic region. The present invention provides examples of transgenes that provide herbicide and insect resistant phenotypes to the soybean plants, other transgenes that provide stress tolerance, disease tolerance, enhanced protein, oil, amino acid or other feed quality, nutrition or processing traits are also contemplated as aspects of the present invention and germplasm comprising these T-types would be crossed to provide a stacked trait product with preferred T-type genomic regions.

[0024] In another embodiment, the current invention provides genetic markers and methods for the identification and breeding of T-type genomic regions in corn. The invention therefore allows for the first time the creation of corn plants that combine the value of a transgene and an agronomically elite, or favorable haplotype. Using the methods of the present invention, loci comprising a T-type genomic region may be introduced into potentially any desired corn plant. Molecular markers are provided that when used in a marker assisted breeding program provide a means to identify and maintain the association of the favorable haplotype and the transgene to provide the valuable T-type genomic region. The present invention provides examples of transgenes that provide an insect resistant phenotype to the corn plant, other transgenes that provide stress tolerance, herbicide tolerance, enhanced protein, oil, amino acid or other feed quality, nutrition or processing traits are also contemplated as aspects of the present invention and germplasm comprising these T-type would be crossed to provide a stacked trait product with preferred T-type genomic regions.

T-type Genomic Region and the Concept of T-type Value

[0025] A T-type genomic region is a novel genetic composition comprising at least one transgene, with suitable levels of expression, in genetic linkage with a haplotype. In a preferred embodiment the linkage of a transgene with a haplotype should have no observable deleterious effect on the functional integrity of the haplotype due to the local insertion of the transgene. Additionally a haplotype of a T-type genomic region could be functionally enhanced as a result of the integration into genetic linkage of a transgene. The T-type genomic region composition has the benefit of the transgene and the haplotype with which it is linked. The T-type genomic region is the genetic composition through which a transgene is diffused into germplasm by breeding.

[0026] In a preferred embodiment of the present invention, a haplotype of a T-type genomic region comprises at least two biallelic markers approximately 10 cM apart, or at least one pluriallelic locus within 5 cM of the transgene and with high polymorphic information content. Changes in a haplotype, brought about by recombination for example, may result in the modification of a haplotype so that it only comprises a portion of the original (parental) haplotype physically linked to the transgene. Any such change in a haplotype would be included in our definition of what constitutes a T-type genomic region so long as the functional integrity of the T-type genomic region is unchanged or improved. The linkage of the transgene to the haplotype or functional portion thereof that provides the desirable phenotype is preferably within about 5 cM, or within about 2 cM, or within about 1 cM of the haplotype region. The functional integrity of a haplotype is considered to be unchanged if its value is not negative with respect to yield, or is not positive with respect to maturity, or is null with respect to maturity, or amongst the best 50 percent with respect to an agronomic trait or a multiple trait index when compared to any other haplotype at the same chromosome segment in a set of germplasm (breeding germplasm, breeding population, collection of elite inbred lines, population of random mating individuals, biparental cross), or amongst the best 50 percent with respect to an agronomic trait or a multiple trait index when compared to any other haplotype across the entire genome in a set of germplasm, or the haplotype being present with a frequency of 50 percent or more in a breeding population or a set of germplasm can be taken as evidence of its high value, or any combination of these.

[0027] The benefit or value of the plant comprising in its genome a T-type genomic region is estimated by a T-value, which depends on the value of the transgene trait and the value of the haplotype to which the transgene is linked. The value of a transgene of a T-type genomic region can be estimated from the value of the trait that the transgene encodes. This value depends on the transgene trait (for example, including but not limited to: herbicide tolerance, insect resistance, disease resistance, improved nutrition, enhanced yield, improved processing trait, or stress tolerance) and could be estimated from increased crop plant output, or decrease in inputs required for crop cultivation, or any combination of these. The transgene trait also has value as a selectable or scorable marker. This has value in breeding applications to one skilled in the art because the ability to select or score for the transgene trait results in the simultaneous selection of the linked haplotype. For example in the case of a cross made with a plant comprising a T-type, wherein the transgene encodes a herbicide tolerance, spraying the progeny of that cross with the herbicide would have a high probability of selecting for the transgene and the tightly linked parental or recombinant haplotype. DNA markers that are developed to define the haplotype can be used to confirm the integrity of the T-type in the progeny of the cross.

[0028] A transgene comprising a recombinant construct may further comprise a selectable marker or scorable marker. The nucleic acid sequence serving as the selectable or scorable marker functions to produce a phenotype in cells which facilitates their identification relative to cells not containing the marker.

[0029] Examples of selectable markers include, but are not limited to, a neo or nptII gene (Potrykus et al., 1991), which codes for kanamycin resistance and can be selected for using kanamycin, G418, etc.; a bar gene which codes for bialaphos resistance; glyphosate resistant EPSP synthase, glyphosate resistant mutant EPSP synthase (Hinchee et al., 1988) which encodes glyphosate resistance, glyphosate inactivating enzymes; a nitrilase gene which confers resistance to bromoxynil (Stalker et al., 1988); a mutant acetolactate synthase gene (ALS) which confers imidazolinone or sulphonylurea resistance (European Patent Application No. 0154204); and a methotrexate resistant DHFR gene (Thillet et al., 1988).

[0030] Other exemplary scorable markers include: a .beta.-glucuronidase or uidA gene (GUS), which encodes an enzyme for which various chromogenic substrates are known (Jefferson, 1987; Jefferson et al., 1987); an R-locus gene, which encodes a product that regulates the production of anthocyanin pigments (red color) in plant tissues (Dellaporta et al., 1988); a .beta.-lactamase gene (Sutcliffe et al., 1978), which encodes an enzyme for which various chromogenic substrates are known (e.g., PADAC, a chromogenic cephalosporin); a luciferase gene (Ow et al., 1986); a xylE gene (Zukowsky et al., 1983) which encodes a catechol dioxygenase that can convert chromogenic catechols; an .beta.-amylase gene (Ikatu el al., 1990); a tyrosinase gene (Katz et al., 1983), which encodes an enzyme capable of oxidizing tyrosine to DOPA and dopaquinone (which in turn condenses to melanin); and an P-galactosidase, which will turn a chromogenic .beta.-galactose substrate.

[0031] Included within the terms "selectable or scorable markers" are also genes that encode a secretable marker whose secretion can be detected as a means of identifying or selecting for transformed cells. Examples include markers that encode a secretable antigen that can be identified by antibody interaction, or even secretable enzymes which can be detected catalytically. Selectable secreted marker proteins fall into a number of classes, including small, diffusible proteins which are detectable, (e.g., by ELISA), small active enzymes which are detectable in extracellular solution (e.g., .beta.-amylase, .beta.-lactamase, phosphinothricin transferase), or proteins which are inserted or trapped in the cell wall (such as proteins which include a leader sequence such as that found in the expression unit of extension or tobacco PR-S). Other possible selectable marker genes will be apparent to those of skill in the art.

[0032] A marker is preferably GUS, green fluorescent protein (GFP), neomycin phosphotransferase II (nptII), luciferase (LUX), an antibiotic resistance gene coding sequence, or an herbicide resistance gene coding sequence. The selectable agent can be an antibiotic, for example including but not limited to, kanamycin, hygromycin, or a herbicide, for example including but not limited to, glyphosate, glufosinate, 2,4-D, and dicamba.

[0033] The T-type genomic region has a value in marker-assisted selection and marker-assisted breeding applications. Selection for a transgene and a favorable haplotype in the case where they comprise a T-type genomic region requires only one marker, whereas at least two markers would be required if the transgene and favorable haplotype are unlinked. This potential value would increase as more T-type genomic regions are accumulated or stacked together in a germplasm.

[0034] The T-value can be changed or modified by changing expression of the transgene, wherein a change is brought about at the level of transgene expression, or in the timing of transgene expression, or in the localization of transgene expression, or any combination of these. It is anticipated by this invention that the change in T-value brought by a change in any of the components of transgene expression could be effected through cis-acting (local) or trans-acting (can act at a distance not simply on the DNA molecule in which they occur) factors, or a combination of these.

[0035] Additionally, the T-value can be changed or modified by changing the haplotype with which the transgene is tightly linked. A preferred embodiment of the present invention is the improvement of the T-value by selecting or directing the transgene of an existing T-type genomic into tight linkage with a different recipient haplotype, wherein the different haplotype is associated with additional value and improved with respect to an agronomic trait or a multiple trait index over the existing T-type haplotype as determined in a side-by-side or head-to-head comparison. A change in the haplotype could also be brought about by generating or selecting for at least one recombinant T-type haplotype that is improved with respect to an agronomic trait or a multiple trait index over the existing T-type haplotype as determined in a replicated side-by-side or head-to-head comparison.

[0036] Another preferred embodiment of the present invention is to build additional value into a new or novel transgene event by selecting or directing the transgene into linkage with a recipient haplotype that has a breeding value that is not negative with respect to yield, or is not positive with respect to maturity, or is null with respect to maturity, or amongst the best 50 percent with respect to an agronomic trait or a multiple trait index when compared to any other haplotype at the same chromosome segment in a set of germplasm, or amongst the best 50 percent with respect to an agronomic trait or a multiple trait index when compared to any other haplotype across the entire genome in a set of germplasm, or alleles conferring agronomic fitness to a crop plant or the haplotype being present with a frequency of 50 percent or more in a breeding population or a set of germplasm can be taken as evidence of its high value, or any combination of these.

[0037] Another embodiment of the present invention is a selection of a plant or line for transformation with at least a first transgene, wherein the selection of the plant or line is based on it comprising in its genome a high proportion of recipient haplotypes that have a breeding value that is not negative with respect to yield, or is not positive with respect to maturity, or is null with respect to maturity, or amongst the best 50 percent with respect to an agronomic trait or a multiple trait index when compared to any other haplotype at the same chromosome segment in a set of germplasm, or amongst the best 50 percent with respect to an agronomic trait or a multiple trait index when compared to any other haplotype across the entire genome in a set of germplasm, or alleles conferring agronomic fitness to a crop plant or the haplotype being present with a frequency of 50 percent or more in a breeding population or a set of germplasm can be taken as evidence of its high value, or any combination of these.

[0038] This invention anticipates an accumulating or stacking of T-type genomic regions into plants or lines by addition of transgenes by transformation, or by crossing parent plants or lines containing different T-type genomic regions, or any combination of these. The value of the accumulated or stacked T-type genomic regions can be estimated by a composite T-value, which depends on a combination of the value of the transgene traits and the value of the haplotype(s) to which the transgenes are linked. The present invention further anticipates that the composite T-value can be improved by modifying the components of expression of one or each of the stacked transgenes. Additionally, the present invention anticipates that additional value can be built into the composite T-value by selection of at least one recipient haplotype with a favorable breeding value to which one or any of the transgenes are linked, or by selection of plants or lines for stacking transgenes by transformation or by breeding or by any combination of these.

[0039] Transgenic crops for which a method of the present invention can be applied include, but are not limited to herbicide tolerant crops, for example, Roundup Ready.RTM. Cotton 1445 and 88913; Roundup Ready.RTM. corn GA21, nk603, MON802, MON809; Roundup Ready.RTM. Sugar beet GTSB77 and H7-1; Roundup Ready.RTM. Canola RT73 and GT200; oilseed rape ZSR500, Roundup Ready.RTM. Soybean 40-3-2, MON89788-containing soybean, Roundup Ready.RTM. Bentgrass ASR368, HCN10, HCN28 and HCN92 canola, MS1 and RF1 canola, OXY-235 canola, PHY14, PHY35 and PHY36 canola, RM3-3, RM3-4 and RM3-6 chicory, A2704-12, A2704-21, A5547-35, A5547-127 soybean, GU262 soybean, W62 and W98 soybean, 19-51A cotton, 31807 and 31808 cotton, BXN cotton, FP967 flax, LLRICE06 and LLRICE62 rice, MON71800 wheat, 676 and 678 and 680 corn, B16 corn, Bt11 corn, CBH-351 corn, DAS-06275-8 corn, DBT418 corn, MS3 and MS6 corn, T14 and T25 corn, H177 corn, and TC1507 corn. Herbicides for which transgenic plant tolerance has been demonstrated and the method of the present invention can be applied, include but are not limited to: glyphosate, glufosinate, sulfonylureas, imidazolinones, bromoxynil, dalapon, dicamba, 2,4-D, cyclohezanedione, protoporphyrinogen oxidase inhibitors, and isoxaflutole herbicides. Polynucleotide molecules encoding proteins involved in herbicide tolerance are known in the art, and include, but are not limited to a polynucleotide molecule encoding 5-enolpyruvylshikimate-3-phosphate synthase (EPSPS) described in U.S. Pat. No. 5,627,061, U.S. Pat. No. 5,633,435, U.S. Pat. No. 6,040,497 and in U.S. Pat. No. 5,094,945 for glyphosate tolerance, all of which are hereby incorporated by reference; polynucleotides encoding a glyphosate oxidoreductase, glyphosate-N-acetyl transferase, or glyphosate decarboxylase (GOX, U.S. Pat. No. 5,463,175; GAT, US Patent publications 20030083480 and 20050246798; glyphosate decarboxylase, US Patent publications 20060021093; 20060021094; 20040177399, herein incorporated by reference in their entirety); a polynucleotide molecule encoding bromoxynil nitrilase (Bxn) described in U.S. Pat. No. 4,810,648 for bromoxynil tolerance, which is hereby incorporated by reference; a polynucleotide molecule encoding phytoene desaturase (crtI) described in Misawa et al, (1993) and Misawa et al, (1994) for norflurazon tolerance; a polynucleotide molecule encoding acetohydroxyacid synthase (AHAS, aka ALS) described in Sathasiivan et al. (1990) for tolerance to sulfonylurea herbicides; and the bar gene described in DeBlock, et al. (1987) for glufosinate and bialaphos tolerance; resistant hydroxyphenyl pyruvate dehydrogenase (HPPD, U.S. Pat. No. 6,768,044). A promoter of a transgene of the present invention can express genes that encode for phosphinothricin acetyltransferase, glyphosate resistant EPSPS, aminoglycoside phosphotransferase, hydroxyphenyl pyruvate dehydrogenase, hygromycin phosphotransferase, neomycin phosphotransferase, dalapon dehalogenase, bromoxynil resistant nitrilase, dicamba mono-oxygenase, anthranilate synthase, glyphosate oxidoreductase, glyphosate-N-acetyl transferase, or glyphosate decarboxylase.

[0040] Transgenic crops for which the method of the present invention can be applied include, but are not limited to, insect resistant crops, for example, cotton events, such as MON15985, 281-24-236, 3006-210-23, MON531, MON757, MON1076, and COT102; or corn events, such as MIR604, BT176, BT11, CBH-351, DAS-06275-8, DBT418, MON80100, MON810, MON863, TC1507, MIR152V, 3210M, and 3243M. Insect resistant transgenic crops can provide tolerance to insect pest feeding damage and have been shown to be effective against certain Lepidopterans, and Coleopterans plant pests, and other transgenic crops that may also provide resistance to plant pests such as, certain members of Hemiptera, Homoptera, Heteroptera, Orthoptera, Thysanoptera, and plant parasitic nematodes. Disease resistant transgenic crops, for example, virus resistant papaya 55-1/63-1, and virus resistant squash CZW-3 and ZW20. Male sterility transgenic crops, for example, PHY14, PHY35 and PHY36 canola and corn events 676, 678, 680, MS3 and MS6. Additional transgenic crop plants may also provide resistance to fungal and bacterial organisms that cause plant disease.

[0041] The present invention contemplates the above listed transgenic crops and germplasm comprising the T-type genomic regions for use in breeding and stacking of T-type genomic regions, or haplotypes identified by an indirect mapping method, or any combination of these to increase T-type value or to enhance overall germplasm quality as described in the methods of the present invention.

[0042] Haplotypes A "haplotype" is a segment of DNA in the genome of an organism that is assumed to be identical by descent for different individuals when the knowledge of identity by state at one or more loci is the same in the different individuals, and that the regional amount of linkage disequilibrium in the vicinity of that segment on the physical or genetic map is high. A haplotype can be tracked through populations and its statistical association with a given trait can be analyzed. Thus, a haplotype association study allows one to define the frequency and the type of the ancestral carrier haplotype. An "association study" is a genetic experiment where one tests the level of departure from randomness between the segregation of alleles at one or more marker loci and the value of individual phenotype for one or more traits. Association studies can be done on quantitative or categorical traits, accounting or not for population structure and/or stratification.

[0043] A haplotype analysis is important in that it increases the statistical power of an analysis involving individual biallelic markers. In a first stage of a haplotype frequency analysis, the frequency of the possible haplotypes based on various combinations of the identified biallelic markers of the invention is determined. The haplotype frequency is then compared for distinct populations and mapping population. Generally, as a result of prior germplasm improvement, the greater the haplotype frequency in a population of set of germplasm the greater its value has been to the germplasm, described as the alleles associated with agronomic fitness of a crop plant (U.S. Pat. No. 5,437,697, herein incorporated by reference in its entirety). A favorable haplotype can be selected based on its frequency in a set of germplasm, generally a frequency of 50 percent or more would indicate that the haplotype has value in the germplasm. A haplotype that occurs at a high frequency would be favorable for targeting with a transgene or selection of a T-type wherein the haplotype has a high frequency in the germplasm would be considered favorable. A haplotype occurring at any frequency in the germplasm can be correlated to a trait and the haplotype can be given a value based on a single trait or a combination of traits. A favorable haplotype will provide one or more favorable traits to a germplasm. In general, any method known in the art to test whether a trait and a genotype show a statistically significant correlation may be used. Methods for determining the statistical significance of a correlation between a phenotype and a genotype, in this case a haplotype, may be determined by any statistical test known in the art and with any accepted threshold of statistical significance being required. The application of particular methods and thresholds of significance are well with in the skill of the ordinary practitioner of the art.

[0044] In plant breeding populations, linkage disequilibrium (LD), which is the level of departure from random association between two or more loci in a population, often persists over large chromosomal segments. Although it is possible for one to be concerned with the individual effect of each gene in the segment, for a practical plant breeding purpose, what generally matters is what is the average impact the region has for the trait(s) of interest(s) when present in a line, hybrid or variety. The amount of pair-wise LD (using the r.sup.2 statistics) was plotted against the distance in centiMorgans (cM, one hundredth of a Morgan, on average one recombination per meiosis, recombination is the result of the reciprocal exchange of chromatid segment between homologous chromosome paired at meiosis, and it is usually observed through the association of alleles at linked loci from different grandparents in the progeny) between the markers for a reference germplasm set, for example, a set of 791 soybean elite US lines and 1211 SNP loci with a rare allele frequency greater than 5 percent. A 200 data points moving average curve was an indicator of the presence of LD even for loci 10 cM apart. Thus when predicting average effect of chromosome segments, one should consider segments a few centiMorgans long, and this is the acception given to a haplotype region, that is a chromosome segment a few centiMorgans long that persists over multiple generations of breeding and that is carried by one or more breeding lines. This segment can be identified with multiple linked marker loci it contains, and the common haplotype identity at these loci in two lines gives a high degree of confidence of the identity by descent of the entire subjacent chromosome segment carried by these lines.

[0045] One should specify what the favorable haplotypes are and what their frequency in the germplasm is. Thus, one would obtain or generate a molecular marker survey of the germplasm under consideration for breeding and/or propagation of a transformation event. This marker survey will generate a fingerprint of each line. These markers are assumed to have their approximate genomic map position known. To simplify downstream analyses, quality assurance and missing data estimations steps may need to be implemented at this stage to produce a complete and accurate data matrix (marker genotype by line). Error detections and missing data estimations may require the use of parent-offspring tests, LD between marker loci, interval mapping, re-genotyping, etc.

[0046] Markers are then grouped based on their proximity. This grouping may be arbitrary (e.g. "start from one end of the chromosome and include all markers that are within 10 cM of the first marker included in the segment, before starting the next segment") or based on some statistical analysis (e.g. "define segment breakpoints based on LD patterns between adjacent loci").

[0047] When a large set of lines is considered, and multiple lines have the same allele at a marker locus, it is needed to ascertain whether identity by state (IBS) at the marker locus is a good predictor of identity by descent (IBD) at the chromosomal region surrounding the marker locus. "Identity by descent" (IBD) characterizes two loci/segment of DNA that are carried by two or more individuals and are all derived from the same ancestor. "Identity by state" (IBS) characterizes two loci/segments of DNA that are carried by two or more individuals and have the same alleles at the observable loci. A good indication that a number of marker loci in a segment are enough to characterize IBD for the segment is that they can predict the allele present at other marker loci within the segment.

[0048] To estimate the frequency of a haplotype, the base reference germplasm has to be defined (collection of elite inbred lines, population of random mating individuals, etc.) and a representative sample (or the all population) has to be genotyped. The haplotype frequency can then be determined by simple counting if considering a set of inbred individuals. Estimation methods that employ computing techniques like the Expectation/Maximization algorithm will be needed if individuals genotyped are heterozygous at more the one loci in the segment and linkage phase is unknown (Excoffier and Slatkin, 1995). Preferably, a method based on an expectation-maximization (EM) algorithm (Dempster et al. 1977) leading to maximum-likelihood estimates of haplotype frequencies under the assumption of Hardy-Weinberg proportions (random mating) is used (Excoffier and Slatkin, 1995). With the haplotype estimates, and the identity of each chromosome segment for each candidate host line, it is further possible to rank lines according to their probability of giving rise to events located in high value haplotypes. Several probability distributions of an event to be located in a chromosome segment could be used, according to the degree of knowledge acquired on the physical size of each segment and the random or pattern-following mode of insertion of a transgene in the genome. Alternative approaches can be employed to perform association studies: genome-wide association studies, candidate region association studies and candidate gene association studies. The biallelic markers of the present invention may be incorporated in any map of genetic markers of a plant genome in order to perform genome-wide association studies.

[0049] The present invention comprises methods to detect an association between a haplotype and a favorable property or a multiple trait index. A multiple trait index (MTI) is a numerical entity that is calculated through the combination of single trait values in a formula. Most often calculated as a linear combination of traits or normalized derivations of traits, it can also be the result of more sophisticated calculations (for example, use of ratios between traits). This MTI can then be used in genetic analysis as if it where a trait. A favorable haplotype provides a favorable property to a parent plant and to the progeny of the parent when selected by a marker means or phenotypic means. The method of the present invention provides for selection of favorable haplotypes and the accumulation of favorable haplotypes in a breeding population, for example one or more of the haplotypes identified in the present invention. A particular embodiment of the present invention, a transgene is associated with a favorable haplotype to create a T-type that is accumulated with other favorable haplotypes to enhance a germplasm.

Accumulation of T-type Genomic Regions and Favorable Haplotypes

[0050] Another embodiment of this invention is a method for enhancing accumulation of one or more haplotypes in a germplasm. The transformation of a plant cell with a transgene means that the transgene DNA has been inserted into a genomic DNA region of the plant. Genomic regions defined as haplotype regions include genetic information and provide phenotypic traits to the plant. Variations in the genetic information result in variation of the phenotypic trait and the value of the phenotype can be measured. The genetic mapping of the haplotype regions and genetic mapping of a transgene insertion event allows for a determination of linkage of a transgene insertion with a haplotype. Any transgene that has a DNA sequence that is novel in the genome of a transformed plant can in itself serve as a genetic marker of the transgene and the genomic region in which it has inserted. For example, in the present invention, a transgene that was inserted into the genome of a soybean plant provides for the expression of a glyphosate resistant 5-enolpyruvylshikimate-3-phosphate synthase that has a DNA coding sequence comprised within SEQ ID NO:28 disclosed in U.S. Pat. No. 6,660,911 and SEQ ID NO:9 disclosed in U.S. Pat. No. 5,633,435, both herein incorporated by reference, from which a DNA primer or probe molecule can be selected to function as a genetic marker for the transgene in the genome.

[0051] Additionally, a transgene may provide a means to select for plants that have the insert and the linked haplotype region. Selection may be due to tolerance to an applied phytotoxic chemical such as a herbicide or antibiotic. Selection may be due to detection of a product of a transgene, for example, an mRNA or protein product. Selection may be conducted by detection of the transgene DNA inserted into the genome of the plant. A transgene may also provide a phenotypic selection means, such as, a morphological phenotype that is easily to observe, this could be a seed color, seed germination characteristic, seedling growth characteristic, leaf appearance, plant architecture, plant height, and flower and fruit morphology, or selection based on an agronomic phenotype, such as, yield, herbicide tolerance, disease tolerance, insect tolerance, enhance feed quality, drought tolerance, cold tolerance, or any other agronomic trait provided by a transgene.

[0052] During the development of improved crop plants by insertion of transgenic genes often hundreds of plants are produced with different transgene insertion locations. These insertion events occur throughout the genome of the plant and are incorporated into tight linkage with many different haplotype regions. The present invention provides for the screening of transgenic events that have a transgene insertion into tight linkage with favorable haplotype regions and selection of these events for use in a breeding program to enhance the accumulation of favorable haplotype regions. The method includes: a) inserting a transgene into a genome of a plant cell and regenerating the plant cell into an intact transformed plant using plant transformation and regeneration methods previously described and known in the art of plant biotechnology; and b) determining a map location of the transgene in the genome of the transformed plant using DNA markers of the transgene and linked genomic regions; and c) correlating the map location to a tightly linked haplotype, wherein the transgene and the haplotype comprises a T-type genomic region in the transformed plant; and d) crossing the transformed plant with a second plant that may also be transformed to contain at least one T-type genomic region that is different from the first transformed plant T-type genomic region or the second plant may contain a favorable haplotype region identified by genetic markers that is different from the first transformed plant; and e) selecting at least one progeny plant by detecting expression of the transgene of the first plant or selecting by the presence of a marker associated with the transgene, wherein the progeny plant comprises in its genome at least a portion of the T-type genomic region of the first plant and at least one T-type genomic region or favorable haplotype of the second plant; and f) using the progeny plant in activities related to germplasm improvement the activities selected from the group consisting of using the plant for making breeding crosses, further testing of the plant, advancement of the plant through self fertilization, use of the plant or parts thereof for transformation, use of the plant or parts thereof for mutagenesis, and use of the plant or parts thereof for TILLING (e.g. McCallum et al., 2000).

[0053] Using this method, the present invention contemplates that preferred T-type genomic regions are selected from a large population of T-type genomic regions, and the preferred T-type genomic regions have an enhanced T-value in the germplasm of a crop plant. Additionally, the preferred T-type genomic region can be used in the described breeding method to accumulate other beneficial T-type genomic regions and favorable haplotype regions and maintain these in a breeding population to enhance the overall germplasm of the crop plant. Crop plants considered for use in the method include but are not limited to, corn, soybean, cotton, wheat, rice, canola, oilseed rape, sugar beet, sorghum, millet, alfalfa, vegetable crops, forest trees, and fruit crops.

Genome Mapping of a T-type Genomic Region

[0054] Another embodiment of this invention is a method for indirect mapping at least one T-type genomic region. Mapping of the T-type genomic region in the genome of a plant provides for selection of favorable haplotype regions that comprise the T-type genomic region. The present invention provides a method for mapping of the transgene insertion event and its association with a genomic region and location on a genome map of a plant. The method may include the following steps: [0055] (a) Obtaining the DNA sequence of the genome flanking the transgene insertion event; [0056] (b) Comparing the DNA sequence chromatogram to eliminate paralogous sequences when two or more sequences of high homology are obtained; [0057] (c) Searching for the DNA sequence in a sequence database to verify whether the insertion event has interrupted an endogenous gene; [0058] (d) Designing one or a plurality of pairs of DNA primer molecules on either or both the 5' and 3' genomic regions flanking the transgene insertion. When multiple pairs of primers are designed, it can be done in such a way as to obtain overlapping PCR products from each genomic flanking region to ensure substantial coverage of the associated genomic DNA; [0059] (e) Using the parent lines of a mapping population(s) as template for PCR; [0060] (f) Sequencing the PCR products obtained from these primers/line combinations; [0061] (g) Identifying SNPs, or other polymorphic feature such as indels or SSRs, between the parents of at least one of the mapping populations; [0062] (h) Repeating steps (d) through (g) on additional flanking sequence, sliding away from the site of insertion in the 5' and 3' directions, until polymorphic sites are found, or to obtain additional ones; [0063] (i) Designing an assay to score the progeny plants of the mapping population(s); [0064] (j) Perform a linkage analysis to ascertain the map position of these polymorphism and consequently of the location of the event; [0065] (k) Correlate map position with the location of a haplotype region.

[0066] The genome flanking the transgene insertion event can comprise a DNA segment of from a few hundred to tens of thousands of nucleotide base pairs or a sufficient length to identify a polymorphism. The genomic flanking region can be from the 5' or 3' end of the transgene insert location extending into the genome from the insert site. The "polymerase chain reaction" (PCR) is a process of in vitro geometrical amplification of a target DNA segment through the use of a heat-resistant DNA polymerase and cyclic variation of temperature to allow for repetitive denaturing, primer annealing and amplification or template DNA. "Paralogous sequences" are two sequences of DNA with a high degree of similarity but belong to different loci on the genome. A "mapping population" is a set of individuals where alleles at marker loci and possibly at one or a plurality of Quantitative Trait Loci (QTL) are segregating, in a way that presence of linkage disequilibrium can be taken of evidence as proximity on the chromosome and there is a positive correlations between proximity and disequilibrium. The mapping population is the same plant species or a plant species demonstrating synteny or colinearity. These populations can be used to estimate the relative positions of marker loci among themselves or between these and QTLs. Generally mapping populations are segregating populations. The method can be applied to any crop species, particular important crop species are, for example, corn, soybean, cotton, wheat, rice, canola, oilseed rape, sugar beet, sorghum, millet, alfalfa, vegetable crops, forest trees, and fruit crops. There are maps available to one skilled in the art for one or more of these crops, by way of example, genetic maps are referenced for maize (Lee et al., 2002), soybean (Ferreira et al., 2000), cotton (Lacape et al., 2003), and canola (Cheung et al., 1997). De novo mapping populations can also be generated for any crop of interest and a genetic map crated that is useful in the present invention to map the haplotype regions in which a transgene has inserted.

[0067] Identification of cloned genomic DNA regions for example, those contained in a Bac library can be probed with DNA markers developed to identify the haplotype linked with a transgenic insertion. Additional DNA markers can be developed by sequencing the Bac clones and inspecting for polymorphisms in the sequence. Genes of interest can be isolated from the Bac clones that can be used as transgenes to improve the performance of the same crop species or different crop species.

Recombinant Vectors and Transgenes

[0068] Means for preparing recombinant vectors are well known in the art. Methods for making recombinant vectors particularly suited to plant transformation include, without limitation, those described in U.S. Pat. Nos. 4,971,908, 4,940,835, 4,769,061 and 4,757,011. These type of vectors have also been reviewed (Rodriguez et al., 1988; Glick et al., 1993).

[0069] Typical vectors useful for expression of nucleic acids in higher plants are well known in the art and include vectors derived from the tumor-inducing (Ti) plasmid of Agrobacterium tumefaciens (Rogers et al., 1987). Other recombinant vectors useful for plant transformation, including the pCaMVCN transfer control vector, have also been described (Fromm et al., 1985). Many crops species have been transformed to contain one or more transgenes of agronomic importance that in themselves provides a favorable property to the plant. One example is a transgene that confers herbicide tolerance to the crop plant. Transgenes that encode herbicide tolerance proteins that have been transformed and expressed in plants include, for example, a 5-enolpyruvylshikimate-3-phosphate synthase (EPSPS) protein conferring glyphosate resistance and proteins conferring resistance to others herbicides, such as glufosinate or bromoxynil (Comai et al., 1985; Gordon-Kamm et al., 1990; Stalker et al., 1988; Eichholtz et al., 1987; Shah et al., 1986; Charest et al., 1990). Further examples include the expression of enzymes such as dihydrofolate reductase and acetolactate synthase, mutant ALS and AHAS enzymes that confer resistance to imidazalinone or a sulfonylurea herbicides (Lee et al., 1988 and Miki et al., 1990), a phosphinothricin-acetyl-transferase conferring phosphinothricin resistance (European application No. 0 242 246), proteins conferring resistance to phenoxy proprionic acids and cycloshexones, such as sethoxydim and haloxyfop (Marshall et al., 1992); and proteins conferring resistance to triazine (psbA and gs+genes) and benzonitrile (nitrilase encoding gene, Przibila et al. (1991).

[0070] A plant of the present invention may also comprise a transgene that confers resistance to insect, pest, viral, or bacterial attack. For example, a transgene conferring resistance to a pest, such as soybean cyst nematode was described in PCT Application WO96/30517 and PCT Application WO93/19181. Jones et al. (1994) describe cloning of the tomato Cf-9 gene for resistance to Cladosporium fulvum); Martin et al. (1993) describe a tomato Pto gene for resistance to Pseudomonas syringae pv. and Mindrinos et al. (1994) describe an Arabidopsis RSP2 gene for resistance to Pseudomonas syringae. Bacillus thuringiensis endotoxins may also be used for insect resistance, for example, Geiser et al. (1986).

[0071] The expression of viral coat proteins as transgenes in transformed plant cells is known to impart resistance to viral infection and/or disease development affected by the virus from which the coat protein gene is derived, as well as by related viruses (Beachy et al., 1990).

[0072] Transgenes may also be used conferring increased nutritional value or another value-added trait. One example is modified fatty acid metabolism, for example, by transforming a plant with an antisense gene of stearoyl-ACP desaturase to increase stearic acid content of the plant, (Knutzon et al., 1992). A sense desaturase gene may also be introduced to alter fatty acid content. Phytate content may be modified by introduction of a phytase-encoding gene to enhance breakdown of phytate, adding more free phosphate to the transformed plant. Modified carbohydrate composition may also be affected, for example, by transforming plants with a gene coding for an enzyme that alters the branching pattern of starch (Shiroza et al., 1988, nucleotide sequence of Streptococcus mutants fructosyltransferase gene); Steinmetz et al. (1985) (nucleotide sequence of Bacillus subtilis levansucrase gene); Pen et al. (1992), production of transgenic plants that express Bacillus lichenifonnis .alpha.-amylase); Elliot et al. (1993), nucleotide sequences of tomato invertase genes); Sogaard et al. (1993), site-directed mutagenesis of barley .alpha.-amylase gene; and Fisher et al. (1993), maize endosperm starch branching enzyme II.

[0073] Transgenes may also be used to alter protein metabolism. For example, U.S. Pat. No. 5,545,545 describes lysine-insensitive maize dihydrodipicolinic acid synthase (DHPS), which is substantially resistant to concentrations of L-lysine which otherwise inhibit the activity of native DHPS. Similarly, EP 0640141 describes sequences encoding lysine-insensitive aspartokinase (AK) capable of causing a higher than normal production of threonine, as well as a subfragment encoding antisense lysine ketoglutarate reductase for increasing lysine.

[0074] A transgene may be employed that alters plant carbohydrate metabolism. For example, fructokinase genes are known for use in metabolic engineering of fructokinase gene expression in transgenic plants and their fruit (U.S. Pat. No. 6,031,154). Further examples of transgenes that may be used are genes that alter grain yield. For example, U.S. Pat. No. 6,486,383 describes modification of starch content in plants with subunit proteins of adenosine diphosphoglucose pyrophosphorylase ("ADPG PPase"). In EP0797673, transgenic plants are discussed in which the introduction and expression of particular DNA molecules results in the formation of easily mobilized phosphate pools outside the vacuole and an enhanced biomass production and/or altered flowering behavior. Still further known are genes for altering plant maturity. U.S. Pat. No. 6,774,284 describes DNA encoding a plant lipase and methods of use thereof for controlling senescence in plants. U.S. Pat. No. 6,140,085 provides FCA genes for altering flowering characteristics, particularly timing of flowering. U.S. Pat. No. 5,637,785 discusses genetically modified plants having modulated flower development such as having early floral meristem development and comprising a structural gene encoding the LEAFY protein in its genome.

[0075] Genes for altering plant morphological characteristics are also known and may be used in accordance with the invention. U.S. Pat. No. 6,184,440 discusses genetically engineered plants which display altered structure or morphology as a result of expressing a cell wall modulation transgene. Examples of cell wall modulation transgenes include a cellulose binding domain, a cellulose binding protein, or a cell wall modifying protein or enzyme such as endoxyloglucan transferase, xyloglucan endo-transglycosylase, an expansin, cellulose synthase, or a novel isolated endo-1,4-.beta.-glucanase.

[0076] A transgene that provides a favorable property can be associated with plant morphology, physiology, growth and development, yield, nutritional enhancement, disease or pest resistance, or environmental or chemical tolerance. A transgene that provides a beneficial agronomic trait to crop plants may be, for example, include but is not limited to the following examples of genetic elements comprising herbicide resistance (U.S. Pat. No. 5,633,435 and U.S. Pat. No. 5,463,175), increased yield (U.S. Pat. No. 5,716,837), insect control (U.S. Pat. No. 6,063,597; U.S. Pat. No. 6,063,756; U.S. Pat. No. 6,093,695; U.S. Pat. No. 5,942,664; and U.S. Pat. No. 6,110,464), fungal disease resistance (U.S. Pat. No. 5,516,671; U.S. Pat. No. 5,773,696; U.S. Pat. No. 6,121,436; U.S. Pat. No. 6,316,407, and U.S. Pat. No. 6,506,962), virus resistance (U.S. Pat. No. 5,304,730 and U.S. Pat. No. 6,013,864), nematode resistance (U.S. Pat. No. 6,228,992), bacterial disease resistance (U.S. Pat. No. 5,516,671), starch production (U.S. Pat. No. 5,750,876 and U.S. Pat. No. 6,476,295), modified oils production (U.S. Pat. No. 6,444,876), high oil production (U.S. Pat. No. 5,608,149 and U.S. Pat. No. 6,476,295), modified fatty acid content (U.S. Pat. No. 6,537,750), high protein production (U.S. Pat. No. 6,380,466), fruit ripening (U.S. Pat. No. 5,512,466), enhanced animal and human nutrition (U.S. Pat. No. 5,985,605 and U.S. Pat. No. 6,171,640), biopolymers (U.S. Pat. No. 5,958,745 and U.S. Patent Publication US20030028917), environmental stress resistance (U.S. Pat. No. 6,072,103), pharmaceutical peptides (U.S. Pat. No. 6,080,560), improved processing traits (U.S. Pat. No. 6,476,295), improved digestibility (U.S. Pat. No. 6,531,648) low raffinose (U.S. Pat. No. 6,166,292), industrial enzyme production (U.S. Pat. No. 5,543,576), improved flavor (U.S. Pat. No. 6,011,199), nitrogen fixation (U.S. Pat. No. 5,229,114), hybrid seed production (U.S. Pat. No. 5,689,041), and biofuel production (U.S. Pat. No. 5,998,700), the genetic elements, methods, and transgenes described in the patents listed above are hereby incorporated by reference.

[0077] Alternatively, a transcribable polynucleotide molecule can effect the above mentioned plant characteristic or phenotype by encoding a RNA molecule that causes the targeted inhibition of expression of an endogenous gene, for example via antisense, inhibitory RNA (RNAi), or cosuppression-mediated mechanisms. The RNA could also be a catalytic RNA molecule (i.e., a ribozyme) engineered to cleave a desired endogenous mRNA product. Certain RNA molecules can also be expressed in plant cells that inhibit targets in organisms other than plants, for example, insects that feed on the plant cells and ingest the inhibitory RNA, or nematodes that feed on plant cells and ingest the inhibitory RNA. Thus, any transcribable polynucleotide molecule that encodes a transcribed RNA molecule that affects a phenotype or morphology change of interest may be useful for the practice of the present invention.

Breeding and Markers

[0078] Breeding techniques take advantage of a plant's method of pollination. There are two general methods of pollination: self-pollination, which occurs if pollen from one flower is transferred to the same or another flower of the same plant, and cross-pollination, which occurs if pollen comes to it from a flower on a different plant. Plants that have been self-pollinated and selected for type over many generations become homozygous at almost all gene loci and produce a uniform population of true breeding progeny, homozygous plants.

[0079] In development of suitable varieties, pedigree breeding may be used. The pedigree breeding method for specific traits involves crossing two genotypes. Each genotype can have one or more desirable characteristics lacking in the other; or, each genotype can complement the other. If the two original parental genotypes do not provide all of the desired characteristics, other genotypes can be included in the breeding population. Superior plants that are the products of these crosses are selfed and are again advanced in each successive generation. Each succeeding generation becomes more homogeneous as a result of self-pollination and selection. Typically, this method of breeding involves five or more generations of selfing and selection: S.sub.1.fwdarw.S.sub.2; S.sub.2.fwdarw.S.sub.3; S.sub.3.fwdarw.S.sub.4; S.sub.4.fwdarw.S.sub.5, etc. A selfed generation (S) may be considered to be a type of filial generation (F) and may be named F as such. After at least five generations, the inbred plant is considered genetically pure.

[0080] Each breeding program should include a periodic, objective evaluation of the efficiency of the breeding procedure. Evaluation criteria vary depending on the goal and objectives. Promising advanced breeding lines are thoroughly tested and compared to appropriate standards in environments representative of the commercial target area(s) for generally three or more years. Identification of individuals that are genetically superior because genotypic value can be masked by confounding plant traits or environmental factors. One method of identifying a superior plant is to observe its performance relative to other experimental plants and to one or more widely grown standard varieties. Single observations can be inconclusive, while replicated observations provide a better estimate of genetic worth.

[0081] Mass and recurrent selections can be used to improve populations of either self-or cross-pollinating crops. A genetically variable population of heterozygous individuals is either identified or created by intercrossing several different parents. The best plants are selected based on individual superiority, outstanding progeny, or excellent combining ability. The selected plants are intercrossed to produce a new population in which further cycles of selection are continued. Descriptions of other breeding methods that are commonly used for different traits and crops can be found in one of several reference books (Allard, 1960; Simmonds, 1979; Sneep and Hendriksen, 1979; Fehr, 1987; Fehr, 1987).

[0082] The effectiveness of selecting for genotypes with enhanced traits of interest (for example, a favorable property such as yield of a harvested plant product, for example yield of a grain, seed, fruit, fiber, forage; or an agronomic trait, for example, pest resistance such as disease resistance, insect resistance, nematode resistance, or improved growth rate, and stress tolerance; or an improved processed product of the plant, for example, fatty acid profile, amino acid profile, nutritional content, fiber quality) in a breeding program will depend upon: 1) the extent to which the variability in the traits of interest of individual plants in a population is the result of genetic factors and is thus transmitted to the progenies of the selected genotypes; and 2) how much the variability in the traits of interest among the plants is due to the environment in which the different genotypes are growing. The inheritance of traits ranges from control by one major gene whose expression is not influenced by the environment (i.e., qualitative characters) to control by many genes whose effects are greatly influenced by the environment (i.e., quantitative characters). Breeding for quantitative traits such as yield is further characterized by the fact that: 1) the differences resulting from the effect of each gene are small, making it difficult or impossible to identify them individually; 2) the number of genes contributing to a character is large, so that distinct segregation ratios are seldom, if ever, obtained; and 3) the effects of the genes may be expressed in different ways based on environmental variation. Therefore, the accurate identification of transgressive segregates or superior genotypes with the traits of interest is extremely difficult and its success is dependent on the plant breeder's ability to minimize the environmental variation affecting the expression of the quantitative character in the population.

[0083] The likelihood of identifying a transgressive segregant is greatly reduced as the number of traits combined into one genotype is increased. Consequently, all the breeder can generally hope for is to obtain a favorable assortment of genes for the first complex character combined with a favorable assortment of genes for the second character into one genotype in addition to a selected gene.

[0084] Introgression of a particular genomic region in a set of genomic regions that contain a transgene, or transgenes into a plant germplasm is defined as the result of the process of backcross conversion. A plant germplasm into which a novel DNA sequence has been introgressed may be referred to as a backcross converted genotype, line, inbred, or hybrid. Additionally, an introgression of a particular genomic region or transgene may be conducted by a forward breeding process. Similarly a plant genotype lacking the desired DNA sequence may be referred to as an unconverted genotype, line, inbred, or hybrid. During breeding, the genetic markers linked to a T-type genomic region may be used to assist in breeding for the purpose of producing soybean plants with increased yield and a transgenic trait. Backcrossing and marker-assisted selection, or forward breeding and marker-assisted selection in particular can be used with the present invention to introduce the T-type genomic region into any variety by conversion of that variety.

[0085] In another embodiment of this invention marker sequences are provided that are genetically linked and can be used to follow the selection of the soybean or corn haplotypes. Genomic libraries from multiple corn or soybean lines are made by isolating genomic DNA from different corn or soybean lines by Plant DNAzol Reagent" from Life Technologies now Invitrogen (Invitrogen Life Technologies, Carlsbad, Calif.). Genomic DNA are digested with Pst 1 endonuclease restriction enzyme, size-fractionated over 1 percent agarose gel and ligated in plasmid vector for sequencing by standard molecular biology techniques as described in Sambrook et al. These libraries are sequenced by standard procedures on ABI Prism.RTM.377 DNA Sequencer using commercially available reagents (Applied Biosystems, Foster City, Calif.). All sequences are assembles to identify non-redundant sequences by Pangea Clustering and Alignment Tools that is available from DoubleTwist Inc., Oakland, Calif. Sequence from multiple corn or soybean lines are assembled into loci having one or more polymorphisms, such as SNPs and/or Indels. Candidate polymorphisms are qualified by the following parameters: [0086] (a) The minimum length of a contig or singleton for a consensus alignment is 200 bases. [0087] (b) The percentage identity of observed bases in a region of 15 bases on each side of a candidate SNP is at least 75 percent. [0088] (c) The minimum Phred quality in each contig at a polymorphism site is 35. [0089] (d) The minimum Phred quality in a region of 15 bases on each side of the polymorphism site is 20.

[0090] Read data from automated sequencers varies significantly in quality due to the nature of nucleotides in a polynucleotide molecule and number of other reasons (Ewing et al., 1998). Many algorithms were developed to address the issue of accurate base pair calling (Giddings et al., 1993; Berno, 1996; Lawrence and Solovyev, 1994). The most widely used algorithm calculates the quality of the sequence as "q" in equation q=.times.-10.times.log 10(p), where p is the estimated error probability of that base call (Ewing and Green, 1998). Thus a base call having a probability of 1/1000 of being incorrect in a particular sequence is assigned a quality score of 30. Quality scores are also referred as "Phred Scores".

Selection of Plants using Marker-Assisted Selection

[0091] A primary motivation for development of molecular markers in crop species is the potential for increased efficiency in plant breeding through marker-assisted selection (MAS). Genetic marker alleles (an "allele" is an alternative sequence at a locus) are used to identify plants that contain a desired genotype at multiple loci, and that are expected to transfer the desired genotype, along with a desired phenotype to their progeny. Genetic marker alleles can be used to identify plants that contain desired genotype at one marker locus, several loci, or a haplotype, and that would be expected to transfer the desired genotype, along with a desired phenotype to their progeny.

[0092] Marker-assisted selection comprises the mapping of phenotypic traits and relies on the ability to detect genetic differences between individuals. A "genetic map" is the representation of the relative position of characterized loci (DNA markers or any other locus for which allele can be identified) along the chromosomes. The measure of distance is relative to the frequency of crossovers event between sister chromatids at meiosis. The genetic differences, or "genetic markers" are then correlated with phenotypic variations using statistical methods. In a preferred case, a single gene encoding a protein responsible for a phenotypic trait is detectable directly by a mutation which results in the variation in phenotype. More commonly, multiple genetic loci each contribute to the observed phenotype.

[0093] The presence and/or absence of a particular genetic marker allele in the genome of a plant exhibiting a favorable phenotypic trait is made by any method listed above using markers, for example, DNA markers are Restriction Fragment Length Polymorphisms (RFLP), Amplified Fragment Length Polymorphisms (AFLP), Simple Sequence Repeats (SSR), Single Nucleotide Polymorphisms (SNP), Insertion/Deletion Polymorphisms (Indels), Variable Number Tandem Repeats (VNTR), and Random Amplified Polymorphic DNA (RAPD), and others known to those skilled in the art. If the nucleic acids from the plant are positive for a desired genetic marker, the plant can be selfed to create a true breeding line with the same genotype, or it can be crossed with a plant with the same marker or with other desired characteristics to create a sexually crossed hybrid generation. Methods of marker-assisted selection (MAS) using a variety of genetic markers are provided. Plants selected by MAS using the methods are provided.

[0094] Marker-assisted introgression involves the transfer of a chromosome region defined by one or more markers from one germplasm to a second germplasm. The initial step in that process is the localization of the genomic region or transgene by gene mapping, which is the process of determining the position of a gene or genomic region relative to other genes and genetic markers through linkage analysis. The basic principle for linkage mapping is that the closer together two genes are on a chromosome, then the more likely they are to be inherited together. Briefly, a cross is generally made between two genetically compatible but divergent parents relative to traits under study. Genetic markers can then be used to follow the segregation of traits under study in the progeny from the cross, often a backcross (BCl), F.sub.2, or recombinant inbred population.

[0095] The selection of a suitable recurrent parent is an important step for a successful backcrossing procedure. The goal of a backcross protocol is to alter or substitute a trait or characteristic in the original inbred. To accomplish this, one or more loci of the recurrent inbred is modified or substituted with the desired gene from the nonrecurrent (donor) parent, while retaining essentially all of the rest of the desired genetic, and therefore the desired physiological and morphological, constitution of the original inbred. The choice of the particular donor parent will depend on the purpose of the backcross. The exact backcrossing protocol will depend on the characteristic or trait being altered to determine an appropriate testing protocol. It may be necessary to introduce a test of the progeny to determine if the desired characteristic has been successfully transferred. In the case of the present invention, one may test the progeny lines generated during the backcrossing program as well as using the marker system described herein to select lines based upon markers rather than visual traits, the markers are indicative of the preferred T-type genomic region or a genomic region comprising a favorable haplotype.

Transformed Plants and Plant Cells

[0096] As used herein, the term "transformed" refers to a cell, tissue, organ, or organism into which has been introduced a foreign polynucleotide molecule, such as a construct. The introduced polynucleotide molecule may be integrated into the genomic DNA of the recipient cell, tissue, organ, or organism such that the introduced polynucleotide molecule is inherited by subsequent progeny. A "transgenic" or "transformed" cell or organism also includes progeny of the cell or organism and progeny produced from a breeding program employing such a transgenic plant as a parent in a cross and exhibiting an altered phenotype resulting from the presence of a foreign polynucleotide molecule. A plant transformation construct containing a polynucleotide molecule of the present invention may be introduced into plants by any plant transformation method. Methods and materials for transforming plants by introducing a plant expression construct into a plant genome in the practice of this invention can include any of the well-known and demonstrated methods including electroporation as illustrated in U.S. Pat. No. 5,384,253; microprojectile bombardment as illustrated in U.S. Pat. No. 5,015,580; U.S. Pat. No. 5,550,318; U.S. Pat. No. 5,538,880; U.S. Pat. No. 6,160,208; U.S. Pat. No. 6,399,861; and U.S. Pat. No. 6,403,865; Agrobacterium-mediated transformation as illustrated in U.S. Pat. No. 5,824,877; U.S. Pat. No. 5,591,616; U.S. Pat. No. 5,981,840; and U.S. Pat. No. 6,384,301; and protoplast transformation as illustrated in U.S. Pat. No. 5,508,184, all of which are hereby incorporated by reference.

[0097] Methods for specifically transforming dicots are well known to those skilled in the art. Transformation and plant regeneration using these methods have been described for a number of crops including, but not limited to, cotton (Gossypium hirsutum), soybean (Glycine max), peanut (Arachis hypogaea), alfalfa (Medicago sativa), and members of the genus Brassica.

[0098] Methods for transforming monocots are well known to those skilled in the art. Transformation and plant regeneration using these methods have been described for a number of crops including, but not limited to, barley (Hordeum vulgarae); maize (Zea mays); oats (Avena sativa); orchard grass (Dactylis glomerata); rice (Oryza sativa, including indica and japonica varieties); sorghum (Sorghum bicolor); sugar cane (Saccharum sp); tall fescue (Festuca arundinacea); turfgrass species (e.g. species: Agrostis stolonifera, Poa pratensis, Stenotaphrum secundatum); and wheat (Triticum aestivum). It is apparent to those of skill in the art that a number of transformation methodologies can be used and modified for production of stable transgenic plants from any number of target crops of interest. Methods for introducing a transgene are well known in the art and include biological and physical, plant transformation protocols. See, for example, Miki et al. (1993). Once a transgene is introduced into a variety it may readily be transferred by crossing. By using backcrossing, essentially all of the desired morphological and physiological characteristics of a variety are recovered in addition to the locus transferred into the variety via the backcrossing technique. Backcrossing and forward breeding methods can be used with the present invention to improve or introduce a characteristic into a plant (Poehlman and Sleper, 1995; Fehr, 1987a, b; Sprague and Dudley, 1988).

Site-Specific Integration of Transgenes

[0099] A number of site-specific recombination-mediated methods have been developed for incorporating transgene into plant genomes, as well as for deleting unwanted genetic elements from plant and animal cells. For example, the cre-lox recombination system of bacteriophage P1, described by Abremski et al. (1983); Sternberg et al. (1981) and others, has been used to promote recombination in a variety of cell types. The cre-lox system utilizes the cre recombinase isolated from bacteriophage P1 in conjunction with the DNA sequences (termed lox sites) it recognizes. This recombination system has been effective for achieving recombination in plant cells (U.S. Pat. No. 5,658,772), animal cells (U.S. Pat. No. 4,959,317 and U.S. Pat. No. 5,801,030), and in viral vectors (Hardy et al., 1997). Targeting and control of insertion or removal of transgene sequences in a plant genome can be achieved by the use of molecular recombination method (U.S. Pat. No. 6,573,425). An introduced polynucleotide molecule comprising a heterologous recombination site incorporated into a haplotype region is within the scope of the prevent invention.

[0100] Wahl et al. (U.S. Pat. No. 5,654,182) used the site-specific FLP recombinase system of Saccharomyces cerevisiae to delete DNA sequences in eukaryotic cells. The deletions were designed to accomplish either inactivation of a gene or activation of a gene by bringing desired DNA fragments into association with one another. Activity of the FLP recombinase in plants has been demonstrated (Lyznik et al, 1996; Luo et al., 2000).

[0101] Others have used transposons, or mobile genetic elements that transpose when a transposase gene is present in the same genome, to separate target genes from ancillary sequences. Yoder el al. (U.S. Pat. No. 5,482,852 and U.S. Pat. No. 5,792,924, both of which are incorporated herein by reference) used constructs containing the sequence of the transposase enzyme and the transposase recognition sequences to provide a method for genetically altering plants that contain a desired gene free of vector and/or marker sequences. Other methods that use DNA sequence directed bacteriophage recombinase or transposases to target specific regions are described in US 20020132350 and EP 1308516 (both of which are incorporated herein by reference). Zinc finger endonucleases can be specifically designed to recognize a DNA sequence and can target specific DNA sequences in a genome to create a recombination site useful for the insertion of a transgene (Wright et al., 2005; U.S. Pat. No. 7,030,215; US 20050208489; US 20050064474, herein incorporated by reference in their entirety), for example, targeted to a haplotype comprising the DNA sequences listed in the sequence listing of the present invention and contained in the genome of a corn or soybean plant is contemplated by the inventors.

[0102] A transgene that contains additional recombination sites when it is a component of a preferred T-type genomic region provides an opportunity to add additional transgenes to the T-type genomic region, thereby increasing the value of the region in a germplasm. The present invention contemplates that the T-type genomic region is also a site for specific recombination activities to remove or add new genetic material to the genomic region.

[0103] The following examples are included to demonstrate preferred embodiments of the invention. It should be appreciated by those of skill in the art that the techniques disclosed in the examples which follow represent techniques discovered by the inventor to function well in the practice of the invention, and thus can be considered to constitute preferred modes for its practice. However, those of skill in the art should, in light of the present disclosure, appreciate that many changes can be made in the specific embodiments which are disclosed and still obtain a like or similar result without departing from the spirit and scope of the invention.

EXAMPLE 1

Identification of Haplotypes

[0104] This example illustrates identifying soybean haplotypes useful in databases for practicing the methods of this invention. The chromosomes of soybean were divided into haplotypes by following the hereditability of a large set of makers. Allelic forms of the haplotypes were identified for a set of 4 haplotypes which are listed in Table 1. With reference to Table 1, a haplotype mapped to a genomic location is identified by reference, for example C8W6H5 refers to chromosome 8, window 6 in that chromosome and haplotype 5 in that window (genomic region); SEQ_ID provides reference to the sequence listing and the marker ID number is an arbitrary identifying name for a DNA amplicon associated with the a marker locus; START_POS refers to the start position of the marker in the DNA amplicon; HAP allele refers to the nucleotide of an SNP/Indel marker at the Start position where * indicates a deletion of an Indel; "other marker states" identifies another nucleotide allele of markers in the window. TABLE-US-00001 TABLE 1 Summary information of marker loci used to characterize four soybean haplotypes associated with the glyphosate tolerant soybean events, including the sequence identification (SEQ ID and marker ID number) and the position of the polymorphism (START POS) being used to characterize alleles (HAP ALLELE) in these sequences. Other START HAP marker Haplotype SEQ_ID POS ALLELE states C8W6H5 1 962360 277 * G 2 1324623 785 A T 3 1271382 239 A G C16W8H43 4 1271562 351 A G 5 894632 193 G C 6 928368 320 A G 7 1267271 563 C A 8 1271614 126 A G 9 1271496 359 T G C18W3H8 10 1271924 603 G A 11 1267375 741 T C 12 860401 372 G C C19W3H6 13 1271355 283 T C 14 1271476 546 A C 15 825651 294 T C

EXAMPLE 2

Preparation of a Database with Agronomic Traits and Haplotypes

[0105] This example illustrates the preparation of a database useful in a method of this invention. With reference to Table 2 the database comprises computed values of agronomic traits, for example, yield, maturity, plant height, and lodging, for the specific allelic soybean haplotypes and the haplotype frequency in a set of breeding lines. Other traits can be measured, for example, yield of a grain, seed, fruit, fiber, forage, oil; or an agronomic trait, for example, pest resistance such as disease resistance, insect resistance, nematode resistance, or improved growth rate, and stress tolerance; or an improved processed product of the plant, for example, fatty acid profile, amino acid profile, nutritional content, fiber quality and a database compiled for the values of each haplotype for these other traits. The agronomic trait values of these haplotypes represent the predicted population change in mean value for the trait listed if the haplotype was fixed in the germplasm, everything else staying the same. The values for "yield" are in bushels of soybeans per acre. The values for "maturity" are in days (maturity of a soybean line is the relative flowering time of that line compared to a set of standard checks of defined maturity). The values for "plant height" are in inches of height measured from the soil surface to the tip of the uppermost plant tissue at maturity. The values of "lodging" are a percent of plants compared to a set of standard checks (lodging is a phenomenon in which the main stem of crop plants has moved from the vertical by a large angle, sometimes to the point of the plants being laying on the ground).

[0106] The breeding values for each of the haplotypes are used to select the haplotype that in combination with a transgene will be the most beneficial for the improvement of the germplasm of the crop. The breeding value is a combination of measured traits and the estimation of how these traits will affect germplasm improvement. The soybean haplotypes associated with the transgenic events for glyphosate tolerance were measured and the results shown in Table 2. The Haplotype C8W6H5 would be a favorable haplotype for its effect on yield, and haplotype C 18W3H8 would be a favorable haplotype for its very high frequency in the germplasm (94 percent), indicating that little variability is present in the target soy germplasm for this chromosome segment, making the diffusion process of a transgenic event in it neutral. Haplotype C19W3H6 is generally neutral with respect to yield. TABLE-US-00002 TABLE 2 The calculated breeding values of four haplotypes described for yield, maturity, plant height, and lodging. The frequency of the haplotype in the soybean germplasm was estimated from a sample of 365 soybean lines. Frequency Yield Plant in a (Bushels/ Maturity height Lodging breeding Haplotype acre) (Days) (inches) (%) population C8W6H5 1.689 0.989 -0.195 -0.027 21% C16W8H43 -0.447 -0.211 -0.514 -0.101 42% C18W3H8 0.000 0.000 0.000 0.000 94% C19W3H6 -0.071 0.232 -0.495 0.001 58%

[0107] The haplotype regions were determined for each of the four new glyphosate tolerant soybean events. 17194 is linked to haplotype C16W8H43, 17426 is linked to haplotype C18W3H8, 19703 is linked to haplotype C19W3H6, and 19788 is linked to haplotype C8W6H5. The relative effect of these haplotypes was measured as illustrated in Table 2. This represents the predicted population change in mean value for the trait listed if the haplotype was fixed in the germplasm, everything else staying the same. The T-type of 19788 and the associated C8W6H5 haplotype is the most favorable of the four T-types that were measured. This result demonstrates that it is important in a process to improve crop performance through transgenic methods that both transgenic events and the linked haplotype regions are evaluated to continue to enhance crop productivity.

[0108] The new glyphosate tolerant events were compared in replicated field trials to a backcross conversion of 40-3-2 into A3244 germplasm. This was demonstrated in replicated field trials including yield data collected from seventeen locations in the United States. The A3244 (U.S. Pat. No. 5,659,114, ATCC number 97549) is an elite soybean germplasm from Asgrow (Monsanto, St Louis, Mo.) that was used as the parent line for transformation to generate the new glyphosate tolerant soybean events 17194, 17426, 19703, and 19788. The results of the yield study showed that 40-3-2 A3244 backcross yielded an average of 60.7 bu/acre, 19788 an average of 65.6 bu/acre, 19703 an average of 65.7 bu/acre, 17426 an average of 65.3 bu/acre, and 17194 an average of 65.8 bu/acre. The four new lines have an approximate yield advantage of 5 bu/acre over the same genotype with the introgressed 40-3-2 T-type genomic region. When the haplotype of each is considered then the most favorable event is 19788.

[0109] These analyses demonstrate the value of determining the T-type for each transgenic event that is being developed as a commercial product. Failure to consider the agronomic effects of the haplotype region in which the transgene has introgressed can result in the introduction of a low performing event into the germplasm of a crop.

EXAMPLE 3

Use of Breeding Values

[0110] The haplotype regions and breeding values of each were determined for four haplotype regions in which an insect tolerance gene was inserted into the genome of a soybean plant. The relative breeding value for each haplotype regions is shown in Table 3, the definitions of the measurements are the same as described in Example 2. The table is a database for determining the haplotype and its breeding value in which an insect tolerance gene was inserted (a T-type). A transgenic event comprising the T-type is selected using the database information. A particular event, GM.sub.--19459, contains the T-type of the insect tolerance gene associated with C6W4H1 haplotype that is a favorable haplotype for maturity. TABLE-US-00003 TABLE 3 The calculated breeding values for yield, maturity, plant height, and lodging of four haplotypes for the insect tolerant soybean events. The frequency of the haplotype in the germplasm was estimated from 2589 soybean lines. Yield Plant (Bushels/ Maturity height Lodging Haplotype Haplotype acre) (Days) (inches) (%) frequency C1W1H2 0.075 0.244 0.057 0.018 16% C1W2H1 0.160 0.314 0.069 0.022 67% C14W7H2 0.130 0.648 -0.101 -0.069 62% C6W4H1 -0.156 -0.111 -- 0.070 29%

[0111] Allelic forms of the haplotypes were identified for a set of 4 haplotypes associated with transgenic insect resistant soybeans as listed in Table 4. With reference to Table 4, a haplotype mapped to a genomic location is identified by reference, for example C1W1H2 refers to chromosome 1, window 1 in that chromosome and haplotype 2 in that window (genomic region); SEQ_ID provides reference to the sequence listing and the marker ID number is an arbitrary identifying name for a DNA amplicon associated with the a marker locus; START_POS refers to the start position of the marker in the DNA amplicon; HAP allele refers to the nucleotide of an SNP/Indel marker at the Start position where * indicates a deletion of an Indel; "other marker states" identifies another nucleotide allele of markers in the window; "NA" indicated another marker allele is not present. TABLE-US-00004 TABLE 4 Summary information of marker loci used to characterize four soybean haplotypes associated with the insect tolerant soybean events, including the sequence identification (SEQ ID and marker ID number) and the position of the polymorphism (START POS) being used to characterize alleles (HAP ALLELE) in these sequences. Other START HAP marker Haplotype SEQ_ID POS ALLELE states C1W1H2 16 NS0092678 0 C T 17 NS0092617 0.4 A G 18 NS0101549 1.4 A G 19 NS0127917 1.4 C A 20 NS0120003 1.8 A T 21 NS0118494 3 C T 22 NS0124158 3 A G C1W2H1 23 NS0101025 11.3 C T 24 NS0101038 11.3 A C 25 NS0127234 11.3 T G 26 NS0129173 11.3 T A 27 NS0097228 16.2 C NA C14W7H2 28 NS0096079 68.5 T C C6W4H1 29 NS0125775 30.3 G C 30 NS0130788 30.3 T C 31 NS0093984 32.9 C T 32 NS0096925 32.9 A *

EXAMPLE 4

Application to Corn Breeding

[0112] This example illustrates the haplotype regions and breeding values that were determined for four haplotype regions in which an insect tolerance gene was inserted into the genome of a corn plant (LH172). The relative breeding value for each haplotype regions is shown in Tabel 5, the definitions of the measurements are the same as described in Example 2. The table is a database for determining the haplotype and its breeding value in which an insect tolerance gene was inserted (a T-type). A transgenic event comprising the T-type is selected using the database information. A particular event contains the T-type of the insect tolerance gene associated with the C1W36H2 haplotype. TABLE-US-00005 TABLE 5 Calculated breeding value for yield of four haplotypes for insect tolerant corn events. The frequency of the haplotype in the germplasm was estimated from 6335 corn lines. Haplotype Yield (Bushels/acre) Haplotype frequency C1W19H14 0.168 9.2% C1W30H4 -0.781 3.3% C1W36H2 0.008 18% C8W4H5 0.377 15%

[0113] Allelic forms of the haplotypes were identified for a set of 4 haplotypes for the transgenic insect resistant corn as listed in Table 6. With reference to Table 6, a haplotype mapped to a genomic location is identified by reference, for example C1W19H14 refers to chromosome 1, window 19 in that chromosome and haplotype 14 in that window (genomic region); SEQ_ID provides reference to the sequence listing and the marker ID number is an arbitrary identifying name for a DNA amplicon associated with the a marker locus; START_POS refers to the start position of the marker in the DNA amplicon; HAP allele refers to the nucleotide of an SNP/Indel marker at the Start position where * indicates a deletion of an Indel; "other marker states" identifies another nucleotide allele of markers in the window. TABLE-US-00006 TABLE 6 Summary information of marker loci used to characterize four corn haplotypes associated with the insect tolerant corn events, including the sequence id (SEQ ID and marker ID number) and the position of the polymorphism (START POS) being used to characterize alleles (HAP ALLELE) in these sequences. Other START HAP marker Haplotype SEQ_ID POS ALLELE states C1W19H14 33 NC0053983 109.4 T C 34 NC0113263 110.1 A G 35 NC0008901 110.8 T C 36 NC0143254 110.9 A G 37 NC0030198 111 A G 38 NC0080733 111 T G 39 NC0104474 111 C T 40 NC0033728 113.3 C A 41 NC0029506 113.6 C G C1W30H4 42 NC0039502 195.5 G A 43 NC0111626 196.4 T C 44 NC0008982 198.4 A G 45 NC0040427 199.4 G T 46 NC0033427 199.8 G T 47 NC0148362 200 G A C1W36H2 48 NC0146570 237 T G 49 NC0008996 238.1 A T 50 NC0013490 240.7 T C C8W4H5 51 NC0111628 57.3 A G 52 NC0026720 58.7 A C 53 NC0037392 60 C T 54 NC0027485 60.1 C T

EXAMPLE 5

Indirect Mapping of a T-type Genomic Region

[0114] DNA markers are identified in the genomic region flanking a transgene insert to provide a means to identify the genomic location of the transgene by comparison of the DNA markers to a mapping population. DNA markers can be developed to any transgenic event by isolation of the genomic region, sequencing of the region, isolation of the same region in a mapping population of the crop plant, and determining the location relative to markers known in the mapping population. The association of the transgene with mapped phenotypes, quantitative trait loci comprising a haplotype genomic region can be determined.

[0115] For example, for MON89788 a DNA primer pair was selected from a DNA sequence that extends into the genome 5' to the transgene insertion site (SEQ ID NO:55 and 56) and into the 3' genomic region relative to the transgene insertion site (SEQ ID NO:57-58). A DNA amplification method was used to produce DNA products that comprise a portion of the soybean genome from the 5' and 3' regions of the transgene insertion site. These DNA products were sequenced. The same primer pairs were used to amplify DNA from seven soybean lines (507354, Minsoy, Noir, HS1, PIC, 88788, A3244) that are parents of four mapping populations. A single nucleotide polymorphism (SNP) was identified at position 119 (SNP119, SEQ ID NO:59) from the 3' flanking sequences when comparing sequences across different lines. Table 7 shows the allelic composition at this position on eight lines tested. TABLE-US-00007 TABLE 7 Polymorphism at flanking sequences in different soybean lines comprising MON89788. 5' Flanking 3' Flanking Position 2809 119 507354 A T Minsoy A T Noir A T HS1 A T PIC T C 88788 T A3244 T 507355 A T

[0116] A Taqman.RTM. (PE Applied Biosystems, Foster City, Calif.) end point assay was developed from SNP119 in accordance to instructions provided by the manufacturer. Primer and probe sequences are given in Table 8. To map the SNP119 polymophism, an F2 population, derived from a cross between HSl.times.PI407305 (PIC), consisting of 140 individuals, was used. Map position of SNP119 was determined by placing the allelic scores against the existing allelic data set using MapMaker (Lincoln and Lander, 1990). SNP119 was found on linkage group D1a+Q (Song, Q. J., et al., 2004). Thus, MON89788 was indirectly mapped to this same position. TABLE-US-00008 TABLE 8 Primer and probe molecules for Taqman assay for mapping haplotype Forward Primer 19788_3E-119F CGTTCTCGACTTCAACCATATGTGA SEQ ID NO:60 Reverse Primer 19788_3E-119R GCATGGAATAAAGCGGAAAGGAAAG SEQ ID NO:61 VIC Probe 19788_3E-119V2 CCATGGTATCATAGGCA SEQ ID NO:62 Fam Probe 19788_3E-119M2 CCATGGTATCGTAGGCA SEQ ID NO:63

[0117] A deposit of Monsanto Technology LLC, soybean seed comprising event MON89788 disclosed above and recited in the claims, has been made under the Budapest Treaty with the American Type Culture Collection (ATCC), 10801 University Boulevard, Manassas, Va. 20110. The ATCC accession number is PTA-6708 deposited on May 11, 2005. The deposit will be maintained in the depository for a period of 30 years, or 5 years after the last request, or for the effective life of the patent, whichever is longer, and will be replaced as necessary during that period. DNA molecules of the present invention can be isolated from the genome of the deposited material and the sequence corrected if necessary, additional DNA molecules for use as probes or primers for the haplotype regions disclosed herein can be isolated from the deposited material.

[0118] All publications, patents and patent applications are herein incorporated by reference to the same extent as if each individual publication or patent application was specifically and individually indicated to be incorporated by reference.

[0119] All of the compositions and methods disclosed and claimed herein can be made and executed without undue experimentation in light of the present disclosure. While the compositions and methods of this invention have been described in terms of preferred embodiments, it will be apparent to those of skill in the art that variations may be applied to the methods and in the steps or in the sequence of steps of the method described herein without departing from the concept, spirit and scope of the invention. More specifically, it will be apparent that certain agents which are both chemically and physiologically related may be substituted for the agents described herein while the same or similar results would be achieved. All such similar substitutes and modifications apparent to those skilled in the art are deemed to be within the spirit, scope and concept of the invention as defined by the appended claims.

REFERENCES

[0120] The following references, to the extent that they provide exemplary procedural or other details supplementary to those set forth herein, are specifically incorporated herein by reference. [0121] U.S. Pat. No. 4,757,011 [0122] U.S. Pat. No. 4,769,061 [0123] U.S. Pat. No. 4,810,648 [0124] U.S. Pat. No. 4,940,835 [0125] U.S. Pat. No. 4,959,317 [0126] U.S. Pat. No. 4,971,908 [0127] U.S. Pat. No. 5,015,580 [0128] U.S. Pat. No. 5,094,945 [0129] U.S. Pat. No. 5,229,114 [0130] U.S. Pat. No. 5,304,730 [0131] U.S. Pat. No. 5,384,253 [0132] U.S. Pat. No. 5,437,697 [0133] U.S. Pat. No. 5,463,175 [0134] U.S. Pat. No. 5,482,852 [0135] U.S. Pat. No. 5,508,184 [0136] U.S. Pat. No. 5,512,466 [0137] U.S. Pat. No. 5,516,671 [0138] U.S. Pat. No. 5,538,880 [0139] U.S. Pat. No. 5,543,576 [0140] U.S. Pat. No. 5,545,545 [0141] U.S. Pat. No. 5,550,318 [0142] U.S. Pat. No. 5,591,616 [0143] U.S. Pat. No. 5,608,149 [0144] U.S. Pat. No. 5,627,061 [0145] U.S. Pat. No. 5,633,435 [0146] U.S. Pat. No. 5,637,785 [0147] U.S. Pat. No. 5,654,182 [0148] U.S. Pat. No. 5,658,772 [0149] U.S. Pat. No. 5,659,114 [0150] U.S. Pat. No. 5,689,041 [0151] U.S. Pat. No. 5,716,837 [0152] U.S. Pat. No. 5,750,876 [0153] U.S. Pat. No. 5,773,696 [0154] U.S. Pat. No. 5,792,924 [0155] U.S. Pat. No. 5,801,030 [0156] U.S. Pat. No. 5,824,877 [0157] U.S. Pat. No. 5,942,664 [0158] U.S. Pat. No. 5,958,745 [0159] U.S. Pat. No. 5,981,840 [0160] U.S. Pat. No. 5,985,605 [0161] U.S. Pat. No. 5,998,700 [0162] U.S. Pat. No. 6,011,199 [0163] U.S. Pat. No. 6,013,864 [0164] U.S. Pat. No. 6,031,154 [0165] U.S. Pat. No. 6,040,497 [0166] U.S. Pat. No. 6,063,597 [0167] U.S. Pat. No. 6,063,756 [0168] U.S. Pat. No. 6,072,103 [0169] U.S. Pat. No. 6,080,560 [0170] U.S. Pat. No. 6,093,695 [0171] U.S. Pat. No. 6,110,464 [0172] U.S. Pat. No. 6,121,436 [0173] U.S. Pat. No. 6,140,085 [0174] U.S. Pat. No. 6,160,208 [0175] U.S. Pat. No. 6,166,292 [0176] U.S. Pat. No. 6,171,640 [0177] U.S. Pat. No. 6,184,440 [0178] U.S. Pat. No. 6,228,992 [0179] U.S. Pat. No. 6,316,407 [0180] U.S. Pat. No. 6,380,466 [0181] U.S. Pat. No. 6,384,301 [0182] U.S. Pat. No. 6,399,861 [0183] U.S. Pat. No. 6,403,865 [0184] U.S. Pat. No. 6,444,876 [0185] U.S. Pat. No. 6,476,295 [0186] U.S. Pat. No. 6,476,295 [0187] U.S. Pat. No. 6,476,295 [0188] U.S. Pat. No. 6,486,383 [0189] U.S. Pat. No. 6,506,962 [0190] U.S. Pat. No. 6,531,648 [0191] U.S. Pat. No. 6,537,750 [0192] U.S. Pat. No. 6,660,911 [0193] U.S. Pat. No. 6,768,044 [0194] U.S. Pat. No. 6,774,284 [0195] U.S. Pat. No. 7,030,215 [0196] U.S. Publn. 20020132350 [0197] U.S. Publn. 20030083480 [0198] U.S. Publn. 20040177399 [0199] U.S. Publn. 20050064474 [0200] U.S. Publn. 20050208489 [0201] U.S. Publn. 20050246798 [0202] U.S. Publn. 20060021093 [0203] U.S. Publn. 20060021094 [0204] U.S. Publn. 20030028917 [0205] Abremski etal., Cell, 32:1301-1311, 1983. [0206] Allard, "Principles of Plant Breeding," John Wiley Sons, NY, U. of CA, Davis, Calif., 50-98, 1960 [0207] Beachy et al., Ann. Rev. Phytopathol., 28:451, 1990. [0208] Berno, Genome Research, 6:80-91, 1996. [0209] Charest et al., Plant Cell Rep., 8:643, 1990. [0210] Cheung et al., Theor. Appl. Genet., 94:569-582, 1997. [0211] Comai et al., Nature, 317:741-744, 1985. [0212] DeBlock, et al., EMBO J., 6:2513-2519, 1987. [0213] Dellaporta et al., Stadler Symposium, 11:263-282, 1988. [0214] Dempster etal. J R. Stat. Soc., 39B:1-38, 1977. [0215] Eichholtz et al., Somatic Cell Mol. Genet., 13:67, 1987. [0216] Elliot et al., Plant Molec. Biol., 21:515, 1993. [0217] European Appln. 0 242 246 [0218] European Appln. 0640141 [0219] European Appln. 0797673 [0220] European Appln. 1308516 [0221] European Patent Appln. 0154204 [0222] Ewing et al., Genome Research, 8:175-185, 1998. [0223] Excoffier and Slatkin, Biol. Evol., 12(5):921-927, 1995. [0224] Fehr, In: Principles of variety development, Theory and Technique, (Vol 1) and In: Crop Species Soybean (Vol 2), Iowa State Univ., Macmillian Pub. Co., NY, 360-376, 1987b. [0225] Fehr, In: Soybeans: Improvement, Production and Uses, 2.sup.nd Ed., Manograph., 16:249, 1987a. [0226] Ferreira et al., J Hered., 91:392-396, 2000. [0227] Fisher et al., Plant Physiol., 102:1045, 1993. [0228] Fromm et al., Proc. Natl. Acad. Sci. USA, 82(17):5824-5828, 1985. [0229] Geiser et al., Gene, 48:109, 1986. [0230] Giddings et al., Nucleic Acid Res., 21:4530-4540, 1993. [0231] Glick et al., In: Methods in Plant Molecular Biology and Biotechnology, CRC Press, Boca Raton, Fla., 1993. [0232] Gordon-Kamm et al., Plant Cell, 2:603-618, 1990. [0233] Hardy et al., J Virology, 71:1842, 1997. [0234] Hinchee el al., Bio/Technology, 6:915-922, 1988. [0235] Ikatu et al., Bio/Technol., 8:241-242, 1990. [0236] Jefferson et al., EMBO J, 6:3901-3907, 1987. [0237] Jefferson, Plant Mol. Biol, Rep., 5:387-405, 1987. [0238] Jones el al., Science, 266:789, 1994. [0239] Katz et al., J Gen. Microbiol., 129:2703-2714, 1983. [0240] Knutzon et al., Proc. Natl. Acad. Sci. USA, 89:2624, 1992. [0241] Lacape et al., Genome, 46:612-626, 2003. [0242] Lawrence and Solovyev; Nucleic Acid Res., 22:1272 1280, 1994. [0243] Lee et al., EMBO J, 7:1241, 1988. [0244] Lee et al., Plant Mol. Biol., 48: 53-461, 2002. [0245] Lewin, In: Genes V, Oxford University Press, NY, 1994. [0246] Lincoln and Lander, Mapping Genes Controlling Quantitative Traits Using MAPMAKER/QTL, Whitehead Institute for Biomedical Research, Massachusetts, 1990. [0247] Luo et al., Plant J, 23:423-430, 2000. [0248] Lyznik et al, Nucleic Acids Res., 24:3784-3789, 1996. [0249] Marshall et al., Theor. Appl. Genet., 83:435, 1992. [0250] Martin et al., Science, 262:1432, 1993. [0251] McCallum et al. (2000) Plant Physiol. 123:439-442, 2000. [0252] Miki et al., In: Methods in Plant Molecular Biology and Biotechnology, Glick and Thompson (Eds.), CRC Press, Inc., Boca Raton, 67-88, 1993. [0253] Miki et al., Theor. Appl. Genet., 80:449, 1990. [0254] Mindrinos et al., Cell, 78:1089, 1994. [0255] Misawa et al, Plant J, 4:833-840, 1993. [0256] Misawa et al, Plant J, 6:481-489, 1994. [0257] Ow et al., Science, 234:856-859, 1986. [0258] Padgette et al., Crop Sci., 35:1451-1461, 1995. [0259] PCT Appln. W093/19181 [0260] PCT Appin. W096/30517 [0261] Pen et al., Bio/Technology, 10:292, 1992. [0262] Poehlman and Sleper, In: Breeding Field Crops, Iowa State University Press, Ames, 1995. [0263] Potrykus et al., Ann. Rev. Plant Physiol. Plant Mol. Biol., 42: 205, 1991. [0264] Przibila et al., Plant Cell, 3:169, 1991. [0265] Rieger et al., In: Glossary of Genetics. Classical and Molecular, 5.sup.th Ed., Springer-Verlag, NY, 1991. [0266] Rodriguez et al., In: Vectors. A Survey of Molecular Cloning Vectors and Their Uses, Butterworths, Boston, 1988. [0267] Rogers et al., Methods In Enzymology, 153:253-277, 1987. [0268] Sambrook et al. [0269] Sathasiivan et al., Nucl. Acids Res., 18:2188-2193, 1990. [0270] Shah et al., Science, 233:478, 1986. [0271] Shiroza et al., J Bacteol., 170:810, 1988. [0272] Simmonds, In: Principles of crop improvement, Longman, Inc., NY, 369-399, 1979. [0273] Sneep and Hendriksen, In: Plant breeding perspectives, Wageningen (Ed.), Center for Agricultural Publishing and Documentation, 1979. [0274] Sogaard et al., J Biol. Chem., 268:22480, 1993. [0275] Song, Q. J., et al, Theor. Appl. Genetics 109:122-128, 2004. [0276] Sprague and Dudley, In: Corn and Corn Improvement, 3.sup.rd Ed., Crop Science of America, Inc.; Soil Science of America, Inc., Wisconsin. 881-883; 901-918, 1988. [0277] Stalker et al., J Biol. Chem., 263:6310-6314, 1988. [0278] Stalker et al., Science, 242:419-423, 1988. [0279] Steinmetz et al., Mol. Gen. Genet., 20:220, 1985. [0280] Stemberg et al. Cold Spring Harbor Symp. Quant. Biol. 45:297-309, 1981. [0281] Sutcliffe et al., Proc. Natl. Acad. Sci. USA, 75:3737-3741, 1978. [0282] Thillet et al., J Biol. Chem., 263:12500-12508, 1988. [0283] Wright et al., Plant Journal, 44:693-705, 2005. [0284] Zukowsky et al., Proc. Natl. Acad Sci. USA, 80:1101-1105, 1983.

Sequence CWU 1

1

63 1 664 DNA Glycine max 1 attcagaagg ctgttaaaac cccctcaggt caccaccact aaacaagaca acaagacaat 60 aacagataaa agcctagttt gtcttgcatg cacttacatg cacctttcat ttttttttct 120 tgcatgcatc atggtcccct actaatacca tttatcttca actattcccc cctctcccaa 180 aatcattcct tgcccttcaa cttttcataa ttgtcttaat taaatgtttg gattaaagtc 240 tataaaagta tcacaaggct tactttttca aaactggata tctaggtaaa attttactct 300 caaacatagt tttgggagta accaaacatt accctaaact gattttaatt tcaaacatat 360 acttttaaac cctcccactg gaaatccgaa cacatcctaa gatgacttat tatatttgca 420 tctatctaat aataataata aaagtaaatg atttttatat tgatacttaa ttatgagttg 480 ttttatgata tgtttattga cttttatagt gattagtatc tactttaaaa atcatatcta 540 ttttggacta gctgagagtg tttatattga caatacatag aaattaaatt ttaagaataa 600 gaaaatgata atcatatttt aggatattgg ttaagaataa aataaaaagt tttattgaaa 660 aaat 664 2 1156 DNA Glycine max 2 aaagccaatc agatgctact gagtaaaatc agaaaaaatg tgaactaaga gcaatgacaa 60 gagtcaagaa tgctatacct gctgctcagt catttgaaga aaagcaatag tatagtacac 120 caattctctg atattagcca caaccaccta tagaaatgca aaccatagat caaagtaatg 180 atacaattga acacaatcag cagaaaaata agatttcaag aatgttctga accaaacctt 240 tcccagtctt ggattaccca caatggtcaa catgagttca aacagctgta aataacaact 300 tagataaaaa tgataattca aaaattatta caagaaatga aggaaacaca aacaaaaaac 360 catgcaaacc tgatttacgc aagttttaga cagatctcag ccagtttatt atttaaaatc 420 ctataattat aattttttat catttccata caataatcca gcaaaactat acgacactaa 480 taaaaggtat agatcaaaca aagaggttcc tttcaatagt aagctgatga agccatcatt 540 gcagtgacta cagtatcccc tgtgaattcc agtttgacta atttgaaatt taggactgtc 600 ttccttgcca aataacaagt taaaggatat cattctagca gctgattcag tttgacactt 660 catagtgacc tttctctctc tggaatactg acaactatgt acagtttaaa ttaaacaatc 720 aagttaacag cagttttctt tggggcagta tgcagatgat acggtttcct ttgggacagc 780 aacattggca aatgtaagag caatcaaggt gatgttgagg agctttgagt tggtatcggg 840 attaaagata aactttgcca acatctgttt ttgggcaatt gggatgactg agcaatggat 900 gaacagtgct gtcagataag ttgcagaatg atgtcagtac tcttttctta tttgggtatc 960 tctattgggg caaactcaag acgtagtgag atttgggatc taatagtcaa gaaatgtgag 1020 agaaaattgt caaaatgaaa acaaaaatat ctttctttcg gggaagggtg actttgatta 1080 agtcagtcct aaactcgatc cctaaaattt ttttcttttt tcaaggctcc aaagaatgtg 1140 gtggataggt aggtga 1156 3 657 DNA Glycine max 3 atagttcact aggtttgtac tcatgccata atatgccaat ctttcacagc attcatttcc 60 tgacataatg tttttagcac cttagtgcga ttttaatact gaaatatgca taaaagatgg 120 tatcgaaatg aaagaagaaa aaaaatccaa taccaagtat gaagcggcat gctttccaat 180 ttccagtttt tttcttgatg gcaggctttt tgtaaatatc aactgtccca tcttctgtgt 240 ataaactatc ttctcccaca tcatgtgatt ttttgacatc tcccatggtt tcacagcaga 300 tttcctattt gctgcttctt gttctctttg atatctacta caaatgtctg tttgcagggt 360 gctggaaatg agtaaacaaa ataatcggca aaaggtacac gaaaattaaa cagattgcta 420 tcatgaattt catattataa atacttgatt tgagggtgtt tatgaagtag gaatagcaaa 480 gagaggttca gcaaagcaat gaatatgtta gctcatgaag ctccaatcac aatccttcgc 540 aaacacattt gatggcaaca ttgtgatttg ggattattag tggtacaaag tgatagttat 600 aatcacaaag aattaatgta gaactgcagg tgacttagga ggtccgggtt cgacccg 657 4 659 DNA Glycine max 4 atttgggcgg cctacacttt tttgaaaatt aaaatatact tttgtgtagc taatttccgt 60 tttgcatgtg tgtgtgtgtg taatggagag agagatagag agaaagagtt agtttggttc 120 ctagtggcac tgaaattact accaaatctc caaaagtagc tatggagtta tttaggatgt 180 ctaatgagtt gagtttatgg tcttatttat atgtggaaat aatgatttat taatccagag 240 cagatgagtt aaaagtttct ctagagaatg gtagtttcta aatgaataaa taggatagaa 300 tctctagcac tcaaagaatg aaagaatgtt tgcattttat tatcacctga gccaatttca 360 gatatctcga ttatttcctc ttaatatccc atggcaacat tcattgcgtt aagccaacat 420 tttaaatgaa agtatctgtg atctccaagt ctttgatatt catttgtcta ttccaaattt 480 tggttccaac tggcttcgaa agctttgatc ctcctccctg ctttcagcat gatttcctca 540 ctcttcttga actttccata ctgaggaagt ctgagacaca atggtaaaac tatcatggtt 600 atggaatcca tgaaacaaac atcattattt tctattaagc tctgaattgt agaaataca 659 5 372 DNA Glycine max 5 cattcatcag aagtgctcaa agcatttaat ttcagtgcca aagcactgca acttttcagc 60 tcagacaaaa caaacttgct agcttccttg gtaactgtgg aggatttgat agacttgaaa 120 accttctcaa cagattcttg agcacatctg cgtacctaaa aagcaaacaa accacaacac 180 aatttaaaca aacaaaaaat ccataagctc agagtaatga aatgattaat aggcatggac 240 caatgtctcg aaatcaaagg tcatatcata accaaatgct gattgcattg taaggtaaaa 300 ccatcaccaa atctatacat aattggatat aaacaaggtt taaaactaaa gttgcagaca 360 cattagcagc ac 372 6 1448 DNA Glycine max 6 aacagaactc tatggccgca agacaagata gaccaaagag gaaggacccg gcatattaaa 60 agcagtgaca aagtaggaaa attgcactca tttacgatca acccaggttc ttgtagatct 120 agagtactag taaggttctc taatcactta tggctcttaa tgttgaatag ccagaagtga 180 taaaatcaaa tcaaataacc ccctagggtc ggcctagtga cggggttttt ggtagcatgc 240 acaaagtctt agattgtaat cttgttgagt cattgtacac caaataaata aataaataaa 300 attaaaatta atgtaaaata tgataaatgc aagtggaatt tatttccaac taatttatgc 360 tcgttctcaa cataaaaaat caagagattt gttgtgcata actctttctt aagccatata 420 tcatgactct tacctgctta gctgtcgcaa aattcagtag cgcttcacat tcatcaagca 480 tggctttttc cctagctgat acaacagcaa cttcagacat taaactattc atgccctcca 540 cctattccag acacacacac aaaaaaaaaa aatggtgtca gccttaaggc tttagggatt 600 taccattgaa tcaaaaagga aaatcattag gaagaaaaaa catacagtta gtagaagaaa 660 aaaagtttga tactaaatgt gtaggcctag aaaatagcaa atgctagtgt gatattgtga 720 gtcaaaccag tagaactcct aaaaaagtaa cacaccccgt gacagcaaag ggcagatagc 780 agatcccatt gcttgcatgg catcaacagc tgaatagata gcatgcttca aatgttcaat 840 atctgcctga aaaagtgtaa acacagtaat gatgttagtg tctctttgca cgcatcgaac 900 caaaagaaca gaaaccacca tacatacctt tgcccctcca gttagaggaa gacgaagagt 960 acttgcctcc aagtcttcta cagccccgga taaggcatca atatgatcac tttcaagtac 1020 agcccagtca tcaaggtagg ccatctgtgt cattaattgt aaaaggacta caatttaaaa 1080 aagtatcaaa aaaaagttga atcatttaaa taagaaaatg gtttcatata tgacttgtaa 1140 tcatatccac cattaataat atgagttatg aataccatgt tatgacagac tagcataaac 1200 aattaaacat aacttttcaa tgtgcagggc caacatcttg ctgagtatat tttcctcatt 1260 tataaacttc acaataaata tctctagtta aattaccaaa aatgaaaatc gggaaaaaaa 1320 aaaaagaaag aaagaaaaag taattgtaat gtatcatcaa caataatatc gcacatagaa 1380 tgataaatat ttcaggcaag agagaagtat tacttgatca ttcaaaatag aattcagctt 1440 cagctcaa 1448 7 922 DNA Glycine max 7 agtttgctag gaagtgggtg ccattctgca agaaattttc tatagaacct agagcaccag 60 agatgtactt cagtgagaaa attgattacc tcaaggacaa ggtgcagccc acctttgtta 120 aggatcgtcg agctatgaag gtctgtatca tatctatcag actagtactt gaacacatgg 180 gagtttaaaa gttagtttaa agcttattca gtgttaattg ggtgtttgac agagagaata 240 tgaagagttt aaggttagga tcaatgcact tgtggcaaaa gctcagaagg ttcctcaagg 300 aggatggatt atgcaggatg ggacaccatg gcctggaaat aatactaagg atcatcctgg 360 tatgattcaa gtctttcttg gtcacagtgg aggtcatgat actgaaggaa acgagcttcc 420 tcgtcttgtt tatgtttccc gagagaaaag gcctggattc caacaccaca agaaagctgg 480 tgctatgaat gctttggtag atttttttga gcagtttttg ttgttcctat gatgtccatt 540 cacctttata tgagacacaa ttccttgaca cttccaatta ttgctgtgat ttgcagattc 600 gggtttccgc tgtgctcaca aatgctcctt tcatgctgaa cttggattgt gatcattatg 660 tcaataacag caaggctgct cgagaggcca tgtgcttttt aatggacccc caaactggga 720 agaaggtctg ctatgtccaa tttcctcaaa gatttgatgg cattgatagg catgatcgtt 780 atgctaatag aaacacggtt ttctttgatg taagtcactg caagaaacac agcatcagca 840 tagcatggcc ttttctttga agcatttgac tatttttttt tggtagtgta agctaatact 900 aactatttct tcttctttgt ct 922 8 730 DNA Glycine max 8 tccctattat tcactgaagt aatgaataag tcgttgaaga aagttgggca tgtcattatg 60 tcaaaatgct tctgacttct gagggtcaaa agtttacacc tcttttctat tttcgtaaaa 120 ttcctgagga acatttttct tctgacatgt aaagtgaaat tttatagctc attgctgtac 180 tgccgtttaa tatctgacaa tcattgaagt taattaaact atctcataaa agttgttggt 240 gatgaatgtc tggaggtgta agcgcaaaat ttgcgaccag ttaatgaatg tcttatcaac 300 gaaaatacgt gtactactaa tcaaccaaca tatgtggctt aaacaatcct agtttgccag 360 tagtataaat gctggggtta cattatcagt agatgttttt attagagaac caggtcatga 420 tcttcagttg aatattgcca caagtatgac atgtgttatg cttgtttttt ccatcagaat 480 agagtagtgg aaaaaaatgc taatctgtga caaatttagg ttgtgtaagt tgaagtagtt 540 gcatggaatg cgcttcatca tgatccttgt gtcagtttct aattttcaat gttattttgg 600 cgtaaacagg ttgaggaata tagccttcgg acgcatctca tgcagatcaa gcaaataagt 660 aaccaatctc ggaaattcaa acacgcatga ccctgacctg cttctacgct aacaagaagt 720 cttttgcagc 730 9 717 DNA Glycine max 9 tcttatttga tttctccaat gatatgcata tagtggtgtt taaccatgtt tcttgacatt 60 tatttgtgtt ctatttgatt atgtgactgc tccagaatag agtatttatg caatatcttg 120 gtaatggaaa ctaacaaagt ggaattaatt aactcattgg agccattcat tatgattgtt 180 tctttcaaat tgtctattga gttcaatcat ttgttgcttc tattgatttt atttaatatt 240 ttagtaggca tgatcggtcc aggggctgta gaaatatagt gaaggatttg aaagtctttg 300 acacattcaa tacttggttg cattatgaat atgttgaaga caaattatag aagataaatc 360 taaggagcaa ttttatatat caacaagcag caagggaata tgttcgaaca gatggtgggc 420 acataggttt ggagctgtca tgagcttaat tcatttgagg atgacccata ttttaaccgt 480 caaaagcaaa acatagaata aaaggaatat tgattctgtt tgcattttgt ttggggtact 540 ggctagacta gatacgtttt cctggtccaa tggaaacctt tggatcgttg gttgatttga 600 agttagtaat atgctgaagg aggaagctac aagagaagtt tgcatttcac gatcattttt 660 ccttatgtat aggttgtttt ctattgttta ctcacatttt cagctgcagg catgcaa 717 10 807 DNA Glycine max 10 tgcataatca accaactgat atgacatttt ctgtggaatg gacagaaccg ttattatcat 60 gatgttaatc agtagatcat ttgccatctg gcttccagat gttaatctct tagtagaatt 120 tttcattgcc tagtattgag aataaaacag atttgagact taaggttctg tatgcaatac 180 aatgaattgt ttattagcat tgtctacttc ttgatactga tggttgtcat tacagtaata 240 tgtcgaaatc tataataact aataatcact taagcaacaa gttaaattct gtttttggta 300 ttttgtcatg ggtgtcttta atgcaagttt atatctttga tgcttttttg gttttatttt 360 tacaaaatag tagatgaagt tcattaagat gtttttcctt attgattgtg aaatggaatg 420 catgataata tttggtgttc tgtctacctc tcctgaatta gaccagtcag tttaattctg 480 ttgctctctc tgttttattt tactctcaat ctttgtgagt ttttcggttc acttgagttg 540 tactgtctct agaaggtcct attactttat tggtcaagaa aaatatagaa ggatattaat 600 ccaaccttgt gatttgtgtg cattacactc atacacattt tatgttcata tagtcatctc 660 aatctaatta gttgttctat gcaaactttg ttgtggaatt gaactgcttc ctgctgtgca 720 tttaacttgc cttgcttact gttctccttc tgtgtgtctc aggttagaaa ttctttgaat 780 gctcagcctt ggatccataa taatctg 807 11 839 DNA Glycine max 11 caaacttgca tgcctgcaga ctatatggca ccgcccctat agtgcccctt ccaacgatta 60 cctagaattt aatttgatct tctgaaggta gatgaataaa taaataaacg tgtgaaaata 120 aaacagtaag tacatgccag tacgtaataa tgtgaactag tttgtataca tgaatttagg 180 tccaatgctg caaaagacct agttagactt ggaacataaa aggatatatt taaatgactc 240 tcaagattaa ctaaataata cacagacaaa tcagataatt aaactgcacg gccactaagg 300 gatcagcata tgtgaaagtc tcagagagca gacatgtcgc tagttatata taaatcaagc 360 tgattttatt atctatatgg gaatcaaata caagcttaat tctcttttgc tagcttcaat 420 ttggatacac ataattccaa cctccaccaa ttgataacaa atactagtaa tgtacacatg 480 ctattgtgcc cccgggtagc ttaactcttg aaaaacacat tctcgtggca tctcttgacg 540 cacaccctcg taattcgaag caacaagagg aggataaatc agagaactgg tttcacccca 600 atcagtactt tgtccacaac ttgaagaagc acaagcacca cccattatct tatcggtaat 660 catttcctcc ccatagaaga actccaacga cccacctaat aacctttgag aataatcatt 720 agcagaaacc atttccccaa ctgtagtggt agtgaaactg ctttgtcctt gcaccaactc 780 agcacggtac tcattgctct gaccaaactg tacttggttg ttgccaatag agttcatgc 839 12 751 DNA Glycine max 12 aaacactggc ttcggattta tcacttgtaa gtagagtttg ctaactaaaa tgctttgtca 60 ctctttattt tcaggttttg ttctgatatc caaaatcctg gctatgttga ttgccattca 120 aattgtaata agtctacatc tcaagcgtct ttgtttttgt gttccgactc caacagtaga 180 agaaatggtg tttttggtag accactttgt gtgaacccct ctggcaggag aaacctagtt 240 ggtccagctt tttattctct ggagactagt gcttatgacg tggctgcttt agaatctcct 300 tcccgtgttg cagaagaaaa agttggtgtg ctgcttctca atctaggagg accagagaca 360 ttgagtgacg tgcaaccttt tctgtttaat ctttttgcag atcctgtatg ttagtttgta 420 tttgtgcttt ttctactgtt gatttttctt tttcctgttt atgtaaattc cattagcatt 480 agtacatgtt catatgattt gtatgctaat gtgtttcttg tattgacata ggatatcatt 540 cgtcttccaa ggttgtttcg gtttctccag cgaccattgg caaaattgat ttctgtactt 600 cgggctccta aatccaagga agggtatgct gctattggtg gtggctctcc tttacgaaaa 660 attacagatg accaggtgga gtttaaattt tttggttttc ccattatctg ctttgtggag 720 cttttatctt tctgcaacat gaatcttttt t 751 13 663 DNA Glycine max 13 aatatgaaga agtacctgct ttaccattca acttagtgac tgtaagtatt caatactaga 60 gccaagttca cttttctatt tagctacata ctaggggggg ttctcttggt aaaaagaaac 120 tatctatact tatatgttat ggaattacat gactttcatg atacaaatca catgaatatc 180 aatagttgca agtagttctt taatgattta tatttcttag gaacatgact tgtgcataac 240 ttctttgagg tcaatccacg gcttagagta attctgggaa cccgtttgca tcattgtaaa 300 caggcatttc acactttcga atgcatcaaa tgaagcaaca tttttttata attggcattc 360 aatgtccatt tggatggttt gaactgataa ccatttggat ggtttttaca attggcatct 420 gtgtcttcag gaaggggact tgagcaggac agcttccttt gcagatgatg gagaggtttt 480 agatggaata attactcgaa gccggggtga ggttagacgt gtttgcagtc caaaggtgat 540 gaaatccact ccaaacctat cccaagagtt aacaagtcca aggctcacag ataaagtata 600 cagccctcgg ataagccatc tcagaggaaa tcaaagccct cgaggtgttg ggagaggatc 660 att 663 14 713 DNA Glycine max 14 cttaagtctg aaaacaattt gtctacttgt acataatctt tatcatggac aaagtatcaa 60 gaacagaaaa tattttatat attgtctact cttgcctcat tcttacacat ctttatttta 120 tttattttgt ttcaagttat ttgttattga aaaagataaa agtgttaact gtttttttaa 180 taatattcta atttaaataa attcaaacct atatttgagc tcttttttta atgaggataa 240 tttaatattt tttatgttat aacttgtgta attaatattt ttttgagaga actcagcaaa 300 aaataaataa atttttgaga gaaaaataat atttttttta aagaagtgtg tattatttta 360 aaaaataaat aatatgagat ggaggcaaca tgtgatttta acaatgactt gtaacatcta 420 taagctcaaa atttttgaaa aatgaactgg cgtaggataa aattaaacta cctggataaa 480 gcaaaggttc ttcccaattg gttatttaaa gcaatcttct ttgtataatg gataccataa 540 cttcaatctc ttaactacca tgatttgatt gaagcgatcg atctcacaaa gatgttcctt 600 tcaatattct taaactcaag tacaattttc cctcaaggac ccactatgtc tatattccat 660 tggattacat agtaaaagca aaccaataat ttctctacct ttagctgcat ttt 713 15 534 DNA Glycine max 15 cttgcatgcc tgcaggagac tttgagaaag cacacttcag ttgtttaccc gatataggag 60 agatacaagt taaagggtta atggtaagta ctttactttc tgtttagtat ctatgcatcc 120 ttttatgaat ttctgcacca atgagttttt gctcaagtta ctgcacattc tcctaggtga 180 agcgaaaatc atgcttctac tacctgcctg attcgttccc tatacagtat gctttccctg 240 caatcagagg aacatggttt cttcatgtgg aagtgaagca tttaaagcgt ttgcgcattc 300 cgtgtccacc tggtgatgca gctgttcttt caaaacataa ggacttaaag acctgcaatg 360 gtgaggataa ggcaaaatgc aacagtgagg aaaataaaat ggaagggttc caaccccgtt 420 catgttttgc agaagagcat gaaactacta atcatgtttc aaagaagctg aacaaaaaga 480 gaatttctaa tgaaaaccac acgcagaatg aagccactgg aatgccagaa agat 534 16 433 DNA Glycine max 16 ttgccaatgc agctgctggc ttgagtgcag ccatggcagc tcagcttgtg tggacccctg 60 ttgatgtcgt gagccagagg ctgatggttc aaggtgtttg tgattcggga aatcctaagg 120 cttcagctct tcggtacatc aacgggattg atgccttcag gaagatcttg agcagtgatg 180 gtcttagggg cttgtatagg ggttttggga tatcaatttt gacctatgcc ccttcaaatg 240 cagtttggtg ggcttcatat tctgttgcac aaaggatggt ttggggtgga gttgggtact 300 acttgtgcaa gggaaatgat agtgcactga agcctgatac aaagactgtg atggcagttc 360 agggagtcag tgcagcagtg gctggtggca tgtctgcttt gatcaccatg ccactggata 420 ccatcaagac aag 433 17 554 DNA Glycine max 17 aaaaaaaagg acaatcatta aacacgtatc taaaatgcat ttcatcaaaa tgaaaaatta 60 tgcaatactg aaaatccatg cgtgttataa aggcaaacaa aatgaacttg gagagcaatg 120 caacaaagta ctttttacag tcaatgtgca ctttaaaaaa tagtatattt catacttaca 180 taaaagagct gaatgagtgc aagacgtacg aaagaataaa atttcaaagt gccacctaag 240 tcacagagtt tatgagaaac aaactgtgag ctttggtcag gtaatatcca ccacaatgca 300 gggatgacaa ccgagtttag gacgaatata ctgcacaaaa atttaaaaga tgttgaaatc 360 attaaacacg tagattttag attcatgatt tgttcaggac aatcaatcca tggatgacaa 420 aaatatgtac aatcagattc cttcgagtca ttatgtcaaa agtatacata atccaatttc 480 tttgccacaa aatttcattc actgtgttga aataaattga agctagtttc acttctcctt 540 ctgcaggtcg actt 554 18 810 DNA Glycine max 18 tgagtttaat catgtatctt ctttttcaat gcttttggtt ggacattaaa gccatatttg 60 tttggatttt gtgccatata ctatcaatcc aattttatta agaactaaga cctactagtt 120 tttcaaacaa ggtcagcaat actcaaaaat aaattgccaa gttggaccct gtagttttgt 180 aaatgtatcc caacaattat aattaaaaag tagttgtact gtataatatc tagcaaattc 240 aaaattctaa agtcaatttt ttactgtcta atccaaatgg acgctaagat actagactat 300 tgatactcac agaacattat ctgtagttaa catgaaaaat gtagtttgtg gttttgatgc 360 ttcctttttt attttattta agtgacttag tttgtagatt ttactttgca gggagactat 420 catgttgaca gtgaattttg tggtacggac agtgtacagc tgaaaggatc tgagattact 480 gctgaactta agtatctctt aaacttgttg acattgtgtt ggcacttttc gaagaagccc 540 tttcccttgt ttttagaaga aactggctac agtgaagaaa acgttctcct tcgagaagcc 600 aaagcaggag taagtcttgt ttgaatttta ggaaaaaatg ataataattc aatatctgta 660 ctgtgttgac taagtcattg atagttatta acacacattc tcttttgagg aaggatggaa 720 ggctgaaagc acaagagatg ttgttttatt tagactgata aaatatggga taaaaaattg 780 atgatagatg ccttcttttt gcttcacttt 810 19 1222 DNA Glycine max 19 aggtgcagct gcccttttgt actcacattg aatcggaaat tgcagactca cgatgcaatc 60 gacgaagatg gtgaagaaaa tgggagtgat acgcccactg atacgccatt aggtattggc 120 cgtgtttctc atcggttaat ccaagcccct gcaacatggt tggagacaat ttcaacattg 180 tcagagactc tcaggttcac gtattcggag acacttggga aatggccaat tggggatttg 240 gcgtttggca tcagctttct tctaaagcgg caggtaatga caacgagtag attttggttc 300 tttattgttg ccctgatttg aagcaactga aaatgccgga aagctgtgtc gtttttttat 360 ctatctgtaa ctttggacac atttaagtag taagtagaat atagaatcag tatttagtgt 420 ggaagccagg tgcatatttt tcaggtagaa ctataattga tctgaaatgg tagttgcaac 480 ctgcacttaa tgtgcaactc acataattca cctaaaggat tgcccgtcac tgacattgat 540 gaatgaaaga gagagaaata tagagaaagt aaaatggaca atggtcatgg aagttatgcg 600

agttaggtga actttttcat gtgtaaataa aattgttgat gattaaatgg ttggaagacc 660 aatttaatgc tttccatctc aataaaaaaa attatggtcg ttaaagaaat aatcccacat 720 ttggagagca gttataattt atattaactc ttaagtgttc cttacatgtg ataggatctt 780 tttttcatgg gtgttgtttt tgttttgctt tttaagcctt ctcaacatca cagcagtggt 840 ttgggatcat gacttgcaat atttgaactt cttttgttac ttgttaatca tgatcacttt 900 gaagttgaca tattagatta ttcttggatt ttatgattta catgatatga tttctgttta 960 tactttctag gagagtaata tggctaaggt agcttagaaa tcagactatt ctctacaaaa 1020 tgaatcctca caagattgtt cacatgacct gtgctacttt atatatttga ttttgattta 1080 aatcatatat taagcatttt ttaaggtaga tcagtttcat gacatcctgg actacttaat 1140 ttcttcatct cagctcaaca taaatagatg agaagttgct ctgtaatatt ggttttgtgc 1200 cagtttaatt tgttcatatt aa 1222 20 682 DNA Glycine max 20 gcaacaaatg gatgtacaca gtccgagccc tgcataattg gagcaaattg catattccaa 60 cccttaatga gaatacagaa atcaaaactt gataaataaa atacttcaaa tttgcccaca 120 ttggttgcta atagctattg cagagtatac acaattattc cagtaataca aacttgcata 180 ttaccacaga actcttaatc ataccaacat taaaatagtc tctgtagcca ggctttgagg 240 gcacaagaaa gtgcaacaaa tagttgaaaa aacactgtat gttgtggttc actaatctct 300 tatgaaacag gttatgtaga aaggcttcag ttgtcaccta ctaacatcag ttcaccttac 360 acttgtaatc gtagctacat cttgcttccg gaaacaaagc tatgtatcta atcactaata 420 atgacatcaa agtatgaagt aagatatacc ttttcttcag aggtactctt gacgaaacac 480 aggaatgcct ctgagaactg agagccactg gaaccggttc gcaagtcagc aacgcaaggg 540 cattcaaggg ctttctgagt catttcctcc acagactgaa aaatccaaaa tttccaattg 600 ttttcattcg ttcaacctgc caagttgaaa gcaaaagatc aaagcaattc aattcaaacc 660 tcagtgttct gatttccata tt 682 21 681 DNA Glycine max 21 actgaaacat tcgaaattcc ttacataatt tattcttatt aaaaataaca gtaatctttt 60 gacttgaatt ggtacagaag tacaattatt ggttgaggct tttatttcac gcataccaca 120 atgaaacaca tttcaatttt tcttacccct ggttaattta atgtaccgaa tttatacatc 180 aaagagaaga taactttcga agtaaaaatg attatcctaa acaccgtatg ataaagtgta 240 taagattgtt caccattact taggtttttg gaaatgtcaa accttagcac tatggtaagt 300 ttgttgcctt gtaacttgga ggtcatgggt tcaaatcctg caaacagcct ctccttaggt 360 aggaacctca tgcattggac tgccattttt gttgctttgg tgtgagcata gatactgctg 420 aattctttga gatgccactt ctcacttatc ccatttttaa cttgcaatct taacttatgc 480 ttgtaactac gttcctgaag gcctagaaat ggtgggataa ggatttatgt gttgtttctt 540 gatggatgtt ttgcagacct tcatgctggg tggatcacat gtcactggac aatgaaacag 600 gattggatcc accaggcata agagttaggc ctgtctctgg acttgtagct gctgattact 660 ttgctgcagg tcgactctag a 681 22 1002 DNA Glycine max 22 tgagagcttc cattcagaac tactatcaag tactgacagt tagcttcaat acttcattta 60 taataaacag aataatcgct taaatgaaat tggtttagtt tcattcacat taatttcagg 120 cacagtgctt tagatgtaat caattcaggg agctgagaaa gaaattccac aaaccctcag 180 attttaaaag ttgaacatcc tcagcttgct gcatcaatta acagtaaaaa agaaaggaaa 240 tgagaaaaaa tgagatttaa gattttatag caatttcatg tgagatatta gagcagatat 300 gagagttgta actctgaaat ttcaactcac tatccaattt tcttccacag tattatcatt 360 gactccatgt aggttattaa ctgagttcag ctgaatcctg tcagttggat ttgagaatga 420 ataatgatgt tgatatttat gtttttatgt tccaaaaggc cacctttggg caaagggaat 480 acaaacaatt acaatgaaac aagtaattat atacagaaaa ctgagaaaag aaaaaaatca 540 acaaatacct gctttctcca tctgaaaatg agcaactcag ggcagggagt tcagatgata 600 actgcttagc atgatcttta gaaccagagc gacataaatc cacaccagaa cttgactgag 660 ataactctga gtgattctgt gcatgaaagt aaattaaata ttaagctgaa tttaggaaat 720 aacgtaatta ctttatcagg aaggaggaag gagaaagcaa aagcatggat aaaaaggaac 780 ttctgcttat gttgctttgg acacaattta taaattttgt aatatgttta gatgttaaag 840 ctgaactaac ttcaaaaaca gaatgagact caacatttag cacactttca agcaaaaatt 900 gttcatgaaa atatcatcag ttcatcactt ttgaagttaa tcaaatgttg cctgcttact 960 ggtaatattt taccaatact atcagcacaa gtagttttat cc 1002 23 785 DNA Glycine max 23 cgacttcgta gcctgcagca agcaaaggcc caactccgtc cagggacggt ccagtacggt 60 ccaagatgat caagctcacc actatgcaca tttctttcgc ctcccaaacc tgaacttgct 120 gcaacagttt taaaagaata ttaatattaa tattaatatt aaagtttcct acagtaagtt 180 attatttaga ataaacataa aaaaaattta tatgaatatt tatttttaat aaataaatag 240 tattcataaa aatgctaaaa tcaagcaagt aattttccta caattttaaa atttgcaaga 300 aaattacata caaatttaaa atccacaaga aagataaact gtgtatttta aattcctgaa 360 aaattaaata caaatttaaa ttccgaagga aaattactag caaatttaaa ttctaccaga 420 aaattatttg caattcaccg aaaaattact tgcaaataaa ttatctgtga aatttctagc 480 agattctttt agtaaaactt tatttataga cacaccactt tttatgtaaa acattttgcc 540 gcagaaattg ttgtatttgt tctagaaaaa ttagcaagaa attttctatg agtttcaaaa 600 ttttcaaaaa attaattatc tactaaggta ttatttagga acccaagtat tggaaattca 660 caggtaatta gtaataagaa aaattctata agatatcgta aaaatataga tcacaataaa 720 gcaagataaa cgtacgggga aaaaaaaatg taaaagggaa tctatcttcg tataaactaa 780 cgtat 785 24 805 DNA Glycine max 24 tattaggtca gccattatga caacatcgga tatattcgac aatactaagg aactcatcaa 60 ggacattgct gatgattaca aaccagcctc tcctttagcc ttgggatctg gtcatgtcaa 120 ccccaacaaa gcccttgacc ctggacttgt ttacgatgta ggagttcaag attatgtcaa 180 tcttctctgt gcaatgagct ccactcaaca gaacatctca atcatcacta gatcgtctac 240 taataattgc tccaatcctt ccttggatct caactaccct tctttcattg gtttcttcag 300 tagcaatggt tcttctaatg aatcaagggt agcttgggca tttcagagaa cagtgaccaa 360 tgttggggag aaacaaacaa tctattctgc taacgttaca cccatcaaag ggtttaatgt 420 tagtgttgtt ccaagcaagt tggtgttcaa ggagaagaac gagaagctaa gttataagtt 480 aaggatagaa ggtccaatgg tcgaaggctt tgggtatctg acttggacgg acatgaagca 540 tgcggtgagg agccctattg tggtcaccaa tcaggcaccc tcaaattcaa tttccatata 600 gatcaatttt gtgatggata aatgtttttc atatgtttga agttaaaaat atatattaat 660 agaggaaatg ttcgtacatg aatgattatc atttctgata ataataataa ttttttttgg 720 aaaagtttta acaccaattt taattttttt tttcttatca cgcacaccaa ttttaattgt 780 tacgtactga aataatacgt tagtt 805 25 1222 DNA Glycine max 25 tagatctgca ctcgtgaatg ataacattgt tgaattaagg attttgatgc ttgatgcttg 60 atgcttgact atgagagaga atactattga aaattgaagt gaatacttag aagaagttca 120 tggccttgga atggaatgat catgtgaacc tcattacctg ccgacttggc actgcatata 180 tggatctaat tcaagtcctt ttcatcctcc taaatgcctg tcccttcttc tttagttctg 240 atcctcaact tatccacatt agctttcttt ttctagtatt tacaaggatt gctaaaatta 300 attttatttg taataataaa aatgtttatt attgttgtct ataattatta ataaatacaa 360 ttactcgttt tagtgtacat atttcttatt tctatatacc ctttaatata ttaattattt 420 tcttcataaa ccttcaagat gtaactgttc taattttttt ctaaaaaaac tgttatcaat 480 actttcttta attgtttccc ttttttaaaa taaagataga agcatgaagt gtctcatttt 540 caattattta aataaacaat actttagtta gacacaagtt cgaactataa gtttcccata 600 attttgctcc attatatcct acaatttttg tgaaatatat atattcttac aagataatat 660 tacgcacaac ttttcatcaa aatgttacaa acaactcgag cattttagga catttttttt 720 caagtaaatc ccaggccgaa taatcatcaa cctatgttac attcaccccc aacataaaaa 780 ctaacggggg aagatatcta ttgttagtct gtacatttgt tagtgcctga tctctctcgc 840 ctacacagtc gcttgttctt ttaaaaaaaa ccagttagtc accgtttatt ggtcttctcc 900 ttgcctgcaa acaagtttgc cttgtgtcag aattaagcat tactatagag aagcataatt 960 ttcttaaata agattactca ccaaatatag ttgattttaa aggaaatcga attgatgaac 1020 ccttaaatct cagctcccga ttatgcttgt ttctattttg tttctcaata gcactggaac 1080 tattgctagt ttctccggtc agaaagtttg ccactttact taccttttca tggtacacag 1140 caggtggggc aagcttcaat ggaggcaagt tttctatttg cataaatctc tgattcttct 1200 gcaagctgct caagatctgg aa 1222 26 1177 DNA Glycine max 26 agtatttttt aaagtacagt gagaaaatgt aaaataaata aataaataaa taaattatct 60 tagctatcat attattgccg ataaaaaaaa atgtcttggc tatcaaagct cttaaagctt 120 accatttagt acggatcctt ccgtggcatc tttatacgcc catttacatg catctatggt 180 actttcagat gcgtatctaa aaaaaaaatt acccaagtta agtatgtata tatgctttga 240 ataataatca gagacaacta aagaagctgg tttctttcat aaaaaaaaaa gaagctggtt 300 tctgttgttt ttctagttat gggtttttgg gatttaaata aagaactcat ttttaagcat 360 gtgataggat ggatatgcca ctattttcaa catcagagaa ggatattata tttttatatt 420 ctaaaggatt attttaatac tattatatgt attgtattta aattatttat aataaaaatc 480 ttaccaagaa aattgaaaga tataaacgtg aaactcgcaa aagaaacatt atagaaataa 540 ttggatttgg gtaaatgata tattaattat attattaata ataatgggac atacgtagct 600 gggcatggaa ggtcattatc accgcagttc tcccattctt ctatctgatc ggcccatact 660 ttctgtttat ttagataaaa ataaataaaa aattgaagat atacaacctc aaacttcaca 720 acccaaatcc ttatttagat tgatgattaa aacaaaattg catacaatac cgtaatattc 780 tgttgaagtg catcgatgaa ttcgttcatg tcagagtcat aaaaattgtc gacttctgtc 840 tgaaggatgg tactatccca aacctgtatg tatatgaagg taaaattaaa catatcatat 900 tgtcgtatat atagtatgtg aagacaagaa atggcaagtt ttaatgcatt ccttcatcca 960 gtttctaatt taaggactca tatttttatc tcaatacatc agattttaaa atgcacacat 1020 tcgagattta aaccatctga tcttcaccta acggtgtcga tctatgaatc catgaagaaa 1080 aaattgactt acatggtgaa gattttgctt ccttttatac cagcgaacag taattgcatt 1140 gcccccatgt ctgagagaaa gccacaatgt agaggct 1177 27 685 DNA Glycine max 27 agtttgcatg cctgcagcca agctctcgtg gatttggtgg tgttgttcgg gtaaattttt 60 cataatttta tattaaattc ttatgtttct tgatgtgttt tgcaccaaat tcacctattt 120 tgggcataac agacacccac cattgcctgt tcctgctgtc tgtgaaagtc agctgcttta 180 cagctgatgc cgtgggctgt tgtagctctg tcaaactcat ggcctcttaa aaaaaacata 240 ccccagtgtc ataaggctct tcactatgcg aaagtatggg agagggtcat tgtatgtagc 300 cttgtccttg ctgatgcaag gaggttgctt ccgaattcaa acccatgacc aactggttag 360 gcacaacttt actgttattc caggactcgc cctctagcca aaatgacctt aacaaaaaat 420 agtctctagc taaatgaatt gtgtcaatgg tgttatttta aaggttaaac aaatgtgtat 480 agtccatcag agacaaaaga gtttacacac taaaactgat agcataaatt gtcacaggct 540 gctattatgg atatacaagt tgttccccat ggttttctta catgcggtgg ggatggaatt 600 gtaaagctgg tacggctgga aaataacttg cttggccatg gaattgagtt atgatacttc 660 tgagatcctt tgggttgatg acaaa 685 28 1343 DNA Glycine max 28 cttgcatgcc tgcagaaaat tataaaattc ataaaacgct tctacagaaa attcaaaaag 60 attgatttgc tatatatcct atacacttgg acattttagt cgaaacctct atggatagag 120 actttaagga cacaatatta tgaaaaatat tcagctcaaa tattataaaa tgttaaaaaa 180 caatgcatct caatattttt ttgaaaggtg tacttttaac atgtttatga attgtacttt 240 cttggctaat gagtgttttc cctggttaat gttatggatt gtactttgta ggtgtactga 300 tatatttttt ttcattaaat actaatgttg attattcaat tttagaacag tgtacgcata 360 gtatatgact gttattgata aggtatttgt tatcgataag gtgcagttaa tataagcaga 420 gaaagaaact aaaaggttaa ctacatgatt aagcttaagt gatgaaatgg ataatccctt 480 aaaagtcatc catgatttgt atatttggtg ctttgcaaaa acaatcattt aagttcgtct 540 tcaaaatcta gtaacagatt acttaaccat tcttttagat cactacaaat atagtatttg 600 tttttttaga aggaaaaaat tgggtctgtg gcatcttaat ttttgtgatc acatttttgg 660 ttctagtgat actaacttct tagttcttac aatgtatgta tattttttct ttttacaaat 720 gtcactcttc tcagctggat tcatggtttg aaaactcttt cctcataatg tcaacaggtg 780 gctccctata atacatttta ctcccaattg gaaaagcata tgaatgaagt tggaattgtg 840 cccacagtta accgatggga tgagcctcta gcattgggca tggttgatcc ccatgattca 900 ttatctcatc cagcaggtgt ctctgatgtt caagctgagt ctgctacacg ggtggaccct 960 gatcagttca ctgattttgt ggtatgaatg tttctttaac attgacttgt aaggaaagta 1020 aaatagtgga tatatgtgtg cacacatgtg tatgcgccag tagatggtat ctttaacatt 1080 catatatgct ttttctctgt ctgtattgtt gtcatgcaga ttccaaactg gtttggagga 1140 gagtccactg gggctacaaa aggcaaccca ttcacgttac cagatgccta tatggtatct 1200 cagcataaaa atgtatgtgt ggtattaatt gatttgtata ttaattaacg attggatttc 1260 caaatctttt ggtgatcaaa ttttcaaaaa actttatttt aagcgaaata tgttttaact 1320 acaatgaaat tgtatcttct tct 1343 29 1062 DNA Glycine max 29 aacttgtgat tcttaatagc cttctcacgc tttttgttgt caaaggttaa ttgatgcagc 60 tttccatata gcagataagc actatataat tacgtatttc aaactacaca ttagtaatta 120 tgtcaggact tttgattatt tctgttggtc aaatttagaa tatggtacta agttaatcta 180 tagtaaatta aactaaccct ttttgagatt agatataact ctctactttt ttttaatatt 240 acattgacat ccttatacag ttatatatat atatatatat atatttaaaa taattaagga 300 agatttatat gtataaaggt gtcaatgtaa actatatatt tttaaatata acaaaaaaga 360 atagtgtagg ggtatataat aagaaaaagg taacgggtaa atttgatgaa atttcaaggg 420 gttaatataa tttaatggca ttcaactggg aaagcaatga gggatgatat tgattggtgc 480 gttgttggct tctaatgtgt ccaaagatgt gttacggaaa attgagcacc aaaagtaccc 540 atagtggttt agagaagtta ctgaaaatga aagcatgtgg tccactctgt ttgatcgatc 600 tccatttctt taaagaattg aatcaaactc tattattaac atactttctg gttccagaat 660 gaatagatat aactagactt gttttatctg acaacaaata ttattccatt tgataaggac 720 gaaacttatg ttcaaattct actttgttag ttaatgtaag aattttattg atagacgatt 780 tgaatgtatt tgagtacgaa ttttttgaat tgagactcaa attaaaatat gtctaacctg 840 atcagtgaat tattgagatc taatttacct atatattttt ttataaaaaa agaattattt 900 tatctaaacc ttttgaaaaa ttaactactt atacatactt tttcaacaac tactcctacc 960 ttagtattct gacctagggc ggctcaattt tccttttttt atttatcagt atgacatatt 1020 aacaaactcg gctgcaggac atgcaaggct ggcggtaaag ga 1062 30 1095 DNA Glycine max 30 aaaagttagt agaatttcgc cttaggtggt ttgggaatgc atgaagaaga cctaaagaag 60 cctcgataag aagagcagat catatggagg gcaatctatt tccttgtatc tctgtattgt 120 atagagaggt gcagtattca actcactctc tcaagttagt agttggcata ctgtgtcagt 180 ttgtaaagtt agttattgac agttgtcata actaactaac tgtttcctaa ctatctaact 240 tcataactct ataaatagag tgttgtaact caggattcat taacctccat aatattttct 300 cattccattt atcttctttc ttttctcctt tttctatgat ctaaacagag ttctaatgtg 360 atctattagt tttctattat ggtatctaga gcttggtgag atcttcaatg gctgcgaaca 420 gcaccacatt cctttccgct tcttcttttt cccaattcca tatcacataa acttgatgat 480 tcaagctttc ttctatgtcg tcaacaattt gagcctgcta tcaaaccaca caaacttaac 540 gattcgttgc taatcctcag attccacttc gatttctctc tgaagaagat caagaagttg 600 gacgtgaaaa tccagcttac gaagcatggg aaaagcaaga tcaggtgtta ttagcctgac 660 acaaatcaag gcacttgctg atgctcttgc ttcagtagga agccctataa tgattcaaga 720 gcacattgat tcaattgttg aaggtctttc tccagattat cacccgataa tcgagataat 780 ttagagtaag tttgaaaccg ttccaatcac gcaagttaaa gcacttcttc tagctcatga 840 gtcttgtctg aataacttca acgattaatt acactcatgc acaacacaga gcgaattcgc 900 attcctaaaa ttacactttg ccaaaaaagt caagttctca atcggatcct gaaagttttt 960 tctggttttc gcggtggttc tgcgtgcggt agctataata ggggtggcag cagcggcggt 1020 agtggtggcc attgtcacac tggtgcaggt caatttgcct atttccaatg ccaagtctgc 1080 tttaaatttg gtcat 1095 31 508 DNA Glycine max 31 ttgctcaaca tacaaaacct accacgcatt gttaatttgc aacttaatct ttcgttattc 60 tatcttgtag gcccatccat gtttgtccaa gaatgcctcg accgcttcct taatggccaa 120 gtttggcact aactgtgatg gatcaagggg ttcccgtgtg attgggtcga atttacccac 180 ctgtttaaga agttgaatta ttgcacactg acttcattga agtggaaagc attcaacaga 240 gaaggtgaaa catgcaaatc aggttacctt ctgaagatgc tcaagaatca ctgctctctc 300 atatgtaagt ccacttggag tgattacagg atcatggaaa atgtcgagtg taattctaca 360 gcacaaataa tctggcacct gtagactcaa caagacaatt tgtgtaatta ataagatggt 420 aatttccagg gcaccactga aaaaagattt gatgaatata gaattaacaa aacaagccat 480 gtacacacct cagtaggtgt gtcagctt 508 32 580 DNA Glycine max 32 gcgtgcctgc aggtaaaaat ttatgctatt aacgaagaag atgttaatgg caggtacgtt 60 tgttattcag acatatcatg caaaatataa cttgcttagc agcttcacag aacccaatag 120 ggcatgaata atatttccgt ttaactggtt accttaccag cagacgcaaa aaacctcatg 180 ttagaccaca accaacacat gtggccagaa tcaataagca tagtaatttt actaacagag 240 aaaattttta accatgttca gtgtggcaaa tatatgctta tggggtacag taaataatga 300 taactatgct accaaagttt catgcattgg gggaggataa atgaagggaa ttgtttgaga 360 ttattttaag gagatacaaa taagagatct ctagttaaca taacaaaatc cttcaattaa 420 tgagcattta ctttttttga gctctccact tcggagtatt ttgtaagcta aattacttta 480 tacactttct ggtgcttgtc aaaattgaat tttaacattt attagaagag cagaaattta 540 taaaaacatg tcatatttgt ttttttatta aatcctttat 580 33 815 DNA Zea mays 33 agccccctga aacattgagc tattaaaaat tagaaaacga cacaccttca gatactttct 60 gattctaaat ctaaatatga aacctcttgg tccttacttg caagtaaaag ctacatatta 120 tacaagtaat ttgatgacaa tgagctatgc agtcagcatg gaaatgtcaa aacctatctg 180 tgagaaaata tcaggagacc tagcattagg attcttttgt tttatttttc tactgattga 240 attcatgcat gacttaccta gtgcaatcta gatgaaaagg tctaacattt tccccatctt 300 aattaagcct gctacaattc acaactgggg agatagaaac tcaaactatg gcagaggcat 360 aagtgtgtat tagagtcaat ccatctacaa tgaaaagtat gctatattca taggtattgg 420 catttcaatt aatcagacaa acagtagtta ctgcacgata aaacaactga acatcaactt 480 tctagcattt tgctgagata ggacccccgc agggtacata ttacaagaca ggatatgaag 540 gatggagtag catagcatta ttctagcata attatattaa cataacatga ttaggtcagg 600 tcatactcat ccggaagttg tgtttcgtca ccatagtaac tttggcagca aaacaaacta 660 gacagaacta agtagtgaaa aagaagcaca gatcgcagag agcagttcac atattttact 720 actaccaagt agcaagaacc atgtttcatc accaccctac ctttggtagc aaaacaatct 780 acatagaact aattattagt ataaaagaag ttgaa 815 34 763 DNA Zea mays 34 tgtaagttat ctacttattt gttccctatt ttcatttatt tatttaaagt tgagatttat 60 ccaaagtatt gttagtgtaa tgttttttct tctgccacat taggtttttg tcaatgcacg 120 gcttcatgtc tcaggagggg ctcttggggg tggtcgcatg gtagaagact cctcaagcgt 180 tgcaggttcc atattttaaa ctttctttga tgatcacatt tttgtagtat tcttttttta 240 cgtaaaacaa ttctgtggta ttcaatagca aattatacac ttcttacaag tcaacatata 300 gatttcacta tctcagtttc tttggaggtt actagcagaa aagtaattta gaaatgatta 360 atatatttta ctgaggttct caacgttgtg tttttcggtt gctggtaaat tgctcatttg 420 ctgctatagt tgattagtaa atatagcagt atttatgtca ttactggtta cttgtaatgc 480 aaaccttttt cattgaagta catgttctgt aaaatactag aacatggtca gtactttcag 540 cattggtcac taacttttat gttttatgcg agtaataata tttctatttc catgtttatc 600 tacttagttt ccatgccatg ccgccttttc agtattggac actgctgctg gagtttggtg 660 tgataccaaa tcagtagtta caactccaag gacaggaagg tatagtgcag atgcagcagg 720 tggtgatgct tctgtagagc ttacacggcg gtgcaggcac gca 763 35 750 DNA Zea mays 35 tgaggcaatc agtgctactt tagctgctgt aaaggctagg caagttaacg gtgagatgga 60 gcattcacct gacagggaac aatctccaga tgctgcacca agtgccaagc aaaattcaag 120 ccttataaaa ccagatcctg ctcttatgaa caattcaaca ccaccacctg gggttcggtt 180 gcaccataga gcagtgagtt gaaaaaatag ttcattttgc tgcttgttgt ttaaatttag 240 ttattctatt cttatttaga cattcagtct gtttaactta gaagtcatca catttacatg 300 aaaaatgctc

ttatttgttt tgatgccagt tacatatttt ggccttgtag gttgtggtag 360 cagcagaaac tggaggtgcc ttaggtggca tggttagaca gctctcgatt gaccagtttg 420 agaatgaagg tagaagggtc atttatggca cccctgagaa tgcaactgcg gcaaggaaat 480 tgctggatcg acaaatgtct attaatagcg tgcccaaaaa ggtaatctac atttttctac 540 tattgtaaga ttactgacaa aagcaacaca tgctagaaaa ctgaaagagt tattatcata 600 atggcttctg ctaaaaaaac aagcacttca tatgatgaca ttttctctaa gatgtagatt 660 tctatattga tttgttataa ttataatctg tggcccaata attcaggtaa ttgcttctct 720 gctgaaacct cgtggttgga gccccctgtg 750 36 607 DNA Zea mays 36 cagtcactac tgtgcttttg actggaactt gtgtcgtctt atggacatca gagaagaatg 60 atggtagcag gccatctcgt gccatggcca tcaatattct tggctgaagg aagtatcaaa 120 gtggaagtat gaataccaat agcagcataa actgaagttg tgcaatgcat atgcttgttg 180 cgagaacaga taaagcagaa aactgatgat atatatccag attatatgcc agtattcttc 240 agatgttact cattttaaaa ccatgcccac ttggctgatg actcatattt tccatcaatt 300 tgaatcacag aagaaatttg atgatacatt ggttaagata tgcttatacc tgtggcaata 360 tagaccccat caaggttgag cagagagcaa gaacagcacc agttgttaca agatacctaa 420 aacaagataa gcacatgaaa acaatctcag tcagacgcac accagatcat ccatgaaaaa 480 aatgagaaga agtccttaca ttgcccagtg catcccatgt ttggcaaagg cagatgaaat 540 aggggtgtct gggtccatag caaagtatgg taccagacca acaataacaa ctgaaaccaa 600 catgtac 607 37 607 DNA Zea mays 37 cagtcactac tgtgcttttg actggaactt gtgtcgtctt atggacatca gagaagaatg 60 atggtagcag gccatctcgt gccatggcca tcaatattct tggctgaagg aagtatcaaa 120 gtggaagtat gaataccaat agcagcataa actgaagttg tgcaatgcat atgcttgttg 180 cgagaacaga taaagcagaa aactgatgat atatatccag attatatgcc agtattcttc 240 agatgttact cattttaaaa ccatgcccac ttggctgatg actcatattt tccatcaatt 300 tgaatcacag aagaaatttg atgatacatt ggttaagata tgcttatacc tgtggcaata 360 tagaccccat caaggttgag cagagagcaa gaacagcacc agttgttaca agatacctaa 420 aacaagataa gcacatgaaa acaatctcag tcagacgcac accagatcat ccatgaaaaa 480 aatgagaaga agtccttaca ttgcccagtg catcccatgt ttggcaaagg cagatgaaat 540 aggggtgtct gggtccatag caaagtatgg taccagacca acaataacaa ctgaaaccaa 600 catgtac 607 38 1025 DNA Zea mays 38 ctctagagga tccccctggt ggttgagagg tactaccagc aagtgacgat gtactgaggc 60 cagtagattg ggctggggag ctgaacccaa aaaggttgtt tccagagctc ggaaatgtaa 120 atgaaggggt tgaagcagac tgtaaacctg gtgcagtgga agtactagat ggtgcagcca 180 cagatattcc agcagcatca cttgaaggtg caactgcggt gaaagtaggc gtagatgtgg 240 cgggggaact tgaaaagctt gcaatacttg atgacacagg tgaaataaaa agttccgatg 300 atgccttgct gcttgcatct ggtgcaacag atttcacttc agcttttgta ccccctatac 360 caaatgataa ggtgctagtg ccatcagatt tagctgatgt taccaatgac aatccaaccg 420 gagcactgga agaaaaacta aatcccgtaa atgcaggaga cgatgaaata gctggactgc 480 tactggatac ggcaaatatg gaagtggaaa caggggcttc atggattcag tttgatgagc 540 ccactcttgt cctcgacctt gattctgaca aattggctgc attctctgct gcatacgcag 600 aacttgaatc tgtactttct ggattgaatg tgcttgttga gacttacttt gctgatgttc 660 ctgctgagtc ctacaagtat gttatttatg ttgctagctc acccttttct cagccattgc 720 tcctcctaga cccttttgtg atgacatcat acttgctgtt cttttgaatg caggacccta 780 acatctctga gcagtgtgac tgcttatggt tttgatcttg tccgtggaac ccaaactctt 840 gggcttgtca cgagtgctgg tttccctgct ggaaagtacc tctttgctgg tgttgtggat 900 ggacgcaaca tctgggctga tgatcttgct acatctctca gcactctcca gtctcttgag 960 gctgttgttg ggaagggtaa tcatgcttgc acttattgtc tgccataaaa ttggatttag 1020 ttaca 1025 39 450 DNA Zea mays 39 tgaaacgata gctttattta tatcactatt ggtacagtta gatagaaaag tttcaggcct 60 caatcctaag taaccgaccc cttacatatt tcgaacttct atttacaggc ctaggcaaca 120 acaagctacc tatagtgcac cggcagagcc catgctcgcc gttgcacggg ctgccgtccc 180 taacggcggc agatgtcttg cgtcgtgaca cctcccgcat ccggcgccgc ttcgcctcac 240 aatcgtccgt ggtggcttct tttgcgtcgg ccccggcccc ggcgccggca gccaccataa 300 tccccctaga cggctcaccg gacgcgggcg cgctggacta caccgtgaac gtcggctatg 360 gcacgccgga gcagcagttc ccgatgttcc tggacaccat cttcggcgtg tccctggtct 420 tgtgcaagcc gtgcgcccca ggttccagca 450 40 373 DNA Zea mays 40 tcacaaattc atctttaaat tggccctaca tatgatataa ctcacactga gtaactgttg 60 atatcatatt ctaaatgact aaaagatttc agttattagc atattatgat atcacacacc 120 tttccaacaa cctcaaacgt ccattgtttc aaccggaagg ccactgcgtt tagatgatta 180 tttggcatgg aggcccagtg tgtatcacca tttaaactct gaaaaggtta gactttccct 240 gatgaaacct tctaattaag tgggtaagca aagcactatt cagtaattgt atcacctcct 300 gttctagcag gaatctccaa aatctcttca ccaggccaaa ctccgtgggg aacgtgttac 360 tcactttgtg ctt 373 41 1106 DNA Zea mays 41 gcggctccac ggccaggcac aacaagcccg gtgtctccga ccaccgagca agggtcgctc 60 ccgttctccc gcgtcccagc gcaccagccc ccgggagtcg gcacgtccgc gccgcggccc 120 aggatctcct cgtctatcct gtccagcacc gggtcgtcct cgtggaactt gcgggcgaac 180 ggcgcgtcgc tggcgaccat gcggtccagg tcctccgccg tcaagtagtg cgggtgctgc 240 ttcggagggt tgtcccacga gatgtagtgc aggtcgtggt tcaccgtcgt gttcttgaac 300 tcctccgcgt tgcacacaac ggtgtggaag tatccttccg gcgacgagat gaagtttgag 360 tagtacatga gcactgtgcg aggtaggttg tcccagcccc atatgcagta ttccacaaag 420 ggcctggaca gtgccatcca ggcagaacct gcgatgtcat aggctcatca ggtgcttcta 480 tatctgaatc agtaactgac atatatagat gcctggttat ctatatgagc tgtctggacc 540 tagagtagtg tttatatcta agctgtagtg tctgtttaag aaattggaat caattaattt 600 cctgcctaca cagagaagtg caaacctcct ttggggggag atggctaagc atggataatg 660 gatgtagaac ctgaacacca tatttatatt aatgaaaaaa cctgattttt gtgggaaaag 720 ttcattgaca cggtttcatt aatatataaa ttagtaactg acacataaaa caaggatagc 780 ctttcaaaaa ttaatggcag ctaacataaa tggatcaggg gagtaatcga gccaggaatg 840 ccttggttgg tttccagaac aggaggtcat ttttacatgc aaataaatgt gttgccatac 900 aaatgttctc tactacaggc aacaaacaaa tagaatcaca aatgtttgtt ccagcaataa 960 gatcaaacaa aataagatta tacagatgga aaggtaattt tttttaccaa ggagaatttc 1020 aggcagcgat atactacact gtacaaaaaa aaaagttaaa aaaataccac agctgtatgc 1080 actattttga agaagaccta aggtaa 1106 42 958 DNA Zea mays 42 catcttgcat gcctgcagtt tctttggtag acataggtgg gcatgaaggt ggttagttct 60 ttaagattta agatgctaat ctttatgttg agttatactg tgattaaaat gaagataaac 120 atatagtagt gtggtctgtg gtgtcaggat ttgttttcat aaaaattctt tctgcatgct 180 taggtctgtt tgttaaggta tatgctagtg taaaagatgg tgctctcaga catatccatg 240 attcaaaatt aaattagtgg ttgtgacgaa gaattttgag ctgggaaggc agtgtccaca 300 ggcaattgct tagagccctg ccctgctgtg tatccttaaa attgacatga atttaggacc 360 cttgtgattt atacatctat gccaaattgc caagtgctac ttttctctat gtggaaggag 420 atacatgcat ggaaagtttt gcgcgagcct tgagtgtatc acagactatc acagctgaga 480 gggatatctg aaggatatat catgattcat gattaagtta atgtgaagat ctgactaaaa 540 ttgtgcttgc agtaaacaaa acagttctgg ttgggaacca cagatctctt aggaacattg 600 tattctgatc tgactaaaat tgtgcttgaa atgaacaaaa cagttattgg caacctcaga 660 tgcttagata tattaggaac aaatgtattc cgatctgact aaaaccgtgc ttgcagtgag 720 caaaacagtt cttgttggga acctcagatc tcttaggaac aaatgtgatc tgatctgact 780 aatattgtgc ttgtagtgat aaaaaagctc ttgttgggaa catcagatat cttaattttt 840 ttgaaaaacg caggagagct gcgcatattt atagataaga tagaagaaag ggtcttacaa 900 gagaggtaca ggttagggac acctgcaccc acacacacgc actatcaact gaaacaaa 958 43 683 DNA Zea mays 43 tggtatgctc tctgaacttt tgctctgtaa ctgtgaccct caataaaaaa aattcagtta 60 aaggaatagg tcccgttgac cgagctcttc gattctctct gaatgcagat agctacactg 120 gtagctgtat acgcggactg ggggttcact tcgatcgaag gcattggatg gggctgggct 180 ggtgtggtgt ggctctacaa cctcgtcttc tacttcccgc tcgacctcct caagttcctc 240 atccgatacg ctctgagtgg caaagcgtgg gatcttgtca ttgagcaaag ggtgatcaat 300 ataaactgct cgttttgtca tgcacagcaa agcacagcac agcacctgtt tgagtgaatt 360 ccatgcacgc gcggtcggtg tgtcgctaat cgccggggtt ttgcagattg cgtttacaag 420 gaagaaggac ttcgggaagg aggagagggc gctcaagtgg gcacacgcgc agaggacgct 480 ccacgggctg cagccaccgg atgccaagct gttccctgac agggtgaacg agctgaatca 540 gatggccgaa gaggccaaac ggagggccga gattgcaagg taagatgttg aagtccgtgg 600 agatggtatc gcttgagggg aaagaaaggg caccgatgtc agcgtttccc atgatctctc 660 catatgcttt gggatatcta taa 683 44 772 DNA Zea mays 44 cacccctacc aaatcgagca cagctcagaa gaggcagggg aggatcactc tcctaccaac 60 caaaagtcac ctcgacactg gacgtgatac aagagacaca aaacatgggc cagaagacac 120 catgcatcag acagttctga gaacttattc aacagcaaaa gtcatcttcg gatttgttgg 180 cttcctcgtt ccaccatcct ttttggcatc ttttcctgta gaacagacca cgaatccaac 240 caaaagaagc aagcataaat cacatcagta gggtcataaa agaccttgtt gttaaagcac 300 acatcattcc taaacttcca aatggcccag agaatggctc ctatgcctag aacaatcaaa 360 ttcctcgtgg ttttcttata aatgttcatc cagtcctcaa aaagagaatc caggttttga 420 ggacatctct tcacatccag ggcaacctga agaactctcc acacaaaagt ggccactggg 480 caattgagaa aaaggtgatt ggtggattcc aaaacaccac agaaacaaca atcagtcaac 540 ccaggccaat tccttttttc aggttttctt tggagataat ttttaatctt cagaacaagc 600 cacaggaaaa cttaaatttt tgaggcattc ttattttcca aagaaactta ggaactttag 660 tatgacgtca aatagatata cttggctaat aattttgaag aatttaatga ttattatttg 720 gtatcataaa tgctattatt tgattatata aaaattggtc aaacttatga tg 772 45 544 DNA Zea mays 45 atacaatgag atggttggca tgcaaagtat tgtgttgaaa ctgataaaac attatcttat 60 atctcaaagt tctcaattgt ttacaagaag gtaaagccct gtataatttt atggcagata 120 actgcaactc ccatagaaaa tccaatgccg gaggttccca catgagtagg gtctgggaag 180 agaaaaacca aggcaagcct tcccccgcag atgaggggag gctacaaatc tcatcaacag 240 acaaaactca tccatcggcg ttggtagcgc ccagatgcca catctgggtc tggcactgga 300 cgcagcatcc accatccatt tgtcctatcc tttgatccac agttcattta atttagtcca 360 caaggaataa catatctact tctaattaat ccaagtgaaa tgggactatc ttcgtgcatt 420 cctttacctc cacatctgcc tacctcttgc accatcgcta tttctctcgt gcggaattcc 480 aaggctgggt taaaaaaaac acggaacacc cctcacccct gtcggcctgt tcttccactg 540 acgc 544 46 452 DNA Zea mays 46 cttccatcca cgaactcccg ttctccattt ccagccatgg cggtgtcgat cgagctcacc 60 aaggagtacg gctacgtcgt gctggtgctg gtggcctatg tcttcctcaa cctttggatg 120 ggcttccagg tcggcaaggc ccgcagaaag taagctctcc gaaatctgaa tcgctcgtcg 180 ccattgttgt cttcgtttgt ctccgcccaa cttatttcat caacggacat aataatatga 240 tggtttccgc tgtgattctt ctttaggtac aaggtgttct accccaccat gtacgccatc 300 gagtcggaga acaaggacgc caagctcttc aactgcgtgc aggtgcgccc aagattctga 360 catcctctcc cctcccccgt gattaattaa ttgctcttgt gaggggttgg gactttggga 420 ggcatctaaa tttccgctgg ttcttgtggt tg 452 47 1064 DNA Zea mays 47 tagctttaga gcatgtggaa atttcagctt ctggacaggc tactaacttc cctacttgca 60 cgcaagcata aggtatggtt ttaatgaaca tgttacccaa gtttgtgttt ttttagtatt 120 tcttaactaa ctttagatca actgatatat gtttgtaggt tctaatattc tcacaatgga 180 caaaagtttt ggacattctt gagtattacc tagattcaaa aggccttggg gtttgcagaa 240 ttgatggtag tgttaatttg gaagagaggc ggcgacaggt aacatgctag cctgtgccaa 300 tatattactc ctttcattcc aaattataaa atattgactt ttctagatac attacttttg 360 ctatgtatct agatatacac tcatgtggga gcctccaaca ctggatctgc cctagataca 420 cactaagtct atatgcataa caaaaagtga tgtatctaga aaagccaaaa cgtcttgtaa 480 tttgggacgt gtactatttt tctaaagaac atctaaactc tgaggatttg cactgtagtt 540 agatatttgg gtacatgtat aaatttattt cacaaaaaaa atcttttgtt actgttttct 600 catgcagata gcagagttga atgatttgaa tagcagtctg aatgtcttta ttctgagcac 660 acgggctggc ggacttggta tcaaccttac ttctgctgat acatgtatcc tttatgacag 720 tgactgggta attctctgtc aagactacta tactcatcag aaaaatgttt acaagaaagc 780 cttttttttt gctggctact ttctttggct gctgattgac ttattatgct agttctaata 840 tggtgctcgt ttactcgaac ctccagaatc ctcagatgga tcagcaggcc atggatcgat 900 gccaccggat tggtcaaaca cgcccagtac atgtatatag gctggctacc tcatattctg 960 ttgaggtatg cttcagtgat ccgtgttttc agacttgtca ctttggctat tgtctcaggt 1020 ttactcagct tttcaccttt gcaggaacgg atcatcaaga aagc 1064 48 1588 DNA Zea mays 48 gggcgccggc gggtgagatc ggcaggggca tgccgccgtt ggccccggcg gcggtagaga 60 atcctgatga tgctgcagag tggtgcaacg gcacgtgagg ggagccctgc gcttgccatg 120 cccatgaggg gaatggctgg gcgcccagct caggccttcg ctccaccggc ctttcctgga 180 cctgcacgat atcgcagcag cacgagcatg agctttgttt gtcaggtttt gaccacagtg 240 aacgtttcca cctttaagtc gttcacagga aacagagcga acaaacaacc aagattggat 300 atcggcgagc atgattaatc tttcatgatt ctgttttaat taattgattg attttaacgt 360 atgtgcagca caatgaagac ttgatgctag ctctatttcc cgtaaaaatt cagagaatcc 420 tcaggactac agctgcagca cgtgatccct gatcagcgta ctaaaagata catcccatgt 480 tatgtcaaga aactgcgacc caacaaccgc gactgcggtg tttcaaaaga tggaaagtgg 540 cagaacggcc tggtcgaaac accaaccagt ccacacacca aacagtacct gccattatag 600 taaggagtac cattgccccc cacccccccc aaaaaaaaca gaaaaagaag tgctgtatag 660 tatttttcca aggagcaaga cgtttcataa gaaatttcta aacaagctaa cagtaacagg 720 ttggtaggga atcaatctac aagtaaatgt ctatctagct gctctcagtc aaaaaggtta 780 gacatgtcgc acatatatat tcggcttgtt tctttcccat tatttccttt ttggaacaca 840 ggtcatgttc agctaaccat cctggattct acgatagttt ggaataactg atggagtatg 900 tagtaacaga catttgcatg tgtaagtgta acaaagggat cttggcgata cctggtggtt 960 cggaaagaag ccatgacaag ctgacgagta caagcgctga tggtgagatg acactgctgg 1020 gttttggtga tacccgggcc aaggggatga tgagctcatt ggctacagca gcagcaggtg 1080 aaaggagcaa acactgtgag atcaataata gttgtaggac ctggaccacg aattaatatc 1140 gatctccccc aaccatatgc caatcaaata gtcctatgaa tctttgtagg ctaggagtta 1200 ctagtaggaa ctagtatcag aggtccaaga ttctacaaag acaattcaac cgacagttgc 1260 ttccatctta agattttaat tttttttcta gatacctctt taggcaccat aagtaaaaag 1320 ctatatactg gaaaattgaa cgggtggttt tacctgagaa gccatcgtat ttgaagattc 1380 aggggaatca catgttggct ggaggccagc taaggtattg tcccttttgg ggtcttgcac 1440 attaggctgt gaaatccgca aatccaaatc aatggcatca ccatcatcaa ttgctgacaa 1500 cgtagagaga accatgagat acacctcaga tactatatat attaaaaaaa agaacggatg 1560 acaaggccga gtgtaatacc ctcggttt 1588 49 784 DNA Zea mays 49 aacataattt catgtactgt tcgtacagag attatcttta gagagaatag taagtactac 60 cttcgttctt gaatatttat catctgctag tttaatttta aactaaaacg tgataaataa 120 aaaaaacgaa gagggtatct tctatcttgt aataccaacc acgtgaagac cttcactcca 180 cggttgtttt gcctttttag attatattaa aactttgccc atacaaaaca gtttcttagg 240 gaaatttcca gatttatctc actcgcaatt accaactggc catcgttatt ttttagaacg 300 atatcttgca atcttctaag ctgactctgt aaatcttcac gtagccatct cctaaaaatg 360 aggctctagt ttttatattt catggcttca actgaacaat cacggtcctc gtttttttta 420 aaaaaaatga gaaaagtgtg tctttaaagt aacctatagc cacaccatct agtatgccaa 480 aaaaatggtg gatttttcat tggccgacac cgtagacctg ctgcttaagt aaatatatct 540 ggtttggggt taattagaag tatgccactc caaatttaaa gagttataaa gtagtttatt 600 ttcatgtcgc tcaatgacat caaaatgaat tacctatcta attttataaa aacagggtga 660 caagtatgta attaatcttt tttttcttag ctaggtttag tgatcttagt ccatctaatc 720 aaatgatatt tccctcttcc aaaaaaaacg ctttctagca ctatccatct ttccaataga 780 tgtc 784 50 802 DNA Zea mays 50 tgctgcgtcg tcgtgttgtt acgatcgtcc tttttttttt gaaaatttat tgaaggcccg 60 tgcttcttct tctcttcttc ttttccttct ccactgtttt cctcgcatgc tacatgacag 120 tctctctctc tcgttaactc tgttggtgtt cttattattt gatccggatg caagtaatac 180 tatacaaaca ggaatcggga atgcacgctc ccaatttttg acgcctccat gcatgtagtg 240 gcggagttcc gtaaaattaa cgtccgagat gtactattct atcttttgat tttttttttc 300 gtgttttatt taaaaataaa cagcggatga taaatattta ataacggaga gtaacattta 360 aaacaatcca gttaaactat gacaagtagg gaaagtggat tagatattcg ggaacaatta 420 cttacaatag aggagaatca cgagcacatg gcagcagaca gtcaacactc aggacacaac 480 gttcgccgct caggtggccg ttgttcattg gccaaggagg cactctgcac tcgcctatga 540 ataacaaaaa aagataatca agttgagaaa gttatatagc ttggaagaag atgtaatgac 600 aggggtggat ttgggcggtg gcatggcgcc agcccccgcg cgcgacactg ctatagggtc 660 gcagggaggc tggaggcgga tggtggaggc cgatgccaga ctggcggtgg tggatgcccc 720 gagtcaagcc cgccccgacc gcctcggagg gttgctggtt tgttcctcat tgccgaattc 780 cccggaagcc cttaaggctt tt 802 51 793 DNA Zea mays unsure (1)..(793) n=a,g,c,or t 51 aaagtaagca aaactaagct gaaatctgca agaagcagta ttgagattac caaaccaaca 60 agatcctgca gttaagttca acagaaccaa gatcagcatt cagcacaaaa tgaactgatt 120 cataatcact ctgtggacag taacagtggc agagaaagac ctgatgccca actggtttcg 180 aattgtgcaa gatatgagaa aatgggtaaa acaaatctct cctgtctttt agtgtcctta 240 agctattact attgagaact ccaccagtga tccttttgcg cattagagca tccttaatcc 300 tggactgact tagttgagta tcagcgagca tgttctgtgt gctgcatagt ataataataa 360 gccatcataa ataataatga tgataaacga aatcaataac ataaaacaca taataactca 420 cgtgttctgt gtgctgcgta gtataataat aagccatcat aaataataat gatgataaac 480 gaaatcaata acataaaaca cataataact cacggtgtgt gttgctctat ttttactcta 540 gggcttccat caggttcggt aagtgcagct tctccttcaa cagcataaac aagaagacca 600 agttggcagt accctaatgc tgtagcttgg aaccaggcaa gatcaacacg gataccattc 660 tcagctaaag agggatttgc atgagattca agccagttga tgaccggaag aaacgttact 720 ttcagatttc caccacggac caaacgcagg gggatnctnt agagtcgaac ctgcaggnnc 780 agcaagtcat agg 793 52 748 DNA Zea mays 52 atcatttttc aaccaagtaa tggagatcgc tttttatttt tagtatgtcg catcgttggc 60 tagagctttg atctaccttc atggcaagca tgttatccat cgggacatta aaccagagaa 120 tcttttagtt ggagttcagg tacttgcatt gtgttctttt ttggtatccc tctggcaccg 180 ggtctgctct gcctcatctg gaagactgga actatatgct tgctttacat cgtcgtctgt 240 ttccttggaa cagggcgaga tcaaaattgc cgactttggc tggtctgtgc acaccttcaa 300 cagaagacgg actatgtgcg gaactctgga ttacctgcca cctgaaatgg gtactttgct 360 agccatttac ctccagttat gaactagttc aatgggtttt gagaattccc acagcagtac 420 ctagctgctt tccttggctc tgaaacctgt tgtggtttat ccagtggaga aggcagaaca 480 tgattaccat gttgatatat ggagccttgg tgttctgtgc tatgagttcc tttacggggt 540 cccacctttc gaagctaagg agcactcaga aacctacaga aggtaactcc acaactctgg 600 gatcttagta tgtcgtccct aacttgccga tatcttgccc tagattattt cctgtggctt 660 ttgacttttg agctatgctc tgataactgt gaggaaactt ttgaagttgc atatagtgct 720 agtgtagcaa agagagagca ctatattg 748 53 610 DNA Zea mays 53 atattcaaaa gggaaacgaa gatggcttgt ttattagttc agttgcctct agctcaaatc 60 tgtgggcttt aattatggat gctggcactg gctttacatc tcaagtatac gaactttcta 120 attactttct tcacaaggta agcaatcaat tctgttgact tcaaagatct gtaagggtcc

180 ttcccctttt ttcttcttaa tataatgata ttcagctctc ctgcttattt gagagaaaaa 240 aaccttcaaa gatatgtaag ggttcttcct ctttgtaata gggtctttac ccttcctttt 300 cttcttctta atataatgat acacaatttc tcctgcttat tcgagaaaaa aattgagaga 360 aaaaaccttc aaaaatttac tgttctttgt ttatgattgt acgccaacac ttcactgatt 420 atacatcctc aatgatgtct cattgtcatt cgcatgcttg tactgtgtta ggaatggata 480 atggaacagt gggagagaaa tttctatatc actgcactgg ctggggcgaa taatggaagc 540 tctttggtga ttatgtcaag aggtaattaa tgtaaatgtt tgagcttgat gcttagactg 600 caggcatgca 610 54 883 DNA Zea mays 54 atttgaatag gatacatatg aaaaaagaga aattaaaggc ctgtttggtt cactacctca 60 gttgccacaa tttgcctaac ttttctgcct gaggttagtt attcaattcg aacgactaac 120 cttaggcaaa gtgtggcaca tttagccaca aaccaaacag gccctaagtg tttgctcagc 180 caaacatcgt gtatcagctt gaaatccaaa atatgtttgg caaaacatag cacatttatc 240 aagaaatcat agaaggcaaa atgcaatatg ctaatggaaa aggctcacag gtgactacga 300 tatctctcaa caggatatac aatgcttgag atggagttcg ccatactcag aaatttgttt 360 gacatgtgtc attttataat tttattttag aagctaggat tcagacttca actggagtag 420 atcaagtcaa tacataaaca gtattctttc atactaagaa atatcacctt gtaagattcc 480 tcaagcctgt ccttgtattt ttcaaccaac agctctttac tcaatgccag atctctttcc 540 actactgcct ctgtctcaaa ctattaagag acaagagaac acattactct atctattcaa 600 aacaattcct gtaatcaagt gataatataa ctataaccaa ctaaacatca tgaaaaaatt 660 gcagtgcaat acctgaataa caatgtgagg atacaatgtt tgtcctttac ggattggtgg 720 atcaagagtg ataacaacaa atgtatgagg attgtttgac tgcaaaggtc aaccatgcaa 780 tgagtttagt tacatgtcac aatgtaatac tatgtacaat aataaagcac aagcctcaac 840 catatgatac caaggaacag aaaccttttg gcaaaaagga aag 883 55 24 DNA Glycine max 55 tgcttgcttt ggacctacac aaaa 24 56 24 DNA Glycine max 56 aaaagcccaa aaggaagagt ggag 24 57 24 DNA Glycine max 57 gcgatgacct tgtatggggt agac 24 58 24 DNA Glycine max 58 ccatgccctg attcattcat cata 24 59 1249 DNA Glycine max 59 cagactctag tgactaccac cttcactctc ctcaagcatt tcagcctctt ccccgctcag 60 actccttagc tttgggagcc aaattatccc ttacgttctc gacttcaacc atatgtgata 120 gctgcctatg ataccatggc tacttcccgt tagttcttta tctttccttt ccgctttatt 180 ccatgcctta ccgatcctct gaagtgtctt tgcattagct tcattgaaac ctcacgcgat 240 gaaaggtgtg atggtctcct ccgatggcgc acttctcata gggtaaccta attgtcttac 300 gaccaacata ggattataat taatacaacc cctcgtccct ataaaaggga catttggaaa 360 tccttcacat aagcataaca ctcctacccc tctttctttc cactgtggga accaactaat 420 ggacgctcct atcatgcctg ccaagagttc ttcccaattt gcctcgtcct ttcctgagca 480 catgcgatga ccttgtatgg ggtagacaga tctactttca tgattgaaga cgtgggatac 540 caaccacaca taaagagcag gcgcacaaca gaaaatcctc gtagtgctct tcttgcatct 600 taagtcaaat gtatcataca cttatgctaa aacaacaatg atcgggcttt ccttgctatg 660 gtgataagca agaaaagcat cgattgctac tagatccacc aactcgtcta cattcgaaaa 720 tagtactatc ccaaacacta gcagtgctaa tacgtcgatg aatgatgccc actctccttg 780 gctggccaga gtttccgcct tctcctccaa tcacttcctt ggtattcccc ctaccctatt 840 cctactttgc ttcactcagt ctaattctca tttcgagatc ttgacaactc ctgctattct 900 cgccatagaa ggatagtacc cagaaaaaag gtatggcttc cttcctccta tcgggcatcc 960 taagatccct tcgaactcct ctatggttgg tgctaactga aagtccccaa aagtgaagca 1020 tctgagtgat tggtcatagt attgggtgag agatgcgatg gcttcaacga acacttctat 1080 catcaccaga tcccaaatct tcccatatac cttgttgaag gactgacgtt gagctcgatc 1140 catccgatgc cccagttttc gcaagatgac tacttctaga ttcttgagtt cgacacgata 1200 gaaccttttc ttaaaagaca gtgcttgtct gaccccatct catcagact 1249 60 25 DNA Glycine max 60 cgttctcgac ttcaaccata tgtga 25 61 25 DNA Glycine max 61 gcatggaata aagcggaaag gaaag 25 62 17 DNA Glycine max 62 ccatggtatc ataggca 17 63 17 DNA Glycine max 63 ccatggtatc gtaggca 17

* * * * *