D1-C-terminal processing protease: methods for three dimensional structural determination and rational inhibitor design Diner, Bruce A. ; et al. [Diner, Bruce A.]

D1-C-terminal processing protease: methods for three dimensional structural determination and rational inhibitor design

Diner, Bruce A. ; et al.

Patent Application Summary

U.S. patent application number 09/999536 was filed with the patent office on 2003-09-18 for d1-c-terminal processing protease: methods for three dimensional structural determination and rational inhibitor design. Invention is credited to Diner, Bruce A., Jordan, Doug B., Liao, Der-Ing, Nelson, Mark J..

Application Number	20030175800 09/999536
Document ID	/
Family ID	22456777
Filed Date	2003-09-18

United States Patent Application	20030175800
Kind Code	A1
Diner, Bruce A. ; et al.	September 18, 2003

D1-C-terminal processing protease: methods for three dimensional structural determination and rational inhibitor design

Abstract

The present invention provides atomic coordinate/x-ray diffraction data defining the three dimensional structure of D1 protease. The present invention further provides methods for identifying ligands that bind to D1 protease.

Inventors:	Diner, Bruce A.; (Chadds Ford, PA) ; Jordan, Doug B.; (Wilmington, DE) ; Liao, Der-Ing; (Newark, DE) ; Nelson, Mark J.; (Newark, DE)
Correspondence Address:	E I DU PONT DE NEMOURS AND COMPANY LEGAL PATENT RECORDS CENTER BARLEY MILL PLAZA 25/1128 4417 LANCASTER PIKE WILMINGTON DE 19805 US
Family ID:	22456777
Appl. No.:	09/999536
Filed:	November 15, 2001

Related U.S. Patent Documents


Application Number	Filing Date	Patent Number
09999536	Nov 15, 2001
09564335	May 2, 2000
60133047	May 7, 1999

Current U.S. Class:	435/7.1 ; 702/19
Current CPC Class:	C12N 9/6424 20130101
Class at Publication:	435/7.1 ; 702/19
International Class:	G01N 033/53; G06F 019/00; G01N 033/48; G01N 033/50

Claims

What is claimed is:

1. A computer readable medium having stored thereon atomic coordinate/x-ray diffraction data defining the three dimensional structure of Scenedesmus obliquus D1 protease or a fragment thereof.

2. A computer readable medium having stored thereon atomic coordinate data defining the three dimensional structure of wheat D1 protease or a fragment thereof.

3. The computer readable medium of claim 1 wherein the atomic coordinate/x-ray diffraction data are given in FIG. 1, FIG. 5 or FIG. 6.

4. The computer readable medium of claim 2 wherein the atomic coordinate data are given in FIG. 4.

5. A computer readable medium having stored thereon the computer model output data defining the three-dimensional structure of Scenedesmus obliquus D1 protease or a fragment thereof.

6. A computer readable medium having stored thereon the computer model output data defining the three-dimensional structure of a wheat D1 protease or a fragment thereof.

7. A computer readable medium having stored thereon atomic coordinate/x-ray diffraction data defining the three dimensional structure of a binary complex of D1 protease and a ligand that binds to D1 protease or a subunit thereof.

8. The computer readable medium of claim 7 wherein the ligand is an active site inhibitor of D1 protease.

9. The computer readable medium of claim 8 wherein the active site inhibitor is a tetrapeptide chloromethylketone.

10. The computer readable medium of claim 9 wherein the tetrapeptide chloromethylketone is Z-LDLA-CMK, wherein Z=carbobenzoxy, and CMK=chloromethylketone.

11. The computer readable medium of claim 10 wherein the atomic coordinate/x-ray diffraction data are given in FIGS. 7 or 8.

12. A computer readable medium having stored thereon the computer model output data defining the three dimensional structure of a ternary complex of D1 protease and a ligand that binds to D1 protease or a subunit thereof.

13. The computer readable medium of claim 12 wherein the ligand is an active site inhibitor of D1 protease.

14. The computer readable medium of claim 13 wherein the active site inhibitor is a tetrapeptide chloromethylketone.

15. A method for identifying a ligand of D1 protease or a fragment thereof the method comprising: (a) providing a computer readable medium having stored thereon computer model output data defining the three dimensional structure of a of D1 protease; (b) providing a computer readable medium having stored thereon computer model output data defining the three dimensional structure of a potential ligand that binds to D1 protease or a fragment thereof; (c) providing a computer system comprising a computer and a computer algorithm, the computer system capable of processing the computer model output data of step (a) and step (b); (d) processing the computer model output data of step (a) and step (b) using the computer system of step (c) wherein the processing calculates the ability of the potential ligand to bind to D1 protease or a fragment thereof; and (e) identifying a potential ligand of D1 protease or a fragment thereof.

16. The method of claim 15 wherein the potential ligand of (b) is a tetrapeptide chloromethylketone.

17. The method of claim 16 wherein the tetrapeptide chloromethylketone is Z-LDLA-CMK, wherein Z=carbobenzoxy, and CMK=chloromethylketone.

18. A crystal of a D1 protease wherein the crystal effectively diffracts x-rays for the determination of the atomic coordinates of a D1 protease or a fragment thereof to a resolution equal to or better than 3.5 Angstroms and wherein the atomic coordinates of the crystal are given in FIG. 1, FIG. 4, FIG. 5, or FIG. 6.

19. The crystal of claim 18 wherein the crystal effectively diffracts x-rays for the determination of the atomic coordinates of the D1 protease to a resolution of about 1.8 Angstroms.

20. A method of identifying a D1 protease ligand comprising: (a) selecting a potential ligand by performing rational compound design with the three-dimensional structure determined for the crystal of claim 19, wherein said selecting is performed in conjunction with computer modeling; (b) contacting the potential ligand with the ligand binding domain of D1 protease; and (c) detecting the binding of the potential ligand for the ligand binding domain; wherein a potential ligand is selected on the basis of its having a greater affinity for the ligand binding domain of D1 protease than that of the natural substrate for the ligand binding domain of D1 protease.

21. A method of identifying a D1protease ligand comprising: (a) performing molecular modeling using; (i) the coordinate/x-ray diffraction data defining the three dimensional structure of Scenedesmus obliquus D1 protease or a fragment thereof; and (ii) the amino acid sequence of a D1 protease enzyme; wherein said modeling produces predicted coordinate data defining the three dimensional structure of the D1 protease enzyme; (b) generating computer model output data from the predicted coordinate data defining the three dimensional structure of the D1 protease enzyme; (c) providing a computer readable medium having stored thereon computer model output data of (b) (d) providing a computer readable medium having stored thereon computer model output data defining the three dimensional structure of a potential ligand that binds to D1 protease or a fragment thereof; (e) providing a computer system comprising a computer and a computer algorithm, the computer system capable of processing the computer model output data of step (c) and step (d); (f) processing the computer model output data of step (c) and step (d) using the computer system of step (e) wherein the processing calculates the ability of the potential ligand to bind to D1 protease or a fragment thereof; and (g) identifying a potential ligand of D1 protease or a fragment thereof.

22. The method of claim 21 wherein the molecular modeling is homology modeling.

23. The method of claim 21 wherein the molecular modeling is molecular replacement, and wherein at step (a) the molecular modeling further uses the x-ray diffraction data obtained from a crystal of said D1 protease enzyme.

24. The method of claim 21 wherein the potential ligand of (b) is a tetrapeptide chloromethylketone.

25. The method of claim 24 wherein the tetrapeptide chloromethylketone is Z-LDLA-CMK, wherein Z=carbobenzoxy, and CMK=chloromethylketone.

26. The method of claim 21 wherein the amino acid sequence of a D1 protease enzyme is isolated from organisms selected from the group consisting of higher plants, algae and cyanobacteria.

27. The method of claim 21 wherein the amino acid sequence of a D1 protease enzyme is isoalted from the group consisting of wheat, corn, soybean, barley, and rice.

28. A method of obtaining coordinate data defining the three dimensional structure of a D1 protease enzyme comprising performing homology modeling using; (i) the coordinate/x-ray diffraction data defining the three dimensional structure of Scenedesmus obliquus D1 protease or a fragment thereof; and (ii) the amino acid sequence of a D1 protease enzyme; wherein said homology modeling produces predicted coordinate data defining the three dimensional structure of the D1 protease enzyme.

29. A method of obtaining coordinate data defining the three dimensional structure of a D1 protease enzyme comprising performing molecular replacement using; (i) the coordinate/x-ray diffraction data defining the three dimensional structure of Scenedesmus D1 protease or a fragment thereof; and (ii) the amino acid sequence of said D1 protease enzyme and (iii) the x-ray diffraction data obtained from a crystal of said D1 protease enzyme; wherein said molecular replacement produces the coordinate/x-ray diffraction data defining the three dimensional structure of the D1 protease enzyme.

30. The method of claims 28 or 29 wherein the amino acid sequence of a D1 protease enzyme is isolated from organisms selected from the group consisting of higher plants, algae and cyanobacteria.

31. The method of claim 30 wherein the amino acid sequence of a D1 protease enzyme is isolated from the group consisting of wheat, corn, soybean, barley, and rice.

Description

[0001] This application claims the benefit of U.S. Provisional Application No. 60/133,047, filed May 7, 1999.

FIELD OF THE INVENTION

[0002] The present invention is in the field of three-dimensional protein structure determination, the modeling of new structures, and inhibitor identification and design using three-dimensional protein structures.

BACKGROUND OF THE INVENTION

[0003] D1-C-terminal processing (D1) protease is responsible for C-terminal processing of the carboxy-terminal extension of the precursor form of the D1 polypeptide of the Photosystem II reaction center (Marder et al., J. Biol. Chem. 259:3900-3908 (1984); Metz et al., FEBS Lett. 205:269-274 (1986); Diner et al., J. Biol. Chem. 263:8972-8980 (1988); Taylor et al., FEBS Lett. 235:109-116 (1988); Takahashi et al., FEBS Lett. 240:6-8 (1988); a Anbudurai et al., Proc. Natl. Acad. Sci. USA 91:8082-8086 (1994); Trost et al., J. Biol. Chem. 272:20348-20356 (1997)). This processing is essential for the assembly of the manganese cluster, responsible for photosynthetic water oxidation and the source of electrons to the photosynthetic electron transport chain (Metz et al., Biochem. Biophys. Res. Commun. 94:560-566 (1980); Bowyer et al., J. Biol. Chem. 267:5424-5433 (1992); Nixon et al., Biochemistry 31:10859-10871 (1992)). Because of the essential nature of the D1 protease for photosynthesis, it is a potential target for inhibitors with utility as commercial herbicides. Until now, the three-dimensional structure of this enzyme as well as of any homologous proteins has not been determined. There are also no publicly known inhibitors of this enzyme. The instant invention reports the three-dimensional structure of D1 protease from Scenedesmus obliquus at 1.8 .ANG. resolution.

SUMMARY OF THE INVENTION

[0004] The present invention provides a computer readable medium having stored thereon atomic coordinate/X-ray diffraction data defining the three dimensional structure of Scenedesmus obliquus D1 protease or a fragment thereof. Additionally the invention provides a computer readable medium having stored thereon atomic coordinate data defining the three dimensional structure of wheat D1 protease or a fragment thereof.

[0005] The invention further provides a computer readable medium having stored thereon the computer model output data defining the three dimensional structure of Scenedesmus obliquus D1 protease or a fragment thereof. Similarly, it is an object of the invention to provide a computer readable medium having stored thereon the computer model output data defining the three dimensional structure of a wheat. D1 protease or a fragment thereof.

[0006] Additionally the present invention provides a method for identifying a ligand of D1 protease or a fragment thereof, the method comprising: (a) providing a computer readable medium having stored thereon computer model output data defining the three dimensional structure of a D1 protease; (b) providing a computer readable medium having stored thereon computer model output data defining the three dimensional structure of a potential ligand that binds to D1 protease or a fragment thereof, (c) providing a computer system comprising a computer and a computer algorithm, the computer system capable of processing the computer model output data of step (a) and step (b); (d) processing the computer model output data of step (a) and step (b) using the computer system of step (c) wherein the processing calculates the ability of the potential ligand to bind to D1 protease or a fragment thereof; and (e) identifying a potential ligand of D1 protease or a fragment thereof.

[0007] It is a further object of the present invention to provide a crystal of a D1 protease wherein the crystal effectively diffracts X-rays for the determination of the atomic coordinates of a D1 protease or a fragment thereof to a resolution equal or better than 3.5 Angstroms.

[0008] The present invention further provides a method of identifying a D1 protease ligand comprising: (a) selecting a potential ligand by performing rational compound design with the three-dimensional structure determined for the crystal of the Scendesmus obliquus D1 protease enzyme, wherein said selecting is performed in conjunction with computer modeling; (b) contacting the potential ligand with the ligand binding domain of D1 protease; and (c) detecting the binding of the potential ligand for the ligand binding domain; wherein a potential ligand is selected on the basis of its having a greater affinity for the ligand binding domain of D1 protease than that of the natural substrate for the ligand binding domain of D1 protease.

[0009] The invention additionally provides methods of obtaining coordinate data defining the three dimensional structure of a D1 protease enzyme comprising performing molecular modeling using; (i) the coordinate/X-ray diffraction data defining the three dimensional structure of Scenedesmus obliquus D1 protease or a fragment thereof; and (ii) the amino acid sequence of a D1 protease enzyme; and optionally the X-ray diffraction data from a crystallized D1 protease enzyme, wherein said molecular modeling produces predicted coordinate data defining the three dimensional structure of the D1 protease enzyme. This method may optionally be accomplished using homology modeling or molecular replacement and the D1 protease may be isolated from plants selected from the group consisting of wheat, corn, soybean, barley, and rice.

BRIEF DESCRIPTION OF THE FIGURES AND SEQUENCE LISTING

[0010] The invention can be more fully understood from the following detailed description and the accompanying figures and Sequence Listing which form a part of this application.

[0011] FIG. 1 presents the atomic coordinates derived from X-ray diffraction data defining the three-dimensional structure of D1 protease isolated from Scenedesmus obliquus.

[0012] FIG. 2 illustrates site-directed mutagenesis of D1 protease.

[0013] FIG. 3 presents an amino acid comparison of wheat and Scenedesmus obliquus D1 protease.

[0014] FIG. 4 presents the predicted atomic coordinates of the resulting three-dimensional model of D1 protease isolated from wheat.

[0015] FIG. 5 presents the atomic coordinates derived from X-ray diffraction data defining the three-dimensional structure of the C2I form of the native D1 protease isolated from Scenedesmus obliquus.

[0016] FIG. 6 presents the atomic coordinates derived from X-ray diffraction data defining the three-dimensional structure of the R32 form of the native D1 protease isolated from Scenedesmus obliquus.

[0017] FIG. 7 presents the atomic coordinates derived from X-ray diffraction data defining the three-dimensional structure of the D1 protease derivatized by peptide chloromethylketone inhibitor.

[0018] FIG. 8 presents the computer model of the active site lysine covalently modified by the peptide chloromethylketone inhibitor.

[0019] The following sequence descriptions and sequence listings attached hereto comply with the rules governing nucleotide and/or amino acid sequence disclosures in patent applications as set forth in 37 C.F.R. .sctn.1.821-1.825. The Sequence Descriptions contain the one letter code for nucleotide sequence characters and the three letter codes for amino acids as defined in conformity with the IUPAC-IYUB standards described in Nucleic Acids Research 13:3021-3030 (1985) and in the Biochemical Journal 219(2):345-373 (1984) which are herein incorporated by reference. The symbols and format used for nucleotide and amino acid sequence data comply with the rules set forth in 37 C.F.R. .sctn.1.822.

[0020] SEQ ID NO: 1 is the amino acid sequence of D1 protease from Scenedesmus obliquus.

[0021] SEQ ID NO: 2 is the 5' primer sequence used for cloning Scenedesmus obliquus D1 protease gene.

[0022] SEQ ID NO: 3 is the 3' primer sequence used for cloning Scenedesmus obliquus D1 protease gene.

[0023] SEQ ID NO: 4 is the amino acid sequence of D1 protease from Scenedesmus obliquus which has undergone site-directed mutagenesis and which lacks the signal peptide.

[0024] SEQ ID NO: 5 is the L132-fwd primer.

[0025] SEQ ID NO: 6 is the L132-rev primer.

[0026] SEQ ID NO: 7 is the L210-fwd primer.

[0027] SEQ ID NO: 8 is the L210-rev primer.

[0028] SEQ ID NO: 9 is the amino acid sequence of D1 protease from wheat.

[0029] SEQ ID NO: 10 is the amino acid sequence of the wildtype D1 protease from Scenedesmus obliquus lacking the signal peptide.

[0030] SEQ ID NO: 11 is the tetrapeptide chloromethylketone D1 protease ligand.

DETAILED DESCRIPTION OF THE INVENTION

[0031] The present invention describes methods for expressing, mutating, refolding, purifying, crystallizing and solving to high resolution the X-ray crystal structure of the D1-C-terminal processing (D1) protease from Scenedesmus obliquus. The X-ray crystal structure describes the apoprotein. The three-dimensional structure (e.g., as provided on computer readable media of the present invention; FIG. 1) is useful for rational design of ligands of D1 protease. Such ligands can be synthesized and are useful as agronomic compounds for inhibiting the activity of D1 protease.

[0032] In this disclosure, a number of terms and abbreviations are used. The following definitions are provided.

[0033] "D1-C-terminal processing protease" is abbreviated D1 protease.

[0034] "Multiwavelength Anomalous Diffraction" is abbreviated MAD.

[0035] "Multiple isomorphous replacement" is abbreviated MIR.

[0036] "Polymerase chain reaction" is abbreviated PCR.

[0037] The term "D1 protease" refers to an enzyme responsible for the processing of the D1 pre-protein at the C-terminal end for the production of the mature D1 polypeptide.

[0038] The terms "D1 pre-protein", "D1 pre-polypeptide", and "pre-D1" refer to the D1 precursor protein that has been N-terminally processed but contains an additional 8 to 16 amino acid residues at the C-terminal portion of the protein which are cleaved off by D1 protease at the carboxy side of D1 -Ala344 to yield the mature D1 protein.

[0039] The terms "D1 protein", "D1 polypeptide", and "mature D1 protein or polypeptide" refer to an electron transport polypeptide that is both N- and C-terminally processed and a subunit of the PSII reaction center. This polypeptide is implicated in coordinating a tetranuclear manganese (Mn) cluster which is found in the PSII reaction center of all photosynthetic organisms and is responsible for the coordination of the primary photoreactants.

[0040] The term "enzyme substrate" means any compound or material that is capable of interacting with or binding to the active enzymatic site of D1 protease where that substrate is catalytically cleaved by the interaction with the active site. As used herein a suitable substrate for the D1 protease enzyme may be the D1 pre-protein, or a portion of that pre-protein comprising the D1 processing site.

[0041] The term "D1 processing site" refers to the region on the D1 pre-protein that is cleaved by the D1 protease enzyme. As used herein "D1 processing" refers to the cleavage of the D1 pre-protein by D1 protease.

[0042] The term "D1 active site" or "active site" refers to the portion of the D1 protease enzyme responsible for D1 processing. For the purposes of the present invention an "active site" will comprise any region of 41 contiguous amino acid residues, located within a polypeptide having D1 processing activity, where there exists at least 60% amino acid identity between region and the corresponding region beginning at residue 361 and ending at residue 402 of the D1 protease enzyme isolated from the Scenedesmus obliquus as set forth in SEQ ID NO: 1.

[0043] The term "ligand" means any compound capable of interacting with the active site of D1 protease or binding to any other domain or sub-domain of D1 protease. Ligands may include but are not limited to enzyme substrates.

[0044] The term "complex" as used herein refers to the association of a protein with other substances or molecules useful in determining the structure of the protein. Thus, a protein may be complexed with a ligand or substrate at the active site. A "binary complex" refers to the association of the protein with one other substance, such as for example the binding of the enzyme with a ligand or substrate.

[0045] The term "atomic coordinate/X-ray diffraction data" means that data generated from an X-ray diffraction procedure that will enable the determination of the structure of a protein.

[0046] The term "predicted atomic coordinate data" or "coordinate data" means that data generated from a computer modeling program that predicts atomic coordinate data that will enable the determination of the structure of a protein.

[0047] The term "computer model output data" refers to the data generated by modeling and compound docking software using atomic coordinate/X-ray diffraction coordinates.

[0048] As used herein the general term "molecular modeling" will refer to the use of a computer algorithm to generate a predicted model of a protein. "Molecular modeling" may encompass specific type of modeling applications, as for example homology modeling or molecular replacement modeling.

[0049] The term "molecular replacement" refers to a computer based method of determining the three dimensional structure of a protein of interest using the atomic coordinates for a reference protein and the X-ray diffraction data from the protein of interest.

[0050] The term "homology modeling" refers to a computer based method of determining the three dimensional structure of a protein of interest using a combination of the primary structure of the protein of interest and the crystal structure of at least one reference protein.

[0051] The term, "rational compound design" means the use of a set of atomic coordinate/X-ray diffraction data derived from a protein or protein complex, in conjunction with computer modeling software to determine compounds that will most likely bind to or interact with a specific site on the protein or protein complex.

[0052] As used herein where references to the positions of amino acids in D1 protease are mentioned (e.g., Lys397), they will always be relative to the amino acid sequence set forth in SEQ ID NO: 1, unless otherwise indicated.

[0053] The term "sequence analysis software" refers to any computer algorithm or software program that is useful for the analysis of nucleotide or amino acid sequences. "Sequence analysis software" may be commercially available or independently developed. Typical sequence analysis software will include but is not limited to the GCG suite of programs (Wisconsin Package Version 9.0, Genetics Computer Group (GCG), Madison, Wis.), BLASTP, BLASTN, BLASTX (Altschul et al., J. Mol. Biol. 215:403-410 (1990), and DNASTAR (DNASTAR, Inc. 1228 S. Park St. Madison, Wis. 53715 USA). Within the context of this application it will be understood that where sequence analysis software is used for analysis, that the results of the analysis will be based on the "default values" of the program referenced, unless otherwise specified. As used herein "default values" will mean any set of values or parameters which originally load with the software when first initialized.

[0054] As used herein the terms "percent identity" and "percent homology" will be used interchangeably. The term "percent identity" is a relationship between two or more polypeptide sequences or two or more polypeptide or polynucleotide sequences, as determined by comparing the sequences. In the art, "identity" also means the degree of sequence relatedness between polypeptide or polynucleotide sequences, as the case may be, as determined by the match between strings of such sequences. "Identity" and "similarity" can be readily calculated by known methods, including but not limited to those described in: Computational Molecular Biology (Lesk, A. M., ed.) Oxford University Press, New York (1988); Biocomputing: Informatics and Genome Projects (Smith, D. W., ed.) Academic Press, New York (1993); Computer Analysis of Sequence Data Part I (Griffin, A. M., and Griffin, H. G., eds.) Humana Press, New Jersey (1994); Sequence Analysis in Molecular Biology (von Heinje, G., ed.) Academic Press (1987); and Sequence Analysis Primer (Gribskov, M. and Devereux, J., eds.) Stockton Press, New York (1991). Preferred methods to determine identity are designed to give the largest match between the sequences tested. Methods to determine identity and similarity are codified in publicly available computer programs. Preferred computer program methods to determine identity and similarity between two sequences include, but are not limited to, the GCG Pileup program found in the GCG program package, using the Needleman and Wunsch algorithm with their standard default values of gap creation penalty=12 and gap extension penalty=4 (Devereux et al., Nucleic Acids Res. 12:387-395 (1984)), BLASTP, BLASTN, and FASTA (Pearson et al., Proc. Natl. Acad. Sci. USA 85:2444-2448 (1988). The BLASTX program is publicly available from NCBI and other sources (BLAST Manual, Altschul et al., Natl. Cent. Biotechnol. Inf., Natl. Library Med. (NCBI NLM) NIH, Bethesda, Md. 20894; Altschul et al., J. Mol. Biol. 215:403-410 (1990); Altschul et al., (Gapped BLAST and PSI-BLAST: a new generation of protein database search programs), Nucleic Acids Res. 25:3389-3402 (1997)). The method to determine percent identity preferred in the present invention is by the method of DNASTAR protein alignment protocol using the Jotun-Hein algorithm (Hein et al., Methods Enzymol. 183:626-645 (1990)). Default parameters used for the Jotun-Hein method for alignments are: for multiple alignments, gap penalty=11, gap length penalty=3; for pairwise alignments ktuple=2. As an illustration, for a polynucleotide having a nucleotide sequence with at least 95% identity to a reference nucleotide sequence, it is intended that the nucleotide sequence of the polynucleotide is identical to the reference sequence except that the polynucleotide sequence may include up to five point mutations per each 100 nucleotides of the reference nucleotide sequence. In other words, to obtain a polynucleotide having a nucleotide sequence at least 95% identical to a reference nucleotide sequence, up to 5% of the nucleotides in the reference sequence may be deleted or substituted with another nucleotide, or a number of nucleotides up to 5% of the total nucleotides in the reference sequence may be inserted into the reference sequence. These mutations of the reference sequence may occur at the 5' or 3' terminal positions of the reference nucleotide sequence or anywhere between those terminal positions, interspersed either individually among nucleotides in the reference sequence or in one or more contiguous groups within the reference sequence. Analogously, for a polypeptide having an amino acid sequence having at least 95% "identity" to a reference amino acid sequence, it is intended that the amino acid sequence of the polypeptide is identical to the reference sequence except that the polypeptide sequence may include up to five amino acid alterations per each 100 amino acids of the reference amino acid. In other words, to obtain a polypeptide having an amino acid sequence at least 95% identical to a reference amino acid sequence, up to 5% of the amino acid residues in the reference sequence may be deleted or substituted with another amino acid, or a number of amino acids up to 5% of the total amino acid residues in the reference sequence may be inserted into the reference sequence. These alterations of the reference sequence may occur at the amino or carboxy terminal positions of the reference amino acid sequence or anywhere between those terminal positions, interspersed either individually among residues in the reference sequence or in one or more contiguous groups within the reference sequence.

[0055] The determined structure is made using the D1 protease amino acid sequence (SEQ ID NO: 1) and/or atomic coordinate/x-ray diffraction data, which are analyzed to provide atomic model output data corresponding to the three-dimensional structure, e.g., as provided on computer readable media. The computer analysis of the atomic coordinate/x-ray diffraction data and/or the amino acid sequence allows the calculation of the secondary and/or tertiary structures, domains, and/or subdomains of the protein. These domains are combined and refined by additional calculations using suitable computer subroutines to determine the most probable or actual three-dimensional structure of the D1 protease, including potential or actual active sites, binding sites or other structural or functional domains or subdomains of the protein. The resulting three-dimensional structure is represented as atomic model output data on the computer readable media.

[0056] Structure determination methods are also provided by the present invention for rational design of D1 protease ligands. Such design uses computer modeling programs that calculate different molecules expected to interact with the determined active sites, binding sites, or other structural or functional domains or subdomains of a D1 protease. These ligands can then be produced and screened for activity in modulating or binding to a D1 protease, according to methods and compositions of the present invention.

[0057] The actual D1 protease-ligand complexes can optionally be crystallized and analyzed using x-ray diffraction techniques. The diffraction patterns obtained are similarly used to calculate the three-dimensional interaction of the ligand and the D1 protease, to confirm that the ligand binds to, or changes the conformation of, particular domain(s) or subdomain(s) of the D1 protease. Such screening methods are selected from assays for at least one biological activity of a D1 protease. The resulting ligands, provided by methods of the present invention, modulate or bind at least one D1 protease and are useful as inhibitors of the D1 protease enzyme. Ligands of a particular D1 protease can similarly modulate other D1 proteases from other sources such as other plants.

[0058] A D1 protease is also provided as a crystallized protein suitable for x-ray diffraction analysis. The x-ray diffraction patterns obtained by the x-ray analysis are of moderate, to moderately high, to high resolution, e.g., equal to or better than 3.5 .ANG. where about 1.8 .ANG. to about 0.7 .ANG. is preferred. It is well understood in the art of x-ray diffraction that the lower the resolution figure the more refined the resolution and the more useful the data obtained from such a pattern. These diffraction patterns are suitable and useful for three-dimensional structure determination of a D1 protease, domain or subdomain thereof.

[0059] The determination of the three-dimensional structure of a D1 protease has a broad-based utility. Significant sequence identity and conservation of important structural elements are expected to exist among different D1 proteases and other homologs, including Prc protease (Genbank D00674; Hara, et al., Journal of Bacteriology 173, 4799-4813(1991)). Therefore, the three-dimensional structure from one or a few D1 proteases can be used to identify ligands that have the ability to inhibit the D1 protease enzyme or D1 protease homologs having different amino acid sequences. More specifically, the three-dimensional structure from one or more D1 proteases can be used to identify ligands that are inhibitory in other D1 proteases with different amino acid sequences. Inhibitors to D1 protease are expected to have herbicidal activity.

[0060] Isolated D1 Protease Polypeptides

[0061] A D1 protease polypeptide can refer to any subset of a D1 protease as a domain, subdomain, fragment, consensus sequence or repeating unit thereof. A D1 protease polypeptide of the present invention can be prepared by any of the following methods:

[0062] (a) recombinant DNA methods;

[0063] (b) proteolytic digestion of the intact molecule or a domain, subdomain or fragment thereof;

[0064] (c) chemical peptide synthesis methods well-known in the art; and/or

[0065] (d) by any other method capable of producing a D1 protease polypeptide and having a conformation similar to a structural or functional subdomain of a D1 protease.

[0066] A biological activity of D1 protease can be screened according to known and patented screening assays (Trost et al., J. Biol. Chem. 272:20348-20356 (1997); U.S. Pat. No. 5,876,945). The minimum peptide sequence to have activity is based on the smallest unit containing or comprising a particular domain, subdomain, fragment, region, consensus sequence, or repeating unit thereof, having at least one biological activity of a D1 protease, such as enzyme activity.

[0067] A D1 protease polypeptide of the invention can have at least 60% homology or sequence identity, such as 60-100% overall homology or identity, with one or more corresponding D1 protease subdomains or fragments as described herein, such as the amino acids of SEQ ID NO: 1. As would be understood by one of ordinary skill in the art, the above configurations of subdomains are provided as part of a D1 protease polypeptide of the invention, when expressed in a suitable host cell, or otherwise synthesized, to provide at least one structural or functional feature of a native D1 protease, such as at least one D1 protease-related biological activity. The active site of the D1 protease is the region most likely to be the subject of such analysis. The active site, in most D1 protease enzymes, spans a distance of about 40 amino acid residues, as for example in the Scenedesmsus enzyme where the active site region comprises amino acids 361 to 402. Comparisons of the active sites of D1 protease enzymes in this active site region to the Scenedesmsus active site by BESTFIT (version 9.0-OpenVMS, Genetics Computer Group (GCG)), using default parameters are shown below:

1 % identity with Scenedesmsus D1 D1 protease source protease Active Site Region Tobacco 71% Spinach 74% Wheat 74% Synechocystis CtpA 74% Synechocystis CtpC 60%

[0068] Thus, relevant D1 protease fragments, domains or sub-domains of D1 protease would have at least 60% amino acid identity to the D1 protease active site.

[0069] Such activities can be assayed using a suitable assay, to establish at least one D1 protease biological activity of one or more D1 protease of the invention. A D1 protease polypeptide of the invention is not naturally occurring or is naturally occurring but is in a purified isolated form which does not occur in nature. Assay methods for D1 protease are known. For example, Trost et al., (J. Biol. Chem. 272:20348-20356 (1997)) and U.S. Pat. No. 5,876,945 disclose a method of determining D1 protease activity. Alternatively, a suitable assay for D1 protease may be designed by the skilled person.

[0070] As previously noted, percent homology or identity can be determined, for example, by comparing sequence information using the GAP or BESTFIT computer programs (version 9.0-OpenVMS, Genetics Computer Group (GCG)). The GAP program utilizes the alignment method of Needleman and Wunsch (J. Mol. Biol. 48:443 (1970)) and performs the comparison across the entire length of the sequences. The BESTFIT program uses the local homology program of Smith and Waterman (Adv. Applied Mathematics 2:482-489 (1981)) to find the best segment of similarity between two sequences. The preferred default parameters for the GAP and BESTFIT programs are routinely used. Both programs define percent identity as the number of aligned symbols (i.e., nucleotides or amino acids) which are the same, in the respective aligned sequences, divided by the total number of symbols in the shorter of the two sequences.

[0071] Thus, one of ordinary skill in the art, given the teachings and guidance presented in the present specification, will know how to add, delete or substitute other amino acid residues in other positions of a D1 protease to obtain substituted, deletional or additional variants thereof.

[0072] Non-limiting examples of substitutions of D1 protease domains or polypeptides of the invention are those in which at least one amino acid residue in the protein molecule has been removed and a different residue added in its place. The types of substitutions which can be made in the protein or peptide molecule of the invention can be based on analysis of the frequencies of amino acid changes between a homologous protein of different species. Based on such an analysis, alternative substitutions are defined herein as exchanges within one of the following five groups:

[0073] 1. Small aliphatic, nonpolar or slightly polar residues: Ala, Ser, Thr (Pro, Gly);

[0074] 2. Polar, negatively charged residues and their amides: Asp, Asn, Glu, Gln;

[0075] 3. Polar, positively charged residues: His, Arg, Lys;

[0076] 4. Large aliphatic, nonpolar residues: Met, Leu, Ile, Val (Cys); and

[0077] 5. Large aromatic residues: Phe, Tyr, Trp.

[0078] Most deletions and additions and substitutions according to the invention are those which do not produce radical changes in the characteristics of the protein or peptide molecule. "Characteristics" is defined in a non-inclusive manner to define both changes in secondary structure, e.g., .alpha.-helix or .beta.-sheet, as well as changes in physiological activity, e.g., in biological activity assays. However, when the exact effect of the substitution, deletion, or addition is to be confirmed, one skilled in the art will appreciate that the effect of at least one substitution, addition or deletion will be evaluated by at least one D1 protease screening assay, such as, but not limited to, immunoassays or bioassays, to confirm at least one D1 protease biological activity.

[0079] Computer Related Embodiments

[0080] An amino acid sequence of a D1 protease (SEQ ID NO: 1) and/or atomic coordinate/x-ray diffraction data, useful for computer structure determination of a D1 protease or a portion thereof, can be "provided" in a variety of mediums to facilitate use thereof. As used herein, provided refers to a manufacture, which contains a D1 protease amino acid sequence and/or atomic coordinate/x-ray diffraction data of the present invention, e.g., the amino acid sequence provided in SEQ ID NO: 1, a representative fragment thereof, or an amino acid sequence having at least 60-100% overall identity of SEQ ID NO: 1, or at least 60% identity to the active site of the D1 protease enzyme. Such a medium provides the amino acid sequence and/or atomic coordinate/x-ray diffraction data in a form which allows a skilled artisan to analyze and determine the three-dimensional structure of a D1 protease or a subdomain thereof.

[0081] In one application of this embodiment, D1 protease, or at least one subdomain thereof, amino acid sequence and/or atomic coordinate/x-ray diffraction data of the present invention is recorded on computer readable media. As used herein, "computer readable media" refers to any medium which can be read and accessed directly by a computer. Such media include, but are not limited to: magnetic storage media, such as floppy discs, hard disc storage medium, and magnetic tape; optical storage media such as optical discs or CD-ROM; electrical storage media such as RAM and ROM; and hybrids of these categories such as magnetic/optical storage media. A skilled artisan can readily appreciate how any of the presently known computer readable media can be used to create a manufacture comprising computer readable medium having recorded thereon an amino acid sequence and/or atomic coordinate/x-ray diffraction data of the present invention.

[0082] As used herein, "recorded" refers to a process for storing information on computer readable medium. A skilled artisan can readily adopt any of the presently known methods for recording information on computer readable medium to generate manufactures comprising an amino acid sequence and/or atomic coordinate/x-ray diffraction data information of the present invention.

[0083] A variety of data storage structures are available to a skilled artisan for creating a computer readable medium having recorded thereon an amino acid sequence and/or atomic coordinate/x-ray diffraction data of the present invention. The choice of the data storage structure will generally be based on the means chosen to access the stored information. In addition, a variety of data processor programs and formats can be used to store the amino acid sequence and/or atomic coordinate/x-ray diffraction data of the present invention on computer readable medium. The amino acid sequence information can be represented in a word processing text file, formatted in commercially-available, word processing software, or represented in the form of an ASCII file, or stored in a database application. A skilled artisan can readily adapt any number of data-processor structuring formats (e.g., text file or database) in order to obtain computer readable medium having recorded thereon the information of the present invention.

[0084] By providing on computer readable media having stored therein a D1 protease sequence and/or atomic coordinates derived from x-ray diffraction data, a skilled artisan can routinely access the sequence and atomic coordinates or x-ray diffraction data to model a three dimensional structure of D1 protease, a subdomain thereof, or a ligand thereof. Computer algorithms are publicly and commercially available which allow a skilled artisan to access this data provided on a computer readable medium and analyze it for structure determination and/or rational inhibitor design. See, e.g., Biotechnology Software Directory, Mary Ann Liebert Publ., New York (1995).

[0085] The present invention further provides systems, particularly computer-based systems, which contain the amino acid sequence and/or atomic coordinate/x-ray diffraction described herein. Such systems are designed to do structure determination and rational design for a D1 protease or at least one subdomain thereof. Non-limiting examples are microcomputer workstations available from Silicon Graphics Incorporated and Sun Microsystems running Unix based, Windows NT or IBM OS/2 operating systems.

[0086] As used herein, "a computer-based system" refers to the hardware means, software means, and data storage means used to analyze the amino acid sequence and/or atomic coordinate/x-ray diffraction of the present invention. The minimum hardware means of the computer-based systems of the present invention comprises a central processing unit (CPU), input means, output means, and data storage means. A skilled artisan can readily appreciate which of the currently available computer-based systems are suitable for use in the present invention. A monitor is optionally provided to visualize structure data.

[0087] As stated above, the computer-based systems of the present invention comprise a data storage means having stored therein a D1 protease or fragment amino acid sequence and/or atomic coordinate/x-ray diffraction data of the present invention and the necessary hardware means and software means for supporting and implementing an analysis means. As used herein, "data storage means" refers to memory which can store amino acid sequence or atomic coordinate/x-ray diffraction data of the present invention, or a memory access means which can access manufactures having recorded thereon the amino acid sequence or atomic coordinate/x-ray diffraction data of the present invention.

[0088] As used herein, "search means" or "analysis means" refers to one or more programs which are implemented on the computer-based system to compare a target sequence or target structural motif with the amino acid sequence or atomic coordinate/x-ray diffraction data stored within the data storage means. Search means are used to identify fragments or regions of a D1 protease which match a particular target sequence or target motif. A variety of known algorithms are disclosed publicly and a variety of commercially available software for conducting search means are and can be used in the computer-based systems of the present invention. A skilled artisan can readily recognize that any one of the available algorithms or implementing software packages for conducting computer analyses that can be adapted for use in the present computer-based systems.

[0089] As used herein, "a target structural motif," or "target motif," refers to any rationally selected sequence or combination of sequences in which the sequence(s) are chosen based on a three-dimensional configuration or electron density map which is formed upon the folding of the target motif. There are a variety of target motifs known in the art. Protein target motifs include, but are not limited to, enzymatic active sites, structural subdomains, epitopes, functional domains and signal sequences. A variety of structural formats for the input and output means can be used to input and output the information in the computer-based systems of the present invention.

[0090] A variety of comparing means can be used to compare a target sequence or target motif with the data storage means to identify structural motifs or interpret electron density maps derived in part from the atomic coordinate/x-ray diffraction data. A skilled artisan can readily recognize that any one of the publicly available computer modeling programs can be used as the search means for the computer-based systems of the present invention.

[0091] Structure Determination

[0092] Crystallization of the instant D1 protease enzyme may be accomplished by a variety of means. For example crystals of the present D1 protease or D1 protease bound to a suitable ligand can be grown by, vapor diffusion (either by sitting drop or hanging drop) and by microdialysis. Seeding of the crystals in some instances is required to obtain x-ray quality crystals. Standard micro and/or macro seeding of crystals may therefore be used.

[0093] Of course, the specific D1 protease of the present invention provided herein serves only as an example, since the crystallization process can tolerate a range of lengths of the flexible portion of the protein. Similarly, the crystallization process will also tolerate a limited removal of amino acids in the globular portion (e.g., less than ten amino acids). Therefore, any person with skill in the art of protein crystallization having the present teachings and without undue experimentation could construct a variety of alternative forms of the D1 protease which could be crystallized.

[0094] Once a crystal of the present invention is grown, x-ray diffraction data can be collected using a synchrotron source such as Cornell High Energy Synchrotron source (CHESS), under standard cryogenic conditions. A variety of methods are available. For example the skilled person could characterize crystals by using x-rays produced in a conventional source (such as a sealed tube or a rotating anode) or using a synchrotron source. Methods of characterization include, but are not limited to, precision photography, oscillation photography and diffractometer data collection. Se-Met multiwavelength anomalous dispersion (MAD) data (Hendrickson, Science 254:51-58 (1991)) can be collected using reverse-beam geometry to record Friedel pairs at four x-ray wavelengths, corresponding to two remote points above and below the Se absorption edge and the K-absorption edge inflection point and peak. Data can be processed using readily available software such as DENZO and SCALEPACK (Szebenyi et al., AIP Conf: Proc. 417(Synchrotron Radiation Instrumentation):187-191 (1997)), for example.

[0095] Alternatively, it is possible to define the three dimensional structure of a protein using computer based methods such as molecular replacement and homology modeling. The method of molecular replacement combines the atomic coordinates for a reference protein and the x-ray diffraction data from the protein of interest to determine the three dimensional protein structure. The object in molecular replacement is to use this combined set of data to determine the relative positions of atoms within the crystal. The method may be accomplished using commercially available software such as AmoRe, fully described by Navaza et al., Methods Enzymol. (1997), 276(Macromolecular Crystallography, Part A), 581-594). Within the context of the present invention molecular replacement methods may be used to generate three dimensional structures for plant D1 protease enzymes using the method of molecular replacement and employing coordinates generated from the Scenedesmus obliquus enzyme and x-ray diffraction data from the plant enzyme.

[0096] The process of homology modeling uses a combination of the primary structure of the protein of interest and the crystal structure of at least one reference protein. The 3-dimensional model is generated based on the protein's amino acid sequence. The model may be constructed by first aligning the amino acid sequence of the protein of interest with the sequence of the reference protein. In regions where the homology between the two proteins is low, information gleaned from secondary structure and site directed mutageneis may be useful. Next, structurally conserved regions of the protein of interest are determined based on the alignment and then the coordinates for these regions are copied from the crystal structure data of the reference protein. The model is then refined using computerized methods. Homology modeling is a technique well known in the art and has been used to determine the three-dimensional structure of a variety of proteins (see for example Grazyna et al., Life Sciences, 61, 2507, (1997) describing the use of homology modeling for the determination of the three-dimensional structure of cytochrome p-450). The present invention provides a method for the determination of the three-dimensional structure of plant D1 protease enzymes using the crystal structure of the Scenedesmus enzyme and the amino acid sequence of the plant enzyme of interest.

[0097] The Fold of the Structure

[0098] D1 protease is an elongated shape monomeric molecule about 77.5 .ANG. long with the widest cross section measured 47.1 .ANG..times.27.6 .ANG. located in the middle section of the molecule. It contains three folding domains: (i) the A domain (amino acid residues 78-147, 401-415) containing a three-helix bundle followed by a short beta strand and a two turn helix; (ii) the B domain (residues 160-249) [which is a PDZ domain, as described in Ponting, Protein Science 6, 464 (1997)] containing a severely twisted five-stranded anti-parallel .beta.-sheet with a two turn helix sitting on top, and; (iii) the C-domain (residues 254-400, 416-463) containing two .beta.-sheets. Within the C domain one .beta.-sheet is a six-stranded mixed .beta.-sheet twisted about 100 degrees and with three helices packed against one side of the sheet and the C-terminal helix on the other side. The other .beta.-sheet is a small three stranded anti-parallel .beta.-sheet which has some contact with the three helices on the other sheet. The fifth strand on the large sheet and the first strand on the small sheet extend to the A domain and together with the beta strand in that domain form a three-stranded anti-parallel sheet. This part of the two beta strands (residue 401-415) is an integral part of the A domain. The linkers between domain A and domain B, as well as between domain B and domain C, have weaker density, indicating that the structure in these regions is more flexible than the rest of the structure. The B domain has very few interactions with the other two domains and therefore it is possible that the conformation observed in this structure may be affected by crystal packing. This domain may have the ability to adjust its orientation upon the binding of different substrates or inhibitors, or maybe even during the course of reaction. Superposition of the C2 I form and R32 form structure shows small but detectable domain movement.

[0099] Analysis of the Active Site

[0100] Unlike the classical serine proteases, D1 protease does not have a steep active site cleft. Instead, its active site region is rather opened, similar to the one in HCV protease (PDB ID code 1A1R. J. L. Kim et. al., Cell Vol. 87 page 343, 1996). The active site is formed by all three domains with the C domain on one side and the A and B domains on the other. This shallow cleft runs across the entire cross section of 47.1 .ANG. in the molecule. The opening of the cleft is about 15 .ANG. throughout the cleft. Both the active site Lys397 and Ser372 are located on the large C domain. They are located in the middle of the cross section and at the bottom of the cleft. The Lys397 is in the middle of the fifth strand of the large .beta.-sheet, one of the two strands that extends to the A domain. Ser372 is at the N-terminal of the 3.sup.rd alpha-helix. The distance between the two main chains' CA's of these two residues is 5.1 .ANG.. The NE of the Lys397 is hydrogen bonded to the OG of Thr168 and the OG of the serine side-chain which interacts with two water molecules in form C2 I. In form R32 the side-chain of the serine shows two conformations. The first interacting with a water molecule and the second interacting with the main chain carbonyl of Lys397. These observations show that, without the bound substrate, the active site residues can have more than one conformation in solution. In both cases, the two side chains are not within hydrogen bonding distance. However, computer modeling shows that they can be brought to form a hydrogen bond for catalysis by adjusting their side chain torsion angles. No density of the inhibitor phenylboronic acid, which was co-crystallized with the enzyme, can be found in the immediate vicinity of these two resides. In the active site cleft, there is a large and open hydrophobic pocket formed by the A and C domains with residues 320, 324, 337, 339, 347, 349, 376, 399, 400, and 419 on one side of the active site. This pocket is large enough to accommodate three or four hydrophobic or neutral side-chains. It is the likely binding site for the P side of the substrates bordering the scissile bond in which the sequences of the first four residues are absolutely conserved. There is a smaller hydrophobic patch, formed by residues 140, 152, 212, 213, and 403, on the other side of the active site. The patch is located on the bottom of the cleft between domains A and B. This part of the cleft is slightly deeper, however. This is likely the potential binding pocket for the p.sup.1 side of the substrate, in which only the P1 and the P2' residues of the substrate are also hydrophobic.

[0101] Analysis of the Surface Properties

[0102] The natural substrate of D1 protease is the C-terminal extention of the D1 polypeptide of the PS II reaction center, an integral membrane protein. It is likely that the D1 protease interacts with the membrane to facilitate the binding of substrate. However, electrostatic calculations, using the program MOLMOL (Koradi, R., Billeter, M., and Wuthrich, K., J. Mol. Graphics 14:51-55 (1996)), show no extensive positively charged areas on the protein surface that can be used for interaction with the membrane surface. It also has no large hydrophobic patch outside the active site cleft that can be used as a membrane binding site. This suggests that if the protease interacts with the membrane, the interacting area should be small and local. One possible candidate is a small cluster of four conserved Arg/Lys residues (residues 90, 94, 108 and 110) in the A domain near the putative hydrophobic binding pocket for the P side of the substrate.

[0103] Two conserved cysteine residues Cys260 and Cys451 are on the surface of the protein, and adjacent to each other. These two are the only cysteine residues in the Scenedesmus obliquus enzyme. They are also the only conserved cysteine residues among all known eukaryotic D1 proteases. They are remote from the active site cleft, and they form a disulfide bond in the native structure. In the Se-Met mutant structure, the disulfide bond is reduced, since the protein was prepared in the presence of 10 mM of reducing agent DTT. The breakage of this disulfide bond does not affect the enzymatic activity nor does it substantially change the structure of the Scenedesmus enzyme.

[0104] Predictive Methods for Ligand Design

[0105] The coordinates shown in FIG. 1 define the hydrogen bonding network for the D1 protease Scenedesmus enzyme. This model can be used for visualizing the orientations and interactions of amino acids within the active site for the purpose of designing novel ligands and substrates of the enzyme through the use of computer modeling using a docking program such as GRAM, DOCK, or AUTODOCK (Dunbrack et al., 1997, supra), to identify potential ligands and/or antagonists for D1 protease. This procedure can include computer fitting of potential ligands to the ligand binding site to ascertain how well the shape and the chemical structure of the potential ligand will complement the binding site (Bugg et al., Scientific American December: 92-98 (1993); West et al., TIPS 16:67-74 (1995)). Computer programs can also be employed to estimate the attraction, repulsion, and steric hindrance of the two binding partners (i.e., the ligand-binding site and the potential ligand). Generally the tighter the fit, the lower the steric hindrances, and the greater the attractive forces, the more potent the potential ligand or inhibitor since these properties are consistent with a tighter binding constant. Furthermore, the greater the specificity in the design of a potential ligand the more likely that the ligand will not interact as well with other proteins. This will minimize potential side-effects due to unwanted interactions with other proteins.

[0106] Initially potential ligands and/or agonists can be selected for their structural similarity to a known ligand, such as the tetrapeptide chloromethylketone (Z-LDLA-CMK) [SEQ ID NO: 11], where Z=carbobenzoxy, and CMK=chloromethylketone, and LDLA represent the tetrapeptide Leu-Asp-Leu-Ala. The structural analog can then be systematically modified by computer modeling programs until one or more promising potential ligands are identified. Alternatively a potential ligand could be obtained by initially screening a random peptide library produced by recombinant bacteriophage for example, (Scott and Smith, Science, 249:386-390 (1990); Cwirla et al., Proc. Natl. Acad. Sci., 87:6378-6382 (1990); Devlin et al., Science 249:404-406 (1990)). Preferred for use in the present invention is the program Sybyl.RTM. (TRIPOS). 1

[0107] Within the computer program Sybyl.RTM. (TRIPOS) ligand molecules may be visualized by using the Build/Edit algorithms to make and break bonds and to add or delete atoms to aid in the design of novel ligands and substrates. The models allow for the visualization of designed or other inhibitors in three dimensions within the active site (after removal of the ligand structures from the models) by using the docking routine within Sybyl.RTM. or other such programs to manually position such inhibitors within the active site. After manually docking the ligands the D1 protease-ligand structures may be minimized by using the minimization procedures within Sybyl.RTM. in order to improve the models. After deleting the ligand, computer programs such as DOCK.RTM. (written by Paul McCloskey, University of California; a WWW site for the DOCK.RTM. program may be found at the URL http://www.cmpharm.ucsf.edu/kuntz/dock.ht- ml) or UNITY.RTM. (TRIPOS) may be used for computer automated dockings of three dimensional libraries of compounds as described in Kuntz, I. D. et al., Acc. Chem. Res. 27:117-123 (1994)) and Kuntz, I. D., Science 257:1078-1082 (1992) which aid in the discovery of novel ligands and substrates. Such programs apply constraints imposed by the enzyme active site and other constraints imposed by the user for computer generation of three dimensional sub-structures which are useful for searching through three dimensional data bases. The models lacking ligands using coordinates as displayed in FIG. 1 (for example) may be applied to computer programs such as Leapfrog.RTM. (TRIPOS) for building virtual molecules within the active site from small three dimensional molecular fragments for the purpose of discovering new ligands and substrates of the enzyme. Sybyl.RTM., DOCK.RTM., UNITY.RTM., Leapfrog.RTM. and other such computer programs can calculate an approximate binding energy for each of the molecules docked thus allowing the user to select favorable molecules for synthesis and substrate analysis against the activity of the enzyme. Useful ligands of D1 protease discovered by these enablements may be evaluated for their ability to inhibit the enzyme.

EXAMPLES

General Methods

[0108] Standard recombinant DNA and molecular cloning techniques used here are well known in the art and are described by Sambrook et al. (1989), J., Fritsch, E. F. and Maniatis, T. Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, 1989 (hereinafter "Maniatis"); and by T. J. Silhavy, M. L. Bennan, and L. W. Enquist, Experiments with Gene Fusions, Cold Spring Harbor Laboratory Press, Cold Spring, N.Y. (1984) and by Ausubel et al., Current Protocols in Molecular Biology, pub. by Greene Publishing Assoc. and Wiley-Interscience (1987).

[0109] Materials and methods suitable for the maintenance and growth of bacterial cultures are well known in the art. Techniques suitable for use in the following examples may be found as set out in Manual of Methods for General Bacteriology (Phillipp Gerhardt, R. G. E. Murray, Ralph N. Costilow, Eugene W. Nester, Willis A. Wood, Noel R. Krieg and G. Briggs Phillips, eds), American Society for Microbiology, Washington, D.C. (1994)) or by Thomas D. Brock in Biotechnology: A Textbook of Industrial Microbiolog, Second Edition, Sinauer Associates, Inc., Sunderland, Mass. (1989). All reagents, restriction enzymes and materials used for the growth and maintenance of bacterial cells were obtained from Aldrich Chemicals (Milwaukee, Wis.), DIFCO Laboratories (Detroit, Mich.), GIBCO/BRL (Gaithersburg, Md.), or Sigma Chemical Company (St. Louis, Mo.) unless otherwise specified.

[0110] Manipulations of genetic sequences were accomplished using the suite of programs available from the Genetics Computer Group Inc. (Wisconsin Package Version 9.0, Genetics Computer Group (GCG), Madison, Wis.). Where the GCG program "Pileup" was used the gap creation default value of 12, and the gap extension default value of 4 were used. Where the CGC "Gap" or "Bestfit" programs were used the default gap creation penalty of 50 and the default gap extension penalty of 3 were used. In any case where GCG program parameters were not prompted for, in these or any other GCG program, default values were used.

[0111] The meaning of abbreviations is as follows: "sec" means second(s), "min" means minute(s), "h" means hour(s), "d" means day(s), ".mu.L" means microliter(s), "mL" means milliliter(s), "L" means liter(s), "mM" means millimolar, "M" means molar, "mmol" means millimole(s).

[0112] Plasmids and Bacterial Strains:

[0113] Plasmids: Scenedesmus obliquus D1P insert in PET-32a expression vector

[0114] Bacteria host strain: BL21(DE3)plysS

[0115] Media and Buffers:

[0116] Media:

[0117] LB medium

[0118] M9 complete medium:

[0119] 2.times.M9 salts

[0120] 2 mm MgSO.sub.4

[0121] 25 .mu.g/ml FeSO.sub.4-7H.sub.2O

[0122] 0.4% glucose

[0123] 40 .mu.g/ml Amino Acid mix I

[0124] 40 .mu.g/ml Amino Acid mix II

[0125] 1 .mu.g/ml vitamin mix

[0126] 2-20 .mu.g/ml uracil

[0127] 40 .mu.g/ml L-methionine or L-seleno-methionine

[0128] pH.about.7.0

[0129] Stock solutions for preparing M9 complete medium 20.times.M9 salts:

[0130] 10 g NH.sub.4Cl

[0131] 30 g KH.sub.2PO.sub.4

[0132] 68 g Na.sub.2HPO.sub.4 or 128 g Na.sub.2HPO.sub.4-7H.sub.2O

[0133] add H.sub.2O to 500 mL

[0134] Amino Acid mix I:

[0135] 16 amino acids, each at 4 mg/mL

[0136] excluding Met, Tyr, Trp, Phe

[0137] Amino Acid mix II:

[0138] 3 amino acids, each at 4 mg/mL

[0139] Tyr, Trp, Phe

[0140] Tyr is hard to dissolve, add last

[0141] final solution may still be turbid, resuspend well before use

[0142] L-methionine or L-seleno-methionine: 10 mg/mL

[0143] Uracil: 2 mg/mL, dissolve in 65.degree. C. H.sub.2O

[0144] Glucose: 20%

[0145] MgSO.sub.4: 1M

[0146] FeSO.sub.4-7H.sub.2O: 12.5 mg/mL

[0147] Vitamin mix: each at 1 mg/mL, store at -20.degree. C. riboflavin, niacinamide, pyridoxine monohydrochloride, thiamine riboflavin may not dissolve completely, filter the mix

[0148] Buffers:

[0149] Lysis buffer:

[0150] 20 mM HEPES pH 7.2

[0151] 1 mM EDTA

[0152] 5 mM MgCl.sub.2

[0153] 0.1% Triton X-100

[0154] 0.1 mg/mL lysozyme

[0155] 0.01 mg/mL RNAse

[0156] 0.05 mg/mL DNAse

[0157] Denaturing buffer:

[0158] 8 M guanidinine hydrocloride

[0159] 20 mM HEPES pH 7.2

[0160] 1 mM EDTA

[0161] 5 mM DTT, freshly added

[0162] 100 .mu.M PMSF, freshly added

[0163] Inclusion body wash buffer:

[0164] 20 mM HEPES pH 7.2

[0165] 1 mM EDTA

[0166] 0.1% Triton X-100

[0167] 0.3 M NaCl

[0168] Refolding Buffer:

[0169] 20 mM MES pH 6.0

[0170] 10% Glycerol

[0171] 10 mM CHAPS

[0172] 1 mM EDTA

[0173] 1 mM GSSG

[0174] 1 mM GSH

[0175] 100 .mu.M PMSF

[0176] Wash Buffer:

[0177] 20 mM pH 6.5

[0178] 10% Glycerol

[0179] 10 mM CHAPS

[0180] 1 mM EDTA

[0181] 100 .mu.M PMSF

[0182] MonoQ Buffer:

[0183] Buffer A:

[0184] 20 mM MES pH 6.5

[0185] 10% Glycerol

[0186] 10 mM CHAPS

[0187] 1 mM EDTA

[0188] 100 .mu.M PMSF

[0189] Buffer B:

[0190] 20 mM MES pH 6.5

[0191] 10% Glycerol

[0192] 10 mM CHAPS

[0193] 1 mM EDTA

[0194] 100 .mu.M PMSF

[0195] 1 M NaCl

[0196] TSK Buffer:

[0197] 20 mM HEPES pH 7.2

[0198] 10% Glycerol

[0199] 10 mM CHAPS

[0200] 1 mM EDTA

[0201] 100 .mu.M PMSF

[0202] Buffer modifications for L-seleno-methionine labeled protein:

[0203] 10 mM DTT was added into all buffers

Example 1

[0204] Cloning Scenedesmus obliquus D1 Protease Gene for Expression

[0205] The polymerase chain reaction (PCR) was used to amplify the coding region for the mature D1 protease, by simultaneously using as template the overlapping 5' Race and 3' Race PCR products described in Trost et al. (J. Biol. Chem. 272:20348-20356 (1997)). The 5' primer sequence was ATG ACC ATG GTG ACA AGC GAG CAG CTG CTG TT (SEQ ID NO: 2) and contained an Nco1 site, while the 3' primer sequence was AGC TGA TGC GGA TCC TTA CCC AAA CAG CCG CGG CGC A (SEQ ID NO: 3) and contained a BamH1 site. The resulting 1.2 kb product was initially ligated into the pGEM-t vector (Promega, Madison Wis.) and transformed into Escherichia coli, which was plated on LB ampicillin. Plasmid DNA was recovered from selected colonies using the Promega Wizard miniprep kit, and then digested with Nco1 and BamH1 restriction enzymes to excise the D1 protease gene fragment. This fragment was ligated into the expression vector pET-32a (Novagen). It should be noted that cloning into the pET-32a vector resulted in the expression of a fusion protein consisting of thioredoxin plus two affinity tags linked to mature D1 protease. Cleavage of the fusion by enterokinase results in a mature D1 protease (D1 protease (+AM)) that is longer by two amino acids (alanine+methionine) than the native mature protein (SEQ ID NO: 10). Nucleotide sequencing was used to confirm the wild type sequence.

Example 2

[0206] Site-Directed Mutagenesis

[0207] MAD (Multiwavelength Anomalous Diffraction), using the selenium K-edge, was used for solving the crystallographic phase problem. Ideally, MAD phasing requires the presence of at least one seleno-methionine per 10 kDa of protein mass. As the wild type D1 protease (+AM) contains only three methionines, it was decided to add two additional ones to the protein (SEQ ID NO: 10). Site-directed mutagenesis was used to replace codons Leu57 (corresponding to Leu132 of SEQ ID NO: 1) and Leu135 (corresponding to Leu210 of SEQ ID NO: 1) with methionine codons, giving the polypeptide as set forth in SEQ ID NO: 4. These leucines were chosen because there are methionines located in these positions in higher plant versions of the D1 protease (e.g. spinach, wheat and tobacco). The mutated protease would then contain five methionines per 40.8 kDa, suitable for MAD phasing using seleno-methionine. The mutations were simultaneously introduced using a procedure involving PCR, reannealing, and fill-in synthesis (FIG. 2). The primers GAT GCC ATC CGC AAG ATG CTG GCG GTG CTG GAC (L132M-fwd; SEQ ID NO: 5) and GTC CAG CAC CGC CAG CAT CTT GCG GAT GGC ATC (L132M-rev; SEQ ID NO: 6) were used to modify L132, while the primers ACG GCT GTG AAG GGG ATG TCG CTG TAT GAC GTG (L210M-fwd; SEQ ID NO: 7) and CAC GTC ATA CAG CGA CAT CCC CTT CAC AGC CGT (L210M-rev; SEQ ID NO: 8) were used to modify L210. The mutagenic PCR was done in two separate reactions, using as template the pET-32a-D1P(+AM) protease expression construct described above. Oligonucleotide primers, L132M-fwd (SEQ ID NO: 5) plus L210M-rev (SEQ ID NO: 8), produced a 270 bp fragment. Oligonucleotide primers, L132M-rev (SEQ ID NO: 6) and L210M-fwd (SEQ ID NO: 7) produced a 6.76 kb fragment, which included the vector sequence. The two fragments were combined, melted, and annealed so as to prime each other for synthesis of a complete 7.03 kb construct. The synthesis reaction contained 7.5 units Pfu polymerase, 1.times.reaction buffer (Stratagene) and 5 .mu.L 10 mM nucleotide stock (Stratagene) in a volume of 50 .mu.L. The reaction mix was held at 72.degree. C. for 30 min to allow for polishing of 3' extensions, then cycled once at 94.degree. C. for 1 min, 60.degree. C. for 30 sec and 68.degree. C. for 20 min. Ten .mu.L of the synthesis reaction was used to transform XL1-blue host cells which were plated on LB ampicillin. Six colonies were picked for sequence verification. All contained the desired mutations.

Example 3

[0208] Expression of Scenedesmus obliquus D1 Protease

[0209] The Escherichia coli host expression strain BL21(DE3)plysS (Novagen) was transformed using plasmid pET-32(a)-D1P(+AM) according to standard protocols (Novagen). The transformed cells were plated on solid LB medium containing 150 .mu.g/mL ampicillin and incubated overnight at 37.degree. C. A single colony containing the mature wild-type Scenedesmus obliquus D1 protease expression clone (+AM) was inoculated into 250 mL LB medium plus carbanecillin (100 .mu.g/mL) and incubated at 37.degree. C. overnight on a rotary shaker. The overnight culture was used to inoculate 9.75 L fresh LB medium plus carbanecillin in a 10-L fermentor. Once the optical density reached 0.4-0.5 at 600 nm, 1 mM IPTG (isopropyl-.beta.-D-thiogalactopyranoside) was added to induce expression. After 2.5-3 h of induction at 37.degree. C., the cells were harvested by centrifugation at 8000 rpm using a GSA rotor (Sorvall), frozen in liquid nitrogen and stored at -75.degree. C. The 10-L culture yielded about 25 g of wet cell paste.

[0210] To obtain L-seleno-methionine labeled protein, a single colony of BL21(DE3)plysS(met.sup.-), bearing expression vector with mutated (Leu132 and 210 replaced by Met) mature Scenedesmus obliquus D1 protease (+AM), was inoculated into 20 mL M9 complete medium containing L-methionine (40 .mu.g/mL) plus 100 .mu.g/mL carbanecillin. The culture was incubated at 37.degree. C. overnight on a rotary shaker. The bacteria were then collected, washed and resuspended in 20 mL M9 complete medium without L-methionine. Two liters of M9 complete medium containing L-seleno-methionine (40 .mu.g/mL) and 100 .mu.g/mL carbanecillin were inoculated with the washed bacteria. The two liters were distributed equally among four 6-L flasks. The cells were grown at 37.degree. C. until the OD.sub.600 reached 0.6. Protein expression was then induced with 1 mM IPTG at 37.degree. C. and allowed to continue overnight. The cells were harvested by centrifugation at 8000 rpm using a GSA rotor (Sorvall). Approximately 5 g wet weight bacteria paste/2 L culture was collected.

Example 4

[0211] Inclusion Body Isolation

[0212] Bacterial cell paste was resupended in Lysis buffer (1 g wet weight cells/2 mL Lysis buffer) and incubated on ice for 15 min. The lysate was sonicated (Branson Sonifier cell disruptor 185) for 1 min on ice to ensure complete lysis. Following sonication, the lysate was incubated on ice for another 30 min with occasional mixing, and centrifuged at 20,000.times.g for 20 min. The pellet containing inclusion bodies was collected and washed with Inclusion body wash buffer for at least 5 times before the pellet was solubilized with Denaturing buffer.

Example 5

[0213] Refolding of Solubilized Fusion Protein

[0214] Fifty mL of fusion protein, solubilized in Denaturing buffer (OD.sub.280=1), was added while stirring to 1 L of Refolding buffer at a rate of 0.1 mL/min at 4.degree. C. The Refolding buffer+protein was then left to stir overnight at 4.degree. C.

Example 6

[0215] Sample Preparation and Chromatography Purifications

[0216] The Refolding buffer+protein was concentrated to 50 mL and washed with MonoQ buffer A to lower the guanidinium hydrochloride concentration to less than 10 mM. The concentrated and washed fusion protein was loaded onto an HR10/10 MonoQ column (Pharmacia) preequilibrated with MonoQ buffer A. The protein was eluted using a 0-1 M NaCl linear gradient elution. The active fusion protein peak eluting at 90 mM NaCl was pooled, concentrated and digested with recombinant enterokinase (Novagen) at a concentration of 1 unit/300 .mu.g fusion protein to release the mature Scenedesmus obliquus D1 protease (+AM). The recombinant protease (D1 protease (+AM)) contains two additional amino acids (Ala and Met) at its N-terminus as compared to the natural mature D1 protease. The extra residues have no effect on enzyme activity. The products of the overnight digestion were then desalted on a BioRad Econo-Pac 10DG column and loaded onto a MonoQ HR10/10 column preequilibrated with the MonoQ Buffer A. Gradient elution proceeded as with the fusion protein except that the mature polypeptide eluted at 78 mM NaCl. The active fractions were pooled and concentrated to less than 500 .mu.L for size exclusion chromatography on a G-2000SW TSK-gel column (TosoHaas). The active mature Scenedesmus obliquus D1 protease (+AM) fractions were pooled, concentrated to 3.5 mg/mL in an Amicon concentrator cell (YM30 membrane), frozen in liquid nitrogen and stored at minus 75.degree. C.

Example 7

[0217] Preparation of D1 Protease for Crystallography

[0218] The concentrated Scenedesmus obliquus D1 protease (+AM) protease was diluted 40-fold into 20 mM HEPES-NaOH, pH 7.5 plus 1 mM phenylboronic acid and concentrated back to 50 .mu.L using a Centricon 30 concentrator (Millipore). This enzyme was then used as is for crystallization trials.

Example 8

[0219] Crystallization of D1 Protease from Scenedesmus obliquus

[0220] Single crystals of D1 protease from Scenedesmus obliquus were obtained at room temperature (.about.20.degree. C.) by vapor diffusion in hanging drops. The hanging drop experiments were set up on Q plate II multi-well trays from Hampton Research. The crystallization drops consist of 1 .mu.L of 3.5 mg/mL protein in 20 mM HEPES pH 7.5 and 1 mM phenylboronic acid, and 1.0 .mu.L of reservoir solution. Each drop was mixed on a siliconized glass cover slip. The cover slip was inverted and placed over a reservoir containing 0.5 or 1.0 mL of reservoir solution. The crystallization tray was then sealed with clear tape. Crystals were obtained from two different conditions. The reservoir solution in condition number one contains 17-18% PEG 4K, 10% isopropanol and 0.1 M HEPES pH 7.5. The reservoir solution in condition number two contains a mixture of 30-40% saturated ammonium sulfate and 10-20% of 2 M lithium sulfate. Two crystal forms with the same space group C2 and slightly different cell dimensions were obtained from condition number one. Form C2 has the cell dimensions of a=110.9 .ANG. b=64.05 .ANG. c=63.4 .ANG. and .beta.=122.0.degree.; form C2 II has the dimensions of a=108.6 .ANG. b=63.12 .ANG. c=60.68 .ANG. and .beta.=119.8.degree.. The diffraction limit for both of them is 1.8 .ANG.. These crystals were transferred to stabilizing solution containing 20% PEG4000 10% isopropanol, 0.1 M HEPES pH 7.5 and 20% glycerol prior to data collection at cryo-temperature. The crystals were either fresh frozen in liquid propane or in a minus 170.degree. C. cryo-stream. The crystals obtained from condition number two have the space group of R32 and cell dimensions of a=b=148.7 .ANG., c=100.31 .ANG. using hexagonal indexing. The crystals were quickly washed in the solution containing 45% saturated ammonium sulfate, 10% of 2 M lithium sulfate and 20% glycerol right before being put in the minus 160 or the minus 170.degree. C. cold nitrogen stream for data collection.

Example 9

[0221] Data Collection and Structure Determination of L132M/L210M Mutant of Scenedesmus obliquus D1 Protease

[0222] The structure of L132M/L210M mutant of Scenedesmus obliquus D1 protease has been solved to 2.2 .ANG. resolution by selenomethionine mutliwavelength anomalous diffraction (MAD) method (Hendrickson, W. A., Horton J. R., LeMaster D. M., EMBO J. 9:1665-1672 (1990)). The native enzyme has only three methionines, including one at the N-terminus. The double mutant was designed and created to generate additional selenium sites in order to augment the MAD signal for structure determination. The Se-Met mutant was crystallized in conditions close to those of the native enzyme, in the presence of 0-0.5% percent BME or 0-5 mM DTT. MAD data sets were collected at the APS 5-ID beam line. The exact anomalous absorption edge of the Se-Met protein crystal used for data collection was determined by X-ray fluorescence measurement using an AMPTEK detector. A four-wavelength MAD data set at the wavelengths of the inflection point (0.97891 .ANG.), the peak (0.97876 .ANG.), high remote (0.96369 .ANG.) and low remote (0.99462 .ANG.) of the anomalous absorption spectrum was collected at a temperature of minus 160.degree. C., using a MAR CCD detector. The entire four-wavelength data set was collected from one C2 I form crystal. A data set of 100% completeness at a resolution of 1.8 .ANG. was collected for each wavelength. These data were processed with the program DENZO/SCALEPACK (Otwinowski, Z., "Oscillation Data Reduction Program," in, Data Collection and Processing, Sawyer, L., Isaccs, N. and Bailey S., eds, pp. 56-62 (1993), SECR Daresbury Laboratory, Warrington, UK). The data set of each wavelength was processed twice, one with each Friedel pair merged and one with each Friedel pair as two independent reflections. The crystal used for this data collection was form C2 I.

[0223] The locations of four of the selenium sites were solved by direct method with the program SHELX 97 (Sheldrick, G. M., "Location of Heavy Atoms by Automated Patterson Interpretation," in, Direct Methods for Solving Macromolecular Structures, Fortier, S., ed., pp. 131-141 (1998), Dordrecht: Kluwer Academic Publishers). The phase problem was solved using the program PHASES (Furey, W. and Swaminathan, S., Am. Crystl. Assoc. Mtg. Abstr. PA33:18:73 (1990)) by treating the MAD data as a special case of multiple isomorphous replacement (MIR) (Ramakrishnan, V. and Biou, V. Methods Enzymol. 276:538-557 (1997)) problem. The dispersion component of the difference in anomalous scattering was isolated by calculating the difference amplitude of the same reflection measured at different wavelengths. Data used for this calculation were processed with the Friedel pair merged. The differences in the dispersion between the wavelengths were used as isomorphous differences in the phase refinement and calculation. The absorption component was isolated by measuring the difference between the two reflections of the Friedel pair in a data set with each Friedel pair treated as two independent reflections. These were used as the anomalous differences in the phase refinement and calculation. The data set of low-remote wavelength showed no anomalous scattering signal, dispersion or absorption, and was used as native. Local scaling implemented in the program PHASES was used for scaling data sets of other wavelengths to the native for isomorphous phase refinement. The positions, isomorphous occupancies, anomalous occupancies and B factor of the four selenium sites were refined using maximum likelihood refinement. A set of protein phases were derived from these refined parameters. The resulting Fourier map was then modified by solvent flatting, histogram matching and Sayer's equation, using program DM (Cowtan K., Joint CCP4 (1994) and ESF-EACBM Newsletter on Protein Crystallography 31:34-38) in the CCP4 package (Collaborative Computational Project Number 4, "The CCP4 Suite: Programs for Protein Crystallography", Acta. Crystallogr. D50:760-763 (1994)). The modified map was of superior quality and allowed one to build the main-chains and side-chains with great confidence. Densities corresponded to a large number of water molecules can also be seen in this map. The map was displayed and the three dimensional model was constructed using the computer graphics program O (Jones et al., Acta. Crystallogr. A47:110-119 (1991)) on a Silicon Graphics R10000 computer.

Example 10

[0224] Refinement of L132M/L210M Mutant of Scenedesmus obliquus D1 Protease

[0225] The initial structure was refined with X-PLOR (Brunger, et al. Science (1987) 235:458-460), using 90% of the data between 10.0 and 1.8 .ANG. for which F>2 .sigma. .vertline.F.vertline.. A free R factor was calculated for the remaining 10% of the data at each refinement cycle. A total of four cycles of refinement was carried out. Each cycle consists of simulated annealing using the slow-cooling protocol of X-PLOR, restrained B-factor refinement and manual model adjustment using program O (Jones et al., Acta Crystallogr. A47:110-119 (1991)). Water molecules were incorporated into the model at cycles 2-4 by inspecting the Fo-Fc map contoured at 3.5 .sigma. after each cycle. At the last cycle of refinement only the data between 6.0 and 1.8 .ANG. were used. The final data set is shown in FIG. 1.

[0226] The current model contains 385 residues, out of the total of 389 and 325 water molecules. Only three residues in the N-terminal and one in C-terminal are missing from the model. The working R factor for this model is 18.6% and the free R factor is 24.5% for 34125 reflections used for the refinement. The rms deviations from ideal values for bond lengths and bond angles are 0.009 .ANG. and 1.486 degrees.

Example 11

[0227] Structure of Native Scenedesmus obliquus D1 Protease Structure (Crystal Forms C2 I)

[0228] The refined Se-Met mutant model with water molecules removed was used to refine the native C2 I form 1.9 .ANG. data set. The data set was collected at minus 170.degree. C. on an Raxis IV imaging plate using X-ray generated by Kigaku rotating anode x-ray generator. X-PLOR was used for the refinement. The working R factor is 28.1% and the free R factor is 32.0% after one cycle of rigid body refinement, using the entire molecule as a group, one cycle of positional refinement and one cycle of restrained B-factor refinement. This indicates that the mutations and Se-Met substitution did not cause significant distortion in the structure. This data set is shown in FIG. 5.

Example 12

[0229] Structure Determination of Native Scenedesmus obliquus D1 Protease Structure (Crystal Form R32)

[0230] The data set of the native R32 form was collected in the same manner as native C2 I. Molecular replacement using the refined Se-Met mutant structure with water removed as the search model, was done using the program AMoRe (Navaza, J., Acta Crystallogr. A50:157-163 (1994)). Program X-PLOR was used for the refinement. Rigidbody refinement was done by breaking up the model into three folding domains and allowing each domain to move independently. After one cycle of positional refinement and one cycle of B-factor refinement, the working R factor is 27.0% and the free R factor is 37.0%. This data set is shown in FIG. 6.

Example 13

[0231] Building a Homology Model of Wheat D1 Protease Based on the Coordinates of Scenedesmus obliquus D1 Protease

[0232] A three-dimensional model of wheat D1 protease was constructed based on the three-dimensional atomic coordinates of Scenedesmus obliquus D1 protease listed in FIG. 1. The amino acid sequence of D1 protease from wheat is presented in SEQ ID NO: 9. The amino acid sequence of this protein was found to be approximately 53% identical to that of the Scenedesmus obliquus D1 protease when compared with the GAP program (GCG), as shown in FIG. 3 using the default program values. Atomic coordinates of the Scenedesmus obliquus D1 protease were loaded into the molecular modeling package Sybyl.RTM.. By using the Biopolymer package within Sybyl.RTM., amino acids of the Scenedesmus obliquus D1 protease were mutated to reflect the amino acid sequence of wheat D1 protease. Insertions and deletions were conducted using the annealing routine of Biopolymer. Finally, the model of wheat D1 protease was minimized by using the energy minimization routine of Sybyl.RTM. holding the protein backbone constant (in an aggregate), adding hydrogens fully to the structure, and adding charges. The predicted atomic coordinates of the resulting three-dimensional model are listed in FIG. 4. The model for wheat D1 protease may be used for inhibitor design by applying one of several methods for docking potential inhibitors within the constraints of the active site defined by the model.

Example 14

[0233] Crystal Structure and Computer Modeling of D1P in Complex With A Irreversible Peptide Chloromethylketone Inhibitor

[0234] Crystals of D1 protease covalently modified by a peptide chloromethylketone with the sequence Leu-Asp-Leu-Ala, which mimics the P site of the substrate, have been obtained by hanging drop experiments as described in Example 8. In this case, the well solution consists of 20% (w/v) PEG 3000, 0.1 M Tris buffer at pH 7.0. The crystal form is similar to the C2I form with the cell dimension of a=111.8 .ANG., b=64.1 .ANG., c=63.2 .ANG. and .beta.=122.2.degree.. The crystals diffract x-rays to 1.6 .ANG. resolution. The structure was determined and refined by using the C2I form inhibitor-free structure as the starting model and using the same refinement protocol described in the Example 10. The working crystallographic R-value was 20.7% and the free R-value is 27.3% for data between 10.0 1.6 .ANG.. The refined coordinates are presented in FIG. 7.

[0235] The electron density in the active site region of this structure indicates that the inhibitor is covalently bound to the Lys 397 residue. However, only three atoms closest to the NZ atom of the lysine side-chain can be seen in the electron density map. However, based on the conformation of the lysine side chain and the residual density produced by the disordered inhibitor, a hypothetical model of the chloromethylketone inhibitor has been built to identify the potential binding site of that part of the substrate mimicked by the inhibitor (FIG. 8). This model suggests that the P side of the substrate is bound to the large hydrophobic patch described earlier in the analysis of the active site section.

Sequence CWU 1

1

11 1 464 PRT Scenedesmus obliquus 1 Met His Ser Arg Thr Asn Cys Leu Gln Thr Ser Val Arg Ala Pro Gln 1 5 10 15 Pro His Phe Arg Pro Phe Thr Ala Val Lys Thr Cys Arg Gln Arg Cys 20 25 30 Ser Thr Thr Ala Ala Ala Ala Lys Arg Asp Gln Ala Gln Glu Gln Gln 35 40 45 Pro Trp Ile Gln Val Gly Leu Gly Leu Ala Ala Ala Ala Thr Ala Val 50 55 60 Ala Val Gly Leu Gly Ala Ala Ala Leu Pro Ala Gln Ala Val Thr Ser 65 70 75 80 Glu Gln Leu Leu Phe Leu Glu Ala Trp Arg Ala Val Asp Arg Ala Tyr 85 90 95 Val Asp Lys Ser Phe Asn Gly Gln Ser Trp Phe Lys Leu Arg Glu Thr 100 105 110 Tyr Leu Lys Lys Glu Pro Met Asp Arg Arg Ala Gln Thr Tyr Asp Ala 115 120 125 Ile Arg Lys Leu Leu Ala Val Leu Asp Asp Pro Phe Thr Arg Phe Leu 130 135 140 Glu Pro Ser Arg Leu Ala Ala Leu Arg Arg Gly Thr Ala Gly Ser Val 145 150 155 160 Thr Gly Val Gly Leu Glu Ile Thr Tyr Asp Gly Gly Ser Gly Lys Asp 165 170 175 Val Val Val Leu Thr Pro Ala Pro Gly Gly Pro Ala Glu Lys Ala Gly 180 185 190 Ala Arg Ala Gly Asp Val Ile Val Thr Val Asp Gly Thr Ala Val Lys 195 200 205 Gly Leu Ser Leu Tyr Asp Val Ser Asp Leu Leu Gln Gly Glu Ala Asp 210 215 220 Ser Gln Val Glu Val Val Leu His Ala Pro Gly Ala Pro Ser Asn Thr 225 230 235 240 Arg Thr Leu Gln Leu Thr Arg Gln Lys Val Thr Ile Asn Pro Val Thr 245 250 255 Phe Thr Thr Cys Ser Asn Val Ala Ala Ala Ala Leu Pro Pro Gly Ala 260 265 270 Ala Lys Gln Gln Leu Gly Tyr Val Arg Leu Ala Thr Phe Asn Ser Asn 275 280 285 Thr Thr Ala Ala Ala Gln Gln Ala Phe Thr Glu Leu Ser Lys Gln Gly 290 295 300 Val Ala Gly Leu Val Leu Asp Ile Arg Asn Asn Gly Gly Gly Leu Phe 305 310 315 320 Pro Ala Gly Val Asn Val Ala Arg Met Leu Val Asp Arg Gly Asp Leu 325 330 335 Val Leu Ile Ala Asp Ser Gln Gly Ile Arg Asp Ile Tyr Ser Ala Asp 340 345 350 Gly Asn Ser Ile Asp Ser Ala Thr Pro Leu Val Val Leu Val Asn Arg 355 360 365 Gly Thr Ala Ser Ala Ser Glu Val Leu Ala Gly Ala Leu Lys Asp Ser 370 375 380 Lys Arg Gly Leu Ile Ala Gly Glu Arg Thr Phe Gly Lys Gly Leu Ile 385 390 395 400 Gln Thr Val Val Asp Leu Ser Asp Gly Ser Gly Val Ala Val Thr Val 405 410 415 Ala Arg Tyr Gln Thr Pro Ala Gly Val Asp Ile Asn Lys Ile Gly Val 420 425 430 Ser Pro Asp Val Gln Leu Asp Pro Glu Val Leu Pro Thr Asp Leu Glu 435 440 445 Gly Val Cys Arg Val Leu Gly Ser Asp Ala Ala Pro Arg Leu Phe Gly 450 455 460 2 32 DNA Artificial Sequence Description of Artificial Sequence primer 2 atgaccatgg tgacaagcga gcagctgctg tt 32 3 37 DNA Artificial Sequence Description of Artificial Sequence primer 3 agctgatgcg gatccttacc caaacagccg cggcgca 37 4 389 PRT Scenedesmus obliquus 4 Ala Met Val Thr Ser Glu Gln Leu Leu Phe Leu Glu Ala Trp Arg Ala 1 5 10 15 Val Asp Arg Ala Tyr Val Asp Lys Ser Phe Asn Gly Gln Ser Trp Phe 20 25 30 Lys Leu Arg Glu Thr Tyr Leu Lys Lys Glu Pro Met Asp Arg Arg Ala 35 40 45 Gln Thr Tyr Asp Ala Ile Arg Lys Met Leu Ala Val Leu Asp Asp Pro 50 55 60 Phe Thr Arg Phe Leu Glu Pro Ser Arg Leu Ala Ala Leu Arg Arg Gly 65 70 75 80 Thr Ala Gly Ser Val Thr Gly Val Gly Leu Glu Ile Thr Tyr Asp Gly 85 90 95 Gly Ser Gly Lys Asp Val Val Val Leu Thr Pro Ala Pro Gly Gly Pro 100 105 110 Ala Glu Lys Ala Gly Ala Arg Ala Gly Asp Val Ile Val Thr Val Asp 115 120 125 Gly Thr Ala Val Lys Gly Met Ser Leu Tyr Asp Val Ser Asp Leu Leu 130 135 140 Gln Gly Glu Ala Asp Ser Gln Val Glu Val Val Leu His Ala Pro Gly 145 150 155 160 Ala Pro Ser Asn Thr Arg Thr Leu Gln Leu Thr Arg Gln Lys Val Thr 165 170 175 Ile Asn Pro Val Thr Phe Thr Thr Cys Ser Asn Val Ala Ala Ala Ala 180 185 190 Leu Pro Pro Gly Ala Ala Lys Gln Gln Leu Gly Tyr Val Arg Leu Ala 195 200 205 Thr Phe Asn Ser Asn Thr Thr Ala Ala Ala Gln Gln Ala Phe Thr Glu 210 215 220 Leu Ser Lys Gln Gly Val Ala Gly Leu Val Leu Asp Ile Arg Asn Asn 225 230 235 240 Gly Gly Gly Leu Phe Pro Ala Gly Val Asn Val Ala Arg Met Leu Val 245 250 255 Asp Arg Gly Asp Leu Val Leu Ile Ala Asp Ser Gln Gly Ile Arg Asp 260 265 270 Ile Tyr Ser Ala Asp Gly Asn Ser Ile Asp Ser Ala Thr Pro Leu Val 275 280 285 Val Leu Val Asn Arg Gly Thr Ala Ser Ala Ser Glu Val Leu Ala Gly 290 295 300 Ala Leu Lys Asp Ser Lys Arg Gly Leu Ile Ala Gly Glu Arg Thr Phe 305 310 315 320 Gly Lys Gly Leu Ile Gln Thr Val Val Asp Leu Ser Asp Gly Ser Gly 325 330 335 Val Ala Val Thr Val Ala Arg Tyr Gln Thr Pro Ala Gly Val Asp Ile 340 345 350 Asn Lys Ile Gly Val Ser Pro Asp Val Gln Leu Asp Pro Glu Val Leu 355 360 365 Pro Thr Asp Leu Glu Gly Val Cys Arg Val Leu Gly Ser Asp Ala Ala 370 375 380 Pro Arg Leu Phe Gly 385 5 33 DNA Artificial Sequence Description of Artificial Sequence primer 5 gatgccatcc gcaagatgct ggcggtgctg gac 33 6 33 DNA Artificial Sequence Description of Artificial Sequence primer 6 gtccagcacc gccagcatct tgcggatggc atc 33 7 33 DNA Artificial Sequence Description of Artificial Sequence primer 7 acggctgtga aggggatgtc gctgtatgac gtg 33 8 33 DNA Artificial Sequence Description of Artificial Sequence primer 8 cacgtcatac agcgacatcc ccttcacagc cgt 33 9 388 PRT Triticum sp. 9 Leu Thr Glu Glu Asn Leu Leu Phe Leu Glu Ala Trp Arg Ala Val Asp 1 5 10 15 Arg Ala Tyr Tyr Asp Lys Ser Phe Asn Gly Gln Ser Trp Phe Arg Tyr 20 25 30 Arg Glu Arg Ala Leu Arg Asp Asp Pro Met Asn Thr Arg Gln Glu Thr 35 40 45 Tyr Ala Ala Ile Lys Lys Met Leu Ala Thr Leu Asp Asp Pro Phe Thr 50 55 60 Arg Leu Leu Glu Pro Glu Lys Phe Lys Ser Leu Arg Ser Gly Thr Gln 65 70 75 80 Gly Ala Leu Thr Gly Val Gly Leu Ser Ile Gly Tyr Pro Leu Ala Leu 85 90 95 Lys Gly Ser Pro Ala Gly Leu Ser Val Met Ser Ala Ala Pro Gly Gly 100 105 110 Pro Ala Glu Lys Ala Gly Ile Val Ser Gly Asp Val Ile Leu Ala Ile 115 120 125 Asp Asp Thr Ser Ala Gln Asp Met Asp Ile Tyr Asp Ala Ala Asp Arg 130 135 140 Leu Gln Gly Pro Glu Gly Ser Ser Ile Asp Leu Thr Ile Leu Ser Gly 145 150 155 160 Ala Asp Thr Arg His Val Val Leu Lys Arg Glu Arg Tyr Thr Leu Asn 165 170 175 Pro Val Arg Ser Arg Met Cys Glu Ile Pro Gly Ser Glu Asp Ser Ser 180 185 190 Lys Ile Gly Tyr Ile Lys Leu Thr Thr Phe Asn Gln Asn Ala Ala Gly 195 200 205 Ser Val Lys Glu Ala Ile Lys Lys Leu Arg Glu Lys Asn Val Lys Ala 210 215 220 Phe Val Leu Asp Leu Arg Asn Asn Ser Gly Gly Leu Phe Pro Glu Gly 225 230 235 240 Ile Glu Ile Ala Lys Ile Trp Met Asp Lys Gly Val Ile Val Tyr Ile 245 250 255 Cys Asp Ser Arg Gly Val Arg Asp Ile Tyr Glu Ala Asp Gly Ala Ser 260 265 270 Thr Ile Ala Ala Ser Glu Pro Leu Val Val Leu Val Asn Lys Gly Thr 275 280 285 Ala Ser Ala Ser Glu Ile Leu Ala Gly Ala Leu Lys Asp Asn Lys Arg 290 295 300 Ala Val Val Tyr Gly Glu Pro Thr Tyr Gly Lys Gly Lys Ile Gln Ser 305 310 315 320 Val Phe Ala Leu Ser Asp Gly Ser Gly Leu Ala Val Thr Val Ala Arg 325 330 335 Tyr Glu Thr Pro Ala His Thr Asp Ile Asp Lys Val Gly Val Thr Pro 340 345 350 Asp Arg Pro Leu Pro Ala Ser Phe Pro Thr Asp Glu Asp Gly Phe Cys 355 360 365 Ser Cys Leu Arg Asp Pro Ala Ser Cys Asn Leu Asn Ala Ala Arg Leu 370 375 380 Phe Val Arg Ser 385 10 389 PRT Scenedesmus obliquus 10 Ala Met Val Thr Ser Glu Gln Leu Leu Phe Leu Glu Ala Trp Arg Ala 1 5 10 15 Val Asp Arg Ala Tyr Val Asp Lys Ser Phe Asn Gly Gln Ser Trp Phe 20 25 30 Lys Leu Arg Glu Thr Tyr Leu Lys Lys Glu Pro Met Asp Arg Arg Ala 35 40 45 Gln Thr Tyr Asp Ala Ile Arg Lys Leu Leu Ala Val Leu Asp Asp Pro 50 55 60 Phe Thr Arg Phe Leu Glu Pro Ser Arg Leu Ala Ala Leu Arg Arg Gly 65 70 75 80 Thr Ala Gly Ser Val Thr Gly Val Gly Leu Glu Ile Thr Tyr Asp Gly 85 90 95 Gly Ser Gly Lys Asp Val Val Val Leu Thr Pro Ala Pro Gly Gly Pro 100 105 110 Ala Glu Lys Ala Gly Ala Arg Ala Gly Asp Val Ile Val Thr Val Asp 115 120 125 Gly Thr Ala Val Lys Gly Leu Ser Leu Tyr Asp Val Ser Asp Leu Leu 130 135 140 Gln Gly Glu Ala Asp Ser Gln Val Glu Val Val Leu His Ala Pro Gly 145 150 155 160 Ala Pro Ser Asn Thr Arg Thr Leu Gln Leu Thr Arg Gln Lys Val Thr 165 170 175 Ile Asn Pro Val Thr Phe Thr Thr Cys Ser Asn Val Ala Ala Ala Ala 180 185 190 Leu Pro Pro Gly Ala Ala Lys Gln Gln Leu Gly Tyr Val Arg Leu Ala 195 200 205 Thr Phe Asn Ser Asn Thr Thr Ala Ala Ala Gln Gln Ala Phe Thr Glu 210 215 220 Leu Ser Lys Gln Gly Val Ala Gly Leu Val Leu Asp Ile Arg Asn Asn 225 230 235 240 Gly Gly Gly Leu Phe Pro Ala Gly Val Asn Val Ala Arg Met Leu Val 245 250 255 Asp Arg Gly Asp Leu Val Leu Ile Ala Asp Ser Gln Gly Ile Arg Asp 260 265 270 Ile Tyr Ser Ala Asp Gly Asn Ser Ile Asp Ser Ala Thr Pro Leu Val 275 280 285 Val Leu Val Asn Arg Gly Thr Ala Ser Ala Ser Glu Val Leu Ala Gly 290 295 300 Ala Leu Lys Asp Ser Lys Arg Gly Leu Ile Ala Gly Glu Arg Thr Phe 305 310 315 320 Gly Lys Gly Leu Ile Gln Thr Val Val Asp Leu Ser Asp Gly Ser Gly 325 330 335 Val Ala Val Thr Val Ala Arg Tyr Gln Thr Pro Ala Gly Val Asp Ile 340 345 350 Asn Lys Ile Gly Val Ser Pro Asp Val Gln Leu Asp Pro Glu Val Leu 355 360 365 Pro Thr Asp Leu Glu Gly Val Cys Arg Val Leu Gly Ser Asp Ala Ala 370 375 380 Pro Arg Leu Phe Gly 385 11 4 PRT Artificial Sequence Description of Artificial Sequence tetrapeptide 11 Leu Asp Leu Ala 1

* * * * *

References

cmpharm.ucsf.edu/kuntz/dock.html