System And Method For Identifying Disease-influencing Genes Brown; Stephen J. [Brown; Stephen J.]

System And Method For Identifying Disease-influencing Genes

Brown; Stephen J.

Patent Application Summary

U.S. patent application number 13/408334 was filed with the patent office on 2012-08-09 for system and method for identifying disease-influencing genes. Invention is credited to Stephen J. Brown.

Application Number	20120203466 13/408334
Document ID	/
Family ID	46087692
Filed Date	2012-08-09

United States Patent Application	20120203466
Kind Code	A1
Brown; Stephen J.	August 9, 2012

SYSTEM AND METHOD FOR IDENTIFYING DISEASE-INFLUENCING GENES

Abstract

A system and method for using individuals' behavioral and environmental information in conjunction with their gene sequences to find drug candidates and drug targets. Individuals designated as having a high risk for developing a particular disease are each given a remotely programmable apparatus. Queries related to the individuals' behavior and environment are sent from a server to the remotely programmable apparatuses. The individuals' responses to the queries and any physiological information are sent back to the server. The process of collecting individuals' information can take place over a period of time to ensure accurate data and to allow researchers to observe progression of the disease. A data mining program on the server analyzes the individuals' behavioral and environmental information, as well as their gene sequences.

Inventors:	Brown; Stephen J.; (San Mateo, CA)
Family ID:	46087692
Appl. No.:	13/408334
Filed:	February 29, 2012

Related U.S. Patent Documents


Application Number	Filing Date	Patent Number
13303622	Nov 23, 2011
13408334
09496893	Feb 2, 2000	8078407
13303622

Current U.S. Class:	702/19
Current CPC Class:	G16H 70/60 20180101; G16H 40/40 20180101; G16H 40/67 20180101; A61B 5/087 20130101; G16H 10/60 20180101; A61B 5/4839 20130101; A61B 5/743 20130101; A61B 2560/0443 20130101; A61B 5/14532 20130101; A61B 5/6896 20130101; A61B 5/0002 20130101; G16H 10/20 20180101; G16H 10/40 20180101; G16H 15/00 20180101; A61B 5/0205 20130101
Class at Publication:	702/19
International Class:	G06F 19/28 20110101 G06F019/28

Claims

1. A method for generating groups of individuals useful in researching influence of a disease on said individuals, comprising: selecting individuals having a risk factor for a disease; providing to each individual a communications apparatus; transmitting a computer program containing queries and predefined response choices to said communications apparatus, wherein said computer program when executed causes said communications apparatus to present said queries and predefined response choices to each individual via a touch sensitive display of said communications apparatus and collect responses to said queries, including at least one of the predefined response choices presented on the display of the communications apparatus, from each individual via said touch sensitive display of the communications apparatus; receiving said responses to the queries from the individuals through the apparatus, said responses communicating information about the individuals; storing the responses of each individual in a database; defining a plurality of groups by categorizing the individuals having similar profiles based on the responses, wherein categorizing the individuals into groups includes one or more phenotypic classifications; after defining said groups, receiving genotype information representative of individuals in each of said groups; comparing said genotype information between said groups; and generating a report for presentation on a display that represents a subset of said genotype information associated with each of said groups, wherein differences in said genotype information between said groups is expressed in terms of phenotypic classifications.

2. The method according to claim 1, wherein the queries are inserted into said computer program with a script generator and assigned to an individual using a script assignor.

3. The method according to claim 1, wherein the one or more phenotypic classifications comprise one or more of behavioral, environmental, and disease progression.

4. The method according to claim 1, wherein the communication apparatus is connectable with a monitoring device configured to acquire physiologic data.

5. The method according to claim 4, wherein the monitoring device includes one of the set consisting of a blood glucose meter, a respiratory flow meter, a blood pressure cuff, a weight scale, and a pulse rate monitor.

6. The method according claim 1, wherein the responses to the queries from the individuals communicate environmental information about the individuals.

7. The method according claim 6, wherein the environmental information comprises one or more of non-genetic information about an individual, information about disease progression, information about diet, information about lifestyle, and information about geographical location.

8. The method according claim 1, wherein the queries contained in said computer program are related to one or both of behavior and environment of each individual.

9. A system for generating groups of individuals useful in researching influence of a disease on said individuals, comprising: a communications apparatus operable by an individual; and a communication network in signal communication with the communications apparatus and a server, a workstation configured to send scripted queries and predefined response choices, a genotyping system configured to provide genotype information representative of the individual, and a patient profile system configured to receive responses from the individual and genotype information analyses via the communications network and the server, wherein the server transmits a computer program containing the scripted queries and predefined response choices to the communications apparatus, the computer program when executed causes the communications apparatus to present the scripted queries and predefined response choices to the individual via a touch sensitive display of said communications apparatus and collect responses to the queries containing information about the individual and at least one of the predefined response choices presented on the touch sensitive display of said communications apparatus, wherein the genotype information is compared based upon groups formed by categorizing individuals having a risk factor for a disease using the responses to the scripted queries in the patient profile system to identify one or more individuals having similar profiles, wherein categorizing the individuals into groups includes one or more phenotypic classifications, and differences in said genotype information between said groups is expressed in terms of phenotypic classifications.

10. The system according claim 9, wherein the responses from the individual are used to categorize the individual into one or more groups and the one or more groups are compared with the genotype information of the individual to categorize said genotype information according to disease progression.

11. The system according claim 10, wherein the disease progression includes non-insulin dependent diabetes.

12. The system according claim 9, wherein the one or more phenotypic classifications comprise one or more of behavioral, environmental, and disease progression.

13. A system for identifying groups of individuals useful in researching influence of disease on said individuals, comprising: at least one communications apparatus in signal communication with a monitoring device configured to measure physiologic and environmental conditions, the communications apparatus and monitoring device being operable by at least one individual; and a communication network in signal communications with each communications apparatus and a server, a workstation configured to send scripted queries and predefined response choices, a genotyping system configured to provide genotype information representative of the at least one individual, and a patient profile system configured to receive responses and measurements from the at least one individual and genotype information analyses via the communications network and the server, wherein the server transmits a computer program containing the scripted queries and predefined response choices to the communication apparatus, the computer program when executed causes the communication apparatus to present the scripted queries and predefined response choices to the individual via a touch sensitive display of the communications apparatus and collect responses to the queries containing information about the individual and at least one of the predefined response choices presented on the touch sensitive display of said communications apparatus, wherein the genotype information representative of the at least one individual is compared based upon groups formed by categorizing individuals having a risk factor for a disease using the responses and measurements to the scripted queries in the patient profile system to identify one or more individuals having similar profiles, wherein categorizing the individuals into groups includes one or more phenotypic classifications, and differences in said genotype information between said groups is expressed in terms of phenotypic classifications.

14. The system according claim 13, wherein the monitoring device includes one of the set consisting of a blood glucose meter, a respiratory flow meter, a blood pressure cuff, a weight scale, and a pulse rate monitor.

15. The system according claim 13, wherein the responses and measurements from each individual are used to categorized each individual with one or more groups and the groups are compared with the genotype information representative of each individual to categorize the genotype information according to disease progression of each individual in the one or more groups based on the responses and measurements sent by each individual.

16. The system according claim 15, wherein the disease progression includes non-insulin dependent diabetes.

17. The system according claim 13, wherein the one or more phenotypic classifications comprise one or more of behavioral, environmental, and disease progression.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS

[0001] This is a continuation of U.S. Ser. No. 13/303,622, filed Nov. 23, 2011, which is a continuation of U.S. Ser. No. 09/496,893, filed Feb. 2, 2000, which are each herein incorporated by reference.

RELATED APPLICATION INFORMATION

[0002] This application is related to copending patent application Ser. No. 08/946,341 filed Oct. 7, 1997 which is herein incorporated by reference.

FIELD OF INVENTION

[0003] This invention relates generally to the fields of genomics, bioinformatics, and drug development. More specifically, it relates to a database containing phenotypic and environmental data on groups of individuals for use in conjunction with gene sequences to identify disease-influencing genes and substances.

BACKGROUND OF THE INVENTION

[0004] The physical makeup of an individual is determined by his or her genes. Genes are comprised of DNA, which in turn consists of four nucleotides known as adenine(A), thymine(T), cytosine(C), and guanine(G). A particular series of nucleotides

[0005] is known as a gene sequence. Each gene sequence codes for a protein. A defective or mutant gene sequence will not produce a working protein. The protein may not perform its purpose, the protein may carry out a different purpose than intended, too much protein may be made, too little protein may be made, or the protein may not be made at all. If the protein is essential to one or more functions of the body, disease will result.

[0006] Mutant gene sequences are either inherited or acquired. An inherited gene sequence is received from an individual's parents, while an acquired gene sequence results from an event in the individuals lifetime which changes the original gene sequence.

[0007] A classic example of an inherited mutant gene sequence is the sickle cell anemia gene. Sickle cell anemia is caused by the substitution of a single nucleotide (A to T) in the gene sequence of an individual. This single substitution results in the substitution of a single amino acid (glutamic acid to valine) in the resulting hemoglobin protein. The mutant hemoglobin protein produces crescent-shaped or sickled red blood cells in affected individuals, causing a decrease in the amount of oxygen that can be transported throughout the body. The lack of oxygen often results in kidney and heart failure, paralysis, and rheumatism, which are common symptoms of anemic individuals.

[0008] An example of an acquired mutant gene sequence is malignant melanoma, or skin cancer. Cancer results when normal cells in an individual's body either lose or gain certain functions, resulting in the unchecked growth of non-normal cells. These non-normal cells often form tumors and spread throughout the body, disrupting normal cell functions. A cancer such as malignant melanoma is caused when the original gene sequence in epidermal cells is changed or mutated by an environmental factor, such as UV radiation. Our cells contain repair mechanisms to fix such problems, but over time the gene sequences in epidermal cells acquire more and more mutations. Mutant proteins are then produced and cellular functions are disrupted. The individual then has skin cancer.

[0009] Although an individual's environment generally precipitates the development of cancer, many individuals have been found to have a predisposition to cancer. These individuals have gene sequences which are more likely to become mutated over a shorter period of time. Examples of such gene sequences are the BRCA1 and BRCA2 genes. Women carrying these gene sequences have a higher probability of,developing breast and ovarian cancer than women who carry normal gene sequences. Thus, although the affected women's original gene sequences may not be mutated, they are more likely to become mutated due to their sequence or location on a chromosome.

[0010] Another factor that should be considered when discussing genetic diseases is whether they are monogenic or polygenic in nature. Sickle cell anemia and cystic fibrosis are examples of monogenic diseases, as they are caused by a single gene sequence. Most types of cancer, asthma, and diabetes are examples of polygenic diseases, as they are caused by a variety of genes. Polygenic diseases are also more likely to be influenced by an individual's environment. Not surprisingly, polygenic diseases are more difficult to diagnose and treat. Thus, the use of gene sequences in developing new drugs is dependent the monogenic or polygenic nature of genetic diseases.

[0011] Typically, individuals with diseases caused by inherited or acquired gene sequences have only their symptoms treated. Diabetes patients receive insulin shots to regulate their blood glucose levels, asthma patients use inhalers to allow normal respiratory functions, and cancer patients undergo chemotherapy and radiation therapy to remove cancerous tumors. Although these treatments are often able to alleviate or eliminate the symptoms, they are unable to remove the genetic bases of the diseases.

[0012] The genetic bases of many diseases were discovered in the 1940's by scientists such as Beadle and Tatum, who discovered that each gene codes for a protein. Researchers then rationalized that study of the relevant gene sequences could lead to effective drug treatments for genetic diseases. The technology was inadequate, however, until the 1970-80's, when Boyer and Cohen cloned DNA; Maxam, Gilbert, and Sanger figured out how to sequence DNA; and Mullis developed the polymerase chain reaction (PCR) technique to quickly amplify DNA sequences. Using genetics to find drug candidates soon became a practical option.

[0013] Before these techniques became available, the pharmaceutical industry's main method of finding new drugs was trial and error. Compounds that were found to mimic the body's natural compounds were tested in vitro, in animal models, and in clinical trials to see if they had a desirable effect in treating disease. This method is still used and has resulted in many well-known drugs, but it is expensive and time-consuming.

[0014] With the advent of improved genetic techniques, however, the pharmaceutical industry has begun concentrating on genetics as the most effective route to new drug discovery. Genomics companies can typically be classified into one of two groups.

[0015] The first group concentrates on gene sequencing in order to find both drug targets and drug candidates, usually in the form of proteins expressed by the gene sequences. Gene sequencing can either be in the form of random discovery, whereby genes are sequenced without regard to their functions, or in the form of targeted discovery, whereby a certain region of the genome which is tentatively associated with a disease is sequenced. In random discovery gene sequencing, potentially useful gene sequences are identified and assayed to determine if they can be used in drug development. One problem with random discovery gene sequencing is that the majority of the human genome contains introns, or gene sequences which do not code for proteins. One way to circumvent this problem is to sequence complementary DNA (cDNA) instead. cDNA is produced from messenger RNA (mRNA). mRNA, in turn, is transcribed from DNA and processed by certain enzymes which remove the introns. cDNA sequences thus code for un-interrupted proteins.

[0016] Targeted discovery gene sequencing is typically used with positional cloning, comparative gene expression, and functional cloning techniques, which are described in the next group.

[0017] The second group of genomics companies takes a more epidemiological approach by first researching families or groups of individuals having a similar disease, and then isolating the relevant genes. In this method, also known as positional cloning, blood samples are taken from the individuals and analyzed. The blood samples contain DNA, which is studied to identify certain regions of the genome which appear to be associated with the disease. Linking a region of the genome with a disease is known as linkage analysis or genetic linkage mapping. Once a region of the genome has been identified, it is sequenced via targeted discovery gene sequencing.

[0018] The second group of genomics companies also uses comparative gene expression to discover disease gene sequences. In comparative gene expression, mRNA from both healthy and diseased tissue is isolated. The mRNA is then used to produce cDNA, which is sequenced using targeted discovery gene Sequencing. The gene sequences from both the healthy and diseased tissue are then compared. In addition, the identification of genes associated with disease can be made by studying the level of expression of genes in both the healthy and diseased tissue.

[0019] Another similar technique is functional cloning. Mutant or non-functional proteins in metabolic pathways are studied and identified. The proteins are sequenced using targeted discovery gene sequencing and these sequences are used to figure out the corresponding DNA gene sequences. Once the disease gene sequences have been identified, they can be used in drug development.

[0020] Genomics companies in the first group include Incyte Pharmaceuticals (Palo Alto, Calif.). Incyte uses random discovery gene sequencing to produce its LifeSeq.TM. and LifeSeq FL.TM. databases. These databases contain the sequences of hundreds of human genes. These databases are licensed to drug development companies who use the sequences to produce new drugs. Databases covering animals (ZooSeq.TM.), plants (PhytoSeq.TM.), and bacteria and fungi (PathoSeq.TM.) are also available. Incyte has also developed bioinformatics software, which provides sequence analysis and data management for their databases. In addition, Incyte offers cDNA libraries of the gene sequences in their databases, which can be directly used in drug development.

[0021] Human Genome Sciences (Rockville, Md.) also concentrates on random discovery gene sequencing, and has sequenced an estimated 90% of the 100,000 genes in the human body. In addition to collaborating with drug development companies who use their gene sequences, HGS also has its own drug discovery and development division. A number of therapeutic proteins which appear effective in animal models are under study.

[0022] Hyseq, Inc. (Sunnyvale, Calif.) has its HvX Platform which is capable of processing and sequencing millions of blood and DNA samples. The HyX Platform includes DNA arrays of samples and probes, software-driven modules, industrial robots for screening DNA probes against DNA samples, and bioinformatic software to analyze the genetic information. Through the use of its HyX Platform, HyX believes it can carry out a variety of techniques, such as gene identification, gene expression level determination, gene interaction studies (for polygenic diseases), and genetic mapping.

[0023] Affymetrix, Inc. (Santa Clara, Calif.) has a GeneChip system consisting of disposable DNA probe arrays containing gene sequences on a chip, instruments to process the probe arrays, and software to analyze and manage the genetic information in the probe. The GeneChip system thus allows pharmaceutical and biotechnology companies to collect gene sequences and apply them to drug development.

[0024] On the other hand, the pharmaceutical industry has a number of genomics companies who first identify the genes which are likely to cause disease. After the genes are identified, they are sequenced and the gene sequences are used in drug development. Likewise, proteins implicated in disease can be identified and sequenced. The sequences can be used to discover the gene sequences, which are then used in drug development.

[0025] Myriad genetics, Inc. (Salt Lake City, Utah) targets families with a history of genetic disease and collects their genetic material in order to identify hereditary disease-causing genes, Myriad is able to identify these genes by using positional cloning and protein interaction studies in combination with targeted discovery gene sequencing. Using these techniques, Myriad has been able to locate and identify eight disease-related gene sequences, including BRCA1 and BRCA2. These gene sequences are used by Myriad's pharmaceutical partners to develop new therapeutics.

[0026] Another genomics company which uses disease inheritance patterns together with gene sequencing is Sequana (La Jolla, Calif.). Sequana uses DNA collection of individuals with inherited diseases, genotyping and linkage analysis, physical mapping, and gene sequencing to find disease gene sequences. Sequana also has a proprietary bioinformatics system which includes data mining tools to automatically sort and organize much of its data. Like Myriad, Sequana has a number of alliances with drug development companies which license Sequana's gene sequences.

[0027] Millennium Pharmaceuticals, Inc. (Cambridge, Mass.) employs a broader range of technologies than Myriad and Sequana. In addition to positional cloning and targeted discovery gene sequencing, Millennium uses a number of other non-genetic techniques. cDNA libraries are prepared from mouse tissues and expressed using rapid expression of differential gene expression (RARE) technology. Different patterns of cDNA gene expression allow researchers to identify possible disease targets. Millennium also uses functional cloning techniques in order to identify the gene sequences of interesting proteins. Once a potentially useful gene sequence has been identified, biological assays and bioinformatics are used as additional analyses.

[0028] Genome Therapeutics Corporation (Waltham, Mass.) uses a combination of positional cloning techniques and targeted discovery gene sequencing, as well as random discovery gene sequencing to isolate and identify disease gene sequences. In addition, Genome Therapeutics also has pathogen programs, which sequence pathogen genomes. As many non-genetic human diseases result from infection by pathogens, Genome Therapeutics hopes to eliminate pathogens by developing drugs and vaccines using the pathogens' genomes.

[0029] Gene Logic, Inc. (Columbia, Md.) has an accelerated drug discovery system which emphasizes its restriction enzyme analysis of differentially expressed sequences (READS) technology. READS is similar in nature to comparative gene expression technology. In READS, normal and diseased tissues are compared in order to identify gene expression differences between the two. Genes which appear to be important in the diseased tissue are then analyzed. Restriction enzymes, which cut gene sequences at specific sites, are used to produce gene fragments. The gene fragments from the normal and diseased tissues will differ and can be compared. Gene Logic also has a Flow-thru Chip and genomic databases, which it licenses to drug development companies.

[0030] Progenitor (Columbus, Ohio) focuses on developmental biology. Growing cells and tissues are analyzed for their level of expression of certain genes. Study of growing cells and tissues may help discover treatments for diseases characterized by abnormal cell growth, such as cancer and osteoporosis. Progenitor also uses bioinformatics, gene mapping, and gene sequencing to isolate, identify, and sequence relevant gene sequences.

[0031] OncorMed, Inc. (Gaithersburg, Md.) has focused on the development of medical services using genetic information. Oncormed offers a number of tests for hereditary diseases such as breast and colon cancers and malignant melanoma. The medical services include measurements of replication error rates in tumors, molecular profiling of tumor suppresser genes, and gene sequencing. In addition, OncorMed has a genomics repository containing known cancer gene sequences.

[0032] U.S. Pat. No. 5,642,936 issued to Evans and assigned to OncorMed describes a method for identifying human hereditary disease patterns. According to the method, data is collected on individuals having a history of disease within their families. Factors related to each disease are given weights, and the weights for each individual are summed. If the sum is above a certain predetermined threshold value, the individual is deemed to have a hereditary risk for the disease. Records from a number of individuals having a hereditary risk for a disease are collected to form a database.

[0033] The methods used by the above companies all focus on the genetic aspect of hereditary disease. Gene sequencing and positional cloning represent the two approaches generally taken. However, very little emphasis is put on the environmental aspect of hereditary disease. An individual's environment is defined as his or her physical surroundings, geographical location, diet, lifestyle, etc. For many diseases which are genetic in origin, such as most cancers, an individual's environment plays a large role in determining whether or not the individual eventually develops the disease. Some individuals who have disease gene sequences develop diseases, while others who carry the exact same disease gone sequences do not. One purpose of collecting environmental data about individuals whose gene sequences are studied is to effectively rule out any non-genetic causes of disease. Another purpose is to discover if any individuals who are carrying disease gene sequences but who do not develop the disease have other compensatory gene sequences or factors which enable them to live disease-free.

[0034] To a certain extent, the second group of genomics companies do take into account a small amount of environmental data when they select individuals whose DNA they use for positional cloning analyses. The environmental data is usually in the form of a questionnaire or survey. However, the data is typically limited in scope to lifestyle questions, and is used only to help narrow the search for the specific disease gene in question.

[0035] In addition, most genomics companies are reluctant to share their data on individuals' with others, even those genomics companies which are studying the same gene sequences. As a result, each genomics company must gather its own data on individuals having a certain disease. For example, Sequana sent its own researcher to the island of Tristan de Cunha to study hereditary asthma, while Myriad is located in Salt Lake City to take advantage of the detailed family trees of the Mormons. For genomics companies searching for gene sequences, gathering environmental data on individuals is often an expensive, time-consuming, but necessary step. Genomics companies could potentially spend more of their time and money on actual disease gene isolation if they were able to obtain necessary environmental data from another source.

[0036] Another problem lies in the fact that when genomics companies do gather environmental data on the individuals whose gene sequences are studied, the environmental data represents only a small time frame of an individual's life. Few genomics companies continually collect data over a long period of time, and as a result, are not able to definitively rule out certain environmental factors which may affect disease progression. In addition, such data collections are unlikely to provide leads for factors which may prohibit the formation of disease.

OBJECTS AND ADVANTAGES OF THE INVENTION

[0037] Accordingly, it is a primary object of the present invention to provide a system and method for creating a database of information about individuals' environments over a period of time. Another object of the present invention is to provide a database containing information about individuals' environments which can be used with existing genomics databases. A further object of the present invention is to provide a method of using environmental information about an individual in conjunction with the individual's genotype to find disease-influencing genes or substances. It is another object of the present invention to use the disease-influencing genes or substances to find drug candidates or drug targets.

SUMMARY OF THE INVENTION

[0038] These objects and advantages are attained by a system and method for identifying a disease-influencing gene or protein. The method includes the step of selecting individuals having a risk factor for a certain disease. Each of the individuals is provided with a remotely programmable apparatus having a user interface for communicating queries to the individuals and for receiving responses. Each apparatus also includes a communication device, such as a modem, for communicating with a server through a communication network.

[0039] Queries relating to the individuals' environment are entered into the server and transmitted from the server to each individual's remote apparatus. After the individuals' have responded to the queries, the responses are sent back to the server and organized into a database. Data mining software is then used to distinguish the individuals into groups based on their environmental profiles. After a period of time, each group is then further divided into categories based on their disease progression. The genomes of all the individuals are then sequenced. Data mining techniques are used to find gene differences between the categories.

[0040] According to a second method of the invention, the individuals are first separated into groups according to their disease progressions. Data mining techniques are then used to further distinguish each group into categories based on the individuals' environmental profiles. The genomes of all the individuals are then sequenced, and data mining techniques are used to find gene differences between the categories.

[0041] A third embodiment of the invention provides a method for identifying disease-influencing substances. The method includes the step of selecting individuals having a risk factor for a certain disease. Each of the individuals is provided with a remotely programmable apparatus having a user interface for communicating queries to the individuals and for receiving responses. Each apparatus also includes a communication device, such as a modem, for communicating with a server through a communication network.

[0042] Queries relating to the individuals' environment are entered into the server and transmitted from the server to each individual's remote apparatus. After the individuals' have responded to the queries, the responses are sent back to the server and organized into a database. The genomes of all the individuals are then sequenced. The individuals are placed into groups based on their gene sequences. Each group is then separated into categories based on the individuals' disease progression. Data mining techniques are then used to find a disease-influencing substance between the categories of individuals by using the individuals environmental profiles.

[0043] The disease-influencing gene or substance isolated using these methods is preferably used to develop drug candidates or drug targets Additionally, the isolation of the disease-influencing gene is preferably used to identify a corresponding disease-influencing protein, which can also be used to develop drug candidates or drug targets.

[0044] The present invention also provides a database and data processing system for storing and analyzing environmental information about individuals. The database and data processing system comprise a server for storing queries and the individuals' responses to the queries. The system also includes at least one remotely programmable apparatuses having a user interface for communicating queries to the individuals' and for receiving the responses. Each apparatus also includes a communication device, such as a modem, for communicating with the server through a communication network.

[0045] The system also includes genotyping means in communication with the server for determining the individuals' gene sequences and a data mining software program accessible to the server for analyzing the individuals' gene sequences and environmental profiles. In particular, the data mining program includes: means for analyzing the responses in order to group the individuals having a similar behavioral and environmental profile, a similar disease progression, and a similar genotype; means for analyzing the responses in order to group the individuals having a similar disease progression; means for analyzing the responses in order to group the individuals-having a similar genotype; and means for identifying a disease-influencing gene or substance. Alternatively, the database can be used with other genomics or bioinformatics databases and systems if the information is to be manipulated in different ways.

DESCRIPTION OF THE FIGURES

[0046] FIG. 1 is a block diagram of a networked system according to a preferred embodiment of the invention.

[0047] FIG. 2 is a block diagram illustrating the interaction of the components of the system of FIG. 1.

[0048] FIG. 3 is a perspective view of a remotely programmable apparatus of the system of FIG. 1.

[0049] FIG. 4 is a block diagram illustrating the components of the apparatus of FIG. 3.

[0050] FIG. 5 is a script entry screen according to the preferred embodiment of the invention.

[0051] FIG. 6A is a listing of a sample script program according to the preferred embodiment of the invention.

[0052] FIG. 6B is a continuation of the listing of FIG. 6A.

[0053] FIG. 7 is a script assignment screen according to the preferred embodiment of the invention.

[0054] FIG. 8 is a sample query appearing on a display of the apparatus of FIG. 3.

[0055] FIG. 9 is a sample prompt appearing on the display of the apparatus of FIG. 3.

[0056] FIG. 10 is a sample report displayed on a workstation of the system of FIG. 1.

[0057] FIG. 11 is a flow chart illustrating the steps included in a monitoring application executed by the server of FIG. 1 according to the preferred embodiment of the invention.

[0058] FIG. 12 is a flow chart illustrating the steps included in the script program of FIGS. 6A-6B.

[0059] FIG. 13 is a sample completed data table of the present invention.

[0060] FIG. 14 is a sample completed data table of the present invention.

[0061] FIG. 15 is a flow chart illustrating a first method for identifying a gene according to the present invention.

[0062] FIG. 16 is a block diagram illustrating the method of FIG. 15.

[0063] FIG. 17 is a flow chart illustrating a second method for identifying a gene according to the present invention.

[0064] FIG. 18 is a block diagram illustrating the method of FIG. 17.

[0065] FIG. 19 is a flow chart illustrating a third method according to the present invention.

[0066] FIG. 20 is a block diagram illustrating the method of FIG. 19.

DETAILED DESCRIPTION

[0067] The invention presents a system and method for creating a database containing environmental information about an individual to be used in conjunction with the individual's gene sequences to find new drug targets and drug candidates. In a preferred embodiment of the invention, remote monitors are used to collect the environmental information. It is to be understood that environmental information includes all non-genetic information about an individual, such as disease progression, diet, lifestyle, and geographical location.

[0068] A preferred embodiment of the invention is illustrated in FIGS. 1-16. Referring to FIG. 1, a networked system includes a server 50 and a workstation 52 connected to server 50 through a communication network 58. Server 50 is also connected to a patient profile database 54 which stores environmental information about the individuals. Server 50 is further connected to a genotyping system 56 which is capable of sequencing individuals' genomes. Patient profile database 54 and genotyping system 56 are connected to server 50 through communication network 58.

[0069] Server 50 and patient profile database 54 are preferably world wide web servers. Server 50 and database 64 may comprise single stand-alone computers or multiple computers distributed throughout a network. Workstation 52 is preferably a personal computer, remote terminal, or web TV unit. Workstation 52 functions as a remote interface for entering in server 50 messages and queries to be communicated to the individuals.

[0070] Genotyping system 55 can be a laboratory capable of sequencing individuals' genomes, a gene sequencing chip such as the GeneChip by Affymetrix, or any other suitable genotyping system. Genotyping system 56 should be capable of transmitting information about the individuals' genomes to server 50. Communication network 58 connects workstation 52, patient profile database 54, and genotyping system 56 to server 50. Communication network 58 can be any suitable communication network, such as a telephone cable, the Internet, or cellular or wireless communication. Such communication networks are well known in the art.

[0071] The system also includes remotely programmable apparatuses 60 for monitoring individuals. Preferably, each remote apparatus 60 is used to monitor a respective one of the individuals. Alternatively, a multi-user apparatus may be used to monitor a plurality of individuals. Each remote apparatus is designed to interact with an individual in accordance with script programs received from server 50.

[0072] Each remote apparatus is in communication with server 50 through communication network 58, which is preferably the Internet. Alternatively, each remote apparatus may be placed in communication with the server via telephone cable, cellular communication, wireless communication, etc. For clarity of illustration, only two remote apparatuses are shown in FIG. 1. It is to be understood that the system may include any number of remote apparatuses for monitoring any number of individuals.

[0073] In the preferred embodiment, each individual to be monitored is also provided with a monitoring device 64. Monitoring device 64 is designed to produce measurements of a physiological condition of the individual, record the measurements, and transmit the measurements to the individual's remote apparatus 60 through a standard connection cable 62. Examples of suitable monitoring devices include blood glucose meters, respiratory flow meters, blood pressure cuffs, electronic weight scales, and pulse rate monitors. Such monitoring devices are well known in the art.

[0074] The specific type of monitoring device provided to each individual is dependent upon the individual's disease. For example, diabetes patients are provided with blood glucose meters for measuring blood glucose concentrations, asthma patients are provided with respiratory flow meters for measuring peak flow rates, obesity patients are provided with weight scales, etc.

[0075] FIG. 2 shows server 50, workstation 52, and remote apparatus 60 in greater detail. Server 50 includes a database 66 for storing script programs 68. The script programs 68 are executed by each remote apparatus 60 to communicate queries and messages to an individual, receive responses 70 to the queries, collect monitoring device measurements 72, and transmit responses 70 and measurements 72 to server 50. Database 66 is designed to store the responses 70 and measurements 72. Database 66 further includes a look-up table 74. Table 74 contains a list of the individuals to be monitored, and for each individual, a unique individual identification code and a respective pointer to script program 68 assigned to the individual. Each remote apparatus 60 is designed to execute the assigned script program which it receives from server 50.

[0076] FIGS. 3-4 show the structure of remote apparatus 50 according to the preferred embodiment. Referring to FIG. 3, remote apparatus 60 includes a housing 90. Housing 90 is preferably sufficiently compact to enable the remote apparatus to be hand-held and carried by an individual. Remote apparatus 60 also includes a user interface for communicating queries to the individual and for receiving responses to the queries.

[0077] In the preferred embodiment, the user interface includes a display 92 and four user input buttons 98A, 98B, 98C, and 98D. Display 92 displays queries and prompts to the individual, and is preferably a liquid crystal display (LCD). The user input buttons 98A, 98B, 98C, and 98D are for entering responses to the queries and prompts. The user input buttons are preferably momentary contact push buttons. Although the user interface of the preferred embodiment includes a display and input buttons, it will be apparent to one skilled in the art of electronic devices that any suitable user interface may be used in remote apparatus 60. For example, the user input buttons may be replaced by switches, keys, a touch sensitive display screen, or any other data input device. Alternatively, the display and input buttons may be replaced by a speech synthesis/speech recognition interface.

[0078] Three monitoring device jacks 96A, 96B, and 96C are located on a surface of housing 90. Device jacks 96A, 96B, and 96C are for connecting remote apparatus 60 to a number of monitoring devices, such as blood glucose meters, respiratory flow meters, or blood pressure cuffs, through standard connection cables (not shown). Remote apparatus 60 also includes a modem jack 94 for connecting remote apparatus 60 to a telephone jack through a standard connection cord (not shown). Remote apparatus 60 further includes a visual indicator, such as a light emitting diode (LED) 100. LED 100 is for visually notifying the individual that he or she has unanswered queries stored in remote apparatus 60.

[0079] FIG. 4 is a schematic block diagram illustrating the components of remote apparatus 60 in greater detail. Remote apparatus 60 includes a microprocessor 102 and a memory 108 connected to microprocessor 102. Memory 108 is preferably a non-volatile memory, such as a serial EEPROM. Memory 108 stores script programs received from the server, measurements received from monitoring device 64, responses to queries, and the individual's unique identification code. Microprocessor 102 also includes built-in read only memory (ROM) which stores firmware for controlling the operation of remote apparatus 60. The firmware includes a script interpreter used by microprocessor 102 to execute the script programs. The script interpreter interprets script commands which are executed by microprocessor 102. Specific techniques for interpreting and executing script programs in this manner are well known in the art.

[0080] Microprocessor 102 is preferably connected to memory 108 using a standard two-wire I.sup.2C interface. Microprocessor 102 is also connected to user input buttons 98A, 98B, 98C, and 98D, LED 100, a clock 112, and a display driver 110. Clock 112 indicates the current date and time to microprocessor 102. For clarity of illustration, clock 112 is shown as a separate component, but is preferably built into microprocessor 102. Display driver 110 operates under the control of microprocessor 102 to display information on display 92. Microprocessor 102 is preferably a PIC 16C65 processor which includes a universal asynchronous receiver transmitter (DART) 104. DART 104 is for communicating with a modem 114 and a device interface 118. A CMOS switch 116 under the control of microprocessor 102 alternately connects modem 114 and interface 118 to DART 116.

[0081] Modem 114 is connected to a telephone jack 119 through modem jack 94. Modem 114 is for exchanging data with the server through the communication network. The data includes script programs which are received from the server as well as responses to queries, device measurements, script identification codes, and the individual's unique identification code which modem 114 transmits to server 50. Modem 114 is preferably a complete 28.8 K modem commercially available from Cermetek, although any suitable modem may be used.

[0082] Device interface 118 is connected to device jacks 96A, 96B, and 96C. Device interface 118 is for interfacing with a number of monitoring devices, such as blood glucose meters, respiratory flow meters, blood pressure cuffs, weight scales, or pulse rate monitors, through device jacks 96A, 96B, and 96C. Device interface 118 operates under the control of microprocessor 102 to collect measurements 72 from monitoring devices 64 and to output measurements 72 to microprocessor 102 for storage in memory 108. In the preferred embodiment, interface 118 is a standard RS232 interface. For simplicity of illustration, only one device interface 118 is shown in FIG. 4. However, in alternative embodiments, remote apparatus 60 may include multiple device interfaces to accommodate monitoring devices which have different connection standards.

[0083] Referring again to FIG. 2, server 50 includes a monitoring application 76. Monitoring application 76 is a controlling software application executed by server 50 to perform the various functions described below. Monitoring application 76 includes a script generator 78, a script assignor 80, and a report generator 82. Script generator 78 is designed to generate script programs 68 from script information entered through workstation 52. The script information is entered through a script entry screen 84. In the preferred embodiment, script entry screen 84 is implemented as a web page on the server 50. Workstation 52 includes a web browser for accessing the web page to enter the script information.

[0084] FIG. 5 illustrates script entry screen 84 as it appears on workstation 52. Script entry screen 84 includes a script name field 120 for specifying the name of script program to be generated. Screen 84 also includes entry fields 122 for entering a set of queries to be answered by an individual. Each entry field 122 has corresponding response choice fields 124 for entering response choices for the query. Screen 84 further includes check boxes 126 for selecting a desired monitoring device type from which to collect measurements, such as a blood glucose meter, respiratory flow meter, or blood pressure cuff.

[0085] Screen 84 additionally includes a connection time field 128 for specifying a prescribed connection time at which each remote apparatus executing the script program is to establish a subsequent communication link to the server. The connection time is preferably selected to be the time at which communication rates are the lowest, such as 3:00 AM. Screen 84 also includes a CREATE SCRIPT button 130 for instructing the script generator to generate a script program from the information entered in screen 84. Screen 84 further includes a CANCEL button 132 for canceling the information entered.

[0086] In the preferred embodiment, each script program created by the script generator 82 conforms to the standard file format used on UNIX systems. In the standard file format, each command is listed in the upper case and followed by a colon. Every line in the script program is terminated by a linefeed character {LF}, and only one command is placed on each line. The last character in the script program is a UNIX end of file character (EOF). TABLE 1 shows an exemplary listing of script commands used in the preferred embodiment of the invention.

TABLE-US-00001 TABLE 1 SCRIPT COMMANDS Command Description CLS: {LF} Clear the display. ZAP: {LF} Erase from memory the last set of query responses recorded. LED: b{LF} Turn the LED on or off, where b is a binary digit of 0 or 1. An argument of 1 turns on the LED, and an argument of 0 turns off the LED. DISPLAY: {chars}{LF} Display the text following the DISPLAY command. INPUT: mmmm{LF} Record a button press. The m's represent a button mask pattern for each of the four input buttons. Each m contains an "X" for disallowed buttons or an "O" for allowed buttons. For example, INPUT: OXOX{LF} allows the user to press either button #1 or #3. WAIT: {LF} Wait for any one button to be pressed, then continue executing the script program. COLLECT: device{LF} Collect measurements from the monitoring device specified in the COLLECT command. The user is preferably prompted to connect the specified monitoring device to the apparatus and press a button to continue. NUMBER: aaaa{LF} Assign a script identification code to the script program. The script identification code from the most recently executed NUMBER statement is subsequently transmitted to the server along with the query responses and device measurements. The script identification code identifies to the server which script program was most recently executed by the remote apparatus. DELAY: t {LF} Wait until time t specified in the DELAY command, usually the prescribed connection time. CONNECT: {LF} Perform a connection routine to establish a communication link to the server, transmit the patient identification code, query responses, device measurements, and script identification code to the server, and receive and store a new script program. When the server instructs the apparatus to disconnect, the script interpreter is restarted, allowing the new script program to execute.

[0087] The script commands illustrated in TABLE 1 are representative of the preferred embodiment and are not intended to limit the scope of the invention. After consideration of the ensuing description, it will be apparent to one skilled in the art many other suitable scripting languages and sets of script commands may be used to implement the system and method of the invention.

[0088] Script generator 78 preferably stores a script program template which it uses to create each script program. To generate a script program, script generator 78 inserts into the template the script information entered in script entry screen 84. For example, FIGS. 6A-6B illustrate a sample script program created by the script generator from the script information shown in FIG. 5.

[0089] The script program includes display commands to display the queries and response choices entered in fields 122 and 124, respectively. The script program also includes input commands to receive responses to the queries. The script program further includes a collect command to collect device measurements from the monitoring device specified in check boxes 126. The script program also includes commands to establish a subsequent communication link to the server at the connection time specified in field 128. The steps included in the sample Script program are also shown in the flow chart of FIGS. 12A-12B and will be discussed in the operation section below.

[0090] Referring again to FIG. 2, script assignor 80 is for assigning the script programs 68 to the individuals. The script programs are assigned in accordance with script assignment information entered through workstation 52. The script assignment information is entered through a script assignment screen 86, which is preferably implemented as a web page on server 50.

[0091] FIG. 7 shows a sample script assignment screen 86 as it appears on the workstation. Screen 86 includes check boxes 134 for selecting the script program to be assigned and check boxes 136 for selecting the individuals to whom the script program is to be assigned. Screen 86 also includes an ASSIGN SCRIPT button 140 for entering the assignments. When button 140 is pressed, s the script assignor creates and stores for each individual selected in check boxes 136 a respective pointer to the script program selected in check boxes 134. Each pointer is stored in the look-up table 74 of database 66. Screen 86 further includes an ADD SCRIPT button 138 for accessing the script entry screen and a DELETE SCRIPT button 142 for a deleting script program.

[0092] Referring again to FIG. 2, report generator 82 is designed to generate a report 88 from the responses 70 and device measurements 72 received in server 50. Report 88 is displayed on workstation 52. FIG. 10 shows a sample patient report 88 produced by report generator 82 for a selected individual. Report 88 includes a graph 146 of the device measurements received from the individual, as well as a listing of the query responses received from the individual. Specific techniques for writing a report generator program to display data in this manner are well known in the software art.

[0093] The operation of the preferred embodiment is illustrated in FIGS. 1-12. FIG. 11 is a flow chart illustrating steps included in the monitoring application executed by server 50. In step 202, the server determines if new script information has been entered through script entry screen 84. If new script information has not been entere, the server proceeds to step 206. If new script information has been entered, the server proceeds to step 204.

[0094] As shown in FIG. 5, the script information includes a set of queries, and for each of the queries, corresponding responses choices. The script information also includes a selected monitoring device type from which to collect measurements. The script information further includes a prescribed connection time for each remote apparatus to establish a subsequent communication link to the server. The script information is generally entered in the server by a healthcare provider, such as the individuals' physician or case manager. Of course, any person desiring to communicate with the individual may also be granted access to the server to create and assign script programs. Further, it is to be understood that the system may include any number of workstations for entering script generation and script assignment information into the server.

[0095] In step 204, script generator 78 generates a script program from the information entered in screen 84. The script program is stored in database 66. Steps 202 and 204 are preferably repeated to generate multiple script programs, e.g. a script program for diabetes patients, a script program for asthma patients, etc. Each script program corresponds to a respective one of the sets of queries entered through script entry screen 84. Following step 204, the server proceeds to step 206.

[0096] In step 206, the server determines if new script assignment information has been entered through script assignment screen 86. If new script assignment information has not been entered, the server proceeds to step 210. If new script assignment information has been entered, the server proceeds to step 208. As shown in FIG. 7, script programs are assigned to each individual by selecting a script program through check boxes 134, selecting the individuals to whom selected the script program is to be assigned through check boxes 136, and pressing the ASSIGN SCRIPT button 140. When button 140 is pressed, script assignor 86 creates for each individual selected in check boxes 136 a respective pointer to the script program selected in check boxes 134. In step 208, each pointer is stored in look-up table 74 of database 66. Following step 208, the server proceeds to step 210.

[0097] In step 210, the server determines if any of the remote apparatuses are remotely connected to the server. Each individual to be monitored is preferably provided with his or her own remote apparatus which has the individual's unique identification code stored therein. Each individual is thus uniquely associated with a respective one of the remote apparatuses. If none of remote apparatuses are connected, the server proceeds to step 220.

[0098] If a remote apparatus is connected, the server receives from the apparatus the individual's unique identification code in step 212. In step 214, the server receives from the apparatus the query responses, device measurements, and script identification code recorded during execution of a previously assigned script program. The script identification code identifies to the server which script program was executed by the remote apparatus to record the query responses and device measurements. The responses, device measurements, and script identification code are stored in database 66.

[0099] In step 216, the server uses the individual's unique identification code to retrieve from look-up table 74 the pointer to the script program assigned to the individual. The server then retrieves the assigned script program froth the database 66. In step 218, the server transmits the assigned script program to the individual's remote apparatus through the communication network 58. Following step 218, the server proceeds to step 220.

[0100] In step 220, the server determines if a report request has been received from workstation 52. If no report request has been received, the server returns to step 202. If a report request has been received for a selected individual, the server retrieves from database 66 the query responses and measurements last received from the individual, step 222. In step 224, the server generates and displays the report 88 on workstation 52.

[0101] As shown in FIG. 10, the report includes the query responses and device measurements last received from the individual. Following step 224, the server returns to step 202.

[0102] FIG. 12 illustrates the steps included in a sample script program executed by the remote apparatus. Before the script program is received, the remote apparatus is initially programmed with the individual's unique identification code and the script interpreter used by microprocessor 102 to execute script programs. The initial programming may be achieved during manufacture or during an initial connection to the server. Following initial programming, the remote apparatus receives from the server the script program assigned to the individual associated with the apparatus. The script program is received by modem 114 through a first communication link to the server and stored in memory 108.

[0103] In step 302, microprocessor 102 assigns a script identification code to the script program and stores the script identification code in memory 108. The script identification code is subsequently transmitted to the server along with query responses and device measurements to identify to the server which script program was most recently executed by the remote apparatus. In step 304, microprocessor 102 lights LED 100 to notify the individual that he or she has unanswered queries stored in the remote apparatus. LED 100 preferably remains lit until the queries are answered by the individual. In step 306, microprocessor 102 erases from memory 108 the last set of query responses recorded,

[0104] In step 308, microprocessor 102 prompts the individual by displaying on display 92 "ANSWER QUERIES NOW? PRESS ANY BUTTON TO START". In step 310, microprocessor 102 waits until a reply to the prompt is received from the individual. When a reply is received, microprocessor 102 proceeds to step 312. In step 312, microprocessor 102 executes successive display and input commands to display the queries and response choices on display 92 and to receive responses to the queries.

[0105] FIG. 8 illustrate a sample query and its corresponding response choices as they appear on display 92. The response choices are preferably positioned on display 92 such that each response choice is located proximate a respective one of the user input buttons 98A, 98B, 98C, and 98D. In the preferred embodiment, each response choice is displayed immediately above a respective user input button. The individual presses the button corresponding to his or her response, and microprocessor 102 stores the response in memory 108.

[0106] In steps 314 to 318, microprocessor 102 executes commands to collect device measurements from a selected monitoring device specified in the script program. In step 314, microprocessor 102 prompts the individual to connect the selected device to one of the device jacks 96A, 96B, or 96C. A sample prompt is shown in FIG. 9. In step 316, microprocessor 102 waits until a reply to the prompt is received from the individual. When a reply is received, microprocessor 102 proceeds to step 318. Microprocessor 102 also connects UART 104 to device interface 118 through CMOS switch 116. In step 318, microprocessor 102 collects device measurements from the selected device through device interface 118. The device measurements are stored in memory 108.

[0107] In step 320, microprocessor 102 prompts the individual to connect remote apparatus 60 to telephone jack 119 so that the apparatus may connect to the server at the prescribed connection time. In step 322, microprocessor 102 waits until a reply to the prompt is received from the individual. When a reply is received, microprocessor 102 turns off LED 100 in step 324. In step 326, microprocessor 102 waits until it is time to connect to the server. Microprocessor 102 compares the connection time specified in the script program to the current time output by clock 112. When it is time to connect, microprocessor 102 connects DART 104 to modem 114 through CMOS switch 116.

[0108] In step 328, microprocessor 102 establishes a subsequent communication link between remote apparatus 60 and server 50 through modem 114 and communication network 58. If the connection fails for any reason, microprocessor 102 repeats step 328 to get a successful connection. In step 330, microprocessor 102 transmits the query responses, device measurements, script identification code, and the individual's unique identification code stored in memory 108 to the server. In step 332, microprocessor 102 receives through modem 114 a newly assigned script program from the server. The new script program is stored in memory 108 for subsequent execution by microprocessor 102. Following step 332, the script program ends.

[0109] After the individual's information has been collected via remote apparatus 60 and the script programs, the data is mined to distinguish patterns. Data mining programs are well known in the art and can be easily adapted to this system. In the preferred embodiment, the data mining program includes a data table 150, as shown in FIG. 13. Data table 150 is stored on the server and has an individual identification number field 151, name fields 152, value fields 154 corresponding to the name fields, and explanation fields 156 corresponding to the name fields and value fields. The data type is entered into name fields 152, the possible numerical values corresponding to the data type are entered into value fields 154, and brief explanations of the data types and corresponding values are entered into explanation fields 156.

[0110] The individuals' device measurements and responses to the queries are entered into data table 150 in the form of numerical values in value fields 154. The individual's identification number is entered into individual identification number field 151. An example of data table 150 in which the individuals' information has been entered is shown in FIG. 14. Once data table 150 contains all the necessary information, the data mining program then compares the information.

[0111] FIG. 15 is a flowchart illustrating a first method of the present invention carried out by the server using the data mining techniques described above. In step 400, individuals having a risk factor for a disease are selected. In step 402, these individuals are queried about their behavior and environment using the script programs and remote apparatuses previously described. The responses to the queries and any device measurements are received and stored by the server. Collection of the responses and device measurements can occur over any period of time, thus allowing for more accurate data.

[0112] After the server receives the responses and measurements, a database comprising the individuals' behavioral and environmental profiles is created in step 404. In step 406, data mining techniques are used to group individuals having similar behavioral and environmental profiles. In step 408, the server determines if it is necessary to further group the individuals in order to produce smaller groups. Steps 406 and 408 can be repeated as often as necessary.

[0113] In step 410, each group of individuals is categorized using data mining techniques. The individuals are categorized according to their disease progressions. For example, a group of individuals can be categorized into those that have a severe disease phenotype, those that have a moderate disease phenotype, and those that have a mild disease phenotype: In step 412, the server determines if it is necessary to further categorize the individuals. Steps 410 and 412 can be repeated as often as necessary.

[0114] In step 414, the genomes of all the individuals are sequenced by genotyping system 56. The genotypes of all the individuals are transmitted to server 50. In step 416, data mining techniques are used to compare the genotypes of the individuals between the categories. For example, if those individuals who have a severe disease phenotype and are overweight have a certain gene sequence, while those individuals who have a mild disease phenotype and are overweight do not, it Is likely the gene sequence is responsible for the severe disease phenotype. If a gene sequence is found, it is further identified in step 418. Methods of isolating and identifying gene sequences are well known in the field.

[0115] FIG. 16 is a block diagram illustrating an example of the first method of the present invention as described in FIG. 15. First individuals having a risk factor for a certain disease, such as is non-insulin dependent diabetes mellitus (NIDDM), are selected, as indicated at block 422. Behavioral and environmental information from each individual is collected using the script programs and remote apparatuses. Using data mining techniques, the individuals are then grouped into overweight individuals 424 and non-overweight individuals 426. Using- data mining techniques, the individuals are then categorized into overweight individuals having severe NIDDM 428, overweight individuals having mild NIDDM 430, non-overweight individuals having mild NIDDM 432, and non-overweight individuals having severe NIDDM 434.

[0116] The individuals' genotype information is then taken, as indicated at block 436, to determine the individuals' gene sequences. For example, overweight individuals with severe NIDDM have gene sequence A, overweight individuals with mild NIDDM have gene sequence B, non-overweight individuals with mild NIDDM have gene sequence B, and non-overweight individuals with severe NIDDM have gene sequence A. Data mining techniques are then used to analyze the information and come to a conclusion. In this example, data mining would conclude that the severe NIDDM phenotype is likely related to gene sequence A, not the individual's weight.

[0117] FIG. 17 shows a flowchart illustrating a second method of the present invention carried out by the server using the data mining techniques described above. In step 500, individuals having a risk factor for a disease are selected. In step 502, these individuals are queried about their behavior and environment using the script programs and remote apparatuses previously described. The responses to the queries and any device measurements are received and stored by the server.

[0118] After the server receives the responses and measurements from the remote apparatuses, a database comprising the individuals' behavioral and environmental profiles is created in step 504. In step 506, data mining techniques are used to group together individuals having similar disease progressions. For example, a group of individuals can be grouped into those that have a severe disease phenotype, those that have a moderate disease phenotype, and those that have a mild disease phenotype. In step 508, the server determines if it is necessary to further group the individuals in order to produce smaller groups. Steps 506 and 508 can be repeated as often as necessary.

[0119] In step 510, each group of individuals created in steps 506 and 508 is categorized using data mining techniques according to the behavioral and environmental profiles of the individuals. In step 512, the server determines if it is necessary to further group the individuals in order to produce smaller groups. Steps 510 and 512 can be repeated as often as necessary.

[0120] In step 514, the genomes of all the individuals are sequenced by genotyping system 56. The genotypes of all the individuals are transmitted to the server. In step 516, data mining techniques are used to compare the genotypes of the individuals between the categories. For example, if those individuals who have a severe disease phenotype and are overweight have a certain gene sequence, while those individuals who have a mild disease and are also overweight phenotype do riot, it is likely the gene sequence, not weight; is responsible for the severe disease phenotype. If a gene sequence is found, it is further identified in step 518. Specific techniques of isolating and identifying gene sequences are well known in the field.

[0121] FIG. 18 is a block diagram illustrating an example of the second method of the present invention as described in FIG. 17. First individuals having a risk factor for a certain disease, such as NIDDM, are chosen, as indicated at block 522. Behavioral and environmental information from each individual is collected using the remote apparatuses and script programs. Using data mining techniques, the individuals are then grouped into those exhibiting severe NIDDM 524 and those exhibiting mild NIDDM 525. Using data mining techniques, the individuals are then categorized into overweight individuals having severe NIDDM 528, non--overweight individuals having severe NIDDM 530, non-overweight individuals having mild NIDDM 532, and overweight individuals having mild NIDDM 534.

[0122] The individuals' genotype information is then taken, as indicated at block 535, to determine the individuals gene sequences. For example, individuals with severe NIDDM who are overweight have gene sequence A, individuals with severe NIDDM who are non-overweight have gene sequence A, individuals with mild NIDDM who are non-overweight have gene sequence B, and individuals with severe NIDDM who are overweight have gene sequence B. Data mining techniques are then used to analyze the information and come to a conclusion. In this example, data mining would conclude that the severe NIDDM phenotype is likely related to gene sequence A, not the individual's weight.

[0123] FIG. 19 shows a flowchart illustrating a preferred method carried out by server 50 to identify a disease-identifying substance. In step 600, individuals having a risk factor for a disease are selected. In step 602, these individuals are queried about their behavior and environment using the script programs and remote apparatuses previously described. The responses to the queries and any device measurements are received and stored by the server.

[0124] After the server receives the responses and measurements from the remote apparatuses, a database comprising the individuals' behavioral and environmental profiles is created in step 604. In step 606, the genomes of all the individuals are sequenced, and the genotypes of all the individuals are transmitted to the server. In step 608, individuals having the sake or close genotypes are grouped together. In step 610, data mining techniques are used to categorize together individuals having similar disease progressions. In step 612, the server determines if it is necessary to further categorize the individuals in order to produce smaller groups. Steps 610 and 612 can be repeated as often as necessary.

[0125] In step 614, data mining techniques are used to find a disease-influencing substance between the categories of individuals by using the individuals behavioral and environmental profiles. For example, if those individuals who have a severe disease phenotype are overweight, while those individuals who have a mild disease phenotype are not, it is likely weight is responsible for the severe disease phenotype. If such a disease-influencing substance is found, it is identified in step 618. If no disease-influencing substance is found, the process is preferably repeated.

[0126] FIG. 20 is a block diagram illustrating an example of the method described in FIG. 19. First, individuals having a risk factor for a certain disease, such as NIDDM, are chosen, as indicated at block 620. Behavioral and environmental information from each individual is collected using the remote apparatuses and script programs. The individuals' genotype information is then taken, as indicated at block 622, to determine the individuals' gene sequences. The individuals are then grouped according to their gene sequences. For example, one group may have gene sequence A, as indicated at block 624, while another group may have gene sequence B, as indicated at block 626. Using data mining techniques, the individuals are then categorized into individuals with gene sequence A having severe NIDDM 628, individuals with gene sequence A having mild NIDDM 630, individuals with gene sequence B having mild NIDDM 632, and individuals with gene sequence B having severe NIDDM 634.

[0127] Data mining techniques are further used to analyze the categories of individuals and their behavioral and environmental profiles. For example, overweight individuals 638 with severe NIDDM have gene sequence A, non-overweight individuals 640 with mild NIDDM have gene sequence A, overweight individuals 642 with mild NIDDM have gene sequence B, and non-overweight individuals 644 with severe NIDDM have gene sequence B. Data mining techniques are then used to analyze the information and come to a conclusion. In this example, data mining would conclude that the severe NIDDM phenotype is likely related to gene sequence A, not the individual's weight.

SUMMARY, RAMIFICATIONS, AND SCOPE

[0128] Although the above description contains many specificities, these should not be construed as limitations on the scope of the invention but merely as illustrations of some of the presently preferred embodiments. Many other embodiments of the invention are possible. For example, the scripting language and script commands shown are representative of the preferred embodiment. It will be apparent to one skilled in the art that many other scripting languages and specific script commands may be used to implement the invention.

[0129] Moreover, the invention is not limited to the specific applications described. The system and method of the invention have many other applications. For example, pharmaceutical manufacturers may apply the system in clinical trials to analyze new drug data.

[0130] Therefore, the scope of the invention should be determined by the appended claims and their legal equivalents.

* * * * *