U.S. patent application number 11/489292 was filed with the patent office on 2007-03-22 for computer-aided visualization of expression comparison.
This patent application is currently assigned to Affymetrix, Inc.. Invention is credited to David Balaban, Josie Dai, Kurt Gish, Elina Khurgin, David H. Mack, Jim Snyder.
Application Number | 20070067111 11/489292 |
Document ID | / |
Family ID | 21800291 |
Filed Date | 2007-03-22 |
United States Patent
Application |
20070067111 |
Kind Code |
A1 |
Mack; David H. ; et
al. |
March 22, 2007 |
Computer-aided visualization of expression comparison
Abstract
Innovative systems and methods for visualizing information
collected from analyzing samples are provided. The samples may
include nucleic acids, proteins, or other polymers. Gene expression
level as determined from analysis of a nucleic acid sample is one
possible analysis result that may be visualized. In one embodiment,
a computer system may display the expression levels of multiple
genes simultaneously in a way that facilitates user identification
of genes whose expression is significant to a characteristic such
as disease or resistance to disease. Additionally, the computer
system may facilitate display of further information about relevant
genes once they are identified.
Inventors: |
Mack; David H.; (Menlo Park,
CA) ; Gish; Kurt; (Sunnyvale, CA) ; Balaban;
David; (San Jose, CA) ; Khurgin; Elina;
(Cupertino, CA) ; Dai; Josie; (San Jose, CA)
; Snyder; Jim; (Palo Alto, CA) |
Correspondence
Address: |
TOWNSEND AND TOWNSEND AND CREW LLP
TWO EMBARCADERO CENTER
8TH FLOOR
SAN FRANCISCO
CA
94111-3834
US
|
Assignee: |
Affymetrix, Inc.
Santa Clara
CA
|
Family ID: |
21800291 |
Appl. No.: |
11/489292 |
Filed: |
July 18, 2006 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
10028748 |
Dec 21, 2001 |
|
|
|
11489292 |
Jul 18, 2006 |
|
|
|
09020743 |
Feb 9, 1998 |
6420108 |
|
|
10028748 |
Dec 21, 2001 |
|
|
|
11080216 |
Mar 14, 2005 |
|
|
|
11489292 |
Jul 18, 2006 |
|
|
|
10374170 |
Feb 25, 2003 |
6882742 |
|
|
11080216 |
Mar 14, 2005 |
|
|
|
09836867 |
Apr 16, 2001 |
6567540 |
|
|
10374170 |
Feb 25, 2003 |
|
|
|
09122167 |
Jul 24, 1998 |
6229911 |
|
|
09836867 |
Apr 16, 2001 |
|
|
|
60069436 |
Dec 11, 1997 |
|
|
|
60053842 |
Jul 25, 1997 |
|
|
|
60069198 |
Dec 11, 1997 |
|
|
|
60069436 |
Dec 11, 1997 |
|
|
|
Current U.S.
Class: |
702/20 |
Current CPC
Class: |
G16B 25/00 20190201;
G06T 11/206 20130101; C12Q 1/68 20130101; G16B 45/00 20190201; C12N
15/1089 20130101 |
Class at
Publication: |
702/020 |
International
Class: |
G06F 19/00 20060101
G06F019/00 |
Claims
1-48. (canceled)
49. A method for analyzing expression level information, the method
comprising: displaying a first axis indicative of a value of a
first expression level for a first expressed sequence; displaying a
first mark at a first position, the first position associated with
a first coordinate related to the first axis in accordance with the
first expression level of the first expressed sequence; generating
a sound associated with the first mark, the sound indicative of a
second expression level for the first expressed sequence.
50. A method for analyzing expression level information, the method
comprising: displaying a first axis indicative of a value of a
first expression level for a first expressed sequence; displaying a
first mark at a first position, the first position associated with
a first coordinate related to the first axis in accordance with the
first expression level of the first expressed sequence; generating
a sound associated with the first mark, the sound indicative of a
second expression level for the first expressed sequence; receiving
an input of selection of the first mark; and in response to the
input, displaying information associated with the first expressed
sequence.
Description
CROSS REFERENCE TO RELATED APPLICATIONS
[0001] This application is a divisional application of U.S.
application Ser. No. 10/028,748, which is a continuation
application of U.S. application Ser. No. 09/020,743, which claims
priority to U.S. Provisional No. 60/069,436. U.S. application Ser.
Nos. 10/028,748 and 09/020,743 are incorporated by reference
herein, and U.S. application Ser. No. 09/020,743 has been issued
into U.S. Pat. No. 6,420,108.
BACKGROUND OF THE INVENTION
[0002] The present invention relates to the field of computer
systems. More specifically, the present invention relates to
computer systems for visualizing analysis results.
[0003] Devices and computer systems for forming and using arrays of
materials on a substrate are known. For example, PCT Publication
No. WO 92/10588, incorporated herein by reference for all purposes,
describes techniques for sequencing or sequence checking nucleic
acids and other materials. Arrays for performing these operations
may be formed according to the methods of, for example, the
pioneering techniques disclosed in U.S. Pat. No. 5,143,854 and U.S.
Pat. No. 5,593,839 both incorporated herein by reference for all
purposes.
[0004] According to one aspect of the techniques described therein,
an array of nucleic acid probes is fabricated at known locations on
a substrate or chip. A fluorescently labeled nucleic acid is then
brought into contact with the chip and a scanner generates an image
file (which is processed into a cell file) indicating the locations
where the labeled nucleic acids bound to the chip. Based upon the
cell file and identities of the probes at specific locations, it
becomes possible to extract information such as the monomer
sequence of DNA or RNA. Such systems have been used to form, for
example, arrays of DNA that may be used to study and detect
mutations relevant to cystic fibrosis, the P53 gene (relevant to
certain cancers), HIV, and other genetic characteristics.
[0005] Computer aided techniques for monitoring gene expression
using such arrays of probes have also been developed as disclosed
in U.S. patent application Ser. No. 08/828,952 (Attorney Docket No.
16528X-028900US) and PCT Publication No. WO 97/10365 (Attorney
Docket No. 16528X-01711OPC), the contents of which are herein
incorporated by reference. Many disease states are characterized by
differences in the expression levels of various genes either
through changes in the copy number of the genetic DNA or through
changes in levels of transcription (e.g., through control of
initiation, provision of RNA precursors, RNA processing, etc.) of
particular genes. For example, losses and gains of genetic material
play an important role in malignant transformation and progression.
Furthermore, changes in the expression (transcription) levels of
particular genes (e.g., oncogenes or tumor suppressors), serve as
signposts for the presence and progression of various cancers.
[0006] It is desirable to identify genes having expression levels
relevant to diagnosis of a diseased state by analyzing the
expression levels of large numbers of genes in both diseased and
normal individuals. Methods for collecting the expression level
information have been developed. However, the user interfaces for
gene expression monitoring systems that have been developed until
now are designed to clearly present the expression of particular
pre-selected genes. A user seeking to identify, e.g., an oncogene
or a tumor suppressor gene, must individually review the expression
level of large numbers of genes and compare the expression levels
between diseased and normal individuals. What is needed is a user
interface that takes advantage of collected gene expression
information to help the user to identify particular genes of
interest.
BRIEF SUMMARY OF THE INVENTION
[0007] The present invention provides innovative systems and
methods for visualizing information collected from analyzing
samples. The samples may include nucleic acids, proteins, or other
polymers. Gene expression level as determined from analysis of a
nucleic acid sample is one possible analysis result that may be
visualized. In one embodiment, a computer system may display the
expression levels of multiple genes simultaneously in a way that
facilitates user identification of genes whose expression is
significant to a characteristic such as disease or resistance to
disease. Additionally, the computer system may facilitate display
of further information about relevant genes once they are
identified.
[0008] A first aspect of the invention provides a computer
implemented method for presenting expression level information as
collected from first and second samples. The method includes steps
of: displaying a first axis corresponding to expression level in
the first sample, and displaying a second axis substantially
perpendicular to the first axis, the second axis corresponding to
expression level in the second sample. The method further includes
a step of: for a selected expressed sequence, displaying a mark at
a position. The position is selected relative to the first axis in
accordance with an expression level of the selected expressed
sequence in the first sample and relative to the second axis in
accordance with an expression level of the selected expressed
sequence in the second sample. A particularly useful application is
displaying many marks simultaneously for many selected genes to
discover which ones of the selected genes may be relevant to the
characteristic.
[0009] A second aspect of the invention provides a
computer-implemented method of presenting sample analysis
information. The method includes steps of: displaying a first axis
corresponding to a concentration of a compound in a first sample as
determined by monitoring binding of the compound to a selected
polymer having binding affinity to the compound, and displaying a
second axis substantially perpendicular to the first axis. The
second axis corresponds to a concentration of the compound in the
second sample as determined by monitoring binding of the compound
to the selected polymer. The method further preferably includes a
step of displaying a mark at a position. The position is selected
relative to the first axis in accordance with the concentration in
the first sample and relative to the second axis in accordance with
the concentration in the second sample.
[0010] A further understanding of the nature and advantages of the
inventions herein may be realized by reference to the remaining
portions of the specification and the attached drawings.
BRIEF DESCRIPTION OF THE DRAWINGS
[0011] FIG. 1 illustrates an example of a computer system that may
be used to execute software embodiments of the present
invention.
[0012] FIG. 2 shows a system block diagram of a typical computer
system.
[0013] FIG. 3 illustrates an overall system for forming and
analyzing arrays of polymers including biological materials such as
DNA or RNA.
[0014] FIG. 4 is an illustration of an embodiment of software for
the overall system.
[0015] FIG. 5 shows a flowchart of a process of monitoring the
expression of a gene by comparing hybridization intensities of
pairs of perfect match and mismatch probes.
[0016] FIG. 6 shows a screen display illustrating gene expression
levels for multiple genes as collected from both normal and
diseased tissue.
[0017] FIGS. 7A-7B show screen displays illustrating information
(SEQ ID NOS:1 and 2) about a particular gene selected from the
display of FIG. 6.
DETAILED DESCRIPTION OF THE INVENTION
[0018] The present invention provides innovative methods of
monitoring visualizing gene expression. In the description that
follows, the invention will be described in reference to preferred
embodiments. However, the description is provided for purposes of
illustration and not for limiting the spirit and scope of the
invention.
[0019] FIG. 1 illustrates an example of a computer system that may
be used to execute software embodiments of the present invention.
FIG. 1 shows a computer system 1 which includes a monitor 3, screen
5, cabinet 7, keyboard 9, and mouse 11. Mouse 11 may have one or
more buttons such as mouse buttons 13. Cabinet 7 houses a CD-ROM
drive 15 and a hard drive (not shown) that may be utilized to store
and retrieve software programs including computer code
incorporating the present invention. Although a CD-ROM 17 is shown
as the computer readable medium, other computer readable media
including floppy disks, DRAM, hard drives, flash memory, tape, and
the like may be utilized. Cabinet 7 also houses familiar computer
components (not shown) such as a processor, memory, and the
like.
[0020] FIG. 2 shows a system block diagram of computer system 1
used to execute software embodiments of the present invention. As
in FIG. 1, computer system 1 includes monitor 3 and keyboard 9.
Computer system 1 further includes subsystems such as a central
processor 50, system memory 52, I/O controller 54, display adapter
56, removable disk 58, fixed disk 60, network interface 62, and
speaker 64. Removable disk 58 is representative of removable
computer readable media like floppies, tape, CD-ROM, removable hard
drive, flash memory, and the like. Fixed disk 60 is representative
of an internal hard drive or the like. Other computer systems
suitable for use with the present invention may include additional
or fewer subsystems. For example, another computer system could
include more than one processor 50 (i.e., a multi processor system)
or memory cache.
[0021] Arrows such as 66 represent the system bus architecture of
computer system 1. However, these arrows are illustrative of any
interconnection scheme serving to link the subsystems. For example,
display adapter 56 may be connected to central processor 50 through
a local bus or the system may include a memory cache. Computer
system 1 shown in FIG. 2 is but an example of a computer system
suitable for use with the present invention. Other configurations
of subsystems suitable for use with the present invention will be
readily apparent to one of ordinary skill in the art. In one
embodiment, the computer system is an IBM compatible personal
computer.
[0022] The VLSIPS.TM. and GeneChip.TM. technologies provide methods
of making and using very large arrays of polymers, such as nucleic
acids, on very small chips. See U.S. Pat. No. 5,143,854 and PCT
Patent Publication Nos. WO 90/15070 and 92/10092, each of which is
hereby incorporated by reference for all purposes. Nucleic acid
probes on the chip are used to detect complementary nucleic acid
sequences in a sample nucleic acid of interest (the "target"
nucleic acid).
[0023] It should be understood that the probes need not be nucleic
acid probes but may also be other receptors, such as antibodies, or
polymers such as peptides. Peptide probes may be used to detect the
concentration of other peptides, proteins, or other compounds in a
sample. The probes must be carefully selected to have bonding
affinity to the compound whose concentration they are to be used to
measure.
[0024] In one embodiment, the present invention provides methods of
visualizing information relating to the concentration of compounds
in a sample as measured by monitoring affinity of the compounds to
probes. In a particular application, the concentration information
is generated by analysis of hybridization intensity files for a
chip containing hybridized nucleic acid probes. The hybridization
of a nucleic acid sample to certain probes may represent the
expression level of one more genes or expressed sequence tags
(ESTs). The expression level of a gene or EST is herein understood
to be the concentration within a sample of mRNA or protein that
would result from the transcription of the gene or EST.
[0025] Expression level information visualized by virtue of the
present invention need not be obtained from probes but may
originate from any source. If the expression information is
collected from a probe array, the probe array need not meet any
particular criteria for size and density. Furthermore, the present
invention is not limited to visualizing fluorescent measurements of
bondings such as hybridizations but may be readily utilized to
visualize other measurements.
[0026] Concentration of compounds other than nucleic acids may be
visualized according to one embodiment of the present invention.
For example, a probe array may include peptide probes which may be
exposed to protein samples, polypeptide samples, or other compounds
which may or may not bond to the peptide probes. By appropriate
selection of the peptide probes, one may detect the presence or
absence of particular compounds which would bond to the peptide
probes.
[0027] For purposes of illustration, the present invention is
described as being part of a system that designs a chip mask,
synthesizes the probes on the chip, labels nucleic acids from a
target sample, and scans the hybridized probes. Such a system is
set forth in U.S. Pat. No. 5,571,639 which is hereby incorporated
by reference for all purposes. However, the present invention may
be used separately from the overall system for analyzing data
generated by such systems, such as at remote locations, or for
visualizing the results of other systems for generating expression
information, or for visualizing concentrations of polymers other
than nucleic acids.
[0028] FIG. 3 illustrates a computerized system for forming and
analyzing arrays of biological materials such as RNA or DNA. A
computer 100 is used to design arrays of biological polymers such
as RNA or DNA. The computer 100 may be, for example, an
appropriately programmed IBM personal computer compatible running
Windows NT including appropriate memory and a CPU as shown in FIGS.
1 and 2. The computer system 100 obtains inputs from a user
regarding characteristics of a gene of interest, and other inputs
regarding the desired features of the array. Optionally, the
computer system may obtain information regarding a specific genetic
sequence of interest from an external or internal database 102 such
as GenBank. The output of the computer system 100 is a set of chip
design computer files 104 in the form of, for example, a switch
matrix, as described in PCT application WO 92/10092, and other
associated computer files.
[0029] The chip design files are provided to a system 106 that
designs the lithographic masks used in the fabrication of arrays of
molecules such as DNA. The system or process 106 may include the
hardware necessary to manufacture masks 110 and also the necessary
computer hardware and software 108 necessary to lay the mask
patterns out on the mask in an efficient manner. As with the other
features in FIG. 3, such equipment may or may not be located at the
same physical site, but is shown together for ease of illustration
in FIG. 3. The system 106 generates masks 110 or other synthesis
patterns such as chrome on glass masks for use in the fabrication
of polymer arrays.
[0030] The masks 110, as well as selected information relating to
the design of the chips from system 100, are used in a synthesis
system 112. Synthesis system 112 includes the necessary hardware
and software used to fabricate arrays of polymers on a substrate or
chip 114. For example, synthesizer 112 includes a light source 116
and a chemical flow cell 118 on which the substrate or chip 114 is
placed. Mask 110 is placed between the light source and the
substrate/chip, and the two are translated relative to each other
at appropriate times for deprotection of selected regions of the
chip. Selected chemical reagents are directed through flow cell 118
for coupling to deprotected regions, as well as for washing and
other operations. All operations are preferably directed by an
appropriately programmed computer 119, which may or may not be the
same computer as the computer(s) used in mask design and mask
making.
[0031] The substrates fabricated by synthesis system 112 are
optionally diced into smaller chips and exposed to marked targets.
The targets may or may not be complementary to one or more of the
molecules on the substrate. The targets are marked with a label
such as a fluorescein label (indicated by an asterisk in FIG. 3)
and placed in scanning system 120. Scanning system 120 again
operates under the direction of an appropriately programmed digital
computer 122, which also may or may not be the same computer as the
computers used in synthesis, mask making, and mask design. The
scanner 120 includes a detection device 124 such as a confocal
microscope or CCD (charge coupled device) that is used to detect
the location where labeled target has bound to the substrate. The
output of scanner 120 is an image files) 124 indicating, in the
case of fluorescein labeled target, the fluorescence intensity
(photon counts or other related measurements, such as voltage) as a
function of position on the substrate. Since higher photon counts
will be observed where the labeled target has bound more strongly
to the array of polymers, and since the monomer sequence of the
polymers on the substrate is known as a function of position, it
becomes possible to determine the sequence(s) of polymer(s) on the
substrate that are complementary to the target.
[0032] The image file 124 is provided as input to an analysis
system 126 that incorporates the visualization and analysis methods
of the present invention. Again, the analysis system may be any one
of a wide variety of computer system. The present invention
provides various methods of analyzing and visualizing the chip
design files and the image files, providing appropriate output 128.
The chip design need not include any particular number of probes.
It should be understood that the present invention does not require
any particular source of expression level information.
[0033] FIG. 4 provides a simplified illustration of the overall
software system used in the operation of one embodiment of the
invention. As shown in FIG. 4, the system first identifies the
nucleotide sequence(s) or targets that would be of interest in a
particular expression level analysis at step 202. The sequences of
interest correspond to mRNA transcripts of one or more genes, ESTs
or nucleic acids derived from the mRNA transcripts. Sequence
selection may be provided via manual input of text files or may be
from external sources such as GenBank.
[0034] At step 204 the system evaluates the sequences of interest
to determine or assist the user in determining which probes would
be desirable on the chip, and provides an appropriate "layout" on
the chip for the probes. The process of selecting probes for an
expression level analysis is explained in PCT Publication No. WO
97/10365, the contents of which are herein incorporated by
reference. An alternative probe selection process that does not
require prior knowledge of sequences of interest is explained in
PCT Publication No. WO97/27317 (Attorney Docket No.
18547-019410PC), the contents of which are herein incorporated by
reference. Further general background on probe selection is found
in PCT Publication No. WO95/11995 (Attorney Docket No.
18547-004111PC) and PCT Publication No. WO97/29212 (Attorney Docket
No. 18547-018540PC), the contents of which are herein incorporated
by reference. The term "perfect match probe" refers to a probe that
has a sequence that is perfectly complementary to a particular
target sequence. The test probe is typically perfectly
complementary to a portion (subsequence) of the target sequence.
The term "mismatch control" or "mismatch probe" refer to probes
whose sequence is deliberately selected not to be perfectly
complementary to a particular target sequence. For each mismatch
(MM) control in an array there typically exists a corresponding
perfect match (PM) probe that is perfectly complementary to the
same particular target sequence.
[0035] The process compares hybridization intensities of pairs of
perfect match and mismatch probes that are preferably covalently
attached to the surface of a substrate or chip. Most preferably,
the nucleic acid probes have a density greater than about 60
different nucleic acid probes per 1 cm.sup.2 of the substrate.
[0036] Initially, nucleic acid probes are selected that are
complementary to the target sequence. These probes are the perfect
match probes. Another set of probes is specified that are intended
to be not perfectly complementary to the target sequence. These
probes are the mismatch probes and each mismatch probe includes at
least one nucleotide mismatch from a perfect match probe.
Accordingly, a mismatch probe and the perfect match probe to which
it is identical except for one base make up a pair. As mentioned
earlier, the nucleotide mismatch is preferably near the center of
the mismatch probe.
[0037] The probe lengths of the perfect match probes are typically
chosen to exhibit detectably greater hybridization with the target
sequence relative to the mismatch probes. For example, the nucleic
acid probes may be all 20-mers. However, probes of varying lengths
may also be synthesized on the substrate for any number of reasons
including resolving ambiguities.
[0038] Again referring to FIG. 4, at step 206 the masks for the
synthesis are designed. At step 208 the software utilizes the mask
design and layout information to make the DNA or other polymer
chips. This step 208 will control, among other things, relative
translation of a substrate and the mask, the flow of desired
reagents through a flow cell, the synthesis temperature of the flow
cell, and other parameters. At step 210, another piece of software
is used in scanning a chip thus synthesized and exposed to a
labeled target. The software controls the scanning of the chip, and
stores the data thus obtained in a file that may later be utilized
to extract hybridization information.
[0039] At step 212 a computer system utilizes the layout
information and the fluorescence information to evaluate the
hybridized nucleic acid probes on the chip. Among the important
pieces of information obtained from DNA chips are the relative
fluorescent intensities obtained from the perfect match probes and
mismatch probes. These intensity levels are used to estimate an
expression level for a gene or EST. The computer system used for
analysis will preferably have available other details of the
experiment including possibly the gene name, gene sequence, probe
sequences, probe locations on the substrate, and the like.
[0040] According to the present invention, at step 214, the same
computer system used for analysis or another one displays the
expression level information in a format useful for identifying
genes of interest. The visualized expression level information may
include information collected from multiple applications of one or
more previous steps of FIG. 4.
[0041] FIG. 5 is a flowchart describing steps of estimating an
expression level for a particular gene and determining whether the
expression level is sufficiently high to be displayed. At step 952,
the computer system receives raw scan data of N pairs of perfect
match and mismatch probes. In a preferred embodiment, the
hybridization intensities are photon counts from a fluorescein
labeled target that has hybridized to the probes on the substrate.
For simplicity, the hybridization intensity of a perfect match
probe will be designed "I.sub.pm" and the hybridization intensity
of a mismatch probe will be designed "I.sub.mm."
[0042] Hybridization intensities for a pair of probes are retrieved
at step 954. The background signal intensity is subtracted from
each of the hybridization intensities of the pair at step 956.
Background subtraction can also be performed on all the raw scan
data at the same time.
[0043] At step 958, the hybridization intensities of the pair of
probes are compared to a difference threshold (D) and a ratio
threshold (R). It is determined if the difference between the
hybridization intensities of the pair (I.sub.pm-I.sub.mm) is
greater than or equal to the difference threshold AND the quotient
of the hybridization intensities of the pair (I.sub.pm/I.sub.mm) is
greater than or equal to the ratio threshold. The difference
thresholds are typically user defined values that have been
determined to produce accurate expression monitoring of a gene or
genes. In one embodiment, the difference threshold is 20 and the
ratio threshold is 1.2.
[0044] If I.sub.pm-I.sub.mm>=D and I.sub.pm/I.sub.mm>=R, the
value NPOS is incremented at step 960. In general, NPOS is a value
that indicates the number of pairs of probes which have
hybridization intensities indicating that the gene is likely
expressed. NPOS is utilized in a determination of the expression of
the gene.
[0045] At step 962, it is determined if I.sub.mm-I.sub.pm>=D and
I.sub.mm/I.sub.pm>=R. If these expressions are true, the value
NNEG is incremented at step 964. In general, NNEG is a value that
indicates the number of pairs of probes which have hybridization
intensities indicating that the gene is likely not expressed. NNEG,
like NPOS, is utilized in a determination of the expression of the
gene.
[0046] For each pair that exhibits hybridization intensities either
indicating the gene is expressed or not expressed, a log ratio
value (LR) and intensity difference value (IDIF) are calculated at
step 966. LR is calculated by the log of the quotient of the
hybridization intensities of the pair (I.sub.pm/I.sub.mm). The IDIF
is calculated by the difference between the hybridization
intensities of the pair (I.sub.pm-I.sub.mm). If there is a next
pair of hybridization intensities at step 968, they are retrieved
at step 954.
[0047] At step 972, a decision matrix is utilized to indicate if
the gene is expressed. The decision matrix utilizes the values N,
NPOS, NNEG, LR (multiple LRs), and IDIF (multiple IDIFs). The
following four assignments are performed: P1=NPOS/NNEG P2=NPOS/N
P3=SUM(LR)/N P4=SUM(IDIF)/N These P values are then utilized to
determine if the gene is expressed and if the expression level
should be displayed. In a preferred embodiment, the expression
level of a gene should be displayed if: P1>2.2 P2>0.3
P3>0.8 P4>30
[0048] Once all the pairs of probes have been processed and the
expression of the gene indicated, an average of the IDIF values for
the probes that incremented NPOS or NNEG is calculated at step 975,
which is utilized as an expression level. Of course, other values
including one of P1 through P4 could be used to indicate expression
level.
[0049] For simplicity, FIG. 5 was described in reference to a
single gene or EST. However, the visualization system of the
present invention displays expression results for many genes to
facilitate discovery of genes of interest or ESTs. Furthermore, the
present invention contemplates display of expression levels of a
single gene or ESTs as collected from two or more different samples
such as tissue samples. The sample sources preferably differ in
some characteristic. It will be understood that when the term
"sample" is used herein, measurements made on a single "sample" can
be based on an aggregation of multiple sample collection events or
even multiple organisms.
[0050] FIG. 6 shows a screen display illustrating gene expression
levels for multiple genes as collected from two tissue samples. A
displayed horizontal axis 1002 represents expression level measured
in one or more nucleic acid samples taken from the first tissue
sample. A displayed vertical axis 1004 represents expression level
in one or more nucleic acid samples taken from the second tissue
sample. Each of marks 1006 represent a particular gene whose
expression level has been measured in both the first and second
tissue samples. Each mark 1006 is placed at a distance from
vertical axis 1004 corresponding to expression level in the first
tissue sample and at a distance from the horizontal axis 1002
corresponding to expression level in the second tissue sample.
[0051] The expression levels used for determining the position of
marks 1006 are preferably taken from the result of step 975. The
position of each of marks 1006 depends on two iterations of the
steps of FIG. 5, once for the sample taken from the first tissue
sample and once for the sample taken from the second tissue sample.
However, a mark is preferably displayed only if one of the samples
meets the threshold criteria at step 972.
[0052] In the depicted representative screen display, the first
tissue sample is a cancerous tissue sample and the second tissue
sample is a normal tissue sample. The individual marks represent
the expression levels of selected genes in both cancerous and
normal tissue. A first group of marks 1008 represent genes that are
neither tumor suppressors nor oncogenes since their expression
levels are roughly similar for both normal and cancerous tissue.
These marks 1008 fall roughly along a line which is rotated 45
degrees from each of the axes. A second group of marks 1010
represent genes that are likely oncogenes since their expression
levels are found to be significantly higher in cancerous tissue
than in normal tissue. A third group of marks 1012 represent genes
that are likely tumor suppressors since their expression levels are
found to be significantly higher in normal tissue than in cancerous
tissue. It will be appreciated that expression levels for large
numbers of genes can be reviewed at once to discover the oncogenes
and tumor suppressors.
[0053] Although in the depicted display, the two types of tissue
are normal tissue and cancerous tissue, the present invention would
aid in the discovery of genes whose expression is associated with
any characteristic that varies among tissue samples. For example,
once can compare expression results from tissue from individuals
who have been exposed to HIV but remain infected to tissue obtained
from infected individuals to identify genes conferring resistance
to HIV. One can compare expression results between tissue from
plants that survive drought to plants that do not. One can compare
expression levels among tissue samples at successive stages or
severity levels of the same disease, among tissue samples where
different ultimate outcomes of the disease (e.g., patient death or
remission) are known, among diseased tissue samples that have been
subject to different treatment regimes including e.g.,
chemotherapy, antisense RNA, etc. For cancers, one can compare
expression levels between malignant cells and non malignant cells.
Also expression levels can be compared among different organs,
between species, and among different stages of development of an
organ.
[0054] It will be appreciated that the present invention also
encompasses displays with more than two dimensions. A third visual
dimension can be used to illustrate expression level from a third
tissue sample. The time dimension can also be used to illustrate
successive groups of two or three tissue samples at successive time
periods.
[0055] The time dimension can be also used to correspond to tissue
samples obtained at, e.g., successive stages of a disease.
[0056] Other interface methods corresponding to human senses other
than sight can also be incorporated within the presentation system
of the present invention. The senses may correspond to additional
dimensions. For example, marks can be displayed in succession
accompanies by a sound having characteristics corresponding to
expression level in another tissue sample.
[0057] The user can employ a cursor 1014 to identify a particular
mark as being of interest. Cursor 1014 can be moved to a particular
mark by use of, e.g., mouse 11. Once cursor 1014 is over a mark of
interest, the mark can be selected by, e.g., depression of one of
mouse buttons 13. Selection of a particular mark can be facilitated
by use of a zoom display feature (not shown). Once a particular
mark is selected, further information is displayed about the gene
represented by the mark. A special mouse can transmit a tactile
sensation back to the user corresponding to expression level in a
tissue sample as the user passes the mouse over a corresponding
mark.
[0058] It will be appreciated that the display of FIG. 6 is not
limited to expression information. The two dimensions of FIG. 6 may
correspond to indicators of the presence of various polymers other
than nucleic acids in two different samples. For example, each mark
may correspond to a different polymer, polypeptide, or other
compound. The distance of the mark from each axis would correspond
to a measure of presence of the particular polymer in the sample
corresponding to the axis. One possible measure is produced by
fluorescently tagging polymer samples such as protein samples and
exposing a probe array such as a peptide probe array to the protein
samples. The fluorescent intensity of the probes will then
correspond to the bonding affinity of the sample to the probes. The
intensity measurement or a measurement derived from the intensity
measurement may then be used to position the marks of FIG. 6.
[0059] FIG. 7A shows a screen display giving information about a
particular gene selected from the display of FIG. 6. A cluster
number 702, a GenBank accession number 704, and a verbal
description 706 for the selected gene are displayed. The user can
also select a number of marks 1006 by circling them with cursor
1014. Then a list of information as shown in FIG. 7A is displayed
for all the genes corresponding to the selected marks.
[0060] By selecting GenBank accession number 704 with another
cursor (not shown), the user can direct retrieval of the GenBank
information for the selected gene. If the GenBank information is
not available locally, the retrieval process can include
formulating a query and transmitting the query to a GenBank web
site. Once the GenBank information is retrieved, it can also be
displayed. FIG. 7B depicts the GenBank information for the gene
identified in FIG. 7A.
[0061] In the foregoing specification, the invention has been
described with reference to specific exemplary embodiments thereof.
It will, however, be evident that various modifications and changes
may be made thereunto without departing from the broader spirit and
scope of the invention as set forth in the appended claims and
their full scope of equivalents.
Sequence CWU 1
1
* * * * *