U.S. patent application number 13/690196 was filed with the patent office on 2013-06-06 for integrative pathway modeling for drug efficacy prediction.
This patent application is currently assigned to Medeolinx, LLC. The applicant listed for this patent is Medeolinx, LLC. Invention is credited to Jake Yue Chen, Xiaogang Wu.
Application Number | 20130144887 13/690196 |
Document ID | / |
Family ID | 48524616 |
Filed Date | 2013-06-06 |
United States Patent
Application |
20130144887 |
Kind Code |
A1 |
Chen; Jake Yue ; et
al. |
June 6, 2013 |
INTEGRATIVE PATHWAY MODELING FOR DRUG EFFICACY PREDICTION
Abstract
An integrative pathway modeling approach and ranking/evaluating
algorithms based on disease-specific pathway models can predict
drug efficacy for patients based on their gene expression profiles.
A disease-specific pathway model is first constructed with proteins
and drugs important to the disease by using computational
connectivity maps (C-Maps). Through the pathway model-based ranking
algorithm, ideal drugs or optimized drug combination can be
discovered for a patient to modulate the gene expression profile of
this patient close to those in healthy individuals at
pathway-level.
Inventors: |
Chen; Jake Yue;
(Indianapolis, IN) ; Wu; Xiaogang; (Indianapolis,
IN) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
Medeolinx, LLC; |
Indianapolis |
IN |
US |
|
|
Assignee: |
Medeolinx, LLC
Indianapolis
IN
|
Family ID: |
48524616 |
Appl. No.: |
13/690196 |
Filed: |
November 30, 2012 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
61566641 |
Dec 3, 2011 |
|
|
|
61566642 |
Dec 3, 2011 |
|
|
|
61566644 |
Dec 3, 2011 |
|
|
|
Current U.S.
Class: |
707/748 |
Current CPC
Class: |
G16H 70/40 20180101;
G06F 16/24578 20190101; G06N 20/10 20190101; G16B 40/00 20190201;
G06F 16/285 20190101; G06N 20/00 20190101; G06F 16/284 20190101;
G16B 5/00 20190201; G16C 20/70 20190201; G16H 20/10 20180101; G16C
20/30 20190201 |
Class at
Publication: |
707/748 |
International
Class: |
G06F 17/30 20060101
G06F017/30 |
Claims
1. A method for determining compounds for the treatment of a
particular disease, said method comprising: generating a list of
proteins related to the particular disease; selecting a plurality
of drug pathways from a pathway database based on the list of
proteins; annotating each of the plurality of drug pathways;
mapping each drug-protein interaction on each of the plurality of
drug pathways including identifying effector proteins; translating
the mapped plurality of drug pathways into a weighted network; and
calculating a ranking of the drugs associated with the plurality of
drug pathways based on the effector proteins in each of the
plurality of drug pathways and providing a ranking of drugs
associated with the pathways for treatment of the particular
disease.
2. The method of claim 1 wherein the mapping step includes mapping
a patient expression profile onto identified effector proteins.
3. The method of claim 1 wherein the generating step includes
calculating a disease relevance score for each of the list of
proteins.
4. The method of claim 3 wherein the generating step includes
limiting the list of proteins to proteins having a predetermined
disease relevance score.
5. The method of claim 1 wherein the annotating step includes
associating directionality with each protein in the list of
proteins.
6. The method of claim 1 wherein the annotating step includes
identifying effector proteins in each of the pathways.
7. The method of claim 1 wherein the annotating step includes
filling holes in each of the pathways.
8. The method of claim 1 wherein the translating step includes
classifying effector protein interaction as one of therapeutic,
toxic, and ambiguous.
9. The method of claim 8 wherein the calculating step includes
assigning a high score to drugs including therapeutic protein
interactions and assigning a low score to drugs including toxic
protein interactions.
10. The method of claim 9 wherein the calculating step uses the
equation: w ( N m ) = N m N log 2 ( 2 k N ) ##EQU00006## Where
N.sub.m is the number of the pharmacology effect of type m, where
m=1 for therapeutic and m=2 for toxic, N is the total number of
effects, and 2.sup.k is a boosting factor based on the path length,
k, from the drug to the effector.
11. A system for determining the efficacy of potential drugs for
the treatment of a particular disease for a particular patient,
said system comprising: a disease profile module configured to
generate a list of proteins related to the particular disease,
select a plurality of drug pathways from a pathway database based
on the list of proteins, provide an interface for annotating each
of the plurality of drug pathways, and map each drug-protein
interaction on each of the plurality of drug pathways including
identifying effector proteins, and translate the mapped plurality
of drug pathways into a weighted network; a patient expression
profile module configured to obtain a mapping of the
gene-expression profile of the particular patient onto the
effectors; and an evaluation module configured to calculate a
ranking of the drugs associated with the plurality of drug pathways
based on the effector proteins in each of the plurality of drug
pathways and the mapping of the gene-expression profile of the
particular patient, said evaluation module configured to provide a
ranking of drugs associated with the pathways for treatment of the
particular disease.
12. The system of claim 11 wherein said disease profile module is
configured to calculate a disease relevance score for each of the
list of proteins.
13. The system of claim 12 wherein said disease profile module is
configured to limit the list of proteins to proteins having a
predetermined disease relevance score.
14. The system of claim 11 wherein said disease profile module is
configured to associate directionality with each protein in the
list of proteins.
15. The system of claim 11 wherein said disease profile module is
configured to identify effector proteins in each of the
pathways.
16. The system of claim 11 wherein said disease profile module is
configured to fill holes in each of the pathways.
17. The system of claim 11 wherein said disease profile module is
configured to classify effector protein interaction as one of
therapeutic, toxic, and ambiguous.
18. The system of claim 17 wherein said evaluation module is
configured to assign a high score to drugs including therapeutic
protein interactions and assign a low score to drugs including
toxic protein interactions.
19. The system of claim 18 wherein said evaluation module is
configured to use the equation: w ( N m ) = N m N log 2 ( 2 k N )
##EQU00007## Where Nm is the number of the pharmacology effect of
type m, where m=1 for therapeutic and m=2 for toxic, N is the total
number of effects, and 2k is a boosting factor based on the path
length, k, from the drug to the effector.
20. The system of claim 19 wherein said evaluation module is
configured to scale the drug rankings by use of the equation: r i =
2 1 + - ( w ( N 1 ) - w ( N 2 ) ) - 1 ##EQU00008## Where r.sub.j
can increase if the number of therapeutic affects increase and
decrease if the numbers of toxic effects increase.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] The present application claims priority under 35 U.S.C.
.sctn.119(e) of U.S. Patent Provisional Application Ser. Nos.
61/566,641, 61/566,642, and 61/566,644, respectively titled
Multidimensional Integrative Expression Profiling for Sample
Classification, Integrative Pathway Modeling for Drug Efficacy
Prediction, and Network Modeling for Drug Toxicity Prediction, all
filed Dec. 3, 2011, the disclosures of which are incorporated by
reference herein.
BACKGROUND OF THE INVENTION
[0002] 1. Field of the Invention
[0003] The invention relates to molecular profiling based on
network modeling and analysis. More specifically, the present
disclosure relates to computational methods, systems, devices
and/or apparatuses for molecular expression analysis and candidate
biomarker discovery.
[0004] 2. Description of the Related Art
[0005] Over 1500 Mendelian conditions whose molecular cause is
unknown are listed in the Online Mendelian Inheritance in Man
(OMIM) database. Additionally, almost all medical conditions are in
some way influenced by human genetic variation. The identification
of genes associated with these conditions is a goal of numerous
research groups, in order to both improve medical care and better
understand gene functions, interactions, and pathways. Sequencing
large numbers of candidate genes remains a time-consuming and
expensive task, and it is often not possible to identify the
correct disease gene by inspection of the list of genes within the
interval.
[0006] A number of computational approaches toward candidate-gene
prioritization have been developed that are based on functional
annotation, gene-expression data, or sequence-based features.
High-throughput technologies have produced vast amounts of
protein-protein interaction data, which represent a valuable
resource for candidate-gene prioritization, because genes related
to a specific or similar disease phenotype tend to be located in a
specific neighborhood in the protein-protein interaction network.
However, only relatively simple methods for exploring biological
networks have been applied to the problem of candidate-gene
prioritization, such as the search for direct neighbors of other
disease genes and the calculation of the shortest path between
candidates and known disease proteins.
SUMMARY OF THE INVENTION
[0007] The invention relates to drug efficacy prediction based on
pathway/network modeling and analysis. More specifically, the
present disclosure relates to computational methods, systems,
devices, and/or apparatuses for personalized drug or drug
combination discovery on a certain disease by integrating
pathway/network modeling, gene expression profiling, and
pathway/network model-based ranking/evaluating algorithms.
[0008] Screening millions of chemical compounds to identify "hit"
compounds for specific disease gene/protein targets has been a
mainstream paradigm for modern drug discovery. While the
conventional "One disease, One gene, and One drug" paradigm works
effectively for simple genetic disorders, it fails to produce
effective drugs for complex diseases such as cancer. In complex
diseases, many genes contribute to the disease's phenotype;
therefore, identifying a "magic bullet" drug compound can be quite
elusive.
[0009] In systems medicine or systems pharmacology, the primary
focus is to model a specific drug target's effect on metabolism,
toxicity, and pharmacokinetics by examining the drug target's
molecular interaction partners. However, existing methods focus on
modeling the structure of the drug target network qualitatively. To
examine a drug's effect on a molecular pathway representative of
the disease, more quantitative and accurate pathway/network
modeling and analysis techniques need to be developed.
[0010] Embodiments of the invention provide an integrative pathway
modeling approach and a ranking algorithm based on integrative
pathway models that can predict drug efficacy for patients. These
models are based on patients' gene expression profiles. First, a
disease-specific pathway model--Pharmacology Effect Network
(PEN)--is constructed with important proteins and drugs by
utilizing a computational connectivity maps (C-Maps) approach. In
this pathway model, drug's effects on its proteins (i.e.
activation/inhibition) are annotated as edge attributes. Second, a
PEN-based ranking algorithm--Pharmacological Effect on Target
(PET)--is developed to evaluate drug efficacy by using the gene
expressions corresponding to the important proteins in the PEN
model. Ideal drugs or optimized drug combinations discovered by the
PEN-PET approach can modulate the gene expression profiles of
patients close to those in healthy individuals at
pathway-level.
[0011] In one embodiment, the present invention relates to a method
for determining compounds for the treatment of a particular
disease. First, a list of proteins related to the particular
disease is generated. Next, a plurality of drug pathways are
selected from a pathway database based on the list of proteins.
Each of the drug pathways is annotated, and each drug-protein
interaction on each of the drug pathways is mapped, including
identifying effector proteins. The mapped drug pathways are
translated into a weighted network. The ranking of the drugs
associated with the drug pathways are then calculated based on the
effector proteins in each of the drug pathways to provide a ranking
of drugs associated with the pathways for treatment of the
particular disease. The mapping step includes mapping a patient
expression profile onto identified effector proteins. The
generating step includes calculating a disease relevance score for
each of the list of proteins. The generating step includes limiting
the list of proteins to proteins having a predetermined disease
relevance score. The annotating step includes associating
directionality with each protein in the list of proteins,
identifying effector proteins in each of the pathways, and filling
holes in each of the pathways. The translating step includes
classifying effector protein interaction as one of therapeutic,
toxic, and ambiguous. The calculating step includes assigning a
high score to drugs including therapeutic protein interactions and
assigning a low score to drugs including toxic protein
interactions.
[0012] In another embodiment, the present invention relates to a
system for determining the efficacy of potential drugs for the
treatment of a particular disease for a particular patient. A
disease profile module is configured to generate a list of proteins
related to the particular disease, select a plurality of drug
pathways from a pathway database based on the list of proteins,
provide an interface for annotating each of the plurality of drug
pathways, map each drug-protein interaction on each of the
plurality of drug pathways including identifying effector proteins,
and translate the mapped plurality of drug pathways into a weighted
network. A patient expression profile module is configured to
obtain a mapping of the gene-expression profile of the particular
patient onto the effectors. An evaluation module is configured to
calculate a ranking of the drugs associated with the plurality of
drug pathways based on the effector proteins in each of the
plurality of drug pathways and the mapping of the gene-expression
profile of the particular patient. The evaluation module is further
configured to provide a ranking of drugs associated with the
pathways for treatment of the particular disease. The disease
profile module is configured to calculate a disease relevance score
for each of the list of proteins. The disease profile module is
configured to limit the list of proteins to proteins having a
predetermined disease relevance score. The disease profile module
is further configured to associate directionality with each protein
in the list of proteins. The disease profile module is configured
to identify effector proteins in each of the pathways. The disease
profile module is also configured to fill holes in each of the
pathways. The disease profile module is configured to classify
effector protein interaction as one of therapeutic, toxic, and
ambiguous, and to further assign a high score to drugs including
therapeutic protein interactions and assign a low score to drugs
including toxic protein interactions.
BRIEF DESCRIPTION OF THE DRAWINGS
[0013] The above mentioned and other features and objects of this
invention, and the manner of attaining them, will become more
apparent and the invention itself will be better understood by
reference to the following description of an embodiment of the
invention taken in conjunction with the accompanying drawings,
wherein:
[0014] FIG. 1 is a schematic diagrammatic view of a network system
in which embodiments of the present invention may be utilized.
[0015] FIG. 2 is a block diagram of a computing system (either a
server or client, or both, as appropriate), with optional input
devices (e.g., keyboard, mouse, touch screen, etc.) and output
devices, hardware, network connections, one or more processors, and
memory/storage for data and modules, etc. which may be utilized in
conjunction with embodiments of the present invention.
[0016] FIG. 3 is a schematic diagram illustrating a framework for
drug efficacy prediction by using an integrative pathway modeling
approach and a ranking algorithm based on the integrative pathway
models.
[0017] FIG. 4 is a symbolic diagram illustrating the classification
of a drug's pharmacological effect on targeted proteins.
[0018] FIG. 5 is a multi-dimensional chart diagram illustrating the
heat map for a simple network shown in the left of the figure.
Columns represent the edges while rows represent nodes. `1`
indicates activation or over-expression while `-1` indicates
inhibition or under-expression.
[0019] FIG. 6 is a graph diagram illustrating a breast cancer drug
list with PET scores by applying the PET algorithm to two gene
expression profiles (GSE10866 and GSE3193) with the PEN model. For
each drug, there are two PET scores, displayed as two bars. The
upper one is from GSE10866, and the lower one is from GSE3193.
[0020] FIG. 7 is a graph diagram illustrating a breast cancer drug
list with PET scores by applying the PET algorithm to two gene
expression profiles (GSE10866 and GSE3193) without the PEN model.
For each drug, there are two PET scores, displayed as two bars. The
upper one is from GSE10866, and the lower one is from GSE3193.
[0021] Corresponding reference characters indicate corresponding
parts throughout the several views. Although the drawings represent
embodiments of the present invention, the drawings are not
necessarily to scale and certain features may be exaggerated in
order to better illustrate and explain the present invention. The
flow charts and screen shots are also representative in nature, and
actual embodiments of the invention may include further features or
steps not shown in the drawings. The exemplification set out herein
illustrates an embodiment of the invention, in one form, and such
exemplifications are not to be construed as limiting the scope of
the invention in any manner.
DESCRIPTION OF EMBODIMENTS OF THE PRESENT INVENTION
[0022] The embodiment disclosed below is not intended to be
exhaustive or limit the invention to the precise form disclosed in
the following detailed description. Rather, the embodiment is
chosen and described so that others skilled in the art may utilize
its teachings.
[0023] In the field of molecular biology, gene expression profiling
is the measurement of the activity (the expression) of thousands of
genes at once, to create a global picture of cellular function
including protein and other cellular building blocks. These
profiles may, for example, distinguish between cells that are
actively dividing or otherwise reacting to the current bodily
condition, or show how the cells react to a particular treatment
such as positive drug reactions or toxicity reactions. Many
experiments of this sort measure an entire genome simultaneously,
that is, every gene present in a particular cell, as well as other
important cellular building blocks.
[0024] DNA Microarray technology measures the relative activity of
previously identified target genes. Sequence based techniques, like
serial analysis of gene expression (SAGE, SuperSAGE) are also used
for gene expression profiling. SuperSAGE is especially accurate and
may measure any active gene, not just a predefined set. The advent
of next-generation sequencing has made sequence based expression
analysis an increasingly popular, "digital" alternative to
microarrays called RNA-Seq. Expression profiling provides a view to
what a patient's genetic materials are actually doing at a point in
time. Genes contain the instructions for making messenger RNA
(mRNA), but at any moment each cell makes mRNA from only a fraction
of the genes it carries. If a gene is used to produce mRNA, it is
considered "on", otherwise "off". Many factors determine whether a
gene is on or off, such as the time of day, whether or not the cell
is actively dividing, its local environment, and chemical signals
from other cells. For instance, skin cells, liver cells and nerve
cells turn on (express) somewhat different genes and that is in
large part what makes them different. Therefore, an expression
profile allows one to deduce a cell's type, state, environment, and
so forth.
[0025] Expression profiling experiments often involve measuring the
relative amount of mRNA expressed in two or more experimental
conditions. For example, genetic databases have been created that
reflect a normative state of a healthy patient, which may be
contrasted with databases that have been created from a set of
patient's with a particular disease or other condition. This
contrast is relavent because altered levels of a specific sequence
of mRNA suggest a changed need for the protein coded for by the
mRNA, perhaps indicating a homeostatic response or a pathological
condition. For example, higher levels of mRNA coding for one
particular disease is indicative that the cells or tissues under
study are responding to the effects of the particular disease.
Similarly, if certain cells, for example a type of cancer cells,
express higher levels of mRNA associated with a particular
transmembrane receptor than normal cells do, the expression of that
receptor is indicative of cancer. A drug that interferes with this
receptor may prevent or treat that type of cancer. In developing a
drug, gene expression profiling may assess a particular drug's
toxicity, for example by detecting changing levels in the
expression of certain genes that constitute a biomarker of drug
metabolism.
[0026] For a type of cell, the group of genes and other cellular
materials whose combined expression pattern is uniquely
characteristic to a given condition or disease constitutes the gene
signature of this condition or disease. Ideally, the gene signature
is used to detect a specific state of a condition or disease to
facilitates selection of treatments. Gene Set Enrichment Analysis
(GSEA) and similar methods take advantage of this kind of logic and
uses more sophisticated statistics. Component genes in real
processes display more complex behavior than simply expressing as a
group, and the amount and variety of gene expression is meaningful.
In any case, these statistics measure how different the behavior of
some small set of genes is compared to genes not in that small
set.
[0027] One way to analyze sets of genes and other cellular
materials apparent in gene expression measurement is through the
use of pathway models and network models. Many protein-protein
interactions (PPIs) in a cell form protein interaction networks
(PINs) where proteins are nodes and their interactions are edges.
There are dozens of PPI detection methods to identify such
interactions. In addition, gene regulatory networks (DNA-protein
interaction networks) model the activity of genes which is
regulated by transcription factors, proteins that typically bind to
DNA. Most transcription factors bind to multiple binding sites in a
genome. As a result, all cells have complex gene regulatory
networks which may be combined with PPIs to link together these
various connections. The chemical compounds of a living cell are
connected by biochemical reactions which convert one compound into
another. The reactions are catalyzed by enzymes. Thus, all
compounds in a cell are parts of an intricate biochemical network
of reactions which is called the metabolic network, which may
further enhance PPI and/or DNA-protein network models. Further,
signals are transduced within cells or in between cells and thus
form complex signaling networks that may further augment such
genetic interaction networks. For instance, in the MAPK/ERK pathway
is transduced from the cell surface to the cell nucleus by a series
of protein-protein interactions, phosphorylation reactions, and
other events. Signaling networks typically integrate
protein-protein interaction networks, gene regulatory networks, and
metabolic networks.
[0028] The detailed descriptions which follow are presented in part
in terms of algorithms and symbolic representations of operations
on data bits within a computer memory representing genetic
profiling information derived from patient sample data and
populated into network models. A computer generally includes a
processor for executing instructions and memory for storing
instructions and data. When a general purpose computer has a series
of machine encoded instructions stored in its memory, the computer
operating on such encoded instructions may become a specific type
of machine, namely a computer particularly configured to perform
the operations embodied by the series of instructions. Some of the
instructions may be adapted to produce signals that control
operation of other machines and thus may operate through those
control signals to transform materials far removed from the
computer itself. These descriptions and representations are the
means used by those skilled in the art of data processing arts to
most effectively convey the substance of their work to others
skilled in the art.
[0029] An algorithm is here, and generally, conceived to be a
self-consistent sequence of steps leading to a desired result.
These steps are those requiring physical manipulations of physical
quantities. Usually, though not necessarily, these quantities take
the form of electrical or magnetic pulses or signals capable of
being stored, transferred, transformed, combined, compared, and
otherwise manipulated. It proves convenient at times, principally
for reasons of common usage, to refer to these signals as bits,
values, symbols, characters, display data, terms, numbers, or the
like as a reference to the physical items or manifestations in
which such signals are embodied or expressed. It should be borne in
mind, however, that all of these and similar terms are to be
associated with the appropriate physical quantities and are merely
used here as convenient labels applied to these quantities.
[0030] Some algorithms may use data structures for both inputting
information and producing the desired result. Data structures
greatly facilitate data management by data processing systems, and
are not accessible except through sophisticated software systems.
Data structures are not the information content of a memory, rather
they represent specific electronic structural elements which impart
or manifest a physical organization on the information stored in
memory. More than mere abstraction, the data structures are
specific electrical or magnetic structural elements in memory which
simultaneously represent complex data accurately, often data
modeling physical characteristics of related items, and provide
increased efficiency in computer operation.
[0031] Further, the manipulations performed are often referred to
in terms, such as comparing or adding, commonly associated with
mental operations performed by a human operator. No such capability
of a human operator is necessary, or desirable in most cases, in
any of the operations described herein which form part of the
present invention; the operations are machine operations. Useful
machines for performing the operations of the present invention
include general purpose digital computers or other similar devices.
In all cases the distinction between the method operations in
operating a computer and the method of computation itself should be
recognized. The present invention relates to a method and apparatus
for operating a computer in processing electrical or other (e.g.,
mechanical, chemical) physical signals to generate other desired
physical manifestations or signals. The computer operates on
software modules, which are collections of signals stored on a
media that represents a series of machine instructions that enable
the computer processor to perform the machine instructions that
implement the algorithmic steps. Such machine instructions may be
the actual computer code the processor interprets to implement the
instructions, or alternatively may be a higher level coding of the
instructions that is interpreted to obtain the actual computer
code. The software module may also include a hardware component,
wherein some aspects of the algorithm are performed by the
circuitry itself rather as a result of an instruction.
[0032] The present invention also relates to an apparatus for
performing these operations. This apparatus may be specifically
constructed for the required purposes or it may comprise a general
purpose computer as selectively activated or reconfigured by a
computer program stored in the computer. The algorithms presented
herein are not inherently related to any particular computer or
other apparatus unless explicitly indicated as requiring particular
hardware. In some cases, the computer programs may communicate or
relate to other programs or equipments through signals configured
to particular protocols which may or may not require specific
hardware or programming to interact. In particular, various general
purpose machines may be used with programs written in accordance
with the teachings herein, or it may prove more convenient to
construct more specialized apparatus to perform the required method
steps. The required structure for a variety of these machines will
appear from the description below.
[0033] The present invention may deal with "object-oriented"
software, and particularly with an "object-oriented" operating
system. The "object-oriented" software is organized into "objects",
each comprising a block of computer instructions describing various
procedures ("methods") to be performed in response to "messages"
sent to the object or "events" which occur with the object. Such
operations include, for example, the manipulation of variables, the
activation of an object by an external event, and the transmission
of one or more messages to other objects.
[0034] Messages are sent and received between objects having
certain functions and knowledge to carry out processes. Messages
are generated in response to user instructions, for example, by a
user activating an icon with a "mouse" pointer generating an event.
Also, messages may be generated by an object in response to the
receipt of a message. When one of the objects receives a message,
the object carries out an operation (a message procedure)
corresponding to the message and, if necessary, returns a result of
the operation. Each object has a region where internal states
(instance variables) of the object itself are stored and where the
other objects are not allowed to access. One feature of the
object-oriented system is inheritance. For example, an object for
drawing a "circle" on a display may inherit functions and knowledge
from another object for drawing a "shape" on a display.
[0035] A programmer "programs" in an object-oriented programming
language by writing individual blocks of code each of which creates
an object by defining its methods. A collection of such objects
adapted to communicate with one another by means of messages
comprises an object-oriented program. Object-oriented computer
programming facilitates the modeling of interactive systems in that
each component of the system can be modeled with an object, the
behavior of each component being simulated by the methods of its
corresponding object, and the interactions between components being
simulated by messages transmitted between objects.
[0036] An operator may stimulate a collection of interrelated
objects comprising an object-oriented program by sending a message
to one of the objects. The receipt of the message may cause the
object to respond by carrying out predetermined functions which may
include sending additional messages to one or more other objects.
The other objects may in turn carry out additional functions in
response to the messages they receive, including sending still more
messages. In this manner, sequences of message and response may
continue indefinitely or may come to an end when all messages have
been responded to and no new messages are being sent. When modeling
systems utilizing an object-oriented language, a programmer need
only think in terms of how each component of a modeled system
responds to a stimulus and not in terms of the sequence of
operations to be performed in response to some stimulus. Such
sequence of operations naturally flows out of the interactions
between the objects in response to the stimulus and need not be
preordained by the programmer.
[0037] Although object-oriented programming makes simulation of
systems of interrelated components more intuitive, the operation of
an object-oriented program is often difficult to understand because
the sequence of operations carried out by an object-oriented
program is usually not immediately apparent from a software listing
as in the case for sequentially organized programs. Nor is it easy
to determine how an object-oriented program works through
observation of the readily apparent manifestations of its
operation. Most of the operations carried out by a computer in
response to a program are "invisible" to an observer since only a
relatively few steps in a program typically produce an observable
computer output.
[0038] In the following description, several terms which are used
frequently have specialized meanings in the present context. The
term "object" relates to a set of computer instructions and
associated data which can be activated directly or indirectly by
the user. The terms "windowing environment", "running in windows",
and "object oriented operating system" are used to denote a
computer user interface in which information is manipulated and
displayed on a video display such as within bounded regions on a
raster scanned video display. The terms "network", "local area
network", "LAN", "wide area network", or "WAN" mean two or more
computers which are connected in such a manner that messages may be
transmitted between the computers. In such computer networks,
typically one or more computers operate as a "server", a computer
with large storage devices such as hard disk drives and
communication hardware to operate peripheral devices such as
printers or modems. Other computers, termed "workstations", provide
a user interface so that users of computer networks can access the
network resources, such as shared data files, common peripheral
devices, and inter-workstation communication. Users activate
computer programs or network resources to create "processes" which
include both the general operation of the computer program along
with specific operating characteristics determined by input
variables and its environment. Similar to a process is an agent
(sometimes called an intelligent agent), which is a process that
gathers information or performs some other service without user
intervention and on some regular schedule. Typically, an agent,
using parameters typically provided by the user, searches locations
either on the host machine or at some other point on a network,
gathers the information relevant to the purpose of the agent, and
presents it to the user on a periodic basis. A "module" refers to a
portion of a computer system and/or software program that carries
out one or more specific functions and may be used alone or
combined with other modules of the same system or program.
[0039] The term "desktop" means a specific user interface which
presents a menu or display of objects with associated settings for
the user associated with the desktop. When the desktop accesses a
network resource, which typically requires an application program
to execute on the remote server, the desktop calls an Application
Program Interface, or "API", to allow the user to provide commands
to the network resource and observe any output. The term "Browser"
refers to a program which is not necessarily apparent to the user,
but which is responsible for transmitting messages between the
desktop and the network server and for displaying and interacting
with the network user. Browsers are designed to utilize a
communications protocol for transmission of text and graphic
information over a world wide network of computers, namely the
"World Wide Web" or simply the "Web". Examples of Browsers
compatible with the present invention include the Internet Explorer
program sold by Microsoft Corporation (Internet Explorer is a
trademark of Microsoft Corporation), the Opera Browser program
created by Opera Software ASA, or the Firefox browser program
distributed by the Mozilla Foundation (Firefox is a registered
trademark of the Mozilla Foundation). Although the following
description details such operations in terms of a graphic user
interface of a Browser, the present invention may be practiced with
text based interfaces, or even with voice or visually activated
interfaces, that have many of the functions of a graphic based
Browser.
[0040] Browsers display information which is formatted in a
Standard Generalized Markup Language ("SGML") or a HyperText Markup
Language ("HTML"), both being scripting languages which embed
non-visual codes in a text document through the use of special
ASCII text codes. Files in these formats may be easily transmitted
across computer networks, including global information networks
like the Internet, and allow the Browsers to display text, images,
and play audio and video recordings. The Web utilizes these data
file formats to conjunction with its communication protocol to
transmit such information between servers and workstations.
Browsers may also be programmed to display information provided in
an eXtensible Markup Language ("XML") file, with XML files being
capable of use with several Document Type Definitions ("DTD") and
thus more general in nature than SGML or HTML. The XML file may be
analogized to an object, as the data and the stylesheet formatting
are separately contained (formatting may be thought of as methods
of displaying information, thus an XML file has data and an
associated method).
[0041] The terms "personal digital assistant" or "PDA", as defined
above, means any handheld, mobile device that combines computing,
telephone, fax, e-mail and networking features. The terms "wireless
wide area network" or "WWAN" mean a wireless network that serves as
the medium for the transmission of data between a handheld device
and a computer. The term "synchronization" means the exchanging of
information between a first device, e.g. a handheld device, and a
second device, e.g. a desktop computer, either via wires or
wirelessly. Synchronization ensures that the data on both devices
are identical (at least at the time of synchronization).
[0042] In wireless wide area networks, communication primarily
occurs through the transmission of radio signals over analog,
digital cellular or personal communications service ("PCS")
networks. Signals may also be transmitted through microwaves and
other electromagnetic waves. At the present time, most wireless
data communication takes place across cellular systems using second
generation technology such as code-division multiple access
("CDMA"), time division multiple access ("TDMA"), the Global System
for Mobile Communications ("GSM"), Third Generation (wideband or
"3G"), Fourth Generation (broadband or "4G"), personal digital
cellular ("PDC"), or through packet-data technology over analog
systems such as cellular digital packet data (CDPD'') used on the
Advance Mobile Phone Service ("AMPS").
[0043] The terms "wireless application protocol" or "WAP" mean a
universal specification to facilitate the delivery and presentation
of web-based data on handheld and mobile devices with small user
interfaces. "Mobile Software" refers to the software operating
system which allows for application programs to be implemented on a
mobile device such as a mobile telephone or PDA. Examples of Mobile
Software are Java and Java ME (Java and JavaME are trademarks of
Sun Microsystems, Inc. of Santa Clara, Calif.), BREW (BREW is a
registered trademark of Qualcomm Incorporated of San Diego,
Calif.), Windows Mobile (Windows is a registered trademark of
Microsoft Corporation of Redmond, Wash.), Palm OS (Palm is a
registered trademark of Palm, Inc. of Sunnyvale, Calif.), Symbian
OS (Symbian is a registered trademark of Symbian Software Limited
Corporation of London, United Kingdom), ANDROID OS (ANDROID is a
registered trademark of Google, Inc. of Mountain View, Calif.), and
iPhone OS (iPhone is a registered trademark of Apple, Inc. of
Cupertino, Calif.), and Windows Phone 7. "Mobile Apps" refers to
software programs written for execution with Mobile Software.
[0044] "PACS" refers to Picture Archiving and Communication System
(PACS) involving medical imaging technology for storage of, and
convenient access to, images from multiple source machine types.
Electronic images and reports are transmitted digitally via PACS;
this eliminates the need to manually file, retrieve, or transport
film jackets. The universal format for PACS image storage and
transfer is DICOM (Digital Imaging and Communications in Medicine).
Non-image data, such as scanned documents, may be incorporated
using consumer industry standard formats like PDF (Portable
Document Format), once encapsulated in DICOM. A PACS typically
consists of four major components: imaging modalities such as X-ray
computed tomography (CT) and magnetic resonance imaging (MRI)
(although other modalities such as ultrasound (US), positron
emission tomography (PET), endoscopy (ES), mammograms (MG), Digital
radiography (DR), computed radiography (CR), etc. may be included),
a secured network for the transmission of patient information,
workstations and mobile devices for interpreting and reviewing
images, and archives for the storage and retrieval of images and
reports. When used in a more generic sense, PACS may refer to any
image storage and retrieval system.
[0045] FIG. 1 is a high-level block diagram of a computing
environment 100 according to one embodiment. FIG. 1 illustrates
server 110 and three clients 112 connected by network 114. Only
three clients 112 are shown in FIG. 1 in order to simplify and
clarify the description. Embodiments of the computing environment
100 may have thousands or millions of clients 112 connected to
network 114, for example the Internet. Users (not shown) may
operate software 116 on one of clients 112 to both send and receive
messages network 114 via server 110 and its associated
communications equipment and software (not shown).
[0046] FIG. 2 depicts a block diagram of computer system 210
suitable for implementing server 110 or client 112. Computer system
210 includes bus 212 which interconnects major subsystems of
computer system 210, such as central processor 214, system memory
217 (typically RAM, but which may also include ROM, flash RAM, or
the like), input/output controller 218, external audio device, such
as speaker system 220 via audio output interface 222, external
device, such as display screen 224 via display adapter 226, serial
ports 228 and 230, keyboard 232 (interfaced with keyboard
controller 233), storage interface 234, disk drive 237 operative to
receive floppy disk 238, host bus adapter (HBA) interface card 235A
operative to connect with Fibre Channel network 290, host bus
adapter (HBA) interface card 235B operative to connect to SCSI bus
239, and optical disk drive 240 operative to receive optical disk
242. Also included are mouse 246 (or other point-and-click device,
coupled to bus 212 via serial port 228), modem 247 (coupled to bus
212 via serial port 230), and network interface 248 (coupled
directly to bus 212).
[0047] Bus 212 allows data communication between central processor
214 and system memory 217, which may include read-only memory (ROM)
or flash memory (neither shown), and random access memory (RAM)
(not shown), as previously noted. RAM is generally the main memory
into which operating system and application programs are loaded.
ROM or flash memory may contain, among other software code, Basic
Input-Output system (BIOS) which controls basic hardware operation
such as interaction with peripheral components. Applications
resident with computer system 210 are generally stored on and
accessed via computer readable media, such as hard disk drives
(e.g., fixed disk 244), optical drives (e.g., optical drive 240),
floppy disk unit 237, or other storage medium. Additionally,
applications may be in the form of electronic signals modulated in
accordance with the application and data communication technology
when accessed via network modem 247 or interface 248 or other
telecommunications equipment (not shown).
[0048] Storage interface 234, as with other storage interfaces of
computer system 210, may connect to standard computer readable
media for storage and/or retrieval of information, such as fixed
disk drive 244. Fixed disk drive 244 may be part of computer system
210 or may be separate and accessed through other interface
systems. Modem 247 may provide direct connection to remote servers
via telephone link or the Internet via an internet service provider
(ISP) (not shown). Network interface 248 may provide direct
connection to remote servers via direct network link to the
Internet via a POP (point of presence). Network interface 248 may
provide such connection using wireless techniques, including
digital cellular telephone connection, Cellular Digital Packet Data
(CDPD) connection, digital satellite data connection or the
like.
[0049] Many other devices or subsystems (not shown) may be
connected in a similar manner (e.g., document scanners, digital
cameras and so on). Conversely, all of the devices shown in FIG. 2
need not be present to practice the present disclosure. Devices and
subsystems may be interconnected in different ways from that shown
in FIG. 2. Operation of a computer system such as that shown in
FIG. 2 is readily known in the art and is not discussed in detail
in this application. Software source and/or object codes to
implement the present disclosure may be stored in computer-readable
storage media such as one or more of system memory 217, fixed disk
244, optical disk 242, or floppy disk 238. The operating system
provided on computer system 210 may be a variety or version of
either MS-DOS.RTM. (MS-DOS is a registered trademark of Microsoft
Corporation of Redmond, Wash.), WINDOWS.RTM. (WINDOWS is a
registered trademark of Microsoft Corporation of Redmond, Wash.),
OS/2.RTM. (OS/2 is a registered trademark of International Business
Machines Corporation of Armonk, N.Y.), UNIX.RTM. (UNIX is a
registered trademark of X/Open Company Limited of Reading, United
Kingdom), Linux.RTM. (Linux is a registered trademark of Linus
Torvalds of Portland, Oreg.), or other known or developed operating
system. In some embodiments, computer system 210 may take the form
of a tablet computer, typically in the form of a large display
screen operated by touching the screen. In tablet computer
alternative embodiments, the operating system may be iOS.RTM. (iOS
is a registered trademark of Cisco Systems, Inc. of San Jose,
Calif., used under license by Apple Corporation of Cupertino,
Calif.), Android.RTM. (Android is a trademark of Google Inc. of
Mountain View, Calif.), Blackberry.RTM. Tablet OS (Blackberry is a
registered trademark of Research In Motion of Waterloo, Ontario,
Canada), webOS (webOS is a trademark of Hewlett-Packard Development
Company, L.P. of Texas), and/or other suitable tablet operating
systems.
[0050] Moreover, regarding the signals described herein, those
skilled in the art recognize that a signal may be directly
transmitted from a first block to a second block, or a signal may
be modified (e.g., amplified, attenuated, delayed, latched,
buffered, inverted, filtered, or otherwise modified) between
blocks. Although the signals of the above described embodiments are
characterized as transmitted from one block to the next, other
embodiments of the present disclosure may include modified signals
in place of such directly transmitted signals as long as the
informational and/or functional aspect of the signal is transmitted
between blocks. To some extent, a signal input at a second block
may be conceptualized as a second signal derived from a first
signal output from a first block due to physical limitations of the
circuitry involved (e.g., there will inevitably be some attenuation
and delay). Therefore, as used herein, a second signal derived from
a first signal includes the first signal or any modifications to
the first signal, whether due to circuit limitations or due to
passage through other circuit elements which do not change the
informational and/or final functional aspect of the first
signal.
[0051] One peripheral device particularly useful with embodiments
of the present invention is microarray 250. Generally, microarray
250 represents one or more devices capable of analyzing and
providing genetic expression and other molecular information from
patients. Microarrays may be manufactured in different ways,
depending on the number of probes under examination, costs,
customization requirements, and the type of analysis contemplated.
Such arrays may have as few as 10 probes or over a million
micrometre-scale probes, and are generally available from multiple
commercial vendors. Each probe in a particular array is responsive
to one or more genes, gene-expressions, proteins, enzymes,
metabolites and/or other molecular materials, collectively referred
to hereinafter as targets or target products.
[0052] In some embodiments, gene expression values from microarray
experiments may be represented as heat maps to visualize the result
of data analysis. In other embodiments, the gene expression values
are mapped into a network structure and compared to other network
structures, e.g. normalized samples and/or samples of patients with
a particular condition or disease. In either circumstance, a simple
patient sample may be analyzed and compared multiple times to focus
or differentiate diagnoses or treatments. Thus, a patient having
signs of multiple conditions or diseases may have microarray sample
data analyzed several times to clarify possible diagnoses or
treatments.
[0053] It is also possible, in several embodiments, to have
multiple types of microarrays, each type having sensitivity to
particular expressions and/or other molecular materials, and thus
particularized for a predetermined set of targets. This allows for
an iterative process of patient sampling, analysis, and further
sampling and analysis to refine and personalize diagnoses and
treatments for individuals. While each commercial vendor may have
particular platforms and data formats, most if not all may be
reduced to standardized formats. Further, sample data may be
subject to statistical treatment for analysis and/or accuracy and
precision so that individual patient data is a relevant as
possible. Such individual data may be compared to large databases
having thousands or millions sets of comparative data to assist in
the experiment, and several such databases are available in data
warehouses and available to the public. Due to the biological
complexity of gene expression, the considerations of experimental
design are necessary so that statistically and biologically valid
conclusions may be drawn from the data.
[0054] Microarray data sets are commonly very large, and analytical
precision is influenced by a number of variables. Statistical
challenges include taking into account effects of background noise
and appropriate normalization of the data. Normalization methods
may be suited to specific platforms and, in the case of commercial
platforms, some analysis may be proprietary. The relation between a
probe and the mRNA that it is expected to detect is not trivial.
Some mRNAs may cross-hybridize probes in the array that are
supposed to detect another mRNA. In addition, mRNAs may experience
amplification bias that is sequence or molecule-specific. Thirdly,
probes that are designed to detect the mRNA of a particular gene
may be relying on genomic Expression Sequence Tag (EST) information
that is incorrectly associated with that gene.
[0055] In post-genome biology, molecular connectivity maps have
been proposed to establish comprehensive knowledge links between
molecules of interest in a given biological context. Embodiments of
the present invention use the computational connectivity maps
(C-Maps) web server, which is an online bioinformatics resource
that provides biologists with potential relationships between drugs
and genes in specific disease contexts, to construct an integrative
disease drug-perturbation pathway/network model. Alternatively,
other connectivity maps may be used, based in whole or in part on
information such as that of drug perturbed gene expression
profiles, the Iconix database, and the Library of Integrated
Network-based Cellular Signatures (LINCS) resources.
[0056] A pathway/network model-based drug ranking algorithm is
developed to evaluate the pharmacological effects of drugs on the
target proteins as either "therapeutic" or "toxic". A quantitative
score for each drug can be calculated by determining the functional
importance of targeted proteins, summarized pharmacological
effects, and network topological features.
[0057] Our innovation opens a new way to evaluate the efficacy of
candidate drugs or drug combinations for patients with a specific
disease. The framework includes two major parts as shown in FIG. 3,
the integrative disease drug-perturbation pathway model development
process 302 and pathway-based drug ranking algorithm development
process 304.
[0058] 1. Develop Integrative Disease Drug-Perturbation Pathway
Models:
[0059] First, an integrated disease-specific pathway is constructed
that consists of important disease-related drugs and proteins,
through the C-Maps webserver 310, by simply inputting disease name.
C-Maps 310 measures the importance of disease-related proteins by
using RP score (for example, as may be calculated by the
methodology disclosed in reference 2). After generating list 312 of
important proteins for a disease, all the disease-related proteins
are collectively used as a query search against the Human Pathway
Database 314. This method yields comprehensive list 316 of
important pathways related to the specific disease. The importance
of each pathway is determined by how many proteins are included in
the pathway. Each important pathway may then be annotated and,
optionally, only sub-pathways with disease-related proteins are
considered into list 318. Alternatively, one could use one or more
these pathway or gene set data resources as substitutes for HPD
314, e.g., the Kyoto Encyclopedia of Genes and Genomes (KEGG)
database by Kanehisa Laboratories, Kyoto University &
University of Tokyo; the Reactome database
(http://www.reactome.org) by the National Institutes of Health,
Enfin of the European Union, and Ontario, Canada, New York
University, and Cold Spring Harbor Laboratory; Curated Gene
Signatures database (GeneSigDB at
http://compbio.dfci.harvard.edu/genesigdb/), and the Pathway And
Gene Enhanced Database (PAGED) database by Indiana University
School of Informatics
(http://bio.informatics.iupui.edu/paged/).
[0060] Second, directionality of protein-protein relations is
obtained from the pathways that were annotated and mapped in list
320. Up-regulated relations are represented as pointed arrows and
down-regulated relations are represented as flattened inhibitory
arrows in FIG. 4. Any top-ranked proteins from C-Maps 310 that are
not included in any existing pathway are considered as "holes",
which may be filled by using publicly available information about
proteins, such as images and implementing the query term of Uniprot
ID for the proteins combined with "Pathway". By examination of the
pathway images retrieved, the holes of the pathway may be filled
and thus generate an integrated pathway model with important
disease-related proteins and the interactions (i.e. activation,
inhibition, etc.) between them.
[0061] Third, drugs are mapped to the pathway based on either drug
target information from DrugBank or the drug protein pairs from
C-Maps 310. To determine how a specific drug affects a specific
protein, we curate all the supported abstracts from PubMed 322 for
each drug-protein pair from C-Maps 310, either manually or by an
automated process. Each drug-protein relation is then classified as
"up-regulated", "down-regulated", or "other". In step 324 a
drug-protein relation is considered "up-regulated" whenever a drug
positively influences a protein; "down-regulated" whenever a drug
negatively influences a protein. These relations are then mapped as
the edge attributes of the drug-protein interactions in 326.
Alternatively, this curation of drug-protein relationships may be
done with the use of a generally available natural language
processing (NLP) software-based approach, or database lookup using
a resource such as the Search Tool for Interactions of Chemicals
(STITCH, athttp://stitch.embl.de/) database.
[0062] Finally, we translate the integrated disease-specific
pathway model with mapped pharmacology effects--Pharmacology Effect
Network 330 (PEN)--into a standard weighted network 332 by
representing its components as an adjacency matrix and network
perturbation vectors to facilitate the use of standard mathematical
approaches.
[0063] 1) A standard weighted adjacency matrix, A (FIG. 5), is used
to store all the protein-protein interaction information from the
PEN. Each protein or protein complex, which is represented as
vertex in the network, is uniquely associated with a column-row
pair in the adjacency matrix so that the i-th row and column
contains the edge weight, which is the protein-protein interaction
directionality. Therefore, the a.sub.i,j element contains the
interaction between the protein associated with i-th row and j-th
column of the matrix and is `1` if the proteins have a stimulating
relationship, `-1` if the proteins have an inhibitory relationship,
and `0` if no direct interaction exists between the proteins.
[0064] 2) The drug-target interactions from the PEN are treated
separately from the protein-protein interactions and stored in a
perturbation vectors, v.sub.d0, where d is the drug identifier.
Each drug is also a vertex with drug-protein interactions
represented as edge weights. Hence, maintaining the same unique row
association for proteins and protein complexes as in A, the i-th
row of the vector contains the interaction directionality between
the drug and row associated protein using the same weighting
outlined in 1).
[0065] 2. Develop PEN Model-based Drug Ranking Algorithms:
[0066] Ideal drugs for a patient diagnosed with a certain disease
modulates the gene expression profile of this patient (from GEO
340) to the similar level with those in healthy individuals at
pathway-level, as shown in FIG. 4, we classify the drug's
pharmacological effect on a protein into three categories in
334:
[0067] 1) Therapeutic: if the drug activates the under-expressed
protein or inhibits the over-expressed protein.
[0068] 2) Toxic: if the drug activates the over-expressed protein
or inhibits the under-expressed protein
[0069] 3) Ambiguous: if there is missing directionality information
for proteins or drugs.
[0070] The ranking algorithms assigns a high score for drugs that
have a therapeutic effect on the proteins in the pathway and a low
score for those that have a toxic effect on the proteins in the
pathway. Since the ranking of a drug is essentially based on the
Pharmacological Effect on its Targets, we call it a PET
algorithm.
[0071] Since there may be multiple paths from the drug to its
effected protein or effector and each one of these paths may have
either a therapeutic or toxic effect, we developed an equation (1)
to account for this fact. This equation is based on the information
theory to score the overall effect of the specific drug to its
effector.
w ( N m ) = N m N log 2 ( 2 R N ) ( 1 ) ##EQU00001##
[0072] Where N.sub.m is the number of the pharmacology effect of
type m, where m=1 for therapeutic and m=2 for toxic. N is the total
number of effects, and 2.sup.k is a boosting factor based on the
path length, k, from the drug to the effector. These two weights
are combined to find the overall pharmacological effect of the drug
to that effector using a logistic function to scale the
results.
r i = 2 1 + - ( w ( N 1 ) - w ( N 2 ) ) - 1 ( 2 ) ##EQU00002##
[0073] Where r.sub.j can increase if the number of therapeutic
affects increase and decrease if the numbers of toxic effects
increase.
[0074] In order to obtain an accurate count for N.sub.1 and
N.sub.2, a novel way to track the perturbations spread was
developed. A drug can have multiple paths towards an effector and
different paths may have conflicting effects, therefore an
iterative approach was employed to separately classify each of the
interactions involved as therapeutic or toxic. Therefore, each
protein has an interaction vector associated with it, i.sub.p,k,
where k is the iteration index and p the proteins identifying
number in the adjacency matrix
i p , k = [ N 1 N 2 ] ( 3 ) ##EQU00003##
[0075] The initial perturbation is applied by updating the directly
targeted effector proteins. The effects for the next step in the
perturbation are found by storing the total number of outgoing and
incoming paths for each node at a given step. In this way, we can
systematically trace drugs' effects on the pathway across multiple
discrete steps. In order to determine the effects of the incoming
perturbation, the perturbation vector is calculated by multiplying
the p-th column of A, .sup.TA.sup.T(p,;), by
,=A.sup.T(p,;) (4)
[0076] By examining the p-th element of , the corresponding
proteins vector i.sub.p,k+1 is vector is updated with all incoming
path effects as follows.
i.sub.p,k+1=i.sub.p,k+.SIGMA..sub.m(p)>0(i.sub.n,k-i.sub.p,k-1)+.SIGM-
A..sub.m(n)<0inv(i.sub.n,k-i.sub.n,k-1) (5)
[0077] Where the inv function inverts the vector so that
inv ( i p , k ) = [ N 2 N 1 ] ( 6 ) ##EQU00004##
[0078] All incoming paths from the previous protein receive
opposite classifications for the next protein in the path due to a
down-regulating edge.
[0079] The final normalized PET score which contains the
information of the drug's overall effect on the whole pathway was
found using the following equation:
r total = i = 0 n r i i Rp ( i ) n P ( 7 ) ##EQU00005##
[0080] Where n is the total number of effector proteins in the
integrated pathway, m is the total number of effector proteins in
the i-th drugs sub pathway, and P is the total number of proteins
in the sub network for the drug.
[0081] Where n is the total number of drug affected proteins in the
integrated pathway. In order to compare multiple drugs effect on
the same disease, a normalized PET index is employed to account for
variance in the Rp score of the proteins in a drugs effector
profile.
[0082] 3. Evaluate Breast Cancer Drug Efficacy by Using the PEN-PET
Approach:
[0083] The top 500 drug-protein relations for breast cancer from
C-Maps 310 against HPD 314 are searched to retrieve the top 15
pathways. All 15 pathways are annotated and integrated. The final
integrated pathway contains a total of 221 nodes. Out of the 221
nodes, 188 nodes are for proteins in the pathway with those
proteins from C-Maps 314 labeled as an oval while others encased in
a rectangle; 23 nodes are for drugs and 14 nodes for biological
processes such as "apoptosis" and "angiogenesis".
[0084] C-Maps 314 provide a comprehensive list of PubMed 322
abstracts for each of the 500 drug-protein relations for Breast
Cancer. This amounts to over 5000 PubMed abstracts that were
manually curated, classified, and categorized for breast cancer.
After performing manual curation, 79 drug protein pairs contained
only up information, 57 only down, 11 primarily up, 8 primarily
down, and 345 unknowns.
[0085] A breast cancer PEN may be created with 23 drugs attached,
including 6 FAD approved drugs for breast cancer. This network is
then studied to determine compliance with the scale free property.
Since PEN is a directed network, a degree distribution is generated
for both in degree and out degree. Both of them show a scale free
property with an R square of 0.95 for in degree and 0.96 for out
degree.
[0086] PET algorithm on simple networks was tested, which are the
functional building blocks of large, complex networks, such as the
one drug with one target proteins, simple loop, and so on. Such
testing makes sure that PET is logically correct for those simple
networks. FIG. 5 shows one of the simple networks tested.
[0087] Note that the main diagonal has toxic effects due to the
drug's pharmacological effect on its protein and the protein
expressions in these cases having the same signs (over-expressed
and activating, under expressed and inhibiting, etc.). Similarly
the opposite diagonal has opposite effect types and receives
largely therapeutic scores.
[0088] Then we applied PET algorithm on two breast cancer related
gene expression microarray datasets (GSE3193, GSE10886), in which
samples from breast cancer patients are all Estrogen Receptor
positive (ER+). Differential expressed genes were mapped onto the
PEN to create a disease specific network.
[0089] Of the 23 drugs tested, 3 received purely positive scores,
including two popular breast cancer treatments specifically for
ER+patients, Tamoxifen and Raloxifene; 11 received purely negative
PET scores, including one drug withdrawn from the market, nine
drugs either the lack of any clinical trials or as with few
clinical trials. Mitomycin is the only breast cancer treatment to
receive a negative score across both expression profiles. As
Mitomycin is an older chemotherapy drug targeting at DNA rather
than specific proteins, this result was anticipated. The breast
cancer drugs Exemstane, Anastrozole and Letrozole all target at
CYP19A1 and thus received the same scores. Due to a mixture of
therapeutic and toxic effects on effector proteins in ER+gene
expression profiles, their individual PET scores for the proteins
are quite low.
[0090] In sum, FIG. 6 shows a trend towards ranking withdrawn or
abandoned drugs with a negative score while FDA approved drugs rank
with a positive score; within the FDA drugs, scores favor those
drugs specifically for ER+patients since both microarray datasets
are from ER+patients.
[0091] The significance of manually curating a drugs effect on
breast cancer proteins is highlighted in the case of Ralxofiene.
Raloxifene targets a key breast cancer protein ESR1. It was
predicted that Raloxifene would receive a highly positive score.
However, contradictory information exists about the effects of this
drug on ESR1. For example, in breast tissue Raloxifene inhibits
ESR1, causing a therapeutic effect, but in bones, Raloxifene can
activate ESR1, which would be classified as toxic. Therefore, the
use of manual curation to determine the effects on ESR1 is
significant so that the appropriate interaction may be selected. In
FIG. 6 Raloxifene* indicates the score that Raloxifene would
receive if the wrong effect had been selected.
[0092] An alternate approach to rank drug's efficacy is to ignore
the underlying network structure and only examine the ultimate
effect of the drug on the effector proteins. This approach has
previously been used. In this approach the effect on each protein
is classified as either entirely toxic or entirely therapeutic.
This would be the equivalent of taking only the sign of the PET
score. However, as FIG. 7 demonstrates, this algorithm has trouble
differentiating between drugs with therapeutic outcomes and those
without.
[0093] The following references were used in the development of the
present invention, and the disclosures of which are explicitly
incorporated by reference herein: [0094] 1. Lamb, J., et al., The
Connectivity Map: using gene-expression signatures to connect small
molecules, genes, and disease. Science, 2006. 313(5795): p. 1929.
[0095] 2. Li, J., X. Zhu, and J. Y. Chen, Building disease-specific
drug-protein connectivity maps from molecular interaction networks
and PubMed abstracts. PLoS Comput Biol, 2009. 5(7): p. e1000450.
[0096] 3. Chowbina, S. R., et al., HPD: an online integrated human
pathway database enabling systems biology studies. BMC
Bioinformatics, 2009. 10 Suppl 11: p. S5. [0097] 4. Knox, C., et
al., DrugBank 3.0: a comprehensive resource for `omics` research on
drugs. Nucleic acids research, 2011. 39(Database issue): p.
D1035-41. [0098] 5. Hui Huang, X. W., Shuyu Li, Sara Ibrahim, Taiwo
Ajumobi, and Jake Y. Chen, Evaluate Drug Effects on Gene Expression
Profiles with Connectivity Maps, in 2nd International Workshop on
Data Mining for Biomarker Discovery. 2010.
[0099] While this invention has been described as having an
exemplary design, the present invention may be further modified
within the spirit and scope of this disclosure. This application is
therefore intended to cover any variations, uses, or adaptations of
the invention using its general principles. Further, this
application is intended to cover such departures from the present
disclosure as come within known or customary practice in the art to
which this invention pertains.
* * * * *
References