U.S. patent application number 14/231488 was filed with the patent office on 2014-07-31 for identification of outliers among chemical assays.
This patent application is currently assigned to AZURE VAULT LTD.. The applicant listed for this patent is AZURE VAULT LTD.. Invention is credited to Ze'ev RUSSAK.
Application Number | 20140214339 14/231488 |
Document ID | / |
Family ID | 47090809 |
Filed Date | 2014-07-31 |
United States Patent
Application |
20140214339 |
Kind Code |
A1 |
RUSSAK; Ze'ev |
July 31, 2014 |
IDENTIFICATION OF OUTLIERS AMONG CHEMICAL ASSAYS
Abstract
An apparatus for identifying outliers among chemical reaction
assays, the apparatus comprising a transition point finder,
configured to find at least one transition point in a cumulative
function, the cumulative function giving a quantitative indication
based on a count of points in a calculated space as a function of
distance from a function dividing the calculated space into at
least two groups, each one of the points representing results of a
respective assay of a chemical reaction. The apparatus further
comprises an outlier identifier, in communication with the
transition point finder, configured to use a distance of the found
transition point from the function dividing the calculated space,
as a threshold, for identifying an outlier among the chemical
reaction assays.
Inventors: |
RUSSAK; Ze'ev; (Ramat-Gan,
IL) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
AZURE VAULT LTD. |
RAMAT-GAN |
|
IL |
|
|
Assignee: |
AZURE VAULT LTD.
RAMAT-GAN
IL
|
Family ID: |
47090809 |
Appl. No.: |
14/231488 |
Filed: |
March 31, 2014 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
13098519 |
May 2, 2011 |
8738303 |
|
|
14231488 |
|
|
|
|
Current U.S.
Class: |
702/32 |
Current CPC
Class: |
G01N 31/10 20130101;
G16B 40/00 20190201 |
Class at
Publication: |
702/32 |
International
Class: |
G01N 31/10 20060101
G01N031/10 |
Claims
1. An apparatus for identifying outliers among chemical reaction
assays, the apparatus comprising: a programmed computer; a
transition point finder, implemented on the computer and configured
to find at least one transition point in a cumulative function, the
cumulative function being based on a count of points in a space
calculated using dimensional reduction, as a function of distance
of the points from a function dividing the calculated space into at
least two groups, each one of the points representing results of a
respective assay of a chemical reaction; and an outlier identifier,
in communication with said transition point finder, configured to
use a distance of the found transition point from the function
dividing the calculated space, as a threshold, for identifying an
outlier among the chemical reaction assays.
2. The apparatus of claim 1, further comprising a cumulative
function calculator, in communication with said transition point
finder, configured to calculate the cumulative function.
3. The apparatus of claim 1, wherein the cumulative function gives
a number of points in the calculated space as a function of the
distance of the points from the function dividing the calculated
space.
4. The apparatus of claim 1, wherein the cumulative function gives
a density of points in the calculated space as a function of the
distance of the points from the function dividing the calculated
space.
5. The apparatus of claim 1, wherein the cumulative function is
specific to one of the groups.
6. The apparatus of claim 1, wherein the cumulative function is
common to at least two of the groups.
7. The apparatus of claim 1, wherein the function dividing the
calculated space is a polynomial function.
8. The apparatus of claim 1, wherein the calculated space is a
space enhancing proximity among points representing assays of
qualitatively identical chemical reactions.
9. The apparatus of claim 1, further comprising a result receiver,
configured to receive a plurality of sets of results, each one of
the received sets of results pertaining to a respective assay of a
chemical reaction, and a space calculator, configured to calculate
the space, using the received sets of results.
10. The apparatus of claim 1, further comprising a space
calculator, configured to calculate the space, using diffusion
mapping.
11. The apparatus of claim 1, further comprising a dividing
function calculator, configured to calculate the function dividing
the calculated space.
12. A computer implemented method for identifying outliers among
chemical reaction assays, the method comprising steps the computer
is programmed to perform, the steps comprising: by a programmed
computer, finding at least one transition point in a cumulative
function, the cumulative function being based on a count of points
in a space calculated using dimensional reduction, as a function of
distance of the points from a function dividing the calculated
space into at least two groups, each one of the points representing
results of a respective assay of a chemical reaction; and using a
distance of the found transition point from the function dividing
the calculated space, as a threshold, for identifying an outlier
among the chemical reaction assays.
13. The method of claim 12, wherein the cumulative function gives a
number of points in the calculated space as a function of the
distance of the points from the function dividing the calculated
space.
14. The method of claim 12, wherein the cumulative function gives a
density of points in the calculated space as a function of the
distance of the points from the function dividing the calculated
space.
15. The method of claim 12, wherein the cumulative function is
specific to one of the groups.
16. The method of claim 12, wherein the cumulative function is
common to at least two of the groups.
17. The method of claim 12, further comprising a step of receiving
a plurality of sets of results, each one of the received sets of
results pertaining to a respective assay of a chemical reaction,
and calculating the space, using the received sets of results.
18. The method of claim 12, further comprising a step of
calculating the space, using diffusion mapping.
19. A non-transitory computer readable medium storing computer
executable instructions for performing steps of identifying
outliers among chemical reaction assays, the steps comprising:
finding at least one transition point in a cumulative function, the
cumulative function being based on a count of points in a space
calculated using dimensional reduction, as a function of distance
of the points from a function dividing the calculated space into at
least two groups, each one of the points representing results of a
respective assay of a chemical reaction; and using a distance of
the found transition point from the function dividing the
calculated space, as a threshold, for identifying an outlier among
the chemical reaction assays.
20. The computer readable medium of claim 19, wherein the steps
further comprise a step of calculating the cumulative function.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application is a continuation of application Ser. No.
13/098,519, filed May 2, 2011; which is hereby incorporated in its
entirety including all tables, figures, and claims.
FIELD AND BACKGROUND OF THE INVENTION
[0002] The present invention relates to analyzing chemical
reactions and, more particularly, but not exclusively to systems
and methods for automatically identifying outliers among chemical
reaction assays.
[0003] An outlier may be indicative of a measurement error, a
contaminated sample, a human error, an experimental error, etc, as
known in the art.
[0004] Traditionally, the classification of chemical assays is
based on manual examination by an expert in the field.
[0005] The expert manually examines hundreds or thousands of
samples, say thousands of graphs derived from results of
Quantitative Fluorescent Polymerase Chain Reaction (QF-PCR) based
assays, or other chemical assays.
[0006] The expert detects certain features in the samples, and
classifies each sample into one of two or more groups.
[0007] Typically, the results of the chemical assays are obtained
through real time photometric measurements of reactions such as
real-time Polymerase Chain Reaction (PCR) and Quantitative
Fluorescent Polymerase Chain Reaction (QF-PCR), thus producing a
time series of values.
[0008] The values produced through the measurements, may be
represented in a two dimensional graph depicting spectral changes
over time, say of a real-time PCR based assay.
[0009] The values may also be represented in a three dimensional
graph depicting spectral changes vs. molecule length vs. time, say
of a Capillary PCR based assay, etc., as known in the art.
[0010] For example, the spectral changes may include Fluorescence
Intensity (FI) values measured over a PCR reaction apparatus, as
known in the art. The measured FI values are indicative of presence
and quantity of specific molecules, as detected in the PCR
reaction.
[0011] The values measured may be used, to classify the chemical
reaction assay into a certain type, to determine if the assay is
positive or negative (say with respect to occurrence of a certain
genetic mutation), etc.
[0012] For example, in QF-PCR, a graph representing the values
measured over time may have linear properties, which indicate that
no amplification takes place in a reaction apparatus.
[0013] Alternatively, the QF-PCR graph may include a sigmoid curve
interval, which indicates the occurrence of a DNA amplification
reaction in the reaction apparatus.
[0014] Parameters extracted from the graph are used to determine
the properties of the amplification.
[0015] The right combination of parameters, say slopes of the graph
in selected points on the graph, may indicate the existence of a
specific subject (say the existence of a specific bacterial DNA
sequence).
[0016] The traditional methods rely on a model built manually, by
the expert.
[0017] In order to build the model, the expert has to manually
examine hundreds or thousands of samples of a training set.
[0018] The expert may position points in a coordinate system, say
on a paper or on a computer screen. Each of the points represents
one the samples. The position of each of the points depends on the
parameters extracted from the graph, and represents the results of
a respective assay (i.e. a single one of the samples).
[0019] Then, the expert classifies each of the samples (as
represented by points) into one of two or more groups.
[0020] Finally, the expert manually draws a line defining a
separation between the two or more groups. For example, the expert
may draw a line defining a separation between positive and negative
samples.
[0021] A new sample may thus be classified into one of the groups
based on position of a point which represents the future sample, on
one or other side of the line which defines the separation between
the groups.
[0022] Occasionally, while building the training set, the expert
may find some of the samples problematic and difficult to classify
into one of the groups, thus finding the problematic samples as
outliers.
[0023] Some currently used methods are based on automatic
classification of samples. For example, some of the currently used
methods use SVM (Support Vector Machine), to identify patterns in
biological systems.
[0024] Support Vector Machines (SVMs) are a set of related
supervised learning methods that analyze data and recognize
patterns. Supervised learning methods are widely used for
classification and regression analysis.
[0025] Standard SVM may take a set of input data, and predict for
each given input, which of two possible categories the input is a
member of.
[0026] Given a set of training samples, each marked as belonging to
one of the two categories, an SVM training algorithm builds a model
usable for assigning a new sample into one category or the
other.
[0027] Intuitively, the SVM built model is a representation of the
samples as points in space, mapped so that the examples of the
separate categories are divided by a clear gap.
[0028] Consequently, new samples may be mapped into that same space
and predicted to belong to one of the categories, based on which
side of the gap the new samples fall on.
SUMMARY OF THE INVENTION
[0029] According to one aspect of the present invention there is
provided an apparatus for identifying outliers among chemical
reaction assays, the apparatus comprising: a transition point
finder, configured to find at least one transition point in a
cumulative function, the cumulative function giving a quantitative
indication based on a count of points in a calculated space as a
function of distance from a function dividing the calculated space
into at least two groups, each one of the points representing
results of a respective assay of a chemical reaction, and an
outlier identifier, in communication with the transition point
finder, configured to use a distance of the found transition point
from the function dividing the calculated space, as a threshold,
for identifying an outlier among the chemical reaction assays.
[0030] According to a second aspect of the present invention there
is provided a computer implemented method for identifying outliers
among chemical reaction assays, the method comprising steps the
computer is programmed to perform, the steps comprising: finding at
least one transition point in a cumulative function, the cumulative
function giving a quantitative indication based on a count of
points in a calculated space as a function of distance from a
function dividing the calculated space into at least two groups,
each one of the points representing results of a respective assay
of a chemical reaction, and using a distance of the found
transition point from the function dividing the calculated space,
as a threshold, for identifying an outlier among the chemical
reaction assays.
[0031] According to a third aspect of the present invention there
is provided a computer readable medium storing computer executable
instructions for performing steps of identifying outliers among
chemical reaction assays, the steps
[0032] comprising: finding at least one transition point in a
cumulative function, the cumulative function giving a quantitative
indication based on a count of points in a calculated space as a
function of distance from a function dividing the calculated space
into at least two groups, each one of the points representing
results of a respective assay of a chemical reaction, and using a
distance of the found transition point from the function dividing
the calculated space, as a threshold, for identifying an outlier
among the chemical reaction assays.
[0033] Unless otherwise defined, all technical and scientific terms
used herein have the same meaning as commonly understood by one of
ordinary skill in the art to which this invention belongs. The
materials, methods, and examples provided herein are illustrative
only and not intended to be limiting.
[0034] Implementation of the method and system of the present
invention involves performing or completing certain selected tasks
or steps manually, automatically, or a combination thereof.
[0035] Moreover, according to actual instrumentation and equipment
of preferred embodiments of the method and system of the present
invention, several selected steps could be implemented by hardware
or by software on any operating system of any firmware or a
combination thereof.
[0036] For example, as hardware, selected steps of the invention
could be implemented as a chip or a circuit. As software, selected
steps of the invention could be implemented as a plurality of
software instructions being executed by a computer using any
suitable operating system. In any case, selected steps of the
method and system of the invention could be described as being
performed by a data processor, such as a computing platform for
executing a plurality of instructions.
BRIEF DESCRIPTION OF THE DRAWINGS
[0037] The invention is herein described, by way of example only,
with reference to the accompanying drawings. With specific
reference now to the drawings in detail, it is stressed that the
particulars shown are by way of example and for purposes of
illustrative discussion of the preferred embodiments of the present
invention only, and are presented in order to provide what is
believed to be the most useful and readily understood description
of the principles and conceptual aspects of the invention. The
description taken with the drawings making apparent to those
skilled in the art how the several forms of the invention may be
embodied in practice.
[0038] In the drawings:
[0039] FIG. 1 is a block diagram schematically illustrating a first
apparatus for identifying outliers among chemical reaction assays,
according to an exemplary embodiment of the present invention.
[0040] FIG. 2 is a block diagram schematically illustrating a
second apparatus for identifying outliers among chemical reaction
assays, according to an exemplary embodiment of the present
invention.
[0041] FIG. 3 is a simplified flowchart schematically illustrating
a first method for identifying outliers among chemical reaction
assays, according to an exemplary embodiment of the present
invention.
[0042] FIG. 4 is a simplified flowchart schematically illustrating
a second method for identifying outliers among chemical reaction
assays, according to an exemplary embodiment of the present
invention.
[0043] FIG. 5 is a simplified flowchart schematically illustrating
a third method for identifying outliers among chemical reaction
assays, according to an exemplary embodiment of the present
invention.
[0044] FIG. 6 is a block diagram schematically illustrating a
computer readable medium storing computer executable instructions
for performing steps of identifying outliers among chemical
reaction assays, according to an exemplary embodiment of the
present invention.
[0045] FIG. 7 is an exemplary graph depicting points representing
chemical assays, in a Euclidean space, according to an exemplary
embodiment of the present invention.
[0046] FIG. 8 is an exemplary graph depicting points representing
chemical assays in a space calculated using diffusion mapping,
according to an exemplary embodiment of the present invention.
[0047] FIG. 9A is an exemplary graph depicting a function which
defines a line and separates a calculated space into two groups,
according to an exemplary embodiment of the present invention.
[0048] FIG. 9B is an exemplary graph depicting a function which
defines two lines and separates a calculated space into three
groups, according to an exemplary embodiment of the present
invention.
[0049] FIG. 9C is an exemplary graph depicting a function which
defines a hyper- surface, and separates a calculated space into two
groups, according to an exemplary embodiment of the present
invention.
[0050] FIG. 10 is a second exemplary graph depicting the function
which defines a line and separates the calculated space into the
two groups, according to an exemplary embodiment of the present
invention.
[0051] FIG. 11 is an exemplary graph depicting cumulative
functions, according to an exemplary embodiment of the present
invention.
[0052] FIG. 12, which is an exemplary graph depicting thresholds
based on transition points of a cumulative function, according to
an exemplary embodiment of the present invention.
DESCRIPTION OF THE PREFERRED EMBODIMENTS
[0053] The present embodiments comprise an apparatus, a computer
readable medium, and a method, for automatically identifying
outliers among chemical reaction assays.
[0054] In a method according to exemplary embodiments of the
present invention, results of chemical reaction assays are
represented as points in a mathematical space, as described in
further detail hereinbelow.
[0055] The space is calculated using the results, such that each of
the points represents a respective one of the chemical reaction
assays.
[0056] The points are divided into two or more groups, using a
function which divides the calculated space into the groups, say a
function which defines a line or a hyper-surface which divides the
space into two groups, as described in further detail
hereinbelow.
[0057] Optionally, the function, which divides the calculated
space, is resultant upon SVM (Support Vector Machine)
classification of the points, as described in further detail
hereinbelow.
[0058] In the method, there is calculated a cumulative
function.
[0059] The cumulative function gives a quantitative indication
based on a count of points in the calculated space as a function of
distance from the function which divides the calculated space into
the groups.
[0060] In one example, the cumulative function gives the number of
points within a distance from the function which divides the
calculated space into the groups.
[0061] When the calculated space includes points which concentrate
in groups, the cumulative function has a sigmoid form, and is thus
characterized by transition points also referred to hereinbelow, as
elbow points.
[0062] Next, the cumulative function searched for a transition
point (i.e. an elbow point).
[0063] A transition point is point in which the cumulative
function's slope changes significantly.
[0064] For example, the transition point may be a point in which a
change in the cumulative function's slope is maximal (typically, a
point in which the cumulative function second derivative's absolute
value is maximal), a point in which the cumulative function shifts
from a relatively linear slope to a significantly exponential slope
(or vise versa), etc., as described in further detail
hereinbelow.
[0065] Finally, a distance of the found transition point from the
function which divides the calculated space is used as a threshold,
for identifying an outlier among the chemical reaction assays.
[0066] That is to say that points residing within a distance
smaller than the distance of the transition point from the function
which divides the calculated space, are found to be outliers.
[0067] Outliers are points that though falling in one of the
groups, lack a strong enough linkage to the group, and are
characterized by excessive proximity to the function which divides
the calculated space into the groups.
[0068] Consequently, assays represented by the points found to be
outliers, may be classified as erroneous.
[0069] Optionally, the outliers are indicative of a measurement
error, a contaminated sample, a human error, an experimental error,
etc, as known in the art.
[0070] In one example, an outlier is a point which represents
results of a Polymerase Chain Reaction (PCR) based assay,
mistakenly classified into a group of positive assays, although no
significant amplification occurs in the assay.
[0071] In another example, an outlier is a point which represents a
PCR based assay, mistakenly classified into a group of negative
assays though exponential amplification does occur in the PCR based
assay.
[0072] The principles and operation of an apparatus according to
the present invention may be better understood with reference to
the drawings and accompanying description.
[0073] Before explaining at least one embodiment of the invention
in detail, it is to be understood that the invention is not limited
in its application to the details of construction and the
arrangement of the components set forth in the following
description or illustrated in the drawings.
[0074] The invention is capable of other embodiments or of being
practiced or carried out in various ways. Also, it is to be
understood that the phraseology and terminology employed herein is
for the purpose of description and should not be regarded as
limiting.
[0075] Reference is now made to FIG. 1, which is a block diagram
schematically illustrating a first apparatus for identifying
outliers among chemical reaction assays, according to an exemplary
embodiment of the present invention.
[0076] Apparatus 1000 may be implemented as a computer program, as
hardware, as a combination of a computer program and hardware,
etc.
[0077] The apparatus 1000 may be implemented as a computer server
application in remote communication with one or more dedicated
client programs installed on remote user computers, say in a
Software-as-a-Service (SaaS) mode, as known in the art.
[0078] Apparatus 1000 may also be implemented as a computer program
installed on a user's computer (say a desktop computer, a laptop
computer, a tablet computer, a cellular phone, etc).
[0079] Apparatus 1000 includes a transition point finder 110.
[0080] The transition point finder 110 finds one or more transition
points in a cumulative function, as described in further detail
hereinbelow.
[0081] The cumulative function (say a mathematical function) gives
a quantitative indication based on a count of points in a
calculated space, as a function of distance from a function which
divides the calculated space into two (or more) groups, as
described in further detail hereinbelow.
[0082] Each one of the points represents results of a respective
assay of a chemical reaction.
[0083] Optionally, the quantitative indication is the number of
points in the calculated space as a function of the distance from
the function which divides the calculated space into the
groups.
[0084] Optionally, the distance is a conventional analytic geometry
distance, which is the shortest distance between the point and the
function (i.e. a normal), as known in the art.
[0085] Optionally, the distance is a rectilinear distance, as known
in the art.
[0086] Optionally, the quantitative indication is a density of
points (say a number of points per one unit of space) in the
calculated space, as a function of the distance from the function
which divides the calculated space.
[0087] Optionally, the cumulative function is specific to one of
the groups, as described in further detail hereinbelow.
[0088] Optionally, the cumulative function is common to two or more
of the groups, as described in further detail hereinbelow.
[0089] Optionally, the function which divides the calculated space,
defines one or more lines, as described in further detail
hereinbelow.
[0090] Optionally, the function which divides the calculated space,
defines one or more hyper-surfaces, as described in further detail
hereinbelow.
[0091] Optionally, the function which divides the space is a
polynomial function.
[0092] Optionally, the calculated space is a space which enhances
proximity among points that represent assays of qualitatively
identical chemical reactions, such as a space calculated using
diffusion mapping or another dimensionality reduction technique, as
described in further detail hereinbelow.
[0093] The apparatus 1000 further includes an outlier identifier
120, in communication with the transition point finder 110.
[0094] The outlier identifier 120 uses a distance of the found
transition point from the function which divides the calculated
space, as a threshold, for identifying an outlier among the
chemical reaction assays, as described in further detail
hereinbelow.
[0095] Optionally, the apparatus 1000 further includes a cumulative
function calculator, which calculates the cumulative function, as
described in further detail hereinbelow.
[0096] Optionally, the apparatus 1000 further includes a result
receiver, which receives a plurality of sets of results, each one
of the received sets of results pertaining to a respective assay of
a chemical reaction.
[0097] Optionally, the apparatus 1000 further includes a space
calculator, which calculates the space, using the received sets of
results, as described in further detail hereinbelow.
[0098] Optionally, the space calculator calculates the space, using
dimensionality reduction.
[0099] For example, the space calculator may calculate the space
using diffusion mapping or another dimensionality reduction
technique, as described in further detail hereinbelow.
[0100] Optionally, the apparatus 1000 further includes a dividing
function calculator, which calculates the function which divides
the calculated space, say using a Support Vector Machine (SVM), as
described in further detail hereinbelow.
[0101] Optionally, the apparatus 1000, further includes an
out-of-sample classifier, which classifies results of a new assay
not represented in the space as originally calculated, using
out-of-sample extension, as described in further detail
hereinbelow.
[0102] Reference is now made to FIG. 2, which is a block diagram
schematically illustrating a second apparatus for identifying
outliers among chemical reaction assays, according to an exemplary
embodiment of the present invention.
[0103] Apparatus 2000 may be implemented as a computer program, as
hardware, as a combination of a computer program and hardware,
etc.
[0104] The apparatus 2000 may be implemented as a computer server
application in remote communication with one or more dedicated
client programs installed on remote user computers, say in a
Software-as-a-Service (SaaS) mode, as known in the art.
[0105] Apparatus 2000 may also be implemented as a computer program
installed on a user's computer (say a desktop computer, a laptop
computer, a tablet computer, a cellular phone, etc).
[0106] Apparatus 2000 includes a cumulative function calculator
207.
[0107] The cumulative function calculator 207 calculates a
cumulative function, as described in further detail
hereinbelow.
[0108] The cumulative function (say a mathematical function) gives
a quantitative indication based on a count of points in a
calculated space, as a function of distance from a function which
divides the calculated space into two (or more) groups, as
described in further detail hereinbelow.
[0109] Optionally, the distance is a conventional analytic geometry
distance, which is the shortest distance between the point and the
function (i.e. a normal), as known in the art.
[0110] Optionally, the distance is a rectilinear distance, as known
in the art.
[0111] Each one of the points represents results of a respective
assay of a chemical reaction.
[0112] Optionally, the quantitative indication is the number of
points in the calculated space as a function of the distance from
the function which divides the calculated space.
[0113] Optionally, the quantitative indication is a density of
points in the calculated space as a function of the distance from
the function which divides the calculated space, say a number of
points per one unit of space, as described in further detail
hereinbelow.
[0114] Optionally, the cumulative function is specific to one of
the groups, as described in further detail hereinbelow.
[0115] Optionally, the cumulative function is common to two or more
of the groups, as described in further detail hereinbelow.
[0116] Apparatus 2000 further includes a transition point finder
210, in communication with the cumulative function calculator
207.
[0117] The transition point finder 210 finds one or more transition
points in the cumulative function calculated by the cumulative
function calculator 207, as described in further detail
hereinbelow.
[0118] Optionally, the function which divides the calculated space,
defines one or more lines, as described in further detail
hereinbelow.
[0119] Optionally, the function which divides the calculated space,
defines one or more hyper-surfaces, as described in further detail
hereinbelow.
[0120] Optionally, the function which divides the space is a
polynomial function.
[0121] Optionally, the calculated space is a space which enhances
proximity among points that represent assays of qualitatively
identical chemical reactions, such as a space calculated using
diffusion mapping, or another dimensionality reduction technique,
as described in further detail hereinbelow.
[0122] The apparatus 2000 further includes an outlier identifier
220, in communication with the transition point finder 210.
[0123] The outlier identifier 220 uses a distance of the found
transition point from the function which divides the calculated
space, as a threshold, for identifying one or more outliers among
the chemical reaction assays, as described in further detail
hereinbelow.
[0124] Optionally, the apparatus 2000 further includes a result
receiver, which receives a plurality of sets of results, each one
of the received sets of results pertaining to a respective assay of
a chemical reaction.
[0125] Optionally, the apparatus 2000 further includes a space
calculator, which calculates the space, using the received sets of
results, as described in further detail hereinbelow.
[0126] Optionally, the space calculator calculates the space, using
dimensionality reduction.
[0127] For example, the space calculator may calculate the space
using diffusion mapping or another dimensionality reduction
technique, as described in further detail hereinbelow.
[0128] Optionally, the apparatus 2000 further includes a dividing
function calculator, which calculates the function which divides
the calculated space, say using a Support Vector Machine (SVM), as
described in further detail hereinbelow.
[0129] Optionally, the apparatus 2000, further includes an
out-of-sample classifier, which classifies results of a new assay
not represented in the space as originally calculated, using
out-of-sample extension, as described in further detail
hereinbelow.
[0130] Reference is now made to FIG. 3 which is a simplified
flowchart schematically illustrating a first method for identifying
outliers among chemical reaction assays, according to an exemplary
embodiment of the present invention.
[0131] A first exemplary method, according to an exemplary
embodiment of the present invention, may be implemented on a
computer, say using a computer server application in remote
communication with one or more dedicated client programs installed
on remote user computers, say in a Software-as-a-Service (SaaS)
mode, as known in the art.
[0132] The first exemplary method may also be implemented using a
computer program installed on a user's computer (say a desktop
computer, a laptop computer, a tablet computer, a cellular phone,
etc).
[0133] In the first exemplary method, there are found 310 one or
more transition points in a cumulative function, say using the
transition point finder 110 of the exemplary apparatus 1000, as
described in further detail hereinbelow.
[0134] The cumulative function (say a mathematical function) gives
a quantitative indication based on a count of points in a
calculated space as a function of distance from a function which
divides the calculated space into two (or more) groups.
[0135] Optionally, the distance is a conventional analytic geometry
distance, which is the shortest distance between the point and the
function (i.e. a normal), as known in the art.
[0136] Optionally, the distance is a rectilinear distance, as known
in the art.
[0137] Each one of the points represents results of a respective
assay of a chemical reaction.
[0138] Optionally, the quantitative indication is a number of
points in the calculated space as a function of the distance from
the function which divides the calculated space.
[0139] Optionally, the quantitative indication is a density of
points in the calculated space as a function of the distance from
the function which divides the calculated space, say a number of
points per one unit of space, as described in further detail
hereinbelow.
[0140] Optionally, the cumulative function is specific to one of
the groups, as described in further detail hereinbelow.
[0141] Optionally, the cumulative function is common to two or more
of the groups, as described in further detail hereinbelow.
[0142] Optionally, the function which divides the calculated space,
defines one or more lines, as described in further detail
hereinbelow.
[0143] Optionally, the function which divides the calculated space,
defines one or more hyper-surfaces, as described in further detail
hereinbelow.
[0144] Optionally, the function which divides the space is a
polynomial function.
[0145] Optionally, the calculated space is a space which enhances
proximity among points that represent assays of qualitatively
identical chemical reactions, such as a space calculated using
diffusion mapping or another dimensionality reduction technique, as
described in further detail hereinbelow.
[0146] Next, there is used a distance of the found transition point
from the function which divides the calculated space, as a
threshold, for identifying 320 outliers among the chemical reaction
assays, as described in further detail hereinbelow.
[0147] Optionally, the first exemplary method further includes
calculating the cumulative function, as described in further detail
hereinbelow.
[0148] Optionally, the first exemplary method further includes
receiving a plurality of sets of results, each one of the received
sets of results pertaining to a respective assay of a chemical
reaction.
[0149] Optionally, the first exemplary method further includes
calculating the space, using the received sets of results, as
described in further detail hereinbelow.
[0150] Optionally, the space is calculated using dimensionality
reduction.
[0151] For example, the first exemplary method may include
calculating the space, using diffusion mapping or another
dimensionality reducing technique, as described in further detail
hereinbelow.
[0152] Optionally, the first exemplary method further includes
calculating the function which divides the calculated space, say
using a Support Vector Machine (SVM), as described in further
detail hereinbelow.
[0153] Optionally, the first exemplary method further includes
classifying results of a new assay not represented in the space as
originally calculated, using out-of-sample extension, as described
in further detail hereinbelow.
[0154] Reference is now made to FIG. 4 which is a simplified
flowchart schematically illustrating a second method for
identifying outliers among chemical reaction assays, according to
an exemplary embodiment of the present invention.
[0155] A second exemplary method, according to an exemplary
embodiment of the present invention, may be implemented on a
computer, say using a computer server application in remote
communication with one or more dedicated client programs installed
on remote user computers, say in a Software-as-a-Service (SaaS)
mode, as known in the art.
[0156] The second exemplary method may also be implemented using a
computer program installed on a user's computer (say a desktop
computer, a laptop computer, a tablet computer, a cellular phone,
etc).
[0157] In the second exemplary method, there is calculated 407 a
cumulative function, as described in further detail
hereinbelow.
[0158] The cumulative function (say a mathematical function) gives
a quantitative indication based on a count of points in a
calculated space as a function of distance from a function which
divides the calculated space into two (or more) groups.
[0159] Optionally, the distance is a conventional analytic geometry
distance, which is the shortest distance between the point and the
function, say a normal, as known in the art.
[0160] Optionally, the distance is a rectilinear distance, as known
in the art.
[0161] Each one of the points represents results of a respective
assay of a chemical reaction.
[0162] Optionally, the quantitative indication is a number of
points in the calculated space as a function of the distance from
the function which divides the calculated space.
[0163] Optionally, the quantitative indication is a density of
points in the calculated space as a function of the distance from
the function which divides the calculated space, say a number of
points per one unit of space.
[0164] Optionally, the cumulative function is specific to one of
the groups, as described in further detail hereinbelow.
[0165] Optionally, the cumulative function is common to two or more
of the groups, as described in further detail hereinbelow.
[0166] Optionally, the function which divides the calculated space,
defines one or more lines, as described in further detail
hereinbelow.
[0167] Optionally, the function which divides the calculated space,
defines one or more hyper-surfaces, as described in further detail
hereinbelow.
[0168] Optionally, the function which divides the space is a
polynomial function.
[0169] Optionally, the calculated space is a space which enhances
proximity among points that represent assays of qualitatively
identical chemical reactions, such as a space calculated using
diffusion mapping or another dimensionality reduction technique, as
described in further detail hereinbelow.
[0170] Next, there are found 410 one or more transition points in a
cumulative function, say using the transition point finder 210 of
the exemplary apparatus 2000, as described in further detail
hereinbelow.
[0171] Finally, there is used a distance of the found transition
point from the function which divides the calculated space, as a
threshold, for identifying 420 outliers among the chemical reaction
assays, as described in further detail hereinbelow.
[0172] Optionally, the second exemplary method further includes
receiving a plurality of sets of results, each one of the received
sets of results pertaining to a respective assay of a chemical
reaction.
[0173] Optionally, the second exemplary method further includes
calculating the space, using the received sets of results, as
described in further detail hereinbelow.
[0174] Optionally, the space is calculated using dimensionality
reduction.
[0175] For example, the second exemplary method may include
calculating the space, using diffusion mapping or another
dimensionality reducing technique, as described in further detail
hereinbelow.
[0176] Optionally, the second exemplary method further includes
calculating the function which divides the calculated space, say
using a Support Vector Machine (SVM), as described in further
detail hereinbelow.
[0177] Optionally, the second exemplary method further includes
classifying results of an assay not represented in the space as
originally calculated, using out-of-sample extension, as described
in further detail hereinbelow.
[0178] Reference is now made to FIG. 5, which is a simplified
flowchart schematically illustrating a third method for identifying
outliers among chemical reaction assays, according to an exemplary
embodiment of the present invention.
[0179] A third exemplary method, according to an exemplary
embodiment of the present invention, may be implemented on a
computer, say using a computer server application in remote
communication with one or more dedicated client programs installed
on remote user computers, say in a Software-as-a-Service (SaaS)
mode, as known in the art.
[0180] The third exemplary method may also be implemented using a
computer program installed on a user's computer (say a desktop
computer, a laptop computer, a tablet computer, a cellular phone,
etc).
[0181] In the third exemplary method, there are received 501 sets
of results. Each one of the received sets of results pertains to a
respective assay of a chemical reaction.
[0182] The sets of results may be received 501 as variables.
[0183] The variables may hold measurement values, such as:
Fluorescence Intensity (FI) values of elbow points of a sigmoid
graph depicting the assay of the chemical reaction, time of each of
the elbow points, FI values of certain points of the graph, pH
values measured during or after the reaction, amounts of products
yielded in the reaction, etc.
[0184] Then, there is calculated 503 a mathematical space in which
each one of the chemical reaction assays is represented by a
respective point. The point is positioned in the calculated 503
space according to the assay's set of results (as in the received
501 variables).
[0185] Optionally, the calculated 503 space is a Euclidean space
dimensioned according to variables' number in each set of assay
results. Each of the points is positioned in the calculated 503
space, such that each coordinate value of the point corresponds to
one of the variables received 501 for the assay represented by the
point.
[0186] Optionally, the calculated 503 space is a space which
enhances proximity among points that represent assays of
qualitatively identical chemical reactions, as described in further
detail hereinbelow.
[0187] Methods which may be used for calculating 503 the space, may
further include, but are not limited to: Dimensionality Reduction
methods such as Diffusion Mapping, as well as Kernel Principal
Component Analysis (Kernel PCA) methods.
[0188] Dimensionality Reduction also referred to as Manifold
Learning, is a process of reducing the number of variables under
consideration. Dimensionality Reduction may be divided into Feature
Selection and Feature Extraction.
[0189] Feature Selection, also known as Variable Selection, Feature
Reduction, Attribute Selection or Variable Subset Selection, is a
technique, commonly used in machine learning.
[0190] With Feature Selection, there is selected a subset of
relevant features, for building robust learning models, as known in
the art.
[0191] In Feature Extraction, when data input to an algorithm is
too large to be processed and is suspected to be redundant, the
input data is transformed into a representation based on a reduced
number of features (also named a feature vector), as known in the
art.
[0192] If the features extracted are carefully selected, it is
expected that the selected features include only information which
is essential for performing a desired task (say, for dividing the
assays into the groups), instead of the whole input data.
[0193] Although Dimension Reduction is a technique which usually
involves narrowing down a space's dimension number, Dimension
Reduction may also involve modifying metrics (say strengthening
affinity between points of qualitatively identical assays), without
reducing the number of dimensions.
[0194] Examples of dimension reduction techniques include, but are
not limited to: Diffusion Mapping, Anisotropic Mapping, Multi
Dimensional Scaling (MDS), Local Linear Embedding (LLE) and Local
Multi Dimensional Scaling (Local MDS), as known in the art.
[0195] Diffusion Mapping is a recently developed method of
dimensionally reduction, which belongs to a method group known as
Kernel Principal Component Analysis (Kernel PCA).
[0196] For example, Ronald R. Coifman and Stephane Lafon describe
Diffusion Mapping in an article entitled "Diffusion Maps",
published in Applied and Computational Harmonic Analysis: Special
issue on Diffusion Maps and Wavelets, Vol 21, July 2006, pp
5-30.
[0197] Coifman and Lafon show in the article, among other things,
that the Diffusion Distance (as defined in the article) in a
Euclidean space equals the Euclidean distance (as defined in the
article) in a corresponding Diffusion Space.
[0198] Diffusion mapping automatically differentiates variable
groups in data (such as the sets of results described hereinabove),
according to how clustered they are.
[0199] Diffusion mapping further extracts minimal sets of
meaningful variables. The minimal sets describe the input data. All
results are taken into account for the extraction. The
differentiation is based on geometrical separation between the
results, as described in further detail hereinbelow.
[0200] Optionally, the sets of results are first represented in a
Euclidean space dimensioned according to variables' number in each
set of results, as illustrated using FIG. 7 hereinbelow.
[0201] Then, by applying diffusion mapping on the Euclidian space,
there is calculated 503 a space in which the distance between any
two of the points depends on a number of short paths between the
two points, as found in the Euclidian space.
[0202] Short paths connect the two points through a number of
points.
[0203] Each of the points in the short path may be directly
connected only to a point in the connected point's vicinity.
[0204] The distance between the points further depends on the
lengths of the short paths. The lower is the number of the points
connected in the short path, the higher is the weight of the short
path, and the closer are the points.
[0205] That is to say that the space calculated 503 using diffusion
mapping, enhances proximity among points that represent assays of
qualitatively identical chemical reactions.
[0206] The space calculated 503 using diffusion mapping, further
provides for easier recognition of grouping among the points, based
on highest-density areas of the calculated 503 space, as
illustrated using FIG. 8 hereinbelow.
[0207] Next, there is calculated 505 a function which divides the
calculated 503 space into two or more groups.
[0208] Optionally, the function which divides the space may be
calculated 505 using one or more of currently known classification
methods, such as simple Nearest-Neighbor methods, K-Means, Fuzzy
K-Means, C-Means, Neuronal Networks, SVM (Support Vector Machine),
etc.
[0209] The calculation 505 of the function may also be based on a
manual or semi-manual process, in which the points in the space are
presented graphically to a user, on a computer screen, and the user
is allowed to draw one or more lines which divide the space into
the two or more groups.
[0210] In one example, the function which divides the space is
calculated 505 using a SVM (Support Vector Machine).
[0211] Support Vector Machines (SVMs) are a set of related
supervised learning methods that analyze data and recognize
patterns. SVM is widely used for classification and regression
analysis, as known in the art.
[0212] A Standard SVM may take a set of input data and predict, for
each given input data, which one of possible categories the input
data belongs to.
[0213] Given a set of training examples, each marked as belonging
to one of the categories, a SVM training algorithm builds a model
usable for assigning a new example into one of the categories.
[0214] Intuitively, a model built by SVM, is a representation of
the examples as points in a space, mapped so that the examples of
the separate categories are divided by a clear gap.
[0215] Consequently, a new example may be mapped into the same
space and be predicted to belong to one of the categories, based on
which side of the gap the new example falls on.
[0216] In one example, the SVM is used to divide the space into two
groups: one negative and one positive, using points that represent
known positive and known negative assays (i.e. control samples),
while ignoring points that represent other assays, in the
calculated 503 space.
[0217] In the example, there is used a standard SVM, to classify
the points, by calculating 505 the function which divides the
calculated 503 space into the two categories of points (negative
assays and positive assays), based on the assays known as negative
or positive (i.e. control samples).
[0218] The standard SVM used may involve a polynomial discriminant
function, with a soft margin, as known in the art. That is to say
that the standard SVM used may tolerate a small number of erroneous
control samples. An erroneous control sample is a sample marked as
positive, which falls negative in the example, or vise versa.
[0219] Optionally, the calculated 505 function which divides the
calculated 503 space is a polynomial function.
[0220] In one example, the calculated 503 space is bi-dimensional
and the calculated 505 function defines a line, which divides the
calculated 503 space into two groups, as illustrated using FIG. 9A,
hereinbelow.
[0221] In other examples, the calculated 505 function may divide
the calculated 503 space into three or more groups (say a function
which defines two separate lines, as illustrated using FIG. 9B
hereinbelow).
[0222] Optionally, the calculated 503 space is rather
three-dimensional, and the calculated 505 function defines one or
more bi-dimensional hyper-surfaces, as illustrated using FIG. 9C,
hereinbelow.
[0223] The discriminant function's sign at any of the points in the
calculated 503 space gives the classification of the point as
positive or negative.
[0224] That is to say that once the SVM calculates 505 the function
dividing the space (using the control samples), all points in the
calculated 503 space may be classified as negative or positive,
using the discriminant function.
[0225] In one example, the discriminant function's value at the
point gives the confidence level at the point. The discriminant
function value is a function of the distance of the point from the
calculated 505 function (i.e. from the line or hyper-surface which
divides the space into the positive and negative groups).
Optionally, the confidence value equals or roughly corresponds to
the distance from the calculated 505 function, or to a value
resultant upon normalization of the distance, as known in the
art.
[0226] Next, there is calculated 507 a cumulative function, as
illustrated using FIG. 11, hereinbelow.
[0227] The cumulative function (say a mathematical function) gives
a quantitative indication based on a count of points in the
calculated 503 space, as a function of distance from the calculated
505 function which divides the calculated 503 space into the two
groups.
[0228] Optionally, the quantitative indication is a number of
points in the calculated 503 space as a function of the distance
from the function which divides the calculated 503 space.
[0229] Optionally, the distance is a conventional analytic geometry
distance, which is the shortest distance between the point and the
function, say a normal, as known in the art.
[0230] Optionally, the distance is a rectilinear distance, as known
in the art.
[0231] In one example, the cumulative function is denoted
F.sub.(x).
[0232] Optionally, F(.sub.X) indicates how many of the points have
a confidence value between 0 and x, as given by the SVM
discriminant function's absolute value at points on either of side
of the function which divides the calculated 503 space. In the
example, x is equal to the distance from the calculated 505
function which divides the space, or to a value resultant upon
normalization of the distance, as known in the art.
[0233] Alternatively, the cumulative function of the example may be
denoted F(.sub.X-D) and indicate how many of the points have a
confidence value between x and x-d, as given by the SV discriminant
function's absolute value, where d is a number between x and 0, and
x is equals to the distance from the function which divides the
calculated 503 space, or to a value resultant upon normalization of
the distance, as known in the art.
[0234] Optionally, the quantitative indication is a density of
points in the calculated 503 space (say number of points per one
unit of space), as a function of the distance from the calculated
505 function which divides the calculated 503 space.
[0235] Optionally, the cumulative function is rather calculated 507
separately for points on one side of the calculated 505 function
which divides the calculated 503 space, and for points on the other
side of the calculated 505 function, as descried in further detail
hereinbelow.
[0236] Next, there are found 510 transition points (i.e. elbow
points) in the cumulative function, as illustrated using FIG. 11,
hereinbelow.
[0237] When the calculated 503 space includes points which
concentrate in groups, the cumulative function has a sigmoid form,
with typical transition points, also referred to hereinbelow as
elbow points.
[0238] For example, the cumulative function denoted F(.sub.X) may
have a transition point with a specific x value. The x value gives
a distance of the transition point from the calculated 505 function
which divides the space.
[0239] Optionally, the transition points are found 510 using the
transition point finder 110 of the exemplary apparatus 1000, as
described in further detail hereinabove.
[0240] When the cumulative function is calculated 507 separately
for each side of the calculated 505 function which divides the
space, transition points are found 510 separately for each of the
two sides, as described in further detail hereinbelow.
[0241] The transition points may be found using a variety of
methods including, but not limited to: base-lining operations,
Levenberg-Marquardt (LM) regression processes, curvature analysis
by comparison with a first or a second degree polynomial curve
which fits a growth type of the cumulative function, rotational
transformation of a curve defined by the cumulative function,
linear regression methods, etc., as known in the art.
[0242] In a slope based example, one or more transition points are
found 510, by a comparison with a second function.
[0243] The second function may be a linear function which connects
the edges (i.e. first and last point) of a curve defined by the
cumulative function, a linear function with a slope parallel to a
slope of the linear function which connects the edges, or a
non-linear function with a line of best fit which parallels the
slope of the linear function which connects the edges.
[0244] In the slope based example, the transition point is found
510 by calculating a difference between the cumulative function and
the second function, and identifying one or more extremum points on
a graph which represents the calculated difference.
[0245] The extremum point indicates a distance of the found 510
transition point from the calculated 505 function which divides the
calculated 503 space.
[0246] Next, there is used the distance of the transition point
found 510 in the cumulative function from the function which
divides the calculated space, as a threshold, for identifying 520
outliers among the chemical reaction assays, as described in
further detail hereinbelow.
[0247] In a first example, the cumulative function is specific to
one of the groups.
[0248] More specifically, in the first example, the cumulative
function is calculated 507 separately for each specific one of the
two sides of the space, as divided by the calculated 505 function,
say using values of the SVM's polynomial discriminant function, on
each point known to belong on the specific side.
[0249] That is to say that a first cumulative function is
calculated 507 using the points known to represent negative assays,
and a second cumulative function is calculated 507 using the points
known to represent positive assays.
[0250] Consequently, two transition points are found 510 (one among
the known negative assays and one among the known positive
assays).
[0251] In the first example, any point positioned between the two
transition points, is found 520 to be an outlier, as described in
further detail hereinabove.
[0252] In a second example, the cumulative function is common to
two or more of the groups.
[0253] More specifically, in the second example, the cumulative
function is calculated 507 jointly for both sides of the space as
divided by the calculated 505 function which divides the space into
two groups, say using absolute values of the SVM's polynomial
discriminant function, on each point of the control samples (i.e.
both the known negative and the known positive assays).
[0254] Consequently, only one transition points is found 510.
[0255] In the second example, any point positioned within a
distance smaller than the distance of the found 510 transition
point, from the function which divides the calculated 503 space, is
found 520 to be an outlier, as described in further detail
hereinabove.
[0256] Optionally, the third exemplary method further includes
classifying results of a new assay not represented in the space as
originally calculated, using out-of-sample extension, as known in
the art.
[0257] For example, the results of the new assay may be positioned
in the calculated 503 space, using an out-of-sample method
compatible with the calculated 503 space, as known in the art.
Alternatively, the space is re-calculated using the new assay and
the results of all assays represented in the space as originally
calculated 503, thus positioning all points in the re-calculated
space.
[0258] Then, the original SVM discriminant function is applied on a
point which represents the results of the new assay.
[0259] The sign of the discriminant function on the point serves to
classify the new assay. The value of the discriminant function on
the point gives the confidence level, as described in further
detail hereinabove.
[0260] The discriminant function value is a function of the
distance of the point from the calculated 505 function (i.e. from
the line or hyper-surface which divides the space into the positive
and negative groups). Optionally, the confidence value equals or
roughly corresponds to the distance from the calculated 505
function, as described in further detail hereinabove.
[0261] The originally found 510 transition points' distances from
the calculated 505 function, are used as threshold values, to
determine if the point is an outlier, using the point's confidence
level, just as for points represented in the space as originally
calculated 503.
[0262] Reference is now made to FIG. 6, which is a block diagram
schematically illustrating a computer readable medium storing
computer executable instructions for performing steps of
identifying outliers among chemical reaction assays, according to
an exemplary embodiment of the present invention.
[0263] According to an exemplary embodiment of the present
invention, there is provided a computer readable medium 6000, such
as a CD-ROM, a USB-Memory, a Portable Hard Disk, a diskette,
etc.
[0264] The computer readable medium 6000 stores computer executable
instructions for performing steps of identifying outliers among
chemical reaction assays, according to an exemplary embodiment of
the present invention.
[0265] The computer executable instructions include a step of
finding 610 one or more transition points in a cumulative function,
as described in further detail hereinabove.
[0266] The cumulative function (say a mathematical function) gives
a quantitative indication based on a count of points in a
calculated space as a function of distance from a function which
divides the calculated space into two (or more) groups, as
described in further detail hereinabove.
[0267] Each one of the points represents results of a respective
assay of a chemical reaction.
[0268] Optionally, the distance is a conventional analytic geometry
distance, which is the shortest distance between the point and the
function, say a normal, as known in the art.
[0269] Optionally, the distance is a rectilinear distance, as known
in the art.
[0270] Optionally, the quantitative indication is a number of
points in the calculated space as a function of the distance from
the function which divides the calculated space.
[0271] Optionally, the quantitative indication is a density of
points in the calculated space as a function of the distance from
the function which divides the calculated space, say a number of
points per one unit of space.
[0272] Optionally, the function which divides the calculated space,
defines one or more lines, as described in further detail
hereinabove.
[0273] Optionally, the function which divides the calculated space,
defines one or more hyper-surfaces, as described in further detail
hereinabove.
[0274] Optionally, the function which divides the space is a
polynomial function.
[0275] Optionally, the calculated space is a space which enhances
proximity among points that represent assays of qualitatively
identical chemical reactions, such as a space calculated using
diffusion mapping or another dimensionality reduction technique, as
described in further detail hereinabove.
[0276] The computer executable instructions further include a step
in which a distance of the found transition point from the function
which divides the calculated space, is used as a threshold, for
identifying 620 outliers among the chemical reaction assays, as
described in further detail hereinabove.
[0277] Optionally, the computer executable instructions further
include a step of calculating the cumulative function, as described
in further detail hereinabove.
[0278] Optionally, the computer executable instructions further
include a step of receiving a plurality of sets of results. Each
one of the received sets of results pertains to a respective assay
of a chemical reaction.
[0279] Optionally, the computer executable instructions further
include a step of calculating the space, using the received sets of
results, as described in further detail hereinabove.
[0280] Optionally, the space is calculated using dimensionality
reduction.
[0281] For example, the computer executable instructions may
include a step of calculating the space, using diffusion mapping or
another dimensionality reducing technique, as described in further
detail hereinbelow.
[0282] Optionally, the computer executable instructions further
include a step of calculating the function which divides the
calculated space, say using a Support Vector Machine (SVM), as
described in further detail hereinbelow.
[0283] Optionally, the computer executable instructions further
include a step of classifying results of an assay not represented
in the space as originally calculated, using out-of-sample
extension, as described in further detail hereinabove.
[0284] FIG. 7-12 serve to graphically illustrate a fourth exemplary
method for outlier identification, according to an exemplary
embodiment of the present invention.
[0285] Reference is now made to FIG. 7, which is an exemplary graph
depicting points representing chemical assays, in a Euclidean
space, according to an exemplary embodiment of the present
invention.
[0286] In a fourth exemplary method, according to an exemplary
embodiment of the present invention, each assay of a chemical
reaction is represented by a respective point positioned in a
Euclidean space, as described in further detail hereinabove.
[0287] Though illustrated using FIG. 7 with two dimensions, the
Euclidean space may have any number of dimensions.
[0288] Optionally, the space's number of dimensions may depend on
variable number of each assay. For example, assays characterized by
three variables, may be represented by points in a three
dimensional Euclidean space.
[0289] As illustrated using FIG. 7, the Euclidean space includes
points of known negative assays (with a minus sign), points of
known positive assays (with a plus sign) and remaining points that
need to be classified as positive or negative.
[0290] Reference is now made to FIG. 8, which is an exemplary
graph, depicting points representing chemical assays in a space
calculated using diffusion mapping, according to an exemplary
embodiment of the present invention.
[0291] In the fourth exemplary method according to an exemplary
embodiment of the present invention, diffusion mapping is applied
on the Euclidean space of FIG. 7, for calculating a dimensionality
reduced space, as described in further detail hereinabove.
[0292] As illustrated using FIG. 8, the space calculated using
diffusion mapping includes the same number of points of known
negative assays (with a negative sign), points of known positive
assays (with a positive sign) and remaining points that need to be
classified as positive or negative.
[0293] However, in the calculated space, the points tend to
concentrate in clearly recognizable two higher density areas.
[0294] Reference is now made to FIG. 9A, which is an exemplary
graph, depicting a function which defines a line and separates a
calculated space into two groups, according to an exemplary
embodiment of the present invention.
[0295] Optionally, in the exemplary fourth method, there is
calculated a function 901 which defines a line. The function 901
divides the calculated space of FIG. 8, into two groups (Namely, a
group of negative assays and a group of positive assays), say using
SVM (Support Vector Machine), as described in further detail
hereinabove.
[0296] Reference is now made to FIG. 9B, which is an exemplary
graph, depicting a function which defines two lines and separates a
calculated space into three groups, according to an exemplary
embodiment of the present invention.
[0297] Optionally, in the fourth exemplary method, there is rather
calculated a function 902 which defines two lines, and divides the
calculated space of FIG. 8, into three groups.
[0298] With function 902, the groups include: a group of negative
assays (which includes known negative assay points marked with a
minus sign), a group of positive assays (which includes points
marked with a plus sign), and a group of neutral assays (which
includes points painted black).
[0299] Reference is now made to FIG. 9C, which is an exemplary
graph, depicting a function which defines a hyper-surface, and
separates a calculated space into two groups, according to an
exemplary embodiment of the present invention.
[0300] Optionally, in the fourth exemplary method, there is rather
calculated a function 903 which defines a hyper-surface, and
divides the calculated space of FIG. 8, into two groups (one
negative and one positive), as described in further detail
hereinabove.
[0301] Reference is now made to FIG. 10, which is a second
exemplary graph, depicting the function which defines a line and
separates the calculated space into the two groups, according to an
exemplary embodiment of the present invention.
[0302] Each of the points in the space (divided into the two
groups) of FIG. 8, is positioned within a certain distance (as
illustrated using arrows 905, 906) from the function 901 of FIG.
9A, which divides the calculated space into the two groups.
[0303] Optionally, the function 901 is calculated by SVM, and each
point's distance from function 901 is given by the absolute value
of the SVM discriminant function in the point, as described in
further detail hereinabove.
[0304] Reference is now made to FIG. 11, which is an exemplary
graph, depicting cumulative functions, according to an exemplary
embodiment of the present invention.
[0305] Further in the fourth exemplary method, according to an
exemplary embodiment of the present invention, there is calculated
a first cumulative function 950.
[0306] The first cumulative function 950 gives a number of points
on the calculated function's 901 (which divides the space into the
two groups) positive side, as a function of distance 905 from the
function 901.
[0307] Further in the fourth exemplary method, there is calculated
a second cumulative function 960, which gives a number of points on
the calculated function's 901 negative side, as a function of
distance 906 from the function 901.
[0308] Optionally, the distance of each point from the function 901
which divides the space, is determined using conventional analytic
geometry tools, by calculating the shortest distance (i.e. a
normal) between the point and the function 901, as known in the
art.
[0309] In each of the cumulative functions 950, 960, there is found
an elbow point (i.e. a transition point), as described in further
detail hereinabove.
[0310] A first elbow point 951 is found in the first cumulative
function 950.
[0311] As shown, the first elbow point 951 is located in a distance
952 to the right of the function 901 which divides the space into
the two groups.
[0312] A second elbow point 961 is found in the second cumulative
function 960.
[0313] As shown, the second elbow point 961 is located in a
distance 962 to the left of the function 901 which divides the
space into the two groups.
[0314] Reference is now made to FIG. 12, which is an exemplary
graph, depicting thresholds based on transition points of a
cumulative function, according to an exemplary embodiment of the
present invention.
[0315] In the exemplary method, the distances 952, 962 of the found
transition points (i.e. elbow points 951 and 961, respectively)
from the function 901, are used as thresholds, for identifying
outliers among the points (and thus identifying erroneous assays),
as described in further detail hereinabove.
[0316] That is to say that points that are positioned between the
thresholds 952, 962, are found to be outliers, as described in
further detail hereinabove.
[0317] It is expected that during the life of this patent many
relevant devices and systems will be developed and the scope of the
terms herein, particularly of the terms "Computer", "Server",
"Desktop computer", "Laptop computer", "Tablet computer", "Cellular
phone", and "SaaS", is intended to include all such new
technologies a priori.
[0318] It is appreciated that certain features of the invention,
which are, for clarity, described in the context of separate
embodiments, may also be provided in combination in a single
embodiment. Conversely, various features of the invention, which
are, for brevity, described in the context of a single embodiment,
may also be provided separately or in any suitable
subcombination.
[0319] Although the invention has been described in conjunction
with specific embodiments thereof, it is evident that many
alternatives, modifications and variations will be apparent to
those skilled in the art. Accordingly, it is intended to embrace
all such alternatives, modifications and variations that fall
within the spirit and broad scope of the appended claims.
[0320] All publications, patents and patent applications mentioned
in this specification are herein incorporated in their entirety by
reference into the specification, to the same extent as if each
individual publication, patent or patent application was
specifically and individually indicated to be incorporated herein
by reference. In addition, citation or identification of any
reference in this application shall not be construed as an
admission that such reference is available as prior art to the
present invention.
* * * * *