U.S. patent application number 16/870492 was filed with the patent office on 2020-11-12 for computer vision systems and methods for machine learning using a set packing framework.
This patent application is currently assigned to Insurance Services Office, Inc.. The applicant listed for this patent is Insurance Services Office, Inc.. Invention is credited to Yossiri Adulyasak, Guy Desaulniers, Maneesh Kumar Singh, Julian Yarkony.
Application Number | 20200356811 16/870492 |
Document ID | / |
Family ID | 1000004858010 |
Filed Date | 2020-11-12 |
![](/patent/app/20200356811/US20200356811A1-20201112-D00000.png)
![](/patent/app/20200356811/US20200356811A1-20201112-D00001.png)
![](/patent/app/20200356811/US20200356811A1-20201112-D00002.png)
![](/patent/app/20200356811/US20200356811A1-20201112-D00003.png)
![](/patent/app/20200356811/US20200356811A1-20201112-D00004.png)
![](/patent/app/20200356811/US20200356811A1-20201112-D00005.png)
![](/patent/app/20200356811/US20200356811A1-20201112-D00006.png)
![](/patent/app/20200356811/US20200356811A1-20201112-D00007.png)
![](/patent/app/20200356811/US20200356811A1-20201112-D00008.png)
![](/patent/app/20200356811/US20200356811A1-20201112-D00009.png)
![](/patent/app/20200356811/US20200356811A1-20201112-D00010.png)
View All Diagrams
United States Patent
Application |
20200356811 |
Kind Code |
A1 |
Yarkony; Julian ; et
al. |
November 12, 2020 |
Computer Vision Systems and Methods for Machine Learning Using a
Set Packing Framework
Abstract
Computer vision systems and methods for machine learning using a
set packing framework are provided. A minimum weight set packing
("MWSP") framework is parameterized by a set of possible
hypotheses, each of which is associated with a real valued cost
that describes the sensibility of the belief that the members of
the hypothesis correspond to a common cause. Using MWSP, the system
then selects the lowest total cost set of hypotheses, such that no
two selected hypotheses share a common observation. Observations
that are not included in any selected hypothesis, define the set of
false observations can be thought of as false observations/noise.
The system can be utilized to support one or more trained computer
models in performing computer vision on input data in order to
generate output data.
Inventors: |
Yarkony; Julian; (Jersey
City, NJ) ; Adulyasak; Yossiri; (Montreal, CA)
; Singh; Maneesh Kumar; (Lawrenceville, NJ) ;
Desaulniers; Guy; (Blainville, CA) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
Insurance Services Office, Inc. |
Jersey City |
NJ |
US |
|
|
Assignee: |
Insurance Services Office,
Inc.
Jersey City
NJ
|
Family ID: |
1000004858010 |
Appl. No.: |
16/870492 |
Filed: |
May 8, 2020 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
62845526 |
May 9, 2019 |
|
|
|
Current U.S.
Class: |
1/1 |
Current CPC
Class: |
G06T 7/62 20170101; G06K
9/0014 20130101; G06T 2207/20084 20130101; G06N 5/04 20130101; G06K
9/00362 20130101; G06T 2207/20081 20130101; G06K 9/6218 20130101;
G06N 20/00 20190101; G06K 9/6261 20130101; G06K 9/00711 20130101;
G06K 9/6256 20130101; G06T 7/0012 20130101 |
International
Class: |
G06K 9/62 20060101
G06K009/62; G06K 9/00 20060101 G06K009/00; G06T 7/62 20060101
G06T007/62; G06T 7/00 20060101 G06T007/00; G06N 20/00 20060101
G06N020/00; G06N 5/04 20060101 G06N005/04 |
Claims
1. A system for training a model for a computer system, comprising:
a computer system in communication with a database including raw
input data; and a set packing engine executed by the first computer
system, the set packing engine: processing the raw input data to
formulate correlation clustering corresponding to the raw input
data as an integer linear program; processing the correlation
clustering to generate an expanded formulation of the correlation
clustering; solving the expanded formulation using a column
generation process; and transmitting training information
corresponding to the solved expanded formulation to a model system,
the training information assisting the model system in performing
computer vision processing on input data to identify output
data.
2. The system of claim 1, wherein the set packing engine formulates
sequences of subtracks for detecting people in one or more video
frames as an integer linear problem.
3. The system of claim 2, wherein the set packing engine defines a
set of detections of people in the one or more video frames.
4. The system of claim 3, wherein the set packing engine decomposes
track costs in terms of subtrack costs.
5. The system of claim 4, wherein the set packing engine augments
sets of subtracks with subtracks padded with empty detections.
6. The system of claim 1, wherein the set packing engine identifies
a plurality of body parts in one or more video frames.
7. The system of claim 6, wherein the set packing engine associates
pairs of detections with costs associated with a common person.
8. The system of claim 7, wherein the set packing engine aggregates
the detections to form representations of people using an integer
linear program formulation.
9. The system of claim 1, wherein the set packing engine denotes a
set of human body part detections using the raw input data.
10. The system of claim 9, wherein the set packing engine defines a
set of people as a power set of the set of human body part
detections.
11. The system of claim 10, wherein the set packing engine defines
a cost for a person.
12. The system of claim 11, wherein the set packing engine models a
person according to a common tree structured model.
13. The system of claim 1, wherein the set packing engine performs
dimensionality reduction by partitioning sets of pixels into sets
of super-pixels.
14. The system of claim 13, wherein the set packing engine provides
a cost for each pair of adjacent super-pixels to be associated with
a common cell.
15. The system of claim 14, wherein the set packing engine computes
a maximum radius and an area of each cell.
16. The system of claim 1, wherein the set packing engine
generates: (i) a set of observations corresponding to a set of
superpixels, and (ii) a set of hypotheses corresponding to a set of
biological cells.
17. The system of claim 16, wherein the set packing engine defines
a radius constraint as a cost.
18. The system of claim 17, wherein the set packing engine denotes
an upper bound on an area of a cell and an area of a
superpixel.
19. The system of claim 18, wherein the set packing engine defines
a volume constraint as a cost.
20. The system of claim 19, wherein the set packing engine
describes image-level evidence corresponding to a quality of a
cell.
21. The system of claim 1, wherein the set packing engine tightens
linear programming relaxation of a minimum weight set packing
framework.
22. The system of claim 21, wherein the set packing engine
determines whether sub-row inequalities destroy a structure of a
pricing problem.
23. The system of claim 22, wherein the set packing engine solves
the pricing problem while modifying the structure of the pricing
problem.
24. The system of claim 22, wherein the set packing engine solves
the pricing problem without modifying the structure of the pricing
problem.
25. A method for training a model for a computer system, comprising
the steps of: processing at a processor raw input data to formulate
correlation clustering corresponding to the raw input data as an
integer linear program; processing the correlation clustering to
generate an expanded formulation of the correlation clustering;
solving the expanded formulation using a column generation process;
and transmitting training information corresponding to the solved
expanded formulation to a model system, the training information
assisting the model system in performing computer vision processing
on input data to identify output data.
26. The method of claim 25, further comprising formulating
sequences of subtracks for detecting people in one or more video
frames as an integer linear problem.
27. The method of claim 26, further comprising defining a set of
detections of people in the one or more video frames.
28. The method of claim 27, further comprising decomposing track
costs in terms of subtrack costs.
29. The method of claim 28, further comprising augmenting sets of
subtracks with subtracks padded with empty detections.
30. The method of claim 25, further comprising identifying a
plurality of body parts in one or more video frames.
31. The method of claim 30, further comprising associating pairs of
detections with costs associated with a common person.
32. The method of claim 31, further comprising aggregating the
detections to form representations of people using an integer
linear program formulation.
33. The method of claim 25, further comprising denoting a set of
human body part detections using the raw input data.
34. The method of claim 33, further comprising defining a set of
people as a power set of the set of human body part detections.
35. The method of claim 34, further comprising defining a cost for
a person.
36. The method of claim 35, further comprising modeling a person
according to a common tree structured model.
37. The method of claim 25, further comprising performing
dimensionality reduction by partitioning sets of pixels into sets
of super-pixels.
38. The method of claim 37, further comprising providing a cost for
each pair of adjacent super-pixels to be associated with a common
cell.
39. The method of claim 38, further comprising computing a maximum
radius and an area of each cell.
40. The method of claim 25, further comprising generating: (i) a
set of observations corresponding to a set of superpixels, and (ii)
a set of hypotheses corresponding to a set of biological cells.
41. The method of claim 40, further comprising defining a radius
constraint as a cost.
42. The method of claim 41, further comprising denoting an upper
bound on an area of a cell and an area of a superpixel.
43. The method of claim 42, further comprising defining a volume
constraint as a cost.
44. The method of claim 43, further comprising describing
image-level evidence corresponding to a quality of a cell.
45. The method of claim 25, further comprising tightening linear
programming relaxation of a minimum weight set packing
framework.
46. The method of claim 45, further comprising determining whether
sub-row inequalities destroy a structure of a pricing problem.
47. The method of claim 46, further comprising solving the pricing
problem while modifying the structure of the pricing problem.
48. The method of claim 47, further comprising solving the pricing
problem without modifying the structure of the pricing problem.
Description
RELATED APPLICATIONS
[0001] The present application claims the benefit of U.S.
Provisional Patent Application Ser. No. 62/845,526 filed on May 9,
2019, the entire disclosure of which is expressly incorporated
herein by reference.
BACKGROUND
Technical Field
[0002] The present disclosure relates generally to the field of
computer vision technology. More specifically, the present
disclosure relates to computer vision systems and methods for
machine learning using a set packing framework.
RELATED ART
[0003] Artificial neural networks ("ANN") excel at learning
functions that map input data vectors (e.g., images of objects such
as a dog, a cat, a horse, etc.) to output labels (e.g., semantic
label: dog, cat, horse, etc.) by using large quantities of labeled
training data. An ANN learns a function that generalizes beyond a
training data set to produce the correct label as output on test
data not part of the training data set. A possible application of
ANNs is object recognition, in which an ANN learns to recognize the
presence of objects (e.g., cat, dog, horse, etc.) in images. Large
data sets facilitate learning such functions. An example of a large
data set includes the image-net data set, which provides fourteen
million training images, each associated with the labels of the
objects present in the image.
[0004] Localizing each unique instance of objects in crowded
images, which is called instance segmentation, is an important
related task to object recognition. The common approach to instance
segmentation iterates over all possible rectangles of pixels
(called bounding boxes) in the image, and predicts the presence of
each object in that rectangle. However, combining the hypotheses
generated in each rectangle to describe each unique instance of
objects is challenging as the hypotheses need not be mutually
consistent. For example, multiple predicted hypotheses can share a
common pixel, but multiple objects cannot be associated with the
same pixel in the ground truth. Heuristics, such as non-max
suppression, are often used to remove conflicts between predicted
hypotheses. Non-max suppression removes from consideration all but
one of each set of "similar" and/or overlapping predictions.
Combinatorial optimization provides a principled alternative to
non-max suppression heuristics, which is referred to as data
association.
[0005] Data association uses combinatorial optimization to
partition the observations in a data set (e.g., pixels in an image)
into a set of hypotheses (e.g., unique instances of objects or
background), each associated with a subset of the observations that
are consistent with the statistical properties of the known
structure of hypothesis.
[0006] The use of combinatorial optimization in computer
vision/machine learning, has developed largely without influence
from the operations research community, and has been focused on
network flows (called graph cuts), primal dual methods (the most
prominent of which is message passing), and compact linear
programming ("LP") relaxations augmented with cutting plane
methods. This often leads to less efficient/optimal solvers than
are desirable. Further, the capacity of the associated models is
limited by not taking advantage of the decades of research in
combinatorial optimization in the operations research
community.
[0007] Recently the core operations research techniques of column
generation ("CG") and (nested) Benders decomposition (called
"(N)BD") have been introduced to the machine learning and computer
vision communities. However, the application of these techniques,
and the construction of models to support the use of CG and (N)BD
is in its infancy.
[0008] Therefore, there is a need for computer vision systems and
methods which can he overcome data association problems in computer
visions systems, thereby improving the speed and efficiency of the
computer vision systems. These and other needs are addressed by the
computer vision systems and methods of the present disclosure.
SUMMARY
[0009] The present disclosure relates to computer vision systems
and methods for machine learning using a set packing framework. The
systems and methods disclosed herein include a minimum weight set
packing ("MWSP") framework, which uses advance methods of integer
programming that the system applies to data association problems
commonly studied in computer vision. In the present system, an MWSP
instance for data association is parameterized by a set of possible
hypotheses, each of which is associated with a real valued cost,
that describes the sensibility of the belief that the members of
the hypothesis correspond to a common cause. Using MWSP, the system
then selects the lowest total cost set of hypotheses, such that no
two selected hypotheses share a common observation. Observations
that are not included in any selected hypothesis, define the set of
false observations can be thought of as false observations/noise.
Embodiments and examples of the present disclosure will be
discussed in regards to multi-person detection, which can be used
in, for example, self-driving car applications. The set of
observations is the set of all pixels, and the set of possible
hypotheses is the power set of pixels. The statistical support for
a hypothesis, is defined in terms of how well a classifier (such as
an ANN) scores the quality of a single person dominating the
corresponding pixels.
BRIEF DESCRIPTION OF THE DRAWINGS
[0010] The foregoing features of the invention will be apparent
from the following Detailed Description of the Invention, taken in
connection with the accompanying drawings, in which:
[0011] FIG. 1 is a diagram illustrating overall system of the
present disclosure;
[0012] FIG. 2 is a flowchart illustrating overall process steps
carried out by the system of the present disclosure;
[0013] FIG. 3 is an illustration of an algorithm for column
generation in connection with step 36, as described in connection
with FIG. 2;
[0014] FIG. 4 is a flowchart illustrating process steps being
carried out by the system of the present disclosure to generate a
minimum weight set packing ("MWSP") formulation of multi-person
tracking;
[0015] FIGS. 5A-5C are a set of images showing multi-object
tracking in connection with the system of the present
disclosure;
[0016] FIG. 6 is a flowchart illustrating process steps being
carried out by the system of the present disclosure to generate a
multi-person tracking MWSP formulation for data association;
[0017] FIG. 7 is an illustration showing subtracks in connection
with step 72 of FIG. 6;
[0018] FIG. 8 is a flowchart illustrating process steps being
carried out by the system of the present disclosure to generate a
MWSP formulation of multi-person pose estimation ("MPPE");
[0019] FIG. 9 is an image showing multi-person pose estimation in
connection with the system of the present disclosure;
[0020] FIGS. 10A-B are illustrations showing a tree model of the
present disclosure, augmented with additional connections, where
additional connections trade off optimization difficulty and
modeling power;
[0021] FIG. 11 is a flowchart illustrating process steps being
carried out by the system of the present disclosure to generate a
MPPE MWSP formulation for data association;
[0022] FIG. 12 is a flowchart illustrating process steps being
carried out by the system of the present disclosure to generate
multi-cell segmentation;
[0023] FIG. 13 is an image showing a multi-cell instance
segmentation in connection with FIG. 12;
[0024] FIG. 14 is a flowchart illustrating process steps being
carried out by the system of the present disclosure to generate a
MWSP formulation of multi-cell segmentation;
[0025] FIG. 15 is a flowchart illustrating process steps being
carried out by the system of the present disclosure to tighten the
linear program relaxation of the MWSP;
[0026] FIG. 16 is an algorithm describing the column/row generation
("CRG") in connection with the present disclosure;
[0027] FIG. 17 is a table showing splits enumerated for a triplet
of observations in connection with the present disclosure;
[0028] FIG. 18 is a set of images illustrating a qualitative
example of improvement as a result of increasing subtrack
length;
[0029] FIGS. 19A-B are graphs showing a comparison of timing/cost
performance of the present disclosure with a baseline dual
decomposition approach;
[0030] FIG. 20 shows a table comparing column generation against a
prior art heuristic optimization procedure in terms of the accuracy
(average precision) on standard computer vision benchmarks;
[0031] FIG. 21 is a set of images showing sample outputs of the
system of the present disclosure;
[0032] FIG. 22 is a table showing a comparison in total time in
second and comparative speed up using dual optimal inequalities
("DOI") on different solvers;
[0033] FIGS. 23-24 are scatter plots showing time consumed using
DOI for each of the two solvers;
[0034] FIG. 25 is an illustration showing an output of the present
system;
[0035] FIG. 26 is a graph showing the results of the comparison
between the present system and prior art systems;
[0036] FIG. 27 is a graph showing optimization time for column
generation across problem instances in dataset one; and
[0037] FIG. 28 is a diagram illustrating sample hardware and
software components capable of being used to implement the system
of the present disclosure.
DETAILED DESCRIPTION
[0038] The present disclosure relates to computer vision systems
and methods for machine learning using a set packing framework, as
described in detail below in connection with FIGS. 1-28.
[0039] FIG. 1 is a diagram illustrating the system of the present
disclosure, indicated generally at 10. The system 10 includes a
model training system 14 which receives raw input data 12,
processes the data 12, and feeds the processed data to a trained
model 18. The raw input data 12 can be sets of training data, as
will be discussed in further detail below. The trained model system
18 receives input data 20 and generates output data 22. The input
data 20 can be data desired to be processed and classified by the
system 10, and the output data 22 can include classified data. The
model training system 14 includes a set packing engine 16.
[0040] The set packing engine 16 models data association as a
minimum weight set packing formulation ("MWSP"), which is framed
using sets of observations and hypotheses denoted D and G
respectively, which are index by d and g respectively. The mapping
of observations to hypotheses is described using matrix G E {0,
1}.sup.|D|.times.G where G.sub.dg=1 if hypothesis g includes
observation d. Real valued costs are associated to hypotheses using
.GAMMA..di-elect cons..sup.|G| where .GAMMA..sub.g is a cost
associated with hypothesis g. The MWSP is formulated as an integer
linear program ("ILP") using .gamma..sub.g.di-elect cons.{0, 1}
where .gamma..sub.g=1 if hypothesis g is included in the set
packing, as is expressed in Equation 1, below:
min .gamma. g .di-elect cons. { 0 , 1 } ; .A-inverted. g .di-elect
cons. g .di-elect cons. G .GAMMA. g .gamma. g g .di-elect cons. G
dg .gamma. g .ltoreq. 1 .A-inverted. d .di-elect cons. Equation 1
##EQU00001##
[0041] In Equation 1, the objective of optimization is the total
cost of all hypotheses in the packing. For every observation
d.di-elect cons.D, there is one constraint in Equation 1 that
states that no more than one selected hypothesis contains
observation d. In an example, the cost of a hypothesis consisting
of zero observations is zero.
[0042] Prior art systems generally generate cost terms by training
a standard linear classifier to determine the probability a
variable/pair of variables takes on a given label/pair of labels.
The output probabilities are converted to cost terms by taking the
negative log of the probability. However, this is not a
mathematically principled approach since it does not consider the
ILP context, in which a complete solution to all variables is
produced. To correctly model the ILP used to produce a solution,
the system 10 uses structured support vector machines ("SVM"). A
structured SVM learns a mechanism to produce cost terms for ILPs
such that the optimal solution to that ILP is similar to the ground
truth (information provided by direct observation). The system 10
learns a structured SVM from large amounts of labeled data using a
cutting plane approach where the ground truth solution is separated
from other solutions generated in the course of training the
structured SVM. Learning for structured SVMs requires repeatedly
solving ILPs (or linear programs "LPs") across problem instances,
making learning on large data sets challenging. Other mechanisms
that can be used by the system 10 to learn cost terms include
herding, which is designed to decrease computational requirements
relative to the structured SVM, and provides multiple solutions for
a problem instance akin to samples from a probability distribution
over solutions.
[0043] FIG. 2 is a flowchart illustrating the overall process steps
being carried out by the system 10, indicated generally at method
30. The process steps of method 20 will be discussed in relation to
the framework of the set packing engine 16. Specifically, method 20
will discuss using expanded representation for MWSP problems, and
the solutions via column generation.
[0044] In step 32, the system 10 formulates correlation clustering
as an ILP (integer linear program). Specifically, a graph is
expressed with a node set D indexed by d, edge set c indexed by
(d.sub.1, d.sub.2) with weights .theta..di-elect
cons..sup.|D|.times.D indexed by (d.sub.1, d.sub.2). Correlation
clustering partitions the nodes into sets, so as to minimize the
sum of the within cluster edges. Correlation clustering is known to
be NP-Hard problem. The system 10 uses decision variables
x.di-elect cons.{0, 1}.sup.|D|.times.|D|, which are index with d,
j, where x.sub.dj=1 if node d is in cluster j. Clusters are indexed
by j, and lie in J={0, 1, 2, |D|}. Expression y.di-elect cons.{0,
1}.sup.D.times.D.times.D describes co-association. Specifically,
y.sub.d1d2j=1 if d.sub.1, d.sub.2 are part of a common cluster j.
Accordingly, correlation clustering as an ILP is expressed by
Equation 2-7, below:
min x .gtoreq. 0 y .gtoreq. 0 d 1 , d 2 .di-elect cons. j .di-elect
cons. y d 1 d 2 j .theta. d 1 d 2 Equation 2 j .di-elect cons. x dj
= 1 .A-inverted. d .di-elect cons. Equation 3 y d 1 d 2 j < x d
1 j .A-inverted. ( d 1 , d 2 ) .di-elect cons. , j .di-elect cons.
Equation 4 y d 1 d 2 j < x d 2 j .A-inverted. ( d 1 , d 2 )
.di-elect cons. , j .di-elect cons. Equation 5 x d 1 j + x d 2 j -
y d 1 d 2 j .A-inverted. ( d 1 , d 2 ) .di-elect cons. , j
.di-elect cons. Equation 6 x dj .di-elect cons. { 0 , 1 } Equation
7 ##EQU00002##
[0045] The objective of Equation 2 is to minimize the sum of the
within cluster edges. Equation 3 is a constraint that enforces that
every node is assigned to exactly one cluster. Equations 4, 5, and
6 are constraints that collectively enforce that
.gamma..sub.d1d2j=1 if x.sub.d1j=1 and x.sub.d2j=1. Equation 7 is a
constraint that enforces integrality of x. It is noted that the
integrality of x ensures that y is also integral. The optimization
in Equation 2 in which Equation 7 is ignored is referred to as the
compact formulation of correlation clustering.
[0046] In step 34, the system 10 expands the formulation of the
correlation clustering to correspond to a tighter relaxation. By
expanding the formulation, the system 10 increases optimization
speed. Specifically, the system 10 generates an expanded
formulation of correlation clustering that corresponds to a tighter
relaxation than the compact formulation. The power set of D denoted
G is indexed with g. The term G is expressed using G.di-elect
cons.{0, 1}.sup.|D|.times.|G| where G.sub.dg=1 if d is in g. The
cost associated with each member of g.di-elect cons.G is defined as
the sum of all edges within the cluster g. The cost of clusters is
expressed using .GAMMA..di-elect cons..sup.|G| which is indexed
with g, where .GAMMA..sub.g is the cost associated with cluster g,
and is defined by Equation 8 below:
.GAMMA. g = ( d 1 , d 2 ) .di-elect cons. G d 1 g G d 2 g .theta. d
1 d 2 Equation 8 ##EQU00003##
[0047] Equations 9-11, below, frame optimization as selecting the
lowest cost non-overlapping subset of G:
min .gamma. g .gtoreq. 0 .A-inverted. g .di-elect cons. g .di-elect
cons. G .GAMMA. g .gamma. g Equation 9 g .di-elect cons. G dg
.gamma. g .ltoreq. 1 .A-inverted. d .di-elect cons. Equation 10 y g
.di-elect cons. { 0 , 1 } .A-inverted. g .di-elect cons. Equation
11 ##EQU00004##
[0048] The objective in Equation 9 is to minimize the sum of the
costs of the clusters selected. The constraint in Equation 10
enforces that every node is assigned to no more than one cluster.
If the solution .gamma. does not select a cluster that includes d,
then d is in a cluster by itself. The constraint in Equation 11
enforces that .gamma. is integral. The optimization expressed in
Equation 9, where Equation 11 is ignored, is referred to as
expanded LP relaxation.
[0049] In step 36, the system 10 solves the expanded formulation
using column generation. Specifically, column generation
circumvents the problem of the massive size of the set of
hypothesis by constructing a sufficient subset of G denoted G so
that solving the LP relaxation of Equation 9 over G provides the
same objective as solving over G. Construction of G is performed in
a cutting plane manner using the Lagrangian dual of the LP
relaxation of Equation 9 defined using G, which will be referred to
as the restricted master problem ("RMP"). Primal and dual LP
relaxations of Equations 9-11 are expressed in Equation 12, below,
where the dual LP relaxation is described using dual variables
.lamda..sub.d.gtoreq.0 for all d.di-elect cons.D:
min .gamma. g .gtoreq. 0 g .di-elect cons. ^ G dg .gamma. g
.ltoreq. 1 .A-inverted. d .di-elect cons. g .di-elect cons. .GAMMA.
g .gamma. g = max .lamda. d .gtoreq. 0 ; .A-inverted. d .di-elect
cons. .GAMMA. g + d .di-elect cons. G dg .lamda. d .gtoreq. 0
.A-inverted. g .di-elect cons. ^ d .di-elect cons. - .lamda. d
Equation 12 ##EQU00005##
[0050] The dual form Equation 12 has a finite number of variables
and |G| constraints, which allows the system 10 to use a cutting
plane method to solve the dual form. After the system 10 uses the
cutting plane approach to solves the dual form, the corresponding
primal solution is provably optimal. The use of the cutting plane
method in the dual form can require access to an oracle that
provides a violated dual constraint given a dual solution .lamda..
This violated dual constraint corresponds to a negative reduced
cost primal variable. The task of finding the lowest reduced cost
primal variable is referred to as pricing, whose corresponding
optimization is expressed in Equation 13, below:
min g .di-elect cons. .GAMMA. g + d .di-elect cons. G dg .lamda. d
Equation 13 ##EQU00006##
[0051] The optimization in Equation 13 is often not solved by
search, but instead as an integer program or a dynamic program. The
system 10 can employ specialized solvers to solve the pricing
problems that exploit the special structure found in specific
problem domains.
[0052] FIG. 3 is an illustration of an algorithm for column
generation in connection with step 36. Specifically, G is set to
equal an empty set, then iterated between solving optimization in
Equation 12 over G, and adding elements to G using Equation 13.
When no violated dual constraints exist, the system 10 terminates
column generation. For practical problems, the primal form of
Equation 12 is generally integral at termination of the column
generation. However, if the LP relaxation in Equation 12 is loose,
an approximate solution is produced by exactly or approximately
solving Equation 12 over set G instead of G. The relaxation in
Equation 12 can be tightened with subset-row inequalities, as will
be discussed in detail, below.
[0053] FIG. 4 is a flowchart illustrating process steps being
carried out by the system 10 to generate a MW SP formulation of
multi-person tracking, indicated generally in method 60.
Multi-person tracking is the task of identifying and tracking each
unique person in a video. For example, multi-person tracking can be
used for security applications, applications including autonomous
vehicles, etc. In multi-person tracking, the specific identity of
the people in an image is unknown. Combinatorial optimization can
be applied to multi-person tracking in the form of min-cost network
flow techniques and MWSP.
[0054] In step 62, the system 10 identifies all candidate
detections of people in each frame of a video. For example, the
system 10 can use a classifier, such as an ANN (artificial neural
network), to perform the identifications. It is noted that some of
these detections can be false detections.
[0055] In step 64, the system 10 associates each group of K
detections ordered in time, each on a separate frame, with a real
cost describing how plausible it is for the K detections to follow
each other directly in the track of a single person. In an example,
for a single user, K can be any real number (e.g., 3, 4, 5, etc.).
These sets can be referred to as subtracks in the present
disclosure.
[0056] The parameter K trades off modeling power and computation
requirements. The set of subtracks is pruned by relying on the fact
that most subsets of K are non-sensible since the detections are
not in sufficiently visually similar to correspond to a common
person. Similarly subtracks that do not follow the known statistics
of human motion are removed, e.g., humans cannot teleport across
space within a few frames of video.
[0057] In step 66, the system 10 formulates the packing of
detections into sequences of subtracks as an ILP, and solves the
IPL using column generation. Specifically, the system 10 employs a
MWSP formulation in which detections correspond to observations and
complete tracks correspond to sequences of subtracks. The cost of a
track is the sum of the costs of the subtracks that compose it plus
a constant offset. The constant offset penalizes/rewards having
additional people in the video which models a Bayesian prior belief
on the number of people in the image.
[0058] FIGS. 5A-5C are a set of images showing multi-object
tracking. Specifically, observations correspond to detections of
people and hypotheses to tracks of people moving across time. The
system 10 uses numbers to denote the bounding boxes 67, 68, 69 of a
common person across frames.
[0059] FIG. 6 is a flowchart illustrating process steps being
carried out by the system 10 to generate a multi-person tracking
MWSP formulation for data association, indicated generally at
method 70. By way of example, the system 10 uses a Markov model for
scoring the quality of a track (hypothesis). Specifically, the
Markov model incorporates scores corresponding to the statistical
support for subsequences of detections within a track called
subtracks, whose scores can depend arbitrarily on detections across
several frames. The Markov model is defined to be K'th order where
K is a user defined modeling parameter that trades off optimization
difficulty and modeling power. Those skilled in the art understand
that other models for scoring can be used.
[0060] In step 72, the system 10 defines a set of detections
(observations) of people in frames of video as V. By way of
example, S to denote a set of subtracks, each of which contains K
detections. For a given subtrack s.di-elect cons.S, s.sub.k
indicates the k'th detection in the sequence s={s.sub.1, . . . ,
s.sub.K} ordered by time from earliest to latest. It is noted that
the detections that compose a subtrack need not be consecutive in
time, thus permitting a person to disappear and reappear in video.
The mapping of subtracks to tracks is described using T.di-elect
cons.{0, 1}.sup.|S|.times.|G| where T.sub.sg=1 indicates that track
g contains subtrack s as a sub-sequence.
[0061] FIG. 7 is an illustration showing the subtracks.
Specifically, FIG. 7 illustrates possible tracks and subtracks
(boxes) where directed arrows indicate the valid successors of a
given subtrack. The subtracks are ordered by the time of their
final detection. It is noted that a subtrack can skip some time
steps, e.g., [d.sub.2b, d.sub.3d, d.sub.5b] which describes
occlusion at time four. Lines 78 indicate a single track that
consists of detections ordered in time d.sub.1a, d.sub.2a,
d.sub.3a, d.sub.4a, d.sub.6b.
[0062] The set of tracks are denoted as G where a track is a
sequence of subtracks ordered in time where the latest K-1 elements
in time of any subtrack s.sup.1 in the sequence are the earliest
K-1 elements of a subtrack s.sup.2 that immediately succeeds
s.sup.1. A track can be equivalently described as a sequence of
detections ordered in time or a sequence of subtracks ordered in
time.
[0063] Returning to FIG. 6, in step 74, the system 10 decomposes
track (hypothesis) costs F in terms of the subtrack costs
.theta..di-elect cons.|.sup.S| where each subtrack s is associated
with cost .theta..sub.s. Positive/negative values of .theta..sub.s
discourage/encourage the use of the sub-track s. The system 10
models a prior on the number of tracks in an image using
.theta..sup.0 which is the cost for instancing a track.
Positive/negative values of .theta..sup.0 discourage/encourage the
presence of more tracks in the packing. Using .theta., the system
defines the cost of a track g denoted .GAMMA..sub.g using Equation
14, below:
.GAMMA. g = .theta. 0 + s .di-elect cons. T sg .theta. s Equation
14 ##EQU00007##
[0064] To permit the construction of tracks that have fewer
detections than Km, in step 76, the system 10 augments the set of
subtracks with subtracks padded with empty detections. Such
subtracks have no possible predecessors or successors.
[0065] FIG. 8 is a flowchart illustrating process steps being
carried out by the system 10 to generate a MW SP formulation of
multi-person pose estimation ("MPPE"), indicated generally in
method 80. MPPE is the task of identifying each unique person in an
image, and annotating their body parts. As in tracking, specific
identities of the people are not known in advance. MPPE is relevant
in multiple domains including but not limited to autonomous
driving, rehabilitation, and defense applications.
[0066] FIG. 9 is an image showing multi-person pose estimation.
Specifically, observations correspond to detections of body parts,
and hypotheses to people. Lines of a common color associate a
person to the average position of each of his body parts. There is
a surjection of body parts (head, neck, etc.) to color for dots
that indicate the body part.
[0067] Returning to FIG. 8, in step 82, the system 10 identifies a
plurality of body parts. For example, the system 10 identities all
instances of each of fourteen human body parts (head, neck, and
left/right of the following: shoulder, elbow, wrist, hip, knee,
ankle). The system 10 can use an ANN to perform the identification.
It is noted that some of the detections can be false detections.
Some sets of detections correspond to the same ground truth
detection, but are separated in pixel space.
[0068] In step 84, a classifier (such as an artificial neural
network) associates each pair of detections with a cost to be
associated with a common person. The cost is made using a negative
log odds ratio of probability that the two detections are/are not
associated with a common person. Similarly, a cost is made to
associate each detection with a person. The classifiers take as
input local statistics of pixel values around the detections, and
or spatial, angular statistics concerning the relative location of
the pair of detections. The cost terms over pairs of detections are
referred to as pairwise, and those over a single detections is
referred to as a unary.
[0069] It is noted that person detection in computer vision relies
traditionally on tree (pictorial) structured models, which describe
the feasibility of poses of the human body, according to a cost
function defined on a graph, where nodes correspond to body parts,
and edges indicated adjacency. Thus, pairwise cost terms are
non-zero only between adjacent detections corresponding to the
same, or adjacent body parts in the tree model.
[0070] FIGS. 10A-B are illustrations showing a tree model,
augmented with additional connections where additional connections
trade off optimization difficulty and modeling power. Specifically,
FIGS. 10A-B shows the system 10 modeling a person as an
augmented-tree, in which each node represents a body part, the
edges light grey are connections of traditional pictorial
structure, and the edges dark grey are augmented connections from
neck to all non-adjacent parts of the neck. In FIG. 10A, the
augmented tree model is displayed as a stick figure. In FIG. 10B,
the augmented-tree model is superimposed on an image of a
person.
[0071] Returning to FIG. 8, in step 86, the system 10 aggregates
the detections to form people using an ILP formulation that admits
efficient inference using column generation. Specifically, the
system 10 employs an MWSP formulation where elements correspond to
detections of body parts, and sets correspond to people. The cost
of a person is the sum of the unary and pairwise terms associated
with the included detections plus an offset. As in tracking the
constant offset penalizes/rewards having additional people in the
image, which models a Bayesian prior belief on the number of people
in the image.
[0072] FIG. 11 is a flowchart illustrating process steps being
carried out by the system 10 to generate a MPPE MWSP formulation
for data association, as generally indicated in method 90. In step
92, the system 10 uses the term V to denote the set of human body
part detections (observations). A surjection of detections to human
body parts (head, neck, and left/right of the following: shoulder,
elbow, wrist, hip, knee, ankle) is denoted using R.sub.d to denote
the human body part associated with detection d.
[0073] In step 94, the system 10 defines a set of people
(hypotheses) G as the power set of V. It is noted that a person can
contain more than one detection of any given body part. This can be
a modeling decision and is a consequence of the body part detector
firing multiple places in close proximity corresponding the same
ground truth body part. Similarly, since human body parts are
occluded in real images it is possible for a hypothesis to contain
zero detections of some body parts.
[0074] In step 96, the system 10 defines a cost of a person using
terms .theta..sup.1.di-elect cons.|.sup.D|, and
.theta..sup.2.di-elect cons..sup.|D|.times.|D|, which is index with
d, and d.sub.1,d.sub.2 respectively. The terms .theta..sup.1,
.theta..sup.2 are referred to as unary and pairwise respectively.
The term .theta..sub.d.sup.1 denotes the cost of including
detection d in a person. Similarly, the term .theta..sup.2.sub.d1d2
denotes the cost of including detections d.sub.1, d.sub.2 in a
common person. Here positive/negative values .theta..sub.d
discourage/encourage the use of the detection d in a person.
Similarly positive/negative values of .theta..sub.d1d2
discourage/encourage the presence of d.sub.1, d.sub.2 jointly in a
single person. The system 10 models a prior on the number of people
in an image using .theta..sup.0 to denote a constant cost
associated with instancing a person. Here, positive/negative values
of .theta..sup.0 discourage/encourage the presence of more people
in the packing.
[0075] In step 98, the system 10 models a person according to a
common tree structured model. The system 10 can augment the tree
structure by connecting the neck to every other body part, the left
shoulder to the right shoulder, and the right hip to the right
shoulder. These augmentations improve performance, as will be
discussed in greater detail below. The augmented tree structure is
respected with regards to the costs, thus .theta..sub.d1d2 can only
be non-zero if R.sub.d1=R.sub.d2, or if R.sub.d2 is a child of
R.sub.d1 in the augmented tree. The mapping of people to costs is
defined by Equation 15, below:
.GAMMA. g = .theta. 0 + d .di-elect cons. .theta. d 1 G dg + d 1
.di-elect cons. d 2 .di-elect cons. .theta. d 1 d 2 2 G d 1 g G d 2
g Equation 15 ##EQU00008##
[0076] FIG. 12 is a flowchart illustrating process steps being
carried out by the system 10 to generate multi-cell segmentation,
indicated generally in method 100. Multi-cell segmentation is the
task of identifying each unique biological cell in an image and
identifying the pixels associated with that cell. This is useful in
domains such as image microscopy, where characterizing the
movements and activities of cells is important, but the capacity of
human annotators is limited.
[0077] FIG. 13 is an image showing a multi-cell instance
segmentation. Specifically, observations correspond to superpixels,
and hypotheses are complete biological cells. The system 10 can
color code cells arbitrarily, with each cell being provided a
single color.
[0078] Returning to FIG. 12, in step 102, given a biological image,
the system 10 applies dimensionality reduction by partitioning set
of pixels into sets called super-pixels. The system 10 achieves
this by aggregating pixels that a classifier is extremely confident
correspond to the same cell or are both background. The classifier
uses local spatial and color statistics. This conversion reduces
the space of millions of pixels to thousands of super-pixels, and
rarely meaningfully compromises the boundaries any cells in the
ground truth.
[0079] In step 104, for each pair of adjacent superpixels, the
system 10 use a classifier that provides a cost for the pair to be
associated with a common cell. Similarly we use a classifier to
generate a cost for each superpixel to be part of a cell. These
costs are referred to as unary and pairwise, respectively.
[0080] In step 106, the system 10 computes a maximum radius and
area (volume in 3D images) of cells on annotated data. In step 108,
the system 10 formulates identifying each cell in the image as a
MWSP problem where elements are superpixels and sets are cells. The
cost of a cell is the sum of the pairwise terms associated with
pairs of superpixels in the cell, plus the unary terms associated
with superpixels in the cell. As in the other applications, the
system 10 adds a constant offset to the cost of a cell that
penalizes/rewards having additional cells in the image. This offset
models a Bayesian prior belief on the number of cells in the image.
The system 10 sets the cost of the cell to .infin. if the radius of
the cell or the volume of the cell significantly exceeds the known
maximum volume and radius of cells on the annotated data.
[0081] FIG. 14 is a flowchart illustrating process steps being
carried out by the system 10 to generate a MWSP formulation of
multi-cell segmentation, indicated generally in method 110. In step
112, the system 10 generates a set of observations d.di-elect
cons.D corresponding to a set of superpixels and a set of
hypothesis G to a set of biological cells. The quality of a cell is
defined in terms of obeying the known structural properties of a
cell which describe the radius, area (volume in 3D for
super-voxels) and agreement with the local image statistics. A
constraint on a radius of a cell is set so that for any cell
g.di-elect cons.G there exists a super-pixel d*, which is referred
to as an anchor, such that all superpixels in the cell g are within
a user defined distance R.sub.max of d*. Terms S.sub.d1d2 denote
the distance between the centers of superpixels d.sub.1 and
d.sub.2. Spatial compactness is satisfied for a given cell
g.di-elect cons.G if the following in Equation 16, below,
holds:
.E-backward.d*.di-elect cons. s.t.
[G.sub.dg=1][S.sub.d*d.ltoreq.R.sub.max] .A-inverted.d.di-elect
cons. Equation 16
[0082] In step 114, the system 10 defines the radius constraint as
a cost. Specifically, for any g.di-elect cons.G, a penalty of
.infin. is added to .GAMMA..sub.g if g does not follow the radius
constraint. The radius constraint as optimization is expressed in
Equation 17, below:
min d * .di-elect cons. ( d .di-elect cons. [ S d * d > R max ]
G dg ) .infin. Equation 17 ##EQU00009##
[0083] Optionally, the system 10 can require that the anchor be
present in the cell. This changes Equations 16 and 17 to the
following formula, expressed in Equation 18, below:
.E-backward. d * .di-elect cons. s . t . G d * g = 1 and [ G dg = 1
] [ S d * d > R max ] .A-inverted. d .di-elect cons. min d *
.di-elect cons. ( 1 - G d * g ) .infin. + ( d .di-elect cons. [ S d
* d > R max ] G dg ) .infin. Equation 18 ##EQU00010##
[0084] Next, the constraint on the area of a cell is considered. In
step 116, the system 10 uses V.sub.max to denote the upper bound on
the area of a cell, and V.sub.d to denote the area of a superpixel
d. A cell g.di-elect cons.G satisfies the constraint on the area of
a cell if the following, expressed below in Equation 19, holds:
d .di-elect cons. V d .ltoreq. V max Equation 19 ##EQU00011##
[0085] In step 118, the system 10 defines the volume constraint as
a cost. For any g.di-elect cons.G, a penalty of .infin. is added to
.GAMMA..sub.g if g does not follow the volume constraint. The
volume constraint is expressed as a cost using Equation 20,
below:
[ V ma x < d .di-elect cons. G d g V d ] .infin. Equation 20
##EQU00012##
[0086] In step 120, the system 10 describes the image level
evidence for the quality of a cell using .theta..sub.d and
.theta..sub.d1d2. Specifically, the system 10 uses .theta..sub.d to
denote the cost for superpixel d to be part of any cell. Similarly,
the system 10 uses .theta..sub.d1d2 to denote the cost for d.sub.1
and d.sub.2 to belong in a common cell. Positive/negative values
.theta..sub.d discourage/encourage the use of the superpixel d in a
cell. Similarly, positive/negative values of .theta..sub.d1d2
discourage/encourage the presence of d.sub.1, d.sub.2 jointly in a
single cell. The system 10 model a prior on the number of cells in
an image using .theta..sup.0 to denote a cost associated with
instancing a cell. Positive/negative values of .theta..sup.0
discourage/encourage the presence of more cells in the packing. The
cost .GAMMA..sub.g of an hypothesis g is expressed in Equation 21,
below:
.GAMMA. g = .theta. 0 + d .di-elect cons. .theta. d 1 G d g + d 1 d
2 .di-elect cons. .theta. d 1 d 2 2 G d 1 g G d 2 g + [ V ma x <
d .di-elect cons. G d g V d ] .infin. + min d * .di-elect cons. ( d
.di-elect cons. [ S dd * > R ma x ] G d g ) .infin. Equation 21
##EQU00013##
[0087] The following will discuss the system 10 solving the pricing
problem of Equation 13 in the context of the MW SP formulations. In
pricing for multi-object tracking, the system 10 formulates the
task of identifying the lowest reduced cost track (hypothesis) as a
dynamic program. The system 10 considers the structure of that
dynamic program and specifies that a subtrack s may be preceded by
another subtrack s, if the least recent K-1 detections in s
correspond to the most recent K-1 detections in s. The system 10
denotes the set of valid subtracks, that may precede a subtrack s
as {.fwdarw.s}. The system 10 uses l.sub.s to denote the reduced
cost of the lowest reduced cost track, that terminates at subtrack
s. Ordering the subtracks by the time of last detection allows
efficient computation of l, using the following dynamic programming
expressed in Equation 22, below:
s .rarw. .theta. s + .lamda. s K + min { min s ^ .di-elect cons. {
s } s ^ , .theta. 0 + k = 0 K - 1 .lamda. s k } Equation 22
##EQU00014##
[0088] The system 10 can choose to add, not only the lowest reduced
cost track to G, but other distinct negative reduced cost tracks.
Such strategies can be implemented by the system 10 since the
dynamic program produces the lowest reduced cost track terminating
at each subtrack. One such strategy adds to G the lowest reduced
cost track terminating at each detection (excluding those with
non-negative reduced cost).
[0089] In pricing for multi-person pose estimation, the system 10
identifies the lowest reduced cost person (hypothesis), which can
be formulated as a set of dynamic programs. A graph is used where
nodes correspond to human body parts, and edges indicate adjacency.
A subgraph in which the neck is removed corresponds to a tree
structure can motivate the use of dynamic pro-gramming to solve the
pricing problem. During the pricing step, the system 10 iterates
through the power set of neck detections and compute the lowest
reduced cost person containing the neck detections. The power set
of neck detections is indexed with and [g ]=1 is used to indicate
that the neck detections in g are exactly those in . Pricing for an
arbitrary subset of the neck detections is expressed in Equation
23, below:
min g .di-elect cons. [ g ] = 1 .GAMMA. g + d .di-elect cons.
.lamda. d G dp Equation 23 ##EQU00015##
[0090] To solve Equation 23 as a dynamic program, the system 10
enumerates the power set of pairs of adjacent detections in the
tree in the problem domain. Specifically, the system 10 provides a
notation to assist formulating Equation 23 as a dynamic program.
The system 10 uses R to denote the set of human body parts, which
is index by r. The system 10 uses S.sup.r to denote the power set
of detections of part r, and index it with s. The system 10 uses Dr
to denote the set of detections of part r. S.sup.r is described
using S.sup.r.di-elect cons.{0, 1}.sup.|D|.times.|Sr|, where
S.sup.r.sub.ds=1 indicates that detection d is in set s. For
convenience, the system can define the neck as part 0 and thus the
power set of neck detections is denoted S.sup.0.
[0091] It is noted that when conditioned on a specific set of neck
detections (denoted s.sup.0), the pairwise costs from the neck
detections to all other detections can be added to unary costs of
the other detections. Thus, the augmented-tree structure becomes a
typical tree structure, and exact inference can be done via dynamic
programming. The system 10 makes the tree directed by choosing a
single node to be the root arbitrarily, and orienting edges in the
graph going away from the root.
[0092] The system 10 defines the set of children of any human body
part r in the tree graph as {r.fwdarw.}. The system 10 defines
.mu..sup.r.sub.s as the reduced cost of the lowest reduced cost
sub-tree rooted at r given that its parent {circumflex over (r)}
takes on state s. The term .mu..sup.r.sub.s includes the cost of
the pairwise terms between detections of part {circumflex over
(r)}, with detections of part r, as expressed in Equation 24,
below:
.mu. s ^ r = min s .di-elect cons. S r d ^ .di-elect cons. r d
.di-elect cons. r S d ^ s r ^ S d s r .theta. d ^ d 2 + v s r
Equation 24 ##EQU00016##
[0093] Specifically, in Equation 24, the term
d ^ .di-elect cons. D r d .di-elect cons. D r S d ^ s r ^ S ds r
.theta. d ^ d 2 ##EQU00017##
computers pairwise costs between part r and its parent {circumflex
over (r)}, while v.sup.r.sub.s accounts for the cost of the
sub-tree rooted at part ra with state s, and is defined by
Equations 25 and 26, below:
v s r = .rho. s r + r _ .di-elect cons. { r -> } .mu. s r _
Equation 25 .rho. s r = d .di-elect cons. r ( .theta. d 1 + .lamda.
d ) S ds r + d 1 .di-elect cons. r d 2 .di-elect cons. r .theta. d
1 d 2 2 S d 1 s r S d 2 s r + d 1 .di-elect cons. 0 d 2 .di-elect
cons. r .theta. d 1 d 2 2 S d 1 s 0 0 S d 2 s r Equation 26
##EQU00018##
[0094] To compute .mu..sup.r.sub.s for each s.di-elect
cons.S.sup.{circumflex over (r)}, the system 10 needs to iterate
over all s.di-elect cons.S.sup.r. For most problems, this is
feasible. However, considering that |D.sup.r|=|D.sup.{circumflex
over (r)}|=15, the system 10 would have to enumerate the joint
space of over one billion configurations, which is can be
expensive. Accordingly, the system 10 can use nested Benders
decomposition, which is able to solve the dynamic program exactly,
with computation that scales in practice O(|D.sup.r|) time not
O(|D.sup.r|.times.|D.sup.r|)
[0095] In pricing for multi-cell segmentation, the system 10 finds
negative reduced cost cells (hypothesis) by exploiting that cells
are small and compact. In Equation 21, above, every cell with
non-infinite cost is associated with an anchor d* in close
proximity to all other super-pixels (observations) that compose the
cell. The system 10 solves pricing by conditioning on the choice of
the anchor d*, and finds the lowest reduced cost cell denoted
g.sub.d*, as expressed by Equation 27, below:
g d * .rarw. arg min g .di-elect cons. G d g = 0 , .A-inverted. d d
* d .di-elect cons. .theta. 0 + ( .theta. d 1 + .lamda. d ) G d g +
d 1 d 2 .di-elect cons. .theta. d 1 d 2 2 G d 1 g G d 2 g Equation
27 ##EQU00019##
[0096] The system 10 reconfigures the optimization in Equation 27
as an ILP (seen below in Equations 30-33) using decision variables
x.di-elect cons.{0, 1}.sup.|D|, y.di-elect cons.0,
1.sup.|D|.times.|D| which are indexed by d and d.sub.1, d.sub.2
respectively, and where x and y are defined in Equations 28 and 28,
below:
x d G d g d * Equation 28 y d 1 d 2 = G d 2 g d * G d 1 g d *
Equation 29 min x d .di-elect cons. { 0 , 1 } x d = 0 .A-inverted.
d d * y d 1 d 2 .gtoreq. 0 .theta. 0 + d .di-elect cons. ( .theta.
d 1 + .lamda. d ) x d + d 1 d 2 .di-elect cons. .theta. d 1 d 2 2 y
d 1 d 2 Equation 30 y d 1 d 2 .ltoreq. x d 1 .A-inverted. d 1 , d 2
.di-elect cons. Equation 31 y d 1 d 2 .ltoreq. x d 2 .A-inverted. d
1 , d 2 .di-elect cons. Equation 32 - y d 1 d 2 + x d 1 + x d 2
.ltoreq. 1 .A-inverted. d 1 , d 2 .di-elect cons. Equation 33
##EQU00020##
[0097] The system enforces Equations 28 and 29 with Equations 31,
32, and 33. Equations 31 and 32 state that .gamma..sub.d1d2 cannot
be set to one unless both d.sub.1,d.sub.2 are included in the cell
g.sub.d*. Similarly, Equation 33 states that if both d.sub.1,
d.sub.2 are included in g.sub.d*, then .gamma..sub.d1d2 is set to
one. It is noted that .gamma. is entirely governed by x and it does
not need to be explicitly required to be integer in order for the
ILP solver to produce an integer solution.
[0098] The system 10 can generate many distinct hypotheses with
negative reduced cost when solving Equation 27 as a consequence
solving using different d*. Thus, the system 10 can add to the
nascent set G each hypothesis with negative reduced cost generated
by solving Equation 27. The system 10 can resolve the master
problem after any negative reduced cost hypothesis is generated. If
the anchor is included in a cell for it to be feasible, then xd* is
required to be set to one in Equation 30.
[0099] FIG. 15 is a flowchart illustrating process steps being
carried out by the system 10 to tightening the LP relaxation of the
MWSP (e.g., tightening the restricted master problem using
subset-row inequalities), indicated generally in method 130. The
system 10 consider four hypothesis G=g.sub.1, g.sub.2, g.sub.3,
g.sub.4 over three observations D=d.sub.1, d.sub.2, d.sub.3, where
the first three hypotheses each contain two of the three
observations {d.sub.1, d.sub.2}, {d.sub.1, d.sub.3}, {d.sub.2,
d.sub.3} respectively, and the fourth hypothesis contains all three
{d.sub.1, d.sub.2, d.sub.3}. The hypotheses costs are given by
.GAMMA..sub.g1=.GAMMA..sub.g2=.GAMMA..sub.g3=-4 and
.GAMMA..sub.g4=-5. An optimal integer solution sets
.gamma..sub.g4=1, and has a cost of -5. A lower cost fractional
solution sets .gamma..sub.g1=.gamma..sub.g2=.gamma..sub.g3=0.5 and
.gamma..sub.g4=0 which has cost -6. Hence the LP relaxation is
loose.
[0100] The LP relaxation of MWSP can be tightened by the system 10
employing subset-row inequalities in such a way as to preserve the
structure of the pricing problem. The system 10 can add them to the
pricing problem, and parameterize them by two integers m.sub.1,
m.sub.2 and a subset {circumflex over (D)}|D| of cardinality
m.sub.1m.sub.2-1. Subset-row inequalities are used to require that
the number of hypotheses containing m.sub.1 or more members of
|{circumflex over (D)}| must be no greater than m.sub.2-1. The most
general form of subset-row inequalities is written in Equation 34,
below:
g .di-elect cons. .gamma. g d .di-elect cons. G d g [ d .di-elect
cons. ^ ] m 1 .ltoreq. m 2 - 1 Equation 34 ##EQU00021##
[0101] Subset-row inequalities where m1=m2=2 will be referred to as
triplets. However, all content in this section is fully applicable
to the other subset-rows inequalities modeled in the present
disclosure.
[0102] In step 132, the system 10 generates an MWSP formulation
tightened using triplets. In step 134, the system 10 determines
whether the subset-row inequalities destroys the structure of the
pricing problem. When the subset-row inequalities do not destroy
the structure of the pricing problem, the system 10 proceeds to
step 136, where the system 10 solves the pricing problem while
modifying the structure of the pricing problem. This allow the
system to use subset-row inequalities to tighten the LP relaxation
for multi-cell segmentation. When the subset-row inequalities
destroys the structure of the pricing problem, the system 10
proceeds to step 138, where the system 10 solves the pricing
problem without modifying the structure of the pricing problem.
This permits the use of subset-row inequalities to tighten the LP
relaxations for multi-person tracking and multi-person pose
estimation. Each step will be discussed in further detail
below.
[0103] In step 132, the system 10 tightens the LP relaxation of
MWSP by enforcing that for any set of three unique observations, a
number of selected hypotheses that include two or more members can
be no larger than one. The system 10 describes the set of sets of
three unique observations by C, and index it with c. The membership
of c is described using [d.di-elect cons.c], where [d.di-elect
cons.c]=1 if observation d is in c, and otherwise [d.di-elect
cons.c]=0. The mapping of triplets to hypotheses is described using
matrix C.di-elect cons.{0, 1}.sup.|C|.times.|G|, which is index by
c, g. Here, C.sub.cg=1 if at least two of the observations in c are
present in g. The LP relaxation for MWSP tightened using triplets
is expressed using Equation 35, below:
min .gamma. g .gtoreq. 0 .A-inverted. g .di-elect cons. g .di-elect
cons. .GAMMA. g .gamma. g g .di-elect cons. G d g .gamma. g
.ltoreq. 1 .A-inverted. d .di-elect cons. g .di-elect cons. G cg
.gamma. g .ltoreq. 1 .A-inverted. c .di-elect cons. Equation 35
##EQU00022##
[0104] A dual form of Equation 35 is expressed in Equation 36
below, which uses dual variables .psi..di-elect cons.|.sup.C|,
which is index by c, where .psi..sub.c is the dual variables
associated with the constraint in Equation 35 over c.
Eq 35 = max .lamda. d .gtoreq. 0 ; .A-inverted. d .di-elect cons.
.psi. c .gtoreq. 0 ; .A-inverted. c .di-elect cons. - d .di-elect
cons. .lamda. d - c .di-elect cons. .psi. c .GAMMA. g + d .di-elect
cons. G d g .lamda. d + c .di-elect cons. C cg .psi. c .gtoreq. 0
.A-inverted. g .di-elect cons. Equation 36 ##EQU00023##
[0105] The system can solve Equation 35 using a generalization of
column generation, called column/row generation ("CRG"). CRG
exploits the fact that the dual LP relaxation has a finite number
of variables, thus making it amenable to optimization via cutting
plane method.
[0106] As in column generation, the system 10 uses CRG to construct
a sufficient set G by adding negative reduced cost hypothesis
(violated dual constraints), given fixed dual variables. CRG
augments this procedure by identifying a sufficient set C by
identifying violated constraints given a fixed primal solution. CRG
begins with sets G, C equal to the empty set, then iterates between
solving optimization in Equation 35 over set G, C, and adding
elements to G, C. Each iteration produces primal/dual solutions,
which facilitate the identification of violated primal/dual
constraints. When, no violated primal/dual constraints exist, the
system 10 terminates CRG. Identifying violated primal constraints
is done by iterating over c.di-elect cons.C, to identify the
c.di-elect cons.C, that maximizes .SIGMA..sub.G.di-elect
cons.G.gamma..sub.gC.sub.cg, given fixed .gamma.. While C is too
large to include each element as a constraint in the LP relaxation,
it is not too large to search over. This is because only triplets,
where each detection is associated with a fractional valued
hypothesis in .gamma., need be considered when iterating over
c.di-elect cons.C. Finding the most violated dual constraint (which
is called pricing) corresponds to the following optimization
expressed in Equation 37, below:
min g .di-elect cons. .GAMMA. g + d .di-elect cons. G d g .lamda. d
+ c .di-elect cons. C C cg .psi. c Equation 37 ##EQU00024##
[0107] FIG. 16 is an algorithm describing the CRG. Specifically, in
lines 0-1, the system 10 initializes the nascent sets of hypotheses
G and triplets C to the empty set. In lines 2-15, the system 10
construct nascent sets G and C. The system 10 iterates until the
flag "did_augment" is set to false, meaning that the solution
.gamma. satisfies all triplets, and no negative reduced cost
hypotheses exist. Specifically, in line 3, the system 10 sets
"did_augment" to false, which indicates that the system 10 has not
edited G or C this iteration. In line 4, the system 10 solves the
restricted master problem producing primal and dual solutions. In
lines 5-6, the system 10 identifies the lowest reduced cost
hypothesis g*, and the triplet constraint that is most violated c*.
In lines 7-10, the system 10 adds g* to G if g* has negative
reduced cost, and sets "did_augment" to true, meaning that
optimization should continue after this iteration of the loop over
lines 2-15. In lines 11-14, if c* corresponds to a violated primal
constraint, the system 10 add c* to C, and sets "did_augment" to
true, meaning that optimization should continue after this
iteration of the loop over lines 2-15. In line 16, the system 10
solves set packing using only G. If the LP relaxation is tight in
the last iteration of lines 2-15, then the system 10 uses the
.gamma. provided during that iteration. In line 17, the system 10
returns the solution .gamma..
[0108] Intelligent schedules can be employed over the operations
(e.g., solve the restricted master problem, augment G and augment
C. For example, multiple elements can be added to G, and or C after
each time the restricted master problem is solved. Alternatively,
the system can only augment C when no negative reduced cost
elements exist to be added to G.
[0109] Returning to FIG. 15, in step 136, the system 10 solves the
pricing problem while modifying the structure of the pricing
problem. For many problem domains, the system 10 can solve the
pricing problem by adding the triples to optimization. One such
example is the case of multi-cell instance segmentation. The
corresponding pricing problem conditioned on the anchor d* is
expressed in Equations 38 and 39, below. The system 10 uses the
term z.sub.c.di-elect cons.{0, 1} to denote the decision associated
with triplet c for all c.di-elect cons.C. Here z.sub.c=1 if two or
more members in triplet c are included in the cell.
min x d .di-elect cons. { 0 , 1 } x d = 0 .A-inverted. d d * y d 1
d 2 .gtoreq. 0 z c .gtoreq. 0 .theta. 0 + d .di-elect cons. (
.theta. d 1 + .lamda. d ) x d + d 1 d 2 .di-elect cons. .theta. d 1
d 2 2 y d 1 d 2 + c .di-elect cons. .psi. c z z Equation 38 y d 1 d
2 .ltoreq. x d 1 .A-inverted. d 1 , d 2 .di-elect cons. y d 1 d 2
.ltoreq. x d 2 .A-inverted. d 1 , d 2 .di-elect cons. - z c + x d 3
+ x d 4 .ltoreq. 1 .A-inverted. c .di-elect cons. , [ d 3 .di-elect
cons. c ] , [ d 4 .di-elect cons. c ] , d 3 .noteq. d 4 Equation 39
##EQU00025##
[0110] It is noted that that z.sub.c is described entirely by x and
is set to the smallest possible value at optimality since
.psi..sub.s is non-negative. Thus, the system 10 does not require
z.sub.c to be integer since integrality of z is assured given that
x is integral.
[0111] In step 138, the system solves the pricing problem without
modifying the structure of the pricing problem. Specifically, the
system 10 finds negative reduced cost primal variables, given the
dual solution .lamda., .psi. where .psi. cannot be directly
considered, when using a specialized solver for pricing. First, the
system 10 denotes the reduced cost of a hypothesis g as V (.GAMMA.,
.lamda., .psi., g). The reduced cost of the lowest reduced cost
hypothesis is denoted as as V*(.GAMMA., .lamda., .psi.). V
(.GAMMA., .lamda., .psi., g), V*(.GAMMA., .lamda., .psi.) are
expressed in Equation 40, below:
V ( .GAMMA. , .lamda. , .psi. , g ) = .GAMMA. g + d .di-elect cons.
.lamda. d G d g + c .di-elect cons. ^ .psi. c C c g V * ( .GAMMA. ,
.lamda. , .psi. ) = min g .di-elect cons. V ( .GAMMA. , .lamda. ,
.psi. , g ) Equation 40 ##EQU00026##
[0112] The system 10 applies a specialized solver and ignores the
triplet term .SIGMA..sub.c.di-elect cons.C.psi..sub.cC.sub.cg,
providing a lower bound. Specifically, the system 10 can use a
branch and bound ("B&B") approach. The set of branches in a
B&B tree is denoted B. Each branch b.di-elect cons.B is defined
by two sets D.sub.b+, and D.sub.b-. These correspond to
observations that must be included in the hypothesis and those that
must not be included in the hypothesis respectively. The set of all
hypotheses that are consistent with both D.sub.b+ and D.sub.b- is
expressed as G.sub.b.+-.. The bounding and branching operators will
be discussed in further detail below. The initial branch b is
defined by D.sub.b+=D.sub.b-={ }.
[0113] Regarding the bounding operator, pricing ignoring the 0
terms is referred to as the independent pricing problem. Term
V.sup.b(.GAMMA., .lamda., .psi.) denotes a value of the lowest
reduced cost over columns in G.sub.b.+-.. The system 10 computes a
lower-bound for this value, denoted V.sup.b.sub.lb by independently
optimizing the independent pricing program and the triplet penalty,
as expressed below in Equation 41:
V b ( .GAMMA. , .lamda. , .psi. ) = min g .di-elect cons. b .+-. V
( .GAMMA. , .lamda. , .psi. , g ) = min g .di-elect cons. b .+-.
.GAMMA. g + d .di-elect cons. .lamda. d G d g + c .di-elect cons. C
^ .psi. c C cg .gtoreq. min g .di-elect cons. b .+-. .GAMMA. g + d
.di-elect cons. .lamda. d G dg + min g .di-elect cons. b .+-. c
.di-elect cons. C ^ .psi. c C cg .gtoreq. min g .di-elect cons. b
.+-. .GAMMA. g + d .di-elect cons. .lamda. d G d g + c .di-elect
cons. C ^ .psi. c [ d .di-elect cons. D [ d .di-elect cons. c ] [ d
.di-elect cons. D b + ] .gtoreq. 2 ] = V l b b ( .GAMMA. , .lamda.
, .psi. ) Equation 41 ##EQU00027##
[0114] The system 10 can compute min.sub.g.di-elect
cons.Gb.+-..GAMMA..sub.g+.SIGMA..sub.d.di-elect cons.D
.lamda..sub.dG.sub.dg for applications in multi-object tracking and
multi-person pose estimation. In multi-person tracking, when
performing dynamic programming, the system 10 enforces that
g.di-elect cons.g.sub.b.+-. as follows: 1) Enforcing D.sub.b-: For
each subtrack s that includes a d.di-elect cons.D.sub.b-, the
system sets the corresponding .theta..sub.s value of .infin.; and
2) Enforcing D.sub.b+: For each subtrack s that includes a
detection co-occurring in time with any d.di-elect cons.D.sub.b-
(other than d), the system sets the .theta..sub.s to .infin..
Similarly, the system 10 does not consider starting a track after
the occurrence of the first member of D.sub.b+ in time. After
completing the dynamic program generating tracks, the system 10
sets the reduced cost to .infin. for any track terminating prior to
the point in time of the last member of D.sub.b+. In multi-person
pose estimation, the system 10 forces detections D.sub.b+, D.sub.b-
to be active/inactive respectively when generating a person.
[0115] Branch operation will now be discussed. The system 10
expresses an upper bound on V.sup.b(.GAMMA., .lamda., .psi.) as
V.sup.b.sub.ub(.GAMMA., .lamda., .psi.). The system 10 constructs
this by adding in the active .psi. terms ignored when constructing
V.sup.b.sub.lb (.GAMMA., .lamda., .psi.). Setting g.sub.b=arg
min.sub.g.di-elect cons.Gb.+-..GAMMA..sub.g+.SIGMA..sub.d.di-elect
cons.D .lamda..sub.dG.sub.dg yields Equation 42, below:
V u b b ( .GAMMA. , .lamda. , .psi. ) = .GAMMA. g b + d .di-elect
cons. .lamda. d G d g b + c .di-elect cons. C ^ .psi. c C cg b
Equation 42 ##EQU00028##
[0116] The largest triplet term .psi..sub.c that is included in
V.sup.b.sub.ub(.GAMMA., .lamda., .psi.) but not
V.sup.b.sub.lb(.GAMMA., .lamda., .psi.) is expressed in Equation
43, below:
c * .rarw. ar g max c .di-elect cons. C ^ .psi. c C cg b [ d
.di-elect cons. [ d .di-elect cons. c ] [ d .di-elect cons. b + ]
< 2 ] Equation 43 ##EQU00029##
[0117] The system 10 generates eight new branches for each of the
eight different ways of splitting the observations in the triplet
term corresponding to c* between the include (+) and exclude (-)
sets. FIG. 17 is a table showing splits enumerated for a triplet of
observations c*={d1, d2, d3}. Specifically, The system 10
enumerates the eight sets each describing one way of partitioning
the three observations d1, d2, d3 between the include (+) and
exclude (-) sets for the children of branch b. For example, branch
D.sub.b8 excludes d.sub.1 and d.sub.2 but includes d.sub.3 so
D.sub.b8-=[D.sub.b-.orgate.d.sub.1.orgate.d.sub.2] and the set
D.sub.b8+=[D.sub.b+.orgate.d.sub.3].
[0118] It is noted that not all child nodes need be created as some
are guaranteed to be infeasible if some observations in c* already
belongs to D.sub.b- or D.sub.b+. For example, let us assume that
c*=d.sub.1, d.sub.2, d.sub.3. If d.sub.1.di-elect cons.D.sub.b+,
then the child nodes D.sub.b2, D.sub.b4, D.sub.b6 and D.sub.b8 will
all be infeasible because d.sub.1 belongs to both + and -
decisions. Furthermore, if d.sub.3.di-elect cons.D.sub.b-, then all
nodes D.sub.b5, D.sub.b6, D.sub.b7 and D.sub.b8 are infeasible.
Thus only the nodes D.sub.b1 and D.sub.b3 are feasible and g.sub.b
remains an optimal solution for D.sub.b1. Note that the branch
operator is not applied if .psi..sub.c*=0.
[0119] The following section discussed upper bounds on the Lagrange
multipliers .lamda., called dual optimal inequalities ("DOI"),
which do not remove all dual optimal solutions. The system 10 using
of DOI decreases the search space that column generation needs to
explore, thus decreasing the number of iterations of pricing
required. For various applications including cutting stock, and
image segmentation, DOI are used to dramatically decrease
optimization time without sacrificing optimality.
[0120] Regarding basic dual optimal inequalities, it is noted that
at any given iteration of column generation, the optimal solution
to the primal LP relaxation need not lie in the polyhedron of G. If
limited to producing a primal solution over G, it is useful to
allow .SIGMA..sub.g.di-elect cons.GG.sub.dg.gamma..sub.g to exceed
one for some d.di-elect cons.D.
[0121] The system 10 uses a slack term .xi..sub.d.gtoreq.0 that
tracks the presence of any observations included more than once and
prevents them from contributing to the objective when the
corresponding contribution is negative. Specifically, the system 10
offsets the cost for "over-including" an observation with a cost
that at least compensates and likely overcompensates. It is noted
that removal of a detection d from a hypothesis increases the cost
of a hypothesis by no more than .XI..sub.d for each d, where
.XI..sub.d is expressed by Equation 44, and the expanded MWSP
objective and its dual LP relaxation are expressed by Equation 45,
both below:
.XI. d _ .gtoreq. max g .di-elect cons. g _ .di-elect cons. G d g =
G d g _ [ d .noteq. d _ ] .A-inverted. d .di-elect cons. max ( 0 ,
.GAMMA. g - .GAMMA. g _ ) Equation 44 min .gamma. g .gtoreq. 0 .xi.
d .gtoreq. 0 g .di-elect cons. G d g .gamma. g - .xi. d .ltoreq. 1
g .di-elect cons. .GAMMA. g .gamma. g + d .di-elect cons. .XI. d
.xi. d = max .XI. d .gtoreq. .lamda. d .gtoreq. 0 .GAMMA. g + d
.di-elect cons. G d g .lamda. d .gtoreq. 0 - d .di-elect cons.
.lamda. d Equation 45 ##EQU00030##
[0122] It is noted that the dual relaxation bounds .lamda. by .XI.
from above. These bounds are called dual optimal inequalities DOIs.
To ensure that the DOIs are not active at termination of column
generation, the system 10 offsets .XI. with a tiny positive
constant.
[0123] It should be understood that the use of the DOI does not cut
off all dual optimal solutions when G=G. Specifically, the system
10 can map any solution .gamma., .xi., where .xi. is optimal given
.gamma. to a feasible solution .gamma., .xi. where .xi. is a zero
vector, such that the cost of the y, is less than or equal to that
of y, To achieve this, the system 10 iterates over d, then convert
hypotheses including d to those not including d proportional to
.SIGMA..sub.id./(1+.xi..sub.d). The system 10 defines g.sup.-d in
Equation 46, below, for all pairs {circumflex over (d)}.di-elect
cons.D, g.di-elect cons.G:
G.sub.dg-{circumflex over (d)}=G.sub.dg[{circumflex over
(d)}.noteq.d] Equation 46
[0124] The system 10 converts .gamma., .xi. to .gamma., by
iterating over d, then over g.di-elect cons.G such that G.sub.dg=1
and .gamma..sub.g>0, and then applying an update expressed in
Equation 47, below:
.alpha..rarw.min(.gamma..sub.g,.xi..sub.d)
.gamma..sub.g.rarw..gamma..sub.g-.alpha.
.gamma..sub.g-d.rarw..gamma..sub.g-d+.alpha.
.xi..sub.d.rarw..xi..sub.d-.alpha. Equation 47
[0125] In Equation 47, .alpha. is the magnitude of the update to
the terms .gamma..sub.g, .gamma..sub.g-d, .xi..sub.d. The change in
the objective using Equation 47 is expressed in Equation 48,
below:
.alpha.(-.XI..sub.d+.GAMMA..sub.g.sup.-d-.GAMMA..sub.g) Equation
48
[0126] Since .XI..sub.d.gtoreq..GAMMA..sub.g{circumflex over (
)}(-d)-.GAMMA..sub.g by definition, and a is positive, then the
total change in Equation 48 is non-negative. Thus, there exists an
optimal primal solution in which .xi. is the zero vector.
Therefore, the use of DOI does no remove all dual optimal
solutions.
[0127] This section discusses dual optimal inequalities that are
not looser that those discussed above. The system 10 uses G to
denote the set of hypotheses which are subsets of hypotheses in G.
Thus, at any given point in column generation, the system 10 binds
.lamda..sub.d as in Equation 44, above, except replacing
optimization over G with G*, which is expressed in Equation 49,
below:
.XI. d _ .gtoreq. max g .di-elect cons. _ * g _ .di-elect cons. ^ *
G d g = G d g _ [ d .noteq. d _ ] .A-inverted. d .di-elect cons.
max ( 0 , .GAMMA. g - .GAMMA. g _ ) Equation 49 ##EQU00031##
[0128] It is noted that bounds in Equation 49 are not greater than
Equation 44 and may increase when elements are added to G. The DOI
in Equations 44 and 49 are referred to as invariant and varying
DOI, respectively.
[0129] The following section discusses generating a valid DOI for
multi-person pose estimation, multi-cell segmentation, and
multi-person tracking. Regarding multi-person pose estimation and
an invariant DOI, the removal of a detection d from a pose removes
from the cost the associated and any active pairwise terms,
.theta..sup.2.sub.dd1, .theta..sup.2.sub.d1d. Similarly, if d is
the only detection in a pose, then the .theta..sup.0 term is also
removed. The system 10 upper bounds the sum of these three terms by
considering only the positive valued terms and .theta..sup.1.sub.d.
If this sum is negative, the system 10 sets the upper bound d to
zero, since .lamda. is non-negative by definition. The system
express .XI..sub.d using Equation 50, below:
.XI. d = - min ( 0 , min ( 0 , .theta. 0 - ) + .theta. d + d 1
.di-elect cons. min ( 0 , .theta. dd 1 + .theta. d 1 d ) ) Equation
50 ##EQU00032##
[0130] Regarding multi-person pose estimation and an invariant DOI,
the system 10 produces .XI..sub.d by using the same approach as in
Equation 50, except that the system 10 only considers pairwise
terms that could be removed when replacing members of G* with other
members of G*, as expressed below in Equation 51:
.XI. d = - min ( 0 , min ( 0 , .theta. 0 - ) - .theta. d - min g
.di-elect cons. ^ G dg = 1 d 1 .di-elect cons. min ( 0 , .theta. dd
1 + .theta. d 1 d ) ) Equation 51 ##EQU00033##
[0131] The DOI for multi-cell segmentation are identical to the DOI
for multi-person pose estimation Regarding multi-person tracking,
the system 10 consider the production of .XI..sub.d for tracking.
The system 10, rather than producing a single track when removing
an element d, splits the track into two separate tracks, where d
defines the boundary, and itself is removed. The removal of d
causes the removal of the costs of all subtracks including d. This
procedure will produce a track if d is a middle element in the
track. Similarly, if d is in every subtrack, then this procedure
removes a track.
[0132] For invariant DOI, the system denotes .delta..sub.s,d,k to
be the lowest total cost sequence of subtracks each including d
(e.g., .delta..sub.s,d,K=.theta..sub.s), where the last subtrack in
the sequence is s and d is in position k, as expressed in Equation
52, and using .delta. to express .XI..sub.d is shown in Equation
53, both below:
.delta. s , d , k = .theta. s + min ( 0 , min s ^ .di-elect cons. {
s } .delta. s ^ , d , k + 1 ) for [ 1 .ltoreq. k < K ] Equation
52 .XI. d = - min ( 0 , - .theta. 0 + min s , k s k = d .delta. s ,
d , k Equation 53 ##EQU00034##
[0133] In Equation 53, the system adds the absolute value of
.theta..sup.0 since the removal of all subtracks including d may
create two tracks from one or remove a track without replacing it.
Further, In Equation 53, all possible sequences of subtracks that
contain d are considered. However, the system 10 in regards to the
varying DOI need only consider the sequences of subtracks in tracks
in G. As such, the system denotes .delta..sub.gs,d,k be the lowest
total cost sequence of subtracks of g each including d, where the
last subtrack in the sequence is s and d is in position k, as
expressed in Equations 54-56, below:
.delta. s , d , K g = .theta. s .A-inverted. s s . t . T sg = 1
Equation 54 .delta. s , d , k g = .theta. s + min ( 0 , min s ^
.di-elect cons. { s } , T s ^ g = 1 .delta. s ^ , d , k + 1 ) for [
1 .ltoreq. k < K ] Equation 55 .XI. d = - min ( 0 , - .theta. 0
+ min g .di-elect cons. ^ G dg = 1 min s , k s k = d .delta. s , d
, k g Equation 56 ##EQU00035##
[0134] The following section discusses the system 10 generating a
lower bound on the LP relaxation at termination of column
generation. Given any fixed set G, solving the restricted master
problem (RMP) does not necessarily provide a lower bound on the ILP
over G. The system 10 can generate anytime lower bounds by adding
to the LP objective the lowest reduced costs of terms generated
during pricing
[0135] As discussed above, each observation can be assigned to at
most one hypothesis. The system generates a lower bound using
Equation 57, below, given any non-negative .lamda. provided by the
RMP:
- d .di-elect cons. .lamda. d - c .di-elect cons. C .psi. c - min g
.di-elect cons. ( 0 , .GAMMA. g + d .di-elect cons. G dg .lamda. d
+ c .di-elect cons. C C cg .psi. c ) Equation 57 ##EQU00036##
[0136] It is noted that minimization in Equation 57 is the pricing
problem called at each iteration of column generation. The bound in
Equation 57 can be tightened using an application specific
analysis. For example, the corresponding lower bound for
multi-person tracking, adds to the RMP objective the following: a
sum of the negative valued, reduced costs for the lowest reduced
cost track terminating at each detection, expressed below in
Equation 58, where there are no triplets:
- d .di-elect cons. .lamda. d - d .di-elect cons. min g .di-elect
cons. d = [ last detection in g ] min ( 0 , .GAMMA. g + d .di-elect
cons. .lamda. d G dg ) Equation 58 ##EQU00037##
[0137] It is further noted that Equation 57 provides a lower bound
on the optimal packing. Specifically, rewriting the optimization
incorporating that the number of hypothesis selected by any packing
is bounded by the number of observations, since every selected
hypothesis must contain at least one observation, yields Equation
59, below:
min .gamma. g .gtoreq. 0 g .di-elect cons. .gamma. g .ltoreq. g
.di-elect cons. G dg .gamma. g .ltoreq. 1 .A-inverted. d .di-elect
cons. g .di-elect cons. C cg .gamma. g .ltoreq. 1 .A-inverted. c
.di-elect cons. C g .di-elect cons. .GAMMA. g .gamma. g Equation 59
##EQU00038##
[0138] Dualizing the packing constraint and the subset-row
inequalities, but retaining in the minimization, the constraint
that no more than D hypothesis are selected, as expressed below
where Equation 59 is equal to Equation 60:
min .gamma. g .gtoreq. 0 g .di-elect cons. .gamma. g .ltoreq. max
.lamda. d .gtoreq. 0 .psi. c .gtoreq. 0 g .di-elect cons. .GAMMA. g
.gamma. g + d .di-elect cons. .lamda. d ( - 1 + g .di-elect cons. G
dg .gamma. g ) + c .di-elect cons. C .psi. c ( - 1 + g .di-elect
cons. C cg .gamma. g ) Equation 60 ##EQU00039##
[0139] The system 10 then relaxes the constraint the .lamda., .psi.
is optimal, and reorders terms by .gamma., which yields Equation 60
being greater of equal to Equation 61, below:
- d .di-elect cons. .lamda. d - c .di-elect cons. C - .psi. c + min
.gamma. g .gtoreq. 0 g .di-elect cons. .gamma. g .ltoreq. g
.di-elect cons. .gamma. g ( .GAMMA. g + d .di-elect cons. G dg
.lamda. d + c .di-elect cons. C C cg .psi. c ) Equation 61
##EQU00040##
[0140] It is noted that the inner minimization selects the lowest
reduced cost solution |D| times if a negative reduced cost
hypothesis exists and otherwise has zero value. Thus, Equation 61
is equal to Equation 62, below:
- d .di-elect cons. .lamda. d - c .di-elect cons. C - .psi. c + min
( 0 , min g .di-elect cons. .GAMMA. g + d .di-elect cons. G dg
.lamda. d + c .di-elect cons. C C cg .psi. c ) Equation 62
##EQU00041##
[0141] Testing and analysis of the above systems and methods will
now be discussed in greater detail. Specifically, computational
results will be discussed on the three applications discussed
above, multi-person tracking, multi-person pose estimation,
multi-cell segmentation. The system of the present disclosure used
a part of MOT 2015 training dataset, to train and evaluate
multi-person tracking in video. The system 10 further used a
structured support vector machine ("SVM") based learning approach
as the mechanism to produce cost terms. To generate the set of
detections D, the system 10 used the raw detector output provided
by the MOT dataset. The system 10 trained models with varying
subtrack length (K=2, 3, 4), and allowed for occlusion up to three
frames.
[0142] In the problem instance for testing, there are 71 frames and
322 detections in the video. The numbers of subtracks present are
1,068, 3,633 and 13,090 for K=2, 3, 4 respectively. For K=2, 48.5%
"Multiple Object Tracking Accuracy", 11 identity switches, and 9
track fragments were observed, which can be expressed as
(48.5,11,9). However, when setting K=3, 4 the performance is
(49,10,7) and (49.9, 9, 7) respectively. Thus, increasing subtrack
length provides noticeable improvements over all metrics.
[0143] FIG. 18 is a set of images showing these results.
Specifically, FIG. 18 illustrates a qualitative example of
improvement as a result of increasing subtrack length. The first
and second row describe tracks outputted when K=2 and K=4
respectively. It is noted that for K=2, track one changes identity
to track five, while with K=4 the identity of track one does not
change.
[0144] Each time the present system solves the pricing problem, the
present system adds to G, the lowest reduced cost track terminating
at each detection, excluding those with non-negative reduced cost.
As discussed above, the dynamic programming structure of the
pricing problem facilitates this computation.
[0145] FIGS. 19A-B are graphs showing a comparison of timing/cost
performance of the present disclosure with a baseline dual
decomposition approach. Specifically, the problem instances are
associated with a loose lower bound, which are tightened using
triplets. When triplets are added to the restricted master problem,
the lower bound becomes tight, on these problem instances. FIGS.
19A-B shows the convergence of upper/lower bounds as a function of
time. The present system plots the gap (absolute value of the
difference) between the bounds, and the final lower bound as a
function of time. The present system then normalizes all plotted
values, by dividing each by the value of the maximum lower bound
times -1. Each time that a triplet is added with a blue dot is
initiated don the lower bound plot. The present system compares
column generation (denoted as "CG"), against the dual decomposition
approach (denoted as "DD"). CG achieves tight upper and lower
bounds at termination. Pricing in CG is achieved using earlier
version of pricing of when triples are present. In this version, it
is not required that tracks to pass through detections in D.sub.b+
when computing V.sup.b.sub.lb.
[0146] The testing and analysis in this section used the following
enhancements to column generation: anytime lower bounds and
subset-row inequalities. However dual optimal inequalities are not
employed. The present system evaluated the above discussed methods
on the MPII-multi-person validation set, which consists of 418
images. The present system used the cost terms .theta..sup.1,
.theta..sup.2 with the following modifications. First,
.gamma..sub.d1d2=oc for each pair of unique neck detections
d.sub.1, d.sub.2. This accelerates optimization since the present
system need not explore an entire power set of neck detections
during pricing. Second, the present system construct D.sup.r as
follows. The system provides a probability that each detection d is
associated with each body part r denoted p.sub.dr. For each
detection d, the system assigns it to the set V.sup.r that
maximizes this probability. This assignment corresponds to the
following optimization arg max.sub.r p.sub.dr for a given
d.di-elect cons.V. Third, the system sets .theta..sup.0 to a single
value for the entire data set. Lastly, the system limits a size of
S.sup.r to 50,000 for each r.di-elect cons.R. The system constructs
S.sup.r as follows: the system iterates over integer k=[0, 1, 2, .
. . |V.sup.r|], then adds to S.sup.r the group of configurations
containing exactly k detections in V.sup.r. If adding a group would
have S.sup.r exceed 50,000, then the system does not add the group
and terminate construction of S.sup.r.
[0147] The set packing relaxation, is tight in over 99% of problem
instances, and in the remaining cases the gap between the lower and
upper bounds is less than 1.5% of the LP objective. The present
system produces an integral solution, when the set packing LP is
loose by solving the set packing ILP over G. FIG. 20 shows a table
comparing column generation against a prior art heuristic
optimization procedure in terms of the accuracy (average precision)
on standard computer vision benchmarks. Running times are measured
on an Intel i7-6700k quad-core CPU. The present system outperforms
the prior art procedure to localize body parts, such as wrists and
ankles. Optimization time is accelerated by using nested Benders
decomposition to accelerate dynamic programming, which provides up
to 500 times speedup.
[0148] FIG. 21 is a set of images showing sample outputs 152, 154,
156, 158, 160, 162, 164 of the system of the present disclosure.
For each person, the present system averages a location of the
detections corresponding to each of the body parts to produce a
corresponding colored dot, denoting the position of the body
part.
[0149] The following section will discuss the performance
improvements provided by the system of the present disclosure using
DOI (dual optimal inequalities). To establish the value of DOI, the
present system decouples the value provided by DOI from that
provided by varying the solver. The solver is defined by an LP
toolbox (e.g., linprog, CPLEX, Gurobi), options for the toolbox
such as algorithm used (interior point, simplex, etc.) and the
computer used. Decoupling the value added by DOI from that added by
the solver is important since some solvers work dramatically better
than others and that DOI provides different speedups depending on
the solver.
[0150] This difference in performance is accounted for by the
number of iterations of column generation. Different solvers
provide different dual optimal solutions. In column generation, the
space of dual optimal solutions rarely consists of a single point
but a space of such points. Using dual optimal solutions that are
well centered allows column generation achieve faster convergence.
Well centered solutions are solutions that have low L2 norm,
meaning that the mass of the dual variables is not concentrated in
a small number of variables.
[0151] Regarding a well centered solution, considering a step of
pricing, using a poorly centered dual solution, in which only a
small number of observations D.sup.- have non-zero dual value. The
hypotheses produced in pricing, will not include D.sup.-, but will
be otherwise inclined to produce similar columns to those produced
in the first iteration of column generation, where dual variables
have value zero. Thus, the use of poorly centered solutions tends
to lead to little progress in column generation.
[0152] In the present system, the time spent performing pricing
vastly exceeds that for solving the RMP (restricted master
problem). Thus, using a faster toolbox, such as CPLEX or Gurobi to
solve the RMP adds little value if the resultant dual solution is
not well centered. The solvers used for testing are as follows.
Solver one: MAT-LAB 2016 linprog solver with default settings.
Solver two: MATLAB 2017 with the interior point solver on a
workstation. FIG. 22 is a table showing the results between solver
one and solver 2. Specifically, FIG. 22 is a table showing a
comparison in the total time in seconds, and comparative speedup
(over no DOI) using DOI on the two different solvers. The first
three columns describes the total time needed to solve the LP
relaxation for column generation using no DOI, invariant DOI, and
the varying DOI respectively. In the final two column, FIG. 22
shows the factor speed up achieved by using invariant and varying
DOI over not using DOI. This is done by dividing the content in
columns two and three by the corresponding values in column
one.
[0153] FIGS. 23-24 are scatter plots showing the time consumed
using DOI for each solver. Specifically, FIGS. 23-24 show the
change in total run time when using dual optimal inequalities on
two different computers. Each data point corresponds to the time
needed to fully optimize the LP relaxation of set packing relative
to the time needed when not using dual optimal inequalities. FIG.
23 used a MATALB linprog solver with default settings using a 2014
Apple.RTM. Macbook 13 inch computer. FIG. 24 shows a powerful up to
date mainframe using Matlab linprog solver using the interior point
algorithm.
[0154] Testing showed that the DOI, that vary with G outperform
those that are invariant. The use of DOI provides a large speedup
to solver two (nearly 20 times speedup) but limited speedup to
solver one (only 1.4-1.6 times speedup). Further, solver one is an
older computer running an older version of MATLAB than solver two
but the timing results of solver one are better than those of
solver two for each selection of DOI. This is a consequence of
solver one producing well centered solutions, and solver two not.
The use of DOI makes solver two perform almost as well as solver
one demonstrating the value of DOI when the solver is poorly
selected.
[0155] The following experiments use the column generation
enhancement of anytime lower bounds but not subset-row
inequalities. DOI are not used in the experiments. The present
system applies column generation for multi-cell segmentation on
three different data sets. The problem instances include
challenging properties, such as densely packed and touching cells,
out-of-focus artifacts, and variations in the shape/size of
cells.
[0156] To generate cost terms, the present system uses an open
source toolbox to train a random forest classifier to discriminate:
(1) boundaries of in-focus cells; (2) in-focus cells; (3)
out-of-focus cells; and (4) background. For training, the present
system used <1% pixels per dataset with generic features e.g.
gaussian, lapla-cian, and structured tensor. The output of this
random forest classifier are also used to generate superpixels.
[0157] FIG. 25 is an illustration showing an output of the present
system. Specifically, FIG. 25 shows example cell segmentation
results of datasets one-three (left to right), where the rows
(going from top to bottom) original image, cell of interest
boundary classifier prediction image, super-pixels, color map of
segmentation, and enlarged views of the inset (black square). For
dataset two, it is observed that the present system successfully
segments the cells in a problem instance, where there are large
variations of cell shape/size even in the same image.
[0158] The performance of the system of the present disclosure was
compared with prior art systems, in terms of detection (precision,
recall and F-score), and segmentation (Dice coefficient and Jaccard
index) which are common measures in bio-image analysis. FIG. 26 is
a graph showing the results of the comparison between the present
system 170, and prior art systems. Evaluation comparison of
datasets one-three on precision (P), recall (R), F-score (F), dice
coefficient (D) and Jaccard index (J) are reported for the present
system 170 and the prior art systems. System [88] uses the
algorithms planar correlation clustering (PCC) and non-planar
correlation clustering (NPCC). The present system 170 achieves or
exceed performances of the prior art systems. Additionally, the
present system requires little training data, relative to some
prior art systems.
[0159] Next, performance of the present system with regard to a gap
between the upper and lower bounds is considered. The gaps are
normalized by dividing by an absolute value of the lower bound. For
the three data sets, the proportion of problem instances that
achieve normalized gaps under 0.1 are 99.28%, 80% and 100%, on
datasets one, two, and three, respectively.
[0160] FIG. 27 is a graph showing optimization time for column
generation across problem instances in dataset one. Specifically,
regarding timing for dataset one, FIG. 27 shows a plotted function
of time the proportion of problem instances that take longer than a
given amount of time. For over 99% of problem instances, the gap
between upper and the lower bound at termination is zero. Thus, the
present system is approximately an order of magnitude faster than
the combinatorial optimization approaches of prior art systems.
Optimization time for column generation is dominated by pricing so
parallelization may dramatically accelerate optimization.
[0161] FIG. 28 is a diagram showing a hardware and software
components of a computer system 202 on which the system of the
present disclosure can be implemented. The computer system 202 can
include a storage device 204, computer vision software code 206, a
network interface 208, a communications bus 210, a central
processing unit (CPU) (microprocessor) 212, a random access memory
(RAM) 214, and one or more input devices 216, such as a keyboard,
mouse, etc. The server 202 could also include a display (e.g.,
liquid crystal display (LCD), cathode ray tube (CRT), etc.). The
storage device 204 could comprise any suitable, computer-readable
storage medium such as disk, non-volatile memory (e.g., read-only
memory (ROM), eraseable programmable ROM (EPROM),
electrically-eraseable programmable ROM (EEPROM), flash memory,
field-programmable gate array (FPGA), etc.). The computer system
202 could be a networked computer system, a personal computer, a
server, a smart phone, tablet computer etc. It is noted that the
server 202 need not be a networked server, and indeed, could be a
stand-alone computer system.
[0162] The functionality provided by the present disclosure could
be provided by computer vision software code 206, which could be
embodied as computer-readable program code stored on the storage
device 204 and executed by the CPU 212 using any suitable, high or
low level computing language, such as Python, Java, C, C++, C#,
.NET, MATLAB, etc. The network interface 208 could include an
Ethernet network interface device, a wireless network interface
device, or any other suitable device which permits the server 202
to communicate via the network. The CPU 212 could include any
suitable single-core or multiple-core microprocessor of any
suitable architecture that is capable of implementing and running
the computer vision software code 206 (e.g., Intel processor). The
random access memory 214 could include any suitable, high-speed,
random access memory typical of most modern computers, such as
dynamic RAM (DRAM), etc.
[0163] Having thus described the system and method in detail, it is
to be understood that the foregoing description is not intended to
limit the spirit or scope thereof. It will be understood that the
embodiments of the present disclosure described herein are merely
exemplary and that a person skilled in the art can make any
variations and modification without departing from the spirit and
scope of the disclosure. All such variations and modifications,
including those discussed above, are intended to be included within
the scope of the disclosure. What is desired to be protected by
letters patent is set forth in the following claims.
* * * * *