U.S. patent application number 13/448207 was filed with the patent office on 2012-08-16 for multi-stage image pattern recognizer.
This patent application is currently assigned to Five Apes, Inc.. Invention is credited to Williams J. F. Paquier.
Application Number | 20120207394 13/448207 |
Document ID | / |
Family ID | 42285071 |
Filed Date | 2012-08-16 |
United States Patent
Application |
20120207394 |
Kind Code |
A1 |
Paquier; Williams J. F. |
August 16, 2012 |
MULTI-STAGE IMAGE PATTERN RECOGNIZER
Abstract
An image-based pattern recognizer and a method and apparatus for
making such a pattern recognizer are disclosed. By employing
positional coding, the meaning of any feature present in an image
can be defined implicitly in space. The pattern recognizer can be a
neural network including a plurality of stages of observers. The
observers are configured to cooperate to identify the presence of
features in the input image and to recognize a pattern in the input
image based on the features. Each of the observers includes a
plurality of neurons. The input image includes a plurality of
units, and each of the observers is configured to generate a
separate output set that includes zero or more coordinates of such
units.
Inventors: |
Paquier; Williams J. F.;
(Toulouse, FR) |
Assignee: |
Five Apes, Inc.
San Mateo
CA
|
Family ID: |
42285071 |
Appl. No.: |
13/448207 |
Filed: |
April 16, 2012 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
12344346 |
Dec 26, 2008 |
8160354 |
|
|
13448207 |
|
|
|
|
Current U.S.
Class: |
382/190 |
Current CPC
Class: |
G06K 9/4623
20130101 |
Class at
Publication: |
382/190 |
International
Class: |
G06K 9/46 20060101
G06K009/46 |
Claims
1. An apparatus comprising: a plurality of converters, each to
input an input image and to compute a potential as a measure of
contrast in the input image for each of a plurality of units of the
input image, each converter further to generate an output set
including a ranked set of coordinates, the ranked set of
coordinates containing a coordinate of each unit in the input image
whose potential exceeds a first threshold, the set of coordinates
being ranked based on potential; and a first plurality of
observers, each observer to process independently the output set of
each of the converters, each of the observers configured to
recognize a different type of feature in the input image when a
coordinate of a feature of the corresponding type is present in the
output set of one or more of the converters.
2. An apparatus as recited in claim 1, further comprising a first
plurality of pattern filters, each including a plurality of weight
matrices, wherein each of the first plurality of observers is
configured to use a different one of the pattern filters to
recognize the corresponding type of feature in the input image.
3. An apparatus as recited in claim 2, wherein the first plurality
of pattern filters are individually configured so that each
observer of the first plurality of observers can recognize features
at a different angular orientation in the input image.
4. (canceled)
5. An apparatus as recited in claim 1, further comprising: a second
plurality of observers, each to process independently the output
set of each of the first plurality of observers, each observer of
the second plurality of observers configured to generate an output
representing a relaxation of locality of a feature recognized in
the input image.
6. An apparatus as recited in claim 5, each observer of the second
plurality of observers further being configured to generate the
output to represent a prioritizing of endpoints of feature
recognized in the input image more highly than a mid-section of the
feature.
7. An apparatus comprising: a processor; and a memory storing code
which, when executed by the processor, instantiates a plurality of
converters, each to input an input image and to compute a potential
as a measure of contrast in the input image for each of a plurality
of units of the input image, each converter further to generate as
output a ranked set of coordinates containing a coordinate of each
unit in the input image whose potential exceeds a first threshold,
the set of coordinates being ranked based on potential; a first
plurality of observers to process the outputs of the converters to
recognize features in the input image that correspond to
coordinates in the outputs of the converters; and a first plurality
of pattern filters, each operatively coupled between a different
pair of a converter of the plurality of converters and an observer
of the first plurality of observers, the first plurality of pattern
filters being individually configured so that each observer of the
first plurality of observers can recognize features at a different
angular orientation in the input image.
8. (canceled)
9. An apparatus as recited in claim 7, wherein the memory further
stores code which, when executed by the processor, instantiates: a
second plurality of observers to process outputs of the first
plurality of observers; and a second plurality of pattern filters,
each operatively coupled between a different pair of an observer of
the first plurality of observers and an observer of the second
plurality of observers, the second plurality of pattern filters
being individually configured so that each observer of the second
plurality of observers generates an output representing a
relaxation of locality of a feature recognized in the input
image.
10. An apparatus as recited in claim 9, wherein the second
plurality of pattern filters further are individually configured so
that each observer of the second plurality of observers generates
an output representing prioritizing endpoints of feature recognized
in the input image more highly than a mid-section of the
feature.
11. An apparatus as recited in claim 9, wherein each observer of
the first plurality of observers is configured to for each
coordinate in the output of each of the first plurality of
observers, integrate a corresponding potential over a range of time
slices, and for each said unit whose potential exceeds a second
threshold after integration by the observer, including the
coordinate of the unit in an output set of the observer; and
wherein each observer of the second plurality of observers is
configured to, for each coordinate in the output of each of the
first plurality of observers, integrate a corresponding potential
over a range of time slices, and for each said unit whose potential
exceeds a third threshold after integration by the observer,
including the coordinate of the unit in an output set of the
observer.
12. A method comprising: using a plurality of contrast converters
to identify a plurality of units of an input image as potentially
representing a feature in the input image; using the plurality of
contrast converters to generate a first output set that contains a
ranking of coordinates of the identified units of the input image;
using a plurality of observers to attempt to recognize, in the
input image, a feature from each of a first plurality of feature
categories, based on the first output set, by independently using
each of a first plurality of weight patterns to integrate a
potential for each of the identified units, based on the ranking of
coordinates in the first output set; and using the plurality of
observers to generate a plurality of second output sets as results
of attempting to recognize a feature from each of the first
plurality of feature categories, each said second output set
corresponding to a different one of the first plurality of feature
categories.
13. A method as recited in claim 12, further comprising:
automatically triggering a specified action in response to
recognizing a pattern in the input image, based on the plurality of
second output sets.
14. A method as recited in claim 12, wherein identifying a
plurality of units of an input image as potentially representing a
feature comprises, for each of the plurality of units: computing a
measure of contrast for the unit; and identifying the unit as
potentially representing a feature based on the measure of
contrast.
15. A method as recited in claim 14, wherein computing a measure of
contrast for a unit comprises: computing a measure of positive
contrast; and computing a measure of negative contrast.
16. A method as recited in claim 14, wherein the ranking of
coordinates in the first output set is based on the measures of
contrast of the corresponding units of the input image.
17. A method as recited in claim 12, wherein each of the first
plurality of weight patterns corresponds to a different angular
orientation in the input image.
18. A method as recited in claim 12, wherein using each of the
first plurality of weight patterns to integrate a potential for
each of the identified units comprises: applying a modulation
factor to a weight, for each of a plurality of iterations of
integration, to decode said ranking.
19. A method as recited in claim 12, further comprising: attempting
to recognize, in the input image, a feature from a second feature
category, based on one of the second output sets, by using a second
weight pattern to integrate a potential for each unit whose
coordinate is represented in said second output set, based on a
ranking of coordinates in said second output set, and generating a
third output set based thereon.
20. A method as recited in claim 19, further comprising:
automatically triggering a specified action in response to
recognizing a pattern represented by the third output set.
21. A method as recited in claim 20, further comprising: attempting
to recognize, in the input image, a feature from each of a second
plurality of feature categories, based on the second output set, by
independently using a second plurality of weight patterns to
integrate a potential for each unit whose coordinate is represented
in the second output set, based on a ranking of coordinates in the
second output set; and generating a plurality of third output sets,
each said third output set corresponding to a different one of the
second plurality of feature categories.
22. An apparatus comprising: means for inputting an input image;
means for computing a potential as a measure of contrast in the
input image for each of a plurality of units of the input image;
means for producing a first output set including a coordinate of
each unit of the input image whose potential exceeds a first
threshold, such that coordinates are located in the first output
set according to a ranking based on potential; means for
integrating, for each coordinate in the first output set, a
corresponding potential, based on the ranking; means for
identifying one or more coordinates in the first output set whose
potential exceeds a second threshold after said integrating; and
means for including, for each said unit whose potential exceeds the
second threshold, the coordinate of the unit in a second output
set.
23. An apparatus as recited in claim 22, further comprising: means
for automatically triggering a specified action in response to
recognizing a pattern in the input image, based on the second
output set.
24. An apparatus as recited in claim 22, further comprising: means
for computing the measure of contrast by concurrently means for
identifying positive contrast in the input image, and means for
identifying negative contrast in the input image.
25. An apparatus as recited in claim 22, wherein producing the
first output set comprises: for each said unit whose potential
exceeds the first threshold, linearly rescaling the potential of
said unit to a range which corresponds to said number of time
slices, and adding the coordinate of the unit to the first output
set in a position which is based on the potential of the unit.
26. An apparatus as recited in claim 22, wherein integrating the
corresponding potential over a number of time slices comprises:
adding to a previously computed potential the product of a
modulation factor and a corresponding weight value, for each of a
plurality of iterations.
27. An apparatus as recited in claim 26, wherein the weight value
is from a matrix of weight values, the matrix corresponding to a
particular angular orientation, such that a result of said
integrating and said identifying is to recognize a feature in the
input image which has the particular angular orientation.
28. An apparatus as recited in claim 27, further comprising: means
for independently and concurrently performing said integrating by
using a plurality of different matrices of weight values, wherein
said identifying and said including produce a plurality of separate
second output sets based on the plurality of different matrices of
weight values.
29. An apparatus as recited in claim 22, further comprising: means
for integrating, for each coordinate in the second output set, the
corresponding potential over a range of time slices; means for
identifying one or more coordinates in the second output set whose
potential exceeds the second threshold after integration; and means
for including the coordinate of the unit in a third output set, for
each said unit whose potential exceeds the second threshold.
30. An apparatus as recited in claim 29, further comprising: means
for automatically triggering a specified action based on
recognizing a pattern in the third output set.
31. An apparatus comprising: a processor; and a memory storing code
which, when executed by the processor, causes instantiation of a
plurality of converters, each to input an input image and to
compute a potential as a measure of contrast in the input image for
each of a plurality of units of the input image, each converter
further to generate as output a set of coordinates of each unit in
the input image whose potential exceeds a first threshold, said
output ranked based on potential; and a plurality of observers
coupled downstream from the plurality of converters, each observer
to process output of an upstream converter or observer, by, for
each coordinate in the output of the upstream converter or
observer, integrating a corresponding potential over a range of
time slices, and for each said unit whose potential exceeds a
second threshold after said integrating, including the coordinate
of the unit in an output set.
32. An apparatus as recited in claim 31, wherein the plurality of
converters include a first converter to identify positive contrast
in the input image and a second converter to identify negative
contrast in the input image.
33. An apparatus as recited in claim 31, wherein each converter
generates its corresponding output by computing the potential for
each of the plurality of units of the input image based on the
measure of contrast; applying each of the potentials to the
specified threshold; and for each said unit whose potential exceeds
the specified threshold, linearly rescaling the potential to a
range which corresponds to said number of time slices, and adding
the coordinate of the unit in the input image to the output in a
position which is based on the potential of the unit.
34. An apparatus as recited in claim 31, wherein the memory further
stores code which, when executed by the processor, causes execution
of an engine to form a detection list for each of the plurality of
observers by keeping only a coordinate representing a local maximum
from the output set of each said observer.
35. An apparatus as recited in claim 31, wherein each of the
observers performs the integrating by adding to a previously
computed potential, the product of a modulation factor and a
corresponding weight value, for each of a plurality of
iterations.
36. An apparatus as recited in claim 35, wherein the weight value
is from a matrix of weight values.
37. An apparatus as recited in claim 36, wherein the region is a
user selected region.
38. An apparatus as recited in claim 36, wherein each of the
observers uses a different matrix of weight values to perform the
integrating, and wherein each said matrix corresponds to a
different particular angular orientation.
39. An apparatus comprising: a plurality of converters, each to
input an input image and to generate an output list indicative of a
measure of contrast in the input image, by computing a potential
for each of a plurality of units of the input image based on the
measure of contrast, each unit having a coordinate in the input
image, applying each of the potentials to a first threshold, and
for each said unit whose potential exceeds the threshold, linearly
rescaling the potential to a range corresponding to a number of
time slices, and adding the coordinate of the unit in the input
image to the output list in a position which is based on the
potential of the unit, such that coordinates in the output list are
ranked according to potential; a plurality of first stage observers
coupled downstream from the plurality of converters, each first
stage observer to process an output list of an upstream converter
of the plurality of converters, by, for each coordinate in the
output list in the upstream converter, integrating the
corresponding potential over said range of time slices, including
applying a modulation factor to a corresponding weight value,
wherein the weight value is from a first matrix of weight values,
and for each said unit whose potential exceeds a second threshold
after said integrating, adding the coordinate of the unit to an
output list; a plurality of second stage observers coupled
downstream from the plurality of first stage observers, each second
stage observer to process an output list of an upstream observer of
the plurality of first stage observers, by, for each coordinate in
the output list in the upstream observer, integrating the
corresponding potential over a range of values, including applying
a modulation factor to a corresponding weight value, wherein the
weight value is from a second matrix of weight values, and for each
said unit whose potential exceeds the threshold after said
integrating, adding the coordinate of the unit to an output list;
and an engine to form a detection list for each of the plurality of
first stage and second stage observers by keeping only a coordinate
representing a local maximum from the output list of each said
observer.
40. An apparatus as recited in claim 39, wherein the plurality of
converters include a first converter to identify positive contrast
in the input image and a second converter to identify negative
contrast in the input image.
41. An apparatus as recited in claim 39, wherein each of observer
of the plurality of first and second stage observers uses a pattern
filter to determine the corresponding weight value, the pattern
filter including: a projective weight matrix; and a receptive
weight matrix.
42. An apparatus as recited in claim 39, wherein each of the
plurality of first stage observers is configured to recognize a
feature at a different particular angle of rotation.
43. An apparatus as recited in claim 42, wherein each of the
plurality of second stage observers is configured to generate an
output representing a relaxation of locality of a feature
recognized in the input image and further representing a
prioritizing of endpoints of feature recognized in the input image
more highly than a mid-section of the feature.
Description
[0001] This is a continuation of U.S. patent application Ser. No.
12/344,346, filed on Dec. 26, 2008, which is incorporated herein by
reference.
FIELD OF THE INVENTION
[0002] At least one embodiment of the present invention pertains to
image-based pattern recognition, and more particularly, to an
image-based pattern recognizer and a method and apparatus for
creating such a pattern recognizer.
BACKGROUND
[0003] Pattern recognition is used in many different aspects of
modern technology. For example, modern cameras can detect faces,
and optical character recognition (OCR) and automatic speech
recognition (ASR) are now relatively common. While the capabilities
of sophistication of pattern recognizers are steadily improving,
they still have significant limitations.
[0004] Images, or more generally, any natural patterns, contain an
enormous amount of information. Unfortunately, that data is not
easily exploitable. In particular, there is no known pattern
recognizer that would allow the processing of images in the same
way that we can now process words. For example, text indexing is
done by searching occurrences of words (typically dictionary
entries). Anytime an occurrence is found, its location is stored in
a hash-table or other similar mechanism. The relation between raw
data (text) and search patterns (character strings) is direct. The
same is not true, however, of image-based pattern recognizers.
[0005] The classic approach in pattern recognition is based on a
priori knowledge of the information to extract for each specific
recognition task. Thus, the structure of the algorithm of the
pattern recognizer normally contains explicit routines and
variables to implement this recognition function. This requires
long periods of research and development to develop a pattern
recognizer specific for each individual recognition category, be
it, OCR, ASR, faces, facial expressions, gestures, etc. This
process is labor intensive and increases the cost of resulting
applications.
SUMMARY
[0006] Introduced here are an image-based pattern recognizer and a
method and apparatus for making such a pattern recognizer. The
techniques introduced here eliminate the need for a long research
and development period associated with making a pattern recognizer.
By employing positional coding, the meaning of any feature present
in an image can be defined implicitly in space. The pattern
recognizer algorithms contain no explicit references to the problem
to be solved or the pattern(s) to be extracted, thus providing a
generic pattern recognizer that can be customized by a user (e.g.,
an application developer) to recognize any of various different
types of patterns for any of various different types of
applications. In effect, a pattern recognizer such as introduced
here forms a building block from which many different types of
application-specific pattern recognizers can be built.
[0007] In certain embodiments, a pattern recognizer according to
the techniques introduced here is in the form of a neural network.
The neural network includes a plurality of processing elements
called observers coupled in a multi-stage neural network, through
which an input image is processed, where each stage includes at
least one (though typically two or more) observer. The network of
observers are configured to cooperate to identify the presence of
features in the input image and to recognize a pattern in the input
image based on the features. Each of the observers includes a
plurality of neurons. The input image includes a plurality of
units, and each of the observers is configured to generate a
separate output set that includes zero or more coordinates of such
units.
[0008] At least two of the observers in a pattern recognizer are
each configured to generate its own output set by: 1) integrating a
corresponding potential intensity over a range of time slices, for
each coordinate in the output set of an upstream (afferent)
observer, and 2) for each unit of the input image whose potential
exceeds a threshold after integration by the observer, including
the coordinate of the unit in the output set. The presence of a
coordinate in the output set of an observer represents recognition
of a particular type of pattern in the input image, at a position
corresponding to that coordinate. Coordinates listed in the output
set up an observer are ranked based on potential.
[0009] In certain embodiments, an output list of an observer is
organized into a plurality of "time slices", with each set of
coordinates in the list being binned into a time slice according to
the potential of the corresponding output at that location (highest
intensity first). This produces a ranked list of coordinates, where
they coordinates are ranked by potential, for use in the
integration process by downstream (efferent) observers.
[0010] In certain embodiments, the neural network further includes
a plurality of pattern filters, each of which includes a weight
matrix (or multiple weight matrices) including a plurality of
weight values. The observers are configured to use the weight
values in integrating corresponding potentials of units of the
input image. Observers are further configured to apply a modulation
factor to the weight values during integration, to decode the
rankings in the output sets of upstream observers.
[0011] Positional coding is employed in the techniques introduced
here, in at least two ways: First, the type of pattern that any
particular observer recognizes is based on the position of that
observer within the neural network. Second, the positional coding
also is employed within the observer themselves. Specifically, if
two neurons belong to the same observer, these two neurons will
code for the same type of pattern, but at different positions in
space. The firing of any particular neuron indicates the presence
of a category (type) of patterns in the input image. Conversely,
the non-firing of neuron can also be important, in that it
represent a placeholder in the positional coding paradigm, like a
zero in positional coding system for numbers. Non-firing events
allow the creation of patterns for downstream observers which
observe this pattern. The output of each neuron is observed by a
downstream observer only if that output represents a detected
event. This is in contrast with, and much more efficient than,
conventional integrate-and-fire type neural networks, in which
downstream elements scan all upstream neurons to identify those
that have fired.
[0012] Also introduced here is a technique of creating a pattern
recognizer. In certain embodiments, the observers are identical
generic elements, i.e., they are not specific to any particular
pattern or pattern type. In certain embodiments, observers are
implemented as software. Any of various conventional programming
languages can be used to implement observers and the supporting
framework, including C, Objective-C, C++, or a combination thereof.
In other embodiments, observers and/or other elements of the system
may be implemented in hardware, such as in one or more application
specific integrated circuits (ASICS), programmable logic devices
(PLDs), microcontrollers, etc.
[0013] Implementing observers as identical generic elements enables
the interactive creation of a special-purpose pattern recognizer by
a user (e.g., an application developer) without the user having to
write any code, i.e., by appropriately adding these observers into
a neural network of observers and "teaching" the observers through
appropriate user input. A user can add an observer to a network or
teach an observer by using standard image editing selection tools,
as described below.
[0014] In certain embodiments, a method of creating a pattern
recognizer includes using a first plurality of observers in a
network of observers, to identify a plurality of units of an input
image that represent a feature, and using a second plurality of
observers in the network to attempt to recognize, in the input
image, a pattern from each of a plurality of pattern types, based
on outputs of the first plurality of observers, each observer of
the second plurality of observers is configured to recognize a
different type of pattern. The method further includes adding a new
observer to the network of observers, to recognize a new pattern
based on output of at least one observer of the second plurality of
observers.
[0015] In another aspect, the method includes adding a new observer
to a network of observers, including a plurality of observers
operatively coupled to each other in successive stages, where each
stage includes at least two observers, and each observer is
configured to produce an output in response to recognizing a
different type of pattern in an input image. The method further
includes configuring the new observer to recognize a pattern in the
input image based on output of at least one observer of the
plurality of observers.
[0016] With the techniques introduced here, a finite set of
algorithms and method to store implicitly the meaning of data in
position (e.g. addresses in memory) allow the creation of any
pattern representing concrete objects or signals in images, video
and sound. This enables dealing with images, video and audio and
doing the kind of things we easily do with text today, such as
indexing, comparing, sorting, selecting, searching, replacing,
correcting, changing style, triggering actions, etc.
[0017] Other aspects of the techniques introduced here will be
apparent from the accompanying figures and from the detailed
description which follows.
BRIEF DESCRIPTION OF THE DRAWINGS
[0018] One or more embodiments of the present invention are
illustrated by way of example and not limitation in the figures of
the accompanying drawings, in which like references indicate
similar elements and in which:
[0019] FIG. 1 illustrates is a conceptual illustration of
principles of pattern recognition in accordance with the techniques
introduced here;
[0020] FIG. 2 illustrates a simple example of a network of
observers;
[0021] FIG. 3 illustrates examples of various input and output
patterns for multiple stages of observers in a network;
[0022] FIG. 4 is a block diagram of an architecture of a system for
operating a pattern processor according to the techniques
introduced here;
[0023] FIG. 5 illustrates the relationship between time steps and
time slices;
[0024] FIG. 6 illustrates the relationship between a neuron in a
given observer and neurons in upstream observers;
[0025] FIG. 7 illustrates the use of positive and negative contrast
converters in a network of observers;
[0026] FIG. 8 schematically illustrates the connectivity between
successive stages of observers in a network, by use of a pattern
filter containing synaptic weight matrices.
[0027] FIG. 9 shows an example of the relationship between the
outputs of two observers with a given pattern filter between
them;
[0028] FIG. 10 shows an example of rank order coding of
potentials;
[0029] FIGS. 11A and 11B illustrate an example of the use of
receptive and projective synaptic weight matrices in integration of
neurons in an observer;
[0030] FIG. 12 illustrates a practical example of a pattern
recognizer formed of a network of observers;
[0031] FIG. 13 illustrates the characteristics of a synaptic weight
matrix for the complex cells observers shown in FIG. 12;
[0032] FIG. 14 shows an example of an algorithm executed by the
observer engine, for streamed input;
[0033] FIG. 15 shows an example of a process for initializing an
observer;
[0034] FIG. 16 shows an example of an algorithm for executing a
converter;
[0035] FIG. 17 shows an example of an algorithm for executing an
observer which is not a converter;
[0036] FIG. 18 conceptually illustrates execution of an observer
which is not a converter;
[0037] FIG. 19 illustrates an example of a process of creating a
new observer;
[0038] FIG. 20 conceptually illustrates the process of creating a
new observer;
[0039] FIGS. 21A and 21B together show an example of a process for
observer learning (also called teaching an observer); and
[0040] FIG. 22 conceptually shows an example of a process for
observer learning.
DETAILED DESCRIPTION
[0041] References in this specification to "an embodiment", "one
embodiment", or the like, mean that the particular feature,
structure or characteristic being described is included in at least
one embodiment of the present invention. Occurrences of such
phrases in this specification do not necessarily all refer to the
same embodiment.
[0042] FIG. 1 illustrates the general principle behind the method
of finding a pattern in an image according to the techniques
introduced here. On the left side of the figure is the real world
space, and on the right side is the image space (acquired image of
the real world) and a representation space which contains
categories representing features in images.
[0043] An objective of the technique introduced here is to use
positional coding to completely disambiguate the relationship
between an object 11 in the real world and a category 12
representing that object. Before discussing the technique further,
it is useful to discuss certain concepts, such as the concepts of
object, image, feature, category and meaning.
[0044] Object: Objects in the real world can be fully observable or
partially observable, depending on the nature of the space. If the
input space is the real world, visual objects are lighted and
deformable 3D objects not fully observable in a single image. If
the input is a 2D space with drawings, letters or numbers, the
objects are 2D drawings fully observable in a single image. If the
input space is a spectrogram of real world sound, objects are words
and sounds and are fully observable but potentially superimposable
(example: several people speaking at the same time).
[0045] Image: Images have an important property: They are matrixes
of pixels, where (in the case of color images) each pixel can take
any of several colors. Images therefore have a positional coding
property. As in the well-known LEGO game, pixels in an image are
like bricks: Depending on their color and relative position to
other pixels, they will create images containing an interpretable
meaning. As in positional coding, the notion of `zero` as a place
holder is used for creating spatial structures.
[0046] Feature: A feature, as the term is used herein, it is set of
one or more spatially organized pixels. An example feature 13 is
shown in FIG. 1. Features are defined in the image space. Pixels
and features can potentially have billions of meanings (just as
LEGO bricks can potentially be used in billions of
constructions).
[0047] Category: Categories are organized into a network of
observers. The elementary categories are sets of features (which
themselves are sets of pixels), while higher level categories are
sets of more elementary categories. Referring again to FIG. 1, one
could have, for example, categories of local orientations, then
categories of curves based on categories of local orientations,
then round shapes based on the categories of curves, resulting in a
category of this particular fruit. Reference is made to
"categories" because the intent is to encode all the potential
images of this kind of fruit in that category, and of course fruit
images which have never been seen before (generalization
capacity).
[0048] The higher one goes in the network of categories (i.e., the
farther from the input of the network), the less potential there is
for meaning for categories in the real world, because at each stage
of the network we add constraints which decrease the potential
categories. At the output of the network, we have categories which
reflect a totally disambiguated representation of objects in the
real world.
[0049] Categories, as the term is used herein, include the notion
of positional coding, as discussed further below. Specifically, the
activation (firing) of a neuron (processing unit) reflects the
presence of a category of patterns in the input image at the
coordinate of the activated neuron. Note that the terms "processing
unit" and "neuron" are synonymous and are used interchangeably in
this description. A neuron can fire anytime its potential passes
its threshold. This means that several features similar to each
other are able to make the neuron fire. Consequently, it means that
the neuron represents multiple similar patterns, i.e., category of
patterns, in the input image.
[0050] In other words, an observer in the network literally
observes proximally its upstream input patterns and distally is
able to recognize a category of patterns in the input image. For
example, if an observer has been trained to recognize faces, it
will code for a category of patterns representing faces. Ideally,
any time a face is presented in an input image, some neurons in the
Observer will fire.
[0051] Meaning: Meaning is the unique relationship between a real
object and a category representing the object in the real world. An
objective of the method is to use the position coding algorithm
detailed below to totally disambiguate the relationship between an
object in the real world and a category representing this
object.
Overall Approach
[0052] A pattern recognizer, according to certain embodiments of
the invention, is a runtime neural network of observers. Observers
instantiate categories such as defined above. The neural network
has multiple stages, or layers, through which an input image is
processed, where each stage includes at least one observer
(typically a stage contains two or more observers). FIG. 2
illustrates a very simple example of such a network, which contains
five observers 21 and three pattern filters 23A, 23B and 23C.
[0053] Certain embodiments use a spiking network model, rank order
coding and event driven approach, as described further below. Note,
however, that at no point does the technique introduce any
algorithmic element that breaks the positional coding property.
[0054] Once instantiated by the observer engine, observers are
connected through pattern filters to other observers. A pattern
filter is a set of weight values which make a generic observer
recognize a particular pattern. Each observer, combined with its
immediately upstream pattern filter, is executed on a set of input
patterns and produces a new pattern as an output. Thus, an observer
is seen as a pattern by other, downstream observers.
[0055] As the outputs of observers are patterns, they constitute
ideal input for higher level (i.e., downstream) observers. As such,
a face detector can be created on top of a left eye, right eye and
nose detector, by chaining together observers with their respective
pattern filters.
[0056] Each pattern filter extracts a particular aspect of the
input pattern. Referring now to the example of FIG. 3, where input
patterns comprise positive and negative contrasts of the same input
image, an example of the outputs of various stages of observers in
a network is shown.
[0057] Observers and pattern filters can be chained in a network.
For example, processing images with four different types of filters
corresponding to upper left, upper right, lower right, lower left
corners, will produce four pattern images, each extracting the
presence of a different kind of corner, then on top of these four
patterns, another pattern filter can then easily extract any kind
of quadrangle.
[0058] The more processing power is available, the more patterns in
images can be filtered. Note that the creation of filters relies on
learning rules and does not require writing code by the user. The
user can create a new observer in the network in order to recognize
a new pattern. The user can also teach an existing observer
(through examples or counter-examples) to improve its recognition
accuracy.
[0059] The user does not have to create pattern filters manually. A
pattern filter is automatically and almost instantaneously created
by the system (describe below), when the user so desires, by using
a user selection in the input image as a starting pattern. This
user selection sets the weight values in the pattern filter, as
described below. One engine cycle later, the immediately downstream
observer from the pattern filter starts to provide an output list
according to that pattern. The selected pattern gets the best
matching level in the output list, but patterns which resemble the
selection are also detected.
[0060] At this point the user has defined a category which contains
a single example. However, this may not be sufficient to clearly
define a frontier between what the user wants in the category and
what the user wants to exclude. Consequently, any subsequent
selections of other positive or negative samples (through the
observer teaching/learning process described below) will refine the
observer category definition.
[0061] User choice defines junctions in category definition. For
example, if an observer is configured (by its upstream pattern
filter) to recognize a face, adding other faces to the category
will eventually build a generic face observer. On the other hand,
excluding other faces while adding other photos of the same person
will build a recognizer for that particular face.
[0062] In a practical application, the output detection list of the
neural network can be used to trigger or perform any of various
actions, such as indexing, comparing, sorting, selecting,
searching, replacing, correcting, changing style, triggering an
action, etc. for example, the output detection list of the network
can be used to allow one to: [0063] associate an image with a
textual tag and to retrieve the image with a textual search later.
[0064] compare several aspects of the same object (e.g., mouth with
smile vs. mouth with no smile, red eyes vs. non red eyes). [0065]
sort images by textual tag or by any attribute of the table of
content. [0066] select a portion of an image for a copy-and-paste
or--treat that portion as a URL. [0067] search for any image which
looks like the current detection or search for images which have
the same tag as the current tag. [0068] process an image
compositing to replace a part of an image with another image (for
example, a closed eye in a photo with an open eye of the same
person). [0069] apply corrections, such as remove a pimple in real
time in a video conference. [0070] change a style, such as allow
application of a 3D deformation on a face. [0071] allow one to
connect detection with predefined actions, such as play and pause
music, switch applications, turn the page of a multipage document,
scroll in a document, etc. [0072] apply a special effect to an
image that the position of the detection. [0073] control mouse
movement or movements of other user input devices [0074] determine
which functions are assigned to various user-interface devices
[0075] The following is a practical example of how a pattern
recognizer of the type introduced here can be used to develop a
primitive for a video game. Consider an end user in front of a
camera, and a network which includes two observers. The first
observer is configured (by its pattern recognizer) to recognize an
opened left hand and the second observer is configured to
recognized a closed left hand. The video game primitive in this
example is for the user's closing the left hand in front of the
camera for a minimum amount of time to be able to create a
"fireball" when he subsequently opens his left hand. Accordingly,
the process can be modeled as follows: If the closed left hand is
recognized, then a timer is launched and an deformation effect
reduce the closed left hand in the output video. If the opened left
hand is recognized and the timer has reached a threshold of time
then a fireball effect is added on top of the opened left hand in
the output video.
[0076] Both observers (the one for the opened left hand and the one
for the closed left hand) are created and taught by an application
developer during the game's development.
System Architecture
[0077] Refer now to FIG. 4, which shows the general architecture of
a system for operating a pattern processor such as introduced here.
In the illustrated embodiment, the system 40 includes a set of
observers 41, a tools module 42 containing a set of tools, and a
graphical user interface (GUI) 43. Note the typical pattern
recognizer includes multiple observers 41, however only one is
shown in FIG. 4, to simplify explanation. In certain embodiments,
these illustrated elements of the system are implemented at least
partially as software and/or firmware that executes on one or more
programmable processors. In other embodiments, the system may be
implemented entirely in special-purpose hardware (e.g., ASICs,
PLDs, etc).
[0078] Each observer 41 receives some form of input media 44, such
as an image or a streaming media file, processes it, and then
generates a list of detections. An observer 40 may be from a
predefined neural network template, although that is not
necessarily the case. Each observer 40 provides two outputs (which
are the same information in different representations: an output
list and a visual pattern of indicators centered on the elements in
the output list.
[0079] In certain embodiments an observer 40 is a software
generated object instantiated at run-time by an observer engine 46,
where the observer 40 is defined by an entity called an observlet
45. An observlet 45 is a document for use by the observer engine 46
and may be, for example, an extensible markup language (XML) flat
file. An observlet 45 contains a pattern description in a simpler
pattern base, i.e., it contains the pattern filter for an observer.
This pattern description defines a category of equivalent patterns.
Observlets 45 can be created somewhere and used somewhere else.
They can be uploaded on servers, shared by users, etc.
[0080] The observer engine 46 includes a finite set of routines
which are common to any pattern description. An observlet 45
running on top of the observer engine 46 forms an observer 40,
which can recognize a pattern. When an observlet 45 is loaded by
the observer engine 46, it becomes a computational function which
produces a detection list 47 of the pattern description it
represents. Anytime a part of the input media 44 matches the
observlet pattern description, a new element is added to the
detection list 47. Of course, if the input media 44 does not
contain the pattern defined by the observlet 45, the output list 47
for that image is void.
[0081] The observer engine 46 has asynchronous mode and an
asynchronous mode. The synchronous mode is used for input which
produces a stream of images (e.g., webcam, microphone) and provides
an output list of detections for all running observlets at the same
rate as that at which the input device acquires images. The
asynchronous mode is used for batch processes.
[0082] The category for an observlet 45 is defined by defining the
frontiers of what is in and what is not in the category. To do that
the user provides a set of positive samples (part of input media)
for what is in and a set of negative for what is out. These samples
can be created by using classical GUI selection tools.
Consequently, anyone who knows how to use simple image processing
software is a potential observer developer.
[0083] In general, the GUI 43 and tools 42 provide various
input/output (I/O) services, such as loading, creating and saving
observlets; loading input media (e.g., image files, video files);
connecting to a media stream (e.g., from webcam or microphone);
image creation with brush tools (e.g., palette, mouse, touch
screen); and creation of an observer network. In one embodiment,
the GUI 43 provides a main window divided into a top toolbar, a
network view of the currently loaded neural network on the side, a
main view and a bottom toolbar view at the bottom.
[0084] The input media 44 is also displayed to the user via the GUI
43. Through the GUI 43, the user (e.g., an application developer)
can perform various functions, such as controlling the input source
(e.g., turning on and off acquisition of stream sources), editing
the neural network (e.g., adding a new observer, deleting an
observer or a group of selected observers, modifying observer
parameters), teaching/learning of observers.
[0085] The list of detections 47 is also displayed to the user via
the GUI 43. For each selected observer, detections are extracted
and so that only coordinates representing maximums of local
clusters of detections are kept. The detections can be superimposed
on the input image as displayed to the user. For example, features
in the input image which are represented in the detection list may
be highlighted, outlined, or otherwise identified, in the image as
displayed to the user on the GUI 43. The detection list may also be
displayed in the form of a table or other structured representation
and/or may be used to trigger any of various actions, as described
above.
[0086] Note that a pattern can be covered by several distinct
observlets. This property constitutes a significant advantage over
other pattern recognition techniques. For example, if one has a
face observlet, a smile observlet, a glasses observlet, and a bear
observlet, each of these recognizers can be made active by the same
pattern if the pattern contains all of these properties. This
capacity is extremely useful for search, sort and browse
functions.
Overall Operation
[0087] The overall operation of a pattern recognizer will now be
described.
[0088] Time: The pattern recognizer uses discrete time to acquire
new images and processes each observer during every new cycle in
case of streaming input (e.g., video or audio), or propagates
information across the network of observers once in the case of a
static input image. A time cycle (also called time step) is divided
into n time slices as shown in FIG. 5. Each time step is a real
time increment which corresponds to a new input image acquisition.
Time slices are used to implement rank order coding and decoding,
as discussed further below. Time slices reflect the rank order of
events, and thus the relative time between events. Integration in
each time step is done time slice by time slice.
[0089] Network Input: The input of the pattern processor is one or
more images. Videos or spectrograms for sounds are converted into
images at each runtime cycle. Images are inherently positional
coding ready, as discussed above.
[0090] Network Output: The output of the network (pattern
recognizer) is a detection list, which is generated from the output
lists of the final stage observer(s). Each element of a detection
list contains a reference to the observer which generated that
element, the spatial (x,y) coordinate, matching level, size and
other information about of a matching feature in the input
image.
[0091] Observers: Each observer comprises a 2D array of processing
units (e.g., categories), called neurons, as shown in FIG. 6. These
neurons are described below. Each 2D array maps to the geometry of
the input image, but can have a smaller scale (i.e., where a neuron
covers several neighboring pixels) or can be translated relative to
the input image (i.e., where a processing unit covers shifted
pixels) or both (translation and different scale at the same time).
Observer output is computed during each time slice, each output
state is both a pattern (and has the same position coding property
of the input image) and a list of the firing neurons during the
previous time slice.
[0092] There are two types of observers: ordinary observers and
converters. Converters and ordinary observers have the same type of
output but not the same input. Converters process an input image
(2D array of pixels) and convert it into ordered lists (output
lists) of coordinates. Ordinary observers process output lists of
converters or other (upstream) observers and generate ordered
output lists of coordinates.
[0093] Neurons (Processing Units): The neural network described
herein uses a custom model of integrate-and-fire neurons, also
known as spiking neurons. Each neuron is modeled with a potential,
a threshold and a modulation factor used for the rank order
decoding. The output of a neuron is a pulse (or spike), i.e., a
coordinate associated with the time slice when the potential passed
the threshold or nothing if the potential does not pass the
threshold. Each observer's (2D array of processing units) global
output can be seen as a dynamic pattern. In the positional coding
scheme, no output or zero output are each important pieces of
information. A processing unit is a placeholder that provides an
observable pattern to downstream observers, so that positional
coding remains intact.
[0094] Converters: The first stage in the pattern recognizer is a
special class of observers called contrast converters, or simply
"converters". The converters extract positive and negative contrast
(e.g., quantified variation of luminance) in the input image and
apply these values (called "potentials") to a threshold to
determine which pixels in the input image show a significant amount
of contrast. The reason for extracting contrasts first is due to
the need to localize variations in the image. Uniform (same
luminance) areas are less informative than local variations of
luminance. The convolution kernel is build to search such
variations in any directions.
[0095] In essence, therefore, the function of the converters is to
identify the locations of features in the input image. By contrast,
downstream observers in the pattern recognizer actually recognize
the identified features, i.e., associate them with one or more
categories.
[0096] As shown in FIG. 7, one converter 71A extracts positive
contrast while a separate converter 71B extracts data contrast. In
certain embodiments, each converter does this by convolving the
input with a Mexican Hat shaped convolution kernel matrix, such as
a Laplacian of Gaussian (LoG) or difference of Gaussian. These
convolutions are computed and stored in a data structure which has
the same structure and output as observers, as discussed further
below. Results are thresholded to only keep positive results, due
to the rank order coding, as discussed further below. Positive and
negative contrasts are processed in this way, because in spiking
neurons, the exchanged information is positive pulses or spikes;
consequently, negative contrasts have to be encoded separately
(e.g., in a different space in the position coding scheme) to
convey a different meaning. A single converter does not allow
coding for both positive and negative values. Consequently, the
extraction of positive and negative contrasts is separated into two
different converters. Each converter 71A or 71B convolves the input
image 73 with a specific convolution kernel matrix, 72A or 72B,
reflecting the extraction of a positive or a negative contrast. The
nature of the contrast is therefore implicitly encoded in the
position of the output list (in the network of observers).
[0097] As a consequence of this positional coding, all elements of
the output list form a converter filtering the image with the
convolution kernel corresponding to the positive convolution will
code for positive contrasts. All elements of the output list coming
form the converter filtering the image with the convolution kernel
corresponding to the negative convolution will code for negative
contrasts.
[0098] Connectivity: Each neuron (processing unit) in each observer
can be functionally connected to a contiguous and rectangular array
of other neurons in an upstream observer, as shown in FIG. 8. In
this scheme each neuron link is represented by a synaptic weight.
Therefore, each contiguous rectangle of neurons defines a synaptic
weight matrix, which is part of a pattern filter 82. A separate
pattern filter is defined between every pair of connected observers
that are in different but adjacent stages of the neural network. In
FIG. 8, reference numeral 82 denotes the array of neurons in
observer A that are observed by neuron X in observer B, through its
corresponding synaptic weight in the receptive synaptic weight
matrix 81. Reference numeral 83 denotes the array of neurons in
observer A that are observed by neuron Y in observer B, through its
corresponding synaptic weight in the receptive synaptic weight
matrix 81.
[0099] Pattern filters and Synaptic Weights: A pattern filter 82 is
a pair of synaptic weight matrices, i.e., one projective weight
matrix 80 and one receptive weight matrix 81, which forms a link
between two observers in the neural network, as illustrated in
FIGS. 8 and 9. Each synaptic weight matrix is an N.times.P matrix
of real numbers, i.e., weight values. The "owner" of a pattern
filter is the immediately downstream observer to which that pattern
filter is connected. The main purpose of a pattern filter is to
allow integration of a signal coming from the ordered output list
of the upstream observer connected to the pattern filter, by the
downstream observer connected to the pattern filter, using the
projective synaptic weights in the pattern filter. A secondary
purpose is to update these synaptic weights and to convert
receptive synaptic weight matrixes into projective synaptic weight
matrixes.
[0100] A single pair of synaptic weights is used between any two
connected observers to filter the same feature at any position in
the image. In other ways the synaptic weight values are the same
for each neuron. During the integration phase of an observer, a
pattern filter converts the presence of a specific localized
feature into an active neuron in the next time slice in the same
time cycle. Any time a feature is present in the input, image a
single feature similarly located in the observer array is
consequently activated.
[0101] Referring again to FIG. 8, the receptive synaptic weight
matrix 81 is the exact same size as each small contiguous
rectangular array 81 or 82 of neurons in observer A. A first small
contiguous and rectangular array 82 of neurons in the observer A is
observed by neuron X in observer B, where each neuron in the
rectangle is observed through its corresponding synaptic weight in
the receptive synaptic weight matrix 81. A second small contiguous
and rectangular array 83 of neurons in the observer A is observed
by neuron Y in observer B, where each neuron in the rectangle is
observed through its corresponding synaptic weight in the receptive
synaptic weight matrix 81.
[0102] FIG. 9 shows an example of the output pattern 91 that would
be produced by an observer for a given pattern filter 90 and a
given input pattern 92 from the immediately a upstream
observer.
[0103] Synaptic weights can be positive (in which case integration
will increase processing unit potential), zero (in which case
integration has no effect on the potential), or negative (in which
case integration decreases the potential). This three-logic-state
approach allows the extraction of several features in a single
pattern. For example, it is possible to extract both a vertical bar
and a horizontal bar in the pattern of a cross.
[0104] Rank order coding: In at least one embodiment, all observers
in the network use the rank order coding scheme. Rank order coding
is a method used for making processed data invariant to contrast
and luminosity by converting a list of numeric data into an ordered
list which removes the exact value of elements and only keeps the
corresponding spatial coordinates in a relative rank order. As an
example of rank order coding, if the input list is {e1=3.2, e2=3.4,
e3=2.7, e4=1.9, e5=4.1}, the corresponding output list would be
{e5, e2, e1, e3, e4}, where each of e5, e2, e1, e3, e4 represents a
different spatial (x,y) coordinate in the input.
[0105] The conversion process between images organized in pixels
into the rank order coding is initiated at the output of the first
layer of observers, i.e., the converters. The rank order coding
process is based on the notion of dividing each time step of the
algorithm into some number, n, of time slices (e.g., n=100), as
illustrated in FIG. 5. In general, the number (n) of time slices
per time step is predetermined and does not change.
[0106] Briefly, the rank order coding process is as follows: First,
convert the (non-zero) convolved and thresholded potential values
into ordered lists of coordinates, then assign the corresponding
spatial coordinates of those values to the appropriate time slice
according to their potential values, with higher values being
placed in earlier time slices and lower values being placed in
later time slices. Whatever the luminance or contrast is, the
ranking of the convolved data remains constant.
[0107] Therefore, the ranked output list of an observer can be
thought of as a histogram of spatial coordinates, where each time
slice is a different bin of the histogram.
[0108] The integration process is done time slice by time slice. If
the rank of data in an output list is different, then the
integration process will give different results.
[0109] The following is a more detailed description of the rank
order coding process done by an observer, for each neuron. First,
the potential P of the neuron is linearly rescaled to be within the
range from 0 to the number of time slices per time step. The
potential P is then truncated to remove any fractional portion. The
coordinate (i,j) of the neuron to which P relates is then added to
the output list of the observer, indexed by P (which now reflects a
time slice).
[0110] Initially there is one void bin in the output list for each
time slice. The goal of the rank order coding is to fill the output
list, indexed by time slice, with coordinates of neurons so that
each time slice's bin will contain the coordinates corresponding to
the linearly rescaled potential value. In other words, potential is
converted to an integer index corresponding to the number of time
slices.
[0111] Consider the simple example illustrated in FIG. 10. The far
left matrix 101 shows the potentials for a set of neurons in a
given observer. The x and y coordinates are shown with shading at
the top and left of the matrix. The middle matrix 102 shows the
rescaled potentials, and the array 103 at the right is the output
list, indexed by time slice (i.e., by potential). This rank order
coding allows the sorting of the potential array in 2n instead of
n*log(n) if n is the number of elements of the array.
[0112] Runtime Process:
[0113] Operation of each observer is event driven, where the
starting point of propagated information is the rank ordered list
of data extracted from an input image and divided into sublists
(output lists) of equivalent elements. For each time step (see FIG.
5), the input image is converted into an ordered (ranked) list
(output list) of coordinates, where the placement of each
coordinate in the list corresponds to the relative intensity
(potential) inside the observed image.
[0114] Integration and fire: The integrate-and-fire model used by
neurons adheres to the positional coding scheme and has no
reference to meanings of objects or features except in the usage or
relative position in patterns and observers. Classical neural
network computation is based on the following algorithm: For each
neuron, its potential is integrated by adding the product of each
of its synaptic weights with the output state of the associated
upstream neuron. In contrast, propagation in the neural network
introduced here is event driven. It is more efficient, because most
neurons remain inactive (generating no pulse) during each time
slice. Using event driven propagation avoids adding zero to the sum
the vast majority of the time during the integration process.
[0115] The basic integration operation in the technique introduced
here is:
[0116] For each time slice and each neuron, the potential P of a
unit increases as
[0117] P(t+1)=P(t)+Sum of all (synaptic weight (W) corresponding to
an active input multiplied by a modulation factor (M))
[0118] At the beginning of each time cycle, the potential P of each
neuron is reset to zero.
[0119] The modulation factor, M, is a mechanism used to decode the
rank order coding. A consequence of using the modulation factor in
the formula above is that the preferred order will always produce a
higher potential, P. The modulation factor M is a real number
between 0 and 1. At the beginning of each time cycle, it is reset
to 1, then any time a synaptic weight is computed, the modulation
factor M is multiplied by a shunting factor, beta, which is a
positive real number close to but less than 1. That is, we have
M(t+1)=M(t)*beta.
[0120] For each time slice and for each integrated processing unit,
after the integration is computed, the firing phase begins. In the
firing phase, for each time slice and for each integrated
processing unit, if the potential P passes the neuron's threshold,
T, then the neuron generates a new pulse in the output list of the
observer for the next time slice. More precisely, the neuron
includes the coordinate of that neuron (which is spatially mapped
to a corresponding coordinate in the input image) in the output
list of the observer.
[0121] Example of the integration process and rank order decoding:
Suppose that we have four weights W1=4, W2=3, W3=2, W4=1. We can
compute the final potential P of a processing unit receiving its
inputs in the following order: I1, I2, I3, I4 and in the opposite
order. Suppose that the shunting factor=0.75.
[0122] Let us start with the input temporal order I1, I2, I3, I4.
THEREFORE, at the first time slice, P=W1*M=4*1=4 and M=M*beta; at
the second time slice, P=4+W2*M=4+3*0.75 and M=M*beta; at the third
time slice P=6.25+W3*M=6.25+2*0.5625 and M=M*beta; and during the
last time slice, P=7.375+W4*M=7.796875.
[0123] By contrast, in the opposite temporal order I4, I3, I2, I1,
we have P=1, P=2.5, P=4.1875 and P=5,875. In the preferred order
(first case) the final potential is 7.796875, and in the worst
order, the potential is only 5,875. A threshold can easily separate
these two patterns, which are statically equivalent but dynamically
different.
[0124] Projective pattern filters: The integration process uses the
projective equivalent of pattern filters, as illustrated in FIGS. 8
and 11. The technique in essence reverses the synaptic weights
matrix from a receptive point to a projective point. In certain
embodiments, the pattern filters (synaptic weights matrices) are
reversed from a receptive point to a projective point. More
precisely, each pattern filter includes both a projective synaptic
weight matrix and a receptive synaptic weight matrix, each being
the transpose of the other, as shown in FIG. 11B. The result in the
projective scheme is numerically exactly the same as in receptive
scheme, but the performance is better because it requires fewer
integration operations per time step than the receptive scheme.
[0125] If a neuron PU in the receptive scheme observes the inverted
`L` pattern in the input observer pattern at this exact position
(marked as A,B,C D in pattern space FIG. 11A) and the only positive
synaptic weights in the receptive synaptic weight matrix of the
pattern filter are at positions {7, 8, 12, 17}, it means that the
input pattern is the best matching input for processing unit PU.
Therefore, in the projective scheme, the output list will have four
elements reflecting the activation of units A,B,C,D, and the
integration of the projective synaptic weight matrix of the
projective pattern filter will add the weights {8, 7, 12, 17} as in
the receptive scheme (the projective scheme of FIG. 11B illustrates
the projective integration induced by coordinate B). However, in
the projective scheme, there will be 25*4 basic integration
operations (25 is the number of values in the matrix, 4 is the
number of nonzero values), as opposed to 25*64 integration
operations in the receptive scheme (64 is the size of the overall
input image).
Practical Example of an Observer Network
[0126] In a typical usage scenario, only an advanced developer will
create new and more efficient pattern recognizers using the
elements described above. Less advanced users can use a predefined
network of observers. FIG. 12 shows an example of such a predefined
network. It is general enough to be used in a lot of different
domains of applications. Note that all of the observers in the
network execute using the exact same algorithm (described in detail
below); the difference between them is only in their positions in
the network and the weights of the matrixes of the pattern filters
which link them to their upstream observers in the network. Note
that the pattern filters are not shown in FIG. 12, to avoid making
the figure unnecessarily complex, although they are described
below.
[0127] In the network of FIG. 12, the input image 121 is initially
applied to the first layer (stage) 120 which includes two
converters 122-1 and 122-2. The converters 122-1 and 122-2
determine a positive and negative contrast value (potential),
respectively, for each pixel in the input image 121. Each converter
122 outputs a ranked output list of coordinates whose potentials
exceed the specified threshold, as described above.
[0128] Each of the converters 122 provides its output to a second
layer 123 of observers, which includes eight observers 124-1
through 124-8, which are referred to herein as "orientation
detectors" to facilitate explanation. Each of the orientation
detectors 124 receives its input from each of the converters 122,
via a separate pattern filter. Each of the orientation detectors
124 is configured (by its immediately upstream pattern filter) to
detect features at a different angular orientation in the input.
The reason for detecting local orientations is to determine the
nature of the variation previously detected by the contrast
converters.
[0129] The settings of the weights in the synaptic weight matrices
in each pattern filter of an orientation detector 124 determine the
orientation of features that will be detected by the orientation
detector. For example, the first orientation detector 124-1 is
configured to detect features oriented at 0 degrees, the second
orientation detector 124-2 is configured to detect features
oriented at 45 degrees, the third orientation detector 124-3 is
configured to detect features oriented at 90 degrees, and so on in
45 degree increments up to 315 degrees. Thus, the output list of
each orientation detector is a ranked list of coordinates at which
a feature having the specified orientation was detected in the
input image.
[0130] In one embodiment, the synaptic weight matrices of the
pattern filters associated with the orientation detectors are Sobel
matrices, in which the weight values are from the well-known Gabor
filter function:
g ( x , y ; .lamda. , .theta. , .psi. , .sigma. , .gamma. ) = exp (
- x '2 + .gamma. 2 g '2 2 .sigma. 2 ) cos ( 2 .pi. x ' .lamda. +
.psi. ) ##EQU00001##
where
.chi.'=.chi. cos .theta.+.gamma. sin .theta.
and
.gamma.'=-.chi. sin .theta.+.gamma. cos .theta.
[0131] The parameter .lamda. represents the wavelength of the
cosine factor, .theta. represents the orientation of the normal to
the parallel stripes of a Gabor function, .psi. is the phase
offset, .sigma. is the sigma of the gaussian envelope and .gamma.
is the spatial aspect ratio, and specifies the ellipticity of the
support of the Gabor function.
[0132] Each orientation detector 124 has two pattern filters, one
to process the output of the positive contrast converter 122-1 and
one to process the output of the negative contrast converter
122-2.
[0133] As an example, for the orientation detector for 0 degrees, a
3.times.3 receptive synaptic weight matrix of the pattern filter
connected to the output of the positive contrast converter 122-1
can be defined as shown in Table 1A, while a corresponding
receptive synaptic weight matrix of the pattern filter connected to
the output of the negative contrast converter 122-2 can be defined
as shown in Table 1B.
TABLE-US-00001 TABLE 1A 1 2 1 0 0 0 - 1 - 2 - 1 ##EQU00002##
TABLE-US-00002 TABLE 1B - 1 - 2 - 1 0 0 0 1 2 1 ##EQU00003##
For the orientation detector for 45 degrees (124-2), examples of
the 3.times.3 receptive synaptic weight matrices connected to the
outputs of the positive and negative contrast converters and
negative contrast converters are shown in Tables 2A and 2B,
respectively:
TABLE-US-00003 TABLE 2A 0 2 0 2 0 - 2 0 - 2 0 ##EQU00004##
TABLE-US-00004 TABLE 2B 0 - 2 0 - 2 0 2 0 2 0 ##EQU00005##
[0134] For the orientation detector for 90 degrees (124-3),
examples of the 3.times.3 receptive synaptic weight matrices
connected to the outputs of the positive and negative contrast
converters and negative contrast converters are shown in Tables 3A
and 3B, respectively:
TABLE-US-00005 TABLE 3A 1 0 - 1 2 0 - 2 1 0 - 1 ##EQU00006##
TABLE-US-00006 TABLE 3B - 1 0 1 - 2 0 2 - 1 0 1 ##EQU00007##
[0135] And so on for other orientations.
[0136] Referring back to FIG. 12, each of the orientation detectors
124 provides its output to a third layer 125 of observers, which
includes eight observers 126-1 through 126-8, which are referred to
herein as "complex cells" to facilitate explanation. Each of the
complex cells 126 receives its input from only one of the
orientation detectors 124, via a separate pattern filter.
[0137] Complex cells 126 have two main properties, by virtue of the
weights in their pattern filters. The first property is to relax
the geometric constraint of localization (in the image geometry
space, not in the positional coding). This relaxation is given by
the width of the positive weights in the matrix. The reason to
relax this geometric constraint is to increase the level of
matching based on local orientation features.
[0138] The second property is to give temporal priority (in a time
slice sense) to neurons which observe the end of a local feature
relative to those which observe the center of a local feature.
[0139] The nature of the synaptic weight matrices for any of the
complex cells is illustrated in FIG. 13. It can be seen that values
in the matrix which are close to the horizontal center axis of the
matrix are positive, except near the outermost points, where they
are negative. All other values in the matrix are zero. Essentially
any matrix which roughly has the illustrated shape and properties
will produce these two properties.
[0140] Any new observer(s) 128 would be created in the next layer
(i.e., a fourth layer in the network of FIG. 12), and each new
observer would have the same upstream observers, i.e., the eight
oriented complex cells. The size of the pattern filter is defined
by the size of the user selection. When the user adds a new
observer, it creates a new recognition ability for the system.
[0141] Creating an observer network (a pattern recognizer) can be
compared to creating a structure with the LEGO game. In the LEGO
game, not all the created structures look like a real world object
(car, house, plane, etc.), but the nature of the LEGO brick allows
one to build essentially whatever structure you want; it is just a
question of imagination, observation, ability to reproduce, etc. In
the techniques introduced here, we allow the user to build
recognition functions instead of physical structures. Not all
conceivable networks will provide interesting recognition
properties, but some of them will.
Detailed Description of Algorithms
[0142] Observer Engine
[0143] FIG. 14 shows an example of the algorithm executed by the
observer engine, for streamed input (e.g., video or audio). The
difference between how streamed input is handled versus
single-image input is discussed below.
[0144] The algorithm is described now for a single time step (which
includes multiple time slices). Initially, at 1401 the current time
cycle is initialized to the first time cycle. Next, at 1402 the
observer engine acquires a set of input images from a user-selected
input source (e.g., a web camera or audio subsystem microphone).
All observers in the selected network, including converters, are
then initialized by the observer engine at 1403, according to a
process described below.
[0145] After all observers have been initialized, each converter in
the network is executed by the observer engine at 1404. The process
of executing a converter is described further below. Next, each
observer which is not a converter is executed by the observer
engine at 1405, also as described further below.
[0146] At 1406, if all time cycles in the current time step have
been completed, the engine proceeds to 1407; otherwise, the engine
increments the time cycle at 1413 and then loops back to 1402.
[0147] Beginning at 1407, the engine performs operations 1408 and
1409 for each observer (1410) other than converters in the network.
Each observer (other than converters) receives one output list as
input, which is designated its "read output list", and produces
another output list as output, which is designated its "write
output list". At 1408 the observer engine generates a detection
list for a particular observer by filtering its write output list
to keep only local maxima of any clustered elements in the list
("clustered" refers to physical proximity in the input image). At
1409, optionally, the observer engine renders the detection list on
top of the corresponding input image on a display device.
[0148] After all observers have been processed per 1408 and 1409,
the observer engine generates a table of results at 1411 by
concatenating all of the detection lists for all of the observers,
sorting the entries and filtering them to keep only local maxima of
any clustered elements. The observer engine then performs or
triggers a desired action at 1412, based on the table of results.
For example, the engine might cause the table to be displayed to a
user. As other examples, the engine might trigger an operation such
as indexing, sorting, comparing, initiating a visual effect or
sound effect, or essentially any other desired action.
[0149] For single image input, the algorithm is similar but not
identical. The basic difference is that for single-image input,
operation 1404, the execution of each observer, is iterated for
each of the layers of the observer network. Two observers are
considered to be in the same layer (stage) if their depth, e.g.
their distance from the input, is the same. The entire network of
observers can be thought of as a pipelined processor. If an input
image is presented at the input of the network, several time cycles
(time steps) will be necessary for an input to reach the output of
the network. Information crosses one stage of the network per time
cycle. The number of necessary time cycles is equal to the depth of
the network according to classic graph theory.
[0150] In the algorithm for streamed input, the pipeline is fed at
each time step, so the engine's algorithm does not need to
incorporate the notion of layers. Regardless of the network
structure, each observer needs to execute an integrate-and-fire
process at each time cycle. However, in the single input image
algorithm, it is desired only to compute each observer once. To
accomplish that, the observer network is ordered in layers
reflecting their depth from the input layer. Consequently, the
information propagation is computed from the input image to the
deepest observer in the network efficiently.
[0151] Initializing an Observer
[0152] FIG. 15 shows an example of the process for initializing an
observer, as done in operation 1403 described above. Operations
1501 and 1508 cause the following set of operations to be executed
for each neuron (having coordinate (i,j)) in the observer. First,
the potential P(i,j) of the neuron is initialized to zero. If then
observer is a converter (1503), the process proceeds to 1505;
otherwise, the modulation factor M(i,j) is initialized to 1, and
the process then proceeds to 1505.
[0153] At 1505 all elements are removed from the write output list
of the observer. The write output list is then set equal to its
read output list (i.e., to the write output list of the next
upstream observer) at 1506. At 1507 the process empties the write
output list.
[0154] Executing a Converter
[0155] FIG. 16 shows an example of the algorithm for executing a
converter. Initially at 1601 the converter convolves the input
image with its convolution kernel (matrix). As noted above, this
kernel can be any Mexican hat shaped function, such as LoG. Next,
1602 and 1607 cause the following set of operations to be executed
for each neuron (having coordinate (i,j)) in the converter.
[0156] First, the converter determines at 1603 whether the
potential P of the neuron exceeds a predetermined threshold, T1.
The threshold T1 may be user-specified or a default parameter. If P
does not exceed T1, the process proceeds to evaluate the next
neuron. If P exceeds T1, then at 1604 the converter rescales P
linearly in a range between 0 and the number of time slices in each
time step. The converter then truncates P at 1605 to remove any
fractional portion, and adds the coordinate (i,j) of the neuron to
its write output list indexed by P (i.e., by time slice).
[0157] Executing Observers Other than Converters
[0158] FIG. 17 shows an example of the algorithm for executing an
observer which is not a converter. The process is also illustrated
conceptually in FIG. 18. The observer is designated "observer A" in
FIG. 17 to facilitate explanation. Operations 1701 and 1715 cause
the following set of nested loops to be executed for each time
slice in the current time step.
[0159] For each neuron whose coordinate (i,j) is listed in observer
A's read output list (1702), and for each immediately downstream
observer ("observer B") that is connected (via a pattern filter) to
observer A (1703), and for each neuron (having coordinate (k,l))
subject to integration (1704) in observer B, observer A computes
the potential P(t) for the current time slice as
P(t)=P(t-1)+W(r,s)*M(t)
as illustrated in FIG. 18,
[0160] where:
[0161] W(r,s) is the weight value at coordinate (r,s) of the
projective synaptic weight matrix in the pattern filter of observer
B, used for integration of the neuron at coordinate (k,l) in
observer B while neuron (i,j) in observer A is integrating;
[0162] M(t)=M(t)*beta;
[0163] beta is the shunting factor discussed above;
[0164] r=k-I+N/2 belongs to interval [0,N], where N is the width of
the projective synaptic weight matrix; and
[0165] s=1-j+P/2 and belongs to interval [0,P], where P is the
height of the projective synaptic weight matrix.
[0166] Reference numeral 181 indicates the area covering all
neurons subject to the integration process induced by the neuron of
observer A at coordinate (i,j). Coordinate (i,j) in Observer B is
the center of the integration area in Observer B.
[0167] Next, if all neurons (k,l) subject to integration have been
processed (1707), then the process proceeds to 1708. Beginning in
1708, for each neuron (having coordinate (i,j)) in observer A, if
that neuron's potential P exceeds a predetermined threshold T2 at
1709, then at 1710 observer adds the coordinate (i,j) of the neuron
to observer A's write output list of observer A for next time
slice. The threshold T2 may be user-specified or a default
parameter. The observer then sets the potential P equal to a
negative value, -K, to prevent the neuron from being reactivated in
the current time step. If that neuron's potential P does not exceed
threshold T2 at 1709, then the process proceeds from 1709 to
1712.
[0168] Per 1712, operations 1709-1711 are carried out as described
above for each neuron in observer A. Per 1713, operations 1704-1712
are carried out as described above for each downstream observer of
observer A. Per 1714, operations 1703-1713 are carried out as
described above for each neuron whose coordinate is listed in the
read output list of observer A. Per 1715, operations 1702-1714 are
carried out as described above for each time slice of the current
time step.
[0169] Creating a New Observer.
[0170] The user can create a new observer in the network to
recognize a new pattern. Creation of a new observer can be done
between two time steps of runtime. In the case of streaming input,
creation of a new observer does not stop the runtime. Creation of a
new observer involves the creation of a new array of neurons and a
set of pattern filters connected to upstream observers previously
created. Each new pattern filter is stored in both the receptive
scheme for the learning process and in the projective scheme for
the integration process.
[0171] Now consider a practical example of creating a new observer.
Assume a user wants to create a mouth-with-smile recognizer and a
mouth-with-no-smile recognizer. The user can create a new observer
for a mouth-with-smile and a new observer for a
mouth-with-no-smile. The reason he will create two new observers in
this example is that he clearly wants to recognize two different
shapes of the same object.
[0172] Note, however, and for mouth-with-smile in particular, that
a single example, e.g., the example used for the creation of the
new observer, may not be enough to create a sufficiently robust
mouth-with-smile recognizer. Sometimes the observer may consider
something in the background as a mouth-with-smile, and sometimes it
may fail to recognize a mouth-with-smile. Therefore, the
capabilities of an observer to learn by example or by
counter-example, as described below, allow the user to teach the
observer to remove false detections in the background and to make
the mouth-with-smile detection more reliable. The user will have
the same problem with the mouth-with-no-smile, and may have to
teach this observer for the same reasons.
[0173] When creating a new observer, the size of the new pattern
filters (i.e., the size of both the receptive and projective
synaptic weight matrices) is based on the size of a user-selected
region in the image and also takes into account any scale reduction
of the upstream observers. The initial synaptic weight matrices of
a pattern filter are created by using the output ordered list of
the new observer upstream in the position defined by the user's
selection.
[0174] Elements of the output list of an observer are, in essence,
spikes ordered in time. Values are given to elements of the list
depending on their relative order, the first elements being given
the highest value, the last being the lowest value. Therefore, the
initial synaptic weights for an observer's pattern filter are all
positive values.
[0175] At this point, the preferred pattern for the new observer
will be the pattern used to create the synaptic weights. Therefore,
the content of the user's selection in the input image will produce
a maximum potential P for the neuron which is located at the center
of the user's selection in the new observer, as shown in FIG.
20.
[0176] An example of the process of creating a new observer is
illustrated in FIG. 19. The process can be performed by the tools
module and the GUI shown in FIG. 4. Refer also to FIG. 20, which
illustrates the process conceptually.
[0177] Initially, at 1901 a user input specifying the name for the
new observer is received via the GUI. Next the process allocates
memory for the new observer at 1902. At 1903 the process inputs a
user-specified list of the observers that are immediately upstream
from the new observer. These are the observers whose output lists
the new observer will process, via its pattern filters. At 1904 the
process inputs a user-specified rectangle 201 as a selection in the
input image, where the rectangle 201 has dimensions N.times.P
pixels. The pixel closest to the center of the selected rectangle
201 has coordinate (i,j). The rectangle 201 may be defined by the
user using any conventional image selection tool, such as a mouse
pointer or the like. The size of the rectangle 201 defines the size
of the synaptic weight matrices in the pattern filters of the new
observer.
[0178] Next, per 1905 and 1912 the following set of operations is
performed for each immediately upstream observer of the new
observer. First, at 1905 the process creates a new pattern filter
between the new observer and the currently selected upstream
observer, with matrix sizes defined from the user-specified
rectangle. Next, at 1906 the process resets each receptive synaptic
weight matrix coefficient to zero. Then, for each element E in the
write output list (1907), the process determines at 1908 whether
the element's coordinate (k,l) belongs to the converted selected
neuron's set, which is the region of interest (ROI) defined by: 1)
a center which is the transformed coordinate (i,j) in the upstream
observer coordinate system after applying the chain of coordinate
transform (scale and translation) imposed by the network of
observers, and 2) a size which is the size of the pattern filter
linking the newly created observer and the upstream observer,
Observer I. If the element's coordinate does belong to that ROI,
then at 1909 the process sets the weight value at coordinate (x,y)
in the receptive synaptic weight matrix (weight(x,y)_receptive)
equal to the number of time slices minus the time slice of element
E in the output list
[0179] By design, the receptive synaptic weight matrix has the same
dimension as the ROI mentioned above. In that case coordinates
(x,y) are the coordinate of the weights inside the receptive
synaptic weight matrix (starting (0,0) upper left which is the
classical convention for matrixes). The conversion of a coordinate
(k,l) into the ROI, i.e. the receptive synaptic weight matrix's
coordinate system, is defined by x=k-i+N/2 and y=1-j+P/2, where N
and P are respectively the width and heights of the receptive
synaptic weight matrix and (i,j) is the center of the ROI.
[0180] After each element E of the write output list has been
processed (1910), the process sets the projective synaptic weight
matrix equal to the transpose of the receptive synaptic weight
matrix at 1911. Per 1912, the process then loops back to 1905,
unless all immediately upstream observers have been processed.
[0181] In FIG. 20, rectangles 202-1 and 202-n indicate the
converted selected neurons based on the user's selection in input
image. If the observers have the same size as the input image and
are not translated, the selection is centered around the neuron
(i,j) and has the same size, N by P. Translation changes the center
position (i,j) but not the size. Rescaling an observer changes both
the center coordinate and the selection size. Classic geometry can
be applied.
[0182] Observer Learning (Teaching an Observer)
[0183] As noted above, the user can teach an existing observer in
the network to improve the accuracy of its recognition. Observers
use a supervised method for learning new categories of features.
This method is based on a selection in the input image itself
(e.g., by a user), and propagation of the image in the network of
observers. This approach allows the creation of a specific pattern
recognizer without the user having to write code to create the
pattern recognizer.
[0184] The result of observer learning is to update the weights in
the synaptic weight matrices of the observer. The technique
introduced here allows for refinement of the weights by giving
(positive) examples or counter-(negative) examples. In one
embodiment, the user clicks roughly on the center of the example
(or counter-example) in the input image, as displayed on a display
device. The pattern recognizer then searches for the coordinates in
the image that have a maximum of potential in the targeted observer
(corresponding to the best matches) and corrects the user's click
coordinate. When the best matching coordinates are found, the
pattern recognizer reconstructs the equivalent selections in the
upstream observers as in the creation of a new observer process (as
described above). This selection is created by using the best
matching coordinates and pattern filter size (i.e., the size of the
synaptic weight matrices). The contents of this selection in each
upstream observer are used to update the synaptic weights.
[0185] In the case of learning by example, for each synaptic weight
and its associated input value, an average is computed to make the
new synaptic weight better adapted to the new example, while
keeping an adaptation to all the previous examples at the same
time.
[0186] In the case of learning by counter-example, for each element
of the input value, if the input element is not zero and the
associated synaptic weight is not zero, the input value remains
unchanged. If the input element is not zero and the associated
synaptic weight is zero, the input value remains sign changed
(i.e., becomes negative). The same average method is used in the
negative example case as in the positive example case, with the
modified input values.
[0187] FIGS. 21A and 21B together show an example of a process for
observer learning (also called teaching an observer). Reference is
also made to FIG. 22, which illustrates the process conceptually.
Initially, at 2101 the process inputs a user's selection of an
observer to teach. The selection may be made via the GUI. At 2102
the process inputs a user-specified point (u, v) in the input
image. Next, a process inputs a user's selection of whether the
process will teach by example or counter-example at 2103, and then
inputs a user selected rectangle in the source image at 2104 as the
example or counter-example.
[0188] At 2105 the process converts the coordinate (u,v) into
another coordinate (u',v') in the newly created observer coordinate
system. Coordinate (u',v') is the transformed coordinate of the
user selected coordinate (u,v) in the upstream observer coordinate
system after applying the chain of coordinate transform (scale and
translation) imposed by the network of observers. Coordinate
(u',v') is then converted at 2106 into a new coordinate (i,j).
[0189] Regarding operation 2106, the user selected transformed
coordinate (u',v') is not the most optimal coordinate as an input
for the learning process. Therefore, operation 2106 determines the
most optimal coordinate around (u',v'), to which to apply the
learning process. More specifically, if the detection list of the
observer to teach is not void and contains a coordinate (x,y) close
to (u',v'), i.e. where u'-x<N/2 and v'-y<P/2, where N and P
are respectively the common width and heights of the receptive
synaptic weights matrices (by design, all the receptive synaptic
weights matrices of a newly created observer have the same
dimensions), then coordinate (i,j) is considered to be coordinate
(x,y). Otherwise (i.e., if no detection has been found around u',v'
according to the constraint above), coordinate (i,j) is considered
to be the coordinate of the local maximum of potential around
(u',v'), where "around" in this context is defined by the ROI
centered at (u',v') and the size of which is N by P, where N and P
are respectively the width and heights of the receptive synaptic
weights matrix.
[0190] After 2106, per 2107 and 2121 the process performs the
following set of operations for each upstream observer I of the
selected observer. First, at 2108 the process creates a new input
matrix having the same size as the receptive and projective
synaptic weight matrices of the selected observer. At 2109 the
process sets all of the weights of the upstream observers (i.e.,
the matrix coefficients) to zero. The process then creates a region
of interest (ROI) 221 (see FIG. 22) of I in the upstream observer
I, where the ROI 221 is defined by: 1) a center which is
transformed coordinate (i,j) in the upstream observer coordinate
system after applying the chain of coordinate transform (scale and
translation) imposed by the network of observers, and 2) a size
which is the size of the pattern filter linking the Observer to be
taught and the upstream Observer I.
[0191] Then, for each element E in the write output list (2111,
2120), the process reforms a sequence to noted by operations
2112-2119. At 2112, but process determines whether element E's
coordinate (k,l) belongs to the ROI of observer I. If it does, the
process proceeds to 2113; otherwise, the process jumps to 2114. At
2113, the process considers the element coordinate in selected
neuron set (r,s), and sets the weight(r,s) of input matrix 1 equal
to the number of time slices per time step minus the time slice of
element E. In that case coordinate (r,s) is the coordinate of the
weights inside the input matrix (starting with (0,0) at the upper
left, which is the classical convention for matrices). The
conversion of a coordinate (k,l) into the ROI, i.e. the receptive
synaptic weight matrix's coordinate system, is defined by r=k-i+N/2
and s=1-j+P/2, where N and P are respectively the width and heights
of the input matrix (which is the same as the size of the ROI and
the receptive synaptic weight matrix) and (i,j) is the center of
the ROI.
[0192] At 2114, if the user had chosen to teach by counter-example,
then the process jumps to 2118, described below. Otherwise, the
process continues with 2115. At 2115, the process initiates a set
of operations for each weight value W(i,j) of the receptive
synaptic weight matrix and I(i,j) of the input matrix of the
observer to be taught. Specifically, at 2116 the process determines
whether the weight W(i,j) is less than or equal to zero. If it is,
the process then sets I(i,j) equal to -I(i,j) at 2117 and then
proceeds to 2118. If it is not less than or equal to zero, the
process jumps directly to 2118.
[0193] At 2118, for each weight W(i,j) of the receptive synaptic
weight matrix and I(i,j) of the input matrix, the process sets
weight W(i,j)(t) equal to the quantity
((1-alpha)*W(i,j)(t)+alpha*I(i,j)), where alpha is a real number
close to zero in the interval [0,1]. The process then sets the
projective synaptic weight matrix equal to the transpose of the
receptive synaptic weight matrix at 2119.
[0194] Operation 2120 then checks whether each element E of the
output list has been processed, and if not, the process loops back
to 2112. After all elements in the output list have been processed,
operation 2121 checks whether all upstream observers of the
selected observer have been processed, and if not, the process
loops back to 2107. After all upstream observers have been
processed, the process ends.
[0195] An observer can be "recycled", i.e. have its synaptic weight
matrices reset, by using the above described process for creating a
new observer, but without changing the size of the synaptic weight
matrices. An example of when recycling an observer may be desirable
is, if the source is a video stream and the user has selected a
moving object in the video is not satisfied by its selected
pattern, yet the user is satisfied with the size of the selection.
By recycling the observer, the synaptic weights can be reset easily
by the user simply clicking at the center of the region of
interest.
[0196] The techniques introduced above can be implemented in
software and/or firmware in conjunction with programmable
circuitry, or entirely in special-purpose hardwired circuitry, or
in a combination of such embodiments. Special-purpose hardwired
circuitry may be in the form of, for example, one or more
application-specific integrated circuits (ASICs), programmable
logic devices (PLDs), field-programmable gate arrays (FPGAs),
etc.
[0197] Software or firmware to implement the techniques introduced
here may be stored on a machine-readable medium and may be executed
by one or more general-purpose or special-purpose programmable
microprocessors. A "machine-readable medium", as the term is used
herein, includes any mechanism that can store information in a form
accessible by a machine (a machine may be, for example, a
conventional computer, game console, network device, cellular
phone, personal digital assistant (PDA), manufacturing tool, any
device with one or more processors, etc.). For example, a
machine-accessible medium includes recordable/non-recordable media
(e.g., read-only memory (ROM); random access memory (RAM); magnetic
disk storage media; optical storage media; flash memory devices;
etc.), etc.
[0198] The term "logic", as used herein, can include, for example,
special-purpose hardwired circuitry, software and/or firmware in
conjunction with programmable circuitry, or a combination
thereof.
[0199] Although the present invention has been described with
reference to specific exemplary embodiments, it will be recognized
that the invention is not limited to the embodiments described, but
can be practiced with modification and alteration within the spirit
and scope of the appended claims. Accordingly, the specification
and drawings are to be regarded in an illustrative sense rather
than a restrictive sense.
* * * * *