U.S. patent application number 14/586863 was filed with the patent office on 2015-12-24 for systems and methods for obtaining structural information from a digital image.
The applicant listed for this patent is QUALCOMM Incorporated. Invention is credited to Ning Bi, Magdi Abuelgasim Mohamed, Yingyong Qi, Michel Adib Sarkis, Xin Zhong.
Application Number | 20150371360 14/586863 |
Document ID | / |
Family ID | 54869952 |
Filed Date | 2015-12-24 |
United States Patent
Application |
20150371360 |
Kind Code |
A1 |
Mohamed; Magdi Abuelgasim ;
et al. |
December 24, 2015 |
SYSTEMS AND METHODS FOR OBTAINING STRUCTURAL INFORMATION FROM A
DIGITAL IMAGE
Abstract
A method for obtaining structural information from a digital
image by an electronic device is described. The method includes
determining an iris position in a region of interest based on a
gradient direction transform. Determining the iris position may
include determining a first dimension position and a second
dimension position corresponding to a maximum value in the
transform space.
Inventors: |
Mohamed; Magdi Abuelgasim;
(San Diego, CA) ; Sarkis; Michel Adib; (San Diego,
CA) ; Qi; Yingyong; (San Diego, CA) ; Zhong;
Xin; (San Diego, CA) ; Bi; Ning; (San Diego,
CA) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
QUALCOMM Incorporated |
San Diego |
CA |
US |
|
|
Family ID: |
54869952 |
Appl. No.: |
14/586863 |
Filed: |
December 30, 2014 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
62015043 |
Jun 20, 2014 |
|
|
|
62015060 |
Jun 20, 2014 |
|
|
|
Current U.S.
Class: |
382/282 |
Current CPC
Class: |
G06K 9/6212 20130101;
G06K 9/4642 20130101; G06K 9/0061 20130101; G06T 3/0012 20130101;
G06T 2207/20021 20130101; G06K 9/4633 20130101 |
International
Class: |
G06T 3/00 20060101
G06T003/00; G06T 7/00 20060101 G06T007/00 |
Claims
1. A method for obtaining structural information from a digital
image by an electronic device, comprising: determining an iris
position in a region of interest based on a gradient direction
transform.
2. The method of claim 1, wherein determining the iris position
comprises determining a first dimension position and a second
dimension position corresponding to a maximum value in a transform
space.
3. The method of claim 1, further comprising performing a second
transform based on a digital image, and wherein determining the
iris position is based on a confidence measure that combines
information from a transform space of the gradient direction
transform and the second transform.
4. The method of claim 1, further comprising: performing blur
convolution based on a digital image to produce weights; and
weighting a transform space of the gradient direction transform
based on the weights to produce a weighted transform space.
5. The method of claim 1, further comprising determining a first
dimension component value and a second dimension component value,
wherein a gradient vector comprises the first dimension component
value and the second dimension component value.
6. The method of claim 1, wherein arithmetic operations of the
gradient direction transform include only one or more of a group of
integer multiplication, integer addition and integer
subtraction.
7. The method of claim 1, wherein determining a first set of pixel
values comprises multiplying an error value by 2.
8. The method of claim 1, wherein numbers utilized by the gradient
direction transform only include integer values, wherein the
integer values are not represented as floating point numbers.
9. The method of claim 1, wherein each element of a transform space
of the gradient direction transform is represented as a first
dimension position, a second dimension position and a value.
10. The method of claim 1, wherein determining a first set of pixel
values comprises comparing a multiplied error value with one or
more of a second dimension component value and a first dimension
component value.
11. An electronic device for obtaining structural information from
a digital image, comprising: a processor; memory in electronic
communication with the processor; and instructions stored in the
memory, the instructions being executable by the processor to:
determine an iris position in a region of interest based on a
gradient direction transform.
12. The electronic device of claim 11, wherein determining the iris
position comprises determining a first dimension position and a
second dimension position corresponding to a maximum value in a
transform space.
13. The electronic device of claim 11, wherein the instructions are
further executable to perform a second transform based on a digital
image, and wherein determining the iris position is based on a
confidence measure that combines information from a transform space
of the gradient direction transform and the second transform.
14. The electronic device of claim 11, wherein the instructions are
further executable to: perform blur convolution based on a digital
image to produce weights; and weight a transform space of the
gradient direction transform based on the weights to produce a
weighted transform space.
15. The electronic device of claim 11, wherein the instructions are
further executable to determine a first dimension component value
and a second dimension component value, wherein a gradient vector
comprises the first dimension component value and the second
dimension component value.
16. The electronic device of claim 11, wherein arithmetic
operations of the gradient direction transform include only one or
more of a group of integer multiplication, integer addition and
integer subtraction.
17. The electronic device of claim 11, wherein determining a first
set of pixel values comprises multiplying an error value by 2.
18. The electronic device of claim 11, wherein numbers utilized by
the gradient direction transform only include integer values,
wherein the integer values are not represented as floating point
numbers.
19. The electronic device of claim 11, wherein each element of a
transform space of the gradient direction transform is represented
as a first dimension position, a second dimension position and a
value.
20. The electronic device of claim 11, wherein determining a first
set of pixel values comprises comparing a multiplied error value
with one or more of a second dimension component value and a first
dimension component value.
21. A computer-program product for obtaining structural information
from a digital image, comprising a non-transitory tangible
computer-readable medium having instructions thereon, the
instructions comprising: code for causing an electronic device to
determine an iris position in a region of interest based on a
gradient direction transform.
22. The computer-program product of claim 21, wherein determining
the iris position comprises determining a first dimension position
and a second dimension position corresponding to a maximum value in
a transform space.
23. The computer-program product of claim 21, further comprising
code for causing the electronic device to perform a second
transform based on a digital image, and wherein determining the
iris position is based on a confidence measure that combines
information from a transform space of the gradient direction
transform and the second transform.
24. The computer-program product of claim 21, further comprising:
code for causing the electronic device to perform blur convolution
based on the digital image to produce weights; and code for causing
the electronic device to weight a transform space of the gradient
direction transform based on the weights to produce a weighted
transform space.
25. The computer-program product of claim 21, further comprising
code for causing the electronic device to determine a first
dimension component value and a second dimension component value,
wherein a gradient vector comprises the first dimension component
value and the second dimension component value.
26. An apparatus for obtaining structural information from a
digital image, comprising: means for determining an iris position
in a region of interest based on a gradient direction
transform.
27. The apparatus of claim 26, wherein determining the iris
position comprises determining a first dimension position and a
second dimension position corresponding to a maximum value in a
transform space.
28. The apparatus of claim 26, further comprising means for
performing a second transform based on a digital image, and wherein
determining the iris position is based on a confidence measure that
combines information from a transform space of the gradient
direction transform and the second transform.
29. The apparatus of claim 26, further comprising: means for
performing blur convolution based on a digital image to produce
weights; and means for weighting a transform space of the gradient
direction transform based on the weights to produce a weighted
transform space.
30. The apparatus of claim 26, further comprising means for
determining a first dimension component value and a second
dimension component value, wherein a gradient vector comprises the
first dimension component value and the second dimension component
value.
Description
RELATED APPLICATIONS
[0001] This application is related to and claims priority to U.S.
Provisional Patent Application Ser. No. 62/015,043, filed Jun. 20,
2014, for "SYSTEMS AND METHODS FOR OBTAINING STRUCTURAL INFORMATION
FROM A DIGITAL IMAGE," and to U.S. Provisional Patent Application
Ser. No. 62/015,060, filed Jun. 20, 2014, for "SYSTEMS AND METHODS
FOR OBTAINING STRUCTURAL INFORMATION FROM A DIGITAL IMAGE."
TECHNICAL FIELD
[0002] The present disclosure relates generally to electronic
devices. More specifically, the present disclosure relates to
systems and methods for obtaining structural information from a
digital image.
BACKGROUND
[0003] In the last several decades, the use of electronic devices
has become common. In particular, advances in electronic technology
have reduced the cost of increasingly complex and useful electronic
devices. Cost reduction and consumer demand have proliferated the
use of electronic devices such that they are practically ubiquitous
in modern society. As the use of electronic devices has expanded,
so has the demand for new and improved features of electronic
devices. More specifically, electronic devices that perform new
functions and/or that perform functions faster, more efficiently or
more reliably are often sought after.
[0004] Some electronic devices utilize digital images. For example,
a smartphone may capture and process a digital image. However,
processing digital images may involve complex operations that
require significant resources (e.g., time and power). As can be
observed from this discussion, systems and methods that improve
digital image processing may be beneficial.
SUMMARY
[0005] A method for obtaining structural information from a digital
image by an electronic device is described. The method includes
determining an iris position in a region of interest based on a
gradient direction transform. Determining the iris position may
include determining a first dimension position and a second
dimension position corresponding to a maximum value in a transform
space. Each element of a transform space of the gradient direction
transform may be represented as a first dimension position, a
second dimension position and a value. Determining a first set of
pixel values may include comparing a multiplied error value with
one or more of a second dimension component value and a first
dimension component value.
[0006] The method may include performing a second transform based
on a digital image. Determining the iris position may be based on a
confidence measure that combines information from a transform space
of the gradient direction transform and the second transform.
[0007] The method may include performing blur convolution based on
a digital image to produce weights. The method may also include
weighting a transform space of the gradient direction transform
based on the weights to produce a weighted transform space.
[0008] The method may include determining a first dimension
component value and a second dimension component value. A gradient
vector may include the first dimension component value and the
second dimension component value.
[0009] Arithmetic operations of the gradient direction transform
may include only one or more of a group of integer multiplication,
integer addition and integer subtraction. Determining a first set
of pixel values comprises multiplying an error value by 2. Numbers
utilized by the gradient direction transform may only include
integer values. The integer values may not be represented as
floating point numbers.
[0010] An electronic device for obtaining structural information
from a digital image is also described. The electronic device
includes a processor. The electronic device also includes memory in
electronic communication with the processor. The electronic device
further includes instructions stored in the memory. The
instructions are executable by the processor to determine an iris
position in a region of interest based on a gradient direction
transform.
[0011] A computer-program product for obtaining structural
information from a digital image is also described. The
computer-program product includes a non-transitory tangible
computer-readable medium with instructions. The instructions
include code for causing an electronic device to determine an iris
position in a region of interest based on a gradient direction
transform.
[0012] An apparatus for obtaining structural information from a
digital image is also described. The apparatus includes means for
determining an iris position in a region of interest based on a
gradient direction transform.
BRIEF DESCRIPTION OF THE DRAWINGS
[0013] FIG. 1 is a block diagram illustrating one configuration of
an electronic device in which systems and methods for obtaining
structural information from a digital image may be implemented;
[0014] FIG. 2 is a flow diagram illustrating one configuration of a
method for obtaining structural information from a digital
image;
[0015] FIG. 3A is a flow diagram illustrating a configuration of a
method for transforming each pixel in the region of interest in
accordance with a transform;
[0016] FIG. 3B is a flow diagram illustrating a configuration of a
method for determining one or more transform matrix values;
[0017] FIG. 3C is a flow diagram illustrating a configuration of a
method for processing a direction;
[0018] FIG. 4 is a block diagram illustrating another configuration
of an electronic device in which systems and methods for obtaining
structural information from a digital image may be implemented;
[0019] FIG. 5 is a flow diagram illustrating another configuration
of a method for obtaining structural information from a digital
image;
[0020] FIG. 6 is a block diagram illustrating examples of some
modules that may be implemented in conjunction with some
configurations of the systems and methods disclosed herein;
[0021] FIG. 7 illustrates an example of a Hough transform utilized
for iris detection;
[0022] FIG. 8 illustrates the Timm & Barth approach for
detecting the center of an iris;
[0023] FIG. 9 is a diagram illustrating one example of the
transform disclosed herein;
[0024] FIG. 10 is a diagram illustrating one example of Bresenham's
algorithm for drawing straight lines;
[0025] FIG. 11 is a diagram illustrating one example of Bresenham's
algorithm for drawing circles;
[0026] FIG. 12 is a diagram illustrating one example of Bresenham's
algorithm for drawing ellipses;
[0027] FIG. 13 illustrates examples of applications of the
transform disclosed herein;
[0028] FIG. 14 illustrates one example of an image of an eye;
[0029] FIG. 15 illustrates one example of a gray image of an
eye;
[0030] FIG. 16 illustrates one example of a gradient horizontal
component (Gx) gray image of an eye;
[0031] FIG. 17 illustrates another example of a gradient vertical
component (Gy) gray image of an eye;
[0032] FIG. 18 illustrates one example of a transform space in
accordance with the systems and methods disclosed herein;
[0033] FIG. 19 illustrates another representation of the transform
space in accordance with the systems and methods disclosed
herein;
[0034] FIG. 20 illustrates one example of a transform space, in
lower spatial resolution, in accordance with the systems and
methods disclosed herein;
[0035] FIG. 21 illustrates another representation of the transform
space, in lower spatial resolution, in accordance with the systems
and methods disclosed herein;
[0036] FIG. 22 illustrates one example of a Timm & Barth
transform;
[0037] FIG. 23 illustrates another representation of the Timm &
Barth transform;
[0038] FIG. 24 illustrates one example of a Timm & Barth
transform, in lower spatial resolution;
[0039] FIG. 25 illustrates another representation of the Timm &
Barth transform, in lower spatial resolution;
[0040] FIG. 26 illustrates a comparison between the transform
disclosed herein and the Timm & Barth transform;
[0041] FIG. 27 illustrates another comparison between the transform
disclosed herein and the Timm & Barth transform;
[0042] FIG. 28 is a block diagram illustrating a more specific
configuration of modules for obtaining structural information from
a digital image;
[0043] FIG. 29 illustrates an example of the BAR transform;
[0044] FIG. 30 is a diagram illustrating one example of a deep
neural network;
[0045] FIG. 31 is a flow diagram illustrating one configuration of
a method for determining a character from a digital image;
[0046] FIG. 32 illustrates an example of a transform space using a
gradient normal direction for handwriting character
recognition;
[0047] FIG. 33 illustrates an example of a transform space using a
gradient tangent direction for handwriting character
recognition;
[0048] FIG. 34 illustrates an example of gradient direction
descriptor (GDD) computations in accordance with the systems and
methods disclosed herein;
[0049] FIG. 35 is a diagram illustrating construction of a feature
descriptor based on the transform described herein;
[0050] FIG. 36 summarizes some results for a handwriting
recognition application for different feature descriptors with
neural networks of 0 hidden layers;
[0051] FIG. 37 summarizes some results for a handwriting
recognition application for different feature descriptors with
neural networks of 2 hidden layers;
[0052] FIG. 38 is a block diagram illustrating one configuration of
a wireless communication device in which systems and methods for
obtaining structural information from a digital image may be
implemented; and
[0053] FIG. 39 illustrates certain components that may be included
within an electronic device.
DETAILED DESCRIPTION
[0054] Systems and methods for obtaining structure information from
a digital image are described herein. For example, some
configurations of the systems and methods disclosed herein may
utilize a gradient direction transform for fast detection of curved
items in digital images.
[0055] Detecting naturally curved items in digital images by
identifying their constituent components usually requires
sophisticated processing to deal with significant variations in
object size and imaging conditions. Conventional Hough
Transform-based techniques work on the gradient magnitude of the
images after binarization. The Timm and Barth (T&B)
conventional technique considers only the gradient orientations
(ignoring the direction signs) and overlooks the size of the
object. The requirements of these conventional techniques limit
their uses in real-time platforms with small memory and processing
footprints for high resolution inputs and data rates.
[0056] The systems and methods disclosed herein describe a
transform for fast detection of naturally curved items in digital
images. This general purpose image transform may be defined to suit
platforms with limited memory and processing footprints by
utilizing simple operations (e.g., only additions and simple shift
and bitwise operations in some configurations). This unique
algorithmic approach may be applied to real world problems of iris
detection and handwriting recognition systems as applications in
electronic devices. The new approach has been tested on several
data sets and the experiments show promising and superior
performance compared to known techniques.
[0057] In particular, the systems and methods disclosed herein may
provide a general purpose digital image transform that
characterizes the content of the input image by emphasizing the
locations where the gradient normal vectors intersect and/or
diverge. Accordingly, this transform may provide concavity and
convexity descriptors of the content. By virtue of its design, this
novel integer-computation-based Gradient Direction Transform (GDT)
can differentiate between the positive and negative direction of
the gradient normal vectors, rather than only considering the
orientation of the gradient normal vectors as done in the expensive
floating-point-computation-based T&B technique. The GDT can be
extended to consider the gradient tangent vector direction together
or instead of the gradient normal vector direction. These GDT
mappings can be used as standalone techniques or in combination
with the classical techniques depending upon the target
application. One application of the GDT is real-time iris detection
purposes. It utilizes an efficient rasterizing algorithm for
determining curves, developed by Bresenham, using simple integer
addition and shift registers to perform the computations.
[0058] Various configurations are now described with reference to
the Figures, where like reference numbers may indicate functionally
similar elements. The systems and methods as generally described
and illustrated in the Figures herein could be arranged and
designed in a wide variety of different configurations. Thus, the
following more detailed description of several configurations, as
represented in the Figures, is not intended to limit scope, as
claimed, but is merely representative of the systems and
methods.
[0059] FIG. 1 is a block diagram illustrating one configuration of
an electronic device 102 in which systems and methods for obtaining
structural information from a digital image 106 may be implemented.
Examples of the electronic device 102 include smartphones, cellular
phones, digital cameras, tablet devices, laptop computers, desktop
computers, video cameras, etc.
[0060] The electronic device 102 may include a digital image 106
obtaining module 104, a gradient vector determination module 108
and/or a transformation module 112. As used herein, a "module" may
be implemented in hardware (e.g., circuitry) or a combination of
hardware and software. It should be noted that one or more of the
modules described in connection with FIG. 1 may be optional.
Furthermore, one or more of the modules may be combined or divided
in some configurations. More specific examples of one or more of
the functions, procedures and/or structures described in connection
with FIG. 1 may be given in connection with one or more of FIGS.
2-6, 9-21 and 26-37.
[0061] The digital image 106 obtaining module 104 may obtain a
digital image 106. For example, the electronic device 102 may
capture a digital image 106 using one or more image sensors and/or
cameras. Additionally or alternatively, the electronic device 102
may receive the digital image 106 from another device (e.g., a
memory card, an external storage device, a web camera, a digital
camera, a smartphone, a computer, a video camera, etc.).
[0062] The digital image 106 or a region of interest of the digital
image 106 may be provided to the gradient vector determination
module 108. The region of interest may include the entire digital
image 106 or a portion of the digital image 106. For example, the
region of interest may include a subset of the pixels of the
digital image 106. In some configurations, the region of interest
may be a cropped portion of the digital image 106. Additionally or
alternatively, the region of interest may be a down-sampled or
decimated version of all or part of the digital image 106. For
instance, the region of interest may be a lower resolution version
of all or part of the digital image 106.
[0063] In some configurations, the electronic device 102 (e.g.,
digital image obtaining module 104) may detect the region of
interest as a particular structure shown in the digital image 106.
For example, the electronic device 102 may detect a face, eye,
character (e.g., number, letter, etc.) and/or other structure shown
in the digital image 106. The electronic device 102 (e.g., digital
image obtaining module 104) may extract the region of interest
and/or crop out other portions of the digital image 106 to obtain
the region of interest.
[0064] The gradient vector determination module 108 may determine a
gradient vector for each pixel in a region of interest of the
digital image 106. For example, the gradient vector determination
module 108 may utilize one or more Sobel operators or Sobel filters
to determine a gradient vector for each pixel in the region of
interest. Other approaches may be utilized. In some configurations,
each of the gradient vectors 110 may be represented as a first
dimension component value (e.g., "dy") and a second dimension
component value (e.g., "dx"). The gradient vectors 110 may be
provided to the transformation module 112.
[0065] The transformation module 112 may transform each pixel in
the region of interest (e.g., each gradient vector corresponding to
each pixel in the region of interest) in accordance with a
transform (e.g., the GDT). For example, the transformation module
112 may determine a first set of pixels for each pixel in the
region of interest. The first set of pixels includes any pixel
along a line that is collinear with or perpendicular to the
gradient vector and that passes through the pixel location (and/or
intersects the origin of the gradient vector for the current pixel,
for example). For instance, the transformation module 112 may
determine a line that is collinear with the gradient vector. The
line may extend to one or more edges of the region of interest. For
example, the line may extend from the current pixel (e.g., from the
origin of the gradient vector) in one or both directions to one or
more edges of the region of interest. Any pixel that is along the
line may be included in the first set of pixels. It should be noted
that a line may not be formed in some cases. For example, if the
first dimension component value and the second dimension component
value of the gradient vector are both 0, then no line may be formed
(and hence no pixels may be along a line). In this case, a set of
pixels (e.g., the first set of pixels) may be an empty set or may
include only one pixel (e.g., the current pixel).
[0066] The transformation module 112 may increment, for each pixel,
a first set of values in a transform space corresponding to any of
the first set of pixels that are in a first direction of the line.
For example, each value in the transform space corresponds to a
pixel in the region of interest of the digital image 106 (in the
image space). In some configurations, the transform space may
include a set of elements, where each element of the transform
space is represented as a first dimension position, a second
dimension position and a value. The first dimension position and
the second dimension position in the transform space may
respectively correspond to a first dimension position and a second
dimension position of the corresponding pixel in the image space.
For example, the first dimension position may be represented as an
index value along a vertical axis (e.g., y axis) and the second
dimension position may be represented as an index value along a
horizontal axis (e.g., x axis). Each value (e.g., score) in the
transform space may indicate a number of lines intersecting a pixel
corresponding to an element in the transform space.
[0067] The transformation module 112 may increment a first set of
values in the transform space corresponding to any of the pixels in
the first set of pixels in a first direction of the line. For
example, the transformation module 112 may increment each value in
the transform space that is along one direction of the line. The
direction may be in the same direction as the gradient vector, in
the opposite direction from the gradient vector, in one direction
along a line that is perpendicular to the gradient vector or in the
other direction along a line that is perpendicular to the gradient
vector. One or more of the values may accumulate (e.g., increase)
as values along lines corresponding to each pixel in the region of
interest are incremented. Each value may accordingly represent a
cumulative score.
[0068] In some configurations, the transformation module 112 may
increment or decrement a second set of values corresponding to any
of the pixels in the first set of pixels in a second direction of
the line. For example, assume a configuration in which the first
set of pixels includes pixels in line with a gradient vector that
extends to two edges of the region of interest. In this example,
the transformation module 112 may increment all values in the
transform space corresponding to pixels in the first set of pixels
that are in the same direction as the gradient vector. The
transformation module 112 may also decrement all values in the
transform space corresponding to pixels in the first set of pixels
that are in the opposite direction from the gradient vector.
[0069] It should be noted that the term "increment" and variations
thereof may mean incrementing in a positive direction (e.g., +1) or
incrementing in a negative direction (e.g., -1). The term
"decrement" and variations thereof may mean decrementing from a
positive direction (e.g., -1) or decrementing from a negative
direction (e.g., +1). When "increment" and "decrement" and
variations thereof are used in a single context (e.g., in a
configuration, in a claim, in an example, etc.), incrementing and
decrementing are opposite operations. It should be noted that an
increment size or decrement size may be an arbitrary value (e.g.,
+1, -1, +2, -2, etc.). In some configurations, increment and/or
decrement size may be limited to integer values.
[0070] It should also be noted that the term "addition" and
variations thereof may include adding positive numbers, negative
numbers or a combination thereof. For example, -1 may be added to
-1 to yield -2. In some configurations, "decrementing" may be
implemented as an addition of a negative number. For example, an
electronic device 102 may decrement a number by adding a negative
number.
[0071] The transformation module 112 may provide values 114
corresponding to the transform space. For example, the values 114
may be the values (e.g., scores) from each element of the transform
space. The values 114 may provide a measure of concavity and/or
convexity of one or more structures (e.g., lines, curves, shapes,
etc.) in the image. The values 114 may be utilized to determine one
or more parameters of the one or more structures. In one example,
high values 114 may indicate a focus (e.g., center) of structures.
For instance, a maximum score may indicate the center of a circle
(e.g., the iris of an eye) or an ellipse. Groups of approximately
uniform values 114 may be utilized to determine the size of a
shape. For example, rectangular shapes may exhibit approximately
uniform values 114 along width and/or height of the shape. Patterns
of scores may also be used to detect (e.g., recognize and/or
identify) certain structures.
[0072] For clarity, some distinguishing characteristics of the
transform (e.g., GDT) disclosed in accordance with the systems and
methods disclosed herein are described as follows. It should be
noted that some of these distinguishing characteristics may only
apply in certain configurations of the transform (e.g., GDT).
[0073] The transform disclosed herein may specify one or more
lines, where the one or more lines are collinear with the gradient
vector and/or perpendicular to the gradient vector of each pixel.
Some known transforms (e.g., the Hough transform) may utilize
circles or cones. Utilizing circles or cones in a transform is more
computationally complex than utilizing lines. Accordingly, the
transform disclosed herein is advantageous because it is less
computationally complex than transforms that utilize circles, cones
or ellipses. In comparison, the transform disclosed herein may be
computed more quickly and may utilize fewer resources (e.g.,
processing resources, memory, power, etc.).
[0074] The one or more lines specified in accordance with the
transform disclosed herein may pass through the current pixel or
origin of the gradient vector. As described above, the transform
space values 114 along these one or more lines may be incremented
and/or decremented. Some known approaches for detecting shapes
(e.g., U.S. Patent Application Publication No. 2006/0098877) "vote"
for pixels along a line that does not pass through the origin. The
transform disclosed herein is superior to these known approaches
for several reasons. For example, these known approaches require
the specification of a number of sides of a polygon as well as a
size (e.g., radius). In contrast, the transform disclosed herein
may not require the specification of a number of sides or a size.
Furthermore, the transform disclosed herein may operate on
arbitrary shapes and/or curves of arbitrary size. These known
approaches also require determining an endpoint in order to specify
a line for "voting." Determining an endpoint may require floating
point numbers, which the transform disclosed herein does not
require. In general, operations with floating point numbers may be
more computationally expensive than operations on integer values.
Accordingly, the transform disclosed herein may operate more
efficiently, more flexibly (e.g., with fewer
constraints/assumptions) and/or more quickly than these known
approaches.
[0075] In some configurations of the transform disclosed herein,
arithmetic operations of the transform (e.g., GDT) may only include
multiplication (e.g., integer multiplication), addition (e.g.,
integer addition) and/or subtraction (e.g., integer subtraction).
For example, each iteration of a loop for determining the first set
of pixels as described above may only utilize a multiplication by
two (which may be implemented as a simple bit shift) and a limited
number of addition and/or subtraction operations. Additionally or
alternatively, some configurations of the transform disclosed
herein may only utilize integer numbers (which may not be
represented as floating point numbers, for example). Known
approaches and/or transforms utilize computationally expensive
operations and/or data representations (e.g., floating point
numbers). For example, the T&B transform may utilize floating
point numbers and operations such as a dot product, squaring the
dot product and normalization. The Histogram of Gradients (HOG) may
use floating point numbers and a normalization operation. Other
approaches compute an angle using sine or cosine functions.
However, the transform disclosed herein may not utilize floating
point numbers. Furthermore, some configurations of the transform
disclosed herein may not use computationally expensive operations
such as the dot product, division, normalization, computing a norm,
raising by a power (in general), sine, cosine, affine
transformations, etc. Accordingly, the transform disclosed herein
may be advantageous because it requires fewer resources (e.g.,
processing resources, power, time, etc.) than known approaches.
[0076] The transform disclosed herein may offer a computational
complexity limited by A.sup.2BK.sub.2, where A is a longest
dimension of a region of interest (e.g., a number of columns, width
of a region of interest, etc.), B is the other dimension (e.g., a
number of rows, height of a region of interest, etc.) and K.sub.2
is a constant that includes a computational cost of a
multiplication by two (which may be implemented as a bitwise shift)
and a limited number of addition and/or subtraction operations.
Other known approaches/transforms require a higher computational
complexity. For example, the T&B transform has a complexity of
A.sup.2B.sup.2K.sub.1, where K.sub.1 is a constant that includes a
computational cost for a normalized floating point vector dot
product. Accordingly, the transform disclosed herein offers lower
computational complexity than other approaches. This is
advantageous because the transform disclosed herein may be
performed with fewer resources (e.g., processing resources, power,
time, etc.) than known approaches.
[0077] Some configurations of the transform disclosed herein may
only update values corresponding to pixels along a line for each
pixel. Other approaches (e.g., T&B) may update values
corresponding to all pixels for each pixel. Accordingly, the values
that are affected based on a gradient vector corresponding to a
single pixel may be fewer in some configurations of the transform
disclosed herein. This may be beneficial by reducing computational
complexity and improving the robustness in comparison with known
approaches.
[0078] In some configurations of the transform disclosed herein,
one or more lines may extend across the entire region of interest.
However, some other approaches may utilize gradient vectors 110 to
only operate on a very localized window (e.g., 3.times.3 pixels,
5.times.5 pixels, etc.). In contrast, the gradient corresponding to
a single pixel may selectively affect values across the region of
interest in accordance with some configurations of the transform
disclosed herein.
[0079] The transform disclosed herein may only utilize certain
aspects of each gradient vector. For example, the transform
disclosed herein may operate with only the origin and direction
(e.g., orientation) of the gradient vectors 110. In particular,
each line specified by the transform disclosed herein may be
generated only with the origin and direction of the gradient
vector. In contrast, the Hough transform utilizes the magnitude of
a gradient vector. The direction of the gradient vector in
accordance with the transform disclosed herein may be represented
in terms of a first dimension component value (with sign) and a
second dimension component value (with sign). The transform
disclosed herein may not explicitly calculate an angle of the
gradient vector. In contrast, other approaches use trigonometric
functions (e.g., sine, cosine, etc.) to obtain vector angles. Also,
the T&B transform squares a dot product, which eliminates the
direction of the vector. In other words, vector sign is irrelevant
to the T&B transform. By utilizing only certain aspects of each
gradient vector (e.g., origin and direction) in a simple
representation (e.g., origin with two values and direction in terms
of two values), the transform disclosed herein requires fewer
resources (e.g., processing resources, power, time, etc.) in
comparison with known approaches/transforms.
[0080] The transform disclosed herein may produce a transform space
with a particular meaning. The transform space may be represented
as a set of elements, where each element includes a first dimension
position (e.g., y0), a second dimension position (e.g., x0) and a
value. In some configurations, the transform space may be
represented with only the first dimension position, the second
dimension position and the value. As described above, each value
may represent a number of lines intersecting a pixel that
corresponds to the element in the transform space (e.g., an
accumulated score based on lines corresponding to pixels in the
region of interest). Other transform spaces in known approaches may
represent different quantities. For example, a transform space in
accordance with the Generalized Hough Transform may represent votes
for the size, orientation and location of an ellipse. In another
example, the Hough transform produces a transform space that
provides line parameters. The transform disclosed herein produces a
transform space that provides a measure of concavity and/or
convexity of one or more structures in a digital image 106.
[0081] FIG. 2 is a flow diagram illustrating one configuration of a
method 200 for obtaining structural information from a digital
image 106. The method may be performed by the electronic device 102
described in connection with FIG. 1. The electronic device 102 may
obtain 202 a digital image 106. This may be accomplished as
described above in connection with FIG. 1, for example.
[0082] The electronic device 102 may determine 204 a gradient
vector for each pixel in a region of interest of the digital image
106. This may be accomplished as described above in connection with
FIG. 1, for example.
[0083] For each pixel, the electronic device 102 may determine 206
a first set of pixels including any pixel along a line that is
collinear with or perpendicular to the gradient vector and that
passes through the pixel location (and/or intersects an origin of
the gradient vector, for example). This may be accomplished as
described above in connection with FIG. 1, for example.
[0084] For each pixel, the electronic device 102 may increment 208
a first set of values in a transform space corresponding to any of
the first set of pixels that are in a first direction of the line.
This may be accomplished as described above in connection with FIG.
1, for example.
[0085] Transforming each pixel in the region of interest may
comprise determining 206 the first set of pixels and incrementing
208 the first set of values. Transforming each pixel in the region
of interest in accordance with the systems and methods disclosed
herein solves technological problems. In particular, transforming
the pixels solves a problem of efficiently representing object
features in digital images. As discussed above, many approaches
utilize transforms that are computationally more complex (e.g.,
with floating point number representation and/or more
computationally expensive functions such as dot products and sine
or cosine functions). Accordingly, those approaches may waste power
and/or hinder implementation on platforms with limited processing
capability and/or limited power resources (e.g., mobile devices
powered with batteries). However, the systems and methods disclosed
herein may enable faster processing and/or greater energy
efficiency. This enables efficient object feature representation,
which may allow efficient processing on a variety of platforms and
in particular, on mobile platforms where processing and power are
limited resources.
[0086] The values 114 in the transform space (e.g., the transformed
pixels) may be utilized in a variety of applications. For example,
the values 114 may be applied to object detection, object tracking
and/or object recognition in digital images. For instance, the
values 114 may provide an indication of object parameters, such as
a center point of an iris. Additionally or alternatively, the
values 114 may provide recognitions patterns for handwriting
characters.
[0087] FIG. 3A is a flow diagram illustrating a configuration of a
method 300a for transforming each pixel in the region of interest
in accordance with a transform. The method may be performed by the
electronic device 102 described in connection with FIG. 1, for
example. Specifically, FIG. 3A illustrates an example of an
algorithm of the gradient direction transform (e.g., a function
call, R=gdt(G, Gx, Gy)). The algorithm may be implemented, for
example, as a gdt(G, Gx, Gy) function that takes three inputs: G,
Gx and Gy. G is a matrix (e.g., image) that represents the pixels
in the region of interest. Gx is a matrix representing a gradient
horizontal component of the region of interest and Gy is a matrix
representing a gradient vertical component of the region of
interest. It should be noted that the variable names described
herein and/or depicted in the Figures are merely examples. Other
variable names may be utilized.
[0088] The electronic device 102 may initialize 302 parameters. For
example, a "rows" parameter may be set to the number of rows in G
(e.g., rows=rows(G)). A "cols" parameter may be set to the number
of columns in G (e.g., cols=columns(G)). All element values in a
transform matrix R may be initially set to 0, for example. The
transform matrix R may include values 114 in the transform space. A
vertical position parameter y0 may be initially set to 1 (e.g.,
y0=1) and a horizontal position parameter x0 may be initially set
to 1 (e.g., x0=1).
[0089] The electronic device 102 may determine 304 whether the
vertical position parameter is less than or equal to the rows
parameter (e.g., y0.ltoreq.rows, "y0<=rows"). If the vertical
position parameter is not less than or equal to the rows parameter,
the electronic device 102 may output 316 the transform matrix
R.
[0090] If the vertical position parameter is less than or equal to
the rows parameter, the electronic device 102 may determine 306
whether the horizontal position parameter is less than or equal to
the columns parameter (e.g., x0.ltoreq.cols, "x0<=cols"). If the
horizontal position parameter is not less than or equal to the
columns parameter, the electronic device 102 may increment 314 the
vertical position parameter (e.g., y0=y0+1). The electronic device
102 may return to determining 304 whether the vertical position
parameter is less than or equal to the number of rows.
[0091] If the horizontal position parameter is less than or equal
to the columns parameter, the electronic device 102 may set 308
dimension component values (e.g., dx=Gx(y0, x0); dy=Gy(y0, x0)).
For example, the electronic device 102 may set a horizontal
dimension component value (e.g., dx) to the value of the gradient
horizontal component matrix (e.g., Gx) at a row and column
indicated by the vertical position parameter (e.g., y0) and the
horizontal position parameter (e.g., x0). The electronic device may
additionally set a vertical dimension component value (e.g., dy) to
the value of the gradient vertical component matrix (e.g., Gy) at a
row and column indicated by the vertical position parameter (e.g.,
y0) and the horizontal position parameter (e.g., x0).
[0092] The electronic device 102 may determine 310 one or more
transform matrix values (e.g., R=rasterDir(R, x0, y0, dx, dy)). For
example, the electronic device 102 may increment one or more values
of the transform matrix (e.g., R) along a line that is collinear
with or perpendicular to the gradient vector (e.g., dx, dy) and
passes through a pixel location (e.g., x0, y0). An example of an
algorithm for determining 310 one or more transform matrix values
is given in connection with FIG. 3B.
[0093] The electronic device 102 may increment 312 the horizontal
component value (e.g., x0=x0+1). The electronic device 102 may
return to determining 306 whether the horizontal position parameter
is less than or equal to the number of columns. As illustrated in
FIG. 3A, the algorithm may iterate over the rows and columns of G,
obtaining the transform matrix (or one or more values of the
transform matrix) R at each pixel location.
[0094] FIG. 3B is a flow diagram illustrating a configuration of a
method 300b for determining one or more transform matrix values.
The method 300b may be performed by the electronic device 102
described in connection with FIG. 1, for example. More
specifically, FIG. 3B illustrates an example of an algorithm for
determining one or more transform matrix values that takes any
inputs A, x0, y0, dx and dy (e.g., a function call, R=rasterDir(A,
x0, y0, dx, dy)), where A is a matrix variable name, where x0, y0,
dx and dy are scalar variable names, and where the algorithm
returns a matrix R (e.g., R=rasterDir(A, x0, y0, dx, dy)).
[0095] The electronic device 102 may initialize 318 a temporary
matrix (e.g., R) as equal to the input matrix A (e.g., R=A). The
electronic device 102 may determine 320 whether the horizontal
dimension component (e.g., dx) is equal to 0 and the vertical
dimension component (e.g., dy) is equal to 0 (e.g., dx==0 AND
dy==0). If the horizontal dimension component (e.g., dx) is equal
to 0 and the vertical dimension component (e.g., dy) is equal to 0
(e.g., dx==0 AND dy==0), the electronic device 102 may output 328
the transform matrix (e.g., R). For example, in the case that the
gradients components in both directions are zero, the transform
matrix (e.g., R) may not be modified (for that pixel location, for
instance).
[0096] If the horizontal dimension component (e.g., dx) is not
equal to 0 or the vertical dimension component (e.g., dy) is not
equal to 0, the electronic device 102 may optionally process 322
the outward direction (of the gradient vector, for example).
Processing 322 the outward direction may include incrementing one
or more values in a direction relative to the gradient vector
(e.g., collinear with or perpendicular to the gradient vector).
Processing 322 the outward direction may be based on the input
matrix (e.g., A), the horizontal position parameter (e.g., x0), the
vertical position parameter (e.g., y0), the horizontal dimension
component value (e.g., dx), the vertical dimension component value
(e.g., dy) and/or a direction parameter (e.g., dir). In some
configurations, processing 322 the outward direction may be
performed in accordance with a function rasterDirection(A, x0, y0,
x0+dx, y0+dy, dir), where dir is -1 (or optionally 1 or 0). For
instance, x1=x0+dx and y1=y0+dy. The direction parameter (e.g.,
dir) may represent an amount by which values in the transform space
are incremented and/or decremented. In some configurations,
optionally processing 322 the outward direction may be performed in
accordance with the pseudo code in Listing (1).
TABLE-US-00001 Listing (1) /* process outward direction */ dir =
-1; /* optionally, dir=1; or dir=0; */ R = rasterDirection(A, x0,
y0, x0+dx, y0+dy, dir);
[0097] The electronic device 102 may optionally adjust 324 the
transform matrix (e.g., R). For example, the electronic device may
subtract the direction parameter from the value of the transform
matrix at position indicated by the horizontal position parameter
and the vertical position parameter (e.g., R(y0, x0)=R(y0,
x0)-dir). The adjustment 324 may be performed in some
configurations and/or cases because the pixel location (y0, x0) may
be processed twice, being part of both the outward and inward
lines. In some configurations, optionally adjusting 324 the
transform matrix may be performed in accordance with the pseudo
code in Listing (2).
TABLE-US-00002 Listing (2) /* adjust for inward direction*/ R(y0,
x0) = R(y0, x0) - dir;
In some configurations, the adjustment may be optionally utilized
in the application of the transform to raster only along one
direction (inward or outward) or both. For example, configuring the
transform along only one direction may reduce the total computation
and avoid processing the same pixel twice. It should be noted that
such configurations are optional and/or may only be used in certain
use cases.
[0098] The electronic device 102 may process 326 the inward
direction (of the gradient vector, for example). Processing 326 the
inward direction may include incrementing one or more values in a
direction relative to the gradient vector (e.g., collinear with or
perpendicular to the gradient vector, opposite from the outward
direction). Processing 326 the inward direction may be based on the
transform matrix (e.g., R), the horizontal position parameter
(e.g., x0), the vertical position parameter (e.g., y0), the
horizontal dimension component value (e.g., dx), the vertical
dimension component value (e.g., dy) and a direction parameter
(e.g., dir). In some configurations, processing 326 the inward
direction may be performed in accordance with a function
rasterDirection(R, x0, y0, x0-dx, y0-dy, dir), where dir is 1. For
instance, x1=x0-dx and y1=y0-dy. In some configurations, processing
326 the inward direction may be performed in accordance with the
pseudo code in Listing (3).
TABLE-US-00003 Listing (3) /* process inward direction */ dir = 1;
R = rasterDirection(R, x0, y0, x0-dx, y0-dy, dir);
[0099] Depending on the configuration, the electronic device 102
may process 326 just the inward direction or may process 322 the
outward direction and process 326 the inward direction. More detail
regarding processing 322 the outward direction and/or processing
326 the inward direction is described in connection with FIG. 3C.
The electronic device 102 (e.g., the rasterDir function) may output
328 the transform matrix (e.g., R).
[0100] FIG. 3C is a flow diagram illustrating a configuration of a
method 300c for processing a direction. The method may be performed
by the electronic device 102 described in connection with FIG. 1,
for example. More specifically, FIG. 3C illustrates an example of
an algorithm for processing a direction that takes inputs A, x0,
y0, x1, y1 and dir (e.g., function call R=rasterDirection(A, x0,
y0, x1, y1, dir)).
[0101] The electronic device 102 may initialize 330 parameters. For
example, the electronic device 102 may determine a horizontal
dimension component value (e.g., dx) as the absolute value of the
difference of a first horizontal dimension component value (e.g.,
x0) and a second horizontal dimension component value (e.g., x1).
For example, dx=abs(x1-x0). The electronic device 102 may
initialize a horizontal sign value (e.g., sx). For example, the
electronic device 102 may initialize the horizontal sign value
based on the first horizontal dimension component value (e.g., x0)
and the second horizontal dimension component value (e.g., x1). If
the first horizontal dimension component value is less than the
second horizontal dimension component value, the horizontal sign
value may be set to 1. Otherwise, the horizontal sign value may be
set to -1. For instance, if (x0<x1), sx=1; else sx=-1.
[0102] The electronic device 102 may determine a vertical dimension
component value (e.g., dy) as the negative absolute value of the
difference of a first vertical dimension component value (e.g., y0)
and a second vertical dimension component value (e.g., y1). For
example, dy=-abs(y1-y0). The electronic device 102 may initialize a
vertical sign value (e.g., sy). For example, the electronic device
102 may initialize the vertical sign value based on the first
vertical dimension component value (e.g., y0) and the second
vertical dimension component value (e.g., y1). If the first
vertical dimension component value is less than the second vertical
dimension component value, the vertical sign value may be set to 1.
Otherwise, the vertical sign value may be set to -1. For instance,
if (y0<y1), sy=1; else sy=-1.
[0103] The electronic device 102 may initialize an error value
(e.g., err). For example, the error value may be the sum of the
horizontal dimension component value (e.g., dx) and the vertical
dimension component value (e.g., dy). For instance, err=dx+dy.
[0104] The electronic device 102 may initialize a row parameter
(e.g., "rows") to the number of rows in the input matrix (e.g., A)
and may initialize a column parameter (e.g., "cols") to the number
of columns in the input matrix (e.g., A). For example, rows=rows
(A) and cols=columns(A). In some configurations, the transform
matrix (e.g., R) may be set to the temporary matrix (e.g., A). For
example, R=A. It should be noted that A is the matrix input
variable name. A continue indicator (e.g., "cont") may be
initialized to 1 (e.g., cont=1). In some configurations,
initializing 330 parameters may be accomplished in accordance with
the pseudo code in Listing (4).
TABLE-US-00004 Listing (4) /* set initial values */ dx =
abs(x1-x0); if (x0<x1), sx=1; else sx=-1; end; dy = -abs(y1-y0);
if (y0<y1), sy=1; else sy=-1; end; err = dx + dy; rows = number
of rows in A; cols = number of columns in A; R = A; cont = 1;
[0105] The electronic device 102 may determine 332 whether to
continue processing (e.g., whether cont==1). This determination may
be based on the continue indicator. If the electronic device 102
determines not to continue processing, the electronic device 102
(e.g., the rasterDirection function) may output 336 the transform
matrix (e.g., R). For example, if cont==0, then the electronic
device 102 may output the transform matrix.
[0106] If the electronic device 102 determines to continue
processing (e.g., if cont==1), the electronic device 102 may
process 334 a line. For example, the electronic device 102 may
increment a value in the transform matrix by the direction
parameter (e.g., dir). For instance, R(y0, x0)=R(y0, x0)+dir. The
electronic device 102 may multiply the error value to obtain a
multiplied error value (e.g., e2). For example, e2=2*err. The
electronic device 102 may adjust the error value (e.g., err) and
one or more of the position parameters (e.g., x0 and/or y0) based
on the multiplied error value, the horizontal dimension component
value, the vertical dimension component value, the horizontal sign
and/or the vertical sign. For example, if (e2.gtoreq.dy),
err=err+dy, x0=x0+sx; end; if (e2<dx), err=err+dx, y0=y0+sy;
end. The electronic device 102 may update the continue indicator
when one or more conditions are met. For example, if the horizontal
position parameter is less than 1 or is greater than the number of
columns or if the vertical position parameter is less than 1 or is
greater than the number of rows, the electronic device 102 may
update the continue indicator to indicate that processing should
not continue for the line. For instance, if ((x0<1) OR
(x0>cols) OR (y0<1) OR (y0>rows)) cont=0; end. As
illustrated in FIG. 3C, the electronic device 102 may continue to
process the line until the continue indicator indicates that
processing should not continue. In some configurations, processing
334 the line may be accomplished in accordance with the pseudo code
given in Listing (5).
TABLE-US-00005 Listing (5) R(y0,x0) = R(y0,x0) + dir; e2 = 2 * err;
if (e2>=dy), err=err+dy; x0=x0+sx; end; if (e2<=dx),
err=err+dx; y0=y0+sy; end; if ((x0<1) OR (x0>cols) OR
(y0<1) OR (y0>rows)) cont=0; end;
[0107] FIG. 4 is a block diagram illustrating another configuration
of an electronic device 402 in which systems and methods for
obtaining structural information from a digital image 406 may be
implemented. The electronic device 402 described in connection with
FIG. 4 may be an example of the electronic device 402 described in
connection with one or more of FIGS. 1-2.
[0108] The electronic device 402 may include a digital image
obtaining module 404, a gradient vector determination module 408, a
transformation module 412 and/or a structure determination module
416. It should be noted that one or more of the modules described
in connection with FIG. 4 may be optional. Furthermore, one or more
of the modules may be combined or divided in some configurations.
More specific examples of one or more of the functions, procedures
and/or structures described in connection with FIG. 4 may be given
in connection with one or more of FIGS. 1-3, 5-6, 9-21 and
26-37.
[0109] The digital image obtaining module 404 may obtain a digital
image 406. For example, the electronic device 402 may obtain a
digital image as described above in connection with FIG. 1. The
digital image 406 or a region of interest of the digital image 406
may be provided to the gradient vector determination module 408 as
described above in connection with FIG. 1. For example, the
electronic device 402 may detect the region of interest as a
particular structure shown in the digital image 406. For instance,
the electronic device 402 may detect a face, eye, text, character
(e.g., number, letter, etc.) and/or other structure shown in the
digital image 406.
[0110] The gradient vector determination module 408 may determine a
gradient vector for each pixel in a region of interest of the
digital image 406. For example, the gradient vector determination
module 408 may determine gradient vectors 410 as described above in
connection with one or more of FIGS. 1-2. The gradient vectors 410
may be provided to the transformation module 412.
[0111] The transformation module 412 may transform each pixel in
the region of interest (e.g., each gradient vector corresponding to
each pixel in the region of interest) in accordance with a
transform (e.g., the GDT). For example, the transformation module
412 may determine a first set of pixels for each pixel in the
region of interest as described in connection with one or more of
FIGS. 1-3C. The transformation module 412 may provide values 114
corresponding to the transform space to the structure determination
module 416. For example, the values 114 may be the values 114
(e.g., scores) from each element of the transform space (e.g.,
transform matrix R).
[0112] The structure determination module 416 may determine one or
more structure parameters 418 based on the values 114. As described
above, the values 114 may provide a measure of concavity and/or
convexity of one or more structures (e.g., lines, curves, shapes,
etc.) in the digital image 406 (e.g., in the region of interest).
The values 114 may be utilized to determine one or more parameters
of the one or more structures. In one example, high values 114 may
indicate a focus (e.g., center) of structures.
[0113] In some configurations, the structure determination module
416 may determine an iris position in the region of interest based
on the values 414 (e.g., transform space, transform matrix R,
etc.). For example, the structure determination module 416 may
determine a first dimension position and a second dimension
position corresponding to a maximum value in the transform space.
In particular, the structure determination module 416 may determine
an element of the transform space with a maximum value. The first
dimension position and the second dimension position of the element
may correspond to a pixel in the region of interest where the
center of a circle or ellipse (e.g., an iris center) is located. In
some configurations, the location of the iris may be utilized to
perform one or more operations, such as eye tracking (e.g., for
3-dimensional (3D) image processing), user interface (UI) control,
camera steering, zoom, autofocus, etc.
[0114] In some configurations, the electronic device 402 may
perform additional operations. For example, the electronic device
402 may perform a second transform (e.g., a Hough transform) on the
digital image 406 (e.g., region of interest). Determining the iris
position may then be based on a confidence measure that combines
information from the transform space (of the GDT, for example) and
the second transform (e.g., Hough).
[0115] The electronic device 402 may additionally or alternatively
perform blur convolution based on the digital image 406 to produce
weights and gradient vectors 410. The electronic device 402 may
weight the GDT space based on the weights to produce a weighted GDT
space. The structure determination module 416 may utilize the
weighted transform space to determine the one or more structure
parameters 418 (e.g., the location of an iris).
[0116] In some configurations, the structure determination module
416 may perform character recognition (e.g., handwriting
recognition). For example, the structure determination module 416
may use the transforms for the normal gradient direction and the
tangent gradient direction to compute corresponding feature maps
and construct a compact feature vector, called a Gradient Direction
Descriptor (GDD), containing unique discriminant information. For
instance, the GDD me be used as an input (instead of the raw input
image, for instance) to handwriting classifiers such as Deep Neural
Networks (DNN) to achieve higher recognition accuracies with less
computations and memory requirements.
[0117] FIG. 5 is a flow diagram illustrating another configuration
of a method 500 for obtaining structural information from a digital
image 406. The method 500 may be performed by the electronic device
402 described in connection with FIG. 4. The electronic device 402
may obtain 502 a digital image 406. This may be accomplished as
described above in connection with one or more of FIGS. 1-2 and 4,
for example.
[0118] The electronic device 402 may determine 504 a gradient
vector for each pixel in a region of interest of the digital image
406. This may be accomplished as described above in connection with
one or more of FIGS. 1-2 and 4, for example.
[0119] For each pixel, the electronic device 402 may determine 506
a first set of pixels including any pixel along a line that is
collinear with or perpendicular to the gradient vector and that
passes through the pixel location (and/or intersects an origin of
the gradient vector). This may be accomplished as described above
in connection with one or more of FIGS. 1-4, for example.
[0120] For each pixel, the electronic device 402 may increment 508
a first set of values 114 in a transform space corresponding to any
of the first set of pixels that are in a first direction of the
line. This may be accomplished as described above in connection
with one or more of FIGS. 1-4, for example.
[0121] The electronic device 402 may determine 510 an iris position
in the region of interest based on the transform space. This may be
accomplished as described above in connection with one or more of
FIGS. 1 and 4, for example. For instance, the highest value of the
values 414 in the transform space (e.g., transform matrix R) may be
determined 510 as the iris position (e.g., the center of the
iris).
[0122] FIG. 6 is a block diagram illustrating examples of some
modules that may be implemented in conjunction with some
configurations of the systems and methods disclosed herein. In
particular, FIG. 6 illustrates a frame normalization module 620, an
image preprocessing module 622, an image segmentation module 624, a
feature extraction module 626, an analysis and classification
module 628 and a post-processing module 630. One or more of these
modules may be implemented in an electronic device (e.g., the
electronic device 102 described in connection with FIG. 1 and/or
the electronic device 402 described in connection with FIG. 4). In
some configurations, one or more of the modules described in
connection with FIG. 6 may be utilized for optical eye
tracking.
[0123] The frame normalization module 620 may normalize a digital
image 106 frame. For example, the frame normalization module 620
may normalize the size, light and/or pose of a frame. The image
preprocessing module 622 may perform binarization, noise filtering
and/or smoothing on the digital image 106.
[0124] The image segmentation module 624 may perform morphological
operations, clustering and/or relaxation techniques on the digital
image 106. The feature extraction module 626 may perform one or
more edge operations (e.g., detection, linking and/or
thinning).
[0125] The analysis and classification module 628 may perform one
or more search and/or image interpretation tasks. For example, the
analysis and classification module 628 may perform the transform
disclosed herein (as described in connection with one or more of
FIGS. 1-5) and/or one or more other transforms (e.g., Hough
transform) and/or other operations. For instance, the analysis and
classification module 628 may perform the GDT in order to detect an
iris (e.g., determine a location of an iris) in some
configurations. Iris detection may be performed for one or more
applications. For example, iris detection may be performed for
optical eye tracking. In some configurations, the analysis and
classification module 628 may perform character recognition (e.g.,
handwriting recognition).
[0126] The post-processing module 630 may determine spatial
relationships, perform one or more sanity checks and/or accept or
reject the image interpretation provided by the analysis and
classification module 628. For example, the post-processing module
630 may check the spatial relationships of one or more structures
(e.g., eyes, nose, mouth, etc.) in an image. If the spatial
relationships of the structures are beyond a threshold range in
distance (e.g., a nose is indicated above the eyes), then the
post-processing module 630 may reject the image interpretation.
However, if the spatial relationships of the structures are within
a threshold range in distance, the post-processing module 630 may
accept the image interpretation.
[0127] FIG. 7 illustrates an example of a Hough transform utilized
for iris detection. Previous studies in the literature of digital
image transformation for shape description involve Hough transform
engines (HTE), histogram approaches such as histogram of gradient
(HOG), edge histogram descriptor (EHD), histogram of sign of
gradient (HSG) and Timm & Barth transform (T&B), among
others. (See e.g., Paul V. C. Hough, "Method and Means for
Recognizing Complex Patterns," U.S. Pat. No. 3,069,654, issued on
Dec. 18, 1962; D. H. Ballard, "Generalizing the Hough Transform to
Detect Arbitrary Shapes," Pattern Recognition Vol. 13, No. 2 pp.
111-122, 1981; Magdi Mohamed and Irfan Nasir, "Method and System
for Parallel Processing of Hough Transform Computations," U.S. Pat.
No. 7,406,212, issued on Jul. 29, 2008; N. Dalal and B. Triggs,
"Histogram of Oriented Gradients for Human Detection," in IEEE
Conference Computer Vision and Pattern Recognition, June 2005;
EISO/IEC/JTC1/SC29/WG11, "Core Experiment Results for Edge
Histogram Descriptor (CT4)," MPEG document M6174, Beijing, July
2000; Fabian Timm and Erhardt Barth, "Accurate Eye Center
Localisation by Means of Gradients," in Proceedings of the Int.
Conference of Computer Theory and Application (VISAPP), Volume 1,
pp. 125-130, Algarve, Portugal, 2011.) Daugman also proposed a
contour analyzing algorithm for iris recognition applications. (See
John Daugman, "How Iris Recognition Works", IEEE Transactions on
Circuits and Systems for Video Technology, Vol. 14, No. 1, pp.
21-30, 2004.) For handwriting recognition applications, one
technique based on the conventional binary distance transform
called the bar transform descriptor (BAR) was described and shown
to provide good accuracy. (See Paul Gader, Magdi Mohamed, and
Jung-Hsien Jiang, "Comparison of Crisp and Fuzzy Character Neural
Networks in Handwritten Word Recognition," IEEE Transactions on
Fuzzy Systems, Vol. 3, No. 3, pp. 357-363, 1995.) In connection
with FIGS. 7-8, details of selected techniques applied in the
literature to common applications of interest are described.
[0128] As described above, the Hough transform is one approach that
may be utilized for detecting an iris. Iris detection in accordance
with the Hough transform may have one or more aspects and/or
requirements. In particular, the Hough transform may deal with
variations in light and reflections. It may deal with partially
occluded, missing and noisy features. No special markers or makeup
may be required. The Hough transform may feature real-time
processing. It may also quantify action codes. Accordingly, the
Hough approach may detect multiple curves and is resilient to noisy
inputs.
[0129] More specifically, early activities for detecting
parameterized shapes such as straight lines, circles and ellipses
in binary digital images used the Hough transform. (See Paul V. C.
Hough, "Method and Means for Recognizing Complex Patterns," U.S.
Pat. No. 3,069,654, issued on Dec. 18, 1962.) Gray input images are
usually binarized based on an estimation of the gradient amplitude
and an optimal threshold before computing the Hough transform.
While the Hough approach was found to be very robust and also
capable of detecting multiple curves using a single transform, it
is computationally expensive and requires large memory for
characterizing shapes with a large number of parameters. Several
extensions have been proposed to generalize the Hough method
including the Ballard approach. (See D. H. Ballard, "Generalizing
the Hough Transform to Detect Arbitrary Shapes," Pattern
Recognition Vol. 13, No. 2 pp. 111-122, 1981.)
[0130] Formally, the conventional Hough transform uses a primitive
curve form satisfying the equation
s(x,p)=0 (1)
where p is a parameter vector and x is a position vector in the
input image. This can be viewed as an equation defining points x in
the image space for a fixed parameter vector p, or as defining
points in a parameter space for fixed values of the position vector
x (i.e., for a particular pixel location). In computation of a
Hough transform, the parameter space is quantized to discrete
values of the parameter vector to form a Hough parameter space P.
For a fixed parameter vector p.sub.k.epsilon.P, the coordinates of
x in the image space that satisfy equation (1) are denoted as
x.sub.n(p.sub.k). The value of the corresponding point in the
parameter space is defined as
H ( p k ) = n = 1 N A ( x n ( p k ) ) ( 2 ) ##EQU00001##
where A(x) is the gray level value of the pixel at position x, and
N is the total number of pixels in the input image data. Usually,
A(x) is set to the value 1 for foreground pixels and 0 for
background pixels. The value corresponding to a point in the Hough
transform space can then be calculated recursively as
H.sub.0(p.sub.k)=0
H.sub.n(p.sub.k)=H.sub.n-1(p.sub.k)+A(x.sub.n(p.sub.k)), n=1:N
(3)
[0131] FIG. 7 shows a sample Hough transform for eye detection
(e.g., image space and Hough space). For each point (x.sub.1,
y.sub.1) in the image space of example A 732a, there is a
corresponding cone in the Hough space whose cross-section at radius
of size r is shown in the Hough space for circles. The image space
in example A 732a is illustrated in height (y) versus width (x) (in
pixels, for example). The Hough space is shown in dimensions of
.beta. versus a. In example B 732b (e.g., eye detection), an edge
image after thresholding is shown on the right hand side, and the
resultant best ellipse and circle fits representing the detected
eyelid and iris boundaries are shown on the left hand side. This
method may be robust, since moving a point in the image space will
result only in moving its corresponding cone in the Hough space, in
this case, but since the rest of the cones are not moved, the
solution will remain the same implying resilience to noise.
[0132] Since Hough transform computations are naturally
parallelizable, dedicated hardware designs have already been
considered for real time application domains that require higher
levels of accuracy. (See Magdi Mohamed and Irfan Nasir, "Method and
System for Parallel Processing of Hough Transform Computations,"
U.S. Pat. No. 7,406,212, issued on Jul. 29, 2008.) The Hough
approach remains one of the most successful techniques for many
image analysis applications. It triggered a unique paradigm for
transforming a 0-dimension point in the image space into a
1-dimension curve, or n-dimension structure in the transform space,
for robust shape detection uses.
[0133] FIG. 8 illustrates the Timm & Barth approach for
detecting the center of an iris. In particular, FIG. 8 illustrates
the Timm & Barth (T&B) image transform. Timm & Barth
defined an approach for iris detection by analyzing the vector
field of the image gradients. (See Fabian Timm and Erhardt Barth,
"Accurate Eye Center Localisation by Means of Gradients," in
Proceedings of the Int. Conference of Computer Theory and
Application (VISAPP), Volume 1, pp. 125-130, Algarve, Portugal,
2011.) The approach is motivated by the availability of graphics
processing units (GPUs), since it involves intensive computations
of dot products of normalized vectors constructed from the input
image.
[0134] As illustrated in example A 834a, let c be a possible object
center and g.sub.i be a normalized gradient vector at position
x.sub.i. The normalized displacement vector d.sub.i is defined as
shown in the two cases of example A 834a. The estimated center c*
of a circular object in an image with pixel positions x.sub.i,
where i=1 to N, is given by
c * = arg max c { 1 N i = 1 N ( d i T g i ) 2 } d i = x i - c x i -
c 2 , .A-inverted. i : g i 2 = 1 ( 4 ) ##EQU00002##
Prior knowledge about the object center can be incorporated by
applying a weight w.sub.c for each possible center c and the
modified objective becomes
arg max c 1 N i = 1 N w c ( d i T g i ) 2 ( 5 ) ##EQU00003##
Example B 834b accordingly illustrates one example of the T&B
approach for detecting an iris center.
[0135] FIG. 9 is a diagram illustrating one example of the
transform disclosed herein. Advances in image sensor technologies
have made high data rate digital inputs available to mobile device
applications. Several approaches for transforming digital images
into different domains and feature maps that achieve analysis
results are discussed above. While they may succeed in performing
such tasks to certain extent, a major concern remains to be the
increased complexities of both memory management and floating point
processing in small footprint and low-power battery-operated
devices. Although it is possible to quantize the gradient
directions for some pragmatic uses, this may result in reduced
performance levels. Some configurations of the transform (e.g., an
integer computation based transform) disclosed herein addresses
these concerns by reducing the complexities without quantizing the
gradient direction values or sacrificing overall performance.
[0136] In particular, FIG. 9 illustrates a concept of the Gradient
Direction Transform (GDT). For example, a non-parametric approach
for the analysis of closed and/or open curves 938 in digital images
may be utilized as a mechanism of emphasizing concavities and
convexities in its constituent components using gradient
information. Some configurations of this new transform may rely
only on the estimated gradient direction (ignoring the gradient
amplitude) to characterize the shapes of naturally curved items
938, particularly in ambiguous and noisy imaging situations.
[0137] In some configurations, the GDT may be constructed as
follows. After initializing a transform matrix (e.g., R) to zeroes,
for each gradient vector 940a-c in the input image region of
interest 936, increment or decrement the value of the cells in the
transform matrix that are in line 942a-c with gradient vectors
940a-c according to their location. For example, depending on the
application, just the locations identified by the straight lines
942a-c determined by the gradient vectors may be incremented, or
only the inward locations (opposite the vector 940a-c direction,
for example) may be incremented and the outward locations (in the
vector 940a-c direction, for example) may be decremented. It is
also possible to leave the outward locations with no adjustments to
further reduce the computations.
[0138] By doing so, it is clear that the computation is greatly
reduced to estimating the un-normalized gradient vectors and
identifying the straight line 942a-c associated with each of them.
Another useful characteristic of the GDT is that, in addition to
the gradient vector direction, a second mapping may be constructed
by considering the tangent direction that is orthogonal to the
gradient vector direction, to characterize other features depending
on the application of the transform.
[0139] As described above in connection with FIGS. 1-2 and 4-5,
electronic devices may determine (e.g., estimate) gradient vectors
(e.g., gradient vectors 940a-c). In some configurations,
computation of the gradient of an image f(x, y) may be based on
obtaining the partial derivatives Gx=df/dx and Gy=df/dy at every
pixel location. Several linear convolution operators can be used to
numerically estimate (G.sub.x, G.sub.y) including Sobel operators,
Prewitt operators and/or Scharr operators. Other nonlinear
approaches (e.g., mathematical morphology) may be used to estimate
the gradients when the image is extremely noisy. Sobel operators of
size (3.times.3) for each image axis may be utilized in conjunction
with the new transform and the conventional ones for proper
performance evaluation.
[0140] As can be observed in the example of FIG. 9, incrementing
values along lines 942a-c corresponding to the gradient vectors
940a-c of curved items 938 may tend to accumulate at a location
(e.g., center 944) relative to the curved item. For circles and
ellipses, for example, the value at the center 944 may tend to be
higher than other values, since the lines 942a-c that are collinear
with the gradient vectors 940a-c intersect at the center 944.
[0141] FIG. 10 is a diagram illustrating one example of Bresenham's
algorithm for drawing straight lines. A class of efficient
techniques (e.g., rasterizing algorithms) for drawing curves in
digital images based on Bresenham's algorithm is described. (See
Alois Zingl, "A Rasterizing Algorithm for Drawing Curves, Technical
Report, Multimedia and Software, Technikum-Wien, Wien, 2012.) A
modified version of Bresenham's algorithm may be utilized for
drawing straight lines to identify the locations of cells (x.sub.i,
y.sub.i) in the GDT to be updated according to each gradient
direction estimate. The stopping criterion may be modified to exit
the loop when the cell location is outside of the region of
interest, to avoid solving for intersections with boundary lines
and finding end points that requires floating point computations.
Hence, for each pixel, only the gradient estimates (G.sub.x,
G.sub.y) may be required to complete the transform commutations. It
is clear, from the C-code in Listing (6), which is a simple example
of Bresenham's line algorithm, that this algorithm only requires
integer additions, multiplication by two (bit shifts) and bitwise
logical operations. These implementation details significantly
reduce the complexity of the GDT algorithm. The line rasterizing
algorithm may be utilized for implementing the GDT.
TABLE-US-00006 Listing (6) void plotLine(int x0, int y0, int x1,
int y1) { int dx = abs(x1-x0), sx = x0<x1 ? 1 : -1; int dy =
-abs(y1-y0), sy = y0<y1 ? 1 : -1; int err = dx+dy, e2; /* error
value e_xy */ for(;;){ /* loop */ setPixel(x0, y0); if (x0==x1
&& y0==y1) break; e2 = 2*err; if (e2 >= dy) { err += dy;
x0 += sx; } /* e_xy+e_x > 0 */ if (e2 <= dx) {err += dx; y0
+= sy; } /* e_xy+e_y < 0 */ } }
[0142] Specifically, FIG. 10 illustrates the operation of
Bresenham's algorithm for drawing a line between a first point
1046a (e.g., (x0, y0)) and a second point 1046b (e.g., (x1, y1)).
In this example, the line is plotted in dimension B 1048b (in
pixels) over dimension A 1048a (in pixels). For example, dimension
A 1048a could be the width of an image and dimension B 1048b could
be the height of an image or vice versa.
[0143] FIG. 11 is a diagram illustrating one example of Bresenham's
algorithm for drawing circles. Bresenham's algorithm can be
extended efficiently to draw other curves such as circles among
others to be utilized in the Hough transform, for example. The
C-code in Listing (7), illustrates an implementation of the circle
algorithm.
TABLE-US-00007 Listing (7) void plotCircle(int xm, int ym, int r) {
int x = -r, y = 0, err = 2-2*r; /* II. Quadrant */ do {
setPixel(xm-x, ym+y); /* I. Quadrant */ setPixel(xm-y, ym-x); /*
II. Quadrant */ setPixel(xm+x, ym-y); /* III. Quadrant */
setPixel(xm+y, ym+x); /* IV. Quadrant */ r = err; if (r <= y)
err += ++y*2+1; /* e_xy+e_y < 0 */ if (r > x || err > y)
err += ++x*2+1; /*e_xy+e_x > 0 or no 2nd y-step */ } while (x
< 0); }
[0144] Specifically, FIG. 11 illustrates the operation of
Bresenham's algorithm for drawing a circle with radius r. In this
example, the circle is plotted in dimension B 1148b (in pixels)
over dimension A 1148a (in pixels). For example, dimension A 1148a
could be the width of an image and dimension B 1048b could be the
height of an image or vice versa.
[0145] FIG. 12 is a diagram illustrating one example of Bresenham's
algorithm for drawing ellipses. Bresenham's algorithm can be
extended efficiently to draw other curves such as ellipses among
others to be utilized in the Hough transform, for example. The
C-code in Listing (8) illustrates an implementation of the ellipse
algorithm. In particular, Listing (8) plots an ellipse inside a
specified rectangle.
TABLE-US-00008 Listing (8) void plotEllipseRect(int x0, int y0, int
x1, int y1) { int a = abs(x1-x0), b=abs(y1-y0), b1 = b&1; /*
values of diameter */ long dx = 4*(1-a)*b*b, dy = 4*(b1+1)*a*a; /*
error increment */ long err = dx+dy+b1*a*a, e2; /* error of 1.step
*/ if (x0 > x1) { x0 = x1; x1 +=a; } /* if called with swapped
points */ if (y0 > y1) y0 = y1; /* .. exchange them */ y0 +=
(b+1)/2; y1 = y0-b1; /* starting pixel */ a *= 8*a; b1 = 8*b*b; do
{ setPixel(x1, y0); /* I. Quadrant */ setPixel(x0, y0); /* II.
Quadrant */ setPixel(x0, y1); /* III. Quadrant */ setPixel(x1, y1);
/* IV. Quadrant */ e2 = 2*err; if (e2 <= dy) { y0++; y1--; err
+= dy += a; } /* y step */ if (e2 >= dx) || 2*err > dy) {
x0++; x1--; err += dx += b1; } /* x step */ } while (x0 < x1);
while (y0-y1 < b) { /* too early stop of flat ellipses a=1 */
setPixel(x0-1, y0); /* -> finish tip of ellipse */
setPixel(x1+1, y0++); setPixel(x0-1, y1); setPixel(x1+1, y1--); }
}
[0146] In this example, the ellipse is plotted in dimension B 1248b
(in pixels) over dimension A 1248a (in pixels). For example,
dimension A 1148a could be the width of an image and dimension B
1048b could be the height of an image or vice versa.
[0147] FIG. 13 illustrates examples of applications of the
transform disclosed herein. In particular, FIG. 13 illustrates a
diamond shape 1350, a rounded rectangle 1352, a handwritten
character ("3") 1354, a fingerprint 1356 and an image of an eye
1358. The transform disclosed herein may be applied to detection of
any of these items and others. For example, an electronic device
102 may take a gradient direction transform of an image of any of
these items.
[0148] As follows, a description is given of examples and
experiments conducted to evaluate the transform disclosed herein.
In particular, two different and specific applications are
described, one for iris detection and one for handwriting
recognition. These examples illustrate the generality and potential
uses of the GDT for image analysis and computer vision tasks.
[0149] In general, intensive application of preprocessing to a
given image may introduce unexpected distortion to the data which
may cause irrecoverable errors in the analysis. Even through simple
binaraization of the gray scale image, useful information can be
lost. To avoid the risk of suppressing important shape information
in an implementation of the GDT, some configurations of the systems
and methods disclosed herein may utilize limited preprocessing. For
example, preprocessing applied to the input images may include
scaling to a fixed size region of interest, with addition to
smoothing, to ensure a reliable estimation of gradient vectors, in
both iris detection and handwriting recognition applications.
[0150] FIG. 14 illustrates one example of an image 1460 of an eye.
Images may be captured in color in some configurations. In FIG. 14,
the image 1460 of the eye is illustrated in grayscale for
convenience. In this example, the image 1460 is a sample image of
dimensions 320.times.240 pixels. FIGS. 14-27 provide examples and
comparisons between the GDT and T&B for iris detection on a
sample eye image 1460. Iris detection is the task of finding the
center of a partial (e.g., circular/elliptical) structure
containing the iris image. It should be noted that while a gray
image representation (one-band) is utilized for purposes of
illustration, the transform can be applied to each image band
(3-bands of color) in a similar manner.
[0151] FIG. 15 illustrates one example of a gray image 1562 of an
eye. In some configurations, for example, only one band of a color
image (e.g., image or region of interest) may be utilized for the
transform. For instance, color information of an image or region of
interest may be discarded or an image may be converted to
grayscale. FIG. 15 shows a gray image 1562 (e.g., an input image of
320.times.240 pixels) that is based on an original color image of
the eye. A range of grayscale values is illustrated next to the
gray image 1562. In some configurations, the digital image
obtaining module 104 may capture a color image and convert it to a
gray image.
[0152] FIG. 16 illustrates one example of a gradient horizontal
component (Gx) gray image 1664 of an eye. In particular, FIG. 16
illustrates a gradient image 1664 in the x direction (Gx)
corresponding to the gray image illustrated in FIG. 15. A range of
gradient (e.g., gray) values is illustrated next to the gray image
1664. In some configurations, the gradient vector determination
module 108 may determine a gradient horizontal component (Gx) image
as described above.
[0153] FIG. 17 illustrates an example of a gradient vertical
component (Gy) gray image 1766 of an eye. In particular, FIG. 17
illustrates a gradient image 1766 in the y direction (Gy)
corresponding to the gray image illustrated in FIG. 15. A range of
gradient (e.g., gray) values is illustrated next to the gray image
1766. In some configurations, the gradient vector determination
module 108 may determine a gradient vertical component (Gy) image
as described above.
[0154] FIG. 18 illustrates one example of a transform space 1868 in
accordance with the systems and methods disclosed herein. In
particular, FIG. 18 illustrates one example of the GDT (in
dimensions of 320.times.240) corresponding to the gradient images
illustrated in FIGS. 16-17. In this example, the transform space
1868 is illustrated in Width 1872 versus Height 1870. Each of the
points of the transform space 1868 may correspond to a pixel in the
image 1460 of FIG. 14. Accordingly, the width and height of the
transform space may correspond to pixel dimensions of the image
1460. As illustrated in FIG. 18, the darker portion near the center
of the transform space may correspond to higher values.
[0155] FIG. 19 illustrates another representation (3-Dimensional)
of the transform space 1968 in accordance with the systems and
methods disclosed herein. In particular, FIG. 19 illustrates one
example of the GDT (in dimensions of 320.times.240) in three
dimensions (3D) corresponding to the gradient images illustrated in
FIGS. 16-17. In this example, the transform space 1968 is
illustrated in Value 1974 over Width 1972 (e.g., x position) and
Height 1970 (e.g., y position). The Value 1974 axis represents a
measure of the numerical value (e.g., score) at each point or
position in the transform space 1968. As can be observed, higher
values occur at locations corresponding to the iris of the eye.
[0156] FIG. 20 illustrates one example of a transform space 2068,
in lower spatial resolution, in accordance with the systems and
methods disclosed herein. In particular, FIG. 20 illustrates one
example of the GDT (in dimensions of 160.times.120) corresponding
to gradient images. In this example, the transform space 2068 is
illustrated in Width 2072 versus Height 2070. In some
configurations, the transform may be performed on images with a
lower resolution or may be performed on a subset of pixels (e.g.,
on a decimated image). As illustrated in FIG. 20, the darker
portion near the center of the transform space may correspond to
higher values.
[0157] FIG. 21 illustrates another representation (3-Dimensional)
of the transform space 2168, in lower spatial resolution, in
accordance with the systems and methods disclosed herein. In
particular, FIG. 21 illustrates one example of the GDT (in
dimensions of 160.times.120) in 3D corresponding to gradient
images. In this example, the transform space 2168 is illustrated in
Value 2174 over Width 2172 (e.g., x position) and Height 2170
(e.g., y position). The Value 2174 axis represents a measure of the
numerical value (e.g., score) at each point or position in the
transform space 2168. As can be observed, higher scores occur at
locations corresponding to the iris of the eye. As illustrated in
FIG. 21, the transform may be performed on images with a lower
resolution or may be performed on a subset of pixels (e.g., on a
decimated image) in some configurations.
[0158] FIG. 22 illustrates one example of a Timm & Barth
transform 2280. In particular, FIG. 22 illustrates one example of
the T&B transform 2280 (in dimensions of 320.times.240)
corresponding to the gray image illustrated in FIG. 15. This
example is illustrated in Height 2276 and Width 2278.
[0159] FIG. 23 illustrates another representation (3-Dimensional)
of the Timm & Barth transform 2380. In particular, FIG. 23
illustrates one example of the T&B transform 2380 (in
dimensions of 320.times.240) in 3D corresponding to the gray image
illustrated in FIG. 15. In comparing FIG. 19 to FIG. 23, the
corresponding GDT and T&B representations are plotted in 3D to
highlight the locations of the iris position. The example in FIG.
23 is illustrated in T&B Transform Output 2382 over Height 2376
and Width 2378.
[0160] FIG. 24 illustrates one example of a Timm & Barth
transform 2480, in lower spatial resolution. In particular, FIG. 24
illustrates one example of the T&B transform 2480 (in
dimensions of 160.times.120) corresponding to the gray image
illustrated in FIG. 15. This example is illustrated in Height 2476
and Width 2478.
[0161] FIG. 25 illustrates another representation (3-Dimensional)
of the Timm & Barth transform 2580, in lower spatial
resolution. In particular, FIG. 25 illustrates one example of the
T&B transform 2580 (in dimensions of 160.times.120) in 3D
corresponding to the gray image illustrated in FIG. 15. The example
in FIG. 25 is illustrated in T&B Transform Output 2582 over
Height 2576 and Width 2578.
[0162] FIG. 26 illustrates a comparison between the transform
disclosed herein and the Timm & Barth transform. For example,
FIG. 26 illustrates aspects of the performance of the Timm &
Barth transform versus the Gradient Direction Transform. When
analyzing an image, face detection may take approximately 30
milliseconds (ms) (for 24.times.24 dimensions, for example).
Additionally, eye corner detection may take approximately 12 ms
(for 256.times.256 dimensions, for example). Iris detection time
complexity for a region of interest may be given as follows.
Analytically, when processing any image with a region of interest
of size C columns by R rows, the time complexity of Timm &
Barth approach is equal to K.sub.1(C*R).sup.2 where K.sub.1 is the
cost for each normalized floating point dot product. The worst case
time complexity for the GDT approach is equal to K.sub.2(C*R)*C,
assuming C.gtoreq.R, where K.sub.2 is the cost for the integer
additions and bit-wise operations used to identify the cells in
line with each gradient vector.
[0163] It should be noted that the T&B approach ignores the
sign of the vector in squaring dot products to avoid square root
computations. However, the GDT is capable of efficient
consideration of sign of vectors at no extra cost (e.g.,
inward/outward directions). The extension of a gradient
normal-vector (in some configurations) to other directions such as
the gradient tangent-vector may suit describing other
(binary/gray/color) image analysis tasks.
[0164] Compared to T&B approach as illustrated in FIG. 23, the
transform disclosed herein may be less smooth. In some
configurations, an inexpensive 3.times.3 linear averaging filter
may be utilized to smooth the transform. The T&B algorithm has
a high positive constant value due to summing the squared values of
non-collinear vectors as expressed in Equation (4) and Equation
(5), for example.
[0165] Specifically, FIG. 26 illustrates measured speedup ratios
(e.g., T.sub.1/T.sub.2) between the GDT and the T&B transform.
An experiment to quantify the speedup on iris detection was
conducted, using a Matlab time profiler tool, by resizing an eye
image to different spatial resolutions, as illustrated in the shown
in Table A 2684. Average time and corresponding speedup values are
computed for each case as shown in Table A 2684 with ten
repetitions for each approach as illustrated in Table B 2686. The
times illustrated in Table A 2684 and Table B 2686 are given in
units of seconds according to a time profiler. It should be noted
that the same time unit is used for both algorithms.
[0166] FIG. 27 illustrates another comparison between the transform
disclosed herein and the Timm & Barth transform. In particular,
FIG. 27 illustrates a comparison of the constants in computational
complexity between the GDT (K.sub.2) and the T&B transform
(K.sub.1). Specifically, FIG. 27 includes Table C 2788 that
illustrates a speedup ratio between the performance of the Timm
& Barth transform versus the Gradient Direction Transform. In
FIG. 27, K.sub.1 and K.sub.2 are given in units of seconds
according to a time profiler.
[0167] FIG. 28 is a block diagram illustrating a more specific
configuration of modules for obtaining structural information from
a digital image. In particular, FIG. 28 illustrates an optional
combination of the GDT and HTE for iris detection. Several modules
2890, 2892, 2894, 2896, 2898, 2801, 2803, 2805, 2807, 2809, 2811
are illustrated in FIG. 28, one or more of which may be implemented
in an electronic device 102. An original image (e.g., i-image) may
be provided to a pre-processing module. In some configurations of
the GDT approach for the iris detection, the pre-processing module
2890 may scale the eye region of interest to a fixed width of 40
pixels, preserving the aspect ratio of the input image. Scaling may
ensure that the detection task is completed in a fixed budget time.
The scaled image may be referred to as an n-image, which may be
provided to a blur convolution module 2892 and/or to an edge
convolution module 2805.
[0168] The scaled image (e.g., n-image) may also be smoothed and/or
blurred by the blur convolution module 2892 in order to obtain
better gradient direction estimation. For example, the blurred
and/or smoothed image may be referred to as a b-image and may be
provided to a vertical edge module 2894 and/or a horizontal edge
module 2896.
[0169] The vertical edge module 2894 and the horizontal edge module
2896 may be used as estimators for the vertical (e.g., Gy) and
horizontal (e.g., Gx) gradient images, respectively. For example,
the vertical edge module 2894 and/or the horizontal edge module
2896 may apply Sobel operators, Prewitt operators, Scharr operators
and/or mathematical morphology to determine the gradient images.
The vertical gradient image may be referred to as a v-image and the
horizontal gradient image may be referred to as an h-image. The
vertical gradient image and the horizontal gradient image may be
provided to a gradient direction transform module 2898.
[0170] The gradient direction transform module 2898 may determine
(e.g., compute) the gradient direction transform based on the
vertical gradient image and the horizontal gradient image as
described above in connection with one or more of FIGS. 1-5 and 9.
The transform space (e.g., values, transform matrix, etc.) may be
referred to as a g-image. The transform space may be provided to a
weighted gradient direction module 2801.
[0171] The weighted gradient direction module 2801 may optionally
apply a prior weight to the raw transform (after computing the GDT
for each scaled and blurred region, for example), utilizing the
observation that the iris center is usually dark, so that the
weight is inversely proportional to the gray level value.
Specifically, the blur convolution module 2892 may provide the
b-image to the weighted gradient direction module 2801. The
weighted gradient direction module may apply weights to the g-image
based on the b-image. For example, the weighted gradient direction
module 2801 may generate weights based on the b-image, where darker
pixels are assigned higher weights and lighter pixels are assigned
lower weights. This may emphasize (e.g., scale up) the values in
the g-image corresponding to the darker areas of the b-image (e.g.,
the iris). The resulting weighted gradient direction image may be
referred to as a w-image, which may be provided to an iris position
determination module 2803.
[0172] It should be noted that the edge convolution module 2805,
the Otsu dynamic threshold module 2807, the thinning module 2809
and/or the Hough transform for elliptical shapes module 2811 may be
optional. One or more of these modules 2805, 2807, 2809, 2811 may
be helpful in certain situations. For example, one difficulty may
occur when there is strong corneal reflection in the images from
light sources covering the iris center location. This is expected,
since the transform may be weighted as mentioned above. Also, some
scenarios for iris detection using sensors mounted on the inside
direction of head mounted displays, or other eye glasses for
example, may utilize a different processing chain to find the eye
corners and iris locations simultaneously, since the full image of
the face may not be available in such cases. One approach to do so,
using both the gradient amplitude (via HTE) and the gradient
direction (via GDT) is outlined in FIG. 28.
[0173] The edge convolution module 2805 may perform edge
convolution on the n-image. The resulting edge-convolved image may
be referred to as an e-image, which may be provided to the Otsu
dynamic threshold module 2807. The Otsu dynamic threshold module
2807 may produce an image referred to as an o-image, which may be
provided to the thinning module 2809. The thinning module 2809 may
thin the o-image. The thinned image may be referred to as a
t-image, which may be provided to the Hough transform for
elliptical shapes module 2811. The purpose of the edge convolution
module 2805, the Otsu dynamic threshold module 2807 and the
thinning module 2809 may be to construct a binary image (t-image)
with minimal reliable foreground pixels to be used for computing
the Hough transform in the Hough transform for elliptical shapes
module 2811 efficiently, since the time complexity of Hough
transform computations increases with the number of foreground
pixels.
[0174] The Hough transform for circles or ellipses may be
implemented using Bresenham's algorithms to avoid floating point
computations as described above in connection with FIGS. 11-12. In
this case, the range of values for the parameters of the circle and
the ellipse can be greatly constrained to further reduce the memory
and processing requirements for Hough transform computations. The
Hough transform for elliptical shapes module 2811 may produce a
Hough transform space, which may be referred to as an h-space. The
h-space may be provided to the iris position determination module
2803.
[0175] The GDT role here may be to uniquely utilize the gradient
orientation information to further improve the accuracy of
detection. Gradient-based image analysis approaches of the last
couple of decades do not fully utilize the gradient information due
to computational and memory constraints. The systems and methods
disclosed herein may contribute a complete and efficient image
transformation scheme that improves (e.g., is faster and more
robust) and extends known approaches to enable real-time
applications.
[0176] More detail is given hereafter regarding the application of
the GDT to handwriting recognition. In particular, other approaches
are described in connection with FIGS. 29-30. Application of the
GDT to handwriting recognition is then described in greater detail
in connection with FIGS. 31-35.
[0177] Recognizing the content of digital images that contain
hand-written characters and/or hand-drawn shapes often requires
sophisticated processing to deal with significant variations in
size, style and data acquisition conditions. Conventional
distance-transform based techniques such as bar-transform (BAR)
descriptors work a on down-scaled binary representation and
quantize the direction information into four major orientations to
reduce the input dimensionality. Deep Neural Networks (DNN) are
also applied directly to the raw input image pixel values, using
multiple hidden layers, to perform automatic feature extraction and
classification tasks. The requirements of these techniques,
particularly when considering large (greater than four) number of
directions in BAR descriptors, or large number of hidden layers in
DNN, limit their uses in platforms with small memory and processing
footprints, for high resolution inputs and data rates.
[0178] FIG. 29 illustrates an example of the BAR transform. The bar
transform was originally defined on binary character images. It is
used to compute a feature descriptor (e.g., BAR features
(handcrafted)) for character (e.g., handwriting) recognition
applications. (See Paul Gader, Magdi Mohamed, and Jung-Hsien Jiang,
"Comparison of Crisp and Fuzzy Character Neural Networks in
Handwritten Word Recognition," IEEE Transactions on Fuzzy Systems,
Vol. 3, No. 3, pp. 357-363, 1995.) Initially, eight feature images
are generated. Each feature image map corresponds to one of the
directions: east (e), northeast (ne), north (n) and northwest (nw),
in either the foreground or the background. Each feature image has
an integer value at each location that represents the length of the
longest bar that fits at that point in that direction. An example
of an original binary image or character image 2913 and the bar
feature image 2915 e(i,j) (e.g., for foreground horizontal
direction) for the foreground representation is shown in FIG. 29.
In particular, The BAR approach considers 4 directions (e.g.,
horizontal, vertical, diagonal and anti-diagonal} to construct a
feature vector (of size=120, for example) by summing values in
overlapping zones of 4 maps for foreground and 4 maps for
background.
[0179] A two pass algorithm is used to generate the feature images.
In the forward pass, the image is scanned left-to-right and
top-to-bottom. Listing (9) illustrates a pseudo-code for computing
the BAR transform on the foreground. More specifically, at each
point, either the foreground or the background feature images are
updated as shown in Listing (9). On the backward pass, the maximum
is propagated back up from bottom-to-top, right-to-left as shown in
the second part of Listing (9).
TABLE-US-00009 Listing (9) /* FORWARD PASS */ FOR i = 1, 2, ...,
nrows DO FOR j = 1, 2, ..., ncols DO e(i, j) = e(i, j - 1) +1 ne(i,
j) = ne(i - 1, j + 1) +1 n(i, j) = n(i - 1, j) +1 nw(i, j) = nw(i
-1, j - 1) +1 /* BACKWARD PASS */ FOR i = nrows, nrows - 1, ..., 1
DO FOR j = ncols, ncols - 1, ..., 1 DO e(i, j) = max(e(i, j), e(i,
j + 1)) ne(i, j) = max(ne(i, j), ne(i + 1, j - 1)) n(i, j) =
max(n(i, j), n(i + 1, j)) nw(i, j) = max(nw(i, j), nw(i + 1, j +
1))
[0180] FIG. 30 is a diagram illustrating one example of a deep
neural network. The deep neural network may be applied for
handwriting recognition. In particular, deep neural network
features may be learned to describe handwriting. A deep neural
network may learn weights 3021a-c by minimizing reconstruction
error and prediction error. As illustrated in FIG. 30, first layer
weights 3021a may be learned to map input values 3017 to layer 1
units 3019a. Second layer weights 3021b may be learned to map layer
1 units 3019a to layer 2 units 3019b. Classifier weights 3021c may
be learned to map layer 2 units 3019b to the label layer or layer
units 3019c.
[0181] The systems and methods disclosed herein provide a compact
and fast feature extractor, which may be referred to as a Gradient
Direction Descriptor (GDD). The GDD is based on the Gradient
Direction Transform (GDT) scheme described above. The GDD
characterizes the content of an input image (e.g., digital image
406) by emphasizing the locations of concavity and convexity
regions and intersections of strokes as pieces of information for
describing the content. This information may be utilized in order
to perform shape classification. This feature descriptor can
discriminate among several shape classes, using small size
classification models. The GDD values can be used as standalone
input feature vector, or in combination with other descriptors to
reduce complexity and improve performance of pattern recognition
systems.
[0182] The GDD is described for handwriting recognition
applications. The GDD utilizes an efficient implementation of the
GDT algorithm to compute the discriminant features. Further
description of the pattern recognition model construction and a
neural network architecture is provided as well.
[0183] FIG. 31 is a flow diagram illustrating one configuration of
a method 3100 for determining a character from a digital image 406.
The method 3100 may be performed by the electronic device 402
described in connection with FIG. 4. The electronic device 402 may
obtain 3102 a digital image 406. This may be accomplished as
described above in connection with one or more of FIGS. 1-2 and 4,
for example.
[0184] The electronic device 402 may determine 3104 a gradient
vector for each pixel in a region of interest of the digital image
406. This may be accomplished as described above in connection with
one or more of FIGS. 1-2 and 4, for example.
[0185] For each pixel, the electronic device 402 may determine 3106
a first set of pixels including any pixel along a line that is
collinear with or perpendicular to the gradient vector and that
passes through the pixel location (and/or intersects an origin of
the gradient vector). This may be accomplished as described above
in connection with one or more of FIGS. 1-4, for example.
[0186] For each pixel, the electronic device 402 may increment 3108
a first set of values 114 in a transform space corresponding to any
of the first set of pixels that are in a first direction of the
line. This may be accomplished as described above in connection
with one or more of FIGS. 1-4, for example. In some configurations,
after initializing a transform matrix to zeroes, for each gradient
vector in the input image region of interest, the electronic device
402 may increment and/or decrement the value of the cells in the
transform matrix in line with and/or tangent to the gradient
vector. FIG. 9 provides an example of the gradient direction
transform (GDT). The lines may be determined as described above in
connection with FIGS. 3A-C, for example.
[0187] The electronic device 402 may determine 3110 a character
(e.g., a handwriting character) in the region of interest based on
the transform (e.g., the GDT). For example, the electronic device
402 may use the transforms for the normal gradient direction and
the tangent gradient direction to compute corresponding feature
maps and construct a compact feature vector, called a Gradient
Direction Descriptor (GDD), containing unique discriminant
information. For example, the GDD me be used as an input (instead
of the raw input image, for instance) to handwriting classifiers
such as Deep Neural Networks (DNN) to achieve higher recognition
accuracies with less computations and memory requirements.
[0188] FIG. 32 illustrates an example (3-Dimensional) of a
transform space 3223 using a gradient normal direction for
handwriting character recognition. In particular, FIG. 32
illustrates one example of the GDT in the normal direction for
handwriting corresponding to the image 3231 of the handwritten
character "8." In this example, the transform space 3223 is
illustrated in Value 3229 over Width 3227 (e.g., x position) and
Height 3225 (e.g., y position). The Value 3229 axis represents a
measure of the numerical value (e.g., score) at each point or
position in the transform space 3223. In this example, the
transform space 3223 corresponds to the transform where values are
incremented along a line that is collinear with the gradient vector
(e.g., the normal direction).
[0189] FIG. 33 illustrates an example (3-Dimensional) of a
transform space 3323 using a gradient tangent direction, for
handwriting character recognition. In particular, FIG. 33
illustrates one example of the GDT in the tangent direction for
handwriting corresponding to the image 3331 of the handwritten
character "8." In this example, the transform space 3323 is
illustrated in Value 3329 over Width 3327 (e.g., x position) and
Height 3325 (e.g., y position). The Value 3329 axis represents a
measure of the numerical value (e.g., score) at each point or
position in the transform space 3323. In this example, the
transform space 3323 corresponds to the transform where values are
incremented along a line that is tangent to the gradient vector. As
illustrated in FIGS. 32-33, two separate transform spaces may be
generated: one for lines that are collinear with the gradient
vectors, and one for lines that are tangent to the gradient
vectors.
[0190] FIG. 34 illustrates an example of Gradient Direction
Descriptor (GDD) computations in accordance with the systems and
methods disclosed herein. Automated handwriting recognition is
known to be one of the most challenging problems in computer
vision, due to the great variability in writing styles, and the
large number of classes. It is significantly difficult and may be
used to evaluate new recognition paradigms. A combination of
techniques may be used to improve the recognition accuracy required
for the target use case. The problem is further complicated by the
fact that, for small footprint platforms, a recognition engine,
containing multiple classifiers, dictionaries, and language models,
has to be small enough to suit the small platform real time
requirements.
[0191] Despite the fact that high recognition rates were reported
in the literature using Deep Neural Networks (DNN) trained using
raw images for isolated alphanumeric characters, a main concern is
still the model size and risk of over fitting. These concerns may
be addressed by designing a feature descriptor based on the
transform disclosed herein (e.g., the GDT). Training and testing
processes may be conducted to properly validate the performance and
compare against existing approaches.
[0192] In some configurations, before generating features, the
input image 3433 may be cropped and then normalized (to a fixed
height h=24 pixels and width w=16 pixels, for example) to produce a
pre-processed image 3435. For example, a pre-processing module
included in an electronic device 402 may perform one or more of
these operations. The gradient vector determination module 408 may
determine gradient vectors 410 (e.g., a horizontal gradient vector
image Gx 3437 and a vertical gradient vector image Gy 3439). For
example, Sobel gradient operators of size (3.times.3) may be
applied to the pre-processed image 3435 (e.g., a fixed size image)
to produce the corresponding horizontal and vertical gradient
images (Gx 3437, Gy 3439), respectively.
[0193] The transformation module 412 may determine (e.g., compute)
the GDT normal and tangent components. In this example, the
transformation module 412 may determine two transform spaces (e.g.,
two sets of values 414, two transform matrices, two transform maps,
etc.). For example, the transformation module 412 may determine the
GDT normal map 3441 based on Gx 3437 and Gy 3439 by incrementing
one or more values in the transform space along a line (in one
direction, for example) that is collinear with the gradient
vectors. Additionally, the transformation module 412 may determine
the GDT tangent map 3443 based on Gx 3437 and Gy 3439 by
incrementing one or more values in the transform space along a line
(in one direction, for example) that is tangent to the gradient
vectors. The corresponding GDT normal map 3441 and tangent map 3443
are shown in the example of FIG. 34, where only the GDT tangent map
3443 is formed by weighting the GDT tangent component with the
preprocessed image 3435 to highlight the existence of strokes and
their intersections. The GDT normal map 3441, however, may not be
weighted in some configurations, since its purpose may be to
highlight the holes and curvatures. One example of the GDT normal
map 3441 is given in FIG. 32. One example of the GDT tangent map
3443 is given in FIG. 33.
[0194] FIG. 35 is a diagram illustrating construction of a feature
descriptor based on the transform described herein. An electronic
device (e.g., electronic device 402) may construct the GDT-based
feature descriptor (e.g., GDD). An example image grid is given in
FIG. 35 in Width 3549 (in pixels, for example) and Height 3547 (in
pixels, for example). Feature vectors may be computed from the
resultant GDT image feature maps using overlapping zones. In other
words, the Gradient Direction Descriptor (GDD) may be determined
with overlapping zones. For example, seventy seven rectangular
zones 3545a-z arranged in 11 rows and 7 columns may be used (where
each zone of size (h/6 by w/4), e.g., of size (4.times.4) in this
case, for example), where h and w are the height and width of the
image, respectively. The upper left hand corners of the zones may
be at positions {(r, c)|r=0, h/12, 2h/12, . . . , 10h/12, and c=0,
w/8, 2w/8, . . . , 6w18}, where r denotes a row and c denotes a
column, for example. The values in each zone 3545a-z in each
feature image may be summed. The sums may be normalized between
zero and one by dividing by the maximum possible sum in a zone. For
example, the value of each feature is the sum of the GDT values in
the zone, normalized by the maximum possible value. One set of
features may be computed for each GDT feature map (e.g., normal and
tangent).
[0195] Two additional values, aspect ratio and number of connected
components may be appended, resulting in a feature vector of
Size=2*77+2=156. In other words, two more features may be added,
namely the aspect ratio and number of connected components. In this
example, the GDT-based feature descriptor (e.g., GDD) may be of
dimension 77*2+2=156, while the BAR-based feature descriptor may be
of dimension 15*8=120. A BAR-based feature descriptor may be
constructed as detailed in Paul Gader, Magdi Mohamed, and
Jung-Hsien Jiang, "Comparison of Crisp and Fuzzy Character Neural
Networks in Handwritten Word Recognition," IEEE Transactions on
Fuzzy Systems, Vol. 3, No. 3, pp. 357-363, 1995.
[0196] A dataset containing digits for training and digits for
testing was used to test the systems and methods disclosed herein.
Some character sets contained a variable number of samples. Using a
conventional K-means clustering algorithm, 1000 images were
prepared for training and 1000 images were prepared for testing,
per each upper case (UC) and lower case (LC) alphabet class, for
conducting the experiments. To properly evaluate the feature vector
representation, linear (no hidden layer) classifiers may be trained
first, and second, nonlinear classifiers with two hidden layers may
be trained with feed forward neural networks using a DNN learning
technique when constructing the nonlinear classifiers. Several
experiments were conducted with a RAW image (Size=28*28=784)
feature descriptor, a BAR (Size=15*8=120) feature descriptor, and a
GDD (Size=77*2+2=156) feature descriptor. The entries in Table D
3651 (illustrated in FIG. 36) and Table E 3753 (illustrated in FIG.
37) summarize the performance levels for each of the individual and
combined BAR+GDT (Size=120+256=276) linear & nonlinear
classifiers, respectively. Specifically, Table D 3651 in FIG. 36
summarizes some results for a handwriting recognition application
for different feature descriptors (RAW, BAR, GDT and a combination
or fusion of BAR and GDT (BAR+GDT)) with neural networks of 0
hidden layers. Furthermore, Table E 3753 in FIG. 37 summarizes some
results for a handwriting recognition application for different
feature descriptors (RAW, BAR, GDT and a combination or fusion of
BAR and GDT (BAR+GDT)) with neural networks of 2 hidden layers. In
Table D 3561 and Table E 3753, performance is illustrated
corresponding to digits, upper case and lower case characters. It
should be noted that MNIST stands for Mixed National Institute of
Standards and Technology in FIGS. 36-37.
[0197] The GDT may evaluated by being applied to solve two
different problems, for iris detection and handwriting recognition,
as described above. The GDT image transform is efficient, reliable,
and generic enough to handle other applications as well.
[0198] Applying the GDT for handwriting recognition, for example,
may utilize both the gradient normal and tangent vector directions
as described above. A dataset may contain binary images of isolated
alphanumeric characters. After cropping and scaling to a fixed
height of 24 pixels and width of 16 pixels, gray images may be
obtained. The transforms may be generated and the feature vectors
may be computed to construct their corresponding classifiers. For
the GDT, the transform may be smoothed. In some configurations,
only the tangent component may be weighted with the per-unit gray
values of the input to produce the tangent feature map. The
reduction in input size from 28.times.28=784 (for RAW descriptor)
to 120+156=276 (for BAR+GDT) descriptors resulted in significant
improvement of the recognition accuracies as demonstrated in Table
D 3651 and Table E 3753 (in FIGS. 36-37), using linear and
nonlinear classifiers, respectively. This shows that the new GDT
based features provide improved performance, particularly for digit
and lower case alphabets. It is worth noting that the number of
nodes in the hidden layers may be optimized for the RAW feature
set. Other configurations of the network architecture may further
improve the resultant fused performance of the isolated
alphanumeric character classifiers to be used for connected
handwritten word recognition systems.
[0199] FIG. 38 is a block diagram illustrating one configuration of
a wireless communication device 3802 in which systems and methods
for obtaining structural information from a digital image may be
implemented. The wireless communication device 3802 illustrated in
FIG. 38 may be an example of one or more of the electronic devices
described herein. The wireless communication device 3802 may
include an application processor 3865. The application processor
3865 generally processes instructions (e.g., runs programs) to
perform functions on the wireless communication device 3802. In
some configurations, one or more of the functions (e.g., the
transform) disclosed herein may be performed by the application
processor 3865. For example, the application processor 3865 may
determine gradient vectors, transform pixels and/or perform one or
more operations based on the values of the transform space (e.g.,
iris detection, handwriting recognition, etc.). The application
processor 3865 may be coupled to an audio coder/decoder (codec)
3863.
[0200] The audio codec 3863 may be used for coding and/or decoding
audio signals. The audio codec 3863 may be coupled to at least one
speaker 3855, an earpiece 3857, an output jack 3859 and/or at least
one microphone 3861. The speakers 3855 may include one or more
electro-acoustic transducers that convert electrical or electronic
signals into acoustic signals. For example, the speakers 3855 may
be used to play music or output a speakerphone conversation, etc.
The earpiece 3857 may be another speaker or electro-acoustic
transducer that can be used to output acoustic signals (e.g.,
speech signals) to a user. For example, the earpiece 3857 may be
used such that only a user may reliably hear the acoustic signal.
The output jack 3859 may be used for coupling other devices to the
wireless communication device 3802 for outputting audio, such as
headphones. The speakers 3855, earpiece 3857 and/or output jack
3859 may generally be used for outputting an audio signal from the
audio codec 3863. The at least one microphone 3861 may be an
acousto-electric transducer that converts an acoustic signal (such
as a user's voice) into electrical or electronic signals that are
provided to the audio codec 3863.
[0201] The application processor 3865 may also be coupled to a
power management circuit 3875. One example of a power management
circuit 3875 is a power management integrated circuit (PMIC), which
may be used to manage the electrical power consumption of the
wireless communication device 3802. The power management circuit
3875 may be coupled to a battery 3877. The battery 3877 may
generally provide electrical power to the wireless communication
device 3802. For example, the battery 3877 and/or the power
management circuit 3875 may be coupled to at least one of the
elements included in the wireless communication device 3802.
[0202] The application processor 3865 may be coupled to at least
one input device 3879 for receiving input. Examples of input
devices 3879 include infrared sensors, image sensors,
accelerometers, touch sensors, keypads, etc. The input devices 3879
may allow user interaction with the wireless communication device
3802. The application processor 3865 may also be coupled to one or
more output devices 3881. Examples of output devices 3881 include
printers, projectors, screens, haptic devices, etc. The output
devices 3881 may allow the wireless communication device 3802 to
produce output that may be experienced by a user.
[0203] The application processor 3865 may be coupled to application
memory 3883. The application memory 3883 may be any electronic
device that is capable of storing electronic information. Examples
of application memory 3883 include double data rate synchronous
dynamic random access memory (DDRAM), synchronous dynamic random
access memory (SDRAM), flash memory, etc. The application memory
3883 may provide storage for the application processor 3865. For
instance, the application memory 3883 may store data and/or
instructions for the functioning of programs that are run on the
application processor 3865.
[0204] The application processor 3865 may be coupled to a display
controller 3885, which in turn may be coupled to a display 3887.
The display controller 3885 may be a hardware block that is used to
generate images on the display 3887. For example, the display
controller 3885 may translate instructions and/or data from the
application processor 3865 into images that can be presented on the
display 3887. Examples of the display 3887 include liquid crystal
display (LCD) panels, light emitting diode (LED) panels, cathode
ray tube (CRT) displays, plasma displays, etc.
[0205] The application processor 3865 may be coupled to a baseband
processor 3867. The baseband processor 3867 generally processes
communication signals. For example, the baseband processor 3867 may
demodulate and/or decode received signals. Additionally or
alternatively, the baseband processor 3867 may encode and/or
modulate signals in preparation for transmission.
[0206] The baseband processor 3867 may be coupled to baseband
memory 3889. The baseband memory 3889 may be any electronic device
capable of storing electronic information, such as SDRAM, DDRAM,
flash memory, etc. The baseband processor 3867 may read information
(e.g., instructions and/or data) from and/or write information to
the baseband memory 3889. Additionally or alternatively, the
baseband processor 3867 may use instructions and/or data stored in
the baseband memory 3889 to perform communication operations.
[0207] The baseband processor 3867 may be coupled to a radio
frequency (RF) transceiver 3869. The RF transceiver 3869 may be
coupled to a power amplifier 3871 and one or more antennas 3873.
The RF transceiver 3869 may transmit and/or receive radio frequency
signals. For example, the RF transceiver 3869 may transmit an RF
signal using a power amplifier 3871 and at least one antenna 3873.
The RF transceiver 3869 may also receive RF signals using the one
or more antennas 3873.
[0208] FIG. 39 illustrates certain components that may be included
within an electronic device 3902. The electronic device 3902
described in connection with FIG. 39 may be an example of and/or
may be implemented in accordance with one or more of the electronic
devices described herein.
[0209] The electronic device 3902 includes a processor 3907. The
processor 3907 may be a general purpose single- or multi-chip
microprocessor (e.g., an ARM), a special purpose microprocessor
(e.g., a digital signal processor (DSP)), a microcontroller, a
programmable gate array, etc. The processor 3907 may be referred to
as a central processing unit (CPU). Although just a single
processor 3907 is shown in the electronic device 3902 of FIG. 39,
in an alternative configuration, a combination of processors (e.g.,
an ARM and DSP) could be used.
[0210] The electronic device 3902 also includes memory 3991 in
electronic communication with the processor 3907 (i.e., the
processor 3907 can read information from and/or write information
to the memory 3991). The memory 3991 may be any electronic
component capable of storing electronic information. The memory
3991 may be random access memory (RAM), read-only memory (ROM),
magnetic disk storage media, optical storage media, flash memory
devices in RAM, on-board memory included with the processor,
programmable read-only memory (PROM), erasable programmable
read-only memory (EPROM), electrically erasable PROM (EEPROM),
registers, and so forth, including combinations thereof.
[0211] Data 3993 and instructions 3995 may be stored in the memory
3991. The instructions 3995 may include one or more programs,
routines, sub-routines, functions, procedures, code, etc. The
instructions 3995 may include a single computer-readable statement
or many computer-readable statements. The instructions 3995 may be
executable by the processor 3907 to implement one or more of the
methods described above. Executing the instructions 3995 may
involve the use of the data 3993 that is stored in the memory 3991.
FIG. 39 shows some instructions 3995a and data 3993a being loaded
into the processor 3907.
[0212] The electronic device 3902 may also include a transmitter
3903 and a receiver 3905 to allow transmission and reception of
signals between the electronic device 3902 and a remote location
(e.g., a base station). The transmitter 3903 and receiver 3905 may
be collectively referred to as a transceiver 3901. An antenna 3999
may be electrically coupled to the transceiver 3901. The electronic
device 3902 may also include (not shown) multiple transmitters,
multiple receivers, multiple transceivers and/or multiple
antenna.
[0213] The various components of the electronic device 3902 may be
coupled together by one or more buses, which may include a power
bus, a control signal bus, a status signal bus, a data bus, etc.
For simplicity, the various buses are illustrated in FIG. 39 as a
bus system 3997.
[0214] In the above description, reference numbers have sometimes
been used in connection with various terms. Where a term is used in
connection with a reference number, this may be meant to refer to a
specific element that is shown in one or more of the Figures. Where
a term is used without a reference number, this may be meant to
refer generally to the term without limitation to any particular
Figure.
[0215] The term "determining" encompasses a wide variety of actions
and, therefore, "determining" can include calculating, computing,
processing, deriving, investigating, looking up (e.g., looking up
in a table, a database or another data structure), ascertaining and
the like. Also, "determining" can include receiving (e.g.,
receiving information), accessing (e.g., accessing data in a
memory) and the like. Also, "determining" can include resolving,
selecting, choosing, establishing and the like.
[0216] The phrase "based on" does not mean "based only on," unless
expressly specified otherwise. In other words, the phrase "based
on" describes both "based only on" and "based at least on."
[0217] It should be noted that one or more of the features,
functions, procedures, components, elements, structures, etc.,
described in connection with any one of the configurations
described herein may be combined with one or more of the functions,
procedures, components, elements, structures, etc., described in
connection with any of the other configurations described herein,
where compatible. In other words, any compatible combination of the
functions, procedures, components, elements, etc., described herein
may be implemented in accordance with the systems and methods
disclosed herein.
[0218] The functions described herein may be stored as one or more
instructions on a processor-readable or computer-readable medium.
The term "computer-readable medium" refers to any available medium
that can be accessed by a computer or processor. By way of example,
and not limitation, such a medium may comprise Random-Access Memory
(RAM), Read-Only Memory (ROM), Electrically Erasable Programmable
Read-Only Memory (EEPROM), flash memory, Compact Disc Read-Only
Memory (CD-ROM) or other optical disk storage, magnetic disk
storage or other magnetic storage devices, or any other medium that
can be used to store desired program code in the form of
instructions or data structures and that can be accessed by a
computer. Disk and disc, as used herein, includes compact disc
(CD), laser disc, optical disc, digital versatile disc (DVD),
floppy disk and Blu-Ray.RTM. disc, where disks usually reproduce
data magnetically, while discs reproduce data optically with
lasers. It should be noted that a computer-readable medium may be
tangible and non-transitory. The term "computer-program product"
refers to a computing device or processor in combination with code
or instructions (e.g., a "program") that may be executed, processed
or computed by the computing device or processor. As used herein,
the term "code" may refer to software, instructions, code or data
that is/are executable by a computing device or processor.
[0219] Software or instructions may also be transmitted over a
transmission medium. For example, if the software is transmitted
from a website, server, or other remote source using a coaxial
cable, fiber optic cable, twisted pair, digital subscriber line
(DSL) or wireless technologies such as infrared, radio, and
microwave, then the coaxial cable, fiber optic cable, twisted pair,
DSL or wireless technologies such as infrared, radio and microwave
are included in the definition of transmission medium.
[0220] The methods disclosed herein comprise one or more steps or
actions for achieving the described method. The method steps and/or
actions may be interchanged with one another without departing from
the scope of the claims. In other words, unless a specific order of
steps or actions is required for proper operation of the method
that is being described, the order and/or use of specific steps
and/or actions may be modified without departing from the scope of
the claims.
[0221] It is to be understood that the claims are not limited to
the precise configuration and components illustrated above. Various
modifications, changes and variations may be made in the
arrangement, operation and details of the systems, methods, and
apparatus described herein without departing from the scope of the
claims.
* * * * *