U.S. patent application number 17/244251 was filed with the patent office on 2021-10-07 for mapping optical-code images to an overview image.
This patent application is currently assigned to SCANDIT AG. The applicant listed for this patent is SCANDIT AG. Invention is credited to Matthias Bloch, Christian Floerkemeier, Fabian Nater, Kimmo Roimela, Bernd Schoner.
Application Number | 20210312217 17/244251 |
Document ID | / |
Family ID | 1000005550288 |
Filed Date | 2021-10-07 |
United States Patent
Application |
20210312217 |
Kind Code |
A1 |
Nater; Fabian ; et
al. |
October 7, 2021 |
MAPPING OPTICAL-CODE IMAGES TO AN OVERVIEW IMAGE
Abstract
Images of optical codes are mapped to an overview image to
localize optical codes within a space. By localizing optical codes,
information about locations of various products can be ascertained.
One or more techniques can be used to map the images of optical
codes to the overview image. The overview image can be a composite
image formed by stitching together several images.
Inventors: |
Nater; Fabian; (Zurich,
CH) ; Roimela; Kimmo; (Tamtere, FI) ; Schoner;
Bernd; (New York, NY) ; Bloch; Matthias;
(Zurich, CH) ; Floerkemeier; Christian; (Zurich,
CH) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
SCANDIT AG |
ZURICH |
|
CH |
|
|
Assignee: |
SCANDIT AG
ZURICH
CH
|
Family ID: |
1000005550288 |
Appl. No.: |
17/244251 |
Filed: |
April 29, 2021 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
17139529 |
Dec 31, 2020 |
|
|
|
17244251 |
|
|
|
|
16920061 |
Jul 2, 2020 |
10963658 |
|
|
17139529 |
|
|
|
|
63017493 |
Apr 29, 2020 |
|
|
|
63017493 |
Apr 29, 2020 |
|
|
|
63003675 |
Apr 1, 2020 |
|
|
|
63019818 |
May 4, 2020 |
|
|
|
Current U.S.
Class: |
1/1 |
Current CPC
Class: |
G06K 7/1417 20130101;
G06K 7/1443 20130101; G06K 9/52 20130101; G06T 11/00 20130101; G06K
9/6202 20130101; G06T 7/73 20170101; G06K 7/1447 20130101; G06K
9/685 20130101 |
International
Class: |
G06K 9/62 20060101
G06K009/62; G06T 7/73 20170101 G06T007/73; G06K 9/68 20060101
G06K009/68; G06K 7/14 20060101 G06K007/14; G06T 11/00 20060101
G06T011/00; G06K 9/52 20060101 G06K009/52 |
Claims
1. A system for mapping optical-code images to an overview image,
the system comprising: an image sensor configured to acquire a
first image, a second image, and a third image of a scene; and one
or more processors configured to: receive the first image, wherein
the first image includes a first optical code but not a second
optical code; decode the first optical code using the first image;
receive the second image, wherein the second image is acquired
after the first image and includes the second optical code but not
the first optical code; decode the second optical code using the
second image; receive the third image, wherein the third image
includes both the first optical code and the second optical code,
without decoding the first optical code or the second optical code
using the third image; generate a first map of the probability of a
location of the first optical code in the third image; generate a
second map of the probability of a location of the second optical
code in the third image; correlate the first optical code with a
first location in the third image, based on the first map; and
correlate the second optical code with a second location in the
third image, based on the second map.
2. The system of claim 1, wherein the first map is based on: a
convolution of one or more features in the first image with the
third image; an assumed scan order of the first optical code and
the second optical code; and identifying label positions in the
third image.
3. The system of claim 1, wherein the first map is further based
on: an estimated device position at a time the first image is
acquired; identifying characters in the first image and in the
third image; image recognition of a product in the third image
identified by the first optical code; data from the second map;
data of phone translation; and matching interest points and texture
features in the first image to the third image.
4. A method for mapping optical-code images to an overview image,
the method comprising: receiving a first image, wherein the first
image includes a first optical code but not a second optical code;
decoding the first optical code using the first image; receiving a
second image, wherein the second image includes the second optical
code but not the first optical code; decoding the second optical
code using the second image; receiving a third image, wherein the
third image includes both the first optical code and the second
optical code, without decoding the first optical code or the second
optical code using the third image; generating a first map of the
probability of a location of the first optical code in the third
image; generating a second map of the probability of a location of
the second optical code in the third image; identifying a first
location in the third image of the first optical code, based on the
first map; and identifying a second location in the third image of
the second optical code, based on the second map.
5. The method of claim 4, wherein the first map is based on a
convolution of one or more features in the first image with the
third image.
6. The method of claim 5, wherein the one or more features includes
the first optical code.
7. The method of claim 4, wherein the first map and the second map
are based on a known scan order.
8. The method of claim 4, wherein the first map and the second map
are based on an estimated device position at a time the first image
is acquired.
9. The method of claim 4, wherein the first map and the second map
are based on identifying label positions in the third image.
10. The method of claim 4, wherein the first map is based on
identifying characters in the first image and in the third
image.
11. The method of claim 4, wherein the first map is based on image
recognition of a product in the third image identified by the first
optical code.
12. The method of claim 4, wherein the first map is based on data
from the second map.
13. The method of claim 4, further comprising searching for a scale
of the first image compared to the third image.
14. The method of claim 4, wherein the first map is based on data
of phone translation between the first image and the second
image.
15. The method of claim 4, wherein the first map is based on
matching texture features in the first image to the third
image.
16. The method of claim 4, wherein there is less than 25 degrees
rotation between the first image and the third image.
17. The method of claim 4, further comprising stitching images
together to form the third image.
18. The method of claim 4, wherein the first map is based on: a
convolution of one or more features in the first image with the
third image; a known scan order of the first optical code and the
second optical code; identifying label positions in the third
image; an estimated device position at a time the first image is
acquired; identifying characters in the first image and in the
third image; image recognition of a product in the third image
identified by the first optical code; data from the second map;
data of phone translation; and/or matching texture features in the
first image to the third image.
19. A memory device containing instructions that, when executed,
cause one or more processors to perform the following steps for
mapping optical-code images to an overview image: receiving a first
image, wherein the first image includes a first optical code but
not a second optical code; decoding the first optical code using
the first image; receiving a second image, wherein the second image
includes the second optical code but not the first optical code;
decoding the second optical code using the second image; receiving
a third image, wherein the third image includes both the first
optical code and the second optical code, without decoding the
first optical code or the second optical code using the third
image; generating a first map of the probability of a location of
the first optical code in the third image; generating a second map
of the probability of a location of the second optical code in the
third image; identifying a first location in the third image of the
first optical code, based on the first map; and identifying a
second location in the third image of the second optical code,
based on the second map.
20. The memory device of claim 19, wherein the first map is based
on: a convolution of one or more features in the first image with
the third image; and identifying label positions in the third
image.
Description
CROSS REFERENCE TO RELATED APPLICATIONS
[0001] This application claims the benefit of priority of U.S.
Provisional Application No. 63/017,493, filed on Apr. 29, 2020,
which is incorporated by reference in its entirety for all
purposes. This application is a continuation-in-part of U.S.
Non-Provisional application Ser. No. 17/139,529, filed on Dec. 31,
2020, which is a continuation of U.S. Non-Provisional application
Ser. No. 16/920,061, filed on Jul. 2, 2020, now U.S. Pat. No.
10,963,658, issued on Mar. 30, 2021, which claims the benefit of
priority of U.S. Provisional Application No. 63/017,493, filed on
Apr. 29, 2020, 63/003,675, filed on Apr. 1, 2020, and 63/019,818,
filed on May 4, 2020, which are incorporated by reference in their
entirety for all purposes. This application is related to U.S.
patent application Ser. No. 16/905,722, filed on Jun. 18, 2020, now
U.S. Pat. No. 10,846,561, issued Nov. 24, 2020, which is
incorporated by reference in its entirety for all purposes.
BACKGROUND
[0002] This disclosure generally relates to decoding codes, and
more specifically, and without limitation to decoding barcodes in a
retail environment. Barcodes have traditionally been scanned using
a specialized scanner. For example, a barcode scanner comprising a
laser is used to shine light on a barcode, and reflected light from
the barcode is detected and used to decode the barcode. As mobile
devices (e.g., smartphones and tablets) with cameras have become
more common, mobile devices are being used to decode codes by
acquiring an image of a code and using image analysis to decode the
code. An example of a method for using as smartphone to decode a
barcode is provided in U.S. Pat. No. 8,596,540, granted on Dec. 3,
2013.
BRIEF SUMMARY
[0003] This disclosure generally relates to tracking and decoding
computer-readable codes (e.g., barcodes; QR codes). For example, a
barcode can be a Stock Keeping Code (SKU) in a retail setting. More
specifically, and without limitation, this disclosure relates to
mapping images of barcodes to an overview image. The overview image
can be a composite of a plurality of images.
[0004] In some configurations, a system for mapping optical-code
images to an overview image comprises an image sensor and one or
more processors. The image sensor can be part of a mobile device.
The image sensor is configured to acquire a first image, a second
image, and a third image of a scene. The one or more processors are
configured to receive the first image, wherein the first image
includes a first optical code but not a second optical code; decode
the first optical code using the first image; receive the second
image, wherein the second image is acquired after the first image
and includes the second optical code but not the first optical
code; decode the second optical code using the second image;
receive the third image, wherein the third image includes both the
first optical code and the second optical code, without decoding
the first optical code or the second optical code using the third
image; generate a first map of the probability of a location of the
first optical code in the third image; generate a second map of the
probability of a location of the second optical code in the third
image; correlate the first optical code with a first location in
the third image, based on the first map; and/or correlate the
second optical code with a second location in the third image,
based on the second map. In some embodiments, the first map is
based on a convolution of one or more features in the first image
with the third image; a known scan order of the first barcode and
the second barcode; identifying label positions in the third image;
an estimated device position at the time the first image is
acquired; identifying characters in the first image and in the
third image; image recognition of a product in the third image
identified by the first barcode; data from the second map; data of
phone translation; and/or matching texture features in the first
image to the third image.
[0005] In some configurations, a method for mapping optical-code
images to an overview image comprises: receiving a first image,
wherein the first image includes a first optical code but not a
second optical code; decoding the first optical code using the
first image; receiving a second image, wherein the second image
includes the second optical code but not the first optical code;
decoding the second optical code using the second image; receiving
a third image, wherein the third image includes both the first
optical code and the second optical code, without decoding the
first optical code or the second optical code using the third
image; generating a first map of the probability of a location of
the first optical code in the third image; generating a second map
of the probability of a location of the second optical code in the
third image; identifying a first location in the third image of the
first optical code, based on the first map; and/or identifying a
second location in the third image of the second optical code,
based on the second map. In some embodiments, the first map is
based on a convolution of one or more features in the first image
with the third image; the one or more features includes the first
barcode; the first map and the second map are based on a known scan
order; the first map and the second map are based on an estimated
device position at the time the first image is acquired; the first
map and the second map are based on identifying label positions in
the third image; the first map is based on identifying characters
in the first image and in the third image; the first map is based
on image recognition of a product in the third image identified by
the first barcode; the first map is based on data from the second
map; the first map is based on data of phone translation between
the first image and the second image; the first map is based on
matching texture features in the first image to the third image;
there is less than ten degrees rotation between the first image and
the third image; the method further comprises searching for a scale
of the first image compared to the third image; and/or the method
further comprises stitching images together to form the third
image, before mapping optical-code images to the overview
image.
[0006] In some configurations, a method for generating an overview
image from multiple sub-images comprises receiving a plurality of
images of a scene, wherein: the plurality of images include a first
image, a second image, and a third image, the third image is
acquired by an image sensor after the second image is acquired by
the image sensor; the first image is acquired by the image sensor
before the second image is acquired by the image sensor, or the
first image is acquired by the image senor after the third image is
acquired; the second image and the third image are acquired by an
image sensor in sequential order, such that the third image is
acquired by the image sensor after the second image; receiving data
about relative positions of the plurality of images; arranging the
second image to overlap a first side of the first image, based on
the data about relative positions of the plurality of image; and/or
arranging the third image to overlap a second side of the first
image, based on the data about relative positions of the plurality
of images, wherein the second side is opposite the first side of
the first image. In some embodiments, a user interface includes an
instruction for a lateral movement; the user interface includes an
instruction for a vertical movement; the user interface includes an
instruction acquiring a central image; the user interface includes
instructions for acquiring an upper left image, an upper right
image, a lower left image, and a lower right image in relation to
the central image; there is rotational movement between the first
image and the second image but not translational; homography
candidates are calculated and/or verified; images are scaled;
images are seamed together to form the overview image; visual
similarities between images are used to match the second image to
the first image (rough alignment); after rough alignment, fine
tuning is used to further align the second image with the first
image; fine tuning includes pixel level (shift by a few pixels),
extracting visual features, detecting edges in the image (e.g.,
shelves), and/or detecting labels in images (parallax is shifted
from the label to the product); optical flow is found and correct;
and/or a seam between images is detected and moved to avoid
barcodes or labels (e.g., by assigning higher costs for stitching
through a label).
[0007] Further areas of applicability of the present disclosure
will become apparent from the detailed description provided
hereinafter. It should be understood that the detailed description
and specific examples, while indicating various embodiments, are
intended for purposes of illustration only and are not intended to
necessarily limit the scope of the disclosure.
BRIEF DESCRIPTION OF THE DRAWINGS
[0008] The present disclosure is described in conjunction with the
appended figures.
[0009] FIG. 1 depicts an embodiment of a tracking methodology.
[0010] FIG. 2 depicts an embodiment of tracking codes and not
decoding the codes in each frame.
[0011] FIG. 3 depicts a diagram of an embodiment for creating a
correlation filter.
[0012] FIG. 4 depicts a diagram of an embodiment for using the
correlation filter for generating a response map for tracking a
code.
[0013] FIG. 5 depicts an embodiment of decoding codes over multiple
frames.
[0014] FIG. 6 illustrates a flowchart of an embodiment of a process
for tracking a code.
[0015] FIG. 7 is an example of segmenting products on a shelf from
an image.
[0016] FIG. 8 is an embodiment of a shelf diagram.
[0017] FIG. 9 illustrates a flowchart of an embodiment of a process
for mapping objects on a shelving unit.
[0018] FIG. 10 is an example of blending two images of a shelf.
[0019] FIG. 11 is an embodiment of a shelf diagram with a visual
image of the shelf.
[0020] FIG. 12 illustrates a flowchart of an embodiment of a
process for creating a visual image of a shelf.
[0021] FIG. 13 depicts an embodiment of a first barcode image
having a first barcode.
[0022] FIG. 14 depicts an embodiment of a second barcode image
having a second barcode.
[0023] FIG. 15 depicts an embodiment of an overview image for
matching the first barcode and the second barcode to.
[0024] FIG. 16 depicts an embodiment of a first map of a
probability of the first barcode in the overview image.
[0025] FIG. 17 depicts an embodiment of a second map of a
probability of the second barcode in the overview image.
[0026] FIG. 18 depicts an embodiment of identifying label positions
in the overview image.
[0027] FIG. 19 depicts an embodiment of correlating the first
barcode to a first location in the overview image and correlating
the second barcode to a second location in the overview image.
[0028] FIG. 20 illustrates a flowchart of an embodiment of a
process for mapping optical-code images to an overview image.
[0029] FIG. 21 depicts an embodiment of tile positions for a
composite overview image.
[0030] FIG. 22 illustrates a flowchart of an embodiment of a
process for generating a composite image.
[0031] FIG. 23 depicts a block diagram of an embodiment of a
computer system.
[0032] In the appended figures, similar components and/or features
may have the same reference label. Further, various components of
the same type may be distinguished by following the reference label
by a dash and a second label that distinguishes among the similar
components. If only the first reference label is used in the
specification, the description is applicable to any one of the
similar components having the same first reference label
irrespective of the second reference label.
DETAILED DESCRIPTION
[0033] The ensuing description provides preferred exemplary
embodiment(s) only, and is not intended to limit the scope,
applicability, or configuration of the disclosure. Rather, the
ensuing description of the preferred exemplary embodiment(s) will
provide those skilled in the art with an enabling description for
implementing a preferred exemplary embodiment. It is understood
that various changes may be made in the function and arrangement of
elements without departing from the spirit and scope as set forth
in the appended claims.
[0034] Matrix Scan
[0035] Many applications are becoming web-based. However, web-based
applications can have less computational resources than a native
application. For example, a native application can be used to track
barcodes based on decoding barcodes from a plurality of images.
However, decoding barcodes can be computationally intense and cause
lag when moved to a web-based application. Thus, in some
embodiments, barcodes are tracked in several frames but decoded
only periodically for a web-based application used to decode
barcodes. In some embodiments, a frame is one of a series of
separate photographs that make up a film or video.
[0036] Referring first to FIG. 1, an embodiment of a tracking
scheme is depicted. This tracking scheme can be used on a native
application. In FIG. 1, a first code 104-1 and a second code 104-2
are decoded in a first frame 108-1 (e.g., initialization) at time
T=1, and positions of the first code 104-1 and the second code
104-2 are ascertained. The first frame 108-1 corresponds to a first
image acquired by an image sensor (e.g., from a camera in a mobile
device). The code 104 is an optical pattern. The code 104 can be a
machine-readable code, such as a one-dimensional bar code having a
plurality of horizontal lines or a two-dimensional barcode (e.g., a
QR code), symbol(s) (e.g., dollar sign, triangle, etc.), number(s),
and/or letter(s). For example, a code 104 can be a price, a VIN
number to identify a vehicle, a credit card number, a license plate
number, a serial number, a tire code (TIN), or a date (e.g., an
expiry date). The code 104 be in various environments and/or uses.
For example, the code 104 can be part of or on a shipping label, a
product label, a passport, a shipping invoice, a driver's license,
an ID card, a credit card, a check, a license plate, a digital
display (e.g., an electronic price label), a utility meter, ID docs
with a Machine Readable Zone (MRZ), or a retail receipt.
[0037] At time T=2, a prediction 112 is calculated (e.g., assuming
smooth motion of the mobile device); codes 104 are decoded in a
second frame 108-2 (wherein the second frame 108-2 corresponds to a
second image acquired by the image sensor); the codes 104 are
matched with the prediction 112; and an updated position 116 of
codes 104 is ascertained based on codes 104 decoded from the second
frame 108-2.
[0038] Some concerns with the scheme above can include inaccurate
and/or missing detections because bounding boxes (e.g., in updated
position 116) can be inaccurate based on decoding one-dimensional
codes. If decoding is not performed each frame, then detections can
be missed and/or detection tracks can be mismatched. An example of
a detection track is shown in updated position 116, where a dotted
outline was a previous position. Also, decoding can get more
challenging with faster movement and lower resolution.
[0039] One possible solution is to use more data, such as
predicting location of codes 104 based on a history of images
(e.g., to establish a trajectory of codes 104 in a field of view),
use known arrangements of codes 104 and/or background structures
(e.g., the shelf unit); and/or leverage additional sensor data
(e.g., inertial measurement unit (IMU) data from gyroscopes to
predict a pose change of a mobile device). However, using more data
can cause lag when using a web-based application. Further, not all
devices have the computing power and/or equipment (e.g., an IMU) to
use more data. Accordingly, another possible solution, explained in
more detail below, is to decode codes in a first frame and a third
frame, and track positions of codes 104 in a second frame between
the first frame and the third frame.
[0040] FIG. 2 depicts an embodiment of tracking and not decoding
codes in each frame. In a first frame 108-1, at time T=1, an
algorithm is used to search for locations that look like a code
104. Locations that look like the code 104 are decoded (or an
attempt is made to decode the code). Initialization of tracking the
code 104 (e.g., as described in conjunction with FIG. 3 by creating
a correlation filter) is also performed.
[0041] In the second frame 108-2, at T=2, codes 104 are tracked and
not decoded. A tracking algorithm tries to find the new position of
the tracked codes in the second frame 108-2. In the second frame
108-2, a first bounding box 204-1 is calculated in relation to the
second frame 108-2 where the first code 104-1 is calculated to be
(e.g., the bounding box 204 is simply calculated and/or overlaid on
the second frame 108-2 in a user interface of the mobile device);
and a second bounding box 204-2 is calculated in relation to the
second frame 108-2 where the second code 104-2 is calculated to be.
For example, a correlation filter is used create a response map, as
described in conjunction with FIG. 4, to determine a position of
the bounding box 204. Tracking in the second frame 108-2 does not
rely on restrictions that are present when decoding codes. For
example, the code 104 can be blurry in the second frame 108-2. In
one embodiment, the correlation filter is learned upon
initialization of tracking and is continuously updated during the
tracking process to adapt, e.g., to a change in perspective. In
another embodiment, the correlation filter operates on a selected
subset of image features, which can be very distinct for barcodes
and might get extracted using a pre-trained neural net.
[0042] In the third frame 108-3, at T=3, codes 104 are scanned and
tracked. For example, an algorithm is used to search for locations
that look like a code 104. Locations that look like the code 104
are decoded, and/or the tracking algorithm ascertains new positions
of the codes 104 in the third frame 108-3 for codes 104 that could
not, or were not, decoded.
[0043] Though only one frame, the second frame 108-2, is shown
between the first frame 108-1 and the third frame 108-3, it is to
be understood that the second frame 108-2 can be one of many frames
between the first frame 108-1 and the third frame 108-3. It is also
to be understood that actions in the third frame 108-3 can be
repeated (e.g., periodically or according to an event such, as the
position of a bounding box not moving more than a set distance that
can indicate there might be less motion blur in an image) with
actions of the second frame 108-2 occurring in between in one or
more frames. Thus the tracking algorithm can determine the position
of code between scanning for codes and/or decoding codes. For
example, a camera in a smartphone acquires images at 30 frames per
second. Actions of the third frame 108-3 in FIG. 2 are set to occur
at an interval (sometimes referred to as scan rate) of 2 Hz, and
actions of the second frame 108-2 in FIG. 2 are set to occur on
frames acquired by the camera between the interval. In some
embodiments, the interval is equal to or more frequent than 0.3,
0.5, 1, 2, or 5 Hz and equal to or less frequent than 5, 10, or 20
Hz. In some configurations, the camera has a frame rate equal to or
greater than 1, 5, 10, or 20 Hz and equal to or less than 10, 30,
60, or 120 Hz. There is a tradeoff in a frame rate used between
wanting a high frame rate so an object does not move too much
between frames and wanting to discover new barcodes appearing as
soon as possible. Applicant has found that 60 to 100 ms, such as 80
ms, between consecutive scans is a good tradeoff, because it
conveys an experience of "snappiness" to the user.
[0044] In some embodiments, only new codes detected are decoded in
subsequent frames. Thus in some embodiments, scanning for barcodes
and/or decoding barcodes is performed in only the first frame
108-1. For example, a store employee is scanning several barcodes
with her smartphone. In a first frame, scanning identifies two
barcodes, and the two barcodes are decoded. The two barcodes are
then tracked, and a green dot is overlaid on images displayed to
the user of the smartphone using the screen of the smartphone
providing an augmented-reality indication to the user of which
barcodes have been decoded. The user can see a third barcode on the
screen that is not covered by a green dot, so the user continues to
move the smartphone to scan the third barcode. In a frame where an
image is searched for locations that look like barcodes, three
locations are identified: two locations corresponding to the two
barcodes previously identified and a new location. The two
locations corresponding to the two barcodes previously identified
are not decoded, but the new location is revealing the third
barcode. The third barcode is then tracked and a green dot overlaid
on the screen where the third barcode is tracked to be. Thus the
user can see which codes have been decoded without the application
decoding each barcode each frame. Since decoding a code can take
more computation resources than tracking, tracking can improve the
function of the mobile device, especially when the application is a
web-based application and not running as a native application.
[0045] In some embodiments, scanning for codes, which can include
both searching for locations that look like codes and decoding
codes, can split to occur in different frames. For example,
searching for locations that look like codes occurs in a first
frame, and five locations are identified. In frames two through
ten, the five locations are tracked. In frame 11, a code at a first
location is decoded. In frames 12-20, the five locations are
tracked. In frame 21, a code at the second location is decoded. In
frames 22-30, the five locations are tracked. In frame 31,
searching for locations that look like codes is performed, and a
sixth location is identified. The process continues searching for
new locations and decoding codes.
[0046] In some configurations, tracking described in conjunction
with FIGS. 2-6 can allow for more stable tracking, because tracking
is not dependent on a barcode being visually decodable (e.g., as
described in FIG. 1). Tracking can provide smoother visualization
to a user due to increased frame rate, and/or reduces energy
consumption when doing "regular" barcode scanning (e.g., because
codes that have already been scanned are not rescanned). No
decoding codes each frame can be helpful in various situations. In
one example where the user is scanning multiple codes for inventory
management, barcodes can be at a far distance. As the user moves
the mobile phone quickly to scan a barcode, the barcodes can be
blurry from motion blur. Thus tracking that relies on decoding
codes can lose track of the barcode.
[0047] In some embodiments, a method comprises extracting a code
visually, establishing correspondence between frames of decoding,
predicting a location of the code, scanning again, wherein
predicting is performed by predicting locations of multiple codes
independently, and/or using only image data (e.g., not IMU
data).
[0048] In some embodiments, the tracking algorithm described in
FIGS. 2-6 can be implemented with other algorithms, such as
prediction algorithms. For example, the tracking algorithm can be
used in combination with a homographic prediction algorithm or when
the homographic prediction algorithm fails. In some embodiments,
the barcodes can be tracked as a single rigid structure. Leveraging
the fact that barcodes don't move relative to each other can help
to reduce computational complexity and improve fault
tolerance/failure detection.
[0049] FIG. 3 depicts a diagram of an embodiment for creating a
correlation filter 304. FIG. 3 is an example of initialization.
Extracted features 308 are obtained from the code 104. The
extracted features can be areas of high contrast (e.g., image
gradients), lines, corners, etc. Since in many scenarios barcodes
are placed on a rigid object or surface, extracted features 308 can
include geometric cues in addition to the code 100 itself (e.g., an
edge of a shelf or a dollar sign can be part of the extracted
features 308). In some embodiments, the extracted features 308 are
converted from a visual domain into a frequency domain using a
Fourier transform (FT) and combined with a target output 312 to
form the correlation filter 304. The Fourier transform is efficient
and reduces computation time. In another embodiment, the operation
is performed in the spatial domain. In some embodiments, to "train"
the correlation filter, the features are transformed in the
frequency domain and then a mask/filter is computed. When convolved
with the features, this will result in a target output. The target
output is a desired response map in the spatial domain (e.g., a
probability map of the object position; basically a matrix the size
of the analyzed patch with a single peak in the center of the
patch, indicating the object center position) is transformed into
the frequency domain.
[0050] FIG. 4 depicts a diagram of an embodiment for using the
correlation filter 304 for generating a response map 404 for
tracking the code 104. In FIG. 4, extracted features 408 are
obtained from an image 412. The image 412 is part of a plurality of
image frames acquired by a camera. The image 412 comprises the code
104. The extracted features 408 are converted into frequency space
using a Fourier transform and combined with the correlation filter
304 (e.g., a convolution) to generate the response map 404. The
extracted features 408 can be combined with a window 416 before
being combined with the correlation filter 304. The window 416 can
be used to restrict searching for the code 104 to an area that is
smaller than an area of the entire image 412. The response map 404
provides a magnitude of correlation. Thus the whiter an area of the
response map 404, the more likely the code 104 will be at that
area.
[0051] A frame rate for tracking can be adjusted (e.g.,
dynamically) based on several criteria including one or more of
motion speed, size of window 416, and how far away the code 104 is
from the camera (e.g., if father away, a larger window 416 is
chosen and/or frame rate of images is increased for tracking).
[0052] FIG. 5 depicts an embodiment of scanning for codes 104 over
multiple frames 508. In some configurations, scanning for codes
(e.g., identifying and/or decoding codes) can be computationally
intense. Identifying a code comprises ascertaining that a code is
present in the image and/or a position of the code in the image.
Decoding a code comprises ascertaining what the code represents.
For example, identifying a barcode includes ascertaining that a
barcode is present in a frame, and decoding the barcode comprises
interpreting a series of white and black lines of the barcode to
represent a numerical string. By distributing scanning across
multiple frames, computation can be distributed in time. FIG. 5
depicts a first frame 508-1, a second frame 508-2, a third frame
508-3, and a fourth frame 508-4. In FIG. 5, scanning is performed
over multiple frames. In the first frame 508-1, a first code 104-1
and a second code 104-2 are tracked. In the second frame 508-2, a
scan for codes is performed in just the top half of the second
frame 508-2. Accordingly, the first code 104-1 is identified and
decoded, while the second code 104-2 is tracked without being
decoded. In the third frame 508-3, codes 104 are tracked. In the
fourth frame 508-4, a scan for codes is performed in just the
bottom half of the fourth frame 508-4. Accordingly, the second code
104-2 is identified and decoded while the first code 104-1 is
tracked without being decoded. In some embodiments, a frame 508 is
divided into quadrants and discovery of new codes (e.g., a scan) is
performed on each quadrant every fourth frame. In some embodiments,
"attention" or "scene semantic" based decoding are used. In some
configurations, this can mean that decoding can be limited to
frames or parts of the observed scene that undergo a drastic visual
change, and/or decoding can be limited to regions that are of
special interest for a specific task (e.g., barcode labels, parts
of a shelf unit, etc.).
[0053] In FIG. 6, a flowchart of an embodiment of a process 600 for
tracking a code is shown. Process 600 begins in step 604 with
acquiring a plurality of frames from an image sensor, wherein the
plurality of frames each comprise an image of a code; the plurality
of frames includes a first frame and a second frame; and the second
frame is acquired after the first frame. In step 608, the code is
identified in the first frame. Features from the code are
extracted, step 612. For example, extracted features 308 are
identified from code 104 in FIG. 3. In step 616, a filter is
created. For example, the correlation filter 304 is created based
on extracted features 308 of FIG. 3.
[0054] In some embodiments, detecting and tracking a code can mean
tracking a plurality of codes, such as barcodes and lines of text.
For example, in a retail store a price label may be tracked and in
a second step one or more codes on that price label could be
decoded, including on or more barcodes identifying the product; a
sequence of numbers indicating the price, and/or a sequence of
characters describing the product name.
[0055] In step 620, features are extracted from a second frame. For
example, extracted features 408 are generated from image 412 in
FIG. 4. A response map is generated based on the extracted features
from the second frame, step 624. For example, the response map 404
is generated based on convolving the extracted features 408 with
the correlation filter 304 in FIG. 4. In step 628 a position of the
code is ascertained based on the response map. For example, areas
of higher magnitude of the response map indicate a likely position
in the response map of the code. Tracking the code as described in
process 600 does not decode the code (e.g., to save computational
resources).
[0056] A graphic indicating the position of the code, such as a box
or an outline of a box, can be overlaid on images comprising the
code to provide an augmented-reality output to a user showing a
location of the code. The graphic can change, such as changing
colors to indicate if a code has been decoded. For example, if a
code was identified as being present, but the camera was too far
away from the code to decode the code, a red outline around the
code could be displayed to the user. The user could then move the
camera closer to the code. As the code is decoded by an application
(e.g., running on the device or web-based), the graphic changes to
a green box, to indicate to the user that the code has been
successfully decoded.
[0057] By tracking the code, a subarea of a subsequent frame can be
predicted based on the position of the code in the second frame.
Thus if scanning is performed in the subsequent frame, scanning of
the code can be matched with the position of the code.
[0058] In some embodiments, a system to decode multiple optical
codes comprises: a mobile data collector enabled with a web browser
software and a display; a camera module; more than one optical
codes collocated with at least one object; and/or one or more
processors that receive the identity of more than one optical codes
in one or more images captured by the camera module, decode the one
or more optical codes, and/or visualize the decoded codes on the
display of the mobile data collector. A system can comprise: a
mobile data collector enabled with a web browser software and a
display; a camera module; more than one optical codes collocated
with at least one object; and/or one or more processors that detect
the presence of more than one optical code in an image captured by
the camera module, decode the more than one detected optical codes
in the image, visualize the location of the detected codes on the
display of the mobile data collector, and/or visualize the decoded
codes on the display of the mobile data collector. A system can
comprise: a mobile data collector enabled with a display and a
camera module; a plurality optical codes collocated with at least
one object; and/or one or more processors that detect the presence
and locations of the plurality of optical code in a first image
captured by the camera module, decode a subset of the detected
optical codes from the first image, detect the presence of the
plurality of optical code in a second image captured by the camera
module, tracking the position of the plurality of optical codes
from the first image to the second image; and/or decode a subset of
the detected optical codes from the second image, the subset
comprising of codes that have not been decoded from the first
image.
[0059] Code Mapping
[0060] Code mapping can be used to enable a user (e.g., a store
employee) to capture locations of codes (e.g., SKUs/products) in an
environment (e.g., on shelves in a store). Identifying and tracking
codes can help enable creating a virtual diagram of a physical
structure. For example, a virtual diagram of a shelving unit could
be used to help answer one of more of the following questions:
Where is a product located on a shelf, on which shelf, and/or at
what height? Is a product at a correct location (e.g., does product
placement comply with a planogram)? What does a particular shelf
visually look like? What was the state of the shelf two weeks ago?
Is the price of the product correct? Does the product correspond
with the label? Is the number of products on a shelf low (e.g.,
should the shelf be restocked)?
[0061] In some embodiments, a cloud service, such as for storage,
APIs, dashboard, etc., is used to allow multiple users to provide
and/or retrieve data. For example, in a retail setting multiple
employees with multiple mobile devices can contribute to data
capturing; and/or different stakeholders can view data for parts of
data in raw and/or aggregated versions. A cloud-based application
can also allow for faster software development cycles.
[0062] In some situations, mapping is restricted to certain
structures. For example, in a retail environment, shelves are
imaged as two-dimensional structures, wherein a plurality of
barcodes are mapped to be on one plane oriented vertically (a
normal of a vertical plane points horizontally). Products on a
shelving unit can be identified by barcodes on the shelving unit.
In some embodiments, products on the shelving unit are identified
by visual product recognition.
[0063] Data can be stored in a remote location (e.g., the Cloud).
Data can be agnostic from a capture and/or display device. For
instance, different mobile capture devices (e.g., different tablets
and/or smartphones can be used) and captured data display can be in
a mobile app or a web-based dashboard. Capture runs can be per SKU,
per shelving unit, and/or per store, and can be flexibly combined
from different capture devices.
[0064] In some configurations, code mapping can be used for
planogram compliance (e.g., comparing SKU location with a planned
location); planogram analytics (e.g., for sales numbers); in-store
navigation to a product; and/or AR marketing (e.g., instant
promotion when a customer is within a predefined distance of a
product).
[0065] In certain embodiments, creating a code map includes:
identifying a plurality of SKUs SKU (e.g., by reading a barcode on
a shelf and/or by product recognition); segmenting a SKU area
(e.g., an area of one type of product; done through image
segmentation); ascertaining a two-dimensional layout (e.g., which
SKU is next to another SKU on a shelving unit and/or aisle);
calculating distances between barcodes (e.g., absolute distance in
centimeters or relative distance); capturing a three-dimensional
layout of barcodes (e.g., relation of multiple aisles); and/or
understanding the concept of shelves (e.g., vertically divided
units).
[0066] A SKU can be identified by scanning a barcode on a shelf.
For identifying a SKU by product recognition, a classifier is run
on an image that can label the product and/or segment an outline
concurrently. The outline can be a rectangle or a pixel-wise image
segmentation (e.g., using a convolutional neural network
(CNN)-based classifier). The classifier can be pre-trained with a
generic data corpus (e.g., ImageNet), and fine-tuned/adapted to the
product recognition use-case. Preferably, new products can be added
with few examples (e.g., one-shot-learning). Examples of CNN-based
object detection frameworks are YOLO, R-CNN, SSD, etc., but some
can also be custom-built.
[0067] FIG. 7 is an example of segmenting products on a shelf from
an image. FIG. 7 depicts a frame 704, which is an image of products
708 located on shelves 712. Segmenting a SKU area can be performed
through image segmentation. If a product 708 can be recognized
directly from an image, a classifier can be used to segment a SKU
area. If the product 708 is recognized through a barcode 716 on the
shelf 712, this could be done based on image content, given the
barcode location. An example approach includes ascertaining if the
product 708 corresponding to the barcode 716 is above or below the
barcode 716. Determining if the barcode 716 is above or below a
product 708 (or to a side), can be solved with a trained binary
classifier, an ad-hoc rule, and/or a human input. Image analysis is
then used to expand the region (e.g., above or below the code) to
the left and right, based on similar image content. In the frame
704 in FIG. 7, the product 708 corresponding to the barcode 716 is
above the barcode 716. Similar image content can be measured for
example with interest point matching, (cross-)correlation, color
analysis, etc. Segmenting can be based on an assumption of using an
undistorted image and/or similar product is horizontally aligned
with shelf rows. A segment 720 (a dashed line) of the product 708
shows the product 708 corresponding with the barcode 716.
[0068] FIG. 8 is an embodiment of a shelf diagram 800. The shelf
diagram 800 provides information about a two-dimensional layout of
SKUs (e.g., which SKU is next to another SKU). In this embodiment,
SKUs are recognized by barcodes. If only one barcode is visible at
a time, a user scans barcodes in a defined order (e.g., a sequence
left to right for each shelf; starting on a top of the shelf,
moving down row by row). However, this provides only layout and not
distances (relative or actual). If more than one barcode is
visible, such as discussed above with Matrix Scan, then barcodes
are mapped relative to each other. For example, starting on a top
left of a shelf, and maintaining inside a current field of view a
barcode that was previously referenced, a newly recognized barcode
can be referenced (e.g., left, right, top, bottom) with respect one
or more known ones. Some assumptions can simplify mapping. For
example, in some configurations, it can be assumed that the barcode
capture view is aligned with shelf rows and there are no gaps.
Approximate relative distances can be calculated using spacing
between barcodes in images.
[0069] To provide relative distances between barcodes, when only
one barcode is visible in a capture view, some embodiments can
start with capturing one or more overview images of a shelving
unit. In these overview image(s), barcodes are not (or cannot be)
decoded (e.g., because they are too small, but barcode location
regions can be identified). The overview images are rectified,
distortion corrected, and aligned with shelf rows. If there are
multiple fields of view with different barcodes, they are stitched
together, or at least referenced with respect to each other.
Individual barcodes are then captured, including the image content
surrounding the individual barcodes. With help of the image
content, with or without relying on identified barcode regions,
individual barcode images can be registered into the overview
image(s). Matching can for example be done with image interest
points, using RANSAC (random sample consensus) for homography or
other image analysis for estimation. In some embodiments,
registering individual barcode images with the overview image(s)
comprising identifying horizontal lines corresponding to shelves
and using the horizontal lines for horizontal alignment of images
and/or combining images. This approach has an advantage of
capturing (relative) distances between barcodes with the help of
the overview image(s).
[0070] In some embodiments absolute distances between barcodes is
calculated (e.g., in cm). For example, Matrix scanning can be used
to track two or more barcodes at the same time, either with or
without decoding the barcodes. An algorithm can include the
assumption that shelves are planar (i.e., a two-dimensional
problem, where codes are arranged in the same plane), which can
simplify some calculations. In a first option, scale can be
determined by a known width of a reference in an image. For
example, if a height and/or width of a barcode is known, such as
1.2 cm, then that scan can be used as a reference to transform
relative measurements to absolute measurements. Barcodes of the
same type can appear in multiple images providing redundancy.
[0071] In a second option, a known, calibrated camera and a moving
device (e.g., using SLAM (simultaneous localization and mapping)),
can be used to determine scale. In a third option, a stereoscopic
camera or a depth camera can be used. The three options above are
not mutually exclusive.
[0072] A three-dimensional model of a store can be generated. The
three-dimensional model can be made by combining a plurality of
two-dimensional models (e.g., of shelving units) and/or using a
single device by implementing a SLAM algorithm or SfM (structure
from motion) pipeline (Apple's ARKit or Google's ARCore offer such
reconstruction capabilities on their respective mobile platforms).
Barcodes can be decoded in close-up scans and localized within a
three-dimensional store model. Another option would be to use
multiple cameras (stereo) and/or a depth camera. In a further
option, two-dimensional models of shelves are created and a user
identifies locations of shelves within a store (e.g., using a store
layout blueprint). In another embodiment, store layout models are
derived from existing CAD models, for example coming from initially
planning the retail surface.
[0073] The concept of shelves (e.g., vertically divided units).
From an application perspective, a shelf (or module or shelving
unit) is the describing unit between store and product. A product
is located on a shelf, and all shelves in the store have unique
identifiers. Thus it can be helpful to record physical boundaries
of the shelf in a digital representation. If shelves are mapped one
after the other, in the 2D case, then capturing shelf identifiers
can be done through manual entry of the identifier, or manual
capture of a dedicated barcode (or different code type) located at
the shelf Though automated entry can be preferred, if shelves do
not carry identifiers, manual entry can be used. In some
embodiments, the identity of the shelf can be inferred from the
position of the camera/device. For example, in some embodiments,
the location of the phone can be determined using WiFi
finger-printing, Bluetooth beacons, GPS, SLAM (simultaneous
localization and mapping), or IMU tracking. In the workflow, this
is, for example, done before capturing the SKU map of that shelf.
If the store is directly reconstructed in 3D, shelf segmentation
(vertical division) can be automatically performed by image
analysis, searching for vertical divider lines within a certain
(parametrizable) distance from each other.
[0074] An application used for scanning a shelf can be structured
in a certain way, e.g., the individual SKU's are scanned in order.
For example, a store associate can collect the SKU data in an
S-shaped pattern going (by way of example) from the upper left to
the lower right, switching direction for each individual shelf so
as to minimize travel time. Data (e.g., for each shelf) is
collected and sent to the backend. Data for each shelf can include:
Shop ID: identifier of the store; Shelf ID: identifier of the
shelf; Time: timestamp of acquisition start & end; for each
SKU: EAN/barcode, position (shelf row, item position), acquisition
timestamp, price (read price, correct price), and/or product image;
and/or smartphone identifier: to differentiate uploads from
different mobile phones.
[0075] Data for a particular product is arranged in vector form
(e.g., an n.times.1 matrix). Table I below provides a sample vector
for data mapping. The vector comprises the code and a relative
position of the code to a shelving diagram (e.g., the vector is the
"Value" column).
TABLE-US-00001 TABLE I Sample product data Descriptor Value SKU ID
2036391100003 Shelf ID 21A7 Barcode location (x) 40 Barcode
location (y) -30 Product class (e.g., dairy) 005 Product area (x)
40, 45 Product area (y) -30, -40
[0076] In FIG. 9, a flowchart of an embodiment of a process 900 for
mapping objects on a shelving unit is shown (e.g., creating a
digital map). Process 900 begins in step 904 with receiving a
plurality of images. For example, images are acquired by a camera
from mobile device. In step 608, a first item code is identified in
the plurality of images. For example, the first item code is a
barcode identifying a first SKU and identifying the first item code
comprises decoding the first item code. In step 612, a second item
code is identified in the plurality of images. For example, the
second item code is a barcode identifying a second SKU and
identifying the second item code comprises decoding the second item
code.
[0077] In step 916, a relative distance and orientation between the
first code and the second code is computed. The relative distance
and orientation could be based on positioning the first item code
and the second item code on a relative coordinate system (e.g., a
coordinate system shown in FIG. 8; such as a second barcode is four
units to the right and three units below a first barcode). In some
embodiments, an absolute distance is measured (e.g., the second
barcode x position is equal to the x position of the first barcode
plus 24 centimeters and the second barcode y position is equal to
the y position of the first barcode).
[0078] In step 920, relative distance and orientation between the
first item code and the second item code are calibrated to a
shelving diagram. For example, multiple codes are combined to the
shelving diagram shown in FIG. 8. The relative distance and
orientation between the first item code and the second item code
are used to designate a position of the first item code and the
second item code in the shelving diagram (e.g., the first item code
has coordinates of 0,0 and the second item code has coordinates of
4,-3, if the first item code was chosen as an origin of the
shelving unit).
[0079] In step 924 a first vector is generated comprising the first
item code and a relative position of the first item code to the
shelving diagram. For example, Table I provides a vector with a
barcode ID and x/y coordinates of the barcode with respect to shelf
21A7. Similarly, the second vector is generated comprising the
second item code and a relative position of the second item code to
the shelving diagram, step 928.
[0080] In some embodiments, a system to map the location of objects
on display comprise: a mobile data collector reading optical codes;
labels with optical codes collocated with the objects on display;
and/or one or more processors that receive the identity of more
than one label, compute the relative distance and orientation
between the more than one labels, and/or arrange the locations of
the more than one labels on a map.
[0081] Shelf Visualization
[0082] Shelf Visualization can be used to enable users (store
employees, managers, etc.) to visualize a current and/or previous
state of a shelf in a retail store, and/or display (e.g., overlay
or through an additional window in a graphical user interface
(GUI)) additional relevant information. Some embodiments are
restricted to planar shelves, but other embodiments include a
product on most any type of point of sale (PoS) in a store. Shelf
visualization can enable remote visual inspection of a shelf;
street-view style stroll through retail store from a couch, towards
shopping experience; and/or augmented or virtual reality
applications.
[0083] FIG. 10 is an example of blending two images of a shelf.
FIG. 10 shows a combined image 1000 of product 1008 on shelves
1012. The combined image 1000 can appear somewhat blurry because
the combined image 1000 can be formed by two or more images. Shelf
visualization can be an image stitching problem. Though panoramic
pictures are made using image stitching, constraints for shelf
visualization are different than for traditional panoramic image
stitching. For example, a shelf (e.g., a shelving unit) can be
close but does not fit onto one camera view. Shelf is captured from
multiple different viewpoints, and there can be significant (with
respect to the object) camera shift between the captured images
(e.g., parallax). However, it can be assumed that there is no
motion in the scene between the captured views. Updates of the
visualization can come from multiple capture devices, and from
multiple instants in time, even from single SKU capture instances.
Image blending often includes exposure compensation to make the
visualization look appealing. As the visualization is updated,
sifting through images of a shelf in time is possible.
[0084] In some configurations, analog to image stitching includes
the following: detecting key points and extracting invariant
descriptors in images; matching the descriptors between two images;
using RANSAC (as an example for a number of similar techniques) to
estimate a homography matrix using the matched image feature
locations in both images, wherein combine the two images only if
there are many matched features (e.g., using a threshold); applying
a warping transformation using the homography matrix obtained; and
blending images (e.g., resize, compensate exposure, find seam).
[0085] In some configurations, the problem is formulated as a
joint-optimization problem (e.g., rather than a pair-wise image
matching problem). This can be a SfM problem. This can also be done
based on other feature types, such as lines (e.g., part of the
shelf unit), not only points. In some embodiments, homography can
be constrained to improve fault tolerance, e.g., by assuming a near
front-to-parallel view. In some embodiments, this problem can is
addressed by formulating a global optimization problem. Considering
the relationship between multiple images while jointly trying to
optimize/derive rough estimates for the positions from which the
images were taken (derived visually and by correlating time stamps)
could reduce errors. In some embodiments, high-quality image
recordings can be supported by low-level image descriptors or even
a planar estimate of the performed motion taken at a much higher
frequency (e.g., estimated from the gyro or through the image;
related to tracking), allowing it to correlate images better and
optimize the "stitched" panorama image. In some embodiments,
sequences of images (or a video) are recorded in order to reduce
parallax between captured frames, and be able to render with
minimal visual artifacts. Sensors of the capturing mobile device
(IMU/acceleration sensors) can be used (and/or sensor fusion with
images based on cues) to improve the stability.
[0086] While capturing SKUs, the user can also capture close-up
images of SKUs, where the barcode is recognizable. These close-up
images can be integrated into the overview images, in order to be
able to provide a zoom in and/or show high-resolution images of the
product and/or barcode. The stitching algorithm takes into account
large parallax and scale change. Image bending is done carefully,
in order to minimize visual artifacts. In some embodiments,
close-up images are available for all SKUs, and the overview
image(s) are used to reference the close-ups with respect to each
other, and if needed, fill in content in between the close-ups. In
some embodiments, for minimal visual artifacts, the style of the
close-up images is matched to the overview image. This may include
adaptations of contrast, brightness, color saturation values, or
synthetic exposure correction (or similar techniques). In some
embodiments, blending is applied for a smooth transition between
the overview and the new close-up image, where content from both
images are superposed. In some embodiments, "seam carving"
algorithms are used for optimal seam location detection.
[0087] FIG. 11 is an embodiment of a shelf diagram 1104 with a
visual image 1108 of the shelf. Product information can be
integrated in the visualization. For example, visualization can be
searchable by product, and the searched product can be highlighted
within the image visualization; the visualization can include an
overlay with product information, such as price, stock level,
allergens, etc.; specific products can be highlighted in the images
(e.g., products that are on promotion, or should be changed to
promotion soon; or products that have the wrong price or the wrong
location in the planogram).
[0088] In FIG. 12, a flowchart of an embodiment of a process 1200
for creating a visual representation of objects on a display from
multiple images is shown. Process 1200 begins in step 1204 with
obtaining a plurality of images, wherein the plurality of images
comprise a first image and a second image. A code in the first
image is detected, step 1208. The code in the second image is
detected, step 1212. The code is decoded to receive an identity of
an object in the first image, step 1216. A set of features are
identified in the first image, step 1220. The set of features are
identified in the second image, step 1224. A position of the first
image is computed relative to a position of the second image, based
on comparing locations of the set of features in the first image to
locations of the set of features in the second image, step 1228.
The first image is blended with the second image based on computing
the first image relative to the second image to create a blended
image, step 1232. The blended image is stored with associated data
identifying the object, step 1236. In some embodiments, metadata of
the combined image comprises a link or a reference to a database
linking the combined image and the information identifying the
object.
[0089] In some embodiments a system comprises: a mobile data
collector reading optical codes and taking images of the objects on
display, labels with optical codes collocated with the objects on
display, and/or one or more processors that receive the identity of
at least one object in one or more image, compute the position of
the more than one images relative to each other, and/or compute a
panoramic image representation from the one or more images.
[0090] Mapping Optical-Code Images to an Overview Image
[0091] In some configurations, mapping is a process to localize
products inside an overview image of a retail shelf. The input
includes an overview image, which can be a single image acquired by
a camera or composed of multiple images stitched into one overview
image, and a set of images of products. The product images each
contain a price label and/or a barcode that relates to the product.
The barcode is used to identify the product and/or information
about the product. For example, the barcode provides a SKU
(stock-keeping unit). The SKU is used to refer to a product that
has a unique SKU number (e.g., for a particular store or
retailer).
[0092] Mapping can be used to localize barcodes on a shelf within
an overview image. The barcodes are generally located close to the
products, that is, either above, below, or to a side of the
product. Therefore, by localizing barcodes within an overview
image, we can infer locations of products.
[0093] To localize barcodes, code images (e.g., from the set of
images of products) contain an image of a barcode as part of the
image. The problem is to register the code images inside the
overview image. One technique is to use cross-correlation (e.g., a
correlation filter that filters a larger overview image with
smaller (e.g., much smaller, less than 1/4, 1/7, 1/10, or smaller
than the size of the overview image)). A maximum likelihood score
would indicate a matching location. In practice, it can be
challenging to use just cross-correlation because the likelihood
score can be very noisy and can be misleading at times. Reasons for
this are the perspective difference between two images (e.g., the
scene is a 3D structure and the product images are typically taken
from a different distance and angle with respect to the overview
image), similar products (similar packaging of different products,
or the same product lined up multiple times side by side), empty
spots on the shelf (e.g., they all look the same), etc.
[0094] To increase the likelihood of accurate mapping, one or more
optimizations can be used, including: [0095] Detecting price labels
(e.g., machine-learning based) on the overview images, and (try to)
ensure the code locations are within those detected price label
regions. [0096] Instructing the user to scan in a predefined
scanning order, such as from top to bottom, one row at a time (left
to right or S-shape). If this scan order is seen in the mapped
result, it gets a higher score. [0097] Using different
complementary interest point features or texture features to verify
initial high likelihood matches, and compute a second likelihood
score. [0098] Apply physical constraints, such as no two scan
images can match to the same location on the overview image. [0099]
Adjusting Scale. Another unknown is the scale between overview and
scan images, which is approximately constant across all scan
images, so we refine scale before mapping each scan image.
[0100] Optimizations can be translated into scores, such that
different output configurations of a correlation map can be
compared and the one with the highest score can be selected. In
some configuration, one or more optimizations can be disabled
(e.g., to save computing resources when matching is being performed
in certain environments), and the output can be used for
matching.
[0101] To identify product regions, a machine-learning-based
detector can be run for product facings, which detects instances of
products on the shelf. If the shelf is well stocked (e.g., no or
few empty spots), extracting product regions translates to
assigning product facings to price-labels (SKUs) and using an outer
bound. An assumption is that (at least) the facing above the price
label belongs to a SKU. By looking at visual similarities, other
product facings are grouped to the identified SKU, and the product
region can be determined. This likely holds true in cases where the
products are arranged above the price-label. This can be the case
for regular supermarket shelves, and for products such as cans,
cereals, drinks, etc. However, for other products it is not
necessarily true. For example, products can also be hanging below
price-labels, and this could be known from a user interface (UI) or
user experience (UX) input provided by a store employee when
scanning SKU codes.
[0102] The optimizations listed above are examples. The examples
are not meant to be limiting. Other optimizations can be used in
addition to, or in lieu of, optimizations listed above. For
example, measuring displacement of the scanning device during
scanning of the codes on the shelf could be used. This would help
to understand in which direction the device is moved, how far, and
therefore where approximately the next scanned code would be with
respect to the previous one. This information could be included in
the mapping process. Device displacement can be measured from an
IMU (inertial measurement unit) and/or using visual scene tracking.
However, using an IMU can be noisy for certain makes and/or models
of mobile devices, and integrating it into absolute displacement
may not be possible. Accordingly, directional data (e.g., linear
acceleration) can be acquired before, during, and/or after a code
image is taken. The directional data can be used to determine in
which direction the mobile device was moving (e.g., estimating the
arrival and departure directions with respect to the code image,
rather than tracking the mobile device between code images).
[0103] In some configurations, a mapping pipeline is run on a
server (e.g., to use increased processing speed and/or resources of
a server). A video image feed is not acquired and sent to the
server, but only the code images and/or the overview image(s) are
saved and sent to the server (e.g., to conserve upload/download
bandwidth). In some configurations, the mapping pipeline is run on
the mobile device.
[0104] In some configurations, objects are mapped to a shelving
unit (e.g., using image analysis). For example, images of
individual optical codes are mapped to an overview image because
optical codes cannot be decoded (e.g., reliably) in the overview
image. In some embodiments, an overview image is received (e.g.,
acquired by a camera and/or sent to a processor). The overview
image contains a plurality of optical codes (e.g., barcodes on
shelves at a store). A plurality of code images are received (e.g.,
acquired by the camera and/or sent to the processor or another
processor). Each code image contains an optical code of the
plurality of optical codes. The plurality of optical codes are
decoded using the code images. The code images are mapped to the
overview image. In some embodiments, two or more methods of
prediction are used to map each optical code to the overview
image.
[0105] As an example, a store clerk uses a mobile device to take an
image of a shelf unit. The shelf unit comprises a plurality of
barcodes. The store clerk then moves the mobile device closer to a
shelf of the shelf unit and begins to scan each barcode of the
shelf unit. As the mobile device is brought closer to a barcode,
the barcode is recognized, an image (e.g., a "code image") of the
barcode (e.g., and surrounding features) is acquired by the camera,
the barcode is decoded using the code image, and a green icon is
overlaid on the display of the mobile device of a location of the
barcode (e.g., using matrix scanning to track the barcode in the
field of view of the mobile device to indicate the barcode is
decoded). In some embodiments, the code image comprises two or more
barcodes of the plurality of barcodes but does not contain all the
barcodes of the plurality of barcodes of the shelf unit. For
example, two, three, or four barcodes of the shelf unit could be
decoded in a single code image. As barcodes are decoded, the store
clerk moves the mobile device across the shelf unit to acquire more
code images and decode more barcodes. As a barcode is decoded,
and/or an image of the barcode acquired, an indication of
successful decoding is displayed on the mobile device (e.g., a
green box overlay), so that the store clerk knows to proceed to
acquire additional barcodes. In some embodiments, superfast
scanning, e.g., as described in U.S. patent application Ser. No.
17/186,909, filed on Feb. 26, 2021, which is incorporated by
reference for all purposes, is used to acquire the code images. As
each code image is taken, sensor data of the mobile device (e.g.,
IMU data) can be recorded. In some configurations, a video feature
of the mobile device is used to acquire code images, but only
images having a decoded barcode are saved (e.g., to save memory
and/or computational resources).
[0106] FIG. 13 depicts an embodiment of a first barcode image 1300
having a first barcode 1304. The first barcode 1304 is part of a
first label 1308. The first label 1308 further comprises a first
description 1312 and a first price 1316. The first price 1316 is
larger than the first barcode 1304.
[0107] FIG. 14 depicts an embodiment of a second barcode image 1400
having a second barcode 1404. The second barcode 1404 is part of a
second label 1408. The second label 1408 further comprises a second
description 1412 and a second price 1416. The second price 1416 is
larger than the second barcode 1404.
[0108] FIG. 15 depicts an embodiment of an overview image 1500 for
matching the first barcode 1304 and the second barcode 1404 to. The
overview image 1500 comprises a plurality of barcodes 1504.
However, one, some, or all of the plurality of barcodes 1504 cannot
be decoded in the overview image 1500 (e.g., not sufficient
resolution, glare, etc.). If a barcode can be decoded in the
overview image, than it is an easy match.
[0109] In some configurations, a system for mapping optical-code
images (e.g., first barcode image 1300 in FIG. 13 and second
barcode image 1400 in FIG. 14) to an overview image (e.g., overview
image 1500) comprises an image sensor and one or more processors.
The image sensor can be part of a mobile device. The one or more
processors can be part of the mobile device and/or remote
processors (e.g., in the Cloud). The image sensor is configured to
acquire a first image (e.g., the first barcode image 1300 from FIG.
13), a second image (e.g., the second barcode image 1400 from FIG.
14), and a third image (e.g., the overview image 1500 of FIG.
15).
[0110] The one or more processors are configured receive the first
image, wherein the first image includes a first optical code (e.g.,
first barcode 1304 in FIG. 13) but not a second optical code (e.g.,
second barcode 1404 in FIG. 14); decode the first optical code
using the first image; receive a second image, wherein the second
image is acquired after the first image and includes the second
optical code but not the first optical code; decode the second
optical code using the second image; receive the third image,
wherein the third image includes both the first optical code and
the second optical code, without decoding the first optical code or
the second optical code using the third image; generate a first map
of the probability of a location of the first optical code in the
third image; generate a second map of the probability of a location
of the second optical code in the third image; correlate the first
optical code with a first location in the third image, based on the
first map; and correlate the second optical code with a second
location in the third image, based on the second map. For example,
the first barcode 1304 from FIG. 13 is correlated to a location of
a first barcode 1504-1 in FIG. 15; and the second barcode 1404 from
FIG. 14 is correlated to a location of a second barcode 1504-2 in
FIG. 15.
[0111] Optical codes from code images are correlated to locations
in an overview image using maps of probability. FIG. 16 is an
embodiment of a first map 1600 of a probability of the first
barcode image 1300 from FIG. 13 in the overview image 1500 in FIG.
15. The first map 1600 is based on a correlation filter (e.g., a
convolution) of one or more features (e.g., interest points, key
point, color histograms, barcodes, text, numbers, horizontal and/or
vertical lines of shelves, etc.) in the first barcode image 1300
from FIG. 13 with the overview image 1500 from FIG. 15. For
example, a convolution of features of the first barcode image 1300
from FIG. 13 (e.g., extracted features from part or all of the
first barcode image 1300, such as extracted features from the first
barcode 1304) with the overview image 1500 of FIG. 15 generates the
first map 1600. Techniques described in conjunction with FIGS. 3
and 4 can be used to generate the first map 1600 (e.g., a response
map). Lighter areas in the first map 1600 indicate a higher
likelihood where the first barcode 1304 from FIG. 13 is in the
overview image 1500 from FIG. 15.
[0112] FIG. 17 depicts an embodiment of a second map 1700 of a
probability of the second barcode image 1400 from FIG. 14 in the
overview image 1500 from FIG. 15. The second map 1700 is based on a
correlation filter (e.g., a convolution) of one or more features in
the second barcode image 1400 from FIG. 14 with the overview image
1500 from FIG. 15. For example, a convolution of features of the
second barcode image 1400 from FIG. 14 (e.g., extracted features
from part or all of the second barcode image 1400, such as
extracted features from the second barcode 1404) with the overview
image 1500 of FIG. 15 generates the second map 1700. Lighter areas
in the first map 1600 indicate a higher likelihood where the second
barcode 1404 from FIG. 14 is in the overview image 1500 from FIG.
15.
[0113] Though FIGS. 16 and 17 depict examples of maps of
probability, additional information can supplement the first map
1600 and/or the second map 1700. Different types of maps (of
probability) can be used, instead of, or in addition to, a response
map. For example, FIG. 18 depicts an embodiment of identifying
label positions 1804 in the overview image 1500. Even though some
or all optical codes on labels (e.g., barcodes, text, numbers, and
symbols) might not be able to be decode from the overview image,
algorithms can detect the presence of certain optical codes and/or
labels. For example, even though a barcode can't be decoded in the
overview image, the label and/or barcode could be detected in the
overview image. In some configurations, a label (e.g., price label)
is used as an anchor when assigning probabilities (e.g., assigned a
much higher probability).
[0114] In FIG. 18, six label positions 1804 are detected (e.g.,
using a machine learning algorithm to detect labels). The label
positions 1804 provide another map of probability of barcodes being
at a location in the overview image 1500 (e.g., there is a higher
percentage probability that an optical code will be at a label
position 1804). For example, the first map 1600 from FIG. 16 could
be combined with the map of probability of label positions 1804
from FIG. 18 to revise the first map 1600 in FIG. 16 (e.g.,
likelihood of locations not corresponding to label positions 1804
are decreased; in FIGS. 16 and 17 there are two probability dots
that do not correspond to a label position and would be
removed).
[0115] FIG. 19 depicts an embodiment of correlating optical codes
to an overview image 1500. In FIG. 19, a first optical code (e.g.,
the first barcode 1304 in FIG. 13) is correlated a first location
1904-1 in the overview image 1500, based on a first map of
probability (e.g., the first map 1600 from FIG. 16 and/or using
other probability methods). A second optical code (e.g., the second
barcode 1404 in FIG. 14) is correlated a second location 1904-2 in
the overview image 1500, based on a second map of probability
(e.g., the second map 1700 from FIG. 17 and/or using other
probability methods).
[0116] Combining detected label positions 1804 with a response map
(e.g., the first map 1600 from FIG. 16) is an example of using two
probability methods to locate an optical code in an overview image.
Additional probability methods can be used. Using two or more
probability methods can increase the probability of identifying the
correct location of a product. For example, a label position 1804
in FIG. 18 might not be detected because of glare on the label.
Using multiple probability methods can provide redundancy. In a
simple example, if there are two barcodes, and only two label
positions are detected in an overview image, then locations for
barcodes can be constrained to the label positions. However, if
there are more optical codes than detected positions, then label
positions are not as important.
[0117] Probability methods to generate a map of probability can
include one or more of: [0118] A correlation filter (e.g., as
described in conjunction with FIGS. 3, 4, 16, and 17) [0119]
Detected label positions (e.g., as described in conjunction with
FIGS. 18 and 19) [0120] Estimated device position [0121] A known
scan order [0122] Estimated device scanning direction [0123]
Identification of characters [0124] Product recognition [0125]
Combining probability maps for multiple optical codes [0126]
Matching texture features [0127] Matching feature points
[0128] In some embodiments, probability methods are run in
parallel, and then probabilities are fused (e.g., in a weighted
sum). If a probability exceeds a threshold (e.g., 65%, 75%, 80%,
90%, or 95%), then a match is made. In some embodiments, a number
of local maxima are selected and probabilities within an area
(e.g., a size of a label and/or up to a width of spacing between
labels) around a local maxima are suppressed (e.g., discard lower
probabilities). A purpose for matching code images to an overview
image can be to understand where scan images, and thus products,
are in the overview image.
[0129] Estimated device position can be based on IMU (Inertial
Measurement Unit) data, such as gyroscope and/or accelerometer
data.
[0130] A known scan order can be used to generate data for a map of
probability. In some configurations, the user is instructed to scan
in a certain order, for example top to bottom. In some
configurations, one of two directions (e.g., orthogonal directions)
are constrained. For example, in some embodiments, a user must scan
optical codes from top to bottom of a shelving unit. If the user is
moving from left to right and misses decoding an optical code, the
user can go back and scan the missed optical code. By constraining
one direction, matching barcodes to locations in an overview image
can become more accurate. By constraining movement in two
directions, the method becomes even more accurate, but less user
friendly. Accordingly, some configurations constrain movement in
one direction but not in a second (e.g., orthogonal) direction. In
some embodiments, orientation of the mobile device is also used.
For example, it can be assumed that each image is acquired while
being held in the same portrait or landscape orientation with a
common "up" direction.
[0131] Estimated device translation can be used to generate data
for a map of probability. For example, a direction of the mobile
device is recorded before, during, and/or after a code image is
acquired. Since code images are acquired in sequential order, the
direction of the mobile device can be used to infer locations or
scanning order of codes in the overview image.
[0132] Identification of characters can be used to generate data
for a map of probability. For example, prices can be larger than
barcodes (e.g., first price 1316 is larger than the first barcode
1304 on first label 1308 in FIG. 13). The price, or other
characters, can be decoded (e.g., by optical character recognition
(OCR)) and matched.
[0133] Product recognition can be used to generate data for a map
of probability. For example, product recognition can be used to
identify a product on the shelf to match with a barcode. Product
recognition does not necessarily imply complete recognition. For
example, a product could be identified as milk (but unknown as to
brand and/or percentage of milkfat, such as 1%, 2%, whole) and
another product could be identified as yogurt (but not necessarily
the flavor or brand); if a first barcode related to yogurt and a
second barcode related to milk are decoded, then the second barcode
would have a higher probability being mapped to the product
identified as milk and the first barcode would have a higher
probability being mapped to the product identified as yogurt.
Certain products have more distinct shapes (e.g., milk, shampoo,
eggs, etc.), which can be used in matching product to barcodes
(e.g., by assigning probabilities).
[0134] Combining probability maps for multiple optical codes can be
used to generate data for a map of probability. A joint probability
of multiple maps can be generated to optimize an overall
probability of codes. In a simple example, if a first barcode has a
50/50 chance to be in one of two locations, and a second barcode
has an 80% chance to be in the second location and a 20% chance to
be in the first location, then the second barcode will be mapped to
the second location and the first barcode mapped to the first
location.
[0135] Matching texture features can be used to generate data for a
map of probability. Even though a product might not be able to be
recognized, certain features could be. For example, the probability
that the first barcode 1304 in FIG. 13 would be at the second
location 1904-2 in FIG. 19 could be low based on textures of the
product above the second location 1904-2 being much different than
the product above the first location 1904-1.
[0136] Matching features points can be used to generate data for a
map of probability. Feature points such as ORB, SIFT, SURF, etc.
can be extracted from the overview and product images, and matched
between each product image and the overview to produce another
probability map for likely locations of the product image.
[0137] In matching products to locations on a shelf, it can be
useful to scale the code images. In some embodiments, a first image
is scaled to the overview image, and then other code images are
scaled similarly (e.g., assuming the mobile device is about the
same distance away while decoding codes). For example, an
exhaustive search could be made by scaling from 10% to 100% in 3%
increments. In some configurations, it is assumed that there is
little rotation between the code images and the overview image
(e.g., the user holds the mobile device in roughly in a plane
parallel with fronts of shelves. In some embodiments, there is less
than 25, 20, 10, or 5 degrees tilt between the overview image and
the code images.
[0138] FIG. 20 illustrates a flowchart of an embodiment of a
process 2000 for mapping optical-code images to an overview image.
The process 2000 beings in step 2004 with decoding a first optical
code in a first image. For example, the first image (e.g., first
barcode image 1300 of FIG. 13), a second image (e.g., second
barcode image 1400 of FIG. 14), and a third image (e.g., overview
image 1500 of FIG. 15) are received. The first image includes a
first optical code (e.g., the first barcode 1304 in FIG. 13) but
not a second optical code (e.g., the second barcode 1404 in FIG.
14). The second image includes the second optical code but not the
first optical code. The third image includes both the first optical
code and the second optical code, but the first optical code and
the second optical code are not decoded using the third image. In
step 2008, the second optical code is decoded using the second
image.
[0139] In step 2012, a first map of probability is generated. The
first map is of a location probability of the first optical code in
the third image. For example, the first map 1600 in FIG. 16 is
generated (e.g., using one or more probability methods). In step
2016, a second map of probability is generated. The second map is
of a location probability of the second optical code in the third
image. For example, the second map 1700 in FIG. 17 is generated
(e.g., using one or more probability methods).
[0140] In step 2020, the first optical code is correlated with a
first location in the third image, based on the first map. For
example, the first barcode 1304 in FIG. 13 is correlated (e.g.,
mapped) to the first location 1904-1 in FIG. 19. In step 2024, the
second optical code is correlated with a second location in the
third image, based on the second map. For example, the second
barcode 1404 in FIG. 14 is correlated to the second location 1904-2
in FIG. 19.
[0141] Multi-Tile Image Capture and Stitching for a Composite
Overview Image
[0142] In certain environments, it may be challenging to acquire an
overview image with just one camera frame. For example, in many
retail environments, space is constrained. A person taking a
picture of a store shelf might not be able to back up enough
(because there is another shelf on an opposite side of an aisle) to
take one picture as an overview image of an entire area of
interest. Retail stores, such as supermarkets, are organized into
modules. It is sometimes desired to have an overview image for each
module so that products can be mapped within the module (e.g., by
mapping optical-code images to the overview image).
[0143] One option to acquire an overview image in a constrained
space is to use a panoramic feature/mode of a camera. However, the
panoramic mode of many mobile devices do not work well for taking
an overview image in a constrained space. For example, the
panoramic mode is meant for distant images. When features are
close, the panoramic mode has a difficult time stitching the images
together. Additionally, many panoramic modes do not allow the user
to stitch together images in two dimensions. In some embodiments,
overlapping images of code images are not acquired because only
images that have a decoded barcode are kept (e.g., to save
processing/data). Though there are some applications that can use a
video feature of a camera to reproduce a "3D" image a scene, they
do so using hundreds or thousands of images, which is resource
intensive. In some embodiments, a discrete number of images (e.g.,
frames; equal to or less than 4, 5, 10, or 20 frames) are acquired
and then combined into an overview image to save computing
resources (e.g., by overlapping 2 or more corner or edge images on
a central image).
[0144] FIG. 21 depicts an embodiment of tile positions for a
composite overview image 2100. In the embodiment shown, there are
five tiles, a first tile 2104-1, a second tile 2104-2, a third tile
2104-3, a fourth tile 2104-4, and a fifth tile 2104-5. The tiles
2104 are stitched together to form the composite overview image
2100. Though five tiles 2104 are used in this example, other
numbers of tiles 2104 could be used for various applications.
[0145] In some configurations, a user interface guides the user to
take images in a predefined manner. For example, the user is
instructed to acquire five images of a scene. The user is first
instructed to acquire a first image of the first tile 2104-1, a
second image of the second tile 2104-2, a third image of the third
tile 2104-3, a fourth image of the fourth tile 2104-4, and a fifth
image of the fifth tile 2104-5, in that order. For example,
graphics presented on a display of the mobile device can show a
user what image to take next, or the user can be instructed to take
a series of images. For example, "Take a center picture; take an
upper-left picture, take an upper-right picture, take a lower-right
picture, take a lower-left picture," in relation to the center
image. Thus the user interface can include an instruction for a
lateral movement and an instruction for a vertical movement.
[0146] The first image (e.g., of the first tile 2104-1) is used as
an anchor for other images and is sometimes referred to as a center
image. Images of the tiles 2104 that are not the center image are
sometimes referred to as corner images or edge images. The center
image is captured without vertical distortion and/or horizontal
distortion (e.g., IMU sensor can be used to verify this when
capturing). By using the center image as the anchor, matching
visual features to an overlap region can be constrained (e.g.,
reduced). In some configurations, the user is instructed to only
rotate the phone (e.g., and not to translate the phone) while
taking images of tiles 2104. Motion sensors (e.g., an IMU) can be
used to verify the user is only rotating the phone while acquiring
images of tiles 2104. Accordingly, several images of a scene are
acquired (e.g., images of tiles 2104). Data about relative
positions of those images is received (e.g., a predetermined
order). A plurality of homography candidates between a corner or
edge image and the center image are calculated (e.g., by using
feature point matching at different scales between the images).
Some homography candidates are rejected early based on the known
image capture constraints (e.g., capture order of corner images)
presented in the user interface. A primary homography candidate is
selected (e.g., validated to match the center image via pixel-level
correlation after reprojection). The primary homography candidate
position is then refined in relation to the center image.
[0147] Matching a corner or edge image with the center image can be
performed individually or by global correlation of all corner
and/or edge images with the center image. The images may be
downscaled by a common scale factor before matching. This may
improve both the speed (e.g., the full images can be very high
resolution) as well as robustness (e.g., to image noise or glare)
of the matching. In some embodiments, corner and/or edge images can
be a different size (e.g., smaller) than the center image.
Accordingly, the corner and/or edge images can be scaled before
matching to the center image. Visual similarities can be used to
match the second image to the first image (e.g., rough alignment).
Images can also be rectified.
[0148] After a rough alignment, fine-tuning alignment can be
performed (e.g., to further align a corner or edge image with the
center image). Fine tuning can be performed by: [0149] pixel level
shifting (e.g., shifting by a few pixels in various directions to
find a better match); [0150] extracting visual features; [0151]
detecting edges in the images (e.g., shelves); and/or [0152]
detecting price labels in the images (e.g., and moving parallax
away from the price labels, such as to the product).
[0153] Optical flow between images may be detected and corrected
for. In some embodiments, a seam between images is detected and
moved to not pass through labels (e.g., price labels, barcodes,
etc.). For example, higher costs for stitching through a label can
be assigned to a stitching algorithm.
[0154] The center tile (e.g., the first tile 2104-1) has a first
side 2108-1 opposite (e.g., laterally opposite) a second side
2108-2 and a third side 2108-3 opposite (e.g., vertically opposite)
a fourth side 2108-4. Though corner tiles 2104-2, 2104-3, 2104-4,
and 2104-5 are shown overlapping each other, in some embodiments,
corner tiles and/or edge tiles do not overlap each other.
[0155] FIG. 22 illustrates a flowchart of an embodiment of a
process 2200 for generating a composite image (e.g., a composite
overview image). Process 2200 begins in step 2204 with receiving a
plurality of images of a scene, wherein the plurality of images
includes a first image (e.g., of the first tile 2104-1 in FIG. 21),
a second image (e.g., of the second tile 2104-2), and a third image
(e.g., of the third tile 2104-3). The third image is acquired by an
image sensor after the second image is acquired by the image
sensor. The first image is acquired by the image sensor before the
second image is acquired by the image sensor, or the first image is
acquired by the image senor after the third image is acquired. The
second image and the third image are acquired by an image sensor in
sequential order, such that the third image is acquired by the
image sensor after the second image is acquired.
[0156] In step 2208, data about relative positions of the plurality
of images is received. For example, the second image goes to the
upper left and the third image goes to the upper right of the first
image.
[0157] In step 2212, the second image is arranged to overlap a
first side of the first image, based on the data about relative
positions of the plurality of image. For example, an image of the
second tile 2104-2 is arranged to overlap the first side 2108-1
and/or the third side 2108-3 of an image of the first tile 2104-1
in FIG. 21.
[0158] In step 2216, the second image is arranged to overlap a
second side of the first image, based on the data about relative
positions of the plurality of image. For example, an image of the
third tile 2104-3 is arranged to overlap the second side 2108-2; or
an image of the fifth tile 2104-5 is arranged to overlap the fourth
side 2108-4 of an image of the first tile 2104-1 in FIG. 21.
[0159] Simultaneous use of Barcode Decoding, OCR, and/or Visual
Shelf
[0160] In some embodiments, a combined tool can allow a store
associate to do several tasks concurrently. For example, a barcode
can be scanned and a price label decoded using optical character
recognition (OCR); a price associated with the product ascertained
from scanning the barcode (e.g., by referencing a data base), and
the price associated with the product compared to the price label
to verify a match. While price verification is being performed,
images can be used to look for an out-of-stock situation. Thus
operational tasks can be done in one walkthrough by an associate or
robot.
[0161] Some decoding tasks can be computationally intense, such as
using OCR. By decoding a code once, and then tracking the code,
without decoding the code, can save computational resources.
[0162] A. Example Users in a Retail Environment
[0163] i. Susan--Retail Associate
[0164] Susan is a store associate about 30 years old. The store is
a mid-sized grocery chain. Susan has no college degree. Although
Susan knows her way around mobile devices, she is not considered
tech-savvy. Susan has been changing her retail employer three times
in the last two years. She works different shifts depending on the
week. As part of her job, her boss asks her regularly to walk
through the store and perform certain data collection tasks, such
as verifying price labels, recording shelf gaps, or verifying
planogram compliance. Susan has access to a Zebra TC52 device.
Susan's biggest concern is that she perform the store walk throughs
as quickly as possible with as much accuracy as possible.
[0165] Susan desires to run a session on her phone to quickly scan
1000s of SKUs to collect and/or verify price label information
(PLI). This is a daily task for her, and speed and accuracy are
important. As she scans SKUs for price label verification, she
would like to also be creating a digital map of the SKUs on the
shelf. It is preferable for Susan to run a session on her phone to
collect data and have the data uploaded to a server in the
Cloud.
[0166] In this example, Susan can scan a SKU, look at an individual
date code, and enter the code that is to expire the soonest into
the app. In other configurations, the individual date code is
decoded using an image of the individual date code. In another
variation, Susan enters every date code found into the app and also
specifies a quantity of product associated with each date code.
Susan can also identify a gap on a shelf while working on a
different task. She can walk up to the gap and scan the SKU
associated with the gap. In a user interface, she can optionally
specify how many products are left on the shelf.
[0167] ii. Paul--Store Manager
[0168] Paul is about 45 years old, married, and has a couple kids.
He typically works from 8 am to 6 pm as a store manager in a large
grocery chain. Paul has some college and is more of a desktop user
than a mobile-device user. He spends most of his time at his desk
in the management office behind his large computer screen. Paul has
been working for the chain for almost two decades. He likes to do
things in the way they have always been done. Paul maintains the
schedules for routine store walk throughs and directly supervises
store associates. Paul uses his iPhone 8 at work.
[0169] With a web-based application, Paul can: create a task list
listing compliance and operational issues, e.g., labels to be
reprinted and replaced; gaps to be filled, etc.; confirm on the
portal that an issue has been resolved; review reports relative to
operational tasks/data in the store, time frame, store associate
performing a task, type of event (PL non-compliance, gap scan,
etc.); check out summary statistics regarding the above and show
company trends over specified time frames/timeline; and/or zoom
into ShelfView and review most recent visual data of the shelf.
[0170] iii. Camilla--Nationwide Store Operations
[0171] Camilla is about 50 years old, married, and has 2 grown
kids. She typically works from 9 am to 7 pm and is responsible for
Store operations for 200 stores. She has an MBA degree. Camilla
fluently moves between her mobile, iPad, and Windows laptop. She is
used to monitoring operations from anywhere and at any time of day.
Camilla has been in her current role for only two months, but has
already run into resistance from the IT department when she tried
to push new software tools. Her biggest problem is that she has no
real-time visibility into the tasks being performed in stores.
Therefore, she does not know if the company is compliant with
relevant regulations.
[0172] With a web-based application, Camilla can review reports
relative to operational tasks/data in the store; filter reports to
subset of stores, time frame, store associate performing a task,
type of event (PL non-compliance, gab scan, etc.); check out
summary statistics regarding the above and show company trends over
specified time frames/timeline; and/or zoom into ShelfView and
review recent visual data of a shelf.
[0173] B. Example System Components
[0174] The web-based application can be configured for
Multi-tenants, where multiple retailers are hosted on the same
system; multiple user roles with different levels of access, such
as store associate, store management, corporate management, etc.;
multiple data elements that can include one or more of the
following fields: time and date, session identifier (user, date,
store), a decoded optical code (Universal Product Code (UPC),
International Article Number (EAN)), shelf/location identifier
placeholder, high-resolution, raw image of detected barcodes, or
other areas to be decoded (OCR), low-resolution image of full field
of view, and/or other fields (e.g., as described by the
applications below); individual data fields can be eligible for
individual deletion schedules (storage needs and privacy concerns);
customer can have the ability to schedule data deletions on a
multitude of schedules: a) daily, b) after 48 hours, c) weekly,
etc.; and/or deletion can be scheduled for dates and for specified
time periods after data collection.
[0175] The web-based application can be configured to provide data
reports, including data downloads filtered in certain formats such
as CSV, XLS, or PDF; filtered data reports can be available via a
web API.
[0176] C. Example Applications and Workflows
[0177] An application can have a single SKU mode. The single SKU
mode can be configured for walk-up gap scan/PLV SKU Scan. In this
mode, the store associate walks up to a gap on the shelf and scans
the SKU code. The user interface guides the operator in such a way
that the display location of the product is within a field of view
of the camera. The system uploads a high-resolution image of the
label and the JPG images of the surrounding area. The user has an
option to enter the following data in a manual interface: number of
inventory left, earliest date code, and/or general text entry (free
comment). On a back end, data is processed as follows: PLV
performed and added to PL event list if incorrect; added to gap
event list; log other information (inventory level, date code) if
available; and/or visual information of the shelf location is
included in a shelf panorama image.
[0178] The single SKU mode can be configured for "Walk-up data
entry" SKU Scan. In this mode, the store associate can walk up to a
price label or SKU and scan the code. The user interface can guide
the store associate in such a way that the display location of the
product is within a field of view of the camera. The system uploads
a high res image of the label and JPG images of the surrounding
area. The user has an option to enter the following data in a
manual interface: out-of-stock, number of inventory-left (including
out-of-stock and "low inventory"), earliest date code, and/or
general text entry (free comment). On a back end, the data is
processed as follows: PLV performed and added to PL event list if
incorrect; added to gap event list, if inventory is low/zero;
and/or log other information (inventory level, date code) if
available
[0179] The single SKU mode can be configured for a batch mode, such
as a systematic shelf scan. In this mode, a store associate
systematically scans all the products on a designated shelf, moving
from left to right, top to bottom. The user interface can be
configured to guide the store associate in such a way that the
display location of the product is within the field of view of the
camera. The system uploads a high-resolution image of the label and
JPG images of the surrounding area for each scan. The user (e.g.,
the store associate) has the option to enter the following data in
a manual interface: number of inventory-left including
"out-of-stock", earliest date code, and/or general text entry (free
comment). Furthermore, the user has the option to do the following:
scan or enter the shelf/module identifier (upper left corner);
erase scans to correct for mistakes; and/or restart data collection
for the same shelf/module by starting a scan from an arbitrary SKU
(data following that SKU will be overwritten). Although
left-to-right scanning is given as an example, other scan patterns
(e.g., right to left, down up then right, etc.) can be used. In
some embodiments, a scanning pattern is determined by orientation
of a device. For example, a phone held in portrait orientation
could be used to scan down so that there are like more SKUs in a
field of view of the camera, whereas a phone held in landscape
orientation could scan right to left so that more than one SKU is
likely in the field of view. On a back end, data is processed as
follows: PLV performed and added to PL event list if incorrect;
added to gap event list if inventory is low or out-of-stock; or
other information (inventory level, date code) if available; and/or
visual information of the shelf location is used to rebuild the
visual state/shelf panorama image.
[0180] D. Example Performance Metrics
[0181] In some configurations, an application has the following
performance metrics and/or tolerances. Speed: a reasonably skilled
operator spends no more than 500 ms per time to complete scanning a
single code. SKU mapping: distances between SKUs are within +/-10%
of the true distance for 99% of measured SKUs. SKU mapping and
visualization: There are not obvious visual artifacts from the
stitching process.
[0182] E. Example Image Upload and Processing
[0183] In some configurations, high-resolution raw images are
uploaded of detected codes (e.g., barcodes, price labels, etc.)
and/or JPEG quality images are uploaded of the entire image frame
showing the label, display area with inventory, and/or adjacent
labels. A barcode view finder is placed in such a way as to
reasonably assure that the inventory location is in a field of view
of the camera.
[0184] For on-device PLV, a single SKU Data Capture PLV can be
performed directly on the device and the result shown to the user
in an augmented-reality (AR) overlay. In some embodiments, it does
not take more than 500 ms for the result to be presented to the
user.
[0185] For some configurations of backend-PLV processing, there is
no more than 10% of false positives (10% of detected mistakes).
Incorrect price labels from a session/day can be accessible along
with the image data for quality review by a human operator. The
data can be presented to an online operator in such a way that the
operator can check from image to image quickly and flag/correct
labels that have been miss-decoded. After quality control, data can
be added to a PLV task list.
[0186] For some configurations of SKU mapping and visualization, as
the user is scanning one SKU after the other, a map is being built
by preserving an order of SKUs. Data collection can include
distances between SKUs while building the map. The map can include
vertical distances between shelves.
[0187] In some configurations of using manual status entry, a user
is able to count the inventory level and enter the information
through the user interface (e.g., during batch scan or individual
SKU scan). An interface on a mobile device can be configurable by
an enterprise user (e.g., to show only data entry options to be
performed by the store associate). For example, an inventory
interface can have a number entry field for the store associate it
to enter a number of items. An out-of-stock interface can include
an out-of-stock button, a low stock button, a replenish button,
and/or a specific replenish/reorder (number entry field). A
date-codes interface can include oldest date code (date entry
field), multiple date codes, and/or inventory number (e.g., list of
numbers and/or dates).
[0188] F. Example Portal Application(s)
[0189] In some configurations of SKU, Task, Event, and/or date code
reports, a user can request a report on a specific SKU showing the
following: all scan events within a specific time range, including
scans that did not result in a violation; shelf images within a
specific time range; and/or violations within a specific time
range. The user can request a report on SKU(s) and/or an event(s)
filtered by one or more of the following criteria: tasks collected
in a certain calendar range; tasks collected in a specific data
collection session; tasks collected by a specific user; tasks
within a certain product category; tasks of a specific kind (e.g.,
date codes, out-of-stock, price label correction); resolved
violations confirmed by user input (e.g., date-code violation, out
of stock, price label correction); and/or resolved violations
confirmed by shelf scan (e.g., date-code violation, out of stock,
price label correction). A user can request a report on SKUs with
date codes (e.g., a date code report) filtered by one or more of
the following criteria: date codes in a certain calendar range;
date codes within a certain product category; date codes collected
in a certain time window; date codes violations (date codes have
passed); and/or resolved date code violations.
[0190] In some configurations of shelf visualization, a backend
system can stitch together the a representation of the shelf (Shelf
Panorama) from individual SKU images (batch mode/SKU Mapping). The
visualization can be "clickable." For example, a user can click on
a section of the image or a SKU, and a metadata pop up includes the
date and time of when the image was taken, any user input from the
time the image was taken, and/or status information. In some
embodiments, the visualization is "zoomable," where a user can zoom
into (e.g., enlarge a view of) a specific shelf location or SKU.
The visualization can be searchable, where a user can type in a SKU
number or product description and "fly" to the location/imagery of
that product. When new scans/images of SKU's become available, an
old image in the panorama is replaced with the new
scans/images.
[0191] In some configurations, a dashboard or portal can include
summary statistics and/or a timeline, which can include: average
time between scans (average over all SKUs); PLV violations by
day/week; out-of-stock events by day/week; total scans by day/week;
and/or average time between scans (average over all SKUs).
[0192] In some configurations of a backend, a database can store
data from a mobile device (e.g., data listed above, including
historical data); be future proof for new structured data (e.g.,
quantity in stock); and/or allow querying of the database by SKU,
shelf, aisle, etc. In some embodiments, the backend can be used to
access a product database, for example public databases, web
databases, a retailer's database, or a manufacturers database.
Metadata can be added to every shelf, starting with shelf category
(e.g., beverages, cereals, etc.).
[0193] Filtering/querying can also be allowed by parameter (e.g.,
shelf ID, store ID, user ID, etc.). In some embodiments, a backend
process can be used to determine an out-of-stock situation from the
collected images. For security and authentication, an application
(e.g., API and/or database) can be multi-tenant capable (3-10 pilot
customers), where data uploaded by one customer is
accessible/query-able only by the one customer. A unique
configuration ID can be used to differentiate uploads. Also, in
some embodiments, no business sensitive information (e.g., prices,
item positions, etc.) is stored in cloud databases (e.g., to assure
maximum privacy and data security).
[0194] Some configurations of an API or other interfaces include an
internal API for sending data to and from a mobile device and/or
the Cloud. For example, in-store data collected by the mobile
device can be uploaded to the cloud; and/or a product database and
be downloaded from the Cloud by the mobile device, with the
possibility to select only certain columns (e.g., price and SKU
columns). An external API for customers to query a backend database
can include an API endpoint to query scans of one shelf inside a
specific store (e.g., this endpoint returns only the latest scan
and no historical data; /{store}/{shelf}); an API endpoint to
update the product database; a possibility for the backend database
to export in-store data to a CSV (comma-separated values) format;
secure APIs; allow a customer to export CSV format; and/or allow a
query by historical data (e.g., by a timespan).
[0195] Using systems and/or methods disclosed herein, the following
can be accomplished: establishment and/or maintenance of a visual
catalogue; SKU-specific and SKU-independent out-of-stock detection;
price-label verification (PLV); simultaneous mapping and PLV;
and/or OCR, and/or barcode decoding in web browser. Images can be
captured in several ways, including: mobile cameras at the shelf,
drones, robots, fixed shelf cameras, and simultaneous image capture
with mobile barcode scans.
[0196] Some embodiments disclosed relate to methods and/or systems
for operating an information system that aggregates pricing
information from retail establishments. More particularly, and
without limitation, some embodiments relate to acquiring imaging
data of an object using an imaging tool, acquiring pricing
information from signage using an imaging tool, acquiring inventory
information of a product using an imaging tool, acquiring
information from a receipt using an imaging tool, using predictive
algorithms to reconstruct pricing information, incorporating
available pricing information from publicly available sources,
incorporating pricing information from issuing recommendations as
to where to purchase a product, presenting information to one or
more third parties, issuing recommendations as to which product to
purchase based on available alternatives, operating a retail store,
and adjusting prices based on information about competitive prices.
Systems or methods can include: scanning products to image pricing
labels at a retail display or shelf; using a drone to scan products
at the shelf; using a robot to scan product at a shelf; using a
consumer device to scan products and/or pricing at a display;
scanning and/or parsing a shopping receipt to record the pricing of
the products purchased; parsing a digital receipt to record the
pricing of products purchased; and/or scanning and interpreting
signage to infer pricing and special offers.
[0197] In some embodiments, a system to detect the state of a
retail display comprises: a mobile data collector enabled with web
browser software; a camera module; labels with optical codes
collocated with the objects on display; and/or one or more
processors that receive the identity of at least one object in one
or more images captured by the camera module, receive the price on
display of at least one object in one or more images captured by
the camera module, receive the intended price of the at least one
object from a database, and/or compare the price on display with
the intended price and report the result. A system can comprise: a
mobile data collector enabled with a web browser software; a camera
module; labels with optical codes collocated with the objects on
display; and/or one or more processors that receive the identity of
at least one object in one or more images captured by the camera
module, decode information about the object on display from the one
or more images captured by the camera module, and/or compute a map
of the at least one object on display from the one or more images
captured by the camera module. A system can comprise: a mobile data
collector; a camera module; labels with optical codes collocated
with the objects on display; and/or one or more processors that
receive the identity of at least one object in one or more images
captured by the camera module, receive image data about the display
area from the one or more images captured by the camera module,
and/or detect the presence of the at least one objects in the
display area from the one or more images captured by the camera
module. A system can comprise: a mobile data collector; a camera
module; labels with optical codes collocated with the objects on
display; and/or one or more processors that receive the identity of
at least one object in one or more images captured by the camera
module, receive image data of the display area from the one or more
images captured by the camera module, detect the presence of the at
least one objects in the display area from the one or more images
captured by the camera module, and/or save the image data of the
one or more objects along with the identity of the one or more
objects to a database.
[0198] Sample Computing Device
[0199] FIG. 23 is a simplified block diagram of a computing device
2300. Computing device 2300 can implement some or all functions,
behaviors, and/or capabilities described above that would use
electronic storage or processing, as well as other functions,
behaviors, or capabilities not expressly described. Computing
device 2300 includes a processing subsystem 2302, a storage
subsystem 2304, a user interface 2306, and/or a communication
interface 2308. Computing device 2300 can also include other
components (not explicitly shown) such as a battery, power
controllers, and other components operable to provide various
enhanced capabilities. In various embodiments, computing device
2300 can be implemented in a desktop or laptop computer, mobile
device (e.g., tablet computer, smart phone, mobile phone), wearable
device, media device, application specific integrated circuits
(ASICs), digital signal processors (DSPs), digital signal
processing devices (DSPDs), programmable logic devices (PLDs),
field programmable gate arrays (FPGAs), processors, controllers,
micro-controllers, microprocessors, or electronic units designed to
perform a function or combination of functions described above.
[0200] Storage subsystem 2304 can be implemented using a local
storage and/or removable storage medium, e.g., using disk, flash
memory (e.g., secure digital card, universal serial bus flash
drive), or any other non-transitory storage medium, or a
combination of media, and can include volatile and/or non-volatile
storage media. Local storage can include random access memory
(RAM), including dynamic RAM (DRAM), static RAM (SRAM), or battery
backed up RAM. In some embodiments, storage subsystem 2304 can
store one or more applications and/or operating system programs to
be executed by processing subsystem 2302, including programs to
implement some or all operations described above that would be
performed using a computer. For example, storage subsystem 2304 can
store one or more code modules 2310 for implementing one or more
method steps described above.
[0201] A firmware and/or software implementation may be implemented
with modules (e.g., procedures, functions, and so on). A
machine-readable medium tangibly embodying instructions may be used
in implementing methodologies described herein. Code modules 2310
(e.g., instructions stored in memory) may be implemented within a
processor or external to the processor. As used herein, the term
"memory" refers to a type of long term, short term, volatile,
nonvolatile, or other storage medium and is not to be limited to
any particular type of memory or number of memories or type of
media upon which memory is stored.
[0202] Moreover, the term "storage medium" or "storage device" may
represent one or more memories for storing data, including read
only memory (ROM), RAM, magnetic RAM, core memory, magnetic disk
storage mediums, optical storage mediums, flash memory devices
and/or other machine-readable mediums for storing information. The
term "machine-readable medium" includes, but is not limited to,
portable or fixed storage devices, optical storage devices,
wireless channels, and/or various other storage mediums capable of
storing instruction(s) and/or data.
[0203] Furthermore, embodiments may be implemented by hardware,
software, scripting languages, firmware, middleware, microcode,
hardware description languages, and/or any combination thereof.
When implemented in software, firmware, middleware, scripting
language, and/or microcode, program code or code segments to
perform tasks may be stored in a machine readable medium such as a
storage medium. A code segment (e.g., code module 2310) or
machine-executable instruction may represent a procedure, a
function, a subprogram, a program, a routine, a subroutine, a
module, a software package, a script, a class, or a combination of
instructions, data structures, and/or program statements. A code
segment may be coupled to another code segment or a hardware
circuit by passing and/or receiving information, data, arguments,
parameters, and/or memory contents. Information, arguments,
parameters, data, etc. may be passed, forwarded, or transmitted by
suitable means including memory sharing, message passing, token
passing, network transmission, etc.
[0204] Implementation of the techniques, blocks, steps and means
described above may be done in various ways. For example, these
techniques, blocks, steps and means may be implemented in hardware,
software, or a combination thereof. For a hardware implementation,
the processing units may be implemented within one or more ASICs,
DSPs, DSPDs, PLDs, FPGAs, processors, controllers,
micro-controllers, microprocessors, other electronic units designed
to perform the functions described above, and/or a combination
thereof.
[0205] Each code module 2310 may comprise sets of instructions
(codes) embodied on a computer-readable medium that directs a
processor of a computing device 2300 to perform corresponding
actions. The instructions may be configured to run in sequential
order, in parallel (such as under different processing threads), or
in a combination thereof. After loading a code module 2310 on a
general purpose computer system, the general purpose computer is
transformed into a special purpose computer system.
[0206] Computer programs incorporating various features described
herein (e.g., in one or more code modules 2310) may be encoded and
stored on various computer readable storage media. Computer
readable media encoded with the program code may be packaged with a
compatible electronic device, or the program code may be provided
separately from electronic devices (e.g., via Internet download or
as a separately packaged computer-readable storage medium). Storage
subsystem 2304 can also store information useful for establishing
network connections using the communication interface 2308.
[0207] User interface 2306 can include input devices (e.g., touch
pad, touch screen, scroll wheel, click wheel, dial, button, switch,
keypad, microphone, etc.), as well as output devices (e.g., video
screen, indicator lights, speakers, headphone jacks, virtual- or
augmented-reality display, etc.), together with supporting
electronics (e.g., digital-to-analog or analog-to-digital
converters, signal processors, etc.). A user can operate input
devices of user interface 2306 to invoke the functionality of
computing device 2300 and can view and/or hear output from
computing device 2300 via output devices of user interface 2306.
For some embodiments, the user interface 2306 might not be present
(e.g., for a process using an ASIC).
[0208] Processing subsystem 2302 can be implemented as one or more
processors (e.g., integrated circuits, one or more single-core or
multi-core microprocessors, microcontrollers, central processing
unit, graphics processing unit, etc.). In operation, processing
subsystem 2302 can control the operation of computing device 2300.
In some embodiments, processing subsystem 2302 can execute a
variety of programs in response to program code and can maintain
multiple concurrently executing programs or processes. At a given
time, some or all of a program code to be executed can reside in
processing subsystem 2302 and/or in storage media, such as storage
subsystem 2304. Through programming, processing subsystem 2302 can
provide various functionality for computing device 2300. Processing
subsystem 2302 can also execute other programs to control other
functions of computing device 2300, including programs that may be
stored in storage subsystem 2304.
[0209] Communication interface 2308 can provide voice and/or data
communication capability for computing device 2300. In some
embodiments, communication interface 2308 can include radio
frequency (RF) transceiver components for accessing wireless data
networks (e.g., WiFi network; 3G, 4G/LTE; etc.), mobile
communication technologies, components for short-range wireless
communication (e.g., using Bluetooth communication standards, NFC,
etc.), other components, or combinations of technologies. In some
embodiments, communication interface 2308 can provide wired
connectivity (e.g., universal serial bus, Ethernet, universal
asynchronous receiver/transmitter, etc.) in addition to, or in lieu
of, a wireless interface. Communication interface 2308 can be
implemented using a combination of hardware (e.g., driver circuits,
antennas, modulators/demodulators, encoders/decoders, and other
analog and/or digital signal processing circuits) and software
components. In some embodiments, communication interface 2308 can
support multiple communication channels concurrently. In some
embodiments the communication interface 2308 is not used.
[0210] It will be appreciated that computing device 2300 is
illustrative and that variations and modifications are possible. A
computing device can have various functionality not specifically
described (e.g., voice communication via cellular telephone
networks) and can include components appropriate to such
functionality.
[0211] Further, while the computing device 2300 is described with
reference to particular blocks, it is to be understood that these
blocks are defined for convenience of description and are not
intended to imply a particular physical arrangement of component
parts. For example, the processing subsystem 2302, the storage
subsystem, the user interface 2306, and/or the communication
interface 2308 can be in one device or distributed among multiple
devices.
[0212] Further, the blocks need not correspond to physically
distinct components. Blocks can be configured to perform various
operations, e.g., by programming a processor or providing
appropriate control circuitry, and various blocks might or might
not be reconfigurable depending on how an initial configuration is
obtained. Embodiments can be realized in a variety of apparatus
including electronic devices implemented using a combination of
circuitry and software. Electronic devices described herein can be
implemented using computing device 2300.
[0213] Various features described herein, e.g., methods, apparatus,
computer-readable media and the like, can be realized using a
combination of dedicated components, programmable processors,
and/or other programmable devices. Processes described herein can
be implemented on the same processor or different processors. Where
components are described as being configured to perform certain
operations, such configuration can be accomplished, e.g., by
designing electronic circuits to perform the operation, by
programming programmable electronic circuits (such as
microprocessors) to perform the operation, or a combination
thereof. Further, while the embodiments described above may make
reference to specific hardware and software components, those
skilled in the art will appreciate that different combinations of
hardware and/or software components may also be used and that
particular operations described as being implemented in hardware
might be implemented in software or vice versa.
[0214] Specific details are given in the above description to
provide an understanding of the embodiments. However, it is
understood that the embodiments may be practiced without these
specific details. In some instances, well-known circuits,
processes, algorithms, structures, and techniques may be shown
without unnecessary detail in order to avoid obscuring the
embodiments.
[0215] While the principles of the disclosure have been described
above in connection with specific apparatus and methods, it is to
be understood that this description is made only by way of example
and not as limitation on the scope of the disclosure. Embodiments
were chosen and described in order to explain the principles of the
invention and practical applications to enable others skilled in
the art to utilize the invention in various embodiments and with
various modifications, as are suited to a particular use
contemplated. It will be appreciated that the description is
intended to cover modifications and equivalents.
[0216] Also, it is noted that the embodiments may be described as a
process which is depicted as a flowchart, a flow diagram, a data
flow diagram, a structure diagram, or a block diagram. Although a
flowchart may describe the operations as a sequential process, many
of the operations can be performed in parallel or concurrently. In
addition, the order of the operations may be re-arranged. A process
is terminated when its operations are completed, but could have
additional steps not included in the figure. A process may
correspond to a method, a function, a procedure, a subroutine, a
subprogram, etc.
[0217] A number of variations and modifications of the disclosed
embodiment(s) can also be used. For example, though several
embodiments are for shelves in a store, other environments could be
coded and/or visualized. For example, a warehouse, a logistics
facility, a storage facility, a postal or parcel facility, supplies
at an auto repair shop, or art supplies at a university can be
tracked and/or visualized.
[0218] A recitation of "a", "an", or "the" is intended to mean "one
or more" unless specifically indicated to the contrary. Patents,
patent applications, publications, and descriptions mentioned here
are incorporated by reference in their entirety for all purposes.
None is admitted to be prior art.
* * * * *