U.S. patent application number 14/497117 was filed with the patent office on 2015-04-02 for off-target tracking using feature aiding in the context of inertial navigation.
The applicant listed for this patent is QUALCOMM Incorporated. Invention is credited to Christopher Brunner, Murali Ramaswamy Chari, Mahesh Ramachandran, Arvind Ramanandan, Abhishek Tyagi.
Application Number | 20150092048 14/497117 |
Document ID | / |
Family ID | 52739773 |
Filed Date | 2015-04-02 |
United States Patent
Application |
20150092048 |
Kind Code |
A1 |
Brunner; Christopher ; et
al. |
April 2, 2015 |
Off-Target Tracking Using Feature Aiding in the Context of Inertial
Navigation
Abstract
A Visual Inertial Tracker (VIT), such as a Simultaneous
Localization And Mapping (SLAM) system based on an Extended Kalman
Filter (EKF) framework (EKF-SLAM) can provide drift correction in
calculations of a pose (translation and orientation) of a mobile
device by obtaining location information regarding a target,
obtaining an image of the target, estimating, from the image of the
target, measurements relating to a pose of the mobile device based
on the image and location information, and correcting a pose
determination of the mobile device using an EKF, based, at least in
part, on the measurements relating to the pose of the mobile
device.
Inventors: |
Brunner; Christopher; (San
Diego, CA) ; Ramanandan; Arvind; (San Diego, CA)
; Ramachandran; Mahesh; (San Jose, CA) ; Tyagi;
Abhishek; (San Diego, CA) ; Chari; Murali
Ramaswamy; (San Diego, CA) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
QUALCOMM Incorporated |
San Diego |
CA |
US |
|
|
Family ID: |
52739773 |
Appl. No.: |
14/497117 |
Filed: |
September 25, 2014 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
61883921 |
Sep 27, 2013 |
|
|
|
Current U.S.
Class: |
348/135 |
Current CPC
Class: |
H04W 4/33 20180201; G01S
5/0263 20130101; G01S 5/163 20130101; G06T 7/80 20170101; H04M
2250/52 20130101; G06T 2207/10016 20130101; G01S 5/16 20130101;
G06T 7/579 20170101; H04M 2250/12 20130101; G01S 5/0294 20130101;
G06K 9/00671 20130101; H04W 4/021 20130101; H04W 4/029 20180201;
G01C 21/165 20130101; G06T 7/248 20170101; G01S 19/14 20130101;
G06T 2207/30244 20130101; G01C 25/005 20130101 |
Class at
Publication: |
348/135 |
International
Class: |
G06K 9/00 20060101
G06K009/00; G06T 7/60 20060101 G06T007/60 |
Claims
1. A method of correcting drift in a tracking system of a mobile
device, the method comprising: obtaining location information
regarding a target; obtaining an image of the target, the image
captured by the mobile device; estimating, from the image of the
target, measurements relating to a pose of the mobile device based
on the image and location information, wherein the pose comprises
information indicative of a translation and orientation of the
mobile device; and correcting a pose determination of the mobile
device using an Extended Kalman Filter (EKF), based, at least in
part, on the measurements relating to the pose of the mobile
device.
2. The method of claim 1, further comprising obtaining absolute
coordinates of the target, wherein correcting the pose is further
based, at least in part, on the absolute coordinates of the
target.
3. The method of claim 1, further comprising processing the image
of the target to determine that the target was captured in the
image, wherein the processing includes comparing one or more
keypoints of the image with one or more keypoints of each target in
a plurality of known targets.
4. The method of claim 3, further comprising: receiving one or more
wireless signals from one or more access points; determining a
proximity of the one or more access points based on wireless
signals; and determining the plurality of known targets, based on
the determined proximity of the one or more access points.
5. The method of claim 1, wherein the tracking system incorporates
a Simultaneous Localization And Mapping (SLAM) system with the
EKF.
6. The method of claim 1, wherein the pose determination is based,
at least in part, on measurements from one or more of an
accelerometer or a gyroscope of the mobile device.
7. The method of claim 6, further comprising determining a bias of
one or more of the accelerometer or the gyroscope of the mobile
device, based at least in part on the pose determination and the
corrected pose.
8. A mobile device comprising: a camera; a memory; and a processing
unit operatively coupled with the camera and the memory and
configured to: obtain location information regarding a target;
obtain, an image of the target, the image captured by the camera of
the mobile device; estimate, from the image of the target,
measurements relating to a pose of the mobile device based on the
image and location information, wherein the pose comprises
information indicative of a translation and orientation of the
mobile device; and correct a pose determination of the mobile
device using an Extended Kalman Filter (EKF), based, at least in
part, on the measurements relating to the pose of the mobile
device.
9. The mobile device of claim 8, wherein the processing unit is
configured to obtain absolute coordinates of the target, wherein
the processing unit is further configured to correct the pose is
based, at least in part, on the absolute coordinates of the
target.
10. The mobile device of claim 8, wherein the processing unit is
configured to process the image of the target to determine that the
target was captured in the image, and wherein processing the image
includes comparing one or more features of the image with one or
more features of each target in a plurality of known targets.
11. The mobile device of claim 10, further comprising a wireless
communication interface configured to receive one or more wireless
signals from one or more access points, wherein the processing unit
is further configured to: determine a proximity of the one or more
access points based on wireless signals; and determine the
plurality of known targets, based on the determined proximity of
the one or more access points.
12. The mobile device of claim 8, wherein the processing unit is
configured to incorporate a Simultaneous Localization And Mapping
(SLAM) system with the EKF.
13. The mobile device of claim 8, further comprising one or more
motion sensors, wherein the processing unit is further configured
to determine the pose determination based, at least in part, on one
or more measurements received from the one or more motion
sensors.
14. The mobile device of claim 13, wherein the one or more motion
sensors include one or more of an accelerometer or a gyroscope.
15. The mobile device of claim 13, wherein the processing unit is
configured to determine a bias of the one or more motion sensors,
based at least in part on the pose determination and the corrected
pose.
16. An apparatus comprising: means for obtaining location
information regarding a target; means for obtaining an image of the
target, the image captured by a mobile device; means for
estimating, from the image of the target, measurements relating to
a pose of the mobile device based on the image and location
information, wherein the pose comprises information indicative of a
translation and orientation of the mobile device; and means for
correcting a pose determination of the mobile device using an
Extended Kalman Filter (EKF), based, at least in part, on the
measurements relating to the pose of the mobile device.
17. The apparatus of claim 16, further comprising means for
obtaining absolute coordinates of the target, wherein the means for
correcting the pose is configured to base the corrected pose, at
least in part, on the absolute coordinates of the target.
18. The apparatus of claim 16, further comprising means for
processing the image of the target to determine that the target was
captured in the image, wherein the means for processing the image
include means for comparing one or more features of the image with
one or more features of each target in a plurality of known
targets.
19. The apparatus of claim 18, further comprising: means for
receiving one or more wireless signals from one or more access
points; means for determining a proximity of the one or more access
points based on wireless signals; and means for determining the
plurality of known targets, based on the determined proximity of
the one or more access points.
20. The apparatus of claim 16, further comprising means for
incorporating a Simultaneous Localization And Mapping (SLAM) system
with the EKF.
21. The apparatus of claim 16, further comprising means for basing
the pose determination, at least in part, on measurements of one or
more of an accelerometer or a gyroscope of the mobile device.
22. The apparatus of claim 21, further comprising means for
determining a bias of one or more of the accelerometer or the
gyroscope of the mobile device, based at least in part on the pose
determination and the corrected pose.
23. A non-transitory machine-readable medium having instructions
embedded thereon for correcting drift in a tracking system of a
mobile device, the instructions including computer code for:
obtaining location information regarding a target; obtaining an
image of the target, the image captured by the mobile device;
estimating, from the image of the target, measurements relating to
a pose of the mobile device based on the image and location
information, wherein the pose comprises information indicative of a
translation and orientation of the mobile device; and correcting a
pose determination of the mobile device using an Extended Kalman
Filter (EKF), based, at least in part, on the measurements relating
to the pose of the mobile device.
24. The non-transitory machine-readable medium of claim 23, the
instructions further including computer code for obtaining absolute
coordinates of the target, wherein the computer code is further
configured to base correcting the pose, at least in part, on the
absolute coordinates of the target.
25. The non-transitory machine-readable medium of claim 23, the
instructions further including computer code for processing the
image of the target to determine that the target was captured in
the image, wherein the computer code for processing includes
computer code for comparing one or more features of the image with
one or more features of each target in a plurality of known
targets.
26. The non-transitory machine-readable medium of claim 25, the
instructions further including computer code for: receiving one or
more wireless signals from one or more access points; determining a
proximity of the one or more access points based on wireless
signals; and determining the plurality of known targets, based on
the determined proximity of the one or more access points.
27. The non-transitory machine-readable medium of claim 23, the
instructions further including computer code for incorporating a
Simultaneous Localization And Mapping (SLAM) system with the
EKF.
28. The non-transitory machine-readable medium of claim 23, wherein
the computer code can be configured to base the pose determination,
at least in part, on measurements from one or more of an
accelerometer or a gyroscope of the mobile device.
29. The non-transitory machine-readable medium of claim 28, the
instructions further including computer code for determining a bias
of one or more of the accelerometer or the gyroscope of the mobile
device, based at least in part on the pose determination and the
corrected pose.
Description
[0001] This application claims the benefit of U.S. Provisional
Patent Application Ser. No. 61/883,921, entitled "OFF-TARGET
TRACKING USING FEATURE AIDING IN THE CONTEXT OF INERTIAL
NAVIGATION," filed on Sep. 27, 2013, which is hereby incorporated
by reference for all purposes as if fully set forth herein.
BACKGROUND
[0002] Camera based tracking of the mobile device's rotation and
translation can be executed by a mobile device (e.g., mobile phone,
tablet, heads-up display, and the like) to enable the mobile device
to provide a wide variety of features, such as augmented reality
and location tracking. In order to more accurately track mobile
device's translation and orientation--also known as the six degrees
of freedom, or "pose"--a mobile device may additionally incorporate
information from sensors such as gyroscopes, accelerometers, GPS
receivers, and the like. However, sensor noise and modeling errors
can cause a tracking system to "drift," resulting in inaccurate
pose determinations. These inaccuracies will build on each other,
increasing over time, unless measures are taken to correct these
pose determinations. Furthermore, GPS reception is poor to
non-existent indoors and inertial sensors alone cannot provide
absolute pose.
SUMMARY
[0003] A Visual Inertial Tracker (VIT), such as a Simultaneous
Localization And Mapping (SLAM) system based on an Extended Kalman
Filter (EKF) framework (EKF-SLAM) can provide drift correction in
calculations of a pose (translation and orientation) of a mobile
device.
[0004] An example method of correcting drift in a tracking system
of a mobile device, according to the disclosure, includes obtaining
location information regarding a target, obtaining an image of the
target, the image captured by the mobile device, and estimating,
from the image of the target, measurements relating to a pose of
the mobile device based on the image and location information. The
pose comprises information indicative of a translation and
orientation of the mobile device. The method further comprises
correcting a pose determination of the mobile device using an EKF,
based, at least in part, on the measurements relating to the pose
of the mobile device.
[0005] The example method of correcting drift in a tracking system
of a mobile device can include one or more of the following
features. The method can include obtaining absolute coordinates of
the target, where correcting the pose is further based, at least in
part, on the absolute coordinates of the target. The method can
include processing the image of the target to determine that the
target was captured in the image, where the processing includes
comparing one or more keypoints of the image with one or more
keypoints of each target in a plurality of known targets. The
method can include receiving one or more wireless signals from one
or more access points, determining a proximity of the one or more
access points based on wireless signals, and determining the
plurality of known targets, based on the determined proximity of
the one or more access points. The tracking system can incorporate
a Simultaneous Localization And Mapping (SLAM) system with the EKF.
The pose determination can be based, at least in part, on
measurements from one or more of an accelerometer or a gyroscope of
the mobile device. The method can include determining a bias of one
or more of the accelerometer or the gyroscope of the mobile device,
based at least in part on the pose determination and the corrected
pose.
[0006] An example mobile device, according to the disclosure, can
include a camera, a memory, and a processing unit. The processing
unit is operatively coupled with the camera and the memory and
configured to obtain location information regarding a target,
obtain, an image of the target, the image captured by the camera of
the mobile device, and estimate, from the image of the target,
measurements relating to a pose of the mobile device based on the
image and location information, where the pose comprises
information indicative of a translation and orientation of the
mobile device. The processing unit is further configured to correct
a pose determination of the mobile device using an Extended Kalman
Filter (EKF), based, at least in part, on the measurements relating
to the pose of the mobile device.
[0007] The example mobile device can include one or more of the
following features. The processing unit can be configured to obtain
absolute coordinates of the target and further configured to
correct the pose is based, at least in part, on the absolute
coordinates of the target. The processing unit can be configured to
process the image of the target to determine that the target was
captured in the image, where processing the image includes
comparing one or more features of the image with one or more
features of each target in a plurality of known targets. The mobile
device may include a wireless communication interface configured to
receive one or more wireless signals from one or more access
points, and the processing unit can be further configured to
determine a proximity of the one or more access points based on
wireless signals, and determine the plurality of known targets,
based on the determined proximity of the one or more access points.
The processing unit can be configured to incorporate a Simultaneous
Localization And Mapping (SLAM) system with the EKF. The mobile
device can include one or more motion sensors, and the processing
unit can be further configured to determine the pose determination
based, at least in part, on one or more measurements received from
the one or more motion sensors. The one or more motion sensors can
include one or more of an accelerometer or a gyroscope. The
processing unit can be configured to determine a bias of the one or
more motion sensors, based at least in part on the pose
determination and the corrected pose.
[0008] An example apparatus, according to the disclosure, can
include means for obtaining location information regarding a
target, means for obtaining an image of the target, the image
captured by a mobile device, and means for estimating, from the
image of the target, measurements relating to a pose of the mobile
device based on the image and location information, where the pose
comprises information indicative of a translation and orientation
of the mobile device. The example apparatus further includes means
for correcting a pose determination of the mobile device using an
Extended Kalman Filter (EKF), based, at least in part, on the
measurements relating to the pose of the mobile device.
[0009] The example apparatus can further include one or more of the
following features. The apparatus can include means for obtaining
absolute coordinates of the target, where the means for correcting
the pose is configured to base the corrected pose, at least in
part, on the absolute coordinates of the target. The apparatus can
include means for processing the image of the target to determine
that the target was captured in the image, where the means for
processing the image include means for comparing one or more
features of the image with one or more features of each target in a
plurality of known targets. The apparatus can include means for
receiving one or more wireless signals from one or more access
points, means for determining a proximity of the one or more access
points based on wireless signals, and means for determining the
plurality of known targets, based on the determined proximity of
the one or more access points. The apparatus can include means for
incorporating a Simultaneous Localization And Mapping (SLAM) system
with the EKF. The apparatus can include means for basing the pose
determination, at least in part, on measurements of one or more of
an accelerometer or a gyroscope of the mobile device. The apparatus
can include means for determining a bias of one or more of the
accelerometer or the gyroscope of the mobile device, based at least
in part on the pose determination and the corrected pose.
[0010] A example non-transitory machine-readable medium, according
to the disclosure, can have instructions embedded thereon for
correcting drift in a tracking system of a mobile device. The
instructions include computer code for obtaining location
information regarding a target, obtaining an image of the target,
the image captured by the mobile device, and estimate, from the
image of the target, measurements relating to a pose of the mobile
device based on the image and location information, where the pose
comprises information indicative of a translation and orientation
of the mobile device. The instructions also include computer code
for correcting a pose determination of the mobile device using an
Extended Kalman Filter (EKF), based, at least in part, on the
measurements relating to the pose of the mobile device.
[0011] The example non-transitory machine-readable medium can
further include instructions including computer code for one or
more of the following features. Instructions can include computer
code for obtaining absolute coordinates of the target, wherein the
computer code is further configured to base correcting the pose, at
least in part, on the absolute coordinates of the target.
Instructions can include computer code for processing the image of
the target to determine that the target was captured in the image,
wherein the computer code for processing includes computer code for
comparing one or more features of the image with one or more
features of each target in a plurality of known targets.
Instructions can include computer code for receiving one or more
wireless signals from one or more access points, determining a
proximity of the one or more access points based on wireless
signals, and determining the plurality of known targets, based on
the determined proximity of the one or more access points.
Instructions can include computer code for incorporating a
Simultaneous Localization And Mapping (SLAM) system with the EKF.
The computer code can be configured to base the pose determination,
at least in part, on measurements from one or more of an
accelerometer or a gyroscope of the mobile device. Instructions can
include computer code for determining a bias of one or more of the
accelerometer or the gyroscope of the mobile device, based at least
in part on the pose determination and the corrected pose.
[0012] Items and/or techniques described herein may provide one or
more of the following capabilities, as well as other capabilities
not mentioned. Techniques can provide for the mitigation of drift
in an indoor location tracking system, such as a Visual Inertial
Tracker (VIT), providing for increased accuracy. This, in turn, can
lead to a better user experience of applications and/or other
features of a mobile device that are dependent on the indoor
location tracking system. These and other advantages and features
are described in more detail in conjunction with the text below and
attached figures.
BRIEF DESCRIPTION OF THE DRAWINGS
[0013] An understanding of the nature and advantages of various
embodiments may be realized by reference to the following figures.
In the appended figures, similar components or features may have
the same reference label. Further, various components of the same
type may be distinguished by following the reference label by a
dash and a second label that distinguishes among the similar
components. If only the first reference label is used in the
specification, the description is applicable to any one of the
similar components having the same first reference label
irrespective of the second reference label.
[0014] FIG. 1 is simplified image that can help illustrate how a
Visual Inertial Tracker (VIT) can utilize targets for pose
estimation and or correction, according to an embodiment.
[0015] FIG. 2 is a block diagram of an example VIT.
[0016] FIG. 3 is a flow chart of a high-level process of drift
correction, according to an embodiment, which can be executed by a
VIT or other tracking system.
[0017] FIG. 4 is a flow diagram of a method of correcting drift in
VIT or other tracking system, according to an embodiment.
[0018] FIG. 5 is a block diagram of an embodiment of a mobile
device.
DETAILED DESCRIPTION
[0019] The detailed description set forth below in connection with
the appended drawings is intended as a description of various
aspects of the present disclosure and is not intended to represent
the only aspects in which the present disclosure may be practiced.
Each aspect described in this disclosure is provided merely as an
example or illustration of the present disclosure, and should not
necessarily be construed as preferred or advantageous over other
aspects. The detailed description includes specific details for the
purpose of providing a thorough understanding of the present
disclosure. However, it will be apparent to those skilled in the
art that the present disclosure may be practiced without these
specific details. In some instances, well-known structures and
devices are shown in block diagram form in order to avoid obscuring
the concepts of the present disclosure. Acronyms and other
descriptive terminology may be used merely for convenience and
clarity and are not intended to limit the scope of the
disclosure.
[0020] Mobile devices, such as mobile phones, media players,
tablets, head-mounted displays (HMDs) and other wearable electronic
devices, and the like, can often execute applications and/or
provide features that utilize the mobile device's translation and
orientation, or "pose." Tracking the mobile device's pose in a
spatial coordinate system such as the Earth-Centered, Earth-Fixed
(ECEF) coordinate system can be accomplished in any of a variety of
ways. Often times, this is done utilizing built-in sensors of the
mobile device, such as accelerometers, gyroscopes, a Global
Positioning System (GPS) receiver, and the like. A determination of
the mobile device's pose can be used to enable or enhance
navigation, games, and/or other applications.
[0021] Where GPS positioning is unavailable or unreliable, such as
in indoor environments, the tracking of a mobile device's pose can
be done by a Visual Inertial Tracker (VIT), which can combine
measurements from visual tracking systems (which utilize visual
sensors such as cameras) with measurements from inertial tracking
systems (which utilize inertial sensors such as accelerometers and
gyroscopes). Other sensors can be utilized as well, such as a
barometer or altimeter to determine and/or correct altitude
measurements. That said, where GPS coordinates are available, they
may also be used to provide absolute location information to a VIT.
One such embodiment of a VIT incorporates a Simultaneous
Localization And Mapping (SLAM) system based on an Extended Kalman
Filter (EKF) framework: the EKF receives various measurements from
different sensors as listed above to track the pose of a phone.
This type of system is referred to herein as an EKF-SLAM system. A
keyframe based parallel tracking and mapping system ("PTAM") is
another example of a SLAM system.
[0022] Despite using both visual and inertial measurements, drift
can still be a problem for such VITs. In other words, inaccuracies
in these inputs can accumulate over time. The drift of current
state-of-the-art VIT systems is about 1% over distance. So, for
example, a VIT executed on a mobile device held buy a walking user
will drift 1 meter for every 100 meters the user walks.
[0023] Embodiments are described herein that can implement pose or
provide drift correction in VITs (such as the EKF-SLAM system
previously described) by providing a pose measurement derived from
an image of a target with a known position. This pose measurement
can be used to correct drift. Moreover, such correction can take
place each time a VIT captures an image of a known target.
[0024] FIG. 1 is simplified image that can help illustrate how a
VIT can utilize such targets for pose correction, according to one
embodiment. In this embodiment, a user can use an application
executed on a mobile device 120 to track the user's position in a
store. Depending on the functionality of the application, it may
display information on a display 130 of the mobile device 120, such
as the user's position on a map of the store and/or where certain
items may be located.
[0025] The application may track the user's position from
information obtained by a VIT, which is also executed by the mobile
device 120. As part of the tracking process, the camera may capture
an image 100 (e.g., a video frame) of a certain portion of the
store that may have one or more targets 110. Targets 110 can be
images or objects with known locations, shown in FIG. 1 as aisle
signs (targets 110-1 and 100-2 at the end of aisles 1 and 2,
respectively). The targets 110 include one or more keypoints
recognizable by the detection algorithm that uses the keypoints to
provide the pose measurement to the VIT. When one or more targets
enter the camera view, an accurate pose can be determined as a
result of the known location(s) of the keypoints on the target(s)
110. This can happen in 4 steps: keypoint detection, keypoint
matching, outlier rejection, pose estimation based on minimization
of reprojection error. Example implementations are described in
"Real-Time Detection and Tracking for Augmented Reality on Mobile
Phones" by Daniel Wagner et al in IEEE Transactions on
Visualization and Computer Graphics, 2009, found at
http://dl.acm.org/citation.cfm?id-1605359, which is incorporated by
reference herein. The pose in the VIT can then be replaced or
updated by the pose obtained from the keypoints on the target to
reduce drift. (Alternatively, the pose may be initialized the VIT
if this is the first absolute pose measurement.)
[0026] As an example, if the image 100 reveals that the pose
obtained from the keypoints on the target 110-1 corresponds to the
pose determined by the VIT when the image 100 was taken, then no
drift correction is needed. However, if there is a mismatch between
the pose obtained from the keypoints on the target 110-1 and the
pose determined by the VIT, then drift correction is needed.
[0027] In some embodiments, drift correction can include resetting
the VIT's pose by replacing the pose previously determined by the
VIT with the pose obtained using the keypoints on the target 110-1.
Moreover, some embodiments may avoid the additional processing
requirements it would take to frequently run keypoint detection by
analyzing the VIT pose and VIT pose uncertainty and executing
keypoint detection only if keypoints on a target are predicted to
be in view.
[0028] Of course, different embodiments may choose to perform such
drift correction differently, depending on desired functionality,
processing considerations, and other factors. For example, a pose
calculation can be calculated for each image in which a target
appears, once for a given period of time and/or set of video
frames, once for each series of video frames in which a target
appears, and the like. A person of ordinary skill in the art will
recognize many additional variations.
[0029] FIG. 2 is a block diagram of an example VIT 200. This VIT
200 employs an EKF-SLAM topology that utilizes a computer vision
(CV) component 210 and an EKF component 220. These components can
be executed in hardware and/or software of a mobile device 120, an
example of which is described in further detail below with regard
to FIG. 5.
[0030] As illustrated in FIG. 2, the CV component 210 can receive
images, or camera frames, from a mobile device, where the camera
frames having accurate time stamps. This can enable the CV
component 210 to determine when an image is captured, which can be
combined or fused in the EKF with time stamped inertial sensor
measurements from the same mobile device. Depending on desired
functionality, the camera frames can be captured as a series of
still images and/or as part of video. Embodiments utilizing video
capture can, for example, receive images at 30 frames per second.
Other embodiments may utilize other frame rates. In some
embodiments, only a portion of frames captured by a camera may be
provided to and/or utilized by the CV component. Some embodiments
may have CV components that utilize all frames.
[0031] The CV component can also receive camera calibration
information. Intrinsic camera calibration includes, for example,
principal point, focal length at infinity, and radial distortion.
Extrinsic camera calibration parameters include rotation and
translation with respect to inertial sensor chip. Rotation can be
estimated in the VIT or assumed to be lined up with the camera. All
other camera calibration parameters are very similar for a mobile
device of one type. Hence, obtaining them, for example, from a
certain model of mobile phone allows the calibration parameters to
be applied to all mobile phones of that certain model.
[0032] Using the camera frames and camera calibration, the CV
component 210 can employ any of a variety of algorithms to
implement keypoint detection and keypoint tracking on the received
camera frames. Keypoint detection can be based on Harris corners.
Keypoint tracking can be based on Normalized Cross-Correlation
(NCC). The keypoint detection/tracking provides 2-D keypoint
measurements in the camera frame that are relayed to the EKF
component 220. The EKF component 220, utilizing sensor measurements
(as detailed below) can calculate and share the predicted 2D camera
coordinates of keypoints with the CV component 210 to limit the
search space of the image point finder, increasing efficiency of
the process. The 2D camera coordinates of keypoints measured by the
image point finder and provided to the EKF component 220 are
ultimately used to estimate the pose of the mobile device 120.
[0033] In addition to the VIT pose, the EKF component 220 utilizes
the 2-D keypoint measurements from the CV component 210, together
with sensor measurements from a gyroscope ("gyro meas" in FIG. 2),
accelerometer ("accel meas"), and the like, to jointly estimate the
three-dimensional (3D) position of the keypoints. M, biases on
accelerometers and gyroscopes, and the gravity vector. For more
information on an EKF-SLAM implementation, see Jones, Eagle S., and
Stefano Soatto, "Visual-inertial navigation, mapping and
localization: A scalable real-time causal approach." The
International Journal of Robotics Research 30.4 (2011): 407-430,
which is incorporated by reference herein in its entirety.
[0034] GPS measurements ("GPS meas") can also be provided to the
EKF component to provide an absolute coordinate framework in which
the mobile device 120 may be tracked. For example, a mobile device
120 initially may be in a location that can receive GPS
measurements, and may therefore determine absolute location
coordinates for the mobile device 120. As the mobile device moves
to a location in which GPS measurements are not received (e.g.,
indoors), the VIT 200 may determine absolute coordinates of the
mobile device 120 based on the mobile device's movement relative to
a position in which absolute coordinates were determined based on
GPS information.
[0035] Embodiments can further provide, as an input to the EKF
component 220, a pose measurement derived from an image of a
target. As described above, the pose measurement can come from a
keypoint detector, which can not only detect keypoints of a target
in an image but also determine the pose of the mobile device 120
based on the image. In some embodiments, the pose measurement may
be provided from the CV component 210 and/or may be derived from 2D
camera coordinates. Other embodiments may provide additional and/or
alternative components to provide target detection and/or pose
measurement.
[0036] Details regarding the targets can be locally stored and/or
accessible by the VIT 200. Details can include location information
such as absolute location of the target and/or its keypoints for
pose calculation based on an image of the target. Additionally or
alternatively, the details can include information regarding how
the target may be identified.
[0037] By being able to determine pose in relation to the target
and by knowing the absolute coordinates of the target (and/or one
or more keypoints of the target), the absolute pose can be
determined and used by the EKF component 220 to correct for any
drift that might have taken place. Correction can include, for
example, overriding a pose calculated by the EKF component 220 with
the newly-determined pose measurement.
[0038] Depending on desired functionality, the EKF component 220
can output various types of data. As indicated in FIG. 2, for
example, the EKF component 220 can output bias of an accelerometer,
gyroscope, and/or other sensor ("accel bias gyro bias"), the
determined pose of the mobile device ("pose of the phone"), 3D
locations of keypoints ("3-D locations of keypoints"), and/or an
estimation of the gravity vector ("gravity"). Any or all of these
outputs may be influenced by the pose measurement determined from a
detected target in a camera frame. The EKF component 220 seeks to
minimize innovations between predicted and measured 2-D camera
keypoints and can adjust inertial sensor biases, pose, gravity
vector, and location of keypoints in 3-D to that end.
[0039] The creation of keypoints on targets for pose correction in
a VIT can be done in any manner of different ways, for any manner
of different applications. For instance, if a picture of the target
is taken from the fronto-parallel view, keypoints can be determined
using the Fast Corner algorithm, scale can be provided by measuring
the distance between two of the keypoints, and descriptors can be
obtained from the pixel measurements in the vicinity of the
respective keypoints. The placement and designation of targets for
a venue can vary, depending on desired functionality. Broadly
speaking, the more targets that are located in and distributed
throughout a venue, the more drift correction they can provide to a
VIT, providing more accurate pose determination. The designation of
targets and the creation of a venue map can be facilitated through
an application on a mobile device. Additionally or alternatively,
the data associated with the map--such as location information
regarding the targets utilized to obtain pose measurements using
the techniques described herein--can be collected with the
designation of targets and incorporated into the map.
[0040] FIG. 3 is a flow chart of a high-level process of drift
correction, according to one embodiment, which can be executed by a
VIT or other tracking system. More specifically, means for
performing one or more of the illustrated components can include
hardware and/or software means described in further detail in
relation to FIG. 5, which may be logically separated into separate
components, such as the components described in FIG. 2. Some or all
of the components can be executed by hardware and/or software at an
operating system or device level. Other embodiments may include
alterations to the embodiments shown. Components shown in FIG. 3
may be performed in a different order and/or simultaneously,
according to different embodiments. Moreover, a person of ordinary
skill in the art will recognize many additions, omissions, and/or
other variations.
[0041] The process can start by receiving a camera image at block
310. The type of camera image can vary in resolution, color, and/or
other characteristics, depending on desired functionality, camera
hardware, and/or other factors. Moreover, the camera image may be a
discrete still image or may be one of several frames of video
captured by the camera. In some embodiments, the image may be
processed to a degree before it is received by a VIT, to facilitate
further image processing by the VIT.
[0042] At block 320, the VIT optionally receives WiFi signals,
which can facilitate the determination of which targets may be
included in the received image. For example, wireless signals may
be utilized together with a map of a venue that includes the
identity and locations of WiFi access points. If the locations of
certain access points can be determined from the map, the VIT can
get a rough estimate of where in the venue the VIT (and any mobile
device associated therewith) is. The VIT can do this by measuring
WiFi signals received from the WiFi access points by the mobile
device (e.g., measuring received signal strength (RSSI), round-trip
time (RTT), and/or other measurements) to determine a proximity of
the access points--including which access points may be closest.
This can then be compared with the map to determine a region in the
venue in which the mobile device is located.
[0043] The VIT of the mobile device can then reduce processing
loads related to target detection by determining nearby targets
based on the WiFi signals, at block 330, and reducing the targets
to detect to the nearby targets. Such optional functionality can be
beneficial, for instance, when a customer starts an application
that uses the map only after having entered the venue. With no
location initially, the VIT can benefit from detecting rough
location from WiFi signals. Furthermore, tight integration of GPS
with VIT would allow for an initial position estimate using GPS
measurements (code, Doppler, carrier phase) in addition to the
regular VIT measurements listed above.
[0044] It will be understood that, although WiFi signals are
described in the embodiment shown in FIG. 3, additional or
alternative wireless signals may be utilized.
[0045] At block 340, the image is processed to detect targets. As
previously described, targets can be images or objects with known
locations. Thus, the VIT can utilize a detection algorithm in which
the image is processed to determine whether certain keypoints of
the image match with keypoints of known targets by, for example,
comparing the keypoints of the image with keypoints of one or more
known targets (e.g., targets having keypoints and location
information stored for comparison). Depending on algorithm(s) used
and implement a variety of detection and matching techniques, from
simple edge detection to the recognition of more complex patterns,
symbols, and more.
[0046] At block 350, a determination is made of whether a target is
in the image. Techniques for making the determination can vary
based on the detection algorithm(s) involved. In some embodiments,
detection algorithms can determine whether a target is in the image
by determining whether one or more keypoints in the image match
with one or more corresponding keypoints of a known target to a
degree above a threshold level of certainty.
[0047] If a target is not determined to be in an image, the process
ends (potentially restarting with the receipt of a new image).
However, if a target is determined to be in the image, a pose is
determined based on the camera image, at block 360. As explained
above with regard to FIG. 2, pose determination can utilize any of
a variety of techniques to determine pose based on the known
location of the target in the image, as well as information
obtained from and/or associated with the image itself. For example,
VIT can determine a distance and orientation of the target in
relation to the mobile device (e.g., by analyzing characteristics
of detected keypoints in the image, such as location, spacing,
etc.), and use this information, together with the known location
(and orientation) of the target, to determine a pose of the mobile
device.
[0048] At block 370, the pose from the target can be provided to an
EKF of the VIT in the manner described previously, allowing the VIT
to correct (e.g., adjust or replace) a posed determination (which
may have been previously and/or separately determined from visual
and/or inertial sensor input), based on the pose provided in the
process of FIG. 3.
[0049] FIG. 4 is a flow diagram of another, more generalized method
400 of correcting drift in VIT or other tracking system, according
to one embodiment. Means for performing one or more of the
components of the method 400 can include hardware and/or software
means described in further detail in relation to FIG. 5, which may
be logically separated into different components, such as the
components described in FIG. 2. The method 400, and other
techniques described herein, can be executed by hardware and/or
software at an operating system or device level. Alternative
embodiments may include alterations to the embodiments shown.
Components of the method 400, although illustrated in a particular
order, may be performed in a different order and/or simultaneously,
according to different embodiments. Moreover, a person of ordinary
skill in the art will recognize many additions, omissions, and/or
other variations.
[0050] At block 410, location information regarding a target is
obtained. As indicated previously, location information regarding a
target can include keypoints associated with coordinates in a
spatial coordinate system. Corresponding descriptors of one or more
targets can be associated with a map, such as a map of a location
in which a VIT system is used to track a mobile device's pose. In
some embodiments, this information (as well as information for
other targets of a venue) can be stored on a server of the venue,
and transferred to a mobile device (e.g., wirelessly via WiFi using
an application executed by the mobile device) when the mobile
device enters or approaches the venue.
[0051] At block 420 image of the target is captured by the mobile
device. The image can be captured as part of a VIT tracking
process, and may be one of a series of video frames. As previously
described, the image may be processed to extract keypoints from the
image and use one or more detection algorithms to determine whether
the target is in the image. For example, algorithms may include
comparing one or more keypoints extracted from the image with one
or more keypoints of each target in a plurality of known targets of
a venue.
[0052] At block 430 measurements relating to a pose of the mobile
device are estimated using the keypoints positioned on the targets.
As previously indicated, positioned keypoints can be used to reveal
the pose of the mobile device in a spatial coordinate system.
[0053] At block 440, a pose determination of the tracking system is
corrected using an EKF, based on the measurements relating to the
pose of the mobile device. As described above, the tracking system
can use visual and inertial information to make the pose
determination of the mobile device, which can be used in various
applications, such as indoor navigation, augmented reality, and
more. Because the pose determination is subject to drift, it can be
corrected (e.g., modified or replaced) by providing measurements
estimated from the image to an EKF. For example, a pose measurement
can be obtained from a target using the process of keypoint
detection, keypoint matching, outlier rejection, and pose
estimation as described above. The pose measurement can then be
provided to an EKF component to correct the pose of the mobile
device. In alternative embodiments, a correction to a pose may not
involve an EKF.
[0054] Depending on desired functionality, different embodiments
may implement variations on the method 400 of correcting drift in
VIT illustrated in FIG. 4. For example, in one implementation, a
customer may be able to simply to point the phone at a target to
get pose to get a map and his or her position on the map without
any prior and/or subsequent tracking. That is, the pose obtained
from a target may provide an initial pose to a VIT in addition or
as an alternative to replacing a pose previously determined by the
VIT. In some embodiments, a VIT may further obtain absolute
coordinates of the target, enabling the VIT to correct a determined
pose of the mobile device using the keypoints of the target and
absolute coordinates associated therewith. A person of ordinary
skill in the art will recognize many additional variations.
[0055] FIG. 5 is a block diagram of an embodiment of a mobile
device 120, which can implement the techniques for correcting a
pose determination of the tracking system, such as the method 400
shown in FIG. 4. It should be noted that FIG. 5 is meant only to
provide a generalized illustration of various components, any or
all of which may be utilized as appropriate. Moreover, system
elements may be implemented in a relatively separated or relatively
more integrated manner. Additionally or alternatively, some or all
of the components shown in FIG. 5 can be utilized in another
computing device, which can be used in conjunction with a mobile
device 120 as previously described.
[0056] The mobile device 120 is shown comprising hardware elements
that can be electrically coupled via a bus 505 (or may otherwise be
in communication, as appropriate). The hardware elements may
include a processing unit 510 which can include without limitation
one or more general-purpose processors, one or more special-purpose
processors (such as digital signal processors (DSPs), graphics
acceleration processors, application specific integrated circuits
(ASICs), and/or the like), and/or other processing structure or
means, which can be configured to perform one or more of the
methods described herein, including methods illustrated in FIGS.
3-4. As shown in FIG. 5, some embodiments may have a separate DSP
520, depending on desired functionality. The mobile device 120 also
can include one or more input devices 570, which can include
without limitation one or more camera(s), a touch screen, a touch
pad, microphone, button(s), dial(s), switch(es), and/or the like;
and one or more output devices 515, which can include without
limitation a display, light emitting diode (LED), speakers, and/or
the like.
[0057] The mobile device 120 might also include a wireless
communication interface 530, which can include without limitation a
modem, a network card, an infrared communication device, a wireless
communication device, and/or a chipset (such as a Bluetooth.TM.
device, an IEEE 502.11 device, an IEEE 502.15.4 device, a WiFi
device, a WiMax device, cellular communication facilities, etc.),
and/or the like. The wireless communication interface 530 may
permit data to be exchanged with a network, wireless access points,
other computer systems, and/or any other electronic devices
described herein. The communication can be carried out via one or
more wireless communication antenna(s) 532 that send and/or receive
wireless signals 534.
[0058] Depending on desired functionality, the wireless
communication interface 530 can include separate transceivers to
communicate with base transceiver stations (e.g., base transceiver
stations of a cellular network) and access points. These different
data networks can include, an OFDMA and/or other type of
network.
[0059] The mobile device 120 can further include sensor(s) 540, as
previously described. Such sensors can include, without limitation,
one or more accelerometer(s), gyroscope(s), camera(s),
magnetometer(s), altimeter(s), microphone(s), proximity sensor(s),
light sensor(s), and the like. At least a subset of the sensor(s)
540 can provide camera frames and/or inertial information used by a
VIT for tracking.
[0060] Embodiments of the mobile device may also include a
Satellite Positioning System (SPS) receiver 580 capable of
receiving signals 584 from one or more SPS satellites using an SPS
antenna 582. Such positioning can be utilized to complement and/or
be incorporated in the techniques described herein. It can be noted
that, as used herein, an SPS may include any combination of one or
more global and/or regional navigation satellite systems and/or
augmentation systems, and SPS signals may include SPS, SPS-like,
and/or other signals associated with such one or more SPS. GPS is
an example of an SPS.
[0061] The mobile device 120 may further include and/or be in
communication with a memory 560. The memory 560 can include,
without limitation, local and/or network accessible storage, a disk
drive, a drive array, an optical storage device, a solid-state
storage device, such as a random access memory ("RAM"), and/or a
read-only memory ("ROM"), which can be programmable,
flash-updateable, and/or the like. Such storage devices may be
configured to implement any appropriate data structures, such as
the FIFO and/or other memory utilized by the techniques described
herein, and may be allocated by hardware and/or software elements
of an OFDM receiver. Additionally or alternatively, data structures
described herein can be implemented by a cache or other local
memory of a DSP 520 or processing unit 510. Memory can further be
used to store an image stack, inertial sensor data, and/or other
information described herein.
[0062] The memory 560 of the mobile device 120 also can comprise
software elements (not shown), including an operating system,
device drivers, executable libraries, and/or other code, such as
one or more application programs, which may comprise computer
programs provided by various embodiments, and/or may be designed to
implement methods, and/or configure systems, provided by other
embodiments, as described herein. Merely by way of example, one or
more procedures described with respect to the method(s) discussed
above, such as the methods illustrated in FIGS. 3-4, might be
implemented as code and/or instructions executable by the mobile
device 120 (and/or processing unit 510 within a mobile device 120)
and/or stored on a non-transitory and/or machine-readable storage
medium (e.g., a "computer-readable storage medium," a
"machine-readable storage medium," etc.). In an aspect, then, such
code and/or instructions can be used to configure and/or adapt a
general purpose processor (or other device) to perform one or more
operations in accordance with the described methods.
[0063] It will be apparent to those skilled in the art that
substantial variations may be made in accordance with specific
requirements. For example, customized hardware might also be used,
and/or particular elements might be implemented in hardware,
software (including portable software, such as applets, etc.), or
both. Further, connection to other computing devices such as
network input/output devices may be employed.
[0064] The methods, systems, and devices discussed above are
examples. Various configurations may omit, substitute, or add
various procedures or components as appropriate. For instance, in
alternative configurations, the methods may be performed in an
order different from that described, and/or various stages may be
added, omitted, and/or combined. Also, features described with
respect to certain configurations may be combined in various other
configurations. Different aspects and elements of the
configurations may be combined in a similar manner. Also,
technology evolves and, thus, many of the elements are examples and
do not limit the scope of the disclosure or claims.
[0065] The term Computer Vision (CV) application as used herein
refers to a class of applications related to the acquisition,
processing, analyzing, and understanding of images. CV applications
include, without limitation, mapping, modeling--including 3D
modeling, navigation, augmented reality applications, and various
other applications where images acquired from an image sensor are
processed to build maps, models, and/or to derive/represent
structural information about the environment from the captured
images. In many CV applications, geometric information related to
captured images may be used to build a map, model, and/or other
representation of objects and/or other features in a physical
environment.
[0066] It can be further noted that, although examples described
herein are implemented by a mobile device, embodiments are not so
limited. Embodiments can include, for example, personal computers
and/or other electronics not generally considered "mobile." A
person of ordinary skill in the art will recognize many alterations
to the described embodiments.
[0067] Terms, "and" and "or" as used herein, may include a variety
of meanings that also is expected to depend at least in part upon
the context in which such terms are used. Typically, "or" if used
to associate a list, such as A, B, or C, is intended to mean A, B,
and C, here used in the inclusive sense, as well as A, B, or C,
here used in the exclusive sense. In addition, the term "one or
more" as used herein may be used to describe any feature,
structure, or characteristic in the singular or may be used to
describe some combination of features, structures, or
characteristics. However, it should be noted that this is merely an
illustrative example and claimed subject matter is not limited to
this example. Furthermore, the term "at least one of" if used to
associate a list, such as A, B, or C, can be interpreted to mean
any combination of A, B, and/or C, such as A, AB, AA, AAB, AABBCCC,
etc.
[0068] Having described several example configurations, various
modifications, alternative constructions, and equivalents may be
used without departing from the spirit of the disclosure. For
example, the above elements may be components of a larger system,
wherein other rules may take precedence over or otherwise modify
the application of embodiments. Also, a number of steps may be
undertaken before, during, or after the above elements are
considered. Accordingly, the above description does not bound the
scope of the claims.
* * * * *
References