U.S. patent application number 14/051282 was filed with the patent office on 2015-04-16 for method and apparatus for device orientation tracking using a visual gyroscope.
This patent application is currently assigned to Nvidia Corporation. The applicant listed for this patent is Nvidia Corporation. Invention is credited to Josh Abbott, Jared Heinly, Jim van Welzen.
Application Number | 20150103183 14/051282 |
Document ID | / |
Family ID | 52809333 |
Filed Date | 2015-04-16 |
United States Patent
Application |
20150103183 |
Kind Code |
A1 |
Abbott; Josh ; et
al. |
April 16, 2015 |
METHOD AND APPARATUS FOR DEVICE ORIENTATION TRACKING USING A VISUAL
GYROSCOPE
Abstract
A method for tracking device orientation on a portable device is
disclosed. The method comprises initializing a device orientation
to a sensor orientation, wherein the sensor orientation is based on
information from an inertial measurement unit (IMU) sensor. It also
comprises initiating visual tracking using a camera on the portable
device and capturing a frame. Next, it comprises determining a
plurality of visual features in the frame and matching the frame to
a keyframe, wherein capture of the keyframe precedes capture of the
frame. Subsequently, it comprises computing a rotation amount
between the frame and the keyframe. Responsive to a determination
that a rotational distance between the frame and the keyframe
exceeds a predetermined threshold, promoting the frame to a
keyframe status and adding it to a first orientation map and
adjusting the frame with all prior captured keyframes.
Inventors: |
Abbott; Josh; (Raleigh,
NC) ; Heinly; Jared; (Chapel Hill, NC) ; van
Welzen; Jim; (Durham, NC) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
Nvidia Corporation |
Santa Clara |
CA |
US |
|
|
Assignee: |
Nvidia Corporation
Santa Clara
CA
|
Family ID: |
52809333 |
Appl. No.: |
14/051282 |
Filed: |
October 10, 2013 |
Current U.S.
Class: |
348/169 |
Current CPC
Class: |
G01C 21/165 20130101;
G06T 7/74 20170101; H04N 5/2621 20130101; H04N 5/272 20130101; G06T
2207/30244 20130101; H04N 5/23264 20130101; G01C 11/00 20130101;
G06T 7/248 20170101; H04N 5/23293 20130101; G06T 2207/10016
20130101 |
Class at
Publication: |
348/169 |
International
Class: |
G06T 7/00 20060101
G06T007/00; G06T 19/00 20060101 G06T019/00 |
Claims
1. A method for tracking device orientation on a portable device,
said method comprising: initializing a device orientation to a
sensor orientation, wherein said sensor orientation is based on
information from an inertial measurement unit (IMU) sensor;
initiating visual tracking using a camera on said portable device;
capturing a frame using said camera; determining a plurality of
visual features in said frame; matching said frame to a keyframe,
wherein capture of said keyframe precedes capture of said frame,
and wherein said keyframe was captured using visual tracking;
computing a rotation amount between said frame and said keyframe;
responsive to a determination that a rotational distance between
said frame and said keyframe exceeds a predetermined threshold,
promoting said frame to a keyframe status and adding said frame to
a first orientation map; and performing an adjustment of said frame
with all prior captured keyframes.
2. The method of claim 1, wherein said determining comprises:
computing repeatable and distinct features in said frame using a
feature detecting procedure; and detecting and describing said
features using a feature description procedure.
3. The method of claim 2, wherein said feature detecting procedure
is selected from a group consisting of: FAST, Harris &
Stephens, Plessey, Shi-Tomasi, Moravec corner detection, Wang and
Brady corner detection, and SUSAN corner detector.
4. The method of claim 2, wherein said feature description
procedure is selected from a group consisting of: SURF, SIFT and
BRIEF.
5. The method of claim 1, further comprising: responsive to a
determination that a rotational distance between said frame and
said keyframe is within said predetermined threshold, continuing to
search for next keyframe.
6. The method of claim 1, wherein said matching is performed using
RANSAC.
7. The method of claim 1, wherein said computing is performed using
Horn's procedure.
8. The method of claim 1 further comprising: responsive to a loss
of said visual tracking, determining device orientation by
combining delta values from said IMU sensor with a last known
orientation measurement from said visual tracking; and building a
second orientation map, wherein said second orientation map is
created using data from said IMU sensor.
9. The method of claim 8, further comprising: determining if said
second orientation map overlaps with keyframes from said
orientation map; and responsive to a determination of overlap,
deleting said second orientation map and continuing to build said
first orientation map.
10. The method of claim 8, further comprising: determining if said
second orientation map overlaps with keyframes from said first
orientation map; and responsive to a determination of overlap,
merging said second orientation map with said first orientation
map.
11. The method of claim 8, further comprising: determining if said
second orientation map overlaps with keyframes from said first
orientation map; and responsive to a determination of no overlap,
continuing to build said second orientation map.
12. A computer-readable storage medium having stored thereon
instructions that, if executed by a computer system cause the
computer system to perform a method for tracking device orientation
on a portable device, said method comprising: initializing a device
orientation to a sensor orientation, wherein said sensor
orientation is based on information from an inertial measurement
unit (IMU) sensor; initiating visual tracking using a camera on
said portable device; capturing a frame using said camera;
determining a plurality of visual features in said frame; matching
said frame to a keyframe, wherein capture of said keyframe precedes
capture of said frame, and wherein said keyframe was captured using
visual tracking; computing a rotation amount between said frame and
said keyframe; responsive to a determination that a rotational
distance between said frame and said keyframe exceeds a
predetermined threshold, promoting said frame to a keyframe status
and adding said frame to a first orientation map; and performing an
adjustment of said frame with all prior captured keyframes.
13. The computer-readable medium as described in claim 12, wherein
said determining comprises: computing repeatable and distinct
features in said frame using a feature detecting procedure; and
detecting and describing said features using a feature description
procedure.
14. The computer-readable medium as described in claim 13, wherein
said feature detecting procedure is selected from a group
consisting of: FAST, Harris & Stephens, Plessey, Shi-Tomasi,
Moravec corner detection, Wang and Brady corner detection, and
SUSAN corner detector.
15. The computer-readable medium as described in claim 13, wherein
said feature description procedure is selected from a group
consisting of: SURF, SIFT and BRIEF.
16. The computer-readable medium as described in claim 12, further
comprising: responsive to a determination that a rotational
distance between said frame and said keyframe is within said
predetermined threshold, continuing to search for next
keyframe.
17. The computer-readable medium as described in claim 12, wherein
said matching is performed using RANSAC.
18. The computer-readable medium as described in claim 12, wherein
said computing is performed using Horn's procedure.
19. The computer-readable medium as described in claim 12, wherein
said method further comprises: responsive to a loss of said visual
tracking, determining device orientation by combining delta values
from said IMU sensor with a last known orientation measurement from
said visual tracking; and building a second orientation map,
wherein said second orientation map is created using data from said
IMU sensor.
20. The computer-readable medium as described in claim 19, wherein
said method further comprises: determining if said second
orientation map overlaps with keyframes from said first orientation
map; and responsive to a determination of overlap, deleting said
second orientation map and continuing to build said first
orientation map.
21. The computer-readable medium as described in claim 19, further
comprising: determining if said second orientation map overlaps
with keyframes from said first orientation map; and responsive to a
determination of overlap, merging said second orientation map with
said first orientation map.
22. The computer-readable medium as described in claim 19, further
comprising: determining if said second orientation map overlaps
with keyframes from said first orientation map; and responsive to a
determination of no overlap, continuing to build said second
orientation map.
23. A system for tracking device orientation on a portable device,
said system comprising: a display screen; a memory; a camera; and a
processor configured to implement a visual gyroscope, wherein said
visual gyroscope performs a method for tracking device orientation
on said portable device, wherein said method comprises:
initializing a device orientation to a sensor orientation, wherein
said sensor orientation is based on information from an inertial
measurement unit (IMU) sensor; initiating visual tracking using a
camera on said portable device; capturing a frame using said
camera; determining a plurality of visual features in said frame;
matching said frame to a keyframe, wherein capture of said keyframe
precedes capture of said frame, and wherein said keyframe was
captured using visual tracking; computing a rotation amount between
said frame and said keyframe; responsive to a determination that a
rotational distance between said frame and said keyframe exceeds a
predetermined threshold, promoting said frame to a keyframe status
and adding said frame to a first orientation map; and performing an
adjustment of said frame with all prior captured keyframes.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS
Related Applications
[0001] The present application is related to U.S. patent
application Ser. No. ______, filed ______, entitled "A METHOD AND
APPARATUS FOR LONG TERM IMAGE EXPOSURE WITH IMAGE STABILIZATION ON
A MOBILE DEVICE," naming Syed Zahir Bokari, Josh Abbott, and Jim
van Welzen as inventors, and having attorney docket number
NVID-PDU-13-0254-US1. That application is incorporated herein by
reference in its entirety and for all purposes.
FIELD OF THE INVENTION
[0002] Embodiments according to the present invention generally
relate to augmented reality systems and more specifically to device
orientation tracking for augmented reality systems.
BACKGROUND OF THE INVENTION
[0003] While augmented reality technologies have improved
considerably in recent years, the improvements typically come at
the cost of computationally intensive procedures implemented on
expensive hardware. The high cost of implementing effective
augmented reality systems is a barrier to entry that prevents
casual users from having access to such systems on everyday devices
that have relatively low processing capability, e.g., tablet
computers, phones and other hand-held devices.
[0004] A critical component in successfully implementing augmented
reality systems is device orientation tracking. In other words,
tracking the orientation of users and objects in the scene is
critical for developing augmented reality applications. Procedures
implemented in conventional augmented reality applications for
tracking device orientation with expected levels of robustness,
speed and precision are computationally expensive. Accordingly,
they are not ideal for handheld mobile devices.
[0005] For example, prior approaches for tracking device
orientation use full Simultaneous Localization and Mapping (SLAM)
algorithms, which attempt to determine a device's position as well
as orientation. SLAM is a technique used by augmented reality
applications build a map of the environment of the device while at
the same time keeping track of the device's current location, where
the current location includes both the device's position and also
its orientation. SLAM generally works by creating geometrically
consistent maps of the environment using inputs from different
types of sensors, e.g., 2D cameras, 3D sonar sensors, single
dimensional beams or 2D sweeping laser rangefinders etc. Building a
3D map of the environment can be significantly intensive from a
computation standpoint, in part, because it involves tracking the
device's position as well as orientation. Because of the
considerable computational requirements for implementing SLAM, it
is not a suitable procedure for implementation on a mobile device,
such as a smart phone.
[0006] Other conventional augmented reality systems are unsuitable
for determining device orientation on mobile devices because they
suffer from problems such as drift error. For example, one approach
used in conventional devices for tracking device orientation uses
only Inertial Measurement Unit (IMU) sensors. An IMU is an
electronic device that can measure and report on a device's
velocity, orientation or gravitational forces, using a combination
of inputs from other devices such an accelerometer, gyroscope and
magnetometer. A major disadvantage of using IMUs is that they
typically suffer from accumulated error. Because the guidance
system is continually adding detected changes to its
previously-calculated positions, any errors in measurement, however
small, are accumulated from point to point. This leads to drift
error, or an ever-increasing difference between where the system
thinks it is located and the actual location, where, as stated
above, location includes both the device's position and also its
orientation. Stated differently, drift is a problem because
integration of the orientation tracking's relative measurements
accumulates the small errors in each measurement. This can,
consequently, create significant differences between the estimated
and actual orientation. Further, measurements from IMU sensors tend
to be error prone and noisy. Thus, IMU sensors are typically not
suitable for an immersive augmented reality environment.
[0007] Conventional methods of tracking device orientation for
augmented reality systems, therefore, are unsuitable for use on
mobile devices because they are either too expensive and
computationally intensive or simply unsuitable as a result of
problems such as noise and drift error.
BRIEF SUMMARY OF THE INVENTION
[0008] Accordingly, a need exists for a system and a method for
tracking device orientation on a mobile device that has a smaller
compute footprint. For example, embodiments of the present
invention track device orientation without constructing an entire
map of the environment or even trying to calculate the device's
position. Further, embodiments of the present invention
advantageously track device orientation without requiring a user to
change the position of the camera on the mobile device. As a
result, device orientation can be tracked more efficiently and
quickly using smaller and more affordable electronic
components.
[0009] Further, a need exists for systems and methods for a
vision-based orientation tracking procedure on a mobile device that
uses the camera on the device to determine orientation by
identifying and tracking landmarks (or image features) in a natural
environment. As a result, embodiments of the present invention
advantageously provide a robust, fast and precise orientation
tracking solution while avoiding the pitfalls of noise and drift
error that is prevalent in conventional IMU sensor-based systems.
Moreover, embodiments of the present invention use markerless
tracking to provide the most accurate device orientation over naive
sensor-based approaches. In one embodiment of the present
invention, IMU sensors are utilized as a fallback option if
vision-based tracking fails.
[0010] In one embodiment, a method for tracking device orientation
on a portable device is disclosed. The method comprises
initializing a device orientation to a sensor orientation, wherein
the sensor orientation is based on information from an inertial
measurement unit (IMU) sensor. It also comprises initiating visual
tracking using a camera on the portable device and capturing a
frame. Next, it comprises determining a plurality of visual
features in the frame and matching the frame to a keyframe, wherein
capture of the keyframe precedes capture of the frame.
Subsequently, it comprises computing a rotation amount between the
frame and the keyframe. Responsive to a determination that a
rotational distance between the frame and the keyframe exceeds a
predetermined threshold, promoting the frame to a keyframe status
and adding it to a first orientation map and adjusting the frame
with all prior captured keyframes.
[0011] In another embodiment, a computer-readable storage medium
having stored thereon instructions that, if executed by a computer
system cause the computer system to perform a method for tracking
device orientation on a portable device is disclosed. The method
comprises initializing a device orientation to a sensor
orientation, wherein the sensor orientation is based on information
from an inertial measurement unit (IMU) sensor. It also comprises
initiating visual tracking using a camera on the portable device
and capturing a frame. Next, it comprises determining a plurality
of visual features in the frame and matching the frame to a
keyframe, wherein capture of the keyframe precedes capture of the
frame. Subsequently, it comprises computing a rotation amount
between the frame and the keyframe. Responsive to a determination
that a rotational distance between the frame and the keyframe
exceeds a predetermined threshold, promoting the frame to a
keyframe status and adding it to a first orientation map and
adjusting the frame with all prior captured keyframes.
[0012] In a different embodiment, a system for tracking device
orientation on a portable device is presented. The system comprises
a display screen; a memory; a camera; and a processor configured to
implement a visual gyroscope, wherein the visual gyroscope performs
a method for tracking device orientation on the portable device.
The method comprises initializing a device orientation to a sensor
orientation, wherein the sensor orientation is based on information
from an inertial measurement unit (IMU) sensor. It also comprises
initiating visual tracking using a camera on the portable device
and capturing a frame. Next, it comprises determining a plurality
of visual features in the frame and matching the frame to a
keyframe, wherein capture of the keyframe precedes capture of the
frame. Subsequently, it comprises computing a rotation amount
between the frame and the keyframe. Responsive to a determination
that a rotational distance between the frame and the keyframe
exceeds a predetermined threshold, promoting the frame to a
keyframe status and adding it to a first orientation map and
adjusting the frame with all prior captured keyframes.
[0013] The following detailed description together with the
accompanying drawings will provide a better understanding of the
nature and advantages of the present invention.
BRIEF DESCRIPTION OF THE DRAWINGS
[0014] Embodiments of the present invention are illustrated by way
of example, and not by way of limitation, in the figures of the
accompanying drawings and in which like reference numerals refer to
similar elements.
[0015] FIG. 1 shows an exemplary computer system with a camera used
to implement a visual gyroscope for tracking device orientation in
accordance with one embodiment of the present invention.
[0016] FIG. 2 shows an exemplary operating environment of a mobile
device capable of tracking device orientation in accordance with
one embodiment of the present invention.
[0017] FIG. 3 illustrates a use case for the visual gyroscope in
accordance with one embodiment of the present invention.
[0018] FIG. 4 is a high level block diagram illustrating the
elements of the orientation tracking system proposed in accordance
with an embodiment of the present invention.
[0019] FIG. 5 is a diagram of a visual gyroscope capturing an
initial keyframe in accordance with an embodiment of the present
invention.
[0020] FIG. 6 is an illustration of the visual gyroscope capturing
a second keyframe following a user rotation from an initial
direction in accordance with an embodiment of the present
invention.
[0021] FIG. 7 is an illustration of the visual gyroscope starting a
new map based on IMU sensor data in accordance with an embodiment
of the present invention.
[0022] FIG. 8 is an illustration of the visual gyroscope creating a
new map based on IMU sensor data until an overlap is found with the
map created from vision-based data in accordance with an embodiment
of the present invention.
[0023] FIG. 9 depicts a flowchart of an exemplary computer
implemented process of tracking device orientation in accordance
with one embodiment of the present invention.
[0024] FIG. 10 depicts a flowchart of an exemplary computer
implemented process of using sensors for tracking device
orientation when visual tracking is lost in accordance with one
embodiment of the present invention.
[0025] In the figures, elements having the same designation have
the same or similar function.
DETAILED DESCRIPTION OF THE INVENTION
[0026] Reference will now be made in detail to the various
embodiments of the present disclosure, examples of which are
illustrated in the accompanying drawings. While described in
conjunction with these embodiments, it will be understood that they
are not intended to limit the disclosure to these embodiments. On
the contrary, the disclosure is intended to cover alternatives,
modifications and equivalents, which may be included within the
spirit and scope of the disclosure as defined by the appended
claims. Furthermore, in the following detailed description of the
present disclosure, numerous specific details are set forth in
order to provide a thorough understanding of the present
disclosure. However, it will be understood that the present
disclosure may be practiced without these specific details. In
other instances, well-known methods, procedures, components, and
circuits have not been described in detail so as not to
unnecessarily obscure aspects of the present disclosure.
[0027] Notation and Nomenclature
[0028] Some portions of the detailed descriptions that follow are
presented in terms of procedures, steps, logic blocks, processing,
and other symbolic representations of operations on data bits
within a computer memory. These descriptions and representations
are the means used by those skilled in the data processing arts to
most effectively convey the substance of their work to others
skilled in the art. In the present application, a procedure, logic
block, process, or the like, is conceived to be a self-consistent
sequence of steps or instructions leading to a desired result. The
steps are those utilizing physical manipulations of physical
quantities. Usually, although not necessarily, these quantities
take the form of electrical or magnetic signals capable of being
stored, transferred, combined, compared, and otherwise manipulated
in a computer system. It has proven convenient at times,
principally for reasons of common usage, to refer to these signals
as transactions, bits, values, elements, symbols, characters,
samples, pixels, or the like.
[0029] It should be borne in mind, however, that all of these and
similar terms are to be associated with the appropriate physical
quantities and are merely convenient labels applied to these
quantities. Unless specifically stated otherwise as apparent from
the following discussions, it is appreciated that throughout the
present disclosure, discussions utilizing terms such as
"capturing," "determining," "matching," "promoting," "bundling,"
"storing," or the like, refer to actions and processes (e.g.,
flowchart 900 of FIG. 9) of a computer system or similar electronic
computing device or processor (e.g., system 100 of FIG. 1). The
computer system or similar electronic computing device manipulates
and transforms data represented as physical (electronic) quantities
within the computer system memories, registers or other such
information storage, transmission or display devices.
[0030] Embodiments described herein may be discussed in the general
context of computer-executable instructions residing on some form
of computer-readable storage medium, such as program modules,
executed by one or more computers or other devices. By way of
example, and not limitation, computer-readable storage media may
comprise non-transitory computer-readable storage media and
communication media; non-transitory computer-readable media include
all computer-readable media except for a transitory, propagating
signal. Generally, program modules include routines, programs,
objects, components, data structures, etc., that perform particular
tasks or implement particular abstract data types. The
functionality of the program modules may be combined or distributed
as desired in various embodiments.
[0031] Computer storage media includes volatile and nonvolatile,
removable and non-removable media implemented in any method or
technology for storage of information such as computer-readable
instructions, data structures, program modules or other data.
Computer storage media includes, but is not limited to, random
access memory (RAM), read only memory (ROM), electrically erasable
programmable ROM (EEPROM), flash memory or other memory technology,
compact disk ROM (CD-ROM), digital versatile disks (DVDs) or other
optical storage, magnetic cassettes, magnetic tape, magnetic disk
storage or other magnetic storage devices, or any other medium that
can be used to store the desired information and that can accessed
to retrieve that information.
[0032] Communication media can embody computer-executable
instructions, data structures, and program modules, and includes
any information delivery media. By way of example, and not
limitation, communication media includes wired media such as a
wired network or direct-wired connection, and wireless media such
as acoustic, radio frequency (RF), infrared, and other wireless
media. Combinations of any of the above can also be included within
the scope of computer-readable media.
[0033] FIG. 1 shows an exemplary computer system with a camera used
to implement a visual gyroscope for tracking device orientation in
accordance with one embodiment of the present invention. Computer
system 100 depicts the components of a generic computer system in
accordance with embodiments of the present invention providing the
execution platform for certain hardware-based and software-based
functionality. In general, computer system 100 comprises at least
one CPU 101, a system memory 115, and at least one graphics
processor unit (GPU) 110. The CPU 101 can be coupled to the system
memory 115 via a bridge component/memory controller (not shown) or
can be directly coupled to the system memory 115 via a memory
controller (not shown) internal to the CPU 101. The GPU 110 may be
coupled to a display 112. One or more additional GPUs can
optionally be coupled to system 100 to further increase its
computational power. The GPU(s) 110 is coupled to the CPU 101 and
the system memory 115. The GPU 110 can be implemented as a discrete
component, a discrete graphics card designed to couple to the
computer system 100 via a connector (e.g., AGP slot, PCI-Express
slot, etc.), a discrete integrated circuit die (e.g., mounted
directly on a motherboard), or as an integrated GPU included within
the integrated circuit die of a computer system chipset component
(not shown). Additionally, a local graphics memory 114 can be
included for the GPU 110 for high bandwidth graphics data
storage.
[0034] The CPU 101 and the GPU 110 can also be integrated into a
single integrated circuit die and the CPU and GPU may share various
resources, such as instruction logic, buffers, functional units and
so on, or separate resources may be provided for graphics and
general-purpose operations. The GPU may further be integrated into
a core logic component. Accordingly, any or all the circuits and/or
functionality described herein as being associated with the GPU 110
can also be implemented in, and performed by, a suitably equipped
CPU 101. Additionally, while embodiments herein may make reference
to a GPU, it should be noted that the described circuits and/or
functionality can also be implemented and other types of processors
(e.g., general purpose or other special-purpose coprocessors) or
within a CPU.
[0035] System 100 can be implemented as, for example, a desktop
computer system or server computer system having a powerful
general-purpose CPU 101 coupled to a dedicated graphics rendering
GPU 110. In such an embodiment, components can be included that add
peripheral buses, specialized audio/video components, IO devices,
and the like. Similarly, system 100 can be implemented as a
handheld device (e.g., cell-phone, tablet computer, MP3 player,
etc.), direct broadcast satellite (DBS)/terrestrial set-top box or
a set-top video game console device such as, for example, the
Xbox.RTM. or the PlayStation3.RTM.. System 100 can also be
implemented as a "system on a chip", where the electronics (e.g.,
the components 101, 115, 110, 114, and the like) of a computing
device are wholly contained within a single integrated circuit die.
Examples include a hand-held instrument with a display, a car
navigation system, a portable entertainment system, and the
like.
[0036] A Method and Apparatus for Device Orientation Tracking Using
a Visual Gyroscope
[0037] Embodiments of the present invention provide a system and a
method for tracking device orientation on a mobile device that has
a smaller compute footprint. For example, embodiments of the
present invention track device orientation without constructing an
entire map of the environment or even trying to calculate the
device's position. Further, embodiments of the present invention
advantageously track device orientation without requiring a user to
change the position of the camera on the mobile device. As a
result, device orientation can be tracked more efficiently and
quickly using smaller and more affordable electronic
components.
[0038] Embodiments of the present invention provide a system and a
method for a vision-based orientation tracking procedure on a
mobile device that uses the camera on the device to determine
orientation by identifying and tracking landmarks (or image
features) in natural environment. In one embodiment, the present
invention is a visual gyroscope that is operable to precisely
determine a handheld device's orientation using vision-based
techniques. As a result, embodiments of the present invention
advantageously provide a robust, fast and precise orientation
tracking solution while avoiding the pitfalls of noise and drift
error prevalent in conventional IMU sensor-based systems.
Embodiments of the present invention use markerless tracking to
provide the most accurate device orientation over naive
sensor-based approaches. In one embodiment of the present
invention, IMU sensors are utilized simply as a fallback option if
vision-based tracking fails.
[0039] FIG. 2 shows an exemplary operating environment of a mobile
device capable of tracking device orientation in accordance with
one embodiment of the present invention. System 200 includes camera
202, image signal processor (ISP) 204, memory 206, IMU sensor 240,
input module 208, central processing unit (CPU) 210, display 212,
communications bus 214, and power source 220. Power source 220
provides power to system 200 and may be a DC or AC power source.
System 200 depicts the components of an exemplary system in
accordance with embodiments of the present invention providing the
execution platform for certain hardware-based and software-based
functionality. Although specific components are disclosed in system
200, it should be appreciated that such components are examples.
That is, embodiments of the present invention are well suited to
having various other components or variations of the components
recited in system 200. It is appreciated that the components in
system 200 may operate with other components other than those
presented, and that not all of the components of system 200 may be
required to achieve the goals of system 200.
[0040] CPU 210 and the ISP 204 can also be integrated into a single
integrated circuit die and CPU 210 and ISP 204 may share various
resources, such as instruction logic, buffers, functional units and
so on, or separate resources may be provided for image processing
and general-purpose operations. System 200 can be implemented as,
for example, a digital camera, cell phone camera, portable device
(e.g., audio device, entertainment device, handheld device),
webcam, video device (e.g., camcorder) or any other device with a
front or back facing camera.
[0041] In one embodiment, camera 202 captures light via a
front-facing or back-facing lens (depending on how the user
typically holds the device), and converts the light received into a
signal (e.g., digital or analog). Camera 202 may comprise any of a
variety of optical sensors including, but not limited to,
complementary metal-oxide-semiconductor (CMOS) or charge-coupled
device (CCD) sensors. Camera 202 is coupled to communications bus
214 and may provide image data received over communications bus
214. Camera 202 may comprise functionality to determine and
configure optical properties and settings including, but not
limited to, focus, exposure, color or white balance, and areas of
interest (e.g., via a focus motor, aperture control, etc.). In one
embodiment, camera 202 may also represent a front facing and a back
facing camera both of which are operable to capture images
contemporaneously.
[0042] Image signal processor (ISP) 204 is coupled to
communications bus 214 and processes the signal generated by camera
204, as described herein. More specifically, image signal processor
204 may process data from camera 202 for storing in memory 206. For
example, image signal processor 204 may process frames of visual
data captured using camera 202 to be stored within memory 206.
[0043] Input module 208 allows entry of commands into system 200
which may then, among other things, control the sampling of data by
camera 202 and subsequent processing by ISP 204. Input module 208
may include, but it not limited to, navigation pads, keyboards
(e.g., QWERTY), up/down buttons, touch screen controls (e.g., via
display 212) and the like.
[0044] Central processing unit (CPU) 210 receives commands via
input module 208 and may control a variety of operations including,
but not limited to, sampling and configuration of camera 202,
processing by ISP 204, and management (e.g., addition, transfer,
and removal) of images and/or video from memory 206.
[0045] Inertial Measurement Unit (IMU) module 240 can detect the
current rate of acceleration of the device 200 using one or more
accelerometers in device 200 (not shown). Accelerometers detect
acceleration forces along a single axis, three are often combined
to provide acceleration detection along the x, y and z axis. When
the accelerometer is at rest, the axis pointing down will read one
due to the force of gravity and the two horizontal axis will read
zero.
[0046] The IMU module 240 can also detect changes in rotational
attributes like pitch, roll, and yaw using one or more gyroscopes
in device 200 (not shown). A gyroscope detects the rotational
change of a device. Finally, IMU module 240 may also receive data
from a magnetometer (not shown), which is a sensor for measuring
the strength and direction of magnetic fields, and can be used for
tracking magnetic north, thereby, acting like a compass.
Alternatively, the IMU module 240 may receive direction data from a
digital compass.
[0047] FIG. 3 illustrates a use case for the visual gyroscope in
accordance with one embodiment of the present invention. As the
stationary user 310 pans the camera on his handheld device around,
the visual gyroscope of the present invention creates a spherical
map of the local environment using vision data from the camera by
dynamically identifying and tracking features in the environment.
In order to create the spherical map, the visual gyroscope needs to
track the orientation of the device. Stated differently, the visual
gyroscope of the present invention can track features or landmarks
in the environment and use information from the tracking to
determine the handheld device's orientation, which is used in order
to create a spherical map of the environment for identification
thereof. The map can then be used in, for example, augmented
reality applications, where the user could obtain more information
regarding the landmarks in the scene identified through the visual
gyroscope. For example, user 310 in FIG. 3 could identify the
building that the camera pans across in section 320. Or the user
may, for example, be able to determine whether a particular
restaurant in the building is open or closed by receiving real time
information concerning the identified restaurant. By way of further
example, the user may be able to overlay certain graphics on
landmarks within the field of view with information regarding the
identified businesses, e.g., restaurant menus on restaurants, menu
of services rendered by a spa, etc.
[0048] Unlike conventional SLAM based and other approaches, the
user is not required to change the position of the camera to build
a model of the scene and construct an entire map of the
environment. A stationary user can capture the feature map with
natural panning motion.
[0049] FIG. 4 is a high level block diagram illustrating the
elements of the orientation tracking system 405 proposed in
accordance with an embodiment of the present invention. The basic
approach employed in the present invention is using vision-based
methods with a visual gyroscope for image feature recognition, but
also combining vision data with IMU sensor data for robustness. As
explained above, only the IMU sensor data lead to drift, which
caused marker misplacement. Thus, embodiments of the present
invention use vision data to correct for drift, but also use IMU
sensor data as a back-up when acquisition of visual data is lost or
breaks down.
[0050] As shown in FIG. 4, data from the gyroscope 414,
accelerometer 416 and magnetometer 418 feeds into the IMU sensor
420. The output of the camera 412 and the IMU sensor 420 is
transmitted to the user application 422. The time-stamped camera
data and IMU sensor data 410 from user application 422 is directed
to the visual gyroscope module 480. The visual gyroscope module 480
uses the time-stamped camera and sensor data to create time-stamped
device orientation data 490. The orientation data 490 is fed back
into the user application 422. In one embodiment, user application
422 can be an augmented reality application used, for example, to
locate nearby restaurants.
[0051] FIG. 5 is a diagram of a visual gyroscope capturing an
initial keyframe in accordance with an embodiment of the present
invention. The visual gyroscope module creates an orientation map
524 by capturing keyframes. A keyframe is a snapshot taken by the
camera in the direction the camera is pointing in that is at a
sufficient degree of rotation from a prior keyframe. This keyframe
is saved as part of an orientation map and can be found if the
device gets lost while the user is panning around. As shown in FIG.
5, in order to create orientation map 524, the device captures an
initial keyframe 522 at user direction 526.
[0052] When the system 200 is first started, the visual gyroscope
module is initialized to the absolute orientation of the device
read from the IMU sensor 420. This initialization plays an
important role when visual tracking fails and will be explained
further below.
[0053] After every captured frame, including the initial keyframe,
the visual gyroscope procedure computes visual features that are
repeatable and distinct within the frame. Any number of feature
detecting procedures can be employed for this purpose, e.g., Harris
& Stephens, Plessey, Shi-Tomasi, Moravec corner detection, Wang
and Brady corner detection, SUSAN (smallest univalue segment
assimilating nucleus) corner detector etc. In one embodiment, the
visual gyroscope uses FAST (Features from Accelerated Segment Test)
for feature detection. FAST is a well-known corner detection
algorithm. Corner detection is an approach used within computer
vision systems to extract certain kinds of features and infer the
contents of an image.
[0054] The visual gyroscope procedure then performs feature
description for the features found from feature detection using a
well-known procedure, e.g., SURF (Speeded Up Robust Feature), Scale
Invariant Feature Transform (SIFT), or BRIEF (Binary Robust
Independent Elementary Features). In one embodiment, the visual
gyroscope module uses BRIEF for feature description. Feature
description comprises detecting and describing local features in
images. For example, for any object in an image, interesting points
on the object can be extracted to provide a feature description of
the object. This description, extracted from a training image, can
then be used to identify the object when attempting to locate the
object in a test image containing many other objects. Accordingly,
the visual gyroscope procedure, in one embodiment, can use BRIEF
for feature description of objects within the captured frame.
[0055] The feature detection procedure, e.g., FAST and the feature
description procedure, e.g., BRIEF are run on every captured frame
including the initial keyframe. Feature detection procedures find
features in the image while feature description procedures describe
the feature in the sequence of bits so as to compare it with
similar features in other frames. In one embodiment, the smoothing
operation in BRIEF can be removed to speed up the procedure to run
real-time.
[0056] In one embodiment, the feature points are projected onto the
image plane as seen by the "ideal" camera using a camera inverse
matrix (K.sup.-1)*(x, y, 1) column vector. In one embodiment, the
newly transformed points are normalized to make a spherical
representation of the points. All features in every frame will be
warped in this fashion.
[0057] FIG. 6 is an illustration of the visual gyroscope capturing
a second frame following a user orientation rotation from an
initial direction in accordance with an embodiment of the present
invention. If the procedure determines that the user has panned
around sufficiently and that there are enough matching visual
features between the second frame and the initial keyframe, the
visual gyroscope module will promote frame 523 at user direction
527 to keyframe status.
[0058] In order to determine if the user has panned around
sufficiently, the visual gyroscope procedure matches features
between the current frame and the prior keyframe and then computes
a rotation between the two if the features match. Stated
differently, the procedure computes a relative rotation between the
two consecutive frames from the differences in position of a
matching set of feature points in the two images. In one
embodiment, the procedure may build a grid for faster matching.
This way only grid cells can be matched to grid cells rather than
brute-force matching of the entire set of features. Matching
features from the prior keyframe to the new frame allows the
procedure to determine which locations on the sphere map to new
locations. When matches are found, the visual gyroscope procedure
can use Horn's procedure with RANSAC, as will be explained below,
to estimate a pure rotation from the matched points.
[0059] The vision data from the camera enables the visual gyroscope
to determine an approximation of the user rotation. In one
embodiment, the well-known Random Sample Consensus (RANSAC)
procedure can be used to match a frame to a prior keyframe. Horn's
procedure as described in the following: "B. Horn. Closed-form
solution of absolute orientation using unit quaternions. Journal of
the Optical Society of America, 1987", all of which is incorporated
herein by reference, is used to compute a rotation between two sets
of three points, which are then used for RANSAC sampling. Also
Horn's method demonstrates how to compute a rotation between two
sets of all matched points. This is used to then compute the final
rotation between frames once the RANSAC procedure has provided
information regarding if the rotation computed with the two sets of
three points provides enough inliers for all points. While RANSAC
and Horn's procedure can be used to determine the rotation between
keyframes, the embodiments of the present invention are not limited
to solely these procedures. For example, in one embodiment, changes
or deltas in absolute sensor orientation received from the IMU
sensor can also be used to approximate user rotation.
[0060] In one embodiment, keyframes are captured at approximately
every 20 degrees of user rotation. If the procedure determines that
the user has panned above a certain threshold, e.g., 20 degrees,
and that there are enough matching visual features in captured
frame 523, it will promote the captured frame to a keyframe status.
Conversely, if it is determined that the user has not panned a
distance sufficiently far from initial keyframe 522, the procedure
will not promote the captured frame to a keyframe. The new keyframe
will mach to the nearest prior keyframe based, in one embodiment,
on a dot product lookup. If, however, for example, no keyframe is
near, then it will match to the last frame and save that as a
keyframe, if possible.
[0061] In one embodiment, a "bundle adjustment" is performed when a
new keyframe is added to the map. A bundle adjustment, which is a
well-known method, comprises globally adjusting every keyframe to
minimize orientation error of each keyframe every time a new
keyframe is added to the map. Global alignment (bundle adjustment)
is based on the difference between the angle of neighboring
keyframes and what a brute force match provides as an angle.
[0062] FIG. 7 is an illustration of the visual gyroscope starting a
new map based on IMU sensor data in accordance with an embodiment
of the present invention. Continuing with visual tracking, the
visual gyroscope is able to save additional keyframes beyond
keyframe 523. As mentioned before, the visual gyroscope is
initialized to the absolute orientation of the device as read from
the IMU sensor 420 on start-up. Accordingly, the visual gyroscope
is able to continually update the orientation as the user moves if
the environment permits.
[0063] However, sometimes vision-based tracking may fail for any of
several reasons, e.g., insufficient texture in environment, not
enough landmarks available, or highly dynamic content as a result
of a user panning too quickly etc. In FIG. 7, for instance, the
user has lost visual tracking because panning too fast results in
motion blur created in the visual data captured by the camera. As a
result of motion blur, the camera is no longer able to match
features to prior frames and therefore loses its orientation.
[0064] When visual tracking is lost, the procedure starts a new map
with keyframe 703 using data from IMU sensor 420. At this point,
the absolute sensor orientation and the visual gyroscope
orientation may show different readings and, occasionally, even
vastly different readings even though they both were initialized
with the same orientation value because of the drift in the IMU
sensor 420. Thus, instead of using the absolute sensor orientation,
the visual gyroscope uses the deltas (or differences) of absolute
sensor orientation to calculate a relative orientation traveled,
which is to be combined into the visual gyroscope orientation
reading from where the visual tracking fails. For example, in FIG.
7, assuming visual tracking failed at user direction 527 right
after keyframe 523 was captured and the user pans around to user
direction 528, then the relative orientation difference between
user direction 527 and user direction 528 is calculated using delta
values from the absolute sensor orientation. The relative
orientation is then combined with the visual gyroscope orientation
reading obtained at user direction 527 to determine the orientation
at user direction 528. This creates a smooth experience and the
user would not see the transition from visual tracking to sensor
tracking.
[0065] After determining orientation by combining the last known
visual gyroscope reading with the delta values from the absolute
sensor orientation, the visual gyroscope starts a new map at user
direction 528 by capturing keyframe 703.
[0066] FIG. 8 is an illustration of the visual gyroscope creating a
new map based on IMU sensor data until an overlap is found with the
map created from vision-based data in accordance with an embodiment
of the present invention. As shown in FIG. 8, a new map is created
comprising keyframes 808 using IMU sensor 420 data when visual
tracking fails. Because of feature tracking in each keyframe, the
visual gyroscope can determine if the user has panned back to a
location for which keyframes based on visual tracking already
exist. In other words, the visual gyroscope can recognize when
there is an overlap between the initial map created using visual
tracking (comprising keyframes 522 and 523) and the new map
comprising keyframes 808.
[0067] In one embodiment, when the overlap is found, the visual
gyroscope deletes the secondary map created using sensor data,
e.g., map comprising frames 808. In other words, the sensor data is
only used when visual tracking has failed. When the procedure
recognizes a keyframe from before that was created using visual
tracking, it immediately reverts back to visual tracking and
discontinues use of sensor tracking at that time. Moreover, it
deletes the map obtained through sensor tracking.
[0068] In a different embodiment, however, when an overlap is
found, the visual gyroscope will merge the new map created through
sensor tracking with the prior map created using visual tracking.
The combined map will then comprise the prior map comprising
keyframes 522 and 523 and the new map comprising keyframes 808.
[0069] In one embodiment of the present invention, the visual
gyroscope is able to turn on certain rejection zones in the
camera's field of view. Rejection zones are areas in the spherical
map that are not allowed to save keyframes. This is important
because the map can experience significant drift if keyframes are
saved based on features that are too close to the viewer. Thus, the
visual gyroscope turns on dead-zones for angles that are pointed
down, e.g., the ground. Also it may be turned on in areas that are
featureless, e.g., the sky. Accordingly, for the rejection zones,
precision is not important, and, therefore, the visual gyroscope
relies on the tracking from the IMU sensor 420.
[0070] In one embodiment, the present invention takes advantage of
the fact that many scenes can be approximated as objects at
infinity, e.g., in a panoramic model. This simplification is
leveraged by the visual gyroscope in order to simplify the
procedures it is implementing.
[0071] FIG. 9 depicts a flowchart of an exemplary computer
implement process of tracking device orientation using visual
tracking in accordance with one embodiment of the present
invention. While the various steps in this flowchart are presented
and described sequentially, one of ordinary skill will appreciate
that some or all of the steps can be executed in different orders
and some or all of the steps can be executed in parallel. Further,
in one or more embodiments of the invention, one or more of the
steps described below can be omitted, repeated, and/or performed in
a different order. Accordingly, the specific arrangement of steps
shown in FIG. 9 should not be construed as limiting the scope of
the invention. Rather, it will be apparent to persons skilled in
the relevant art(s) from the teachings provided herein that other
functional flows are within the scope and spirit of the present
invention. Flowchart 900 may be described with continued reference
to exemplary embodiments described above, though the method is not
limited to those embodiments.
[0072] At step 910, the device orientation is initialized using
absolute sensor orientation. At step 912, the camera on the
handheld device captures a frame from the camera or any other
visual capture device. At step 914, the features in the frame are
determined using procedures such as FAST AND BRIEF as explained
above. At step 916, the frame is matched to a prior keyframe and a
rotation is computed between the current frame and prior keyframe.
In one embodiment, RANSAC and Horn's procedure are used to perform
the matching and rotation computation. If it is found that the user
has rotated orientation of the handheld device over a certain
threshold, e.g., 20 degrees, then the current frame is promoted to
a keyframe. It should be noted, however, that there are other
considerations that, in one embodiment, may also be taken into
account before promoting a frame to a keyframe status, e.g.,
ascertaining that the frame has enough visual features, that the
frame is not in a restricted zone, and also that it is at an
appropriate distance away from another key-frame. Finally, at step
920, bundle adjustment is performed on the newly added keyframe
with all the other keyframes.
[0073] FIG. 10 depicts a flowchart of an exemplary computer
implemented process of using sensors for tracking device
orientation when visual tracking is temporarily lost in accordance
with one embodiment of the present invention. While the various
steps in this flowchart are presented and described sequentially,
one of ordinary skill will appreciate that some or all of the steps
can be executed in different orders and some or all of the steps
can be executed in parallel. Further, in one or more embodiments of
the invention, one or more of the steps described below can be
omitted, repeated, and/or performed in a different order.
Accordingly, the specific arrangement of steps shown in FIG. 10
should not be construed as limiting the scope of the invention.
Rather, it will be apparent to persons skilled in the relevant
art(s) from the teachings provided herein that other functional
flows are within the scope and spirit of the present invention.
Flowchart 1000 may be described with continued reference to
exemplary embodiments described above, though the method is not
limited to those embodiments.
[0074] At step 1012, after visual tracking is lost, embodiments of
the present invention determine device orientation by combining
delta values obtained from absolute sensor orientation to a last
computed orientation reading from the visual gyroscope, wherein the
last computed orientation reading is based on visual tracking.
[0075] At step 1013, a new keyframe is saved based on the delta
values from the IMU sensor. This acts as the initial keyframe in a
new orientation map that is generated based on IMU sensor data. In
other words, relative orientation between the first map based on
visual tracking and the second map is calculated using IMU delta
values. Accordingly, at step 10 14 a new map is built based on the
sensor data. It should be noted that this new orientation map based
on IMU sensor data is only built if visual tracking is lost and the
visual gyroscope module that the incoming new frames are not close
enough to match to the first map. In other words, the second
orientation map is only created if the prior map is "lost."
[0076] At step 1015, as new keyframes are added to the new
orientation map, the visual gyroscope maintains matching features
to determine if there is an overlap between the prior map based on
visual tracking and the new map based on sensor data.
[0077] At step 1016, when an overlap is found, in one embodiment,
the new map based on sensor data is deleted and the visual
gyroscope continues to build the prior map based on visual tracking
data. In a different embodiment, however, when an overlap is found,
the prior map and the new map based on sensor data are merged and
the visual gyroscope continues to build the map based on visual
tracking data.
[0078] While the foregoing disclosure sets forth various
embodiments using specific block diagrams, flowcharts, and
examples, each block diagram component, flowchart step, operation,
and/or component described and/or illustrated herein may be
implemented, individually and/or collectively, using a wide range
of hardware, software, or firmware (or any combination thereof)
configurations. In addition, any disclosure of components contained
within other components should be considered as examples because
many other architectures can be implemented to achieve the same
functionality.
[0079] The process parameters and sequence of steps described
and/or illustrated herein are given by way of example only. For
example, while the steps illustrated and/or described herein may be
shown or discussed in a particular order, these steps do not
necessarily need to be performed in the order illustrated or
discussed. The various example methods described and/or illustrated
herein may also omit one or more of the steps described or
illustrated herein or include additional steps in addition to those
disclosed.
[0080] While various embodiments have been described and/or
illustrated herein in the context of fully functional computing
systems, one or more of these example embodiments may be
distributed as a program product in a variety of forms, regardless
of the particular type of computer-readable media used to actually
carry out the distribution. The embodiments disclosed herein may
also be implemented using software modules that perform certain
tasks. These software modules may include script, batch, or other
executable files that may be stored on a computer-readable storage
medium or in a computing system. These software modules may
configure a computing system to perform one or more of the example
embodiments disclosed herein. One or more of the software modules
disclosed herein may be implemented in a cloud computing
environment. Cloud computing environments may provide various
services and applications via the Internet. These cloud-based
services (e.g., software as a service, platform as a service,
infrastructure as a service, etc.) may be accessible through a Web
browser or other remote interface. Various functions described
herein may be provided through a remote desktop environment or any
other cloud-based computing environment.
[0081] The foregoing description, for purpose of explanation, has
been described with reference to specific embodiments. However, the
illustrative discussions above are not intended to be exhaustive or
to limit the invention to the precise forms disclosed. Many
modifications and variations are possible in view of the above
teachings. The embodiments were chosen and described in order to
best explain the principles of the invention and its practical
applications, to thereby enable others skilled in the art to best
utilize the invention and various embodiments with various
modifications as may be suited to the particular use
contemplated.
[0082] Embodiments according to the invention are thus described.
While the present disclosure has been described in particular
embodiments, it should be appreciated that the invention should not
be construed as limited by such embodiments, but rather construed
according to the below claims.
* * * * *