Method And Apparatus For Device Orientation Tracking Using A Visual Gyroscope Abbott; Josh ; et al. [Nvidia Corporation]

Method And Apparatus For Device Orientation Tracking Using A Visual Gyroscope

Abbott; Josh ; et al.

Patent Application Summary

U.S. patent application number 14/051282 was filed with the patent office on 2015-04-16 for method and apparatus for device orientation tracking using a visual gyroscope. This patent application is currently assigned to Nvidia Corporation. The applicant listed for this patent is Nvidia Corporation. Invention is credited to Josh Abbott, Jared Heinly, Jim van Welzen.

Application Number	20150103183 14/051282
Document ID	/
Family ID	52809333
Filed Date	2015-04-16

United States Patent Application	20150103183
Kind Code	A1
Abbott; Josh ; et al.	April 16, 2015

METHOD AND APPARATUS FOR DEVICE ORIENTATION TRACKING USING A VISUAL GYROSCOPE

Abstract

A method for tracking device orientation on a portable device is disclosed. The method comprises initializing a device orientation to a sensor orientation, wherein the sensor orientation is based on information from an inertial measurement unit (IMU) sensor. It also comprises initiating visual tracking using a camera on the portable device and capturing a frame. Next, it comprises determining a plurality of visual features in the frame and matching the frame to a keyframe, wherein capture of the keyframe precedes capture of the frame. Subsequently, it comprises computing a rotation amount between the frame and the keyframe. Responsive to a determination that a rotational distance between the frame and the keyframe exceeds a predetermined threshold, promoting the frame to a keyframe status and adding it to a first orientation map and adjusting the frame with all prior captured keyframes.

Inventors:

Abbott; Josh; (Raleigh, NC) ; Heinly; Jared; (Chapel Hill, NC) ; van Welzen; Jim; (Durham, NC)

Applicant:

Name	City	State	Country	Type
Nvidia Corporation	Santa Clara	CA	US

Assignee:

Nvidia Corporation
Santa Clara
CA

Family ID:

52809333

Appl. No.:

14/051282

Filed:

October 10, 2013

Current U.S. Class:	348/169
Current CPC Class:	G01C 21/165 20130101; G06T 7/74 20170101; H04N 5/2621 20130101; H04N 5/272 20130101; G06T 2207/30244 20130101; H04N 5/23264 20130101; G01C 11/00 20130101; G06T 7/248 20170101; H04N 5/23293 20130101; G06T 2207/10016 20130101
Class at Publication:	348/169
International Class:	G06T 7/00 20060101 G06T007/00; G06T 19/00 20060101 G06T019/00

Claims

1. A method for tracking device orientation on a portable device, said method comprising: initializing a device orientation to a sensor orientation, wherein said sensor orientation is based on information from an inertial measurement unit (IMU) sensor; initiating visual tracking using a camera on said portable device; capturing a frame using said camera; determining a plurality of visual features in said frame; matching said frame to a keyframe, wherein capture of said keyframe precedes capture of said frame, and wherein said keyframe was captured using visual tracking; computing a rotation amount between said frame and said keyframe; responsive to a determination that a rotational distance between said frame and said keyframe exceeds a predetermined threshold, promoting said frame to a keyframe status and adding said frame to a first orientation map; and performing an adjustment of said frame with all prior captured keyframes.

2. The method of claim 1, wherein said determining comprises: computing repeatable and distinct features in said frame using a feature detecting procedure; and detecting and describing said features using a feature description procedure.

3. The method of claim 2, wherein said feature detecting procedure is selected from a group consisting of: FAST, Harris & Stephens, Plessey, Shi-Tomasi, Moravec corner detection, Wang and Brady corner detection, and SUSAN corner detector.

4. The method of claim 2, wherein said feature description procedure is selected from a group consisting of: SURF, SIFT and BRIEF.

5. The method of claim 1, further comprising: responsive to a determination that a rotational distance between said frame and said keyframe is within said predetermined threshold, continuing to search for next keyframe.

6. The method of claim 1, wherein said matching is performed using RANSAC.

7. The method of claim 1, wherein said computing is performed using Horn's procedure.

8. The method of claim 1 further comprising: responsive to a loss of said visual tracking, determining device orientation by combining delta values from said IMU sensor with a last known orientation measurement from said visual tracking; and building a second orientation map, wherein said second orientation map is created using data from said IMU sensor.

9. The method of claim 8, further comprising: determining if said second orientation map overlaps with keyframes from said orientation map; and responsive to a determination of overlap, deleting said second orientation map and continuing to build said first orientation map.

10. The method of claim 8, further comprising: determining if said second orientation map overlaps with keyframes from said first orientation map; and responsive to a determination of overlap, merging said second orientation map with said first orientation map.

11. The method of claim 8, further comprising: determining if said second orientation map overlaps with keyframes from said first orientation map; and responsive to a determination of no overlap, continuing to build said second orientation map.

12. A computer-readable storage medium having stored thereon instructions that, if executed by a computer system cause the computer system to perform a method for tracking device orientation on a portable device, said method comprising: initializing a device orientation to a sensor orientation, wherein said sensor orientation is based on information from an inertial measurement unit (IMU) sensor; initiating visual tracking using a camera on said portable device; capturing a frame using said camera; determining a plurality of visual features in said frame; matching said frame to a keyframe, wherein capture of said keyframe precedes capture of said frame, and wherein said keyframe was captured using visual tracking; computing a rotation amount between said frame and said keyframe; responsive to a determination that a rotational distance between said frame and said keyframe exceeds a predetermined threshold, promoting said frame to a keyframe status and adding said frame to a first orientation map; and performing an adjustment of said frame with all prior captured keyframes.

13. The computer-readable medium as described in claim 12, wherein said determining comprises: computing repeatable and distinct features in said frame using a feature detecting procedure; and detecting and describing said features using a feature description procedure.

14. The computer-readable medium as described in claim 13, wherein said feature detecting procedure is selected from a group consisting of: FAST, Harris & Stephens, Plessey, Shi-Tomasi, Moravec corner detection, Wang and Brady corner detection, and SUSAN corner detector.

15. The computer-readable medium as described in claim 13, wherein said feature description procedure is selected from a group consisting of: SURF, SIFT and BRIEF.

16. The computer-readable medium as described in claim 12, further comprising: responsive to a determination that a rotational distance between said frame and said keyframe is within said predetermined threshold, continuing to search for next keyframe.

17. The computer-readable medium as described in claim 12, wherein said matching is performed using RANSAC.

18. The computer-readable medium as described in claim 12, wherein said computing is performed using Horn's procedure.

19. The computer-readable medium as described in claim 12, wherein said method further comprises: responsive to a loss of said visual tracking, determining device orientation by combining delta values from said IMU sensor with a last known orientation measurement from said visual tracking; and building a second orientation map, wherein said second orientation map is created using data from said IMU sensor.

20. The computer-readable medium as described in claim 19, wherein said method further comprises: determining if said second orientation map overlaps with keyframes from said first orientation map; and responsive to a determination of overlap, deleting said second orientation map and continuing to build said first orientation map.

21. The computer-readable medium as described in claim 19, further comprising: determining if said second orientation map overlaps with keyframes from said first orientation map; and responsive to a determination of overlap, merging said second orientation map with said first orientation map.

22. The computer-readable medium as described in claim 19, further comprising: determining if said second orientation map overlaps with keyframes from said first orientation map; and responsive to a determination of no overlap, continuing to build said second orientation map.

23. A system for tracking device orientation on a portable device, said system comprising: a display screen; a memory; a camera; and a processor configured to implement a visual gyroscope, wherein said visual gyroscope performs a method for tracking device orientation on said portable device, wherein said method comprises: initializing a device orientation to a sensor orientation, wherein said sensor orientation is based on information from an inertial measurement unit (IMU) sensor; initiating visual tracking using a camera on said portable device; capturing a frame using said camera; determining a plurality of visual features in said frame; matching said frame to a keyframe, wherein capture of said keyframe precedes capture of said frame, and wherein said keyframe was captured using visual tracking; computing a rotation amount between said frame and said keyframe; responsive to a determination that a rotational distance between said frame and said keyframe exceeds a predetermined threshold, promoting said frame to a keyframe status and adding said frame to a first orientation map; and performing an adjustment of said frame with all prior captured keyframes.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS

Related Applications

[0001] The present application is related to U.S. patent application Ser. No. ______, filed ______, entitled "A METHOD AND APPARATUS FOR LONG TERM IMAGE EXPOSURE WITH IMAGE STABILIZATION ON A MOBILE DEVICE," naming Syed Zahir Bokari, Josh Abbott, and Jim van Welzen as inventors, and having attorney docket number NVID-PDU-13-0254-US1. That application is incorporated herein by reference in its entirety and for all purposes.

FIELD OF THE INVENTION

[0002] Embodiments according to the present invention generally relate to augmented reality systems and more specifically to device orientation tracking for augmented reality systems.

BACKGROUND OF THE INVENTION

[0003] While augmented reality technologies have improved considerably in recent years, the improvements typically come at the cost of computationally intensive procedures implemented on expensive hardware. The high cost of implementing effective augmented reality systems is a barrier to entry that prevents casual users from having access to such systems on everyday devices that have relatively low processing capability, e.g., tablet computers, phones and other hand-held devices.

[0004] A critical component in successfully implementing augmented reality systems is device orientation tracking. In other words, tracking the orientation of users and objects in the scene is critical for developing augmented reality applications. Procedures implemented in conventional augmented reality applications for tracking device orientation with expected levels of robustness, speed and precision are computationally expensive. Accordingly, they are not ideal for handheld mobile devices.

[0005] For example, prior approaches for tracking device orientation use full Simultaneous Localization and Mapping (SLAM) algorithms, which attempt to determine a device's position as well as orientation. SLAM is a technique used by augmented reality applications build a map of the environment of the device while at the same time keeping track of the device's current location, where the current location includes both the device's position and also its orientation. SLAM generally works by creating geometrically consistent maps of the environment using inputs from different types of sensors, e.g., 2D cameras, 3D sonar sensors, single dimensional beams or 2D sweeping laser rangefinders etc. Building a 3D map of the environment can be significantly intensive from a computation standpoint, in part, because it involves tracking the device's position as well as orientation. Because of the considerable computational requirements for implementing SLAM, it is not a suitable procedure for implementation on a mobile device, such as a smart phone.

[0006] Other conventional augmented reality systems are unsuitable for determining device orientation on mobile devices because they suffer from problems such as drift error. For example, one approach used in conventional devices for tracking device orientation uses only Inertial Measurement Unit (IMU) sensors. An IMU is an electronic device that can measure and report on a device's velocity, orientation or gravitational forces, using a combination of inputs from other devices such an accelerometer, gyroscope and magnetometer. A major disadvantage of using IMUs is that they typically suffer from accumulated error. Because the guidance system is continually adding detected changes to its previously-calculated positions, any errors in measurement, however small, are accumulated from point to point. This leads to drift error, or an ever-increasing difference between where the system thinks it is located and the actual location, where, as stated above, location includes both the device's position and also its orientation. Stated differently, drift is a problem because integration of the orientation tracking's relative measurements accumulates the small errors in each measurement. This can, consequently, create significant differences between the estimated and actual orientation. Further, measurements from IMU sensors tend to be error prone and noisy. Thus, IMU sensors are typically not suitable for an immersive augmented reality environment.

[0007] Conventional methods of tracking device orientation for augmented reality systems, therefore, are unsuitable for use on mobile devices because they are either too expensive and computationally intensive or simply unsuitable as a result of problems such as noise and drift error.

BRIEF SUMMARY OF THE INVENTION

[0008] Accordingly, a need exists for a system and a method for tracking device orientation on a mobile device that has a smaller compute footprint. For example, embodiments of the present invention track device orientation without constructing an entire map of the environment or even trying to calculate the device's position. Further, embodiments of the present invention advantageously track device orientation without requiring a user to change the position of the camera on the mobile device. As a result, device orientation can be tracked more efficiently and quickly using smaller and more affordable electronic components.

[0009] Further, a need exists for systems and methods for a vision-based orientation tracking procedure on a mobile device that uses the camera on the device to determine orientation by identifying and tracking landmarks (or image features) in a natural environment. As a result, embodiments of the present invention advantageously provide a robust, fast and precise orientation tracking solution while avoiding the pitfalls of noise and drift error that is prevalent in conventional IMU sensor-based systems. Moreover, embodiments of the present invention use markerless tracking to provide the most accurate device orientation over naive sensor-based approaches. In one embodiment of the present invention, IMU sensors are utilized as a fallback option if vision-based tracking fails.

[0010] In one embodiment, a method for tracking device orientation on a portable device is disclosed. The method comprises initializing a device orientation to a sensor orientation, wherein the sensor orientation is based on information from an inertial measurement unit (IMU) sensor. It also comprises initiating visual tracking using a camera on the portable device and capturing a frame. Next, it comprises determining a plurality of visual features in the frame and matching the frame to a keyframe, wherein capture of the keyframe precedes capture of the frame. Subsequently, it comprises computing a rotation amount between the frame and the keyframe. Responsive to a determination that a rotational distance between the frame and the keyframe exceeds a predetermined threshold, promoting the frame to a keyframe status and adding it to a first orientation map and adjusting the frame with all prior captured keyframes.

[0011] In another embodiment, a computer-readable storage medium having stored thereon instructions that, if executed by a computer system cause the computer system to perform a method for tracking device orientation on a portable device is disclosed. The method comprises initializing a device orientation to a sensor orientation, wherein the sensor orientation is based on information from an inertial measurement unit (IMU) sensor. It also comprises initiating visual tracking using a camera on the portable device and capturing a frame. Next, it comprises determining a plurality of visual features in the frame and matching the frame to a keyframe, wherein capture of the keyframe precedes capture of the frame. Subsequently, it comprises computing a rotation amount between the frame and the keyframe. Responsive to a determination that a rotational distance between the frame and the keyframe exceeds a predetermined threshold, promoting the frame to a keyframe status and adding it to a first orientation map and adjusting the frame with all prior captured keyframes.

[0012] In a different embodiment, a system for tracking device orientation on a portable device is presented. The system comprises a display screen; a memory; a camera; and a processor configured to implement a visual gyroscope, wherein the visual gyroscope performs a method for tracking device orientation on the portable device. The method comprises initializing a device orientation to a sensor orientation, wherein the sensor orientation is based on information from an inertial measurement unit (IMU) sensor. It also comprises initiating visual tracking using a camera on the portable device and capturing a frame. Next, it comprises determining a plurality of visual features in the frame and matching the frame to a keyframe, wherein capture of the keyframe precedes capture of the frame. Subsequently, it comprises computing a rotation amount between the frame and the keyframe. Responsive to a determination that a rotational distance between the frame and the keyframe exceeds a predetermined threshold, promoting the frame to a keyframe status and adding it to a first orientation map and adjusting the frame with all prior captured keyframes.

[0013] The following detailed description together with the accompanying drawings will provide a better understanding of the nature and advantages of the present invention.

BRIEF DESCRIPTION OF THE DRAWINGS

[0014] Embodiments of the present invention are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which like reference numerals refer to similar elements.

[0015] FIG. 1 shows an exemplary computer system with a camera used to implement a visual gyroscope for tracking device orientation in accordance with one embodiment of the present invention.

[0016] FIG. 2 shows an exemplary operating environment of a mobile device capable of tracking device orientation in accordance with one embodiment of the present invention.

[0017] FIG. 3 illustrates a use case for the visual gyroscope in accordance with one embodiment of the present invention.

[0018] FIG. 4 is a high level block diagram illustrating the elements of the orientation tracking system proposed in accordance with an embodiment of the present invention.

[0019] FIG. 5 is a diagram of a visual gyroscope capturing an initial keyframe in accordance with an embodiment of the present invention.

[0020] FIG. 6 is an illustration of the visual gyroscope capturing a second keyframe following a user rotation from an initial direction in accordance with an embodiment of the present invention.

[0021] FIG. 7 is an illustration of the visual gyroscope starting a new map based on IMU sensor data in accordance with an embodiment of the present invention.

[0022] FIG. 8 is an illustration of the visual gyroscope creating a new map based on IMU sensor data until an overlap is found with the map created from vision-based data in accordance with an embodiment of the present invention.

[0023] FIG. 9 depicts a flowchart of an exemplary computer implemented process of tracking device orientation in accordance with one embodiment of the present invention.

[0024] FIG. 10 depicts a flowchart of an exemplary computer implemented process of using sensors for tracking device orientation when visual tracking is lost in accordance with one embodiment of the present invention.

[0025] In the figures, elements having the same designation have the same or similar function.

DETAILED DESCRIPTION OF THE INVENTION

[0026] Reference will now be made in detail to the various embodiments of the present disclosure, examples of which are illustrated in the accompanying drawings. While described in conjunction with these embodiments, it will be understood that they are not intended to limit the disclosure to these embodiments. On the contrary, the disclosure is intended to cover alternatives, modifications and equivalents, which may be included within the spirit and scope of the disclosure as defined by the appended claims. Furthermore, in the following detailed description of the present disclosure, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure. However, it will be understood that the present disclosure may be practiced without these specific details. In other instances, well-known methods, procedures, components, and circuits have not been described in detail so as not to unnecessarily obscure aspects of the present disclosure.

[0027] Notation and Nomenclature

[0028] Some portions of the detailed descriptions that follow are presented in terms of procedures, steps, logic blocks, processing, and other symbolic representations of operations on data bits within a computer memory. These descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. In the present application, a procedure, logic block, process, or the like, is conceived to be a self-consistent sequence of steps or instructions leading to a desired result. The steps are those utilizing physical manipulations of physical quantities. Usually, although not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated in a computer system. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as transactions, bits, values, elements, symbols, characters, samples, pixels, or the like.

[0029] It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise as apparent from the following discussions, it is appreciated that throughout the present disclosure, discussions utilizing terms such as "capturing," "determining," "matching," "promoting," "bundling," "storing," or the like, refer to actions and processes (e.g., flowchart 900 of FIG. 9) of a computer system or similar electronic computing device or processor (e.g., system 100 of FIG. 1). The computer system or similar electronic computing device manipulates and transforms data represented as physical (electronic) quantities within the computer system memories, registers or other such information storage, transmission or display devices.

[0030] Embodiments described herein may be discussed in the general context of computer-executable instructions residing on some form of computer-readable storage medium, such as program modules, executed by one or more computers or other devices. By way of example, and not limitation, computer-readable storage media may comprise non-transitory computer-readable storage media and communication media; non-transitory computer-readable media include all computer-readable media except for a transitory, propagating signal. Generally, program modules include routines, programs, objects, components, data structures, etc., that perform particular tasks or implement particular abstract data types. The functionality of the program modules may be combined or distributed as desired in various embodiments.

[0031] Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data. Computer storage media includes, but is not limited to, random access memory (RAM), read only memory (ROM), electrically erasable programmable ROM (EEPROM), flash memory or other memory technology, compact disk ROM (CD-ROM), digital versatile disks (DVDs) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to store the desired information and that can accessed to retrieve that information.

[0032] Communication media can embody computer-executable instructions, data structures, and program modules, and includes any information delivery media. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, radio frequency (RF), infrared, and other wireless media. Combinations of any of the above can also be included within the scope of computer-readable media.

[0033] FIG. 1 shows an exemplary computer system with a camera used to implement a visual gyroscope for tracking device orientation in accordance with one embodiment of the present invention. Computer system 100 depicts the components of a generic computer system in accordance with embodiments of the present invention providing the execution platform for certain hardware-based and software-based functionality. In general, computer system 100 comprises at least one CPU 101, a system memory 115, and at least one graphics processor unit (GPU) 110. The CPU 101 can be coupled to the system memory 115 via a bridge component/memory controller (not shown) or can be directly coupled to the system memory 115 via a memory controller (not shown) internal to the CPU 101. The GPU 110 may be coupled to a display 112. One or more additional GPUs can optionally be coupled to system 100 to further increase its computational power. The GPU(s) 110 is coupled to the CPU 101 and the system memory 115. The GPU 110 can be implemented as a discrete component, a discrete graphics card designed to couple to the computer system 100 via a connector (e.g., AGP slot, PCI-Express slot, etc.), a discrete integrated circuit die (e.g., mounted directly on a motherboard), or as an integrated GPU included within the integrated circuit die of a computer system chipset component (not shown). Additionally, a local graphics memory 114 can be included for the GPU 110 for high bandwidth graphics data storage.

[0034] The CPU 101 and the GPU 110 can also be integrated into a single integrated circuit die and the CPU and GPU may share various resources, such as instruction logic, buffers, functional units and so on, or separate resources may be provided for graphics and general-purpose operations. The GPU may further be integrated into a core logic component. Accordingly, any or all the circuits and/or functionality described herein as being associated with the GPU 110 can also be implemented in, and performed by, a suitably equipped CPU 101. Additionally, while embodiments herein may make reference to a GPU, it should be noted that the described circuits and/or functionality can also be implemented and other types of processors (e.g., general purpose or other special-purpose coprocessors) or within a CPU.

[0035] System 100 can be implemented as, for example, a desktop computer system or server computer system having a powerful general-purpose CPU 101 coupled to a dedicated graphics rendering GPU 110. In such an embodiment, components can be included that add peripheral buses, specialized audio/video components, IO devices, and the like. Similarly, system 100 can be implemented as a handheld device (e.g., cell-phone, tablet computer, MP3 player, etc.), direct broadcast satellite (DBS)/terrestrial set-top box or a set-top video game console device such as, for example, the Xbox.RTM. or the PlayStation3.RTM.. System 100 can also be implemented as a "system on a chip", where the electronics (e.g., the components 101, 115, 110, 114, and the like) of a computing device are wholly contained within a single integrated circuit die. Examples include a hand-held instrument with a display, a car navigation system, a portable entertainment system, and the like.

[0036] A Method and Apparatus for Device Orientation Tracking Using a Visual Gyroscope

[0037] Embodiments of the present invention provide a system and a method for tracking device orientation on a mobile device that has a smaller compute footprint. For example, embodiments of the present invention track device orientation without constructing an entire map of the environment or even trying to calculate the device's position. Further, embodiments of the present invention advantageously track device orientation without requiring a user to change the position of the camera on the mobile device. As a result, device orientation can be tracked more efficiently and quickly using smaller and more affordable electronic components.

[0038] Embodiments of the present invention provide a system and a method for a vision-based orientation tracking procedure on a mobile device that uses the camera on the device to determine orientation by identifying and tracking landmarks (or image features) in natural environment. In one embodiment, the present invention is a visual gyroscope that is operable to precisely determine a handheld device's orientation using vision-based techniques. As a result, embodiments of the present invention advantageously provide a robust, fast and precise orientation tracking solution while avoiding the pitfalls of noise and drift error prevalent in conventional IMU sensor-based systems. Embodiments of the present invention use markerless tracking to provide the most accurate device orientation over naive sensor-based approaches. In one embodiment of the present invention, IMU sensors are utilized simply as a fallback option if vision-based tracking fails.

[0039] FIG. 2 shows an exemplary operating environment of a mobile device capable of tracking device orientation in accordance with one embodiment of the present invention. System 200 includes camera 202, image signal processor (ISP) 204, memory 206, IMU sensor 240, input module 208, central processing unit (CPU) 210, display 212, communications bus 214, and power source 220. Power source 220 provides power to system 200 and may be a DC or AC power source. System 200 depicts the components of an exemplary system in accordance with embodiments of the present invention providing the execution platform for certain hardware-based and software-based functionality. Although specific components are disclosed in system 200, it should be appreciated that such components are examples. That is, embodiments of the present invention are well suited to having various other components or variations of the components recited in system 200. It is appreciated that the components in system 200 may operate with other components other than those presented, and that not all of the components of system 200 may be required to achieve the goals of system 200.

[0040] CPU 210 and the ISP 204 can also be integrated into a single integrated circuit die and CPU 210 and ISP 204 may share various resources, such as instruction logic, buffers, functional units and so on, or separate resources may be provided for image processing and general-purpose operations. System 200 can be implemented as, for example, a digital camera, cell phone camera, portable device (e.g., audio device, entertainment device, handheld device), webcam, video device (e.g., camcorder) or any other device with a front or back facing camera.

[0041] In one embodiment, camera 202 captures light via a front-facing or back-facing lens (depending on how the user typically holds the device), and converts the light received into a signal (e.g., digital or analog). Camera 202 may comprise any of a variety of optical sensors including, but not limited to, complementary metal-oxide-semiconductor (CMOS) or charge-coupled device (CCD) sensors. Camera 202 is coupled to communications bus 214 and may provide image data received over communications bus 214. Camera 202 may comprise functionality to determine and configure optical properties and settings including, but not limited to, focus, exposure, color or white balance, and areas of interest (e.g., via a focus motor, aperture control, etc.). In one embodiment, camera 202 may also represent a front facing and a back facing camera both of which are operable to capture images contemporaneously.

[0042] Image signal processor (ISP) 204 is coupled to communications bus 214 and processes the signal generated by camera 204, as described herein. More specifically, image signal processor 204 may process data from camera 202 for storing in memory 206. For example, image signal processor 204 may process frames of visual data captured using camera 202 to be stored within memory 206.

[0043] Input module 208 allows entry of commands into system 200 which may then, among other things, control the sampling of data by camera 202 and subsequent processing by ISP 204. Input module 208 may include, but it not limited to, navigation pads, keyboards (e.g., QWERTY), up/down buttons, touch screen controls (e.g., via display 212) and the like.

[0044] Central processing unit (CPU) 210 receives commands via input module 208 and may control a variety of operations including, but not limited to, sampling and configuration of camera 202, processing by ISP 204, and management (e.g., addition, transfer, and removal) of images and/or video from memory 206.

[0045] Inertial Measurement Unit (IMU) module 240 can detect the current rate of acceleration of the device 200 using one or more accelerometers in device 200 (not shown). Accelerometers detect acceleration forces along a single axis, three are often combined to provide acceleration detection along the x, y and z axis. When the accelerometer is at rest, the axis pointing down will read one due to the force of gravity and the two horizontal axis will read zero.

[0046] The IMU module 240 can also detect changes in rotational attributes like pitch, roll, and yaw using one or more gyroscopes in device 200 (not shown). A gyroscope detects the rotational change of a device. Finally, IMU module 240 may also receive data from a magnetometer (not shown), which is a sensor for measuring the strength and direction of magnetic fields, and can be used for tracking magnetic north, thereby, acting like a compass. Alternatively, the IMU module 240 may receive direction data from a digital compass.

[0047] FIG. 3 illustrates a use case for the visual gyroscope in accordance with one embodiment of the present invention. As the stationary user 310 pans the camera on his handheld device around, the visual gyroscope of the present invention creates a spherical map of the local environment using vision data from the camera by dynamically identifying and tracking features in the environment. In order to create the spherical map, the visual gyroscope needs to track the orientation of the device. Stated differently, the visual gyroscope of the present invention can track features or landmarks in the environment and use information from the tracking to determine the handheld device's orientation, which is used in order to create a spherical map of the environment for identification thereof. The map can then be used in, for example, augmented reality applications, where the user could obtain more information regarding the landmarks in the scene identified through the visual gyroscope. For example, user 310 in FIG. 3 could identify the building that the camera pans across in section 320. Or the user may, for example, be able to determine whether a particular restaurant in the building is open or closed by receiving real time information concerning the identified restaurant. By way of further example, the user may be able to overlay certain graphics on landmarks within the field of view with information regarding the identified businesses, e.g., restaurant menus on restaurants, menu of services rendered by a spa, etc.

[0048] Unlike conventional SLAM based and other approaches, the user is not required to change the position of the camera to build a model of the scene and construct an entire map of the environment. A stationary user can capture the feature map with natural panning motion.

[0049] FIG. 4 is a high level block diagram illustrating the elements of the orientation tracking system 405 proposed in accordance with an embodiment of the present invention. The basic approach employed in the present invention is using vision-based methods with a visual gyroscope for image feature recognition, but also combining vision data with IMU sensor data for robustness. As explained above, only the IMU sensor data lead to drift, which caused marker misplacement. Thus, embodiments of the present invention use vision data to correct for drift, but also use IMU sensor data as a back-up when acquisition of visual data is lost or breaks down.

[0050] As shown in FIG. 4, data from the gyroscope 414, accelerometer 416 and magnetometer 418 feeds into the IMU sensor 420. The output of the camera 412 and the IMU sensor 420 is transmitted to the user application 422. The time-stamped camera data and IMU sensor data 410 from user application 422 is directed to the visual gyroscope module 480. The visual gyroscope module 480 uses the time-stamped camera and sensor data to create time-stamped device orientation data 490. The orientation data 490 is fed back into the user application 422. In one embodiment, user application 422 can be an augmented reality application used, for example, to locate nearby restaurants.

[0051] FIG. 5 is a diagram of a visual gyroscope capturing an initial keyframe in accordance with an embodiment of the present invention. The visual gyroscope module creates an orientation map 524 by capturing keyframes. A keyframe is a snapshot taken by the camera in the direction the camera is pointing in that is at a sufficient degree of rotation from a prior keyframe. This keyframe is saved as part of an orientation map and can be found if the device gets lost while the user is panning around. As shown in FIG. 5, in order to create orientation map 524, the device captures an initial keyframe 522 at user direction 526.

[0052] When the system 200 is first started, the visual gyroscope module is initialized to the absolute orientation of the device read from the IMU sensor 420. This initialization plays an important role when visual tracking fails and will be explained further below.

[0053] After every captured frame, including the initial keyframe, the visual gyroscope procedure computes visual features that are repeatable and distinct within the frame. Any number of feature detecting procedures can be employed for this purpose, e.g., Harris & Stephens, Plessey, Shi-Tomasi, Moravec corner detection, Wang and Brady corner detection, SUSAN (smallest univalue segment assimilating nucleus) corner detector etc. In one embodiment, the visual gyroscope uses FAST (Features from Accelerated Segment Test) for feature detection. FAST is a well-known corner detection algorithm. Corner detection is an approach used within computer vision systems to extract certain kinds of features and infer the contents of an image.

[0054] The visual gyroscope procedure then performs feature description for the features found from feature detection using a well-known procedure, e.g., SURF (Speeded Up Robust Feature), Scale Invariant Feature Transform (SIFT), or BRIEF (Binary Robust Independent Elementary Features). In one embodiment, the visual gyroscope module uses BRIEF for feature description. Feature description comprises detecting and describing local features in images. For example, for any object in an image, interesting points on the object can be extracted to provide a feature description of the object. This description, extracted from a training image, can then be used to identify the object when attempting to locate the object in a test image containing many other objects. Accordingly, the visual gyroscope procedure, in one embodiment, can use BRIEF for feature description of objects within the captured frame.

[0055] The feature detection procedure, e.g., FAST and the feature description procedure, e.g., BRIEF are run on every captured frame including the initial keyframe. Feature detection procedures find features in the image while feature description procedures describe the feature in the sequence of bits so as to compare it with similar features in other frames. In one embodiment, the smoothing operation in BRIEF can be removed to speed up the procedure to run real-time.

[0056] In one embodiment, the feature points are projected onto the image plane as seen by the "ideal" camera using a camera inverse matrix (K.sup.-1)*(x, y, 1) column vector. In one embodiment, the newly transformed points are normalized to make a spherical representation of the points. All features in every frame will be warped in this fashion.

[0057] FIG. 6 is an illustration of the visual gyroscope capturing a second frame following a user orientation rotation from an initial direction in accordance with an embodiment of the present invention. If the procedure determines that the user has panned around sufficiently and that there are enough matching visual features between the second frame and the initial keyframe, the visual gyroscope module will promote frame 523 at user direction 527 to keyframe status.

[0058] In order to determine if the user has panned around sufficiently, the visual gyroscope procedure matches features between the current frame and the prior keyframe and then computes a rotation between the two if the features match. Stated differently, the procedure computes a relative rotation between the two consecutive frames from the differences in position of a matching set of feature points in the two images. In one embodiment, the procedure may build a grid for faster matching. This way only grid cells can be matched to grid cells rather than brute-force matching of the entire set of features. Matching features from the prior keyframe to the new frame allows the procedure to determine which locations on the sphere map to new locations. When matches are found, the visual gyroscope procedure can use Horn's procedure with RANSAC, as will be explained below, to estimate a pure rotation from the matched points.

[0059] The vision data from the camera enables the visual gyroscope to determine an approximation of the user rotation. In one embodiment, the well-known Random Sample Consensus (RANSAC) procedure can be used to match a frame to a prior keyframe. Horn's procedure as described in the following: "B. Horn. Closed-form solution of absolute orientation using unit quaternions. Journal of the Optical Society of America, 1987", all of which is incorporated herein by reference, is used to compute a rotation between two sets of three points, which are then used for RANSAC sampling. Also Horn's method demonstrates how to compute a rotation between two sets of all matched points. This is used to then compute the final rotation between frames once the RANSAC procedure has provided information regarding if the rotation computed with the two sets of three points provides enough inliers for all points. While RANSAC and Horn's procedure can be used to determine the rotation between keyframes, the embodiments of the present invention are not limited to solely these procedures. For example, in one embodiment, changes or deltas in absolute sensor orientation received from the IMU sensor can also be used to approximate user rotation.

[0060] In one embodiment, keyframes are captured at approximately every 20 degrees of user rotation. If the procedure determines that the user has panned above a certain threshold, e.g., 20 degrees, and that there are enough matching visual features in captured frame 523, it will promote the captured frame to a keyframe status. Conversely, if it is determined that the user has not panned a distance sufficiently far from initial keyframe 522, the procedure will not promote the captured frame to a keyframe. The new keyframe will mach to the nearest prior keyframe based, in one embodiment, on a dot product lookup. If, however, for example, no keyframe is near, then it will match to the last frame and save that as a keyframe, if possible.

[0061] In one embodiment, a "bundle adjustment" is performed when a new keyframe is added to the map. A bundle adjustment, which is a well-known method, comprises globally adjusting every keyframe to minimize orientation error of each keyframe every time a new keyframe is added to the map. Global alignment (bundle adjustment) is based on the difference between the angle of neighboring keyframes and what a brute force match provides as an angle.

[0062] FIG. 7 is an illustration of the visual gyroscope starting a new map based on IMU sensor data in accordance with an embodiment of the present invention. Continuing with visual tracking, the visual gyroscope is able to save additional keyframes beyond keyframe 523. As mentioned before, the visual gyroscope is initialized to the absolute orientation of the device as read from the IMU sensor 420 on start-up. Accordingly, the visual gyroscope is able to continually update the orientation as the user moves if the environment permits.

[0063] However, sometimes vision-based tracking may fail for any of several reasons, e.g., insufficient texture in environment, not enough landmarks available, or highly dynamic content as a result of a user panning too quickly etc. In FIG. 7, for instance, the user has lost visual tracking because panning too fast results in motion blur created in the visual data captured by the camera. As a result of motion blur, the camera is no longer able to match features to prior frames and therefore loses its orientation.

[0064] When visual tracking is lost, the procedure starts a new map with keyframe 703 using data from IMU sensor 420. At this point, the absolute sensor orientation and the visual gyroscope orientation may show different readings and, occasionally, even vastly different readings even though they both were initialized with the same orientation value because of the drift in the IMU sensor 420. Thus, instead of using the absolute sensor orientation, the visual gyroscope uses the deltas (or differences) of absolute sensor orientation to calculate a relative orientation traveled, which is to be combined into the visual gyroscope orientation reading from where the visual tracking fails. For example, in FIG. 7, assuming visual tracking failed at user direction 527 right after keyframe 523 was captured and the user pans around to user direction 528, then the relative orientation difference between user direction 527 and user direction 528 is calculated using delta values from the absolute sensor orientation. The relative orientation is then combined with the visual gyroscope orientation reading obtained at user direction 527 to determine the orientation at user direction 528. This creates a smooth experience and the user would not see the transition from visual tracking to sensor tracking.

[0065] After determining orientation by combining the last known visual gyroscope reading with the delta values from the absolute sensor orientation, the visual gyroscope starts a new map at user direction 528 by capturing keyframe 703.

[0066] FIG. 8 is an illustration of the visual gyroscope creating a new map based on IMU sensor data until an overlap is found with the map created from vision-based data in accordance with an embodiment of the present invention. As shown in FIG. 8, a new map is created comprising keyframes 808 using IMU sensor 420 data when visual tracking fails. Because of feature tracking in each keyframe, the visual gyroscope can determine if the user has panned back to a location for which keyframes based on visual tracking already exist. In other words, the visual gyroscope can recognize when there is an overlap between the initial map created using visual tracking (comprising keyframes 522 and 523) and the new map comprising keyframes 808.

[0067] In one embodiment, when the overlap is found, the visual gyroscope deletes the secondary map created using sensor data, e.g., map comprising frames 808. In other words, the sensor data is only used when visual tracking has failed. When the procedure recognizes a keyframe from before that was created using visual tracking, it immediately reverts back to visual tracking and discontinues use of sensor tracking at that time. Moreover, it deletes the map obtained through sensor tracking.

[0068] In a different embodiment, however, when an overlap is found, the visual gyroscope will merge the new map created through sensor tracking with the prior map created using visual tracking. The combined map will then comprise the prior map comprising keyframes 522 and 523 and the new map comprising keyframes 808.

[0069] In one embodiment of the present invention, the visual gyroscope is able to turn on certain rejection zones in the camera's field of view. Rejection zones are areas in the spherical map that are not allowed to save keyframes. This is important because the map can experience significant drift if keyframes are saved based on features that are too close to the viewer. Thus, the visual gyroscope turns on dead-zones for angles that are pointed down, e.g., the ground. Also it may be turned on in areas that are featureless, e.g., the sky. Accordingly, for the rejection zones, precision is not important, and, therefore, the visual gyroscope relies on the tracking from the IMU sensor 420.

[0070] In one embodiment, the present invention takes advantage of the fact that many scenes can be approximated as objects at infinity, e.g., in a panoramic model. This simplification is leveraged by the visual gyroscope in order to simplify the procedures it is implementing.

[0071] FIG. 9 depicts a flowchart of an exemplary computer implement process of tracking device orientation using visual tracking in accordance with one embodiment of the present invention. While the various steps in this flowchart are presented and described sequentially, one of ordinary skill will appreciate that some or all of the steps can be executed in different orders and some or all of the steps can be executed in parallel. Further, in one or more embodiments of the invention, one or more of the steps described below can be omitted, repeated, and/or performed in a different order. Accordingly, the specific arrangement of steps shown in FIG. 9 should not be construed as limiting the scope of the invention. Rather, it will be apparent to persons skilled in the relevant art(s) from the teachings provided herein that other functional flows are within the scope and spirit of the present invention. Flowchart 900 may be described with continued reference to exemplary embodiments described above, though the method is not limited to those embodiments.

[0072] At step 910, the device orientation is initialized using absolute sensor orientation. At step 912, the camera on the handheld device captures a frame from the camera or any other visual capture device. At step 914, the features in the frame are determined using procedures such as FAST AND BRIEF as explained above. At step 916, the frame is matched to a prior keyframe and a rotation is computed between the current frame and prior keyframe. In one embodiment, RANSAC and Horn's procedure are used to perform the matching and rotation computation. If it is found that the user has rotated orientation of the handheld device over a certain threshold, e.g., 20 degrees, then the current frame is promoted to a keyframe. It should be noted, however, that there are other considerations that, in one embodiment, may also be taken into account before promoting a frame to a keyframe status, e.g., ascertaining that the frame has enough visual features, that the frame is not in a restricted zone, and also that it is at an appropriate distance away from another key-frame. Finally, at step 920, bundle adjustment is performed on the newly added keyframe with all the other keyframes.

[0073] FIG. 10 depicts a flowchart of an exemplary computer implemented process of using sensors for tracking device orientation when visual tracking is temporarily lost in accordance with one embodiment of the present invention. While the various steps in this flowchart are presented and described sequentially, one of ordinary skill will appreciate that some or all of the steps can be executed in different orders and some or all of the steps can be executed in parallel. Further, in one or more embodiments of the invention, one or more of the steps described below can be omitted, repeated, and/or performed in a different order. Accordingly, the specific arrangement of steps shown in FIG. 10 should not be construed as limiting the scope of the invention. Rather, it will be apparent to persons skilled in the relevant art(s) from the teachings provided herein that other functional flows are within the scope and spirit of the present invention. Flowchart 1000 may be described with continued reference to exemplary embodiments described above, though the method is not limited to those embodiments.

[0074] At step 1012, after visual tracking is lost, embodiments of the present invention determine device orientation by combining delta values obtained from absolute sensor orientation to a last computed orientation reading from the visual gyroscope, wherein the last computed orientation reading is based on visual tracking.

[0075] At step 1013, a new keyframe is saved based on the delta values from the IMU sensor. This acts as the initial keyframe in a new orientation map that is generated based on IMU sensor data. In other words, relative orientation between the first map based on visual tracking and the second map is calculated using IMU delta values. Accordingly, at step 10 14 a new map is built based on the sensor data. It should be noted that this new orientation map based on IMU sensor data is only built if visual tracking is lost and the visual gyroscope module that the incoming new frames are not close enough to match to the first map. In other words, the second orientation map is only created if the prior map is "lost."

[0076] At step 1015, as new keyframes are added to the new orientation map, the visual gyroscope maintains matching features to determine if there is an overlap between the prior map based on visual tracking and the new map based on sensor data.

[0077] At step 1016, when an overlap is found, in one embodiment, the new map based on sensor data is deleted and the visual gyroscope continues to build the prior map based on visual tracking data. In a different embodiment, however, when an overlap is found, the prior map and the new map based on sensor data are merged and the visual gyroscope continues to build the map based on visual tracking data.

[0078] While the foregoing disclosure sets forth various embodiments using specific block diagrams, flowcharts, and examples, each block diagram component, flowchart step, operation, and/or component described and/or illustrated herein may be implemented, individually and/or collectively, using a wide range of hardware, software, or firmware (or any combination thereof) configurations. In addition, any disclosure of components contained within other components should be considered as examples because many other architectures can be implemented to achieve the same functionality.

[0079] The process parameters and sequence of steps described and/or illustrated herein are given by way of example only. For example, while the steps illustrated and/or described herein may be shown or discussed in a particular order, these steps do not necessarily need to be performed in the order illustrated or discussed. The various example methods described and/or illustrated herein may also omit one or more of the steps described or illustrated herein or include additional steps in addition to those disclosed.

[0080] While various embodiments have been described and/or illustrated herein in the context of fully functional computing systems, one or more of these example embodiments may be distributed as a program product in a variety of forms, regardless of the particular type of computer-readable media used to actually carry out the distribution. The embodiments disclosed herein may also be implemented using software modules that perform certain tasks. These software modules may include script, batch, or other executable files that may be stored on a computer-readable storage medium or in a computing system. These software modules may configure a computing system to perform one or more of the example embodiments disclosed herein. One or more of the software modules disclosed herein may be implemented in a cloud computing environment. Cloud computing environments may provide various services and applications via the Internet. These cloud-based services (e.g., software as a service, platform as a service, infrastructure as a service, etc.) may be accessible through a Web browser or other remote interface. Various functions described herein may be provided through a remote desktop environment or any other cloud-based computing environment.

[0081] The foregoing description, for purpose of explanation, has been described with reference to specific embodiments. However, the illustrative discussions above are not intended to be exhaustive or to limit the invention to the precise forms disclosed. Many modifications and variations are possible in view of the above teachings. The embodiments were chosen and described in order to best explain the principles of the invention and its practical applications, to thereby enable others skilled in the art to best utilize the invention and various embodiments with various modifications as may be suited to the particular use contemplated.

[0082] Embodiments according to the invention are thus described. While the present disclosure has been described in particular embodiments, it should be appreciated that the invention should not be construed as limited by such embodiments, but rather construed according to the below claims.

* * * * *