U.S. patent application number 15/691439 was filed with the patent office on 2018-03-01 for systems and methods for simulatenous localization and mapping.
The applicant listed for this patent is C3D AUGMENTED REALITY SOLUTIONS LTD. Invention is credited to David Benezra, Ozi Egri, Eyal Zak.
Application Number | 20180061072 15/691439 |
Document ID | / |
Family ID | 59887381 |
Filed Date | 2018-03-01 |
United States Patent
Application |
20180061072 |
Kind Code |
A1 |
Benezra; David ; et
al. |
March 1, 2018 |
SYSTEMS AND METHODS FOR SIMULATENOUS LOCALIZATION AND MAPPING
Abstract
Various embodiments provide systems, methods, devices, and
instructions for performing simultaneous localization and mapping
(SLAM) that involve initializing a SLAM process using images from
as few as two different poses of a camera within a physical
environment. Some embodiments may achieve this by disregarding
errors in matching corresponding features depicted in image frames
captured by an image sensor of a mobile computing device, and by
updating the SLAM process in a way that causes the minimization
process to converge to global minima rather than fall into a local
minimum.
Inventors: |
Benezra; David; (Beit
Shearim, IL) ; Zak; Eyal; (Megiddo, IL) ;
Egri; Ozi; (Yassur, IL) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
C3D AUGMENTED REALITY SOLUTIONS LTD |
Kfar Yehoshua |
|
IL |
|
|
Family ID: |
59887381 |
Appl. No.: |
15/691439 |
Filed: |
August 30, 2017 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
62381036 |
Aug 30, 2016 |
|
|
|
Current U.S.
Class: |
1/1 |
Current CPC
Class: |
G06T 7/73 20170101; G06T
7/70 20170101; G06T 7/20 20130101; G06T 7/579 20170101; G06T
2207/10016 20130101; G06T 2207/30244 20130101; G06T 7/246 20170101;
G06T 2207/30241 20130101 |
International
Class: |
G06T 7/579 20060101
G06T007/579; G06T 7/20 20060101 G06T007/20; G06T 7/70 20060101
G06T007/70 |
Claims
1. A method comprising: continuously capturing, by an image sensor,
new image frames of a physical environment and adding the new image
frames to a set of captured image frames; continuously capturing,
from an inertial measurement unit (IMU), IMU data in correspondence
with the image frames captured, the captured IMU data comprising
degrees of freedom (DOF) parameters of the image sensor;
identifying, by one or more hardware processors, a first key image
frame from the set of captured image frames; identifying, by the
one or more hardware processors, first IMU data, from the captured
IMU data, associated with the first key image frame; detecting, by
the IMU, a movement of the image sensor from a first pose, in the
physical environment, to a second pose in the physical environment;
in response to detecting the movement: identifying, by the one or
more hardware processors, a second key image frame from the set of
captured image frames; identifying, by the one or more hardware
processors, second IMU data, from the captured IMU data, associated
with the second key image frame; performing, by one or more
hardware processors, feature matching on at least the first and
second key image frames to identify a set of matching
three-dimensional (3D) features in the physical environment;
generating, by the one or more hardware processors, a filtered set
of matching 3D features by filtering out at least one erroneous
feature, from the set of matching 3D features, based on a set of
error criteria; and determining, by the one or more hardware
processors, a first set of six degrees of freedom (6DOF) parameters
of the image sensor for the second key image frame and a set of 3D
positions for the set of matching 3D features, the determining
comprising performing a simultaneous localization and mapping
(SLAM) process based on the identified first IMU data, the
identified second IMU data, and the filtered set of matching 3D
features.
2. The method of claim 1, wherein the first IMU data comprises a
set of four degrees of freedom (4DOF) parameters of the image
sensor, and the second IMU data comprises a set of 4DOF parameters
of the image.
3. The method of claim 1, wherein the image sensor and the IMU are
included in a device.
4. The method of claim 3, wherein the movement of the image sensor
is caused by a human individual that is holding the device
performing a sidestep.
5. The method of claim 4, wherein the identifying the first key
image frame comprises detecting a start impulse of the sidestep,
and the first key image frame is a particular image frame, in the
set of captured image frames, corresponding with the detected start
impulse.
6. The method of claim 4, wherein the identifying the second key
image frame comprises detecting a finish impulse of the sidestep,
and the second key image frame is a particular image frame, in the
set of captured image frames, corresponding with the detected
finish impulse.
7. The method of claim 1, further comprising: for each particular
new image frame added to the set of captured image frames:
determining, by the one or more hardware processors, whether a set
of key image frame conditions is satisfied for the particular new
image frame; in response to the set of key image frame conditions
being satisfied for the particular new image frame, identifying, by
the one or more hardware processors, the particular new image frame
as a new key image frame and performing, by the one or more
hardware processors, a full SLAM process cycle on the new key image
frame; and in response to the set of key image frame conditions not
being satisfied for the particular new image frame, performing, by
the one or more hardware processors, a partial SLAM process cycle
on the particular new image frame as a non-key image frame, the
partial SLAM process cycle comprising only a localization portion
of a SLAM process.
8. The method of claim 7, wherein the set of key image frame
conditions comprises at least one of the new image frame meeting or
exceeding a certain image quality, a minimum time has elapsed since
the full SLAM process cycle was last performed, and a translation
between a prior image frame and the new image frame meets or
exceeds a minimum translation threshold.
9. The method of claim 7, wherein performing the full SLAM process
cycle on the new key image frame comprises: identifying, by the one
or more hardware processors, third IMU data, from the captured IMU
data, associated with the new key image frame; performing, by the
one or more hardware processors, feature matching on the new key
image frame and at least one prior image frame to identify a second
set of matching 3D features in the physical environment;
determining, by the one or more hardware processors, a second set
of 6DOF parameters of the image sensor for the new key image frame
by performing the SLAM process on the new key image frame based on
the second set of matching 3D features and the third IMU data;
generating, by the one or more hardware processors, a second
filtered set of matching 3D features by filtering out at least one
erroneous feature, from the second set of matching 3D features,
based on a second set of error criteria and the second set of 6DOF
parameters; and determining, by the one or more hardware
processors, a third set of 6DOF parameters of the image sensor for
the new key image frame and a set of 3D positions of new 3D
features in the physical environment by performing the SLAM process
on all key image frames based on the second filtered set of
matching 3D features and the third IMU data.
10. The method of claim 7, wherein performing the partial SLAM
process cycle on the non-key image frame comprises: performing, by
the one or more hardware processors, two-dimensional (2D) feature
tracking on the non-key image frame, based on a set of 3D positions
of new 3D features from the performing the full SLAM process cycle
and a new key image frame most recently identified, to identify a
set of 2D features; determining, by the one or more hardware
processors, a fourth set of 6DOF parameters of the image sensor for
the non-key image frame by performing only the localization portion
of the SLAM process based on the set of 2D features; generating, by
the one or more hardware processors, a filtered set of 2D features
by filtering out at least one erroneous feature, from the set of 2D
features, based on a third set of error criteria and the fourth set
of 6DOF parameters; and projecting, by the one or more hardware
processors, a set of tracking points on the non-key image frame
based on the filtered set of 2D features and the fourth set of 6DOF
parameters.
11. A method comprising: continuously capturing, by an image
sensor, new image frames of a physical environment and adding the
new image frames to a set of captured image frames; continuously
capturing, from an inertial measurement unit (IMU), IMU data in
correspondence with the capturing of the image frames, the captured
IMU data comprising degrees of freedom (DOF) parameters of the
image sensor; for each particular new image frame added to the set
of captured image frames: determining, by the one or more hardware
processors, whether a set of key image frame conditions is
satisfied for the particular new image frame; in response to the
set of key image frame conditions being satisfied for the
particular new image frame, identifying, by the one or more
hardware processors, the particular new image frame as a new key
image frame and performing, by the one or more hardware processors,
a full simultaneous localization and mapping (SLAM) process cycle
on the new key image frame; and in response to the set of key image
frame conditions not being satisfied for the particular new image
frame, performing, by the one or more hardware processors, a
partial SLAM process cycle on the particular new image frame as a
non-key image frame.
12. The method of claim 11, further comprising: detecting, by the
one or more hardware processors, a movement of the image sensor
from a first pose, in the physical environment, to a second pose,
in the physical environment, based on the captured IMU data; and
identifying, by the one or more hardware processors, a first key
image frame and a second key image frame based on the movement, the
first key image frame corresponding to a start impulse of the
movement, and the second key image frame corresponding to a finish
impulse of the movement.
13. The method of claim 11, wherein the set of key image frame
conditions comprises at least one of the new image frame meeting or
exceeding a certain image quality, a minimum time has elapsed since
the full SLAM process cycle was last performed, and a translation
between a prior image frame and the new image frame meets or
exceeds a minimum translation threshold.
14. The method of claim 11, wherein performing the full SLAM
process cycle on the new key image frame comprises: identifying, by
the one or more hardware processors, third IMU data, from the
captured IMU data, associated with the new key image frame;
performing, by the one or more hardware processors, feature
matching on the new key image frame and at least one prior image
frame to identify a second set of matching 3D features in the
physical environment; determining, by the one or more hardware
processors, a second set of 6DOF parameters of the image sensor for
the new key image frame by performing a SLAM process on the new key
image frame based on the second set of matching 3D features and the
third IMU data; generating, by the one or more hardware processors,
a second filtered set of matching 3D features by filtering out at
least one erroneous feature, from the second set of matching 3D
features, based on a second set of error criteria and the second
set of 6DOF parameters; and determining, by the one or more
hardware processors, a third set of 6DOF parameters of the image
sensor for the new key image frame and a set of 3D positions of new
3D features in the physical environment by performing the SLAM
process on all key image frames based on the second filtered set of
matching 3D features and the third IMU data.
15. The method of claim 11, wherein performing the partial SLAM
process cycle on the non-key image frame comprises: performing, by
the one or more hardware processors, two-dimensional (2D) feature
tracking on the non-key image frame, based on a set of 3D positions
of new 3D features from the performing a full SLAM process cycle
and a new key image frame most recently identified, to identify a
set of 2D features; determining, by the one or more hardware
processors, a fourth set of 6DOF parameters of the image sensor for
the non-key image frame by performing only a localization portion
of a SLAM process based on the set of 2D features; generating, by
the one or more hardware processors, a filtered set of 2D features
by filtering out at least one erroneous feature, from the set of 2D
features, based on a third set of error criteria and the fourth set
of 6DOF parameters; and projecting, by the one or more hardware
processors, a set of tracking points on the non-key image frame
based on the filtered set of 2D features and the fourth set of 6DOF
parameters.
16. A system comprising: an image frame capture module to
continuously capture, by an image sensor of a device, new image
frames of a physical environment and to add the new image frames to
a set of captured image frames; an inertial measurement unit (IMU)
data capture module to continuously capture, from an IMU of the
device, IMU data in correspondence with the image frames captured
by the image frame capture module, the captured IMU data comprising
degrees of freedom (DOF) parameters of the image sensor; a key
image frame module to: determine, for each particular new image
frame added to the set of captured image frames, whether a set of
key image frame conditions is satisfied for the particular new
image frame; and identify the particular new image frame as a new
key image frame in response to the set of key image frame
conditions being satisfied for the particular new image frame; a
full simultaneous localization and mapping (SLAM) cycle module to
perform a full SLAM process cycle on each particular new key image
frame identified by the key image frame module; and a partial SLAM
cycle module to perform a partial SLAM process cycle on each
particular new image frame not identified by the key image frame
module, the partial SLAM process cycle comprising performing only a
localization portion of a SLAM process.
17. The system comprising of claim 16, further comprising: a SLAM
initialization module to: detect a movement of the image sensor
from a first pose, in the physical environment, to a second pose,
in the physical environment, based on the captured IMU data from
the IMU capture module; and identify a first key image frame and a
second key image frame based on the movement, the first key image
frame corresponding to a start impulse of the movement, and the
second key image frame corresponding to a finish impulse of the
movement.
18. The system of claim 16, wherein the set of key image frame
conditions comprises at least one of the new image frame meeting or
exceeding a certain image quality, a minimum time has elapsed since
the full SLAM process cycle was last performed, and a translation
between a prior image frame and the new image frame meets or
exceeds a minimum translation threshold.
19. The system of claim 16, wherein to perform the full SLAM
process cycle on each particular new key image frame identified by
the key image frame module, the full SLAM cycle module is to:
identify second IMU data, from the captured IMU data, associated
with the particular new key image frame; perform feature matching
on the new key image frame and at least one prior image frame to
identify a second set of matching 3D features in the physical
environment; determine a first set of 6DOF parameters of the image
sensor for the new key image frame by performing the SLAM process
on the new key image frame based on the second set of matching 3D
features and the second IMU data; generate a filtered set of
matching 3D features by filtering out at least one erroneous
feature, from the second set of matching 3D features, based on a
set of error criteria and the first set of 6DOF parameters; and
determine a second set of 6DOF parameters of the image sensor for
the new key image frame and a set of 3D positions of new 3D
features in the physical environment by performing the SLAM process
on all key image frames based on the filtered set of matching 3D
features and the second IMU data.
20. The system of claim 16, wherein to perform the partial SLAM
process cycle on each particular new image frame not identified by
the key image frame module, the partial SLAM cycle module is to:
perform two-dimensional (2D) feature tracking on the particular new
image frame, based on a set of 3D positions of new 3D features from
the full SLAM cycle module and a new key image frame most recently
identified by the key image frame module, to identify a set of 2D
features; determine a set of 6DOF parameters of the image sensor
for the non-key image frame by performing only the localization
portion of the SLAM process based on the set of 2D features;
generate a filtered set of 2D features by filtering out at least
one erroneous feature, from the set of 2D features, based on a set
of error criteria and the set of 6DOF parameters; and project a set
of tracking points on the non-key image frame based on the filtered
set of 2D features and the set of 6DOF parameters.
Description
CROSS-REFERENCE TO RELATED APPLICATION
[0001] This application claims priority to and the benefit of U.S.
Provisional Patent Application No. 62/381,036, filed on Aug. 30,
2016, which is incorporated herein by reference.
TECHNICAL FIELD
[0002] Embodiments described herein relate to localization and
mapping of a sensor within a physical environment and, more
particularly, to systems, methods, devices, and instructions for
performing simultaneous localization and mapping (SLAM).
BACKGROUND
[0003] A SLAM (Simultaneous Localization and Mapping) process
(e.g., algorithm) can be used by a mobile computing device (e.g.,
mobile phone, tablet, wearable augmented reality (AR) device,
wearable, autonomous aerial or ground vehicle, or a robot) to map
the structure of a physical environment surrounding the mobile
computing device and to localize the mobile computing device's
relative position within that mapped environment. A SLAM process
can usually map and localize, in real-time, as the mobile computing
device moves about within its physical environment.
[0004] Although not exclusively image-based, some SLAM processes
achieve mapping and localization by using images of the physical
environment provided by an image sensor associated with the mobile
computing device, such as a mobile phone's built-in camera. From
the captured images, such SLAM processes can recover the mobile
computing device position and construct a map of the physical
environment surrounding the mobile computing device, by recovering
both the image sensor's pose and the structure of the map without
initially knowing either.
[0005] SLAM processes that use captured images usually require
several images of corresponding physical features (hereafter,
features), in the physical environment, that are captured by an
image sensor (e.g., of a mobile computing device) at different
poses. The images captured from the different camera locations
permit such SLAM processes to converge and start their localization
and mapping processes. Unfortunately, the localization problem in
an image-based SLAM process is usually difficult to solve due to
errors in matching corresponding features between captured
images--these errors tend to move the local result of the
minimization problem of the SLAM process to a local minimum rather
than a global minima, which provides a specific location.
BRIEF DESCRIPTION OF THE DRAWINGS
[0006] Various ones of the appended drawings merely illustrate some
embodiments of the present disclosure and should not be considered
as limiting its scope. The drawings are not necessarily drawn to
scale. To easily identify the discussion of any particular element
or act, the most significant digit or digits in a reference number
refer to the figure number in which that element is first
introduced, and like numerals may describe similar components in
different views.
[0007] FIG. 1 is a block diagram illustrating an example high-level
client- server-based network architecture that includes a
simultaneous localization and mapping (SLAM) system, according to
some embodiments.
[0008] FIG. 2 is a block diagram illustrating an example computing
device including a SLAM system, according to some embodiments.
[0009] FIGS. 3-7 are flowcharts illustrating example methods for
SLAM processing, according to various embodiments.
[0010] FIG. 8 is a block diagram illustrating a representative
software architecture, which may be used in conjunction with
various hardware architectures herein described to implement an
embodiment.
[0011] FIG. 9 is a block diagram illustrating components of a
machine, according to some example embodiments, able to read
instructions from a machine-readable medium (e.g., a
machine-readable storage medium) and perform any one or more of the
methodologies discussed herein.
DETAILED DESCRIPTION
[0012] Various embodiments provide systems, methods, devices, and
instructions for performing simultaneous localization and mapping
(SLAM) that involve initializing a SLAM process using images
(hereafter, image frames) from as few as two different poses (e.g.,
physical locations) of the camera within a physical environment.
Some embodiments may achieve this by disregarding errors in
matching corresponding features (hereafter, feature errors)
depicted in image frames (of a physical environment) captured by an
image sensor of a mobile computing device, and by updating the SLAM
process in a way that causes the minimization process to converge
to global minima rather than fall into a local minimum. The global
minima can provide a physical location of the image sensor.
[0013] According to some embodiments, a SLAM process initializes by
detecting movement of a mobile computing device, within a physical
environment, between two physical locations, where the movement is
bounded by two image frames (hereafter, images) distinguishable and
captured by an image sensor of the mobile computing device. The
mobile computing device may identify the two distinguishable images
by the mobile computing device correlating image blur, detected via
the image sensor, with a shock or impulse of movement detected via
a movement sensor of the mobile computing device (e.g., an inertial
measurement unit (IMU) or an accelerometer). The movement detected
by the mobile computing device may comprise the shock or impulse
detected when the mobile computing device initially starts
movement, and may also comprise the shock or impulse detected when
the mobile computing device finally stops movement. In this way,
various embodiments can tie data from one or more sensors of the
mobile computing device to specific images captured by the image
sensor of the mobile computing device, which in turn can initialize
operation of the SLAM process. Additionally, various embodiments
can permit a SLAM process of an embodiment to initialize each key
image frame based on a previous image frame and use an IMU to
determine an initial distance.
[0014] For some embodiments, the movement comprises a sidestep
performed by a human individual holding the mobile computing
device, which can provide a sufficient parallax for good SLAM
initialization. In particular, an embodiment may analyze the
movement of the human individual and extract relevant portions of
the movement using the impulse created at the start and finish of
the sidestep (e.g., based on a typical human sidestep).
Subsequently, the embodiment may use those relevant portions to
identify first and second key image frames and initialize the SLAM
process based on the first and second key image frames.
[0015] Some embodiments enable a SLAM process to determine
localization of a mobile computing device and map the physical
environment of the mobile computing device (e.g., with a usable or
acceptable level of accuracy), while using a movement sensor that
provides poor accuracy (e.g., a noisy IMU). Some embodiments enable
the SLAM process to determine localization and map the physical
environment while using a limited amount of image data. Some
embodiments enable the SLAM process to determine localization and
map the physical environment in real-time, while using a limited
computing resources (e.g., low power processor). Additionally, some
embodiments enable the SLAM process to determine localization and
map the physical environment without using depth data.
[0016] A SLAM technique of some embodiments may be used to: track
key points (tracking points) in two-dimensional (2D) image frames
(e.g., of a video stream); and identify three-dimensional (3D)
features (e.g., physical objects in a physical environment) in the
2D image frames as well as a relative physical pose (e.g.,
location) of the camera to 3D features.
[0017] For instance, a SLAM technique of an embodiment may be used
with augmented reality (AR) image processing and image frame
tracking. In particular, an image frame captured for an AR system
may be tracked using the SLAM technique and then a virtual object
may be placed within, or relative to, the captured image frame as
part of an AR display of a device, such as smart glasses, smart
phone, tablet, or another mobile computing device. As used herein,
augmented reality (AR) refers to systems, methods, devices, and
instructions that can capture image frames, enhance those image
frames with additional information, and then present the enhanced
information on a display. This may, for example, enable a user to
hold up a mobile computing device (e.g., smart phone or tablet) to
capture a video stream of a scene, and an output display of the
mobile computing device to present the scene as visible to the user
along with additional information. This additional information may
include placing virtual objects in the scene so that the virtual
objects are presented as if they exist in the scene. Aspects of
such virtual objects are processed to occlude the virtual object if
another real or virtual object passes in front of the virtual
object as shown from the perspective of the image sensor capturing
the physical environment. Such virtual objects are also processed
to maintain their relationship with real objects as both real and
virtual objects move over time, and as the perspective of the image
sensor capturing the environment changes.
[0018] Some embodiments provide for a method comprising performing
cycles of a full SLAM process, performing cycles of a partial SLAM
process, and performing the partial SLAM process cycles and the
full SLAM process cycles such that the partial SLAM process cycles
are performed more frequently than the full SLAM process cycles.
According to various embodiments, a full SLAM process cycle
performs a full SLAM process, while a partial SLAM process cycle
performs a partial SLAM process that requires less of a computing
resource (e.g., processing, memory resources, or both) to perform
than does performing the full SLAM process. Additionally, a partial
SLAM process may perform faster than a full SLAM process cycle.
[0019] For some embodiments, the partial SLAM process only performs
a localization portion of a SLAM process. In alternative
embodiments, the partial SLAM process only performs a mapping
portion of a SLAM process. By performing only a portion of a SLAM
process, a partial SLAM process may perform using less computing
resources than a full SLAM process, and may perform faster than a
full SLAM process. Further, by performing the full SLAM process
cycles less frequently than the partial SLAM process cycles,
various embodiments achieve SLAM results (e.g., useful and accurate
SLAM results) while limiting the computer resources needed to
achieve those results. Accordingly, various embodiments are
suitable for performance of a SLAM process on a device that
otherwise has limited computing resources for performing
traditional SLAM techniques, such as a smart phone or smart glasses
having limited processing power.
[0020] For some embodiments, image frames are captured (e.g.,
continuously captured at a particular sample rate) by an image
sensor of a device, such as a camera of a mobile phone. Some
embodiments perform the full SLAM process cycles on those captured
image frames identified (e.g., generated) as new key image frames,
while performing the partial SLAM process cycles on those captured
image frames not identified as key image frames. In some
embodiments, a captured image frame is identified (e.g., generated)
as a new key image frame when one or more key image frame
conditions are satisfied. Various embodiments use the key image
frame conditions to ensure that new key image frames identified
from the captured image frames are sufficiently unique to ensure
that each full SLAM process cycle is performed as intended or as
expected.
[0021] For instance, a new key image frame may be generated when
the captured image frame is at least of, if not better than, a
predetermined quality (e.g., fair image equality). In this way, an
embodiment may avoid designating, as new image frames, those image
frames captured during movement of the image sensor, which likely
captured image blur caused by the movement of the image sensor. The
image quality of a captured image frame may be determined by a
gradient histogram method, which can determine the quality of a
current image frame based on the quality of a predetermined number
of prior captured image frames. In another instance, a new key
image frame may be generated only after a certain amount of time or
certain number of cycles (e.g., partial SLAM process cycles) has
passed between identification (e.g., generation) of the last new
key image frame. In this way, an embodiment can avoid every
captured image frame from being treated as a key image frame and
processed by a full SLAM cycle, which, as described herein, can be
processor-intensive or memory-intensive to perform and not suitable
for continuous performance on a device having limited computing
resources. In another instance, a new key image frame may be
generated only after a certain amount of translation (e.g., caused
by image sensor position change in the physical environment with
respect to X, Y, or Z coordinates) is detected between the current
captured image frame and a previous image frame. In this way, an
embodiment can avoid too many image frames, capturing the same
point in the physical environment, being designated as new key
image frames, which would not be helpful for three-dimensional (3D)
mapping purposes.
[0022] For some embodiments, the full SLAM process cycles and the
partial SLAM process cycles may be performed in parallel, whereby
the full SLAM process cycles are only performed on those captured
image frames identified as new key image frames and the partial
SLAM process cycles are performed on all other image frames
captured between non-key image frames. Additionally, for some
embodiments, the full SLAM process cycles and the partial SLAM
process cycles begin to perform after SLAM initialization is
performed as described herein. For instance, a SLAM initialization
process of an embodiment may produce the first two key image frames
(e.g., based on a sidestep by a human individual), provide initial
localization data (e.g., comprising the six degrees of freedom
(6DOF) for the second key image frame), and provide initial mapping
data (e.g., comprising 3D positions of features matched between the
first two key image frames). Subsequently, the full SLAM process
cycles and the partial SLAM process cycles can begin to perform
based on the initial localization and mapping data provided by the
SLAM initialization process.
[0023] Although various embodiments are described herein with
respect to use of an IMU, it will be understood that some
embodiments may utilize one or more other sensors in addition to,
or in place of, the IMU, such as an accelerometer or a gyroscope.
As used herein, degrees of freedom (DOF) (e.g., as measured by an
IMU, accelerometer, or gyroscope) may comprise displacement (e.g.,
measured according to X, Y, and Z coordinates) and orientation
(e.g., measured according psi, theta, and phi). Accordingly, six
degrees of freedom (6DOF) parameters may comprise values
representing distance along an x-axis, y-axis, and z-axis, and
values representing rotations according to Euler angles psi, theta,
and phi. Four degrees of freedom (4DOF) parameters may comprise
values representing distance along an x-axis, y-axis, and z-axis,
and a value representing rotation according to a Euler angle (e.g.,
phi).
[0024] The description that follows includes systems, methods,
techniques, instruction sequences, and computing machine program
products that embody illustrative embodiments of the disclosure. In
the following description, for the purposes of explanation,
numerous specific details are set forth in order to provide an
understanding of various embodiments of the inventive subject
matter. It will be evident, however, to those skilled in the art,
that embodiments of the inventive subject matter may be practiced
without these specific details. In general, well-known instruction
instances, protocols, structures, and techniques are not
necessarily shown in detail.
[0025] Reference will now be made in detail to embodiments of the
present disclosure, examples of which are illustrated in the
appended drawings. The present disclosure may, however, be embodied
in many different forms and should not be construed as being
limited to the embodiments set forth herein.
[0026] FIG. 1 is a block diagram illustrating an example high-level
client-server-based network architecture 100 that includes a
simultaneous localization and mapping (SLAM) system 126, according
to some embodiments. As shown, the network architecture 100
includes a client device 102A and a client device 102B (hereafter,
collectively referred to as client devices 102), the SLAM system
126 included in the client device 102B; a messaging server system
108; and a network 106 (e.g., the Internet or wide area network
(WAN)) that facilitates data communication between the client
devices 102 and the messaging server system 108. In the network
architecture 100, the messaging server system 108 can provide
server-side functionality via the network 106 to the client devices
102. In some embodiments, a user (now shown) interacts with one of
the client devices 102 or the messaging server system 108 using one
of the client devices 102.
[0027] The client devices 102 may comprise a computing device that
includes at least a display and communication capabilities that
provide communications with the messaging server system 108 via the
network 106. Each of the client devices 102 may comprise, without
limitation, a remote device, work station, computer, general
purpose computer, Internet appliance, hand-held device, wireless
device, portable device, wearable computer, cellular or mobile
phone, personal digital assistant (PDA), smart phone, tablet,
ultrabook, netbook, laptop, desktop, multi-processor system,
microprocessor-based or programmable consumer electronic, game
console, set-top box, network personal computer (PC),
mini-computer, and so forth. Depending on the embodiment, at least
one of the client devices 102 may comprise one or more of a touch
screen, inertial measurement unit (IMU), accelerometer, gyroscope,
biometric sensor, camera, microphone, Global Positioning System
(GPS) device, and the like.
[0028] For some embodiments, the client device 102B represents a
mobile computing device, such as a mobile phone, tablet, or
wearable device (e.g., smart glasses, a smart visor, or a smart
watch), that includes an image sensor. As shown, the client device
102B includes sensors 128, which can include the image sensor
(e.g., camera) of the client device 102B and other sensors, such as
an inertial measurement unit (IMU), accelerometer, or gyroscope.
For various embodiments, the sensors 128 facilitate operation of
the SLAM system 126 on the client device 102B.
[0029] The SLAM system 126 performs a SLAM technique of an
embodiment on the client device 102B, which can permit the client
device 102B to map its physical environment while simultaneously
determining its location within that physical environment.
Additionally, for some embodiments, the SLAM system 126 permits the
performance of a SLAM technique on the client device 102B despite
the client device 102B having limited computing resources (e.g.,
processing or memory resources), which may prevent traditional SLAM
techniques from operating on the client device 102B as intended.
The SLAM technique performed by the SLAM system 126 may support
image frame tracking for an augmented reality system 124 of the
client device 102B.
[0030] As shown, the client device 102B includes an augmented
reality system 124, which may represent an augmented reality
application operating on the client device 102B. The augmented
reality system 124 may provide functionality to generate augmented
reality images for display on a display (e.g., AR display) of the
client device 102B. The network architecture 100 may be used to
communicate information for virtual objects to be displayed on
client device 102B by the augmented reality system 124 included by
client device 102B, or provide data (e.g., street view data) used
to create models used by the augmented reality system 124. An image
frame captured for the augmented reality system 124 may be tracked
using the SLAM system 126 and then a virtual object may be placed
within, or relative to, the captured image frame as part of an AR
display of the client device 102B.
[0031] Each of the client devices 102 can host a number of
applications, including a messaging client application 104, such as
an ephemeral message application. Each messaging client application
104 may be communicatively coupled to other instances of the
messaging client application 104 and a messaging server system 108
via a network 106 (e.g., the Internet). Accordingly, each messaging
client application 104 may be able to communicate and exchange data
with another messaging client application 104 and with the
messaging server system 108 via the network 106. The data exchanged
between messaging client applications 104, and between a messaging
client application 104 and the messaging server system 108, may
include functions (e.g., commands to invoke functions) as well as
payload data (e.g., text, audio, video or other multimedia
data).
[0032] The messaging server system 108 provides server-side
functionality via the network 106 to a particular messaging client
application 104. While certain functions of the network
architecture 100 are described herein as being performed by either
a messaging client application 104 or by the messaging server
system 108, it will be appreciated that the location of certain
functionality either within the messaging client application 104 or
the messaging server system 108 is a design choice. For example, it
may be technically preferable to initially deploy certain
technology and functionality within the messaging server system
108, but to later migrate this technology and functionality to the
messaging client application 104 where a client device 102 has a
sufficient processing capacity.
[0033] The messaging server system 108 supports various services
and operations that are provided to the messaging client
application 104 or the augmented reality system 124. Such
operations include transmitting data to, receiving data from, and
processing data generated by the messaging client application 104
or the augmented reality system 124. This data may include message
content, client device information, geolocation information, media
annotation and overlays, message content persistence conditions,
social network information, augmented reality (AR) content, and
live event information, as examples. Data exchanges within the
network architecture 100 are invoked and controlled through
functions available via user interfaces (UIs) of the messaging
client application 104 or the augmented reality system 124.
[0034] Turning now specifically to the messaging server system 108,
an Application Program Interface (API) server 110 is coupled to,
and provides a programmatic interface to, an application server
112. The application server 112 is communicatively coupled to a
database server 118, which facilitates access to a database 120 in
which is stored data associated with messages or augmented
reality-related data processed by the application server 112.
[0035] Dealing specifically with the Application Program Interface
(API) server 110, this server 110 receives and transmits message
data (e.g., commands and message payloads) between the client
devices 102 and the application server 112. Specifically, the API
server 110 provides a set of interfaces (e.g., routines and
protocols) that can be called or queried by the messaging client
application 104 in order to invoke functionality of the application
server 112. The API server 110 exposes various functions supported
by the application server 112, including account registration;
login functionality; the sending of messages, via the application
server 112, from a particular messaging client application 104 to
another messaging client application 104; the sending of media
files (e.g., images or video) from a messaging client application
104 to the messaging server application 114, and for possible
access by another messaging client application 104, the setting of
a collection of media data (e.g., story); the retrieval of a list
of friends of a user of a client device 102; the retrieval of such
collections; the retrieval of messages and content; the adding and
deletion of friends to a social graph; the location of friends
within a social graph; opening an application event (e.g., relating
to the messaging client application 104).
[0036] The application server 112 hosts a number of applications
and subsystems, including a messaging server application 114, an
image processing system 116 and a social network system 122. The
messaging server application 114 implements a number of message
processing technologies and functions, particularly related to the
aggregation and other processing of content (e.g., textual and
multimedia content) included in messages received from multiple
instances of the messaging client application 104. The text and
media content from multiple sources may be aggregated into
collections of content (e.g., called stories or galleries). These
collections are then made available, by the messaging server
application 114, to the messaging client application 104. Other
processor-and memory-intensive processing of data may also be
performed server-side by the messaging server application 114, in
view of the hardware requirements for such processing.
[0037] The application server 112 also includes an image processing
system 116 that is dedicated to performing various image processing
operations, typically with respect to images or video received
within the payload of a message at the messaging server application
114.
[0038] The social network system 122 supports various social
networking functions services, and makes these functions and
services available to the messaging server application 114. To this
end, the social network system 122 maintains and accesses an entity
graph within the database 120. Examples of functions and services
supported by the social network system 122 include the
identification of other users of the messaging system 108 with
which a particular user has relationships or is "following", and
also the identification of other entities and interests of a
particular user.
[0039] The application server 112 is communicatively coupled to a
database server 118, which facilitates access to a database 120 in
which is stored data associated with messages or augmented reality
content processed by the application server 112.
[0040] FIG. 2 is a block diagram illustrating an example computing
device 200 including a SLAM system 240, according to some
embodiments. The computing device 200 may represent a mobile
computing device, such as a mobile phone, tablet, laptop, wearable
device, or the like, which a human individual can readily move
around in a physical environment. As shown, the computing device
200 includes a processor 210, an image sensor 220, an inertial
measurement unit (IMU) 230, and the SLAM system 240. The SLAM
system 240 includes an image frame capture module 241, an IMU data
capture module 242, a key image frame module 243, a full SLAM cycle
module 244, and a partial SLAM cycle module 245. Depending on the
embodiment, the SLAM system 240 may or may not include a SLAM
initialization module 246.
[0041] Any one or more of the functional components (e.g., modules)
of the SLAM system 240 may be implemented using hardware (e.g., the
processor 210 of the computing device 200) or a combination of
hardware and software. For instance, any one of the components
described herein may configure the processor 210 to perform the
operations described herein for that component. Moreover, any two
or more of these components may be combined into a single
component, and the functions described herein for a single
component may be subdivided among multiple components. Furthermore,
according to various example embodiments, any of the functional
components illustrated FIG. 2 may be implemented together or
separately within a single machine, database, or device or may be
distributed across multiple machines, databases, or devices.
[0042] The processor 210 may comprise a central processing unit
(CPU), the image sensor 220 may comprise a camera built-into the
computing device 200 or one that is externally coupled to the
computing device 200, and the IMU 230 may comprise a sensor capable
of measuring degrees of freedom (e.g., 6DOF) with respect to at
least the image sensor 220, if not the computing device 200 as
well. Though not shown, the computing device 200 may include other
sensors to facilitate the operation of the SLAM system 240, such as
an accelerometer or a gyroscope.
[0043] The image frame capture module 241 may invoke, cause, or
perform continuous capture of new image frames of a physical
environment of the computing device 200 by the image sensor 220.
The continuous capture may be performed according to a
predetermined sampling rate, such as 25 or 30 frames per a second.
The image frame capture module 241 may add new image frames
continuously captured by the image sensor 220 to a set of captured
image frames, which can be further processed by the SLAM system
240.
[0044] The IMU data capture module 242 may invoke, cause, or
perform continuous capture of IMU data from the IMU 230 in
correspondence to the image frames captured by the image frame
capture module 241. For instance, the IMU data capture module 242
may capture IMU data for each image frame captured. For a given
captured image frame, the captured IMU data may comprise degrees of
freedom (DOF) parameters of the image sensor at the time the image
frame was captured by the image sensor. The DOF parameters may
comprise, for example, four degrees of freedom (4DOF) or six
degrees of freedom (6DOF) measured with respect to the image sensor
220. Where the IMU 230, the image sensor 220, and the computing
device 200 are physically integrated as a single unit, the IMU data
may reflect DOF parameters of the image sensor 220 and the
computing device 200.
[0045] For each particular new image frame added to the set of
captured image frames (e.g., by the image frame capture module
241), the key image frame module 243 may determine whether a set of
key image frame conditions is satisfied for the particular new
image frame. The key image frame module 243 may identify the
particular new image frame as a new key image frame in response to
the set of key image frame conditions being satisfied for the
particular new image frame. In this way, the key image frame module
243 can generate new key image frames according to a set of key
image frame conditions. As described herein, a set of key image
frame conditions can ensure that new key image frames are
sufficiently unique for processing by a full SLAM process cycle. An
example key image frame condition may relate to whether a new image
frame meets or exceeds a certain image quality, whether a minimum
time has lapsed since a full SLAM process cycle was last performed,
or whether a translation between a prior image frame and the new
image frame meets or exceeds a minimum translation threshold.
[0046] The full SLAM cycle module 244 may perform a full SLAM
process cycle on each particular new key image frame identified by
the key image frame module 243. Performing the full SLAM process
cycle on the particular new key image frame may comprise
determining 6DOF of the image sensor of the computing device in
association with the particular new key image frame. Additionally,
performing the full SLAM process cycle on the particular new key
image frame may comprise determining a set of 3D positions of new
3D features matched in the particular new key image frame. More
regarding partial SLAM process cycles is described herein with
respect to FIG. 6.
[0047] The partial SLAM cycle module 245 may perform a partial SLAM
process cycle on each particular new image frame not identified by
the key image frame module 243. For some embodiments, the partial
SLAM process cycle only performs a localization portion of a SLAM
process. Performing the partial SLAM process cycle on the
particular new image frame may comprise determining 6DOF of the
image sensor 220 of the computing device 200 in association with
the particular new image frame. Additionally, performing the
partial SLAM process cycle on the particular new image frame may
comprise projecting a set of tracking points on the particular new
image frame based on the 6DOF of the image sensor 220.
Alternatively, for some embodiments, the partial SLAM process cycle
only performs a mapping portion of a SLAM process. More regarding
partial SLAM process cycles is described herein with respect to
FIG. 7.
[0048] The SLAM initialization module 246 may detect a movement of
the image sensor 220 from a first pose (e.g., orientation or
location of the image sensor 220), in the physical environment, to
a second pose, in the physical environment, based on the captured
IMU data from the IMU data capture module 242. The SLAM
initialization module 246 may identify a first key image frame and
a second key image frame based on the movement. In particular, the
first and second key image frames may be identified such that the
first key image frame corresponds to a start impulse of the
movement, and the second key image corresponds to a finish impulse
of the movement. For instance, where a human individual is holding
the computing device 200, the start impulse of the movement may be
the beginning of a sidestep performed by the human individual, and
the finish impulse of the movement may be the end of the sidestep.
The start or finish impulse may be detected using a shock function
of the computing device 200.
[0049] FIGS. 3-7 are flowcharts illustrating example methods for
SLAM processing, according to various embodiments. It will be
understood that example methods described herein may be performed
by a device, such as a computing device (e.g., the computing device
200), in accordance with some embodiments. Additionally, example
methods described herein may be implemented in the form of
executable instructions stored on a computer-readable medium or in
the form of electronic circuitry. For instance, one or more
operations of a method 300 of FIG. 3 may be represented by
executable instructions that, when executed by a processor of a
computing device, cause the computing device to perform the method
300. Depending on the embodiment, an operation of an example method
described herein may be repeated in different ways or involve
intervening operations not shown. Although the operations of
example methods may be depicted and described in a certain order,
the order in which the operations are performed may vary among
embodiments, including performing certain operations in
parallel.
[0050] FIG. 3 is a flowchart illustrating an example method 300 for
SLAM processing, according to some embodiments. In particular, the
method 300 illustrates how an embodiment may perform full SLAM
process cycles and partial SLAM process cycles. As shown, the
method 300 begins with operation 302 invoking, causing, or
performing continuous capture of new image frames of a physical
environment of a computing device, by an image sensor of the
computing device. Operation 302 adds the new image frames to a set
of captured image frames, which can be further processed by the
method 300. The method 300 continues with operation 304 invoking,
causing, or performing continuous capture of IMU data, from an
inertial measurement unit (IMU) of the computing device, in
correspondence to the image frames captured by operation 302. As
described herein, the IMU data of a particular image frame may
comprise degrees of freedom (DOF) of the image sensor, as measured
by the IMU, at the time the image frame was captured by operation
302.
[0051] The method 300 continues with operation 306, which performs
operations 320 through 326 for each particular new image frame
captured and added to the set of captured image frames by operation
302. Operation 306 begins with operation 320 determining whether a
set of key image frame conditions is satisfied for the particular
new image frame. Operation 306 continues with operation 322
identifying the particular new image frame as a new key image frame
in response to operation 320 determining that the set of key image
frame conditions is satisfied for the particular new image
frame.
[0052] Operation 306 continues with operation 324 performing a full
SLAM process cycle on the new key image frame. For future
processing purposes, some embodiments keep track of those image
frames identified as key image frames. Performing the full SLAM
process cycle on the particular new key image frame may comprise
determining 6DOF of the image sensor of the computing device in
association with the particular new key image frame. Additionally,
performing the full SLAM process cycle on the particular new key
image frame may comprise determining a set of 3D positions of new
3D features matched in the particular new key image frame. More
regarding partial SLAM process cycles is described herein with
respect to FIG. 6.
[0053] Operation 306 continues with operation 326 performing a
partial SLAM process cycle on the particular new image frame in
response to operation 320 determining that the set of key image
frame conditions is not satisfied for the particular new image
frame (i.e., non-key new image frames). Performing the partial SLAM
process cycle on the particular new image frame may comprise
determining 6DOF of the image sensor of the computing device in
association with the particular new image frame. Additionally,
performing the partial SLAM process cycle on the particular new
image frame may comprise projecting a set of tracking points on the
particular new image frame based on the 6DOF of the image sensor.
More regarding partial SLAM process cycles is described herein with
respect to FIG. 7.
[0054] FIG. 4 is a flowchart illustrating an example method 400 for
SLAM processing, according to some embodiments. In particular, the
method 400 illustrates how an embodiment may initialize a SLAM
process, perform full SLAM process cycles, and perform partial SLAM
process cycles. As shown, the method 400 begins with operations 402
and 404, which, according to some embodiments, are respectively
similar to operations 302 and 304 of the method 300 described above
with respect to FIG. 3.
[0055] The method 400 continues with operation 406 detecting, based
on the captured IMU data, a movement of the image sensor from a
first pose (e.g., orientation or location of the image sensor), in
the physical environment, to a second pose in the physical
environment. The method 400 continues with operation 408
identifying a first key image frame and a second key image frame
based on the movement detected by operation 406. For some
embodiments, the first key image frame corresponds to a start
impulse of the movement, and the second key image frame corresponds
to a finish impulse of the movement. By operations 406 and 408,
some embodiments can initialize the method 400 for performance of
full and partial SLAM process cycles by operation 410. As described
herein, the movement may be caused by a human individual performing
a sidestep while holding a computing device performing the method
400 and including the image sensor.
[0056] The method 400 continues with operation 410, which performs
operations 420 through 426 for each particular new image frame
captured and added to the set of captured image frames by operation
402. According to some embodiments, operations 420 through 426 are
respectively similar to operations 320 through 326 of the method
300 described above with respect to FIG. 3.
[0057] FIG. 5 is a flowchart illustrating an example method 500 for
SLAM processing, according to some embodiments. In particular, the
method 500 illustrates how an embodiment may initialize a SLAM
process. As shown, the method 500 begins with operations 502 and
504, which, according to some embodiments, are respectively similar
to operations 302 and 304 of the method 300 described above with
respect to FIG. 3.
[0058] The method 500 continues with operation 506 identifying a
first key image frame from the set of captured image frames. The
first key image frame identified may comprise a certain image
quality (e.g., fair quality) and may be one captured by the image
sensor (e.g., image sensor 220) when the IMU (e.g., IMU 230)
indicates that the image sensor is stable. Accordingly, operation
506 may not identify a first key image frame until an image frame
is captured while the image sensor is stable and the captured image
frame meets a certain image quality.
[0059] The method 500 continues with operation 508 identifying
first IMU data, from the IMU data captured by operation 504,
associated with the first key image frame. For some embodiments,
the first IMU data comprises 4DOF parameters (e.g., x, y, z, and
phi). The first IMU data may represent IMU data captured at the
time the first key image frame was captured by the image
sensor.
[0060] The method 500 continues with operation 510 detecting, by
the IMU, a movement of the image sensor from a first pose (e.g.,
orientation or location), in the physical environment, to a second
pose in the physical environment. As described herein, the movement
may be caused by a human individual performing a sidestep while
holding a computing device performing the method 500 and including
the image sensor and the IMU.
[0061] The method 500 continues with operation 512, which performs
operations 520 through 528 in response to detecting the movement by
operation 510. Operation 512 begins with operation 520 identifying
a second key image frame from the set of captured image frames.
While the first key image frame may be identified by operation 506
such the first key image frame corresponds to the start of the
movement detected by operation 510, the second key image frame may
be identified by operation 520 such that the second key image frame
corresponds to the end of the movement detected by operation
510.
[0062] Operation 512 continues with operation 522 identifying
second IMU data, from the IMU data captured by operation 504,
associated with the second key image frame. For some embodiments,
the second IMU data comprises 4DOF parameters (e.g., x, y, z, and
phi). The second IMU data may represent IMU data captured at the
time the second key image frame was captured by the image
sensor.
[0063] Operation 512 continues with operation 524 performing
feature matching on at least the first and second key image frames
to identify a set of matching 3D features in the physical
environment. For some embodiments, operation 524 uses a feature
matcher based on KAZE or A-KAZE, which extracts 3D features from a
set of image frames by matching features across the image frames.
Operation 512 continues with operation 526 generating a filtered
set of matching 3D features by filtering out at least one erroneous
feature, from the set of matching 3D features produced by operation
524, based on a set of error criteria. For instance, the set of
error criteria may include an error criterion relating to an
epipolar axis, projected error, or spatial error. If features
errors are found, the method 500 may return to operation 524 to
perform feature matching again. Operation 512 continues with
operation 528 determining a set of 6DOF parameters of the image
sensor for the second key image frame and a set of 3D positions for
the set of matching 3D features. To facilitate this determination,
operation 512 performs a (full) SLAM process on the second key
image frame based on the first IMU data identified by operation
506, the second IMU data identified by operation 522, and the
filtered set of matching 3D features extracted by operation
526.
[0064] FIG. 6 is a flowchart illustrating an example method 600 for
SLAM processing, according to some embodiments. In particular, the
method 600 illustrates how an embodiment may perform a full SLAM
process cycle. For some embodiments, the method 600 is not
performed until at least two key image frames are produced by way
of a SLAM initialization process (e.g., the method 500). As shown,
the method 600 begins with operation 602 identifying particular IMU
data, from the captured IMU data, associated with the new key image
frame. The IMU data may represent IMU data captured at the time the
new key image frame was captured by the image sensor.
[0065] The method 600 continues with operation 604 performing
feature matching on the new key image frame and at least one prior
image frame (e.g., last two captured image frames) to identify a
set of matching 3D features in the physical environment. For some
embodiments, operation 604 uses a feature matcher based on KAZE or
A-KAZE, which extracts 3D features from a set of image frames by
matching features across the image frames. The method 600 continues
with operation 606 determining a first set of 6DOF parameters of
the image sensor for the new key image frame by performing a (full)
SLAM process on the new key image frame based on the set of
matching 3D features extracted by operation 604 and the particular
IMU data identified by operation 602.
[0066] The method 600 continues with operation 608 generating a
filtered set of matching 3D features by filtering out at least one
erroneous feature, from the set of matching 3D features extracted
by operation 604, based on a set of error criteria and the first
set of 6DOF parameters determined by operation 606. As described
herein, the set of error criteria may include an error criterion
relating to an epipolar axis, projected error, or spatial error.
For instance, an error criterion may specify filtering out those
features representing the top 3% of the worst projected errors.
[0067] The method 600 continues with operation 610 determining a
second set of 6DOF parameters of the image sensor for the new key
image frame and a set of 3D positions of new 3D features in the
physical environment by performing the (full) SLAM process on all
key image frames based on the second filtered set of matching 3D
features generated by operation 608 and the particular IMU data
identified by operation 602.
[0068] FIG. 7 is a flowchart illustrating an example method 700 for
SLAM processing, according to some embodiments. In particular, the
method 700 illustrates how an embodiment may perform a partial SLAM
process cycle. As shown, the method 700 begins with operation 702
performing feature tracking on the non-key image frame, based on a
set of 3D positions of new 3D features provided (e.g., extracted)
by the last performance of a full SLAM process cycle (e.g., the
method 600) and a new image frame most recently processed by the
full SLAM process cycle (e.g., by the method 600). For some
embodiments, operation 702 uses a 2D tracker based on the
Kanade-Lucas-Tomasi (KLT) method, which extracts 2D features from
the new key image frame processed by the full SLAM process cycle
last performed. The method 700 continues with operation 704
determining a set of 6DOF parameters of the image sensor for the
non-key image frame by performing only the localization portion of
the SLAM process based on the set of 2D features from operation
702. The method 700 continues with operation 706 generating a
filtered set of 2D features by filtering out at least one erroneous
feature, from the set of 2D features identified by operation 702,
based on a set of error criteria and the set of 6DOF parameters
determined by operation 704. The set of error criteria may include,
for example, an error criterion relating to an epipolar axis,
projected error, or spatial error. The method 700 continues with
operation 708 projecting a set of tracking points on the non-key
image frame based on the filtered set of 2D features generated by
operation 706 and the set of 6DOF parameters determined by
operation 704. For some embodiments, the set of tracking points
permits 2D virtual tracking on the non-key image frame, which may
be useful in such applications as augmented reality.
Software Architecture
[0069] FIG. 8 is a block diagram illustrating an example software
architecture 806, which may be used in conjunction with various
hardware architectures herein described to implement an embodiment.
FIG. 8 is a non-limiting example of a software architecture and it
will be appreciated that many other architectures may be
implemented to facilitate the functionality described herein. The
software architecture 806 may execute on hardware such as machine
900 of FIG. 9 that includes, among other things, processors 904,
memory 914, and I/O components 918. A representative hardware layer
852 is illustrated and can represent, for example, the machine 900
of FIG. 9. The representative hardware layer 852 includes a
processor unit 854 having associated executable instructions 804.
Executable instructions 804 represent the executable instructions
of the software architecture 806, including implementation of the
methods, components and so forth described herein. The hardware
layer 852 also includes memory and/or storage modules
memory/storage 856, which also have executable instructions 804.
The hardware layer 852 may also comprise other hardware 858.
[0070] In the example architecture of FIG. 8, the software
architecture 806 may be conceptualized as a stack of layers where
each layer provides particular functionality. For example, the
software architecture 806 may include layers such as an operating
system 802, libraries 820, frameworks/middleware 818, applications
816, and a presentation layer 814. Operationally, the applications
816 and/or other components within the layers may invoke
application programming interface (API) calls 808 through the
software stack and receive a response as messages 812. The layers
illustrated are representative in nature and not all software
architectures have all layers. For example, some mobile or special
purpose operating systems may not provide a frameworks/middleware
818, while others may provide such a layer. Other software
architectures may include additional or different layers.
[0071] The operating system 802 may manage hardware resources and
provide common services. The operating system 802 may include, for
example, a kernel 822, services 824 and drivers 826. The kernel 822
may act as an abstraction layer between the hardware and the other
software layers. For example, the kernel 822 may be responsible for
memory management, processor management (e.g., scheduling),
component management, networking, security settings, and so on. The
services 824 may provide other common services for the other
software layers. The drivers 826 are responsible for controlling or
interfacing with the underlying hardware. For instance, the drivers
826 include display drivers, camera drivers, Bluetooth.RTM.
drivers, flash memory drivers, serial communication drivers (e.g.,
Universal Serial Bus (USB) drivers), Wi-Fi.RTM. drivers, audio
drivers, power management drivers, and so forth depending on the
hardware configuration.
[0072] The libraries 820 provide a common infrastructure that is
used by the applications 816 and/or other components and/or layers.
The libraries 820 provide functionality that allows other software
components to perform tasks in an easier fashion than to interface
directly with the underlying operating system 802 functionality
(e.g., kernel 822, services 824 and/or drivers 826). The libraries
820 may include system libraries 844 (e.g., C standard library)
that may provide functions such as memory allocation functions,
string manipulation functions, mathematical functions, and the
like. In addition, the libraries 820 may include API libraries 846
such as media libraries (e.g., libraries to support presentation
and manipulation of various media format such as MPEG4, H.264, MP3,
AAC, AMR, JPG, PNG), graphics libraries (e.g., an OpenGL framework
that may be used to render 2D and 3D in a graphic content on a
display), database libraries (e.g., SQLite that may provide various
relational database functions), web libraries (e.g., WebKit that
may provide web browsing functionality), and the like. The
libraries 820 may also include a wide variety of other libraries
848 to provide many other APIs to the applications 816 and other
software components/modules.
[0073] The frameworks/middleware 818 (also sometimes referred to as
middleware) provide a higher-level common infrastructure that may
be used by the applications 816 and/or other software
components/modules. For example, the frameworks/middleware 818 may
provide various graphic user interface (GUI) functions, high-level
resource management, high-level location services, and so forth.
The frameworks/middleware 818 may provide a broad spectrum of other
APIs that may be utilized by the applications 816 and/or other
software components/modules, some of which may be specific to a
particular operating system 802 or platform.
[0074] The applications 816 include built-in applications 838
and/or third-party applications 840. Examples of representative
built-in applications 838 may include, but are not limited to, a
contacts application, a browser application, a book reader
application, a location application, a media application, a
messaging application, and/or a game application. Third-party
applications 840 may include an application developed using the
ANDROID.TM. or IOS.TM. software development kit (SDK) by an entity
other than the vendor of the particular platform, and may be mobile
software running on a mobile operating system such as IOS.TM.,
ANDROID.TM., WINDOWS.RTM. Phone, or other mobile operating systems.
The third-party applications 840 may invoke the API calls 808
provided by the mobile operating system (such as operating system
802) to facilitate functionality described herein.
[0075] The applications 816 may use built-in operating system
functions (e.g., kernel 822, services 824 and/or drivers 826),
libraries 820, and frameworks/middleware 818 to create user
interfaces to interact with users of the system. Alternatively, or
additionally, in some systems, interactions with a user may occur
through a presentation layer, such as presentation layer 814. In
these systems, the application/component "logic" can be separated
from the aspects of the application/component that interact with a
user.
[0076] FIG. 9 is a block diagram illustrating components of a
machine 900, according to some example embodiments, able to read
instructions 804 from a machine-readable medium (e.g., a
machine-readable storage medium) and perform any one or more of the
methodologies discussed herein. Specifically, FIG. 9 shows a
diagrammatic representation of the machine 900 in the example form
of a computer system, within which instructions 910 (e.g.,
software, a program, an application, an applet, an app, or other
executable code) for causing the machine 900 to perform any one or
more of the methodologies discussed herein may be executed. As
such, the instructions 910 may be used to implement modules or
components described herein. The instructions 910 transform the
general, non-programmed machine 900 into a particular machine 900
programmed to carry out the described and illustrated functions in
the manner described. In alternative embodiments, the machine 900
operates as a standalone device or may be coupled (e.g., networked)
to other machines. In a networked deployment, the machine 900 may
operate in the capacity of a server machine or a client machine in
a server-client network environment, or as a peer machine in a
peer-to-peer (or distributed) network environment. The machine 900
may comprise, but not be limited to, a server computer, a client
computer, a personal computer (PC), a tablet computer, a laptop
computer, a netbook, a set-top box (STB), a personal digital
assistant (PDA), an entertainment media system, a cellular
telephone, a smart phone, a mobile computing device, a wearable
device (e.g., a smart watch), a smart home device (e.g., a smart
appliance), other smart devices, a web appliance, a network router,
a network switch, a network bridge, or any machine capable of
executing the instructions 910, sequentially or otherwise, that
specify actions to be taken by machine 900. Further, while only a
single machine 900 is illustrated, the term "machine" shall also be
taken to include a collection of machines that individually or
jointly execute the instructions 910 to perform any one or more of
the methodologies discussed herein.
[0077] The machine 900 may include processors 904, memory/storage
906, and I/O components 918, which may be configured to communicate
with each other such as via a bus 902. The memory/storage 906 may
include a memory 914, such as a main memory, or other memory
storage, and a storage unit 916, both accessible to the processors
904 such as via the bus 902. The storage unit 916 and memory 914
store the instructions 910 embodying any one or more of the
methodologies or functions described herein. The instructions 910
may also reside, completely or partially, within the memory 914,
within the storage unit 916, within at least one of the processors
904 (e.g., within the processor's cache memory), or any suitable
combination thereof, during execution thereof by the machine 900.
Accordingly, the memory 914, the storage unit 916, and the memory
of processors 904 are examples of machine-readable media.
[0078] The I/O components 918 may include a wide variety of
components to receive input, provide output, produce output,
transmit information, exchange information, capture measurements,
and so on. The specific I/O components 918 that are included in a
particular machine 900 will depend on the type of machine. For
example, portable machines such as mobile phones will likely
include a touch input device or other such input mechanisms, while
a headless server machine will likely not include such a touch
input device. It will be appreciated that the I/O components 918
may include many other components that are not shown in FIG. 9. The
I/O components 918 are grouped according to functionality merely
for simplifying the following discussion and the grouping is in no
way limiting. In various example embodiments, the I/O components
918 may include output components 926 and input components 928. The
output components 926 may include visual components (e.g., a
display such as a plasma display panel (PDP), a light emitting
diode (LED) display, a liquid crystal display (LCD), a projector,
or a cathode ray tube (CRT)), acoustic components (e.g., speakers),
haptic components (e.g., a vibratory motor, resistance mechanisms),
other signal generators, and so forth. The input components 928 may
include alphanumeric input components (e.g., a keyboard, a touch
screen configured to receive alphanumeric input, a photo-optical
keyboard, or other alphanumeric input components), point based
input components (e.g., a mouse, a touchpad, a trackball, a
joystick, a motion sensor, or other pointing instrument), tactile
input components (e.g., a physical button, a touch screen that
provides location and/or force of touches or touch gestures, or
other tactile input components), audio input components (e.g., a
microphone), and the like.
[0079] In further example embodiments, the I/O components 918 may
include biometric components 930, motion components 934,
environment components 936, or position components 938 among a wide
array of other components. For example, the biometric components
930 may include components to detect expressions (e.g., hand
expressions, facial expressions, vocal expressions, body gestures,
or eye tracking), measure biosignals (e.g., blood pressure, heart
rate, body temperature, perspiration, or brain waves), identify a
person (e.g., voice identification, retinal identification, facial
identification, fingerprint identification, or electroencephalogram
based identification), and the like. The motion components 934 may
include acceleration sensor components (e.g., accelerometer),
gravitation sensor components, rotation sensor components (e.g.,
gyroscope), and so forth. The environment components 936 may
include, for example, illumination sensor components (e.g.,
photometer), temperature sensor components (e.g., one or more
thermometer that detect ambient temperature), humidity sensor
components, pressure sensor components (e.g., barometer), acoustic
sensor components (e.g., one or more microphones that detect
background noise), proximity sensor components (e.g., infrared
sensors that detect nearby objects), gas sensors (e.g., gas
detection sensors to detection concentrations of hazardous gases
for safety or to measure pollutants in the atmosphere), or other
components that may provide indications, measurements, or signals
corresponding to a surrounding physical environment. The position
components 938 may include location sensor components (e.g., a
Global Position system (GPS) receiver component), altitude sensor
components (e.g., altimeters or barometers that detect air pressure
from which altitude may be derived), orientation sensor components
(e.g., magnetometers), and the like.
[0080] Communication may be implemented using a wide variety of
technologies. The I/O components 918 may include communication
components 940 operable to couple the machine 900 to a network 932
or devices 920 via coupling 924 and coupling 922 respectively. For
example, the communication components 940 may include a network
interface component or other suitable device to interface with the
network 932. In further examples, communication components 940 may
include wired communication components, wireless communication
components, cellular communication components, Near Field
Communication (NFC) components, Bluetooth.RTM. components (e.g.,
Bluetooth.RTM. Low Energy), Wi-Fi.RTM. components, and other
communication components to provide communication via other
modalities. The devices 920 may be another machine or any of a wide
variety of peripheral devices (e.g., a peripheral device coupled
via a Universal Serial Bus (USB)).
[0081] Moreover, the communication components 940 may detect
identifiers or include components operable to detect identifiers.
For example, the communication components 940 may include Radio
Frequency Identification (RFID) tag reader components, NFC smart
tag detection components, optical reader components (e.g., an
optical sensor to detect one-dimensional bar codes such as
Universal Product Code (UPC) bar code, multi-dimensional bar codes
such as Quick Response (QR) code, Aztec code, Data Matrix,
Dataglyph, MaxiCode, PDF417, Ultra Code, UCC RSS-2D bar code, and
other optical codes), or acoustic detection components (e.g.,
microphones to identify tagged audio signals). In addition, a
variety of information may be derived via the communication
components 940, such as location via Internet Protocol (IP)
geo-location, location via Wi-Fi.RTM. signal triangulation,
location via detecting a NFC beacon signal that may indicate a
particular location, and so forth.
[0082] As used herein, "ephemeral message" can refer to a message
that is accessible for a time-limited duration (e.g., maximum of 10
seconds). An ephemeral message may comprise a text content, image
content, audio content, video content and the like. The access time
for the ephemeral message may be set by the message sender or,
alternatively, the access time may be a default setting or a
setting specified by the recipient. Regardless of the setting
technique, an ephemeral message is transitory. A message duration
parameter associated with an ephemeral message may provide a value
that determines the amount of time that the ephemeral message can
be displayed or accessed by a receiving user of the ephemeral
message. An ephemeral message may be accessed or displayed using a
messaging client software application capable of receiving and
displaying content of the ephemeral message, such as an ephemeral
messaging application.
[0083] As also used herein, "ephemeral message story" can refer to
a collection of ephemeral message content that is accessible for a
time-limited duration, similar to an ephemeral message. An
ephemeral message story may be sent from one user to another, and
may be accessed or displayed using a messaging client software
application capable of receiving and displaying the collection of
ephemeral content, such as an ephemeral messaging application.
[0084] Throughout this specification, plural instances may
implement components, operations, or structures described as a
single instance. Although individual operations of one or more
methods are illustrated and described as separate operations, one
or more of the individual operations may be performed concurrently,
and nothing requires that the operations be performed in the order
illustrated. Structures and functionality presented as separate
components in example configurations may be implemented as a
combined structure or component. Similarly, structures and
functionality presented as a single component may be implemented as
separate components. These and other variations, modifications,
additions, and improvements fall within the scope of the subject
matter herein.
[0085] Although an overview of the inventive subject matter has
been described with reference to specific example embodiments,
various modifications and changes may be made to these embodiments
without departing from the broader scope of embodiments of the
present disclosure.
[0086] The embodiments illustrated herein are described in
sufficient detail to enable those skilled in the art to practice
the teachings disclosed. Other embodiments may be used and derived
therefrom, such that structural and logical substitutions and
changes may be made without departing from the scope of this
disclosure. The detailed description, therefore, is not to be taken
in a limiting sense, and the scope of various embodiments is
defined only by the appended claims, along with the full range of
equivalents to which such claims are entitled.
[0087] As used herein, modules may constitute software modules
(e.g., code stored or otherwise embodied in a machine-readable
medium or in a transmission medium), hardware modules, or any
suitable combination thereof. A "hardware module" is a tangible
(e.g., non-transitory) physical component (e.g., a set of one or
more processors) capable of performing certain operations and may
be configured or arranged in a certain physical manner. In various
embodiments, one or more computer systems or one or more hardware
modules thereof may be configured by software (e.g., an application
or portion thereof) as a hardware module that operates to perform
operations described herein for that module.
[0088] In some embodiments, a hardware module may be implemented
electronically. For example, a hardware module may include
dedicated circuitry or logic that is permanently configured to
perform certain operations. A hardware module may be or include a
special-purpose processor, such as a field programmable gate array
(FPGA) or an ASIC. A hardware module may also include programmable
logic or circuitry that is temporarily configured by software to
perform certain operations. As an example, a hardware module may
include software encompassed within a CPU or other programmable
processor.
[0089] Considering embodiments in which hardware modules are
temporarily configured (e.g., programmed), each of the hardware
modules need not be configured or instantiated at any one instance
in time. For example, where a hardware module includes a CPU
configured by software to become a special-purpose processor, the
CPU may be configured as respectively different special-purpose
processors (e.g., each included in a different hardware module) at
different times. Software (e.g., a software module) may accordingly
configure one or more processors, for example, to become or
otherwise constitute a particular hardware module at one instance
of time and to become or otherwise constitute a different hardware
module at a different instance of time.
[0090] Hardware modules can provide information to, and receive
information from, other hardware modules. Accordingly, described
hardware modules may be regarded as being communicatively coupled.
Where multiple hardware modules exist contemporaneously,
communications may be achieved through signal transmission (e.g.,
over suitable circuits and buses) between or among two or more of
the hardware modules. In embodiments in which multiple hardware
modules are configured or instantiated at different times,
communications between such hardware modules may be achieved, for
example, through the storage and retrieval of information in memory
structures to which the multiple hardware modules have access. For
example, one hardware module may perform an operation and store the
output of that operation in a memory (e.g., a memory device) to
which it is communicatively coupled. A further hardware module may
then, at a later time, access the memory to retrieve and process
the stored output. Hardware modules may also initiate
communications with input or output devices, and can operate on a
resource (e.g., a collection of information from a computing
resource).
[0091] The various operations of example methods described herein
may be performed, at least partially, by one or more processors
that are temporarily configured (e.g., by software) or permanently
configured to perform the relevant operations. Whether temporarily
or permanently configured, such processors may constitute
processor-implemented modules that operate to perform one or more
operations or functions described herein. As used herein,
"processor-implemented module" refers to a hardware module in which
the hardware includes one or more processors. Accordingly, the
operations described herein may be at least partially
processor-implemented, hardware-implemented, or both, since a
processor is an example of hardware, and at least some operations
within any one or more of the methods discussed herein may be
performed by one or more processor-implemented modules,
hardware-implemented modules, or any suitable combination
thereof.
[0092] As used herein, the term "or" may be construed in either an
inclusive or exclusive sense. The terms "a" or "an" should be read
as meaning "at least one," "one or more," or the like. The use of
words and phrases such as "one or more," "at least," "but not
limited to," or other like phrases shall not be read to mean that
the narrower case is intended or required in instances where such
broadening phrases may be absent.
[0093] Boundaries between various resources, operations, modules,
engines, and data stores are somewhat arbitrary, and particular
operations are illustrated in a context of specific illustrative
configurations. Other allocations of functionality are envisioned
and may fall within a scope of various embodiments of the present
disclosure. In general, structures and functionality presented as
separate resources in the example configurations may be implemented
as a combined structure or resource. Similarly, structures and
functionality presented as a single resource may be implemented as
separate resources. These and other variations, modifications,
additions, and improvements fall within a scope of embodiments of
the present disclosure as represented by the appended claims. The
specification and drawings are, accordingly, to be regarded in an
illustrative rather than a restrictive sense.
[0094] The description above includes systems, methods, devices,
instructions, and computer media (e.g., computing machine program
products) that embody illustrative embodiments of the disclosure.
In the description, for the purposes of explanation, numerous
specific details are set forth in order to provide an understanding
of various embodiments of the inventive subject matter. It will be
evident, however, to those skilled in the art, that embodiments of
the inventive subject matter may be practiced without these
specific details. In general, well-known instruction instances,
protocols, structures, and techniques are not necessarily shown in
detail.
* * * * *