U.S. patent application number 15/286161 was filed with the patent office on 2017-01-26 for augmented reality vision system for tracking and geolocating objects of interest.
This patent application is currently assigned to SRI INTERNATIONAL. The applicant listed for this patent is SRI INTERNATIONAL. Invention is credited to Vlad BRANZOI, Rakesh KUMAR, Taragay OSKIPER, Supun SAMARASEKERA, Mikhail SIZINTSEV.
Application Number | 20170024904 15/286161 |
Document ID | / |
Family ID | 57235007 |
Filed Date | 2017-01-26 |
United States Patent
Application |
20170024904 |
Kind Code |
A1 |
SAMARASEKERA; Supun ; et
al. |
January 26, 2017 |
AUGMENTED REALITY VISION SYSTEM FOR TRACKING AND GEOLOCATING
OBJECTS OF INTEREST
Abstract
Methods and apparatuses for tracking objects comprise one or
more optical sensors for capturing one or more images of a scene,
wherein the one or more optical sensors capture a wide field of
view and corresponding narrow field of view for the one or more
mages of a scene, a localization module, coupled to the one or more
optical sensors for determining the location of the apparatus, and
determining the location of one more objects in the one or more
images based on the location of the apparatus and an augmented
reality module, coupled to the localization module, for enhancing a
view of the scene on a display based on the determined location of
the one or more objects.
Inventors: |
SAMARASEKERA; Supun;
(Princeton, NJ) ; OSKIPER; Taragay; (East Windsor,
NJ) ; KUMAR; Rakesh; (West Windsor, NJ) ;
SIZINTSEV; Mikhail; (Plainsboro, NJ) ; BRANZOI;
Vlad; (Lawrenceville, NJ) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
SRI INTERNATIONAL |
Menlo Park |
CA |
US |
|
|
Assignee: |
SRI INTERNATIONAL
|
Family ID: |
57235007 |
Appl. No.: |
15/286161 |
Filed: |
October 5, 2016 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
13916702 |
Jun 13, 2013 |
9495783 |
|
|
15286161 |
|
|
|
|
61675734 |
Jul 25, 2012 |
|
|
|
61790715 |
Mar 15, 2013 |
|
|
|
Current U.S.
Class: |
1/1 |
Current CPC
Class: |
G06T 11/00 20130101;
H04N 5/2258 20130101; G06T 2207/10021 20130101; G06T 2207/30184
20130101; G06T 7/246 20170101; G06T 19/006 20130101; G06K 9/00671
20130101; G06K 9/209 20130101; G06K 9/2054 20130101; G06T 11/60
20130101; G06T 2207/20081 20130101 |
International
Class: |
G06T 11/00 20060101
G06T011/00; H04N 5/225 20060101 H04N005/225; G06T 11/60 20060101
G06T011/60 |
Goverment Interests
GOVERNMENT RIGHTS IN THIS INVENTION
[0002] This invention was made with U.S. government support under
contract number N00014-11-C-0433. The U.S. government has certain
rights in this invention.
Claims
1.-22. (canceled)
23. A user-carried or user-worn apparatus for tracking objects, the
apparatus comprising: one or more optical sensors for capturing
images of a scene, wherein the images include a wide field of view
image of the scene and a narrow field of view image of the scene; a
display for displaying a displayed view of the scene; and a
processor coupled to a storage medium, the storage medium storing
processor-executable instructions, which when executed by the
processor, performs a method comprising: geolocating the one or
more optical sensors and a location of one or more objects depicted
in the captured images based on a location of the one or more
optical sensors; recognizing an object of interest in the wide
field of view image; recognizing the object of interest in the
narrow field of view image; correlating a location of the
recognized object of interest in the wide field of view image and
the narrow field of view image; and enhancing the displayed view of
the scene on the display based on the geolocating of the one or
more objects, wherein the enhanced, displayed view comprises
overlay content for the recognized object of interest.
24. The apparatus of claim 23, wherein the method performed by the
processor based on the processor-executable instructions further
comprises: broadcasting the tracked location of the one or more
objects and receiving tracked location information of objects
outside a field of view of the one or more optical sensors, the
received, tracked location information being received from other
apparatuses.
25. The apparatus of claim 23, wherein the user-carried or
user-worn apparatus is configured to exchange tracked location
information of the one or more objects with at least one other
user-carried or user-worn apparatus and enhance the displayed view
of the scene on the display based on the geolocating of the one or
more objects by the apparatus and also based on tracked location
information received from the at least one other user-carried or
user-worn apparatus.
26. The apparatus of claim 23, wherein the apparatus is in
communication with one or more similar apparatuses such that users
of the apparatuses utilize the apparatuses in a gaming activity in
a physical area.
27. The apparatus of claim 23, wherein when there are multiple
instances of the apparatus in remote locations, geolocation is
performed with respect to all of the sensors included in the
multiple instances of the apparatus.
28. The apparatus of claim 23, wherein the enhancing of the
displayed view further comprises augmenting the displayed view to
indicate location of the one or more objects outside the field of
view of the one or more optical sensors.
29. The apparatus of claim 28, wherein the augmentation of the
displayed view includes insertion of one or more markers of objects
of interest.
30. The apparatus of claim 28, wherein the augmentation of the
displayed view includes insertion of one or more indicators
associated with one or more objects of interest.
31. The apparatus of claim 30, wherein the one or more indicators
provide direction to the one or more objects of interest.
32. The apparatus of claim 31, wherein the one or more indicators
provide direction to the one or more objects of interest outside of
the current field of view outside a field of view of the one or
more optical sensors.
33. The apparatus of claim 23, wherein the correlating is performed
by tracking the location of the recognized object of interest
during transition from the wide field of view image to the narrow
field of view image.
34. The apparatus of claim 33, wherein the overlay content for the
recognized object of interest is scaled in accordance with the
tracking of the location of the recognized object of interest from
the wide field of view to the narrow field of view.
35. The apparatus of claim 23, wherein the method performed by the
processor based on the processor-executable instructions further
comprises: maintaining the enhanced, displayed view consistent in
real-time with determined location of the recognized object of
interest as a user of the apparatus relocates the apparatus.
36. The apparatus of claim 23, wherein the method performed by the
processor based on the processor-executable instructions further
comprises: inserting objects into the enhanced, displayed view
based on geographic data that indicates that the inserted object is
to be occluded by another object in the enhanced, displayed
view.
37. The apparatus of claim 23, wherein enhancing the displayed view
comprises overlaying geographically located information from
external sources onto the displayed view.
38. A method for tracking objects by a user-carried or user-worn
apparatus, the method comprising: capturing, using one or more
optical sensors, images of a scene, wherein the images include a
wide field of view image of the scene and a narrow field of view
image of the scene; geolocating the one or more optical sensors,
and a location of one or more objects depicted in the captured
images based on a location of the one or more optical sensors;
recognizing an object of interest in the wide field of view image;
recognizing the object of interest in the narrow field of view
image; correlating a location of the recognized object of interest
in the wide field of view image and the narrow field of view image;
and enhancing a displayed view of the scene on a display of the
user-carried or user-worn apparatus based on the geolocating of the
one or more objects, wherein the enhanced, displayed view comprises
overlay content for the recognized object of interest.
39. The method of claim 38, further comprising broadcasting the
tracked location of the one or more objects and receiving tracked
location information of objects outside a field of view of the one
or more optical sensors, the received, tracked location information
being received from other apparatuses.
40. The method of claim 38, wherein the user-carried or user-worn
apparatus is configured to exchange tracked location information of
the one or more objects with at least one other user-carried or
user-worn apparatus and enhance the displayed view of the scene on
the display based on the geolocating of the one or more objects by
the apparatus and also based on tracked location information
received from the at least one other user-carried or user-worn
apparatus.
41. The method of claim 38, wherein when there are multiple
instances of the apparatus in remote locations, geolocation is
performed with respect to all of the sensors included in the
multiple instances of the apparatus.
42. The method of claim 38, wherein the enhancing of the displayed
view further comprises augmenting the displayed view to indicate
location of the one or more objects outside the field of view of
the one or more optical sensors.
43. The method of claim 42, wherein the augmentation of the
displayed view includes insertion of one or more markers of objects
of interest.
44. The method of claim 42, wherein the augmentation of the
displayed view includes insertion of one or more indicators
associated with one or more objects of interest.
45. The method of claim 44, wherein the one or more indicators
provide direction to the one or more objects of interest.
46. The method of claim 44, wherein the one or more indicators
provide direction to the one or more objects of interest outside of
the current field of view outside a field of view of the one or
more optical sensors.
47. The method of claim 38, wherein the correlating is performed by
tracking the location of the recognized object of interest during
transition from the wide field of view image to the narrow field of
view image.
47. The apparatus of claim 47, wherein the overlay content for the
recognized object of interest is scaled in accordance with the
tracking of the location of the recognized object of interest from
the wide field of view to the narrow field of view.
48. The method of claim 38, further comprising maintaining the
enhanced, displayed view consistent in real-time with determined
location of the recognized object of interest as a user of the
apparatus relocates the apparatus.
49. The method of claim 38, further comprising inserting objects
into the enhanced, displayed view based on geographic data that
indicates that the inserted object is to be occluded by another
object in the enhanced, displayed view.
50. The method of claim 38, wherein enhancing the displayed view
comprises overlaying geographically located information from
external sources onto the displayed view.
51. The method of claim 38, further comprising: capturing, using a
first lens of the one or more optical sensors, the wide field of
view image; and capturing, using a second lens the one or more
optical sensors, the narrow field of view image.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application is a continuation of and claims priority to
and the benefit of U.S. patent application Ser. No. 13/916,702,
filed Jun. 13, 2013, which is and claims priority to and the
benefit of U.S. provisional patent application No. 61/675,734 filed
Jul. 25th, 2012 and U.S. provisional patent application No.
61/790,715 filed on Mar. 15th 2013, the disclosures of which are
herein incorporated by reference in their entirety.
BACKGROUND OF THE INVENTION
[0003] Field of the Invention
[0004] Embodiments of the present invention generally relate to
augmented reality and, more particularly, to a method and apparatus
for tracking and geolocating objects of interest.
[0005] Description of the Related Art
[0006] Currently, binoculars are entirely optical devices allowing
users to zoom in on a particular real world area from a long
distance. If a user is attempting to view precise movements of an
object at a distance, such as a car, truck or person, the user is
able to use a binocular lens switching mechanism to change to a
different magnification. In other binoculars, a "zooming" function
is provided which can vary magnification ranges using a switch or
lever. However, once the user increases magnification level, the
user may experience difficulty in finding the object of interest
within the "zoomed" scene.
[0007] Further, in conventional binocular system, if there are
several binocular users in communication and one user has
identified one or more objects of interest such as interesting
wildlife, people, or the like, difficulty arises in signaling the
location of the object of interest to other binocular users. The
user who sighted the object of interest may use landmarks, but this
method is imprecise and landmarks may not be in view of other
user's binocular systems. In addition there may be several similar
landmarks, making it more difficult to identify the precise
location of objects of interest.
[0008] Therefore, there is a need in the art for a method and
apparatus for precisely determining the geolocation of distant
objects, in addition to tracking and sharing of the location for
those objects.
SUMMARY OF THE INVENTION
[0009] An apparatus and/or method for tracking and geolocating
objects of interest comprising one or more optical sensors for
capturing one or more images of a scene, wherein the one or more
optical sensors capture a wide field of view and corresponding
narrow field of view for the one or more mages of a scene, a
localization module, coupled to the one or more optical sensors for
determining the location of the apparatus, and determining the
location of one more objects in the one or more images based on the
location of the apparatus, and an augmented reality module, coupled
to the localization module, for enhancing a view of the scene on a
display based on the determined location of the one or more
objects.
[0010] Various advantages, aspects and features of the present
disclosure, as well as details of an illustrated embodiment
thereof, are more fully understood from the following description
and drawings.
BRIEF DESCRIPTION OF THE DRAWINGS
[0011] So that the manner in which the above recited features of
the present invention can be understood in detail, a more
particular description of the invention, briefly summarized above,
may be had by reference to embodiments, some of which are
illustrated in the appended drawings. It is to be noted, however,
that the appended drawings illustrate only typical embodiments of
this invention and are therefore not to be considered limiting of
its scope, for the invention may admit to other equally effective
embodiments.
[0012] FIG. 1 depicts an apparatus for tracking objects in
accordance with exemplary embodiments of the present invention;
[0013] FIG. 2 depicts an embodiment of the apparatus as a binocular
unit in accordance with exemplary embodiments of the present
invention;
[0014] FIG. 3 depicts an image captured by narrow field of view
optics in accordance with exemplary embodiments of the present
invention;
[0015] FIG. 4 depicts an image captured by wide field of view
optics in accordance with exemplary embodiments of the present
invention;
[0016] FIG. 5 depicts a wide field of view image, where a
helicopter is placed at a particular location in accordance with
exemplary embodiments of the present invention;
[0017] FIG. 6 depicts a wide field of view image, where a tank is
placed at a particular location in accordance with exemplary
embodiments of the present invention;
[0018] FIG. 7A depicts a plurality of binocular units viewing
portions of a scene in accordance with exemplary embodiments of the
present invention;
[0019] FIG. 7B depicts an exemplary use of the apparatus 100 in
accordance with exemplary embodiments of the present invention;
and
[0020] FIG. 8 depicts a method for object detection in accordance
with exemplary embodiments of the present invention.
DETAILED DESCRIPTION
[0021] Embodiments of the present invention generally relate to a
method and apparatus for tracking and geolocating objects of
interest. According to one embodiment, an apparatus for tracking
objects of interest comprises an optical sensor used in conjunction
with positional sensors such as inertial measurement units (IMU)
and global navigation satellite systems (such as geographical
positioning satellite (GPS) systems, Glonass and Galileo) to locate
the apparatus and a laser rangefinder, to geographically localize
objects in reference to the location of the optical sensor. Objects
of interest are tracked as a user zooms the optical sensors, or the
user switches from a wide field of view to a narrow field of view.
Embodiments of the present invention preserve a location of an
object of interest within the user's view throughout a change in
zoom or field of view. Additionally, if the object of interest
escapes the user's view during a zooming function, an augmented
reality system may provide the user with indicia of the location of
the object within the user's view, as well as other pertinent
information. An observed scene is augmented with information,
labels of real world objects and guidance information on moving a
camera unit or binocular unit to view objects of interest in a
user's field of view. The augmentation appears stable with respect
to the observed scene.
[0022] FIG. 1 depicts an apparatus for tracking objects 100 in
accordance with exemplary embodiments of the present invention. The
apparatus 100 comprises an object recognition module 102, sensors
103.sub.1 to 103.sub.n and 105.sub.1 to 105.sub.n, a tracking
module 104, an augmented reality module 106, a localization module
108, a reasoning module 110, a database 112, and output devices
116. According to exemplary embodiments, the database 112 further
comprises a knowledge base 133 and scene and language data 135.
[0023] According to one embodiment, the sensors 103.sub.1 to
103.sub.n are optical sensors and the sensors 105.sub.1 to
105.sub.n are positional sensors. In some embodiments, sensors
103.sub.1 to 103.sub.n may comprise infrared (IR) sensors, visible
sensors, night-vision sensors, radiation signature sensors,
radio-wave sensors or other types of optical sensors. Sensors
103.sub.1 to 103.sub.n simultaneously capture one or more images of
a scene 153, while the sensors 105.sub.1 to 105.sub.n capture data
about the geographic location and orientation of the apparatus 100.
According to an exemplary embodiment, all of the sensors are
physically coupled to the apparatus 100. In some embodiments, the
one or more sensors 103.sub.1 to 103.sub.n may be housed in a
telescope, a binocular unit, a headset, bifocals, or the like. In
some instances, the sensors may be remotely located from the
apparatus.
[0024] One or more of the sensors 103.sub.1 to 103.sub.n comprise a
sensor for capturing wide field of view images, e.g. 50 degrees to
80 degrees horizontal field of view, and one or more of the sensors
103.sub.1 to 103.sub.n comprise a sensor for capturing a narrow
field of view images, e.g. 1 degree to 10 degrees. A wide angle
image of the scene 153 provides context for narrower field of view
images. The images captured by sensors 103.sub.1 to 103.sub.n are
coupled to the object recognition module 102. The object
recognition module 102 performs object recognition on the wide
field of view images and the narrow field of view images to
recognize objects in all of the images. According to exemplary
embodiments, invariant image features are used to recognize objects
in the images as described in commonly assigned and issued U.S.
Pat. No. 8,330,819 filed on Apr. 12.sup.th, 2012, commonly assigned
U.S. Pat. No. 8,345,988 B2 filed on Jun. 22.sup.nd, 2005 and U.S.
Pat. No. 8,243,991 B2, filed on Aug. 14, 2012, herein incorporated
by reference in their entirety.
[0025] Simultaneously capturing narrow field of view images with
wide field of view images of the scene 153 allow the vision
algorithms to have additional context in recognizing objects of
interest in a scene. The narrow and wide FOV images in conjunction
with various positional sensors aid in high-fidelity localization,
i.e., highly accurate geolocalization of objects of interest can be
achieved. Further, the object recognition module 102 is coupled to
database 112 to receive invariant feature records and the like to
assist in object recognition. The object recognition module 102 may
also receive scene and language data 135 from the database 112 to
localize wide angle images, as described in commonly assigned,
co-pending U.S. patent application Ser. No. 13/493,654 filed Jun.
11th, 2012, herein incorporated by reference in its entirety.
[0026] For example, a GNSS received provides an initial bearing of
apparatus 100 and an IMU can provide relative movements of the
apparatus 100 such as rotation and acceleration. Together, six
degrees of freedom (6DOF) can be obtained using the various
sensors. In some instances, a GPS signal may only be sporadically
available. According to some embodiment, the apparatus 100 can
still provide tracking of objects of interest during periods of
sporadic GNSS reception. The signals previously received from the
GNSS may be stored in a memory of the apparatus 100. The
localization module 108 calculates a projected geographical
location of the apparatus 100 based on the previously stored GNSS
location and trajectory of the apparatus 100 in conjunction with
the trajectory of the objects of interest.
[0027] The localization module 108 couples the localized object and
apparatus information to the tracking module 104. The tracking
module 104 correlates the objects in the wide field of view mages
and the narrow field of view images. For example, in wide field of
view images, a dog and cat are recognized at a distance. In narrow
field of view images, the dog and cat are recognized and correlated
with the dog and cat in the wide field of view images because the
location of the dog and cat are known from the localization module
108. The tracking module 104 tracks the location of the dog and cat
when a user switches from viewing a wide field of view of the scene
to a narrow field of view of the scene 153. This process is
referred to as visual odometry, as described in commonly assigned,
co-pending U.S. patent application Ser. No. 13/217,014 filed on
Aug. 24.sup.th, 2011, herein incorporated by reference in its
entirety. Accordingly, the visual odometry is performed on both the
wide field of view and the narrow field of view simultaneously or
in parallel. The wide field of view provides robustness, while the
narrow field of view provides accuracy. Users may be provided with
preset field of view angles; for example, a lens with ten steps of
fields of view may be provided to the user. According to some
embodiments, to support multiple fields of view, prior geometric
calibration may be performed between images taken with different
fields of view.
[0028] According to other embodiments, a high resolution camera,
e.g., a camera that can capture greater than 50 MegaPixels (MP) may
be used instead of two views of a scene with differing field of
view. According to this embodiment, the camera enables a user to
capture the wide field of view with a very high resolution, for
example, 50 MP or more, and as the user zooms into a particular
area of the wide field of view of the scene, there is enough pixel
data to represent the narrow field of view of the scene also. The
object recognition module 102 uses the wide field of view image to
detect objects of interest, and the localization module 108 may use
a laser rangefinder to calculate precise geographic coordinates of
the objects of interest. The tracking module 104 enables a user of
the apparatus 100 to track the objects of interest as the zoom on
the camera is changed.
[0029] According to exemplary embodiments, the sensors 105.sub.1 to
105.sub.n may comprise navigation sensors such as a GNSS receiver,
an IMU unit, a magnetometer, pressure sensors, a laser range-finder
and the like. The localization module 108 localizes the apparatus
100 using the sensors 105.sub.1 to 105.sub.n and the narrow/wide
field of view images captured by sensors 103.sub.1 to 103.sub.n. A
three-dimensional location and orientation is established for the
apparatus 100 with more accuracy than with the positional sensors
105.sub.1 to 105.sub.n alone. Refer to commonly assigned,
co-pending U.S. patent application Ser. No. 13/217,014 filed on
Aug. 24.sup.th 2011, and commonly assigned U.S. Pat. No. 8,174,568
filed on Dec. 3.sup.rd, 2007, herein incorporated by reference in
their entireties, for more detail regarding calculating 30
coordinates based on an image and navigational sensors.
[0030] Once the location of the apparatus 100 is calculated, the
object recognition module 102 may recognize objects within the wide
field of view and narrow field of view of the scene 153. A user of
the apparatus 100 may select objects of interest by manually
inputting selections in the field of view using a laser rangefinder
or the like. In other instances, the object recognition module 102
scans both fields of views and detects objects automatically, based
on training data stored in database 112. While a user pans the
optical sensors 103.sub.1 . . . n the object recognition module 102
detects objects of interest, for example, facial recognition,
movement recognition, bird recognition, or the like. According to
exemplary embodiments, the localization module 108 further receives
object recognition information from the object recognition module
102 to aid in localization. For example, if the object recognition
module 102 recognizes an oak tree at a distance d1 from a park
bench at a known location in scene 153 in a wide angle image, and a
distance d2 from the apparatus 100, the localization module 108
uses the location of the park bench to determine the accuracy of
the estimated distance d1 of the oak tree from the apparatus
100.
[0031] The localization module 108 also locates objects of interest
in the user's fields of view and generates guides to locate objects
of interest. For example, the apparatus 100 may automatically
detect different types of birds in a scene 153 and automatically
guide a user towards the birds, as well as augmented a user display
with information regarding the birds. The user may mark a target
bird, stationary or moving, while in a "zoomed out" field of view,
and be guided by the registration of images on how to move the
camera while zooming in without losing the target object, ensuring
the target is still in view.
[0032] The reasoning module 110 uses the localization knowledge and
the object recognition knowledge from the database 112 to generate
relevant content to be presented to the user. A word model
consisting of geo-spatial information such as digital terrain,
geographical tags and 30 models can be used in the reasoning module
110. Similarly dynamic geographic data can be transmitted to the
reasoning module 110 from an external source to augment the
reasoning generated by the reasoning module 110. Based on
geo-location and context, overlay content can be customized in the
reasoning module 110. The overlay is also used in aiding in
navigation of a user of the apparatus 100.
[0033] The overlay content is then projected using the augmented
reality module 106 in the accurate coordinate system of the narrow
field of view optical sensors as derived by the localization module
108. Additionally the world model may be used by the reasoning
module 110 to determine occlusion information during the rendering
process. According to one embodiment, the apparatus 100 is a video
see-through system, i.e., a system with an optical sensor as well
as an optical viewfinder where content is rendered on the optical
viewfinder as the user of apparatus 100 views a scene through the
optical viewfinder. In this embodiment, the content is fused with
the captured images from the sensor 103.sub.1 to 103.sub.n for
display on the viewfinder, allowing the user to view the augmented
content overlaid on the scene 153.
[0034] Overlaid content may include dynamic information such as
weather information, social networking information, flight
information, traffic information, star-map information, and other
external geographically located information fed into the system.
Another example of geographically located information may include
overlaying real-estate house and commercial sales information,
tourist attractions, and the like on a view of the apparatus
100.
[0035] According to one embodiment, a visually impaired person
wearing a single head or body mounted camera may be aided using the
apparatus 100. As a visually impaired individual pans his or her
head, an implicit image mosaic or landmark images are generated by
the AR tracking module 101 and stored in database 112. An object of
interest can be designated in a certain view by image processing by
the object recognition module 102 or manually by the user. The
visually impaired user is then able to move around, but be guided
back to the designated view-point or location based on the
geolocated apparatus 100 as well as through alignment of images
from the camera with landmark and mosaic images stored in the
database 112.
[0036] According to other embodiments, the database 112 may store
images or videos of people of interest, such as celebrities,
athletes, news-worthy figures, or the like. The object recognition
module 102 may then perform object recognition while a user is
scanning a scene and match against the people of interest stored in
database 112 to identify people of interest in the scene.
[0037] FIG. 2 depicts an embodiment of the apparatus 100 as a
binocular unit 200 in accordance with exemplary embodiments of the
present invention. The binocular unit 200 comprises a wide field of
view optic 202 and a narrow field of view optic 204, i.e., a wide
field of view lens and a narrow field of view lens, an eyepiece
201, a GNSS 210, a magnetometer 214, an IMU 212 and a laser
rangefinder 213. According to some embodiments, the laser
rangefinder 213 may be replaced with an ultrasonic rangefinder or
by estimating range from ray tracing to a 30 model of the scene
given the user's location and view-point.
[0038] According to some embodiments, the computer system 250 may
be housed internally in the binocular unit 200. In other
embodiments, the computer system 250 is remote from the binocular
unit 200 and the binocular unit 200 transmits and receives data
from the computer system 250 through wired transmission lines, or
wirelessly.
[0039] The wide FOV optic 202 is coupled to the camera 206. The
narrow FOV optic is coupled to the camera 208. The camera 206 and
camera 208 are coupled to a computer system 250. The wide FOV optic
202 and the narrow FOV optic 204 are both coupled to the eyepiece
201. The eyepiece 201 comprises, according to one embodiment, a
first lens 201A and a second lens 201B. In other embodiments, the
eyepiece 201 may comprise only a single viewing lens, or more than
two viewing lenses.
[0040] In some embodiments, a user of the binocular unit 200 is
able to switch between viewing the wide FOV optic view and he
narrow FOV optic view in the eyepiece 201. The laser rangefinder
213 outputs a laser towards a direction aligned with the
orientation of the binocular unit 200. The laser rangefinder 213
establishes distances to objects within the field of view of the
binocular unit 200. In addition, the laser rangefinder may be used
to map the terrain into three-dimensional space, and store the 30
terrain map in the database 272.
[0041] The computer system 250 includes a processor 252, various
support circuits 256, and memory 254. The processor 252 may include
one or more microprocessors known in the art. The support circuits
256 for the processor 252 include conventional cache, power
supplies, clock circuits, data registers, I/O interface 257, and
the like. The I/O interface 257 may be directly coupled to the
memory 254 or coupled through the supporting circuits 256. The I/O
interface 257 may also be configured for communication with input
devices and/or output devices such as network devices, various
storage devices, mouse, keyboard, display, video and audio sensors
and the like.
[0042] The memory 254, or computer readable medium, stores
non-transient processor-executable instructions and/or data that
may be executed by and/or used by the processor 252. These
processor-executable instructions may comprise firmware, software,
and the like, or some combination thereof. Modules having
processor-executable instructions that are stored in the memory 254
comprise an augmented reality (AR) tracking module 258. The AR
tracking module 258 further comprises an object recognition module
260, a localization module 262, a tracking module 264, an augmented
reality module 266, a reasoning module 268 and a communication
module 270.
[0043] The computer 250 may be programmed with one or more
operating systems (generally referred to as operating system (OS)),
which may include OS/2, Java Virtual Machine, Linux, SOLARIS, UNIX,
HPUX, AIX, WINDOWS, WINDOWS95, WINDOWS98, WINDOWS NT, AND
WINDOWS2000, WINDOWS ME, WINDOWS XP, WINDOWS SERVER, WINDOWS 8, Mac
OS X, IOS, ANDROID among other known platforms. At least a portion
of the operating system may be disposed in the memory 404.
[0044] The memory 254 may include one or more of the following
random access memory, read only memory, magneto-resistive
read/write memory, optical read/write memory, cache memory,
magnetic read/write memory, and the like, as well as signal-bearing
media as described below.
[0045] The camera 206 and the camera 208 are coupled to the support
circuits 256 and the I/O interface 257 of the computer system 250.
The I/O interface 257 is further coupled to an overlay 214 of the
eyepiece 201. The object recognition module 260 recognizes objects
within the wide FOV optic 202 and the narrow FOV optic 204 as the
images are received by the cameras 206 and 208, i.e., in real-time
as the user of the binocular unit 200 scans a viewing area.
According to one embodiment, the user can scan a larger area than
that visible in the wide FOV optic 202 and have the object
recognition module 260 scan for objects in the larger area.
Additionally, the user can identify, using the laser rangefinder
213, or other visual means, an object of interest to track or
recognize, and the object recognition module 260 will perform
object recognition on that particular object, or the user
designated area.
[0046] The objects recognized by the object recognition module 260
are localized using the localization module 262 and tracked by the
tracking module 264. The tracking module 264 may also notify the
augmented reality module 266 that a visual indication should be
rendered to indicate the location of a particular recognized
object, even if the object moves out of view of the binocular unit
200. The augmented reality module 266 then creates an overlay for
the narrow field of view based on the recognized object, requested
content by the user, markers indicating the location of recognized
objects, amongst additional augmented reality content known to
those of ordinary skill in the art. The content is then coupled
from the augmented reality module 266 to the binocular unit 200
and, in particular, to the eyepiece 201 as overlay 214. According
to this embodiment, the overlay 214 overlays the real-time view
through the lenses 201A and 201B, i.e., as the user surveys a scene
through the lenses 201A and 201B, the overlay 214 is also visible
through the lenses 201A and 201B. The augmented reality module 266
generates the overlay 214 in geometrical and geographical
correspondence with the terrain visible to the user through the
binocular unit 200.
[0047] For example, a binocular unit 200 may observe, with the wide
FOV optic 202, a scene depicted by the image 300 shown in FIG. 3
and with the narrow FOV optic 204 observe a scene depicted by the
image 400 shown in FIG. 4. In image 300, a user has marked a
landmark as a landmark of interest after the object recognition
module 260 has recognized the objects in the image 300 and the
localization module 262 localized the landmark of interest. When
the user zooms or alternates views to the narrow field of view from
the eyepiece 201, the augmented reality module 266 places the
marker 302 appropriately to indicate the landmark of interest
according to the modified view, because the localization module 262
has precisely located the landmark in geographic space. According
to other embodiments, the object may be tracked so that when a user
of the unit 200 moves, the tracked objects of interest remain
centered in the near field of view.
[0048] According to another embodiment, the augmented reality
module 266 can overlay the view from a binocular unit 200 to create
a simulation. For example, FIG. 5 depicts a wide FOV image 500,
where a user of a binocular unit 200 indicates that a helicopter
502 should be placed at a particular location. The augmented
reality module 266 places the helicopter in the preferred location,
and overlay the helicopter in real time on overlay 214, where the
position of the helicopter in the user's view changes as the user
adjusts the binocular unit 200. The AR module 266 scales, rotates
and translates the helicopter simulation based on the geographic
localization data of the binocular unit 200 and the rotational
movement of the IMU 212. FIG. 6 depicts another example of
inserting simulated objects into a scene for aiding in simulated
operations. Image 600 depicts a narrow field of view scene with a
tank object 602 inserted into the scene. According to exemplary
embodiments, the tank 602 may be animated by the augmented reality
module 266, with a size, location and orientation corresponding to
the other objects recognized by the object recognition module 260
and the orientation of the device according to the localization
module 262 to create a mixed reality and simulated environment to
aid in user training, such as procedural training, strategic
training or the like.
[0049] Additionally, the reasoning module 268 may occlude the tank
as it is animated to conform to the terrain depicted in image 600.
According to another embodiment, a user of the binocular unit 200
may scan a scene by spatially moving the binocular unit 200 to
geographically locate the entire scanned contents of the scene. The
reasoning module 268 can then generate a three-dimensional terrain
based on the geographically located objects, terrain, and the like.
The wide field of view and narrow field of view images can be used
to construct a 30 map of an observed area by imaging the area based
on multiple images, and then estimating the 30 by a motion stereo
process accurately using the geographic location provided by the
various sensors 210-214, as described in U.S. patent application
Ser. No. 13/493,654 filed on Jun. 11.sup.th, 2012, herein
incorporated by reference in its entirety.
[0050] FIG. 7A depicts a plurality of binocular units viewing
portions of a scene 700 in accordance with exemplary embodiments of
the present invention. In accordance with exemplary embodiments,
FIG. 7 depicts a guidance system based on image processing and
alignment of wide and narrow imagery. In this embodiment, the
binocular unit (BU) 702, BU 704 and BU 706 comprises an augmented
tracking module as depicted in FIG. 2, communicating wired or
wirelessly, between each other through the network 701. In other
embodiments, each binocular unit is coupled, wired or wirelessly,
to a remote computer system such as computer system 250 through the
network 701.
[0051] The scene 700 comprises one or more objects, the one or more
objects comprising a first person 708, a vehicle 710 and a second
person 712. The scene 700 is being observed by three users: a first
user operating BU 702, a second user operating BU 704 and a third
user operating BU 706, each oriented in a different direction.
Though not shown in FIG. 7, each of the BU 702, 704 and 706 may be
oriented in three-dimensional space, as opposed to the 20 plane
depicted. For example, BU 704 may be positioned at a height above
BU 706, and BU 704 may be looking at an area in scene 700 lower in
height than the area n scene 700 being viewed by BU 706.
[0052] The BU 706 has person 708 and vehicle 710 in view. The
operator of BU 706 identifies person 708 and vehicle 710 as objects
of interest using a laser rangefinder, or in some other manner well
known in the art such as delimiting a particular area wherein the
person 708 and the vehicle 710 are located and directing the unit
706 to perform object recognition.
[0053] The BUs is all coupled to the network 701. According to some
embodiments, the BUs 702, 704 and 706 create an ad-hoc network and
communicate directly with each other.
[0054] The scene 700 is an example view of a field containing
several objects: a person 708, a vehicle 710, a second person 712
and several trees and bushes 732-736. According to one embodiment
of the present invention, the user of binocular unit 706 may mark
the location using a laser rangefinder, or other means, of the
person 708 and the vehicle 710. A marker 709 identifies the
location of the first person 708 and the marker 711 identifies the
location of the vehicle 711. The marker and location are shared
across the network 701 with the other Bus 702 and 704. Similarly,
the BU 705 marks the location of the second person 712 with a
marker 713.
[0055] The first person 708 and the vehicle 710 are in the view 707
of BU 707; however they are out of the view of BU 704. If the
operator of BU 706 directs the attention of the user of the BU 704
towards the vehicle 710 and the first person 708, the user will
have indications in their view 705 showing where to turn their BU
704 to view the person 708 and the vehicle 710. For example,
indicator 714 is shown as pointing in the direction that the BU
unit 704 must turn in order to see person 708. Indicator 716
indicates the direction the BU 704 must turn to view vehicle 710.
According to this embodiment, since the localization module of BU
unit 706 has localized vehicle 710 and first person 708, the
location is available to all units on the network 701. The AR
module of each of the BU units generates the indicators 716 and
714.
[0056] Similarly, the AR module of the BU unit 706 generates an
indicator 718 pointing in the direction of the person 712 and the
AR module of the BU unit 702 generates a marker 720 pointing in the
direction of the location of person 712.
[0057] The BU 702 also has first person 708 in its view 703. The
user of BU 702 may also mark the location of first person 708 using
the laser rangefinder. Since both BU unit 702 and BU unit 706 have
marked the location of person 708, the localization module may
localize person 708 more accurately given two perspectives and two
distances from the BUs. In addition, when a user of any of the Bus
pan their own view, they do not lose sight of the objects in the
scene 700 because of the indicators 714, 716, 718 and 720, as well
as the markers 709, 711 and 712.
[0058] According to other embodiments, the AR modules of each BU
may augment the BU view of the scene based on the determined
location of the objects with many other types of data such as
distance, motion estimation, threat level, or the like. In other
instances, facial and object recognition may be applied in an urban
setting for recognizing faces and well-known structures to aid in
guidance towards a particular landmark. A user of the BU 702, or
even a mobile phone with a camera, may designate and identify real
world objects in wide and narrow field of views, while panning the
phone or keeping the device still. The object is then located
according to the discussion above geographically in world space,
and in local view. The AR module 106 then provides audio and/or
visual guidance to another user on how to move their device to
locate the target object. In addition, users performing the object
tagging may attach messages to their tagged objects, or other tags
such as images, video, audio or the like.
[0059] Other users of mobile devices may then have objects of
interest highlighted on their displays, or have directions towards
the highlighted objects on their displays. In some embodiments, one
of the viewers of a scene may be viewing the scene aerially, while
other viewers of a scene are on the ground and either party may
tag/highlight objects of interest for the users to locate or
observe.
[0060] Those of ordinary skill in the art would recognize that the
Binocular units 702, 704 and 706 may be replaced with, for example,
a head-mounted display/vision unit, a mobile camera or the like. A
head-mounted display or vision unit may be wearable by a user, who
can view the surrounding scene through the visor or glasses in the
head-mounted unit, and also view the AR enhancements in the visor
or glasses overlaid on top of the user's view of the scene.
[0061] For example, several users of head-mounted units 702, 704
and 706 may be engaged in a session of gaming (i.e., playing a
video game) where objects are rendered onto each user's field of
view with regard to each unit's frame of reference, view and angle.
In this embodiment, the scene can become a virtual gaming arena
where users can view and use real-world objects with simulated
objects to achieve particular goals, such as capturing a flag or
scoring a point in a sport. In these embodiments, as well as other
embodiments, virtual advertisements may be displayed on each user's
view in a geographically sensible position as in-game
advertisements.
[0062] FIG. 7B depicts an exemplary use of the apparatus 100 in
accordance with exemplary embodiments of the present invention. A
user 740 uses the apparatus 100, here shown as binocular unit 742,
to observe a scene 750 of a cityscape containing several landmarks
and objects of interest. For example, the view of unit 742 may have
an augmented display showing the building 752 tagged with a tag 756
indicating that building 752 is the "Trans America Building". The
building 754 may also be tagged as the "Ferry Building Market
Place". In addition, the user 740 may attach a geographical tag
(geo-tag) to the building 754 stating "Meet you at the Ferry
Building Place" for members of their party as a coordination point.
Accordingly, embodiments of the present invention may apply to
tourism. Users may augment the view from their viewing units with
landmarks and points of interest. Geo-tags may be left for friends.
In addition, markups can be left on GOOGLE Earth or other mapping
services. In exemplary embodiments, different detection algorithms
can be plugged in to the object detection of the binocular units,
or may be available in the cloud. The images captured by the
binocular units may be processed by a local plug-in module or may
be sent to a remote cloud for analysis. Alternatively, tracking by
another system such as a unmanned aerial vehicle (UAV) may be sent
to the binocular units for display. For moving objects, the user
740 may mark a moving object, while a second user using another
viewing unit may perform an object detection search to detect the
moving object and recognize the moving object.
[0063] According to some instances, when users create Geo-tags,
they may be stored in a local ad-hoc network created by similar
viewing units throughout a particular area, or units manually
synced together. Geo-tags and other messages may then be shared
directly from one binocular unit another binocular unit. In other
embodiments, the geo-tags and other landmark/objects of interest
tags and markings are stored on external servers, "cloud servers"
for example, and each unit may access the cloud to access geo-tags
associated with them (i.e., tags within the user's group), or
global geo-tags depending on their preference. In this embodiment,
a second user's location is determined to be in the same vicinity
as one of the geo-tags and the cloud server transmits the tags
within the user's area. Geo-tags, objects of interest and guided
directions towards objects of interest may then be downloaded from
the external cloud servers and used to overlay an AR display on an
AR viewing unit. In other embodiments, users can share geo-tags,
objects of interest, landmarks, and the like, with other users by
giving an input command, such as a speech command or at the press
of a menu item.
[0064] According to exemplary embodiments, the apparatus 100 may
have a application programming interface (API) which allows
developers to plug-in particular applications of object detection
such as in recreational sports, bird-watching, tourism, hunting,
hiking, law enforcement and the like.
[0065] FIG. 8 depicts a method 800 for object detection in
accordance with exemplary embodiments of the present invention. The
method 800 depicts the functionality of the inferencing module
augmented reality tracking module 258 of FIG. 2, as executed by the
processor 252. The method begins at step 802 and proceeds to step
804.
[0066] At step 804, one or more optical sensors capture one or more
images of a scene containing wide field of view and narrow field of
view images. In some embodiments, the mages are captured and
processed in real-time, i.e., as a user views a scene in the narrow
field of view, a arger field of view mage containing the scene is
also being captured by the optical sensors.
[0067] At step 806, the localization module 262, determines the
location of the optical sensors. In this embodiment, the user is
directly holding, or otherwise coupled with, the location of the
optical sensors, so that the location of the optical sensors is
essentially equivalent to the location of the user of BU 200, for
example. The object recognition module recognizes objects in the
images and, using the laser rangefinder 213 of the BU 200, the
distance between the optical sensor and the objects can be
determined. Since the location of the BU 200 has been determined,
and the distance to the objects has been determined, the location
of the objects can also be accurately determined. In addition, if
several binocular units observe the same objects, another degree of
accuracy is added to the determined location of the objects.
[0068] At step 808, the AR module 266 enhances the view through the
BU 200 based upon the determined locations of the objects. As
described above, the AR module 266 overlays content on the user's
view. The content may include markers of objects of interest,
indicators directing the user of BU 200 to objects of interest
outside of the current field of view, weather information, flight
map information, and other information.
[0069] At step 810, the tracking module 264 tracks the one or more
objects from the wide field of view to the narrow field of view as
a user switches between the two fields of view. For example, the
user of BU 200 may view the wide FOV scene and sight a bird of
interest. However, in order to see the bird in more detail, the
user of BU 200 switches to a narrow field of view.
[0070] Under ordinary circumstances, the user would have difficulty
in finding the bird again in the narrow field of view. However, the
tracking module 264 tracks the objects from the wide FOV to the
narrow FOV, so the user will be presented with a narrow FOV already
containing the bird. According to another embodiment, when a user
switches from a wide FOV to a narrow FOV, the AR module 266
overlays an indication of the direction in which to rotate the BU
200 to locate the bird of interest.
[0071] At step 812, the BU 200 may use the communication module 270
to broadcast the determined locations of the BU 200 in addition to
the objects of interest found, to other people in the area with
mobile devices, binocular units, mobile computers or the like. In
addition, the other people in the area may send the location of
objects of interest outside of the user's field of view to the
communication module 270. At step 814, accordingly, the AR module
266 updates the overlay on the enhanced view indicating the
direction in which to look to find those objects of interest
outside the user's field of view.
[0072] Subsequently, the method 800 moves to step 816. At step 816,
the enhanced view is updated in real-time based on the determined
location of the one or more detected objects as the user relocates
the BU 200. At step 818, the view is further enhanced to
incorporate external sources of geographically located information
sources. For example, traffic may be overlaid on the BU 200 view,
weather information, social media information, or the like.
According to some embodiments, based on the determined location of
the BU 200 and selected objects of interest, the AR module 266 may
overlay virtual advertisements relevant to the user location. The
method terminates at step 820.
[0073] Various elements, devices, modules and circuits are
described above in association with their respective functions.
These elements, devices, modules and circuits are considered means
for performing their respective functions as described herein.
While the foregoing is directed to embodiments of the present
invention, other and further embodiments of the invention may be
devised without departing from the basic scope thereof, and the
scope thereof is determined by the claims that follow.
* * * * *