U.S. patent application number 13/159010 was filed with the patent office on 2012-12-13 for natural user interfaces for mobile image viewing.
This patent application is currently assigned to MICROSOFT CORPORATION. Invention is credited to Michael F. Cohen, Neel Suresh Joshi.
Application Number | 20120314899 13/159010 |
Document ID | / |
Family ID | 47293234 |
Filed Date | 2012-12-13 |
United States Patent
Application |
20120314899 |
Kind Code |
A1 |
Cohen; Michael F. ; et
al. |
December 13, 2012 |
NATURAL USER INTERFACES FOR MOBILE IMAGE VIEWING
Abstract
The mobile image viewing technique described herein provides a
hands-free interface for viewing large imagery (e.g., 360 degree
panoramas, parallax image sequences, and long multi-perspective
panoramas) on mobile devices. The technique controls the imagery
displayed on a display of a mobile device by movement of the mobile
device. The technique uses sensors to track the mobile device's
orientation and position, and front facing camera to track the
user's viewing distance and viewing angle. The technique adjusts
the view of a rendered imagery on the mobile device's display
according to the tracked data. In one embodiment the technique can
employ a sensor fusion methodology that combines viewer tracking
using a front facing camera with gyroscope data from the mobile
device to produce a robust signal that defines the viewer's 3D
position relative to the display.
Inventors: |
Cohen; Michael F.; (Seattle,
WA) ; Joshi; Neel Suresh; (Seattle, WA) |
Assignee: |
MICROSOFT CORPORATION
Redmond
WA
|
Family ID: |
47293234 |
Appl. No.: |
13/159010 |
Filed: |
June 13, 2011 |
Current U.S.
Class: |
382/103 |
Current CPC
Class: |
G06F 3/012 20130101;
G06F 3/04815 20130101; G06F 2200/1637 20130101 |
Class at
Publication: |
382/103 |
International
Class: |
G06K 9/00 20060101
G06K009/00 |
Claims
1. A computer-implemented process for viewing large scale imagery
on a mobile device, comprising: tracking a mobile device's
orientation and position; using a camera and viewer tracker on the
mobile device to track a user's face looking at a screen on the
mobile device; computing a viewing angle and a viewing distance
between the user and the screen on the mobile device by using the
tracked orientation and position of the mobile device, and the
tracked position of the user's face relative to screen of the
mobile device; and computing image transformations of an imagery
rendered on the screen of the mobile device using the computed
viewing angle and viewing distance to allow the user to control
viewing of the rendered imagery.
2. The computer-implemented process of claim 1, further comprising
the user changing the viewpoint of imagery rendered on the screen
by moving the mobile device relative to the user's face.
3. The computer-implemented process of claim 1, further comprising
zooming in or out of the imagery rendered on the screen by changing
the distance of the mobile device relative to the user's face.
4. The computer-implemented process of claim 3 wherein the distance
of the mobile device relative to the user's face is approximated
using face width.
5. The computer-implemented process of claim 1, further comprising
panning around the imagery rendered on the screen by changing the
position of the mobile device laterally in relation to the user's
face.
6. The computer-implemented process of claim 1, further comprising
mapping the angular offset of the user's face from the normal to
the screen of the mobile device and the change in rotation about
the vertical axis tangent to the screen to a position in the
imagery rendered in computing the image transformations.
7. The computer-implemented process of claim 6, further comprising
fusing the tracked mobile device's orientation and position and
tracked user's face to map the angular offset of the user's face
from the normal to the display of the mobile device and the change
in rotation about the vertical axis tangent to the display of the
mobile device to a position in the rendered imagery.
8. The computer-implemented process of claim 1 wherein the mobile
device's orientation and position is determined by a gyroscope on
the mobile device.
9. The computer-implemented process of claim 8 wherein the viewer
tracker is used to correct for drift of the gyroscope.
10. A computer-implemented process for viewing large scale imagery
on a mobile device, comprising: tracking a mobile device's
orientation and position with a gyroscope on the mobile device;
using a front-facing camera and viewer tracker on the mobile device
to track a user's face looking at a screen on the mobile device;
using the mobile device's orientation and position from the
gyroscope and the position of the user's face obtained by the
viewer tracker to determine a combined position and rate control
for viewing imagery on the screen of the mobile device. using the
combined position and rate control to compute image transformations
of the imagery rendered on the screen of the mobile device to allow
the user to display different viewpoints of the rendered
imagery.
11. The computer-implemented process of claim 10, wherein the
imagery is a 360 degree panorama and wherein the user can pan to
the left and to the right in the rendered imagery by changing the
viewing angle between the user and the screen of the mobile device,
and can zoom into the imagery by changing the distance between the
user and the screen of the mobile device.
12. The computer-implemented process of claim 10, wherein the
imagery is a set of parallax images and wherein the combined
position and rate control is used to determine a relative offset of
a virtual camera.
13. The computer-implemented process of claim 10, wherein the
imagery comprises a series of 360 degree panoramas of the same
scene taken at fixed intervals, and a set of long perspective strip
panoramas created by clipping out and stitching parts of the series
of 360 degree panoramas.
14. The computer-implemented process of claim 13, wherein the user
can view left and right in a 360 degree panorama of the series by
changing the viewing angle between the user's face and the screen
of the mobile device and can zoom into a different 360 degree
panorama of the series by changing the viewing distance between the
user's face and the screen of the mobile device.
15. A system for viewing large scale imagery, comprising: a general
purpose computing device; a computer program comprising program
modules executable by the general purpose computing device, wherein
the computing device is directed by the program modules of the
computer program to, track a mobile device's orientation and
position; use a camera and viewer tracker on the mobile device to
track a user's face looking at a screen on the mobile device; use
the mobile device's tracked orientation and position, and the
position of the user's face obtained by the viewer tracker, to
determine a combined position and rate control for viewing imagery
on the screen of the mobile device, using the combined position and
rate control to compute image transformations of the imagery
rendered on the screen of the mobile device to allow the user to
display different view points of the rendered imagery.
16. The system of claim 15, wherein the module to determine the
combined position and rate control for viewing imagery on the
screen of the mobile device, further comprises a sub-module to:
compute a viewing angle and a viewing distance between the user and
the screen on the mobile device by using the tracked orientation
and position of the mobile device, and the tracked position of the
user's face relative to screen of the mobile device.
17. The system of claim 16 wherein the user can change the
viewpoint of the imagery rendered on the screen of the mobile
device by changing the viewing angle of the mobile device relative
to the user's face.
18. The system of claim 16, wherein the user's face can be outside
of the field of view of the camera and wherein a gyroscope on the
mobile device can be used to estimate the location of the face.
19. The system of claim 15, wherein the viewer tracker tracks the
viewer's face by: locating the viewer's face relative to the camera
using a face finder which returns a rectangle for the size and
location of the face; recording a face template from the rectangle
along with position and size; matching the face template at varying
positions and scales around the current position and scale at each
subsequent frame recorded by the camera to find the face in
subsequent frames; if the face is lost, reacquiring the face with
the face finder.
20. The system of claim 15, wherein the large scale imagery is one
of a group comprising: high resolution imagery; wide field of view
imagery; and a multi-perspective panorama.
Description
BACKGROUND
[0001] Most viewing of photographs now takes place on an electronic
display rather than in print form. Yet, almost all interfaces for
viewing photos on an electronic display still try to mimic a static
piece of paper by "pasting the photo on the back of the glass", in
other words, simply scaling the image to fit the display. This
approach ignores the inherent flexibility of displays while also
living with the constraints of limited pixel resolution.
[0002] In addition, the resolution and types of imagery available
continues to expand beyond traditional flat images, e.g., high
resolution, multi-perspective, and panoramic imagery.
Paradoxically, as the size and dimensionality of available imagery
has increased, the typical viewing size has decreased as an
increasingly significant fraction of photo viewing takes place on a
mobile device with limited screen size and resolution. As a result,
the mismatch between imagery and display has become even more
obvious. While there are obvious limitations due to screen size on
mobile devices, one significant benefit is that they are outfitted
with numerous sensors including accelerometers, gyros, and cameras.
The sensors, are currently ignored in the image viewing
process.
SUMMARY
[0003] This Summary is provided to introduce a selection of
concepts in a simplified form that are further described below in
the Detailed Description. This Summary is not intended to identify
key features or essential features of the claimed subject matter,
nor is it intended to be used to limit the scope of the claimed
subject matter.
[0004] The mobile image viewing technique described herein provides
a hands-free interface for viewing large imagery (e.g., 360.degree.
panoramas, parallax image sequences, and long multi-perspective
panoramas) on mobile devices. The technique controls a display on a
mobile device, such as, for example, a mobile phone, by movement of
the mobile device. The technique uses sensors to track the mobile
device's orientation and position, and front facing camera to track
the user's viewing distance and viewing angle. The technique
adjusts the view of a rendered image on the mobile device's display
according to the tracked data.
[0005] More particularly, in one embodiment, the technique employs
a sensor fusion methodology that combines viewer tracking using a
front facing camera with gyroscope data from the mobile device to
produce a robust signal that defines the viewer's 3D position
relative to the display. For example, viewer tracking can be
achieved by face tracking, color-blob/skin tracking, tracking
feature points of the face and other types of ego-motion and
optical flow tracking. The gyroscopic data provides both low
latency feedback and allows extrapolation of the face position
beyond the field-of-view of the front facing camera. The technique
employs a hybrid position and rate control that uses the viewer's
3D position to drive viewing and exploration of very large image
spaces on the mobile device.
DESCRIPTION OF THE DRAWINGS
[0006] The specific features, aspects, and advantages of the
disclosure will become better understood with regard to the
following description, appended claims, and accompanying drawings
where:
[0007] FIG. 1 depicts a flow diagram of an exemplary process for
practicing one embodiment of the mobile image viewing technique
described herein.
[0008] FIG. 2 depicts another flow diagram of another exemplary
process for practicing the mobile image viewing technique described
herein.
[0009] FIG. 3 is an exemplary architecture for practicing one
exemplary embodiment of the mobile image viewing technique
described herein.
[0010] FIG. 4 shows that a gyroscope alone cannot distinguish
between situations in Case B and Case C. The drift signal,
.theta..sup.D, disambiguates these and brings the control in line
with .theta..sup.F.
[0011] FIG. 5 depicts the face offset angle and distance that is
computed from a face tracked in a camera situated to the side of
the display of a mobile device.
[0012] FIG. 6 is a schematic of an exemplary computing environment
which can be used to practice the mobile image viewing
technique.
DETAILED DESCRIPTION
[0013] In the following description of the mobile image viewing
technique, reference is made to the accompanying drawings, which
form a part thereof, and which show by way of illustration examples
by which the mobile image viewing technique described herein may be
practiced. It is to be understood that other embodiments may be
utilized and structural changes may be made without departing from
the scope of the claimed subject matter.
[0014] 1.0 Mobile Image Viewing Technique
[0015] The following sections provide an overview of the mobile
image viewing technique, exemplary processes and an exemplary
architecture for practicing the technique, as well as details of
the mathematical computations employed in some embodiments of the
technique.
[0016] 1.1 Overview of the Technique
[0017] The mobile image viewing technique described herein allows a
user to perform image viewing on mobile devices, leveraging the
many sensors on typical mobile devices, such as, for example, cell
phones or smart phones. In particular, in one embodiment, the
technique uses low latency gyros on a mobile device to sense
changes in direction of the device as well as the front-facing
camera to detect and track the position of a user/viewer relative
to a display on the mobile device, albeit with higher noise and
latency. Fusion of these two sensor streams provides the
functionality to create compelling interfaces to view a range of
imagery. The technique provides for natural user interfaces for
viewing many forms of complex imagery ranging from multiple images
stitched to create a single viewpoint 360.degree. panorama,
multi-viewpoint image sets depicting parallax in a scene, and
street side interfaces integrating both multi-perspective panoramas
and single viewpoint 360.degree. panoramas.
[0018] One aspect of large format and/or very wide angle imagery is
that there is a natural tension between a desire for direct
positional control, i.e., a direct mapping of sensor output to
position in the image, versus rate control, mapping sensor position
to velocity of motion across an image. In one embodiment, the
technique employs a hybrid rate/position control through a single
relationship between sensors and output. Some technical
contributions of the technique thus include the sensor fusion
between the gyro and viewer tracking from front facing camera, as
well as novel functional relationships between this sensing and
control of image viewing across numerous modalities.
[0019] The following sections provide exemplary processes for
practicing the technique, an exemplary architecture for practicing
the technique and details of various embodiments of the technique.
Details for those processes, and the exemplary architecture are
described in Section 2.
[0020] 1.2 Exemplary Processes for Practicing the Technique
[0021] FIG. 1 provides an exemplary process for practicing one
embodiment of the mobile image viewing technique. As shown if FIG.
1, block 102, a mobile device's (for example, a mobile phone's)
orientation and position are tracked using instrumentation on the
device. For example, this mobile device could be a smart phone,
Personal Data Assistant (PDA), or other cellular phone with a
screen for viewing imagery. Tracking could be, for example, using a
gyroscope on the mobile device, a digital compass, an
accelerometer, or some other type of instrumentation that can
determine orientation and position of the mobile device. A camera
and viewer tracker on the mobile device is also simultaneously used
to track a user's face looking at a screen on the mobile device, as
shown in block 104. For example, the camera could be a front facing
camera facing the user/viewer, disposed on the same side of the
mobile device as the screen of the mobile device. The viewer
tracker could be a face tracker, color-blob/skin tracker, tracker
for tracking feature points of the face and other types of
ego-motion and optical flow tracker.
[0022] A viewing angle and a viewing distance between the user and
the screen on the mobile device are computed by using the tracked
orientation and position of the mobile device, and the tracked
position of the user's face relative to screen of the mobile
device, as shown in block 106. The details of computing this
viewing angle and viewing distance are provided in Section 3.
[0023] Image transformations of imagery to be rendered on the
screen of the mobile device are then computed using the computed
viewing angle and viewing distance to allow the user to control
viewing of the rendered imagery, as shown in block 108. For
example, the imagery can include any type of images including
single viewpoint panoramas, multi-viewpoint image sets depicting
parallax in a scene, multi-perspective panoramas or a combination
of these. The user can change the view of the imagery by merely
moving the mobile device relative to his or her face.
[0024] FIG. 2 provides another exemplary process for practicing
another embodiment of the mobile image viewing technique. As shown
if FIG. 2, block 202, a mobile device's (for example, a mobile
phone's) orientation and position is tracked using a gyroscope
(although other similar instrumentation could be used). A camera
and viewer tracker on the mobile device is also used to track a
user's face looking at a screen on the mobile device, as shown in
block 204.
[0025] The mobile device's orientation and position from the
gyroscope and the position of the user's face obtained by the
viewer tracker is used to determine a combined position and rate
control for viewing imagery on the screen of the mobile device, as
shown in block 206. The details of the computation for determining
this combined position and rate control are provided in Section
3.
[0026] Image transformations of imagery to be rendered on the
screen of the mobile device are then computed using the computed
combined position and rate control to allow the user to display
different points of the rendered imagery, as shown in block 208. In
general, the combined position and rate control values are mapped
to coordinates in the imagery in order to determine which portion
of the imagery to render. When the user moves the mobile device
relative to his face the imagery on the device will change based on
the distance and the angle the user holds the device.
[0027] 1.3 Exemplary Architecture
[0028] FIG. 3 shows an exemplary architecture 300 for practicing
one embodiment of the mobile image viewing technique. As shown if
FIG. 3, a mobile imagery computing module 302 is located on a
computing device 600, which will be described in greater detail
with respect to FIG. 6. This computing device 600 is preferably
mobile, such as, for example a mobile phone or smart phone. The
mobile computing device 600 includes a camera 304 that can be used
to capture the face of a user 306 of the mobile computing device
600. The mobile computing device 600 includes instrumentation such
as, for example, a gyroscope 308 that is used to track the mobile
computing device's orientation and position. It should be noted,
however that other instrumentation capable of determining the
mobile devices orientation and position could equally well be
used.
[0029] The mobile computing device 600 includes a viewer tracker
310 (e.g., a face tracker, optical flow on the camera, point
tracker) that is used to track a user's face, looking at a screen
312 on the mobile device, which is captured by the camera 304. The
mobile device's tracked orientation and position, and the position
of the user's face obtained by the viewer tracker are used to
determine a viewing angle in a viewing angle computation module 312
from the mobile computing device 600 to the user 306. In addition,
the distance between the mobile computing device and the user are
determined in a distance computation module 314. A combined
position and rate control for viewing imagery 318 on the screen 312
of the mobile device in a combined position and rate control
computation module 316. The output of the combined position and
rate control module 316 is used to compute image transformations of
imagery to be rendered in an image transformation module 320. The
computed image transformations are used to create transformed
imagery 322 to be rendered on the screen 312 of the mobile device
600. Using the transformed imagery 322 the user can display
different views of the rendered imagery on the screen simply by
moving the camera relative to his or her face.
[0030] 2.0 Exemplary Computations for Embodiments of the
Technique
[0031] Exemplary processes and an exemplary architecture having
been described, the following sections provide details and
exemplary calculations for implementing various embodiments of the
technique.
[0032] 2.1 Mapping Sensors to Image Transformations
[0033] Despite the lack of many traditional affordances found in a
desktop setting (large display, keyboard, mouse, etc.), mobile
devices offer a wide variety of sensors (touch, gyroscopes,
accelerometers, compass, and cameras) that can help overcome the
lack of traditional navigation controls and provide a richer and
more natural interface to image viewing. The mobile image viewing
technique described herein has been used with various applications
that cover a variety of image (scene) viewing scenarios in which
the imagery covers either a large field of view, a wide strip
multi-perspective panorama, multi-views, or a combination of these.
In particular, interfaces for 360.degree. panoramas, multi-view
strips exhibiting parallax, and Microsoft.RTM. Corporation's
Bing.TM. for iOS StreetSide.TM. interface that combines very long
multi-perspective strip panoramas with single view 360.degree.
views. A common aspect of all of these is that the imagery requires
exploration to view the full breadth of the data. Details of these
exemplary applications are described in Section 3.
[0034] The most obvious way to explore imagery that cannot fit in
the display is to use touch sensing to mimic a traditional
interface. Users have become accustomed to sliding a finger to pan
and performing a two fingered pinch for zooming. These affordances
have four main drawbacks, however. First, a user's fingers and hand
obscure a significant portion of the display. Second, it becomes
difficult to disambiguate touches designed for purposes other than
navigation, for example, a touch designed to select a link embedded
with the imagery. Third, using the touch screen generally requires
two hands. Finally, combined motions require sequential gestures,
e.g., a "pan and zoom" action requires first a swipe and then a
pinch. The mobile image viewing technique described herein instead
uses more natural interfaces involving one-handed motion of the
device itself for image navigation.
[0035] 2.2 Hybrid Gyro Plus Viewer Tracking
[0036] In the real world, a person moves his or her gaze relative
to a scene, or moves an object relative to their gaze to fully
explore a scene (or object). In both cases, their head is moving
relative to the scene. If one considers an image as a
representation of a scene on a device, tracking the head relative
to the device as an affordance for navigation seems like a natural
fit.
[0037] Viewer tracking, such as, for example, face tracking alone
can, in theory, provide a complete 3D input affordance, (x,y)
position based on face location, and (z) depth based on face size.
However, viewer tracking alone exhibits a few robustness problems.
Viewer tracking, such as face tracking, is costly and thus incurs
some latency. In addition, the vision algorithms for tracking face
position and size are inherently noisy as small changes in face
shape and illumination can produce unexpected signals. This can be
overcome somewhat through filtering albeit at the price of more
latency. Finally, viewer tracking is lost beyond an offset angle
beyond the field of view of the front facing camera (it has been
experimentally found that this limit is about .+-.15 degrees).
Nonetheless, viewer tracking is unique in its ability to deliver a
3D signal that is directly relevant to image viewing
applications.
[0038] Gyroscopes provide a more robust and lower latency
alternative for the 2D (x,y) angular position. For relative
orientation, the gyros provide a superior signal, however they do
drift considerably. It is common to see 5 degree drifts during a
360.degree. rotation over 15 seconds. In addition, gyros alone
cannot disambiguate between the cases shown in FIG. 4 Case B and
FIG. 4 Case C. In the first case, the user 402 has rotated the
device 404. In the second case, the user 402 has rotated themselves
carrying that same rotation to the device 404. To achieve both
robustness and liveliness and reduced ambiguity, the technique
creates a sensor fusion that is a hybrid of the gyro plus viewer
tracking using the front facing camera.
[0039] In one embodiment of the technique, it was decided not to
use accelerometers for positions tracking based on empirical
experience that has shown that aside from the direction of gravity
and fairly sudden moves, the noise from the accelerometers
overwhelms subtle motions. However, it should be noted that
accelerometers, compasses and other tracking devices could feasibly
be used to track the mobile device.
[0040] 2.2.1 Viewer Tracker
[0041] In one embodiment of the technique, a face is first located
in the front facing camera via a face finder. Various conventional
face finders can be used for this purpose. In one embodiment, the
technique finds the user's face using a conventional face finder
and returns a rectangle for the size and location of the face. A
face template is recorded from this rectangle along with the
position and size. This template is then matched at varying (x,y)
positions and scales around the current (position, scale) at each
subsequent frame. The (position, scale) with the highest
correlation to the original template in the new frame is considered
the current location of the face. In one embodiment, the technique
searches over a rectangle 3.times.the size of the previous face in
x and y and over 3 scales between .+-.5% of the previous scale. If
the face is lost, the slower full-frame face finder is re-run until
the face is found. Given the field of view of the front facing
camera, position is trivially transformed to horizontal and
vertical angular offsets, .theta..sub.x.sup.F' and
.theta..sub.y.sup.F'. From here on, only the more important
horizontal offset, .theta..sub.x.sup.F' will be referred to, and
the x subscript will be dropped. As previously mentioned, however,
other methods of tracking a viewer can be used.
[0042] 2.2.2 Horizontal Angle
[0043] Referring to FIG. 5, there are two direct signals the
technique tracks, .theta..sup.F', 502, the angular offset of the
face from the normal to the display (from the front-facing camera),
and .DELTA..theta..sup.G, 504, the change in rotation about the
vertical axis tangent to the display (from the gyros). The
technique estimates the distance d 506 from the camera 508 from
face width. Given the fixed offset of the camera 508 from the
center of the display 512 and .DELTA..theta..sup.G, 504, the
technique derives .theta..sup.F, 510, the face's angular offset
from the display center. It is thus possible to compute the value,
.THETA., which is mapped to the position and rate control for the
user interface.
.THETA..sub.t=.alpha..THETA..sub.t-1+(1-.alpha.)(.theta..sub.t.sup.G+.th-
eta..sub.t.sup.D) (1)
.THETA..sub.t represents the value at time t that the technique
will map to its control functions. The variable a serves to provide
a small amount of hysteresis to smooth this signal. It was found
that a value of 0.1 provides a small smoothing without adding
noticeable lag. .theta..sub.t.sup.G is the time integrated gyro
signal, i.e., the total rotation of the device including any
potential drift:
.theta..sub.t.sup.G=.theta..sub.t-1.sup.G+.DELTA..theta..sub.t.sup.G
(2)
where .DELTA..theta..sub.t.sup.G represents the direct readings
from the gyro. .theta..sub.t.sup.D represents a smoothed signal of
the difference between the face position, .theta..sup.F and the
integrated gyro angle, .theta..sup.G. This quantity encompasses any
drift incurred by the gyro as well as any rotation of the user
himself (see FIG. 4 Case C). Since the viewer tracker runs more
slowly than the gyro readings (in one embodiment, 1 to 10 HZ for
the viewer tracker and 50 Hz for the gyro), the technique records
both the face position and gyro values each time a face position is
received. .theta..sup.D is thus defined by
.theta..sub.t.sup.D=.beta..theta..sub.t-1.sup.D+(1-.beta.)(.theta.*.sup.-
F-.theta.*.sup.G) (3)
where "*" represents the time of the most recent face track, and
.beta. serves to smooth the face signal and add hysteresis. In one
embodiment, the technique uses a much higher value of .beta.=0.9 in
this case. This produces a some lag time which actually adds a side
benefit discussed in the context of the control mapping.
[0044] To summarize, .THETA..sub.t represents a best guess of the
face position relative to the device even when the face is beyond
the field of view of the device. Although viewer tracking, such as,
for example, face tracking, is inherently slow and noisy, the gyro
signal serves as a lively proxy with good accuracy over short time
intervals. The viewer tracker is used to continuously correct the
gyro input to bring it back in line with where the face is seen
from the front-facing camera.
[0045] 2.2.3 Distance
[0046] In one embodiment, the technique uses the face width in the
camera's view as as proxy for the face's distance from the device.
The technique uses a time smoothed face size for this signal.
Z.sub.t=.gamma.Z.sub.t-1+(1-.gamma.)(1/FaceSize) (4)
where .gamma.=0.9 to smooth over noisy readings albeit at some cost
of latency.
[0047] 2.3 Hybrid Position and Rate Control
[0048] Given the angular offset, .THETA..sub.t, one is now left
with the mapping between this value and the controls for viewing
the imagery. The simplest and most intuitive mapping is a position
control, in which the .THETA..sub.t is mapped through some linear
function to the position on the imagery (i.e., angle in a panorama,
position on a large flat image, or viewing position in a multi-view
parallax image set). Position mapping can provide fine control over
short distances and is almost always the control of choice for
displaying imagery when applicable.
[0049] Unfortunately, such a simple mapping has severe limitations
for viewing large imagery. The useful domain of .THETA..sub.t is
between .+-.40.degree. since beyond this angle of a typical mobile
device/phone display becomes severely foreshortened and unviewable.
For 360.degree. panoramas or very long multi-perspective images
this range is very limited. The alternatives are to provide
clutching or to create a rate control in which .THETA..sub.t is
mapped to a velocity across the imagery. Although rate controls
provide an infinite range as the integrated position continues to
increase over time, they have been shown to lack fine precision
positioning as well as suffering from a tendency to overshoot.
[0050] 2.4 Zoom Control
[0051] In panorama and street side applications, Z.sub.t is
linearly mapped to zoom level. The technique caps the minimum zoom
level at a bit less than arm's length. The street side application
has a fixed zoom level at which a mode change takes place between
the multi-perspective panoramas and cylindrical panoramas. To avoid
rapid mode changes near this transition point, the technique eases
in a small offset to the zoom level after the mode switch and then
eases out the offset after the mode switches back.
[0052] 2.5 Mapping Controls to Imagery
[0053] Once the values of the controls are obtained they are mapped
to the imagery to be rendered on the screen. For example, the
output of the position and velocity control can be mapped to: the
viewing angle in a 360 panorama or viewpoint selection in a
multi-point panorama. The zoom control can be used to scale the
field of view, i.e., literally zoom in/out on an image or to switch
between modes as is described in the previous paragraph.
[0054] 3.0 Exemplary Applications
[0055] The interaction paradigm of the technique described above
has been applied to a number of image viewing applications. These
include wide angle imagery such as 360.degree. panoramas and
parallax photos consisting of a series of side-by-side images.
Also, the technique has been applied to very long multi-perspective
images and 360.degree. panoramas.
[0056] 3.1 Panoramas
[0057] Wide angle and 360.degree. panoramas have become a popular
form of imagery especially as new technologies arrive making their
construction easier. Sites, which hosts high resolution panoramas,
and the bubbles of street side imagery are two examples.
[0058] By interpreting .DELTA.X.sub.t at each frame time as a
change in orientation, and Z.sub.t as the zoom factor, the
technique provides an interface to such imagery that does not
require two-handed input or standing and physically turning in
place.
[0059] 3.2 Parallax Images
[0060] By sliding a camera sideways and capturing a series of
images one can create a virtual environment by simply flipping
between the images. Automated and less constrained versions for
capture and display of parallax photos also exist.
[0061] In one embodiment, .DELTA.X.sub.t at each frame time
represents a relative offset of the virtual camera. One embodiment
of the technique provides an interface to such imagery that creates
a feeling of peering into a virtual environment. In this case, the
position control and thus the gyro input dominates. The viewer
tracker's role is primarily to counteract gyro drift.
[0062] 3.3 Street Imagery
[0063] A new interface for viewing street side imagery was
demonstrated in Microsoft.RTM. Corporation's StreetSlide.TM.
application. The original imagery consists of a series of
360.degree. panoramas set at approximately 2 meter intervals along
a street. The StreetSlide.TM. paradigm was subsequently adapted to
create long multi-perspective strip panoramas constructed by
clipping out and stitching parts of the series of panoramas. The
StreetSlide.TM. application automatically flips between the long
strip panoramas and the 360.degree. panoramas depending on zoom
level. Other similar applications use traditional finger swipes and
pinch operations.
[0064] The present mobile image viewing technique was applied as a
new user interface on top of the StreetSlide.TM. application. It
could equally well be applied to similar applications. Since there
are two modes, the meaning of .DELTA.X.sub.t switches. In slide
mode, .DELTA.X.sub.t moves the view left and right along the street
side. Z.sub.t zooms the strip panorama in and out. At a given zoom
level, the mode switches automatically to the corresponding
360.degree. panorama at that location on the street. At this point,
the technique revert to the panorama control described above.
Zooming out once more returns to the slide mode. Navigation now
requires only one hand leaving the other hand free for unambiguous
access to other navigation aids and information overlaid on the
location imagery.
[0065] 3.4 Alternate Embodiments
[0066] Many other types of media could be viewing using the mobile
image viewing technique. For example, the technique can be applied
to an interface to mapping applications. Being able to zoom out
from a street in San Francisco, pan across the country, and back in
to a New York street, for example, would be achievable by simply
moving the device away, tilting it "east" and pulling the device
back towards the viewer.
[0067] 4.0 Exemplary Operating Environments:
[0068] The mobile image viewing technique described herein is
operational within numerous types of general purpose or special
purpose computing system environments or configurations. FIG. 6
illustrates a simplified example of a general-purpose computer
system on which various embodiments and elements of the mobile
image viewing technique, as described herein, may be implemented.
It should be noted that any boxes that are represented by broken or
dashed lines in FIG. 6 represent alternate embodiments of the
simplified computing device, and that any or all of these alternate
embodiments, as described below, may be used in combination with
other alternate embodiments that are described throughout this
document.
[0069] For example, FIG. 6 shows a general system diagram showing a
simplified computing device 600. Such computing devices can be
typically found in devices having at least some minimum
computational capability, including, but not limited to, personal
computers, server computers, hand-held computing devices, laptop or
mobile computers, communications devices such as cell phones and
PDA's, multiprocessor systems, microprocessor-based systems, set
top boxes, programmable consumer electronics, network PCs,
minicomputers, mainframe computers, audio or video media players,
etc.
[0070] To allow a device to implement the mobile image viewing
technique, the device should have a sufficient computational
capability and system memory to enable basic computational
operations. In particular, as illustrated by FIG. 6, the
computational capability is generally illustrated by one or more
processing unit(s) 610, and may also include one or more GPUs 615,
either or both in communication with system memory 620. Note that
that the processing unit(s) 610 of the general computing device of
may be specialized microprocessors, such as a DSP, a VLIW, or other
micro-controller, or can be conventional CPUs having one or more
processing cores, including specialized GPU-based cores in a
multi-core CPU.
[0071] In addition, the simplified computing device of FIG. 6 may
also include other components, such as, for example, a
communications interface 630. The simplified computing device of
FIG. 6 may also include one or more conventional computer input
devices 640 (e.g., pointing devices, keyboards, audio input
devices, video input devices, haptic input devices, devices for
receiving wired or wireless data transmissions, etc.). The
simplified computing device of FIG. 6 may also include other
optional components, such as, for example, one or more conventional
computer output devices 650 (e.g., display device(s) 655, audio
output devices, video output devices, devices for transmitting
wired or wireless data transmissions, etc.). Note that typical
communications interfaces 630, input devices 640, output devices
650, and storage devices 660 for general-purpose computers are well
known to those skilled in the art, and will not be described in
detail herein.
[0072] The simplified computing device of FIG. 6 may also include a
variety of computer readable media. Computer readable media can be
any available media that can be accessed by computer 600 via
storage devices 660 and includes both volatile and nonvolatile
media that is either removable 670 and/or non-removable 480, for
storage of information such as computer-readable or
computer-executable instructions, data structures, program modules,
or other data. By way of example, and not limitation, computer
readable media may comprise computer storage media and
communication media. Computer storage media includes, but is not
limited to, computer or machine readable media or storage devices
such as DVD's, CD's, floppy disks, tape drives, hard drives,
optical drives, solid state memory devices, RAM, ROM, EEPROM, flash
memory or other memory technology, magnetic cassettes, magnetic
tapes, magnetic disk storage, or other magnetic storage devices, or
any other device which can be used to store the desired information
and which can be accessed by one or more computing devices.
[0073] Storage of information such as computer-readable or
computer-executable instructions, data structures, program modules,
etc., can also be accomplished by using any of a variety of the
aforementioned communication media to encode one or more modulated
data signals or carrier waves, or other transport mechanisms or
communications protocols, and includes any wired or wireless
information delivery mechanism. Note that the terms "modulated data
signal" or "carrier wave" generally refer a signal that has one or
more of its characteristics set or changed in such a manner as to
encode information in the signal. For example, communication media
includes wired media such as a wired network or direct-wired
connection carrying one or more modulated data signals, and
wireless media such as acoustic, RF, infrared, laser, and other
wireless media for transmitting and/or receiving one or more
modulated data signals or carrier waves. Combinations of the any of
the above should also be included within the scope of communication
media.
[0074] Further, software, programs, and/or computer program
products embodying the some or all of the various embodiments of
the mobile image viewing technique described herein, or portions
thereof, may be stored, received, transmitted, or read from any
desired combination of computer or machine readable media or
storage devices and communication media in the form of computer
executable instructions or other data structures.
[0075] Finally, the mobile image viewing technique described herein
may be further described in the general context of
computer-executable instructions, such as program modules, being
executed by a computing device. Generally, program modules include
routines, programs, objects, components, data structures, etc.,
that perform particular tasks or implement particular abstract data
types. The embodiments described herein may also be practiced in
distributed computing environments where tasks are performed by one
or more remote processing devices, or within a cloud of one or more
devices, that are linked through one or more communications
networks. In a distributed computing environment, program modules
may be located in both local and remote computer storage media
including media storage devices. Still further, the aforementioned
instructions may be implemented, in part or in whole, as hardware
logic circuits, which may or may not include a processor.
[0076] It should also be noted that any or all of the
aforementioned alternate embodiments described herein may be used
in any combination desired to form additional hybrid embodiments.
Although the subject matter has been described in language specific
to structural features and/or methodological acts, it is to be
understood that the subject matter defined in the appended claims
is not necessarily limited to the specific features or acts
described above. The specific features and acts described above are
disclosed as example forms of implementing the claims.
* * * * *