U.S. patent application number 12/421363 was filed with the patent office on 2010-03-04 for in-line mediation for manipulating three-dimensional content on a display device.
This patent application is currently assigned to SAMSUNG ELECTRONICS CO., LTD. Invention is credited to Francisco Imai, Seung Wook Kim, Stefan Marti.
Application Number | 20100053151 12/421363 |
Document ID | / |
Family ID | 41724668 |
Filed Date | 2010-03-04 |
United States Patent
Application |
20100053151 |
Kind Code |
A1 |
Marti; Stefan ; et
al. |
March 4, 2010 |
IN-LINE MEDIATION FOR MANIPULATING THREE-DIMENSIONAL CONTENT ON A
DISPLAY DEVICE
Abstract
A user holds the mobile device upright or sits in front of a
nomadic or stationary device, views the monitor from a suitable
distance, and physically reaches behind the device with her hand to
manipulate a 3D object displayed on the monitor. The device
functions as a 3D in-line mediator that provides visual coherency
to the user when she reaches behind the device to use hand gestures
and movements to manipulate a perceived object behind the device
and sees that the 3D object on the display is being manipulated.
The perceived object that the user manipulates behind the device
with bare hands corresponds to the 3D object displayed on the
device. The visual coherency arises from the alignment of the
user's head or eyes, the device, and the 3D object. The user's hand
may be represented as an image of the actual hand or as a
virtualized representation of the hand, such as part of an
avatar.
Inventors: |
Marti; Stefan; (San
Francisco, CA) ; Kim; Seung Wook; (Santa Clara,
CA) ; Imai; Francisco; (Mountain View, CA) |
Correspondence
Address: |
Beyer Law Group LLP
P.O. BOX 1687
Cupertino
CA
95015-1687
US
|
Assignee: |
SAMSUNG ELECTRONICS CO.,
LTD
Suwon City
KR
|
Family ID: |
41724668 |
Appl. No.: |
12/421363 |
Filed: |
April 9, 2009 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
61093651 |
Sep 2, 2008 |
|
|
|
Current U.S.
Class: |
345/419 ;
340/407.1; 345/629; 348/77; 348/E7.085 |
Current CPC
Class: |
G09G 2356/00 20130101;
G06F 3/017 20130101; H04M 2203/359 20130101; G06F 3/1446 20130101;
G06F 3/04815 20130101; G06T 2219/2021 20130101; G09G 3/002
20130101; G06T 19/006 20130101; G06F 3/147 20130101; G06T 2219/2016
20130101; G06F 3/011 20130101; G06F 3/016 20130101; H04N 13/279
20180501 |
Class at
Publication: |
345/419 ; 348/77;
340/407.1; 345/629; 348/E07.085 |
International
Class: |
G06T 15/00 20060101
G06T015/00; H04N 7/18 20060101 H04N007/18; H04B 3/36 20060101
H04B003/36 |
Claims
1. A method of detecting manipulation of a digital 3D object
displayed on a device having a front side with a display and a back
side, the method comprising: detecting a hand within a specific
area of the back side of the device, the back side having an
sensor; displaying the hand on the display; tracking movement of
the hand within the specific area of the back side, wherein said
movement is caused by a user intending to manipulate the displayed
3D object; detecting a collision between the displayed hand and the
displayed 3D object; and modifying an image of the 3D object
displayed on the device, wherein the device is a 3D in-line
mediator between the user and the 3D object.
2. A method as recited in claim 1 further comprising: detecting a
hand gesture within the specific area.
3. A method as recited in claim 1 wherein modifying an image of the
3D object further comprises: deforming the image of the 3D
object.
4. A method as recited in claim 1 wherein modifying an image of the
3D object further comprises: moving the image of the 3D object.
5. A method as recited in claim 1 further comprising: displaying
the modified image on the device.
6. A method as recited in claim 1 wherein the user reaches behind
the device to manipulate a perceived object corresponding to the 3D
object, such that the hand is within the specific area of the back
side of the device.
7. A method as recited in claim 6 further comprising: providing the
user with visual coherency when the user reaches behind the
device.
8. A method as recited in claim 1 wherein tracking movement of the
hand further comprises: processing depth data of said hand
movement.
9. A method as recited in claim 1 further comprising: executing
tracking software.
10. A method as recited in claim 1 wherein the sensor is a tracking
component that faces outward from the back side of the device and
wherein the sensor is a camera.
11. A method as recited in claim 1 wherein displaying the hand
further comprising: displaying a composited image of the hand on
the display.
12. A method as recited in claim 1 wherein displaying the hand
further comprising: displaying a virtualized image of the hand on
the display.
13. A method as recited in claim 1 further comprising: providing
haptic feedback to the hand when a collision is detected between
the displayed hand and the 3D object.
14. A method as recited in claim 1 wherein there is no contact
between the hand and the back side of the device or with the
display.
15. A device having a display, the device comprising: a processor;
a memory storing digital 3D content data; a tracking sensor
component for tracking movement of an object in proximity of the
device, wherein the tracking sensor component is on a back side of
the device facing away from a user; a hand tracking module for
processing movement data related to a user hand; and a hand-3D
object collision module for detecting a collision between the user
hand and a 3D object.
16. A device as recited in claim 15 further comprising: a face
tracking sensor component for tracking face movement in proximity
of a front side of a device; and a face tracking module for
processing face movement data related to user face movement in
front of the device.
17. A device as recited in claim 15 further comprising: a hand
gesture detection module for detecting user hand gestures made
within range of the tracking sensor component.
18. A device as recited in claim 15 further comprising: a tactile
feedback controller for providing tactile feedback to the user
hand.
19. A device as recited in claim 15 wherein a tracking sensor
component is a camera-based component.
20. A device as recited in claim 15 wherein a tracking sensor
component is one of an image differentiator, infrared detector,
optic flow component, and spectral processor.
21. A device as recited in claim 15 wherein the tracking sensor
component tracks movements of the hand when the user moves the hand
behind the device with the range of the tracking sensor
component.
22. A device as recited in claim 15 wherein the device is one of a
mobile device, a nomadic device, and a stationary device.
23. A device as recited in claim 15 further comprising a network
interface for connecting to a network to receive digital 3D content
data.
24. An apparatus for manipulating digital 3D content, the apparatus
having a front side with a display and a back side, the apparatus
comprising: means for detecting a hand within a specific area of
the back side of the apparatus, the back side having an sensor;
means for displaying the hand on the apparatus; means for tracking
movement of the hand within the specific area of the back side,
wherein said movement is caused by a user intending to manipulate a
displayed 3D object; means for detecting a collision between the
displayed hand and the displayed 3D object; and means for modifying
an image of the 3D object displayed on the apparatus, wherein the
apparatus is a 3D in-line mediator between the user and the 3D
object.
25. An apparatus as recited in claim 24 further comprising: means
for detecting a hand gesture within the specific area.
26. An apparatus as recited in claim 24 wherein means for modifying
an image of the 3D object further comprises: means for moving the
image of the 3D object.
27. An apparatus as recited in claim 24 further comprising: means
for displaying the modified image on the apparatus.
28. An apparatus as recited in claim 24 wherein the user reaches
behind the apparatus to manipulate a perceived object corresponding
to the 3D object, such that the hand is within the specific area of
the back side of the apparatus.
29. An apparatus as recited in claim 24 wherein means for tracking
movement of the hand further comprises: means for processing depth
data of said hand movement.
30. An apparatus as recited in claim 24 wherein the sensor is a
tracking component that faces outward from the back side of the
apparatus and wherein the sensor is a camera.
31. An apparatus as recited in claim 24 wherein means for
displaying the hand further comprises: means for displaying a
composited image of the hand on the apparatus.
32. An apparatus as recited in claim 24 further comprising: means
for providing haptic feedback to the hand when a collision is
detected between the displayed hand and the 3D object.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application claims priority of U.S. provisional patent
application No. 60/093,651 filed Sep. 2, 2008 entitled "GESTURE AND
MOTION-BASED NAVIGATION AND INTERACTION WITH THREE-DIMENSIONAL
VIRTUAL CONTENT ON A MOBILE DEVICE," of which is hereby
incorporated by reference.
BACKGROUND OF THE INVENTION
[0002] 1. Field of the Invention
[0003] The present invention relates generally to hardware and
software for user interaction with digital three-dimensional data.
More specifically, it relates to devices having displays and to
human interaction with data displayed on the devices.
[0004] 2. Description of the Related Art
[0005] The amount of three-dimensional content available on the
Internet and in other contexts is increasing at a rapid pace.
Consumers are getting more accustomed to hearing about "3D" in
various contexts, such as movies, video games, and online virtual
cities. Three-dimensional content may be found in medical imaging
(e.g., examining MRIs), modeling and prototyping, information
visualization, architecture, tele-immersion and collaboration,
geographic information systems (e.g., Google Earth), and in other
fields. Current systems, including computers and cell phones, but
more generally, content display systems (e.g., TVs) fall short of
taking advantage of 3D content by not providing an immersive user
experience. For example, they do not provide an intuitive, natural
and unobtrusive interaction with 3-D objects.
[0006] With respect to mobile devices, presently, such devices do
not provide users who are seeking interaction with digital 3D
content on their mobile devices with a natural, intuitive, and
immersive experience. Mobile device users are not able to make
gestures or manipulate 3D objects using bare hands in a natural and
intuitive way.
[0007] Although some displays allow users to manipulate 3D content
with bare hands in front of the display (monitor), current display
systems that are able to provide some interaction with 3D content
require inconvenient or intrusive peripherals that make the
experience unnatural to the user. For example, some current methods
of providing tactile or haptic feedback require vibro-tactile
gloves. In other examples, current methods of rendering 3-D content
include stereoscopic displays (requiring the user to wear a pair of
special glasses), auto-stereoscopic displays (based on lenticular
lenses or parallax barriers that cause eye strain and headaches as
usual side effects), head-mounted displays (requiring heavy head
gear or goggles), and volumetric displays, such as those based on
oscillating mirrors or screens (which do not allow bare hand direct
manipulation of 3-D content).
[0008] In addition mobile device displays, such as displays on cell
phones, only allow for a limited field of view (FOV). This is due
to the fact that the mobile device display size is generally
limited by the size of the device. For example, the size of a
non-projection display cannot be larger than the mobile device that
contains the display. Therefore, existing solutions for mobile
displays (which are generally light-emitting displays) limit the
immersive experience for the user. Furthermore, it is presently
difficult to navigate through virtual worlds and 3-D content via a
first-person view on mobile devices, which is one aspect of
creating an immersive experience. Mobile devices do not provide
satisfactory user awareness of virtual surroundings, another
important aspect of creating an immersive experience.
[0009] Some display systems require a user to reach behind the
monitor. However, in these systems the user's hands must physically
touch the back of the monitor and is only intended to manipulate
2-D images, such as moving images on the screen.
SUMMARY OF THE INVENTION
[0010] A user is able to use a mobile device having a display, such
as a cell phone or a media player, to view and manipulate 3D
content displayed on the device by reaching behind the device and
manipulating a perceived 3D object. The user's eyes, device, and a
perceived 3D object are aligned or "in-line," such that the device
performs as a type of in-line mediator between the user and the
perceived 3D object. This alignment results in a visual coherency
to the user when reaching behind the device to make hand gestures
and movements to manipulate the 3D content. That is, the user's
hand movements behind the device are at a natural and intuitive
distance and are aligned with the 3D object displayed on the device
monitor so that the user has a natural visual impression that she
is actually handling the 3D object shown on the monitor.
[0011] One embodiment of the present invention is a method of
detecting manipulation of a digital 3D object displayed on a device
having a front side with a display monitor facing the user and a
back side having a sensor facing away from the user. A hand or
other object may be detected within a specific area of the back
side of the device having the sensor, such as a camera. The hand is
displayed on the monitor and its movements within a specific area
of the back side of the device are tracked. The movements are the
result of the user intending to manipulate the displayed 3D object
and are made the user in manipulating a perceived 3D object behind
the device, but without having to physically touch the backside of
the device. A collision between the displayed hand and the
displayed 3D object may be detected by the device resulting in a
modification of the image of the 3D object displayed on the device.
In this manner the device functions as a 3D in-line mediator
between the user and the 3D object.
[0012] In another embodiment, a display device includes a processor
and a memory component storing digital 3D content data. The device
also includes a tracking sensor component for tracking movement of
an object that is in proximity of the device. In one embodiment,
the tracking sensor component faces the back of the device (away
from the user) and is able to detect movements and gestures of a
hand of a user who reaches behind the device. A hand tracking
module processes movement data from the tracking sensor and a
collision detection module detects collisions between a user's hand
and a 3D object.
BRIEF DESCRIPTION OF THE DRAWINGS
[0013] References are made to the accompanying drawings, which form
a part of the description and in which are shown, by way of
illustration, particular embodiments:
[0014] FIG. 1A is an illustration of a user using a mobile device
as a 3D in-line mediator to manipulate digital 3D content displayed
on the device in accordance with one embodiment;
[0015] FIG. 1B is an illustration of a user using a laptop computer
as a 3D in-line mediator to manipulate digital 3D content displayed
on device in accordance with one embodiment;
[0016] FIG. 1C is an illustration of a user using a desktop
computer as a 3D in-line mediator to manipulate digital 3D content
displayed on a desktop monitor in accordance with one
embodiment;
[0017] FIGS. 2A and 2B are top views illustrating 3D in-line
mediation;
[0018] FIG. 2C is an illustration of a side perspective of user
shown in FIG. 2A;
[0019] FIG. 3A is a more detailed top view of a user utilizing a
mobile device as a 3D in-line mediator for manipulating digital 3D
content in accordance with one embodiment;
[0020] FIG. 3B shows a scene that a user sees when facing a device
and when reaching behind the device;
[0021] FIG. 4 is a flow diagram of a process of enabling in-line
mediation in accordance with one embodiment;
[0022] FIG. 5 is a block diagram showing relevant components of a
device capable of functioning as a 3D in-line mediator in
accordance with one embodiment; and
[0023] FIGS. 6A and 6B illustrate a computer system suitable for
implementing embodiments of the present invention.
DETAILED DESCRIPTION OF THE INVENTION
[0024] Methods and systems for using a display device as a
three-dimensional (3D) in-line mediator for interacting with
digital 3D content displayed on the device are described in the
various figures. The use of a display device as an in-line mediator
enables intuitive bare hand manipulation of digital 3D content by
allowing a user to see the direct effect of the user's handling of
the 3D content on the display by reaching behind the display
device. In this manner, the device display functions as an in-line
mediator between the user and the 3D content, enabling a type of
visual coherency for the user. That is, the 3D content is visually
coherent or in-line from the user's perspective. The user's hand,
or a representation of it, is shown on the display, maintaining the
visual coherency. Furthermore, by reaching behind the device, the
user's view of the 3D content on the display is not obstructed by
the user's arm or hand.
[0025] FIG. 1A is an illustration of a user using a mobile device
as a 3D in-line mediator to manipulate digital 3D content displayed
on the device in accordance with one embodiment. The term "3D
in-line mediation" refers to the user's eyes, the 3D content on the
display, and user's hands behind the display (but not touching the
back of the display) being aligned in real 3D space. A user 102
holds a mobile device 104 with one hand 105 and reaches behind
device 104 with another hand 106. One or more sensors (not shown),
collectively referred to as a sensor component, on device 104 face
away from user 102 in the direction of user's hand behind device
104. User's hand 106 is detected and a representation 108 of the
hand is displayed on a device monitor 109 displaying a 3D scene
107. As discussed in greater detail below, representation 108 may
be an unaltered (actual) image of the user's hand that is
composited onto scene 106 or may be a one-to-one mapping of real
hand 106 to a virtual representation (not shown), such as an avatar
hand which becomes part of 3D or virtual scene 107. The term "hand"
as used herein may include, in addition to the user's hand,
fingers, and thumb, the user's wrist and forearm, all of which may
be detected by the sensor component.
[0026] Mobile device 104 may be a cell phone, a media player (e.g.,
MP3 player), portable gaming device, or any type of smart handset
device having a display. It is assumed that the device is
IP-enabled or capable of connecting to a suitable network to access
3D content over the Internet. However, the various embodiments
described below do not necessarily require that the device be able
to access a network. For example, the 3D content displayed on the
device may be resident on a local storage of the device, such as on
a hard disk or other mass storage component or on a local cache.
The sensor component on mobile device 104 and the accompanying
sensor software may be one or more of various types of sensors, a
typical one being a conventional camera. Implementations of various
sensor components are described below. Although the methods and
systems of the various embodiments are described using a mobile
device, they may equally apply to nomadic devices, such as laptops
and netbook computers (i.e., devices that are portable), and to
stationary devices, such as desktop computers, workstations, and
the like, as shown in FIGS. 1B and 1C.
[0027] FIG. 1B is an illustration of a user 110 using a laptop
computer 112 or similar nomadic computing device (e.g., netbook
computer, mini laptop, etc.) as a 3D in-line mediator to manipulate
digital 3D content displayed on device 112 in accordance with one
embodiment. A laptop computer has a sensor component (not shown)
facing away from user 110. Sensor component may be an internal
component of laptop 112 or a peripheral 113 attached to it with
associated software installed on laptop 112. Both of user 110 hands
114 and 115 are composited onto a 3D scene 116 on display 117.
[0028] Similarly, FIG. 1C is an illustration of a user 118 using a
desktop computer monitor 119 as a 3D in-line mediator to manipulate
digital 3D content displayed on desktop monitor 119 in accordance
with one embodiment. As a practical matter, it is preferable that
monitor 119 be some type of flat panel monitor, such as an LCD or
plasma monitor, so that the user is physically able to reach behind
the monitor. A tracking sensor component 120 detects a user's hand
122 behind desktop monitor 119. In this example, hand 122 is mapped
to a digital representation, such as an avatar hand 124. User 118
may also move his other hand behind monitor 119 as in FIG. 1B.
[0029] FIGS. 2A and 2B are top views illustrating 3D in-line
mediation. They show a user 200 holding a mobile device 202 with
her left hand 204. User 200 extends her right hand 206 behind
device 202. An area 208 behind mobile device 202 indicated by the
solid angled lines, is a so-called virtual 3D space and the area
210 surrounding device 202 is the physical or real environment or
world ("RW") that user 200 is in. A segment of the user's hand 206
in virtual space 208 is shaded. A sensor component (not shown)
detects the presence and movement in area 208. User 200 can move
hand 206 around in area 208 to manipulate 3D objects shown on
device 202. FIG. 2B shows a user 212 reaching behind a laptop 214
with both hands 216 and 218.
[0030] The sensor component, also referred to as a tracking
component, may be implemented using various types of sensors. These
sensors may be used to detect the presence of a user's hand (or any
object) behind the mobile device's monitor. In a preferred
embodiment, a standard or conventional mobile device or cell phone
camera is used to sense the presence of a hand and its movements or
gestures. Image differentiation or optic flow may also be used to
detect and track hand movement. In other embodiments, a
conventional camera may be replaced with infrared detection
components to perform hand detection and tracking. For example, a
mobile device camera facing away from the user and that is IR
sensitive (or has its IR filter removed), possibly in combination
with additional IR illumination (e.g., LED), may look for the
brightest object within the ranger of the camera, which will likely
be the user's hand. Dedicated infrared sensors with IR illumination
may also be used. In another embodiment, redshift thermal imaging
may be used. This option provides passive optical components that
redshift a standard CMOS imager to be able to detect long
wavelength and thermal infrared radiation. Another type of sensor
may be ultrasonic gesture detection sensors. Sensor software
options include off-the-shelf gesture recognition tools, such as
software for detecting hands using object segmentation and/or optic
flow. Other options include spectral imaging software for detecting
skin tones, pseudo-thermal imaging, and 3D depth cameras using
time-of-flight.
[0031] FIG. 2C is an illustration of a side perspective of user 200
shown initially in FIG. 2A. User hand 206 is in front of a sensor
component of device 202, the sensor facing away from user 200. User
200 holds device 202 with hand 204. Hand 206 is behind device 202
in virtual 3D space 208. Space 208 is the space in proximity of the
backside of device 202 that is tracked by the backward facing
sensor. Real or physical world 210 is outside the proximity or
tracked area of device 202. Gestures and movements of hand 206 in
3D space 208 are made by user 200 in order to manipulate a
perceived 3D object (not shown) on the display of device 202. In
another scenario, the gestures may not be for manipulating an
object but simply for the purpose of making a gesture (e.g.,
waving, pointing, etc.) in a virtual world environment.
[0032] FIG. 3A is a more detailed top view of a user utilizing a
mobile device as a 3D in-line mediator for manipulating digital 3D
content in accordance with one embodiment. It shows a user's head
303, a "perceived" digital 3D object 304 and a display device 306
as being in-line or aligned, with display device 306 functioning as
the in-line mediator. It also shows a user's hand 308 reaching
behind device 306. FIG. 3B shows a scene 310 that user head 303
sees when facing device 306 and when reaching behind device 306.
The user sees digital 3D object 312 on the screen and user hand 308
or a representation of it touching or otherwise manipulating object
312. It is helpful to note that although the figures only show a
user touching an object or making gestures, the user is also able
to manipulate a digital 3D object, such as touching, lifting,
holding, moving, pushing, pulling, dropping, throwing, rotating,
deforming, bending, stretching, compressing, squeezing, pinching.
When the user reaches behind the monitor or device, she can move
her hand(s) to manipulate 3D objects she sees on the display. As
explained below, there is a depth component. If the 3D object is in
front of the 3D scene she is viewing, she will not have to reach
far behind the device/monitor. If the object is further back in the
3D scene, the user may have to physically reach further behind the
device/monitor in order to touch (or collide with) the 3D
object.
[0033] FIG. 4 is a flow diagram of a process of enabling in-line
mediation in accordance with one embodiment. The process described
in FIG. 4 begins after the user has powered on the device, which
may be mobile, nomadic, or stationary. A tracking component has
also been activated. The device displays 3D content, such as an
online virtual world or any other form of 3D content, examples of
which are provided above. The user directly faces the display; that
is, sits squarely in front of the laptop or desktop monitor or
holds the cell phone directly in front of her. There may be a 3D
object displayed on the screen that the user wants to manipulate
(e.g., pick up a ball, move a chair, etc.) or there may be a 3D
world scene in which the user wants to perform a hand gesture or
movement (e.g., wave to 3D person or an avatar). Other examples
that do not involve online 3D content may include moving or
changing the orientation of 3D medical imaging data, playing a 3D
video game, interacting with 3D content, such as a movie or show,
and so on.
[0034] The user begins by moving a hand behind the device
(hereafter, for ease of illustration, the term "device" may convey
mobile device screens and laptop/desktop monitors). At step 402 a
tracking component detects the presence of the user's hand. There
are various ways this can be done. One conventional way is by
detecting the skin tone of the user's hand. As described above,
there are numerous types of tracking components or sensors that may
be used. Which one that is most suitable will likely depend on the
features and capabilities of the device (i.e., mobile, nomadic,
stationary, etc.). A typical cell phone camera is capable of
detecting the presence of a human hand. An image of the hand (or
hands) is transmitted to a compositing component.
[0035] At step 404 the hand is displayed on the screen. The user
sees either an unaltered view of her hand (not including the
background behind and around the hand) or an altered representation
of the hand. If an image of the user's hand is displayed, known
compositing techniques may be used. For example, some techniques
may involve combining two video sources-one for the 3D content and
another representing video images of the user's hand. Other
techniques for overlaying or compositing the images of the hand
over the 3D content data may be used and which technique is most
suitable will likely depend on the type of device. If the user's
hand is mapped to an avatar hand or other digital representation,
software from the 3D content provider or other conventional
software may be used to perform a mapping of the user hand images
to an avatar image, such as a robotic hand. Thus, after step 404, a
representation of a stationary user's hand can be seen on the
device. That is, its presence has been detected and is being
represented on the device.
[0036] At step 406 the user starts moving the hand, either by
moving it up, down, left, right, or inward or outward (relative to
the device) or by gesturing (or both). The initial position of the
hand and its subsequent movement can be described in terms of x, y,
and z coordinates. The tracking component begins tracking hand
movement and gesturing, which has horizontal, vertical, and depth
components. For example, a user may be viewing a 3D virtual world
room on the device and wants to move an object that is in the far
left corner of the room (which has a certain depth) and to the near
right corner of the room. In one embodiment of the invention, the
user may have to move his hand to a position that is, for example,
about 12 inches behind and slightly left of the device. This may
require that the user extend her arm out a little further than what
would be considered a normal or natural distance. After grabbing
the object, as discussed in step 408 below, the user moves her hand
to a position that is maybe 2-3 inches behind and to the right of
the device. This example illustrates that there is a depth
component in the hand tracking that is implemented to maintain the
in-line mediation performed by the device.
[0037] At step 408 the digital representation of the user's hand on
the device collides or touches an object. This collision is
detected by comparing sensor data from the tracking sensor and
geometrical data from the 3D data repository. The user moves her
hand behind the device in a way that causes the digital
representation of her hand on the screen to collide with the
object, at which point she can grab, pick up, or otherwise
manipulate the object. The user's hand may be characterized as
colliding with the perceived object that is "floating" behind the
device, as described in FIG. 3A. In the described embodiment, in
order to maintain the 3D in-line mediation or visual coherency, the
user's eyes are looking straight at the middle of the screen. That
is, there is a vertical and horizontal alignment of the user's head
with the device and the 3D content. In another embodiment, the
user's face may also be tracked which may enable changes in the 3D
content images to reflect movement in the user's head (i.e.,
perspective).
[0038] In one embodiment, an "input-output coincidence" model is
used to close a human-computer interaction feature referred to as a
perception-action loop, where perception is what the user sees and
action is what the user does. This enables a user to see the
consequences of an interaction, such as touching a 3-D object,
immediately. As described above, a user's hand is aligned with or
in the same position as the 3-D object that is being manipulated.
That is, from the user's perspective, the hand is aligned with the
3-D object so that it looks like the user is lifting or moving a
3-D object as if it were a physical object. What the user sees
makes sense based on the action being taken by the user. In one
embodiment, the system provides tactile feedback to the user upon
detecting a collision between the user's hand and the 3-D
object.
[0039] At step 410 the image of the 3D scene is modified to reflect
the user's manipulation of the 3D object. If there is no
manipulation of a 3D object (and thus no object collision), the
image on the screen changes as the user moves her hand, as it does
when the user manipulates a 3D object. The changes in the 3D image
on the screen may be done using known methods for processing 3D
content data. These methods or techniques may vary depending on the
type of device, the source of the data, and other factors. The
process then repeats by returning to step 402 where the presence of
the user's hand is again detected. The process described in FIG. 4
is continuous in that the user's hand movement is tracked as long
as it is within the range of the tracking component. In the
described embodiment, the device is able to perform as a 3D in-line
mediator as long as the user's head or perspective is kept in line
with the device which, in turn, allows the user's hand movements
behind the device to be visually coherent with the hand movements
shown on the screen and vice versa. That is, the user moves her
hand in the physical world based on actions she wants to perform in
the digital 3D environment shown on the screen.
[0040] FIG. 5 is a block diagram showing relevant components of a
device capable of functioning as a 3D in-line mediator in
accordance with one embodiment. Many of the components shown here
have been described above. A device 500 has a display component
(not shown) for displaying digital 3D content data 501 which may be
stored in mass storage or in a local cache (not shown) on device
500 or may be downloaded from the Internet or from another source.
A tracking sensor component 502 may include one or more
conventional (2D) cameras and 3D (depth) cameras and non-camera
peripherals. A 3-D camera may provide depth data which simplifies
gesture recognition by use of depth keying. In another embodiment,
a wide angle lens may be used in a camera which may require less
processing by an imaging system, but may produce more distortion.
Component 502 may also have other capabilities as described above,
such as infrared detection, optic flow, image differentiation,
redshift thermal imaging, spectral processing, and other techniques
may be used in tracking component 502. Tracking sensor component
502 is responsible for tracking the position of body parts within
the range of detection. This position data is transmitted to hand
tracking module 504 and to face tracking module 506 and each
identifies features that are relevant to each module.
[0041] Hand tracking module 504 identifies features of the user's
hand positions, including the positions of the fingers, wrist, and
arm. It determines the location of these body parts in the 3D
environment. Data from module 504 goes to two components related to
hand and arm position: gesture detection module 508 and hand
collision detection module 510. In one embodiment, a user "gesture"
results in a modification of 3D content 501. A gesture may include
lifting, holding, squeezing, pinching, or rotating a 3D object.
These actions typically result in some type of modification of the
object in the 3D environment. A modification of an object may
include a change in its location (lifting or turning) without there
being an actual deformation or change in shape of the object. The
gesture detection data may be applied directly to the graphics data
representing 3D content 501.
[0042] In another embodiment, tracking sensor component 502 may
also track the user's face. In this case, face tracking data is
transmitted to face tracking module 506. Face tracking may be
utilized in cases where the user is not vertically aligned (i.e.,
the user's head is not looking directly at the middle of the
screen) with the device and the perceived object.
[0043] In another embodiment, data from hand collision detection
module 510 may be transmitted to a tactile feedback controller 512,
which is connected to one or more actuators 514 which are external
to device 500. In this embodiment, the user may receive haptic
feedback when the user's hand collides with a 3D object. Generally,
it is preferred that actuators 514 be as unobtrusive as possible.
In one embodiment, they are vibrating wristbands, which may be
wired or wireless. Using wristbands allows for bare hand
manipulation of 3D content as described above. Tactile feedback
controller 512 receives a signal that there is a collision or
contact and causes tactile actuators 514 to provide a physical
sensation to the user. For example, with vibrating wristbands, the
user's wrist will sense a vibration or similar physical sensation
indicating contact with the 3-D object.
[0044] As is evident from the figures and the various embodiments,
the present invention enables a user to interact with digital 3D
content in a natural and immersive way by enabling visual
coherency, thereby creating an immersive volumetric interaction
with the 3D content. In one embodiment, a user uploads or executes
3D content onto a mobile computing device, such as a cell phone.
This 3D content may be a virtual world that the user has visited
using a browser on the mobile device (e.g., Second Life or any
other site that provides virtual world content). Other examples
include movies, video games, online virtual cities, medical imaging
(e.g., examining MRIs), modeling and prototyping, information
visualization, architecture, tele-immersion and collaboration, and
geographic information systems (e.g., Google Earth). The user holds
the display of the device upright at a comfortable distance in
front of the user's eyes, for example at 20-30 centimeters. The
display of the mobile device is used as a window into the virtual
world. Using the mobile device as an in-line mediator between the
user and the user's hand, the user is able to manipulate 3D objects
shown on the display by reaching behind the display of the device
and make hand gestures and movements around a perceived object
behind the display. The user sees the gestures and movements on the
display and the 3D object that they are affecting.
[0045] As discussed above, one aspect of creating an immersive and
natural user interaction with 3D content using a mobile device is
enabling the user to have bare-hand interaction with objects in the
virtual world. That is, allowing the user to manipulate and "touch"
digital 3D objects using the mobile device and not requiring the
user to use any peripheral devices, such as gloves, finger sensors,
motion detectors, and the like.
[0046] FIGS. 6A and 6B illustrate a computing system 600 suitable
for implementing embodiments of the present invention. FIG. 6A
shows one possible physical form of the computing system. Of
course, the computing system may have many physical forms including
an integrated circuit, a printed circuit board, a small handheld
device (such as a mobile telephone, handset or PDA), a personal
computer or a super computer. Computing system 600 includes a
monitor 602, a display 604, a housing 606, a disk drive 608, a
keyboard 610 and a mouse 612. Disk 614 is a computer-readable
medium used to transfer data to and from computer system 600.
[0047] FIG. 6B is an example of a block diagram for computing
system 600. Attached to system bus 620 are a wide variety of
subsystems. Processor(s) 622 (also referred to as central
processing units, or CPUs) are coupled to storage devices including
memory 624. Memory 624 includes random access memory (RAM) and
read-only memory (ROM). As is well known in the art, ROM acts to
transfer data and instructions uni-directionally to the CPU and RAM
is used typically to transfer data and instructions in a
bi-directional manner. Both of these types of memories may include
any suitable of the computer-readable media described below. A
fixed disk 626 is also coupled bi-directionally to CPU 622; it
provides additional data storage capacity and may also include any
of the computer-readable media described below. Fixed disk 626 may
be used to store programs, data and the like and is typically a
secondary storage medium (such as a hard disk) that is slower than
primary storage. It will be appreciated that the information
retained within fixed disk 626, may, in appropriate cases, be
incorporated in standard fashion as virtual memory in memory 624.
Removable disk 614 may take the form of any of the
computer-readable media described below.
[0048] CPU 622 is also coupled to a variety of input/output devices
such as display 604, keyboard 610, mouse 612 and speakers 630. In
general, an input/output device may be any of: video displays,
track balls, mice, keyboards, microphones, touch-sensitive
displays, transducer card readers, magnetic or paper tape readers,
tablets, styluses, voice or handwriting recognizers, biometrics
readers, or other computers. CPU 622 optionally may be coupled to
another computer or telecommunications network using network
interface 640. With such a network interface, it is contemplated
that the CPU might receive information from the network, or might
output information to the network in the course of performing the
above-described method steps. Furthermore, method embodiments of
the present invention may execute solely upon CPU 622 or may
execute over a network such as the Internet in conjunction with a
remote CPU that shares a portion of the processing.
[0049] Although illustrative embodiments and applications of this
invention are shown and described herein, many variations and
modifications are possible which remain within the concept, scope,
and spirit of the invention, and these variations would become
clear to those of ordinary skill in the art after perusal of this
application. Accordingly, the embodiments described are
illustrative and not restrictive, and the invention is not to be
limited to the details given herein, but may be modified within the
scope and equivalents of the appended claims.
* * * * *