In-line Mediation For Manipulating Three-dimensional Content On A Display Device Marti; Stefan ; et al. [SAMSUNG ELECTRONICS CO., LTD]

In-line Mediation For Manipulating Three-dimensional Content On A Display Device

Marti; Stefan ; et al.

Patent Application Summary

U.S. patent application number 12/421363 was filed with the patent office on 2010-03-04 for in-line mediation for manipulating three-dimensional content on a display device. This patent application is currently assigned to SAMSUNG ELECTRONICS CO., LTD. Invention is credited to Francisco Imai, Seung Wook Kim, Stefan Marti.

Application Number	20100053151 12/421363
Document ID	/
Family ID	41724668
Filed Date	2010-03-04

United States Patent Application	20100053151
Kind Code	A1
Marti; Stefan ; et al.	March 4, 2010

IN-LINE MEDIATION FOR MANIPULATING THREE-DIMENSIONAL CONTENT ON A DISPLAY DEVICE

Abstract

A user holds the mobile device upright or sits in front of a nomadic or stationary device, views the monitor from a suitable distance, and physically reaches behind the device with her hand to manipulate a 3D object displayed on the monitor. The device functions as a 3D in-line mediator that provides visual coherency to the user when she reaches behind the device to use hand gestures and movements to manipulate a perceived object behind the device and sees that the 3D object on the display is being manipulated. The perceived object that the user manipulates behind the device with bare hands corresponds to the 3D object displayed on the device. The visual coherency arises from the alignment of the user's head or eyes, the device, and the 3D object. The user's hand may be represented as an image of the actual hand or as a virtualized representation of the hand, such as part of an avatar.

Inventors:	Marti; Stefan; (San Francisco, CA) ; Kim; Seung Wook; (Santa Clara, CA) ; Imai; Francisco; (Mountain View, CA)
Correspondence Address:	Beyer Law Group LLP P.O. BOX 1687 Cupertino CA 95015-1687 US
Assignee:	SAMSUNG ELECTRONICS CO., LTD Suwon City KR
Family ID:	41724668
Appl. No.:	12/421363
Filed:	April 9, 2009

Related U.S. Patent Documents


Application Number	Filing Date	Patent Number
61093651	Sep 2, 2008

Current U.S. Class:	345/419 ; 340/407.1; 345/629; 348/77; 348/E7.085
Current CPC Class:	G09G 2356/00 20130101; G06F 3/017 20130101; H04M 2203/359 20130101; G06F 3/1446 20130101; G06F 3/04815 20130101; G06T 2219/2021 20130101; G09G 3/002 20130101; G06T 19/006 20130101; G06F 3/147 20130101; G06T 2219/2016 20130101; G06F 3/011 20130101; G06F 3/016 20130101; H04N 13/279 20180501
Class at Publication:	345/419 ; 348/77; 340/407.1; 345/629; 348/E07.085
International Class:	G06T 15/00 20060101 G06T015/00; H04N 7/18 20060101 H04N007/18; H04B 3/36 20060101 H04B003/36

Claims

1. A method of detecting manipulation of a digital 3D object displayed on a device having a front side with a display and a back side, the method comprising: detecting a hand within a specific area of the back side of the device, the back side having an sensor; displaying the hand on the display; tracking movement of the hand within the specific area of the back side, wherein said movement is caused by a user intending to manipulate the displayed 3D object; detecting a collision between the displayed hand and the displayed 3D object; and modifying an image of the 3D object displayed on the device, wherein the device is a 3D in-line mediator between the user and the 3D object.

2. A method as recited in claim 1 further comprising: detecting a hand gesture within the specific area.

3. A method as recited in claim 1 wherein modifying an image of the 3D object further comprises: deforming the image of the 3D object.

4. A method as recited in claim 1 wherein modifying an image of the 3D object further comprises: moving the image of the 3D object.

5. A method as recited in claim 1 further comprising: displaying the modified image on the device.

6. A method as recited in claim 1 wherein the user reaches behind the device to manipulate a perceived object corresponding to the 3D object, such that the hand is within the specific area of the back side of the device.

7. A method as recited in claim 6 further comprising: providing the user with visual coherency when the user reaches behind the device.

8. A method as recited in claim 1 wherein tracking movement of the hand further comprises: processing depth data of said hand movement.

9. A method as recited in claim 1 further comprising: executing tracking software.

10. A method as recited in claim 1 wherein the sensor is a tracking component that faces outward from the back side of the device and wherein the sensor is a camera.

11. A method as recited in claim 1 wherein displaying the hand further comprising: displaying a composited image of the hand on the display.

12. A method as recited in claim 1 wherein displaying the hand further comprising: displaying a virtualized image of the hand on the display.

13. A method as recited in claim 1 further comprising: providing haptic feedback to the hand when a collision is detected between the displayed hand and the 3D object.

14. A method as recited in claim 1 wherein there is no contact between the hand and the back side of the device or with the display.

15. A device having a display, the device comprising: a processor; a memory storing digital 3D content data; a tracking sensor component for tracking movement of an object in proximity of the device, wherein the tracking sensor component is on a back side of the device facing away from a user; a hand tracking module for processing movement data related to a user hand; and a hand-3D object collision module for detecting a collision between the user hand and a 3D object.

16. A device as recited in claim 15 further comprising: a face tracking sensor component for tracking face movement in proximity of a front side of a device; and a face tracking module for processing face movement data related to user face movement in front of the device.

17. A device as recited in claim 15 further comprising: a hand gesture detection module for detecting user hand gestures made within range of the tracking sensor component.

18. A device as recited in claim 15 further comprising: a tactile feedback controller for providing tactile feedback to the user hand.

19. A device as recited in claim 15 wherein a tracking sensor component is a camera-based component.

20. A device as recited in claim 15 wherein a tracking sensor component is one of an image differentiator, infrared detector, optic flow component, and spectral processor.

21. A device as recited in claim 15 wherein the tracking sensor component tracks movements of the hand when the user moves the hand behind the device with the range of the tracking sensor component.

22. A device as recited in claim 15 wherein the device is one of a mobile device, a nomadic device, and a stationary device.

23. A device as recited in claim 15 further comprising a network interface for connecting to a network to receive digital 3D content data.

24. An apparatus for manipulating digital 3D content, the apparatus having a front side with a display and a back side, the apparatus comprising: means for detecting a hand within a specific area of the back side of the apparatus, the back side having an sensor; means for displaying the hand on the apparatus; means for tracking movement of the hand within the specific area of the back side, wherein said movement is caused by a user intending to manipulate a displayed 3D object; means for detecting a collision between the displayed hand and the displayed 3D object; and means for modifying an image of the 3D object displayed on the apparatus, wherein the apparatus is a 3D in-line mediator between the user and the 3D object.

25. An apparatus as recited in claim 24 further comprising: means for detecting a hand gesture within the specific area.

26. An apparatus as recited in claim 24 wherein means for modifying an image of the 3D object further comprises: means for moving the image of the 3D object.

27. An apparatus as recited in claim 24 further comprising: means for displaying the modified image on the apparatus.

28. An apparatus as recited in claim 24 wherein the user reaches behind the apparatus to manipulate a perceived object corresponding to the 3D object, such that the hand is within the specific area of the back side of the apparatus.

29. An apparatus as recited in claim 24 wherein means for tracking movement of the hand further comprises: means for processing depth data of said hand movement.

30. An apparatus as recited in claim 24 wherein the sensor is a tracking component that faces outward from the back side of the apparatus and wherein the sensor is a camera.

31. An apparatus as recited in claim 24 wherein means for displaying the hand further comprises: means for displaying a composited image of the hand on the apparatus.

32. An apparatus as recited in claim 24 further comprising: means for providing haptic feedback to the hand when a collision is detected between the displayed hand and the 3D object.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS

[0001] This application claims priority of U.S. provisional patent application No. 60/093,651 filed Sep. 2, 2008 entitled "GESTURE AND MOTION-BASED NAVIGATION AND INTERACTION WITH THREE-DIMENSIONAL VIRTUAL CONTENT ON A MOBILE DEVICE," of which is hereby incorporated by reference.

BACKGROUND OF THE INVENTION

[0002] 1. Field of the Invention

[0003] The present invention relates generally to hardware and software for user interaction with digital three-dimensional data. More specifically, it relates to devices having displays and to human interaction with data displayed on the devices.

[0004] 2. Description of the Related Art

[0005] The amount of three-dimensional content available on the Internet and in other contexts is increasing at a rapid pace. Consumers are getting more accustomed to hearing about "3D" in various contexts, such as movies, video games, and online virtual cities. Three-dimensional content may be found in medical imaging (e.g., examining MRIs), modeling and prototyping, information visualization, architecture, tele-immersion and collaboration, geographic information systems (e.g., Google Earth), and in other fields. Current systems, including computers and cell phones, but more generally, content display systems (e.g., TVs) fall short of taking advantage of 3D content by not providing an immersive user experience. For example, they do not provide an intuitive, natural and unobtrusive interaction with 3-D objects.

[0006] With respect to mobile devices, presently, such devices do not provide users who are seeking interaction with digital 3D content on their mobile devices with a natural, intuitive, and immersive experience. Mobile device users are not able to make gestures or manipulate 3D objects using bare hands in a natural and intuitive way.

[0007] Although some displays allow users to manipulate 3D content with bare hands in front of the display (monitor), current display systems that are able to provide some interaction with 3D content require inconvenient or intrusive peripherals that make the experience unnatural to the user. For example, some current methods of providing tactile or haptic feedback require vibro-tactile gloves. In other examples, current methods of rendering 3-D content include stereoscopic displays (requiring the user to wear a pair of special glasses), auto-stereoscopic displays (based on lenticular lenses or parallax barriers that cause eye strain and headaches as usual side effects), head-mounted displays (requiring heavy head gear or goggles), and volumetric displays, such as those based on oscillating mirrors or screens (which do not allow bare hand direct manipulation of 3-D content).

[0008] In addition mobile device displays, such as displays on cell phones, only allow for a limited field of view (FOV). This is due to the fact that the mobile device display size is generally limited by the size of the device. For example, the size of a non-projection display cannot be larger than the mobile device that contains the display. Therefore, existing solutions for mobile displays (which are generally light-emitting displays) limit the immersive experience for the user. Furthermore, it is presently difficult to navigate through virtual worlds and 3-D content via a first-person view on mobile devices, which is one aspect of creating an immersive experience. Mobile devices do not provide satisfactory user awareness of virtual surroundings, another important aspect of creating an immersive experience.

[0009] Some display systems require a user to reach behind the monitor. However, in these systems the user's hands must physically touch the back of the monitor and is only intended to manipulate 2-D images, such as moving images on the screen.

SUMMARY OF THE INVENTION

[0010] A user is able to use a mobile device having a display, such as a cell phone or a media player, to view and manipulate 3D content displayed on the device by reaching behind the device and manipulating a perceived 3D object. The user's eyes, device, and a perceived 3D object are aligned or "in-line," such that the device performs as a type of in-line mediator between the user and the perceived 3D object. This alignment results in a visual coherency to the user when reaching behind the device to make hand gestures and movements to manipulate the 3D content. That is, the user's hand movements behind the device are at a natural and intuitive distance and are aligned with the 3D object displayed on the device monitor so that the user has a natural visual impression that she is actually handling the 3D object shown on the monitor.

[0011] One embodiment of the present invention is a method of detecting manipulation of a digital 3D object displayed on a device having a front side with a display monitor facing the user and a back side having a sensor facing away from the user. A hand or other object may be detected within a specific area of the back side of the device having the sensor, such as a camera. The hand is displayed on the monitor and its movements within a specific area of the back side of the device are tracked. The movements are the result of the user intending to manipulate the displayed 3D object and are made the user in manipulating a perceived 3D object behind the device, but without having to physically touch the backside of the device. A collision between the displayed hand and the displayed 3D object may be detected by the device resulting in a modification of the image of the 3D object displayed on the device. In this manner the device functions as a 3D in-line mediator between the user and the 3D object.

[0012] In another embodiment, a display device includes a processor and a memory component storing digital 3D content data. The device also includes a tracking sensor component for tracking movement of an object that is in proximity of the device. In one embodiment, the tracking sensor component faces the back of the device (away from the user) and is able to detect movements and gestures of a hand of a user who reaches behind the device. A hand tracking module processes movement data from the tracking sensor and a collision detection module detects collisions between a user's hand and a 3D object.

BRIEF DESCRIPTION OF THE DRAWINGS

[0013] References are made to the accompanying drawings, which form a part of the description and in which are shown, by way of illustration, particular embodiments:

[0014] FIG. 1A is an illustration of a user using a mobile device as a 3D in-line mediator to manipulate digital 3D content displayed on the device in accordance with one embodiment;

[0015] FIG. 1B is an illustration of a user using a laptop computer as a 3D in-line mediator to manipulate digital 3D content displayed on device in accordance with one embodiment;

[0016] FIG. 1C is an illustration of a user using a desktop computer as a 3D in-line mediator to manipulate digital 3D content displayed on a desktop monitor in accordance with one embodiment;

[0017] FIGS. 2A and 2B are top views illustrating 3D in-line mediation;

[0018] FIG. 2C is an illustration of a side perspective of user shown in FIG. 2A;

[0019] FIG. 3A is a more detailed top view of a user utilizing a mobile device as a 3D in-line mediator for manipulating digital 3D content in accordance with one embodiment;

[0020] FIG. 3B shows a scene that a user sees when facing a device and when reaching behind the device;

[0021] FIG. 4 is a flow diagram of a process of enabling in-line mediation in accordance with one embodiment;

[0022] FIG. 5 is a block diagram showing relevant components of a device capable of functioning as a 3D in-line mediator in accordance with one embodiment; and

[0023] FIGS. 6A and 6B illustrate a computer system suitable for implementing embodiments of the present invention.

DETAILED DESCRIPTION OF THE INVENTION

[0024] Methods and systems for using a display device as a three-dimensional (3D) in-line mediator for interacting with digital 3D content displayed on the device are described in the various figures. The use of a display device as an in-line mediator enables intuitive bare hand manipulation of digital 3D content by allowing a user to see the direct effect of the user's handling of the 3D content on the display by reaching behind the display device. In this manner, the device display functions as an in-line mediator between the user and the 3D content, enabling a type of visual coherency for the user. That is, the 3D content is visually coherent or in-line from the user's perspective. The user's hand, or a representation of it, is shown on the display, maintaining the visual coherency. Furthermore, by reaching behind the device, the user's view of the 3D content on the display is not obstructed by the user's arm or hand.

[0025] FIG. 1A is an illustration of a user using a mobile device as a 3D in-line mediator to manipulate digital 3D content displayed on the device in accordance with one embodiment. The term "3D in-line mediation" refers to the user's eyes, the 3D content on the display, and user's hands behind the display (but not touching the back of the display) being aligned in real 3D space. A user 102 holds a mobile device 104 with one hand 105 and reaches behind device 104 with another hand 106. One or more sensors (not shown), collectively referred to as a sensor component, on device 104 face away from user 102 in the direction of user's hand behind device 104. User's hand 106 is detected and a representation 108 of the hand is displayed on a device monitor 109 displaying a 3D scene 107. As discussed in greater detail below, representation 108 may be an unaltered (actual) image of the user's hand that is composited onto scene 106 or may be a one-to-one mapping of real hand 106 to a virtual representation (not shown), such as an avatar hand which becomes part of 3D or virtual scene 107. The term "hand" as used herein may include, in addition to the user's hand, fingers, and thumb, the user's wrist and forearm, all of which may be detected by the sensor component.

[0026] Mobile device 104 may be a cell phone, a media player (e.g., MP3 player), portable gaming device, or any type of smart handset device having a display. It is assumed that the device is IP-enabled or capable of connecting to a suitable network to access 3D content over the Internet. However, the various embodiments described below do not necessarily require that the device be able to access a network. For example, the 3D content displayed on the device may be resident on a local storage of the device, such as on a hard disk or other mass storage component or on a local cache. The sensor component on mobile device 104 and the accompanying sensor software may be one or more of various types of sensors, a typical one being a conventional camera. Implementations of various sensor components are described below. Although the methods and systems of the various embodiments are described using a mobile device, they may equally apply to nomadic devices, such as laptops and netbook computers (i.e., devices that are portable), and to stationary devices, such as desktop computers, workstations, and the like, as shown in FIGS. 1B and 1C.

[0027] FIG. 1B is an illustration of a user 110 using a laptop computer 112 or similar nomadic computing device (e.g., netbook computer, mini laptop, etc.) as a 3D in-line mediator to manipulate digital 3D content displayed on device 112 in accordance with one embodiment. A laptop computer has a sensor component (not shown) facing away from user 110. Sensor component may be an internal component of laptop 112 or a peripheral 113 attached to it with associated software installed on laptop 112. Both of user 110 hands 114 and 115 are composited onto a 3D scene 116 on display 117.

[0028] Similarly, FIG. 1C is an illustration of a user 118 using a desktop computer monitor 119 as a 3D in-line mediator to manipulate digital 3D content displayed on desktop monitor 119 in accordance with one embodiment. As a practical matter, it is preferable that monitor 119 be some type of flat panel monitor, such as an LCD or plasma monitor, so that the user is physically able to reach behind the monitor. A tracking sensor component 120 detects a user's hand 122 behind desktop monitor 119. In this example, hand 122 is mapped to a digital representation, such as an avatar hand 124. User 118 may also move his other hand behind monitor 119 as in FIG. 1B.

[0029] FIGS. 2A and 2B are top views illustrating 3D in-line mediation. They show a user 200 holding a mobile device 202 with her left hand 204. User 200 extends her right hand 206 behind device 202. An area 208 behind mobile device 202 indicated by the solid angled lines, is a so-called virtual 3D space and the area 210 surrounding device 202 is the physical or real environment or world ("RW") that user 200 is in. A segment of the user's hand 206 in virtual space 208 is shaded. A sensor component (not shown) detects the presence and movement in area 208. User 200 can move hand 206 around in area 208 to manipulate 3D objects shown on device 202. FIG. 2B shows a user 212 reaching behind a laptop 214 with both hands 216 and 218.

[0030] The sensor component, also referred to as a tracking component, may be implemented using various types of sensors. These sensors may be used to detect the presence of a user's hand (or any object) behind the mobile device's monitor. In a preferred embodiment, a standard or conventional mobile device or cell phone camera is used to sense the presence of a hand and its movements or gestures. Image differentiation or optic flow may also be used to detect and track hand movement. In other embodiments, a conventional camera may be replaced with infrared detection components to perform hand detection and tracking. For example, a mobile device camera facing away from the user and that is IR sensitive (or has its IR filter removed), possibly in combination with additional IR illumination (e.g., LED), may look for the brightest object within the ranger of the camera, which will likely be the user's hand. Dedicated infrared sensors with IR illumination may also be used. In another embodiment, redshift thermal imaging may be used. This option provides passive optical components that redshift a standard CMOS imager to be able to detect long wavelength and thermal infrared radiation. Another type of sensor may be ultrasonic gesture detection sensors. Sensor software options include off-the-shelf gesture recognition tools, such as software for detecting hands using object segmentation and/or optic flow. Other options include spectral imaging software for detecting skin tones, pseudo-thermal imaging, and 3D depth cameras using time-of-flight.

[0031] FIG. 2C is an illustration of a side perspective of user 200 shown initially in FIG. 2A. User hand 206 is in front of a sensor component of device 202, the sensor facing away from user 200. User 200 holds device 202 with hand 204. Hand 206 is behind device 202 in virtual 3D space 208. Space 208 is the space in proximity of the backside of device 202 that is tracked by the backward facing sensor. Real or physical world 210 is outside the proximity or tracked area of device 202. Gestures and movements of hand 206 in 3D space 208 are made by user 200 in order to manipulate a perceived 3D object (not shown) on the display of device 202. In another scenario, the gestures may not be for manipulating an object but simply for the purpose of making a gesture (e.g., waving, pointing, etc.) in a virtual world environment.

[0032] FIG. 3A is a more detailed top view of a user utilizing a mobile device as a 3D in-line mediator for manipulating digital 3D content in accordance with one embodiment. It shows a user's head 303, a "perceived" digital 3D object 304 and a display device 306 as being in-line or aligned, with display device 306 functioning as the in-line mediator. It also shows a user's hand 308 reaching behind device 306. FIG. 3B shows a scene 310 that user head 303 sees when facing device 306 and when reaching behind device 306. The user sees digital 3D object 312 on the screen and user hand 308 or a representation of it touching or otherwise manipulating object 312. It is helpful to note that although the figures only show a user touching an object or making gestures, the user is also able to manipulate a digital 3D object, such as touching, lifting, holding, moving, pushing, pulling, dropping, throwing, rotating, deforming, bending, stretching, compressing, squeezing, pinching. When the user reaches behind the monitor or device, she can move her hand(s) to manipulate 3D objects she sees on the display. As explained below, there is a depth component. If the 3D object is in front of the 3D scene she is viewing, she will not have to reach far behind the device/monitor. If the object is further back in the 3D scene, the user may have to physically reach further behind the device/monitor in order to touch (or collide with) the 3D object.

[0033] FIG. 4 is a flow diagram of a process of enabling in-line mediation in accordance with one embodiment. The process described in FIG. 4 begins after the user has powered on the device, which may be mobile, nomadic, or stationary. A tracking component has also been activated. The device displays 3D content, such as an online virtual world or any other form of 3D content, examples of which are provided above. The user directly faces the display; that is, sits squarely in front of the laptop or desktop monitor or holds the cell phone directly in front of her. There may be a 3D object displayed on the screen that the user wants to manipulate (e.g., pick up a ball, move a chair, etc.) or there may be a 3D world scene in which the user wants to perform a hand gesture or movement (e.g., wave to 3D person or an avatar). Other examples that do not involve online 3D content may include moving or changing the orientation of 3D medical imaging data, playing a 3D video game, interacting with 3D content, such as a movie or show, and so on.

[0034] The user begins by moving a hand behind the device (hereafter, for ease of illustration, the term "device" may convey mobile device screens and laptop/desktop monitors). At step 402 a tracking component detects the presence of the user's hand. There are various ways this can be done. One conventional way is by detecting the skin tone of the user's hand. As described above, there are numerous types of tracking components or sensors that may be used. Which one that is most suitable will likely depend on the features and capabilities of the device (i.e., mobile, nomadic, stationary, etc.). A typical cell phone camera is capable of detecting the presence of a human hand. An image of the hand (or hands) is transmitted to a compositing component.

[0035] At step 404 the hand is displayed on the screen. The user sees either an unaltered view of her hand (not including the background behind and around the hand) or an altered representation of the hand. If an image of the user's hand is displayed, known compositing techniques may be used. For example, some techniques may involve combining two video sources-one for the 3D content and another representing video images of the user's hand. Other techniques for overlaying or compositing the images of the hand over the 3D content data may be used and which technique is most suitable will likely depend on the type of device. If the user's hand is mapped to an avatar hand or other digital representation, software from the 3D content provider or other conventional software may be used to perform a mapping of the user hand images to an avatar image, such as a robotic hand. Thus, after step 404, a representation of a stationary user's hand can be seen on the device. That is, its presence has been detected and is being represented on the device.

[0036] At step 406 the user starts moving the hand, either by moving it up, down, left, right, or inward or outward (relative to the device) or by gesturing (or both). The initial position of the hand and its subsequent movement can be described in terms of x, y, and z coordinates. The tracking component begins tracking hand movement and gesturing, which has horizontal, vertical, and depth components. For example, a user may be viewing a 3D virtual world room on the device and wants to move an object that is in the far left corner of the room (which has a certain depth) and to the near right corner of the room. In one embodiment of the invention, the user may have to move his hand to a position that is, for example, about 12 inches behind and slightly left of the device. This may require that the user extend her arm out a little further than what would be considered a normal or natural distance. After grabbing the object, as discussed in step 408 below, the user moves her hand to a position that is maybe 2-3 inches behind and to the right of the device. This example illustrates that there is a depth component in the hand tracking that is implemented to maintain the in-line mediation performed by the device.

[0037] At step 408 the digital representation of the user's hand on the device collides or touches an object. This collision is detected by comparing sensor data from the tracking sensor and geometrical data from the 3D data repository. The user moves her hand behind the device in a way that causes the digital representation of her hand on the screen to collide with the object, at which point she can grab, pick up, or otherwise manipulate the object. The user's hand may be characterized as colliding with the perceived object that is "floating" behind the device, as described in FIG. 3A. In the described embodiment, in order to maintain the 3D in-line mediation or visual coherency, the user's eyes are looking straight at the middle of the screen. That is, there is a vertical and horizontal alignment of the user's head with the device and the 3D content. In another embodiment, the user's face may also be tracked which may enable changes in the 3D content images to reflect movement in the user's head (i.e., perspective).

[0038] In one embodiment, an "input-output coincidence" model is used to close a human-computer interaction feature referred to as a perception-action loop, where perception is what the user sees and action is what the user does. This enables a user to see the consequences of an interaction, such as touching a 3-D object, immediately. As described above, a user's hand is aligned with or in the same position as the 3-D object that is being manipulated. That is, from the user's perspective, the hand is aligned with the 3-D object so that it looks like the user is lifting or moving a 3-D object as if it were a physical object. What the user sees makes sense based on the action being taken by the user. In one embodiment, the system provides tactile feedback to the user upon detecting a collision between the user's hand and the 3-D object.

[0039] At step 410 the image of the 3D scene is modified to reflect the user's manipulation of the 3D object. If there is no manipulation of a 3D object (and thus no object collision), the image on the screen changes as the user moves her hand, as it does when the user manipulates a 3D object. The changes in the 3D image on the screen may be done using known methods for processing 3D content data. These methods or techniques may vary depending on the type of device, the source of the data, and other factors. The process then repeats by returning to step 402 where the presence of the user's hand is again detected. The process described in FIG. 4 is continuous in that the user's hand movement is tracked as long as it is within the range of the tracking component. In the described embodiment, the device is able to perform as a 3D in-line mediator as long as the user's head or perspective is kept in line with the device which, in turn, allows the user's hand movements behind the device to be visually coherent with the hand movements shown on the screen and vice versa. That is, the user moves her hand in the physical world based on actions she wants to perform in the digital 3D environment shown on the screen.

[0040] FIG. 5 is a block diagram showing relevant components of a device capable of functioning as a 3D in-line mediator in accordance with one embodiment. Many of the components shown here have been described above. A device 500 has a display component (not shown) for displaying digital 3D content data 501 which may be stored in mass storage or in a local cache (not shown) on device 500 or may be downloaded from the Internet or from another source. A tracking sensor component 502 may include one or more conventional (2D) cameras and 3D (depth) cameras and non-camera peripherals. A 3-D camera may provide depth data which simplifies gesture recognition by use of depth keying. In another embodiment, a wide angle lens may be used in a camera which may require less processing by an imaging system, but may produce more distortion. Component 502 may also have other capabilities as described above, such as infrared detection, optic flow, image differentiation, redshift thermal imaging, spectral processing, and other techniques may be used in tracking component 502. Tracking sensor component 502 is responsible for tracking the position of body parts within the range of detection. This position data is transmitted to hand tracking module 504 and to face tracking module 506 and each identifies features that are relevant to each module.

[0041] Hand tracking module 504 identifies features of the user's hand positions, including the positions of the fingers, wrist, and arm. It determines the location of these body parts in the 3D environment. Data from module 504 goes to two components related to hand and arm position: gesture detection module 508 and hand collision detection module 510. In one embodiment, a user "gesture" results in a modification of 3D content 501. A gesture may include lifting, holding, squeezing, pinching, or rotating a 3D object. These actions typically result in some type of modification of the object in the 3D environment. A modification of an object may include a change in its location (lifting or turning) without there being an actual deformation or change in shape of the object. The gesture detection data may be applied directly to the graphics data representing 3D content 501.

[0042] In another embodiment, tracking sensor component 502 may also track the user's face. In this case, face tracking data is transmitted to face tracking module 506. Face tracking may be utilized in cases where the user is not vertically aligned (i.e., the user's head is not looking directly at the middle of the screen) with the device and the perceived object.

[0043] In another embodiment, data from hand collision detection module 510 may be transmitted to a tactile feedback controller 512, which is connected to one or more actuators 514 which are external to device 500. In this embodiment, the user may receive haptic feedback when the user's hand collides with a 3D object. Generally, it is preferred that actuators 514 be as unobtrusive as possible. In one embodiment, they are vibrating wristbands, which may be wired or wireless. Using wristbands allows for bare hand manipulation of 3D content as described above. Tactile feedback controller 512 receives a signal that there is a collision or contact and causes tactile actuators 514 to provide a physical sensation to the user. For example, with vibrating wristbands, the user's wrist will sense a vibration or similar physical sensation indicating contact with the 3-D object.

[0044] As is evident from the figures and the various embodiments, the present invention enables a user to interact with digital 3D content in a natural and immersive way by enabling visual coherency, thereby creating an immersive volumetric interaction with the 3D content. In one embodiment, a user uploads or executes 3D content onto a mobile computing device, such as a cell phone. This 3D content may be a virtual world that the user has visited using a browser on the mobile device (e.g., Second Life or any other site that provides virtual world content). Other examples include movies, video games, online virtual cities, medical imaging (e.g., examining MRIs), modeling and prototyping, information visualization, architecture, tele-immersion and collaboration, and geographic information systems (e.g., Google Earth). The user holds the display of the device upright at a comfortable distance in front of the user's eyes, for example at 20-30 centimeters. The display of the mobile device is used as a window into the virtual world. Using the mobile device as an in-line mediator between the user and the user's hand, the user is able to manipulate 3D objects shown on the display by reaching behind the display of the device and make hand gestures and movements around a perceived object behind the display. The user sees the gestures and movements on the display and the 3D object that they are affecting.

[0045] As discussed above, one aspect of creating an immersive and natural user interaction with 3D content using a mobile device is enabling the user to have bare-hand interaction with objects in the virtual world. That is, allowing the user to manipulate and "touch" digital 3D objects using the mobile device and not requiring the user to use any peripheral devices, such as gloves, finger sensors, motion detectors, and the like.

[0046] FIGS. 6A and 6B illustrate a computing system 600 suitable for implementing embodiments of the present invention. FIG. 6A shows one possible physical form of the computing system. Of course, the computing system may have many physical forms including an integrated circuit, a printed circuit board, a small handheld device (such as a mobile telephone, handset or PDA), a personal computer or a super computer. Computing system 600 includes a monitor 602, a display 604, a housing 606, a disk drive 608, a keyboard 610 and a mouse 612. Disk 614 is a computer-readable medium used to transfer data to and from computer system 600.

[0047] FIG. 6B is an example of a block diagram for computing system 600. Attached to system bus 620 are a wide variety of subsystems. Processor(s) 622 (also referred to as central processing units, or CPUs) are coupled to storage devices including memory 624. Memory 624 includes random access memory (RAM) and read-only memory (ROM). As is well known in the art, ROM acts to transfer data and instructions uni-directionally to the CPU and RAM is used typically to transfer data and instructions in a bi-directional manner. Both of these types of memories may include any suitable of the computer-readable media described below. A fixed disk 626 is also coupled bi-directionally to CPU 622; it provides additional data storage capacity and may also include any of the computer-readable media described below. Fixed disk 626 may be used to store programs, data and the like and is typically a secondary storage medium (such as a hard disk) that is slower than primary storage. It will be appreciated that the information retained within fixed disk 626, may, in appropriate cases, be incorporated in standard fashion as virtual memory in memory 624. Removable disk 614 may take the form of any of the computer-readable media described below.

[0048] CPU 622 is also coupled to a variety of input/output devices such as display 604, keyboard 610, mouse 612 and speakers 630. In general, an input/output device may be any of: video displays, track balls, mice, keyboards, microphones, touch-sensitive displays, transducer card readers, magnetic or paper tape readers, tablets, styluses, voice or handwriting recognizers, biometrics readers, or other computers. CPU 622 optionally may be coupled to another computer or telecommunications network using network interface 640. With such a network interface, it is contemplated that the CPU might receive information from the network, or might output information to the network in the course of performing the above-described method steps. Furthermore, method embodiments of the present invention may execute solely upon CPU 622 or may execute over a network such as the Internet in conjunction with a remote CPU that shares a portion of the processing.

[0049] Although illustrative embodiments and applications of this invention are shown and described herein, many variations and modifications are possible which remain within the concept, scope, and spirit of the invention, and these variations would become clear to those of ordinary skill in the art after perusal of this application. Accordingly, the embodiments described are illustrative and not restrictive, and the invention is not to be limited to the details given herein, but may be modified within the scope and equivalents of the appended claims.

* * * * *