U.S. patent application number 13/996230 was filed with the patent office on 2014-02-27 for system and method for communication using interactive avatar.
The applicant listed for this patent is Yangzhou Du, Wei Hu, Wenlong Li, Xiaofeng Tong, Yimin Zhang. Invention is credited to Yangzhou Du, Wei Hu, Wenlong Li, Xiaofeng Tong, Yimin Zhang.
Application Number | 20140055554 13/996230 |
Document ID | / |
Family ID | 48696221 |
Filed Date | 2014-02-27 |
United States Patent
Application |
20140055554 |
Kind Code |
A1 |
Du; Yangzhou ; et
al. |
February 27, 2014 |
SYSTEM AND METHOD FOR COMMUNICATION USING INTERACTIVE AVATAR
Abstract
A video communication system that replaces actual live images of
the participating users with animated avatars. A method may include
selecting an avatar, initiating communication, capturing an image,
detecting a face in the image, determining facial characteristics
from the face, including eye movement and eyelid movement of a user
indicative of direction of user gaze and blinking, respectively,
converting the facial features to avatar parameters, and
transmitting at least one of the avatar selection or avatar
parameters.
Inventors: |
Du; Yangzhou; (Beijing,
CN) ; Li; Wenlong; (Beijing, CN) ; Tong;
Xiaofeng; (Beijing, CN) ; Hu; Wei; (Beijing,
CN) ; Zhang; Yimin; (Beijing, CN) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
Du; Yangzhou
Li; Wenlong
Tong; Xiaofeng
Hu; Wei
Zhang; Yimin |
Beijing
Beijing
Beijing
Beijing
Beijing |
|
CN
CN
CN
CN
CN |
|
|
Family ID: |
48696221 |
Appl. No.: |
13/996230 |
Filed: |
April 9, 2012 |
PCT Filed: |
April 9, 2012 |
PCT NO: |
PCT/CN2012/000461 |
371 Date: |
June 20, 2013 |
Current U.S.
Class: |
348/14.07 |
Current CPC
Class: |
G06K 9/00268 20130101;
H04N 21/4788 20130101; H04N 21/44008 20130101; H04N 21/8146
20130101; G06K 9/00255 20130101; H04N 7/157 20130101; H04N 21/4223
20130101; G06K 9/00308 20130101; H04N 7/147 20130101; G06K 9/00248
20130101; G06T 13/40 20130101; G06K 9/00281 20130101 |
Class at
Publication: |
348/14.07 |
International
Class: |
H04N 7/15 20060101
H04N007/15; G06K 9/00 20060101 G06K009/00 |
Foreign Application Data
Date |
Code |
Application Number |
Dec 29, 2011 |
CN |
PCT/CN2011/084902 |
Claims
1-22. (canceled)
23. A system for interactive avatar communication between a first
user device and a remote user device, said system comprising: a
camera configured to capture images; a communication module
configured to initiate and establish communication between said
first and said remote user devices and to transmit and receive
information between said first and said remote user devices; and
one or more storage mediums having stored thereon, individually or
in combination, instructions that when executed by one or more
processors result in the following operations comprising: selecting
an avatar; initiating communication; capturing an image; detecting
a face in said image; determining facial characteristics from said
face, said facial characteristics comprising at least one of eye
movement and eyelid movement; converting said facial
characteristics to avatar parameters; and transmitting at least one
of said avatar selection and avatar parameters.
24. The system of claim 23, wherein determining facial
characteristics from said face comprises determining a facial
expression in said face.
25. The system of claim 23, wherein determining facial
characteristics from said face comprises determining at least one
of gaze direction and blinking of said eyes based on
statistical-based analysis selected from the group consisting of
linear discriminant analysis (LDA), artificial neural network (ANN)
and support vector machine (SVM).
26. The system of claim 23, wherein said avatar selection and
avatar parameters are used to generate an avatar on a remote
device, said avatar being based on said facial characteristics.
27. The system of claim 23, wherein said avatar selection and
avatar parameters are used to generate an avatar in a virtual
space, said avatar being based on said facial characteristics.
28. The system of claim 23, wherein the instructions that when
executed by one or more processors result in the following
additional operations: receiving at least one of a remote avatar
selection or remote avatar parameters.
29. The system of claim 28, further comprising a display, wherein
the instructions that when executed by one or more processors
result in the following additional operations: displaying an avatar
based on said remote avatar selection.
30. The system of claim 29, wherein the instructions that when
executed by one or more processors result in the following
additional operations: animating said displayed avatar based on
said remote avatar parameters.
31. An apparatus for interactive avatar communication between a
first user device and a remote user device, said apparatus
comprising: a communication module configured to initiate and
establish communication between said first and said remote user
devices; an avatar selection module configured to allow a user to
select an avatar for use during said communication; a face
detection module configured to detect a facial region in an image
of said user and to detect and identify one or more facial
characteristics of said face, said facial characteristics
comprising at least one of eye movement and eyelid movement of said
user; and an avatar control module configured to convert said
facial characteristics to avatar parameters; wherein said
communication module is configured to transmit at least one of said
avatar selection and avatar parameters.
32. The apparatus of claim 31, further comprising an eye
detection/tracking module configured to detect and identify at
least one of eye movement of said user with respect to a display
and eyelid movement of said user.
33. The apparatus of claim 32, wherein said eye detection/tracking
module comprises an eye classification module configured to
determine at least one of gaze direction of said user's eyes user
and blinking of said user's eyes.
34. The apparatus of claim 33, wherein said determination of said
gaze direction and blinking of said user's eyes by said eye
detection/tracking module is based on statistical-based analysis
selected from the group consisting of linear discriminant analysis
(LDA), artificial neural network (ANN) and support vector machine
(SVM).
35. The apparatus of claim 31, wherein said avatar selection and
avatar parameters are used to generate an avatar on said remote
device, said avatar being based on said facial characteristics.
36. The apparatus of claim 31, wherein said communication module is
configured to receive at least one of a remote avatar selection or
remote avatar parameters.
37. The apparatus of claim 36, further comprising a display
configured to display an avatar based on said remote avatar
selection.
38. The apparatus of claim 37, wherein said avatar control module
is configured to animate said displayed avatar based on said remote
avatar parameters.
39. A method for interactive avatar communication, said method
comprising: selecting an avatar; initiating communication;
capturing an image; detecting a face in said image; determining
facial characteristics from said face, said facial characteristics
comprising at least one of eye movement and eyelid movement;
converting said facial characteristics to avatar parameters; and
transmitting at least one of said avatar selection or avatar
parameters.
40. The method of claim 39, wherein the avatar selection and avatar
parameters are used to generate an avatar on a remote device, said
avatar being based on said facial characteristics.
41. The method of claim 39, further comprising receiving at least
one of a remote avatar selection or remote avatar parameters.
42. The method of claim 41, further comprising displaying an avatar
based on the remote avatar selection.
43. The method of claim 42, further comprising animating said
displayed avatar based on said remote avatar parameters.
44. At least one computer accessible medium storing instructions
which, when executed by a machine, cause the machine to perform
operations comprising: selecting an avatar; initiating
communication; capturing an image; detecting a face in said image;
determining facial characteristics from said face, said facial
characteristics comprising at least one of eye movement and eyelid
movement; converting said facial characteristics to avatar
parameters; and transmitting at least one of said avatar selection
or avatar parameters.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] The present application claims the benefit of PCT Patent
Application Serial No. PCT/CN2011/084902, filed Dec. 29, 2011, the
entire disclosure of which is incorporated herein by reference.
FIELD
[0002] The present disclosure relates to video communication and
interaction, and, more particularly, to a system and method for
communication using interactive avatars.
BACKGROUND
[0003] The increasing variety of functionality available in mobile
devices has spawned a desire for users to communicate via video in
addition to simple calls. For example, users may initiate "video
calls," "videoconferencing," etc., wherein a camera and microphone
in a device transmits audio and real-time video of a user to one or
more other recipients such as other mobile devices, desktop
computers, videoconferencing systems, etc. The communication of
real-time video may involve the transmission of substantial amounts
of data (e.g., depending on the technology of the camera, the
particular video codec employed to process the real time image
information, etc.). Given the bandwidth limitations of existing
2G/3G wireless technology, and the still limited availability of
emerging 4G wireless technology, the proposition of many device
users conducting concurrent video calls places a large burden on
bandwidth in the existing wireless communication infrastructure,
which may impact negatively on the quality of the video call.
BRIEF DESCRIPTION OF DRAWINGS
[0004] Features and advantages of various embodiments of the
claimed subject matter will become apparent as the following
Detailed Description proceeds, and upon reference to the Drawings,
wherein like numerals designate like parts, and in which:
[0005] FIG. 1A illustrates an example device-to-device system
consistent with various embodiments of the present disclosure;
[0006] FIG. 1B illustrates an example virtual space system
consistent with various embodiments of the present disclosure;
[0007] FIG. 2 illustrates an example device in consistent with
various embodiments of the present disclosure;
[0008] FIG. 3 illustrates an example face detection module
consistent with various embodiments of the present disclosure;
[0009] FIG. 4 illustrates an example system implementation in
accordance with at least one embodiment of the present disclosure;
and
[0010] FIG. 5 is a flowchart of example operations in accordance
with at least one embodiment of the present disclosure.
[0011] Although the following Detailed Description will proceed
with reference being made to illustrative embodiments, many
alternatives, modifications, and variations thereof will be
apparent to those skilled in the art.
DETAILED DESCRIPTION
[0012] By way of overview, the present disclosure is generally
directed to a system and method for video communication and
interaction using interactive avatars. A system and method
consistent with the present disclosure generally provides detection
and/or tracking of a user's eyes during active communication,
including the detection of characteristics of a user's eyes,
including, but not limited to, eyeball movement, gaze direction
and/or point of focus of the user's eyes, eye blinking, etc. The
system and method is further configured to provide avatar animation
based at least in part on the detected characteristics of the
user's eyes in real-time or near real-time during active
communication.
[0013] In one embodiment an application is activated in a device
coupled to a camera. The application may be configured to allow a
user to select an avatar for display on a remote device, in a
virtual space, etc. The device may then be configured to initiate
communication with at least one other device, a virtual space, etc.
For example, the communication may be established over a 2G, 3G, 4G
cellular connection. Alternatively, the communication may be
established over the Internet via a WiFi connection. After the
communication is established, the camera may be configured to start
capturing images. Facial detection is then performed on the
captured images, and facial characteristics are determined. The
detected face/head movements, including movement of the user's eyes
and/or eyelids, and/or changes in facial features are then
converted into parameters usable for animating the avatar on the at
least one other device, within the virtual space, etc. At least one
of the avatar selection or avatar parameters are then transmitted.
In one embodiment at least one of a remote avatar selection or
remote avatar parameters are received. The remote avatar selection
may cause the device to display an avatar, while the remote avatar
parameters may cause the device to animate the displayed avatar.
Audio communication accompanies the avatar animation via known
methods.
[0014] FIG. 1A illustrates device-to-device system 100 consistent
with various embodiments of the present disclosure. The system 100
may generally include devices 102 and 112 communicating via network
122. Device 102 includes at least camera 104, microphone 106 and
display 108. Device 112 includes at least camera 114, microphone
116 and display 118. Network 122 includes at least one server
124.
[0015] Devices 102 and 112 may include various hardware platforms
that are capable of wired and/or wireless communication. For
example, devices 102 and 112 may include, but are not limited to,
videoconferencing systems, desktop computers, laptop computers,
tablet computers, smart phones, (e.g., iPhones.RTM.,
Android.RTM.-based phones, Blackberries.RTM., Symbian.RTM.-based
phones, Palm.RTM.-based phones, etc.), cellular handsets, etc.
[0016] Cameras 104 and 114 include any device for capturing digital
images representative of an environment that includes one or more
persons, and may have adequate resolution for face analysis of the
one or more persons in the environment as described herein. For
example, cameras 104 and 114 may include still cameras (e.g.,
cameras configured to capture still photographs) or video cameras
(e.g., cameras configured to capture moving images comprised of a
plurality of frames). Cameras 104 and 114 may be configured to
operate using light in the visible spectrum or with other portions
of the electromagnetic spectrum not limited to the infrared
spectrum, ultraviolet spectrum, etc. Cameras 104 and 114 may be
incorporated within devices 102 and 112, respectively, or may be
separate devices configured to communicate with devices 102 and 112
via wired or wireless communication. Specific examples of cameras
104 and 114 may include wired (e.g., Universal Serial Bus (USB),
Ethernet, Firewire, etc.) or wireless (e.g.,
[0017] WiFi, Bluetooth, etc.) web cameras as may be associated with
computers, video monitors, etc., mobile device cameras (e.g., cell
phone or smart phone cameras integrated in, for example, the
previously discussed example devices), integrated laptop computer
cameras, integrated tablet computer cameras (e.g., iPad.RTM.,
Galaxy Tab.RTM., and the like), etc.
[0018] Devices 102 and 112 may further include microphones 106 and
116. Microphones 106 and 116 include any devices configured to
sense sound. Microphones 106 and 116 may be integrated within
devices 102 and 112, respectively, or may interact with the devices
102, 112 via wired or wireless communication such as described in
the above examples regarding cameras 104 and 114. Displays 108 and
118 include any devices configured to display text, still images,
moving images (e.g., video), user interfaces, graphics, etc.
Displays 108 and 118 may be integrated within devices 102 and 112,
respectively, or may interact with the devices via wired or
wireless communication such as described in the above examples
regarding cameras 104 and 114.
[0019] In one embodiment, displays 108 and 118 are configured to
display avatars 110 and 120, respectively. As referenced herein, an
Avatar is defined as graphical representation of a user in either
two-dimensions (2D) or three-dimensions (3D). Avatars do not have
to resemble the looks of the user, and thus, while avatars can be
lifelike representations they can also take the form of drawings,
cartoons, sketches, etc. As shown, device 102 may display avatar
110 representing the user of device 112 (e.g., a remote user), and
likewise, device 112 may display avatar 120 representing the user
of device 102. As such, users may view a representation of other
users without having to exchange large amounts of information that
are generally involved with device-to-device communication
employing live images.
[0020] Network 122 may include various second generation (2G),
third generation (3G), fourth generation (4G) cellular-based data
communication technologies, Wi-Fi wireless data communication
technology, etc. Network 122 includes at least one server 124
configured to establish and maintain communication connections when
using these technologies. For example, server 124 may be configured
to support Internet-related communication protocols like Session
Initiation Protocol (SIP) for creating, modifying and terminating
two-party (unicast) and multi-party (multicast) sessions,
Interactive Connectivity Establishment Protocol (ICE) for
presenting a framework that allows protocols to be built on top of
bytestream connections, Session Traversal Utilities for Network
Access Translators, or NAT, Protocol (STUN) for allowing
applications operating through a NAT to discover the presence of
other NATs, IP addresses and ports allocated for an application's
User Datagram Protocol (UDP) connection to connect to remote hosts,
Traversal Using Relays around NAT (TURN) for allowing elements
behind a NAT or firewall to receive data over Transmission Control
Protocol (TCP) or UDP connections, etc.
[0021] FIG. 1B illustrates a virtual space system 126 consistent
with various embodiments of the present disclosure. The system 126
may include device 102, device 112 and server 124. Device 102,
device 112 and server 124 may continue to communicate in the manner
similar to that illustrated in FIG. 1A, but user interaction may
take place in virtual space 128 instead of in a device-to-device
format. As referenced herein, a virtual space may be defined as a
digital simulation of a physical location. For example, virtual
space 128 may resemble an outdoor location like a city, road,
sidewalk, field, forest, island, etc., or an inside location like
an office, house, school, mall, store, etc.
[0022] Users, represented by avatars, may appear to interact in
virtual space 128 as in the real world. Virtual space 128 may exist
on one or more servers coupled to the Internet, and may be
maintained by a third party. Examples of virtual spaces include
virtual offices, virtual meeting rooms, virtual worlds like Second
Life.RTM., massively multiplayer online role-playing games
(MMORPGs) like World of Warcraft.RTM., massively multiplayer online
real-life games (MMORLGs), like The Sims Online.RTM., etc. In
system 126, virtual space 128 may contain a plurality of avatars
corresponding to different users. Instead of displaying avatars,
displays 108 and 118 may display encapsulated (e.g., smaller)
versions of virtual space (VS) 128. For example, display 108 may
display a perspective view of what the avatar corresponding to the
user of device 102 "sees" in virtual space 128. Similarly, display
118 may display a perspective view of what the avatar corresponding
to the user of device 112 "sees" in virtual space 128. Examples of
what avatars might see in virtual space 128 may include, but are
not limited to, virtual structures (e.g., buildings), virtual
vehicles, virtual objects, virtual animals, other avatars, etc.
[0023] FIG. 2 illustrates an example device 102 in accordance with
various embodiments of the present disclosure. While only device
102 is described, device 112 (e.g., remote device) may include
resources configured to provide the same or similar functions. As
previously discussed, device 102 is shown including camera 104,
microphone 106 and display 108. The camera 104 and microphone 106
may provide input to a camera and audio framework module 200. The
camera and audio framework module 200 may include custom,
proprietary, known and/or after-developed audio and video
processing code (or instruction sets) that are generally
well-defined and operable to control at least camera 104 and
microphone 106. For example, the camera and audio framework module
200 may cause camera 104 and microphone 106 to record images and/or
sounds, may process images and/or sounds, may cause images and/or
sounds to be reproduced, etc. The camera and audio framework module
200 may vary depending on device 102, and more particularly, the
operating system (OS) running in device 102. Example operating
systems include iOS.RTM., Android.RTM., Blackberry.RTM. OS,
Symbian.RTM., Palm.RTM. OS, etc. A speaker 202 may receive audio
information from camera and audio framework module 200 and may be
configured to reproduce local sounds (e.g., to provide audio
feedback of the user's voice) and remote sounds (e.g., the sound of
the other parties engaged in a telephone, video call or interaction
in a virtual place).
[0024] The device 102 may further include a face detection module
204 configured to identify and track a head, face and/or facial
region within image(s) provided by camera 104 and to determine one
or more facial characteristics of the user (i.e., facial
characteristics 206). For example, the face detection module 204
may include custom, proprietary, known and/or after-developed face
detection code (or instruction sets), hardware, and/or firmware
that are generally well-defined and operable to receive a standard
format image (e.g., but not limited to, a RGB color image) and
identify, at least to a certain extent, a face in the image.
[0025] The face detection module 204 may also be configured to
track the detected face through a series of images (e.g., video
frames at 24 frames per second) and to determine a head position
based on the detected face. Known tracking systems that may be
employed by face detection module 204 may include particle
filtering, mean shift, Kalman filtering, etc., each of which may
utilize edge analysis, sum-of-square-difference analysis, feature
point analysis, histogram analysis, skin tone analysis, etc.
[0026] The face detection module 204 may also include custom,
proprietary, known and/or after-developed facial characteristics
code (or instruction sets) that are generally well-defined and
operable to receive a standard format image (e.g., but not limited
to, a RGB color image) and identify, at least to a certain extent,
one or more facial characteristics in the image. Such known facial
characteristics systems include, but are not limited to, the CSU
Face Identification Evaluation System by Colorado State University,
standard Viola-Jones boosting cascade framework, which may be found
in the public Open Source Computer Vision (OpenCV.TM.) package.
[0027] As discussed in greater detail herein, facial
characteristics 206 may include features of the face, including,
but not limited to, the location and/or shape of facial landmarks
such as eyes, eyebrows, nose, mouth, etc., as well as movement of
the eyes and/or eyelids. In one embodiment, avatar animation may be
based on sensed facial actions (e.g., changes in facial
characteristics 206). The corresponding feature points on an
avatar's face may follow or mimic the movements of the real
person's face, which is known as "expression clone" or
"performance-driven facial animation."
[0028] The face detection module 204 may also be configured to
recognize an expression associated with the detected features
(e.g., identifying whether a previously detected face is happy,
sad, smiling, frown, surprised, excited, etc.)). Thus, the face
detection module 204 may further include custom, proprietary, known
and/or after-developed facial expression detection and/or
identification code (or instruction sets) that is generally
well-defined and operable to detect and/or identify expressions in
a face. For example, the face detection module 204 may determine
size and/or position of facial features (e.g., eyes, mouth, cheeks,
teeth, etc.) and may compare these facial features to a facial
feature database which includes a plurality of sample facial
features with corresponding facial feature classifications (e.g.
smiling, frown, excited, sad, etc.).
[0029] The device 102 may further include an avatar selection
module 208 configured to allow a user of device 102 to select an
avatar for display on a remote device. The avatar selection module
208 may include custom, proprietary, known and/or after-developed
user interface construction code (or instruction sets) that are
generally well-defined and operable to present different avatars to
a user so that the user may select one of the avatars.
[0030] In one embodiment one or more avatars may be predefined in
device 102. Predefined avatars allow all devices to have the same
avatars, and during interaction only the selection of an avatar
(e.g., the identification of a predefined avatar) needs to be
communicated to a remote device or virtual space, which reduces the
amount of information that needs to be exchanged. Avatars are
selected prior to establishing communication, but may also be
changed during the course of an active communication. Thus, it may
be possible to send or receive an avatar selection at any point
during the communication, and for the receiving device to change
the displayed avatar in accordance with the received avatar
selection,
[0031] The device 102 may further include an avatar control module
210 configured to generate parameters for animating an avatar.
Animation, as referred to herein, may be defined as altering the
appearance of an image/model. A single animation may alter the
appearance of a 2-D still image, or multiple animations may occur
in sequence to simulate motion in the image (e.g., head turn,
nodding, talking, frowning, smiling, laughing, blinking, winking,
etc.). An example of animation for 3-D models includes deforming a
3-D wireframe model, applying a texture mapping, and re-computing
the model vertex normal for rendering. A change in position of the
detected face and/or facial characteristic 206, including facial
features, may be may converted into parameters that cause the
avatar's features to resemble the features of the user's face.
[0032] In one embodiment the general expression of the detected
face may be converted into one or more parameters that cause the
avatar to exhibit the same expression. The expression of the avatar
may also be exaggerated to emphasize the expression. Knowledge of
the selected avatar may not be necessary when avatar parameters may
be applied generally to all of the predefined avatars. However, in
one embodiment avatar parameters may be specific to the selected
avatar, and thus, may be altered if another avatar is selected. For
example, human avatars may require different parameter settings
(e.g., different avatar features may be altered) to demonstrate
emotions like happy, sad, angry, surprised, etc. than animal
avatars, cartoon avatars, etc.
[0033] The avatar control module 210 may include custom,
proprietary, known and/or after-developed graphics processing code
(or instruction sets) that are generally well-defined and operable
to generate parameters for animating the avatar selected by avatar
selection module 208 based on the face/head position and/or facial
characteristics 206 detected by face detection module 204. For
facial feature-based animation methods, 2-D avatar animation may be
done with, for example, image warping or image morphing, whereas
3-D avatar animation may be done with free form deformation (FFD)
or by utilizing the animation structure defined in a 3-D model of a
head. Oddcast is an example of a software resource usable for 2-D
avatar animation, while FaceGen is an example of a software
resource usable for 3-D avatar animation.
[0034] In addition, in system 100, the avatar control module 210
may receive a remote avatar selection and remote avatar parameters
usable for displaying and animating an avatar corresponding to a
user at a remote device. The avatar control module 210 may cause a
display module 212 to display an avatar 110 on the display 108. The
display module 212 may include custom, proprietary, known and/or
after-developed graphics processing code (or instruction sets) that
are generally well-defined and operable to display and animate an
avatar on display 108 in accordance with the example
device-to-device embodiment.
[0035] For example, the avatar control module 210 may receive a
remote avatar selection and may interpret the remote avatar
selection to correspond to a predetermined avatar. The display
module 212 may then display avatar 110 on display 108. Moreover,
remote avatar parameters received in avatar control module 210 may
be interpreted, and commands may be provided to display module 212
to animate avatar 110.
[0036] In one embodiment more than two users may engage in the
video call. When more than two users are interacting in a video
call, the display 108 may be divided or segmented to allow more
than one avatar corresponding to remote users to be displayed
simultaneously. Alternatively, in system 126, the avatar control
module 210 may receive information causing the display module 212
to display what the avatar corresponding to the user of device 102
is "seeing" in virtual space 128 (e.g., from the visual perspective
of the avatar). For example, the display 108 may display buildings,
objects, animals represented in virtual space 128, other avatars,
etc. In one embodiment, the avatar control module 210 may be
configured to cause the display module 212 to display a "feedback"
avatar 214. The feedback avatar 214 represents how the selected
avatar appears on the remote device, in a virtual place, etc. In
particular, the feedback avatar 214 appears as the avatar selected
by the user and may be animated using the same parameters generated
by avatar control module 210. In this way the user may confirm what
the remote user is seeing during their interaction.
[0037] The device 102 may further include a communication module
216 configured to transmit and receive information for selecting
avatars, displaying avatars, animating avatars, displaying virtual
place perspective, etc. The communication module 216 may include
custom, proprietary, known and/or after-developed communication
processing code (or instruction sets) that are generally
well-defined and operable to transmit avatar selections, avatar
parameters and receive remote avatar selections and remote avatar
parameters. The communication module 216 may also transmit and
receive audio information corresponding to avatar-based
interactions. The communication module 216 may transmits and
receive the above information via network 122 as previously
described.
[0038] The device 102 may further include one or more processor(s)
218 configured to perform operations associated with device 102 and
one or more of the modules included therein.
[0039] FIG. 3 illustrates an example face detection module 204a
consistent with various embodiments of the present disclosure. The
face detection module 204a may be configured to receive one or more
images from the camera 104 via the camera and audio framework
module 200 and identify, at least to a certain extent, a face (or
optionally multiple faces) in the image. The face detection module
204a may also be configured to identify and determine, at least to
a certain extent, one or more facial characteristics 206 in the
image. The facial characteristics 206 may be generated based on one
or more of the facial parameters identified by the face detection
module 204a as described herein. The facial characteristics 206 may
include may include features of the face, including, but not
limited to, the location and/or shape of facial landmarks such as
eyes, eyebrows, nose, mouth, etc., as well as movement of the
mouth, eyes and/or eyelids.
[0040] In the illustrated embodiment, the face detection module
204a may include a face detection/tracking module 300, a face
normalization module 302, a landmark detection module 304, a facial
pattern module 306, a face posture module 308, a facial expression
detection module 310, an eye detection/tracking module 312 and an
eye classification module 314. The face detection/tracking module
300 may include custom, proprietary, known and/or after-developed
face tracking code (or instruction sets) that is generally
well-defined and operable to detect and identify, at least to a
certain extent, the size and location of human faces in a still
image or video stream received from the camera 104. Such known face
detection/tracking systems include, for example, the techniques of
Viola and Jones, published as Paul Viola and Michael Jones, Rapid
Object Detection using a Boosted Cascade of Simple Features,
Accepted Conference on Computer Vision and Pattern Recognition,
2001. These techniques use a cascade of Adaptive Boosting
(AdaBoost) classifiers to detect a face by scanning a window
exhaustively over an image. The face detection/tracking module 300
may also track a face or facial region across multiple images.
[0041] The face normalization module 302 may include custom,
proprietary, known and/or after-developed face normalization code
(or instruction sets) that is generally well-defined and operable
to normalize the identified face in the image. For example, the
face normalization module 302 may be configured to rotate the image
to align the eyes (if the coordinates of the eyes are known), crop
the image to a smaller size generally corresponding the size of the
face, scale the image to make the distance between the eyes
constant, apply a mask that zeros out pixels not in an oval that
contains a typical face, histogram equalize the image to smooth the
distribution of gray values for the non-masked pixels, and/or
normalize the image so the non-masked pixels have mean zero and
standard deviation one.
[0042] The landmark detection module 304 may include custom,
proprietary, known and/or after-developed landmark detection code
(or instruction sets) that is generally well-defined and operable
to detect and identify, at least to a certain extent, the various
facial features of the face in the image. Implicit in landmark
detection is that the face has already been detected, at least to
some extent. Optionally, some degree of localization may have been
performed (for example, by the face normalization module 302) to
identify/focus on the zones/areas of the image where landmarks can
potentially be found. For example, the landmark detection module
304 may be based on heuristic analysis and may be configured to
identify and/or analyze the relative position, size, and/or shape
of the eyes (and/or the corner of the eyes), nose (e.g., the tip of
the nose), chin (e.g. tip of the chin), cheekbones, and jaw. The
eye-corners and mouth corners may also be detected using
Viola-Jones based classifier.
[0043] The facial pattern module 306 may include custom,
proprietary, known and/or after-developed facial pattern code (or
instruction sets) that is generally well-defined and operable to
identify and/or generate a facial pattern based on the identified
facial landmarks in the image. As may be appreciated, the facial
pattern module 306 may be considered a portion of the face
detection/tracking module 300.
[0044] The face posture module 308 may include custom, proprietary,
known and/or after-developed facial orientation detection code (or
instruction sets) that is generally well-defined and operable to
detect and identify, at least to a certain extent, the posture of
the face in the image. For example, the face posture module 308 may
be configured to establish the posture of the face in the image
with respect to the display 108 of the device 102. More
specifically, the face posture module 308 may be configured to
determine whether the user's face is directed toward the display
108 of the device 102, thereby indicating whether the user is
observing the content being displayed on the display 108.
[0045] The facial expression detection module 310 may include
custom, proprietary, known and/or after-developed facial expression
detection and/or identification code (or instruction sets) that is
generally well-defined and operable to detect and/or identify
facial expressions of the user in the image. For example, the
facial expression detection module 310 may determine size and/or
position of the facial features (e.g., eyes, mouth, cheeks, teeth,
etc.) and compare the facial features to a facial feature database
which includes a plurality of sample facial features with
corresponding facial feature classifications.
[0046] The eye detection/tracking module 312 may include custom,
proprietary, known and/or after-developed eye tracking code (or
instruction sets) that is generally well-defined and operable to
detect and identify, at least to a certain extent, eye movement
and/or eye gaze or focus of the user in the image. Similar to the
face posture module 308, the eye detection/tracking module 312 may
be configured to establish the direction in which the user's eyes
are directed with respect to the display 108 of the device 102. The
eye detection/tracking module 312 may be further configured to
establish eye blinking of a user.
[0047] As shown, the eye detection/tracking module 312 may include
an eye classification module 314 configured to determine whether
the user's eyes (individually and/or both) are open or closed and
movement of the user's eyes with respect to the display 108. In
particular, the eye classification module 314 is configured to
receive one or more normalized images (images normalized by the
normalization module 302). A normalized image may include, but is
not limited to, rotation to align the eyes (if the coordinates of
the eyes are known), cropping of the image, particularly cropping
of the eyes with reference to the eye-corner position, scaling the
image to make the distance between the eyes constant, histogram
equalizing the image to smooth the distribution of gray values for
the non-masked pixels, and/or normalizing the image so the
non-masked pixels have mean zero and a unit standard deviation.
[0048] Upon receipt of one or more normalized images, the eye
classification module 314 may be configured to separately identify
eye opening/closing and/or eye movement (e.g. looking left/right,
up/down, diagonally, etc.) with respect to the display 108 and, as
such, determine a status of the user's eyes in real-time or near
real-time during active video communication and/or interaction. The
eye classification module 314 may include custom, proprietary,
known and/or after-developed eye tracking code (or instruction
sets) that is generally well-defined and operable to detect and
identify, at least to a certain extent, movement of the eyelids and
eyes of the user in the image. In one embodiment, the eye
classification module 314 may use statistical-based analysis in
order to identify the status of the user's eyes (open/close,
movement, etc.), including, but not limited to, linear discriminant
analysis (LDA), artificial neural network (ANN) and/or support
vector machine (SVM). During analysis, the eye classification
module 314 may further utilize an eye status database, which may
include a plurality of sample eye features with corresponding eye
feature classifications.
[0049] As previously described, avatar animation may be based on
sensed facial actions (e.g., changes in facial characteristics 206
of a user, including eye and/or eyelid movement. The corresponding
feature points on an avatar's face may follow or mimic the
movements of the real person's face, which is known as "expression
clone" or "performance-driven facial animation." Accordingly, eye
opening/closing and eye movement may be animated in the avatar
model during active video communication and/or interaction by any
known methods.
[0050] For example, upon receipt of the avatar selection and avatar
parameters from the device 102, an avatar control module of the
remote device 112 may be configured to control (e.g. animate) the
avatar based on the facial characteristics 206, including the eye
and/or eyelid movement of the user. This may include normalizing
and remapping the user's face to the avatar face, copying any
changes to the facial characteristics 206 and driving the avatar to
perform the same facial characteristics and/or expression changes.
For facial feature-based animation methods, 2-D avatar animation
may be done with, for example, image warping or image morphing,
whereas 3-D avatar animation may be done with free form deformation
(FFD) or by utilizing the animation structure defined in a 3-D
model of a head. Oddcast is an example of a software resource
usable for 2-D avatar animation, while FaceGen is an example of a
software resource usable for 3-D avatar generation and
animation.
[0051] FIG. 4 illustrates an example system implementation in
accordance with at least one embodiment. Device 102' is configured
to communicate wirelessly via WiFi connection 400 (e.g., at work),
server 124' is configured to negotiate a connection between devices
102' and 112' via Internet 402, and apparatus 112' is configured to
communicate wirelessly via another WiFi connection 404 (e.g., at
home). In one embodiment, a device-to-device avatar-based video
call application is activated in apparatus 102'. Following avatar
selection, the application may allow at least one remote device
(e.g., device 112') to be selected. The application may then cause
device 102' to initiate communication with device 112'.
Communication may be initiated with device 102' transmitting a
connection establishment request to device 112' via enterprise
access point (AP) 406. The enterprise AP 406 may be an AP usable in
a business setting, and thus, may support higher data throughput
and more concurrent wireless clients than home AP 414. The
enterprise AP 406 may receive the wireless signal from device 102'
and may proceed to transmit the connection establishment request
through various business networks via gateway 408, The connection
establishment request may then pass through firewall 410, which may
be configured to control information flowing into and out of the
WiFi network 400.
[0052] The connection establishment request of device 102' may then
be processed by server 124'. The server 124' may be configured for
registration of IP addresses, authentication of destination
addresses and NAT traversals so that the connection establishment
request may be directed to the correct destination on Internet 402.
For example, server 124' may resolve the intended destination
(e.g., remote device 112') from information in the connection
establishment request received from device 102', and may route the
signal to through the correct NATs, ports and to the destination IP
address accordingly. These operations may only have to be performed
during connection establishment, depending on the network
configuration.
[0053] In some instances operations may be repeated during the
video call in order to provide notification to the NAT to keep the
connection alive. Media and Signal Path 412 may carry the video
(e.g., avatar selection and/or avatar parameters) and audio
information direction to home AP 414 after the connection has been
established. Device 112' may then receive the connection
establishment request and may be configured to determine whether to
accept the request. Determining whether to accept the request may
include, for example, presenting a visual narrative to a user of
device 112' inquiring as to whether to accept the connection
request from device 102'. Should the user of device 112' accept the
connection (e.g., accept the video call) the connection may be
established. Cameras 104' and 114' may be configured to then start
capturing images of the respective users of devices 102' and 112',
respectively, for use in animating the avatars selected by each
user. Microphones 106' and 116' may be configured to then start
recording audio from each user. As information exchange commences
between devices 102' and 112', displays 108' and 118' may display
and animate avatars corresponding to the users of devices 102' and
112'.
[0054] FIG. 5 is a flowchart of example operations in accordance
with at least one embodiment. In operation 502 an application
(e.g., an avatar-based voice call application) may be activated in
a device. Activation of the application may be followed by
selection of an avatar. Selection of an avatar may include an
interface being presented by the application, the interface
allowing the user to select a predefined avatar. After avatar
selection, communications may be configured in operation 504.
Communication configuration includes the identification of at least
one remote device or a virtual space for participation in the video
call. For example, a user may select from a list of remote
users/devices stored within the application, stored in association
with another system in the device (e.g., a contacts list in a smart
phone, cell phone, etc.), stored remotely, such as on the Internet
(e.g., in a social media website like Facebook, LinkedIn, Yahoo,
Google+, MSN, etc.). Alternatively, the user may select to go
online in a virtual space like Second Life.
[0055] In operation 506, communication may be initiated between the
device and the at least one remote device or virtual space. For
example, a connection establishment request may be transmitted to
the remote device or virtual space. For the sake of explanation
herein, it is assumed that the connection establishment request is
accepted by the remote device or virtual space. A camera in the
device may then begin capturing images in operation 508, The images
may be still images or live video (e.g., multiple images captured
in sequence). In operation 510 image analysis may occur starting
with detection/tracking of a face/head in the image. The detected
face may then be analyzed in order to detect facial characteristics
(e.g., facial landmarks, facial expression, etc.). In operation 512
the detected face/head position and/or facial characteristics are
converted into Avatar parameters. Avatar parameters are used to
animate the selected avatar on the remote device or in the virtual
space. In operation 514 at least one of the avatar selection or the
avatar parameters may be transmitted.
[0056] Avatars may be displayed and animated in operation 516. In
the instance of device-to-device communication (e.g., system 100),
at least one of remote avatar selection or remote avatar parameters
may be received from the remote device. An avatar corresponding to
the remote user may then be displayed based on the received remote
avatar selection, and may be animated based on the received remote
avatar parameters. In the instance of virtual place interaction
(e.g., system 126), information may be received allowing the device
to display what the avatar corresponding to the device user is
seeing. A determination may then be made in operation 518 as to
whether the current communication is complete. If it is determined
in operation 518 that the communication is not complete, operations
508-516 may repeat in order to continue to display and animate an
avatar on the remote apparatus based on the analysis of the user's
face. Otherwise, in operation 520 the communication may be
terminated. The video call application may also be terminated if,
for example, no further video calls are to be made.
[0057] While FIG. 5 illustrates various operations according to an
embodiment, it is to be understood that not all of the operations
depicted in FIG. 5 are necessary for other embodiments. Indeed, it
is fully contemplated herein that in other embodiments of the
present disclosure, the operations depicted in FIG. 5 and/or other
operations described herein may be combined in a manner not
specifically shown in any of the drawings, but still fully
consistent with the present disclosure. Thus, claims directed to
features and/or operations that are not exactly shown in one
drawing are deemed within the scope and content of the present
disclosure.
[0058] A system consistent with the present disclosure provides
detection and/or tracking of a user's eyes during active
communication, including the detection of characteristics of a
user's eyes, including, but not limited to, eyeball movement, gaze
direction and/or point of focus of the user's eyes, eye blinking,
etc. The system uses a statistical-based approach for the
determination of the status (e.g. open/closed eye and/or direction
of eye gaze) of a user's eyes. The system further provides avatar
animation based at least in part on the detected characteristics of
the user's eyes in real-time or near real-time during active
communication and interaction. Animation of a user's eyes may
enhance interaction between users, as the human eyes and the
characteristics associated with them, including movement and
expression, may convey rich information during active
communication, such as, for example, a user's interest, emotions,
etc.
[0059] A system consistent with the present disclosure provides
advantages. For example, the use of statistical-based methods
allows the performance of eye analysis and classifying to be
improved by increasing sample collection and classifier
re-training. Additionally, in contrast to other known methods of
eye analysis, such as, for example, template-matching methods
and/or geometry-based methods, a system consistent with the present
disclosure generally does not require calibration before use nor
does the system require special hardware, such as, for example,
infrared lighting or close-view camera. Additionally, a system
consistent with the present disclosure does not require a learning
process for new user's.
[0060] Various features, aspects, and embodiments have been
described herein. The features, aspects, and embodiments are
susceptible to combination with one another as well as to variation
and modification, as will be understood by those having skill in
the art. The present disclosure should, therefore, be considered to
encompass such combinations, variations, and modifications. Thus,
the breadth and scope of the present invention should not be
limited by any of the above-described exemplary embodiments, but
should be defined only in accordance with the following claims and
their equivalents.
[0061] As used in any embodiment herein, the term "module" may
refer to software, firmware and/or circuitry configured to perform
any of the aforementioned operations. Software may be embodied as a
software package, code, instructions, instruction sets and/or data
recorded on non-transitory computer readable storage medium.
Firmware may be embodied as code, instructions or instruction sets
and/or data that are hard-coded (e.g., nonvolatile) in memory
devices. "Circuitry", as used in any embodiment herein, may
comprise, for example, singly or in any combination, hardwired
circuitry, programmable circuitry such as computer processors
comprising one or more individual instruction processing cores,
state machine circuitry, and/or firmware that stores instructions
executed by programmable circuitry. The modules may, collectively
or individually, be embodied as circuitry that forms part of a
larger system, for example, an integrated circuit (IC), system
on-chip (SoC), desktop computers, laptop computers, tablet
computers, servers, smart phones, etc.
[0062] Any of the operations described herein may be implemented in
a system that includes one or more storage mediums having stored
thereon, individually or in combination, instructions that when
executed by one or more processors perform the methods. Here, the
processor may include, for example, a server CPU, a mobile device
CPU, and/or other programmable circuitry. Also, it is intended that
operations described herein may be distributed across a plurality
of physical devices, such as processing structures at more than one
different physical location. The storage medium may include any
type of tangible medium, for example, any type of disk including
hard disks, floppy disks, optical disks, compact disk read-only
memories (CD-ROMs), compact disk rewritables (CD-RWs), and
magneto-optical disks, semiconductor devices such as read-only
memories (ROMs), random access memories (RAMs) such as dynamic and
static RAMs, erasable programmable read-only memories (EPROMs),
electrically erasable programmable read-only memories (EEPROMs),
flash memories, Solid State Disks (SSDs), magnetic or optical
cards, or any type of media suitable for storing electronic
instructions. Other embodiments may be implemented as software
modules executed by a programmable control device. The storage
medium may be non-transitory.
[0063] The terms and expressions which have been employed herein
are used as terms of description and not of limitation, and there
is no intention, in the use of such terms and expressions, of
excluding any equivalents of the features shown and described (or
portions thereof), and it is recognized that various modifications
are possible within the scope of the claims. Accordingly, the
claims are intended to cover all such equivalents. Various
features, aspects, and embodiments have been described herein. The
features, aspects, and embodiments are susceptible to combination
with one another as well as to variation and modification, as will
be understood by those having skill in the art. The present
disclosure should, therefore, be considered to encompass such
combinations, variations, and modifications.
[0064] As described herein, various embodiments may be implemented
using hardware elements, software elements, or any combination
thereof. Examples of hardware elements may include processors,
microprocessors, circuits, circuit elements (e.g., transistors,
resistors, capacitors, inductors, and so forth), integrated
circuits, application specific integrated circuits (ASIC),
programmable logic devices (PLD), digital signal processors (DSP),
field programmable gate array (FPGA), logic gates, registers,
semiconductor device, chips, microchips, chip sets, and so
forth.
[0065] Reference throughout this specification to "one embodiment"
or "an embodiment" means that a particular feature, structure, or
characteristic described in connection with the embodiment is
included in at least one embodiment. Thus, appearances of the
phrases "in one embodiment" or "in an embodiment" in various places
throughout this specification are not necessarily all referring to
the same embodiment. Furthermore, the particular features,
structures, or characteristics may be combined in any suitable
manner in one or more embodiments.
[0066] According to one aspect, there is provided a system for
interactive avatar communication interactive avatar communication
between a first user device and a remote user device. The system
includes a camera configured to capture images, a communication
module configured to initiate and establish communication, and to
transmit and receive information, between said first and said
second user devices. The system further includes one or more
storage mediums having stored thereon, individually or in
combination, instructions that when executed by one or more
processors result in one or more operations. The operations include
selecting an avatar, initiating communication, capturing an image,
detecting a face in the image and determining facial
characteristics from the face. The facial characteristics include
at least one of eye movement and eyelid movement, converting the
facial characteristics to avatar parameters, transmitting at least
one of the avatar selection and avatar parameters.
[0067] Another example system includes the foregoing components and
determining facial characteristics from the face includes
determining a facial expression in the face.
[0068] Another example system includes the foregoing components and
the avatar selection and avatar parameters are used to generate an
avatar on a remote device, the avatar being based on the facial
characteristics.
[0069] Another example system includes the foregoing components and
the avatar selection and avatar parameters are used to generate an
avatar in a virtual space, the avatar being based on the facial
characteristics.
[0070] Another example system includes the foregoing components and
the instructions that when executed by one or more processors
result in the following additional operation of receiving at least
one of a remote avatar selection or remote avatar parameters.
[0071] Another example system includes the foregoing components and
further includes a display, the instructions that when executed by
one or more processors result in the following additional operation
of displaying an avatar based on the remote avatar selection.
[0072] Another example system includes the foregoing components and
the instructions that when executed by one or more processors
result in the following additional operation of animating the
displayed avatar based on the remote avatar parameters.
[0073] According to one aspect, there is provided an apparatus for
interactive avatar communication between a first user device and a
remote user device. The apparatus includes a communication module
configured to initiate and establish communication between the
first and the remote user devices and to transmit information
between the first and the remote user devices. The apparatus
further includes an avatar selection module configured to allow a
user to select an avatar for use during the communication. The
apparatus further includes a face detection module configured to
detect a facial region in an image of the user and to detect and
identify one or more facial characteristics of the face. The facial
characteristics include eye movement and eyelid movement of the
user. The apparatus further includes an avatar control module
configured to convert the facial characteristics to avatar
parameters. The communication module is configured to transmit at
least one of the avatar selection and avatar parameters.
[0074] Another example apparatus includes the foregoing components
and further includes an eye detection/tracking module configured to
detect and identify at least one of eye movement of the user with
respect to a display and eyelid movement of the user,
[0075] Another example apparatus includes the foregoing components
and the eye detection/tracking module includes an eye
classification module configured to determine at least one of gaze
direction of the user's eyes user and blinking of the user's
eyes.
[0076] Another example apparatus includes the foregoing components
and the avatar selection and avatar parameters are used to generate
an avatar on the remote device, the avatar being based on the
facial characteristics.
[0077] Another example apparatus includes the foregoing components
and the communication module is configured to receive at least one
of a remote avatar selection and remote avatar parameters.
[0078] Another example apparatus includes the foregoing components
and further includes a display configured to display an avatar
based on the remote avatar selection.
[0079] Another example apparatus includes the foregoing components
and the avatar control module is configured to animate the
displayed avatar based on the remote avatar parameters.
[0080] According to another aspect there is provided a method for
interactive avatar communication. The method includes selecting an
avatar, initiating communication, capturing an image, detecting a
face in the image and determining facial characteristics from the
face, The facial characteristics include at least one of eye
movement and eyelid movement, converting the facial characteristics
to avatar parameters, transmitting at least one of the avatar
selection and avatar parameters.
[0081] Another example method includes the foregoing operations and
determining facial characteristics from the face includes
determining a facial expression in the face.
[0082] Another example method includes the foregoing operations and
the avatar selection and avatar parameters are used to generate an
avatar on a remote device, the avatar being based on the facial
characteristics.
[0083] Another example method includes the foregoing operations and
the avatar selection and avatar parameters are used to generate an
avatar in a virtual space, the avatar being based on the facial
characteristics.
[0084] Another example method includes the foregoing operations and
further includes receiving at least one of a remote avatar
selection or remote avatar parameters.
[0085] Another example method includes the foregoing operations and
further includes displaying an avatar based on the remote avatar
selection on a display.
[0086] Another example method includes the foregoing operations and
further includes animating the displayed avatar based on the remote
avatar parameters.
[0087] According to another aspect there is provided at least one
computer accessible medium including instructions stored thereon.
When executed by one or more processors, the instructions may cause
a computer system to perform operations for interactive avatar
communication. The operations include selecting an avatar,
initiating communication, capturing an image, detecting a face in
the image and determining facial characteristics from the face. The
facial characteristics include at least one of eye movement and
eyelid movement, converting the facial characteristics to avatar
parameters, transmitting at least one of the avatar selection and
avatar parameters.
[0088] Another example computer accessible medium includes the
foregoing operations and determining facial characteristics from
the face includes determining a facial expression in the face.
[0089] Another example computer accessible medium includes the
foregoing operations and the avatar selection and avatar parameters
are used to generate an avatar on a remote device, the avatar being
based on the facial characteristics.
[0090] Another example computer accessible medium includes the
foregoing operations and the avatar selection and avatar parameters
are used to generate an avatar in a virtual space, the avatar being
based on the facial characteristics.
[0091] Another example computer accessible medium includes the
foregoing operations and further includes receiving at least one of
a remote avatar selection or remote avatar parameters.
[0092] Another example computer accessible medium includes the
foregoing operations and further includes displaying an avatar
based on the remote avatar selection on a display.
[0093] Another example computer accessible medium includes the
foregoing operations and further includes animating the displayed
avatar based on the remote avatar parameters.
[0094] The terms and expressions which have been employed herein
are used as terms of description and not of limitation, and there
is no intention, in the use of such terms and expressions, of
excluding any equivalents of the features shown and described (or
portions thereof), and it is recognized that various modifications
are possible within the scope of the claims. Accordingly, the
claims are intended to cover all such equivalents.
* * * * *