U.S. patent application number 13/994761 was filed with the patent office on 2013-10-17 for facilitating television based interaction with social networking tools.
The applicant listed for this patent is Yangzhou Du, Jianguo Li, Wenlong Li, Xiaofeng Tong, Peng Wang. Invention is credited to Yangzhou Du, Jianguo Li, Wenlong Li, Xiaofeng Tong, Peng Wang.
Application Number | 20130276007 13/994761 |
Document ID | / |
Family ID | 47882502 |
Filed Date | 2013-10-17 |
United States Patent
Application |
20130276007 |
Kind Code |
A1 |
Li; Wenlong ; et
al. |
October 17, 2013 |
Facilitating Television Based Interaction with Social Networking
Tools
Abstract
Video analysis may be used to determine who is watching
television and their level of interest in the current programming.
Lists of favorite programs may be derived for each of a plurality
of viewers of programming on the same television receiver.
Inventors: |
Li; Wenlong; (Beijing,
CN) ; Du; Yangzhou; (Beijing, CN) ; Li;
Jianguo; (Beijing, CN) ; Tong; Xiaofeng;
(Beijing, CN) ; Wang; Peng; (Beijing, CN) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
Li; Wenlong
Du; Yangzhou
Li; Jianguo
Tong; Xiaofeng
Wang; Peng |
Beijing
Beijing
Beijing
Beijing
Beijing |
|
CN
CN
CN
CN
CN |
|
|
Family ID: |
47882502 |
Appl. No.: |
13/994761 |
Filed: |
September 12, 2011 |
PCT Filed: |
September 12, 2011 |
PCT NO: |
PCT/CN2011/001544 |
371 Date: |
June 17, 2013 |
Current U.S.
Class: |
725/12 ;
725/10 |
Current CPC
Class: |
H04N 21/44218 20130101;
G06Q 50/01 20130101 |
Class at
Publication: |
725/12 ;
725/10 |
International
Class: |
H04N 21/442 20060101
H04N021/442 |
Claims
1. A method comprising: electronically determining whether a person
is watching television content.
2. The method of claim 1 further including electronically
determining if the person is also online.
3. The method of claim 1 including using electronic video analysis
to determine if the person is watching television.
4. The method of claim 1 including automatically indicating, using
a social networking tool, when the person is watching
television.
5. The method of claim 1 including automatically electronically
assessing whether the person likes the television content.
6. The method of claim 5 including using video facial analysis to
assess whether the person likes the television content.
7. The method of claim 5 including automatically reporting the
analysis of whether the user likes the content using a social
networking tool.
8. The method of claim 1 including electronically identifying a
plurality of persons who are watching television.
9. The method of claim 8 including using electronic video analysis
to identify each of a plurality of persons who are watching
television.
10. The method of claim 8 including electronically developing lists
of favorite programs for each of a plurality of said persons.
11. A non-transitory computer readable medium storing instructions
executed by a computer to: determine if a person is actually
watching television; and transmit information about whether the
person is watching television, using a social networking tool.
12. The medium of claim 11 further storing instructions to
determine if the person is also online and to transmit this
information to a social networking tool.
13. The medium of claim 12 further storing instructions to
determine if the person likes what is on the television and to
report that information to a social networking tool.
14. The medium of claim 13 further storing instructions to use
video facial analysis to determine whether the person likes what is
on the television.
15. The medium of claim 11 further storing instructions to use
video analysis to identify each of a plurality of persons watching
the television.
16. The medium of claim 15 further storing instructions to
determine which television programs each of said persons likes.
17. The medium of claim 16 further storing instructions to compile
lists of television programs each of a plurality of viewers
likes.
18. The medium of claim 14 further storing instructions to link
with a friend on a social networking site that likes the same
television program.
19. A system comprising: a processor to identify a person watching
television and to report, that the person is watching television,
through a social networking tool; and a storage coupled to said
processor.
20. The system of claim 19, said processor to determine if the
person is also on line.
21. The system of claim 19 including a video camera coupled to said
processor, said processor to use video facial analysis to determine
if the person likes a program on television.
22. The system of claim 19, said processor to identify a plurality
of people watching television.
23. The system of claim 22, said processor to determine which
programs each of said people like.
24. The system of claim 23 to compile lists of programs each of
said people like.
25. The system of claim 24 including a video camera coupled to said
processor, said processor to use video analysis to determine who is
watching television and whether they like a program on
television.
26. The system of claim 19, said processor to communicate, using a
social network tool, whether the person likes a television
program.
27. The system of claim 26, said processor coupled to a video
camera, said processor to analyze video to determine whether the
user likes the television program.
Description
BACKGROUND
[0001] This relates generally to television and to interaction with
social networking tools.
[0002] Social networking tools have become essential to the lives
of many people. Social networking tools allow their users to keep
track of their friends and to find sources of additional contacts
with existing and new friends.
[0003] One advantage of social networking is that friends with
similar interests can be identified. However, to determine what
those interests are usually requires a lot of user input. For
example, a user may maintain a Facebook page that indicates area of
interest. The amount of information that may be provided may be
limited because of the amount of time that it takes and the amount
of imagination it may involve to provide a full exposition of all
the user's interests, likes, and dislikes.
BRIEF DESCRIPTION OF THE DRAWINGS
[0004] FIG. 1 is a schematic depiction of one embodiment of the
present invention;
[0005] FIG. 2 is a flow chart for one embodiment of the present
invention; and
[0006] FIG. 3 is a flow chart for another embodiment of the present
invention.
DETAILED DESCRIPTION
[0007] In accordance with some embodiments, information about a
user's television experience may be automatically conveyed to
social networking tools as a modality for increasing social
interaction. Moreover, some embodiments may actually determine, not
only whether the user is online, but also whether the user is
actually proximate to the user's television display. In some
embodiments, it can be determined from the user's facial
expressions whether or not the user likes or dislikes the currently
displayed programming. Also, in some embodiments, favorite program
lists for various television viewers may be compiled in an
automated fashion. This information may then be uploaded to social
networking tools or to other avenues for social interaction.
[0008] Referring to FIG. 1, a television display 18 may be
equipped, in one embodiment, with a television camera 16. While, in
some embodiments, the television camera may be mounted on or
integrated with the television display 18, the camera, of course,
can be completely separate from the television display. However, it
is advantageous that the camera 16 be mounted in a way that it can
capture images of those people watching the television and can also
capture their facial expressions. Thus, the television 18 may
receive a video source which may be an airwave broadcast, streaming
Internet information, a digital movie from a storage device, such
as a DVD player, an interactive game played over the Internet or
using a digital media player.
[0009] The output from the camera 16 may be connected to a
processor-based system 10. The processor-based system 10 may be any
type of computer, including a laptop computer, a desktop computer,
an entertainment device, or a cell phone, to mention a few
examples. The processor-based system 10 may include a video
interface 22 that receives the video from the camera 16 and
converts it to the proper format for use by a processor 12. The
video interface may provide the video for a user status module
24.
[0010] In accordance with one embodiment, the user status module
determines whether the user is actually online and, in some
embodiments, whether the user is actually watching television. The
online status can be determined from detecting inputs and outputs
through a network interface controller, for example. Whether the
user is actually viewing the program can be determined, for
example, from video analysis of the camera 16 video feed to detect
whether the user is present in front of the television screen.
[0011] In some embodiments, the user status module may detect
numerous television viewers. Each of those viewers may be
identified by automated facial analysis. For example, in the setup
mode, each viewer may be prompted to cooperate in the capture of a
picture of the user. Then the system can compare the images of
faces of viewers of the television program with those prerecorded
video clips or still shots taken during a setup mode, to identify
the currently active viewers.
[0012] Thus, user status module 24, in some embodiments, not only
indicates whether or not any viewer is viewing the television but
actually identifies which of a plurality of viewers are actually
viewing the television display 18.
[0013] The user status module may be coupled to a user interest
detection module 26 that also receives a video feed from the video
interface 22. The user interest detection module 26 may analyze the
user's facial expressions using video facial expression analysis
tools to determine whether the user is interested in the program or
disinterested. Likewise, facial expression analysis can be used to
determine whether the user likes the program or dislikes the
program. Information from the user interest detection module may be
combined with the results from the user status module for provision
of information to a social networking interface 28. In some
embodiments, instantaneous video facial analysis of the user's
likes and dislikes may be conveyed to social networking tools. A
"social networking tool," as used herein, is an electronic
communication technology, such as a website, that helps people
interact with existing friends or colleagues and/or helps people
discover new friends or colleagues by illuminating shared
interests. Also, emails, tweets, text messages or other
communications may be provided, as part of a social networking
tool, to indicate the user's current activity and level of
satisfaction.
[0014] In some embodiments video clips from the television program
may be captured and conveyed to the processer 12 for distribution
over the social networking interface 28 together with an indication
of the user's viewing status and the user's current level of
interest.
[0015] The storage 14 may store the captured video and may also
store programs 30 and 50 for implementing embodiments of the
present invention.
[0016] In particular in some embodiments of the present invention,
the sequences depicted in FIGS. 2 and 3 may be implemented in
hardware, software and/or firmware. In software or firmware
implemented embodiments, the sequences may be implemented by
computer executed instructions stored on a non-transitory storage
medium such as a semiconductor, magnetic or optical storage
device.
[0017] Referring to FIG. 2, one sequence may begin in one
embodiment by receiving a feed (as indicated in block 32) from the
camera 16, shown in FIG. 1. An initial phase (block 34) may involve
a password login through a user interface or a face login wherein
the user submits to video facial analysis using the camera 16 and
the user status module 24. Once the user has identified himself or
herself through a login and/or facial recognition, the user may
select a video program for viewing as indicated in block 36. This
program may be identified using a variety of tools including
capturing information from an electronic programming guide,
capturing video, audio or metadata clips and analyzing them using
inputs from the user's friends over social networking tools or from
Internet or database image or text searching or using any other
tool.
[0018] Then the user's online status may be determined through
video face detection as indicated in block 38. Namely the user can
be identified through analysis of the camera feed 16 to determine
that the user not only is active on his or her processor based
system 10 but is actually in front of and viewing an active
television program.
[0019] Next the level of user interest may be determined using
facial expression analysis as indicated in block 40. Well known
video facial analysis techniques for determining whether the user
is interested or disinterested or whether the user likes or
dislikes a particular sequence in the video may be used. Thus
information in real time may be provided to indicate whether the
user's level of interest or disinterest or likes or dislikes have
changed. This may be correlated to current content being viewed in
terms of time, for example, while providing captured video clips
from that content together with the indication of the user's level
of interest.
[0020] The video facial analysis can be done locally or remotely.
Remote video analysis may be accomplished by sending video to a
remote server over a network connection, for example.
[0021] The information deduced from the facial expression analysis
may then be conveyed to friends using social networking tools as
indicated in block 42. In some embodiments, the social networking
message distribution may be screened or filtered so only those
users who are friends, friends who like the same television
program, friends who are actually online, friends who are actually
watching television or some combination of these categories as
indicated in block 42. Friends can then be linked if they like the
same television program, for example.
[0022] This social networking tool interaction provides a means for
providing information about the user which may facilitate
engagement with new friends and create resources for interaction
with existing friends. In addition, the information may be used for
demographics collection by content providers and advertisers.
Particularly, content providers or advertisers may get very
detailed information about what users liked at particular times
during a given program or advertisement.
[0023] With one exemplary embodiment, six major steps may be used
for facial attribute detection. First, the face detection may be
run to locate a face rectangle region for a given digital image or
video frame. Then, a facial landmark detector may be run to find
six point landmarks, such as eye-corners and mouth corners, in each
detected face rectangle. Next, the face rectangle image may be
aligned and normalized according to facial landmark points to a
predefined standard size, such as 64.times.64 (i.e., 64 pixels wide
by 64 pixels tall). Then local features may be extracted, including
local binary patterns, histograms, or histograms of oriented
gradients from preselected local regions of the normalized face
images. Each local region is then fed to a multi-layer perception
based weak classifier for prediction. The output from the weak
classifiers of each local region are aggregated as the final
detection score. The score may be in the range of 0-1, the larger
the score the higher the facial attribute detection confidence.
Face detection may follow the standard Viola-Jones boosting cascade
framework. The Viola-Jones detector can be found in the public
OpenCV software package. The facial landmarks include six facial
points, including eye-corners from the left and right eyes and
mouth corners. The eye-corners and mouth corners may also be
detected using Viola-Jones based classifiers. In addition, geometry
constraints may be incorporated to six facial points to reflect
their geometry relationship.
[0024] All detected faces may be converted to gray scale, aligned
and normalized to the predefined standard size such as 64.times.64.
The alignment may be done by first computing the rotation angle
between the eye corner lines and the horizontal line. Then the
image angle is rotated to make the eye corner parallel to the
horizontal line. Next, two eye-center distances w are computed and
eye-to-mouth distance h is computed. Then a 2 w.times.2 h rectangle
is cropped from the face region to make the left eye-center at 0.5
w, 0.5 h, right center 0.5 w, 0.5 h, and mouth center at w, 1.5 h.
The cropped rectangle is finally scaled to the standard size. To
alleviate lighting differences between images, the scaling image
can be histogram equalized.
[0025] Local features on local regions of aligned and normalized
faces may be extracted. The local features can be local binary
patterns, histogram, histogram of oriented gradients. For example,
the extracted local features may be different for different facial
attributes. For example, in smile detection, local binary patterns
are a little better than other techniques while in gender/age
detection, histogram of oriented gradient works slightly
better.
[0026] The local region is defined as a quadruple (x,y,w,h), where
(x,y) is a top left corner point of the local region and (w,h) is
the width and height of the rectangle of the local region. A
boosting algorithm may be used to select discriminating regions for
facial attribute detection from a training dataset.
[0027] For each selected local region, a classifier may be trained
to do the weak classification. The base classifier may be
multi-layer perceptions rather than support vector machines.
Multi-layer perceptions (MLP) may be advantageous in some
embodiments because it can provide similar performance to state of
the art support vector machine-based algorithms. Also, the model
size of the MLP is much smaller than the support vector machines
(SVM), since MLP only stores network weights as models while SVM
stores sparse training samples. The prediction of MLP is relatively
fast as it only contains vector product operations and MLP directly
gives probability and score output but only for prediction
confidence.
[0028] The MLP may include an input layer, an output layer and one
hidden layer. Suppose there are d nodes at the input layer, where d
is the dimension of the local features, 59 for local binary pattern
histograms, 2 nodes at the output layer for smile detection and 2
nodes indicate prediction probability for smiling or non-smiling,
while the number of nodes in the hidden layer is a tuned parameter
and determined by a training procedure.
[0029] All nodes, known as neurons, in MLP may be similar. MLP may
take the output values from several nodes in the previous layer on
input and pass the responses to the neurons in the next layer. The
values retrieved from the previous layer are summed with training
weights for each node, plus a bias term, and the sum is transformed
using an activation function f.
[0030] The activation function f is usually a sigmoid function,
such as f(x)=e.sup.-xa/(1+e.sup.-xa). The output of this function
is in the range of 0 to 1. At each node, the computation is a
vector product between a weight factor and input vector from the
previous layer: y=f(wmd x), where w is the weight factor and x is
the input vector. Thus, the computations can be easily accelerated
by single instruction, multiple data instructions (SIMD) or other
accelerators.
[0031] MLP is used as a weak classifier for each local region. Each
selected region associates with one MLP classifier. The final
classification is based on a simple aggregating rule as follows.
For a given test sample x, for each selected local region k,
extract the local features x.sub.k at that region. Then use a weak
MLP classifier C.sub.k (x.sub.k) to do the prediction. The final
output is the aggregated result
C _ ( x ) = 1 K n = 1 K C n ( x ) . ##EQU00001##
[0032] Referring next to FIG. 3, a camera feed is received at block
32. At block 52, a people list identification may be assembled
using facial detection and recognition. In other words, all the
people viewing the content (such as a television program) may be
recorded using the camera 16. Then, video content analysis may be
used to identify viewers who are watching and are depicted in that
video stream. Again faces may be recorded with identifiers in a set
up phase, in one embodiment.
[0033] Video expression analysis may then be used to determine
which ones of the users viewing the program actually likes the
given program at a given instance of time as indicated in block 54.
Over time, favorite program lists for each video identified viewer
may be developed as indicated in block 56. Then in block 58,
program recommendations based on the user's computer detected
facial expressions may be pushed to friends over social networking
tools, including websites, tweets, text messages or emails, for
example.
[0034] References throughout this specification to "one embodiment"
or "an embodiment" mean that a particular feature, structure, or
characteristic described in connection with the embodiment is
included in at least one implementation encompassed within the
present invention. Thus, appearances of the phrase "one embodiment"
or "in an embodiment" are not necessarily referring to the same
embodiment. Furthermore, the particular features, structures, or
characteristics may be instituted in other suitable forms other
than the particular embodiment illustrated and all such forms may
be encompassed within the claims of the present application.
[0035] While the present invention has been described with respect
to a limited number of embodiments, those skilled in the art will
appreciate numerous modifications and variations therefrom. It is
intended that the appended claims cover all such modifications and
variations as fall within the true spirit and scope of this present
invention.
* * * * *