U.S. patent application number 14/551985 was filed with the patent office on 2015-03-19 for system for say-feel gap analysis in video.
The applicant listed for this patent is Sensory Logic, Inc.. Invention is credited to Daniel A. Hill.
Application Number | 20150081304 14/551985 |
Document ID | / |
Family ID | 48280715 |
Filed Date | 2015-03-19 |
United States Patent
Application |
20150081304 |
Kind Code |
A1 |
Hill; Daniel A. |
March 19, 2015 |
SYSTEM FOR SAY-FEEL GAP ANALYSIS IN VIDEO
Abstract
Systems and techniques using observed emotional data are
described herein. An audio stream of a subject corresponding in
time to a sequence of visual observations of the subject can be
received. A transcript of speech uttered in the audio stream can be
produced. A meaning of a string in the transcript can be
determined. The sequence of visual observations that correspond to
speech that produced the string can be received. An emotional state
of the subject can be determined based on the sequence of visual
observations. A correlation value can be calculated for the string
by comparing the meaning and the emotional state.
Inventors: |
Hill; Daniel A.; (St. Paul,
MN) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
Sensory Logic, Inc. |
Minneapolis |
MN |
US |
|
|
Family ID: |
48280715 |
Appl. No.: |
14/551985 |
Filed: |
November 24, 2014 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
13676296 |
Nov 14, 2012 |
8903176 |
|
|
14551985 |
|
|
|
|
61559582 |
Nov 14, 2011 |
|
|
|
Current U.S.
Class: |
704/254 |
Current CPC
Class: |
G06K 9/00302 20130101;
G06K 9/46 20130101; G10L 15/1815 20130101; G06K 9/00845
20130101 |
Class at
Publication: |
704/254 |
International
Class: |
G10L 15/18 20060101
G10L015/18 |
Claims
1. A system comprising: an audio processing module to: receive an
audio stream of a subject corresponding in time to a sequence of
visual observations of the subject; and produce a transcript of
speech uttered in the audio stream; a semantic processing module to
determine a meaning of a string in the transcript; an image
processing module to receive the sequence of visual observations
that correspond to speech that produced the string; an emotion
determination module to determine an emotional state of the subject
based on the sequence of visual observations; and a difference
module to calculate a correlation value for the string by comparing
the meaning and the emotional state.
2. The system of claim 1 comprising a presentation module
configured to present the correlation to a user.
3. The system of claim 2, wherein to present the correlation
includes the presentation module to play the sequence of visual
observations, and to produce an audio representation with the
sequence of visual observations.
4. The system of claim 3, wherein the audio representation includes
a modified aspect of the audio stream.
5. The system of claim 2, wherein to present the correlation
includes the presentation module to present a visual indication of
the emotional state in a representation of the transcript
corresponding to the string.
6. The system of claim 2, wherein to present the correlation
includes the presentation module to vary an intensity of the
presentation based on the magnitude of the correlation.
7. The system of claim 1, wherein the correlation includes an
engagement component.
8. The system of claim 1, wherein the correlation includes an
emotional response component, the emotional response component
including at least one of impact or appeal.
9. A method comprising: receiving an audio stream of a subject
corresponding in time to a sequence of visual observations of the
subject; producing a transcript of speech uttered in the audio
stream; determining a meaning of a string in the transcript;
receiving the sequence of visual observations that correspond to
speech that produced the string; determining an emotional state of
the subject based on the sequence of visual observations; and
calculating a correlation value for the string by comparing the
meaning and the emotional state.
10. The method of claim 9, comprising presenting the correlation to
a user.
11. The method of claim 10, wherein presenting the correlation
includes playing the sequence of visual observations, and producing
an audio representation with the sequence of visual
observations.
12. The method of claim 11, wherein the audio representation
includes a modified aspect of the audio stream.
13. The method of claim 10, wherein presenting the correlation
includes presenting a visual indication of the emotional state in a
representation of the transcript corresponding to the string.
14. The method of claim 10, wherein presenting the correlation
includes creating a modified sequence of images by changing a
portion of an image in the sequence of images, and playing the
modified sequence of images.
15. The method of claim 10, wherein presenting the correlation
includes varying an intensity of the presentation based on the
magnitude of the correlation.
16. The method of claim 9, wherein the correlation includes an
emotional response component, the emotional response component
including at least one of impact or appeal.
17. A machine readable medium that is not a transitory propagating
signal, the machine readable medium including instruction that,
when executed by a machine, cause the machine to perform operations
comprising: receiving an audio stream of a subject corresponding in
time to a sequence of visual observations of the subject; producing
a transcript of speech uttered in the audio stream; determining a
meaning of a string in the transcript; receiving the sequence of
visual observations that correspond to speech that produced the
string; determining an emotional state of the subject based on the
sequence of visual observations; and calculating a correlation
value for the string by comparing the meaning and the emotional
state.
18. The machine readable medium of claim 17, wherein the operations
include presenting the correlation to a user.
19. The machine readable medium of claim 18, wherein presenting the
correlation includes playing the sequence of visual observations,
and producing an audio representation with the sequence of visual
observations.
20. The machine readable medium of claim 19, wherein the audio
representation includes a modified aspect of the audio stream.
21. The machine readable medium of claim 18, wherein presenting the
correlation includes presenting a visual indication of the
emotional state in a representation of the transcript corresponding
to the string.
22. The machine readable medium of claim 18, wherein presenting the
correlation includes creating a modified sequence of images by
changing a portion of an image in the sequence of images, and
playing the modified sequence of images.
23. The machine readable medium of claim 18, wherein presenting the
correlation includes varying an intensity of the presentation based
on the magnitude of the correlation.
24. The machine readable medium of claim 17, wherein the
correlation includes an emotional response component, the emotional
response component including at least one of impact or appeal.
Description
CLAIM OF PRIORITY
[0001] This application is a continuation of and claims the benefit
of priority to U.S. patent application Ser. No. 13/676,296, filed
14 Nov. 2012, which claims the benefit of priority, under 35 U.S.C.
.sctn.119(e), to U.S. Provisional Applicant Ser. No. 61/559,582,
filed Nov. 14, 2011, which applications are hereby incorporated by
reference in their entirety.
BACKGROUND
[0002] Applications (e.g., computer games, interactive forms, word
processors and other productivity programs, mobile applications,
etc.) often include a user interface through which a user interacts
with the application. User interfaces can vary widely, but usually
include one or more input devices (e.g., mouse, keyboard, touch
screen, etc.) manipulated by the user to interact with the
application. The user interface generally includes observable
elements (e.g., visual elements such as graphics, icons, fields;
audio elements; haptic feedback such as vibrations among others)
that present an environment to which the user interacts. It is
often a goal of user interface designers to balance the
capabilities of the interface (e.g., how much a user can accomplish
with the user interface) with increasing the ease with which the
user can use the interface.
BRIEF DESCRIPTION OF THE DRAWINGS
[0003] In the drawings, which are not necessarily drawn to scale,
like numerals may describe similar components in different views.
Like numerals having different letter suffixes may represent
different instances of similar components. The drawings illustrate
generally, by way of example, but not by way of limitation, various
embodiments discussed in the present document.
[0004] FIG. 1 illustrates an example of a system for emotionally
sensitive application operation, according to an embodiment.
[0005] FIG. 2 illustrates an example of a method for emotionally
sensitive application operation, according to an embodiment.
[0006] FIG. 3 illustrates an example of an emotional state chart,
according to an embodiment.
[0007] FIG. 4 illustrates an example of a system for say-feel gap
identification and use, according to an embodiment.
[0008] FIG. 5 illustrates an example of a method for say-feel gap
identification and use, according to an embodiment.
[0009] FIG. 6 illustrates an example of a system for facilitating
human facial coders, according to an embodiment.
[0010] FIG. 7 illustrates an example of a media source of facial
images, according to an embodiment.
[0011] FIG. 8 illustrates an example of a facial image with
identified media aspects, according to an embodiment.
[0012] FIG. 9 illustrates an example of an enhanced media source,
according to an embodiment.
[0013] FIG. 10 illustrates an example of a user interface to
present an enhanced media source to a user, according to an
embodiment.
[0014] FIG. 11 illustrates an example of a method for facilitating
human facial coders, according to an embodiment.
[0015] FIG. 12 is a block diagram illustrating an example of a
machine upon which one or more embodiments may be implemented.
DETAILED DESCRIPTION
[0016] Application user interfaces can be improved by dynamically
modifying the application's behavior based on observed emotional
states of a subject. For example, where the subject is the user of
the application, a game may use a camera as an additional input
device to observe the face of the subject during play. The
observations may be used to determine that the user is, for
example, growing bored with the game. In response to this
determination, the game may change to present additional challenges
to the user in an effort to increase user interest. In an example,
the application can be an interactive form, such as a tax form. The
camera may both be used to gather emotionally relevant observations
as well as corresponding application targets (e.g., a particular
field to be completed). This data can be used to, for example,
provide the user with additional help for the field (e.g., a pop-up
tutorial, offer of chatting with live representative, presentation
of frequently asked questions (FAQs), etc.) when the user is
frustrated while dealing with the field.
[0017] In an example the user and subject can be different, such as
an application that monitors patients in a hospital. In this
example, the user can be a nurse while the subject is a patient.
Visual (or non-visual) emotional cues can be observed by a patient
monitor and an emotional state for the patient can be determined.
If, for example, the emotional state indicates surprise, alarm, or
other emotional state that can indicate a problem, the monitor may
changes to alert (e.g., via sound, changed text, changed colors,
paging the user, etc.) the nurse. In an example, the user can be an
automated system, such as a car control system. In this example,
the subject can be the car's driver. The system can determine,
through visual observations, that the driver is experiencing one or
more emotions that can lead to dangerous behavior, such as
frustration, anger, etc. In response, the system can implement
mitigation procedures, such as controlling the car's maximum speed,
sounding an alarm, contacting a family member, or taking control of
the vehicle.
[0018] Using observed emotional data can permit a user interface
designer to increase the capabilities of the interface while
providing a method to identify and adapt to problem areas of the
interface. This approach can increase user satisfaction and thus
the value of the application. Additional details are given below
including with regard to FIGS. 1-3.
[0019] An example application for observed emotional data is in
measuring the veracity of a person's oral representations. For
example, observed emotional data can be used to determine if a
politician delivering a promise at a campaign event believes in the
promise, or in her ability to deliver on the promise. Such observed
emotional data can, for example, be illustrated on a video stream
of the campaign event, such as a televised debate. However, an
observer (e.g., constituent) may be taxed to connect which emotions
are directed to which words in the speech. Emotional observations
of the subject (politician) can be used in conjunction with
semantic understanding of the politician's words or phrases to
automatically augment the presentation to observers. A say-feel gap
(e.g., the difference between the meaning of the word and the
emotional state of the speaker) can be determined. This say-feel
gap can be used to, for example, modify the pitch, tone, or
magnitude of the speech's audio to indicate the confidence the
speaker has in her words. In this example, baseline operation
(e.g., unmodified audio) can be presented of the speech until the
promise, which the politician does not intend to keep. At this
juncture, the audio can be changed to be lower in tone and quieter,
to indicate the politician's lack of confidence. Conversely, the
audio can be made louder to illustrate a portion of the speech in
which the politician is particularly confident. In an example, the
transcript of the speech can be presented to the observer. Strings
(e.g., words or phrases) in the transcript can be marked (e.g.,
highlighted, enlarged, changed in color, etc.) to represent the
emotional state of the speaker correlated to the string. Additional
details are given below including with regard to FIGS. 3-5.
[0020] The emotional state determination discussed above can be
automatically accomplished from the observed emotional indicators
and applied in real-time, or near real-time to modify the described
applications. However, trained human facial coders, for example,
can still provide valuable emotional state determinations in many
situations. For example, in product marketing campaigns, video of a
target group can be gathered while members of the group interact
with a product. This video can be used by human facial coders to
identify emotional indicia that can be used to determine the
subject's emotional state. This information can be compiled and
presented to a customer in a report. In an example, human facial
coders can be used to increase the accuracy of emotional state
determination over that of some automated systems. However, the
rapidity with which an emotional queue may pass (e.g., sub-second
expression in video) and the possibility for long media sources
(e.g., observing a long video) can lead to the problem whereby the
facial coder must manually speed, slow, and rewind the media source
to perform their function. This problem can be addressed by
enhancing the media source based on an emotional determination
system. For example, emotional indicia can be determined for a
media source (e.g., a video). This indicia can be enlarged, color
coded, or the video slowed, to help the human coder identify the
appropriate emotional indicia. In an example, the individual
emotional elements can be separated from the media source to, for
example, anonymize the data or to allow these discrete elements to
be processed by one or more different facial coders. This data can
then be sent to multiple human facial coders (e.g., group or
crowd-sourcing) to perform the analysis. In an example, the
enhanced media source can include a "first pass" analysis that is
presented and confirmed (or corrected) by the human coder.
Additional details are given below including with regard to FIGS. 3
and 6-11.
[0021] Enhancing applications with observed emotional data for a
subject can increase the usability and utility of a variety of
applications as described above. This can lead to increased
productivity, user satisfaction, and other benefits for application
developers, purveyors, and consumers.
[0022] FIG. 1 illustrates an example of a system 100 for
emotionally sensitive application operation. The system 100 can
include a device 120. The device 120 can include an image
processing module 105, an emotion determination module 110, and a
modification module 115. In an example, any one or more of the
image processing module 105, emotion determination module 110, or
the modification module 115 can be remote, such as in a network or
cloud server of a service provider.
[0023] The image processing module 105 can be configured to receive
a sequence of visual observations of a subject 125 during execution
of an application. As illustrated in FIG. 1, the device 120
includes a camera to observe the subject 125. In an example, the
camera can be a peripheral to the device 120. In an example, the
camera can be a stand-alone camera (e.g., camcorder) or other
remote device positioned in such a way as to observe the subject
125.
[0024] The emotion determination module 110 can be configured to
determine an emotional state of the subject 125 based on the
sequence of visual observations. The emotion determination module
110 can be configured to automatically determine the emotional
state based on an emotional determination system. Some examples of
such systems are discussed below with regard to FIG. 3. However,
any system by which an emotional state for a subject 125 can be
determined by observing the subject 125 can be used. In an example,
the emotion determination module 110 can be configured to relay
application state information to an external source where the
emotional state determination can be performed. The emotion
determination module 110 can then communicate the emotional state
determination to other modules of the device 120. In an example,
this remote emotional state determination can be manual (e.g.,
performed, at least in part, by a human).
[0025] In an example, the emotion determination module 110 can be
configured to identify a stimulus corresponding to the determined
emotional state. For example, eye-tracking software and an
additional camera positioned so as to capture what the subject 125
is looking at can be used to produce an image, or otherwise
identify, what object the subject 125 is observing for the
determined emotional state. In an example, the single camera,
alone, can identify the position of the subject's 125 eyes. That
position can be correlated to an object rendered to the screen; the
object being the identified stimulus.
[0026] The modification module 115 can be configured to modify
execution of the application from a baseline execution using the
emotional state. In an example, the modification module 115 can be
configured to store a correlation between the emotional state and
the identified stimulus (discussed above). Memorializing the
pairing between these two data points can be used to provide
further application enhancements, such as reporting this data at a
later date or sharing it with others.
[0027] Using the above modules an application can better interact
with its users. Such use of observed emotional data can provide
better outcomes in user interface interactions. Below are several
example scenarios for application interaction that can be enhanced
in this way.
[0028] In an example, the application can be a social media
application. Such an application can include an interface in which
the subject 125 (user in this scenario) can post information (e.g.,
biographical information, pictures, group affiliations, etc.) about
themselves, identify various circles of trust (e.g., friends) with
other members, play games, etc. An example activity that the
subject 125 can engaged in is to view articles (e.g., posts by
other members) groups, products, etc., and rate them (e.g.,
numerically, "liking" or "disliking" them, etc.). In this example,
the identified stimulus discussed above can be such an article.
Also in this example, the stored correlation (between the emotional
state of the subject 125 and the stimulus) can be the rating. That
is, whether the subject 125 liked, disliked, or was neutral (in an
example also including the magnitude of this feeling) on the
article can be used to later inform others (e.g., friends,
marketers, advertisers, etc.) of the subject's feelings towards the
article.
[0029] In an example, the application can be a consumer marketing
application. An example of such an application can include a
smart-phone application that the subject 125 can used to aid in
shopping. Other such applications can include online shopping, or
even general browsing applications in which product, group (e.g.,
charities, political candidates, etc.), or services can be viewed.
In an example, the stimulus can be presented in a consumer context
(e.g., while shopping). In an example, the stimulus can be
presented in a research context, such as a product being presented
to a market research group. In these examples, the correlation can
be a representation of the emotional state in a modified sequence
of images correlated to the stimulus. For example, the video of the
stimulus (e.g., of a product on the shelves at the grocery store)
can be modified to place an emotional rating proximate to the
product. In an example, the stimulus can be captured (e.g., in a
picture or label) and placed on video of the subject 125 during
observance of the emotional state. This placing can include color
coding, or other manipulations, to visually identify a specific
emotion, or an emotional summarization (e.g., like, dislike,
etc.).
[0030] In an example, the application can be an interactive
application. An interactive application refers to continually
inputting information by the subject 125 and a user interface that
is responsive to such inputs. Examples of such interactive
applications can include games (e.g., video or computer games),
forms (e.g., data entry into one or more fields, etc.),
productivity applications (e.g., word processors, spreadsheets,
presentation tools, graphical drafting tools, etc.), among others.
FIG. 1 illustrates an interactive application on the device 120
using fields. In this example, the identified stimulus is the field
130. In this example, the application includes additional help
services for the field 130. Examples of such additional help can
include an expanded set of text, link to frequently asked questions
(FAQ), or interactive support (e.g., connection to a machine or
human that can ask questions and provide answers). In this example,
a baseline emotional state can be established. In an example,
establishing the baseline state can be accomplished beforehand,
such as through user testing of the application. In an example, the
baseline state can be determined dynamically by observing the
subject 125 for a period of time. In an example, a threshold can be
determined. This threshold can be used to demarcate an acceptable
user experience from one that has become too frustrating,
difficult, or otherwise undesirable. The modification module can be
configured to present the additional help to the subject 125 in
response to the emotional state crossing the threshold. In this
way, the user interface can generally remain uncluttered and permit
those without difficulty from being burdened with the additional
help while enhancing the user experience for those who experience
difficulty.
[0031] In an example, the application is interactive and can
comprise one or more elements. For example, in a game, the elements
can include such things as music, sound effects, sound volume,
lighting effects, difficulty level, etc. For any given element, a
plurality of alternative items can be used in its place. For
example, for an element such as timing between surprise encounters
with characters in a game (e.g., in the action or horror genres)
two or more alternative timings exist and can be used
interchangeably. In an example, the modification module 115 can be
configured to select a next element from the plurality of
alternative items to replace the current element. For example, in a
horror game, the emotional state may indicate that the subject 125
has become bored. Elements such as lighting, music, or surprise
encounters can be replaced with alternatives (e.g., that are more
spooky) to keep the subject 125 interested. Examples can include
decreasing the difficulty level when the subject 125 becomes
frustrated, increasing the frequency of encounters similar to those
in which the subject 125 expressed interest, etc.
[0032] In an example, the application can be a monitoring
application. Examples of such applications can include patient
observations systems, worker observations systems (e.g., power
plant operator observation systems), or vehicle operator
observations systems. With monitoring applications, the user may be
a different entity than the subject 125. In an example, the user
may be a machine, such as a vehicle control system, or other
control system. In these examples, the modification module 115 can
be configured to intervene on behalf of the subject 125. In an
example, to intervene, the modification module 1125 can be
configured to present an alarm to at least one of the subject or
the user. For example, the alarm can take the form of a sound,
page, call, etc. to a nurse observing a patient. In an example, the
alarm can include a sound, vibration, or other notification to a
driver who has become too angry, thus provide a way for the driver
to recognize her own anger and work to mitigate the problem. In an
example, the modification module 115 can be configured to
manipulate a physical aspect of the subject's 125 environment. For
example, with an angry driver, the modification module 115 can be
configured to limit the maximum speed of the vehicle, slow the
vehicle to a stop, etc. Other example environmental manipulations
can include changing the temperature, lighting, or sound (e.g.,
music) of the environment.
[0033] FIG. 2 illustrates an example of a method 200 for
emotionally sensitive application operation. In an example,
components of the system 100 can be used to implement one or more
operations described below.
[0034] At operation 205, a sequence of visual observations of a
subject during execution of an application can be received. In an
example, the subject can be continuous monitored during the
application's execution to obtain the sequence of visual
observations.
[0035] At operation 210, an emotional state of the subject can be
determined based on the sequence of visual observations. In an
example, this determination can operate as discussed above with
respect to FIG. 1.
[0036] At operation 215, the execution of the application can be
modified from a baseline execution using the emotional state. In an
example, the baseline execution of the application is the default
execution of the application. For example, in a game with a given
difficulty level, the baseline execution is the game at the defined
difficulty level as defined by the game's designers. Accordingly,
the modification from the baseline execution is an execution that
deviates from this baseline execution. In an example, the
application can be a monitoring application, the emotional state
can cross a threshold from a baseline emotional state, and an
intervention can be performed on behalf of the subject. For
example, a monitored patient can have an emotional baseline
established during an observation period. This can account for the
peculiar emotional states of the monitored subject. The monitoring
can continue, for example, throughout the night of the patient's
stay in a hospital. At some point in the night, the patient may
suddenly demonstrate alarm, surprise, or other emotion indicative
of a problem or need for attention. The application can alert an
observer (e.g., user, nurse, doctor, etc.) of the changed emotional
condition. In this way, a patient can quickly receive attention to
a potential problem. In an example, the intervention can include
presenting an alarm to at least one user (e.g., the subject or a
different party). In an example, the intervention can include
manipulating a physical aspect of an environment of the subject.
For example, a vehicle operator can be monitored. The environment
can be the vehicle. The physical aspect can be a speed control,
braking control, environment control (e.g., heat, cooling, music,
lighting, etc.). Thus, for example, the intervention can include
playing soothing music while dimming the lights to sooth a driver
who is observed to express fear or anxiety.
[0037] At operation 220, a stimulus corresponding to the determined
emotional state can be identified. In an example, a correlation
between the emotional state and the stimulus can be stored to
modify the execution of the application in operation 215. In an
example, the stimulus can include a visual object (e.g., icon,
field, etc.) of a user interface presented to the subject. In an
example, the stimulus can be identified using eye tracking
techniques and a representation of what the subject is seeing. For
example, eye tracking techniques can be used to identify a screen
region that the subject is looking at. The user interface rendering
engine can be used to provide the object present in that screen
region, which is the stimulus. In an example, the stimulus can be
identified non-visually, such as by touch. In this example, a touch
sensor (e.g., capacitive, light-base, resistive, etc.) can be used
to identify the touch. In an example, visual processing of the
subject's fingers (e.g., observing the fingers and performing image
analysis to determine their location) can be used to identify the
stimulus.
[0038] In an example, the application can be a social media
application, the stimulus can be an article (e.g., post, message,
etc.) presented to the subject by the social media application, and
the correlation can be an indication of one or more of "like",
"dislike", or "neutral" based on the emotional state. In an
example, the application can be a consumer marketing application,
the stimulus can be presented in a consumer context (e.g., on a
shelf in a market, or on a vendor's website), and the correlation
can be a representation of the emotional state in a modified
sequence of images correlated to the stimulus, the modified
sequence of images including the sequence of images. In this
example, the modified sequence of images can include highlighting,
circling, color coding, enlarging, or any other combination of
visual or audio cues to illustrate the emotional state of the
subject with specificity to the product. For example, the product
can be outlined in red when the emotional state is anger.
[0039] In an example, the application can be an interactive
application. In this example, the stimulus can be a portion of the
application for which help is available, such as a field, menu
selection, etc. In this example, the emotional state can cross a
threshold from a baseline emotional state (e.g., from neutral to
frustrated). For example, a calm person may experience a degree of
anger sufficient to cross the threshold. In this example, after the
threshold is crossed, the help can be presented the help to the
subject (user in this example).
[0040] In an example, the application can be an interactive
application, the stimulus can be an element of the interactive
application (e.g., graphic, character, sound track, lighting
element, map, difficulty, etc.), the element is one of a plurality
of alternative items (e.g., in a game with a plurality of
difficulty levels, the alternative items can each be a specific
difficulty level), and a next element can be selected from the
plurality of alternative items based on the emotional state. In
this example, the next element replaces the element in the
interactive application . For example, given a horror computer
game, the emotional state can indicate that the subject has become
overly anxious. In this example, the element can be the level of
illumination given the environment presented by the game. The
alternative items can include a variety of illumination levels. In
this way a higher illumination level than the current level can be
selected to reduce subject anxiety.
[0041] FIG. 3 illustrates an example of an emotional state chart
300. The chart 300 illustrates several useful emotional concepts
for use in the systems and methods described herein. Engagement
(e.g., impact or intensity) represents the degree of emotional
response by a subject. For example, without regard to the type of
emotion observed, high engagement represents strong emotion while a
low engagement represents little or no emotional response. Appeal
(e.g., valence) represents whether the emotion was positive or
negative. Thus, an emotional reaction that has high engagement with
low appeal can be mapped into the illustrated "Danger Zone". The
pertinent zone for modifying an application can vary depending on
the circumstance. For example, in a horror game it may be desirable
that a subject does not become comfortable while in a productivity
application comfort may be the desired goal. In an example, the
zone can be a summarized emotional result (e.g., part of a
summarization component in emotional analysis of the subject).
[0042] Appeal can be determined by identifying particular emotions
expressed by the subject. For example, anger can push an appeal
rating lower while happiness can push an appeal rating higher.
Engagement can be determined by observing the degree (e.g., in
duration or magnitude of motion) of the subject movement
corresponding to an emotion.
[0043] Specific raw emotions, such as happiness, surprise, sadness,
fear, anger, disgust, or contempt can also be used directly.
Determining these raw emotions can be accomplished in a variety of
ways. In an example, an artificial intelligence system can be
trained on subject body or face. In an example, a facial muscle
movement system can be used. These systems utilize observations by
Darwin and others to associate particular physical variations in a
person's face to underlying emotions. One such system is the Facial
Action Coding System (FACS) developed by Dr. Paul Ekman and Wally
Freisen. An emotionally-focused variation of FACS is EMFACS. EMFACS
includes approximately twenty action units (AU). An action unit is
a discrete variation in the subject's face. The following list
includes example Ails and corresponding facial locations: [0044] 1.
Left eyebrow and area adjacent to it on the forehead: can relate to
AU1 and [0045] 2. Center of forehead and area between the eyebrows:
can relate to AU4 [0046] 3. Right eyebrow and area adjacent to it
on the forehead: can relate to AU1 and AU2 [0047] 4. Left eye, and
adjacent area above and below: can relate to AU5, AU6, and AU7
[0048] 5. Right eye, and adjacent area above and below: can relate
to AU5, AU6, and AU7 [0049] 6. Nose, from tip to base: can relate
to AU9 [0050] 7. Left cheek, from alongside nose and outwards: can
relate to AU6, AU10, AU11, AU12, AU14 and AU20 [0051] 8. Right
cheek, from alongside nose and outwards: can relate to AU6, AU10,
AU11, AU12, AU14 and AU20 .. [0052] 9. Left corner of the mouth,
upper and lower: can relate to AU11, AU12, AU14, AU15, AU16, AU20,
AU22, AU23, and AU24 [0053] 10. Middle of mouth, including area
adjacent above and below: can relate to AU9, AU10, AU11, AU12,
AU16, AU17, AU22, AU23, AU24, AU25, AU26, and AU27 [0054] 11. Right
corner of the mouth, upper and lower: can relate to AU11, AU12,
AU14, AU15, AU16, AU20, AU22, AU23, and AU24 [0055] 12. Lower left
part of the face, from chin to mouth: can relate to AU15, AU16,
AU20, AU22, and AU27 [0056] 13. Lower center part of the face, from
chin to mouth: can relate to AU14, AU16, AU17, AU20, AU22, AU23,
AU26, and AU27 [0057] 14. Lower right part of the face, from chin
to mouth: can relate to AU15, AU16, AU20, AU22, and AU27. Emotions
are represented by a single AU or combinations of AUs. Other
example systems can be derived from FACS or EMFACS. For example,
motion in an area of the face, however measured, can be matched
with emotions for AUs in the same or similar region of the
face.
[0058] FIG. 4 illustrates an example of a system 400 for say-feel
gap identification and use. The system 400 can include an audio
processing module 405, a sematic processing module 410, an image
processing module 415, an emotion determination module 420, and a
difference module 425. In an example, the system 400 can also
include a presentation module 430. Any one or more of these modules
can be on a single device, or spread among several devices (such as
in a cloud computing environment).
[0059] The audio processing module 405 can be configured to receive
an audio stream of the subject 125. This audio stream can
correspond in time to a sequence of visual observations of the
subject 125. The audio processing module 405 can produce a
transcript of speech uttered in the audio stream. For example, the
audio processing module 405 can receive an audio track for video of
the subject and produce a transcript of that speech. In an example,
the transcript can be time-coded for later matching to the video
images or for other purposes. In an example, processing of the
audio, including producing the transcript, can be facilitated by an
external entity, such as a human.
[0060] The semantic processing module 410 can be configured to
determine the meaning of a string in the transcript. The string can
include a single word, a portion of a word, or a multi-word phrase.
The meaning is a feeling, definition, etc., as understood by
others. In an example, the "others" can be a subset of the general
population.
[0061] The image processing module 415 can be configured to receive
the sequence of visual observations that correspond to the speech
that produced the string. For example, while video of a complete
criminal suspect interview may have been captured, the sequence of
visual observations may pertain only to that speech in which the
subject 125 denies culpability for a crime. In an example, the
image processing module 415 can be configured to capture, receive,
or store sequences of subject 125 observations. In an example, the
image processing module 415 can be configured to operate as
discussed above with regard to FIG. 1 or 2.
[0062] The emotion determination module 420 can be configured to
determine an emotional state of the subject 125 based on the
sequence of visual observations. In an example, the emotion
determination module can be configured to operate as discussed
above with regard to FIGS. 1-3.
[0063] The difference module can be configured to calculate a
correlation value for the string. In an example, the correlation
value can be calculated by comparing the meaning of the string to
the determined emotional state of the subject 125 for the sequence
of visual images corresponding to the utterance of the string. In
an example, the correlation is a binary value. For example, the
correlation can simply indicate that the spoken meaning of the
words does, or does not, match the observed emotional state of the
subject 125. In an example, the correlation can include a magnitude
component. In an example, the correlation can include one or more
of an engagement component, summarization component, or emotional
response component (e.g., as discussed above with regard to FIG.
3).
[0064] The presentation module 430 can be configured to present the
correlation calculated by the difference module 425 to a user. In
an example, the presentation module 430 can be configured to vary
the intensity of the presentation based on the magnitude of the
correlation. For example, if the image of flaming pants is applied
to video of a speaker when the correlation indicates a say-feel
gap, the size of the flames can be increased as the correlation
decreases. In an example, the presentation module can be configured
to play the sequence of visual observations and produce an audio
representation of the emotional state along with the sequence of
visual observations. For example, a bell can "ding" when there is
strong correlation between the string's meaning the subject's 125
emotional state while a "buzzer" can sound when the correlation is
weak. In an example, the audio representation can include modifying
an aspect of the audio stream. For example, the speaker's voice be
produced with more authority (e.g., louder, with more resonance,
etc.) when there is strong correlation between the string's meaning
and the emotional state of the subject 125.
[0065] In an example, the presentation module 430 can be configured
to present the correlation (e.g., visually). For example, a
transcript including the string can be modified (e.g., marked up)
to illustrate the correlation. In an example, the font size used
for the string can be increased or decreased from a default size.
In an example, the string can be highlighted. In an example, the
highlighting can vary in color, fill style (e.g., solid, striped,
patterned, etc.), transparency, etc., to represent a particular
emotion, or other category based on the observed emotional
state.
[0066] In an example, the presentation module 430 can be configured
to create a modified sequence of images and playing the modified
sequence of images to the user. In an example, the modified
sequence of images can include changing a portion of an image in
the sequence of images. For example, the nose on the face of a
speaker in which there is a low correlation can be lengthened.
Other modifications can include, changing the color of the speaker,
presenting an accompanying meter of the correlation, or a graphic
to indicate the correlation, among others.
[0067] The system 400 can be useful in a variety of situations. For
example, it can be employed by a broadcaster to provide real-time
analysis for viewers or presenters. In an example, the system 400
can be used by physicians interviewing patients. In an example, the
system 400 can be used by marketers or researchers when presenting
(e.g., a product) or concept to the subject 125. In an example, the
system 400 can be used between employers and employees (or
potential employees), between co-workers, between customers and
vendors, or in personal relationship interactions. Further, beyond
just indicating the veracity of the subject's words, the system 400
allows the user to better understand the subject 125. For example,
a patient who indicates willingness to talk about sexual activity
but expresses anxiety may indicate that it is discomfort, rather
than malicious intent, that prompts false statements to the
physician. By understanding this, the physician can attempt to
rephrase, or adopt a new approach, to gathering the requested
information.
[0068] FIG. 5 illustrates an example of a method 500 for say-feel
gap identification and use. In an example, components of the system
400 can be used to implement one or more operations described
below.
[0069] At operation 505, an audio stream of a subject corresponding
in time to a sequence of visual observations, both of the same
subject, can be received.
[0070] At operation 510, a transcript of speech uttered in the
audio stream can be produced.
[0071] At operation 515, a meaning for a string in the transcript
can be determined.
[0072] At operation 520, the sequence of visual observations that
correspond to the speech that produced the string can be received.
That is, this sequence of visual observations is directly related
to the string, and is a subset of the sequence of visual
observations corresponding to the entire audio stream.
[0073] At operation 525, an emotional state of the subject can be
determined based on the sequence of visual observations.
[0074] At operation 530, a correlation value can be calculated for
the string by comparing the meaning and the emotional state.
[0075] At operation 535, the correlation can be presented to a
user. In an example, presenting the correlation can include playing
the sequence of visual observations (e.g., a video) and producing
an audio representation with the sequence of visual observations.
In an example, the audio representation can include a modified
aspect of the audio stream. For example, the pitch, tone, or
magnitude of the speaker's voice can be manipulated to express the
correlation. In an example, background music can be selectively
added to represent the correlation (e.g., ominous music to indicate
deception or seriousness, and light-hearted music to indicate
happiness, joy, etc.).
[0076] In an example, presenting the correlation can include
presenting a visual indication of the emotional state in a
representation of the transcript corresponding to the string. In
this example, one or more strings of text can be produced from the
audio stream. These strings can be presented to the user. For
example the presentation can include closed-captioning in video
streams or written transcripts (e.g., as on a webpage), among
others. In an example, the string can be color coded (e.g.,
highlighted in a particular color) to represent the emotional
state. In an example, the string can be enlarged or shrunk to
represent the emotional state. Other modifications to the string,
such as changing the font, replacing the string with a graphic,
adding an accompanying graphic, can also be used to represent the
emotional state.
[0077] In an example, presenting the correlation between the
meaning of the string and the emotional state can include creating
a modified sequence of images by changing a portion of an image in
the sequence of images, and playing the modified sequence of
images. In an example, a speaker's features in a video can be
modified to represent the correlation. For example, a speaker who
lacks confidence in what they are saying may have their nose
lengthened. In an example, a color can be superimposed upon the
speaker to represent the correlation. For example, a
semi-transparent green can overlay the speaker's face on a video
stream.
[0078] In an example, presenting the correlation can include
varying an intensity of the presentation based on the magnitude of
the correlation. For example, if the correlation is strong (e.g.,
little difference between the meaning of the string and the
observed emotional state) a small variation can be presented.
Conversely, where the correlation is weak, a large variation can be
presented.
[0079] In an example, the correlation can include an engagement
component. In an example, the correlation can include a
summarization component. In an example, the correlation can include
an emotional response component. The emotional response component
can include at least one measure for impact or appeal. In an
example, these components are defined above with regard to FIG.
3.
[0080] FIG. 6 illustrates an example of a system 600 for
facilitating human facial coders. The system 600 can include an
identification module 605, an enhancement module 610, and a
presentation module 615. The system 600 can be used to facilitate
manual facial coding by identifying or enhancing movements of a
subject's face as well as change the operation of the coding system
(e.g., selecting candidate emotions).
[0081] The identification module 605 can be configured to identify
a media aspect of a media source based on an emotional
determination system. A media aspect is observable when the media
source is presented to a user. Examples of media aspects can
include: a window in time of, for example, a sequence of visual
observations of the subject; a region of one or more frames of
video, a region of the subject's face or body that is visible in
the sequence of visual observations, etc. The emotional
determination system can be any system designed to determine a
subject's emotions via physical observation (e.g., visually,
electrically, heat, tactically, etc.). In an example, non-visual
observations (e.g., tactile observations) can be converted into
visual observations. In an example, the emotional determination
system is based on the FACS or EMFACS systems discussed above with
regard to FIG. 3. In an example, the media aspect is a portion of a
subject's face corresponding to an action unit of the FACS or
EMFACS systems. In an example, the media aspect includes the
portion of the subject's face spanning a subset of the sequence of
visual observations corresponding to a change in the facial portion
from a baseline. For example, the media aspect can include the
frames of video in which (or slightly before and after) a subject's
mouth corner indicates a true smile. In an example, the media
aspect can include multiple portions of the subject's face
corresponding to a single AU. In an example, the media aspect can
include multiple portions of the face corresponding to multiple
AUs. In an example, the multiple AUs can represent a single
emotion.
[0082] In an example, the baseline can be adjusted based on the
subject's speech to correct for interference from particular vocal
articulations in a facial movement. For example, for facial coding
when a subject is talking, it may be difficult to separate emotive
facial muscle movements and those attributable to articulate sounds
for speech. For example, when a person says "slippery" the mouth
contorts differently than when a person says a word like
"explosive." "Explosive" is likely to cause the upper lip to flare
in a way that a human facial coder may be mistakenly identified as
AU10. The baseline determination can take into account such a
problem to avoid false media aspect identifications. In an example,
an audio system can be configured to detect speech to identify when
a possible AU occurs just prior to a person speaking, during a
pause, or immediately after a given utterance. Thus, passages of
time can be identified when the simpler, non-verbal form of coding
can occur, and media aspect identification can be limited to these
passages.
[0083] In an example, linguistics and prosody research can be used
as a model such that plosives (e.g., as in "explosive" with its
"p's"), or "s" sounds, etc., can be used indicate whether an
emotional component is expressed (e.g., in AU terms) or whether in
fact the observed facial muscle movements are merely a part of how
the word is being articulated. In an example, the prosody model can
vary depending upon the language being spoken by the subject.
[0084] The enhancement module 610 can be configured to produce an
enhanced media source. The enhanced media source can include an
indicator of the media aspect identified by the identification
module 605. In an example, the enhancement module 610 can be
configured to create a user observable indicator of the media
aspect. For example, the user observable indicator is a measurement
of the change to the portion of the subject's face described above.
In an example, the measurement can include the duration of the
change. In an example, the measurement can include a magnitude of
the change. For example, the magnitude can represent the distance
the subject's eyebrow was raised. In an example, the magnitude can
be a ratio of the change and the subject's face.
[0085] The presentation module can be configured to present the
enhanced media source to the user. In an example, the presentation
can include displaying video, presenting a user interface, etc.
Additional examples are discussed below with regard to FIGS.
7-10.
[0086] FIG. 7 illustrates an example of a media source 700 of
facial images, according to an embodiment. In this example, the
media source is a video including a sequence of facial images. As
illustrated, the sequence of visual observations (e.g., facial
images) includes a change in the subject's right eyebrow and mouth.
The baseline visual image 705 illustrates the subject's face
immediately prior to the change. Visual image 710 is the first to
illustrate the change. The subset of visual images 715 represents
those images in which the change is present. Visual image 720
illustrates the subject's face returning to the baseline following
the change. This illustration is used below to describe examples of
media aspect identification, user observable indicator creation,
and presentation of the enhanced media source.
[0087] FIG. 8 illustrates an example of a facial image 800 with
identified media aspects. The illustrated media aspects correspond
to the facial changes introduced above in the media source 700. In
an example, portions of the subject's face can be predetermined to
be relevant. For example, the subject's face can be divided into a
virtual grid of areas of interest. In an example, this grid can
correspond to the Ails described above with regard to FIG. 3. As
illustrated, the portions 805, 810, and 815 are identified media
aspects of this scenario. Each portion corresponds to an
emotionally relevant facial motion.
[0088] FIG. 9 illustrates an example of an enhanced media source
900. Taking the identified media aspects of the facial image 800,
the enhanced media source 900 can be created. In this example, the
user observable indicator is an enlargement of the portion of the
subject's face corresponding to each media aspect. For example,
user observable indicator 905 is an enlargement of the facial
portion 805 and user observable indicator 910 is an enlargement of
both facial portions 810 and 815. In an example, other
manipulations of the underlying facial image can be used to create
the user observable indicator. Examples can include: circling,
shading, or highlighting. In an example, the color of these
manipulations can be used to indicate a possible emotion or
underlying emotional component (e.g., AU). In an example, the user
observable indicator can include removing any part of the image not
directly associated with a particular emotional component. For
example, the user observable indicator 905 can be left while the
rest of the visual image 800 is removed. Accordingly, distribution
of the media aspect 805 to different manual coders can be
facilitated. This distribution can be used to, for example, enable
a crowd or group sourcing model to perform the facial coding.
[0089] FIG. 10 illustrates an example of a user interface 1000 to
present an enhanced media source to a user. The user interface 1000
can include the enhanced media source 1005, an emotional component
interface 1010, and a media aspect metrics interface. Additional
controls can also be included in the user interface 1000, such as
playback controls, project interface controls, etc. In an example,
the previously introduced user observable indicator can include
slowing playback of the sequence of visual images. Thus, a manual
coder can have the playback automatically slowed during interesting
periods and, possibly, sped up at less interesting periods.
[0090] The emotional component interface 1010 can be configured to
display available emotional component choices to the user. In an
example, the list of available emotional components can be reduced
based on the media aspect. This reduction of listed emotional
components can help reduce errors by removing, for example, Ails
that are not possible with a given media aspect.
[0091] The media aspect metric interface 1015 can be configured to
present metrics of one or more media aspects. For example, the
duration or magnitude of a given media aspect can be displayed.
These metrics can facilitate decisions by the manual coder of such
things like engagement of the subject. In an example, the media
aspect metrics interface 1015 can be configured to indicate whether
a media aspect is positive or negative. Such an indication can
facilitate the manual coder to determine appeal in the subject.
[0092] FIG. 11 illustrates an example of a method 1100 for
facilitating human facial coders. In an example, components of the
system 600 or the user interface 1000 can be used to implement one
or more operations described below.
[0093] At operation 1105, a media aspect of a media source can be
identified based on an emotional determination system. The media
aspect can be observable (e.g., visually, audibly, tactilely, etc.)
when the media source is presented to a user. The media source can
include a sequence of visual observations of a subject. In an
example, the media aspect can span a subset of the sequence of
visual observations. For example, the media source can be a video
stream and the media aspect can be a part of a scene, such as a
region of the screen over a number of frames of the video stream.
Other media aspects can include particular frames, features of the
subject's face or body, etc.
[0094] In an example, the emotional determination system is based
on the FACS or EMFACS systems described above with regard to FIG.
3. In an example, the media aspect can be a portion of a face of
the subject corresponding to an action unit of the FACS system. In
an example, the media aspect includes an additional portion of the
face corresponding to an additional action unit. For example, the
media aspect can include multiple facial regions each corresponding
to different action units.
[0095] In an example, the emotional determination system can
include an artificial intelligence (e.g., neural network) system
trained against a number of faces. These systems can define
parameters, areas of the face, facial baseline procedures, duration
of facial muscle movements, and relevant combinations of muscle
movements, among other things, to inform what portions of the media
source are pertinent to the human facial coder.
[0096] At operation 1110, an enhanced media source can be produced
by creating a user observable indicator of the media aspect. In an
example, the user observable indicator can include a listing of the
action unit and the additional action unit synchronized with the
media aspect. For example, instead of having to select from the
complete list of actions units, the user can be presented with a
subset of the list based on the media aspect (e.g., only valid
action units for the identified facial regions).
[0097] The subset of the sequence of visual observations can
correspond to a change in the portion of the face from a baseline.
In an example, the user observable indicator can be a measurement
of the change. For example, the indicator can include a duration
for the change, a magnitude of the change, etc. In an example, the
user observable indicator can be an enlargement of the portion in
the subset. In an example, the user observable indicator can
include slowing playback of the sequence of images. In an example,
the user observable indicator can include a color change for the
portion of the face. For example, a semi-transparent yellow can
overlay the portion of the face that changed. In an example, the
specific color can represent a specific action unit or group of
action units. IN an example a circling, or other attention drawing
marker (e.g., an arrow) can be used to alert the human coder to
motion in a relevant portion of the subject's face.
[0098] At operation 1115, the enhanced media source can be
presented to the user.
[0099] FIG. 12 illustrates a block diagram of an example of a
machine 1200 upon which any one or more of the techniques (e.g.,
methodologies) discussed herein may perform. In alternative
embodiments, the machine 1200 may operate as a standalone device or
may be connected (e.g., networked) to other machines. In a
networked deployment, the machine 1200 may operate in the capacity
of a server machine, a client machine, or both in server-client
network environments. In an example, the machine 1200 may act as a
peer machine in peer-to-peer (P2P) (or other distributed) network
environment. The machine 1200 may be a personal computer (PC), a
tablet PC, a set-top box (STB), a Personal Digital Assistant (PDA),
a mobile telephone, a web appliance, a network router, switch or
bridge, or any machine capable of executing instructions
(sequential or otherwise) that specify actions to be taken by that
machine. Further, while only a single machine is illustrated, the
term "machine" shall also be taken to include any collection of
machines that individually or jointly execute a set (or multiple
sets) of instructions to perform any one or more of the
methodologies discussed herein, such as cloud computing, software
as a service (SaaS), or other computer cluster configurations.
[0100] Examples, as described herein, may include, or may operate
on, logic or a number of components, modules, or mechanisms.
Modules are tangible entities (e.g., hardware) capable of
performing specified operations and may be configured or arranged
in a certain manner. In an example, circuits may be arranged (e.g.,
internally or with respect to external entities such as other
circuits) in a specified manner as a module. In an example, the
whole or part of one or more computer systems (e.g., a standalone,
client or server computer system) or one or more hardware
processors may be configured by firmware or software (e.g.,
instructions, an application portion, or an application) as a
module that operates to perform specified operations. In an
example, the software may reside on a machine readable medium. In
an example, the software, when executed by the underlying hardware
of the module, causes the hardware to perform the specified
operations.
[0101] Accordingly, the term "module" is understood to encompass a
tangible entity, be that an entity that is physically constructed,
specifically configured (e.g., hardwired), or temporarily (e.g.,
transitorily) configured (e.g., programmed) to operate in a
specified manner or to perform part or all of any operation
described herein. Considering examples in which modules are
temporarily configured, each of the modules need not be
instantiated at any one moment in time. For example, where the
modules comprise a general-purpose hardware processor configured
using software, the general-purpose hardware processor may be
configured as respective different modules at different times.
Software may accordingly configure a hardware processor, for
example, to constitute a particular module at one instance of time
and to constitute a different module at a different instance of
time.
[0102] Machine (e.g., computer system) 1200 may include a hardware
processor 1202 (e.g., a central processing unit (CPU), a graphics
processing unit (GPU), a hardware processor core, or any
combination thereof), a main memory 1204 and a static memory 1206,
some or all of which may communicate with each other via an
interlink (e.g., bus) 1208. The machine 1200 may further include a
display unit 1210, an alphanumeric input device 1212 (e.g., a
keyboard), and a user interface (UI) navigation device 1214 (e.g.,
a mouse). In an example, the display unit 1210, input device 1212
and UI navigation device 1214 may be a touch screen display. The
machine 1200 may additionally include a storage device (e.g., drive
unit) 1216, a signal generation device 1218 (e.g., a speaker), a
network interface device 1220, and one or more sensors 1221, such
as a global positioning system (GPS) sensor, compass,
accelerometer, or other sensor. The machine 1200 may include an
output controller 1228, such as a serial (e.g., universal serial
bus (USB), parallel, or other wired or wireless (e g ,
infrared(IR), near field communication (NFC), etc.) connection to
communicate or control one or more peripheral devices (e.g., a
printer, card reader, etc.).
[0103] The storage device 1216 may include a machine readable
medium 1222 on which is stored one or more sets of data structures
or instructions 1224 (e.g., software) embodying or utilized by any
one or more of the techniques or functions described herein. The
instructions 1224 may also reside, completely or at least
partially, within the main memory 1204, within static memory 1206,
or within the hardware processor 1202 during execution thereof by
the machine 1200. In an example, one or any combination of the
hardware processor 1202, the main memory 1204, the static memory
1206, or the storage device 1216 may constitute machine readable
media.
[0104] While the machine readable medium 1222 is illustrated as a
single medium, the term "machine readable medium" may include a
single medium or multiple media (e.g., a centralized or distributed
database, and/or associated caches and servers) that configured to
store the one or more instructions 1224. The term "machine readable
medium" may include any medium that is capable of storing,
encoding, or carrying instructions for execution by the machine
1200 and that cause the machine 1200 to perform any one or more of
the techniques of the present disclosure, or that is capable of
storing, encoding or carrying data structures used by or associated
with such instructions. Non-limiting machine readable medium
examples may include solid-state memories, and optical and magnetic
media. In an example, a massed machine readable medium comprises a
machine readable medium with a plurality of particles having
resting mass. Specific examples of massed machine readable media
may include: non-volatile memory, such as semiconductor memory
devices (e.g., Electrically Programmable Read-Only Memory (EPROM),
Electrically Erasable Programmable Read-Only Memory (EEPROM)) and
flash memory devices; magnetic disks, such as internal hard disks
and removable disks; magneto-optical disks; and CD-ROM and DVD-ROM
disks.
[0105] The instructions 1224 may further be transmitted or received
over a communications network 1226 using a transmission medium via
the network interface device 1220 utilizing any one of a number of
transfer protocols (e.g., frame relay, internet protocol (IP),
transmission control protocol (TCP), user datagram protocol (UDP),
hypertext transfer protocol (HTTP), etc.). Example communication
networks may include a local area network (LAN), a wide area
network (WAN), a packet data network (e.g., the Internet), mobile
telephone networks (e.g., cellular networks), Plain Old Telephone
(POTS) networks, and wireless data networks (e.g., Institute of
Electrical and Electronics Engineers (IEEE) 802.11 family of
standards known as Wi-Fi.RTM., IEEE 802.16 family of standards
known as WiMax.RTM.), peer-to-peer (P2P) networks, among others. In
an example, the network interface device 1220 may include one or
more physical jacks (e.g., Ethernet, coaxial, or phone jacks) or
one or more antennas to connect to the communications network 1226.
In an example, the network interface device 1220 may include a
plurality of antennas to wirelessly communicate using at least one
of single-input multiple-output (SIMO), multiple-input
multiple-output (MIMO), or multiple-input single-output (MISO)
techniques. The term "transmission medium" shall be taken to
include any intangible medium that is capable of storing, encoding
or carrying instructions for execution by the machine 1200, and
includes digital or analog communications signals or other
intangible medium to facilitate communication of such software.
Additional Notes & Examples
[0106] Example 1 can include subject matter (such as a device,
apparatus, or system) comprising device comprising an image
processing module configured to receive a sequence of visual
observations of a subject during the execution of an application,
an emotion determination module configured to determine an
emotional state of the subject based on the sequence of visual
observations, and a modification module configured to modify the
execution of the application from a baseline execution using the
emotional state.
[0107] In Example 2, the subject matter of Example 1 can optionally
include, wherein the emotion determination module is configured to
identify a stimulus corresponding to the determined emotional
state.
[0108] In Example 3, the subject matter of Example 2 can optionally
include, wherein to modify the execution of the application
includes the modification module configured to store a correlation
between the emotional state and the stimulus.
[0109] In Example 4, the subject matter of Example 3 can optionally
include, wherein the application is a social media application,
wherein the stimulus is an article presented to the subject by the
social media application, and wherein the correlation is an
indication of at least one of like, dislike, or neutral indication
of the article, the indication selected based on the emotional
state.
[0110] In Example 5, the subject matter of any one or more of
Examples 3-4 can optionally include, wherein the application is a
consumer marketing application, wherein the stimulus is presented
in a consumer context, and wherein the correlation is a
representation of the emotional state in a modified sequence of
images correlated to the stimulus, the modified sequence of images
including the sequence of images.
[0111] In Example 6, the subject matter of any one or more of
Examples 2-5 can optionally include, wherein the application is an
interactive application, wherein the stimulus is a portion of the
application for which help is available, wherein the emotional
state crosses a threshold from a baseline emotional state, and
wherein the modification module is configured to present the help
to the subject.
[0112] In Example 7, the subject matter of any one or more of
Examples 2-6 can optionally include, wherein the application is an
interactive application, wherein the stimulus is an element of the
interactive application, wherein the element is one of a plurality
of alternative items, and wherein the modification module is
configured to select a next element from the plurality of
alternative items based on the emotional state, the next element
replacing the element in the interactive application.
[0113] In Example 8, the subject matter of any one or more of
Examples 1-7 can optionally include, wherein the application is a
monitoring application, wherein the emotional state crosses a
threshold from a baseline emotional state, and wherein the
modification module is configured to intervene on behalf of the
subject.
[0114] In Example 9, the subject matter of Example 8, wherein to
intervene, the modification module is configured to present an
alarm to at least one of the subject or a user.
[0115] In Example 10, the subject matter of any one or more of
Examples 9-8 can optionally include, wherein to intervene, the
modification module is configured to manipulate a physical aspect
of an environment of the subject.
[0116] Example 11 can include, or can optionally be combined with
the subject matter of any one or more of Examples 1-15 to include,
subject matter (such as a method, means for performing acts, or
machine readable medium including instructions that, when performed
by a machine cause the machine to perform acts) comprising
receiving a sequence of visual observations of a subject during the
execution of an application, determining an emotional state of the
subject based on the sequence of visual observations, and modifying
the execution of the application from a baseline execution using
the emotional state.
[0117] In Example 12, the subject matter of Example 11 can
optionally include, wherein the operations comprise identifying a
stimulus corresponding to the determined emotional state.
[0118] In Example 13, the subject matter of Example 12 can
optionally include, wherein modifying the execution of the
application includes storing a correlation between the emotional
state and the stimulus.
[0119] In Example 14, the subject matter of Example 13 can
optionally include, wherein the application is a social media
application, wherein the stimulus is an article presented to the
subject by the social media application, and wherein the
correlation is an indication of at least one of like, dislike, or
neutral indication of the article, the indication selected based on
the emotional state.
[0120] In Example 15, the subject matter of any one or more of
Examples 13-14 can optionally include, wherein the application is a
consumer marketing application, wherein the stimulus is presented
in a consumer context, and wherein the correlation is a
representation of the emotional state in a modified sequence of
images correlated to the stimulus, the modified sequence of images
including the sequence of images.
[0121] In Example 16, the subject matter of any one or more of
Examples 12-15 can optionally include, wherein the application is
an interactive application, wherein the stimulus is a portion of
the application for which help is available, wherein the emotional
state crosses a threshold from a baseline emotional state, and
wherein the operations comprise presenting the help to the
subject.
[0122] In Example 17, the subject matter of any one or more of
Examples 12-16 can optionally include, wherein the application is
an interactive application, wherein the stimulus is an element of
the interactive application, wherein the element is one of a
plurality of alternative items, and wherein the operations comprise
selecting a next element from the plurality of alternative items
based on the emotional state, the next element replacing the
element in the interactive application.
[0123] In Example 18, the subject matter of any one or more of
Examples 11-17 can optionally include, wherein the application is a
monitoring application, wherein the emotional state crosses a
threshold from a baseline emotional state, and wherein the
operations comprise intervening on behalf of the subject.
[0124] In Example 19, the subject matter of Example 18 can
optionally include, wherein the intervening includes presenting an
alarm to at least one of the subject or a user.
[0125] In Example 20, the subject matter of any one or more of
Examples 18-19 can optionally include, wherein the intervening
includes manipulating a physical aspect of an environment of the
subject.
[0126] Example 21 can include, or can optionally be combined with
the subject matter of any one or more of Examples 1-50 to include,
subject matter (such as a device, apparatus, or system) comprising
an audio processing module, a semantic processing module, an image
processing module, an emotion determination module, a difference
module. The audio processing module can be configured to receive an
audio stream of a subject corresponding in time to a sequence of
visual observations of the subject, and produce a transcript of
speech uttered in the audio stream. The semantic processing module
can be configured to determine a meaning of a string in the
transcript. The image processing module can be configured to
receive the sequence of visual that correspond to speech that
produced the string. The emotion determination module can be
configured to determine an emotional state of the subject based on
the sequence of visual observations. The difference module can be
configured to calculate a correlation value for the string by
comparing the meaning and the emotional state.
[0127] In Example 22, the subject matter of Example 21 can
optionally include a presentation module configured to present the
correlation to a user.
[0128] In Example 23, the subject matter of Example 22 can
optionally include, wherein to present the correlation includes the
presentation module configured to play the sequence of visual
observations, and produce an audio representation with the sequence
of visual observations.
[0129] In Example 24, the subject matter of Example 23 can
optionally include, wherein the audio representation includes a
modified aspect of the audio stream.
[0130] In Example 25, the subject matter of any one or more of
Examples 22-24 can optionally include, wherein to present the
correlation includes the presentation module configured to present
a visual indication of the emotional state in a representation of
the transcript corresponding to the string.
[0131] In Example 26, the subject matter of any one or more of
Examples 22-25 can optionally include, wherein to present the
correlation includes the presentation module configured to create a
modified sequence of images by changing a portion of an image in
the sequence of images, and play the modified sequence of
images.
[0132] In Example 27, the subject matter of any one or more of
Examples 22-26 can optionally include, wherein to present the
correlation, the presentation module is configured to vary an
intensity of the presentation based on the magnitude of the
correlation.
[0133] In Example 28, the subject matter of any one or more of
Examples 21-27 can optionally include, wherein the correlation
includes an engagement component.
[0134] In Example 29, the subject matter of any one or more of
Examples 21-28 can optionally include, wherein the correlation
includes a summarization component.
[0135] In Example 30, the subject matter of any one or more of
Examples 21-29 can optionally include, wherein the correlation
includes an emotional response component, the emotional response
component including at least one of impact or appeal.
[0136] Example 31 can include, or can optionally be combined with
the subject matter of any one or more of Examples 1-15 to include,
subject matter (such as a method, means for performing acts, or
machine readable medium including instructions that, when performed
by a machine cause the machine to perform acts) comprising
receiving an audio stream of a subject corresponding in time to a
sequence of visual observations of the subject, producing a
transcript of speech uttered in the audio stream, determining a
meaning of a string in the transcript, receiving the sequence of
visual observations that correspond to speech that produced the
string, determining an emotional state of the subject based on the
sequence of visual observations, and calculating a correlation
value for the string by comparing the meaning and the emotional
state.
[0137] In Example 32, the subject matter of Example 31 can
optionally include, wherein the operations comprise presenting the
correlation to a user.
[0138] In Example 33, the subject matter of Example 32 can
optionally include, wherein presenting the correlation includes
playing the sequence of visual observations, and producing an audio
representation with the sequence of visual observations.
[0139] In Example 34, the subject matter of Example 33 can
optionally include, wherein the audio representation includes a
modified aspect of the audio stream.
[0140] In Example 35, the subject matter of any one or more of
Examples 32-33 can optionally include, wherein presenting the
correlation includes presenting a visual indication of the
emotional state in a representation of the transcript corresponding
to the string.
[0141] In Example 36, the subject matter of any one or more of
Examples 32-35 can optionally include, wherein presenting the
correlation includes creating a modified sequence of images by
changing a portion of an image in the sequence of images, and
playing the modified sequence of images.
[0142] In Example 37, the subject matter of any one or more of
Examples 32-36 can optionally include, wherein presenting the
correlation includes varying an intensity of the presentation based
on the magnitude of the correlation.
[0143] In Example 38, the subject matter of any one or more of
Examples 31-37 can optionally include, wherein the correlation
includes an engagement component.
[0144] In Example 39, the subject matter of any one or more of
Examples 31-38 can optionally include, wherein the correlation
includes a summarization component.
[0145] In Example 40, the subject matter of any one or more of
Examples 31-39 can optionally include, wherein the correlation
includes an emotional response component, the emotional response
component including at least one of impact or appeal.
[0146] Example 41 can include, or can optionally be combined with
the subject matter of any one or more of Examples 1-40 to include,
subject matter (such as a device, apparatus, or system) comprising
an identification module configured to identify a media aspect of a
media source based on an emotional determination system--the media
aspect being observable when the media source is presented to a
user--and the media source including a sequence of visual
observations of a subject, an enhancement module configured to
produce an enhanced media source by creating a user observable
indicator of the media aspect, and a presentation module configured
to present the enhanced media source to the user.
[0147] In Example 42, the subject matter of Example 41 can
optionally include, wherein the emotional determination system is
based on a Facial Action Coding (FACS) system.
[0148] In Example 43, the subject matter of Example 42 can
optionally include, wherein the media aspect is a portion of a face
of the subject corresponding to an action unit of the FACS
system.
[0149] In Example 44, the subject matter of Example 43 can
optionally include, wherein the media aspect spans a subset of the
sequence of visual observations, the subset corresponding to a
change in the portion of the face from a baseline.
[0150] In Example 45, the subject matter of Example 44 can
optionally include, wherein the user observable indicator is a
measurement of the change.
[0151] In Example 46, the subject matter of any one or more of
Examples 44-45 can optionally include, wherein the user observable
indicator is an enlargement of the portion in the subset.
[0152] In Example 47, the subject matter of any one or more of
Examples 44-46 can optionally include, wherein the user observable
indicator includes slowing playback of the sequence of images
[0153] In Example 48, the subject matter of any one or more of
Examples 43-47 can optionally include, wherein the media aspect
includes an additional portion of the face corresponding to an
additional action unit.
[0154] In Example 49, the subject matter of Example 48 can
optionally include, wherein the user observable indicator includes
a listing of the action unit and the additional action unit
synchronized with the media aspect.
[0155] In Example 50, the subject matter of any one or more of
Examples 43-49 can optionally include, wherein the user observable
indicator includes a color change for the portion of the face.
[0156] Example 51 can include, or can optionally be combined with
the subject matter of any one or more of Examples 1-50 to include,
subject matter (such as a method, means for performing acts, or
machine readable medium including instructions that, when performed
by a machine cause the machine to perform acts) comprising
identifying a media aspect of a media source based on an emotional
determination system--the media aspect being observable when the
media source is presented to a user--and the media source including
a sequence of visual observations of a subject, producing an
enhanced media source by creating a user observable indicator of
the media aspect, and presenting the enhanced media source to the
user.
[0157] In Example 52, the subject matter of Example 51 can
optionally include, wherein the emotional determination system is
based on a Facial Action Coding (FACS) system.
[0158] In Example 53, the subject matter of Example 52 can
optionally include, wherein the media aspect is a portion of a face
of the subject corresponding to an action unit of the FACS
system.
[0159] In Example 54, the subject matter of Example 53 can
optionally include, wherein the media aspect spans a subset of the
sequence of visual observations, the subset corresponding to a
change in the portion of the face from a baseline.
[0160] In Example 55, the subject matter of Example 54 can
optionally include, wherein the user observable indicator is a
measurement of the change.
[0161] In Example 56, the subject matter of any one or more of
Examples 54-55 can optionally include, wherein the user observable
indicator is an enlargement of the portion in the subset.
[0162] In Example 57, the subject matter of any one or more of
Examples 54-56, wherein the user observable indicator includes
slowing playback of the sequence of images
[0163] In Example 58, the subject matter of any one or more of
Examples 53-57, wherein the media aspect includes an additional
portion of the face corresponding to an additional action unit.
[0164] In Example 59, the subject matter of Example 58 can
optionally include, wherein the user observable indicator includes
a listing of the action unit and the additional action unit
synchronized with the media aspect.
[0165] In Example 60, the subject matter of any one or more of
Examples 53-59 can optionally include, wherein the user observable
indicator includes a color change for the portion of the face.
[0166] The above detailed description includes references to the
accompanying drawings, which form a part of the detailed
description. The drawings show, by way of illustration, specific
embodiments in that may be practiced. These embodiments are also
referred to herein as "examples." Such examples can include
elements in addition to those shown or described. However, the
present inventors also contemplate examples in which only those
elements shown or described are provided. Moreover, the present
inventors also contemplate examples using any combination or
permutation of those elements shown or described (or one or more
aspects thereof), either with respect to a particular example (or
one or more aspects thereof), or with respect to other examples (or
one or more aspects thereof) shown or described herein.
[0167] All publications, patents, and patent documents referred to
in this document are incorporated by reference herein in their
entirety, as though individually incorporated by reference. In the
event of inconsistent usages between this document and those
documents so incorporated by reference, the usage in the
incorporated reference(s) should be considered supplementary to
that of this document; for irreconcilable inconsistencies, the
usage in this document controls.
[0168] In this document, the terms "a" or "an" are used, as is
common in patent documents, to include one or more than one,
independent of any other instances or usages of "at least one" or
"one or more." In this document, the term "or" is used to refer to
a nonexclusive or, such that "A or B" includes "A but not B," "B
but not A," and "A and B," unless otherwise indicated. In the
appended claims, the terms "including" and "in which" are used as
the plain-English equivalents of the respective terms "comprising"
and "wherein." Also, in the following claims, the terms "including"
and "comprising" are open-ended, that is, a system, device,
article, or process that includes elements in addition to those
listed after such a term in a claim are still deemed to fall within
the scope of that claim. Moreover, in the following claims, the
terms "first," "second," and "third," etc. are used merely as
labels, and are not intended to impose numerical requirements on
their objects.
[0169] The above description is intended to be illustrative, and
not restrictive. For example, the above-described examples (or one
or more aspects thereof) may be used in combination with each
other. Other embodiments can be used, such as by one of ordinary
skill in the art upon reviewing the above description. The Abstract
is to allow the reader to quickly ascertain the nature of the
technical disclosure, for example, to comply with 37 C.F.R.
.sctn.1.72(b) in the United States of America. It is submitted with
the understanding that it will not be used to interpret or limit
the scope or meaning of the claims. Also, in the above Detailed
Description, various features may be grouped together to streamline
the disclosure. This should not be interpreted as intending that an
unclaimed disclosed feature is essential to any claim. Rather,
inventive subject matter may lie in less than all features of a
particular disclosed embodiment. Thus, the following claims are
hereby incorporated into the Detailed Description, with each claim
standing on its own as a separate embodiment. The scope of the
embodiments should be determined with reference to the appended
claims, along with the full scope of equivalents to which such
claims are entitled.
* * * * *