U.S. patent application number 15/087657 was filed with the patent office on 2016-07-28 for experience or "sentio" codecs, and methods and systems for improving qoe and encoding based on qoe experiences.
The applicant listed for this patent is Net Power and Light, Inc.. Invention is credited to Tara Lemmey, Nikolay Surin, Stanislav Vonog.
Application Number | 20160219279 15/087657 |
Document ID | / |
Family ID | 45564811 |
Filed Date | 2016-07-28 |
United States Patent
Application |
20160219279 |
Kind Code |
A1 |
Vonog; Stanislav ; et
al. |
July 28, 2016 |
EXPERIENCE OR "SENTIO" CODECS, AND METHODS AND SYSTEMS FOR
IMPROVING QoE AND ENCODING BASED ON QoE EXPERIENCES
Abstract
Certain embodiments teach a variety of experience or "sentio"
codecs, and methods and systems for enabling an experience
platform, as well as a Quality of Experience (QoS) engine which
allows the sentio codec to select a suitable encoding engine or
device. The sentio codec is capable of encoding and transmitting
data streams that correspond to participant experiences with a
variety of different dimensions and features. As will be
appreciated, the following description provides one paradigm for
understanding the multi-dimensional experience available to the
participants, and as implemented utilizing a sentio codec. There
are many suitable ways of describing, characterizing and
implementing the sentio codec and experience platform contemplated
herein.
Inventors: |
Vonog; Stanislav; (San
Francisco, CA) ; Surin; Nikolay; (San Francisco,
CA) ; Lemmey; Tara; (San Francisco, CA) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
Net Power and Light, Inc. |
San Francisco |
CA |
US |
|
|
Family ID: |
45564811 |
Appl. No.: |
15/087657 |
Filed: |
March 31, 2016 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
13363187 |
Jan 31, 2012 |
|
|
|
15087657 |
|
|
|
|
13136870 |
Aug 12, 2011 |
9172979 |
|
|
13363187 |
|
|
|
|
61373236 |
Aug 12, 2010 |
|
|
|
Current U.S.
Class: |
1/1 |
Current CPC
Class: |
H04N 21/631 20130101;
H04N 21/2662 20130101; H04N 21/2343 20130101; H04N 21/234327
20130101; H04N 19/12 20141101; H04N 19/164 20141101; H04N 19/156
20141101; H04N 21/234345 20130101; H04N 21/8146 20130101; H04N
21/658 20130101; H04N 21/6377 20130101; H04N 21/6379 20130101 |
International
Class: |
H04N 19/156 20060101
H04N019/156; H04N 21/63 20060101 H04N021/63; H04N 21/6379 20060101
H04N021/6379; H04N 19/164 20060101 H04N019/164; H04N 21/2343
20060101 H04N021/2343 |
Claims
1. A hybrid codec for encoding and decoding a plurality of
multi-dimensional data streams for a multi-dimensional experience,
the hybrid codec comprising: a plurality of codecs suitable for
encoding and decoding multi-dimensional experience data streams
related to a multi-dimensional experience shared over a network
between one or more transmitting devices and a receiving device; a
Quality of Experience (QoE) decision engine configured to: receive
an output associated with the multi-dimensional experience, the
encoded output including the plurality of multi-dimensional data
streams; wherein the output is divided into a plurality of regions;
analyze the output in each of the plurality of regions; and for
each of the plurality of regions, select one codec from the
plurality of codecs to decode the encoded output in that region;
wherein, the selection of the codec is made to improve a human
perception of the multi-dimensional experience and is based on data
associated with the capabilities of a transmitting device, the
capabilities of the receiving device, and the characteristics of
the multi-dimensional experience; and a network engine configured
to implement a low-latency transfer protocol for transmitting and
receiving of encoded multi-dimensional data streams; wherein the
low latency transfer protocol takes into account the current
conditions of the network, the capabilities of the one or more
transmitting devices, and the capabilities of the receiving
device.
2. The hybrid codec of claim 1, wherein the plurality of codecs
includes an audio codec and a video codec.
3. The hybrid codec of claim 2, wherein the plurality of codecs
further includes a gesture command codec.
4. The hybrid codec of claim 2, wherein the plurality of codecs
further includes a sensor data codec.
5. The hybrid codec of claim 2, wherein the plurality of codecs
further includes an emotion data codec.
6. The hybrid codec of claim 1, wherein improving human perception
of the multi-dimensional experience includes prioritizing the
encoding and decoding of audio streams over the encoding and
decoding of video streams.
7. The hybrid codec of claim 1, wherein the capabilities of the
transmitting and receiving device include the availability of
hardware codecs.
8. The hybrid codec of claim 1, wherein the capabilities of the
receiving device include graphical processing capabilities.
9. The hybrid codec of claim 1, wherein the received encoded output
associated with the multi-dimensional experience includes a
plurality of layers, each of the plurality of layers associated
with a subset of the plurality of multi-dimensional data
streams.
10. A computer implemented method for providing an experience using
a hybrid codec for encoding, decoding, and transmitting
experiences, the hybrid codec including a quality of experience
(QoE) decision engine, a network engine, and plurality of codecs
suitable for encoding and decoding multi-dimensional data streams
associated with the experience, the computer implemented method
comprising: receiving, by the QoE decision engine, an output
including a plurality of data streams, the plurality of data
streams including video, audio, graphics, text, gestures and at
least one emotion; wherein the output is divided into a plurality
of regions; analyzing, by the QoE decision engine, the output in
each of the plurality of regions; for each of the plurality of
regions, selecting, by the QoE decision engine, one codec from the
plurality of codecs to encode the output within that region;
wherein, the selection of the codec is made to improve a human
perception of the multi-dimensional experience and is based on data
associated with the capabilities of a transmitting device, the
capabilities of the receiving device, and the characteristics of
the multi-dimensional experience; wherein the selection is informed
by the network engine, the network engine including a hybrid
network stack with network intelligence configured to implement a
low-latency transfer protocol; and for each of the plurality of
regions, encoding, by the selected codec, the plurality of data
streams associated with the output in that region.
11. The computer implemented method of claim 10, wherein the
experience includes a plurality of layers, and the encoding
generates the plurality of layers.
12. The computer implemented method of claim 10, wherein the
quality of experience engine affects the encoding by taking into
consideration the nature and type of devices involved in providing
the experience.
13. The computer implemented method of claim 10 further comprising
encoding and transmitting virtual goods as part of the
experiences.
14. The computer implemented method of claim 10, wherein the
received output includes a plurality of layers, each of the
plurality of layers associated with a subset of the plurality of
multi-dimensional data streams.
15. The computer implemented method of claim 10, wherein improving
human perception of the multi-dimensional experience includes
prioritizing the encoding and decoding of audio streams over the
encoding and decoding of video streams.
16. The computer implemented method of claim 10, further
characterized in that a network engine provides instructions on how
to encode, the network engine utilizing network information
including bandwidth, latency, and jitter.
17. A system comprising: a plurality of codecs suitable for
encoding and decoding multi-dimensional data streams one or more
processors; and a memory unity having instructions stored thereon,
which when executed by the one or more processors, cause the system
to: receiving an output including a plurality of data streams, the
plurality of data streams including video, audio, graphics, text,
gestures and at least one emotion, wherein the output is divided
into a plurality of regions; analyze the output in each of the
plurality of regions; for each of the plurality of regions, select
one codec from the plurality of codecs to encode the output within
that region; wherein, the selection of the codec is made to improve
a human perception of the multi-dimensional experience and is based
on data associated with the capabilities of a transmitting device,
the capabilities of the receiving device, and the characteristics
of the multi-dimensional experience; wherein the selection is
informed by a network engine, the network engine including a hybrid
network stack with network intelligence configured to implement a
low-latency transfer protocol; and for each of the plurality of
regions, cause the selected codec to encode the plurality of data
streams associated with the output in that region.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] The present application is a continuation of U.S. patent
application Ser. No. 13/363,187 entitled "Experience or "Sentio"
Codecs, and "Methods and Systems for Improving QoE and Encoding
Based on QoE Experiences", filed Jan. 31, 2012, which is a
continuation of U.S. patent application Ser. No. 13/136,870
entitled "Experience or "Sentio" Codecs, and "Methods and Systems
for Improving QoE and Encoding Based on QoE Experiences", filed
Aug. 12, 2011 (now U.S. Pat. No. 9,172,979) which claims the
benefit of and priority to U.S. Provisional Patent Application No.
61/373,236 entitled "Experience or "Sentio" Codecs, and Methods and
Systems for Improving QoE and Encoding Based on QoE Experiences,"
filed on Aug. 12, 2010, the contents of the above identified
applications are incorporated herein by reference in their
entirety. This application is therefore entitled to a priority date
of Aug. 12, 2010.
FIELD OF INVENTION
[0002] The present teaching relates to experience or "sentio"
codecs enabling encoding and transmission for data streams
involving a variety of dimensions and data types including video,
group participation, gesture recognition, heterogeneous device use,
emotions, etc.
SUMMARY OF THE INVENTION
[0003] The present invention contemplates a variety of experience
or "sentio" codecs, and methods and systems for enabling an
experience platform, as well as a Quality of Experience (QoE)
engine which allows the sentio codec to select a suitable encoding
engine or device. As will be described in more detail below, the
sentio codec is capable of encoding and transmitting data streams
that correspond to participant experiences with a variety of
different dimensions and features. As will be appreciated, the
following description provides one paradigm for understanding the
multi-dimensional experience available to the participants, and as
implemented utilizing a sentio codec. There are many suitable ways
of describing, characterizing and implementing the sentio codec and
experience platform contemplated herein.
BRIEF DESCRIPTION OF DRAWINGS
[0004] These and other objects, features and characteristics of the
present invention will become more apparent to those skilled in the
art from a study of the following detailed description in
conjunction with the appended claims and drawings, all of which
form a part of this specification. In the drawings:
[0005] FIG. 1 illustrates a system architecture for composing and
directing user experiences;
[0006] FIG. 2 is a block diagram of an experience agent;
[0007] FIG. 3 is a block diagram of a sentio codec; and
[0008] FIG. 4 provides a screen shot useful for illustrating how a
hybrid encoding scheme can be used to accomplish low-latency
transmission.
DETAILED DESCRIPTION OF THE INVENTION
[0009] The present invention contemplates a variety of experience
or "sentio" codecs, and methods and systems for enabling an
experience platform, as well as a Quality of Experience (QoS)
engine which allows the sentio codec to select a suitable encoding
engine or device. As will be described in more detail below, the
sentio codec is capable of encoding and transmitting data streams
that correspond to participant experiences with a variety of
different dimensions and features. (The term "sentio" is Latin
roughly corresponding to perception or to perceive with one's
senses, hence the original nomenclature "sensio codec.")
[0010] The primary goal of a video codec is to achieve maximum
compression rate for digital video while maintaining great picture
quality video; audio codecs are similar. But video and audio codecs
alone are insufficient to generate and capture a full experience,
such as a real-time experience enabled by hybrid encoding, and
encoding of other experience aspects such as gestures, emotions,
etc.
[0011] FIG. 4 will now be described to provide an example
experience showing 4 layers where video encoding alone is
inadequate. (The "layer" concept will be described below in more
detail with reference to FIGS. 1-3.) A first layer is generated by
Autodesk 3ds Max instantiated on a suitable layer source, such as
on an experience server or a content server. A second layer is an
interactive frame around the 3ds Max layer, and in this example is
generated on a client device by an experience agent. A third layer
is the black box in the bottom-left corner with the text "FPS" and
"bandwidth", and is generated on the client device but pulls data
by accessing a service engine available on the service platform. A
fourth layer is a red-green-yellow grid which demonstrates an
aspect of a low-latency transfer protocol (e.g., different regions
being selectively encoded) and is generated and computed on the
service platform, and then merged with the 3ds Max layer on the
experience server.
[0012] FIG. 4 illustrates how a hybrid encoding approach can be
used to accomplish low-latency transmission. The first layer
provides an Autodesk 3ds Max image including a rotating teapot, the
first layer moving images, static or nearly static images, and
graphic and/or text portions. Rather then encoding all the
information with a video encoder alone, a hybrid approach encoding
some regions with a video encoder, other regions with a picture
encoder, and other portions as command, results in better
transmission results, and can be optimized based on factors such as
the state of the network and the capabilities of end devices. These
different encoding regions are illustrated by the different
coloring of the red-green-yellow grid of layer 4. One example of
this low-latency protocol is described in more detail in Vonog et
al.'s U.S. patent application Ser. No. 12/569,876, filed Sep. 29,
2009, and incorporated herein by reference for all purposes
including the low-latency protocol and related features such as the
network engine and network stack arrangement.
[0013] As is seen from the example of FIG. 4, a video codec alone
is inadequate to accomplish the hybrid encoding scheme covering
video, pictures and commands. While it is theoretically possible to
encode the entire first layer using only a video codec, latency and
other issues can prohibit real-time and/or quality experiences. A
low-latency protocol can solve this problem by efficiently encoding
the data.
[0014] In another example, a multiplicity of video codecs can be
used to improve encoding and transmission. For example, h.264 can
be used if a hardware decoder is available, thus saving battery
life and improving performance, or a better video codec (e.g., low
latency) can be used if the device fails to support h.264.
[0015] As yet another example, consider the case of multiple
mediums where an ability to take into account the nature of human
perception would be beneficial. For example, assume we have video
and audio information. If network quality degrades, it could be
better to prioritize audio and allow the video to degrade. To do so
would require using psychoacoustics to improve the QoE.
[0016] Accordingly, the present teaching contemplates an experience
or sentio codec capable of encoding and transmitting data streams
that correspond to experiences with a variety of different
dimensions and features. These dimensions include known audio and
video, but further may include any conceivable element of a
participant experience, such as gestures, gestures+voice commands,
"game mechanics" (which you can use to boost QoE when current
conditions (such as network) do not allow you to do so--i.e. apply
sound distortion effect specific to a given experience when loss of
data happened), emotions (perhaps as detected via voice or facial
expressions, various sensor data, microphone input, etc.
[0017] It is also contemplated that virtual experiences can be
encoded via the sentio codec. According to one embodiment, virtual
goods are evolved into virtual experiences. Virtual experiences
expand upon limitations imposed by virtual goods by adding
additional dimensions to the virtual goods. By way of example, User
A transmits flowers as a virtual good to User B. The transmission
of the virtual flowers is enhanced by adding emotion by way of
sound, for example. The virtual flowers are also changed to a
virtual experience when User B can do something with the flowers,
for example User B can affect the flowers through any sort of
motion or gesture. User A can also transmit the virtual goods to
User B by making a "throwing" gesture using a mobile device, so as
to "toss" the virtual goods to User B.
[0018] The sentio codec improves the QoE to a consumer or
experience participant on the device of their choice. This is
accomplished through a variety of mechanisms, selected and
implemented, possibly dynamically, based on the specific
application and available resources. In certain embodiments, the
sentio codec encodes multi-dimensional data streams in real-time,
adapting to network capability. A QoE engine operating within the
sentio codec a makes decisions on how to use different available
codecs. The network stack can be implemented as hybrid, as
described above, and in further detail with reference to Vonog et
al.'s U.S. patent application Ser. No. 12/569,876.
[0019] The sentio codec can include 1) a variety of codecs for each
segment of experience described above, 2) a hybrid network stack
with network intelligence, 3) data about available devices, and 4)
a QoE engine that makes decisions on how to encode. It will be
appreciated that QoE is achieved through various strategies that
work differently for each given experience (say a zombie karaoke
game vs. live stadium rock concert experience), and adapt in
real-time to the network and other available resources, know the
devices involved and take advantages of various psychological
tricks to conceal imperfections which inevitably arise,
particularly when the provided experience is scaled for many
participants and devices.
[0020] FIG. 1 illustrates a block diagram of a system 10. The
system 10 can be viewed as an "experience platform" or system
architecture for composing and directing a participant experience.
As will be appreciated, the experience platform described herein
provides, by way of example only, one platform suitable for
incorporating and taking advantage of the sentio codec described
herein. In one embodiment, the experience platform 10 is provided
by a service provider to enable an experience provider to compose
and direct a participant experience. The participant experience can
involve one or more experience participants. The experience
provider can create an experience with a variety of dimensions, as
will be explained further now. The sentio codec enables the
encoding and transmission of data streams representing this variety
of dimensions. As will be appreciated, the following description
provides one paradigm for understanding the multi-dimensional
experience available to the participants. There are many suitable
ways of describing, characterizing and implementing the experience
platform contemplated herein.
[0021] In general, services are defined at an API layer of the
experience platform. The services are categorized into
"dimensions." The dimension(s) can be recombined into "layers." The
layers form to make features in the experience. The sentio codec
enables encoding and transmission of the data streams representing
the various dimensions and features.
[0022] By way of example, the following are some of the dimensions
that can be supported on a suitable experience platform, and the
related data streams encoded by a suitable sentio codec. It will be
appreciated that not all dimensions are necessarily available or
needed on specific experience platforms or specific devices, and
that the sentio codec can be implemented with general, all
encompassing capabilities, or with only those capabilities needed
for the specific implementation, or with a suitable subset.
[0023] Video--is the near or substantially real-time streaming of
the video portion of a video or film with near real-time display
and interaction.
[0024] Audio--is the near or substantially real-time streaming of
the audio portion of a video, film, karaoke track, song, with near
real-time sound and interaction.
[0025] Live--is the live display and/or access to a live video,
film, or audio stream in near real-time that can be controlled by
another experience dimension. A live display is not limited to
single data stream.
[0026] Encore--is the replaying of a live video, film or audio
content. This replaying can be the raw version as it was originally
experienced, or some type of augmented version that has been
edited, remixed, etc.
[0027] Graphics--is a display that contains graphic elements such
as text, illustration, photos, freehand geometry and the attributes
(size, color, location) associated with these elements. Graphics
can be created and controlled using the experience input/output
command dimension(s) (see below).
[0028] Input/Output Command(s)--are the ability to control the
video, audio, picture, display, sound or interactions with human or
device-based controls. Some examples of input/output commands
include physical gestures or movements, voice/sound recognition,
and keyboard or smart-phone device input(s).
[0029] Interaction--is how devices and participants interchange and
respond with each other and with the content (user experience,
video, graphics, audio, images, etc.) displayed in an experience.
Interaction can include the defined behavior of an artifact or
system and the responses provided to the user and/or player.
[0030] Game Mechanics--are rule-based system(s) that facilitate and
encourage players to explore the properties of an experience space
and other participants through the use of feedback mechanisms. Some
services on the experience Platform that could support the game
mechanics dimensions include leader boards, polling, like/dislike,
featured players, star-ratings, bidding, rewarding, role-playing,
problem-solving, etc.
[0031] Ensemble--is the interaction of several separate but often
related parts of video, song, picture, story line, players, etc.
that when woven together create a more engaging and immersive
experience than if experienced in isolation.
[0032] Auto Tune--is the near real-time correction of pitch in
vocal and/or instrumental performances. Auto Tune is used to
disguise off-key inaccuracies and mistakes, and allows
singer/players to hear back perfectly tuned vocal tracks without
the need of singing in tune.
[0033] Auto Filter--is the near real-time augmentation of vocal
and/or instrumental performances. Types of augmentation could
include speeding up or slowing down the playback,
increasing/decreasing the volume or pitch, or applying a
celebrity-style filter to an audio track (like a Lady Gaga or
Heavy-Metal filter).
[0034] Remix--is the near real-time creation of an alternative
version of a song, track, video, image, etc. made from an original
version or multiple original versions of songs, tracks, videos,
images, etc.
[0035] Viewing 360.degree./Panning--is the near real-time viewing
of the 360.degree. horizontal movement of a streaming video feed on
a fixed axis. Also the ability to for the player(s) to control
and/or display alternative video or camera feeds from any point
designated on this fixed axis.
[0036] Turning back to FIG. 1, the experience platform 10 includes
a plurality of devices 20 and a data center 40. The devices 12 may
include devices such as an iPhone 22, an android 24, a set top box
26, a desktop computer 28, and a netbook 30. At least some of the
devices 12 may be located in proximity with each other and coupled
via a wireless network. In certain embodiments, a participant
utilizes multiple devices 12 to enjoy a heterogeneous experience,
such as using the iPhone 22 to control operation of the other
devices. Multiple participants may also share devices at one
location, or the devices may be distributed across various
locations for different participants.
[0037] Each device 12 has an experience agent 32. The experience
agent 32 includes a sentio codec and an API. The sentio codec can
include 1) a variety of codecs for each segment of experience
described above, 2) a hybrid network stack with network
intelligence, 3) data about available devices, and 4) a QoE
decision engine that makes decisions on how to encode. It will be
appreciated that QoE is achieved through various strategies that
work differently for each given experience (say a zombie karaoke
game vs. live stadium rock concert experience), and adapt in
real-time to the network and other available resources, know the
devices involved and take advantages of various psychological
tricks to conceal imperfections which inevitably arise,
particularly when the provided experience is scaled for many
participants and devices.
[0038] The sentio codec and the API enable the experience agent 32
to communicate with and request services of the components of the
data center 40. The experience agent 32 facilitates direct
interaction between other local devices. Because of the
multi-dimensional aspect of the experience, the sentio codec and
API should fully enable the desired experience. However, the
functionality of the experience agent 32, including the sentio
codec, is typically tailored to the needs and capabilities of the
specific device 12 on which the experience agent 32 is
instantiated. In some embodiments, services implementing experience
dimensions are implemented in a distributed manner across the
devices 12 and the data center 40. In other embodiments, the
devices 12 have a very thin experience agent 32 with little
functionality beyond a minimum API and sentio codec, and the bulk
of the services and thus composition and direction of the
experience are implemented within the data center 40.
[0039] Data center 40 includes an experience server 42, a plurality
of content servers 44, and a service platform 46. As will be
appreciated, data center 40 can be hosted in a distributed manner
in the "cloud," and typically the elements of the data center 40
are coupled via a low latency network. The experience server 42,
servers 44, and service platform 46 can be implemented on a single
computer system, or more likely distributed across a variety of
computer systems, and at various locations.
[0040] The experience server 42 includes at least one experience
agent 32, an experience composition engine 48, and an operating
system 50. The experience agent 32 again includes a sentio codec
with the various capabilities as described herein. In one
embodiment, the experience composition engine 48 is defined and
controlled by the experience provider to compose and direct the
experience for one or more participants utilizing devices 12.
Direction and composition is accomplished, in part, by merging
various content layers and other elements into dimensions generated
from a variety of sources such as the service provider 42, the
devices 12, the content servers 44, and/or the service platform
46.
[0041] The content servers 44 may include a video server 52, an ad
server 54, and a generic content server 56. Any content suitable
for encoding by the sentio codec of an experience agent can be
included as an experience layer. These include well know forms such
as video, audio, graphics, and text. As described in more detail
earlier and below, other forms of content such as gestures,
emotions, temperature, proximity, etc., are contemplated for
encoding and inclusion in the experience via a sentio codec, and
are suitable for creating dimensions and features of the
experience.
[0042] The service platform 46 includes at least one experience
agent 32, a plurality of service engines 60, third party service
engines 62, and a monetization engine 64. In some embodiments, each
service engine 60 or 62 has a unique, corresponding experience
agent with a corresponding sentio codec. The sentio codecs may have
separate code and utilize different and/or combinations of the same
local hardware. As will be appreciated, the implementation may be
distinct to each application. In other embodiments, a single
experience agent 32 can support multiple service engines 60 or 62.
The service engines and the monetization engines 64 can be
instantiated on one server, or can be distributed across multiple
servers. The service engines 60 correspond to engines generated by
the service provider and can provide services such as audio
remixing, gesture recognition, and other services referred to in
the context of dimensions above, etc. Third party service engines
62 are services included in the service platform 46 by other
parties. The service platform 46 may have the third-party service
engines instantiated directly therein, or within the service
platform 46 these may correspond to proxies which in turn make
calls to servers under control of the third-parties.
[0043] Monetization of the service platform 46 can be accomplished
in a variety of manners. For example, the monetization engine 64
may determine how and when to charge the experience provider for
use of the services, as well as tracking for payment to
third-parties for use of services from the third-party service
engines 62.
[0044] FIG. 2 illustrates a block diagram of an experience agent
100. The experience agent 100 includes an application programming
interface (API) 102 and a sentio codec 104. The API 102 is an
interface which defines available services, and enables the
different agents to communicate with one another and request
services.
[0045] The sentio codec 104 is a combination of hardware and/or
software which enables encoding of many types of data streams for
operations such as transmission and storage, and decoding for
operations such as playback and editing. These data streams can
include standard data such as video and audio. Additionally, the
data can include graphics, sensor data, gesture data, and emotion
data.
[0046] FIG. 3 illustrates a block diagram of one embodiment of a
sentio codec 200. The sentio codec 200 includes a plurality of
codecs such as video codecs 202, audio codecs 204, graphic language
codecs 206, sensor data codecs 208, and emotion codecs 210. The
sentio codec 200 further includes a quality of experience (QoE)
decision engine 212 and a network engine 214. The codecs, the QoE
decision engine 212, and the network engine 214 work together to
encode one or more data streams and transmit the encoded data
according to a low-latency transfer protocol supporting the various
encoded data types. One suitable low-latency protocol and more
details related to the network engine 214 can be found in Vonog et
al.'s U.S. patent application Ser. No. 12/569,876.
[0047] The sentio codec 200 can be designed to take all aspects of
the experience platform into consideration when executing the
transfer protocol. The parameters and aspects include available
network bandwidth, transmission device characteristics and
receiving device characteristics. Additionally, the sentio codec
200 can be implemented to be responsive to commands from an
experience composition engine or other outside entity to determine
how to prioritize data for transmission. In many applications,
because of human response, audio is the most important component of
an experience data stream. However, a specific application may
desire to emphasize video or gesture commands.
[0048] The sentio codec provides the capability of encoding data
streams corresponding to many different senses or dimensions of an
experience. For example, a device 12 may include a video camera
capturing video images and audio from a participant. The user image
and audio data may be encoded and transmitted directly or, perhaps
after some intermediate processing, via the experience composition
engine 48, to the service platform 46 where one or a combination of
the service engines can analyze the data stream to make a
determination about an emotion of the participant. This emotion can
then be encoded by the sentio codec and transmitted to the
experience composition engine 48, which in turn can incorporate
this into a dimension of the experience. Similarly a participant
gesture can be captured as a data stream, e.g. by a motion sensor
or a camera on device 12, and then transmitted to the service
platform 46, where the gesture can be interpreted, and transmitted
to the experience composition engine 48 or directly back to one or
more devices 12 for incorporation into a dimension of the
experience.
[0049] The sentio codec delivers the best QoE to a consumer on the
device of their choice through current network. This is
accomplished through a variety of mechanisms, selected and
implemented based on the specific application and available
resources. In certain embodiments, the sentio codec encodes
multi-dimensional data streams in real-time, adapting to network
capability. A QoE engine operating within the sentio codec a makes
decisions on how to use different available codecs. The network
stack can be implemented as hybrid, as described above, and in
further detail with reference to Vonog et al.'s U.S. patent
application Ser. No. 12/569,876.
[0050] In addition to the above mentioned examples, various other
modifications and alterations of the invention may be made without
departing from the invention. Accordingly, the above disclosure is
not to be considered as limiting and the appended claims are to be
interpreted as encompassing the true spirit and the entire scope of
the invention.
* * * * *