U.S. patent application number 14/463695 was filed with the patent office on 2015-02-12 for system & method for real-time video communications.
The applicant listed for this patent is SORYN TECHNOLOGIES LLC. Invention is credited to Royn D. Coultas, Andrew K. Gooding, Albert Jordan, John D. Ralston, Steven E. Saunders.
Application Number | 20150042744 14/463695 |
Document ID | / |
Family ID | 47752833 |
Filed Date | 2015-02-12 |
United States Patent
Application |
20150042744 |
Kind Code |
A1 |
Ralston; John D. ; et
al. |
February 12, 2015 |
System & Method for Real-Time Video Communications
Abstract
Systems and methods for video communication services are
presented herein. In particular, systems and methods in which
multiple participants can simultaneously create and share video in
real-time are presented herein. Other systems and methods are also
presented herein.
Inventors: |
Ralston; John D.; (Portola
Valley, CA) ; Jordan; Albert; (Menlo Park, CA)
; Coultas; Royn D.; (Atherton, CA) ; Gooding;
Andrew K.; (Los Altos, CA) ; Saunders; Steven E.;
(Cupertino, CA) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
SORYN TECHNOLOGIES LLC |
JERSEY CITY |
NJ |
US |
|
|
Family ID: |
47752833 |
Appl. No.: |
14/463695 |
Filed: |
August 20, 2014 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
13407732 |
Feb 28, 2012 |
8896652 |
|
|
14463695 |
|
|
|
|
61447664 |
Feb 28, 2011 |
|
|
|
Current U.S.
Class: |
348/14.02 ;
348/14.08 |
Current CPC
Class: |
H04L 12/1827 20130101;
H04N 21/2405 20130101; H04N 21/4728 20130101; H04N 21/64707
20130101; H04L 65/604 20130101; H04L 65/103 20130101; H04N 21/2404
20130101; H04N 21/2402 20130101; H04L 65/80 20130101; H04N 7/152
20130101 |
Class at
Publication: |
348/14.02 ;
348/14.08 |
International
Class: |
H04N 7/15 20060101
H04N007/15; H04L 29/06 20060101 H04L029/06 |
Claims
1. A real time communication platform, comprising, an application
layer module interoperable on a processor associated with a mobile
device having a memory and at least one camera, wherein the
application layer module comprises at least one session control
module; a digital technology media engine in communication with the
application layer module and at least one media source accessible
to the processor of the mobile device, wherein the digital
technology media engine includes at least one codec; and a real
time adaptation sub-system in communication with the application
layer module and the processor, wherein the real time adaptation
sub-system is capable of detecting and adapting to variations in
one or more conditions to which at least one of the processor,
memory, or the at least one camera is subjected.
2. The system of claim 1, wherein the one or more conditions that
the real time adaptation sub-system is capable of detecting relate
to at least one variety of network impairment
3. The system of claim 2, wherein the network impairment relates to
packet delay.
4. The system of claim 2, wherein the network impairment relates to
network congestion.
5. The system of claim 1, wherein the one or more conditions that
the real time adaptation sub-system is capable of detecting relate
to at least one device impairment.
6. The system of claim 5, wherein the device impairment relates to
a frame rate associated with one of the at least one cameras.
7. The system of claim 5, wherein the device impairment relates to
processor loading time of the processor.
8. The system of claim 5, wherein the device impairment relates to
limitations associated with a forward-facing one of the at least
one cameras.
9. The system of claim 1, wherein the application layer module is
embedded within a browser installed on the mobile device.
10. The system of claim 1, wherein the application layer module is
capable of communicating information related to a codec to a
different mobile device.
11. The system of claim 1, wherein the session control module is
capable of performing device registration operations.
12. A system for real time communication comprising, a client
application installed within a mobile device having a memory and at
least one camera, wherein the client application comprises: an
application layer module overlaying the mobile device and
comprising at least one session control module; a digital
technology media engine in communication with the application layer
module and at least one media source accessible to the processor of
the mobile device, wherein the digital technology media engine
includes at least one codec; a real time adaptation sub-system in
communication with the application layer module and the processor,
wherein the real time adaptation sub-system is capable of detecting
and adapting to variations in one or more conditions to which at
least one of the processor, memory, or the at least one camera is
subjected; and a plurality of server applications installed on a
server.
13. The system of claim 12, wherein the client application is
embedded within a browser installed on the mobile device.
14. The system of claim 12, wherein the server comprises a video
gateway.
15. The system of claim 12, wherein the server comprises a
multi-point control unit.
16. The system of claim 12, wherein a first server application
provides a transcoding functionality.
17. The system of claim 16, wherein a second server application
enables real time video editing.
18. The system of claim 12, wherein the server is a cloud-based
server.
19. The system of claim 12, wherein the session control module is
capable of performing device registration operations.
20. A method of providing real time communication, comprising:
deploying a client application to a mobile device having a memory
and at least one camera, wherein the client application comprises:
an application layer module capable of being interoperable on a
processor associated with the mobile device and comprising at least
one session control module; a digital technology media engine
capable of communicating with the application layer module and at
least one media source accessible to the processor of the mobile
device, and including at least one codec; and a real time
adaptation sub-system capable of communicating with the application
layer module and the processor, and further capable of detecting
and adapting to variations in one or more conditions to which at
least one of the processor, memory, or the at least one camera is
subjected.
21. The method of claim 20, wherein the client application is
embedded within a web browser application.
22. The method of claim 20, wherein the client application is
deployed in association with a web browser application.
23. The method of claim 20, wherein subsequent to being deployed in
the mobile device, the client application communicates with a
different mobile device.
24. The method of claim 23, wherein the communication is with a web
browser application installed on the different mobile device.
25. The method of claim 23, wherein the communication includes
information related to a codec.
26. The method of claim 20, wherein the one or more conditions that
the real time adaptation sub-system is capable of detecting relate
to at least one variety of network impairment
27. The method of claim 26, wherein the network impairment relates
to packet delay.
28. The method of claim 26, wherein the network impairment relates
to network congestion.
29. The method of claim 20, wherein the one or more conditions that
the real time adaptation sub-system is capable of detecting relate
to at least one variety of device impairment.
30. The method of claim 29, wherein the device impairment relates
to frame rate associated with one of the at least one cameras.
31. The method of claim 29, wherein the device impairment relates
to processor loading time of the processor.
32. The method of claim 29, wherein the device impairment relates
to limitations associated with a forward-facing one of the at least
one cameras.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application is a continuation to U.S. patent
application Ser. No. 13/407,732, filed Feb. 28, 2012, entitled
"System and Method for Real-Time Video Communications, which claims
the benefit of U.S. Provisional Application No. 61/447,664, filed
on Feb. 28, 2011, and entitled "System and Method for Real-Time
Video Communications", which is incorporated herein by
reference.
FIELD OF INVENTION
[0002] The present invention relates generally to video
communication services. More particularly, the present invention
relates to electronic devices, computer program products, and
methods with which multiple participants can simultaneously create
and share video in real-time.
BACKGROUND OF THE INVENTION
[0003] Demand for Real-Time Video
[0004] Explosive growth in consumer and business demand for
real-time video on mobile and Internet devices has created exciting
new commercial opportunities and major new technical challenges. As
they pursue the integration of new real-time video capabilities
(FIG. 1) for mobile/Internet communication, business collaboration,
entertainment, and social networking, device manufacturers, network
infrastructure providers, and service provides are struggling to
meet customer expectations for higher quality real-time video
across a wider range of devices and networks.
[0005] Limitations of Broadcast Video Solutions
[0006] Today's standard video processing and distribution
technologies have been developed to efficiently support one-way
video broadcast, not the two-way and multi-party video sharing
required for real-time mobile and Internet user interaction.
Traditional broadcast industry solutions have proven to be too
computationally complex and bandwidth hungry to deliver the device,
infrastructure, or bandwidth requirements for commercially scalable
real-time mobile/Internet video services.
[0007] Device, Network, and Video Fluctuations
[0008] Furthermore, the available computational resources on many
devices, as well as the delay, jitter, packet loss, and bandwidth
congestion over user networks cannot be guaranteed to remain
constant during a real-time video/audio communication session. In
the absence of any adaptation strategy, both device and network
loading can lead to significant degradation in the user experience.
An adaptation strategy designed to address network fluctuations but
not device loading fluctuations is ineffective, since it is often
difficult to distinguish between these two contributors to apparent
"lost packets" and other performance degradations. Adaptation to
frame-to-frame fluctuations in inherent video characteristics can
provide additional performance benefits.
[0009] Embodiments of the present invention comprise an
all-software Real-time Video Service Platform (RVSP). The RVSP is
an end-to-end system solution that enables high-quality real-time
two-way and multi-party video communications within the real-world
constraints of mobile networks and public Internet connections. The
RVSP includes both Client and Server software applications, both of
which leverage low-complexity, low-bandwidth, and network-adaptive
video processing and communications methods.
[0010] The RVSP Client (FIG. 2) integrates all: video and audio
encode, decode, and synchronization functions; real-time device and
network adaptation; and network signaling, transport, and control
protocols, into a single all-software application compatible with
smartphone and PC operating systems. The RVSP client application
has been designed to accommodate fluctuations in: the internal
loading of client devices; external impairments on a variety of
different user networks; and inherent video characteristics such as
frame-to-frame compressibility and degree of motion.
[0011] The RVSP Server (FIG. 3) integrates multiparty connectivity,
transcoding, and automated video editing into a single all-software
application. The all-software architecture of the RVSP supports
flexible deployment across a wide range of network infrastructure,
including existing mobile application/media server infrastructure,
standard utility server hardware, or in a cloud computing
infrastructure. For both peer-to-peer and server-based real-time
2-way video share services and multi-party video conferencing, the
RVSP platform reduces both the up-front capital expenditures
(CapEx) and on-going operational expenditures (OpEx) compared to
existing video platforms that utilize significantly higher
bandwidths and require additional specialized video hardware in
both the user devices and the network infrastructure.
[0012] In order to meet customer expectations for higher quality
video across a wider range of devices and networks, mobile
operators and other communication service providers worldwide have
made significant new investments in IP Multimedia Subsystem (IMS)
network infrastructure. By reducing bandwidth consumption and
supporting higher concurrent user loading capabilities for a given
infrastructure investment and bandwidth allotment in an IMS
deployment (FIG. 4), the RVSP provides significant CapEx and OpEx
reductions over competing real-time video platforms that require
additional specialized video hardware in both the user devices and
the network infrastructure.
[0013] The RVSP also delivers similar CapEx and OpEx benefits for
"over the top" (OTT) and direct-to-subscriber deployments of
real-time video services (FIG. 5) using standard utility server
hardware or in a cloud computing infrastructure. In these cases,
mobile devices communicating via public Internet or corporate
networking infrastructure typically do not have access to video
quality-of-service (QoS) enhancements in the mobile operator's IMS
core. The real-time network adaptation features of the RVSP
disclosed here then become critical to delivering a compelling user
experience within the real-world constraints of mobile networks and
consumer Internet connections.
[0014] Video conferencing systems are evolving to enable a more
life-like "Telepresence" user experience, in which the quality of
the real-time video and audio communications and the physical
layout of the meeting rooms are enhanced so that multiple remote
parties can experience the look, sound, and feel of all meeting
around at the same table. As shown in FIG. 6, multi-user video
conferencing systems typically require specially designed meeting
rooms with dedicated video cameras, large size video displays,
arrays of audio microphones and speakers, and specialized
processing equipment for digitizing, compressing, and distributing
the multiple video and audio streams over dedicated high-speed data
network connections.
[0015] For many consumer and business applications, there is a need
to extend higher quality multi-party video communications to
participants using a wider variety of less-specialized
video-enabled electronic devices, including mobile communications
devices, laptop computers, PCs, and standard TVs. There is also a
need to extend immersive business communications to support a wider
range of consumer and professional collaboration and social
networking activities.
[0016] When it comes to multi-party video communications, users of
these less-specialized electronic devices encounter a number of
drawbacks in the devices and in the user experience. For example,
these devices may have a wide range of video processing
capabilities, video display sizes, available connection bandwidths,
and available connection quality-of-service (QoS). Furthermore,
without the benefit of specially designed meeting rooms, creating a
"perceptually pleasant" meeting experience is challenging. Many
video conferencing systems rely on a static screen layout in which
all participants are presented within an array of equal-sized video
"tiles", even though several participants may be passive listeners
throughout much of the meeting and hence contribute very little.
These "static" multi-party video default display layouts have many
drawbacks, including: [0017] 1. All participants are displayed at
the same image size, same image quality, and same video frame rate,
regardless of their level of participation. [0018] 2. Individual
participants have no control over the display layout on their own
device. [0019] 3. A participant with the role of "moderator" cannot
"give the floor" to individual participants, as they can in a
face-to-face conference setting. [0020] 4. Participants cannot
choose to focus on one other participant, as they can in a
face-to-face conference setting.
[0021] When deployed together, the RVSP Client and Server
applications enable multiple participants to simultaneously create
and share high-quality video with each other in real-time, with
many key aspects of a face-to-face user experience.
BRIEF DESCRIPTION OF THE DRAWINGS
[0022] To facilitate further description of the embodiments, the
following drawings are provided, in which like numbers in different
figures indicate the same elements.
[0023] FIG. 1 illustrates examples of real-time video services.
[0024] FIG. 2 illustrates an example of real-time video service
platform client according to an embodiment of the present
invention.
[0025] FIG. 3 illustrates an example of a real-time video service
platform server application according to an embodiment of the
present invention.
[0026] FIG. 4 illustrates an example of a system according to an
embodiment of the present invention.
[0027] FIG. 5 illustrates an example of a system according to an
embodiment of the present invention.
[0028] FIG. 6 illustrates an example of a multi-party communication
system.
[0029] FIG. 7 illustrates examples of network impairments.
[0030] FIG. 8 illustrates an example of measured variations in
compressed video frame size generated for a constant level of
perceived image quality.
[0031] FIG. 9 illustrates examples of video quality and user
experience degradations.
[0032] FIG. 10 illustrates an example of a system according to an
embodiment of the present invention.
[0033] FIG. 11 illustrates an example of differences in network
congestion.
[0034] FIG. 12 illustrates an example of a system according to an
embodiment of the present invention.
[0035] FIG. 13 illustrates an example of a rate function.
[0036] FIG. 14 illustrates an example of a video encoder according
to an embodiment of the present invention.
[0037] FIG. 15 illustrates an example of a network
configuration.
[0038] FIG. 16 illustrates an example of a network
configuration.
[0039] FIG. 17 illustrates an example of measured output video bit
rate vs. target bit rate.
[0040] FIG. 18 illustrates an example of measured output video bit
rate vs. target bit rate.
[0041] FIG. 19 illustrates an example of a system with voice
activity detection according to an embodiment of the present
invention.
[0042] FIG. 20 illustrates an example of a system with moderator
selection according to an embodiment of the present invention.
[0043] FIG. 21 illustrates an example of a system with participant
selection according to an embodiment of the present invention.
DETAILED DESCRIPTION
[0044] RVSP Client Application
[0045] As illustrated in FIG. 2, the RVSP Client integrates all:
video and audio encode, decode, and synchronization functions;
real-time device and network adaptation; and network signaling,
transport, and control protocols, into a single all-software
application compatible with leading smartphone and PC operating
systems. The highly modular and open API architecture of the RVSP
Client supports rapid and flexible device and service
customization. Key components of the RVSP Client application
include the Digital Technology Media Engine (DTME), Application
Layer, and Device Abstraction, OS Abstraction, and Network
Abstraction modules. The RVSP Client can include more or less
components than specifically mentioned herein.
[0046] Application Layer
[0047] The Application Layer provides the primary user interface
(UI), and can be rapidly customized to support a wide range of
real-time video applications and services with customer-specified
User Experience Design (UxD) requirements. The Application Layer is
implemented in Java to leverage the many additional capabilities
included in today's mobile device and PC platforms. An example
Application Layer for a mobile Video Chat service would include the
following modules:
TABLE-US-00001 SIP, NAT Ensures compatibility with real-time
communications (Session infrastructure deployed by mobile operators
and Internet video Control) service providers. Implements SIP-based
call session Module provisioning, device registration, device and
service capabilities exchange, call session management, and media
routing. The RVSP Client has been successfully integrated with
multiple SIP servers and other proprietary signaling protocol
servers. Call View Implements the User Interface (UI) for each
application, Activities allowing for customer-specific branding at
both the device and Module service level. Settings Governs the user
editable settings for each application or Module service. Settings
are preserved in the device database and thus persistent Address
Interacts with both the native handset address book and any Book
additional Network Address Book and Presence functions. Module
[0048] DTME
[0049] The DTME implements all media (video and audio) processing
and delivery functions. The DTME collects media streams from their
designated sources, encodes or decodes them, and delivers the
encoded/decoded media streams to their designated destinations.
Each media source may be a hardware device (camera, microphone), a
network socket, or a file. Similarly, each media destination may be
a hardware device (display, speaker), a network socket, or a
file.
TABLE-US-00002 RTP/ Enables efficient network operations, and
interfaces RTCP directly with device input/output devices (camera,
Stack and display, microphone and speaker) via a hardware
abstraction layer. The RTP/RTCP stack also includes an Adjustable
Jitter Buffer, which automatically sets the jitter buffer depth
depending on network conditions determined by the RTA module.
Real-Time In order to provide an industry-leading real-time mobile
Adaptation video user experience, the RVSP Client application (RTA)
includes a Real-Time Adaptation (RTA) Module Module designed to
accommodate fluctuations in the internal loading of a variety of
different client devices, external impairments on a variety of
different user networks, and inherent video characteristics such as
frame-to-frame compressibility and degree of motion. In the absence
of real-time adaptation, device and network loading significantly
degrade user experience in real-time mobile/Internet video
services. DTV-X The DTV-X Video Codec at the heart of the DTME
Video dramatically reduces the computational complexity of Codec
high-quality, real-time video capture and playback, enabling
all-software implementations on mobile handsets. The DTV-X codec
dramatically reduces compressed image data size while retaining
high picture quality, extends device and networked video storage
capacity, realizes higher image-per-second live
monitoring/playback, enables faster download speeds, and supports
advanced video manipulation in the device and/or in the network.
Other Since the video codec functions are fully abstracted in Video
the DTME, the RVSP Client can be configured to utilize Codecs any
other video codecs, such as H.263 and H.264, which are already
integrated into handset or PC hardware. This feature enables
support for the widest possible range of devices and legacy video
service infrastructure. Audio In a similar manner, the audio codec
functions are also Codecs fully abstracted in the DTME, so that the
RVSP can be configured to utilize a wide range of embedded audio
codecs and acoustic echo cancelation solutions.
[0050] The DTME communicates with the Application layer thru a
well-defined Application Layer Interface (DTME API) for rapid and
flexible customization across a wide range of real-time video
applications. The DTME API also enables a "headless" client,
allowing third parties such as handset OEMs and video service
providers to develop their own custom applications.
[0051] Device Abstraction, OS Abstraction, and Network Abstraction
Modules
[0052] These modules allow installation and interoperability of the
RVSP Client on devices running all of today's leading smartphone
and PC operating systems. They also allow the RVSP Client to
accommodate the wide range of cameras, displays, and audio hardware
found in smartphones and PCs, and allow real-time video services to
leverage the widest possible range of 3G, 4G, WiFi, DSL, and
broadband network connectivity modes.
[0053] RVSP Server Application
[0054] As shown in FIG. 3, the RVSP Server integrates multiparty
connectivity, transcoding, and automated video editing into a
single all-software application that can be deployed both on
existing mobile operator server infrastructure or standard utility
servers in a cloud computing infrastructure.
[0055] Many real-time video services require support for additional
network based video processing, including [0056] multiparty
connectivity [0057] transcoding [0058] automated video editing
[0059] multimedia mashups [0060] connectivity to legacy video
conferencing systems.
[0061] The RVSP Server provides these functions in an
industrial-strength solution built using standards-based
software--without the use or added expense of a hardware-based MCU
or MRF appliance. An all-software RVSP Server solution enables
customers to purchase only the number of ports they need, then grow
into additional ports as the user base expands. The RVSP Server
solution's flexible capacity management also enables video service
providers to roll-out services spanning mobile video chat thru
broadband HD videoconferencina on a sinale platform.
TABLE-US-00003 RVSP Server Specifications System Requirements SIP
compatible, multipoint video Operating System: Linux, Windows and
voice conferencing, trans- Server 2003 R2/2008 coding, and
automated media Processor: Dual Core processor or mixing and
editing higher required for operation On-demand, personal meeting
2.5 GHz Xeon processor or higher rooms or one-click, ad-hoc
required for HD video support conferences Concurrent user capacity
varies Personal layout selection with based on available processor
speed continuous presence, automatic and number of available cores
layout adaptation based on number Resource usage varies by selected
of conference participants resolution Large conference support up
to the Memory: 4 GB capacity of the MCU Diskspace: 2 GB Up to
720p30 transmit and receive Network: Single, 100 Mbps network
resolutions, call rates up to 4 Mbps adapter with full duplex
connectivity Selectable 4:3 and 16:9 aspect ratio and a static IP
address for transmitted video Virtual Servers: Supported; DTV-4,
H.264, H.263+, and dedicated resources required H.263++ video
codecs AMR, AAC-LC, G.711, G,722, G.722.1c, MP3 audio codecs SIP
Registration and proxy support Web-based, remote configuration and
management Multi-level administrative access control using Windows
domain and local host authentication authorities Usage and system
logging to Microsoft SQL Server 2008 Configurable DiffServ settings
for audio and video Endpoint API via SIP CSTA for advanced
conference management REST API for management integration
[0062] Additional RVSP Server benefits include: [0063] Natural
Interactions--High quality media experience across a wide range of
devices and networks [0064] Standards Based--Supports existing
conferencing standards and interfaces to legacy conferencing
systems [0065] Right-sized Buying--Flexible deployment model
empowers customers to license only the ports they need [0066]
Scalability--Easily add host server processing power to increase
RVSP Server capacity [0067] Flexible Capacity Management--Ensures
optimal resource usage [0068] Transcoding/Transrating--For each
port, ensures that endpoints receive the best possible experience
based on their capabilities.
[0069] Real-Time Adaptation Sub-System
[0070] A Real-Time Adaptation (RTA) sub-system has been integrated
into the RVSP client application to enable prediction and/or
measurement of, and adaptation to, fluctuations in the following
device/network impairments and video characteristics:
[0071] Device Impairments
[0072] Existing real-time video client applications running on
commercially available smart phones, tablets, and other
video-enabled devices suffer from many device impairments,
including: [0073] Differences between front camera versus rear
camera. Some devices have front cameras limited to 15 fps and VGA
(640.times.480 pixels) image sizes, while rear cameras on the same
devices can support up to 30 fps and larger image sizes. [0074]
Limited control of camera frame rate. Some camera modules, once
activated in video mode, deliver a constant frame rate (i.e. 30
fps) regardless of what frame rate is requested by the calling
application [0075] Poor tracking of camera frame rate. Some camera
modules, once activated in video mode, do not accurately track and
maintain the requested frame rate. Deviations between requested and
delivered video frame rates may also be influenced by processor
loading due to other applications running on the device. [0076] CPU
loading during camera operation. Some camera modules, once
activated in video mode, automatically activate additional video
processing functions in the device that can lead to significant
processor loading. This loading in turn can limit overall real-time
video applications to lower frame rates than targeted.
[0077] Resulting real-time video application degradations resulting
from failure to adapt to device impairments include: [0078]
discrepancies between uncompressed video frame rates requested by
the real-time video client application and the actual frame rates
delivered by device camera modules [0079] uncompressed video frames
that are delivered to the real-time video client by the device
camera module, but cannot be passed to the video encoder due to
timing limitations [0080] compressed video frames that arrive in
the real-time video client, but cannot be passed to the video
decoder due to timing limitations.
[0081] Network Impairments
[0082] Existing real-time video services running on commercial
wireless (3G, 4G, WiFi) and wireline (DSL, broadband) suffer from
many network impairments (FIG. 7), including: [0083] Packet delay
& jitter in the network. [0084] Outright packet loss in the
network. [0085] Other "network congestion". Traffic contention due
to the presence of other data traffic can manifest itself as a
decrease in available network bit rate and/or decrease in data
stream signal-to-noise ratio (SNR). Traffic contention may also
manifest itself as increased packet delay/jitter and packet loss in
the network. [0086] Asymmetry between uplink & downlink
characteristics for each party on a real-time video call
session.
[0087] Resulting real-time video application degradations resulting
from failure to adapt to network impairments include: [0088] media
packets/audio & video frames that arrive in the receiver's
client device but are sufficiently delayed/out of order that the
client application is forced to ignore them and not pass them to
the decoder [0089] media packets/audio & video frames, and
control/signaling information that never arrive in the receiver's
client device [0090] wide variations in the quality of individual
participants' video streams on a multi-party video conference.
[0091] Variations in Inherent Video Characteristics
[0092] Testing on commercially available smart phones, tablets, and
other video-enabled devices has revealed that, depending on typical
frame-to-frame variations in inherent video data characteristics
such as the relative degree of luma and chroma detail and
frame-to-frame motion, the bits/frame required to maintain a
constant level of user-perceived image quality can vary
significantly (FIG. 8).
[0093] Real-time video application degradations that can result
from failure to adapt to variations in such characteristics as
frame-to-frame compressibility and degree of motion include: [0094]
the real-time video client attempting to drive target bits/frame or
frames/second to unnecessarily high or unattainably low levels
during a video call session.
[0095] The many real-time video quality and user experience
degradations that result from failure to adapt to device and
network impairments, and fluctuations in inherent video
characteristics, include: stalling, dropped frames, dropped
macro-blocks, image blockiness, and image blurriness (FIG. 9).
[0096] RTA Sub-System Design Strategy
[0097] Successful real-time adaptation to the above impairments and
fluctuations requires that the RTA sub-system in the RVSP Client
application simultaneously analyze fluctuations of multiple video,
device and network parameters, via measurement and/or prediction,
in order to continuously update and implement an overall real-time
video adaptation strategy. FIG. 10 illustrates the RTA Subsystem
inputs and control outputs.
[0098] Device Impairments: The RTA sub-system analyzes the behavior
of uncompressed and compressed audio and video frames being
generated, transmitted, stored, and processed within the device to
detect and adapt to fluctuations in device loading. The RTA
sub-system adapts to the measured fluctuations in device loading
via corresponding [0099] i) internal modifications to the target
compressed frame rate to be generated and sent from the device to
another RVSP-enabled user. [0100] (ii) requested modifications to
the target compressed frame rate to be generated and sent to the
device from another RVSP-enabled user.
[0101] Network Impairments: The RTA sub-system analyzes the audio
and video RTP media packets and corresponding RTCP control packets
being generated within, and transmitted between, RVSP-enabled
client devices, in order to measure/predict and adapt to
fluctuations in the network. The RTA sub-system adapts to the
measured fluctuations in network performance via corresponding
modifications to [0102] (i) Targeted uncompressed video frame rate
(fps) to be delivered by the camera to the DTV-X video encoder.
[0103] (ii) Targeted compressed video bits/frame to be delivered by
the DTV-X video encoder. Several encoding parameters determine the
compressed video bits/frame, including: [0104] Quantization
parameter Q [0105] progressive refresh parameters [0106] saliency
parameters [0107] PN frame ratios [0108] I frame insertion [0109]
(iii) Targeted video data packet size to be generated by the RVSP
client application's RTP/RTCP module for network transmission to
another user. [0110] (iv) Video frame/stream format requested from
other user: [0111] Send/resend I frame [0112] (v) Frame buffers in
RVSP media framework [0113] (vi) Packet buffers in RVSP RTP stack
[0114] (vii) RTCP messages in RVSP RTP stack
[0115] Inherent Video Characteristics: The DTV-X video encoder
analyzes frame-to-frame variations in the inherent compressibility
of uncompressed video frame sequences being delivered from the
camera module, and communicates this information to the RTA
sub-system. The RTA sub-system utilizes this information to prevent
the RVSP client from attempting to drive target bits/frame or
frames/second to unnecessarily high or unattainably low levels
during a call session. The inherent compressibility will vary with
the relative degree of luma and chroma detail and/or the relative
degree of motion in a sequence of video frames.
[0116] Successful real-time adaptation within the RVSP Client
application requires that the above analysis and feedback be
implemented as a set of collaborating processes within and between
the RTA sub-system, the DTV-X video codec, the RTP/RTCP module, and
the Session Control module. During a real-time video session, the
RTA sub-system first determines device and network limitations
during call setup and capabilities exchange between the
participating devices. Once a real-time video has been established,
the RTA sub-system continues to analyze and adapt to fluctuations
in device and network impairments and video parameters.
[0117] Determining Device/Network Limitations during Call Setup:
During call setup and capabilities exchange, the RVSP client
application determines the media bandwidth appropriate to the
targeted user experience that can be supported by the device(s) and
network(s) participating in the video call session. For each video
call session, this bits/second target is then utilized by the RVSP
client application(s) to establish [0118] the initial video frame
resolution and frame rate targets (for camera interfacing) [0119]
the initial bits/frame target (for DTV-X codec interfacing) [0120]
the initial bytes/packet target (for RTP/RTCP module
interfacing).
[0121] The initial bits/second, frames/second, bits/frame, and
bytes/packet targets should likely not be chosen to correspond to
the maximum rates that are expected to be supported by the
device(s) and network(s) participating in the video call session.
Instead, the initial targets should be chosen so as to guarantee a
high probability that they will actually be met, in order to avoid
prolonged periods at call startup where the RTA sub-system is "out
of target" and delivering a degraded user experience.
[0122] Analyzing Device/Network Impairments and Video Parameters
during Call:
[0123] The following RTA-related parameters are measured during
each video call:
[0124] Device Impairments
[0125] i. Camera Speed Degradation
[0126] Measured input is the difference between the uncompressed
video frame rate requested from the camera and the actual
uncompressed video frame rate that is delivered to the RVSP
application for processing by the DTV-X video encoder.
[0127] Used to determine the maximum video frame rate that can be
requested.
[0128] ii. Device Loading on Send Channel
[0129] Measured input is the fraction of the uncompressed video
frames delivered by the camera that arrive within a time window
suitable to be further processed by the DTV-X encoder.
[0130] Used to determine the maximum video frame rate that can
actually be encoded and sent.
[0131] iii. Device Loading on Receive Channel
[0132] Measured input is the fraction of the compressed video
packets successfully received and re-assembled into complete video
frames by the RVSP application within a time window suitable to be
further processed by the DTV-X decoder.
[0133] Used to determine the maximum video frame rate that can
actually be decoded and displayed.
[0134] Network Impairments
[0135] iv. Network Congestion
[0136] Measured inputs are the RTCP reports and internal video and
audio packet output status utilized by each device to determine the
number of packets that the device itself has already sent but are
still in transit to the target receiving device.
[0137] Used to estimate available network bandwidth in order to
update target bit rate in bits/sec and packet size in bytes/packet.
Our own measurements have revealed that correlation between the
transmission of audio and video packets on mobile network is poor
(FIG. 11). Packet tracing added to the DTME to report the number of
Audio and Video packets in transit at any given time under multiple
controlled and uncontrolled network conditions has shown that the
fractional in-transit Audio and Video packet counts are not well
correlated, and that both show a significant dependence on packet
size for any given level of network congestion. At higher levels of
network congestion, smaller video packet sizes result in improved
overall video throughput. At lower levels of network congestion,
efficient video throughput can be maintained with larger packet
sizes.
[0138] Used to estimate available network bandwidth in order to
update target bit rate in bits/sec and packet size in bytes/packet.
Our own measurements have revealed that correlation between the
transmission of audio and video packets on mobile network is poor
(FIG. 11). Packet tracing added to the DTME to report the number of
Audio and Video packets in transit at any given time under multiple
controlled and uncontrolled network conditions has shown that the
fractional in-transit Audio and Video packet counts are not well
correlated, and that both show a significant dependence on packet
size for any given level of network congestion. At higher levels of
network congestion, smaller video packet sizes result in improved
overall video throughput. At lower levels of network congestion,
efficient video throughput can be maintained with larger packet
sizes.
[0139] Poor correlation between Audio and Video packets in transit
can be used as an indication that network congestion is high and
that the Video packet size is not small enough to ensure efficient
video throughput at the current level of network congestion
[0140] v. Uplink and Downlink Network Packet Loss
[0141] Measured inputs are the RTCP reports indicating the fraction
of packets lost.
[0142] Used as an additional input to gauge the network congestion
and the corresponding effective real-time network bandwidth. Also
used to modify "aggressiveness" of progressive refresh and PN frame
ratio.
[0143] vi. Uplink and Downlink Network Jitter
[0144] Measured input is the difference between arrival-time
intervals (between successive packets, observed as they arrive on
the receiver device) and capture-time intervals (between successive
packets, as indicated by timestamps written by the sender device).
These difference measurements are processed using a rolling average
filter to calculate a "recently observed jitter".
[0145] Used to adapt the depth of the RVSP jitter buffer in order
to support packet re-ordering.
[0146] vii. Roundtrip, Uplink, and Downlink Network Delays
[0147] Measured via NTP-based time values provided in RTCP sender
report--RFC 1889 section 6.3.1 (also see FIG. 2: Example for
round-trip time computation). Let SSRC_r denote the receiver
issuing this report. Source SSRC_n can compute the round-trip
propagation delay to SSRC_r by recording the time A when this
reception report block is received. It calculates the total
round-trip time A-LSR using the last SR timestamp (LSR in the RTCP
Sender report) field, and then subtracting the delay since the last
SR was sent (DLSR in the RTCP Sender report field). The round-trip
propagation delay is then given as (A-LSR-DLSR).
[0148] May be used to estimate signaling delay that must be
accounted for if/when an I-Frame resend request is made from one
device to another.
[0149] Inherent Video Characteristics
[0150] viii. Video Frame Compressibility
[0151] Measured via the internal compression parameters
(quantization levels, prediction mode decisions, and others) and
the resulting actual compressed frame size generated by the DTV-X
encoder. The inherent compressibility will vary with the relative
degree of luma detail, chroma detail, brightness, contrast, and/or
the relative degree of motion in a sequence of video frame.
[0152] Used to alter the trade-off between bits/frame and
frames/second targets during extended sequences of "highly
compressible" frames. When frames are highly compressible, it may
be advantageous to allocate fewer bits to each compressed image and
choose a higher frame rate for smoother motion, within the given
overall bit rate target. Conversely, when frames are difficult to
compress well, it may be advantageous to reduce the frame rate and
allocate more bits to each compressed frame.
[0153] ix. Relative Degree of Luma and Chroma Detail
[0154] Measured from the quantization levels and resulting
compressed size in different wavelet transform subbands for current
video frame input to DTV-X encoder.
[0155] Used to determine minimum bits/frame required to provide
good image fidelity.
[0156] Used to determine minimum bits/frame required to provide
good image fidelity.
[0157] Measured from motion channel in saliency map for current
video frame input to DTV-X encoder.
[0158] Used to determine the minimum frame rate required to provide
good motion fidelity, and to support lower frame rates and higher
bits/frame targets during extended sequences of "low motion"
frames. The motion channel of the saliency map generation compares
a filtered version of the current frame's luma against the same
version of the previous frame or frames, estimating motion based on
the magnitude of differences in the comparison.
[0159] Adaptation to Device/Network/Video Fluctuations during Call:
The RVSP RTA sub-system processes the above inputs from the camera
module, DTV-X codec, RTP/RTCP module, and RVSP packet/frame buffers
in order to update its estimates of parameters (i)-(x) above.
[0160] Adaptation to Device/Network/Video Fluctuations during Call:
The RVSP RTA sub-system processes the above inputs from the camera
module, DTV-X codec, RTP/RTCP module, and RVSP packet/frame buffers
in order to update its estimates of parameters (i)-(x) above.
[0161] Adapting the Jitter Buffer Depth: Based on the updated
estimate of (vi), the RTA sub-system then either maintains or
updates the RVSP packet/frame jitter buffer depth(s). If excessive
jitter bursts are detected, and these cannot be accommodated by
packet re-ordering in the jitter buffer set to its maximum depth,
then the corresponding packets must be treated by the RVSP client
as if they were lost. The RTA sub-system may send a request to the
other user to send a new I-frame in order to reset the decode
process at the receiver impacted by the burst. The roundtrip
network delay estimate (vii) provides the device with a lower limit
on how long it must expect to wait for the requested I-frame to be
delivered, and thus how long it must rely on alternative mechanisms
(saliency/progressive refresh/V-frames) to deal with the high
packet loss.
[0162] Adapting to Video Frame Compressibility and Degree of
Motion: Based on the updated estimates of (viii)-(x) above, the RTA
sub-system then either maintains or updates the bits/frame and
frames/sec targets. In order to deliver the best user experience
using the least device and network resources, the RTA sub-system
can maintain lower bits/frame targets during extended sequences of
"highly compressible" frames (low relative degree of Luma and
Chroma detail), and/or lower frames/sec targets during extended
sequences of "low motion" frames.
[0163] RTA Sub-System Implementation
[0164] Key Modules
[0165] As shown in FIGS. 10 and 12, the RTA subsystem includes the
following modules: [0166] Automatic Bit Rate Adjustment (ABA)
[0167] Rate Function [0168] Frame Rate Regulator [0169] Compression
Regulator [0170] Jitter Buffer Control [0171] Packet Size Control
[0172] Codec Control
[0173] Automatic Bit Rate Adjustment (ABA): The ABA evaluates two
measurements of the network performance to determine the target bit
rate for video transmission: [0174] (i) Packet Loss Analysis--The
receiver maintains a count of the received packets and a count of
the gaps in the packet sequence numbering--which are lost packets.
Periodically, the receiver sends a report to the sender with the
ratio of lost packets. [0175] (ii) Network Buffer Fill Level
Analysis--The receiver periodically sends a port to the sender with
the sequence number of the last received packet. The sender
compares this number to the last sent packet sequence number to
approximate the number of packets remaining on the network en route
to the receiver.
[0176] The ABA compares this bit rate target against the peer's
computation to ensure this unit does not consume a disproportionate
amount of the available network bandwidth. The ABA unit
periodically notifies it's peer with its determined target bit rate
for video transmission. The peer compares its own target to this
value and when its value is significantly larger, the peer lowers
its own target correspondingly.
[0177] Rate Function: The Rate Function converts the target bit
rate into a corresponding combination of frame rate and bytes/frame
for the encoder output. As shown in Frame 13, the rate function
incorporates the following parameters: [0178] Minimum bit rate
(bits/sec) [0179] Maximum bit rate (bits/sec) [0180] Minimum frame
rate (fps) [0181] Maximum frame rate (fps) [0182] Minimum
compression level (bytes/frame) [0183] Maximum compression level
(bytes/frame)
[0184] Frame Rate Regulator: Because the output frame rate from the
camera modules on many smartphones and tablets is often irregular,
the Frame Rate Regulator provides intermediate frame
buffering/processing in order to ensure that the DTV-X video
encoder receives video frames at the fps rate targeted by the Rate
Function.
[0185] Compression Regulator: The Compression Regulator monitors
the encoder output and modulates the frames/sec and bytes/frame
targets based on the recent frame compressibility history provided
by the video encoder. The goal is to deliver the best user
experience using the least device and network resources. For
example, the RTA sub-system can maintain lower bits/frame and
higher frames/second targets during extended sequences of "highly
compressible" frames (low relative degree of Luma and Chroma
detail), and/or lower frames/sec and higher bits/frame targets
during extended sequences of "low motion" frames. Additionally, the
Compression Regulator monitors and compares the actual uncompressed
video frame rate delivered by the Camera and the actual compressed
video frame rate delivered by the Encoder, and adjusts the
bytes/frame target to achieve the target bit rate. The Compression
Regulator can thus modify the Rate Function described above.
[0186] Jitter Buffer Control: The Jitter Buffer Control measures
the difference between arrival-time intervals (between successive
packets, observed as they arrive on the receiver device) and
capture-time intervals (between successive packets, as indicated by
timestamps written by the sender device). These difference
measurements are processed using a rolling average filter to
calculate a "recently observed jitter". If the recently observed
jitter increases, the temporal depth of the RVSP jitter buffer in
the RTP/RTCP module is increased in order to support packet
re-ordering over a larger number of packets. If the recently
observed jitter decreases, the temporal depth of the RVSP jitter
buffer in the RTP/RTCP module is decreased correspondingly.
[0187] Packet Size Control: The maximum transmission unit (MTU) is
the largest packet size that can be transmitted over a network.
Occasionally, the size of the video frame exceeds this maximum and
the frame is split across several packets. The number of packets is
first determined and then the frame is split evenly across that
number of packets. Packet size can also be reduced/increased to
enable more efficient video transmission as network impairments
increase/decrease.
[0188] Codec Control: The DTV-X video codec encoder accepts video
frames (images) in sequence and produces compressed representations
of them for transmission to the DTV-X video codec decoder. It has
various control inputs and information outputs, in addition to the
input and output of video frames, as can be seen in FIG. 14.
[0189] With each frame to be compressed, the encoder accepts a
frame type request that can be "I-Frame", "P-Frame", or (in some
embodiments) "V-Frame". These designate options in the encoding
process and the format of the resulting compressed frame. The
encoder will produce an I-frame when requested. It may produce an I
frame when a P-frame was requested, if the compression process
produces a better result as an I-frame; for example, in the case of
a sudden scene change or cut. V-frames are reduced representations
that should be used only in between I-frames or P-frames; in some
embodiments, the encoder may produce a V-frame when a P-frame was
requested.
[0190] With each frame to be compressed, the encoder accepts a
target for the compressed size of the frame. This target may not be
met exactly; if the video is changing in compressibility character,
the actual compressed size may differ from the target by a
significant factor.
[0191] The encoder accepts an input indication of the desired
strength of Progressive Refresh, which is the fraction of each
P-frame that should be compressed without reference to prior frames
or encoding state. The reason for this is that if a frame is lost
or cannot be decoded for any reason, the decoder will not have
available the necessary state reference for correctly decoding
frames that follow the missing frame. Progressive refresh allows
the reference state to be refreshed partially in every P-frame, so
that it is not necessary to send I-frames periodically. This makes
the frame size and transmission bit rate more uniform, and adds
robustness against lost packets.
[0192] With each compressed frame that it produces as output, the
encoder delivers an indication of the frame type actually used for
this frame, whether I-Frame, P-Frame, or V-Frame.
[0193] With each compressed frame that it produces as output, the
encoder delivers an indication of the actual size to which the
frame was compressed. This may be compared with the size target
that was given as input, and used in a rate control tracking loop
to keep actual rate within tighter long-term bounds than the
codec's single frame size targeting ability.
[0194] With each compressed frame that it produces as output, the
encoder delivers an estimate of the compressibility of the frame.
This can be used in deciding how to balance frames-per-second
against bits-per-frame in the rate control process.
[0195] With each compressed frame that it produces as output, the
encoder delivers an estimate of the motion activity of the frame,
and of the detail levels in the frame. These can be used in
deciding how to balance frames-per-second against bits-per-frame in
the rate control process.
[0196] RTA Bit Rate Adjustment Algorithm Description
[0197] RTA Bit Rate Adjustment Algorithm Description
DEFINITIONS
[0198] Packet Loss rate is the fraction of the total transmitted
packets that do not arrive at the intended receiver.
[0199] Network Buffer Fill Level is the number bytes (or in the
case of uniform packet sizes--the number of packets) currently in
transmission through the network.
[0200] Packet Loss Analysis: The packet loss ratio is taken
directly from the `fraction lost` field in the RTCP Sender or
Receiver Report packet (SR: Sender report RTCP packet--Paragraph
6.3.1 if RFC 1889; RR: Receiver report RTCP packet--Paragraph 6.3.2
if RFC 1889). This value is average filtered:
LR.sub.new=.alpha..sub.L*LR.sub.old+(1-.alpha..sub.L)*LR.sub.net
(1) [0201] where: [0202] LR.sub.new is the newly filtered Packet
Loss ratio value; [0203] LR.sub.old is the previous Packet Loss
ratio value; [0204] LR.sub.net is the Packet Loss ratio value from
the RTCP receiver report; [0205] .alpha..sub.L is a parameter
specifying how aggressive the algorithm reacts to the latest
reported value, and 0.ltoreq..alpha..sub.L.ltoreq.1.
[0206] Network Buffer Fill Level Analysis:
[0207] The sender keeps track of the latest transmitted packet
sequence number. The receiver reports the latest received packet
sequence number in its RR report. The sender subtracts its number
from the receiver's number to calculate the amount of data
currently in transmission through the network. Since the report's
value is inherently offset by the network delay between receiver
and sender, the difference defines an upper estimate of the network
buffer fill level. This value is average filtered:
N.sub.new=.alpha..sub.N*N.sub.old+(1-.alpha..sub.N)*N.sub.net (2)
[0208] where: [0209] N.sub.new is the newly filtered Network Buffer
fill level value; [0210] N.sub.old is the previous Network Buffer
fill level value; [0211] N.sub.net is the freshly calculated
Network Buffer fill level value; [0212] .alpha..sub.N is a
parameter specifying how aggressive the algorithm reacts to the new
calculated value, and 0.ltoreq..alpha..sub.N.ltoreq.1.
[0213] Because the maximum Network Buffer fill level is not known,
the latest network value is compared to the previous averaged value
and this difference becomes the final result.
N.sub.fill=(N.sub.net-N.sub.old)/N.sub.new (3)
[0214] Packet Loss Adjustment Strategy:
[0215] Network packet loss is defined to be in one of the following
three states: [0216] Congested: The Packet Loss ratio is high and
the transmission quality is low. [0217] Fully Loaded: The Packet
Loss ratio is affordable and the transmission quality is good.
[0218] Under Loaded: The Packet Loss ratio is very small or
zero.
[0219] The estimate of the network packet loss conditions is based
on relative values of the filtered values of the Packet Loss ratio,
LR.sub.new, and two threshold values LRc (congested Packet Loss
ratio) and LR.sub.u (under-loaded Packet Loss ratio):
if (LR.sub.new.gtoreq.LR.sub.c).fwdarw.network congestion
if (LR.sub.c>LR.sub.new.gtoreq.LR.sub.u).fwdarw.network fully
loaded
if (LR.sub.u>LR.sub.new).fwdarw.network under loaded (4)
[0220] According to one exemplary embodiment, the above parameters
can be:
[0221] .alpha..sub.L=0.5, LR.sub.c=0.05, LR.sub.u=0.02.
[0222] Network Buffer Fill Level Adjustment Strategy:
[0223] Network congestion is defined to be in one of the following
four states: [0224] Congested: The network fill level is high and
the transmission quality is low. [0225] Fully Loaded: The network
fill level is affordable and the transmission usage is good. [0226]
Under Loaded: The network fill level is underutilized and
transmission should increase slightly. [0227] Very Under Loaded:
The network fill level is underutilized and the transmission should
increase significantly.
[0227] if (N.sub.fill.gtoreq.N.sub.c).fwdarw.network congestion
if (N.sub.c>N.sub.fill.gtoreq.N.sub.u).fwdarw.network fully
loaded
if (N.sub.u>N.sub.fill.gtoreq.N.sub.vu).fwdarw.network under
loaded
if (N.sub.vu>N.sub.fill).fwdarw.network very under loaded
(5)
[0228] In one exemplary embodiment, the above parameters are set
to: [0229] N.sub.c=0.75; N.sub.u=0.45; N.sub.vu=0.15
[0230] Combined Adjustment:
[0231] The combined algorithm includes both of the above two
algorithms. Because "Network Buffer Fill Level" provides a more
sensitive prediction of network congestion than "Packet Loss
Ratio", the RTA uses "Network Buffer Fill Level" as a primary
adjustment, and "Packet Loss Ratio Adjustment" as a secondary
adjustment, according to the following specific conditions.
if (N.sub.fill.gtoreq.N.sub.c).fwdarw.use higher of two
adjustments
if (N.sub.c<N.sub.fill.gtoreq.N.sub.u).fwdarw.use Network Buffer
Fill Level adjustment (6)
[0232] RTA Testing
[0233] Network Test Configuration
[0234] Network configurations used for RTA performance testing and
evaluation are shown in FIGS. 15 and 16. A standard off-the-shelf
Linux based computer is used to implement the Network Impairment
Router. Network impairment is realized using Traffic Control, a
standard utility on Linux. Traffic Control is a command line based
utility. MasterShaper is a network traffic shaper that leverage
Traffic Control and other Linux utilities to provide a Web
Interface for Quality of Service (QoS) functions.
[0235] Two devices are used to conduct real-time Peer-to-Peer Video
Call tests. Each device connects to a separate access point,
forcing the video call path to go thru the Network Impairment
Router. Shell scripts leveraging Traffic Control/MasterShaper
commands are used to control the Peer-to-Peer Video Call path.
MasterShaper allows predetermined values and fluctuations of the
bandwidth, delay, jitter, and packet loss to be set for the Video
Call path.
[0236] IPerl is installed on both clients and used to validate the
IFPW bandwidth. One iPerf client is setup as the server and the
other as the client. iPerf performs a test to measure the effective
bandwidth over the network connection between the client and the
server.
[0237] Bandwidth Adaptation Testing
[0238] Bandwidth adaptation test cases demonstrate the RVSP
Client's capability to maintain high quality video under varying
network bandwidth availability. The network bandwidth is first set
to a target (constant) level, and a real-time video call session is
initiated to demonstrate how the RTA sub-system allows the Client
to adapt to the initial network capacity (FIG. 17). Next, the
network bandwidth is varied during the video call session to
demonstrate how the RTA sub-system allows the Client to track and
adapt to variations in the network capacity (FIG. 18).
[0239] Jitter Adaptation Testing
[0240] Network jitter presents a particular challenge for video. In
this test case we induce a jitter of up to 50 ms, and show how the
Client continues to deliver high quality video. Qualitative results
are produced by the Client, and can be observed using eclipse's
logging facility (by connecting the device under test to an eclipse
enabled PC). The Client reports the number of total audio and video
packets that were found out of order, and the degree to which it
was successful sorting the out-of-order packets. For qualitative
results, the video can be observed during the call while jitter is
introduced. Video is never frozen. Further, the Client's saliency
capability is used to only refresh salient parts of the video when
packets are lost. Additionally, by turning jitter on and off, the
relative delay on Handset B is adjusted automatically. Hence the
Client does not rely on a fixed buffer that introduce needless
delay.
[0241] Packet Loss Adaptation Testing
[0242] All networks are prone to packet loss. This is a particular
problem for wireless networks, or use cases where the packets must
traverse multiple network boundaries to reach the target
destination. In this test case, we implement packet loss rates up
to 5% on the video communication path, and observe the resulting
video quality. Since reducing the bandwidth can also cause the
packet loss rate to vary, the Client bandwidth adaptation
capability (ABA) is turned off for these tests. We turn off
adaptation by selecting the menu button during a video call, and
clicking on "ABA Off" button. The result of this test is
qualitative only. Similar to the jitter test case, the video never
freezes, and in event of a packet loss, only salient parts of the
video are refreshed, resulting in a more acceptable user acceptable
experience.
[0243] Video Conferencing User Features
[0244] When deployed together, the RVSP Client and Server
applications enable multiple participants to simultaneously create
and share high-quality video with each other in real-time, with
many key aspects of a face-to-face user experience.
[0245] FIG. 19 is an overview diagram of an all-software multi-user
video conferencing system according to one embodiment of the
present invention, with user-enabled voice activity detection.
[0246] FIG. 20 is an overview diagram of an all-software multi-user
video conferencing system according to one embodiment of the
present invention, with the moderator able to select a participant
to be given "the floor" via display at maximum video frame
size/frame rate on all participants' device displays.
[0247] FIG. 21 is an overview diagram of an all-software multi-user
video conferencing system according to one embodiment of the
present invention, with each participant able to select which other
participant will be displayed at maximum video frame size/frame
rate.
[0248] While several embodiments have been shown and described
herein, it should be understood that changes and modifications can
be made to the invention without departing from the invention in
its broader aspects. For example, but without limitation, the
present invention could be incorporated into a wide variety of
electronic devices, such as feature phones, smart phones, tablets,
laptops, PCs, video phones, personal telepresence endpoints, and
televisions or video displays with external or integrated
set-top-boxes (STBs). These devices may utilize a wide variety of
network connectivity, such as 3G, 4G, WiFi, DSL, and broadband.
* * * * *