U.S. patent application number 10/212831 was filed with the patent office on 2004-02-05 for method and apparatus for continuously receiving images from a plurality of video channels and for alternately continuously transmitting to each of a plurality of participants in a video conference individual images containing information concerning each of said video channels.
Invention is credited to Kuo, Chen-Yuan, Yang, Chih-Lung.
Application Number | 20040022202 10/212831 |
Document ID | / |
Family ID | 31187825 |
Filed Date | 2004-02-05 |
United States Patent
Application |
20040022202 |
Kind Code |
A1 |
Yang, Chih-Lung ; et
al. |
February 5, 2004 |
Method and apparatus for continuously receiving images from a
plurality of video channels and for alternately continuously
transmitting to each of a plurality of participants in a video
conference individual images containing information concerning each
of said video channels
Abstract
A method and apparatus is provided for video conferencing. The
method and apparatus continuously receive frames from a plurality
of video channels and alternately continuously transmit to each of
a plurality of participants in a video conference individual frames
containing information concerning each of the video channels. The
method and apparatus only transmits at any given instant new
picture data for one of the participants in the video
conference.
Inventors: |
Yang, Chih-Lung; (Gilbert,
AZ) ; Kuo, Chen-Yuan; (Gilbert, AZ) |
Correspondence
Address: |
TOD R. NISSLE, P.C.
P.O. Box 55630
Phoenix
AZ
85078
US
|
Family ID: |
31187825 |
Appl. No.: |
10/212831 |
Filed: |
August 5, 2002 |
Current U.S.
Class: |
370/261 |
Current CPC
Class: |
H04L 65/1101 20220501;
H04L 65/765 20220501; H04L 12/1813 20130101; H04L 65/1106 20220501;
H04L 9/40 20220501; H04L 65/4038 20130101; H04N 7/15 20130101 |
Class at
Publication: |
370/261 |
International
Class: |
H04L 012/16 |
Claims
Having described our invention in such terms as to be understood by
those of skill in the art to make and use of the invention, and
having described the presently preferred embodiments and best mode
thereof, we claim:
1. A method for receiving and transmitting video data across a
network comprising the steps of: (a) Receiving a call
initialization signal further comprising a codec identifying signal
that corresponds to a codec format, in a network interface from a
first video source; (b) Storing said codec identifying signal in a
memory; (c) Receiving a component video packet stream from said
first video source; (d) Disassembling said component video packet
stream into a component video signal; (e) Forming a composite video
signal from said component video signal said composite video signal
further comprising said codec format; (f) Assembling said composite
video signal into a composite video packet stream further
comprising said codec format; and, (g) Transmitting said composite
video packet stream to said first video source.
2. The method as recited in claim 1 further comprising the steps
of: (a) Receiving a second call initialization signal further
comprising a second codec identifying signal that corresponds to a
second codec format, in said network interface from a second video
source; (b) Storing said second codec identifying signal in said
memory; (c) Receiving a second component video packet stream from
said second video source; (d) Disassembling said second component
video packet stream into a second component video signal; (e)
Forming a second composite video signal from said second component
video signal said second composite video signal further comprising
said codec format; (f) Assembling said second composite video
signal into a second composite video packet stream further
comprising said codec format; (g) Forming a third composite video
signal from said second component video signal said third composite
video signal further comprising said second codec format; (h)
Assembling said third composite video signal into a third composite
video packet stream further comprising said second codec format;
(h) Transmitting said second composite video packet stream to said
first video source; and, (i) Transmitting said third composite
video packet stream to said second video source.
3. An apparatus for receiving and transmitting video data across a
network comprising: A. a video processing unit, said video
processing unit further comprising: 1. a network interface for
receiving a call initialization signal from a video source, said
call initialization signal further comprising a codec format,
receiving a component video packet stream from said video source
and transmitting a composite video packet stream to said video
source, 2. a memory further comprising a call set-up algorithm for
identifying said codec format of said call initialization signal
and storing said codec format in said memory, 3. a packet driver
for disassembling said component video packet stream into a
component video signal and for assembling a composite video signal
into a composite video packet stream, and 4. a multi-point control
unit for revising said component video signal into said composite
video signal; B. said packet driver coupled to said multi-point
control unit, said memory and said network interface; C. said
multi-point control unit coupled to said memory.
4. The apparatus as recited in claim 3 such that said network
interface receives a second call initialization signal from a
second video source, said second call initialization signal further
comprising a second codec format such that said call setup
algorithm identifies said second codec format and stores said
second codec format in said memory.
5. The apparatus as recited in claim 4 such that said multi-point
control unit sequentially senses whether a second component video
packet stream has been received at said network interface from said
second video source whereby upon receipt of said second component
packet stream said packet driver disassembles said second component
video packet stream into a second component video signal, said
multi-point control unit revises said second component video signal
into a second composite video signal in said codec format and said
second component video signal into a third composite video signal
in said second codec format such that said packet driver assembles
said second composite video signal into a second composite video
packet stream in said codec format and said third composite video
signal into a third composite video packet stream in said second
codec format such that said network interface transmits said second
composite video packet stream in said codec format to said video
source and said network interface transmits said third composite
video packet stream in said second codec format to said second
video source.
6. The apparatus as recited in claim 3 wherein said network
interface further comprises the capability to connect to a
plurality of video sources such that each of said plurality of
video sources transmits a component video packet stream further
comprising a corresponding codec format, such that said network
interface receives said plurality of component video packet streams
from said plurality of video sources and transmits a composite
video packet stream to each of said plurality of video sources in
the same corresponding codec format comprised in the component
video packet streams transmitted from said plurality of video
sources.
7. The method as recited in claim 1 wherein if, within a certain
time frame, said component video packet stream is received as a
video packet stream for an image, then said component video packet
stream from said first video source is received.
8. The method as recited in claim 2 wherein if, within a certain
time frame, said second component video packet stream is received
as a video packet stream for an image, then said second component
video packet stream from said second video source is received.
9. The apparatus as recited in claim 3 wherein said component video
packet stream is a video packet stream for an image.
10. A method for receiving and transmitting video data across a
network comprising the steps of (a) Receiving a video signal from a
first video source, said signal further comprising a codec
identifying signal that corresponds to a first codec format; (b)
Receiving a second video signal from a second video source, said
second signal further comprising a codec identifying signal that
corresponds to a second codec format different from said first
codec format; (c) Forming a composite video signal from said first
and second signals, said composite video signal further comprising
said first codec format; (d) Assembling said composite video signal
into a composite video packet stream further comprising said first
codec format; and, (e) Transmitting said composite video packet
stream to said first video source.
11. A method for receiving and transmitting video data across a
network comprising the steps of (a) Receiving a call initialization
signal in a network interface from a first video source; (b)
Receiving a call initialization signal in said network interface
from a second video source; (c) Receiving a component packet stream
from said first video source; (d) Disassembling said component
video packet stream into a component video signal; (e) Forming a
composite video signal from said component video signal, said
composite video signal further comprising steady state data for
said second video source; (f) Assembling said composite video
signal into a composite video packet; and, (g) Transmitting said
composite video packet stream to said second video source.
12. An apparatus for receiving video data across a network
comprising (a) A display further comprising a screen displaying at
least first and second images; (b) A network interface for
receiving a composite video packet stream further comprising coded
domain data to revise said first image and steady state data
indicating that said second image is unchanged; (c) A packet driver
for disassembling said composite video packet stream into a
composite video signal including said steady state data and said
coded domain data; and, (d) A control unit to revise said first
image in said display with said coded domain data and, based on
said steady state data, to permit said second image to remain
unchanged in said display.
13. An apparatus for receiving video data across a network
comprising (a) A display further comprising a screen displaying at
least first and second images; (b) A network interface for
receiving a composite video packet stream further comprising coded
domain data to revise said first image; (c) A packet driver for
disassembling said composite video packet stream into a composite
video signal including said coded domain data; and, (d) A control
unit to receive said composite video signal and revise only said
first image in said display with said coded domain data.
14. An apparatus for receiving and transmitting video data across a
network comprising a video processing unit, said video processing
unit further comprising: A. a network interface for (1) receiving a
call initialization signal from a video source, said call
initialization signal further comprising a codec format, (2)
receiving a component video packet stream from said video source,
(3) receiving a call initialization signal from a second video
source, said call initialization signal further comprising a second
codec format, and (4) transmitting a composite video packet stream;
B. a packet driver coupled to said network interface for
disassembling said component video packet stream into a component
video signal and for assembling said composite video signal into a
composite video packet stream; C. a memory coupled to said packet
driver and further comprising a call set-up algorithm for
identifying the codec formats of said call initialization signals
and storing said codec formats in said memory; and, D. a
multi-point control unit coupled to said packet driver and said
memory for revising said component video signal into said composite
video signal further comprising steady state data for said second
video source.
15. The apparatus of claim 14 wherein said first and second codec
formats are the same.
16. The apparatus of claim 14 wherein said first and second codec
formats are different.
Description
[0001] This invention relates to video conferencing.
[0002] More particularly, the invention relates to a method and
apparatus for video conferencing which significantly simplifies and
reduces the expense of video conferencing equipment which
continuously receives a video signal from each of two or more
participants, combines the video signals into a single composite
video signal, and retransmits to each of the participants the
composite video signal so that each participant can view
simultaneously himself or herself on a video screen along with the
other participants.
[0003] In a further respect, the invention relates to a method and
apparatus for video conferencing which receives a video signal from
a participant and alters the headers and coded domain data, if
necessary, in the signal without altering, in whole or in part, the
pixel domain data which defines the picture transmitted with the
signal.
[0004] In another respect, the invention relates to a method and
apparatus for video conferencing which transmits to participants
only the new information in one video channel at a time.
[0005] Video conferencing permits two or more participants to
communicate both verbally and visually. The use of equipment which
permits video conferencing has experienced only moderate growth in
recent years because of cost, bandwidth limits, compatibility
problems, and the limited advantages inherent in face-to-face
meetings as compared to the traditional audio conference
accomplished via telephone.
[0006] Many commercially available video conferencing systems,
including those video units which use the H.320, H.323 and H.324
envelope-protocols for call set up, call control plus audio and
video coding-decoding or codec formats (H.320 is the protocols for
ISDN network, H.323 for the LAN network and H.324 for the standard
phone or POTS connections), only provide point-to-point video
conferencing. To involve these point-to-point video conferencing
systems in a multi-point video conferencing requires the use of an
MCU (multi-point control or conference unit). A MCU can operate
either in a switched presence mode or continuous presence mode. In
switched presence mode, only one video stream is selected and
transmitted to all the participants based either on the audio
signal or "chairman" switch control. In continuous presence mode,
the MCU receives component video signals from each participant in a
video conference and combines the signals to produce a single
composite signal, and sends the composite signal back to each
participant, see FIG. 1. The composite signal enables each
participant to view on one screen the pictures of the other
participants along with his or her own picture on a real time basis
using a split-screen. The sophisticated structure and large
computation power of an MCU presently ordinarily require that it
resides on a central server. Some providers of MCU systems claim
that their MCU software can be operated on a desktop personal
computer (PC). However, such MCU systems apparently support only
the switched presence multi-point operation or they produce video
streams in proprietary formats which require each participant to
install special video conferencing software or apparatus.
[0007] Some of the factors that have made conventional MCU systems
complicated follow:
[0008] 1. The H.263 codec format permits the continuous presence
mode. In the continuous presence mode, an MCU receives 4 video
streams from the participants, makes some headers changes, and send
them back without combining them. The computer or other apparatus
of each participant need to decode and display all four video
streams to see the pictures of all the participants. The H.261
codec format does not, however, permit the continuous presence
mode. The H.261 is the required codec format for the H.323 video
unit. H.263 is an optional codec format. In addition, some existing
systems that run H.263 do not support the continuous presence mode
which is optional in H.263.
[0009] 2. Most existing video conferencing systems provide only
point-to-point video conferencing.
[0010] 3. An MCU system can provide continuous presence multi-point
video conferencing only if it can combine several incoming video
streams into a single composite outgoing video stream that can be
decoded by the equipment which receives the outgoing video
stream.
[0011] 4. When an MCU system combines several incoming video
streams, difficulties arise:
[0012] a. Incoming streams may use different codec formats, e.g.,
H261 or H.263.
[0013] b. Even if incoming streams have the same codec format, they
may have different picture types, e.g., I picture or P picture.
[0014] c. Even if incoming streams have the same codec format and
the same picture type, they each may have or utilize different
quantizers. This makes the adjustment of the DCT coefficients
necessary and at the same time introduces errors.
[0015] d. Video frames in each of the video channels ordinarily
arrive at different times. When the MCU awaits the arrival of a
frame or frames from each video channel, a time delay results.
[0016] e. If the MCU waits for the arrival of a frame or frames
from each video channel, operation of the MCU is, in substance,
controlled by the channel with the slowest frame rate.
[0017] f. An existing technique for solving the non-synchronized
frame rate problem mentioned above is to substitute the slower
channels with the previous images, so that the faster channel are
updated while the slower ones remain the same. But this practice
takes a significant amount of memory for buffering the images and
it may mean each image has to be fully decoded and encoded.
[0018] Accordingly, it would be highly desirable to provide an
improved video conferencing system which could, in essence, provide
continuous presence multi-point video conferencing while avoiding
some or all of the various problems in prior art MCU systems.
[0019] Therefore, it is a principal object of the instant invention
to provide an improved video conferencing system.
[0020] A further object of the invention is to provide an improved
method and apparatus for providing a continuous presence
multi-point video conferencing system.
[0021] Another object of the invention is to provide an improved
continuous presence multi-point video conferencing system which
significantly simplifies and reduces the expense of existing
multi-point video conferencing systems.
[0022] These, and other and further and more specific objects of
the invention will be apparent to those skilled in the art based on
the following description, taken in conjunction with the drawings,
in which:
[0023] FIG. 1 is a diagram illustrating the relationship between an
MCU and video sources in a multi-point, continuous presence video
conferencing system;
[0024] FIG. 2 is a diagram illustrating the screen of a participant
in a video conferencing system constructed in accordance with the
invention;
[0025] FIG. 3 is a diagram illustrating the information contained
in the outgoing composite H.263 video stream when the upper left
quadrant of the video image is being changed, when the upper right
quadrant of the video image is being changed, when the lower left
quadrant of the video image is being changed, and when the lower
right quadrant of the video image is being changed;
[0026] FIG. 4 is a diagram illustrating an incoming QCIF frame for
the upper left quadrant of an outgoing composite H.261 CIF video
stream and indicating information contained in the outgoing
composite H.261 CIF video stream, illustrating an incoming QCIF
frame for the upper right quadrant of an outgoing composite H.261
CIF video stream and indicating information contained in the
outgoing composite H.261 CIF video stream, illustrating an incoming
QCIF frame for the lower left quadrant of an outgoing composite
H.261 CIF video stream and indicating information contained in the
outgoing composite H.261 CIF video stream, and, illustrating an
incoming QCIF frame for the lower right quadrant of an outgoing
composite H.261 CIF video stream and indicating information
contained in the outgoing composite H.261 CIF video stream;
[0027] FIG. 5 illustrates the information contained in a composite
CIF video stream produced from an incoming QCIF I picture and an
incoming QCIF P picture using the H.263 codec format;
[0028] FIG. 6 is a diagram illustrating how the group number (GN)
in an incoming QCIF frame may be changed when the QCIF frame coded
in H.261 is incorporated in an outgoing composite CIF video stream
coded in H.261;
[0029] FIG. 7 is a diagram illustrating an exemplary implementation
of the current invention in a network that involves multiple video
terminals and that includes an MCU remote from the video
terminals;
[0030] FIG. 8 is a diagram illustrating an exemplary implementation
of the current invention in a network that involves multiple video
terminals and that includes an MCU associated with one of the video
terminals; and,
[0031] FIG. 9 is a diagram illustrating a video terminal
constructed in accordance with the invention.
[0032] The current invention can be implemented in either line
switching networks or packet switching networks. In a packet
switching network, video data are segmented into packets before
they are shipped through the network. A packet is an individual
object that travels through the network and contains one or a
fraction of a picture frame. The header of each packet provides
information about that packet, such as if it contains the end of a
frame. With this end-of-frame packet and the previous packets, if
it applies, the MCU gets all the data for a new picture frame.
Therefore, a MCU can tell if a new frame is received in a video
channel just by reading the packet header. Also, at the very
beginning of a video conference, before any video packet can be
sent, there is a call setup process which checks each participant's
capabilities, such as what kind of video codec is used, what kind
of audio codec is used. Once the call setup is done, each video
channel carries a video stream only in a certain standard codec
format, i.e. H.261 or H.263.
[0033] Briefly, in accordance with the invention, we provide an
improved method for receiving frames from at least first and second
incoming video channels and for alternately continuously
transmitting individual frames in at least a first outgoing video
stream to a first equipment apparatus for receiving the first video
stream and generating a video image including pictures from both of
the incoming video channels, and a second outgoing video stream to
a second equipment apparatus for receiving the second video stream
and generating a video image including pictures from both of the
incoming video channels. The method including the steps of matching
the codec format of the new frame, when there is a new frame
available, to that of at least the first equipment apparatus;
generating, after matching the codec format of the new frame to
that of the first equipment apparatus, a revised frame by altering
at least one header and coded domain data, if necessary, in the
available frame according to a selected picture format; generating
steady state data which indicates that there is no change in the
picture for the video channel which does not provide any new frame;
combining the revised frame produced and the steady state data
produced to generate a composite video signal in the first outgoing
video stream. The first equipment apparatus receives the composite
video signal and produces a video image including a picture, from
one of the channels, generated from the revised frame and including
a picture, from the remaining channel, which exists prior to
receipt of the composite video signal by the first equipment
apparatus and which, based on the steady state data in the video
signal, remains unchanged.
[0034] In another embodiment of the invention, we provide an
improved apparatus for receiving frames from at least first and
second incoming video channels and for alternately continuously
transmitting individual frames in at least a first outgoing video
stream to a first equipment apparatus for receiving the first video
stream and generating a video image including pictures from both of
the incoming video channels, and a second outgoing video stream to
a second equipment apparatus for receiving the second video stream
and generating a video image including pictures from both of the
incoming video channels. The improved apparatus includes apparatus,
when there is a new frame available, to match the codec format of
the new frame to that of at least the first equipment apparatus;
apparatus to generate, after the codec format of the new frame is
matched to that of the first equipment apparatus, a revised frame
by altering at least one header and coded domain data, if
necessary, in the new frame according to a selected picture format;
apparatus to generate steady state data which indicates that there
is no change in the picture for the video channel which does not
provide any new frame; apparatus to combine the revised frame and
the steady state data to generate a composite video signal in the
first outgoing video stream. The first equipment apparatus receives
the composite video signal and produces a video image including a
picture, from one of the channels, generated from the revised frame
in the video signal, and including a picture from the other channel
which exists prior to receipt of the composite video signal by the
first equipment apparatus and which, based on the steady state data
in the video signal, remains unchanged.
[0035] In a further embodiment of the invention, we provide an
improved method for receiving and transmitting video data across a
network. The method comprises the steps of receiving a call
initialization signal further comprising a codec identifying signal
that corresponds to a codec format, in a network interface from a
first video source; storing the codec identifying signal in a
memory; receiving a component video packet stream from the first
video source; disassembling the component video packet stream into
a component video signal; forming a composite video signal from the
component video signal, the composite video signal further
comprising the codec format; assembling the composite video signal
into a composite video packet stream further comprising the codec
format; and, transmitting the composite video packet stream to the
first video source. If, within a certain time frame, the component
video packet stream is received as a video packet stream for an
image, then the component video packet stream from the first video
source is received. The method can comprise the additional steps of
receiving a second call initialization signal further comprising a
second codec identifying signal that corresponds to a second codec
format, in the network interface from a second video source;
storing the second codec identifying signal in the memory;
receiving a second component video packet stream from the second
video source; disassembling the second component video packet
stream into a second component video signal; forming a second
composite video signal from the second component video signal, the
second composite video signal further comprising the codec format;
assembling the second composite video signal into a second
composite video packet stream further comprising the codec format;
forming a third composite video signal from the second component
video signal, the third composite video signal further comprising
the second codec format; assembling the third composite video
signal into a third composite video packet stream further
comprising the second codec format; transmitting the second
composite video packet stream to the first video source; and,
transmitting the third composite video packet stream to the second
video source.
[0036] In still another embodiment of the invention, we provide an
improved apparatus for receiving and transmitting video data across
a network. The apparatus includes a video processing unit. The
video processing unit further comprises a network interface for
receiving a call initialization signal from a video source, the
call initialization signal further comprising a codec format, for
receiving a component video packet stream from the video source,
and for transmitting a composite video packet stream to the video
source; a memory further comprising a call set-up algorithm for
identifying the codec format of the call initialization signal and
storing the codec format in the memory; a packet driver for
disassembling the component video packet stream into a component
video signal and for assembling a composite video signal into a
composite video packet stream; and, a multi-point control unit for
revising the component video signal into the composite video
signal. The packet driver is coupled to the multi-point control
unit, the memory and the network interface. The multi-point control
unit is coupled to the memory. The component video packet stream is
a video packet stream for an image. If desired, the network
interface can receive a second call initialization signal from a
second video source. The second call initialization signal further
comprises a second codec format such that the call setup algorithm
identifies the second codec format and stores the second codec
format in the memory. Further, if desired, the multi-point control
unit can sequentially sense whether a second component video packet
stream has been received at the network interface from the second
video source whereby upon receipt of the second component packet
stream the packet driver disassembles the second component video
packet stream into a second component video signal, the multi-point
control unit revises the second component video signal into a
second composite video signal in the codec format and the second
component video signal into a third composite video signal in the
second codec format such that the packet driver assembles the
second composite video signal into a second composite video packet
stream in the codec format and the third composite video signal
into a third composite video packet stream in the second codec
format such that the network interface transmits the second
composite video packet stream in the codec format to the video
source and the network interface transmits the third composite
video packet stream in the second codec format to the second video
source. If desired, the network interface can have the capability
to connect to a plurality of video sources such that each of the
plurality of video sources transmits a component video packet
stream further comprising a corresponding codec format, such that
the network interface receives the plurality of component video
packet streams from the plurality of video sources and transmits a
composite video packet stream to each of the plurality of video
sources in the same corresponding codec format comprised in the
component video packet streams transmitted from the plurality of
video sources.
[0037] In still a further embodiment of the invention, we provide a
method for receiving and transmitting video data across a network.
The method comprises the steps of receiving a video signal from a
first video source, the signal further comprising a codec
identifying signal that corresponds to a first codec format;
receiving a second video signal from a second video source, the
second signal further comprising a codec identifying signal that
corresponds to a second codec format different from the first codec
format; forming a composite video signal from the first and second
signals, the composite video signal further comprising the first
codec format; assembling the composite video signal into a
composite video packet stream further comprising the first codec
format; and, transmitting the composite video packet stream to the
first video source.
[0038] In yet another embodiment of the invention, we provide an
improved method for receiving and transmitting video data across a
network comprising the steps of receiving a call initialization
signal in a network interface from a first video source; receiving
a call initialization signal in the network interface from a second
video source; receiving a component packet stream from the first
video source; disassembling the component video packet stream into
a component video signal; forming a composite video signal from the
component video signal, the composite video signal further
comprising steady state data for the second video source;
assembling the composite video signal into a composite video packet
stream; and, transmitting the composite video packet stream to the
second video source.
[0039] In yet a further embodiment of the invention, we provide an
improved apparatus for receiving video data across a network. The
apparatus comprises a display further comprising a screen
displaying at least first and second images; a network interface
for receiving a composite video packet stream further comprising
coded domain data to revise the first image and steady state data
indicating that the second image is unchanged; a packet driver for
disassembling the composite video packet stream into a composite
video signal including the steady state data and the coded domain
data; and, a control unit to revise the first image in the display
with the coded domain data and, based on the steady state data, to
permit the second image to remain unchanged in the display.
[0040] In still yet another embodiment of the invention, we provide
an improved apparatus for receiving video data across a network.
The apparatus includes a display further comprising a screen
displaying at least first and second images; a network interface
for receiving a composite video packet stream further comprising
coded domain data to revise the first image; a packet driver for
disassembling the composite video packet stream into a composite
video signal including the coded domain data; and, a control unit
to receive the composite video signal and revise only the first
image in the display with the coded domain data.
[0041] In still yet a further embodiment of the invention, we
provide an improved apparatus for receiving and transmitting video
data across a network. The apparatus comprises a video processing
unit. The video processing unit further comprises a network
interface. The interface receives a call initialization signal from
a video source, the call initialization signal further comprising a
codec format; receives a component video packet stream from the
video source; receives a call initialization signal from a second
video source, the call initialization signal further comprising a
second codec format; and, transmits a composite video packet
stream. The video processing unit also includes a packet driver
coupled to the network interface for disassembling the component
video packet stream into a component video signal and for
assembling a composite video signal into a composite video packet
stream; a memory coupled to the packet driver and further
comprising a call set-up algorithm for identifying the codec
formats of the call initialization signals and storing the codec
formats in the memory; and, a multi-point control unit coupled to
the packet driver and the memory for revising the component video
signal into the composite video signal, the composite video signal
further comprising steady state data for the second video source.
The first and second codec formats can be identical or different
from one another.
[0042] Turning now to the drawings, which describe the presently
preferred embodiments of the invention for the purpose of
describing the operation and use thereof and not by way of
limitation of the scope of the invention, and in which like
reference characters refer to corresponding elements throughout the
several views, the following terms and definitions therefore as
utilized herein.
[0043] Assemble.
[0044] Take digital data from a video signal and organize data into
packets for transmission across a network.
[0045] Block.
[0046] A block is the fourth hierarchical layer in video syntax.
Data for a block consists of code words for transform coefficients.
The size of a block is 8 by 8. This term is used in both H.261 and
H.263 codec formats.
[0047] Call Set-Up.
[0048] A process executed at the very beginning of a video
conference, before any packet containing video pixel data is sent,
to determine the capabilities of each participant's video
equipment, such as the codec used by the video equipment.
[0049] Chrominance.
[0050] The difference determined by quantitative measurements
between a color and a chosen reference color of the same luminous
intensity, the reference color having a specified color quality.
This term is used in connection with H.261, H.263, and other codec
formats.
[0051] CIF.
[0052] CIF stands for common intermediate format. CIF is a picture
format which has, for luminance, 352 pixels per horizontal line and
288 lines, and has, for chrominance, 176 pixels per horizontal line
and 144 lines. CIF indicates the size of a digital picture that
appear on the display of a participant's video equipment. CIF
presently is used in connection with most codec formats.
[0053] COD.
[0054] COD stands for coded macroblock indication and is used in
connection with the H.263 codec format. A COD is one data bit in
the header of a macroblock (MB) in an INTER picture. If the data
bit is set to "1", no further information is transmitted. In
another words, the picture associated with and defined by this
macroblock does not change on the participant's screen and remains
the same.
[0055] Codec Format.
[0056] A format for coding and decoding digitized video data.
[0057] Coded Domain Data.
[0058] This is coded compressed picture data. In the presently
preferred embodiment of the invention the MCU receives coded domain
QCIF data and sends coded domain CIF data. This term is used in
connection with H.261, H.263, and other codec formats. Coded domain
data includes pixel domain data and headers for the layers of a
video frame. Coded domain data, as used herein, does not include
either data defining codec format or picture size format (i.e.,
CIF, QCIF, or comparable formats).
[0059] Component Video Signal.
[0060] A video signal sent from the video equipment of one of the
participants in a video conference to the MCU of the invention.
This signal includes digital data defining a sequential series of
frames. The digital data for each frame comprises video syntax.
Video syntax includes headers and pixel domain data, and, can, if
desired, include other data.
[0061] Component Video Packet Stream.
[0062] A packet stream containing the data in the component video
signal.
[0063] Composite Video Signal.
[0064] A video signal sent from the MCU of the invention to the
video equipment of one or more of the participants in a video
conference. The composite video signal is generated from a
component video signal(s) by altering the headers in the component
video signal that define codec format and picture size format
(i.e., QCIF, etc.) and, if necessary, altering the headers in the
coded domain data and, if necessary, all or a portion of the pixel
domain data in the coded domain data in the incoming component
video signal. The pixel domain data in the component video signal
ordinarily is only altered by rearranging the pixel domain data.
While it is possible that a portion of the pixel domain data can be
decoded, it is a principal object and advantage of the invention
that decoding of the pixel domain data is avoided during formation
of the composite video signal. The pixel domain data used to form a
composite video signal defines an image. If the image only
corresponds to a portion of a display picture, then when the video
equipment of a participant receives the composite video signal, it
uses the image to update only a portion of the display picture
shown on the participant's CRT or other display.
[0065] Composite Video Packet Stream.
[0066] A packet stream containing the data in a composite video
signal.
[0067] Computer.
[0068] A functional unit that can perform substantial computations,
including numerous arithmetic operations and logic operations
without human intervention during a run. In information processing,
the term computer usually describes a digital computer. A computer
may consist of a stand-alone unit or may consist of several
interconnected units.
[0069] Computer System.
[0070] A functional unit, consisting of one or more computers and
associated software, that uses common storage for all or part of a
program and also for all or part of the data necessary for the
execution of the program; executes user-written or user-designated
programs; performs user-designated data manipulation, including
arithmetic operations and logic operations; and that can execute
programs that modify themselves during their execution. A computer
system may be a stand-alone unit or may consist of several
interconnected units.
[0071] CPBY.
[0072] CPBY stands for coded block pattern for luminance and is
used in connection with the H.263 codec format. A CPBY is a
variable length code word in the header of a macroblock (MB) which
describes data in the macroblock.
[0073] DCT.
[0074] DCT stands for discrete cosine transformation. This
transformation is used to compress data and to eliminate
unnecessary information. DCT is used by the coding device of the
participant. This term is used in connection with H.261, H.263 and
other codec formats.
[0075] Digital Picture.
[0076] A digital picture is a frame. The digital data defining a
frame is called video syntax. Video syntax includes headers and
pixel domain data. The pixel domain data defines the display
picture that appears on the CRT or other display of a participant's
video conference equipment.
[0077] Digital Video Signal.
[0078] Digital data defining an image and including a codec format
and other pertinent information.
[0079] Disassemble.
[0080] To take the data in packets and produce a video signal.
[0081] Display Picture.
[0082] The picture that appears on the CRT or other screen or
display in a participant's video conference equipment as defined by
pixel domain data.
[0083] Frame.
[0084] A frame is one digital picture in a sequential series of
pictures in a video channel or other video stream. This term is
used in connection with H.261, H.263, and other codec formats.
[0085] Frame Rate.
[0086] The frame rate is the rate in frames per second that an MCU
receives a sequential series of frames. The frame rate currently
typically is about thirty frames per second. This term is used in
connection with H.261, H.263, and other codec formats.
[0087] GOB.
[0088] GOB stands for group of blocks. A GOB is the second
hierarchical layer in video syntax. This term is used in connection
with the H.261 and H.263 codec formats.
[0089] GN.
[0090] GN stands for group of block number. A GN consists of 4 bits
in H.261's header and 5 bits in H.263's header for a group of
blocks. Only the GN in the H.261 header is used in the practice of
the invention. The data bits indicate the position of the group of
blocks in a picture, i.e., upper left, upper right, lower left,
lower right.
[0091] Header (or Header File).
[0092] A header is information included at the beginning of a
picture, group of blocks, macroblock or block of information. The
header describes the information which follow the header. This term
is used in connection with the H.261 and H.263 codec formats and
other codec formats.
[0093] Horizontal Component.
[0094] The horizontal component is the pixels along a horizontal
line. This term is used in connection with the H.261 and H.263
codec formats and other codec formats.
[0095] H.261 Codec Format.
[0096] A standard format for coding and decoding digitized video
data. The format is provided by ITU-T.
[0097] H.263 Codec Format.
[0098] A standard format for coding and decoding digitized video
data. The format is provided by ITU-T.
[0099] IDCT.
[0100] IDCT stands for inverse discrete cosine transformation. The
IDCT is used to reverse or decode the DCT. This term is used in
connection with the H.261 and H.263 codec formats.
[0101] Image.
[0102] An image is one digital picture, more than one digital
picture, or a portion of a digital picture in a composite video
packet stream.
[0103] Ordinarily, a composite video packet stream received by a
participant's video conference equipment will contain the video
syntax defining a frame and will contain pixel domain data
pertaining to the entire display picture. It is, however, possible
for a composite video packet stream to contain digital data
defining or altering only a portion of the display picture. It is
also possible for an MCU to receive from two separate sources
digital data (i.e., to receive two or more frames) pertaining to
one display picture, and to combine such digital data during the
process of preparing the composite video signal. Accordingly, the
composite video signal and composite video packet stream contain
digital data defining an image, which image can comprise one
digital picture, more than one digital picture (combined to define
digital data defining a particular display picture), or a portion
of a digital picture.
[0104] Interface.
[0105] Hardware, software, or both that links systems, programs, or
devices.
[0106] INTRA.
[0107] This is an I-picture. An INTRA is a picture or a macroblock
type that has no reference picture(s) for prediction purposes. This
term is used in connection with the H.263 codec format.
[0108] INTER.
[0109] This is a P-picture. An INTER is a picture or a macroblock
type that has a temporally previous reference video data. This term
is used in connection with the H.263 codec format.
[0110] Layer.
[0111] A layer is one level of hierarchy in video syntax,
comprising a quantity of digitized data or information.
[0112] Lower Layer.
[0113] A lower layer is a layer in video syntax which is a part of
an upper layer and is lower than the picture layer. This term is
used in connection with the H.261, H.263, and other codec
formats.
[0114] Luminance.
[0115] Luminance is the luminous intensity of a surface in a given
direction per unit of projected area. This term is used in
connection with the H.261, H.263, and other codec formats.
[0116] Macroblock.
[0117] A macroblock (MB) is digital data or information. A MB
includes blocks and a header. This term is used in connection with
the H.261 and H.263 codec formats.
[0118] Mapping.
[0119] Mapping is modifying headers and coded domain data, if
necessary, in the video syntax for a H.261 or H.263 QCIF frame so
that the QCIF frame looks like a H.261 or H.263 CIF frame with the
QCIF data in one quarter (or some other portion) of the CIF frame
area. Although mapping changes or alters headers and other
information such as the coded domain data, it ordinarily does not
change the portions of the signal which define the pixels
comprising the picture of a participant that is produced on the
participant's screen of video equipment. This term is used in
connection with the H.261 and H.263 codec formats.
[0120] MB.
[0121] MB stands for macroblock, which is defined above.
[0122] MBA.
[0123] MBA stands for macroblock address. The MBA is a variable
length code word in the header of a MB that indicates the position
of the MB within a group of blocks. This term is used in connection
with the H.261 codec format.
[0124] MCBPC.
[0125] MCBPC indicates of the macroblock type and coded block
pattern for chrominance, and consist of a variable length code word
in the header of a MB. This terms is used in connection with the
H.263 codec format.
[0126] MCU.
[0127] MCU stands for multi-point control (or conference) unit. A
conventional MCU can operate either in a switched presence format
or in a continuous presence format. In the switched presence
format, the MCU receives video bit-streams from more than one
participant, selects only one of the video bit-streams and
transmits it simultaneously to each participant in a video
conference. In the continuous presence format, the MCU receive
video bit-streams from more than one participant, and
simultaneously transmits each stream in a split screen format to
each participant in the video conference. The MCU utilized in the
practice of the invention at any instant in time only transmits one
image of an incoming video signal plus some steady state data added
to the image to facilitate retransmission of the image data to
participants in a video conference. This term is used in connection
with the H.261, H.263, and other codec formats.
[0128] Memory.
[0129] The addressable storage space in a processing unit and other
internal storages that is used to execute instructions.
[0130] MVD.
[0131] MVD stands for motion vector data. An MVD is a variable
length code word in the header of a macroblock for the horizontal
component followed by a variable length code word for the vertical
component. This term is used in connection with the H.263 codec
format.
[0132] Network.
[0133] Techniques, physical connections, and computer programs used
to link two or more computers. Network users are able to share
files, printers, and other resources; send electronics messages;
and run programs on other computers.
[0134] Network Interface.
[0135] A component in a computer system for exchanging digital data
between that system and other computer systems in a network.
[0136] NOT COD.
[0137] NOT COD means the COD is set to "1". This term is used in
connection with the H.263 codec format.
[0138] Non-Reference Picture.
[0139] A non-reference picture is a skipped picture frame that is
received from a participant by the MCU and is skipped and not
retransmitted by the MCU. A non-reference picture frame ordinarily
is not retransmitted because it is identical, or nearly identical,
to the frame which was just previously transmitted by the MCU. This
term is used in connection with the H.261 and H.263 codec
formats.
[0140] Packet.
[0141] A basic unit of data transferred over a network such as the
Internet. A message to be transferred over the network is broken up
into small units, or packets, by the sending computer system. The
packets, which travel independently of one another, are marked with
the sender's address, destination address, and other pertinent
information, including data about any errors introduced during
transfer. When the packets arrive at the receiving computer, they
are reassembled.
[0142] Packet Driver.
[0143] A device or software program for disassembling a packet
video stream into a digital video signal and for assembling a
digital video signal into a packet video stream.
[0144] Packet Video Stream.
[0145] Digital data from a video signal assembled into packets for
transmission across a network.
[0146] Picture.
[0147] A picture is the first hierarchical layer in video syntax.
The information included in a picture is a header file plus the
GOB. The information includes the picture size format (QCIF, CIF,
etc.) information. This term is used in connection with the H.261
and H.263 codec formats and other codec formats.
[0148] Point-to-Point Function.
[0149] In a point-to-point function video conferencing system, only
two participants are involved. Such a system allows the first
person's picture to be sent to the second person or vice-versa. The
video of the first person is not combined with the video of another
person before it is sent to the second person.
[0150] QCIF.
[0151] QCIF stands for quarter-common intermediate format. QCIF is
a picture format which has, for luminance, 176 pixels per
horizontal line and 144 lines, and has, for chrominance, 88 pixels
per horizontal line and 72 lines. QCIF presently is used in
connection with most codec formats.
[0152] Quantizer.
[0153] A quantizer is data that indicates the accuracy of the
picture data.
[0154] Steady State Data.
[0155] Data in a composite video signal that indicates that there
is no change in a display picture for a specified area of a
recipient's CRT or other display.
[0156] TR.
[0157] Temporal reference. As used in connection with the H.263
codec format, the TR comprises eight bits of data in the header of
a picture layer. This data is produced by incrementing its value in
the temporally previous reference picture header by one plus the
number of skipped or non-reference pictures at the picture clock
frequency since the previously transmitted picture. As used in
connection with the H.261 codec format, this TR comprises five bits
of data in the header of a picture layer and is data that is
produced by incrementing its value in the temporally previous
reference picture header by one plus the number of skipped or
non-reference pictures at the picture clock frequency since the
previously transmitted picture.
[0158] Video Channel.
[0159] A path along which video signals can be sent.
[0160] Video Signal.
[0161] Data from a video source. The data can comprise a call
initialization signal, a component video signal, a composite video
signal, codec format, picture size information, a component video
packet stream, a composite video packet stream, and/or other
pertinent information.
[0162] Video Syntax.
[0163] Video syntax is digitized data that describes and defines a
video frame. Video syntax is a defined arrangement of information
contained in a video frame. The information is arranged in a
hierarchical structure which has four layers:
[0164] Picture
[0165] Group of blocks (GOB)
[0166] Macroblocks (MB)
[0167] Blocks
[0168] Each layer includes a header file.
[0169] In the following description of the presently preferred
embodiments of the invention, it is assumed that there are four
participants in a video conference and that each participant has
video conferencing equipment which generates video signals
comprising a video channel which is received by a MCU constructed
in accordance with the invention. As would be appreciated by those
of skill in the art, the apparatus and method of the invention can
be utilized when there are two or more participants in a video
conference. The method and apparatus of the invention ordinarily
are utilized when there are three or more participants in a video
conference.
[0170] In the video conferencing system of the invention, the MCU
generates an outgoing composite CIF signal. The MCU divides the
outgoing composite CIF signal into orthogonal quarters, namely, an
upper left quarter, an upper right quarter, a lower left quarter,
and a lower right quarter. Each incoming channel from a participant
comprises a component QCIF signal. Each channel is assigned to one
of the orthogonal quarters of the outgoing composite CIF signal.
When a frame arrives at the MCU in one of the channels, the frame
is assigned by the MCU to the orthogonal quarter of the outgoing
composite CIF signal that is reserved or selected for that
channel.
[0171] It is understood that a variety of codec formats exist or
will be developed and can be incorporated into the invention.
However, for the following discussion it is assumed that the video
equipment utilized by each participant in a video conference
utilizes either the H.261 or H.263 codec format. And, more
specifically, it is assumed that the incoming component signal 10
(from the first participant) is in the H.261 codec format and that
incoming component signals 11 (from the second participant), 12
(from the third participant), 13 (from the fourth participant) in
FIG. 2 are in the H.263 codec format. This means that the outgoing
composite signal 14 (FIG. 2) produced by the MCU for the first
participant will be in the H.261 codec format and that the outgoing
composite signal 14 produced by the MCU for the second, third, and
fourth participants will be in the H.263 codec format.
[0172] Since the MCU is, at any instant, basically updating only
one-quarter of the outgoing composite CIF signal, the structure of
the MCU of the invention is, in comparison to conventional MCUs,
simplified, and, the computation power required to operate the MCU
of the invention is, in comparison to conventional MCUs,
significantly reduced. Also, since the MCU of the invention works,
in contrast to conventional MCUs, only on coded domain data, the
MCU of the invention requires only small amount of memory. This
reduction in complexity, computation power, and memory size enables
the practice of the invention to be employed in any existing
point-to-point video conferencing equipment, such as personal
computers, 2.5G/3G video mobile phones, notebook computers,
personal digital assistants (PDA), game consoles, etc., without any
additional support from a central server.
[0173] As noted above, for sake of this example, it is assumed that
there are four participants in a video conference. The video
equipment of the first participant produces a channel comprising an
incoming component QCIF signal 10. The video equipment of the
second participant produces a channel comprising an incoming
component QCIF signal 11. The video equipment of the third
participant produces a channel comprising an incoming component
QCIF signal 12. The video equipment of the fourth participant
produces a channel comprising an incoming component QCIF signal 13.
The camera, computer, CRT or other video screen, and other video
equipment used by each participant to produce a channel comprising
a QCIF signal is well known in the art and will not be described in
detail herein.
[0174] The MCU receives the incoming component QCIF signals 10, 11,
12, 13 and combines them into an outgoing composite CIF signal 14.
Please see FIG. 2. Each component QCIF signal comprises a stream of
digital frames or pictures. Digital frames in component QCIF signal
10 are utilized to update the upper left quadrant of an outgoing
composite CIF signal 14. Digital frames in component QCIF signal 11
are utilized to update the upper right quadrant of an outgoing
composite CIF signal 14. Digital frames in component QCIF signal 12
are utilized to update the lower left quadrant of an outgoing
composite CIF signal 14. Digital frames in component QCIF signal 13
are utilized to update the lower right quadrant of an outgoing
composite CIF signal 14. Each time a new composite CIF signal 14 is
generated by the MCU, in the presently preferred embodiment of the
invention, the new signal contains information which basically only
changes the picture in one quadrant of the CIF signal. As would be
appreciated by those of skill in the art, it is possible in
accordance with the invention to configure the MCU such that each
time a new composite CIF signal 14 is generated, the picture in two
or more quadrants of the CIF signal are changed. Also, it is
possible in accordance with the invention to configure the MCU such
that each time a new composite CIF signal 14 is generated, only a
portion of the picture in a quadrant of the CIF signal is changed.
But in the presently preferred embodiment of the invention, only
the picture in one quadrant of the CIF signal is changed each time
a new composite CIF signal is generated by the MCU.
[0175] As would be appreciated by those of skill in the art, each
incoming channel can comprise a component sub-QCIF signal, and the
outgoing composite signal can be a composite CIF signal which
contains 6 sub-QCIF pictures plus some empty space. Or, each
incoming channel can comprise a component CIF signal and the
outgoing signal can be a composite 4CIF signal. Or, each incoming
channel can comprise a component 4CIF signal and the outgoing
signal can be a composite 16CIF signal, etc. Or, other standardized
or non-standardized picture formats can be adopted. In the
continuous presence mode of H.263, at most 4 video signals can be
transmitted. This is in direct contrast to the capability of some
possible embodiments of the invention in which, for example, four
CIF pictures, each containing four QCIF pictures, can add up to one
composite 4CIF picture containing the pictures of sixteen
participants. Also, pictures of different sizes can be accommodated
in the invention. For example, an outgoing composite 4CIF signal
can contain two incoming CIF signals in its upper left and upper
right quadrants while its lower left and lower right quadrants can
contains eight QCIF signals.
[0176] It is understood that one component signal 10 may transmit
new frames or pictures to the MCU at a higher or slower rate than
component signals 11, 12, 13. This does not alter operation of the
MCU, because the MCU basically operates on a first-come,
first-serve basis. E.g., as soon as the MCU receives an image from
a component signal 10 to 13, it processes that particular image and
generates and transmits a composite CIF signal 14 to the video
equipment of each of the participants. As would be appreciated by
those of skill in the art, the MCU can, if desired, process every
other frame, every third frame, or other designated intermittent
frames. The MCU then processes the next frame it receives and
generates and transmits a composite CIF signal 14 to the video
equipment of each of the participants, and so on. Since the
equipment of one of the participants utilizes the H.261 codec
format and the equipment of the remaining participants utilizes the
H.263 codec format, each time the MCU receives and processes a
frame via one of component signals 10 to 13, the MCU generates both
a composite CIF signal 14 in the H.261 codec format and a composite
CIF signal 14 in the H.263 codec format.
[0177] If an incoming component QCIF signal 10 is in the H.261
codec format and the outgoing composite CIF signal is in the H.263
format, a frame from the signal 10 is converted from the H.261
codec format to the H.263 codec format when the MCU is generating
an outgoing composite signal 14 in the H.263 codec format.
Similarly, if an incoming component QCIF signal 11 to 13 is in the
H.263 codec format and the outgoing composite CIF sign is in the
H.261 codec format, a frame from the component signal 11 to 13 is
converted from the H.263 codec format to the H.261 codec format
when the MCU is generating an outgoing composite signal 14 in the
H.261 codec format.
[0178] Part I of Example: Composite CIF Signal Transmitted in H.263
Codec Format
[0179] In this part of the example, it is assumed that the MCU is
processing incoming component signals 10 to 13 to produce an
outgoing composite signal 14 which is in the H.263 codec format,
which outgoing composite signal 14 will be sent to the second,
third, and fourth participants identified above.
[0180] The MCU monitors the incoming component signals 10 to 13 and
waits to receive a new frame from one of component signals 10 to
13. Component signal 10 is the first signal to transmit a new QCIF
frame to the MCU. The MCU alters the headers and coded domain data
of the QCIF frame to change the frame from an H.261 codec format to
the H.263 codec format. The altered headers indicate that the frame
is an INTER picture (i.e., is a P picture). The MCU retains the
digital data (i.e., the pixel domain data) in the frame which
defines the video picture of the first participant. Although the
digital data which defines the video picture of the first
participant may be rearranged by the MCU, the video picture which
results is unchanged, or is substantially unchanged, by the
MCU.
[0181] The MCU prepares outgoing composite CIF signal 14A depicted
in FIG. 3. First, a CIF picture header which has a picture type of
CIF and a picture coding type of INTER (P picture) is generated.
Then, a proper temporal reference is assigned to the picture. The
temporal reference indicates the number of non-transmitted
pictures. Therefore, the temporal reference is incremented by 1 for
each picture. The H.263 codec format includes a frame skipping
feature which presently is not utilized in the practice of the
invention.
[0182] Since frames received from component QCIF signal 10 have
been assigned to the upper left quadrant of an outgoing composite
CIF signal 14, the MCU inserts in the upper left quadrant of the
outgoing composite CIF signal 14A the QCIF frame produced by the
MCU by converting the QCIF picture it receives via component signal
10 from the H.261 codec format to the H.263 codec format. Since the
new QCIF frame is in the upper left quadrant, each GOB data in the
QCIF frame, from top to bottom, goes through necessary MVD
modifications since it may refers to different MVD in the CIF
picture. After each GOB goes through the necessary MVD
modifications, it links up with eleven MB headers for the upper
right quadrant (each of which is assigned the bit "1" to designate
NOT COD) and becomes a new CIF GOB. Each of the MB headers for the
lower left and lower right quadrants is filled with the bit "1" to
designate NOT COD.
[0183] The resulting outgoing composite CIF signal 14A is
illustrated in FIG. 3. When this composite signal is transmitted
and is received by participants two, three, and four, the video
equipment of these participants inserts the picture illustrated in
the upper left quadrant in 14A in the upper left quadrant of the
video picture shown on each of the participant's CRTs or other
screens. The pictures shown on the CRTs or other screens in the
remaining quadrants remain unchanged.
[0184] The MCU transmits composite CIF signal 14A to participants
two, three, and four.
[0185] After transmitting composite CIF signal 14A, the MCU again
monitors the incoming component signals 10 to 13 in a round-robin
fashion. Component signal 11 is checked to see if it contains a new
frame. If component signal 11 does not contain a new frame, MCU
moves on and checks if channel 12 contains a new frame and so on.
If component signal 11 contains a new frame, the following
procedure will be followed.
[0186] Since the frame is already in the H.263 codec format, it is
not necessary to change the frame from the H.261 codec format to
the H.263 codec format.
[0187] This frame is found to be an INTRA picture (I picture). The
MCU converts it into an INTER or P picture, see FIG. 5. At the
macro block level, MB Type is set to INTRA or INTRA+Q if a
quantizer is modified, and COD is added. MCBPC is transferred from
the table for the I picture to the table for the P picture. CBPY
takes the complement of its original value. This procedure for
changing the headers and rearrangement of the coded domain data, if
necessary, to indicate an INTER picture is well known by those of
ordinary skill in the art.
[0188] The MCU prepares outgoing composite CIF signal 14B depicted
in FIG. 3. First, a CIF picture header which has a picture type of
CIF and a picture coding type of INTER (P picture) is generated.
Then, a proper temporal reference is assigned to the picture. The
temporal reference indicates the number of non-transmitted
pictures. The temporal reference is incremented by 1 for each
picture in the method of the invention. The H.263 codec format
includes a frame skipping feature which presently is not utilized
in the practice of the invention.
[0189] Since frames received from component QCIF signal 11 have
been assigned to the upper right quadrant of an outgoing composite
CIF signal 14, the MCU inserts in the upper right quadrant of the
outgoing composite CIF signal 14B the QCIF frame produced by the
MCU by converting the QCIF I picture it receives via component
signal 11 into a QCIF P picture, both in H.263 codec format. Since
the QCIF frame is in the upper right quadrant, every GOB data in
the QCIF frame, from top to bottom, goes through necessary MVD
modifications since it refers to different MVD in the CIF picture.
After each GOB goes through the necessary MVD modifications, it
links up with eleven MB headers for the upper left quadrant (each
of which is assigned the bit "1" to designate NOT COD) and becomes
a new CIF GOB. Each of the MB headers for the lower left and lower
right quadrants is filled with the bit "1" to designate NOT
COD.
[0190] The resulting outgoing composite CIF signal 14B is
illustrated in FIG. 3. When this signal is transmitted and is
received by participants two, three, and four, the video equipment
of these participants inserts the picture illustrated in the upper
right quadrant of composite CIF signal 14B in the upper right
quadrant of the video picture shown on each of the participant's
CRTs or other screens. The pictures shown on the CRTs or other
screens in the remaining quadrants remain unchanged.
[0191] The MCU transmits composite CIF signal 14B to participants
two, three, and four.
[0192] The MCU again monitors the incoming component signals 10 to
13 for a new incoming frame in a round-robin fashion. The MCU
receives a new frame from component signal 12.
[0193] Since the frame received from component signal 12 is already
in the H.263 codec format, it is not necessary to change the frame
from the H.261 codec format to the H.263 codec format.
[0194] This frame is found to be an INTER picture (P picture).
Therefore, the MCU does not need to convert it into P picture
format.
[0195] The MCU prepares outgoing composite CIF signal 14C depicted
in FIG. 3. First, a CIF picture header which has a picture type of
CIF and a picture coding type of INTER (P picture) is generated,
see FIG. 5. Then, a proper temporal reference is assigned to the
picture. The temporal reference indicates the number of
non-transmitted pictures. The temporal reference is incremented by
1 for each picture in the method of the invention. The H.263 codec
format includes a frame skipping feature which presently is not
utilized in the practice of the invention.
[0196] Each of the eleven MB headers for the upper left and upper
right quadrants of the outgoing composite CIF signal is filled with
the bit "1" to designate NOT COD. Then, since frames received from
component QCIF signal 12 have been assigned to the lower left
quadrant of an outgoing composite CIF signal 14, the MCU inserts in
the lower left quadrant of the outgoing composite CIF signal 14C
the QCIF frame received by the MCU via component signal 12. Since
the QCIF frame is in the lower left quadrant, every GOB data in the
QCIF frame, from top to bottom, goes through necessary MVD
modifications since it refers to different MVD in the CIF picture.
After each GOB goes through the necessary MVD modifications, it
links up with eleven MB headers for the lower right quadrant (each
of which is assigned the bit "1" to designate NOT COD) and becomes
a new CIF GOB.
[0197] The resulting outgoing composite CIF signal 14C is
illustrated in FIG. 3. When this signal is transmitted and is
received by participants two, three, and four, the video equipment
of these participants inserts the picture illustrated in the lower
left quadrant of composite CIF signal 14C in the lower left
quadrant of the video picture shown on each of the participant's
CRTs or other screens. The pictures shown on the CRTs or other
screens in the remaining quadrants remain unchanged.
[0198] The MCU transmits composite CIF signal 14C to participants
two, three, and four.
[0199] The MCU again monitors the incoming component signals 10 to
13 for new incoming frame in a round-robin fashion. The MCU
receives a new frame from component signal 13.
[0200] Since the frame received from component signal 13 is already
in the H.263 codec format, it is not necessary to change the frame
from the H.261 codec format to the H.263 codec format.
[0201] This frame is found to be an INTER picture (P picture).
Therefore, the MCU does not need to convert it into P picture
format.
[0202] The MCU prepares outgoing composite CIF signal 14D depicted
in FIG. 3. First, a CIF picture header which has a picture type of
CIF and a picture coding type of INTER (P picture) is generated,
see FIG. 5. Then, a proper temporal reference is assigned to the
picture. The temporal reference indicates the number of
non-transmitted pictures. The temporal reference is incremented by
1 for each picture in the method of the invention. The H.263 codec
format includes a frame skipping feature which presently is not
utilized in the practice of the invention.
[0203] Each of the eleven MB headers for the upper left and upper
right quadrants of the outgoing composite CIF signal is filled with
the bit "1" to designate NOT COD. Then, since frames received from
component QCIF signal 13 have been assigned to the lower right
quadrant of an outgoing composite CIF signal 14, the MCU inserts in
the lower right quadrant of the outgoing composite CIF signal 14D
the QCIF frame received by the MCU via component signal 13. Since
the QCIF frame is in the lower right quadrant, every GOB data in
the QCIF frame, from top to bottom, goes through necessary MVD
modifications since it refers to different MVD in the CIF picture.
After each GOB goes through the necessary MVD modifications, it
links up with eleven MB headers for the lower left quadrant (each
of which is assigned the bit "1" to designate NOT COD) and becomes
a new CIF GOB.
[0204] The resulting outgoing composite CIF signal 14D is
illustrated in FIG. 3. When this signal is transmitted and is
received by participants two, three, and four, the video equipment
of these participants inserts the picture illustrated in the lower
right quadrant of composite CIF signal 14D in the lower right
quadrant of the video picture shown on each of the participant's
CRTs or other screens. The pictures shown on the CRTs or other
screens in the remaining quadrants remain unchanged.
[0205] The MCU transmits composite CIF signal 14D to participants
two, three, and four.
[0206] Part II of Example: Composite CIF Signal Transmitted in
H.261 Codec Format
[0207] In this part of the example, it is assumed that the MCU is
processing incoming component signals 10 to 13 to produce an
outgoing composite signal 14 which is in the H.261 codec format,
which outgoing composite signal 14 will be sent only to the first
participant identified above.
[0208] The MCU again monitors the incoming component signals 10 to
13 for new incoming frame in a round-robin fashion. Let component
signal 10 be the first signal to transmit a new frame 10A to the
MCU. Since the frame is already in the H.261 codec format it is not
necessary for the MCU to modify the frame from the H.263 codec
format to the H.261 codec format.
[0209] The MCU prepares outgoing composite CIF signal 14E depicted
in FIG. 4A. First, a CIF picture header which has a picture type of
CIF is generated. Then, a proper temporal reference is assigned to
the picture.
[0210] Since frames received from component QCIF signal 10 have
been assigned to the upper left quadrant of an outgoing composite
CIF signal 14E, the MCU inserts in the upper left quadrant of the
outgoing composite CIF signal 14E the QCIF frame received by the
MCU via component signal 10. If necessary, the GNs for the QCIF
frame should be altered to correspond to the GNs illustrated in
FIG. 6. Since a QCIF frame in H.261 codec format has GNs 1, 3, 5
which match those of the upper left quadrant of a CIF frame in
H.261 codec format, they don't need to be altered.
[0211] The MCU fills the upper right quadrant of composite signal
14E with GOB headers each containing the correct GN 2, 4, or 6, as
the case may be. The headers in each GOB are not followed by any
macro block data. Similarly, the MCU fills the lower left quadrant
of composite CIF signal 14E with GOB headers each containing the
correct GN 7, 9, or 11, as the case may be. The headers in each GOB
in the lower left quadrant are not followed by any macro block
data. Finally, the MCU fills the lower right quadrant of composite
CIF signal 14E with GOB headers each containing the correct GN 8,
10 or 12, as the case may be. The headers in each GOB for the lower
right quadrant are not followed by any macro block data. When a GOB
header, with a proper GN, is not followed by any additional macro
block data, Skip MBA is indicated, which means that the picture in
that quadrant is not updated by a participant's video equipment
when the equipment receives that particular composite CIF signal
14E.
[0212] The resulting outgoing composite CIF signal 14E is
illustrated in FIG. 4A. When this signal is transmitted and is
received by participant one, the video equipment of this
participant inserts the picture contained in the QCIF frame in the
upper left quadrant of the video picture shown on the participant's
CRT or other screen. The pictures shown on the participant's CRT in
the remaining quadrants remain unchanged.
[0213] After transmitting composite CIF signal 14E to the first
participant, the MCU again monitors the incoming component signals
10 to 13 and waits to receive a new frame. The MCU receives a new
frame 11A from component signal 11.
[0214] Since the frame is in the H.263 codec format, the MCU
changes the codec format to H.261 codec format. When the H.263
codec format is changed to the H.261 codec format, it makes no
difference whether the incoming picture is an I picture or a P
picture. The MCU retains the digital data (i.e., the pixel domain
data) in the frame which defines the video picture of the second
participant. Although the digital data (pixel domain data) which
defines the video picture of the second participant may be
rearranged by the MCU, the video picture which results is
unchanged, or is substantially unchanged, by the MCU.
[0215] The MCU prepares outgoing composite CIF signal 14F depicted
in FIG. 4A. First, a CIF picture header which has a picture type of
CIF is generated. Then, a proper temporal reference is assigned to
the picture.
[0216] Since frames received from component QCIF signal 11A have
been assigned to the upper right quadrant of an outgoing composite
CIF signal 14F, the MCU inserts in the upper right quadrant of the
outgoing composite CIF signal 14F the QCIF frame produced by the
MCU by converting the QCIF picture it receives via component signal
11A from the H.263 codec format to the H.261 codec format. The GNs
for the QCIF frame are altered to correspond to the GNs illustrated
in FIG. 6. Since the QCIF frame has GNs of 1, 3 and 5, these
numbers are changed to 2, 4, and 6 because the QCIF frame is
inserted in the upper right quadrant of the outgoing composite CIF
signal. The GNs for the upper right quadrant of the composite CIF
signal 14F must, as shown in FIG. 6, be 2, 4, 6.
[0217] The MCU fills the upper left quadrant of composite signal
14F with GOB headers each containing the correct GN 1, 3, or 5, as
the case may be. The headers in each GOB are not followed by any
macro block data. Similarly, the MCU fills the lower left quadrant
of composite CIF signal 14F with GOB headers each containing the
correct GN 7, 9, or 11, as the case may be. The headers in each GOB
in the lower left quadrant are not followed by any macro block
data. Finally, the MCU fills the lower right quadrant of composite
CIF signal 14F with GOB headers each containing the correct GN 8,
10 or 12, as the case may be. The headers in each GOB for the lower
right quadrant are not followed by any macro block data. When a GOB
header, with a proper GN, is not followed by any additional macro
block data, Skip MBA is indicated, which means that the picture in
that quadrant is not updated by a participant's video equipment
when the equipment receives that particular composite CIF signal
14F.
[0218] The resulting outgoing H.261 codec format composite CIF
signal 14F is illustrated in FIG. 4A. When this signal is
transmitted and is received by participant one, the video equipment
of this participant inserts the picture contained in the QCIF frame
in the upper right quadrant of the video picture shown on the CRT
or other screen of participant one. The pictures shown on the
participant's CRT in the remaining quadrants remain unchanged.
[0219] After transmitting composite CIF signal 14F to the first
participant, the MCU again monitors the incoming component signals
10 to 13 and waits to receive a new frame. The MCU receives a new
frame 12A from component signal 12.
[0220] Since the frame is in the H.263 codec format, the MCU
changes the codec format to H.261. When the H.263 codec format is
changed to the H.261 codec format, it makes no difference whether
the incoming picture is an I picture or a P picture. The MCU
retains the digital data (i.e., the pixel domain data) in the frame
which defines the video picture of the third participant. Although
the digital data (pixel domain data) which defines the video
picture of the third participant may be rearranged by the MCU, the
video picture which results is unchanged, or is substantially
unchanged, by the MCU.
[0221] The MCU prepares outgoing composite CIF signal 14G depicted
in FIG. 4B. First, a CIF picture header which has a picture type of
CIF is generated. Then, a proper temporal reference is assigned to
the picture.
[0222] Since frames received from component QCIF signal 12A have
been assigned to the lower left quadrant of an outgoing composite
CIF signal 14G, the MCU inserts in the lower left quadrant of the
outgoing composite CIF signal 14G the QCIF frame produced by the
MCU by converting the QCIF picture it receives via component signal
12A from the H.263 codec format to the H.261 codec format. The GNs
for the QCIF frame are altered to correspond to the GNs illustrated
in FIG. 6. Since the QCIF frame has default GNs of 1, 3 and 5,
these numbers are changed to 7, 9, and 11 because the QCIF frame is
inserted in the lower left quadrant of the outgoing composite CIF
signal. The GNs for the lower left quadrant of the composite CIF
signal 14G must, as shown in FIG. 6, be 7, 9, 11.
[0223] The MCU fills the upper left quadrant of composite signal
14G with GOB headers each containing the correct GN 1, 3, or 5, as
the case may be. The headers in each GOB are not followed by any
macro block data. Similarly, the MCU fills the upper right quadrant
of composite CIF signal 14G with GOB headers each containing the
correct GN 2, 4, 6, as the case may be. The headers in each GOB in
the upper right quadrant are not followed by any macro block data.
Finally, the MCU fills the lower right quadrant of composite CIF
signal 14G with GOB headers each containing the correct GN 8, 10 or
12, as the case may be. The headers in each GOB for the lower right
quadrant are not followed by any macro block data. When a GOB
header, with a proper GN, is not followed by any additional macro
block data, Skip MBA is indicated, which means that the picture in
that quadrant is not updated by a participant's video equipment
when the equipment receives that particular composite CIF signal
14G.
[0224] The resulting outgoing H.261 codec format composite CIF
signal 14G is illustrated in FIG. 4B. When this signal is
transmitted and is received by participant one, the video equipment
of this participant inserts the picture contained in the QCIF frame
in the lower left quadrant of the video picture shown on the CRT or
other screen of participant one. The pictures shown on the
participant's CRT in the remaining quadrants remain unchanged.
[0225] After transmitting composite CIF signal 14G to the first
participant, the MCU again monitors the incoming component signals
10 to 13 and waits to receive a new frame. The MCU receives a new
frame 13A from component signal 13.
[0226] Since the frame is in the H.263 codec format, the MCU
changes the codec format to H.261. When the H.263 codec format is
changed to the H.261 codec format, it makes no difference whether
the incoming picture is an I picture or a P picture. The MCU
retains the digital data (i.e., the pixel domain data) in the frame
which defines the video picture of the fourth participant. Although
the digital data (pixel domain data) which defines the video
picture of the fourth participant may be rearranged by the MCU, the
video picture which results is unchanged, or is substantially
unchanged, by the MCU.
[0227] The MCU prepares outgoing composite CIF signal 14H depicted
in FIG. 4B. First, a CIF picture header which has a picture type of
CIF is generated. Then, a proper temporal reference is assigned to
the picture.
[0228] Since frames received from component QCIF signal 13A have
been assigned to the lower right quadrant of an outgoing composite
CIF signal 14H, the MCU inserts in the lower right quadrant of the
outgoing composite CIF signal 14H the QCIF frame produced by the
MCU by converting the QCIF picture it receives via component signal
13A from the H.263 codec format to the H.261 codec format. The GNs
for the QCIF frame are altered to correspond to the GNs illustrated
in FIG. 6. Since the QCIF frame has GNs of 1, 3 and 5, these
numbers are changed to 8, 10, and 12 because the QCIF frame is
inserted in the lower right quadrant of the outgoing composite CIF
signal. The GNs for the lower right quadrant of the composite CIF
signal 14H must, as shown in FIG. 6, be 8,10, 12.
[0229] The MCU fills the upper left quadrant of composite CIF
signal 14H with GOB headers each containing the correct GN 1, 3, or
5, as the case may be. The headers in each GOB are not followed by
any macro block data. Similarly, the MCU fills the upper right
quadrant of composite CIF signal 14H with GOB headers each
containing the correct GN 2, 4, 6, as the case may be. The headers
in each GOB in the upper right quadrant are not followed by any
macro block data. Finally, the MCU fills the lower left quadrant of
composite CIF signal 14H with GOB headers each containing the
correct GN 7, 9, 11, as the case may be. The headers in each GOB
for the lower left quadrant are not followed by any macro block
data. When a GOB header, with a proper GN, is not followed by any
additional macro block data, Skip MBA is indicated, which means
that the picture in that quadrant is not updated by a participant's
video equipment when the equipment receives that particular
composite CIF signal 14H.
[0230] The resulting outgoing H.261 codec format composite CIF
signal 14H is illustrated in FIG. 4B. When this signal is
transmitted and is received by participant one, the video equipment
of this participant inserts the picture contained in the QCIF frame
in the lower right quadrant of the video picture shown on the CRT
or other screen of participant one. The pictures shown on the
participant's CRT in the remaining quadrants remain unchanged.
[0231] As would be appreciated by those of skill in the art, a
variety of codec formats other than H.263 and H.261 exist and can
be utilized in accordance with the invention to receive and
transmit only one image at a time from a plurality of incoming
channels during a video conference between a plurality of
participants.
[0232] The equipment needed to transmit to the MCU component QCIF
(or other) signals from each participant in a video conference and
to transmit composite CIF (or other) signals from the MCU to
selected ones of the participants in a video conference is well
known and is not described in detail herein.
[0233] Instead of transmitting in a quadrant of a CIF signal 14 a
picture of a video conference participant, other information can be
transmitted. For example, video clips, documents, spread sheets,
presentations can be integrated into CIF signal 14 and appear, for
example, in the lower right quadrant instead of a picture of one of
the participants in the video conference.
[0234] FIG. 1 illustrates video sources 1, 2, 3 . . . N
communicating with one another via an MCU during a video
conference. As is illustrated in FIGS. 7 and 8, the MCU can be
located on a server or other computer system separate from the
video equipment of the participants in the video conference, or can
be incorporated in the video equipment of one of the participants
in the video conference.
[0235] FIG. 7 presents an exemplary video conferencing system in
accordance with the present invention. In this system, video
streams are transmitted across a network among N video equipment
via an MCU 50. The functions of call set-up program, video data
source (camera, codec, etc.), packet driver and network interface
hardware are well known in the art and will not be described in
detail herein.
[0236] Unlike other MCU designs which need to reside on a separate
server computer, under current invention an MCU 60 can, due to its
low hardware and software demands, be easily incorporated in one of
the video equipment K (FIG. 8) through software modifications. As
earlier noted, one of the key reasons the MCU of the invention has
low hardware and software demands is that the MCU works with the
coded domain data and does not need to decode the coded domain
data, in particular the pixel domain data contained in the coded
domain data. The MCU of the invention may change headers in the
coded domain data, and may rearrange the pixel domain data. The MCU
of the invention ordinarily does not, however, decode the pixel
domain data.
[0237] FIG. 9 illustrates a video data source and call set-up
including a controller 27, memory 26, camera 21, and display 20.
Also illustrated in FIG. 9 associated with said data source are the
other video equipment packet driver 28 and network interface 29.
The video data source transmits and receives data to and from MCU
30. Other equipment 31 is also in communication with MCU 30.
[0238] A packet driver, network interface, memory and call set-up
are associated with MCU 30 in the manner illustrated in FIGS. 7 and
8.
[0239] Video data from camera is stored in memory as video data 24.
The video data typically comprises a picture of one of the
participants in the video conference. Controller 27 can direct the
participant's CRT or other display 20 to show a picture of the
participant.
[0240] During a video conference, the call set-up sub-routine 23
transmits call-initialization signals, including codec format data
and other call-initialization data 32, to packet driver 28. Driver
28 assembles such data into packets and transmits the packets 38 to
interface 29. Interface 29 transmits 42 the packet to MCU 30. The
MCU 30 also transmits the packets 44, containing its
call-initialization data, to network interface 29. Interface 29
transmits the call-initialization packets 40 to driver 28. Driver
28 disassembles the packet into a call-initialization signal 34 to
call set-up data 25 in memory 26 and to call set-up sub-routine 23.
Once this "hand shake" protocol exchange is successfully completed,
controller 27 sends a video component signal 33 to packet driver
28. The video component signal 33 is produced by video signal
sub-routine 22 using video data 24. Driver 28 assembles the video
component signal 33 into a video component packet stream 39.
Interface 29 transmits 43 the packet stream 39 to MCU 30. MCU 30
receives stream 43. The component packet stream 43 is disassembled
by the packet driver associated with the MCU. The MCU 30 prepares a
composite video signal in the manner earlier described. The packet
driver associated with MCU 30 receives the composite video signal
and prepares a composite video packet stream that is transmitted to
the network interface associated with MCU 30. The network interface
associated with MCU 30 transmits the composite video packet stream
45 to interface 29. Interface 29 transmits 41 the composite video
packet stream 45 to packet driver 28. Driver 28 disassembles the
packets in stream 41 to produce a composite video signal 37 to
video data 24 and a composite video signal 35 to sub-routine 22.
Signals 35 and 37 contain, for sake of this example, the same video
data. Controller 27 causes the display picture on display 20 to be
altered in accordance with the data received 41 in the composite
video packet stream.
[0241] It should be noted that the current invention can be
implemented using many various communication media, such as local
area networks (LAN), mobile wireless communication networks, ISDN,
cable/DSL, ATM networks and wired telephone networks. Also, as
discussed earlier in this application, the video equipment used in
such a system can be a mixture of personal computers, 2.5G/3G video
mobile phone, notebook computers, PDA, game consoles, etc.
* * * * *