System And Method For Decreasing End-to-end Delay During Video Conferencing Session GU; QUNSHAN ; et al. [POLYCOM, INC.]

System And Method For Decreasing End-to-end Delay During Video Conferencing Session

GU; QUNSHAN ; et al.

Patent Application Summary

U.S. patent application number 11/549927 was filed with the patent office on 2007-05-24 for system and method for decreasing end-to-end delay during video conferencing session. This patent application is currently assigned to POLYCOM, INC.. Invention is credited to QUNSHAN GU, JUAN ROJAS.

Application Number	20070116113 11/549927
Document ID	/
Family ID	32073064
Filed Date	2007-05-24

United States Patent Application	20070116113
Kind Code	A1
GU; QUNSHAN ; et al.	May 24, 2007

SYSTEM AND METHOD FOR DECREASING END-TO-END DELAY DURING VIDEO CONFERENCING SESSION

Abstract

A method for decreasing end-to-end delay in a video conferencing context is disclosed. At video conferencing system startup, a processor is initialized to receive either a top field or a bottom field of video frame data. If the first line of a new field arriving after initialization does not match a field state that the processor is initialized to, the present invention senses the state mismatch and adjusts a display buffer by one display line, and the field is stored in the display buffer. The display buffer is adjusted in order to preserve a vertical spatial relationship between the top and bottom fields.

Inventors:	GU; QUNSHAN; (LONDONDERRY, NH) ; ROJAS; JUAN; (SAN JOSE, CA)
Correspondence Address:	WONG, CABELLO, LUTSCH, RUTHERFORD & BRUCCULERI,;L.L.P. 20333 SH 249 SUITE 600 HOUSTON TX 77070 US
Assignee:	POLYCOM, INC. 4750 WILLOW ROAD PLEASANTON CA 94588-2708
Family ID:	32073064
Appl. No.:	11/549927
Filed:	October 16, 2006

Related U.S. Patent Documents


Application Number	Filing Date	Patent Number
10448810	May 30, 2003
11549927	Oct 16, 2006
60384606	May 31, 2002

Current U.S. Class:	375/240.01 ; 348/E17.003; 348/E7.081; 348/E7.082
Current CPC Class:	H04N 7/148 20130101; H04N 17/004 20130101; H04N 7/147 20130101
Class at Publication:	375/240.01
International Class:	H04N 11/04 20060101 H04N011/04

Claims

1. A method of storing received video signals containing top fields and bottom fields, wherein the vertical spatial relationship between top fields and bottom fields is preserved even though the first field received was not the expected field, comprising, initializing a codec, wherein the codec is set to store first field data in the first line of a frame buffer memory; Receiving second field data at the codec; Adjusting the codec to receive the second field data, rather than the expected first field data; and Storing the second field data in the frame buffer.

2. The method of claim 1 wherein, the first field data is top field data and the second field data is bottom field data.

3. The method of claim 1 wherein, the first field data is bottom field data and the second field data is top field data.

4. The method of claim 1 wherein the step of adjusting the codec comprises redirecting a pointer to the first line of the frame buffer memory, to point one line down in the frame buffer memory.

5. The method of claim 1 wherein the step of adjusting the codec comprises remapping the frame buffer memory to add a new line before the first line set to store first field data, and storing the second field data in the new line.

6. The method of claim 1 wherein the step of receiving second field data at the codec comprises receiving second field data from a camera.

7. The method of claim 1 wherein the step of receiving second field data at the codec comprises receiving second field data from a remote video conferencing device.

8. A video conferencing device comprising a codec that once initialized to store first field data upon receipt, is adjustable to store second field data instead of first field data.

9. The device of claim 8 wherein the first field data is top field data and the second field data is bottom field data.

10. The device of claim 8 wherein the first field data is bottom field data and the second field data is top field data.

Description

RELATED APPLICATIONS

[0001] This application claims the benefit under 35 U.S.C. .sctn. 119(e) of U.S. Provisional Patent Application No. 60/384,606, filed May 31, 2002, which is incorporated herein by reference.

BACKGROUND OF THE INVENTION

[0002] 1. Field of the Invention

[0003] The present invention relates generally to video conference systems, and more particularly to decreasing end-to-end delay during video conferencing sessions.

[0004] 2. Description of the Related Art

[0005] The well-known National Television Standards Committee (NTSC) and Phase Alternating Line (PAL) television standards are employed by video cameras and monitors to capture and display video information for consumer applications. Both NTSC and PAL cameras and monitors capture and display video information in an interlaced format. Interlacing refers to a method of capturing two fields of video information per frame. One half of a vertical resolution of a frame (i.e., every other horizontal line) is captured in a first or "top" field. A remaining half of the vertical resolution of the frame is captured in a second or "bottom" field. Each frame of a video picture produced by the NTSC camera or displayed by the NTSC monitor is displayed in a 480-line format with each line having 720 pixels, while the PAL format is displayed in 576 lines. The NTSC video is transmitted at 60 frames per second and, the PAL video is transmitted at 50 frames per second. Adaptations of these formats have been adopted for emerging high-definition television as well.

[0006] Typically, the NTSC or PAL cameras and monitors are used in conjunction with video conferencing systems that implement the International Telecommunications Union (ITU) Telecommunications (ITU-T) H.263 standard (incorporated herein by reference in its entirety, including all annexes, appendices, and subparts thereof), since such devices are much less expensive than equipment that captures video information using progressive (non-interlaced) scan technology. Until recently, however, the H.263 standard did not directly support interlaced video transmission, but supported Common Intermediate Format (CIF), which is a non-interlaced frame consisting of 288 lines of 352 pixels each. Transmission rate for CIF video can be as high as 30 frames per second. Thus, video conference systems had to convert from NTSC (or PAL) into CIF before coding each input video frame. Such a conversion discards some spatial and temporal information, and thus degrades the picture quality. In this context the "spatial information" is the pixels in both vertical and horizontal directions that are not included in the CIF frame. Likewise, the discarded "temporal information" represents the fact that a 50 or 60 frame per second (fps) transmission of the NTSC or PAL standard is down-sampled to 30 fps in the CIF format.

[0007] In recent years, cost of hardware and transmission bandwidth required for coding and transmitting interlaced video pictures has decreased. It is now considered economically practical for a video conferencing system to code interlaced pictures with a full spatial dimension of NTSC or PAL input sources. The ITU has addressed this change in technology by adding Annex W to the H.263 standard.

[0008] Annex W describes how interlaced video signals can be encoded and decoded when transmitted in a single stream (or channel) of video information. The Annex W video encoding (or simply "coding") scheme utilizes a reference frame from one field to predict a picture of another field. However, a top field in an interlaced video transmission scheme is a poor predictor of a bottom field and vice versa. Thus, using the top field to predict the bottom field can lead to poor picture quality during times of low motion.

[0009] This particular form of picture quality degradation is due to the fact that the camera creates a complete picture frame by first scanning for top field information and then scanning for bottom field information. Each field is thus separated spatially (by one line) and temporally (by the refresh period between the end of the top field and the end of the bottom field). This temporal and spatial separation can result in display jitter, which is more noticeable during times of low motion. With this problem in mind, Section W.6.3.11 of Annex W suggests that Annexes N or U of H.263 can be used to predict from more than one previous field. For example, two or three previous fields can be used to form a prediction of the next field. In particular, the field (or fields) to be used for prediction can be chosen (according to Annexes N or U) such that each top field is always predicted from the previous top field (or fields) and each bottom field is always predicted from the previous bottom field (or fields). In this way, the top field can be coded and transmitted in a stream completely separate from the stream containing the bottom field. Using the video information from the same field for prediction thus mitigates the picture quality problem described above.

[0010] This field prediction scheme is also more resilient to errors. If one stream of video information is temporarily dropped, the other stream can continue. Since one field remains, there is always some video information to decode and display, albeit at a slower update rate.

[0011] Further, more than one processor may be used to more efficiently encode a video stream in a multiple-processor architecture. For example, one processor can code the stream of top fields, and a second processor can code the stream of bottom fields, where each processor is programmed to capture and encode either the top or bottom field of video information. Each processor may receive both streams of top or bottom fields and decode one. Conversely, the video conferencing system may be configured such that each processor only receives one of the field streams.

[0012] Several shortcomings exist in the above-described systems. Firstly, dropped fields, caused by large amounts of motion or by transmission errors occurring in any one of the video signal transmission streams, can affect the quality of the displayed picture for an extended period of time. In such cases, the picture quality remains poor until the coding process recovers. For instance, if a field of information is lost during transmission for any reason, and a decoder signals an encoder to encode an "Intra" field (the use of Intra fields described within the H.263 standard), the quality of that half of the picture (i.e., the lost field) will suffer for a period of time that it takes the encoder to recover from the error and/or encode the Intra frame.

[0013] Another shortcoming of prior art systems is that the field that the encoder begins encoding with (at start up) is indeterminate. The receiving video conference system does not know a priori whether the first frame to be received will begin with a top field or a bottom field. This is so because, at the transmitting video conference terminal, the video camera starts generating and sending fields of video information before the encoder is ready to receive the information. After the encoder is itself initialized, the encoder begins processing at the beginning of the next field it sees.

[0014] This situation can cause additional and unacceptable transmission delay. If the received video stream begins with the same field that the encoder was initialized to expect, there are no problems and no added delay in subsequent encoding. If, however, the encoder receives the opposite field than the one that is expected, the encoder will wait (i.e. delay) for as much as an entire field capture time (e.g. 16.7 milliseconds) in order to receive and store the expected field. This image delay will prevail for the entire video conferencing session. Such a systematic delay can lead to unacceptable meeting dynamics and misunderstood conversations.

[0015] In a dual processor implementation, each processor is programmed to capture and encode either the top or the bottom field of video information (i.e. each processor receives both fields of video, however, both fields are not captured and encoded). Generally, at system start time, the encoder randomly sends either the top or bottom field of video information first. Specifically, at the time that the video conferencing system is started, either the top or the bottom field of video can be available to either of the two processors. This is because the video camera starts generating and sending fields of video information prior to the processors being ready to receive video information, and the processors will capture the first field that is available after initialization.

[0016] The first field that the decoder receives can be indeterminate for other reasons as well. For instance, bit errors contained in a field can also cause the field to be dropped at the decoder or lost in the network. At startup, an interrupt is generated by the decoder which has an effect of preparing the decoder to receive either the top or bottom field of video (actually the routine that services this interrupt determines which field the pointer will be initialized to). In some systems, one interrupt is generated every 16.7 msec (NTSB) or 20 msec (PAL), which is a period of time it takes to display one field of information. As a result of this interrupt, a display buffer pointer is set to a particular memory location. This location could, for instance, correspond to a first line (i.e., line 0) of the top field of video information. During normal operation, the display buffer pointer is changed by the processor whenever the processor services the interrupt. This interrupt is generated during a vertical blanking period (i.e., the period during which the monitor scanning moves back to the top of the display screen). The receipt and servicing of this interrupt results in the pointer being moved from a starting position (i.e., either top or bottom field location) to a second position (i.e., either bottom or top field location, respectively). Disadvantageously, if the first field that the encoder captures, encodes, and transmits is not the field that the decoder buffer pointer was initialized to, then the decoder must wait one full encoder capture period (e.g., 16.7 msec) for the next field to arrive. This wait adds 16.7 msec of end-to-end video delay to the system. When the total end-to-end video delay ranges from 150 to 200 msec due to bandwidth availability and network delay, removing 16.7 msec is significant.

[0017] Since the first field that the decoder receives at the start of a video conference session is not determinate, the decoder may have to wait one field capture time (e.g., 16.7 msec) to store the next field in the display buffer, therefore delaying display of the image. This video image delay prevails for the entire video conferencing session.

[0018] One of the main problems with end-to-end video delay is that the delay affects video meeting dynamics. One example of a meeting dynamics problem is if a local person makes a statement and is watching a remote meeting participant waiting for a response and the response is delayed to a point that the local person is not sure whether or not the remote participant understood the statement. Another example is if the local person is listening to the participant and is also waiting for an opportunity to break in to ask a question. If, at the same time, a second remote person is also waiting to break in, in all probability, the second remote person will do so before the local person is aware that the first remote participant has stopped talking. So, in effect, people interrupt one another during a meeting in an "uncontrolled" manner. As this is the case, it is very desirable to have the end-to-end delay time be as short as possible, therefore giving the meeting as "natural" a feeling as possible.

[0019] Therefore there is a need for a method that avoids introduction of additional delay in a video conferencing session.

SUMMARY OF THE INVENTION

[0020] The present invention provides in various embodiments a method for decreasing end-to-end delay in a video conferencing context. According to one embodiment of the present invention a processor is initialized to receive an initial field of video frame data having a first state. The processor receives an initial field of video frame data having either a first state or a second state. If the state of the initial field of video frame data is not the same as a state that the processor is initialized to, then a display buffer is adjusted by one display line, and the initial field of video frame data having a second state is stored in the display buffer.

[0021] According to another embodiment of the present invention, a method is provided for decreasing end-to-end delay in a video conferencing context, where at least one buffer pointer is initialized to either a first state or a second state to form a first initialized buffer pointer. The first state is associated with a top field of the video frame data, and the second state is associated with a bottom field of the video frame data. An initial field of video frame data is received having either the first state or the second state. If the state of the initial field of video frame data is not the same as the state of the first initialized buffer pointer, the state of the first initialized buffer pointer is toggled, and the first received field is stored into a buffer using the first initialized buffer pointer.

[0022] A further understanding of the nature and advantages of the inventions herein may be realized by reference to the remaining portions of the specification and the attached drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

[0023] The foregoing and other advantages of the invention will be appreciated more fully from the following further description thereof and with reference to the accompanying drawings, wherein:

[0024] FIG. 1 shows an exemplary video conferencing system that may be used with the present invention.

[0025] FIG. 2 shows an exemplary high-level schematic view of elements of a video conference terminal.

[0026] FIG. 3 shows a display format for PAL standards.

[0027] FIG. 4 shows a schematic representation of the organization of a video frame buffer where the display buffer pointer is initialized to the top field and the bottom field is received first.

[0028] FIG. 5 shows a schematic representation of the organization of a video frame buffer where the display buffer pointer is initialized to the bottom field and the top field is received first.

[0029] The use of the same reference symbols in different drawings indicates similar or identical items.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

Introduction

[0030] To provide an overall understanding of the present invention, certain illustrative embodiments will now be described in the context of an ITU Standard H.263 video conferencing system.

[0031] It will be understood by those of ordinary skill in the art that the methods and systems described herein may be suitably adapted to other video coding techniques, such as Moving Picture Experts Group (MPEG) standards, Audio Visual Interleave (AVI), or Multiple Image Network Graphics (MNG). All such adaptations and modifications that would be clear to one of ordinary skill in the art are intended to fall within the scope of the invention described herein.

[0032] Furthermore, although the term "coding" is used herein, those of ordinary skill in the art will appreciate that the reciprocal decoding function is also implicated in the use of present invention. Accordingly, all references to coding techniques are to be understood to include decoding techniques unless specifically identified otherwise.

[0033] As used herein, terms such as "image", "image data", "picture", "picture data", "video", "video data", and "video stream" are intended to refer generally to any form of video data, unless specifically stated otherwise. This includes reference images (which may, for example, be represented or described in terms of luminance and chrominance data), differential data, motion vectors, sequential identifiers, and any other coding and control information, whether relating to blocks, macro-blocks, frames, or any other partial or complete image representation, however encoded.

[0034] Referring to FIG. 1 an exemplary video conferencing system that may be used with the present invention is shown. In a video conferencing network 100, a rack 110 may include a multi-point conference unit ("MCU") 120, a gateway 130, and hardware/software for other services 150. The gateway 130 may provide one or more connections to a Public Switched Telephone Network 160, for example, through high speed connections such as Integrated Services Digital Network ("ISDN") lines, T1 lines, or Digital Subscriber Lines ("DSL"). Multiple PSTN video conferencing terminals 170 may also be connected in a communicating relationship with the PSTN 160, and may be accessible using known telecommunications dialing and signaling services.

[0035] The MCU 120 may also be connected in a communicating relationship with a network 180. Multiple Internet Protocol ("IP") video conferencing terminals 190 may also be connected in a communicating relationship with the network 180, and may be accessible using known data networking techniques, such as IP addressing.

[0036] It will be appreciated that, although the following description refers to the network 180 (e.g., an IP network such as the Internet) and the PSTN 160, any network for connecting terminals may be usefully employed according to the principles of the present invention. The network 180, for example, may be any packet-switched network, a circuit-switched network (such as an Asynchronous Transfer Mode ("ATM") network), or any other network for carrying data including the well-known Internet. The network 180 may also be the Internet, an extranet, a local area network, or other networks of networks known in the art. Further, the PSTN 160 may likewise be any circuit-switched network, or any other network for carrying circuit-switched signals or other data. It is additionally appreciated that the PSTN 160 and/or the network 180 may likewise include wireless portions, or may be completely wireless networks. Finally, the principles of the present invention may be usefully employed in any multimedia system.

[0037] It will also be appreciated that the components of the rack 110, such as the MCU 120, the gateway 130, and the other services 150, may be realized as separate physical machines, as separate logical machines on a single physical device, as separate processes on a single logical machine, or some combination of these. Further, a single, physical rack device is not required. Additionally, each component of the rack 110, such as the gateway 130, may comprise a number of separate physical machines grouped as a single logical machine, as for example, where traffic through the gateway 130 exceeds data handling and processing power of a single machine. A distributed video conferencing network may include a number of racks 110, as indicated by ellipsis 192.

[0038] Each PSTN video conferencing terminal 170 may use an established telecommunications video conferencing standard such as H.320. Further, each IP video conferencing terminal 190 may use an established data networking video standard such as H.323. H.320 is an ITU-T standard for sending voice and audio over the PSTN 160, and provides common formats for compatible audio/video inputs and outputs, and protocols that allow a multimedia terminal to utilize the communications links and synchronize audio and video signals. The T.120 standard may also be used to enable data sharing and collaboration. The ITU-T, H.320, and T.120 standards are incorporated herein by reference in their entireties.

[0039] The gateway 130 may communicate with the PSTN 160, and may translate data and other media between a form that is compatible with the PSTN 160 and a form that is compatible with the network 180, including any protocol and media translations required to transport media between the networks.

[0040] Referring now to FIG. 2, some major components of a video conferencing terminal 200 suitable for use with either a PSTN or an IP network are shown in a high-level schematic form. The terminal 200 may include input devices such as a video camera 210, a microphone 215, a keyboard (not shown), and a pointing device (not shown). The terminal 200 may also include output devices such as a speaker 220 and a display system 225. Those of ordinary skill in the conferencing arts will also recognize that additional input and output devices, including but not limited to overhead projectors, projection video systems, and whiteboards, may also be used as components within the video conferencing terminal 200.

[0041] The video conferencing terminal 200 also may contain analog to digital converters ("A/D") 230 for converting analog input signals from one or more sources into a digital form for encoding. An audio coder/decoder ("codec") 240, which may include A/D converter 230 functionality, encodes audio signals for transmission via a transmitter 260. Similarly, a video codec 250 performs analogous functions for video signals.

[0042] In an exemplary embodiment, the video codec 250 comprises separate encoders 252 and 254 for top and bottom video fields that make up each video frame, respectively. The video codec 250 may also include a field splitter 257, combiner/multiplexer 259 functions, and A/D converter function 230, depending on the type and output signal characteristics of the camera 210. Typically, functional blocks 230, 257, 252, 254, and 259 are present in all video encoding systems, and the present description is intended only to convey a functional overview of video signal processing rather than a working schematic.

[0043] While those of ordinary skill in the art will readily recognize the function of a codec, as used herein the term "codec" is not limited to a device or subsystem that performs coding and decoding simultaneously. Instead, the term "codec" is here in used to refer to aggregated functions of coding (or encoding) and decoding, which may be performed exclusively or in combination in one or more physical devices. Thus, in certain instances the term "encoder" (or its equivalent, "coder") is used to connote the encoding function only. In other instances, the term "decoder" is used to connote the decoding function. In still other contexts, the term "codec" may be used as a generalization of either or both functions.

[0044] The video codec 250 and the audio codec 240 (and their counterpart codecs 251 and 241 in the receiving path of the terminal 200, respectively) provide standards-based conferencing according to the H.320 and T.120 standards for PSTN terminals or H.323 standard for IP terminals. These standards may be implemented entirely in software on a computer (not shown), on dedicated hardware, or in some combination of both.

[0045] The terminal 200 also includes a receive path, comprised of a network receiver 270, the audio codec 241 and the video codec 251. The video codec 251 may include a display driver function, or that function may be implemented separately in a display driver 255, as illustrated. Likewise, the audio codec 240 may include a digital to analog ("D/A") converter, or the D/A converter function may be provided externally, as in a D/A converter 245.

[0046] Referring to FIG. 1, the MCU 120 may communicate with the IP video conferencing terminals 190 over the network 180 or with PSTN video conferencing terminals 170 over the PSTN 160. The MCU 120 may also include hardware and/or software implementing the H.323 standard (or the H.320 standard, where the MCU 120 is connected to the PSTN 160) and the T.120 standard, and may also include multipoint control for switching and multiplexing video, audio, and data streams in a multimedia conference. The MCU 120 may additionally include hardware and/or software to receive from, and transmit to, the PSTN video conferencing terminals 170 connected to gateway 130.

[0047] The MCU 120 may reside on one of the racks 110 (as shown in FIG. 1) or may be located elsewhere in the network, as are the MCU's 120a and 120b. It will be appreciated that the MCU 120 may also reside in one of the PSTN video conferencing terminals 170, or one of the IP video conferencing terminals 190, and may be implemented in hardware, software, or some combination thereof.

[0048] The rack 110 may provide additional services for use in a video conference. These may include, for example, audio/video codecs that are not within the H.323 or H.320 standards, such as the G2 codec and streamer for use with a proprietary streaming system sold by ReaINetworks, Inc., or a Windows Media codec for use with proprietary media systems sold by Microsoft Corporation. Other services may include, for example, a directory server, a conference scheduler, a database server, an authentication server, and a billing/metering system.

[0049] Video codecs may include codecs for standards such as H.261 FCIF, H.263 QCIF, H.263 FCIF, H.261 QCIF, and H.263 SQCIF. video teleconferencing standards define different image size and quality parameters. Further, audio codecs may include codecs for standards such as G.711, G.722, G.722.1, and G.723.1. These audio teleconferencing standards define audio data parameters for audio transmission. Any other proprietary or non-proprietary standards currently known or that may be developed in the future for audio, video, and data may likewise be used with the present invention, and are intended to be encompassed by this description. For example, current H.320 devices typically employ monaural sound; however, the principles of the invention may be readily adapted to a conferencing system employing stereo coding and reproduction, or any other spatial sound representation. Each and every standard recited herein is hereby incorporated by reference in its entirety, including any and all appendices, annexes, and subparts thereof, as if it were set forth herein.

Delay Avoidance

[0050] Referring to FIG. 2, video conferencing delay is avoided by ensuring that each received field is stored in a local video buffer memory (i.e., at the transmit video conferencing terminal or the receive video conferencing terminal, as appropriate) without loss of any fields due to a mismatch between the initialized state of the video buffer pointer and the state of the first received video field. "State" in the context of this application refers to the association of both the video buffer pointer and contents of a received video field with one of two types of fields (i.e., a top or a bottom field). A particular instance of a buffer pointer identifying a buffer location at which to begin storing the top field of a received video frame has a "top" state; the instance of a pointer identifying a buffer for the bottom field has a "bottom" state. Likewise, the video data first received after initialization of the camera 210 and the encoder 250 (or the receiver 270 and the decoder 251, at the receiving video conferencing terminal) is always the first line of either the top field or the bottom field, by common definition of the interlaced video standards. First received datum in a given field (or the beginning of a field, generally) is thus referred to herein as having either a "top" or "bottom" state, respectively.

[0051] At video conferencing system startup, both the video encoder 250 and the video decoder 251 are initialized to receive either a top field or a bottom field of video frame data. As part of this initialization, a display buffer pointer is set to a particular memory location at each video conferencing terminal (or "end" of the conference), corresponding, for example to the first line of the top field of video information. A second display buffer and its associated pointer are maintained by a local processor for the bottom field. Alternatively, a second, separate processor can be employed to buffer alternating fields.

[0052] As field information is received by the video conferencing system (either from the local camera 210 or from a transmitting terminal), the data is temporarily stored (i.e., buffered) in the local display buffer. During normal operations, the display buffer pointer is changed by the processor during a vertical blanking period of each frame to reset the pointer to a beginning of the buffer in preparation for the next field. For example, if the first field received is a top field, the display buffer pointer must be reset to the beginning of the bottom field buffer after the top field has been displayed.

[0053] Regardless of the initial state of the display buffer pointer, if the first line of a new field arriving after initialization is not what was expected (i.e., does not match the field state of the buffer pointer), the present invention senses the state mismatch, and dynamically resets the buffer pointer to point to the correct buffer. Since the buffer pointer has only two possible states (i.e., pointing to the top field or the bottom field), a dynamic reset can take the form of a state toggle.

[0054] Referring to FIG. 3 an example of a display format for PAL standards 300 is shown. At video conferencing system startup, both the video encoder 250 (FIG. 2) and the video decoder 251 (FIG. 2) are initialized to receive either a top field 310 or a bottom field 320 of video frame data. As part of this initialization, a display buffer pointer is set to a particular memory location at each video conferencing terminal corresponding to the first line of a field of video information. A second display buffer and its associated pointer can be maintained by the local processor for the bottom field. Alternatively, a second, separate processor may be employed to buffer alternating fields.

[0055] The video processor senses the received field state when the video processor decodes the video and picture layer information. In particular, a PSUPP field in the picture layer of an H.263-complaint video signal contains, within the Picture Message (function type [FTYPE] 14), an indication of whether the field is the top field 310 or the bottom field 320. The PSUPP field is, itself, fully described in section W.63 of Annex W to the H.263 standard, and is thus well-known to persons of ordinary skill in the art.

[0056] As field information is received by the video conferencing terminal 170 (FIG. 1) from a local camera or from a transmitting terminal, the data may be temporarily stored (buffered) in the local display buffer by the terminal's video processor. During normal operations, the display buffer pointer is changed by the processor during the vertical blanking period of each frame to reset the pointer to the beginning of the buffer in preparation for the next field. For example, if the first field received is the top field 310, the display buffer should be reset to the beginning of the bottom field buffer after the top field 310 has been displayed.

[0057] Regardless of the initial state of the display buffer pointer, if the first line of a new field arriving after initialization (the "initial field") does not match the field state of the buffer pointer, the video processor senses the state mismatch, and dynamically resets the buffer pointer to point to the correct buffer, examples of which are shown in FIGS. 4 and 5. Where, as in PAL or NTSC video, the buffer pointer has only two possible states (i.e., pointing to the top field or the bottom field), this "dynamic reset" can take the form of a state toggle.

[0058] The buffer pointer can be initialized to either a first state or a second state. The first state associated with the top field 310 and the second state associated with the bottom field 320 of video frame data. Referring to FIG. 4, a schematic representation of an organization of an exemplary embodiment of a video frame buffer 411, where the display buffer pointer is initialized to the top field 310 (FIG. 3) and the bottom field 320 (FIG. 3) is received first, is shown. In this scenario, because the state of the initial field of video frame data is not the same as the state of the first initialized buffer pointer, the state of the first initialized buffer pointer is toggled 421, and the first received field is stored into a buffer using the first initialized buffer pointer. In other words, the processor will immediately cause the display buffer pointer to reposition the display lines such that a vertical spatial relationship between top and bottom lines is preserved. In this embodiment, the toggling 421 can be a change of state of the first initialized buffer pointer or the replacement of the first initialized buffer pointer with a second initialized buffer pointer having a state different from the state of the first initialized buffer pointer.

[0059] Other embodiments not heavily dependent on buffer pointers and their adjustment exist. In such a case, the processor is initialized to receive an initial field of video frame data having a first state, but the processor receives an initial field of video frame data having a second state. The display buffer is then adjusted by one display line, and the initial field of video frame data having a second state is stored into the display buffer. As shown in FIG. 4, the first state is the top field 310, and the second state is the bottom field 320. Therefore, the display buffer is adjusted down one display line. At this moment, the display buffer 411 is remapped for an additional position 422. The toggling 421, in this embodiment, is the adjustment of the display position of the fields by one line downward such that, although bottom field 320 lines go into top field 310 lines in the display buffer, the vertical spatial relationship between top field 310 and bottom field 320 lines is preserved.

[0060] Referring to FIG. 5, a schematic representation of an organization of an exemplary embodiment of a video frame buffer 511, where the display buffer pointer is initialized to the bottom field 320 (FIG. 3) and the top field 310 (FIG. 3.) is received first, is shown. In this embodiment, because the state of the initial field of video frame data is not the same as the state of the first initialized buffer pointer, the state of the first initialized buffer pointer is toggled 521, and the first received field is stored into a buffer using the first initialized buffer pointer. In other words, the processor will immediately cause the display buffer pointer to reposition the display lines such that the vertical spatial relationship between top and bottom lines is preserved. In this embodiment, the toggling 521 can be a change of state of the first initialized buffer pointer, or the replacement of the first initialized buffer pointer with a second initialized buffer pointer having a state different from the state of the first initialized buffer pointer.

[0061] Other embodiments not heavily dependent on buffer pointers and their adjustment exist. In such a case, the processor is initialized to receive an initial field of video frame data having a first state, but the processor receives an initial field of video frame data having a second state. As shown in FIG. 5, the first state is the bottom field 320 and the second state is the top field 310. Therefore, the display buffer is adjusted up one display line. At this moment, the buffer is remapped to add an additional position 522. The toggling 521 in this embodiment is the adjustment of the display position of the fields by one line upward such that, although top field 510 lines go into bottom field 520 lines in the display buffer, the vertical spatial relationship between top field 510 and bottom field 520 lines is preserved.

[0062] The method of the present invention may be performed in hardware, software, or any combination thereof, as those terms are currently known in the art. In particular, the present method may be carried out by software, firmware, or microcode operating on a computer or computers of any type. Additionally, software embodying the present invention may comprise computer instructions in any form (e.g., source code, object code, interpreted code, etc.) stored in any computer-readable medium (e.g., ROM, RAM, magnetic media, punched tape or card, compact disc (CD) in any form, DVD, etc.). Furthermore, such software may also be in the form of a computer data signal embodied in a carrier wave, such as that found within Web pages transferred among devices connected to the Internet. Accordingly, the present invention is not limited to any particular platform, unless specifically stated otherwise herein.

[0063] The above description is illustrative and not restrictive. Many variations of the invention will become apparent to those skilled in the art upon review of this disclosure. The scope of the invention should therefore be determined not with reference to the above description, but instead should be determined with reference to the appended claims along with their full scope of equivalents.

* * * * *