U.S. patent application number 15/094808 was filed with the patent office on 2017-03-09 for system and method for decoding 3d stereoscopic digital video.
The applicant listed for this patent is TD VISION CORPORATION S.A. DE C.V.. Invention is credited to Manuel Rafael Gutierrez Novelo.
Application Number | 20170070742 15/094808 |
Document ID | / |
Family ID | 34910116 |
Filed Date | 2017-03-09 |
United States Patent
Application |
20170070742 |
Kind Code |
A1 |
Gutierrez Novelo; Manuel
Rafael |
March 9, 2017 |
SYSTEM AND METHOD FOR DECODING 3D STEREOSCOPIC DIGITAL VIDEO
Abstract
Described herein is a MPEG-2 compatible stereoscopic 3D-video
image digital decoding method and system. In order to obtain
3D-images from a digital video stream, modifications are made to
the current MPEG2 decoders, by means of software and hardware
changes in different parts of the decoding process. Namely, the
video_sequence structures of the video data stream are modified via
software to include the necessary flags at the bit level of the
image type in the TDVision.RTM. technology. Modifications are also
made in the decoding processes as well as in decoding the
information via software and hardware, wherein a double output
buffer is activated, a parallel and difference decoding selector is
activated, the decompression process is executed, the corresponding
output buffer is displayed; and the decoder is programmed via
software to simultaneously receive and decode two independent
program streams, each with an TDVision.RTM. stereoscopic
identifier.
Inventors: |
Gutierrez Novelo; Manuel
Rafael; (Naperville, IL) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
TD VISION CORPORATION S.A. DE C.V. |
Col. Nueva Santa Maria |
|
MX |
|
|
Family ID: |
34910116 |
Appl. No.: |
15/094808 |
Filed: |
April 8, 2016 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
11510262 |
Aug 25, 2006 |
|
|
|
15094808 |
|
|
|
|
PCT/MX2004/000012 |
Feb 27, 2004 |
|
|
|
11510262 |
|
|
|
|
Current U.S.
Class: |
1/1 |
Current CPC
Class: |
H04N 19/61 20141101;
H04N 19/42 20141101; H04N 19/597 20141101; H04N 19/593 20141101;
H04N 19/44 20141101; H04N 19/70 20141101; H04N 19/30 20141101 |
International
Class: |
H04N 19/44 20060101
H04N019/44; H04N 19/597 20060101 H04N019/597; H04N 19/42 20060101
H04N019/42 |
Claims
1. A method for decoding stereoscopic digital video, comprising:
receiving one or more video streams, wherein the one or more video
streams comprise headers and first eye images; reading the headers
to determine if the one or more video streams do or do not also
comprise deltas, wherein the deltas, when present, were calculated
by subtracting the first eye images from second eye images, and
wherein if the headers indicate that the deltas are present, adding
the deltas to the first eye images to form decoded second eye
images; wherein receiving one or more video streams comprises
receiving an MPEG-compatible digital video stream comprising a
first eye image, a header, and a data structure stored in the
MPEG-compatible digital video stream with the first eye image,
wherein reading the headers comprises reading bits in the header of
the MPEG-compatible digital video stream to determine if the header
has bit flags signifying the presence of a delta stored in the data
structure, and wherein the method further comprises if the header
does not have the bit flags, decoding the first eye image according
to MPEG standards and outputting an MPEG-compatible two-dimensional
digital video comprising the decoded first eye image; and if the
header has the bit flags, decoding the first eye image and the
second eye image, the second eye image decoded by comparing the
delta stored in the data structure to the first eye image, and
outputting a stereoscopic digital video comprising the decoded
first eye image and the decoded second eye image.
2. The method of claim 1, wherein, if the header has the bit flags,
the bit flags comprise a stereoscopic identifier configured to
indicate the presence of a delta stored in the data structure of
the digital video stream.
3. The method of claim 2, wherein the data structure is a
PICTURE_DATA3D( ) data structure.
4. The method of claim 2, wherein the data structure is a
USER_DATA( ) data structure.
5. The method of claim 1, wherein the header further comprises a
series of bits configured to indicate the aspect ratio for a
two-dimensional digital video, and wherein, if the header does not
have the bit flags, the first eye image is decoded using the
two-dimensional digital video aspect ratio.
6. The method of claim 1, wherein, if the header has the bit flags,
the header further comprises a series of bits configured to
indicate the aspect ratio for a stereoscopic digital video, and the
first eye image and the second eye image are decoded using the
stereoscopic digital video aspect ratio.
7. The method of claim 1, wherein, if the header has the bit flags,
the method further comprises configuring a digital signal processor
to activate a double buffer and identifying the aspect ratio of the
stereoscopic digital video.
8. The method of claim 1, wherein, if the header has the bit flags,
the delta stored in the data structure is a B type image frame.
9. The method of claim 1, wherein, if the header does not have the
bit flags, the method comprises outputting a two-dimensional
digital video comprising an MPEG2 image.
10. The method of claim 1, wherein reading the bits in the header
comprises identifying, using a digital signal processor, a video
sequence in the digital video stream and determining a type of the
digital video stream.
11. The method of claim 1, wherein, if the header has the bit
flags, the stereoscopic digital video is outputted to a high
definition television system or a stereoscopic vision system.
12. A system for displaying stereoscopic digital video to a user,
comprising: a receiver configured to receive one or more video
streams, wherein the one or more video streams comprise a header
and a first eye image; a decoder configured to read the header and
determine if the one or more video streams do or do not also
comprise a delta, wherein the delta, when present, was calculated
by subtracting the first eye image from a second eye image to
determine a difference between the first eye image and the second
eye image, and wherein if the header indicates that the delta is
present, adding the delta to the first eye image to form a decoded
second eye image; wherein the receiver is configured to receive an
MPEG-compatible digital video stream comprising a first eye image,
a header, and a data structure stored in the video stream with the
first eye image; and wherein the decoder is an MPEG-compatible
decoder configured to decode the MPEG-compatible digital video
stream, the decoder further configured to: read bits in the header,
determine if the header has bit flags signifying the presence of a
delta stored in the data structure, the delta, when present in the
data structure, calculated by determining a difference between the
first eye image and a second eye image, if the header does not have
the bit flags, decode the first eye image according to MPEG
standards and output an MPEG-compatible two-dimensional digital
video comprising the decoded first eye image, and if the header has
the bit flags, decode the first eye image and the second eye image,
the second eye image decoded by comparing the delta stored in the
data structure to the first eye image, and output a stereoscopic
digital video comprising the decoded first eye image and the
decoded second eye image.
13. The system of claim 12, wherein, if the header has the bit
flags, the bit flags comprise a stereoscopic identifier configured
to indicate the presence of a delta stored in the data structure of
the digital video stream.
14. The system of claim 13, wherein the data structure is a
PICTURE_DATA3D( ) data structure.
15. The system of claim 13, wherein the data structure is a
USER_DATA( ) data structure.
16. The system of claim 12, wherein the header further comprises a
series of bits configured to indicate the frame rate for a
two-dimensional digital video, and wherein, if the header does not
have the bit flags, the first eye image is decoded using the
two-dimensional digital video frame rate.
17. The system of claim 12, wherein, if the header has the bit
flags, the header further comprises a series of bits configured to
indicate the frame rate for a stereoscopic digital video, and the
first eye image and the second eye image are decoded using the
stereoscopic digital video frame rate.
18. The system of claim 12, wherein the system further comprises a
digital signal processor, and wherein, if the header has the bit
flags, the digital signal processor is configured to activate a
double buffer and identify the aspect ratio of the stereoscopic
digital video.
19. (canceled)
20. The system of claim 12, wherein, if the header does not have
the bit flags, the video output is configured to output a
two-dimensional digital video comprising an MPEG2 image.
21. (canceled)
22. The system of claim 12, further comprising a digital signal
processor configured to identify a video sequence in the digital
video stream and determine a type of the digital video stream.
23.-31. (canceled)
Description
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application is a continuation of U.S. application Ser.
No. 11/510,262, filed Aug. 25, 2006, which is a continuation of PCT
Application No. PCT/MX2004/000012 filed on Feb. 27, 2004 in the
Spanish language. The disclosures of all the above-referenced
applications, publications, and patents are considered part of the
disclosure of this application, and are incorporated by reference
herein in their entirety.
FIELD OF THE INVENTION
[0002] The present invention is related to stereoscopic video image
display in the 3DVisor.RTM. device and, particularly, to a video
image decoding method by means of a digital data compression
system, which allows the storage of three-dimensional information
by using standardized compression techniques.
BACKGROUND OF THE INVENTION
[0003] Presently, data compression techniques are used in order to
decrease the bits consumption in the representation of an image or
a series of images. The standardization works were carried out by a
group of experts of the International Standardization Organization.
Presently, the methods are usually known as PEG (Joint Photographic
Expert Group), and MPEG (Moving Pictures Expert Group).
[0004] A common characteristic of these techniques is that the
image blocks are processed by means of the application of a
transform adequate for the block, usually known as Discrete Cosine
Transform (DCT). The formed blocks are submitted to a quantization
process, and then coded with a variable-length code.
[0005] The variable-length code is a reversible process, which
allows the exact reconstruction of that which has been coded with
the variable-length code.
[0006] The display of digital video signals includes a certain
number of image frames (30 to 96 fps) displayed or represented
successively at a 30 to 75 Hz frequency. Each image frame is still
an image formed by a pixels array, of the display resolution of a
particular system. By example, the VHS system has a display
resolution of 320 columns and 480 rows, the NTSC system has a
display resolution of 720 columns and 486 rows, and the high
definition television system (HDTV) has a display resolution of
1360 columns and 1020 rows. In reference to a digitized form of low
resolution, 320 columns by 480 rows VI-IS format, a two-hour long
movie could be equivalent to 100 gigabytes of digital video
information. In comparison, a conventional compact optical disk has
an approximate capacity of 0.6 gigabytes, a magnetic hard disk has
a 1-2 gigabyte capacity, and the present compact optical disks have
a capacity of 8 or more gigabytes.
[0007] All images we watch at the cinema and TV screens are based
on the principle of presenting complete images (static images, like
photographs) at a great speed. When they are presented in a fast
and sequential manner at a 30 frames per second speed (30 fps) we
perceive them as an animated image due to the retention of the
human eye.
[0008] In order to codify the images to be presented in a
sequential manner and form video signals, each image needs to be
divided in rows, where each line is in turn divided in picture
elements or pixels, each pixel has two associated values, namely,
luma and chroma. Luma represents the light intensity at each point,
while luma represents the color as a function of a defined color
space (RGB), which can be represented by three bytes.
[0009] The images are displayed on a screen in a
horizontal-vertical raster, top to bottom and left to right and so
on, cyclically. The number of lines and frequency of the display
can change as a function of the format, such as NTSC, PAL, or
SECAM.
[0010] The video signals can be digitized for storage in digital
format, after being transmitted, received, and decoded to be
displayed in a display device, such as a regular television set or
the 3DVisor.RTM., this process is known as analog-to-digital video
signal coding-decoding.
[0011] By definition, MPEG has two different methods for
interlacing video and audio in the system streams.
[0012] The transport stream is used in systems with a greater error
possibility, such as satellite systems, which are susceptible to
interference. Each package is 188 bytes long, starting with an
identification header, which makes recognizing gaps and repairing
errors possible. Various audio and video programs can be
transmitted over the transport stream simultaneously on a single
transport stream; due to the header, they can be independently and
individually decoded and integrated into many programs.
[0013] The program stream is used in systems with a lesser error
possibility, as in DVD playing. In this case, the packages have a
variable-length and a size substantially greater than the packages
used in the transport stream. As a main characteristic, the program
stream allows only a single program content.
[0014] Even when the transport and program streams handle different
packages, the video and audio formats are decoded in an identical
form.
[0015] In turn, there are three compression types, which are
applied to the packages above, e.g. time prediction, compression,
and space compression.
[0016] Decoding is associated to a lengthy mathematical process,
which purpose is to decrease the information volume. The complete
image of a full frame is divided by a unit called macroblock, each
macroblock is made up of a 16 pixels.times.16 pixels matrix, and is
ordered and named top to bottom and left to right. Even with a
matrix array on screen, the information sent over the information
stream follows a special sequential sequence, i.e. the macroblocks
are ordered in ascending order, this is, macroblock0, macroblock1,
etc.
[0017] A set of consecutive macroblocks represents a slice; there
can be any number of macroblocks in a slice given that the
macroblocks pertain to a single row. As with the macroblocks, the
slices are numbered from left to right and bottom to top. The
slices should cover the whole image, as this is a form in which
MPEG2 compresses the video, a coded image not necessarily needs
samples for each pixel. Some MPEG profiles require handling a rigid
slice structure, by which the whole image should be covered.
[0018] U.S. Pat. No. 5,963,257 granted on Oct. 5, 1999 to Katata et
al., protects a flat video image decoding device with means to
separate the coded data by position areas and image form, bottom
layer code, predictive coding top layer code, thus obtaining a
hierarchical structure of the coded data; the decoder has means to
separate the data coded in the hierarchical structure in order to
obtain a high quality image,
[0019] U.S. Pat. No. 6,292,588 granted on Sep. 18, 2001 to Shen et
al., protects a device and method for coding predictive flat images
reconstructed and decoded from a small region, in such way that the
data of the reconstructed flat image is generated from the sum of
the small region image data and the optimal prediction data for
said image. Said predictive decoding device for an image data
stream includes a variable-length code for unidimensional DCT
coefficients. U.S. Pat. No. 6,370,276 granted on Apr. 9, 2002 to
Boon, uses a decoding method similar to the above.
[0020] U.S. Pat. No. 6,456,432 granted on Sep. 24, 2002 to Lazzaro
et al., protects a stereoscopic 3D-image display system, which
takes images from two perspectives, displays them on a CRT, and
multiplexes the images in a field-sequential manner with no
flickering for both eyes of the observer.
[0021] U.S. Pat. No. 6,658,056 granted on Dec. 2, 2003 to Duruoz et
al., protects a digital video decoder comprising a logical display
section responding to a "proximal field" command to get a digital
video field of designated locations in an output memory. The
digital video display system is equipped with a MPEG2 video
decoder. Images are decoded as a memory buffer, the memory buffer
is optimized maintaining compensation variable tables and accessing
fixed memory pointer tables displayed as data fields.
[0022] U.S. Pat. No. 6,665,445 granted on Dec. 16, 2003 to Boon,
protects a data structure for image transmission, a flat images
coding method and a flat images decoding method. The decoding
method is comprised of two parts, the first part to codify the
image-form information data stream, the second part is a decoding
process for the pixel values of the image data stream, both parts
can be switched of the flat image signal coding.
[0023] U.S. Pat. No. 6,678,331 granted on Jan. 13, 2004 to Moutin
et al., protects a MPEG decoder, which uses a shared memory.
Actually, the circuit includes a microprocessor, a MPEG decoder,
which decodes a Out image sequence, and a common memory for the
microprocessor, and the decoder. It also includes a circuit for
evaluating the decoder delay, and a control circuit for determining
the memory priority for the microprocessor or the decoder.
[0024] U.S. Pat. No. 6,678,424 granted on Jan. 13, 2004 to
Ferguson, protects a behavior model for a real-time human vision
System; actually, it processes two image signals in two dimensions,
one derived from the other, in different channels.
BRIEF DESCRIPTION OF THE INVENTION
[0025] It is an object of the present invention to provide a
stereoscopic 3D-video image digital decoding system and method,
comprised of changes in software and changes in hardware.
[0026] It is an additional object of the present invention to
provide a decoding method where the normal video_sequence process
is applied to the coded image data, i.e. variable_length_decoding
(VLD), inverse_scan; inverse_quantization,
inverse_discrete_cosine_transform (IDCT), and
motion_compensation.
[0027] It is also an object of the present invention to make
changes in the software information for decoding the identification
of the video format, 2D-images MPEG2 backward compatibility,
discriminating a TDVision.RTM. type image, storing the last image
buffer, applying information decoding, applying error correction
and storing the results in the respective channel buffer.
[0028] It is still another object of the present invention to
provide a decoding method with the video_sequence process normal
form, in such a way that when a TDVision.RTM. type image is found,
the buffer of the last complete image is stored in the left or
right channel buffers.
[0029] It is also another object of the present invention to
provide a decoding process in which two interdependent (difference)
video signals can be sent within the same video_sequence, in which
information decoding is applied and is stored as a B type
frame.
[0030] It is still another object of the present invention to
provide a decoding process in which error correction is applied to
the last obtained image when the movement and color correction
vectors are applied.
[0031] It is also an object of the present invention to program the
decoder by software, to simultaneously receive and codify two
independent program streams.
[0032] It is still another object of the present invention to
provide a decoding system, which decodes the 3D-image information
via hardware, in which a double output buffer is activated.
[0033] It is another object of the present invention to provide a
decoding system of 3D-image information, which activates an
image-decoding selector in parallel and by differences.
[0034] It is also another object of the present invention to
provide a 3D-image information decoding system, which executes the
decompression process and displays the corresponding output
buffer.
BRIEF DESCRIPTION OF THE DRAWINGS
[0035] FIG. 1 represents one embodiment of a technology map
[0036] FIG. 2 shows a flowchart in which the steps of one
embodiment of a process are outlined.
[0037] FIG. 3 illustrates structures that can be modified and the
video_sequence of the data stream in order to identify the
TDVision.RTM. technology image type at the bit level.
[0038] FIG. 4 shows one embodiment of the compilation software
format for the TDVision.RTM. decoding method (40).
[0039] FIG. 5 is a representation of one embodiment of the decoding
compilation format of the hardware.
DETAILED DESCRIPTION OF THE INVENTION
[0040] The combination of hardware and software algorithms makes
possible the stereoscopic 3D-image information compression, which
are received as two independent video signals but with the same
time_code, corresponding to the left and right signals coming from
a 3Dvision.RTM. camera, by sending two simultaneous programs with
stereoscopic pair identifiers, thus promoting the coding-decoding
process. Also, two interdependent video signals can be handled by
obtaining their difference, which is stored as a "B" type frame
with the image type identifier. As the coding process was left open
in order to promote the technological development, it is only
necessary to follow this decoding process, namely: apply
variable-length decoding to the coded data where a substantial
reduction is obtained, but a look-up table should be used to carry
out decoding; apply an inverse scan process; apply an inverse
quantization process in which each data is multiplied by a scalar;
apply the inverse cosine transform function; apply error correction
or motion compensation stage and eventually obtain the decoded
image.
[0041] The novel characteristics of this invention in connection
with its structure and operation method will be better understood
from the description of the accompanying figures, together with the
attached specification, where similar numerals refer to similar
parts and steps.
[0042] FIG. 1 represents the technology map to which the subject
object of the present invention pertains. It shows a stereoscopic
3D-image coding and decoding system and corresponding method. The
images come from a stereoscopic camera (32), the information
compiled in (31) and are displayed in any adequate system (30) or
(33). The information is coded in (34) and then it can be
transmitted to a system having an adequate previous decoding stage
such as (35), which may be a cable system (36), a satellite system
(37), a high definition television system (38) or a stereoscopic
vision system such as TDVision.RTM.'s 3DVisors.RTM. (39).
[0043] FIG. 2 shows a flowchart in which the steps of the process
are outlined. The objective is to obtain three-dimensional images
from a digital video stream by making modifications to the current
MPEG2 decoders, and changes to software (3) and hardware (4) in the
decoding process (2): the decoder (1) should be compatible with
MPEG2-4.
[0044] FIG. 3 outlines the structures that should be modified and
the video_sequence of the data stream in order to identify the
TDVision.RTM. technology image type at the bit level.
[0045] Each of the stages of the decoding process is detailed below
(20):
[0046] The coded data (10) are bytes with block information,
macroblocks, fields, frames, and MPEG2 format video images.
[0047] Variable_length_decoding (11) (VLC, Variable-length Decoder)
is a compression algorithm in which the most frequent patterns are
replaced by shorter codes and those occurring less frequently are
replaced by longer codes. The compressed version of this
information occupies less space and can be transmitted faster by
networks. However, it is not an easily editable format and requires
decompression using a look-up table.
[0048] For example, the word BEETLE
TABLE-US-00001 Letter ASCII Code VLC B 01000010 0000 0010 10 E 0110
0101 11 L 0110 1100 0001 01 T 0111 0100 0100
[0049] Therefore, the ASCII code for the word is:
[0050] 0100 0010 0110 0101 0110 0101 0111 01000 0110 1100 0110
0101
[0051] in VLC: 0000 0010 10 11 11 0100 00010 01 11.
[0052] A substantial decrease is noted, however, in order to go
back from VLC to the word `Beetle` a search in the look-up table is
needed to decode the bit stream, this is made by exact comparison
of the read bits.
[0053] Inverse scan (12): The information should be grouped by
blocks, and by coding the information with the VLC a linear stream
is obtained. The blocks are 8.times.8 data matrixes, so it is
necessary to convert the linear information in a square 8.times.8
matrix. This is made in a descending zigzag manner, top to bottom
and left to right in both sequence types, depending on whether it
is a progressive image or an interlaced image.
[0054] Inverse Quantization (13): It consists simply in multiplying
each data value by a factor. When codified, most of the data in the
blocks are quantized to remove information that the human eye is
not able to perceive, the quantization allows to obtain a greater
MPEG2 stream conversion, and it is also required to perform the
inverse process (Inverse quantization) in the decoding process.
[0055] Inverse cosine transform (14) (IDCT,
inverse_discrete_cosine_transform): The data handled within each
block pertain to the frequency domain, this inverse cosine
transform allows to return to the samples of the space domain. Once
the data in the IDCT have been transformed, pixels, colors and
color corrections can be obtained.
[0056] Motion compensation (15) allows to correct some errors
generated before the decoding stage of MPEG format, motion
compensation takes as a reference a previous frame and calculates a
motion vector relative to the pixels (it can calculate up to four
vectors), and uses them to create a new image. This motion
compensation is applied to the P and B type images, where the image
position is located over a "t" time from the reference images.
Additionally to the motion compensation, the error correction is
also applied, as it is not enough to predict the position of a
particular pixel, but a change in its color can also exist Thus,
the decoded image is obtained (16).
[0057] To decode a P or B type image, the reference image is taken,
the motion vectors are algebraically added to calculate the next
image, and finally the error correction data is applied, thus
generating the decoded image successfully. Actually, in the
video_sequence, two interdependent video signals exist, "R-L=delta,
the delta difference is that stored as a B type stereoscopic pair
frame with TDVision.RTM. identifier and which is constructed at the
moment of decoding by differences from the image. This is,
R-delta=L and L-delta=R, the left image is constructed from the
difference with the right image, which in turn is constructed from
the difference with the left image.
[0058] The previous process is outlined in such a way that the left
or right signal is taken, both are stored in a temporary buffer,
then the difference between the left and right signals is
calculated, and then it is coded as a B type image stored in the
video_sequence to be later decoded by differences from said
image.
[0059] In the decoding process it can be deducted that the data
inputted by the VLC stage are much smaller than the data outputted
by the same stage.
[0060] MPEG video sequence structure: This is the maximum structure
used in the MPEG2 format and has the following format:
[0061] Video sequence (Video_Sequence)
[0062] Sequence header (Sequence_Header)
[0063] Sequence extension (Sequence_Extension)
[0064] User Data (0) and Extension (Extension_and_User_Data
(0))
[0065] Image group header (Group_of_Picture_Header)
[0066] User Data (1) and Extension (Extension_and_User Data
(1))
[0067] Image header (Picture_Header)
[0068] Coded image extension (Picture_Coding_Extension)
[0069] User Data (2) and Extensions (Extension_and_User Data
(2))
[0070] Image Data (Picture_Data)
[0071] Slice(Slice)
[0072] Macroblock (Macroblock)
[0073] Motion vectors (Motion_Vectors)
[0074] Coded Block Pattern (Coded_Block_Pattern)
[0075] Block (Block)
[0076] Final Sequence Code (Sequence_end_Code)
[0077] These structures make up the video sequence. A video
sequence is applied for MPEG format, in order to differentiate each
version there should be a validation that immediately after the
sequence header, the sequence extension is present; should the
sequence extension not follow the header, then the stream is in
MPEG1 format.
[0078] At the beginning of a video sequence, the sequence_header
and sequence_extension appear in the video_sequence. The
sequence_extension repetitions should be identical on the first try
and the "s" repetitions of the sequence_header vary little compared
to the first occurrence, only the portion defining the quantization
matrixes should change. Having sequences repetition allows a random
access to the video stream, i.e., if the decoder wants to start
playing at the middle of the video stream this may be done, as it
only needs to find the sequence_header and sequence_extension prior
to that moment in order to decode the following images. This also
happens for video streams that could not start from the beginning,
such as a satellite decoder turned on after the transmission
time.
[0079] The full video signal coding-decoding process is comprised
of the following steps:
[0080] Digitizing the video signals, which can be done in NTSC, PAL
or SECAM format.
[0081] Storing the video signal in digital form
[0082] Transmitting the signals
[0083] Recording the digital video stream in a physical media (DVD,
VCD, MiniDV)
[0084] Receiving the signals
[0085] Playing the video stream
[0086] Decoding the signal
[0087] Displaying the signal
[0088] It is essential to double the memory to be handled by the
adequate DSP and have the possibility of disposing of up to 8
output buffers, which allow the previous and simultaneous
representation of a stereoscopic image on a device such as
TDVision.RTM.'s 3DVisor.RTM.
[0089] Actually, two channels should be initialized when calling
the programming API of the DSP as, by example, the illustrative
case of the Texas Instruments TMS320C62X DSP.
[0090] MPEG2VDEC_create (const IMPEG2VDEC_fxns*fxns, const
MEPG2VDEC_Params* params).
[0091] Where IMPEG2VDEC_fxns y MEPG2VDEC_Params are pointer
structures defining the operation parameters for each video
channel, e.g.:
[0092] 3DLhandle=MPEG2VDEC_create (fxns3DLEFT,Params3DLEFT).
[0093] 3DRhandle=MPEG2VDEC_create(fxns3DRIGHT,Params3DRIGHT.
[0094] Thereby enabling two video channels to be decoded and
obtaining two video handlers, one for the left-right stereoscopic
channel.
[0095] A double display output buffer is needed and by means of
software, it will be defined which of the two buffers should
display the output by calling the AP function:
[0096] Namely, MPEG2VDEC_APPLY(3DRhandle, inputR1, inputR2,
inputR3, 3doutrightpb, 3doutright_fb).
[0097] MPEG2VDEC_APPLY(3DLhandle, inputL1, inputL2, inputL3,
3doutleft_pb, 3doutleft_fb).
[0098] This same procedure can be implemented for any DSP,
microprocessor or electronic device with similar functions.
[0099] Where 3DLhandle is the pointer to the handle returned by the
DSP's create function, the input1 parameter is the
FUNC_DECODE_FRAME or FUNC_START_PARA address, input2 is the pointer
to the external input buffer address, and input3 is the size of the
external input buffer size.
[0100] 3doutleft_pb is the address of the parameter buffer and
3doutleft_fb is the beginning of the output buffer where the
decoded image will be stored.
[0101] The timecode and timestamp will be used for output to the
final device in a sequential, synchronized manner.
[0102] It is essential to double the memory to be handled by the
DSP and have the possibility of disposing of up to 8 output buffers
which allow the previous and simultaneous display of a stereoscopic
image on a device such as TDVision.RTM. Corporation's
3DVisor.RTM..
[0103] The integration of software and hardware processes is
carried out by devices known as DSP, which execute most of the
hardware process. These DSP are programmed by a C and Assembly
language hybrid provided by the manufacturer. Each DSP has its own
API, consisting of a functions list or procedure calls located in
the DSP and called by software.
[0104] With this reference information, the present application for
MPEG2 format-compatible 3D-images decoding is made.
[0105] Actually, at the beginning of a video sequence the sequence
header (sequence_header) and the sequence extension always appear.
The repetitions of the sequence extension should be identical to
the first. On the contrary, the sequence header repetitions vary a
little as compared to the first occurrence, only the portion
defining the quantization matrixes should change.
[0106] FIG. 4 shows the compilation software format for the
TDVision.RTM. decoding method (40), where the video_sequence (41)
of the digital stereoscopic image video stream is identified, which
may be dependent or independent (parallel images), in the
sequence_header (42). If the image is TDVision.RTM. then the double
buffer is activated and the changes in the aspect_ratio_information
are identified. The information corresponding to the image that can
be found here is read in the user_data (43). The
sequence_scalable_extension (44) identifies the information
contained in it and the base and enhancement layers, the
video_sequence can be located here, defines the scalable_mode and
the layer identifier. extra_bit_picture (45) identifies the
picture_estructure, picture_header and the picture_coding_extension
(46) reads the "B" type images and if it is a TDVision.RTM. type
image, then it decodes the second buffer.
picture_temporal_scalable_extension ( ) (47), in case of having
temporal scalability, is used to decode B type images.
[0107] Namely, the sequence header (sequence_header) provides a
higher information level on the video stream, for clarity purposes
the number of bits corresponding to each is also indicated, the
most significative bits are located within the sequence extension
(Sequence_Extension) structure, it is formed by the following
structures:
TABLE-US-00002 Sequense_Header Field bits Description
Sequence_Header_Code 32 Sequence_Header Start 0x00001B3
Horizontal_Size_Value 12 less significative bits for width*
Vertical Size Value 12 12 less significative bits for length Aspect
Ratio Information 4 image aspect 0000 forbidden 0001 n/a TDVision
.RTM. 0010 4:3 TDVision .RTM. 0011 16:9 TDVision .RTM. 0100 2.21:1
TDVision .RTM. 0111 will execute a logical "and" in order to obtain
backward compatibility with 2D systems. 0101 . . . 1111 reserved
Frame rate code 4 0000 forbidden 0001 24,000/1001 (23.976) in
TDVision .RTM. format 0010 24 in TDVision .RTM. format 0011 25 in
TDVision .RTM. format 0100 30,000/1001 (29.97)" 0101 30 in TDVision
.RTM. format 0110 50 in TDVision .RTM. format 0111 60,000/1001
(59.94) "(will execute a logical "and" in order to obtain backward
compatibility with 2D systems.) 1000 60 1111 reserved
Bit_rate_value 18 The 18 less significative bits of the
video_stream bit rate (bit_rate = 400 .times. bit_rate_value +
bit_rate_extension << 18) the most significative bits are
located within the sequence_extension structure. Marker_bit 1
Always 1 (prevents start_code failure). Vbv_buffer_size_value 10
The 10 less significative bits of vbv_buffer_size, which determines
the size of the video buffering verifier (VBV), a structure used to
ensure that a data stream can be used decoding a limited size
buffer without exceeding or leaving too much free space in the
buffer. Constrained_parameters_flag 1 Always 0, not used in MPEG2.
Load_intra_quantizer_matrix 1 Indicates if an intra- coded
quantization matrix is available. If (load_intra_quantizer_matrix)
Intra_quantizer_matrix(64) 8 .times. 64 If a quantization matrix is
indicated, then it should be specified here, it is a 8 .times. 64
matrix. Load_non_intra_quantizer_matrix 1 If
load_non_intra_quantizer_matrix If load_non_intra_quantizer_matrix
Non_intra_quantizer_matrix (64) 8 .times. 64 If the previous flag
is activated, the 8 .times. 64 data forming the quantized matrix
are stored here. *The most significative bits are located within
the sequence_extension structure. Picture_coding_extension Field
bits # Description Extension_start_code 32 Always 0x000001B5
Extension_start_code_identifier 4 Always 1000 F_code(0)(0) 4 Used
to decode motion vectors; when it is a type I image, this data is
filled with 1111. F_code(0)(1) 4 F_code(1)(0) 4 Decoding
information backwards in motion vectors (B), when it is a (P) type
image it should be set to 1111, because there is no backward
movement. F_code(1)(1) 4 Decoding information backwards in motion
vectors, when it is a P type image it should be set to 1111,
because there is no backward movement. Intra_dc_precision 2
precision used in the inverse quantizing of the coefficients of the
DC discrete cosine transform. 00 8 bits precision 01 9 bits
precision 10 10 bits precision 11 11 bits precision
Picture_structure 2 Specifies if the image is divided in fields or
in a full frame. 00 reserved (image in TDVision .RTM. format) 01
top field 10 bottom field 11 by-frame image Top_field_first 1 0 =
decode bottom field first 1 = decode top field first
Frame_pred_frame_dct 1 Concealment_motion_vectors 1 Q_scale_type 1
Intra_vic_format 1 Alternate_scan 1 Repeat_first_field 1 0 =
display a progressive frame 1 = display two identical progressive
frames Chroma_420_type 1 If the chroma format is 4:2:0, then it
should be equal to progressive frame, otherwise it should be equal
to zero. Progressive_frame 1 0 = interlaced 1 = progressive
Composite_display_flag 1 warns about the originally coded
information V_axis 1 Field_sequence 3 Sub_carrier 1 Burst_amplitude
7 Sub_carrier_phase 8 Next_start_code( )
[0108] Picture_Temporal_Scalable_Extension( )
[0109] Two spatial resolution streams exist in case of having
temporal scalability, the bottom layer provides a lesser index
version of the video frames, while the top layer can be used to
derive a greater index version of frames of the same video. The
temporal scalability can be used by low quality, low cost or free
decoders, while the greater frames per second would be used for a
fee.
TABLE-US-00003 Picture_temporal_scalable_extension( ) Field bits #
Definition Extension_start_code_identifier 4 Always 1010
Reference_select_code 2 It is used to indicate that the reference
image will be used to decode intra_coded images FOR O TYPE IMAGES
00 enhances the most recent images 01 the lower and most recent
frame layer in display order 10 the next lower frame layer in order
of forbidden display. 11 forbidden FOR B TYPE IMAGES 00 forbidden
01 most recently decoded images in enhanced mode 10 most recently
decoded images in enhanced mode 11 most recent image in the bottom
layer in display order Forward_temporal_reference 10 Temporal
reference Marker_bit 1 Backward_temporal_reference 10 Temporal
reference Next_star_code( )
[0110] Picture_spatial_scalable_extension( )
[0111] In the case of image spatial scalability, the enhancement
layer contains data, which allow a better resolution of the base
layer so it can be reconstructed. When an enhancement layer is used
as a function of a base layer as a reference for the motion
compensation, then the bottom layer should be escalated and offset
in order to obtain greater resolution of the enhancement layer.
TABLE-US-00004 Picture_spatial_scalable_extension( ) Field bits #
Definition Extension_start_code_identifier 4 Always 1001
Lower_layer_temporal_reference 10 Reference to the lower layer's
temporal image Marker_bit 1 1 Lower_layer_horizontal_offset 15
Horizontal compensation (Offset) Marker_bit 1 1
Lower_layer_vertical_offset 15 Vertical compensation (Offset)
Spatial_temporal_weight_code_table_index 2 Prediction details
Lower_layer_progressive_frame 1 1 = progressive 0 = interlaced
Lower_layer_desinterlaced_field_select 1 0 = the top field is used
1 = the bottom field is used Next_start_code( )
Copyright_extension( ) Extension_start_code_identifier 4 Always 010
Copyright_flag 1 if it is equal to 1 then it uses copyright If it
is zero (0), no additional copyright information is needed
Copyright_identifier 8 1 = original 0 = copy Original_or_copy 1
Reserved 7 Marker_bit 1 Copyright_number_1 20 Number granted by
copyright instance Marker_bit 1 Copyright_number_2 22 Number
granted by copyright instance Marker_bit 1 Copyright_number_3 22
Number granted by copyright instance Next_start_code( )
Picture_data( )
[0112] This is a simple structure, it does not have field in
itself.
[0113] Slice( )
[0114] Contains information on one or more macroblocks in the same
vertical position.
[0115] Slice_start_code 32
[0116] Slice_vertical_position_extension 3
[0117] Priority_breakpoint 7
[0118] Quantizer_scale_code 5
[0119] Intra_slice_flag 1
[0120] Intra_slice 1
[0121] Reserved_bits 7
[0122] Extrabit_slice 1
[0123] Extra_information_slice 8
[0124] Extrabit_slice 1
[0125] Macroblock( )
[0126] Macroblock_modes( )
[0127] Motion_vectors( )
[0128] Motion_vector( )
[0129] Coded_block_pattern( )
[0130] Block( )
[0131] EXTENSION_AND_USER_DATA(2)
[0132] The image can be displayed in:
[0133] DVD (Digital Versatile Disks)
[0134] DTV (Digital Television)
[0135] HDTV (High Definition Television)
[0136] CABLE (DVB Digital Video Broadcast)
[0137] SATELLITE (DSS Digital Satellite Systems); and it is the
software and hardware process integration.
[0138] The decoding compilation format in the hardware (50) section
of FIG. 5, is duplicated in the DSP input memory, at the same time,
the simultaneous input of two independent or dependent video
signals is allowed, corresponding to the left-right stereoscopic
existing signal taken by the stereoscopic TDVision.RTM. camera. In
the procedure the video_sequence (51) is detected to alternate the
left and right frames or sending them in parallel, sequence_header
(52) identification, the image type (53) is identified, it passes
to the normal video stream (54), then it is submitted to an error
correction process (55), the video image information is sent to the
output buffer (56) which in turn shares and distributes the
information to the left channel (57) and the right channel (58) in
said channels the video stream information is displayed in 3D or
2D.
[0139] Consists in storing both L (left) and R (right) video
streams in simultaneous form as two independent video streams, but
synchronized with the same time_code, so they can later be decoded
and played back in parallel in large output buffers. They can also
be dependent and decodified by differences.
[0140] Regarding hardware, most of the process is executed by
devices known as DSP (Digital Signal Processors). As an example,
namely, the Motorola and the Texas Instruments (TMS320C62X) models
can be used.
[0141] These DSP are programmed by a hybrid language from C and
Assembly languages, provided by the manufacturer in question. Each
DSP has its own API, consisting of a functions list or procedure
calls located in the DSP to be called by software. From this
reference information, the 3D-images are coded, which are
compatible with the MPEG2 format and with their own coding
algorithm. When the information is coded, the DSP is in charge of
running the prediction, comparison, quantization, and DCT function
application processes in order to form the MPEG2 compressed video
stream.
[0142] In order to obtain three-dimensional images from a digital
video stream, certain modifications have been made to the current
MPEG2 decoders, by software and hardware changes in different parts
of the decoding process. The structures and the video_sequence of
the video data stream should be modified to include the necessary
flags to identify at the bit level the TDVision.RTM.D technology
image type.
[0143] The modifications are made in the next decoding steps.
[0144] Software:
[0145] Video format identification.
[0146] Application of a logical "and" for MPEG2 backward
compatibility in case of not being a TDVision.RTM. video.
[0147] Image decoding in normal manner (previous technique) [0148]
scanning the video_sequence.
[0149] In case of a TDVision.RTM. type image:
[0150] Discriminating if they are dependent or independent video
signals
[0151] Store the last complete image buffer in the left or right
channel buffer.
[0152] Apply the B type frame information decoding.
[0153] Apply error correction to the last obtained image by
applying the motion and color correction vectors.
[0154] Store the results in their respective channel buffer.
[0155] Continue the video sequence reading.
[0156] Hardware:
[0157] When the information is decoded via hardware;
[0158] discriminate if the image is 2D or 3D
[0159] Activate a double output buffer (memory is increased).
[0160] The difference decoding selector is activated.
[0161] The parallel decoding selector is activated.
[0162] The decompression process is executed.
[0163] The image is displayed in its corresponding output
buffer.
[0164] The following structures, sub-structures and sequenced will
be used in specific ways; they belong to the video_sequence
structure for the hardware implementation of the MPEG2 backward
compatible TDVision.RTM. technology.
[0165] Actually:
[0166] Sequence_header
[0167] Aspect_ratio_information
[0168] 1001 n/a in TDVision.RTM.
[0169] 1010 4:3 in TDVision.RTM.
[0170] 1011 16:9 in TDVision.RTM.
[0171] 1100 2.21:1 in TDVision.RTM.
[0172] A logical "and" will be executed with 0111 to obtain the
backward compatibility with 2D systems, when this occurs, the
instruction is sent to the DSP that the buffer of the stereoscopic
pair (left or right) should be equal to the source, so all the
images decoded will be sent to both output buffers to allow the
image display in any device.
[0173] Frame_rate_code
[0174] 1001 24,000/101 (23.976) in TDVision.RTM. format
[0175] 1010 24 in TDVision.RTM. format.
[0176] 1011 25 in TDVision.RTM. format.
[0177] 1100 30,000/1001 (29.97) in TDVision.RTM. format.
[0178] 1101 30 in TDVision.RTM. format.
[0179] 1110 50 in TDVision.RTM. format.
[0180] 1111 60,000/1001 (59.94) in TDVision.RTM. format.
[0181] A logical "and" with 0111 will be executed in order to
obtain backward compatibility with 2D systems.
[0182] User_data( )
[0183] Sequence_scalable_extension
[0184] Picture_header
[0185] Extra_bit_picture
[0186] 0=TDVision.RTM.
[0187] 1=normal
[0188] Picture_coding_extension
[0189] Picture-structure
[0190] 00=image in TDVision.RTM. format
[0191] Picture_temporal_scalable_extension( )
[0192] At the moment of coding the information a DSP is used which
is in charge of executing the prediction, comparison, and
quantization processes, applies the DCT to form the MPEG2
compressed video stream, and discriminates between 2D or
3D-images.
[0193] Two video signals are coded in an independent form but with
the same time_code, signals corresponding to the left signal and
the right signal coming from a 3DVision.RTM. camera, sending both
programs simultaneously with TDVision.RTM. stereoscopic pair
identifiers. This type of decoding is known as "by parallel
images", consisting in storing both left and right (L and R) video
streams simultaneously as two independent video streams, but
time_code-synchronized. Later, they will be decoded and played back
in parallel. Only the decoding software should be decoded, the
coding and the compression algorithm of the transport stream will
be identical to the current one.
[0194] Software modifications in the decoder.
[0195] In the decoder, two program streams should be programmed
simultaneously, or two interdependent video signals, i.e.,
constructed from the difference between both stored as a B type
frame with an identifier, following the programming API as in the
example case, in the use of the TMS320C62X family Texas Instruments
DSP.
[0196] DSP's programming algorithm and method.
[0197] Create two process channels when starting the DSP (primary
and secondary buffers or left and right when calling API).
[0198] Get the RAM memory pointers for each channel (RAM addresses
in the memory map)
[0199] When a TDVision.RTM. type video sequence is obtained
[0200] it is taken as B type
[0201] the image is decoded in real-time
[0202] the change or difference is applied to the complementary
buffer
[0203] the results are stored in the secondary buffer.
[0204] In that related to the software in the video_sequence data
stream, two options are implemented:
[0205] 1.--One modifies only the software and uses the user_data( )
section to store the error correction that allows to regenerate the
stereoscopic signal.
[0206] 2.--The other enables by hardware the PICTURE_DATA3D( )
function which is transparent to MPEG2-compatible readers, and
which it can be decoded by a TDVision.RTM.-compatible DSP.
[0207] At the moment that the MPEG2 decoder detects a user_data( )
code, it will search the 3DVISION_START_IDENTIFIER=0X000ABCD 32-bit
identifier, which is an extremely high and difficult to reproduce
code, or which does not represent data. Then, the 3D block length
to be read will be taken into account, which is a 32-bit "n" data.
When this information is detected within the USER_DATA( ), a call
to the special decoding function will be made which is then
compared to the output buffer and applied from the current read
offset of the video_sequence, the n bytes as a typical correction
for B type frames. The output of this correction is sent to other
output address, which is directly associated to a video output
additional to that existing in the electronic display device.
[0208] If the PICTURE_DATA3D( ) structure is recognized, then it
proceeds to read the information directly by the decoder; but it
writes the information in a second output buffer, which is also
connected to a video output additional to that existing in the
electronic display device.
[0209] In case of the program stream, two signals (left and right)
are synchronized by the time_code, which will be decoded in
parallel by a MPEG decoder with enough simultaneous multiple video
channels decoding capability, or which can send two interdependent
video signals within the same video_sequence, e.g., "R-L=delta",
where delta is the difference stores as a "B" type frame with
stereoscopic pair TDVision.RTM. identifier and which can be
reconstructed at the moment of the decoding by differences from the
image, i.e., "R-delta=L" or "L-delta=R", as in the case of the
aforementioned Texas Instruments DSP, which is considered as an
illustrative but not limiting example.
[0210] A video containing a single video sequence is also
implemented; but alternating the left and right frames at 60 frames
per second (30 frames each) and when decoded place the video buffer
image in the corresponding left or right channel.
[0211] It will also have the capacity of detecting via hardware if
the signal is of TDVision.RTM. type, if this is the case, it will
be identified if it is a transport stream, program stream or
left-right multiplexion at 60 frames per second.
[0212] In the case of the transport stream the backward
compatibility system is available in the current decoders, having
the ability to display the same video without 3d characteristics
but only in 2D, in which case the DSP is disabled to display the
image in any TDVision.RTM. or previous technique device.
[0213] In the case of the program stream unmodified coders are
used, such as those currently used in satellite transmission
systems; but the receptor and decoder have a TDVision.RTM. flag
identification system, thus enabling the second video buffer to
form a left-right pair.
[0214] Finally, in the case of multiplexed video, the MPEG decoder
with two video buffers (left-right) is enabled, identifying the
adequate frame and separating each signal at 30 frames per second,
thus providing a flickerless image, as the video stream is constant
and due to the characteristic retention wave of the human eye the
multiplexion effect is not appreciated.
[0215] Particular embodiments of the invention have been
illustrated and described, it will be obvious for those skilled in
the art that several modifications or changes can be made without
departing from the scope of the present invention. All such
modifications and changes are intended to be covered by the
following claims, so that all changes and modifications fall within
the scope of the present invention.
* * * * *