U.S. patent application number 13/225202 was filed with the patent office on 2012-03-08 for video classification systems and methods.
Invention is credited to Fang SHI, Biao Wang.
Application Number | 20120057633 13/225202 |
Document ID | / |
Family ID | 45770713 |
Filed Date | 2012-03-08 |
United States Patent
Application |
20120057633 |
Kind Code |
A1 |
SHI; Fang ; et al. |
March 8, 2012 |
Video Classification Systems and Methods
Abstract
Video encoder systems and methods are described that employ
table-based content classification. One or more tables relate
quantization parameters and P-points for a frame of video that
typically comprises macroblocks. A deviation representative of a
difference between original and decoded versions of a macroblock is
determined, the deviation being further representative of a
distribution frequency of the value of a distortion for a P-point.
The P-point corresponds to a distortion value that is associated
with a minimum rate difference between encoding modes for a
macroblock. A motion complexity index is updated using a
quantization parameter and non-zero coefficients of the encoded
frame. An encoding mode for the macroblock can be retrieved from
the tables using the motion complexity index to reference mode
information maintained in the tables.
Inventors: |
SHI; Fang; (San Diego,
CA) ; Wang; Biao; (Chengdu, CN) |
Family ID: |
45770713 |
Appl. No.: |
13/225202 |
Filed: |
September 2, 2011 |
Current U.S.
Class: |
375/240.16 ;
375/E7.125 |
Current CPC
Class: |
H04N 19/198 20141101;
H04N 19/61 20141101; H04N 19/176 20141101; H04N 19/51 20141101;
H04N 19/124 20141101; H04N 19/52 20141101; H04N 5/145 20130101;
H04N 19/115 20141101; H04N 19/164 20141101 |
Class at
Publication: |
375/240.16 ;
375/E07.125 |
International
Class: |
H04N 7/26 20060101
H04N007/26 |
Foreign Application Data
Date |
Code |
Application Number |
Sep 2, 2010 |
CN |
PCT/CN2010/076555 |
Sep 2, 2010 |
CN |
PCT/CN2010/076564 |
Sep 2, 2010 |
CN |
PCT/CN2010/076567 |
Sep 2, 2010 |
CN |
PCT/CN2010/076569 |
Claims
1. A method of content classification in a video encoder,
comprising: calculating a deviation representative of a difference
between original and decoded versions of a macroblock in a frame of
video, a distribution frequency of the value of a distortion and
the location of a P-point, wherein the macroblock is associated
with a bit rate representing bits used to encode the macroblock,
and wherein a P-point represents a point in the frame at which a
rate of change of bit rate is equal to zero; updating a motion
complexity index using a quantization parameter and a number of
non-zero coefficients in the macroblock when encoded; and selecting
an encoding mode for the macroblock using the motion complexity
index to reference mode information maintained in one or more
tables relating quantization parameters to one or more P-points for
the frame of video, wherein the mode is selected to yield a least
cost encoding, wherein the frame comprises a plurality of
macroblocks, each macroblock associated with a bit rate
representing bits used to encode the each macroblock, and wherein
each P-point corresponds to a distortion value that is associated
with a minimum rate difference between encoding modes for a
macroblock.
2. The method of claim 1, wherein the deviation comprises a
weighted difference of estimated distortion and measured distortion
for a selected quantization parameter value.
3. The method of claim 1, wherein the deviation is normalized.
4. The method of claim 1, wherein calculating the deviation
representative of the difference between original and decoded
versions of a macroblock is based on a tangential relationship
between the distortion and a rate difference between the encoding
modes.
5. The method of claim 1, wherein each P-point corresponds to a
distortion value that is associated with no rate difference between
encoding modes for the macroblock.
6. The method of claim 1, wherein the motion complexity index is
initiated during receipt of an initial number of frames in a video
sequence.
7. The method of claim 6, wherein the initial number of frames in
the video sequence comprises 5 frames.
8. The method of claim 1, further comprising modeling a cost of
deviation for each motion complexity class for each macroblock as a
function of P-point, distortion and quantization parameter.
9. The method of claim 1, further comprising looking up a P-point
for a current frame using a weighted quantization parameter value
of a previous frame.
10. The method of claim 1, wherein the encoding modes comprise an
inter-prediction mode and an intra-prediction mode.
11. The method of claim 1, wherein the encoding modes are defined
by the H.264 video standard.
12. A video encoder, comprising: non-transitory storage adapted to
maintain a plurality of tables relating quantization parameters and
encoding modes for a video frame; and a content classifier that
selects an encoding mode for a macroblock of the video frame from
the plurality of tables using a deviation representative of a
difference between original and decoded versions of the macroblock;
and wherein the video encoder maintains a motion complexity index
corresponding to a quantization parameter and non-zero coefficients
of the encoded frame, the motion complexity index being operable to
select the encoding mode as a function of the motion complexity of
the video frame, wherein the selected encoding mode yields a
least-cost encoding.
13. The video encoder of claim 12, wherein the deviation is
represented by a function of a P-point, a distortion and a
quantization parameter, wherein each P-point corresponds to a
distortion value that is associated with a minimum rate difference
between encoding modes for the macroblock.
14. A non-transitory computer-readable medium encoded with data and
instructions wherein the data and instructions, when executed by a
processor of a video encoder, cause the video encoder to perform a
content classification method comprising: calculating a deviation
representative of a difference between original and decoded
versions of a macroblock of a frame of video, a distribution
frequency of the value of a distortion and the location of a
minimum point corresponding to a distortion value associated with a
minimum rate difference between possible encoding modes for the
macroblock; updating a motion complexity index using a quantization
parameter and a number of non-zero coefficients in the encoded
macroblock; and selecting an encoding mode for the macroblock using
the motion complexity index to reference mode information
maintained in one or more tables by the video encoder, the one or
more tables relating quantization parameters and minimum points for
the frame, wherein each macroblock of the frame is associated with
a bit rate representing bits used to encode the each macroblock,
and wherein each minimum point represents a point in the frame at
which a rate of change of bit rate is equal to zero.
15. The non-transitory computer-readable medium of claim 14,
wherein the deviation comprises a weighted difference of estimated
distortion and measured distortion for a selected quantization
parameter value, and wherein the selected mode yields a least cost
encoding.
16. The non-transitory computer-readable medium of claim 15,
wherein the deviation comprises a weighted difference of estimated
distortion and measured distortion for a selected quantization
parameter value and wherein calculating the deviation
representative of the difference between original and decoded
versions of a macroblock includes determining a tangential
relationship between the distortion and a rate difference between
the encoding modes.
17. The non-transitory computer-readable medium of claim 14,
wherein the method further comprises modeling cost of deviation for
each motion complexity class for each macroblock as a function of
minimum point, distortion and quantization parameter.
18. The non-transitory computer-readable medium of claim 14,
wherein the method further comprises looking up a minimum point for
a current frame using a weighted quantization parameter value of a
previous frame.
19. The non-transitory computer-readable medium of claim 14,
wherein the encoding modes comprise an inter-prediction mode and an
intra-prediction mode.
20. The non-transitory computer-readable medium of claim 14,
wherein the encoding modes are defined by the H.264 video standard.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] The present application claims priority from
PCT/CN2010/076569 (title: "Video Classification Systems and
Methods") which was filed in the Chinese Receiving Office on Sep.
2, 2010, from PCT/CN2010/076564 (title: "Rho-Domain Metrics") which
was filed in the Chinese Receiving Office on Sep. 2, 2010, from
PCT/CN2010/076555 (title: "Video Analytics for Security Systems and
Methods") which was filed in the Chinese Receiving Office on Sep.
2, 2010, and from PCT/CN2010/076567 (title: "Systems And Methods
for Video Content Analysis) which was filed in the Chinese
Receiving Office on Sep. 2, 2010, each of these applications being
hereby incorporated herein by reference. The present Application is
also related to concurrently filed U.S. Patent non-provisional
applications entitled "Rho-Domain Metrics" (attorney docket no.
043497-0393276), "Video Analytics for Security Systems and Methods"
(attorney docket no. 043497-0393277) and "Systems And Methods for
Video Content Analysis" (attorney docket no. 043497-0393278), which
are expressly incorporated by reference herein.
BRIEF DESCRIPTION OF THE DRAWINGS
[0002] FIG. 1 illustrates the relationship of distortion and rate
difference between Intra Inter modes for a given quantization
parameter.
[0003] FIG. 2 is a flowchart illustrating a content classification
based mode decision method.
[0004] FIG. 3 is a simplified block schematic illustrating a
processing system employed in certain embodiments of the
invention.
DETAILED DESCRIPTION
[0005] Embodiments of the present invention will now be described
in detail with reference to the drawings, which are provided as
illustrative examples so as to enable those skilled in the art to
practice the invention. Notably, the figures and examples below are
not meant to limit the scope of the present invention to a single
embodiment, but other embodiments are possible by way of
interchange of some or all of the described or illustrated
elements. Wherever convenient, the same reference numbers will be
used throughout the drawings to refer to same or like parts. Where
certain elements of these embodiments can be partially or fully
implemented using known components, only those portions of such
known components that are necessary for an understanding of the
disclosed embodiments will be described, and detailed descriptions
of other portions of such known components will be omitted so as
not to obscure the disclosed embodiments. In the present
specification, an embodiment showing a singular component should
not be considered limiting; rather, the invention is intended to
encompass other embodiments including a plurality of the same
component, and vice-versa, unless explicitly stated otherwise
herein. Moreover, applicants do not intend for any term in the
specification or claims to be ascribed an uncommon or special
meaning unless explicitly set forth as such. Further, certain
embodiments of the present invention encompass present and future
known equivalents to the components referred to herein by way of
illustration.
[0006] Video standards such as H.264/AVC employ mode decision as an
encoding decision process to determine whether a macroblock ("MB")
is encoded as an intra-prediction mode ("Intra Mode") or an
inter-prediction mode ("Inter Mode"). Rate-distortion optimization
techniques are commonly applied in various implementations. When
encoding a MB, rate-distortion cost is calculated for both Intra
Modes and Inter Modes. The minimum cost mode is selected as the
final encoding mode. Depending on the video standard, multiple
Intra Modes and Inter Modes are applied. For example, in H.264
standard, there are 4 Intra 16.times.16 Modes and 9 Intra 4.times.4
Modes for each MB, and skip macroblock, Inter 16.times.16 Mode,
Inter 16.times.8, 8.times.16, 8.times.8, 8.times.4, 4.times.8 and
4.times.4 Modes for each MB. Rate-distortion cost J is defined
as
J=D+.lamda.*R, (1)
where distortion D is defined as the difference between
reconstructed MB and original MB, where rate R represents the bits
used to encode the current MB, and where coefficient .lamda. is a
weighting factor. In one example, the sum of absolute differences
(SAD) can be used to quantify distortion.
Rate-Distortion Optimization
[0007] Rate-distortion optimization (RDO) techniques can provide a
balance of encoding quality and compression ratio. An accurate
calculation of rate R in equation (1) is computationally costly and
generally involves a dual-pass encoding process which requires the
use of hardware resources and which introduces additional delays.
Research has been conducted to optimize the calculation of R and to
provide a fast rate-distortion balanced mode decision algorithm.
However, estimation of bit rate R per MB is generally very costly
due to tight pipeline architectures employed in hardware
embodiments that provide real-time encoding and multiple-channel
encoding.
[0008] Accordingly, in certain embodiments, distortion D is used to
determine the mode decision when R is omitted from equation (1).
Mode optimization typically cannot be achieved by using D alone
without considering the bit rate perspective of encoding. For
example, in the low-complex background cases, Intra Mode's SAD
values for background MBs can be smaller than Inter Mode SAD
values: therefore, Intra-mode is typically selected for background
MBs. However, Intra Mode encoding usually consumes many more bits
than Inter Mode encoding and, consequentially, encoding bits may be
wasted and background blocky artifacts can be observed.
[0009] Certain embodiments employ a comparison of rate cost
distortions J_int ra and J_int er. Based on equation (1), a
comparison can be taken as equivalent to the comparison of
D.sub.int ra+.lamda.*(.DELTA.R)D.sub.int er shown in Equation (2),
where .lamda.*(.DELTA.R) (denoted hereafter .tau.) is the rate
difference weighting factor between Intra mode and Inter mode.
J.sub.int er=D.sub.int er
J.sub.int ra=D.sub.int ra
Experimental results show there is a pseudo tangent relationship
between .DELTA.R and distortion for a given quantization parameter
("QP") as shown in FIG. 1.
[0010] FIG. 1 shows the relationship of .DELTA.R and D for a given
QP (in FIG. 1, QP=26), and, in FIG. 1, SAD is used as distortion
and .DELTA.R=R.sub.int ra-R.sub.int er. For the purposes of this
description, R.sub.int ra represents bit numbers used by the intra
mode encoder to encode the current microblock, and R.sub.int er
represents bit numbers used by the inter mode encoder to encode
current microblock. A point P is defined as the point at which
Diff_R (.DELTA.R) is equal to the zero point on axis X. Points with
D values less than P will consume more bits with Intra Mode
encoding (.DELTA.R(=R.sub.int ra-r.sub.int er)>0), while points
with D values larger than P will consume less bits with Intra Mode,
as shown in the drawing. Experimental results show there is a
pseudo tangent relationship between .DELTA.R and distortion for a
given QP. The location of P-point is a function of QP and video
motion complexity, P-points increase along with the increasing of
QP and motion complexity. After a P-point is located, deviation r
can be estimated and the Intra Mode/Inter Mode decision can be
reached quickly and with greater ease, based on the tangent curve
and D value distribution frequency.
Rho-Domain Content Classification
[0011] Certain embodiments of the invention use Rho-domain
(".rho.-domain") content classification and certain embodiments
provide an innovative .rho.-domain metric ".theta." and employ
systems and methods that apply the metric. In some embodiments, the
definition of .rho. in .rho.-domain can be taken to be the number
of non-zero coefficients after transform and quantization in a
video encoding process. Additionally, the term "NZ" will be used
herein to represent .rho., where NZ can be understood as meaning a
number of non-zero coefficients after quantization of each
macroblock in video standards such as the H.264 video standard. For
the purposes of this description, a .rho.-domain deviation metric
.theta. may be defined as a recursive weighted ratio between the
theoretical NZ_QP curve and the actual NZ_QP curve. Normalized
.theta. typically fluctuates around 1.0. A value of .theta. smaller
than 1.0 can indicate that the actual encoded bit rate is larger
than the expectation, implying that a more complicated motion
contextual content has been encountered. In contrast, a value of
.theta. larger than 1.0 indicates that the actual encoded bit rate
is smaller than the expectation, implying that smoother motion
content has been encountered. Therefore, .rho.-domain deviation
.theta. can be used as an indicator to classify video content to
high motion complexity, medium, medium-low and low motion
complexity categories. Based on motion complexity classification, a
fast mode decision algorithm can be employed.
Example of a Content Classification Based Mode Decision
Algorithm
[0012] In the example of FIG. 2, a content classification based
mode decision algorithm is illustrated. The algorithm may be
embodied in a combination of hardware and software and may be
deployed as instructions and data stored in a non-transitory
computer readable media. It will be appreciated that the
instructions and data may be configured and/or adapted such that
execution of the instructions by a processor cause the processor to
perform the method described in FIG. 2.
[0013] At step 200, offline trained quantization parameter QP and
P-point tables QP_P_T.sub.n are built based on p-domain content
classifications, while T.sub.n(T.sub.n=1, 2, 3, . . . 51) denotes
different motion complexity classifications. If at step 203 it is
determined that a current frame belongs to the first 5 frames of a
video sequence, then step 204 is performed next; otherwise step 203
is performed next. At step 204, motion complexity index T.sub.n is
initiated based on initial QP and complexity information and the
P-point can be found from QP_P_T.sub.n tables. Step 206 can then be
performed.
[0014] If at step 202 it is identified that the current frame does
not belong to the first 5 frames of a video sequence then, at step
203, NZ_QP deviation .theta. is calculated based on the encoded
frame NZ and QP information. At step 205, the motion complexity
index based on T.sub.n is then recalculated based on deviation
.theta.. A table lookup from QP_P_T.sub.n tables may be performed
to find P for the current frame based on weighted previous frame's
QP value and content classification index T.sub.n before performing
step 206.
[0015] At step 206, deviation .tau. is calculated with respect to
distortion D based on the tangent relationship of .tau. and D, the
distribution frequency of D, and the location of P-point. A
mathematical model .phi. can be established as a function of
P-point, D and QP for each motion complexity class to represent the
cost deviation .tau. for each MB.
One example of a QP_P_T.sub.n is shown in Table 1, here below:
TABLE-US-00001 TABLE 1 QP_P_Tn table QP_P_Tn : static int
MD_P_TABLE[ ][ ]={ //{T1,T2,T3,P_point_T1,P_point_T2,P_point_T3}
{0.8,1.1,2,4,6,6}, //QP = 14 {0.8,1.1,2,4,6,6}, //QP = 15
{0.8,1.1,2,5,7,7}, //QP = 16 {0.8,1.1,2,5,7,7}, //QP = 17
{0.8,1.1,2,6,8,8}, //QP = 18 {0.8,1.1,2,6,8,8}, //QP = 19
{0.8,1.1,2,7,9,9}, //QP = 20 {0.8,1.1,2,8,9,9}, //QP = 20 .... }
//Listed in the table are relative values. //From QP and content
classification index Tn, and P_point can be obtained form the
MD_P_TABLE
[0016] At step 208, mode decisions for each MB of current frame can
be taken. Inter Mode RD cost J.sub.int er can be replaced by D as
shown in equation (2) and Intra Mode cost J.sub.int ra can be
replaced by D+.tau., where .tau. is derived from experimental model
.phi. as described at step 206. A winning mode may be selected as
the mode which yields the minimum mode cost J.sub.min. The process
is typically repeated until it is determined at 210 the encoding of
the current frame is finished.
[0017] In certain embodiments, the mode-decision algorithm,
QP_P_T.sub.n table and deviation model .phi. are built offline from
experimental results. Motion classification index T.sub.n and its
corresponding methods are described in a related, concurrently
filed application titled ".rho.-domain metrics .theta. and its
applications." The video classification based mode decision
algorithms, systems and methods described herein can provide a very
cost efficient, fast and robust alternative approach compared with
conventional systems that tend to be computationally costly, and
which are usually involve dual-pass encoding mode decision
algorithms. In certain embodiments of the present invention. A fast
table-lookup method is used to get a P-point value. From the
P-point, QP and content classification index T.sub.n, and MB cost
deviation .tau. can be obtained from a selected experimental model
.phi.. Mode decisions can be made efficiently by inserting .tau.
into equation (2).
System Description
[0018] Turning now to FIG. 3, certain embodiments of the invention
employ a processing system that includes at least one computing
system 30 deployed to perform certain of the steps described above.
Computing system 30 may be a commercially available system that
executes commercially available operating systems such as Microsoft
Windows.RTM., UNIX or a variant thereof, Linux, a real time
operating system and or a proprietary operating system. The
architecture of the computing system may be adapted, configured
and/or designed for integration in the processing system, for
embedding in one or more of an image capture system, communications
device and/or graphics processing systems. In one example,
computing system 30 comprises a bus 302 and/or other mechanisms for
communicating between processors, whether those processors are
integral to the computing system 30 (e.g. 304, 305) or located in
different, perhaps physically separated computing systems 300.
Typically, processor 304 and/or 305 comprises a CISC or RISC
computing processor and/or one or more digital signal processors.
In some embodiments, processor 304 and/or 305 may be embodied in a
custom device and/or may perform as a configurable sequencer.
Device drivers 303 may provide output signals used to control
internal and external components and to communicate between
processors 304 and 305.
[0019] Computing system 30 also typically comprises memory 306 that
may include one or more of random access memory ("RAM"), static
memory, cache, flash memory and any other suitable type of storage
device that can be coupled to bus 302. Memory 306 can be used for
storing instructions and data that can cause one or more of
processors 304 and 305 to perform a desired process. Main memory
306 may be used for storing transient and/or temporary data such as
variables and intermediate information generated and/or used during
execution of the instructions by processor 304 or 305. Computing
system 30 also typically comprises non-volatile storage such as
read only memory ("ROM") 308, flash memory, memory cards or the
like; non-volatile storage may be connected to the bus 302, but may
equally be connected using a high-speed universal serial bus (USB),
Firewire or other such bus that is coupled to bus 302. Non-volatile
storage can be used for storing configuration, and other
information, including instructions executed by processors 304
and/or 305. Non-volatile storage may also include mass storage
device 310, such as a magnetic disk, optical disk, flash disk that
may be directly or indirectly coupled to bus 302 and used for
storing instructions to be executed by processors 304 and/or 305,
as well as other information.
[0020] In some embodiments, computing system 30 may be
communicatively coupled to a display system 312, such as an LCD
flat panel display, including touch panel displays,
electroluminescent display, plasma display, cathode ray tube or
other display device that can be configured and adapted to receive
and display information to a user of computing system 30.
Typically, device drivers 303 can include a display driver,
graphics adapter and/or other modules that maintain a digital
representation of a display and convert the digital representation
to a signal for driving a display system 312. Display system 312
may also include logic and software to generate a display from a
signal provided by system 300. In that regard, display 312 may be
provided as a remote terminal or in a session on a different
computing system 30. An input device 314 is generally provided
locally or through a remote system and typically provides for
alphanumeric input as well as cursor control 316 input, such as a
mouse, a trackball, etc. It will be appreciated that input and
output can be provided to a wireless device such as a PDA, a tablet
computer or other system suitable equipped to display the images
and provide user input.
[0021] In certain embodiments, computing system 30 may be embedded
in a system that captures and/or processes images, including video
images. In one example, computing system may include a video
processor or accelerator 317, which may have its own processor,
non-transitory storage and input/output interfaces. In another
example, video processor or accelerator 317 may be implemented as a
combination of hardware and software operated by the one or more
processors 304, 305. In another example, computing system 30
functions as a video encoder, although other functions may be
performed by computing system 30. In particular, a video encoder
that comprises computing system 30 may be embedded in another
device such as a camera, a communications device, a mixing panel, a
monitor, a computer peripheral, and so on.
[0022] According to one embodiment of the invention, portions of
the described invention may be performed by computing system 30.
Processor 304 executes one or more sequences of instructions. For
example, such instructions may be stored in main memory 306, having
been received from a computer-readable medium such as storage
device 310. Execution of the sequences of instructions contained in
main memory 306 causes processor 304 to perform process steps
according to certain aspects of the invention. In certain
embodiments, functionality may be provided by embedded computing
systems that perform specific functions wherein the embedded
systems employ a customized combination of hardware and software to
perform a set of predefined tasks. Thus, embodiments of the
invention are not limited to any specific combination of hardware
circuitry and software.
[0023] The term "computer-readable medium" is used to define any
medium that can store and provide instructions and other data to
processor 304 and/or 305, particularly where the instructions are
to be executed by processor 304 and/or 305 and/or other peripheral
of the processing system. Such medium can include non-volatile
storage, volatile storage and transmission media. Non-volatile
storage may be embodied on media such as optical or magnetic disks,
including DVD, CD-ROM and BluRay. Storage may be provided locally
and in physical proximity to processors 304 and 305 or remotely,
typically by use of network connection. Non-volatile storage may be
removable from computing system 304, as in the example of BluRay,
DVD or CD storage or memory cards or sticks that can be easily
connected or disconnected from a computer using a standard
interface, including USB, etc. Thus, computer-readable media can
include floppy disks, flexible disks, hard disks, magnetic tape,
any other magnetic medium, CD-ROMs, DVDs, BluRay, any other optical
medium, punch cards, paper tape, any other physical medium with
patterns of holes, RAM, PROM, EPROM, FLASH/EEPROM, any other memory
chip or cartridge, or any other medium from which a computer can
read.
[0024] Transmission media can be used to connect elements of the
processing system and/or components of computing system 30. Such
media can include twisted pair wiring, coaxial cables, copper wire
and fiber optics. Transmission media can also include wireless
media such as radio, acoustic and light waves. In particular radio
frequency (RF), fiber optic and infrared (IR) data communications
may be used.
[0025] Various forms of computer readable media may participate in
providing instructions and data for execution by processor 304
and/or 305. For example, the instructions may initially be
retrieved from a magnetic disk of a remote computer and transmitted
over a network or modem to computing system 30. The instructions
may optionally be stored in a different storage or a different part
of storage prior to or during execution.
[0026] Computing system 30 may include a communication interface
318 that provides two-way data communication over a network 320
that can include a local network 322, a wide area network or some
combination of the two. For example, an integrated services digital
network (ISDN) may used in combination with a local area network
(LAN). In another example, a LAN may include a wireless link.
Network link 320 typically provides data communication through one
or more networks to other data devices. For example, network link
320 may provide a connection through local network 322 to a host
computer 324 or to a wide are network such as the Internet 328.
Local network 322 and Internet 328 may both use electrical,
electromagnetic or optical signals that carry digital data
streams.
[0027] Computing system 30 can use one or more networks to send
messages and data, including program code and other information. In
the Internet example, a server 330 might transmit a requested code
for an application program through Internet 328 and may receive in
response a downloaded application that provides or augments
functional modules such as those described in the examples above.
The received code may be executed by processor 304 and/or 305.
Additional Descriptions of Certain Aspects of the Invention
[0028] The foregoing descriptions of the invention are intended to
be illustrative and not limiting. For example, those skilled in the
art will appreciate that the invention can be practiced with
various combinations of the functionalities and capabilities
described above, and can include fewer or additional components
than described above. Certain additional aspects and features of
the invention are further set forth below, and can be obtained
using the functionalities and components described in more detail
above, as will be appreciated by those skilled in the art after
being taught by the present disclosure.
[0029] Certain embodiments of the invention provide video encoder
systems and methods. In some of these embodiments, the encoder
systems employ content classification. Some of these embodiments
comprise maintaining one or more tables relating quantization
parameters and P-points for a frame of video. In some of these
embodiments, the frame comprises one or more macroblocks. Some of
these embodiments comprise calculating a deviation representative
of a difference between original and decoded versions of a
macroblock. Some of these embodiments comprise calculating a
deviation representative of a distribution frequency of the value
of a distortion. Some of these embodiments comprise calculating a
deviation representative of the location of a P-point. In some of
these embodiments, the P-point corresponds to a distortion value
that is associated with a minimum rate difference between encoding
modes for a macroblock. Some of these embodiments comprise updating
a motion complexity index using a quantization parameter and a
number of non-zero coefficients of the encoded frame. Some of these
embodiments comprise selecting an encoding mode for the macroblock
using the motion complexity index to reference mode information
maintained in the one or more tables.
[0030] In some of these embodiments, the selected mode yields a
least cost encoding. In some of these embodiments. In some of these
embodiments, the deviation comprises a weighted difference of
estimated distortion and measured distortion for a selected
quantization parameter value. In some of these embodiments, the
deviation is normalized. In some of these embodiments, calculating
the deviation representative of the difference between original and
decoded versions of a macroblock is based on a tangential
relationship between the distortion and a rate difference between
the encoding modes. In some of these embodiments, each P-point
corresponds to a distortion value is associated with no rate
difference between encoding modes for the macroblock. In some of
these embodiments, the motion complexity index is initiated during
receipt of an initial number of frames in a video sequence. In some
of these embodiments, there are at least 5 frames in the initial
number of frames in the video sequence.
[0031] Some of these embodiments comprise modeling cost of
deviation for each motion complexity class for each macroblock as a
function of P-point, distortion and quantization parameter. Some of
these embodiments comprise looking up a P-point for a current frame
using a weighted quantization parameter value of a previous frame.
In some of these embodiments, the encoding modes comprise an
inter-prediction mode and an intra-prediction mode. In some of
these embodiments, the encoding modes are defined by the H.264
video standard.
[0032] Certain embodiments of the invention provide a video encoder
317 (see FIG. 3). Some of these embodiments comprise a plurality of
tables relating quantization parameters and encoding modes for a
video frame. Some of these embodiments comprise a content
classifier that selects an encoding mode for a macroblock of the
video frame from the plurality of tables using a deviation
representative of difference between original and decoded versions
of the macroblock. Some of these embodiments comprise a processor
that maintains a motion complexity index using a quantization
parameter and non-zero coefficients of the encoded frame. In some
of these embodiments, the motion complexity index is operable to
select an encoding mode based on the motion complexity of the
frame. In some of these embodiments, the selected mode yields a
least cost encoding for the frame. In some of these embodiments,
the selected mode yields a least cost encoding for the macroblock.
In some of these embodiments, each P-point corresponds to a
distortion value that is associated with a minimum rate difference
between encoding modes for a macroblock.
[0033] Although the present invention has been described with
reference to specific exemplary embodiments, it will be evident to
one of ordinary skill in the art that various modifications and
changes may be made to these embodiments without departing from the
broader spirit and scope of the invention. Accordingly, the
specification and drawings are to be regarded in an illustrative
rather than a restrictive sense.
* * * * *