U.S. patent application number 13/225238 was filed with the patent office on 2012-03-08 for video analytics for security systems and methods.
Invention is credited to Keqiang Dai, Jin Ming, Changsong Qi, Fang SHI.
Application Number | 20120057640 13/225238 |
Document ID | / |
Family ID | 45770713 |
Filed Date | 2012-03-08 |
United States Patent
Application |
20120057640 |
Kind Code |
A1 |
SHI; Fang ; et al. |
March 8, 2012 |
Video Analytics for Security Systems and Methods
Abstract
Video processing, encoding and decoding systems are described. A
processor receives video frames representative of a sequence of
images captured by a video sensor and the video frames are encode
according to a desired video encoding standard. A video analytics
processor receives video analytics metadata generated by the video
encoder from the sequence of images and produces video analytics
messages for transmission to a client device which performs client
side video analytics processing. The video analytics metadata may
comprise pixel domain video analytics information directly from an
analog-to-digital front end or directly from an encoding engine as
the engine is performing compression.
Inventors: |
SHI; Fang; (San Diego,
CA) ; Qi; Changsong; (Chengdu, CN) ; Ming;
Jin; (ChengDu City, CN) ; Dai; Keqiang;
(Chengdu, CN) |
Family ID: |
45770713 |
Appl. No.: |
13/225238 |
Filed: |
September 2, 2011 |
Current U.S.
Class: |
375/240.26 ;
375/E7.026 |
Current CPC
Class: |
H04N 5/145 20130101;
H04N 19/115 20141101; H04N 19/51 20141101; H04N 19/52 20141101;
H04N 19/198 20141101; H04N 19/124 20141101; H04N 19/61 20141101;
H04N 19/176 20141101; H04N 19/164 20141101 |
Class at
Publication: |
375/240.26 ;
375/E07.026 |
International
Class: |
H04N 7/26 20060101
H04N007/26 |
Foreign Application Data
Date |
Code |
Application Number |
Sep 2, 2010 |
CN |
PCT/CN2010/076555 |
Sep 2, 2010 |
CN |
PCT/CN2010/076564 |
Sep 2, 2010 |
CN |
PCT/CN2010/076567 |
Sep 2, 2010 |
CN |
PCT/CN2010/076569 |
Claims
1. A video processing system comprising: a video encoder operative
to encode a sequence of images captured by a video sensor into
video frames according to a desired video encoding standard and to
generate video analytics metadata based on information in the
sequence of images; a video analytics processor configured to
receive and process the video analytics metadata to produce video
analytics messages suitable for transmission to a client device and
that are useable for client-side video analytics processing.
2. The video processing system of claim 1, wherein the video
analytics metadata comprise pixel domain video analytics
information received directly from an analog-to-digital front
end.
3. The video processing system of claim 1, wherein the video
encoder comprises an encoding engine, and wherein the video
analytics metadata comprise pixel domain video analytics
information received directly from the encoding engine and
generated as the encoding engine is performing compression on the
sequence of images.
4. The video processing system of claim 3, wherein the video
analytics messages include information related to one or more of a
background model, a motion alarm, a virtual line detection and
electronic image stabilization parameters.
5. The video processing system of claim 2, wherein the video
analytics messages comprise video analytics messages related to a
group of images and include messages related to one or more of a
background frame, a foreground object segmentation descriptor, a
camera parameter, a virtual line and a predefined motion alarm
region.
6. The video processing system of claim 1, wherein the video
analytics messages comprise video analytics messages related to an
individual video frame and include messages related to one or more
of a global motion vector, a motion alarm region alarm status, a
virtual line count, an object tracking parameter and a camera
motion parameter.
7. The video processing system of claim 1, wherein the video
processing system is configured to transmit video analytics
messages to the client device in a layered structured network
bitstream comprising an encoder-generated video bitstream and at
least a portion of the video analytics metadata.
8. The video processing system of claim 7, wherein the video
analytics messages and the portion of the video analytics metadata
are transmitted in a supplemental enhancement information network
abstraction layer package unit of an H.264 bitstream.
9. A video decoding system comprising: a decoder configured to
extract video frames and one or more video analytics messages from
a network bitstream, wherein the video analytics messages comprise
information derived from pixel domain video analytics information
which identifies characteristics of a sequence of images
represented in the video frames; and one or more video processors
configured to produce video analytics metadata related to the video
frame based on the extracted video frames and the information in
the video analytics messages.
10. The video decoding system of claim 9, wherein the video
analytics metadata comprise pixel domain video analytics
information generated directly by an analog-to-digital front
end.
11. The video decoding system of claim 9, wherein the video
analytics metadata comprise pixel domain video analytics
information generated directly by an encoding engine as the engine
performed compression on the sequence of images.
12. The video decoding system of claim 11, wherein the video
analytics messages are received with a portion of the pixel domain
video analytics information in a supplemental enhancement
information network abstraction layer package unit of an H.264
bitstream.
13. The video decoding system of claim 9, wherein one or more video
processors extract a background image for a plurality of the video
frames based on the information in the video analytics
messages.
14. The video decoding system of claim 9, wherein one or more video
processors use the information in the video analytics messages to
monitor objects crossing a virtual line observed in a plurality of
the video frames.
15. The video decoding system of claim 9, wherein the one or more
video processors are configured to produce a global motion vector
using the information in the video analytics messages.
16. The video decoding system of claim 9, wherein one or more video
processors provide electronic image stabilization based on the
information in the video analytics messages.
17. The video decoding system of claim 9, wherein the video
analytics messages include information concerning one or more of a
background frame, a foreground object a segmentation descriptor, a
camera parameter, a virtual line and a predefined motion alarm
region.
18. The video decoding system of claim 9, wherein the video
analytics messages comprise video analytics messages concerning an
individual video frame and including information related to one or
more of a global motion vector, a motion alarm region alarm status,
a virtual line count, an object tracking parameter and a camera
motion parameter.
19. A non-transitory computer-readable medium encoded with data and
instructions wherein the data and instructions, when executed by a
processor of a video processing system, cause the video processing
system to perform a method comprising: encoding a sequence of
images captured by a video sensor into video frames according to a
desired video encoding standard; generating pixel domain video
analytics information from the sequence of images while encoding
the sequence of images; producing video analytics messages using
the pixel domain video analytics information; and transmitting the
video analytics messages concurrently with the video frames,
wherein the video analytics messages are configured to facilitate
client-side video analytics processing of the video frames.
20. The non-transitory computer-readable medium of claim 19,
wherein certain video analytics messages correspond to an
individual video frame and relate to one or more of a global motion
vector, a motion alarm region, a virtual line, object tracking and
camera motion.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] The present application claims priority from
PCT/CN2010/076555 (title: "Video Analytics for Security Systems and
Methods") which was filed in the Chinese Receiving Office on Sep.
2, 2010, from PCT/CN2010/076569 (title: "Video Classification
Systems and Methods") which was filed in the Chinese Receiving
Office on Sep. 2, 2010, from PCT/CN2010/076564 (title: "Rho-Domain
Metrics") which was filed in the Chinese Receiving Office on Sep.
2, 2010, and from PCT/CN2010/076567 (title: "Systems And Methods
for Video Content Analysis) which was filed in the Chinese
Receiving Office on Sep. 2, 2010, each of these applications being
hereby incorporated herein by reference. The present Application is
also related to concurrently filed U.S. Patent non-provisional
applications entitled "Video Classification Systems and Methods"
(attorney docket no. 043497-0393274), "Rho-Domain Metrics"
(attorney docket no. 043497-0393276) and "Systems And Methods for
Video Content Analysis" (attorney docket no. 043497-0393278), which
are expressly incorporated by reference herein.
BRIEF DESCRIPTION OF THE DRAWINGS
[0002] FIG. 1 is a block schematic illustrating a simplified
example of a video security surveillance analytics architecture
according to certain aspects of the invention.
[0003] FIG. 2 is a block schematic depicting an example of a video
analytics engine according to certain aspects of the invention.
[0004] FIG. 3 depicts an example of H.264 standards-defined
bitstream syntax.
[0005] FIG. 4A is an image that includes both foreground and
background objects.
[0006] FIG. 4B is the image of 4A from which foreground objects
have been extracted using techniques according to certain aspects
of the invention.
[0007] FIGS. 5A and 5B are images illustrating virtual line
counting according to certain aspects of the invention.
[0008] FIG. 6 is a simplified block schematic illustrating a
processing system employed in certain embodiments of the
invention.
DETAILED DESCRIPTION
[0009] Embodiments of the present invention will now be described
in detail with reference to the drawings, which are provided as
illustrative examples so as to enable those skilled in the art to
practice the invention. Notably, the figures and examples below are
not meant to limit the scope of the present invention to a single
embodiment, but other embodiments are possible by way of
interchange of some or all of the described or illustrated
elements. Wherever convenient, the same reference numbers will be
used throughout the drawings to refer to same or like parts. Where
certain elements of these embodiments can be partially or fully
implemented using known components, only those portions of such
known components that are necessary for an understanding of the
disclosed embodiments will be described, and detailed descriptions
of other portions of such known components will be omitted so as
not to obscure the disclosed embodiments. In the present
specification, an embodiment showing a singular component should
not be considered limiting; rather, the invention is intended to
encompass other embodiments including a plurality of the same
component, and vice-versa, unless explicitly stated otherwise
herein. Moreover, applicants do not intend for any term in the
specification or claims to be ascribed an uncommon or special
meaning unless explicitly set forth as such. Further, certain
embodiments of the present invention encompass present and future
known equivalents to the components referred to herein by way of
illustration.
[0010] Certain embodiments of the invention comprise systems having
an architecture that is operable to perform video analytics for
security applications. Video analytics may also be referred to as
video content analysis. In a video security surveillance analytics
architecture where the server encodes captured video images,
certain embodiments provide greatly improved video analytics
efficiency for client side processing applications and systems. By
improving and/or optimizing client side video analytics efficiency,
client-side performance can be greatly improved, consequently
enabling processing of an increased number of video channels.
Moreover, video analytics metadata ("VAMD") created on the server
side according to certain aspects of the invention can enable high
accuracy video analytics on the server side and for the video
security surveillance system as a whole. According to certain
aspects of the invention, the advantages of a layered video
analytics system architecture can include facilitating and/or
enabling a balanced partition of video analytics at multiple
layers. These layers may include server and client layers, pixel
domain layers and motion domain layers. For example, global
analytics defined to include information related to background
frame, segmented object descriptors and camera parameters can
enable cost efficient yet complex video analytics in the receiver
side for many advanced video intelligent application and can enable
an otherwise difficult or impossible level of video analytics
efficiency in terms of computational complexity and analytic
accuracy.
[0011] A simplified example of a video security surveillance
analytics architecture is shown in FIG. 1. In the example, the
system is partitioned into server side 10 and client side 12
elements. The terms server and client are used here to include
hardware and software systems, apparatus and other components that
perform types of functions that can be attributed to server side 10
and client side 12 operations. It will be appreciated that certain
elements may be provided on either or both server side 10 and
client side 12, and that at least some client and server
functionality may be committed to hardware components such as
application specific integrated circuits, sequencers, custom logic
devices as needed, typically to improve one or more of efficiency,
reliability, processing speed and security. Server side 10
components may be embodied in a security surveillance or other
camera.
[0012] On server side 10, a video sensor 100 can be configured to
capture information representative a sequence of images, including
video data, and passes the information to a video encoder module
102 adapted for use in embodiments of the invention. One example of
such video encoder module 102 is the TW5864 from Intersil Techwell
Inc., which can be adapted and/or configured to generate VAMD 103
related to video bitstream 105. In certain embodiments, video
encoder 102 can be configured to generate one or more compressed
video bitstream 105 that complies with industry standards and/or
that is generated according to a proprietary specification. The
video encoder 102 is typically configurable to produce VAMD103 that
can comprise pixel domain video analytics information, such as
information obtained directly from an analog-to-digital ("ND")
front end (e.g. at the video sensor 100) and/or from an encoding
engine 102 as the encoding engine 102 is performing video
compression to obtain video bitstream 103. VAMD103 may comprise
block base video analytics information including, for example,
macroblock ("MB") level information such as motion vector, MB-type
and/or number of non-zero coefficients, etc. A MB typically
comprises a 16.times.16 pixel block.
[0013] In certain embodiments, VAMD 123 can comprise any video
encoding intermediate data such as MB-type, motion vectors,
non-zero coefficient (as per the H.264 standard), quantization
parameter, DC or AC information, motion estimation metric sum of
absolute value ("SAD"), etc. VAMD 123 can also comprise useful
information such as motionFlag information generated in an analog
to digital front end module, such module being found, for example,
in the TW5864 device referenced above. VAMD is typically processed
in VAE 104 to generate more advanced video intelligent information
that may include, for example, motion indexing, background
extraction, object segmentation, motion detection, virtual line
detection, object counting, motion tracking and speed
estimation.
[0014] Video analytics engine 104 can be configured to receive the
VAMD103 from the encoder 102 and to process the VAMD103 using one
or more video analytics algorithms based on application
requirements. Video analytics engine 104 can generate useful video
analytics results, such as background model, motion alarm, virtual
line detections, electronic image stabilization parameters, etc. A
more detailed example of a video analytics engine 104 is shown in
FIG. 2. Video analytics results can comprise video analytics
messages ("VAM") that may be categorized into a global VAM class
and a local VAM class. Global VAM includes video analytics messages
applicable to a group of pictures, such as background frames,
foreground object segmentation descriptors, camera parameters,
predefined motion alarm regions coordination and index, virtual
lines, etc. Local VAM can be defined as localized VAM applied to a
specific individual video frame, and can include global motion
vectors of a current frame, motion alarm region alarm status of the
current frame, virtual line counting results, object tracking
parameters, camera moving parameters, and so on.
[0015] In certain embodiments, an encoder generated video bitstream
105, VAMD 103 and VAM generated by video analytics engine 104 are
packed together as a layered structure into a network bitstream 106
following a predefined packaging format. The network bitstream 106
can be sent though a network to client side of the system. The
network bitstream 106 may be stored locally, on a server and/or on
a remote storage device for future playback and/or
dissemination.
[0016] FIG. 3 depicts an example of an H.264 standards-defined
bitstream syntax, in which VAM and VAMD 103 can be packed into a
supplemental enhancement information ("SEI") network abstraction
layer package unit. Following SPS, PPS and IDR network abstraction
layer units, a global video analytics ("GVA") SEI network
abstraction layer unit can be inserted into network bitstream 106.
The GVA network abstraction layer unit may include the global video
analytics messages for a corresponding group of pictures, a pointer
to the first local video analytics SEI network abstraction layer
location within the group of pictures, and pointer to the next GVA
network abstraction layer unit, and may include an indication of
the duration of frames which the GVA applicable. Following each
individual frame which is associated with VAM or VAMD elements, a
local video analytics ("LVA") SEI network abstraction layer unit is
inserted right after the frame's payload network abstraction layer
unit. The LVA can comprise local VAM, VAMD information and a
pointer to a location of the next frame which has LVA SEI network
abstraction layer unit. The amount of VAMD packed into an LVA
network abstraction layer unit depends on the network bandwidth
condition and the complexity of user video analytics requirement.
For example, if sufficient network bandwidth is available,
additional VAMD can be packed. The VAMD can be used by client side
video analytics systems and may simplify and/or optimize
performance of certain functions. When network bandwidth is
limited, less VAMD may be sent to meet the network bandwidth
constraints. While FIG. 3 illustrates a bitstream format for H.264
standards, the principles involved may be applied in other video
standards and formats.
[0017] In certain embodiments of the invention, a client side
system 12 receives and decodes the network bitstream106 sent from a
server side system 10. The advantages of a layered video analytics
system architecture, which can include facilitating and/or enabling
a balanced partition of video analytics at multiple layers, become
apparent at the client side 12. Layers can include server and
client layers, pixel domain layers and motion domain layers. Global
video analytics messages such as background frame, segmented object
descriptors and camera parameters can enable a cost efficient yet
complicated video analytics in the receiver side for many advanced
video intelligent applications. The VAM enables an otherwise
difficult or impossible level of video analytics efficiency in term
of computational complexity and analytic accuracy.
[0018] In certain embodiments of the invention, the client side
system 12 separates the compressed video bitstream 125, the VAMD
123 and the VAM from the network bitstream 106. The video bitsream
can be decoded using decoder 124 and provided with VAMD 123 and
associated VAM to client application 122. Client application
typically employs video analytics techniques appropriate for the
application at hand. For example, analytics may include background
extraction, motion tracking, object detection, and other functions.
Known analytics can be selected and adapted to use the VAMD 103 and
VAM that were derived from the encoder 102 and video analytics
engine 104 at the server side 10 to obtain richer and more accurate
results 120. Adaptions of the analytics may be based on speed
requirements, efficiency, and the enhanced information available
through the VAM and VAMD 123.
[0019] Certain advantages may be accrued from video analytics
system architecture and layered video analytics information
embedded in network bitstreams according to certain aspects of the
invention. For example, greatly improved video analytics efficiency
can be obtained on the client side 12. In one example, video
analytics engine 104 receives and processes encoder feedback VAMD
to produce the video analytics information that may be embedded in
the network bitstream 106. The use of embedded layered VAM provides
users direct access to a video analytics message of interest, and
permits use of VAM with limited or no additional processing. In one
example, additional processing would be unnecessary to access the
motion frame, number of object passing a virtual line, object
moving speed and classification, etc. In certain embodiments,
information related to object tracking may be generated using
additional, albeit limited, processing related to the motion of the
identified object. Information related to electronic image
stabilization may be obtained by additional processing based on the
global motion information provided in VAM. Accordingly, in certain
embodiments, client side 12 video analytics efficiency can be
optimized and performance can be greatly improved, consequently
enabling processing of an increased number of channels.
[0020] Certain embodiments enable operation of high-accuracy video
analytics applications on the client side 12. According to certain
aspects of the invention, client side 12 video analytics may be
performed using information generated on the server side 10.
Without VAM embedded in the network bitstream 106, client side
video analytics processing would have to rely on video
reconstructed from the decoded video bitstream 125. Decoded
bitstream 125 typically lacks some of the detailed information of
the original video content (e.g. content provided by video sensor
100), which may be discarded or lost in the video compression
process. Consequently, video analytics performed solely on the
client side 12 cannot generally preserve the accuracy that can be
obtained if the processing was performed at the server side 10, or
at the client side 12 using VAMD 123 derived from original video
content on the server side 10. Loss of accuracy due to analytics
processing that is limited to client side 12 can exhibit problems
with geometric center of an object, object segmentation, etc.
Therefore, embedded VAM can enable improved system-level
accuracy.
[0021] Certain embodiments of the invention enable fast video
indexing, searching and other applications. In particular,
embedded, layered VAM in the network bitstream enables fast video
indexing, video searching, video classification applications and
other applications in the client side. For instance, motion
detection information, object indexing, foreground and background
partition, human detection, human behavior classification
information of the VAM can simplify client-side and/or downstream
tasks that include, for example, video indexing, classification and
fast searching in the client. Without VAM, a client generally needs
vast computational power to process the video data and to rebuild
the required video analytics information for a variety of
applications including the above-listed applications. It will be
appreciated that not all VAM can be accurately reconstructed at the
client side 12 using video bitstream 125 and it is possible that
certain applications, such as human behavioral analysis
applications, cannot even be performed if VAM created at server
side 10 is not available.
[0022] Certain embodiments of the invention permit the use of more
complex server/client algorithms, partitioning of computational
capability and balancing of network bandwidth. In certain
embodiments, the video analytics system architecture allows video
analytics to be partitioned between server and client sides based
on network bandwidth availability, server and client computational
capability and the complexity of the video analytics. In one
example, in response to low network bandwidth conditions, the
system can embed more condensed VAM in the network bitstream 106
after processing by the VAE 104. The VAM can include motion frame
index, object index, and so on. After extracting the VAM from the
bitstream, the client side 12 system can utilize the VAM to assist
further video analytics processing. More VAMD 103 can be directly
embedded into the network bitstream 106 and processing by the VAE
104 can be limited or halted when computational power is limited on
the server side 10. Computational power on the server side 10 may
be limited when, for example, the server side 10 system is embodied
in a camera, a digital video recorder ("DVR") or network video
recorder ("NVR"). Certain embodiments may use client side 12
systems to process embedded VAMD 123 in order to accomplish the
desired video analytics function system. In some embodiments, more
video analytics functions can be partitioned and/or assigned to
server side 10 when, for example, the client side is required to
monitor and/or process multiple channels simultaneously. It will be
appreciated, therefore, that a balanced video analytics system can
be achieved for a variety of system configurations.
EXAMPLES
[0023] With reference to FIG. 2, certain embodiments provide
electronic image stabilization ("EIS") capabilities 220. EIS 220
finds wide application that can be used in video security
applications. A current captured video frame is processed with
reference to the previous reconstructed reference frame or frames
and generates a global motion vector 202 for the current frame,
utilizing the global motion vector to compensate the reconstructed
image in the client side to reduce or eliminate image instability
or shaking.
[0024] In a conventional pixel domain EIS algorithm, the current
and previous reference frames are fetched, a block based or
grey-level histogram based matching algorithm is applied to obtain
local motion vectors, and the local motion vectors are processed to
generate a pixel domain global motion vector. The drawbacks of the
conventional approach include the high computational cost
associated with the matching algorithm used to generate local
motion vectors and the very high memory bandwidth required to fetch
both current reconstructed frame and previous reference frames.
[0025] In certain embodiments of the invention, the video encoding
engine 102 can generate VAMD 103 including block-based motion
vectors, MB-type, etc., as a byproduct of video compression
processing. VAMD 103 is fed into VAE 104, which can be configured
to process the VAMD 103 information in order to generate global
motion vector 202 as a VAM. The VAM is then embedded into the
network bitstream 106 to transmit to the client side 12, typically
over a network. A client side 12 processor can parse the network
bitstream 106, extract the global motion information for each frame
and apply global motion compensation to accomplish EIS 220.
Video Background Modeling
[0026] Certain embodiments of the invention comprise a video
background modeling feature that can construct or reconstruct a
background image 222 which can provide highly desired information
for use in a wide variety of video surveillance applications,
including motion detection, object segmentation, abundant object
detection, etc. Conventional pixel domain background extraction
algorithms operate on a statistical model of multiple frame
co-located pixel values. For example, a Gauss model is used to
model N continuous frames' co-located pixels and to select the
mathematical most likely pixel value as the background pixel. If a
video frame's height is denoted as H, width as W and continuous N
frames to satisfy the statistical model requirement, then total
W*H*N pixels are needed to process to generate a background
frame.
[0027] In certain embodiments, MB-based VAMD 103 is used to
generate the background information rather than pixel-based
background information. According to certain aspects of the
invention, the volume of information generated from VAMD 103 is
typically only 1/256 of the volume of pixel-based information. In
one example, MB based motion vector and non-zero-count information
can be used to detect background from foreground moving object.
FIG. 4A shows an original image with background and foreground
objects, and FIG. 4B shows a typical background extracted by
processing VAMD.
[0028] Certain embodiments of the invention provide systems and
methods for motion detection 200 and virtual line counting 201. A
motion detector 200 can be used to automatically detect motion of
objects including humans, animals and/or vehicles entering
predefined regions of interest. Virtual line detection and counting
module 201 can detect a moving object that crosses an invisible
line defined by user configuration and that can count a number of
objects crossing the line as illustrated in FIGS. 5A and 5B. The
virtual line can be based on actual lines in the image and can be a
delineation of an area defined by a polygon, circle, ellipse or
irregular area. In some embodiments, the number of objects crossing
one or more lines can be recorded as an absolute number and/or as a
statistical frequency and an alarm may be generated to indicate any
line crossing, a threshold frequency or absolute number of
crossings and/or an absence of crossings within a predetermined
time. In certain embodiments, motion detection 200 and virtual line
and counting 201 can be achieved by processing one or more MB-based
VAMDs. Information such as motion alarm and object count across
virtual line can be packed as VAM is transmitting to the client
side 12. Motion indexing, object counting or similar customized
applications can be easily archived by extracting the VAM with
simple processing. It will be appreciated that configuration
information may be provided from client side to server side as a
form of feedback, using packed information as a basis for resetting
lines, areas of interest and so on.
[0029] Certain embodiments of the invention provide improved object
tracking within a sequence of video frames using VAMD 103. Certain
embodiments can facilitate client side measurement of speed of
motion of objects and can assist in identifying directions of
movement. Furthermore, VAMD 103 can provide useful information
related to video mosaics 221, including motion indexing and object
counting.
System Description
[0030] Turning now to FIG. 6, certain embodiments of the invention
employ a processing system that includes at least one computing
system 60 deployed to perform certain of the steps described above.
Computing system 60 may be a commercially available system that
executes commercially available operating systems such as Microsoft
Windows.RTM., UNIX or a variant thereof, Linux, a real time
operating system and or a proprietary operating system. The
architecture of the computing system may be adapted, configured
and/or designed for integration in the processing system, for
embedding in one or more of an image capture system, communications
device and/or graphics processing systems. In one example,
computing system 60 comprises a bus 602 and/or other mechanisms for
communicating between processors, whether those processors are
integral to the computing system 60 (e.g. 604, 605) or located in
different, perhaps physically separated computing systems 60.
Typically, processor 604 and/or 605 comprises a CISC or RISC
computing processor and/or one or more digital signal processors.
In some embodiments, processor 604 and/or 605 may be embodied in a
custom device and/or may perform as a configurable sequencer.
Device drivers 603 may provide output signals used to control
internal and external components and to communicate between
processors 604 and 605.
[0031] Computing system 60 also typically comprises memory 606 that
may include one or more of random access memory ("RAM"), static
memory, cache, flash memory and any other suitable type of storage
device that can be coupled to bus 602. Memory 606 can be used for
storing instructions and data that can cause one or more of
processors 604 and 605 to perform a desired process. Main memory
606 may be used for storing transient and/or temporary data such as
variables and intermediate information generated and/or used during
execution of the instructions by processor 604 or 605. Computing
system 60 also typically comprises non-volatile storage such as
read only memory ("ROM") 608, flash memory, memory cards or the
like; non-volatile storage may be connected to the bus 602, but may
equally be connected using a high-speed universal serial bus (USB),
Firewire or other such bus that is coupled to bus 602. Non-volatile
storage can be used for storing configuration, and other
information, including instructions executed by processors 604
and/or 605. Non-volatile storage may also include mass storage
device 610, such as a magnetic disk, optical disk, flash disk that
may be directly or indirectly coupled to bus 602 and used for
storing instructions to be executed by processors 604 and/or 605,
as well as other information.
[0032] In some embodiments, computing system 60 may be
communicatively coupled to a display system 612, such as an LCD
flat panel display, including touch panel displays,
electroluminescent display, plasma display, cathode ray tube or
other display device that can be configured and adapted to receive
and display information to a user of computing system 60.
Typically, device drivers 603 can include a display driver,
graphics adapter and/or other modules that maintain a digital
representation of a display and convert the digital representation
to a signal for driving a display system 612. Display system 612
may also include logic and software to generate a display from a
signal provided by system 600. In that regard, display 612 may be
provided as a remote terminal or in a session on a different
computing system 60. An input device 614 is generally provided
locally or through a remote system and typically provides for
alphanumeric input as well as cursor control 616 input, such as a
mouse, a trackball, etc. It will be appreciated that input and
output can be provided to a wireless device such as a PDA, a tablet
computer or other system suitable equipped to display the images
and provide user input.
[0033] In certain embodiments, computing system 60 may be embedded
in a system that captures and/or processes images, including video
images. In one example, computing system may include a video
processor or accelerator 617, which may have its own processor,
non-transitory storage and input/output interfaces. In another
example, video processor or accelerator 617 may be implemented as a
combination of hardware and software operated by the one or more
processors 604, 605. In another example, computing system 60
functions as a video encoder, although other functions may be
performed by computing system 60. In particular, a video encoder
that comprises computing system 60 may be embedded in another
device such as a camera, a communications device, a mixing panel, a
monitor, a computer peripheral, and so on.
[0034] According to one embodiment of the invention, portions of
the described invention may be performed by computing system 60.
Processor 604 executes one or more sequences of instructions. For
example, such instructions may be stored in main memory 606, having
been received from a computer-readable medium such as storage
device 610. Execution of the sequences of instructions contained in
main memory 606 causes processor 604 to perform process steps
according to certain aspects of the invention. In certain
embodiments, functionality may be provided by embedded computing
systems that perform specific functions wherein the embedded
systems employ a customized combination of hardware and software to
perform a set of predefined tasks. Thus, embodiments of the
invention are not limited to any specific combination of hardware
circuitry and software.
[0035] The term "computer-readable medium" is used to define any
medium that can store and provide instructions and other data to
processor 604 and/or 605, particularly where the instructions are
to be executed by processor 604 and/or 605 and/or other peripheral
of the processing system. Such medium can include non-volatile
storage, volatile storage and transmission media. Non-volatile
storage may be embodied on media such as optical or magnetic disks,
including DVD, CD-ROM and BluRay. Storage may be provided locally
and in physical proximity to processors 604 and 605 or remotely,
typically by use of network connection. Non-volatile storage may be
removable from computing system 604, as in the example of BluRay,
DVD or CD storage or memory cards or sticks that can be easily
connected or disconnected from a computer using a standard
interface, including USB, etc. Thus, computer-readable media can
include floppy disks, flexible disks, hard disks, magnetic tape,
any other magnetic medium, CD-ROMs, DVDs, BluRay, any other optical
medium, punch cards, paper tape, any other physical medium with
patterns of holes, RAM, PROM, EPROM, FLASH/EEPROM, any other memory
chip or cartridge, or any other medium from which a computer can
read.
[0036] Transmission media can be used to connect elements of the
processing system and/or components of computing system 60. Such
media can include twisted pair wiring, coaxial cables, copper wire
and fiber optics. Transmission media can also include wireless
media such as radio, acoustic and light waves. In particular radio
frequency (RF), fiber optic and infrared (IR) data communications
may be used.
[0037] Various forms of computer readable media may participate in
providing instructions and data for execution by processor 604
and/or 605. For example, the instructions may initially be
retrieved from a magnetic disk of a remote computer and transmitted
over a network or modem to computing system 60. The instructions
may optionally be stored in a different storage or a different part
of storage prior to or during execution.
[0038] Computing system 60 may include a communication interface
618 that provides two-way data communication over a network 620
that can include a local network 622, a wide area network or some
combination of the two. For example, an integrated services digital
network (ISDN) may used in combination with a local area network
(LAN). In another example, a LAN may include a wireless link.
Network link 620 typically provides data communication through one
or more networks to other data devices. For example, network link
620 may provide a connection through local network 622 to a host
computer 624 or to a wide are network such as the Internet 628.
Local network 622 and Internet 628 may both use electrical,
electromagnetic or optical signals that carry digital data
streams.
[0039] Computing system 60 can use one or more networks to send
messages and data, including program code and other information. In
the Internet example, a server 630 might transmit a requested code
for an application program through Internet 628 and may receive in
response a downloaded application that provides or augments
functional modules such as those described in the examples above.
The received code may be executed by processor 604 and/or 605.
[0040] Additional Descriptions of Certain Aspects of the
Invention
[0041] The foregoing descriptions of the invention are intended to
be illustrative and not limiting. For example, those skilled in the
art will appreciate that the invention can be practiced with
various combinations of the functionalities and capabilities
described above, and can include fewer or additional components
than described above. Certain additional aspects and features of
the invention are further set forth below, and can be obtained
using the functionalities and components described in more detail
above, as will be appreciated by those skilled in the art after
being taught by the present disclosure.
[0042] Certain embodiments of the invention provide video
processing systems and methods. Some of these embodiments comprise
a processor configured to receive video frames representative of a
sequence of images captured by a video sensor. Some of these
embodiments comprise a video encoder operative to encode the video
frames according to a desired video encoding standard. Some of
these embodiments comprise a video analytics processor that
receives video analytics metadata generated by the video encoder
from the sequence of images. In some of these embodiments, the
video analytics processor is configurable to produce video
analytics messages for transmission to a client device. In some of
these embodiments, the video analytics messages are used for client
side video analytics processing.
[0043] In some of these embodiments, the video analytics metadata
comprise pixel domain video analytics information. In some of these
embodiments, the pixel domain video analytics information includes
information received directly from an analog-to-digital front end.
In some of these embodiments, the pixel domain video analytics
information includes information received directly from an encoding
engine as the engine is performing compression. In some of these
embodiments, the video analytics messages include information
related to one or more of a background model, a motion alarm, a
virtual line detection and electronic image stabilization
parameters. In some of these embodiments, the video analytics
messages comprise video analytics messages related to a group of
images, including messages related to one or more of a background
frame, a foreground object segmentation descriptor, a camera
parameter, a virtual line and a predefined motion alarm region.
[0044] In some of these embodiments, the video analytics messages
comprise video analytics messages related to an individual video
frame, including messages related to one or more of a global motion
vector, a motion alarm region alarm status, a virtual line count,
an object tracking parameter and a camera motion parameter. In some
of these embodiments, the video analytics messages are transmitted
to the client device in a layered structure network bitstream
comprising encoder generated video bitstream, a portion of the
video analytics metadata. In some of these embodiments, the video
analytics messages and the portion of the video analytics metadata
are transmitted in a supplemental enhancement information network
abstraction layer package unit of an H.264 bitstream.
[0045] Certain embodiments of the invention provide video decoding
systems and methods. Some of these embodiments comprise a decoder
configured to extract a video frame and one or more video analytics
messages from a network bitstream. In some of these embodiments,
the video analytics messages provide information related to
characteristics of the video frame. Some of these embodiments
comprise one or more video processors configured to produce video
analytics metadata related to the video frame based on content of
the video frame and the video analytics messages.
[0046] In some of these embodiments, the video analytics metadata
comprise pixel domain video analytics information received directly
from an analog-to-digital front end. In some of these embodiments,
the video analytics metadata comprise pixel domain video analytics
information received directly from an encoding engine as the engine
was performing compression. In some of these embodiments, the video
analytics messages comprise video analytics messages related to a
plurality of video frames, including messages related to one or
more of a background frame, a foreground object segmentation
descriptor, a camera parameter, a virtual line and a predefined
motion alarm region. In some of these embodiments, the video
analytics messages comprise video analytics messages related to an
individual video frame, including messages related to one or more
of a global motion vector, a motion alarm region alarm status, a
virtual line count, an object tracking parameter and a camera
motion parameter.
[0047] In some of these embodiments, the video analytics messages
are received in a supplemental enhancement information network
abstraction layer package unit of an H.264 bitstream. In some of
these embodiments, the video analytics messages are received in a
supplemental enhancement information network abstraction layer
package unit of an H.264 bitstream and together with a portion of
the pixel domain video analytics information. In some of these
embodiments, the one or more video processors configured to produce
a global motion vector. In some of these embodiments, the one or
more video processors provide electronic image stabilization based
on the video analytics messages. In some of these embodiments, the
one or more video processors extract a background image for a
plurality of video frames based on the video analytics messages. In
some of these embodiments, the one or more video processors use the
video analytics messages to monitor objects crossing a virtual line
in a plurality of video frames.
[0048] Although the present invention has been described with
reference to specific exemplary embodiments, it will be evident to
one of ordinary skill in the art that various modifications and
changes may be made to these embodiments without departing from the
broader spirit and scope of the invention. Accordingly, the
specification and drawings are to be regarded in an illustrative
rather than a restrictive sense.
* * * * *