U.S. patent application number 11/520532 was filed with the patent office on 2007-03-15 for storage of video analysis data for real-time alerting and forensic analysis.
Invention is credited to Thomas A. JR. Faulhaber, Stephen D. Fleischer, Timothy B. Frederick, Gordon T. Haupt, Marcus S. Marinelli, Colvin H. Pitts, Robert P. Vallone.
Application Number | 20070058842 11/520532 |
Document ID | / |
Family ID | 37865593 |
Filed Date | 2007-03-15 |
United States Patent
Application |
20070058842 |
Kind Code |
A1 |
Vallone; Robert P. ; et
al. |
March 15, 2007 |
Storage of video analysis data for real-time alerting and forensic
analysis
Abstract
A method and apparatus for storing video data is provided. Video
data, that comprises a series of frames, is received. Information
about changes that are detected in the series of frames is
generated. The information is aggregated to generate a plurality of
video data change records. Each video data change record
corresponds to a plurality of frames and includes change
information that indicates changes that were detected relative to
the corresponding plurality of frames. Events of interest that
satisfy specified search criteria are searched for by comparing the
specified search criteria against change information in one or more
of the plurality of video data change records.
Inventors: |
Vallone; Robert P.; (Palo
Alto, CA) ; Fleischer; Stephen D.; (San Francisco,
CA) ; Pitts; Colvin H.; (Mountain View, CA) ;
Haupt; Gordon T.; (San Francisco, CA) ; Frederick;
Timothy B.; (San Francisco, CA) ; Faulhaber; Thomas
A. JR.; (San Francisco, CA) ; Marinelli; Marcus
S.; (Palo Alto, CA) |
Correspondence
Address: |
HICKMAN PALERMO TRUONG & BECKER, LLP
2055 GATEWAY PLACE
SUITE 550
SAN JOSE
CA
95110
US
|
Family ID: |
37865593 |
Appl. No.: |
11/520532 |
Filed: |
September 12, 2006 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
60716729 |
Sep 12, 2005 |
|
|
|
Current U.S.
Class: |
382/115 ;
348/701; G9B/27.029; G9B/27.043 |
Current CPC
Class: |
G11B 27/322 20130101;
G08B 13/19602 20130101; H04N 21/4728 20130101; G06K 9/3233
20130101; G11B 27/28 20130101; H04N 21/45455 20130101; G06K 9/2081
20130101; G06F 16/7837 20190101; H04N 21/2353 20130101; H04N
21/45457 20130101; G06T 7/20 20130101; G08B 13/19671 20130101; H04N
21/4828 20130101; G06K 9/00744 20130101; G06F 16/7335 20190101;
G06K 9/00785 20130101; G06K 9/00771 20130101; G06F 16/786 20190101;
G08B 13/19676 20130101 |
Class at
Publication: |
382/115 ;
348/701 |
International
Class: |
G06K 9/00 20060101
G06K009/00; H04N 5/14 20060101 H04N005/14 |
Claims
1. A machine-implemented method, comprising: generating information
about changes that are detected in video data that includes a
series of frames; aggregating the information to generate a
plurality of video data change records, wherein each video data
change record corresponds to a plurality of frames and includes
change information that indicates changes that were detected in
video data relative to the corresponding plurality of frames; and
storing said aggregated video data change records for subsequent
forensic analysis.
2. The method of claim 1, wherein: the plurality of video data
change records includes a particular video data change record that
corresponds to an event detected within the video data; and the
change information contained in the particular video data change
record indicates changes that occurred in the frames that
correspond to said event.
3. The method of claim 1, wherein: the plurality of video data
change records includes a set of video data change records that
each corresponds to a specified time interval; and the change
information contained in each video data change record of the set
of video data change records indicates changes that occurred in the
frames that correspond to the corresponding specified time
interval.
4. The method of claim 1, wherein the change information contained
in each video data change record of the plurality of video data
change records includes change type information that indicates at
least one of: (a) a change in pixel values that occurred during the
plurality of frames that correspond to said each video data change
record, and (b) a detection of motion of one or more objects in the
plurality of frames that correspond to said each video data change
record.
5. The method of claim 4, wherein: the change type information of
each video data change record indicates the detection of motion of
one or more objects captured in the plurality of frames that
correspond to said each video data change record; and said each
video data change record further indicates at least one of a
direction and a speed of said motion.
6. The method of claim 1, wherein: said plurality of video data
change records includes a plurality of region change records; and
each region change record of the plurality of region change records
corresponds to a separate region of a frame and contains region
change information about the changes that were detected in the
separate region relative to the plurality of frames that correspond
to the particular video data change record.
7. The method of claim 6, wherein: the region change information of
said each region change record is filtered based on other region
change information of other region change records of the plurality
of region change records; and the other region change records
correspond to other regions, of the frame, that are adjacent to the
region that corresponds to said each region change record.
8. The method of claim 1, wherein the change information of a
particular video data change record of the plurality of video data
change records is filtered, based on the plurality of frames that
correspond to said particular video data change record.
9. The method of claim 1, further comprising searching for events
of interest that satisfy specified search criteria by comparing the
specified search criteria against change information in one or more
of said plurality of video data change records.
10. The method of claim 6, wherein: the method further comprises
searching for events of interest that satisfy specified search
criteria by comparing the specified search criteria against change
information in one or more of said plurality of video data change
records; said specified search criteria indicates a first region of
a set of frames in which the events of interest must have occurred;
searching for the events of interest that satisfy the specified
search criteria includes: identifying a first set of video data
change records that correspond to an area of the view that includes
said first region and is larger than said first region; and based
on the first set of video data change records, identifying a second
set of video data change records that correspond to said first
region, wherein the change information of each video data change
record in the second set satisfies the specified search
criteria.
11. The method of claim 1, wherein: the method further comprises
searching for events of interest that satisfy specified search
criteria by comparing the specified search criteria against change
information in one or more of said plurality of video data change
records; said specified search criteria specifies time at which the
events of interest must have occurred; searching for the events of
interest that satisfy the specified search criteria includes:
identifying a first set of video data change records that
correspond to a first time interval that includes said specified
time; and based on said first set of video data change records,
identifying a second set of video data change records that
aggregate change information at a finer level of granularity than
the video data change records in said first set of video data
change records; wherein the change information of each video data
change record in the second set includes said specified time and
satisfies the specified search criteria.
12. The method of claim 1, wherein at least one video data change
record of said plurality of video data change records corresponds
to a time interval that was established, at least in part, by an
event that is not reflected in said video data.
13. The method of claim 9, further comprising: receiving second
specified search criteria; detecting an event in said video data as
the video data is being received; generating aggregated change
information associated with said event; before a video data change
record is generated for said event, comparing said second specified
search criteria against said aggregated change information; and if
said aggregated change information satisfies said second specified
search criteria, immediately generating an alert that indicates
that said second specified search criteria has been satisfied.
14. A machine-implemented method, comprising: generating
information about changes that are detected in video data that
includes a series of frames; and aggregating the information to
generate a first plurality of video data change records, wherein
each video data change record in the first plurality of change
records corresponds to a plurality of frames and includes change
information, aggregated at a relatively fine level of granularity,
that indicates changes that were detected relative to the
corresponding plurality of frames; and aggregating the information
to generate a second plurality of video data change records,
wherein each video data change record in the second plurality of
change records corresponds to a plurality of frames and includes
change information, aggregated at a relatively coarse level of
granularity, that indicates changes that were detected relative to
the corresponding plurality of frames.
15. The method of claim 14 wherein: each of the first plurality of
video data change records corresponds to regions of a view; and
each of the second plurality of video data change records
corresponds to the entire view.
16. The method of claim 14 wherein: each of the first plurality of
video data change records corresponds to time intervals of a first
duration; and each of the second plurality of video data change
records corresponds to time intervals of a second duration, wherein
the second duration is longer than said first duration.
17. A computer-readable medium carrying one or more sequences of
instructions which, when executed by one or more processors, causes
the one or more processors to perform the method recited in claim
1.
18. A computer-readable medium carrying one or more sequences of
instructions which, when executed by one or more processors, causes
the one or more processors to perform the method recited in claim
2.
19. A computer-readable medium carrying one or more sequences of
instructions which, when executed by one or more processors, causes
the one or more processors to perform the method recited in claim
3.
20. A computer-readable medium carrying one or more sequences of
instructions which, when executed by one or more processors, causes
the one or more processors to perform the method recited in claim
4.
21. A computer-readable medium carrying one or more sequences of
instructions which, when executed by one or more processors, causes
the one or more processors to perform the method recited in claim
5.
22. A computer-readable medium carrying one or more sequences of
instructions which, when executed by one or more processors, causes
the one or more processors to perform the method recited in claim
6.
23. A computer-readable medium carrying one or more sequences of
instructions which, when executed by one or more processors, causes
the one or more processors to perform the method recited in claim
7.
24. A computer-readable medium carrying one or more sequences of
instructions which, when executed by one or more processors, causes
the one or more processors to perform the method recited in claim
8.
25. A computer-readable medium carrying one or more sequences of
instructions which, when executed by one or more processors, causes
the one or more processors to perform the method recited in claim
9.
26. A computer-readable medium carrying one or more sequences of
instructions which, when executed by one or more processors, causes
the one or more processors to perform the method recited in claim
10.
27. A computer-readable medium carrying one or more sequences of
instructions which, when executed by one or more processors, causes
the one or more processors to perform the method recited in claim
11.
28. A computer-readable medium carrying one or more sequences of
instructions which, when executed by one or more processors, causes
the one or more processors to perform the method recited in claim
12.
29. A computer-readable medium carrying one or more sequences of
instructions which, when executed by one or more processors, causes
the one or more processors to perform the method recited in claim
13.
30. A computer-readable medium carrying one or more sequences of
instructions which, when executed by one or more processors, causes
the one or more processors to perform the method recited in claim
14.
31. A computer-readable medium carrying one or more sequences of
instructions which, when executed by one or more processors, causes
the one or more processors to perform the method recited in claim
15.
32. A computer-readable medium carrying one or more sequences of
instructions which, when executed by one or more processors, causes
the one or more processors to perform the method recited in claim
16.
Description
CLAIM OF PRIORITY
[0001] This application claims priority to U.S. Provisional Patent
Application Ser. No. 60/716,729 filed Sep. 12, 2005, the contents
of which is incorporated herein in its entirety for all
purposes.
FIELD OF THE INVENTION
[0002] The present invention relates to efficiently storing video
data, and more specifically, storing information about visual
changes that has been aggregated over a series of frames of the
video data.
BACKGROUND
[0003] Analyzing video streams to determine whether or not any
interesting activities or objects are present is a
resource-intensive operation. Software applications are used to
analyze video data, attempting to recognize certain activities or
objects in the video data. For example, recognition applications
exist for recognizing faces, gestures, vehicles, guns, motion, and
the like. Often, such applications are used to analyze surveillance
video streams for security purposes.
[0004] If a user is interested in whether a particular object (e.g.
face or gun) appears in a video stream, a software application may
be used to detect the particular object, and store data that
records that the object was detected. Typically, the amount of
storage space needed to record the detection of those objects is
relatively small. However, under some circumstances, one may not
know ahead-of-time what events of interest will occur in a video
stream. In such cases, one could theoretically try to detect and
capture all possible changes that occur within the video stream.
However, doing so would require a prohibitively large amount of
storage space. Not only would storage capacity issues arise from
storing all possible change information, but it would be difficult
to perform searches against such a vast amount of information.
[0005] Due to the impracticality of such an all-changes storage
technique, current approaches for scanning for suspicious behavior
captured in video necessarily employ human involvement. Not only is
significant human involvement prohibitively expensive (especially
for small to mid-size businesses), people are prone to error.
Watching hours of live or recorded video is extremely fatiguing,
which fatigue may result in missing suspicious activity.
Furthermore, a computer may operate continuously whereas people
require sleep and rest.
[0006] Based on the foregoing, there is a need for efficiently
storing motion and other change information to reduce the amount of
data stored and to increase search speed.
[0007] The approaches described in this section are approaches that
could be pursued, but not necessarily approaches that have been
previously conceived or pursued. Therefore, unless otherwise
indicated, it should not be assumed that any of the approaches
described in this section qualify as prior art merely by virtue of
their inclusion in this section.
BRIEF DESCRIPTION OF THE DRAWINGS
[0008] The present invention is illustrated by way of example, and
not by way of limitation, in the figures of the accompanying
drawings and in which like reference numerals refer to similar
elements and in which:
[0009] FIG. 1 is a flow diagram that illustrates how video data may
be stored, according to an embodiment of the invention;
[0010] FIG. 2 is a block diagram that illustrates how video data
change records may represent varying amounts of video data,
according to an embodiment of the invention;
[0011] FIG. 3 is a graphical depiction that illustrates how change
information may be stored on a per-region basis, according to an
embodiment of the invention;
[0012] FIG. 4 is a block diagram that illustrates how video data
change records may store specific and generalized change
information, according to an embodiment of the invention; and
[0013] FIG. 5 is a block diagram of a computer system on which
embodiments of the invention may be implemented.
DETAILED DESCRIPTION
[0014] In the following description, for the purposes of
explanation, numerous specific details are set forth in order to
provide a thorough understanding of the present invention. It will
be apparent, however, that the present invention may be practiced
without these specific details. In other instances, well-known
structures and devices are shown in block diagram form in order to
avoid unnecessarily obscuring the present invention.
General Overview
[0015] In order to (1) store a minimal amount of video data and (2)
quickly search video data for certain events, an efficient storage
mechanism is proposed. Instead of storing information related to
changes detected in video data on a frame-by-frame basis, the
information is aggregated across all or most of the corresponding
frames and stored as a single logical record in a storage system.
For example, typical video cameras and display devices operate at
approximately 24 video frames per second. If motion is detected
within a particular view and the motion lasts for one minute, then
instead of storing 1440 different records to represent the motion,
the motion information is stored in a single record that represents
the 1440 frames corresponding to the motion.
[0016] Because the amount of information that is stored is
relatively small, the searches for visual changes that satisfy
certain criteria can also be performed very efficiently. For
example, if a user desired to search for a certain type of motion
in 1440 frames, then only one record would have to be searched, as
opposed to 1440 records.
[0017] The embodiments of the invention described herein are
illustrated in the context of video surveillance systems. However,
embodiments of the invention are limited to that context.
Embodiments of the invention are also relevant in other
non-surveillance contexts, such as searching for certain motion
patterns in a series of computer-generated frames.
Functional Overview
[0018] FIG. 1 is a flow diagram that illustrates how video data may
be stored and used to search for changes detected in the video
data, according to an embodiment of the invention. In step 102,
video data that comprises a series of frames is received. In step
104, information about visual changes that are detected in the
series of frames is generated. In step 106, the generated
information is aggregated to generate a plurality of video data
change records (VDCRs). Each video data change record corresponds
to a plurality of frames and includes change information that
indicates visual changes that were detected relative to the
corresponding plurality of frames.
[0019] In step 108, events of interest that satisfy specified
search criteria are searched for by comparing the specified search
criteria against change information in one or more of the plurality
of visual change records. As shall be described in greater detail
hereafter, changes detected in the same sequence of video frames
may be aggregated at multiple levels of granularity. For example,
in the spatial dimension, the changes may be aggregated at the
entire-view level, the quadrant level, and the grid-point level.
Similarly, in the temporal dimension, the changes may be aggregated
at per-week, per-day, per-hour, per-minute and per-second levels of
granularity, or at variable time intervals that depend on other
criteria.
Definitions
[0020] A "video data change record" (VDCR) is a logical composition
of one or more fields, items, attributes, and/or objects. A VDCR
corresponds to a plurality of frames and includes change
information (described below). A VDCR corresponds to a particular
level of temporal and spatial granularity. A VDCR may contain
information about one or more events that were detected in the
video data within the spatial/temporal space associated with the
VDCR. VDCRs may also store change information pertaining about
other events that do not appear in the frames that correspond to
the VDCR. For example, a VDCR may store information indicating that
an audible alarm began ringing during the time interval associated
with the VDCR, even though there is no indication within the video
stream of the alarm.
[0021] A VDCR may also include, but is not limited to, (a) a start
time of when the first frame in the plurality of frames was
captured, (b) an end time of when the last frame in the plurality
of frames was captured, (c) a time duration indicating the
difference between the start time and the end time, (d) type data
indicating whether the change corresponds to a detection of motion
or only a pixel change, (e) shape data indicating a shape (e.g.,
person, car) of a moving object that triggered the VDCR, (f)
behavior data indicating a behavior (e.g., walking, running,
driving) of a moving object that triggered the VDCR, and (g) an
indication of whether the VDCR corresponds to an event or a
specified time interval.
[0022] A VDCR may also contain a reference to the actual video data
that corresponds to the plurality of frames of the VDCR in order to
enable a user of the storage system to view the corresponding video
data. If a VDCR contains a start time, then the start time may be
used as the reference.
[0023] "Change information" is information that indicates visual
changes that are detected in the temporal/spatial interval
associated with a VDCR. In one embodiment, the change information
for changes associated with a VDCR is stored in the VDCR. Change
information may indicate motion that is detected in the plurality
of frames and/or a change in pixel values that is detected in the
plurality of frames, such as brightness and hue. For example, a
pixel change may result from the shadow, of a person, that enters
and leaves a view represented by the frames. A pixel change may
also result from a light bulb turning on or off that affects the
brightness of objects in the frames. In some instances, the last
frame in an event may appear as an exact duplicate of the first
frame of the event. For example, suppose a light bulb faded out and
then back on. By simply differencing the pixel values of the first
frame with the pixel values of the last frame, the difference may
be zero. Thus, the change information may indicate the greatest
amount of change. For example, if the light bulb mentioned above
went out and then back on and the possible pixel values range from
0-100, the change information may indicate 100 instead of zero.
[0024] Correspondingly, if the change information indicates a
motion, then the change information may further indicate all
directions and/or speeds of the motion. For example, with a
particular view, an object may move right, left, up, and down.
Thus, the change information may indicate all directions. As
another example, if the object moved at five different speeds in a
certain direction, then change information may indicate the largest
speed.
[0025] Any method for detecting and calculating visual changes
(whether just pixel change or motion) may be used. Thus,
embodiments of the invention are not limited to any particular
method.
[0026] Change information may further include information on a
per-region basis. A "region" is a portion of a two-dimensional view
(e.g., captured by a video camera) of the video data. The view may
be divided into multiple uniform regions, such as in a grid layout.
However, a region may be of any arbitrary size and shape. Thus,
change information may include motion and/or pixel change
information for each specified region of the view for the duration
of the plurality of frames that corresponds to the change
information.
[0027] An "event" is generally associated with a visual change
detected in video data. For example, an event may correspond to the
detection of a person walking in a region of the view. The duration
of the event is typically the length of time that the visual change
occurs. Once no more visual change is detected, then the event may
end.
[0028] An event may be initiated, not only on the detection of
visual changes within a view, but also upon the occurrence of an
external event. For example, an event may be triggered by a fire
alarm. Once the fire alarm is detected, the frames of video data
from that point on are used to generate a VDCR that represents the
event. The event may end, for example, when the fire alarm ends or
when an administrator of a video surveillance system indicates that
the event is completed.
[0029] Alternatively, a VDCR may correspond to a specified time
interval instead of to an event. For example, regardless of whether
a visual change is detected, a VDCR may be generated for each
5-minute interval after every hour. As another example, a VDCR may
be generated for each 24 hour period.
[0030] A VDCR may be generated from other VDCRs and not necessarily
from the video data itself. For example, if a VDCR is generated for
each one-hour period of each day, then a "day" VDCR may be
generated directly from the twenty-four "hour" VDCRs that
correspond to that day. Similarly, a "week" VDCR may be generated
from seven "day" VDCRs, and so forth.
[0031] Similarly, a view-level VDCR may be generated based on the
change information in the corresponding quadrant VDCRs, and the
quadrant VDCRs may be generated based on the change information in
the corresponding region VDCRs.
Storage of Visual Changes
[0032] Because a single VDCR may correspond to thousands or
millions of frame of video data, the storage space required to
store visual changes is much smaller than is required otherwise
(e.g., storing a VDCR for each two-frame sequence where motion is
detected).
[0033] FIG. 2 is a block diagram that illustrates how video data
change records may represent varying amounts of video data,
according to an embodiment of the invention. Video data 201
comprises a series of frames. Each block of video data 201 may
represent, for example, 100 frames. Thus, VDCR 202 represents 400
frames and VDCR 206 represents 600 frames. VDCRs 202-226 are
comprised of two sets: a VDCR set 230 and a VDCR set 240. Suppose
that each VDCR in VDCR set 230 (i.e., VDCRs 202-214) is generated
based on events, as opposed to a pre-specified time interval
regardless of visual changes. For example, VDCR 204 represents an
event that lasted 600 frames, whereas VDCR 206 represents an event
that lasted 200 frames.
[0034] Further suppose that each VDCR in VDCR set 240 (i.e., VDCRs
222-226) is generated for each hour. Thus, for example, VDCR 222
may represent hour # 1, VDCR 224 may represent hour #2, and VDCR
226 may represent hour #3. Therefore, a specific period of time may
be represented by multiple VDCRs that each represents different
periods of time.
[0035] In one embodiment, if a VDCR represents a number of frames
that is also represented by another VDCR, then each VDCR contains a
reference to the other VDCR, as illustrated. Thus, VDCR 222
contains references to VDCRs 202 and 204 and VDCRs 202 and 204 each
contain a reference to VDCR 222. Similarly, VDCR 226 contains
references to VDCRs 210-214 and VDCRs 210-214 each contain a
reference to VDCR 226.
[0036] VDCRs may be stored on disk in specified tables. Each table
may correspond to VDCRs of a certain type. For example, each table
of a plurality of tables may comprise VDCRs that correspond to a
specified time interval (e.g., day table, week table, month table,
etc.). As another example, each table of a plurality of tables may
comprise VDCRs that correspond to certain time frames (e.g., all
VDCRs generated in January, 2006, or all VDCRs generated during
week #51, etc.). Embodiments of the invention are not limited to
how VDCRs are organized on disk.
Regions
[0037] As described above, a two-dimensional view of video data may
be divided into multiple regions. A region may or not be convex.
Multiple regions within a view may be of different sizes and
shapes.
[0038] If a VDCR is generated based on a detected change in pixel
values that is not associated with motion of an object, then the
change information of a particular region may indicate the amount
of change in pixel values within that particular region.
[0039] If a VDCR is generated based on a motion event, then the
change information of a particular region may indicate the
direction and/or velocity of the detected motion within that
particular region. For example, suppose a ball was thrown through
the view of a camera. A VDCR was generated for the few frames that
captured the event. The change information in every region of the
view through which the ball traveled will indicate that motion was
detected in that region and may indicate the direction of the ball
and the velocity of the ball.
[0040] FIG. 3 is a graphical depiction that illustrates how change
information may be stored on a per-region basis, according to an
embodiment of the invention. In this example, a camera view is
divided into multiple, non-overlapping rectangle regions. Each
region in FIG. 3 indicates one or more directions of motion. For
example, region 302 indicates that at least four directions of a
motion have been detected for that particular region. As this
example further illustrates, not every region must specify a
direction and/or speed, such as region 304. In such a case, the
change information for such a region may be empty or include some
indicator indicating zero direction and/or speed.
Abstraction of Change Information
[0041] In one embodiment, change information may be kept, not only
in per-region VDCRs, but also in multi-region VDCRs. For example,
if there are 100 regions within a view, and change information is
maintained for each region for a given region-level VDCR, then
view-level VDCRs may indicate change information for the entire
view, quadrant-level VDCRs may indicate change information for each
quadrant of the view, etc. In each VDCR, the change information is
abstracted to the level of the VDCR. Thus, change information for a
view-level VDCR may contain 1/00.sup.th the information of the
information contained in the corresponding 100 region-level VDCRs.
Similarly, the change information for a single quadrant-level VDCR
may contain 1/25.sup.th the information of the information
contained in the corresponding twenty-five region-level VDCRs that
correspond to the quadrant.
[0042] It should be noted that VDCRs are logical pieces of
information, and do not necessarily correspond to distinct records
within a repository. For example, a single record or data structure
within a repository may include a view-level VDCR, its
corresponding quadrant-level VDCRs, and its corresponding
region-level VDCRs. Similarly, a single record or data structure
may be used to store change information aggregated at the week
level, the day, level, the hour level and the minute level. Data
structures that store change data that has been aggregated at
multiple levels of granularity are referred to herein as composite
VDCRs.
[0043] The nature of the repository used to store VDCRs may vary
from implementation to implementation. The techniques described
herein are not limited to any particular type of repository. For
example, the VDCRs may be stores in a multi-dimensional database, a
relational database, or an object-relational database. In one
embodiment, separate relational tables are used to store VDCRs at
different levels of granularity. Thus, one relational table may
have rows that correspond to view-level VDCRs, while another
relational table may have rows that correspond to region-level
VDCRs. In such an embodiment, indexes may be used to efficiently
locate the region-level VDCRs that correspond to a particular
view-level VDCR.
[0044] FIG. 4 is a block diagram that illustrates how composite
video data change records may store specific and generalized change
information, according to an embodiment of the invention. Video
data 401 comprises a series of frames. Each block of video data 401
may represent any number of frames, such as one hundred. Thus,
composite VDCR 402 may represent four hundred frames. The change
information of VDCR 402 may be represented on a per-region basis, a
per-quadrant basis, a per-view basis, and/or any other basis. In
this example, VDCR 402 comprises view data 403, quadrant data 404,
and region data 405. VDCR 402 may comprise other information such
as whether any visual change was detected in the frames that
correspond to VDCR 402 and the type of visual change (e.g., pixel
change, motion).
Alerts
[0045] In one embodiment, search criteria may be specified in which
all incoming video data is analyzed to determine whether any
detected visual changes satisfy the search criteria. Thus, for an
ongoing event (i.e., before a VDCR is generated for the event), an
alert may be triggered on the precise frame that the change
information (accumulated thus far) first satisfied the search
criteria. Once the event has completed, the accumulated change
information is stored (in the manner described above) in a VDCR so
that future searches with similar search criteria may return that
VDCR.
Filters
[0046] In one embodiment, several levels of filtering may be
performed for various reasons, some of which may include (1)
reducing noise that may be generated when generating change
information and (2) determining dominant motion areas and
velocities within a scene. The change information of a particular
VDCR may be filtered across adjacent regions within a frame, or
across frames, of the corresponding plurality of frames and
adjacent frames within the corresponding plurality of frames, using
various methods that include, but are not limited to, smoothing
filters, median filters, and multi-dimensional clustering
algorithms.
Searching
[0047] With the storage techniques described above, the generated
VDCRs facilitate fast searches across multiple events and specified
time intervals. Because searches are executed against change
information, as described above (which may be thought of as
meta-data about visual changes), rather than the entire video data
itself, the searches may be performed much faster than if a user
was required to search through the entire video data or search on a
frame-by-frame basis of each detected change.
[0048] Thus, a user may specify search criteria that are compared
against each VDCR. For example, a user may search for all VDCRs
that indicate any motion, where the motion is more than 20 mph. As
another example, a user may search for one VDCR that indicates a
pixel change in the lower left quadrant of a 50% change in
brightness.
[0049] Multiple indexes may be generated in order to facilitate
faster searches. Such indexes may be based on time, the type of
visual change, the speed of a motion, the direction of a motion,
etc.
[0050] Furthermore, the manner in which change information is
stored and the varying types of information that a VDCR may include
make possible many types of searches. For example, search criteria
of a particular search may include (1) multiple ranges of time, (2)
the speed of motion in some regions, (3) the direction of motion in
other regions, (4) an amount of pixel change in still other
regions, (5) the shape and type of behavior of multiple detected
objects, etc. The number of possible search criteria is
immeasurable.
Multi-Level Searching--Regions
[0051] As described above, change information that is generated
from video data may be aggregated at different levels of spatial
granularity. For example, the change information stored for a
particular time period may include (1) view-level VDCRs that
indicate change information relative to the entire view, (2)
quadrant-level VDCRs that indicate change information for each of
four quadrants of the view, and (3) region-level VDCRs that
indicate change information for each of a thousand grid regions
within the view. The search mechanism may make use of these
different levels of granularity to improve search performance.
[0052] For example, suppose a view is divided into one hundred
non-overlapping regions. Further, suppose that a user is searching
for motion events that occurred over a particular week, and that a
million region-level VDCRs have been generated for each region
during that week. Suppose that the search criteria includes that a
specified type of motion occurred within each region of twenty-four
specified regions of the view. In this example, if the entire
search is performed at the region-level of granularity, then
twenty-four million region-level VDCRs will have to be inspected
during the search.
[0053] Instead of performing the entire search at the region-level
of granularity, a multi-level search may be performed.
Specifically, during the first phase of the multi-level search,
each of a million view-level VDCRs may be inspected to find those
view-level VDCRs that indicate that the specified motion occurred
anywhere within the view. The determination may be based on
view-level change information in each view-level VDCR. The
view-level change information of a view-level VDCR indicates
whether motion was detected anywhere in the entire view during the
frames associated with the view-level VDCR. In the present example,
the first-level search will involve one million comparisons (one
for each view-level VDCR). For the purpose of explanation, assume
that 50,000 view-level VDCRs matched the first-level search.
[0054] During the second-phase of the multi-level search,
quadrant-level VDCRs are inspected. However, rather than inspecting
all 4 million of the quadrant-level VDCRs, only the quadrant-level
VDCRs that correspond to the 50,000 view-level VDCRs are searched
in the second-level search. Further, if the 24 regions specified in
the search criteria only fall within two of the four quadrants,
then the second-level search need only involve the quadrant-level
VDCRs associated with those two quadrants. Thus, the second phase
of the search will involve no more than 100,000 quadrant-level
VDCRs.
[0055] Each quadrant-level VDCR includes quadrant-level data that
indicates whether motion was detected in any portion of the
corresponding quadrant. For the purpose of explanation, assume
that, based on the quadrant-level VDCRs, only 10,000 view-level
VDCRs of the 50,000 VDCRs included motion in those two
quadrants.
[0056] In the third level search, a region-level search is
performed against the region-level VDCRs that correspond to the
10,000 view-level VDCRs. When searching at the region-level of
granularity, 24 region-level VDCRs may need to be inspected for
each of the 10,000 view-level VDCRs. However, because the candidate
set of view-level VDCRs has been pruned down during the first two
search phases, the number of region-level comparisons performed
during the third-level search (240,000, in the present example)
will typically be far fewer than the number of comparisons (24
million) that would have been performed if all searching was done
at the region-level of granularity.
Multi-Level Searching--Time
[0057] As with areas of a view, a search may be separated into a
multi-level search according to time. For example, suppose a user
wants to find motion events that occurred between the hours of 1:00
AM and 5:00 AM during the past week. Further suppose that an
hour-level VDCR exists for each hour and each day. Thus, in the
first search level, each day-level VDCR of the past week is
examined to determine whether motion was detected in the
corresponding day. In the second search level, each hour-level VDCR
that is associated with a day-level VDCR that was identified in the
first search level is examined to determine whether motion was
detected in the corresponding hour.
[0058] In one embodiment, one level of a multi-level search may be
performed based on time and another level of the multi-level search
may be performed based on areas of the view. For example, suppose
search criteria specifies motion that occurred within a certain
area of a view between the hours of 1:00 AM and 5:00 AM during the
past week. Thus, the first two levels of the search may be used to
identify all hour-level/view-level VDCRs of the past week between
1:00 AM and 5:00 AM. Subsequent levels of the search may be used to
identify all hour-level/region-level VDCRs with change information
that indicates the specified motion in the specified area.
[0059] In one embodiment, users may specify the search criteria for
each level of a multi-level search. In another embodiment,
multi-level searches may be performed automatically transparent to
the user, beginning at relatively coarser temporal/spatial
granularities and ending at the level of granularities of the
search criteria that was specified by the user. Thus, a single set
of search criteria may be automatically divided (e.g., by a query
compiler) into one or more general searches and one specific
search. Any mechanism for dividing search criteria into a
multi-level query may be used. Embodiments of the invention are not
limited to any specific mechanism.
Hardware Overview
[0060] FIG. 5 is a block diagram that illustrates a computer system
500 upon which an embodiment of the invention may be implemented.
Computer system 500 includes a bus 502 or other communication
mechanism for communicating information, and a processor 504
coupled with bus 502 for processing information. Computer system
500 also includes a main memory 506, such as a random access memory
(RAM) or other dynamic storage device, coupled to bus 502 for
storing information and instructions to be executed by processor
504. Main memory 506 also may be used for storing temporary
variables or other intermediate information during execution of
instructions to be executed by processor 504. Computer system 500
further includes a read only memory (ROM) 508 or other static
storage device coupled to bus 502 for storing static information
and instructions for processor 504. A storage device 510, such as a
magnetic disk or optical disk, is provided and coupled to bus 502
for storing information and instructions.
[0061] Computer system 500 may be coupled via bus 502 to a display
512, such as a cathode ray tube (CRT), for displaying information
to a computer user. An input device 514, including alphanumeric and
other keys, is coupled to bus 502 for communicating information and
command selections to processor 504. Another type of user input
device is cursor control 516, such as a mouse, a trackball, or
cursor direction keys for communicating direction information and
command selections to processor 504 and for controlling cursor
movement on display 512. This input device typically has two
degrees of freedom in two axes, a first axis (e.g., x) and a second
axis (e.g., y), that allows the device to specify positions in a
plane.
[0062] The invention is related to the use of computer system 500
for implementing the techniques described herein. According to one
embodiment of the invention, those techniques are performed by
computer system 500 in response to processor 504 executing one or
more sequences of one or more instructions contained in main memory
506. Such instructions may be read into main memory 506 from
another machine-readable medium, such as storage device 510.
Execution of the sequences of instructions contained in main memory
506 causes processor 504 to perform the process steps described
herein. In alternative embodiments, hard-wired circuitry may be
used in place of or in combination with software instructions to
implement the invention. Thus, embodiments of the invention are not
limited to any specific combination of hardware circuitry and
software.
[0063] The term "machine-readable medium" as used herein refers to
any medium that participates in providing data that causes a
machine to operation in a specific fashion. In an embodiment
implemented using computer system 500, various machine-readable
media are involved, for example, in providing instructions to
processor 504 for execution. Such a medium may take many forms,
including but not limited to, non-volatile media, volatile media,
and transmission media. Non-volatile media includes, for example,
optical or magnetic disks, such as storage device 510. Volatile
media includes dynamic memory, such as main memory 506.
Transmission media includes coaxial cables, copper wire and fiber
optics, including the wires that comprise bus 502. Transmission
media can also take the form of acoustic or light waves, such as
those generated during radio-wave and infra-red data
communications. All such media must be tangible to enable the
instructions carried by the media to be detected by a physical
mechanism that reads the instructions into a machine.
[0064] Common forms of machine-readable media include, for example,
a floppy disk, a flexible disk, hard disk, magnetic tape, or any
other magnetic medium, a CD-ROM, any other optical medium,
punchcards, papertape, any other physical medium with patterns of
holes, a RAM, a PROM, and EPROM, a FLASH-EPROM, any other memory
chip or cartridge, a carrier wave as described hereinafter, or any
other medium from which a computer can read.
[0065] Various forms of machine-readable media may be involved in
carrying one or more sequences of one or more instructions to
processor 504 for execution. For example, the instructions may
initially be carried on a magnetic disk of a remote computer. The
remote computer can load the instructions into its dynamic memory
and send the instructions over a telephone line using a modem. A
modem local to computer system 500 can receive the data on the
telephone line and use an infra-red transmitter to convert the data
to an infra-red signal. An infra-red detector can receive the data
carried in the infra-red signal and appropriate circuitry can place
the data on bus 502. Bus 502 carries the data to main memory 506,
from which processor 504 retrieves and executes the instructions.
The instructions received by main memory 506 may optionally be
stored on storage device 510 either before or after execution by
processor 504.
[0066] Computer system 500 also includes a communication interface
518 coupled to bus 502. Communication interface 518 provides a
two-way data communication coupling to a network link 520 that is
connected to a local network 522. For example, communication
interface 518 may be an integrated services digital network (ISDN)
card or a modem to provide a data communication connection to a
corresponding type of telephone line. As another example,
communication interface 518 may be a local area network (LAN) card
to provide a data communication connection to a compatible LAN.
Wireless links may also be implemented. In any such implementation,
communication interface 518 sends and receives electrical,
electromagnetic or optical signals that carry digital data streams
representing various types of information.
[0067] Network link 520 typically provides data communication
through one or more networks to other data devices. For example,
network link 520 may provide a connection through local network 522
to a host computer 524 or to data equipment operated by an Internet
Service Provider (ISP) 526. ISP 526 in turn provides data
communication services through the world wide packet data
communication network now commonly referred to as the "Internet"
528. Local network 522 and Internet 528 both use electrical,
electromagnetic or optical signals that carry digital data streams.
The signals through the various networks and the signals on network
link 520 and through communication interface 518, which carry the
digital data to and from computer system 500, are exemplary forms
of carrier waves transporting the information.
[0068] Computer system 500 can send messages and receive data,
including program code, through the network(s), network link 520
and communication interface 518. In the Internet example, a server
530 might transmit a requested code for an application program
through Internet 528, ISP 526, local network 522 and communication
interface 518.
[0069] The received code may be executed by processor 504 as it is
received, and/or stored in storage device 510, or other
non-volatile storage for later execution. In this manner, computer
system 500 may obtain application code in the form of a carrier
wave.
[0070] In the foregoing specification, embodiments of the invention
have been described with reference to numerous specific details
that may vary from implementation to implementation. Thus, the sole
and exclusive indicator of what is the invention, and is intended
by the applicants to be the invention, is the set of claims that
issue from this application, in the specific form in which such
claims issue, including any subsequent correction. Any definitions
expressly set forth herein for terms contained in such claims shall
govern the meaning of such terms as used in the claims. Hence, no
limitation, element, property, feature, advantage or attribute that
is not expressly recited in a claim should limit the scope of such
claim in any way. The specification and drawings are, accordingly,
to be regarded in an illustrative rather than a restrictive
sense.
* * * * *