U.S. patent application number 10/908557 was filed with the patent office on 2005-11-10 for system and method for restricting access through a mantrap portal.
This patent application is currently assigned to COGNEX TECHNOLOGY AND INVESTMENT CORPORATION. Invention is credited to Fix, Raymond A., Nichani, Sanjay, Schwab, John.
Application Number | 20050249382 10/908557 |
Document ID | / |
Family ID | 34551584 |
Filed Date | 2005-11-10 |
United States Patent
Application |
20050249382 |
Kind Code |
A1 |
Schwab, John ; et
al. |
November 10, 2005 |
System and Method for Restricting Access through a Mantrap
Portal
Abstract
A method and system provides increased levels of security for a
mantrap portal by continuously monitoring two zones of the mantrap;
a primary zone and a secondary zone. A primary sensor determines
that exactly one or zero people are present in the primary zone
when requesting access into a secured area. A secondary sensor
determines that exactly zero people are present in the secondary
zone when access to the secured area is granted. The primary and
secondary sensors in combination can detect piggyback events and
tailgating events before granting access to a secured area.
Further, the primary and secondary sensors in combination can
detect the presence of unauthorized persons in a mantrap prior to
granting access to the mantrap for exit from a secured area.
Inventors: |
Schwab, John; (Framingham,
MA) ; Fix, Raymond A.; (Natick, MA) ; Nichani,
Sanjay; (Natick, MA) |
Correspondence
Address: |
ARTHUR J. O'DEA
LEGAL DEPARTMENT
COGNEX CORPORATION
ONE VISION DRIVE
NATICK
MA
01760-2077
US
|
Assignee: |
COGNEX TECHNOLOGY AND INVESTMENT
CORPORATION
1001 Rengstorff Avenue
Mountain View
CA
|
Family ID: |
34551584 |
Appl. No.: |
10/908557 |
Filed: |
May 17, 2005 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
10908557 |
May 17, 2005 |
|
|
|
10702059 |
Nov 5, 2003 |
|
|
|
Current U.S.
Class: |
382/115 ;
340/5.7; 340/541; 340/542; 348/152; 348/156; 707/999.001 |
Current CPC
Class: |
G07C 9/00 20130101; G06T
7/20 20130101; G07C 9/15 20200101; G06K 9/00778 20130101; G06T
7/593 20170101 |
Class at
Publication: |
382/115 ;
707/001; 340/541; 340/542; 340/005.7; 348/152; 348/156 |
International
Class: |
G06K 009/00 |
Claims
What is claimed is:
1. A method of controlling access to a secured area using a
mantrap, the mantrap having a landside door and an airside door,
the method comprising: monitoring a primary zone, the primary zone
comprising a region within the mantrap having an area less than the
area of the mantrap, to determine the presence of one person in the
primary zone; monitoring a secondary zone, the secondary zone being
an area comprising a region of the mantrap not including the
primary zone, to determine the absence of any persons in the
secondary zone; and controlling access through the landside door
and the airside door in response to the steps of monitoring the
primary zone and monitoring the secondary zone.
2. The method according to claim 1 wherein the step monitoring the
primary zone further comprises: acquiring a stereo image of the
primary zone; computing a first set of 3D features from the stereo
image of the primary zone; and determining the presence of one
person in the primary zone using the first set of 3D features.
3. The method according to claim 2 wherein the step of monitoring
the secondary zone further comprises: acquiring a stereo image of
the secondary zone; computing a set of 3D features from the stereo
image of the secondary zone; and determining the absence of any
person in the secondary zone using the second set of 3D
features.
4. The method according to claim 1 further comprising setting an
alarm signal if the step of monitoring the primary zone fails to
determine the presence of one person in the primary zone.
5. The method according to claim 4 further comprising setting an
alarm signal if the step of monitoring the secondary zone fails to
determine the absence of any persons in the secondary zone.
6. The method according to claim 2 further comprising filtering the
first set of 3D features to exclude features that are computed to
be substantially near the ground in the primary zone.
7. The method according to claim 3 further comprising filtering the
second set of 3D features to exclude features that are computed to
be substantially near the ground in the secondary zone.
8. The method according to claim 1 wherein both the step of
monitoring the primary zone and monitoring the secondary zone are
performed by a single three-dimensional machine vision sensor.
9. A system for controlling access to a secured area using a
mantrap, the system comprising: a mantrap having a lockable
landside door and a lockable airside door; a primary sensor to
detect the presence of a person in a primary zone within the
mantrap, the primary zone comprising a region within the mantrap
having an area less than the area of the mantrap; a secondary
sensor to detect the absence of any persons within a secondary zone
within the mantrap, the secondary zone comprising a region within
the mantrap not including the primary zone; a controller coupled to
the primary sensor and the secondary sensor, the controller
actuating the lockable landside door and the lockable airside door
in response to the output of the primary sensor and the secondary
sensor.
10. The system according to claim 9 wherein the primary sensor is a
three-dimensional machine vision sensor adapted to monitor a first
volume of space directly above the primary zone.
11. The system according to claim 10 wherein the secondary sensor
is a three-dimensional machine vision sensor adapted to monitor a
second volume of space directly above the secondary zone.
12. The system according to claim 10 wherein the secondary sensor
comprises a plurality of three-dimensional machine vision sensors,
the plurality of three dimensional machine vision sensors adapted
to cooperatively monitor a second volume of space directly above
the secondary zone, and wherein the controller is cooperatively
coupled to each of the plurality of three dimensional machine
vision sensors.
13. The system according to claim 10 wherein the secondary sensor
is a presence/absence detector.
14. The system according to claim 13 wherein the presence/absence
detector is a sensor selected from the list consisting of a
pressure-sensitive mat, a light beam emitter/detector pair, and an
infra-red motion sensor.
15. A method for detecting objects in a mantrap, the method
comprising: acquiring a stereo image of a region of the mantrap,
the stereo image comprising a plurality of two-dimensional images
of the region; computing a set of 3D features from the stereo
image; determining the absence of any person in the region using
the set of 3D features; comparing one of the plurality of two
dimensional images of the region to a baseline image; and detecting
an object in the mantrap from the step of comparing.
16. The method according to claim 15 wherein the baseline image is
computed from a plurality of images of the region of the mantrap
when no known objects are present.
17. The method according to claim 15 further comprising combining
the baseline image with at least one of the plurality of
two-dimensional images of the region if no objects are detected.
Description
CROSS REFERENCE TO RELATED APPLICATIONS
[0001] This application is a continuation-in-part to U.S. patent
application Ser. No. 10/702,059, entitled "Method and System for
Enhanced Portal Security through Stereoscopy," filed Nov. 5, 2003,
the contents of which are hereby incorporated by reference.
BACKGROUND OF THE INVENTION
[0002] This invention relates to security systems that permit
controlled access to a secured area. Specifically, this invention
relates to automatic door control in a secured area using a mantrap
portal.
[0003] A typical security issue associated with most access
controlled portal security systems is to ensure that when a person
is authorized for entry into a secured area, it is only that person
that is permitted to enter. A mantrap secured portal is a
configuration of a secured portal that is commonly employed for
restricting access to that of only an authorized person at a
time.
[0004] A mantrap portal is a small room with two doors: one door
for access to/from an unsecured area (called "landside"); and one
door for access to/from a secured area (called "airside"). The
basic operation of a mantrap for entry into a secured area from an
unsecured area can be described with reference to FIG. 1.
[0005] A representative mantrap 100 is shown to provide an access
portal between an unsecured landside region 130 and a secured
airside region 140. The mantrap 100 has a landside door 120 and an
airside door 110. The landside door 120 can be locked in the closed
position with landside lock 150, and the airside door 110 can be
locked in the closed position with airside lock 160. In the normal,
unoccupied configuration (not shown), the airside door 110 is
closed and locked with airside lock 160, while the landside door
120 is closed, but not locked by landside lock 150.
[0006] A person seeking access to the secured area (airside) will
approach the mantrap 100 represented by person 125. The landside
door can be opened while the airside door is locked. Once the
person seeking access is fully inside the mantrap, as represented
by person 105, a request for entry can be made at entry request
155. The entry request can be a card reader, a doorbell, or a
biometric input such as a palm or fingerprint reader, or a retina
scan. Once entry access is granted, the landside door 120 is in the
closed position and locked by landside lock 150. With the landside
door 120 locked closed, the airside lock 160 can be released, and
the airside door 110 can be opened. The person seeking access can
enter the secured area, represented as person 115. Once the airside
door 110 is closed and locked by airside lock 160, the landside
lock 150 can be released, to return the mantrap to the normal,
unoccupied position.
[0007] The mantrap 100 can operate to permit a person to exit the
secured airside region 140 while maintaining a high degree of
security. A request can be made at exit request 165, which starts
the door locking cycle. The landside door 120 is locked by landside
lock 150, and the airside door 110 is unlocked by airside lock 160.
The person seeking to exit can enter the mantrap, and the airside
door 110 can be locked so that the landside door 120 can be
unlocked, thereby permitting a person to exit. The mantrap
configuration operates to control access since the door to the
unsecured area can be locked in the closed position while the door
to the secured area is open.
[0008] The basic operation of a mantrap portal becomes increasingly
complex as security of the portal is enhanced. For example, mantrap
portals are commonly equipped with IR sensors, pressure mats, to
prevent piggyback and tailgate violations.
[0009] Piggybacking can occur when an authorized person knowingly
or unknowingly provides access through a portal to another
traveling in the same direction. If a second, unauthorized person
is permitted to enter the secured area with the authorized person,
the security is breached.
[0010] Tailgating can occur when an authorized person knowingly or
unknowingly provides unauthorized access through a portal to
another traveling in the opposite direction. For example, an
unauthorized person entering the mantrap from the unsecured area
can wait until someone leaves the secured area - and while the door
is opened into the mantrap from the secured area, the unauthorized
person can enter, thereby breaching security.
[0011] Piggybacking and tailgating can be prevented in a mantrap
using door locks controlled by a door controller that has the
ability to count the number of people in the mantrap. To prevent
piggybacking violations, the door to the secured area is only
unlocked if there is exactly one authorized person seeking access
to the secured area. Tailgating is prevented by only unlocking the
door from the secured area to permit someone to exit the secured
area only if there is nobody detected in the mantrap.
[0012] Mantrap portals with enhanced security, such as pressure
mats and IR sensors, are easily defeated by two people walking
close together, or by carrying one person by the other.
Accordingly, there exists a need for a system that can effectively
enhance the security of a mantrap portal.
BRIEF SUMMARY OF THE INVENTION
[0013] The present invention provides for improved methods and
systems for restricting access to a secured area using a mantrap
portal. An embodiment of the present invention continuously
monitors a primary zone to determine the presence or absence of one
person in the primary zone. The primary zone is a region of the
mantrap having an area less than the area of the entire mantrap,
preferably located at a location proximal to the airside door.
While the primary zone is monitored, the present invention
continuously monitors a secondary zone to determine that no persons
are present. The secondary zone is a region of the mantrap not
including the primary zone. When the primary zone has exactly one
or zero people present, and at the same time the secondary zone has
exactly zero people present, the mantrap door locking/unlocking
cycle can commence to permit access/egress to/from the secured
area.
[0014] An exemplary embodiment of the present invention uses a
three-dimensional machine vision sensor to monitor the primary zone
and the secondary zone to identify and track detected features that
can be associated with people or a person. When used in conjunction
with a door access control system, alarm conditions can be
generated when unexpected conditions are detected.
[0015] Other embodiments of the present invention use a
three-dimensional machine vision sensor to monitor the primary zone
in combination with one or more presence/absence detectors to
monitor the secondary zone.
[0016] Further embodiments disclose methods and systems that
perform additional two-dimensional image analysis of regions of the
mantrap in combination with a three-dimensional image analysis so
that the extreme extents of the respective primary and secondary
zones, and regions not captured by the respective primary and
secondary zones are analyzed for the presence of any people or
objects.
BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS
[0017] The present invention is further described in the detailed
description which follows, by reference to the noted drawings by
way of non-limiting exemplary embodiments, in which like reference
numerals represent similar parts throughout the several views of
the drawings, and wherein:
[0018] FIG. 1 is a plan view of a mantrap security portal according
to the background art;
[0019] FIG. 2 is a plan view of a mantrap security portal according
to the present invention;
[0020] FIG. 3 is a block diagram of a control system according to
the present invention;
[0021] FIG. 4 is a flowchart of the operation of the mantrap
security portal according to the present invention;
[0022] FIG. 5 is a perspective view of an embodiment of the present
invention;
[0023] FIG. 6 is a flowchart of the method used to detect people or
objects according to the exemplary embodiment of the present
invention;
[0024] FIG. 7 is a flowchart of the additional image analysis
methods used to detect people or objects according to an alternate
embodiment of the present invention;
[0025] FIG. 8 is a plan view of a mantrap security portal according
to an exemplary embodiment of the present invention.
[0026] FIG. 9 is a block diagram illustrating a coarse segmentation
process that identifies coarse people candidates according to an
embodiment of the present invention;
[0027] FIG. 10 is a diagram illustrating the coarse segmentation
process that identifies coarse people candidates according to an
embodiment of the present invention;
[0028] FIG. 11 is a block diagram illustrating a fine segmentation
process for validating or discarding coarse people candidates
according to an embodiment of the present invention;
[0029] FIG. 12 is a diagram illustrating a fine segmentation
process for validating or discarding coarse people candidates
according to an embodiment of the present invention;
[0030] FIG. 13 is a diagram illustrating a fine segmentation
process for validating or discarding coarse people candidates
according to an embodiment of the present invention; and
[0031] FIG. 14 is a block diagram illustrating a method used to
determine the number of people candidates by confidence level
scoring according to an embodiment of the present invention.
DETAILED DESCRIPTION OF THE INVENTION
[0032] Referring to FIG. 2, in accordance with the present
invention, there is provided a mantrap 100 to permit an enhanced
level of security. The mantrap 100 is a portal region between an
insecure landside region 130 and a secured airside region 140. The
mantrap 100 has a landside door 120 for access into and out from
the landside region 130, and an airside door 110, for access into
and out from the airside region 140. An airside door lock 160
permits remote locking of the airside door 160, and a landside door
lock 150 permits remote locking of the landside door 120. An access
request 155 is shown as a panel for requesting access into the
secure airside region 140, and an exit access 165 is shown as a
panel for requesting access from the secured airside region 140
into the mantrap 100.
[0033] As shown in FIG. 2, a primary zone 210 is established as a
region in the mantrap having an area less than the area of the
mantrap 100. A primary sensor 230 monitors the primary zone 210 to
determine if exactly one person is present in the primary zone. The
primary zone 210 can be located anywhere within the mantrap 100,
though preferably, the primary zone 210 is located adjacent to the
airside door 110.
[0034] A secondary zone 220 is established as a region within the
mantrap 100, not including the primary zone 210. The secondary zone
220 does not need to include the entire region of the mantrap 100
exclusive of the primary zone 210, though it is preferred that any
region within the mantrap not inclusive of the primary zone 210 and
the secondary zone 220 be not large enough for a person to
occupy.
[0035] A secondary sensor 240 monitors the secondary zone to
determine whether or not a person or object exists within the
secondary zone 220.
[0036] Referring to FIG. 3, a block diagram is shown in accordance
with the present invention. A controller 310 of the type
conventionally known in the art of access control for security
applications is used to control the airside door lock 160 and the
landside door lock 150. The controller can be any device that is
capable of reading inputs, processing simple logic, and controlling
the landside door and airside door. The controller may have the
capability for performing automatic door control, i.e., opening and
closing, in addition to actuation of the respective door locks. The
controller 310 can be a Programmable Logic Controller (PLC), or a
Personal Computer (PC) with the appropriate software
instructions.
[0037] The controller 310 is responsive to signals from an entry
request 155 and an exit request 165 upon presentation of an
appropriate credential by the person seeking access/exit. Each of
the entry request 155 and exit request 165 being of the type
conventionally known in the art of access control for security,
including, but not limited to, card readers, keypad terminals, or
biometric input stations, such as finger- or palm-print readers,
retinal scanners, or voice recognition stations.
[0038] The controller 310 is adapted to receive input signals from
the primary sensor 230 and the secondary sensor 240 to actuate the
airside door lock 160 and the landside door lock 150 in response to
either of the entry request 155 or exit request 165 terminals. FIG.
4 depicts a flowchart of the basic operation of the controller 310
according to the present invention.
[0039] Referring to FIG. 4, the controller initializes the mantrap
100 with the appropriate signals to lock the airside door at step
410, and unlock the landside door 420. The entry request terminal
155 is monitored at step 430 and the exit request terminal 165 is
monitored at step 440. If neither an entry request 430 or exit
request 440 is made, processing loops continuously.
[0040] Referring to FIG. 2 in conjunction with FIG. 4, a person
seeking access to the secured airside region approaches the
mantrap, shown as person 125. Once an entry request is made,
processing continues to step 450 where the output of the primary
sensor 230 and secondary sensor are considered by the controller
310. If the primary sensor does not output a signal indicating that
one person is in the primary zone, or if the secondary sensor does
not output a signal indicating that no objects or people are
detected in the secondary zone, processing continues by looping in
place, as shown by processing path 455, until both conditions are
met.
[0041] When the person seeking access is in the primary zone 210,
shown in FIG. 2 as person 105, the primary sensor outputs a signal
indicating that one person is detected in the primary zone. If
there are no people or objects detected in the secondary zone 220,
the secondary sensor outputs a signal indicating that no such
people or objects are detected. and processing continues. At this
point, the landside door is locked at step 470 and the airside door
is unlocked at step 480, so that the person seeking access can
enter the secured airside region, shown as person 115.
[0042] Processing continues by looping back to step 410 where the
airside door is returned to the locked state, and the landside door
is unlocked at step 420.
[0043] If an exit request is detected at step 440, processing
continues to step 460, where the signals from the primary sensor
230 and secondary sensor 240 are considered by the controller 310.
At step 460, if the primary sensor does not indicate that no people
are present in the primary zone 210 or if the secondary sensor does
not indicate that no people or objects are present in the secondary
zone 220, processing continues by looping in place, as shown by
processing path 465.
[0044] When both the primary sensor detects that zero people are
present in the primary zone 210, and the secondary sensor detects
that no people or objects are present in the secondary zone,
processing continues to step 470 where the landside door is locked.
When the airside door is unlocked at step 480, the person
requesting to exit from the secured airside region can enter the
mantrap through the airside door 110. At that point, the airside
door can be locked at step 410 and the landside door can be
unlocked at 420, so that the person can exit the mantrap through
the landside door 120.
[0045] One skilled in the art of controlling access to a secured
area using a conventional door control system will appreciate that
the basic operation of the mantrap 100 can be modified in various
ways without departing from the scope and spirit of the present
invention. For example, the entry request terminal 155 can be
placed outside the mantrap in the unsecured landside region 130,
and the normal idle state of the mantrap can be configured with
both the airside door 110 and the landside door 120 in the locked
state. Further, several alarm conditions can be initiated by the
controller 310 if the looping path 455 or the looping path 465 are
traversed for a specified duration.
[0046] In an exemplary embodiment of the present invention, the
primary sensor 230 and the secondary sensor 240 are each a
three-dimensional machine vision sensor described herein with
reference to FIG. 5. Each of the primary sensor 230 and the
secondary sensor 240 has a 3D image processor, memory, discrete
I/O, and a set of stereo cameras 10, in an integrated unit mounted
in the mantrap 100. The primary sensor 230 is mounted in the
ceiling above the airside door 110 looking downward and outward
towards the primary zone 210. The secondary sensor 240 is mounted
in a position so that it can observe the secondary zone 220. One
skilled in the art will appreciate that the primary sensor 230 and
the secondary sensor 240 can be mounted in any number of positions
relative to the respective primary and secondary zones.
[0047] In each of the sensors, the set of cameras 10 is calibrated
to provide heights above the ground plane for any point in the
field of view. Therefore, when any object enters the field of view,
it generates interest points called "features",the heights of which
are measured relative to the ground plane. These points are then
clustered in 3D space to provide "objects". These objects are then
tracked in multiple frames to provide "trajectories".
[0048] In an exemplary system, the baseline distance between the
optical centers of the cameras is 12 mm and the lenses have a focal
length of 2.1 mm (150 degree Horizontal Field of View (HFOV)). The
cameras are mounted approximately 2.2 meters from the ground and
have a viewing area that is approximately 2.5 by 2.5 meters. The
surface normal to the plane of the cameras points downward and
outward as shown in FIG. 5 wherein the cameras are angled just
enough to view the area just below the mounting point.
[0049] In the exemplary embodiment of the present invention various
parameters are set up in the factory. The factory setup involves
calibration and the computation of the intrinsic parameters for the
cameras and the relative orientation between the cameras.
Calibration involves the solution of several sub-problems, as
discussed hereinafter, each of which has several solutions that are
well understood by persons having ordinary skill in the art.
Further, rectification coefficients, described hereinafter, must be
computed to enable run time image correction.
[0050] Stereo measurements could be made in a coordinate system
that is different from the coordinate systems of either camera. For
example, the scene or world coordinates correspond to the points in
a viewed scene. Camera coordinates (left and right) correspond to
the viewer-centered representation of scene points. Undistorted
image coordinates correspond to scene points projected onto the
image plane. Distorted image coordinates correspond to points
having undergone lens distortion. Pixel coordinates correspond to
the grid of image samples in the image array.
[0051] In the exemplary embodiment one camera is designated to be a
"reference camera",to which the stereo coordinate system is tied.
An interior orientation process is performed to determine the
internal geometry of a camera. These parameters, also called the
intrinsic parameters, include the following: effective focal
length, also called the camera constant; location of the principal
point, also called the image center; radial distortion
coefficients; and horizontal scale factor, also called the aspect
ratio. The cameras used in the exemplary embodiment have
fixed-focus lenses that cannot be modified; therefore these
parameters can be computed and preset at the factory.
[0052] A relative orientation process is also performed to
determine the relative position and orientation between two cameras
from projections of calibration points in the scene. Again, the
cameras are mechanically fixtured such that they stay in alignment
and hence these parameters can also be preset at the factory.
[0053] A rectification process, closely associated with the
relative orientation, is also performed. Rectification is the
process of resampling stereo images so that epipolar lines
correspond to image rows. "An epipolar line on one stereo image
corresponding to a given point in another stereo image is the
perspective projection on the first stereo image of the
three-dimensional ray that is the inverse perspective projection of
the given point from the other stereo image, as described in Robert
M. Haralick & Linda G. Shapiro, Computer and Robot Vision Vol.
II 598 (1993), incorporated herein by reference. If the left and
right images are coplanar and the horizontal axes is collinear (no
rotation about the optical axis), then the image rows are epipolar
lines and stereo correspondences can be found along corresponding
rows. These images, referred to as normal image pairs provide
computational advantages because the rectification of normal image
pairs need only be performed one time.
[0054] The method for rectifying the images is independent of the
representation used for the given pose of the two cameras. It
relies on the principal that any perspective projection is a
projective projection. Image planes corresponding to the two
cameras are replaced by image planes with the desired geometry
(normal image pair) while keeping the geometry of the rays spanned
by the points and the projection centers intact. This results in a
planar projective transformation. These coefficients can also be
computed at the factory.
[0055] Given the parameters computed in interior orientation,
relative orientation and rectification, the camera images can be
corrected for distortion and misalignment either in software or
hardware. The resulting corrected images have the geometry of a
normal image pair i.e., square pixels, aligned optical planes,
aligned axes (rows), and pinhole camera model.
[0056] An exterior orientation process is also performed during
factory set up of the exemplary embodiment. The exterior
orientation process is needed because 3D points in a viewed scene
are only known relative to the camera coordinate system. Exterior
orientation determines the position and orientation of a camera in
an absolute coordinate system. An absolute 3D coordinate system is
established such that the XY plane corresponds to the ground plane
and the origin is chosen to be an arbitrary point on the plane.
[0057] Ground plane calibration is performed at the location of the
installation. In an embodiment, the primary sensor 230 and the
secondary sensor 240 are mounted on a plane that is parallel to the
floor, and the distance between the respective sensor and the floor
is entered. Alternatively, calibration targets can be laid out in
the floor to compute the relationship between the stereo coordinate
system attached to the reference camera and the world or scene
coordinates system attached to the ground plane.
[0058] Regions of interest are also set up manually at the location
of the installation. This involves capturing the image from the
reference camera (camera that the stereo coordinate system is tied
to), rectifying it, displaying it and then using a graphics overlay
tool to specify the zones to be monitored. Multiple zones can be
pre-selected to allow for different run-time algorithms to run in
each of the zones. The multiple zones typically include particular
3D spaces of interest. Filtering is performed to eliminate features
outside of the zones being monitored, i.e., the primary zone 210.
In alternative embodiments of the invention, automatic setup can be
performed by laying out fiducial markings or tape on the floor.
[0059] While there are several methods to perform stereo vision to
monitor each of the primary zone 210 and the secondary zone 220
according to the present invention, one such method is outlined
below with reference to FIG. 6. This method detects features in a
3D scene using primarily boundary points or edges (due to occlusion
and reflectance) because the information is most reliable only at
these points. One skilled in the art will appreciate that the
following method can be performed by each of the primary sensor 230
and the secondary sensor 240 simultaneously and independently. By
the manner in which each of the respective sensors are
independently coupled to the controller 310, it is not necessary
for both primary and secondary sensors to communicate directly with
each other.
[0060] Referring to FIG. 6, a set of two dimensional images are
provided, e.g., a right image and a left image. One of the images
is designated the reference image. Both of the images are rectified
at step 610. Each respective rectification step is performed by
applying an image rectification transform that corrects for
alignment and lens distortion, resulting in virtually coplanar
images. Rectification can be performed by using standard image
rectification transforms known in the art. In an exemplary
embodiment, the image rectification transform is implemented as a
lookup table through which pixels of a raw image are transformed
into pixels of a rectified image.
[0061] At 620, the rectified two-dimensional image points from the
reference image (X.sub.R, Y.sub.R) are matched to corresponding
two-dimensional image points in the non-reference image (X.sub.L,
Y.sub.L). By first rectifying the images, reference image points
(X.sub.R, Y.sub.R) are matched to non-reference image points
(X.sub.L, Y.sub.L) along the same row, or epipolar line. Matching
can be performed through known techniques in the art, such as in T.
Kanade et al, A Stereo Machine for Video-rate Dense Depth Mapping
and its New Applications, Proc. IEEE Computer Vision and Pattern
Recognition (CVPR), pp. 196-202 (1996), the entire contents of
which are incorporated herein by reference.
[0062] At 630, a set of disparities D corresponding to the matched
image points is computed relative to the reference image points
(X.sub.R, Y.sub.R), resulting in a disparity map (X.sub.R, Y.sub.R,
D), also called the depth map or the depth image. The disparity map
contains a corresponding disparity `d` for each reference image
point (X.sub.R, Y.sub.R). By rectifying the images, each disparity
`d` corresponds to a shift in the x-direction.
[0063] At 640, a three dimensional model of the door scene is
generated in 3D world coordinates. In one embodiment, the three
dimensional scene is first generated in 3D camera coordinates
(X.sub.c, Y.sub.c, Z.sub.c) from the disparity map (X.sub.R,
Y.sub.R, D) and intrinsic parameters of the reference camera
geometry. The 3D camera coordinates (X.sub.c, Y.sub.c, Z.sub.c) for
each image point are then converted into 3D world coordinates
(X.sub.w, Y.sub.w, Z.sub.w) by applying an appropriate coordinate
system transform.
[0064] At 650, the target volume, i.e., the volume of space
directly above the observed zone, can be dynamically adjusted and
image points outside the target volume are clipped. The 3D world
coordinates of the mantrap scene (X.sub.w, Y.sub.w, Z.sub.w) that
fall outside the 3D world coordinates of target volume are clipped.
In a particular embodiment, clipping can be effectively performed
by setting the disparity value `d` to zero for each image points
(X.sub.R, Y.sub.R) whose corresponding 3D world coordinates fall
outside the target volume, resulting in a filtered disparity map
"filtered (X.sub.R, Y.sub.R, D)". A disparity value that is equal
to zero is considered invalid. The filtered disparity map is
provided as input to a multi-resolution people segmentation process
commencing at 660.
[0065] At 660, coarse segmentation is performed for identifying
people candidates within the target volume. According to one
embodiment, coarse segmentation includes generating a topological
profile of the target volume from a low resolution view of the
filtered disparity map. Peaks within the topological profile are
identified as potential people candidates. A particular embodiment
for performing coarse segmentation is illustrated in FIGS. 9 and
10.
[0066] At 670, fine segmentation is performed for confirming or
discarding people candidates identified during course segmentation.
According to one embodiment, the filtered disparity map is analyzed
within localized areas at full resolution. The localized areas
correspond to the locations of the people candidates identified
during the coarse segmentation process. In particular, the fine
segmentation process attempts to detect head and shoulder profiles
within three dimensional volumes generated from the localized areas
of the disparity map. A particular embodiment for performing fine
segmentation is illustrated in FIGS. 11 through 13.
[0067] Coarse Segmentation of People Candidates
[0068] FIGS. 9 and 10 are diagrams illustrating a coarse
segmentation process that identifies coarse people candidates
according to one embodiment. In particular, FIG. 9 is a flow
diagram illustrating a coarse segmentation process that identifies
coarse people candidates according to one embodiment. The detected
locations of the coarse people candidates resulting from the
segmentation process are then forwarded to a fine segmentation
process for validation or discard.
[0069] At 700, the filtered disparity map is segmented into bins.
For example, in FIG. 10, the filtered disparity map 755 includes
points (X.sub.R, Y.sub.R, D) which are segmented into bins 752,
such that each bin contains a set of image points (X.sub.BIN,
Y.sub.BIN) and their corresponding disparities (D.sub.BIN).
[0070] At 701 of FIG. 9, a low resolution disparity map is
generated from calculated mean disparity values of the bins. For
example, in FIG. 10, a low resolution disparity map 760 is
generated including points (X.sub.M, Y.sub.M, D.sub.M) where the
points (X.sub.M, Y.sub.M) correspond to bin locations in the high
resolution disparity map 755 and D.sub.M corresponds to the mean
disparity values d.sub.M calculated from those bins.
[0071] In a particular embodiment, a mean disparity value d.sub.M
for a particular bin can be calculated by generating a histogram of
all of the disparities D.sub.BIN in the bin having points
(X.sub.BIN, Y.sub.BIN). Excluding the bin points in which the
disparities are equal to zero and thus invalid, a normalized mean
disparity value d.sub.M is calculated. The normalized mean
disparity d.sub.M is assigned to a point in the low resolution
disparity map for that bin.
[0072] At 702 of FIG. 9, peaks are identified in the topological
profile of the low resolution disparity map. In a particular
embodiment, a peak is identified at a location in the low
resolution disparity map having the largest value for mean
disparity value d.sub.M. The extent of the peak is determined by
traversing points in every direction, checking the disparity values
at each point, and stopping in a direction when the disparity
values start to rise. After determining the extent of the first
peak, the process repeats for any remaining points in the low
resolution map that have not been traversed.
[0073] For example, in FIG. 10, peak locations are identified at
(x.sub.M1, y.sub.M1) and (x.sub.M2, y.sub.M2) of the low resolution
disparity map 760 having mean disparity values d.sub.M1, d.sub.M2.
The arrows extending from the peak locations illustrate the paths
traversed from the peak locations. A watershed algorithm can be
implemented for performing the traversal routine.
[0074] Alternatively, pixels in the disparity map having at least
3.times.3 neighborhoods can be determine to be relatively flat
regions, that can be considered peak locations.
[0075] At 703 of FIG. 9, each of the peak locations are converted
to approximate head location in the high resolution filtered
disparity map. For example, in FIG. 10, peak locations (x.sub.M1,
y.sub.M1) and (x.sub.M2, y.sub.M2) in the low resolution disparity
map 760 are converted into locations (x.sub.R1, y.sub.R1) and
(x.sub.R2, y.sub.R2) in the high resolution disparity map 755. This
conversion can be accomplished by multiplying the peak locations by
the number and size of the bins in the corresponding x-or
y-direction.
[0076] At 704 of FIG. 9, the locations of the coarse people
candidates (e.g., (x.sub.R1, y.sub.R1) and (x.sub.R2, y.sub.R2)) in
the filtered disparity map and the mean disparity values d.sub.M1,
d.sub.M2 of the corresponding peak locations are forwarded to a
fine segmentation process for validating or discarding these
locations as people candidates, as in FIG. 11.
[0077] Fine Segmentation of People Candidates
[0078] FIGS. 11, 12, and 13 are diagrams illustrating a fine
segmentation process for validating or discarding coarse people
candidates according to one embodiment. In particular, FIG. 11 is a
flow diagram illustrating fine segmentation process for validating
or discarding coarse people candidates according to one embodiment.
In particular, the fine segmentation process obtains more accurate,
or fine, locations of the coarse people candidates in the filtered
disparity map and then determines whether the coarse people
candidates have the characteristic head/shoulder profiles from
localized analysis of the high resolution filtered disparity map.
Depending on the results, the fine segmentation process either
validates or discards the people candidates.
[0079] At 800, a two dimensional head template is generated having
a size relative to the disparity of one of the coarse candidates.
Disparity corresponds indirectly to height such that as disparity
increases, the distance from the camera decreases, and thus the
height of the person increases. For example, FIG. 12 is a block
diagram of an exemplary head template according to one embodiment.
In the illustrated embodiment, the template model 870 includes a
head template 875. The head template 875 is a circular model that
corresponds to the top view of a head.
[0080] The dimensions of the head template 875 are based on the
coarse location of the candidate (e.g., x.sub.R1, y.sub.R1), the
mean disparity value (e.g., d.sub.M1), and known dimensions of a
standard head (e.g. 20 cm in diameter, 10 cm in radius). For
example, to compute the dimensions of the head template, the
position of the head is computed in 3D world coordinates (X, Y, Z)
from the calculated coarse location and a mean disparity value
using the factory data (e.g., intrinsic parameters of camera
geometry) and field calibration data (e.g., camera to world
coordinate system transform). Next, consider another point in the
world coordinate system which is (X+10 cm, Y, Z) and compute the
position of the point in the rectified image space (e.g., x.sub.R2,
y.sub.R2) which is the image space in which all the image
coordinates are maintained. The length of the vector defined by
(x.sub.R1, y.sub.R1) and (x.sub.R2, y.sub.R2) corresponds to the
radius of the circular model for the head template 875.
[0081] Furthermore, each point within the area of the resulting
head template 875 is assigned the mean disparity value (e.g.,
d.sub.M1) determined for that candidate. Points outside the head
template 875 are assigned an invalid disparity value equal to
zero.
[0082] At 810 of FIG. 11, a fine location for the candidate is
determined through template matching. For example, in the
illustrated embodiment of FIG. 13, the template model 870 overlays
the filter disparity map 755 at an initial position corresponding
to the coarse head location (e.g., x.sub.R1, y.sub.R1). The
disparities of the filtered disparity map 755 that fall within the
head template 875 are then subtracted from the mean disparity value
for the coarse people candidate (e.g., d.sub.M1). A sum of the
absolute values of these differences is then computed as a template
score that serves as a relative indication of whether the
underlying points of the filtered disparity map correspond to a
head. Other correlation techniques may also be implemented to
generate the template score.
[0083] The template matching is repeated, for example, by
positioning the template 870 to other areas such that the center of
the head template 875 corresponds to locations about the original
coarse location of the candidate (e.g., x.sub.R1, y.sub.R1). A fine
location for the candidate (x.sub.F1, y.sub.F1) is obtained from
the position of the head template 875 at which the best template
score was obtained.
[0084] At 820, another mean disparity value d.sub.F1 is computed
from the points of the filtered disparity map within the head
template 875 centered at the fine candidate location (x.sub.F1,
y.sub.F1). In a particular embodiment, the mean disparity value
d.sub.F1 can be calculated by generating a histogram of all the
disparities of the filtered disparity map that fall within the head
template. Excluding the points in which the disparities are equal
to zero and thus invalid, the normalized mean disparity value
d.sub.F1 is calculated.
[0085] At 830, people candidates are discarded for lack of coverage
by analyzing the disparities that fall within the head template
which is fixed at the fine head location. For example, it is known
that disparity corresponds to the height of an object. Thus, a
histogram of a person's head is expected to have a distribution, or
coverage, of disparities that is centered at a particular disparity
tapering downward. If the resulting histogram generated at 820 does
not conform to such a distribution, it is likely that the candidate
is not a person and the candidate is discarded for lack of
coverage.
[0086] At 840, the process determines whether there are more coarse
candidates to process. If so, the process returns to 800 to analyze
the next candidate. Otherwise, the process continues at 850.
[0087] At 850, people candidates having head locations that overlap
with head locations of other people candidates are discarded. In a
particular embodiment, the head locations of all of the people
candidates are converted from the filtered disparity map into their
corresponding 3D world coordinates. People candidates whose head
locations overlap with the head locations of other people
candidates result in at least one of the candidates being
discarded. Preferably, the candidate corresponding to a shorter
head location is discarded, because the candidate likely
corresponds to a neck, shoulder, or other object other than a
person.
[0088] At 860, the one or more resulting fine head locations (e.g.,
x.sub.F1, y.sub.F1) of the validated people candidates and the
corresponding mean disparity values (e.g., d.sub.F1) are forwarded
for further processing to determine if the number of people in the
observed zone can be determined, at step 652.
[0089] Confidence Level Scoring of the Fuzzy Scoring Module
[0090] FIG. 14 is a flow diagram illustrating augmenting people
candidates by confidence level scoring according to one embodiment.
The input to the scoring algorithm includes the list of validated
people candidates and their locations in the filtered disparity
map. In particular, the input can be a data structure (e.g., array
or linked list data structure) in which the size of the data
structure corresponds to the number of validated people
candidates.
[0091] If, at 900, the number of validated people candidates is
equal to one or more persons, a confidence score F1 can be
generated at 910. The confidence score F1 corresponds to a
confidence level that the target volume contains only one person.
The confidence score F1 can be a value between 0 and 1.
[0092] If, at 920, the number of validated people candidates is
equal to two or more persons, a confidence score F2 can be
generated at 930. The confidence score F2 corresponds to a
confidence level that the target volume contains two or more
persons. The confidence score F2 can be a value between 0 and
1.
[0093] At 940, a confidence score F0 can be generated regardless of
the number of validated people candidates. The confidence score F0
corresponds to a confidence level that the target volume contains
at least one person. The confidence score F0 can be a value between
0 and 1.
[0094] At 950, 960, and 970 respectively, the confidence scores F0,
F1, and F2 are each averaged with confidence scores from previous
frames, resulting in average confidence scores F0.sub.AVG,
F1.sub.AVG and F2.sub.AVG. In a preferred embodiment, the
confidence scores F0, F1, F2 are weighted according to weights
assigned to each frame. The weights are intended to filter out
confidence scores generated from frames giving spurious
results.
[0095] At 980, the average confidence scores F0.sub.AVG, F1.sub.AVG
and F2.sub.AVG are used to determine the number of people present
(or absent) in the target volume.
[0096] Referring back to FIG. 6, the primary sensor 230 and the
secondary sensor 240 according to the exemplary embodiment
considers the confidence scores from step 980, to make a
determination about the number of people candidates in the
respective primary zone 210 and secondary zone 220, and a
confidence level of that determination, as shown at decision step
652. If the confidence that such a determination can be made, when
interfaced to the controller 310 using discrete I/O, a signal can
be asserted to the controller 310 to indicate if no people are
present, one person is present, or greater than one person is
present at step 672. If the confidence level is not sufficient to
make such a determination, a signal is asserted to indicate that
the sensor is "not ready",at step 662.
[0097] At step 652, motion analysis between frames is used for the
purpose of asserting a "not ready" signal, i.e., that the
respective sensor does not have an ambiguous result, and can
determine the number of people in the observed zone. In an
illustrative embodiment, motion detection is performed using an
orthographic projection histogram of 3D points on the floor. Each
point in the histogram is weighted such that the closer the point
is to the sensor, the less it contributes to the histogram value
following the square law. A point twice as far away contributes
four times as much resulting in a normalized count. The sum of
absolute differences is computed for the current frame and several
frames earlier, using a ring buffer. If the difference is
excessive, motion is sufficient to suggest that the observed scene
is not at a steady state to report a result. One skilled in the art
will appreciate that other methods of motion detection an/or
tracking objects between frames can be performed to determine a
steady state sufficient to report a result. A sequence of such
views and statistics for a duration (determined by the size of the
ring buffer) is used to determine if the system "ready/not ready"
signal can be asserted so that the number (or absence) of people in
the observed zone can be determined.
[0098] The exemplary embodiment of the present invention can be
implemented using the CPS-1000 PeopleSensor available from Cognex
Corporation, Natick, Mass. for both the primary sensor 230 and the
secondary sensor 240.
[0099] While the exemplary embodiment describes an implementation
of the present invention in a basic rectangular mantrap, the
invention can also be applied to large mantrap implementations and
complex geometrical shaped mantraps. The secondary sensor can
accommodate a large or an irregularly shaped secondary zone,
through the use of a plurality of secondary sensors with the
respective outputs logically combined (i.e., "ORed"). FIG. 8
depicts an exemplary arrangement of a plurality of secondary
sensors in an "L" shaped mantrap 105. Referring to FIG. 8, the
primary sesor 230 is mounted to observe the primary zone 210 in
front of the airside door 110. The secondary zone is split into two
regions, each with a secondary sensor. The first secondary zone 221
is observed by a first secondary sensor 241. The second secondary
zone 222 is observed by a second secondary sensor 242. As shown in
FIG. 8, the first secondary zone 221 can overlap the second
secondary zone 222 to ensure complete coverage. One skilled in the
art will appreciate that a plurality of secondary sensors can be
adapted to provide complete coverage of a secondary zone of a
mantrap that is shaped in an irregular pattern, or where regions of
the mantrap secondary zone would be obscured from view of a single
secondary sensor due to internal walls and/or partitions.
[0100] In an alternate embodiment of the present invention,
additional image analysis can be performed to provide increased
levels of security. The primary and secondary sensors in the
exemplary embodiment analyze a three-dimensional space for features
associated with objects or people in the respective zones. As
described above, each of the sensors performs volume filtering to
consider only those features that are detected in the 3D space
above the respective primary zone 210 or secondary zone 220. The
additional image analysis of the alternate embodiment will detect a
person lying down, or attempting to bring foreign objects into the
secure area.
[0101] A flowchart of the operation of the additional image
analysis of the alternate embodiment is shown in FIG. 7. During
operation, the three-dimensional space is analyzed according to the
methods described above. At step 710, if there are no people or
objects detected, e.g., the signal asserted by step 672 of FIG. 6
corresponds to no people or objects present, processing continues
to step 720 where a comparison of a two-dimensional image is made
to a baseline image 725.
[0102] An initial baseline image is provided during an initial
setup configuration. To collect the initial baseline image, a
plurality of images of the scene are acquired and statistics about
the variation of each pixel are computed. If the variance of the
intensity of a pixel is too high, it is added into a mask image so
that it is not considered by subsequent processing. For example, a
video monitor mounted within the mantrap will appear to be
constantly changing appearance, and therefore, can be masked from
consideration so that it does not falsely indicate the presence of
a person or object in the region during operation. The computed
statistics can also be used to set threshold levels used to
determine changes that are significant and those that are not.
[0103] The comparison step 720 compares the current two-dimensional
rectified image (from steps 610 and 612 of FIG. 6) to the baseline
image. If a pixel in the current image significantly differs in
value from the baseline, it is noted. These differing points can be
clustered together and if the resulting clusters have sufficient
size, it would suggest that a foreign object is in the mantrap. The
clustering can be performed using conventional blob image
analysis.
[0104] At step 730, if a significant difference is not detected,
processing continues to step 740 where the baseline image is
updated so that the comparison step 720 does not become susceptible
to gradual changes in appearance. At step 740, the baseline image
725 is linearly combined with the current image compared at step
720. Processing then continues for another of a continuous cycle of
analysis.
[0105] At step 730, if a significant difference is detected,
processing continues to step 735 where a significantly differing
pixel increments a timestamp count. At step 745, if the timestamp
count exceeds a threshold, the baseline pixel is updated at step
740. This threshold could be user settable, allowing the user to
decide how fast differences in the appearance of the mantrap get
blended into the baseline image. By setting the threshold long
enough, the dynamic baseline can be rendered essentially static. At
step 750, a signal is asserted to indicate to the controller that a
person or object is detected, and processing continues for another
of a continuous cycle of analysis.
[0106] Optionally, for an even higher level of security, one might
cluster the pixels being updated, and if there are sufficient
numbers and areas, a security guard might be notified with an image
of the new baseline image.
[0107] If most of the pixels are different than the dynamic
baseline, it could signify a drastic lighting change. This could be
caused by something like a light burning out. In this case one
could automatically reselect image exposure parameters, run 3-d
processing, reselect a dynamic 2D baseline and/or notify a security
guard about the change.
[0108] When a person seeking entry into the secured region and
enters the mantrap, the primary zone must be masked out of the
image in addition to the regions of high pixel value variance. When
someone is exiting the secured region through the mantrap, the
entire space (both primary and secondary zones) can be examined to
make sure that the area is clear and no one is attempting an
ambush.
[0109] In a second alternative embodiment of the present invention,
both the primary sensor 230 and the secondary sensor 240 are a
single three-dimensional machine vision sensor configured to
observe both the primary zone and the secondary zone at the same
time, or in rapid succession.
[0110] In yet another alternative embodiment of the present
invention, the secondary sensor 240 is a presence/absence detector,
or a series of presence/absence detectors. In this embodiment, for
example, the secondary sensor can be a pressure-sensitive mat that
outputs a signal indicating that a person or object is standing or
resting on the mat. Alternatively, the presence/absence detector
can be one or more light beam emitter/detector pairs that outputs a
signal indicating that a person or object blocks the light
emissions directed from the emitter to the detector.
[0111] Alternatively, the presence/absence detector can be an IR
sensor that outputs a signal indicating that motion of a person or
object is detected in the secondary zone. Further, one skilled in
the art will appreciate that the secondary sensor according to the
present invention can be any of a combination of various types of
presence/absence detectors that can be logically combined to output
a signal indicating that a person or object exists in the secondary
zone.
[0112] Although various calibration methods are described herein in
terms of exemplary embodiments of the invention, persons having
ordinary skill in the art should appreciate that any number of
calibration methods can be used without departing from the spirit
and scope of the invention. Although the exemplary embodiment
described herein is setup in the factory using factory setup
procedures, persons having ordinary skill in the art should
appreciate that any of the described setup steps can also be
performed in the field without departing from the scope of the
invention.
[0113] Although an interior orientation process for determining the
internal geometry of cameras in terms of the camera constant, the
image center, radial distortion coefficients and aspect ratio,
persons having ordinary skill in the art should appreciate that
additional intrinsic parameters may be added or some of these
parameters ignored in alternative embodiments within the scope of
the present invention.
[0114] Although ground plane calibration in the exemplary
embodiments described herein is performed at the location of
installation, persons having ordinary skill in the art should
appreciate that ground plane calibration could also be performed in
the factory or at alternate locations without departing from the
spirit and scope of the invention.
[0115] Although the invention is described herein in terms of a two
camera stereo vision system, persons skilled in the art should
appreciate that a single camera can be used to take two or more
images from different locations to provide stereo images within the
scope of the invention. For example, a camera could take separate
images from a plurality of locations. Alternatively, a plurality of
optical components could be arranged to provide a plurality of
consecutive views to a stationary camera for use as stereo images
according to the invention. Such optical components include
reflective optical components, for example, mirrors, and refractive
optical components, for example, lenses.
[0116] Although exemplary embodiments of the present invention are
described in terms of filtering objects having predetermined
heights above the ground plain, persons having ordinary skill in
the art should appreciate that a stereo vision system according to
the present invention could also filter objects at a predetermined
distance from any arbitrary plain such as a wall, without departing
from the spirit or scope of the invention.
* * * * *