U.S. patent application number 11/305694 was filed with the patent office on 2007-06-21 for database assisted ocr for street scenes and other images.
Invention is credited to Bret Taylor, Luc Vincent.
Application Number | 20070140595 11/305694 |
Document ID | / |
Family ID | 38173572 |
Filed Date | 2007-06-21 |
United States Patent
Application |
20070140595 |
Kind Code |
A1 |
Taylor; Bret ; et
al. |
June 21, 2007 |
Database assisted OCR for street scenes and other images
Abstract
Optical character recognition (OCR) for images such as a street
scene image is generally a difficult problem because of the variety
of fonts, styles, colors, sizes, orientations, occlusions and
partial occlusions that can be observed in the textual content of
such scenes. However, a database query can provide useful
information that can assist the OCR process. For instance, a query
to a digital mapping database can provide information such as one
or more businesses in a vicinity, the street name, and a range of
possible addresses. In accordance with an embodiment of the present
invention, this mapping information is used as prior information or
constraints for an OCR engine that is interpreting the
corresponding street scene image, resulting in much greater
accuracy of the digital map data provided to the user.
Inventors: |
Taylor; Bret; (Menlo Park,
CA) ; Vincent; Luc; (Palo Alto, CA) |
Correspondence
Address: |
GOOGLE / FENWICK
SILICON VALLEY CENTER
801 CALIFORNIA ST.
MOUNTAIN VIEW
CA
94041
US
|
Family ID: |
38173572 |
Appl. No.: |
11/305694 |
Filed: |
December 16, 2005 |
Current U.S.
Class: |
382/310 ;
701/468 |
Current CPC
Class: |
G06K 9/00664 20130101;
G06K 9/3258 20130101; G06K 9/72 20130101; G06K 2209/01
20130101 |
Class at
Publication: |
382/310 ;
701/213 |
International
Class: |
G06K 9/03 20060101
G06K009/03; G01C 21/00 20060101 G01C021/00 |
Claims
1. A method for assisting optical character recognition (OCR) of a
street scene image using a mapping system database, the method
comprising: determining a target GPS location for a street scene
image using known GPS data associated with that street scene image;
estimating a street address of the target GPS location; identifying
a target address range based on the street address of the target
GPS location; querying a mapping system database to identify a
business name having a street address in the target address range;
performing OCR of the street scene image to determine if key words
associated with the identified business name are present; and in
response to determining that at least one key word associated with
the identified business name is present, determining an actual GPS
location for the street address of that business name, based on the
known GPS data.
2. The method of claim 1 further comprising: updating the mapping
system database to include the actual GPS location.
3. The method of claim 1 further comprising: repeating the method
for a number of additional target GPS locations.
4. The method of claim 1 wherein performing OCR of the street scene
image to determine if key words associated with the identified
business name are present further includes performing image
analysis of the street scene image to determine if expected
features associated with the identified business name are
present.
5. The method of claim 1 wherein the street scene image includes
one street address.
6. The method of claim 1 wherein the street scene image is a
panoramic image that includes a plurality of street addresses.
7. The method of claim 1 wherein the known GPS data includes known
GPS locations for at least two locations captured in the street
scene image.
8. The method of claim 1 wherein the mapping system database
includes a business listings directory.
9. A method for assisting optical character recognition (OCR) of a
street scene image using a database, the method comprising:
querying a database to identify a feature expected in a street
scene image of one or more street addresses, the street scene image
associated with known GPS data; performing OCR of the street scene
image to determine if the expected feature is present in the street
scene image; and in response to determining that the expected
feature is present, determining an actual GPS location for a street
address associated with that expected feature.
10. The method of claim 9 further comprising: updating the database
to include the actual GPS location.
11. The method of claim 9 wherein querying the database identifies
a number of textual and non-textual features.
12. The method of claim 11 further comprising: performing image
analysis of the street scene image to determine if non-textual
expected features are present.
13. The method of claim 9 wherein the street scene image includes
one street address.
14. The method of claim 9 wherein the street scene image is a
panoramic image that includes a plurality of street addresses.
15. The method of claim 9 wherein the known GPS data includes known
GPS locations for at least two locations captured in the street
scene image.
16. A method for assisting optical character recognition (OCR) of
an image using a database, the method comprising: querying a
database to identify at least one keyword corresponding to text
expected to be in an image; and performing OCR of the image to
determine if the keyword is present in the image.
17. The method of claim 16 wherein the keyword is further
associated with a street address or a key event.
18. The method of claim 16 wherein querying the database identifies
a number of textual and non-textual expected features.
19. The method of claim 18 further comprising: performing image
analysis of the image to determine if non-textual expected features
are present.
20. The method of claim 16 wherein the image is one of a photograph
or video frame.
21. The method of claim 16 wherein the image is associated with
known GPS location data, and in response to determining that the
keyword is present, the method further comprises determining an
actual GPS location associated with that keyword.
Description
RELATED APPLICATIONS
[0001] This application is related to U.S. application Ser. No.
11/088,542, filed Mar. 23, 2005, titled "Generating and Serving
Tiles in a Digital Mapping System." In addition, this application
is related to U.S. application Ser. No. 11/051,534, filed Feb. 5,
2005, titled "A Digital Mapping System." In addition, this
application is related to U.S. application Ser. No. 11/181,386,
filed Jul. 13, 2005, titled "Visually-Oriented Driving Directions
in Digital Mapping System." Each of these applications is herein
incorporated in its entirety by reference.
FIELD OF THE INVENTION
[0002] The invention relates to optical character recognition
(OCR), and more particularly, to database assisted OCR for images
such as street scenes.
BACKGROUND OF THE INVENTION
[0003] There is a current trend for capturing photographic data
(pictures) of cities, streets, businesses, etc. These pictures are
typically captured in a way that also captures GPS location and
orientation (e.g., facing 67 degrees east). This data can then be
used by mapping services, to enhance and augment the quality of the
data being returned. For example, when returning a map of 123
University Avenue, Palo Alto Calif. 94301, street level pictures of
this location can also be returned, which can significantly improve
the user experience and the value of the map information
returned.
[0004] One problem here is that the mapping from a GPS location to
a street address, and vice versa, is not always very accurate. This
problem can be traced to the way map data is collected. In general,
the GPS location of certain "anchor" street addresses along a
particular street is known, but addresses in-between these anchors
are interpolated. As such, significant discrepancies can sometimes
be observed between the actual GPS location of an address and the
interpolated location. As a result, the street images shown by a
mapping service for a particular address could end up being shifted
by as much as 100 yards or more.
[0005] What is needed, therefore, are techniques that improve the
accuracy of interpolated or otherwise estimated street address
locations.
SUMMARY OF THE INVENTION
[0006] One embodiment of the present invention provides a method
for assisting optical character recognition (OCR) of an image using
a database. The method includes querying a database to identify at
least one keyword corresponding to text expected to be in an image,
and performing OCR of the image to determine if the keyword is
present in the image. In one such configuration, the image is
associated with known GPS location data, and the keyword(s) can be
derived from information associated with the image, such as a
business name, address, street name, or other descriptive
information. The keywords are used to assist the OCR process in
identifying text in the image. Another embodiment of the present
invention further extends the above method, by determining, in
response to determining that the keyword is present in the image,
an actual GPS location associated with that keyword. In another
such embodiment, the keyword is further associated a key event
captured in the image (e.g., such as a touch down in a
sub-titled/closed-captioned video image). In another such
embodiment, querying the database identifies a number of textual
and non-textual expected features. In this case, the method may
further include performing image analysis of the image to determine
if non-textual expected features are present. The image can be, for
example, one of a photograph or video frame.
[0007] Another embodiment of the present invention provides a
method for assisting optical character recognition (OCR) of a
street scene image using a database. In this embodiment, the method
includes querying a database to identify a feature expected in a
street scene image of one or more street addresses, the street
scene image associated with known GPS data. The method continues
with performing OCR of the street scene image to determine if the
expected feature is present in the street scene image. In response
to determining that the expected feature is present, the method
continues with determining an actual GPS location for a street
address associated with that expected feature. The method may
include updating the database to include the actual GPS location.
In one particular case, querying the database identifies a number
of textual and non-textual features. In such an embodiment, the
method may further include performing image analysis of the street
scene image to determine if non-textual expected features are
present.
[0008] Another embodiment of the present invention provides a
method for assisting optical character recognition (OCR) of a
street scene image using a mapping system database. The method
includes determining a target GPS location for a street scene image
using known GPS data associated with that street scene image,
estimating a street address of the target GPS location, and
identifying a target address range based on the street address of
the target GPS location. The method continues with querying a
mapping system database to identify a business name having a street
address in the target address range, and performing OCR of the
street scene image to determine if key words associated with the
identified business name are present. In response to determining
that at least one key word associated with the identified business
name is present, the method continues with determining an actual
GPS location for the street address of that business name, based on
the known GPS data. The method may include updating the mapping
system database to include the actual GPS location. The method may
include repeating the method for a number of additional target GPS
locations. In one particular case, performing OCR of the street
scene image to determine if key words associated with the
identified business name are present further includes performing
image analysis of the street scene image to determine if expected
non-textual features associated with the identified business name
are present. The street scene image can be, for example, a
panoramic image that includes a plurality of street addresses.
Alternatively, the street scene image (e.g., regular or panoramic)
can be one street address. The known GPS data includes, for
instance, known GPS locations for at least two locations captured
in the street scene image. The mapping system database may include,
for example, a business listings directory, and/or other digital
mapping system data.
[0009] The features and advantages described herein are not
all-inclusive and, in particular, many additional features and
advantages will be apparent to one of ordinary skill in the art in
view of the figures and description. Moreover, it should be noted
that the language used in the specification has been principally
selected for readability and instructional purposes, and not to
limit the scope of the inventive subject matter.
BRIEF DESCRIPTION OF THE DRAWINGS
[0010] FIG. 1a illustrates a typical city block that includes a
number of physical addresses.
[0011] FIG. 1b illustrates a conventional mapping system's
representation of the city block shown in FIG. 1a.
[0012] FIG. 2 is a block diagram of a database assisted OCR system
configured in accordance with one embodiment of the present
invention.
[0013] FIG. 3 is a block diagram of the street scene image OCR
module shown in FIG. 2, configured in accordance with one
embodiment of the present invention.
[0014] FIG. 4 illustrates a method for assisting optical character
recognition of street scene images using a mapping system database,
in accordance with one embodiment of the present invention.
DETAILED DESCRIPTION OF THE INVENTION
[0015] Optical character recognition (OCR) for images such as
street scenes (e.g., storefronts) is generally a difficult problem
because of the variety of fonts, styles, colors, sizes,
orientations, occlusions and partial occlusions that can be
observed in the textual content of such scenes. However, a database
query can provide useful information that can assist the OCR
process.
[0016] For instance, a query to a digital mapping database can
provide information such as one or more businesses in a vicinity,
the street name, and a range of possible addresses. In accordance
with an embodiment of the present invention, this mapping
information is used as prior information or constraints for an OCR
engine that is interpreting the corresponding street scene. The
result of the OCR process can also be used to refine or otherwise
update the mapping system database, particularly if the GPS
location and orientation where the picture was taken is known. The
result is much greater accuracy of the digital map data provided to
the user.
[0017] Example Application
[0018] Consider the following example shown in FIG. 1a, which shows
a typical city block (University Ave) that includes a number of
addresses, ranging from 100 to 200, and their actual locations.
This range of addresses are included between two intersections, as
are most city blocks. An image of the entire city block could be
captured, for example, in a panoramic photograph. Other features
typically found in such a city block scene, such as parked cars,
curb, sidewalk, trees and planters, retaining walls, glass
storefronts, signage, addresses, etc may also be included in a
given city block. Note that the use of terms such as "city block"
and "intersection" is not intended to exclude other streets that
include a range of addresses, whether residential or
commercial.
[0019] FIG. 1b illustrates a conventional mapping system's
representation of the city block shown in FIG. 1a. An underlying
digital map system database (e.g., Navteq) has an actual GPS
location for 100 University Ave and 200 University Ave. Assume only
one side of the street is being considered. In conventional mapping
systems, the addresses between 100 and 200 University Ave. are
typically interpolated by dividing the city block equally.
[0020] However, a common situation is where there are only a few
addresses between the anchor address of 100 and 200 (e.g., 180,
170, 168, and 164, as shown in FIG. 1a). So, when querying the
database for an address such as 102 University Avenue, one will get
a map image location very close to 100 (based on even address
distribution). In actuality, however, 102 University Avenue could
be as much as halfway between the anchor addresses of 100
University Ave and 200 University Ave, or closer to 200 University
Ave than to 100 University Ave. Indeed, in this example, there is
no 102 University Avenue.
[0021] In more detail, and with reference to FIGS. 1a and 1b, a
first street intersection is at 100 University Ave and a second
intersection is at 200. Significantly, there are no street numbers
between 100 University Ave and 164 University Ave in this example.
Thus, if the GPS location of the store (or other structure) at 164
University Ave is interpolated using conventional techniques (based
on its street address), it would be placed far to the left of its
actual location, as shown in FIG. 1b.
[0022] As such, the user may be confused or otherwise have a
difficult time when attempting to actually locate 164 University
Ave. This problem is exacerbated on longer streets and streets that
have one or more breaks in the middle. In the latter case, it is
possible that the returned map data provided to the user could send
the user to the wrong section of the street. If the user does not
know that the street continues after a break (e.g., to allow for a
park or large campus), then that user may conclude that the target
street address does not actually exist.
[0023] As will be apparent in light of this disclosure, database
assisted OCR can be used to significantly improve this situation.
For instance, assume a collection of images or photographs taken
between 100 University Ave and 200 University Ave are available.
Alternatively, assume a wide panorama image showing the entire city
block between 100 and 200 University Ave is available. In any such
case, a significant amount of existing database information is
known about the images. For instance, a mapping database (e.g.,
Navteq or Google Local) would indicate that the street numbers are
all even numbers ranging in numerical order between 100 and 200.
Furthermore, actual street numbers are known. In addition, business
names at these addresses are known, as well as the order that these
businesses occur along the street.
[0024] This existing database information can be used as a set of
constraints for an OCR engine specifically trained to work on city
scenes. For example, a list of constraints (such as the mentioned
existing database information) could be used by an OCR engine to
reduce the problem to word spotting (i.e., is this word or number
detected in this image?). Alternatively, or in addition to, Hidden
Markov Models (HMM) and other statistical approaches can be used to
provide a set of constraints, as used in OCR applications such as
forms recognition.
[0025] Through such a constrained OCR approach, a much refined
mapping from street addresses to GPS locations is provided.
Numerous benefits can be realized, including refinement of the
underlying database (e.g., Navteq), and improved user experience
(the right images would be displayed for each street address). In
addition, this approach as described herein could be used as a more
efficient and cost-effective process to collect Navteq-type data.
Note that this approach is not limited to digital pictures and
mapping systems but could also be used for video data, or panorama
images of the style currently produced for the CityBlock Project at
Stanford University.
[0026] System Architecture
[0027] FIG. 2 is a block diagram of a database assisted OCR system
configured in accordance with one embodiment of the present
invention. As can be seen, the system includes a street scene image
OCR module 205, a map data database 210, and a digital mapping
system 215.
[0028] In the embodiment shown in FIG. 2, the database assisted OCR
system operates in both an off-line capacity, as well as in an
on-line capacity. In particular, OCR results are generated by the
street scene image OCR module 205 in an off-line process, and can
be used to refine/update the map data in database 210. This
off-line process effectively improves the accuracy of map data in
the database 210, by removing errors associated with address
interpolation and other techniques for estimating address locations
on a given street, as previously discussed. The structural and
functional details of the street scene image OCR module 205 are
discussed in more detail with reference to FIGS. 3 and 4.
[0029] During on-line operations, requests (e.g., HTTP) for map
data (e.g., written and graphical driving directions, maps, local
data, etc.) are received by the digital mapping system 215. The
request can be initiated, for example, by a user engaging a web
browser of a computing device to access the system. In response to
such a client request, the digital mapping system 215 accesses the
map data database 210 and integrates the relevant map data into the
response to the request. This map data can then be served to the
requestor via a network (e.g., Internet or local area network) and
web browser of the requestor's computing device.
[0030] The digital mapping system 215 can be implemented with
conventional or custom technology. The map data database 210 can
also be implemented with conventional or custom technology (e.g.,
for storing Navteq and/or Google Local map data). In one particular
embodiment, the digital mapping system 215 and map data database
210 are implemented as described in the previously incorporated
U.S. application Ser. No. 11/088,542. The remote client (not shown)
receives the requested graphical map data, and requests any map
tiles it doesn't already have displayed or cached (e.g., as
explained in the previously incorporated U.S. application Ser. No.
11/051,534). When the tiles are received from the server side of
the digital mapping system, the client draws and displays the map,
along with the driving directions. The client side can also be used
to draw (e.g., overlay) graphical driving directions, location
markers, etc on the map image. However, the present invention is
not intended to be limited to systems that provide tile-based maps.
Rather, embodiments of the present invention can also be used with
other mapping systems, such as non-tile vector-based and
raster-based mapping systems. Still other embodiments of the
present invention can be used to provide database assisted OCR for
applications other than digital mapping systems.
[0031] For instance, consider a system for analyzing video that
includes sub-titles (including closed-captioning). In this
embodiment, the database for assisting the OCR process would
include the sub-titles for each frame of video (and other textual
information, such as "laughter" or "explosion"). Here, the OCR
engine would be constrained, for example, to look for sub-titled
dialog in the frames of video. Once the OCR process identifies the
target sub-titled text, the frame and/or time associated with that
text could be noted. Such an application might be useful, for
example, in the context of a smart video playback system. In more
detail, assume a sports fan that has video recorded the broadcast
of a game with closed-captioning enabled (to textually capture the
commentary). A playback system configured in accordance with an
embodiment of the present invention would include a database
storing the commentary transcript of the game, or an even more
refined data collection, such as a set of key event terms (e.g.,
"touch down," "home run," "goal," "score," "unsportsmanlike
conduct," "4.sup.th and goal," "punt," "double play,"
"interception," etc). An OCR engine would then search for the
transcript text or key event terms (or other expected features) in
the frames of sub-titled video. Transcript pages or key events
could then be correlated to video frames for quick reference. The
smart video playback system could then be trained to only play
frames leading up to (and just after) the frames where a
significant event occurred.
[0032] Street Scene Image OCR Module
[0033] FIG. 3 is a block diagram of the street scene image OCR
module 205 shown in FIG. 2, configured in accordance with one
embodiment of the present invention. The module 205 performs
database assisted OCR, the results of which can then be used to
update the map data database 210. The database 210 can then be
accessed by, for example, the digital mapping system 215 as
previously discussed. Note, however, that the street scene image
OCR module 205 can be used with other applications, as will be
understood in light of this disclosure. Generally, any image
analysis can benefit from using database assisted OCR.
[0034] As can be seen, the module 205 includes a storefront image
database 305, a business listings database 310, an image
registration module 315, an optical character recognition (OCR)
module 320, a target address range estimator module 325, and a
target GPS calculator module 330. Numerous variations on this
configuration for performing database assisted OCR will be apparent
in light of this disclosure, and the present invention is not
intended to be limited to any one such embodiment.
[0035] In operation at preprocessing time (off-line), the street
scene image OCR module 205 employs one or more databases of street
scene images (e.g., storefront image database 305), together with
one or more corresponding databases of business listings (e.g.,
business listings database 310) and/or map data databases (e.g.,
map data database 210). Each database can be structured to
facilitate efficient access of data, and include various types of
information. For example, each street-level image (e.g., digital
photograph taken using a GPS-enable camera) stored in the
storefront image database 305 can be indexed by geocode, and
associated with corresponding GPS coordinates. The business
listings database 310 and map data database 210 can be structured,
for example, as conventionally done.
[0036] In an alternative embodiment, the illustrated databases are
integrated into a single database that is accessed to assist the
OCR process. Also, other databases or information sets could be
included, such as a conventional residential listings (e.g., white
pages directory) database or other such listing service databases.
Further note that the image databases may include multiple views
and/or zoom levels of each photographed area. For instance, one
storefront image can be taken from an angle as it would be seen
from one direction of the street (e.g., traveling north), while
another storefront image of the same address could be taken from an
angle as it would be seen from the other direction of the street
(e.g., traveling south). Thus, depending on the driving direction
route, either image could be used.
[0037] The storefront image database 305 can store different kinds
of images. In one example embodiment, there are two primary modes
in which the system 205 can operate: mode 1 and mode 2. Either mode
can be used, or a combination of the modes can be used.
[0038] Mode 1 uses panoramic images, where a single panoramic image
captures multiple street addresses (e.g., one city block, or a
string of contiguous address locations on a street), such as shown
in FIG. 1a. Such panoramic pictures can be taken, for example,
using a panoramic camera or a regular camera equipped with a
panoramic lens. In this mode, the GPS coordinates at every point
along the picture is known or can be accurately calculated. For
instance, given a panoramic picture corresponding to the block of
100 to 200 University Ave shown in FIG. 1a, where the GPS location
at either end of the block are known (e.g., based on GPS receiver
data taken at same time of image capture), then the GPS coordinates
can be calculated at every point along the way using linear
interpolation. Thus, GPS coordinates can be determined for each
corresponding address location in the panoramic image with
reasonable accuracy (e.g., +/-1 to 2 meters).
[0039] If so desired, exact GPS coordinates of every pixel or
vertical line in a panoramic image can be known. In more detail, a
differential GPS antenna on a moving vehicle can be employed, along
with wheel speed sensors, inertial measurement unit, and other
sensors, which together, enable a very accurate GPS coordinate to
be computed for every pixel in the panorama. However, such high
accuracy is not required. As long as GPS coordinates at some
regularly sampled points (such as street corners) are known,
sufficiently accurate GPS coordinates of locations in-between could
be interpolated, as previously discussed.
[0040] Mode 2 uses more traditional imagery, such as digital
pictures taken with regular digital cameras, or any camera that
generates images upon which OCR can be performed (e.g., disposable
cameras). In this mode, a single set of GPS coordinates is known
for each picture, corresponding to the exact location where each
picture was taken. Each picture corresponds to one particular
street address. Thus, given a series of such picture/GPS data
pairs, exact GPS coordinates are known for each corresponding
address location on that street. Alternatively, the end pictures of
a series can be associated with knows GPS data, so that the GPS
data for the in-between addresses can be estimated with reasonable
accuracy.
[0041] The image registration module 320 is programmed or otherwise
configured to construct a mapping between images and business
listings. In one embodiment, this mapping is accomplished by a
combination of image segmentation using standard image-processing
techniques (e.g., edge detection, etc.) and interpolation of a
business's street address within the range of street addresses
known to be contained in the image. Image registration can be done
for the street scene images stored in the storefront image database
305, and any other images that can be used in the OCR image
analysis process (e.g., such as satellite images). The mapping can
be implemented, for example, with a pointer or address scheme that
effectively connects images from an image database to listings in
the business listings database. Alternatively, a single database
can be built as the image registration process is carried out,
where the records of the single database are indexed by geocode
and/or GPS coordinates, and each record includes image data and
related business listings information.
[0042] In the embodiment shown, image processing (e.g., OCR) is
performed by accessing the images by way of the image registration
module 315 (e.g., which can access the images stored in the
database 305 using a pointer or addressing scheme). Other
embodiments can access the images directly from their respective
databases. In any case, once a target address range is known,
images associated with that range can be identified and subjected
to image processing using the OCR module 320, to determine actual
GPS location data for each of the addresses detected in the
images.
[0043] The target address range provided to the OCR module 320 can
be determined using the target GPS calculator module 330 and the
target address range estimator module 325. In more detail, actual
GPS location data associated with a particular image is provided to
the target GPS calculator module 330 from, for example, the
storefront image database 305 or the image registration module 315.
This actual GPS location data can be, for instance, known GPS data
associated with two anchor points of an image (e.g., such as
discussed with reference to 100 and 200 University Ave of FIG. 1a).
Alternatively, the actual GPS location data can be, for instance,
known GPS location data associated with each of two single address
images that have a number of addresses between them.
[0044] In any such case, when two actual GPS coordinates are known,
the target GPS calculator module 330 can use that known GPS data to
calculate any in-between GPS data. For instance, a target GPS
location (GPS.sub.target) at the midpoint between two known actual
GPS locations (GPS.sub.1 and GPS.sub.2) can be calculated using
linear interpolation (e.g.,
GPS.sub.target=[|GPS.sub.1-GPS.sub.2|/2]+GPS.sub.1). This is
particularly useful for panoramic images that include multiple
address locations between two anchor points (as is the case
sometimes in mode 1). Likewise, this calculation is useful for a
contiguous series of single address images, where only the images
at that beginning and end of the series have GPS location data (as
is the case sometimes in mode 2). In short, if the actual target
GPS location data is not known, it can be interpolated or otherwise
calculated based on known GPS location data. If the target GPS
location data is already known, then no calculation by module 330
would be necessary.
[0045] The target GPS location data (whether previously known or
calculated by module 330) is then provided to the target address
range estimator module 325, which uses the target GPS location data
to estimate a target address range. For instance, the target GPS
location data can be used to identify a corresponding address in a
table lookup operation (e.g., of database 305 or module 315). Once
the corresponding address is identified, the business listings
database 310 can be queried to return a number (e.g., 10) of
addresses before that corresponding address, and a number (e.g.,
10) of addresses after that corresponding address, so as to provide
a range of target addresses. Alternatively (or in addition to), the
map data database 210 can be queried to return the set of addresses
before that corresponding address, and the set of addresses after
that corresponding address, so as to provide the range of target
addresses.
[0046] In another embodiment, the target address range estimator
module 325 can estimate the target address range using
interpolation. For example, if the target address is somewhere in
the middle of the city block shown in FIG. 1a, then that target
address can be interpolated or otherwise estimated (e.g.,
[100+200]/2=150). Then an address tolerance can be assigned (e.g.,
+/-20). Thus, in this example, the target address range would be
addresses 130 to 170 on University Ave.
[0047] In any case, once the target address range is determined,
then the available databases can be queried to provide information
that can be used to constrain the OCR process. For instance, the
business listings database 310 can be queried to identify the store
names at the addresses included in the target address range.
Alternatively, the map data database 210 can be queried to identify
the store names at the addresses included in the target address
range. Alternatively, both databases 210 and 310 can be queried to
identify the store names, and the results can then be cross-checked
for correlation and to identify business names missing from one of
the databases. Each of these results can be used an expected
feature for constraining the OCR process.
[0048] The OCR module 320 can now be applied to the storefront
image or images to read the storefront signage (if any) and other
readable imagery, using OCR algorithms and techniques as further
improved by the present invention. As previously explained, the OCR
process can be constrained based on the query results of the
databases 205 and/or 310. In one particular embodiment, the OCR
module 320 is constrained by the business names returned from the
database query or queries. For instance, the OCR module 320 can be
constrained to look for text such as "McDonalds," "Fry's
Electronics," "H&R Block," and "Pizza Hut." The OCR module 320
can also be constrained, for example, by identifying the type of
store or stores in the target address range, based on the
corresponding category listings in the business listings database
310 (e.g., "bars and restaurants" or "flowers" as done in a
conventional yellow pages directory). Recall that the image
registration module 315 has already mapped the images to
corresponding listings within the business listings database 310,
thereby facilitating this context identification for the OCR
process. In addition, text related to that business listings
category can be obtained, for example, by accessing web sites of
stores in that category, and adjusting the language model used for
OCR module 320, accordingly. This supplemental information from the
map database 210 and/or business listings database 310, and/or
websites enables the OCR module 335 to be further informed of the
context in which it is operating (in addition to knowing the store
names for which it is searching).
[0049] In one particular configuration, the constrained OCR search
can be carried out using a template matching technique. In more
detail, for each expected feature to be used for a given image, at
least one identification template is generated. The identification
templates can be bitmaps, vectors, splines, or other
representations of the expected feature. For any given expected
feature, a number of different templates can be constructed in
various font types, or font sizes, or font styles. Further, where
the expected feature has a predetermined and consistently used font
style, shape, or other form (e.g., the specific font used for
"McDonald's"), then this font is used for the generation of the
identification templates. As the OCR module processes the image,
image features are compared with at least one of the identification
templates. The OCR module then uses the results of the comparison
to make the OCR determination.
[0050] For instance, suppose that one of the candidate words being
searched for in an image is "165". In this case, a number of bitmap
renditions of "165" could be generated at various scales and using
various fonts to form the identification templates. Then, features
of the image could be compared to the renditions, to see if that
numerical pattern was in the image. Such approaches work
particularly well, for example, for identifying street numbers,
where the range of fonts is relatively limited. There are numerous
such template matching approaches that can be used, as will be
apparent in light of this disclosure. Along the same lines, another
way to constrain the OCR is by using a "digits only" lexicon or
language pack. This limits the search to street numbers only (or
other numeric patterns), but because of the constraint introduced,
greater accuracy is achieved. In one such embodiment, the image can
be binarized using, for example, the Niblack approach (e.g., Wayne
Niblack, An Introduction to Image Processing, Prentice-Hall,
Englewood Cliffs, NJ, 1986, pp. 115-116, which is herein
incorporated in its entirety by reference), and then running a
commercial OCR package (e.g., Abbyy FineReader) with a digits-only
lexicon. Other such image processing techniques can be used as
well.
[0051] In addition to OCR, the OCR module 335 can also be
programmed or otherwise configured to further analyze storefront
images. In one embodiment, this supplemental image analysis is
carried out at both a coarse level (e.g., width, height, color
histograms) and a more refined level (e.g., segmentation into
facade, doors, windows, roof, architectural elements such as
pillars and balconies; decorative elements such as awnings,
signage, neon lights, painted designs). Such analysis can carried
out, for example, using standard image-processing techniques (e.g.,
computer vision). Standard feature extraction algorithms typically
extract high level information from images, such as shapes, colors,
etc. Pattern recognition algorithms can then be applied to classify
the extracted information so as to "recognize" objects in the
storefront images. The patterns and other features identified
during this supplemental image analysis may be helpful, for
instance, where a particular storefront indicated in an image does
not include any helpful text that can be identified by OCR
processing.
[0052] For instance, the supplemental image analysis can be used to
identify trade dress and logos. In more detail, the image
processing constraints provided from the databases 210 and/or 310
might include store names (e.g., McDonalds and Pizza Hut) as well
as known trade dress and logos corresponding to those store names
(e.g., Golden Arches for McDonalds and the unique-shaped red roof
for Pizza Hut). Thus, the OCR module 320 can be looking for
"McDonalds" and "Pizza Hut", while supplemental image analysis can
be looking for the Golden Arches and the unique red roof design.
Note that the supplemental image analysis can be programmed into
the OCR module 320, or can exist independently as a distinct
module. The various types of image analysis can be carried out in
parallel, if so desired.
[0053] Once the OCR module 320 identifies targeted features (e.g.,
business names and/or other targeted text, symbols, colors,
graphics, logos, etc) in the image, then the known GPS
coordinate(s) associated with that image can then be assigned to
the corresponding addresses determined by the target address range
estimator module 325. As such, each address captured in the image
will now have actual GPS coordinates (as opposed to interpolated or
otherwise estimated). This actual GPS location data can then be
integrated into the map data database 210 (or other databases) to
further improve its accuracy.
[0054] Thus, efficient and effective OCR on images of natural
scenes is enabled. This efficiency and effectiveness is derived
from constraints learned from database queries to one or more
databases or other information stores (e.g., websites). Numerous
variations and applications will be apparent in light of this
disclosure. One such application is to use database assisted OCR
techniques described herein to implement or otherwise complement a
digital mapping system, such as the one described in the previously
incorporated U.S. application Ser. No. 11/181,386, thereby enabling
the service of highly accurate and visually-oriented driving
directions.
[0055] Methodology
[0056] FIG. 4 illustrates a method for assisting optical character
recognition of street scene images using a mapping system database,
in accordance with one embodiment of the present invention. The
method can be carried out, for example, using the street scene
image OCR module 205 of FIGS. 2 and 3, although other
implementations will be apparent in light of this disclosure.
[0057] For this example, assume a street scene image being analyzed
includes a number of address location, including two address
locations having known GPS locations, with some address locations
therebetween having unknown GPS locations. The method begins with
determining 405 a target GPS location using anchor addresses and/or
other addresses having known GPS locations, as previously explained
(e.g., GPS.sub.target=[|GPS.sub.1-GPS.sub.2|/2]+GPS.sub.1). Note
that this equation can be repeated a number of times to find
multiple target GPS locations (e.g.,
GPS.sub.target1=[|GPS.sub.1-GPS.sub.2|/2]+GPS.sub.1);
GPS.sub.target2=[|GPS.sub.1-GPS.sub.target1|/2]+GPS.sub.1);
GPS.sub.target3=[|GPS.sub.1-GPS.sub.target2|/2]+GPS.sub.1). The
resolution of calculated target GPS locations will depend on the
particular street being analyzed. In one embodiment, a GPS location
is calculated for every 3 meters of the actual street.
[0058] The method continues with estimating 410 an address of the
target GPS location (e.g., [100+200]/2=150, with respect to the
University Ave example of FIG. 1a). Address interpolation or other
suitable estimating techniques can be used to estimate the address.
The method continues with .identifying 415 a target address range
based on estimated street address of target GPS location (e.g.,
150+/-20). This target address range can be used for searching in
the map database (or other appropriate database) to provide OCR
constraints.
[0059] The method continues with querying 420 the map database to
identify business names having a street address in target address
range. The method continues with performing 425 OCR of the street
scene image to determine if key words (and/or other descriptive
features, as previously explained) associated with the identified
business names are present.
[0060] In response to determining that a keyword is present, the
method continues with determining 430 an actual GPS location for
the street address of that particular business name (as indicated
by detected the keywords). The method continues with updating 435
the map database to include the correct GPS location for the street
address of that business. The method then continues with
determining 440 if there are more target GPS locations to analyze.
If so, then the method repeats for each particular target GPS
location. If there are no more target GPS locations to analyze,
then the method concludes.
[0061] Variations on this embodiment will be apparent in light of
this disclosure. For instance, another embodiment is a method. for
assisting OCR of an image using a database of expected keywords
(and/or other expected data). Here, the method includes querying
the database (any storage facility) to identify at least one
keyword corresponding to text expected to be in an image, and
performing OCR of the image to determine if the keyword is present
in the image. As an alternative to OCR (or in addition to OCR),
image analysis may be performed to identify if expected non-textual
features retrieved from the database are included in the image.
[0062] The foregoing description of the embodiments of the
invention has been presented for the purposes of illustration and
description. It is not intended to be exhaustive or to limit the
invention to the precise form disclosed. Many modifications and
variations are possible in light of this disclosure. It is intended
that the scope of the invention be limited not by this detailed
description, but rather by the claims appended hereto.
* * * * *