U.S. patent number 8,731,233 [Application Number 10/348,225] was granted by the patent office on 2014-05-20 for system of automated document processing.
This patent grant is currently assigned to ABBYY Development LLC. The grantee listed for this patent is Konstantin Anisimovich, Andrey Lubenets, Konstantin Zuev. Invention is credited to Konstantin Anisimovich, Andrey Lubenets, Konstantin Zuev.
United States Patent |
8,731,233 |
Anisimovich , et
al. |
May 20, 2014 |
**Please see images for:
( Certificate of Correction ) ** |
System of automated document processing
Abstract
A system is proposed for automated document processing,
comprising a document, consisting of two sections--a main section,
containing data in printed character form, and a supplementary
section in a machine-readable form; a document forming means; a
document inputting means; a character recognition means; a main and
supplementary data comparison means. Said system uses the
supplementary section data to confirm the main section data. The
supplementary section data can fully or partly duplicate the main
section data, supplement it and also comprise other additional
data. The supplementary machine-readable section can be realized in
a form of coded consecutive characters, printed graphic image
(bar-code), magnetic, optical, microprocessor or other kind of data
storage means. For enhancing security of documents all or a part of
data can be coded prior to introduction into the supplementary
section.
Inventors: |
Anisimovich; Konstantin
(Moscow, RU), Zuev; Konstantin (Moscow,
RU), Lubenets; Andrey (Moscow, RU) |
Applicant: |
Name |
City |
State |
Country |
Type |
Anisimovich; Konstantin
Zuev; Konstantin
Lubenets; Andrey |
Moscow
Moscow
Moscow |
N/A
N/A
N/A |
RU
RU
RU |
|
|
Assignee: |
ABBYY Development LLC (Moscow,
RU)
|
Family
ID: |
32501896 |
Appl.
No.: |
10/348,225 |
Filed: |
January 22, 2003 |
Prior Publication Data
|
|
|
|
Document
Identifier |
Publication Date |
|
US 20040117738 A1 |
Jun 17, 2004 |
|
Foreign Application Priority Data
|
|
|
|
|
Dec 17, 2002 [RU] |
|
|
2002133899 |
|
Current U.S.
Class: |
382/100; 382/101;
382/139; 382/181 |
Current CPC
Class: |
G06V
10/22 (20220101); G06V 30/1444 (20220101); G06V
30/10 (20220101) |
Current International
Class: |
G06K
9/00 (20060101) |
Field of
Search: |
;382/181,101,139,100 |
References Cited
[Referenced By]
U.S. Patent Documents
Primary Examiner: Carter; Aaron W
Attorney, Agent or Firm: Weiland; LeighAnn Krishnan;
Aditya
Claims
The invention claimed is:
1. A system of automated document processing, the system
comprising: a document, containing at least a main section and a
supplementary section, said main section containing data in
character form, said supplementary section containing data in
machine-readable form, wherein the supplementary section data is
written onto a special data storage media, and wherein said special
data storage media is attached to said supplementary section of the
document; a document forming device, including a device for
printing the main section data, and a special device for data
transformation and outputting to the supplementary section of the
document; at least one device for inputting data from the main and
supplementary sections; a character recognition device, capable of
recognizing the data in character form; and a comparison device
capable of comparing a whole or predefined portion of the
recognized character data from the main section of the document
with a whole or a predefined portion of the data from the
supplementary section; wherein said comparing depends on a type of
said document.
2. The system as recited in claim 1, wherein the supplementary
section data comprises a copy of the whole or a significant part of
the main section document data.
3. The system as recited in claim 1, wherein the supplementary
section comprises a complementary portion of document data that is
absent in the main section.
4. The system as recited in claim 1, wherein the supplementary
section comprises other supplementary data.
5. The system as recited in claim 1, wherein a decision about a
type of subsequent document processing is made in accordance with
the result of a comparison of the main and the supplementary
sections data by the comparison device.
6. The system as recited in claim 1, wherein a decision about the
accuracy of the main section data is made in accordance with the
result of the comparison of the main and the supplementary sections
data.
7. The system as recited in claim 6, wherein the decision on the
data accuracy is made automatically.
8. The system as recited in claim 1, wherein the main section data
is considered as more accurate than the supplementary section
data.
9. The system as recited in claim 1, wherein the supplementary
section data is considered as more accurate than the main section
data.
10. The system as recited in claim 1, wherein the decision on the
data accuracy is made only in the case of a full coincidence of
predetermined portions of main and supplementary sections.
11. The system as recited in claim 1, wherein the supplementary
section data is placed onto the document via printing.
12. The system as recited in claim 1, wherein the special data
storage media is a magnetic type storage media.
13. The system as recited in claim 1, wherein the special data
storage media is an optical type storage media.
14. The system as recited in claim 1, wherein the special data
storage media is a microprocessor type storage media.
15. The system as recited in claim 1, wherein the supplementary
section data is placed on the document in the form of aggregate of
points or strokes.
16. The system as recited in claim 1, wherein the supplementary
section data is placed on the document in bar-code form.
17. The system as recited in claim 1, wherein the supplementary
section data is placed on the document in the form of a
one-dimensional bar-code.
18. The system as recited in claim 1, wherein the supplementary
section data is placed on the document in the form of a
two-dimensional bar-code.
19. The system as recited in claim 1, wherein the supplementary
section data is placed on the document in special ink.
20. The system as recited in claim 1, wherein the supplementary
section data is placed on the document as coded in the form of a
character string.
21. The system as recited in claim 20, wherein the supplementary
section data is placed on the document as coded in the form of a
numerical string.
22. The system as recited in claim 1, wherein the supplementary
section data is subjected to extra coding prior to outputting onto
the document.
23. A method of automated document processing comprising: printing
a main section of a document, wherein the main section includes
data in character form; printing a supplementary section of the
document, wherein the supplementary section includes data in
machine-readable form, wherein the machine-readable form includes
an aggregate of at least one of points and strokes, wherein the
supplementary section data is written onto a special data storage
media, and wherein said special data storage media is attached to
said supplementary section of the document; performing optical
character recognition (OCR) on the main section of the document;
comparing the OCR'd data in the main section of the document to the
machine-readable data in the supplementary section of the document,
wherein said comparing depends on a type of said document; in
response to comparing the OCR'd data in the main section to the
machine-readable data in the supplementary section, determining
that the OCR'd data in the main section is accurate; and in
response to determining that the OCR'd data in the main section is
accurate, storing the document.
24. A method according to claim 23 wherein determining that the
OCR'd data in the main section is accurate is performed
automatically.
25. An apparatus for automated document processing comprising: a
text printer that is configured to print a main section of a
document in character form; a data transformer that is configured
to transform into machine-readable form at least one of data points
and data strokes and to print the transformed machine-readable data
in a supplementary section of the document, wherein the
supplementary section data is written onto a special data storage
media, and wherein said special data storage media is attached to
said supplementary section of the document; a data receiver that is
configured to receive the main section of the document and the
supplementary section of the document; an optical character
recognizer that is configured to recognize characters in the main
section of the document; a comparator that is configured to compare
the OCR'd data in the main section of the document to the
machine-readable data in the supplementary section of the document,
wherein said comparison depends on a type of said document; a
decision maker that, in response to comparing the OCR'd data in the
main section to the machine-readable data in the supplementary
section, is configured to decide that the OCR'd data in the main
section is accurate; and a storage that is configured to store the
document in response to the decision that the OCR'd data in the
main section is accurate.
26. An apparatus according to claim 25 wherein the decision maker
is further configured to decide manually that the OCR'd data in the
main section is accurate.
27. An apparatus according to claim 25, wherein the decision maker
is further configured to decide in an automated manner that the
OCR'd data in the main section is accurate.
28. The system of claim 1, wherein said document type determines
whether the main section or the supplementary section is considered
as correct when there is a discrepancy between the two
sections.
29. The method of claim 23, wherein said document type determines
whether the main section or the supplementary section is considered
as correct when there is a discrepancy between the two sections.
Description
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates generally to a document formation,
input, control and processing automation and more particularly to
the recognition of printed and handwritten characters from a
bit-mapped image file.
2. Prior Art
According to traditional methods, document processing comprises
either document formation in printed form on a paper media, and
further input thereof, commonly manually or by scanning and
recognition at the point of accepting, processing, registering and
storing. But for all that, both methods do not guarantee the
absence of errors. During manual input most of errors are due to
the peculiarities of a human as an operator. An automated input via
scanning and recognition causes errors due to the probabilistic
base of recognition methods.
Automated document input and recognition is used in a bank
automated system and specifically in payment document input,
according to which the point of accepting documents is equipped
with an optical scanner, connected to a computer, where the
recognition process is performed. The system performs the payment
document scanning and further text recognition, i.e. uses the
probabilistic methods that can cause errors. In the said system the
verification is performed by an operator causing the decrease of
system productivity in comparison with fully automated control.
Known systems uses various kinds of supplementary machine readable
data storage means to achieve various technical results.
Traditionally a bar-code is used in connection with documents or
goods for assigning to them a unique machine-readable
identification number for automated registration or recordation
purposes. The following US patents can be an example of this--U.S.
Pat. No. 5,640,647 issued Jun. 17, 1997, U.S. Pat. No. 6,276,535
issued Aug. 21, 2001, U.S. Pat. No. 6,085,975 issued Jul. 11, 2000,
U.S. Pat. No. 5,844,221 issued Dec. 1, 1998, U.S. Pat. No.
5,804,806 issued Sep. 8, 1998, U.S. Pat. No. 5,682,819 issued Nov.
4, 1997, and U.S. Pat. No. 5,493,107 issued Feb. 20, 1996.
The main inadequacy of traditional systems is a limitation of
bar-code use mainly for identification number storage. For example,
system of mail items registration and service, according to U.S.
Pat. No. 6,101,487 issued Aug. 8, 2000, proposes postal requisites
coding and inserting into a bar-code, marked on a mail item for
automation of passing it through process to the addressee. As
mentioned previously, the inadequacy of such a system is through
the different sources from which a human operator and a computer
get the address data, and the impossibility of an operator to
visually control the barcode data.
One more known system involves enhancing security of gaming tickets
(tickets in game business) by embodying a machine-readable indicium
(preferably bar-code) into payout ticket from a gaming machine, is
proposed in U.S. Pat. No. 6,110,044 issued Aug. 29, 2000. This
system does not suppose mass ticket processing with automated data
comparison between text and bar-code sections of each ticket. The
data verification is performed visually for each winning ticket and
is not a quick process. The main disadvantage of the method is its
unfitness for automated verification of text data with that of
bar-code.
Another known method deals with a bar-code used for document
registration purposes in an automated specialized database (U.S.
Pat. No. 6,356,923 issued Mar. 12, 2002). According to it a
registration card is marked by a bar-code containing accounting
data and a table of contents of the document in coded form. Said
table of contents has no room on the card in text form. The system
either does not support automated verification or confirmation of
the text data via bar-code data.
In the automated system of payment document formation and control,
proposed in patent RU #2190252 Sep. 27, 2002, a bar-code, either
one- or two-dimensional, is used for providing automatic document
input. All significant data of a payment document of standard form
is written to a bar-code printed on the spare space of the
document. The system is provided with a special device for bar-code
data input. Payment document data read from bar-code is directed to
the further bank processing and storage. A mutual data confirmation
between bar-code and text is not provided.
The system does not sufficiently prevent falsification, since the
text portion of a document and its bar-code portion can contain
different transaction details that cannot be verified visually. The
differences may concern payment sum, beneficiary details etc. The
falsification safeness is especially important for payment
documents, that are the main subject of the said patent.
To increase the falsification safeness of text or/and bar-code data
the said system needs to include supplemental visual verification
of text data in conformity with bar-code data, and that will
require involving a human operator for processing and thus
considerably decrease system productivity.
So, all known methods are highly limited in ability to automate
data input and confirmation and thus they cannot be used for
achieving the declared technical result.
SUMMARY OF THE INVENTION
The technical result of the proposed invention is an acceleration
of document processing, reducing data input errors, confirmation of
document data authenticity.
The said technical result is achieved by dividing the document into
two sections--the main section, containing data in a text form, and
the supplementary section, containing data copy of all or a
significant portion of document information, any additional data,
adapted for automated input by special computer compatible devices.
Such additional data is not convenient or even possible for human
visual perception. The addition of a said machine-readable data
section eliminates input errors, provides data protection function,
prevents from manual data modification. In the present invention
the supplementary data is used either for automated verification of
recognition results of the main section text or data contents
confirmation.
The system of the present invention comprises:
a) a document, comprising at least of two sections--a main section,
containing document data in a text form, suitable for human visual
perception, and a supplementary section, containing data in a
machine-readable form;
b) document forming means, providing printing of the text portion
of document, and data transformation to machine-readable form and
writing it onto the supplementary section thereof;
c) document data input means, suitable for either a character
(commonly optical scanning device), or a machine-readable data
input (Readers for machine-readable data may differ, depending on
machine-readable media type);
d) character recognition means, commonly specialized software for
text recognition from bit-mapped image file, obtained from optical
scanner or the like; in the preferred embodiment as a specialized
software is successfully used such as "ABBYY FineReader" or "ABBYY
FormReader" of the latest versions, depending on the document type
("ABBYY FineReader"Ver.6.0. Users Guide. Moscow: 2002. "ABBYY
FormReader"Ver.4.1. Users Guide. Moscow: 2001); and
e) means for the main and supplementary data comparison.
A comparison of all or a part of document data is provided. The
size of compared portions of data is set beforehand. Depending upon
the document type information of either the main section or the
supplementary one may be considered as correct data.
BRIEF DESCRIPTION OF THE DRAWING DRAWINGS
FIG. 1a shows a document provided by a one-dimension bar-code.
FIG. 1b shows a document provided by a two-dimension bar-code.
FIG. 1c shows a document provided by data coded by consecutive
characters.
FIG. 1d shows a document provided by magnetic data media.
FIG. 1e shows a document provided by optical data media.
FIG. 1f shows a document provided by magneto-optical data
media.
FIG. 1g shows a document provided by electro-mechanical data
media.
FIG. 2 shows the flow diagram of the system.
FIGS. 1a, 1b, 1c, 1d, 1e, 1f, 1g, show the main (1) and the
supplementary (2) sections of a document.
DETAILED DESCRIPTION OF THE INVENTION
The main distinction of the system, proposed by the present
invention is that it uses data from a document main section and a
supplementary section for the mutual comparison and confirmation of
data accuracy.
The document formation means are not connected physically with the
rest of the system.
The supplementary section may contain at least a copy of the whole
or a part of the document data. The supplementary section data can
also supplement the main section data, or contain other additional
information.
The supplementary section of the document may be realized either as
printed on the document or embodied into it (or attached to
it).
The supplementary section of the document may be placed on an empty
space either on the face or opposite side thereof.
Various kinds of stroke or graphic images can be printed on the
document, and particularly the standard and/or non-standard
barcodes, points and/or spot assemblies, character successions, and
their combinations.
The embodied or attached means can be realized as machine-readable
media of various kinds. It can be realized on magnetic, optical,
micro-electronic, micro-processor or other bases, if its dimensions
provide to imbed it into an empty band of the document, and data
access may be applied to such document in the technological process
of data processing.
The decision-making rule may vary depending on different document
types. The data of either the main or the supplementary section may
be assumed as a correct one. Some kind of conclusion can be made
even in the case of non-coincidence of both sections, giving no
preference to anyone of them.
In the case of a data discrepancy between the main and
supplementary sections, the final decision about the data
correctness and content may be made with the help of a human
operator or by special automated means.
For enhancing security of a document, all or a part of data can be
additionally coded prior to introduction into a supplementary
section.
Some kinds of documents are adapted to work (fit, function) in the
system, proposed by the present invention are shown in FIGS. 1a,
1b, 1c, 1d, 1e, 1f, and 1g.
FIG. 1a shows a document provided by one-dimension bar-code.
FIG. 1b shows a document provided by two-dimension bar-code.
FIG. 1c shows a document provided by data coded by a character
consecution.
FIG. 1d shows a document provided by magnetic data media.
FIG. 1e shows a document provided by optical data media.
FIG. 1f shows a document provided by magneto-optical data
media.
FIG. 1g shows a document provided by electro-mechanical data
media.
FIG. 2 shows the flow diagram of the system.
A general overview of the invention is illustrated in FIG. 2.
By means of a document forming device (1) a new document (2) is
created, and it contains two sections--the main section with all
data of the document printed on it in usual printed character form,
suitable for human visual perception, and the supplementary section
with data in machine-readable form. To use a special data media,
differing from a printed image in the supplementary section, a
special input device is necessary.
A document forming device may not be required to be connected
physically with the rest the system.
Document is directed to the system input device (3), fit for
optical scanning of the character data of the main section and
supplementary section data. If a special data media, used in the
supplementary section, differs from a printed image, a special
input device is necessary.
The main section data is then directed for character recognition
and marking out the significant portion thereof (4).
Whole or a predefined portion of the main section data is then
compared with whole or a predefined portion of the supplementary
section data in the block of comparison (5).
If data from both sections coincide with each other, the document
is assumed as correct and is directed to further processing or
storage (7).
If data from both sections does not coincide with each other, all
data is directed to additional processing (6). The said additional
processing may be performed by human operator intervention or in a
fully automatic manner. In the case of data confirmation on this
stage, the document, assumed as correct, is directed to further
processing or storage (7). Otherwise, the document is marked
erroneous and therefore rejected (8).
* * * * *