U.S. patent application number 09/022666 was filed with the patent office on 2001-09-06 for word recognition device and method.
Invention is credited to CANEGALLO, ROBERTO, CHINOSI, MAURO, GOZZINI, GIOVANNI, KRAMER, ALAN, NAVONI, LORIS, ROLANDI, PIERLUIGI.
Application Number | 20010019629 09/022666 |
Document ID | / |
Family ID | 8230567 |
Filed Date | 2001-09-06 |
United States Patent
Application |
20010019629 |
Kind Code |
A1 |
NAVONI, LORIS ; et
al. |
September 6, 2001 |
WORD RECOGNITION DEVICE AND METHOD
Abstract
A word recognition device uses an associative memory to store a
plurality of coded words in such a way that a weight is associated
with each character of the alphabet of the stored words, wherein
equal weights correspond to equal characters. To perform the
recognition, a dictionary of words is first chosen; this is stored
in the associative memory according to a pre-determined code; a
string of characters which correspond to a word to be recognized is
received; a sequence of weights corresponding to the string of
characters received is supplied to the associative memory; the
distance between the word to be recognized and at least some of the
stored words is calculated in parallel as the sum of the difference
between the weights of each character of the word to be recognized
and the weights of each character of the stored words; the minimum
distance is identified; and the word stored in the associative
memory having the minimum distance is stored.
Inventors: |
NAVONI, LORIS; (CERNUSCO SUL
NAVIGLIO, IT) ; CANEGALLO, ROBERTO; (TORTONA, IT)
; CHINOSI, MAURO; (COLOGNO MONZESE, IT) ; GOZZINI,
GIOVANNI; (PALAZZOLO SULL'OGLIO, IT) ; KRAMER,
ALAN; (BERKELEY, CA) ; ROLANDI, PIERLUIGI;
(VOLPEDO, IT) |
Correspondence
Address: |
SEED INTELLECTUAL PROPERTY LAW GROUP PLLC
701 FIFTH AVE
SUITE 6300
SEATTLE
WA
98104-7092
US
|
Family ID: |
8230567 |
Appl. No.: |
09/022666 |
Filed: |
February 12, 1998 |
Current U.S.
Class: |
382/229 ;
707/E17.039 |
Current CPC
Class: |
Y10S 707/99936 20130101;
G06F 16/90344 20190101; G11C 15/046 20130101 |
Class at
Publication: |
382/229 |
International
Class: |
G06K 009/72 |
Foreign Application Data
Date |
Code |
Application Number |
Feb 12, 1997 |
EP |
97830052.3 |
Claims
1. A word recognition device, comprising an associative memory.
2. A device according to claim 1 wherein said associative memory
stores weights, each associated with a respective character of the
alphabet.
3. A device according to claim 2 wherein said associative memory
comprises a plurality of memory lines each formed by a plurality of
groups of memory locations each for the storage of a respective
word, adjacent groups of memory locations storing words of
different maximum length.
4. A device according to claim 3, comprising a switch matrix
suitable for selectively enabling one of said groups of locations
in said memory lines.
5. A word recognition method, comprising the step of storing a
plurality of words in an associative memory.
6. A method according to claim 5 wherein said step of storing
comprises the steps of: selecting a dictionary of words;
associating a weight with each character of the alphabet forming
said words, wherein equal weights correspond to equal characters;
storing said weights in said associative memory; and in that it
further comprises the steps of: receiving a string of characters
corresponding to a word to be recognized; supplying to said
associative memory a sequence of weights corresponding to the
string of characters received; calculating in parallel the distance
between said word to be recognized and at least some of said words
stored as the sum of the difference between the weights of each
character of said word to be recognized and the weights of each
character of said stored words; identifying a minimum distance from
among said distances; and storing the word stored in said
associative memory having said minimum distance.
7. A method comprising: selecting a dictionary of words;
associating an analog weight with each character of an alphabet
forming said words, wherein equal analog weights correspond to
identical characters; storing said analog weights in a first analog
associative memory; dividing said words into groups, with each
group corresponding to words having common lengths; transforming
each word of each group of words via said analog weights stored in
said first analog associative memory into a sequence of analog
weights, to provide groups of analog weight sequences, each said
group of analog weight sequences corresponding to one of said
groups of words having common lengths; separating a second analog
associative memory into portions; and storing said groups of analog
weight sequences in said second analog associative memory such that
each group of words corresponds to only one of said portions in
said second analog associative memory.
8. A method as claimed in claim 7 wherein said step of separating a
second analog associative memory into portions includes a step of
separating said second analog associative memory into
non-overlapping portions each comprising one or more columns of
said second analog associative memory.
9. A method as claimed in claim 7, further comprising steps of:
receiving a character string corresponding to a word; determining a
length of said character string; transforming said character string
into a series of analog signals via said analog weights stored in
said first analog associative memory; coupling said series of
analog signals to a portion of said second analog associative
memory storing words having lengths comparable to said length of
said character string; computing distances between each analog
weight sequence stored in said portion and said series of analog
signals; selecting those distances that are less than a
predetermined distance; and writing data corresponding to those
analog weight sequences providing distances less than said
predetermined distance or providing a distance of zero to a
memory.
10. A method as claimed in claim 9 wherein said step of writing
data corresponding to those analog weight sequences providing
distances less than said predetermined distance includes a step of
writing data describing locations in said second analog associative
memory corresponding to those analog weight sequences providing
distances less than said predetermined distance to a digital
memory.
11. A method as claimed in claim 9 wherein said step of writing
data corresponding to those analog weight sequences providing
distances less than said predetermined distance to a memory
includes a step of writing binary data corresponding to those
analog weight sequences providing distances less than said
predetermined distance to a digital memory.
12. A method as claimed in claim 9 wherein said step of computing
distances includes steps of: providing said series of analog
signals to a corresponding series of columns of said second analog
associative memory; summing, along each row within said series of
columns, output signals from analog memory cells across said series
of columns, to provide said distances; determining, via a
winner-take-all circuit, when any distances from said summing step
are zero; determining, via said winner-take-all circuit, those
distances less than a predetermined distance when none of said
distances from said summing step are zero; and writing data
corresponding to those analog weight sequences providing distances
less than said predetermined distance to a memory.
13. A method as claimed in claim 12 wherein said step of writing
data corresponding to those analog weight sequences providing
distances less than said predetermined distances includes a step of
writing binary data describing locations in said second analog
associative memory corresponding to those analog weight sequences
providing distances less than said predetermined distance to a
digital memory.
14. A method as claimed in claim 12 wherein said summing step
includes, for each sequence of weights corresponding to a word from
said dictionary, a step of summing differences between each weight
of said series of weights and each weight of said sequence of
weights corresponding to a word from said dictionary to provide a
distance for each sequence of weights corresponding to a word from
said dictionary.
15. A method as claimed in claim 14 wherein said summing step
includes a step of carrying out all of said steps of summing
differences for each sequence of weights corresponding to a word
from a dictionary simultaneously.
16. A method comprising steps of: receiving a character string
corresponding to a word; determining a length of said character
string; transforming said character string into a series of analog
signals via analog weights stored in a first analog associative
memory, wherein equal analog weights correspond to identical
characters; coupling said series of analog signals to a portion of
a second analog associative memory having a dictionary stored
therein, each word of said dictionary corresponding to an analog
weight sequence stored in said second analog associative memory,
said portion of said second analog associative memory storing words
having lengths comparable to said length of said character string;
computing distances between each analog weight sequence stored in
said portion and said series of analog signals; selecting those
distances that are less than a predetermined distance; and writing
data corresponding to those analog weight sequences providing
distances less than said predetermined distance to a memory.
17. A method as claimed in claim 16 wherein said step of computing
distances includes steps of: providing said series of analog
signals to a corresponding series of columns of said second analog
associative memory; summing, along each row within said series of
columns, output signals from analog memory cells across said series
of columns, to provide said distances; determining, via a
winner-take-all circuit, when any distances from said summing step
are zero; determining, via said winner-take-all circuit, those
distances less than a predetermined distance when none of said
distances from said summing step are zero; and writing data
corresponding to those analog weight sequences providing distances
less than said predetermined distance or a distance of zero to a
memory.
18. A method as claimed in claim 17 wherein said summing step
includes, for each sequence of weights corresponding to a word from
said dictionary, a step of summing differences between each weight
of said series of weights and each weight of said sequence of
weights corresponding to a word from said dictionary to provide a
distance for each sequence of weights corresponding to a word from
said dictionary.
19. A method as claimed in claim 18 wherein said summing step
includes a step of carrying out all of said steps of summing
differences for each sequence of weights corresponding to a word
from a dictionary simultaneously.
20. A method as claimed in claim 17 wherein said step of writing
data corresponding to those analog weight sequences providing
distances less than said predetermined distance to a memory
includes a step of writing binary data corresponding to those
analog weight sequences providing distances less than said
predetermined distance or a distance of zero to a digital memory.
Description
TECHNICAL FIELD
[0001] The invention relates to a word recognition device and
method.
BACKGROUND OF THE INVENTION
[0002] As is known, for reading text, particularly hand-written
text, various character recognition systems have been developed,
based on text segmentation, to separate the individual characters
or portions thereof one from another, and on processing of the
segments obtained for the identification of the characters. This
procedure outputs a series of characters including spaces and
punctuation marks.
[0003] Current systems are not, however, always capable of
outputting correct data because of the presence of noise, the
particular graphical characteristics of the text or the limited
capacities of the recognition system. Consequently, further
processing of the characters is necessary so as to guarantee the
correctness of the sequence of characters and the extraction of
meaningful words.
[0004] For these reasons, word recognition devices have been
proposed which compare the input word to be recognized with a
plurality of words belonging to a vocabulary, until a word in the
vocabulary which is identical to the word to be recognized is
identified or the word in the vocabulary that is nearest to that to
be recognized is identified. The comparison procedure, when carried
out sequentially on the words in the vocabulary, requires a
considerable amount of time.
SUMMARY OF THE INVENTION
[0005] An object of the invention is to produce a word recognition
device and method capable of processing the input characters so as
to output the word or words having the sequence of characters
closest to the input word in a simple and speedy manner.
[0006] In a first embodiment, the invention includes a method
having steps of selecting a dictionary of words and associating an
analog weight with each character of an alphabet forming the words.
The analog weights are such that equal analog weights correspond to
identical characters. The method also includes steps of storing the
analog weights in a first analog associative memory and dividing
the words into groups. Each group corresponds to words having
common lengths. The method additionally includes a step of
transforming each word of each group of words via the analog
weights stored in the first analog associative memory into a
sequence of analog weights, to provide groups of analog weight
sequences. Each group of analog weight sequences corresponds to one
of the groups of words having common lengths. The method further
includes steps of separating a second analog associative memory
into portions and storing the groups of analog weight sequences in
the second analog associative memory such that each group of words
corresponds to only one of the portions in the second analog
associative memory.
[0007] In a second preferred embodiment, the present invention
includes a method having steps of receiving a character string
corresponding to a word, determining a length of the character
string and transforming the character string into a series of
analog signals via analog weights stored in a first analog
associative memory. Equal analog weights correspond to identical
characters. The method also includes a step of coupling the series
of analog signals to a portion of a second analog associative
memory having a dictionary stored therein. Each word of the
dictionary corresponds to an analog weight sequence stored in the
second analog associative memory. The portion of the second analog
associative memory stores words having lengths comparable to the
length of the character string. The method additionally includes
steps of computing distances between each analog weight sequence
stored in the portion and the series of analog signals, selecting
those distances that are less than a predetermined distance and
writing data corresponding to those analog weight sequences
providing distances less than the predetermined distance to a
memory.
BRIEF DESCRIPTION OF THE DRAWINGS
[0008] For an understanding of the invention, a preferred
embodiment will now be described, purely by way of non-exhaustive
example, with reference to the accompanying drawings in which:
[0009] FIG. 1 is a general block diagram of a word recognition
device produced according to the invention;
[0010] FIG. 2 shows the architecture of the memory array;
[0011] FIG. 3 is a flow-chart relating to the recognition method
according to the invention;
[0012] FIGS. 4 and 5 show tables relating to the organization of
the memory array of FIG. 2;
[0013] FIG. 6 shows a table relating to the character recognition
errors;
[0014] FIG. 7 shows a graph obtained from the table of FIG. 6;
and
[0015] FIG. 8 shows an adjacency list obtained from the graph of
FIG. 7.
DETAILED DESCRIPTION OF THE INVENTION
[0016] In FIG. 1, the word recognition device 1 is located
downstream of an OCR or optical character recognition system (not
shown) which is provided according to any known technique, as
described, for example, in the article entitled "Off-line
Handwritten Word Recognition Using a Hidden Markow Model Type
Stochastic Network" by Mou-Yen Chen, Amlan Kundu and Jian Zhou,
IEEE Transactions on Pattern Analysis and Machine Intelligence,
Vol. 16, No. 5, May 1994.
[0017] The device 1 comprises a control unit 2, which coordinates
the activities of the device 1, as described in this specification
below, and has an input 3 at which it receives, from the OCR
system, strings of characters on the basis of which the words are
to be recognized; a data memory 4, storing data necessary for the
control unit 2 and coupled thereto; a switch matrix 6, coupled to
the control unit 2; a reference voltage generator 7, coupled to the
switch matrix 6 by means of input lines 25; a memory array 10,
coupled to the switch matrix 6; a selection block 11, coupled to
the outputs of the memory array 10 to identify the output of the
memory array 10 which has minimum value; a priority code generation
block 12, coupled to the output of the selection block; and a
memory element 13, coupled to the output of the priority code
generation block 12.
[0018] In detail, the control unit 2, which may be a microprocessor
or other software processing unit, for example, determines the
length of successive words, supplied by the character recognition
system, on the basis of the length of the strings of characters not
separated by spaces or punctuation marks and, on the basis of the
architecture of the memory and of the coding used for the
characters, it provides commands to the switch matrix 6. For this
purpose the data memory 4 supplies the control unit 2 with data
relating to the organization of the memory array 10 (i.e., data
relating to the columns 25 of the array 10 in which words of a
given length are stored, as explained below) and data relating to
the coding used for the individual characters, i.e., to the weight
(voltage level) associated with each character of the alphabet as
well as which of the lines 8 supplies that weight. Consequently,
the switch matrix 6 couples the lines 8 associated with the weights
corresponding to the word to be recognized to the predetermined
lines 25 of the memory array 10. The voltage values corresponding
to the weights of the different letters, according to a
pre-determined coding, are generated by the reference voltage
generator 7 which may, for example, be provided as described in
European patent application 96830498.0 filed on Sep. 30, 1996 in
the name of this applicant. The switch matrix 6 may be of any
acceptable type of the many known in the prior art, such as that
described in European patent application 96830497.2 filed on Sep.
30, 1996 in the name of this applicant.
[0019] The hardware to implement the memory array 10 comprises a
memory of the associative type, or content addressable memory of a
type well known in the art. When this type of memory receives a
datum formed by a sequence of elements at its input, it outputs a
datum correlated to the address of the line (generally row) in
which the datum closest to the input datum is stored. Preferably,
the memory 10 is of the auto-associative type, or it directly
outputs the stored datum closest to the input datum. For example,
the hardware to perform the memory function for the memory array 10
may be of any acceptable type of the many known in the prior art,
such as that described in the article by A. Kramer, M. Sabatini, R.
Canegallo, M. Chinosi, P. L. Rolandi and P. Zabberoni entitled
"Flash-Based Programmable Nonlinear Capacitor for
Switched-Capacitor Implementations of Neural Networks" in IEDM
Tech. Dig., pp. 17.6.1-17.6.4, December 1994.
[0020] In detail, as shown for clarity in FIG. 2, is one example of
the hardware for the memory array 10 which comprises M.times.N
pairs of cells 15 (4000 .times.64 pairs of cells for example),
located in M rows and N columns. Each pair of cells 15 comprises a
first cell 16 and a second cell 17. The drain and source terminals
of all the cells 16, 17 disposed on the same row are coupled
together to the inverting input of an operational amplifier 20 in a
charge integration configuration, having a non-inverting input
coupled to earth and an output 21 coupled to the inverting input
via a capacitor 22. A reset switch 23 controlled by the control
unit 2 (in a manner not shown) is located in parallel to the
capacitor 22. The outputs 21 of the operational amplifiers 20
define the outputs of the memory array 10.
[0021] The gate terminals of the first cells 16 belonging to the
same column are coupled to the same input line 25 of the memory
whilst the gate terminals of the second cells 17 belonging to the
same column are coupled to a respective different input 25. With
this hardware configuration, as described in detail in the
above-mentioned article by Kramer et al., by storing a
pre-determined voltage value in each pair of cells and by supplying
complementary voltage values Vg and V'g at the inputs 25 of the two
cells 16, 17 of a pair 15, a voltage value is obtained at each
output 21 of the array 10. This voltage is proportional to the
Manhattan distance between the input vector and the vector stored
in each row.
[0022] The distance values present at the outputs 21 of the memory
array 10 are supplied to the selection block 11 for identification
of the rows having shorter distance; the selection block 21 is of
known type and described, for example, in "Winner-take-all-networks
of 0(n) complexity" by Lazzaro, S Ryckenbusch, M A Mahowald and C
Mead in Tourestzky D (ed), Advances in Neural Network Information
Processing Systems 1. San Mateo CA: Morgan Kauffmann Publisher, pp.
703-711 (1988). The addresses of the rows at minimum distance (or
the stored contents of the rows) are then supplied to the priority
code generation block 12 which places them in a priority code,
starting from the row (or vector) at minimum distance and then to
the memory element 13 (an EEPROM, ROM, RAM, or other memory for
example) for them to be stored.
[0023] The word recognition device 1 of FIG. 1 operates according
to the following method, described with reference to the flowchart
of FIG. 3.
[0024] Initially a dictionary I is selected, or a base of
meaningful words in a certain language, block 30; this dictionary
must be suited to the physical limitations of the memory array 10.
A coding of the dictionary is then defined such as to show the
characteristics of the language in a readily computable way, block
31. As indicated, the coding takes place by associating an
appropriate voltage value (weight) to each character of the
alphabet, as exemplified below. Then, block 32, the dictionary is
inserted into the memory array 10 using the coding defined above,
preferably storing several words in each row of the array, as
described below.
[0025] Subsequently, block 33, the sequence of characters belonging
to a word to be recognized is input into the memory array 10, using
the same coding of the characters used to store the dictionary.
Specifically, on the basis of the coding table stored in the data
memory 4, the control unit 2 commands the switch matrix 6 so that
the matrix 6 supplies to the input lines 25 of the memory array 10
the corresponding pairs of voltage values which are complementary
to each other and generated by the reference voltage generator
7.
[0026] The memory array 10 then calculates the distance between the
word to be recognized and each of the words stored in the memory
array 10 or in the desired portion thereof, or calculates the sum
of the distance between the weights associated with the characters
forming the word to be recognized and the weights associated with
the corresponding characters of the words stored in the individual
rows (or addressed portions of rows), block 34. In particular, if
we call the coding of a single element (character) of a stored word
a.sub.i and the coding of a corresponding element (character) of
the word to be recognized b.sub.i, the memory array 10 calculates
the distance dist defined as: 1 dist = j = 1 L ( a i , b i )
[0027] in which L is the length of the word to be recognized and
.theta. represents the generic calculation function of the distance
(the Manhattan distance in the case of the memory array illustrated
above).
[0028] On the basis of this distance, as described above, the
blocks 11-13 are capable of showing and storing addresses of the
rows of the array 10 relating to the words closest to the input
word or directly storing the words, block 35.
[0029] In this way, the search for stored words most similar to an
input word is carried out in parallel on the entire array or on a
predetermined portion thereof. The output words may then be
subjected to further processing for a better reconstruction of the
original text on the basis of other criteria such as the context,
more or less common use of the words identified (frequency),
etc.
[0030] To optimize the occupation of the memory array 10 in view of
the presence of words of variable length, it is also proposed to
organize the memory array 10 by dividing it into sub-groups (groups
of columns or of rows) which are selectively addressable by the
control unit 2 through the switch matrix 6, and then to carry out a
dedicated search considering only the words linked to the input
configuration, or having homologous dimensions.
[0031] In detail, given the memory array 10 of dimension M.times.N,
given the base I (dictionary) of storable configurations (words) of
different length, also coming from different types of data, by
dividing the base I into a number s of classes, each containing
configurations having the same maximum length; indicating by max(j)
the maximum length of the configurations contained in the class j,
plus an arbitrary number of additional elements (such as the
frequency of the configuration -word- expressed as a codified
number), whenever the following inequality is met:
max(1)+max(2)+. . . +max(j-1)+max(j).ltoreq.N
[0032] for j.ltoreq.s. This configuration excludes at most a
limited number of elements of the base I. It is possible to
organize the memory array 10 in such a way that each line of the
memory array 10 comprises a plurality (j) of groups of memory
locations, with each group of locations of a line being intended
for the storage of a word. Adjacent groups of memory locations on
the same line store words of different maximum length, whilst
groups of memory locations belonging to different lines but
disposed on the same columns store words belonging to the same
class (having the same length).
[0033] An example of organization of the memory array 10 in the
case in which the words are stored in rows is shown in the table in
FIG. 4. In this table, the columns of the array 10 are re-grouped
into groups of columns each associated with a different class of
the base I (and the number of columns of each group is equal to the
maximum length of the configurations belonging to the respective
class). The configurations (words) belonging to the same class are
stored in different rows of the respective group of columns.
[0034] Given this organization, by considering a dictionary I of
approx. 25,000 words of different length, taking into account that
the frequency of the words in a text decreases as the length of the
words increases and that words of length greater than 24 characters
represent 0.4% of the total, it is possible to sub-divide the
memory array 10 as illustrated in detail in the table shown in FIG.
5. The organization described above enables 90% occupation of the
memory to be obtained with only 0.4% of the words in the dictionary
that are not stored.
[0035] With this type of organization, word recognition takes place
by comparing the word supplied to the inputs 25 of the memory array
10 with the words stored in the corresponding group of columns, as
stored in the data memory 4.
[0036] The organization described above enables different types of
data to be loaded onto the same line, associating them with the
classes organized by columns and then selecting the calculation on
the basis of the data required. For example, as an alternative to
that shown in the table of FIG. 5, in which the memory array 10
stores only complete words, it is possible to store in the same
memory array (but in another portion thereof) the weights used for
the recognition of the individual characters and in another portion
the weights used for the recognition of the words, thereby using a
single memory device both for the recognition of characters (OCR)
and for the recognition of words.
[0037] By using a five bit coding of the memory cells 16, 17, it is
possible to program up to 32 different levels. Noting that the
Latin alphabet comprises 26 letters, the coding of the words may
take place by associating a different voltage level of
predetermined value with each character of the alphabet forming a
word. The six remaining levels may be used for the coding of
characters which are special or used as word separators.
[0038] Advantageously, the assignment of the weight (voltage level)
to each character is carried out taking account of the
"morphological closeness" of the different characters. In
particular, this is carried out on the basis of the following
considerations.
[0039] Every character recognition device is capable of supplying
at least three types of responses: a correct response, when the
device succeeds in recognizing the character correctly; an
incorrect response when the device outputs a character which does
not correspond to the original one; or the rejection of the
character, when the device does not have sufficient elements for
the decision. By carrying out functional tests in an OCR device it
is possible to extract data about the discriminatory capacities of
the device and organize them in a table, a so-called confusion
table, which represents, for each input character, the number of
times that character has been recognized in the form of a
pre-determined output character.
[0040] An example of a confusion table obtained for a given
character recognition device is shown in FIG. 6. In the confusion
table of FIG. 6, "rejections" of characters have not been accepted;
the OCR device has been forced to supply an output character in any
case. In this way, if on the one hand there is an increase in
noise, on the other hand it is possible to show to a greater extent
the morphological closenesses of the characters. In the table,
relating to the recognition of upper-case characters, the number in
correspondence with the intersection between a given column and a
given row indicates the number of times the letter indicated in the
column in question has been recognized as the letter indicated in
that row. Consequently, the results relating to correct
recognitions are plotted on the diagonal and the remaining results
refer to incorrect recognitions. In practice, therefore, the
confusion table represents the degree of similarity of one
character to another.
[0041] An oriented and weighted graph has been extracted from the
confusion table, from which graph the noise, produced by occasional
incorrect character recognition, is excluded. This exclusion may be
carried out by regarding as zero the similarity of characters which
may have been confused with one another less frequently than a
pre-determined threshold (less than 5 times for example). An
adjacency graph is obtained in this way which, for the table of
FIG. 6, assumes the form shown in FIG. 7. In practice, the
adjacency graph graphically represents the proximity of groups of
characters having homologous characteristics. The longer the path
from one letter to another, the easier it is to recognize
differences between them. The stronger connections may be
identified by noting the weight of each connection and of the
double connections existing between different pairs of
characters.
[0042] An adjacency list, shown in FIG. 8, in which the characters
most easily confused with each other are placed one next to the
other, has been obtained from the adjacency graph. This list is
used to establish the coding of the characters in the memory array
10. In practice, the weight associated with each character is
chosen such that the difference between two adjacent characters is
represented solely by one weight unit, whilst very different
weights correspond to distant elements. For example, the adjacency
list of FIG. 8 may be coded by associating the weight (or coding) 0
for the letter D, 1 for the letter 0, 2 for the letter Q and so on,
up to 29 for the letter X.
[0043] In this way, and as mentioned above, the memory array 10
supplies, at each of its outputs 21, different values equal to the
distance between the coding of the input word and the word stored
in the row corresponding to that output. In particular, if the
input word has a complete meaning and has been stored in the array
10, one of the outputs will have an output equal to 0, indicating
that the input word has been recognized. When, on the other hand,
none of the outputs has the value zero and one or more outputs has
a very low voltage level, there is distorted recognition of a
character due to the confusion of the original character with one
of the characters adjacent to it on the adjacency list of FIG.
7.
[0044] Obviously, the coding obtained in this way is strictly
linked to the performance of the OCR device which carries out the
recognition and for different OCR devices it is necessary to repeat
the procedure to obtain the respective adjacency list and hence the
respective coding.
[0045] The advantages that can be obtained with the device
described are as follows. The use of an associative memory enables
the search for words most similar to an input word to be
parallelized, enabling a high response rate to be obtained. The
optimization of the memory occupation enables a large vocabulary to
be stored and hence the recognition capacities of the device to be
improved. The possibility of storing data of different types means
that the device is highly flexible, on the basis of the
requirements and the type of coding described which makes use of
the homologies between the characters making the correct words more
easily identifiable in the case in which the word to be recognized
has one or more errors (characters reconstructed incorrectly).
Consequently the device has improved efficiency and reliability
compared with current devices.
[0046] Finally, it will be clear that numerous modifications and
variants, all of which come within the scope of the inventive
concept, may be introduced to the device described and illustrated
here. In particular, the organization described of the memory array
10 and the coding illustrated are solely exemplary. The array may
be of the auto-associative or hetero-associative type, whenever the
data output are sufficient to identify one or more elements stored
on the basis of the distance from the input word to be recognized.
Although the device described permits an identification solely of
words with a given type of characters (upper-case for example), the
use of a larger number of devices or the use of a memory of larger
size enables words with both types of character to be
identified.
* * * * *