Identity Verification By Voice Signals In The Frequency Domain Patent Grant Hair , et al. June 27, 1 [Texas Instruments Incorporated]

Identity Verification By Voice Signals In The Frequency Domain

Hair , et al. June 27, 1

Patent Grant 3673331

U.S. patent number 3,673,331 [Application Number 05/003,991] was granted by the patent office on 1972-06-27 for identity verification by voice signals in the frequency domain. This patent grant is currently assigned to Texas Instruments Incorporated. Invention is credited to George D. Hair, James U. Kincaid.

United States Patent	3,673,331
Hair , et al.	June 27, 1972

IDENTITY VERIFICATION BY VOICE SIGNALS IN THE FREQUENCY DOMAIN

Abstract

Voice verification is accomplished at a plurality of spaced apart facilities each having a plurality of terminals. Multiplexing structure interconnects the terminals through a communications link to a central processing station. Analog reproductions of voices transmitted from the terminals are converted into digital signals. The digital signals are transformed into the frequency domain at the central processing station. Predetermined features of the transformed signals are compared with stored predetermined features of each voice to be verified. A verify or non-verify signal is then transmitted to the particular terminal in response to the comparison of the predetermined features.

Inventors:	Hair; George D. (Irving, TX), Kincaid; James U. (Richardson, TX)
Assignee:	Texas Instruments Incorporated (Dallas, TX)
Family ID:	21708579
Appl. No.:	05/003,991
Filed:	January 19, 1970

Current U.S. Class:	704/246; 704/203; 704/238; 704/249; 704/272; 704/E17.005
Current CPC Class:	G10L 17/02 (20130101); G06F 3/16 (20130101); G07C 9/257 (20200101)
Current International Class:	G10L 17/00 (20060101); G06F 3/16 (20060101); G07C 9/00 (20060101); G10l 001/04 (); G10l 001/08 ()
Field of Search:	;179/15A,1VS,15B ;340/146.3,148,149,152,153 ;324/77

References Cited [Referenced By]

U.S. Patent Documents


3466394	September 1969	French
3403227	September 1968	Malm
3009106	November 1961	Haase
3281789	October 1966	Willcox
3071753	January 1963	Fritze

Other References

Shively, A Digital Processor to Generate Spectra in Real Time, IEEE Transactions on Computers, pp. 485-491, 5/68 .
Dietrich and Maiwald, Digitalized Sound Spectrograph Using FFT and Multiprint Techniques, JASA, p.308, 11/68.

Primary Examiner: Claffy; Kathleen H.
Assistant Examiner: Leaheey; Jon Bradford

Claims

What is claimed is:

1. A method of voice verification comprising:

converting analog representations of a voice into digital signals,

transforming said digital signals into the frequency domain,

comparing predetermined features comprising corresponding points of a spectral estimate of said transformed signals which are characteristic of differences between individual voices with stored predetermined features of the voice to be verified, and

generating a verify or non-verify signal in response to the comparison of said predetermined features.

2. The method of claim 1 wherein said step of transforming comprises Fourier transformation.

3. The method of claim 1 wherein said corresponding points comprise selected segments of selected phonemes.

4. A method for verifying the voice of an individual comprising:

converting the voice into analog electrical signals,

converting said electrical signals into digital signals,

sampling spaced apart portions of said digital signals,

Fourier transforming the sampled portions of said digital signals into the frequency domain,

smoothing the frequency transformed signals,

forming a spectral estimate of the smoothed signals,

comparing the spectral estimate with a stored spectral signal representative of the individual, and

generating a verify or non-verify signal in response to the comparison.

5. The method of claim 4 wherein said Fourier transforming utilizes the Cooley-Tuckey algorithm.

6. The method of claim 4 wherein said step of smoothing comprises:

convolving a real smoothing function with said frequency transformed signals.

7. The method of claim 4 wherein said step of forming a spectral estimate comprises:

multiplying the smoothed signals by the complex conjugate of said signals.

8. The method of claim 4 wherein said step of comparing comprises:

computing a non-negative single valued function of the Euclidian distance between vectors corresponding to said spectral estimate and said stored spectral signal.

9. The method of claim 8 an further comprising:

generating a verify signal only if said non-negative single valued function of the Euclidian distance is equal to or less than a preset threshold value.

10. The method of claim 4 and further comprising:

updating said stored spectral signal in response to variations in said spectral estimate and said stored spectral signal.

11. A method of voice verification for a plurality of stations comprising:

transmitting station and person identification signals from a station to a central processing station,

in response to said identification signals generating signals representative of a predetermined series of words to be spoken by the identified person,

displaying the series of words to the identified person,

transmitting spoken words by the identified person to said processing station,

comparing said spoken words with stored representations of the words previously spoken by the identified person, and

transmitting verification signals in response to the comparison to the station.

12. The method of claim 11 wherein said station and person identification signals are transmitted in response to insertion of a uniquely coded card into a station.

13. The method of claim 12 wherein said station and person identification signals are transmitted in response to operation of numerical registers at the station.

14. The method of claim 12 wherein said predetermined series of words are visually displayed from a panel at the station in a random manner.

15. The method of claim 12 and further comprising:

transmitting additional numerical signals from a station to the central processing station, wherein operations may be performed thereon if the identity of the person is verified.

16. A system for voice verification comprising:

means for converting analog reproductions of a voice into digital signals,

means for transforming said digital signals into the frequency domain,

means for comparing predetermined features of said transformed signals which are characteristic of differences between individual voices with stored predetermined features of the voice to be verified,

fast Fourier transformation means for transforming said digital signals into the frequency domain,

means for forming a spectral estimate of points of selected phonemes of said digital signals, and

means for comparing the Euclidian distance between points of said spectral estimate and corresponding points of said stored predetermined features, and

means for generating a verify or non-verify signal in response to the comparison of said predetermined features.

17. A system for voice verification comprising:

means for converting analog reproductions of a voice into digital signals,

means for transforming said digital signals into the frequency domain,

means for comparing predetermined features of said transformed signals which are characteristic of differences between individual voices with stored predetermined features of the voice to be verified,

means for generating a verify or non-verify signal in response to the comparison of said predetermined features, and

means for varying the stored predetermined features in response to the comparison of said predetermined features.

18. A system for verifying the voice of an individual comprising:

means for converting the voice into analog electrical signals,

means for converting said electrical signals into digital signals,

means for sampling spaced apart portions of said digital signals,

Fourier transform means for transforming the sampled portions of said digital signals into the frequency domain,

means for smoothing the frequency transformed signals,

means for forming a spectral estimate of the smoothed signals,

means for comparing the spectral estimate with a stored spectral signal representative of the individual, and

means for generating a verify or non-verify signal in response to the comparison.

19. The system of claim 18 wherein said Fourier transform means operates according to the Cooley-Tukey algorithm.

20. The system of claim 18 wherein said means for smoothing comprises:

means for convolving a real smoothing function with said frequency transformed signals.

21. The system of claim 18 wherein said means for forming a spectral estimate comprises:

means for multiplying the smoothed signals by the complex conjugate of said signals.

22. The system of claim 18 wherein said means for comparing comprises:

means for computing a non-negative single valued function of the Euclidian distance between vectors corresponding to said spectral estimate and said stored spectral signal.

23. The system of claim 22 and further comprising:

means for generating a verify signal only if said non-negative single valued function of the Euclidian distance is equal to or less than a preset threshold value.

24. The system of claim 18 and further comprising:

means for updating said stored spectral signal in response to variations in said spectral estimate and said stored spectral signal.

25. A system for voice verification comprising:

a plurality of spaced apart facilities each requiring verification of the identities of a predetermined group of people,

each facility having a plurality of terminals for receiving identification and voice information and for indicating verification signals,

multiplexing means interconnecting the terminals at each facility to a communications link,

a central processing station for receiving and transmitting signals over the communications link,

means at said central processing station responsive to identification information from a terminal for requesting voice information at said terminal,

frequency conversion means at said central processing station for converting voice information transmitted from said terminal into the frequency domain,

means at said central processing station for comparing the frequency converted information with stored identification information, and

means for generating verification signals for transmission to said terminal in response to said comparison.

26. The system of claim 25 wherein each of said terminals at a facility are connected via wirelines to said multiplexer.

27. The system of claim 25 wherein said communications link comprises an electromagnetic wave communication system.

28. The system of claim 25 and further comprising:

reversible multiplexer means located at said central processing station.

29. The system of claim 25 and further comprising:

gate means at said terminals for allowing entrance only upon the receipt of favorable verification signals.

30. In a system for voice verification, the combination comprising:

a terminal for connection through a communications link with a central voice verification station,

means at said terminal for transmitting a unique identification signal representative of an individual to the central station,

display means at said terminal responsive to a control signal from the central station for displaying in random order a plurality of required words to be spoken by the individual,

microphone means at said terminal for transmitting the spoken required words to the central station, and

means at said terminal for indicating verification signals transmitted from the central station in response to the spoken required words.

31. The combination of claim 30 wherein said display means comprises:

a panel located on said terminal and including a plurality of selectively energizable word display portions,said control signal sequentially energizing said portions in a random order to reduce the possibility of fraudulent operation of the system.

32. The combination of claim 31 and further comprising:

keyboard means disposed on said terminal to enable the transmission of supplemental information to the central station.

33. The combination of claim 30 wherein said means for transmitting a unique identification signal comprises circuitry responsive to the insertion of a coded card.

34. The combination of claim 30 wherein said means for transmitting a unique identification signal comprises selectively operable register means disposed for manual operation on said terminal.

35. The combination of claim 30 wherein said means for indicating verification signals includes means for controlling the unlatching of a passageway.

36. The combination of claim 30 wherein said means for indicating verification signals energizes a visual sign.

Description

This invention relates to voice verification, and more particularly to a method and system of voice verification from a number of spaced apart locations.

It is necessary in a number of applications to positively identify or recognize an individual by the use of some unique non-transferable characteristic. For instance, one such application is the identity verification and selective admittance of employees to industrial plants or high security civilian or military areas. Another type of situation includes high volume retail credit transactions such as in large department stores or the like. It has heretofore been known to recognize an individual's physical characteristics by personal knowledge or by visual comparison with a photograph and description, and it has also been heretofore known to compare an individual's handwriting or fingerprints with known samples. However, these prior techniques have not only been relatively time consuming and cumbersome, but have often not been completely satisfactory with respect to accuracy.

Systems have also been heretofore developed wherein voice signatures of individuals are stored and then compared with spoken words of the individual. An example of such a prior system is disclosed in U. S. Pat. No. 3,466,394, issued to Walter K. French on Sept. 9, 1969. However, many such prior systems, as exemplified by the French patent, attempt to verify speech by matching voltage amplitude peaks and valleys. The present invention recognizes that improved verification results are provided by translating voice data into the frequency domain and then making relatively simple comparisons thereof to provide verification data. Prior verification systems have also not provided a practical verification technique which may be used to perform voice verification over large geographic areas. Moreover, previous voice verification systems have not recognized the requirement for remote terminals which include counterfeit preventing devices therein.

In accordance with the present invention, a method and apparatus of voice verification is provided wherein analog reproductions of a voice are converted into digital signals. The digital signals are transformed into the frequency domain, and predetermined features of the transformed signals are compared with stored predetermined features of the voice to be verified. A "verify" or "non-verify" signal is then generated in response to the comparison of these predetermined features.

In accordance with a more specific aspect of the invention, a method and system for voice verification includes the conversion of the voice into analog electrical signals. The analog electrical signals are converted into digital signals which are sampled at a predetermined rate. The sampled signals are Fourier transformed into the frequency domain and then smoothed. A smoothed spectral estimate of the signal is formed, the spectral estimate being compared with a stored spectral signal representative of the individual to be verified. A "verify" or "non-verify" signal is then generated in response to this comparison.

In accordance with another aspect of the invention, a method and system of voice verification is provided for a plurality of remote spaced apart facilities each requiring verification of the identities of a predetermined group of people. Each facility includes a plurality of terminals for receiving identification and voice information and for indicating verification signals. Multiplexing circuitry interconnects the terminals at each facility to a communications link which joins with a central processing station. The central processing station is responsive to identification information from each of the terminals and requests voice signals to be input at the terminals. Voice signals transmitted from a terminal are converted into the frequency domain at the central processing station and are compared with stored spectral signals. Verification signals are then transmitted from the central processing station to the terminal in response to the comparison.

In accordance with yet another aspect of the invention, terminals are provided for interconnection to a central voice verification station. Structure is provided at each terminal for transmitting to the central station a unique identification signal representative of an individual. Display structure is provided at the terminals which is responsive to a control signal from the central station, the display structure then displaying in random order a plurality of required words to be uttered by the individual. A microphone is disposed at each terminal for transmitting the spoken required words to the central station. Display structure at the terminal then indicates verification signals transmitted from the central station in response to the spoken required words.

For a more complete understanding of the present invention and for further objects and advantages thereof, reference may now be made to the following description taken in conjunction with the accompanying drawings, in which:

FIG. 1 illustrates a block diagram of the basic components of a system constructed in accordance with the present invention;

FIG. 2 is a somewhat diagrammatic illustration of the use of the present invention with a plurality of spaced apart facilities each having a number of spaced apart terminals connected therewith;

FIG. 3 is a perspective view of one embodiment of a terminal for use with the present invention;

FIG. 4 is a top inverted view of the terminal shown in FIG. 3;

FIG. 5 is a perspective view of another embodiment of a terminal for use with the invention;

FIG. 6 is a block diagram of a terminal and multiplexing arrangement for use with the preferred embodiment of the invention;

FIG. 7 is a block diagram of the central processing station for use with the preferred embodiment of the invention;

FIG. 8 is a flow diagram for storing voice signatures in a digital computer according to the invention; and

FIG. 9 is a flow diagram for comparison of spoken words with stored words in a digital computer according to the invention.

Referring to FIG. 1, a plurality of paced apart terminals 10, 12 and 14 are connected to a reversible multiplexing system 16. The number of terminals used with the invention will, of course, depend upon the type of voice verification application desired. For instance, a large number of terminals may be spaced around a retail store for use with verifying identity and credit, as well as for implementing accounting and inventory control procedures. Alternatively, the terminals may comprise a plurality of gate stations in a security area which may not be opened until an individual is properly verified according to the invention. The terminals can be also utilized as voting terminals which provide verification of voter identity as well as vote compilation.

Individuals desiring voice verification speak required words into the terminals. The signals flowing from the various terminals are suitably multiplexed by the multiplexer 16 and fed through a transmission link 18 to a reversible demultiplexing station 20 where the signals are demultiplexed. The transmission link 18 may comprise any electromagnetic wave communication system, including radio wave, microwave, laser or the like. The demultiplexed signals are fed to a central processing station 22, where the words spoken at terminals 10, 12 and 14 are compared with previously stored voice information contained in permanent data file units 24. The central processing station 22 also inputs into the remote surveillance system 26 to provide additional information if required, particularly in the industrial secruity application. After the comparison of the voice inputs with the stored voice information, the central processor 22 transmits verification signals through the reversible demultiplexer station 20, the transmission link 18 and the multiplexer 16 to the respective terminal.

FIG. 2 illustrates in more detail a total system implementing the present technique over a wide geographic area. A plurality of facilities 30, 32, 34 and 36 are spaced about a centrally located facility shown generally by the numeral 38. Each of the facilities has within its general area a group of individuals with whom it is necessary to perform identification operations. For instance, each of the facilities 30, 32, 34 and 36 may comprise a separate industrial complex requiring identification verification as a part of overall security operations. Alternatively, the facilities 30, 32, 34 and 36 may comprise retail store operations wherein it is necessary to verify identification and credit verification during the conduction of business.

Each facility has a plurality of terminals associated therewith. Each of the terminals 30a-n at facility 30 are joined to a facility transmitting and receiving station 40 by wire connections. Similarly, each of the terminals 32a-n at facility 32 are joined to central transmitting and receiving station 42. Likewise, the terminals at facilities 34 and 36 are respectively connected to transmitting and receiving stations 44 and 46. Each of the transmitting and receiving stations 40, 42, 44 and 46 communicate via radio waves with a transmitting and receiving station 48 located at the central processing station facility 38.

In the preferred embodiment, the wirelines connecting the terminals to each central facility transmitting and receiving station comprise 15 kc studio quality transmission lines currently used by commercial radio stations between their studio and transmitter locations. Additionally, the communications link between the outlying facilities and the central processing station comprises microwave communication systems. It is also noted that the central processing station facility 38 may include a number of outlying terminals which are connected thereto by wire or other suitable connections. It will thus be seen that the present invention allows a number of outlying facilities to utilize a central processing station for voice verification, the details of operation of which will be later described.

FIGS. 3 and 4 illustrate one embodiment of a terminal 50 for use in a retail store. The terminal 50 comprises a case which houses simple logic and light energizing circuits. A microphone 52 is connected to the side of the casing which is faced by the individual to be verified. A receptacle 54 is provided in the central portion of the casing to receive an identification card carried by the individual to be verified. This identification card may comprise a conventional credit card having punched holes using light and photocell matrix logic, or magnetic encoding techniques, which actuate switches according to the code imprinted thereon.

When it is desired to charge an item bought in the store, the purchaser places a card in the receptacle 54. Terminal 50 is energized and a "go ahead" signal is displayed. At this time, the clerk or salesman may enter alphanumeric information for sales and inventory control into the terminal. The unit senses the uniquely coded card and sends an identifying signal via the wirelines or other suitable transmitting link to the transmitting and receiving station of that particular facility. The identifying card signals, along with signals identifying the particular terminal involved, are transmitted over the communications link to the central processing station. This identifying signal is entered into the computer at the central station and utilized to retrieve a stored voice signature of the particular individual involved. The computer then sends an acknowledging signal to the terminal and a panel 58a-e is illuminated to cue the individual to speak the required words in a random sequential order. At the illumination of each word panel, the individual speaks the required word into the microphone 52. This voice data is then transmitted over the communications link to the central processing station.

An important aspect of the invention is that the word panels 58a-e are energized in a random order to reduce the possibility of use of a counterfeit voice source, such as a tape recorder or the like. The panels 56, 58a-3, 60, 62 and 64 in the simplest aspect may comprise glass panels bearing legends disposed in front of a light source. Suitable filtering networks are connected at each light source so that signals bearing different frequencies will energize different ones of the panels. A frequency coded signal is then provided by the central processing station in order to sequentially energize the panels in a random order.

A "thank you" panel 60 may also be energized by the central control station. If a "non-verify" decision is made and the decision measure is within a predetermined distance of the "verify" threshold, a signal is transmitted to the proper terminal and the "please repeat" panel 56 is energized. Also, if a "non-verify" signal is generated due to background noise or the like, then the "please repeat" panel 56 is energized to enable additional tries for verification. The number of additional tries may be under the store clerk or salesman's discretion. Referring to FIG. 4, it will be seen that the sales clerk is provided with a "verified" panel 62 and a "non-verified" panel 64. After comparison of the voice signals transmitted by the purchasing individual, the central station selectively energizes either panels 60 and 62 or 56 and 64. If the individual is verified due to the lighting up of panels 60 and 62, the sales clerk has the option to enter additional inventory control and sales information on a keyboard 66. Panels 67a-b are provided to indicate to the clerk as to whether price or inventory control information is presently being input into the system. A number of buttons 68 are also provided to initiate the input and inventory information transmission and to request return alphanumeric signals. The keyboard 66 and the buttons 68 may comprise capacitively actuated switches or any other suitable conventional means.

It should be understood that instead of the use of the identification card shown in FIGS. 3 and 4, that a series of manually operable buttons or registers could be alternatively provided at the terminal. The purchasing individual could then set the buttons or registers to a particular unique code for identification purposes. Also, it will be seen that more or less required words than the five words illustrated could be presented to the purchasing individual in a variety of ways. For instance, a cathode ray tube could be utilized to project a desired sequence of required words to be spoken by the purchasing individual. The particular required words utilized with the terminals will, of course, be carefully chosen in order to provide the required number and type of phonemes most useful for accurate voice identification.

FIG. 5 illustrates another embodiment of a terminal for use with the invention in a high security area. An opening is formed in walls 70 surrounding the high security area, with the pair of spaced apart semicircular enclosure members 72a-b surrounding a revolving door 74. The partitions forming the revolving door 74 are provided with a series of slotted portions 76 which mesh with outwardly extending members 78 which are rigidly attached to the wall 70.

The members 78 allow entrance through the revolving door 74 on only one side. A terminal 80 constructed in accordance with the invention is disposed on the gate for voice verification purposes. A person wishing to enter through the gate inserts a card in a slot 82 in the terminal, the card indicating the purported identity of the person. The data processing station then sequentially illuminates a plurality of word panels 86. As each panel is illuminated, the individual speaks the required word into a microphone 88. As previously indicated, the word panels 86 are randomly energized in order to reduce the possibility of voice counterfeiting. If the remote data processing center verifies the voice of the individual with respect to the stored voice signals of the individual, a "proceed" light panel 84 is illuminated and the revolving door 74 is unlatched to allow entrance of the individual therethrough. Light panel 90 may be energized to display "please repeat" in the manner previously described. Any type of suitable latch may be utilized to lock the revolving door against unauthorized admittance, such as a movable latch controlled by a solenoid. A surveillance television camera 94 is operated to indicate when an individual is passing through the gate or to provide a visual indication of an individual who is not verified by the system. The camera 94 also prevents entry through the gate of more than one person. When a "non-verify" decision is received, "repeat" panel 90 is illuminated.

FIG. 6 illustrates in detail the interconnections between the terminals of the invention, the reversible multiplexer system, and the transmission link which extends to the central processing center. A start signal is initially generated by the depression of a button on a terminal or by the insertion of a uniquely coded card into the terminal. The start signal operates a service request signal generator 100 which generates a suitable signal to an input-output interface and frequency multiplexer circuit 102. Generator 100 may comprise a frequency tone generator of the type used in conventional touch tone telephone systems. Multiplexer 102 comprises any suitable type of reversible analog multiplexer, of which there are a number of commercially available units made by different manufacturers.

Also applied as an input into the terminal is a voice input which is fed through a microphone to a signal conditioning circuit 104, which may comprise for instance AGC circuitry with preamplification and bandpass filtering, and other noise reduction circuitry. The output of the signal conditioning circuitry 104 is also fed through the frequency multiplexer 102. Alphanumeric user and machine inputs are fed from the remote terminal into an alphanumeric data block signal generator 106. Generator 106 preferably will comprise a frequency shift keying system such as the conventional touch-tone system for generating tones in response to alphanumeric inputs. The output of the signal generator 106 is also fed through the multiplexer 102.

Signals fed from the central processing center back through the frequency multiplexer 102 are demultiplexed thereby and are applied through a terminal command and control circuit 108. The circuit 108 comprises simple logic circuitry to turn signal generator 106 on after reception of a ready acknowledgment from the central processing center. Additionally, the command and control circuitry 108 decodes command signals and controls the operation of the visual signal panels on the terminal. These signal panels are termed, for the purposes of this disclosure, the queue and decision display circuit 110. Display 110 thus includes the randomly operated required word displays and also the verification panels on the terminal. The terminal command and control 108 also controls the operation of the admittance gate 112 at an industrial site, or alternatively, the sales ticket data operation 114 in a retail site.

The multiplexed signals from the frequency multiplxer 102 are fed through an input interface and channel separator 116. Additionally, multiplexed signals from other terminals at the facility are fed into the interface and channel separator 116. The signals fed from each of the terminals at the facility are separated into various frequency channels by the separator 116. The start request and other signals are fed through the signaling channel interface circuitry 118 to the service request sensor circuit 120. Suitable circuitry for use as the service request sensor is a detector or recognizer for the output of the touch-tone system utilized in generator 106. Such a detector may comprise a narrow band filter, along with squaring and integrating circuits and time gate sampling.

The voice signals and alphanumeric user and machine inputs are fed through a data channel interface 122 to a signal conditioning circuit 124. Some amplification may be required at interface 122. Again, filtering and other conditioning operations similar to those provided by circuit 104 are performed by the circuit 124 on the voice signals in order to enhance the signal to noise ratio. The other alphanumeric inputs may be amplified by circuit 124. The service request sensor 120 generates a tone signal which is fed to the system command and control circuitry 126. Receipt of a "use" signal by circuitry 126 initiates channel assignment. Circuitry 126 also provides clocking and control functions for the multiplexing operations, and generally coordinates movement of signal flow by opening and inhibiting the various channels. The conditioned data signals are fed from circuit 124 to the transmitter channel assignment circuitry 128 wherein the data signals are fed to available frequency channels for transmission. Clocking and control signals for the transmission are provided by the command control system 126. System 126 also maintains memory of which frequency channels are unassigned and thus available.

The command and control system 126 also actuates a "use" signal generator 130 which provides a "use" tone signal to be fed to the central processing station via the transmitter channel assignment circuit 128. The multiplexed signals are again conditioned by preamplification and bandpass filtering by a conditioning circuit 132. The conditioned signals are then fed through a microwave transmitter 134 to the transmitting antenna 136 for transmission to the central processing center.

Acknowledgment and verification signals transmitted from the central processing center are received by a receiver antenna 138, and fed through a microwave receiver 140 to a receiver channel assignment circuit 142. Control of the receiver channel assignment circuit 142 is provided by the transmitter channel assignment circuit 128. Ready or acknowledgment tone signals are fed to a tone sensing circuit 144, of the type previously described, which generates a signal which is routed through the command and control system 126. This signal then actuates the alphanumeric request tone signal generator 146 and the queue tone signal generators 148. It will be understood that a plurality of queue signal generators will be provided, with one generator being provided for each of the required word panels on a terminal. The queue signal generators 148 generate tone signals of different frequency in order to actuate selected ones of the lights behind the word panels on the terminal.

The output signals from the tone signal generators 146 and 148 are fed through the signaling channel interface 118 and the channel separator 116 to the remote terminal. After voice signals are transmitted after the operation of one of the queue signal generators 148 and the voice signals are received by the central processing center, a tone indication is provided through the receiver channel assignment 142 and is fed to the sense queue circuit 150. The sense queue circuit senses the tone and then generates tone signals which sequentially actuate the next desired signal from the queue signal generators 148. After the desired number of required spoken words have been received by the central processing center, a decision tone signal on verification is transmitted from the processing center, received by the receiver antenna 138 and fed to the sense decision circuit 152. This decision circuit 152 feeds a signal through the system command and control circuit 126 to actuate the decision signal generator 154. The decision signal is then fed through the signaling channel interface 118 to the remote terminal for actuating the visual verification display panel at the display 110. Additionally, other acknowledgment tone signals, inventory control information, termination tone signals and the like are fed through the receiver channel assignment 142 to the return data circuit 158 for suitable display at the terminal displays 112 or 114. The circuit 158 decodes the coded tone signals applied thereto.

FIG. 7 illustrates the circuitry at the central processing station of the invention. A microwave receiving antenna 160 receives the microwave signal transmissions from the various terminal facilities. A microwave receiver 162 transmits the received signals through a receiver channel interface 164. The channel interface 164 is automatically set by the channel assignment circuit 128 (FIG. 6) and feeds the signals through a data interface and channel assignment circuit 166. The receiver channel interface 164 also provides a signal to the use sensor 168 which determines which time shared input lines, and thus which facility and terminal is requesting service. The sensor 168 then generates a tone signal which notifies the system command and control circuit 170 in order to schedule the requested job and initiate action thereon. Circuit 170 then notifies assignment circuit 166 for channel assignment.

The voice signals fed through the data interface system 166 are fed to a voice data analog-to-digital converter 172. In a typical system, conversion is provided at the converter 172 at an 8,000 to 20,000 samples/second rate. Selection of portions of the voice signals will be based on starting an arbitrary time following or preceding signal onset as determined by threshold logic. The digitized signals are then fed through a buffer 174 which stores the digital signals until the signals are accepted by a fast fourier transform circuit 176. Suitable buffer and fast Fourier transform systems are commercially available from such manufacturers as the Digital Systems Division of Texas Instruments Incorporated of Houston, Texas. The resulting spectra are fed to an identification and data processor 178, wherein the spectra are stored on magnetic discs or the like and utilized for voice verification in the manner to be subsequently described in detail. The data processor in the preferred embodiment of the invention will comprise a properly programmed digital computer.

Several Fourier transform units may be required in circuit 176. Each fast Fourier transform unit will be capable of producing Fourier transforms in the range of about 50 channels of 1,000 word speech samples every half second. A 20 kc sampling rate and a 0.05 second transform time-gate then produces 1,000 samples for each phoneme transformed. Assuming 10 phoneme spectra for each individual verified, this yields about 30 verifications per second as a typical maximum verification rate for the system utilizing three fast Fourier transform systems. The time required for processing each set of ten spectra and making the verification for an individual would be about 0.03 seconds. This time is based upon forty words per spectrum, or a total of 400 words per person, required to represent the pertinent spectral features. Thus, most conventional general purpose computers may easily keep up with three fast Fourier transform systems all working at maximum capacity. With computers of much larger capacity, the need for fast Fourier transform units would not be necessary, as such transformation functions may be done by properly programming the digital computer.

Alphanumeric tone signals fed through the data interface 166 are decoded into digital signals and fed through a buffer 180 to the identification and data processor 178. Additionally, portions of the alphanumeric signals are fed through buffer 180 for storage in a memory file 182. The data processor 178 may recall prestored spectral data from the memory file 182 when desired. Additional auxiliary data storage is provided by the storage 184, and the auxiliary data may be processed at a later time by processor 178 or yet another specially programmed general purpose computer. Buffers and storage circuits 180, 182 and 184 may comprise conventional magnetic disc or core storage, or banks of shift register strings.

The operation of each of the buffers and other circuits of the invention are controlled and addressed by clock and timing signals provided by the command control system 170. The command control system 170 also initiates the ready signal generator 186 which supplies a tone signal through the transmitter channel interface system 188. System 188 is automatically set by the operation of circuit 142 (FIG. 6). The tone signal is then assigned to a channel for transmission and is fed through a signal conditioning circuit 190 for filtering and waveshaping operations. The signal is then fed to the microwave transmitter 192 for transmission via the transmitter antenna 194 to the respective facility and terminal.

Upon answer from the terminal, the system command and control 170 feeds signals through an exhaustive random order selector circuit 196 which generates a series of five random control pulses to the queue signal generators 198. An exemplary circuit for use as selector circuit 196 is a continuously cycling five position shift register. When it is desired to generate a random word signal, the instantaneous position of the register is output as the random number. This position is stored by logic circuitry. For the next random number to be generated, the instantaneous position of the register is output and then compared with the stored position. If the two positions are different, the second position is output as a random number. If the positions are the same, the instantaneous position of the register is again detected. This cycle is repeated until five random numbers have been generated, after which the system is reset. The generators 198 include a generator of a different tone frequency for each of the required words to be displayed at the terminal. The different frequency signals are transmitted through the channel interface 188 for transmission via the microwave communication link. Thus, each required word panel is energized during each verification cycle, but the word order is random. After the system has received the voice signals and made the verification comparison at the data processor 178, the system command and control 170 initiates a decision tone signal from the tone generator 200 for transmission to the respective terminal. Additionally, the system command and control 170 controls the operation of a return data buffer 202, which may be a magnetic storage, such that return data may be fed from a data processor 178 to the respective terminal.

The operation of the system will now be described in detail with reference to FIGS. 6 and 7. The person to be identified approaches the remote terminal and initiates a start signal by the depression of a button or the like. The start signal is fed through the service request signal generator 100 which applies a signal through the interface and frequency multiplexer 102. The channel separator 116 separates the start signal from other signal information being supplied from other remote terminals and feeds the start signal through the interface 118 to the service request sensor 120. A signal is then initiated through the command and control circuit 126 to the use signal generator 130, which generates a "use" signal for application to the transmitter channel assignment circuit 128. A transmitter channel is assigned upon receipt of the service request by circuit 126. The "use" signal is transmitted from antenna 136 via the microwave link to the receiver antenna 160 shown in FIG. 7.

The "use" signal is then transmitted through the channel interface circuit 164 to the use sensor 168. The resulting output signal from the sensor 168 is fed to the system command and control circuit 170, wherein the location and identity of the terminal is stored and utilized for scheduling of the job. Scheduling signals are fed from the command and control circuit 170 to the channel assignment circuit 166. Additionally, the command and control system 170 initiates a signal from a ready signal generator 186 which is fed through the channel system 188 and transmitted via the antenna 194 to the terminal. This ready signal is received by the antenna 138 (FIG. 6) and fed to the sense ready circuit 144, which applies an indication to the system command and control 126.

The alphanumeric request signal generator 146 is then energized to request alphanumeric data blocks through the channel interface 118 and channel separator 116, the multiplexer 102 and through the command and control 108. These user inputs, which may comprise the purported identity contained on the card inserted into the terminal, are fed through signal generator 106. As noted, generator 106 in the preferred embodiment generates tone frequencies which are fed through the multiplexing circuitry and through the data channel interface 122 for transmission from the antenna 136 to the central processing station.

The purported identity and other alphanumeric signals are received by the receiver antenna 160 and suitably demultiplexed and fed through the alphanumeric data buffer 180 to the system command and control 170. The signals are then stored, and in response thereto a signal is fed to the random order selector 196. This selector then feeds a first randomly selected control signal to initiate operation of one of the queue signal generators 198. The selected queue signal generator generates a unique tone frequency which is fed through the channel interface 188 and transmitted via the transmitter antenna 194. The selected queue signal is received by the receiver antenna 138 and fed through the receiver channel assignment system 142 to the sense queue circuit 150. The sense queue signal is fed through the command and control system 126 to the queue signal generators 148.

Only one of the signal generators 148 is initiated by the particular queue signal tone, and this particular generator produces a signal which is fed through the channel interface 118 and through the terminal command and control circuit 108 to the queue display 110. A single one of the required word panels is then energized at the terminal. For instance, referring to FIG. 3, a selected one of the word panels 58a-e is energized. The individual to be verified then speaks the required word through the microphone, the resulting voice input being fed through the signal conditioning circuit 104 and through the multiplexing circuit 102 to the data channel interface 122. The voice signals are suitably conditioned for noise reduction through the conditioning unit 124 and fed through the channel assignment circuit 128 for transmission via the transmitting antenna 136.

The required word is received at the receiver antenna 160 and fed through the data interface and channel assignment circuit 166 which has been preset by the initial start signal. Circuit 166 routes the word signal to the voice data analog-to-digital converter 172. The digital signals are stored in the buffer 174 until they may be transformed into the frequency domain by the fast Fourier transform circuit 176. The transformed signals are then fed to the identification and data processor 178 for processing in the manner to be subsequently described. The command and control system 170 senses the reception and storing of the required word, and then pulses the random order selector 196 in order that a second queue signal may be generated from the signal generators 198. The signal is again a unique tone frequency which is transmitted to the terminal, in the manner previously described, so that a second one of the queue signal generators 148 will be energized to display a second required word at the terminal.

The individual to be verified again repeats the required word through the microphone and the voice input is processed, multiplexed and transmitted to the central processing station. The voice signal is digitally converted in the unit 172 and transformed into the frequency domain by the circuit 176. The second spoken word is then stored in the identification and data processor 178 and the cycle is repeated again. This required word generation and resulting voice response by the individual to be verified is continued until all of the required spoken words are received and stored in the identification and data processor. In the embodiment shown in FIG. 3, this would comprise five spoken words. However, it will be understood that more or less words may be required for varying applications of the invention.

After all the required words have been spoken and stored, the stored spectral signals for the individual are retrieved from the memory file 182. The stored signals relative to the individual are then compared with the spoken words within the data processor 178 in the manner to be subsequently described. The data processor 178 then makes a verification decision and transmits the verification signal through the system command control 170 and to the decision signal generator 200. If the person's purported identity is verified, the generator 200 generates one signal which is fed to the channel assignment circuit 188 and transmitted via the microwave link to the receiver antenna 138 at the terminal. The favorable decision is fed through the sense decision circuit 152 and through the command and control system 126 to the decision signal generator 154.

In response to the favorable decision signal, generator 154 generates an indication which is fed through the interface 118, the channel separator 116, the multiplexer 102 and the terminal command and control 108 to either the gate control 112 or to the sales ticket data 114. If "no verification" is decided by the data processor 178, the decision signal generator 200 generates a "non-verify" signal, which is displayed by 110 and 112, and the gate control 112 is not actuated, or no sales ticket data is presented to unit 114. Additionally, the "please repeat" panel will be energized.

In a credit environment, if the data processor 178 generates a "verify" signal, the system command and control 170 actuates the auxiliary data storage to store the input signals for inventory and credit control purposes.

Upon indication of a decision signal, the terminal generates a clear indication which is fed through the service request signal generator loop to the central processing station to release the channel assignment for the terminal. Return signals fed from the data processor may at this time be fed through the return data buffer 202 and through the return data unit 158 for display at the terminal primarily at display 114.

It will thus be seen that the present system may have many applications for use as an industrial security system, or for use in credit verification for retail stores and the like. Alternatively, the system can be used as a verification unit in apartment houses to control the entrance thereto. The system may also be used between banks in order to keep credit balances and the like on a real-time basis. For security purposes, codes or scrambling may be utilized in the microwave communications link to reduce the possibility of interception and use by unauthorized persons.

The identification and data processor 178 may take on various forms, as for instance, a special purpose digital computer. However, it is thought that for most purposes it will be desirable to properly program a general purpose digital computer to perform the voice verification functions of the invention. FIGS. 8 and 9 thus illustrate flow diagrams which may be used to program such a digital computer.

Referring to FIG. 8, a flow diagram is illustrated whereby the digital processor may be provided with a stored memory of a voice, termed a voice signature, of an individual. In practice, the individual speaks the required words into a microphone, after which the words are converted into digital form by a conventional analog-to-digital convertor. Assuming M utterances of each of N phonemes, T points of the digital words are stored at 300. These T points comprise an arbitrary number to give sufficient information, but in practice T has been set to 1,024. At step 302, m is set equal to one and n is set equal to one at 304. At 306, the spectrum of the mth utterance of the nth phoneme is computed and stored as the Fourier transform .PHI..sub.n (i), wherein i = a.sub.n, . . . ,b.sub.n.

At 308, n is incremented once and a decision is made at 310 as to whether or not n = N. If not, the next phoneme is transformed at 306. When the last phoneme of the first utterance is transformed, n = N, and m is incremented once at 312. The decision is made at 314 as to whether or not m = M. If not, the next utterance is Fourier transformed, phoneme by phoneme, in the manner described. After the last utterance has been Fourier transformed, the reference spectra are computed at 316, as indicated, and are stored to provide .PHI..sub.n.sup.REF (i) for use as a reference voice signature in the manner to be subsequently described. The operation at 316 is an arithmetic average of repetitive utterances made by the individual, to provide more meaningful stored data. Spectrum smoothing will generally also be conducted on the transform data to make the data more compatible with the input data treated in the manner as shown in FIG. 9.

Referring to FIG. 9, it will be assumed that the required spoken words have been transmitted to the central data processor and have been digitally transformed. T points of each of the N preselected phonemes are stored at 400 as previously noted. In one embodiment T comprised the value of 1,024, with from five to 10 phonemes being utilized. n is set to one at 402 and Fourier transformation is accomplished on the nth phoneme utilizing the Cooley-Tukey algorithm according to the formula:

The Fourier transforms are of the nth phonemes and are smoothed at 406 according to the formula:

wherein i = a.sub.n, a.sub.n + x, . . . , b.sub.n, and H(m) Real which involves convolving the Real H(m) function with .PHI..sub.n '(i) such that only frequencies of interest are examined. It will be noted that frequencies a.sub.n and b.sub.n may differ for different phonemes and for various applications desired.

At 408, a spectral estimate of the nth phoneme is formed according to the formula:

.PHI..sub.nn (i) = .PHI..sub.n (i) .PHI..sub.n *(i) 3.

wherein * denotes the complex conjugate and i = a.sub.n, a.sub.n +x,...,b.sub.n.

This step involves multiplying the smoothed function by its complex conjugate. The spectral estimate .PHI..sub.nn (i) is stored and n is incremented at 410. A decision is made at 412 as to whether or not n = N. If not, the subsequent phonemes are Fourier transformed at 404 and the cycle is continued until all phonemes have been processed. At 414, the reference spectra .PHI..sub.nn.sup.REF (i) are called from memory for the particular individual to be verified. The observed spectra are then compared at 416 to the reference spectra in the following manner:

This involves comparing the Euclidian distance between corresponding vectors in the .PHI..sub.nn (i) multidimensional feature space, whose coordinates are the energy-densities at the particular frequencies of each of the phoneme spectra. The square of the Euclidian distance, or other suitable non-negative single valued functions of the Euclidian distance, is compared at 418 against the predetermined threshold value. This value is arbitrary and is defined by previously made experiments on large portions of the population. This threshold may, in some instance, be unique for each individual and may be stored within the computer. If the Euclidian distance is not less than or equal to the threshold value, a "not verified" signal is transmitted at 420. If the squared distance is less than the threshold, a "verify" signal is transmitted at 422 in the manner previously described.

In the preferred embodiment of the invention, the reference spectra stored within the computer is updated at 424 in accordance with the results of the decision at 418. This is accomplished according to the following:

wherein

i = a.sub.n,...,b.sub.n

n = 1,2,...,N

P = number of verifications.

The stored reference data is then changed to compensate for changes in voices due to age and the like. In some instances, it may be desirable to track changes in voices faster by utilizing only the most recent past, as for instance six months, when updating the reference spectra at 424.

Whereas the present invention has been described with respect to specific embodiments thereof, it will be understood that various changes and modifications will be suggested to one skilled in the art, and it is intended to encompass such changes and modifications as fall within the scope of the appended claims.

* * * * *