U.S. patent application number 09/088996 was filed with the patent office on 2001-08-23 for voice recognition apparatus and recording medium having voice recognition program recorded therein.
Invention is credited to ONISHI, TAKAFUMI, TAKAHASHI, HIDETAKA.
Application Number | 20010016815 09/088996 |
Document ID | / |
Family ID | 27279504 |
Filed Date | 2001-08-23 |
United States Patent
Application |
20010016815 |
Kind Code |
A1 |
TAKAHASHI, HIDETAKA ; et
al. |
August 23, 2001 |
VOICE RECOGNITION APPARATUS AND RECORDING MEDIUM HAVING VOICE
RECOGNITION PROGRAM RECORDED THEREIN
Abstract
The present invention relates to what causes a computer to read
a voice recognition program from a first recording medium, and read
voice data from a second recording medium, and causes a CPU in the
computer to recognize voice represented by the read voice data
according to the voice recognition program, convert the result of
voice recognition into text data, and display the converted text
data on a display unit. Also included is a check mark button used
by a speaker to designate a portion of voice data, which is input
through a microphone, corresponding to an unnecessary word or the
like. The portion of the voice data in which a check mark is
inscribed is not regarded as an object of voice recognition. Only
the other portion of the voice data in which the check mark is not
inscribed is regarded as an object of voice recognition, and voice
recognition is thus carried out. Furthermore, the sound level of a
voiceful portion of voice data is rated. The gain of the voice data
is adjusted according to the rated level. On the basis of the voice
data whose sound level has been adjusted, voice recognition is
carried out.
Inventors: |
TAKAHASHI, HIDETAKA; (TOKYO,
JP) ; ONISHI, TAKAFUMI; (TOKYO, JP) |
Correspondence
Address: |
VOLPE AND KOENIG, P.C.
SUITE 400, ONE PENN CENTER
1617 JOHN F. KENNEDY BOULEVARD
PHILADELPHIA
PA
19103
US
|
Family ID: |
27279504 |
Appl. No.: |
09/088996 |
Filed: |
June 2, 1998 |
Current U.S.
Class: |
704/235 ;
704/E15.045 |
Current CPC
Class: |
G10L 15/26 20130101 |
Class at
Publication: |
704/235 |
International
Class: |
G10L 015/26 |
Foreign Application Data
Date |
Code |
Application Number |
Jun 6, 1997 |
JP |
H9-149729 |
Jan 23, 1998 |
JP |
H10-011631 |
Jan 23, 1998 |
JP |
H10-011632 |
Claims
What is claimed is:
1. A voice recognition apparatus for recognizing voice within a
programmed computer, comprising: a voice data reading means for
reading voice data from a voice data recording medium in which the
voice data is recorded; a voice recognition means for recognizing
voice represented by the voice data and converting it into text
data; and a display means for displaying the text data.
2. A voice recognition apparatus according to claim 1, wherein
voice data recorded in said voice data recording medium is
compressed digital voice data.
3. A recording medium having a voice recognition program recorded
therein, wherein said voice recognition program causes a computer
to: read voice data from a voice data recording medium in which the
voice data is recorded; recognize voice represented by the voice
data so as to convert it into text data; and display the text
data.
4. A recording medium having a voice recognition program recorded
therein according to claim 3, wherein said voice recognition
program further causes the computer to recognize in voice or
voice-recognize only a given number of words and convert them into
text data at intervals of a given time when causing the computer to
recognize voice represented by the voice data and convert it into
text data.
5. A recording medium having a voice recognition program recorded
therein according to claim 3 or 4, wherein said voice recognition
program further causes the computer to voice-recognize only a given
number of words starting at a given position in said voice data
recording medium having voice data recorded therein and to convert
them into text data when causing the computer to recognize voice
represented by the voice data and convert it to text data.
6. A recording medium having a voice recognition program recorded
therein, wherein said voice recognition program causes a computer
to: read voice data from a voice data recording medium in which the
voice data is recorded; recognize voice represented by the voice
data so as to detect a given word; and indicate the positions of
the given word.
7. A recording medium having a voice recognition program recorded
therein according to claim 6, wherein said voice recognition
program further causes the computer to create an index mark at the
positions of the given word in said voice data recording medium
having the voice data recorded therein after causing the computer
to recognize voice represented by the voice data and detect the
given word.
8. A recording medium having a voice recognition program recorded
therein according to claim 7, wherein said voice recognition
program further causes the computer to reproduce voice data
starting at a given position in said voice data recording medium
having the voice data recorded therein after causing the computer
to indicate the positions of the given word.
9. A recording medium having a voice recognition program recorded
therein, wherein said voice recognition program causes a computer
to: read voice data from a voice data recording medium in which the
voice data is recorded; recognize voice represented by the voice
data so as to convert it into text data; display the text data;
enable designation of at least part of the text data using a
designation input means; and delete a portion of the voice data
corresponding to a portion of the text data designated using said
designation input means from said voice data recording medium, and
cancel display of the designated portion of the text data.
10. A recording medium having a voice recognition program recorded
therein, wherein said voice recognition program causes a computer
to: read voice data from a voice data recording medium in which the
voice data is recorded; recognize voice represented by the voice
data so as to convert it into text data; acquire position
information of positions in said voice data recording medium, at
which portions of the voice data corresponding to words of the text
data are recorded, in one-to-one correspondence with the words;
display the text data; enable designation of at least part of the
text data using a designation input means; acquire position
information of positions in said voice data recording medium, at
which a corresponding portion of the voice data is recorded,
according to a word contained in a portion of the text data
designated using said designation input means; and delete the
corresponding portion of the voice data from said voice data
recording medium having the voice data recorded therein on the
basis of the position information, and cancel display of the
designated portion of the text data.
11. A voice recognition apparatus, comprising: a voice data reading
means for reading voice data from a voice data recording medium in
which the voice data is recorded; a detecting means for detecting a
check mark that is appended to the voice data and distinguishes an
interval within the voice data; a voice recognition means for not
recognizing voice represented by a portion of the voice data
associated with the given check mark but recognizing voice
represented by the other portion of the voice data; and a display
means for displaying the result of recognition performed by said
voice recognition means.
12. A voice recognition apparatus according to claim 11, wherein
the check mark is recorded by a voice recording apparatus
including: a voice data input means for inputting voice data; an
interval designating means enabling designation of a desired
interval within the voice data input by said voice data input
means; a recording means for appending a check mark, which
distinguishes the interval designated using said interval
designating means, to the voice data and recording the voice data
in a voice data recording medium; and a recording medium attaching
means for use in freely detachably attaching said voice data
recording medium.
13. A recording medium having a voice recognition program recorded
therein, wherein said voice recognition program causes a computer
to: read voice data from a voice data recording medium in which the
voice data is recorded; detect a check mark that is appended to the
voice data and distinguishes an interval within the voice data; not
recognize voice represented by a portion of the voice data
associated with the given check mark but recognize voice
represented by the other portion of the voice data; and display the
result of voice recognition.
14. A voice recognition apparatus, comprising: a voice data reading
means for reading voice data from a voice data recording medium in
which the voice data is recorded; a level adjusting means for
adjusting the sound level of the voice data read by said voice data
reading means according to a given procedure; a voice recognizing
means for recognizing voice represented by the voice data whose
sound level has been adjusted by said level adjusting means; and a
display means for displaying the result of recognition performed by
said voice recognizing means.
15. A voice recognition apparatus, comprising: a voice data reading
means for reading voice data from a voice data recording medium in
which the voice data is recorded; a voice rating means for rating
the voice data read by said voice data reading means as voiceful
portions and voiceless portions; a level adjusting means for
adjusting the sound level of the voice data read by said voice data
reading means on the basis of absolute values of amplitudes of
voice signals of voice data items rated as the voiceful portions by
said voice rating means; a voice recognizing means for inputting
the voice data whose sound level has been adjusted by said level
adjusting means, and recognizing voice; and a display means for
displaying the result of recognition performed by said voice
recognizing means.
16. A voice recognition apparatus according to claim 15, further
comprising a minimum value calculating means for calculating a
minimum value of an energy level of voice data of a given interval,
wherein a criterion of said voice rating means is set on the basis
of the minimum value calculated by said minimum value calculating
means.
17. A voice recognition apparatus, comprising: a voice data reading
means for reading voice data from a voice data recording medium in
which the voice data is recorded; a voice rating means for rating
the voice data read by said voice data reading means as voiceful
portions and voiceless portions; an averaging means for averaging
absolute values of voice data items rated as the voiceful portions
by said voice rating means; a gain calculating means for
calculating a gain on the basis of the average value; a multiplying
means for multiplying the voice data by the gain; a voice
recognizing means for recognizing voice represented by the voice
data multiplied by the gain; and a display means for displaying the
result of recognition performed by said voice recognizing
means.
18. A voice recognition apparatus, comprising: a voice data reading
means for reading voice data of a desired file from a voice data
recording medium in which voice data digitized and divided into
frames is recorded in units of a file; a voice rating means for
rating the voice data read by said voice data reading means as
voiceful frames and voiceless frames; an averaging means for
averaging absolute values of voice data items in frames rated as
the voiceful frames by said voice rating means; a gain calculating
means for calculating a gain on the basis of the average value; a
multiplying means for multiplying the voice data by the gain; a
voice recognizing means for recognizing voice represented by the
voice data multiplied by the gain; and a display means for
displaying the result of recognition performed by said voice
recognizing means.
19. A recording medium having a voice recognition program recorded
therein, wherein said voice recognition program causes a computer
to: read voice data from a voice data recording medium in which the
voice data is recorded; adjust the sound level of the read voice
data; recognize voice represented by the voice data whose sound
level has been adjusted; and display the result of voice
recognition.
20. A recording medium having a voice recognition program recorded
therein, wherein said voice recognition program causes a computer
to: read voice data from a voice data recording medium in which the
voice data is recorded; rate the read voice data as voiceful
portions and voiceless portions; adjust the sound level of the read
voice data on the basis of the absolute values of voice data items
rated as the voiceful portions according to a given procedure;
recognize voice represented by the voice data whose sound level has
been adjusted; and display the result of voice recognition.
21. A recording medium having a voice recognition program recorded
therein, wherein said voice recognition program causes a computer
to: read voice data from a voice data recording medium in which the
voice data is recorded; rate the read voice data as voiceful
portions and voiceless portions; average absolute values of voice
data items rated as the voiceful portions; calculate a gain on the
basis of the average value; multiply the voice data by the gain;
input the voice data multiplied by the gain so as to recognize
voice; and display the result of voice recognition.
22. A voice recognition apparatus according to claim 1, wherein
said voice recognition apparatus includes an attachment permitting
attachment of said voice data recording medium.
23. A voice recognition apparatus according to claim 22, wherein
said voice data recording medium is attached to said attachment via
an adaptor.
Description
BACKGROUND OF THE INVENTION
[0001] 1. Field of the Invention
[0002] The present invention relates to a voice recognition
apparatus and a recording medium having a voice recognition program
recorded therein. More particularly, this invention is concerned
with a voice recognition apparatus for recognizing voice data, and
a recording medium in which a voice recognition program causing a
computer to recognize voice data is recorded.
[0003] 2. Description of the Related Art
[0004] In recent years, research and development of a voice
recognition technology has been undertaken in earnest. A
technological means capable of recognizing voice in real time has
been proposed. This kind of technology has been adapted to various
kinds of products or usages, for example, reservation of tickets by
telephone or voice commanding within car navigation.
[0005] Along with a recent breakthrough in voice recognition
technology and improvement in performance of personal computers, a
technology for documenting voice input through a microphone
connected to a personal computer by recognizing the voice within
application software running in the personal computer, and
displaying the document has been developed.
[0006] An example of a software package enabling voice recognition
is a product "Voice Type 3.0 for Windows 95" released recently by
IBM Ltd. This product converts voice input through a microphone
into text data in real time and enjoys a considerably high
recognition ratio.
[0007] However, the application software permits real-time input
through a microphone that is only one means for inputting voice
data. An already existent voice file cannot be recognized
directly.
[0008] One object of development of the aforesaid voice recognition
technology is to realize a so-called voice word processor or a
dictation system for automatically creating a document on the basis
of voice data input by performing dictation, and displaying the
document in a screen or the like.
[0009] A conventionally adopted means is such that when the
contents of a document to be created are dictated and temporarily
recorded by a recording apparatus such as a tape recorder, and a
secretary, typist, or the like reproduces the dictated contents and
documents them using a documentation apparatus such as a type
writer, word processor, or the like. This style has been generally
adopted as one form of effective utilization of the recording
apparatus such as a tape recorder.
[0010] As for such dictational recording, a technique of appending
an index mark or end mark to voice data so as to give instructions
to a secretary or typist has been known in the past. According to a
prior art of appending such a mark, a desired region of voice data
is not designated as an interval but a specified region of voice
data is designated as a point.
[0011] In the foregoing form of utilization in which a recording
apparatus is used for dictation, the birth of a technology for
automatically converting the contents of a record into a document
has been greatly demanded in the past.
[0012] In actual dictation, a word irrelevant to contents to be
informed may be contained. For example, when written sentences are
recited, an incorrectly uttered word or a word having no meaning
such as "Ah" or "Well" (hereinafter an unnecessary word) may be
contained (frequently in some cases).
[0013] In this case, the performance of voice recognition
deteriorates. This leads to a drawback that a document displayed in
a screen contains many mistakes. A technology for constructing a
dictation system by taking account of the above unnecessary words
and creating language models that cover all words including the
unnecessary words and that are intended to be used for voice
recognition has been proposed in the past.
[0014] For example, according to Japanese Unexamined Patent
Publication No. 7-5893, there is provided a voice recognition
apparatus comprising: a standard pattern memory means for storing
standard patterns; an unnecessary word pattern memory means for
storing patterns of unnecessary words; a word spotting means for
spotting as a word or word-spotting a standard pattern stored in
the standard pattern memory means or a pattern of an unnecessary
word stored in the unnecessary word pattern memory means on the
basis of input voice, and outputting a corresponding interval and
score; a producing means for hypothesizing the contents of uttered
voice and producing a representation of the meaning; and an
analyzing means for analyzing the result of word-spotting, which is
performed by the word spotting means, on the basis of the
representation of the meaning of the hypothesis produced by the
producing means. The analyzing means allocates a score resulting
from word-spotting performed on the pattern of an unnecessary word
to remaining intervals, of which corresponding standard patterns or
patterns of an unnecessary word have not been word-spotted, among
all the intervals of data items constituting the voice. The result
of word-spotting performed by the word spotting means is then
analyzed.
[0015] However, the voice recognition apparatus described in the
Japanese Unexamined Patent Publication No. 7-5893 has difficulty in
carrying out practical processing within an existing computer
(especially a computer of a personal level) because the data size
of language models becomes enormous.
[0016] Using a currently commercialized product, a speaker must be
careful in not uttering an unnecessary word or the like and cannot
therefore help feeling clumsiness.
[0017] For improving the performance of voice recognition, it is
required that the sound level of input voice is proper. Currently,
it is hard to guarantee a high recognition ratio over a wide range
of sound levels from a low level to a high level. A system is
therefore designed to provide a maximum recognition ratio relative
to an average sound level of voice.
[0018] In a voice recognition apparatus of a mode in which voice is
input through a microphone as mentioned above, a sound-level meter
for indicating a sound level of voice is displayed in, for example,
a screen or the like so that a speaker himself/herself can manage
his/her sound level of voice properly.
[0019] As an example of an embodiment of this technology, a sound
pressure level display for a voice recognition apparatus comprising
a first sound receiver for receiving a voice signal, a second sound
receiver for receiving a noise whose level is close to that of the
voice signal received by the first sound receiver, a sound pressure
level ratio calculating means for calculating a ratio of a sound
pressure level of a voice signal input to the first sound receiver
to a ratio of a sound pressure level of a noise input to the second
sound receiver, and a display means for displaying the ratio of
sound pressure levels calculated by the sound pressure level ratio
calculating means is described in Japanese Unexamined Patent
Publication No. 5-231922.
[0020] However, it is annoying for a speaker to manage his/her own
voice so that the sound level will become proper. There is
therefore an increasing demand for a user-friendly voice
recognition apparatus. Moreover, since the sound level of input
voice cannot be detected using already recorded voice data, the
technology disclosed in the Japanese Unexamined Patent Publication
No. 5-231922 cannot be adapted as it is. It cannot be judged
whether or not the sound level of voice data is suitable for voice
recognition. Besides, since the sound pressure level display is not
provided with a facility for adjusting a sound level of voice
autonomously, a voice recognition ratio may vary abruptly depending
on a sound level indicated by recorded voice data.
OBJECTS AND SUMMARY OF THE INVENTION
[0021] The first object of the present invention is to provide a
voice recognition apparatus for recognizing voice represented by
voice data recorded in a given recording medium and a recording
medium in which a voice recognition program is recorded.
[0022] The second object of the present invention is to provide a
voice recognition apparatus capable of treating an unnecessary word
or the like contained in voice without the need of especially fast
processing, and a recording medium in which a voice recognition
program is recorded.
[0023] The third object of the present invention is to provide a
voice recognition apparatus capable of recognizing voice on a
stable basis irrespective of a sound level indicated by recorded
voice data, and a recording medium in which a voice recognition
program is recorded.
[0024] Briefly, a voice recognition apparatus in accordance with
the present invention for recognizing voice within a programmed
computer comprises a voice data reading means for reading voice
data from a voice data recording medium in which the voice data is
recorded, a voice recognizing means for recognizing voice
represented by the voice data so as to convert the voice into text
data, and a display means for displaying the text data.
[0025] A recording medium in accordance with the present invention
having a voice recognition program recorded therein is used to run
the voice recognition program in a computer, whereby the voice
recognition program causes the computer to read voice data from a
voice data recording medium in which the voice data is recorded,
recognize voice represented by the voice data so as to convert the
voice into text data, and display the text data.
[0026] These objects and advantages of the present invention will
become further apparent from the following detailed
explanation.
BRIEF DESCRIPTION OF THE DRAWINGS
[0027] FIG. 1 is a block diagram schematically showing the
configuration of a computer that is the first embodiment of a voice
recognition apparatus in accordance with the present invention;
[0028] FIG. 2 is a flowchart describing the first example (first
voice recognition program) of a voice recognition program recorded
in a recording medium in accordance with the present invention
having the voice recognition program recorded therein, and run in
the first embodiment;
[0029] FIG. 3 is a diagram showing an example of display appearing
when voice recognition application software read from the first
recording medium is activated in the computer of the first
embodiment, or a main screen used to reproduce compressed voice
data;
[0030] FIG. 4 is a diagram showing an example of a screen in which
text data is displayed when the voice recognition application
software read from the first recording medium is activated in the
computer of the first embodiment;
[0031] FIG. 5 is a diagram showing an example of a dialog box
screen used to set a time interval between voice recognitions and
the number of displayed words when a given number of words are
recognized at intervals of a given time since the start of a file
subjected to voice recognition, after the voice recognition
application software read from the first recording medium is
activated in the computer of the first embodiment;
[0032] FIG. 6 is a diagram showing an example of a screen in which
a given number of words recognized at intervals of a given time
since the start of a file subjected to voice recognition after the
voice recognition application software read from the first
recording medium is activated in the computer of the first
embodiment is displayed;
[0033] FIG. 7 is a flowchart describing a second example (second
voice recognition program) of a voice recognition program recorded
in a recording medium in accordance with the present invention
having the voice recognition program recorded therein, and run in
the first embodiment;
[0034] FIG. 8 is a flowchart describing a third example (third
voice recognition program) of a voice recognition program recorded
in a recording medium in accordance with the present invention
having the voice recognition program recorded therein, and run in
the first embodiment;
[0035] FIG. 9 is a diagram showing an example of a dialog box
screen used to set a word to be retrieved for voice recognition
when only a word that must be recognized in voice and contained in
a voice compressed file is recognized in voice after the voice
recognition application software read from the first recording
medium is activated in the computer of the first embodiment;
[0036] FIG. 10 is a flowchart describing a fourth example (fourth
voice recognition program) of a voice recognition program recorded
in a recording medium in accordance with the present invention
having the voice recognition program recorded therein, and run in
the first embodiment;
[0037] FIG. 11 is a flowchart describing a fifth example (fifth
voice recognition program) of a voice recognition program recorded
in a recording medium in accordance with the present invention
having the voice recognition program recorded therein, and run in
the first embodiment;
[0038] FIG. 12 is a conceptual diagram showing the overall
configuration of a dictation system of the second embodiment of the
present invention;
[0039] FIG. 13 is a block diagram showing the electrical
configuration of a digital recorder of the second embodiment;
[0040] FIG. 14 is a diagram showing a scene in which a check mark
button of the digital recorder is handled during dictation in the
second embodiment;
[0041] FIG. 15 is a diagram showing the format of data to be
recorded in a voice memory of a miniature card by means of the
digital recorder of the second embodiment;
[0042] FIG. 16 is a block diagram showing the electrical
configuration of a personal computer of the second embodiment;
[0043] FIG. 17 is a flowchart describing voice recognition carried
out in the personal computer of the second embodiment;
[0044] FIG. 18 is a diagram showing an overall flow of reading
voice data from a voice memory and recognizing voice which is
followed by the dictation system of the third embodiment of the
present invention;
[0045] FIG. 19 is a flowchart describing voice recognition carried
out by a dictation system of the third embodiment of the present
invention;
[0046] FIG. 20 is a flowchart describing the contents of processing
relevant to judgment of voice or voiceless which is briefed in FIG.
19; and
[0047] FIG. 21 is a flowchart describing the contents of gain
calculation briefed in FIG. 19.
DESCRIPTION OF THE PREFERRED EMBODIMENTS
[0048] Referring to the drawings, embodiments of the present
invention will be described below.
[0049] FIG. 1 is a block diagram schematically showing the
configuration of a computer that is the first embodiment of a voice
recognition apparatus in accordance with the present invention.
[0050] A computer 1 consists, as shown in FIG. 1, mainly of: a
central processing unit (CPU) 1a responsible for control of the
whole computer 1; a first input unit 5 in which an external
recording medium (first recording medium 7) having a given program
recorded therein can be mounted freely; a first recording medium
driver 6, incorporated in the first input unit 5, for reading a
given program from the first recording medium 7 under the control
of the CPU la when the first recording medium 7 is mounted in the
first input unit 5; a second input/output unit 8 in which an
external recording medium (second recording medium 10) having given
voice data recorded therein can be mounted freely; a second
recording medium driver 9, incorporated in the second input/output
unit 8, for reading given voice data and writing given data from
and in the second recording medium 10 under the control of the CPU
1a when the second recording medium 10 is mounted in the second
input/output unit 8; an operation unit 2 for inputting a given
instruction entered by a user; a display unit 3 serving as a
display means for displaying given data after given processing is
carried out by the CPU 1a; and a voice output unit 4 for outputting
produced voice after given processing is carried out by the CPU
1a.
[0051] The computer 1 is configured to permit operation of an
operation system (OS) capable of executing a plurality of
application software concurrently (multitasking). Hereinafter, a
description will be made on the assumption that the OS is installed
in the computer 1.
[0052] The first recording medium 7 is a recording medium in which
a given voice recognition program is recorded. In this embodiment,
for example, a portable recording medium such as a CD-ROM or floppy
disk is imagined as the recording medium.
[0053] Moreover, the second recording medium 10 is a voice data
recording medium in which given voice data is recorded. The second
recording medium 10 will be described below.
[0054] The second recording medium 10 is a recording medium in
which voice data acquired by an external solid-state recorder is
recorded. In this embodiment, a card-shaped recording medium that
is a flash memory is imagined.
[0055] In recent years, there has been an increasing demand for a
flash memory. Digital solid-state recorders using the flash memory
as a recording medium have been commercialized. The flash memory is
known in many types of card-shaped recording media. For example, a
memory card conformable to the PCMCIA standard, a miniature card
manufactured by Intel Corp., an SSDFC manufactured by Toshiba Co.,
Ltd., and a compact flash memory manufactured by SunDisk Co., Ltd.
are known.
[0056] In general, these card-shaped flash memories are connected
to a personal computer via an adaptor or the like, and capable of
transferring given data. Many of the existing card-shaped memories
have a storage capacity ranging from 2M bytes to 8M bytes.
Moreover, the digital solid-state recorders currently on the market
include those capable of recording sound in a card having a storage
capacity of 2M bytes for 20 min. to 40 min.
[0057] The solid-state recorders convert an analog signal input
through a microphone into digital PCM data, which is digital data
modulated in pulse code, or the like, compresses the PCM data
according to an algorithm for encoding based on the ADPCM or CELP,
and records compressed data in a flash memory card. The thus
recorded data can be read directly by a personal computer via an
adaptor.
[0058] The computer 1 of this embodiment reads voice data from the
flash memory card (second recording medium 10) mounted as mentioned
above.
[0059] Next, a voice recognition operation for recognizing voice
represented by voice data which is carried out by the computer 1
will be described.
[0060] To begin with, a user mounts a recording medium (first
recording medium 7), in which a given voice recognition program is
recorded, in the first input unit 5 of the computer 1. The computer
1 reads a given voice recognition program, which is application
software, from the connected first recording medium 7 into an
internal memory, which is not shown, via the first recording medium
driver 6. This causes the CPU 1a to control a voice recognition
operation following the program.
[0061] Now, the voice recognition operation to be carried out
according to the voice recognition program will be described.
[0062] FIG. 2 is a flowchart describing the first example (first
voice recognition program) of a voice recognition program recorded
in the recording medium in accordance with the present invention
having the voice recognition program recorded therein.
[0063] When the second recording medium 10 is mounted in the
computer 1, the CPU 1a reads voice data from a voice compressed
file containing voice data compressed and recorded by an external
solid-state recorder (step S1). The first voice recognition program
stretches compressed voice data into PCM data by reversely
following a compression algorithm according to which data is
recorded by the solid-state recorder (step S2). In other words,
this processing that is identical to reproduction performed by the
solid-state recorder is carried out by the computer 1 controlled by
the first voice recognition program.
[0064] The PCM data stretched at step S2 is subjected to voice
recognition (step S3). The voice-recognized data or data recognized
in voice is converted into text data (step S4), and the converted
text data is displayed on a display (display unit 3) (step S5).
This processing is continued until the voice-recognized data comes
to an end (step S6).
[0065] FIG. 3 shows an example of display appearing when the voice
recognition application software read from the first recording
medium 7 is activated in the computer 1 of this embodiment, or a
main screen used to reproduce voice data that is compressed data
representing voice.
[0066] FIG. 3 shows a main screen 11 in which: a menu bar 12 used
to select file-related handling or editing-related handling; a tool
button bar 13 presenting easily discernibly various kinds of
handling in the form of icons; a voice file list box 14 which
displays a list of information such as names of voice files
transferred from the second recording medium 10, recording times,
dates of recording, and priorities and in which a voice file whose
data is reproduced or voice-recognized is highlighted in contrast
with the other voice files; and a reproduction control 18 used to
carry out processing such as replay, stop, fast feed, or fast
return are displayed.
[0067] The tool button bar 13 is provided with a voice recognition
tool button group 21 consisting of a voice recognition start button
22, word recognition button 23, and list display button 24.
[0068] Moreover, the reproduction control 18 is provided with a
current position-of-reproduction indicator slider 15, lines 16, and
an index search button 17.
[0069] In the main screen 11 shown in FIG. 3, when the voice
recognition start button 22 belonging to the voice recognition tool
button group 21 included in the tool button bar 13 is pressed,
voice recognition of a voice file highlighted in the voice file
list box 14 is started. A text editor shown in FIG. 4 is started
up. Recognized voice data is displayed as serial text data in the
editor screen.
[0070] Next, a processing operation of recognizing a given number
of words at intervals of a given time since the start of a file
subjected to voice-recognition and displaying a list of the words
will be described.
[0071] The list display button 24 belonging to the voice
recognition tool button group 21 is a button used to recognize a
certain number of words at intervals of a certain time since the
start of a file subjected to voice recognition, and display the
words in the form of a list.
[0072] When the list display button 24 is pressed, a dialog box
shown in FIG. 5 appears. A user is prompted to enter the setting of
a time in sec, at intervals of which words will be recognized,
since the start of a file (file subjected to voice recognition)
highlighted in the voice file list box 14, and the setting of the
number of words to be recognized and displayed. If the user wants
to suspend the processing, he/she presses a cancel button shown in
FIG. 5. Thus, control can be returned to the main screen shown in
FIG. 3.
[0073] When the user enters the setting of the time interval and
the setting of the number of words to be recognized and presses the
start button, the dialog box shown in FIG. 5 is closed and a list
box shown in FIG. 6 appears.
[0074] FIG. 7 is a flowchart describing the second example (second
voice recognition program) of a voice recognition program recorded
in a recording medium in accordance with the present invention
having the voice recognition program recorded therein. Herein, a
processing operation of recognizing a given number of words at
intervals of a certain time since the start of a file subjected to
voice recognition, and displaying the words in the form of a list
is described.
[0075] Specifically, when the user sets the time interval and the
number of words to be recognized, and then presses the start
button, voice data is first read from a file subjected to voice
recognition and recorded in the second recording medium 10 (step
S11). The second voice recognition program stretches the compressed
voice data in the same manner as the first voice recognition
program (step S12). If a word coincident with a time instant when
the set time has elapsed is detected (step S13), stretched PCM data
starting with the word is voice-recognized (step S14).
[0076] The voice-recognized data is converted into text data (step
S15), and the converted text data is, as shown in FIG. 6, displayed
by the given number of words on the display (display unit 3).
Specifically, in the list box shown in FIG. 6, display of a
position-of-reproduction time passed since the start of the
voice-recognized file and display of text data starting at the
position of reproduction are carried out sequentially by the number
of words set in the dialog box shown in FIG. 5. This processing is
terminated when data comes to an end (step S17).
[0077] Next, a processing operation of recognizing voice started at
a given position in a file subjected to voice recognition will be
described.
[0078] When the position of reproduction indicated by the current
position-of-reproduction indicator slider 15 in the main screen 11
shown in FIG. 3 is changed, if the voice recognition start button
22 belonging to the voice recognition tool button group 21 is
pressed, voice recognition is started at the changed position of
reproduction. The result of voice recognition then appears in the
text editor screen shown in FIG. 4.
[0079] FIG. 8 is a flowchart describing the third example (third
voice recognition program) of a voice recognition program recorded
in a recording medium in accordance with the present invention
having the voice recognition program recorded therein, wherein a
processing operation of starting voice recognition at a given
position in a file subjected to voice recognition and displaying
the result is described.
[0080] Specifically, when a user changes the position of
reproduction indicated by the current position-of-reproduction
indicator slider 15 shown in FIG. 3, voice data is read from a file
subjected to voice recognition in the second recording medium (step
S21). The third voice recognition program stretches compressed
voice data in the same manner as the first voice recognition
program (step S22). If a word coincident with a given position is
detected (step S23), stretched PCM data starting with the word at
the given position is voice-recognized (step S24).
[0081] The voice-recognized data is converted into text data (step
S25), and the converted text data is displayed on the display
(display unit 3) (step S26). In other words, text data starting at
the given position set in the editor screen shown in FIG. 4 is
displayed. This processing is terminated when data comes to an
end.
[0082] Next, a processing operation of voice-recognizing a desired
word, which should be voice-recognized, among those contained in a
file subjected to voice recognition, and indicating the positions
of the desired word will be described.
[0083] The word recognition button 23 belonging to the voice
recognition tool button group 21 shown in FIG. 3 is a button for
use in voice-recognizing a desired word, which should be
voice-recognized, among those contained in a file subjected to
voice recognition, and indicating the positions of the desired
word. Specifically, when the word recognition button 23 is pressed,
only the word that should be voice-recognized is retrieved from a
voice-compressed file by carrying out voice recognition. Retrieved
locations are indicated with the lines 16 in the current
position-of-reproduction indicator slider 15 so that they can be
discerned at sight. The details will be described below.
[0084] When the word recognition button 23 is pressed, the dialog
box shown in FIG. 9 appears. With the dialog box, a user is
prompted to enter a specified word that should be recognized. For
suspending this processing, the cancel button is pressed. The
processing is then exited and the main screen shown in FIG. 3 is
returned.
[0085] FIG. 10 is a flowchart describing the fourth example (fourth
voice recognition program) of a voice recognition program recorded
in a recording medium in accordance with the present invention
having the voice recognition program recorded therein, wherein a
processing operation of voice-recognizing desired words alone,
which should be voice-recognized, among those contained in a file
subjected to voice recognition, and indicating the positions of the
desired words is described.
[0086] Specifically, after a desired word that should be recognized
is entered in the screen shown in FIG. 9 by a user, when the start
button is pressed, voice data is read from a file subjected to
voice recognition in the second recording medium (step S31). The
fourth voice recognition program stretches compressed voice data in
the same manner as the first voice recognition program (step S32).
Voice recognition is then started at the start of the selected
voice-compressed file (step S33).
[0087] Thereafter, when the word registered in the dialog box shown
in FIG. 9 is recognized from among those contained in the file
subjected to voice recognition (step S34), the positions of the
word are indicated with the lines 16 in the current
position-of-reproduction indicator slider 15 in the main screen 12
shown in FIG. 3. An index mark is inserted into a voice data item
coincident with the position. Every time the index search button 17
in the reproduction control 18 in the main screen 11 shown in FIG.
3 is pressed, control is skipped sequentially to one of the
positions indicated with the lines 16 (step S35 and step S36). This
facility can be validated not only when reproduction is stopped but
also when reproduction is under way.
[0088] When voice recognition involving the end of the
voice-compressed file is completed, all the positions at which the
registered word is found are indicated with the lines 16 in the
current position-of-reproduction indicator slider 15.
[0089] This processing is terminated when data comes to an end
(step S37).
[0090] Next, a processing operation of deleting a portion of voice
data corresponding to a designated portion of text data from a file
subjected to voice recognition will be described.
[0091] FIG. 11 is a flowchart describing the fifth example (fifth
voice recognition program) of a voice recognition program recording
in a recording medium in accordance with the present invention
having the voice recognition program recorded therein, wherein a
processing operation of deleting a portion of voice data
corresponding to a designated portion of text data from the second
recording medium 10 is described.
[0092] First, voice data is read from a file subjected to voice
recognition in the second recording medium 10 (step S41). The fifth
voice recognition program stretches compressed voice data in the
same manner as the first voice recognition program (step S42). The
stretched PCM data is voice-recognized (step S43).
[0093] The voice-recognized data is converted into text data (step
S44). Addresses in the second recording medium 10 associated with
words are detected and then listed (step S45). Table 1 indicates
the addresses in the second recording medium 10 allocated to an
example of text data "The sky is blue and the ocean is also
blue."
1 TABLE 1 Leading and last addresses in a Text Word recording
medium 1 the 3468H 3492H 2 sky 3494H 3560H 3 is 3580H 3600H 4 blue
3610H 3620H 5 and 3622H 3640H 6 the 3692H 3699H 7 ocean 3706H 3720H
8 is 3724H 3736H 9 also 3740H 3753H 10 blue 3760H 3770H
[0094] Thereafter, the above text data is kept displayed on the
display until the data comes to an end (step S46 and step S47).
[0095] When data comes to an end, it is judged whether or not the
text data should be deleted (step S48). If the data should be
deleted, a position of deletion is designated in the text data
(step S49). Addresses in the second recording medium 10 associated
with the designated position are retrieved from Table 1 (step
S50).
[0096] Thereafter, voice data is read from the second recording
medium 10 (step S51), and stretched (step S52). The portion of the
voice data defined by the addresses is deleted (step S53).
Thereafter, the voice data is compressed again (step S54) and then
overwritten (step S55).
[0097] In this embodiment, addresses are listed so that a position
of deletion in text data can be associated with a position in the
second recording medium. The present invention is not limited to
this mode. For example, times passed since the start of a file may
be recorded in the form of a list.
[0098] According to the voice recognition program of the first
embodiment recorded in a recording medium to be adapted to a
computer, a CPU is conventionally requested to exhibit a great
processing capability because when voice output through a
microphone is recognized directly, voice recognition must be
carried out in real time. However, since stretching of a
voice-compressed file and voice recognition should merely be
repeated, the advantage that real-time processing is not required
and the CPU is not requested to exhibit a great processing
capability is exerted.
[0099] Moreover, since real-time processing is not required, there
is the advantage that an algorithm permitting voice recognition
with high precision can be created.
[0100] Furthermore, since the contents of a portion of a
voice-compressed file can be discerned at sight, what is recorded
at which position of reproduction can be grasped broadly.
[0101] Only a portion of an existing voice-compressed file which
should be converted into text data can be voice-recognized.
[0102] In addition, control can be skipped spontaneously from an
existing voice-compressed file to the position of a word serving as
a keyword. A position of the word that should be retrieved can be
reached at once.
[0103] Furthermore, even after data is recorded, since a word can
be designated later and an index mark can be inscribed in the
recorded data, usefulness improves. Besides, even after data is
recorded, since an unnecessary word can be designated later and
deleted from the recorded data, an unsuccessful dictation can be
deleted easily.
[0104] In the computer 1 of the first embodiment, the first
recording medium 7 is an external recording medium. After a
recording medium having a given voice recognition program recorded
therein is mounted in the computer 1, the given voice recognition
program that is application software can be read from the recording
medium. The present invention is not limited to this mode.
Alternatively, any mode will do as long as a given voice
recognition program can be activated by working on the CPU 1a in
the computer.
[0105] For example, the computer 1 may be provided with a recording
medium having a voice recognition program recorded therein in
advance so that the voice recognition program can be read any
time.
[0106] FIGS. 12 to 17 relate to the second embodiment of the
present invention. FIG. 12 is a conceptual diagram showing the
overall configuration of a dictation system to which the present
invention is adapted.
[0107] The dictation system comprises: as shown in FIG. 12, a
digital recorder 26 that is a voice recording apparatus for
converting voice into an electric signal and producing voice data;
a miniature card 10A, freely detachably attached to the digital
recorder 26, serving as a voice date recording medium in which
voice data is recorded; a PC card adaptor 27 used to insert the
miniature card 10A into a PC card slot 9A (See FIG. 16) to be
described later for connection; and a personal computer 1A
including a display 3A serving as a display means, and a keyboard
2A and mouse 2B serving as an operation unit, and acting as a voice
recognition apparatus for processing voice data read from the
miniature card 10A through the PC card slot 9A according to a
control program 28 or a voice recognition program 29.
[0108] FIG. 13 is a block diagram showing the electrical
configuration of the digital recorder 26.
[0109] The digital recorder 26 comprises: as shown in FIG. 13, a
microphone 31 serving as a voice data input means for inputting
voice and converting it into an electric signal; a microphone
amplifier 32 for amplifying a voice signal sent from the microphone
31 to a proper level; a lowpass filter 33 for removing unnecessary
high-frequency components from the voice signal amplified by the
microphone amplifier 32; an A/D converter 34 for converting an
analog voice signal output from the lowpass filter 33 into digital
data; an encoder-decoder 35 for encoding (compressing) the
digitized voice signal during an recording operation, and decoding
(stretching) encoded data during a reproduction operation; a memory
control unit 36 serving as a recording means for controlling
recording or reproduction of voice information in or from a voice
memory 37, which will be described later, on the basis of address
information given by a system control unit 38 to be described
later; a voice memory 37 incorporated in the miniature card 10A
serving as a voice data recording medium and formed with, for
example, a semiconductor memory; a miniature card attachment 44
serving as a recording medium attaching means enabling the
miniature card 10A including the voice memory 37 to be freely
attached or detached to or from the digital recorder 26; a D/A
converter 39 for converting the digital voice signal output from
the encoder-decoder 35 into an analog signal; a lowpass filter 40
for removing unnecessary high-frequency components from a voice
signal converted into an analog form by the D/A converter 39; a
power amplifier 41 for amplifying an analog voice signal output
from the lowpass filter 40; a loudspeaker 42 for uttering sound
when driven by the power amplifier 41; an operation input unit 43
composed of various kinds of operation buttons including a check
mark button 43a (See FIG. 14) to be described later; and a system
control unit 38 that controls the digital recorder 26 including the
encoder-decoder 35, memory control unit 36, and voice memory 37 in
a centralized manner and that serves as a recording means to which
an output terminal of the operation input unit 43 is connected.
[0110] FIG. 14 is a diagram showing a scene in which the check mark
button of the digital recorder is handled during dictation.
[0111] The check mark button 43a serving as an interval designating
means of the operation input unit 43 is, as shown in FIG. 14,
located at a position enabling the thumb of a hand, by which the
digital recorder 26 is grabbed, to handle the check mark button
easily. The check mark button is a button to be pressed in order to
append a check mark, which indicates that an uttered word is an
unnecessary word, to voice data when an unnecessary word or the
like is uttered while the contents of a document to be created are
being dictated.
[0112] The unnecessary word or the like is uttered unconsciously.
The instant an unnecessary word was uttered, a speaker can
recognize the uttered word as an unnecessary word. Since the check
mark button 43a is located at a position enabling the speaker to
press it easily, a check mark can be appended readily if
necessary.
[0113] FIG. 15 is a diagram showing the format of data to be
recorded in the voice memory 37 in the miniature card 10A by the
digital recorder 26.
[0114] One record data is managed in the form of a file. In each
file, information, for example, a date of recording and a recording
time is written as a file header. In the remaining area, data
divided into frames is written.
[0115] Moreover, each frame includes check mark information
indicating whether or not the check mark button 43a has been
pressed, and encoded voice data. The check mark information is
structured as, for example, a flag of, for example, 1 bit long.
When the check mark button 43a is pressed, the flag is set to "1."
When the check mark button 53a is not pressed, the flag is set to
"0."
[0116] FIG. 16 is a block diagram showing the electrical
configuration of the personal computer 1A.
[0117] The personal computer 1A carries out voice reproduction,
information display, and the like according to the control program
28, carries out documentation according to the voice recognition
program 29, and also carries out various kinds of processing
according to the other various kinds of programs. The personal
computer 1A comprises: a CPU 51 serving as a detecting means, a
level adjusting means, a voice recognizing means, a voice rating
means, a minimum value calculating means, a gain value calculating
means, a multiplying means, and an averaging means; a main memory
52 serving as a recording medium offering a work area for the CPU
51; an internal recording medium 53 serving as a recording medium
which is formed with, for example, a hard disk or floppy disk and
in which the control program 28 and voice recognition program 29
are recorded; an external port 54 used to connect the personal
computer to various kinds of external equipment; an interface 55
used to connect the display 3A to the personal computer; an
interface 56 used to connect the keyboard 2A or mouse 2B; a
loudspeaker 4A that is a voice output unit for uttering sound on
the basis of voice data; an interface 57 used to connect the
loudspeaker 4A; a PC card slot 9A which serves as a voice data
reading means and into which the miniature card 10 attached to the
PC card adaptor 27 is inserted; and an interface 58 used to connect
the PC card slot 9A. The CPU 51, main memory 52, internal recording
medium 53, external port 54, and interfaces 55, 56, 57, and 58 are
interconnected over a bus.
[0118] Voice data may be read directly from the miniature card 10A
via the PC card slot 9A. Alternatively, the voice data may be
temporarily recorded in the internal recording medium 53 and read
from the internal recording medium 53. Otherwise, the voice data
may be read directly from the digital recorder 26 via a
communication means or the like. Thus, the voice data reading means
is not limited to the PC card slot.
[0119] Moreover, an example of screen display attained by running
the control program in the personal computer is nearly identical to
that shown in FIG. 3.
[0120] FIG. 17 is a flowchart describing processing of voice
recognition carried out in the personal computer 1A.
[0121] The voice recognition is, as mentioned later, carried out
stepwise in the order of phoneme recognition, word recognition, and
sentence recognition.
[0122] Specifically, when the voice recognition start button 22
belonging to the voice recognition tool button group 21 in the tool
button bar 13 in the main screen 11 is clicked, voice recognition
is started. A voice file highlighted in the voice file list box 14
is read in units of a given frame (step S61), and decoded in units
of the frame (step S62).
[0123] The decoded voice data is passed to the voice recognition
program 29. First, a phoneme is identified (step S63). Word
recognition is then carried out, wherein a word stream that matches
input voice most satisfactorily is retrieved on the basis of a
given language model suggested by the identified phoneme (step
S64).
[0124] What is referred to as the language model is a model giving
a probability of occurrence that suggests a given word stream. As
the language model, various forms have been conceived. However, an
efficient model taking account of unnecessary words or the like has
not been devised yet.
[0125] In this embodiment, therefore, check mark information
located at the start of each frame shown in FIG. 15 is checked to
see if a word represented by data in a frame immediately preceding
the frame is an unnecessary word or the like.
[0126] Specifically, it is judged whether or not the check mark
information is 1 (step S65). If the check mark information is 1, a
word represented by data in a frame immediately preceding the frame
is not regarded as an object of processing of sentence recognition
of the next step (step S66). If the check mark information is 0,
sentence recognition is carried out (step S67).
[0127] Character conversion for converting voice data into
character codes on the basis of a recognized sentence (step S68).
The result of recognition is displayed in a screen on the display
3A (step S69).
[0128] Thereafter, it is judged whether or not the voice file has
come to an end (step S70). If the voice file has not come to an
end, control is returned to step S61. If the voice file has come to
an end, the processing is terminated.
[0129] The processing of not regarding an unnecessary word as an
object of recognition according to the result of detecting check
mark information has been described to be carried out within the
voice recognition program 29. The present invention is not limited
to this mode. Alternatively, the processing may be carried out
within, for example, the control program 28, and the result may be
passed to the voice recognition program 29.
[0130] In this case, the control program 28 causes the personal
computer 1A to fetch voice data from the miniature card 10A, and to
detect check mark information appended to the voice data. If the
check mark information is 1, the voice data is not passed to the
voice recognition program 29. If the check mark information is 0,
the voice data is passed to the voice recognition program 29.
[0131] Moreover, a word represented by data in a frame immediately
preceding a frame including check mark information of 1 has been
described to be not regarded as an object of voice recognition. The
present invention is not limited to this mode. For example, a word
represented by data in a frame including check mark information of
1 may not be regarded as an object of voice recognition.
[0132] Furthermore, the result of voice recognition has been
described to be displayed as characters on the display 3A. The
present invention is not limited to this mode. For example, the
characters may be output as character data to a recording medium or
may be displayed and output simultaneously.
[0133] The check mark information has been described to be recorded
during recording by the digital recorder 26. Alternatively, the
system may be configured so that the check mark information can be
designated during reproduction by the digital recorder 26 or
reproduction by the personal computer 1A.
[0134] According to the second embodiment, when a speaker presses
the check mark button, a check mark is recorded in voice data.
During processing of reproduction and voice recognition, the check
mark is detected. A word represented by data in a frame having a
check mark inscribed therein or a word represented by data in a
frame preceding or succeeding the frame having the check mark
inscribed therein is not regarded as an object of voice
recognition. Consequently, treatment of an unnecessary word or the
like which has not been able to be achieved in the past can be
carried out easily without the need of increasing the load of voice
recognition, that is, the need of especially fast processing. This
results in a good-quality dictation system capable of achieving
voice recognition properly and creating a document with few
mistakes.
[0135] FIGS. 18 to 21 relate to the third embodiment of the present
invention. The conceptual overall configuration of a dictation
system of the third embodiment is identical to that shown in FIG.
12. Moreover, the electric configuration of the personal computer
1A is identical to that shown in FIG. 16.
[0136] Next, FIG. 18 is a diagram showing the overall flow of
reading voice data from a voice memory and recognizing voice which
is followed by the dictation system, and FIG. 19 is a flowchart
describing processing of voice recognition carried out by the
dictation system.
[0137] As described in FIG. 19, when the processing is started,
voice data recorded in units of a file is read from a voice memory
61 in the miniature card 10A or internal recording medium 53, and
Decoding 62 is executed (step S71).
[0138] The result of decoding 62 is sent to Voiceful-or-voiceless
Judgment 63 and Sample Absolute Value Averaging 64.
[0139] Voiceful-or-voiceless Judgment 63 calculates a threshold
value used for voiceful-or-voiceless judgment (step S72). Based on
the calculated threshold value, whether voice data is voiceful or
voiceless is judged (step S73). This processing will be explained
in detail later in conjunction with FIG. 20. The result of
voiceful-or-voiceless judgment 63 is sent to Sample Absolute Value
Averaging 64.
[0140] Sample Absolute Value Averaging 64 and Gain Calculation 65
are executed to calculate a gain (step S74). This processing will
be described in conjunction with FIG. 21 later. Based on a gain
calculated by Gain Calculation 65, Gain Multiplication 66 amplifies
an output of Decoding 62 (step S75).
[0141] Voice data adjusted to a proper level by Gain Multiplication
66 is sent to Voice Recognition 67, whereby voice recognition is
carried out (step S76).
[0142] Character conversion is carried out for converting the
result of voice recognition into character codes (step S77).
Resultant character codes are output and displayed 68 in a screen
on the display 3A or the like (step S78).
[0143] FIG. 20 is a flowchart describing the contents of processing
relevant to voiceful-or-voiceless judgment performed at steps S72
and S73.
[0144] When this processing is started, first, a variable f
indicating a count of the number of frames is initialized to 0
(step S81).
[0145] After the variable f is incremented (step S82), a level of
frame energy e(f) is calculated according to an illustrated formula
(step S83). In the formula, s(i) denotes an input signal of the
(i-1)-th sample out of one frame, and N denotes the number of
frames constituting one file.
[0146] It is then judged whether or not the variable f is 1, that
is, a frame to be treated is an initial frame (step S84). If the
variable f is 1, a variable min indicating a minimum level of frame
energy is set to e(1) (step S86).
[0147] If it is found at step S84 that the variable f is not 1, it
is judged whether or not the level of frame energy e(f) is smaller
than the variable min (step S85). If the level of frame energy e(f)
is smaller, the variable min is set to the level of frame energy
e(f) (step S87). By contrast, if the level of frame energy e(f) is
not smaller, nothing is done but control is passed to the nest step
S88.
[0148] It is then judged whether or not the file has come to an end
(step S88). If the file has not come to an end, control is returned
to step S82 and the foregoing processing is repeated.
[0149] If it is judged at step S88 that the file has come to an
end, a product of the variable min by a given value a (for example,
1.8) is set as a threshold value trs (step S89). The processing is
then exited.
[0150] This procedure of setting a threshold value is making the
most of the fact that voice data is already recorded. Since the
threshold value can be determined on the basis of the minimum
energy level of the whole file, voiceful-or-voiceless judgment can
be achieved with a little error.
[0151] As described above, minimum values of all read intervals
(that is, all the frames constituting a voice file) are calculated.
The present invention is not limited to this mode. Instead of the
minimum values of all the intervals, a minimum value of an interval
of a certain length will do.
[0152] Next, FIG. 21 is a flowchart describing the contents of gain
calculation to be performed at step S74 in FIG. 19.
[0153] When this processing is started, a variable f indicating a
count of the number of frames, a variable SumAbs indicating a sum
of absolute values of samples, and a variable Cnt indicating the
number of additions are initialized to Os (step S91).
[0154] The variable f is then incremented (step S92). It is judged
whether or not the level of frame energy e(f) calculated within the
processing described in FIG. 20 is larger than the threshold value
trs (step S93). If the level of frame energy e(f) is larger than
the threshold value trs, the sum of absolute values of samples of
frames is added to the variable SumABs (step S94), and the variable
Cnt is incremented (step S95).
[0155] If it is found at step S93 that the level of frame energy
e(f) is equal to or smaller than the threshold value, control is
passed to the next step S96.
[0156] Thereafter, it is judged whether or not the file has come to
an end (step S96). If the file has not come to an end, control is
returned to step S92 and the foregoing processing is repeated.
[0157] If it is judged at step S96 that the file has come to an
end, the variable SumAbs is divided by the variable Cnt in order to
calculate an average value, average, of the absolute values of
samples of frames (step S97).
[0158] A given value LEV is divided by the average value, average,
in order to calculate a gain, gain (step S98). Herein, the given
value LEV is set to the average value of the predicted absolute
values of samples. For example, an average value of absolute values
of voice samples used to learn voice data by a voice recognizer is
employed.
[0159] According to the third embodiment, already-recorded voice
data can be adjusted to a sound level suitable for voice
recognition. Voice recognition can therefore be carried out on a
stable basis irrespective of a sound level of recorded voice data.
This results in a high-quality dictation system.
[0160] In this invention, it is apparent that a wide range of
different working modes can be formed on the basis of the invention
without a departure from the spirit and scope of the invention.
This invention is not restricted to any specific embodiment but is
limited to the appended claims.
* * * * *