U.S. patent application number 10/375272 was filed with the patent office on 2003-08-28 for singing voice synthesizing apparatus, singing voice synthesizing method and program for singing voice synthesizing.
This patent application is currently assigned to Yamaha Corporation. Invention is credited to Bonada, Jordi, Kemmochi, Hideki, Yoshioka, Yasuo.
Application Number | 20030159568 10/375272 |
Document ID | / |
Family ID | 27750971 |
Filed Date | 2003-08-28 |
United States Patent
Application |
20030159568 |
Kind Code |
A1 |
Kemmochi, Hideki ; et
al. |
August 28, 2003 |
Singing voice synthesizing apparatus, singing voice synthesizing
method and program for singing voice synthesizing
Abstract
A singing voice synthesizing apparatus comprises: a storage
device that stores singing voice information for synthesizing a
singing voice; a phoneme database that stores articulation data of
a transition part including an articulation for a transition from
one phoneme to another and stationary data of a long sound part
including stationary part where one phoneme is stably pronounced; a
selecting device that selects data in the phoneme database in
accordance with the singing voice information; a first outputting
device that outputs a characteristic parameter of the transition
part by extracting the characteristic parameter of the transition
part from the selected articulation data; and a second outputting
device that obtains the articulation data before and after the
stationary data of a long sound part selected by the selecting
device, generates and outputs a characteristic parameter of the
long sound part by interpolating the obtained data.
Inventors: |
Kemmochi, Hideki;
(Shimizu-shi, JP) ; Yoshioka, Yasuo;
(Hamamatsu-shi, JP) ; Bonada, Jordi; (Barcelona,
ES) |
Correspondence
Address: |
Mr. Roger R. Wise
PILLSBURY MADISON & SUTRO LLP
725 South Figueroa Street, Suite 1200
Los Angeles
CA
90017
US
|
Assignee: |
Yamaha Corporation
Hamamatsu-shi
JP
|
Family ID: |
27750971 |
Appl. No.: |
10/375272 |
Filed: |
February 27, 2003 |
Current U.S.
Class: |
84/626 ;
704/E13.001 |
Current CPC
Class: |
G10H 7/00 20130101; G10H
2250/455 20130101; G10L 13/00 20130101; G10H 2240/056 20130101;
G10H 2250/235 20130101 |
Class at
Publication: |
84/626 |
International
Class: |
G10H 001/02; G10H
007/00; G01P 003/00 |
Foreign Application Data
Date |
Code |
Application Number |
Feb 28, 2002 |
JP |
2002-054487 |
Claims
What are claimed are:
1. A singing voice synthesizing apparatus, comprising: a storage
device that stores singing voice information for synthesizing a
singing voice; a phoneme database that stores articulation data of
a transition part that includes an articulation for a transition
from one phoneme to another phoneme and stationary data of a long
sound part that includes stationary part where one phoneme is
stably pronounced; a selecting device that selects data stored in
the phoneme database in accordance with the singing voice
information; a first outputting device that outputs a
characteristic parameter of the transition part by extracting the
characteristic parameter of the transition part from the
articulation data selected by the selecting device; and a second
outputting device that obtains the articulation data before and
after the stationary data of a long sound part selected by the
selecting device, generates a characteristic parameter of the long
sound part by interpolating the obtained two articulation data and
outputs the generated characteristic parameter of the long sound
part.
2. A singing voice synthesizing apparatus according to claim 1,
wherein the second outputting device generates the characteristic
parameter of the long sound part by adding a changing component of
the stationary data to the interpolated articulation data.
3. A singing voice synthesizing apparatus according to claim 1,
wherein the articulation data stored in the phoneme database
includes a characteristic parameter of the articulation and
stochastic component, and the first outputting device further
separates the stochastic component.
4. A singing voice synthesizing apparatus according to claim 3,
wherein the characteristic parameter of the articulation and the
stochastic component are obtained by a SMS analysis of a voice.
5. A singing voice synthesizing apparatus according to claim 1,
wherein the stationary data stored in the phoneme database includes
a characteristic parameter of the stationary part and stochastic
component, and the second outputting device further separates the
stochastic component.
6. A singing voice synthesizing apparatus according to claim 5,
wherein the characteristic parameter of the articulation and the
stochastic component are obtained by a SMS analysis of a voice.
7. A singing voice synthesizing apparatus according to claim 1,
wherein the singing voice information includes dynamics
information, and further comprises a correcting device that
corrects the characteristic parameters of the transition part and
the long sound parts in accordance with the dynamics
information.
8. A singing voice synthesizing apparatus according to claim 7,
wherein the singing voice information further includes pitch
information, and the correcting device at least comprises a first
calculating device that calculates a first amplitude value
corresponding to the dynamics information and a second calculating
device that calculates a second amplitude value corresponding to
the characteristic parameters of the transition part and the long
sound parts and the pitch, and corrects the characteristic
parameters in accordance with a difference between the first and
the second amplitude value.
9. A singing voice synthesizing apparatus according to claim 8,
wherein the first calculating device comprises a table storing a
relationship between the dynamics information and the amplitude
values.
10. A singing voice synthesizing apparatus according to claim 9
wherein the table storing the relationship corresponding to each
kind of phoneme.
11. A singing voice synthesizing apparatus according to claim 9,
wherein the table storing the relationship corresponding to each
frequency.
12. A singing voice synthesizing apparatus according to claim 1,
wherein the phoneme database stores the articulation data and the
stationary data respectively associated with pitches, and the
selecting device stores the characteristic parameters of the same
articulation respectively associated pitches and selects the
articulation data and the stationary data in accordance with input
pitch information.
13. A singing voice synthesizing apparatus according to claim 12,
wherein the phoneme database further stores expression data, and
the selecting device selects the expression data in accordance with
expression information included in the input singing voice
information.
14. A singing voice synthesizing method, comprising the steps of:
(a) storing articulation data of a transition part that includes an
articulation for a transition from one phoneme to another phoneme
and stationary data of a long sound part that includes stationary
part where one phoneme is stably pronounced into a phoneme
database; (b) inputting singing voice information for synthesizing
a singing voice; (c) selecting data stored in the phoneme database
in accordance with the singing voice information; (d) outputting a
characteristic parameter of the transition part by extracting the
characteristic parameter of the transition part from the
articulation data selected by the step (c); and (e) obtaining the
articulation data before and after the stationary data of a long
sound part selected by the selecting device, generating a
characteristic parameter of the long sound part by interpolating
the obtained two articulation data and outputting the generated
characteristic parameter of the long sound part.
15. A singing voice synthesizing method according to claim 14,
wherein the step (e) generates the characteristic parameter of the
long sound part by adding a changing component of the stationary
data to the interpolated articulation data.
16. A singing voice synthesizing method according to claim 14,
wherein the singing voice information includes dynamics
information, and further comprises the step of (f) correcting the
characteristic parameters of the transition part and the long sound
parts in accordance with the dynamics information.
17. A singing voice synthesizing method according to claim 16,
wherein the singing voice information further includes pitch
information, and the step (f) at least comprises sub-steps of (f1)
calculating a first amplitude value corresponding to the dynamics
information and (f2) calculating a second amplitude value
corresponding to the characteristic parameters of the transition
part and the long sound parts and the pitch, and correcting the
characteristic parameters in accordance with a difference between
the first and the second amplitude value.
18. A singing voice synthesizing program which a computer can
execute, the program comprising the instructions of: (a) storing
articulation data of a transition part that includes an
articulation for a transition from one phoneme to another phoneme
and stationary data of a long sound part that includes stationary
part where one phoneme is stably pronounced into a phoneme
database; (b) inputting singing voice information for synthesizing
a singing voice; (c) selecting data stored in the phoneme database
in accordance with the singing voice information; (d) outputting a
characteristic parameter of the transition part by extracting the
characteristic parameter of the transition part from the
articulation data selected by the instruction (c); and (e)
obtaining the articulation data before and after the stationary
data of a long sound part selected by the selecting device,
generating a characteristic parameter of the long sound part by
interpolating the obtained two articulation data and outputting the
generated characteristic parameter of the long sound part.
19. A singing voice synthesizing program according to claim 18,
wherein the instruction (e) generates the characteristic parameter
of the long sound part by adding a changing component of the
stationary data to the interpolated articulation data.
20. A singing voice synthesizing program according to claim 18,
wherein the singing voice information includes dynamics
information, and further comprises the instruction of (f)
correcting the characteristic parameters of the transition part and
the long sound parts in accordance with the dynamics
information.
21. A singing voice synthesizing program according to claim 20,
wherein the singing voice information further includes pitch
information, and the instruction (f) at least comprises
sub-instructions of (f1) calculating a first amplitude value
corresponding to the dynamics information and (f2) calculating a
second amplitude value corresponding to the characteristic
parameters of the transition part and the long sound parts and the
pitch, and correcting the characteristic parameters in accordance
with a difference between the first and the second amplitude value.
Description
CROSS REFERENCE TO RELATED APPLICATION
[0001] This application is based on Japanese Patent Application
2002-054487, filed on Feb. 28, 2002 the entire contents of which
are incorporated herein by reference.
BACKGROUND OF THE INVENTION
[0002] A) Field of the Invention
[0003] This invention relates to a singing voice synthesizing
apparatus, a singing voice synthesizing method and a program for
singing voice synthesizing for synthesizing a human singing
voice.
[0004] B) Description of the Related Art
[0005] In a conventional singing voice synthesizing apparatus, data
obtained from an actual human singing voice is stored as a
database, and data that agrees with contents of an input
performance data (a musical note, a lyrics, an expression and the
like) is chosen from the database. Then, a singing voice that is
close to the real human singing voice is synthesized by a data
conversion of this performance data based on the chosen data.
[0006] A principle of the singing voice synthesizing is explained
in Japanese Patent Application No.2001-67258, which was filed by
the applicant of the present invention, with reference to FIGS. 7
and 8.
[0007] The principle of the singing voice synthesizing apparatus
mentioned by Japanese Patent Application No.2001-67258 is shown in
FIG. 7. This singing voice synthesizing apparatus equips a timbre
template database 51 in which data for characteristic parameters of
phoneme (timbre template) at one point is stored, a constant part
(stationary) template database 53 in which data (the stationary
template) for slight change of the characteristic parameters in a
long sound is stored and a phonemic chain (articulation) template
database 52 in which data (the articulation template) that change
from a phoneme to a phoneme for the characteristic parameters of
the transition part is shown.
[0008] The characteristic parameter is generated by applying these
templates by doing as follows.
[0009] That is, synthesizing of the long sound part is executed by
adding changing component included in the stationary template on
the characteristic parameter obtained from the timbre template.
[0010] On the other hand, however, synthesizing of the transition
part is executed by adding the changing component included in the
articulation template on the characteristic parameter obtained from
the timbre template, a characteristic parameter to be added with is
different by cases. For example, in a case that a front and a rear
phonemes of the transition part are both voiced sounds, the
changing component included in the articulation template on the
characteristic parameter is added on what is obtained by linear
interpolation of the characteristic parameter of the front part
phoneme and the characteristic parameter of the rear part phoneme.
Also, in a case that the front part phoneme is a voiced sound and
the rear part phoneme is a silence, the changing component included
in the articulation template on the characteristic parameter is
added on the characteristic parameter of the front part phoneme.
Also, in a case that the front part phoneme is a silence and the
rear part phoneme is a voiced sound, the changing component
included in the articulation template-on the characteristic
parameter is added on the characteristic parameter of the rear part
phoneme. As doing as the above, in the singing voice synthesizing
apparatus disclosed in Japanese Patent Application No.2001-67258,
the characteristic parameter generated from the timbre template is
a standard, and singing voice synthesizing is executed by change of
the characteristic parameter of the articulation part so that it is
agreed with the characteristic parameter of this timbre part.
[0011] In the singing voice synthesizing apparatus disclosed in
Japanese Patent Application No.2001-67258, there were cases that
the singing voice to be synthesized was unnatural. The causes for
that are the followings:
[0012] a change in the characteristic parameter of the transition
part is different from a change in that if original transition part
because the change of the articulation template is changed; and
[0013] a phoneme before a long sound part is always same regardless
of a kind of the phoneme because the characteristic parameter of
the long sound part is also calculated from the addition of the
characteristic parameter generated from the timbre template with
the changing component of the stationary template.
[0014] That is, in the singing voice synthesizing apparatus
disclosed in Japanese Patent Application No.2001-67258, there were
cases that the synthesized singing voice was unnatural because the
parameter of the long sound and the transition part has been added
based on the characteristic parameter of the timbre template that
is just a part of whole singing song.
[0015] For example, in the conventional singing voice synthesizing
apparatus, in a case of making a singer sing "saita", phonemes
between phonemes do not transit naturally, and the singing voice to
be synthesized has an unnatural audio sound. Also, there is a case
that it cannot be judged what the synthesized singing voice is
singing.
[0016] That is, in the singing voice, for example, in a case of
singing "saita", it is pronounced without partitions of each
phoneme ("sa" "i" and "ta"), and it is normally pronounced by
inserting a long sound part and a transition part between each
phoneme as "[#s] sa (a), [ai], i, (i), [it], ta, (a) ("#"
represents a silence). In this case of the example of "saita",
[#s], [ai] and [it] are the transition parts, and (a), (i) and (a)
are the long sounds. Therefore, in a case that a singing voice is
synthesized from performance data such as MIDI information, it is
significant how realistically the transition part and the long
sound part are generated.
SUMMARY OF THE INVENTION
[0017] It is an object of the present invention to provide a
singing voice synthesizing apparatus that can naturally reproduce a
transition part.
[0018] According to the present invention, high naturality of a
synthesized singing voice of the transition part can be kept.
[0019] According to one aspect of the present invention, there is
provided a singing voice synthesizing apparatus, comprising: a
storage device that stores singing voice information for
synthesizing a singing voice; a phoneme database that stores
articulation data of a transition part that includes an
articulation for a transition from one phoneme to another phoneme
and stationary data of a long sound part that includes stationary
part where one phoneme is stably pronounced; a selecting device
that selects data stored in the phoneme database in accordance with
the singing voice information; a first outputting device that
outputs a characteristic parameter of the transition part by
extracting the characteristic parameter of the transition part from
the articulation data selected by the selecting device, and a
second outputting device that obtains the articulation data before
and after the stationary data of a long sound part selected by the
selecting device, generates a characteristic parameter of the long
sound part by interpolating the obtained two articulation data and
outputs the generated characteristic parameter of the long sound
part.
[0020] According to another aspect of the present invention, there
is provided a singing voice synthesizing method, comprising the
steps of: (a) storing articulation data of a transition part that
includes an articulation for a transition from one phoneme to
another phoneme and stationary data of a long sound part that
includes stationary part where one phoneme is stably pronounced
into a phoneme database; (b) inputting singing voice information
for synthesizing a singing voice; (c) selecting data stored in the
phoneme database in accordance with the singing voice information;
(d) outputting a characteristic parameter of the transition part by
extracting the characteristic parameter of the transition part from
the articulation data selected by the step (c); and
[0021] (e) obtaining the articulation data before and after the
stationary data of a long sound part selected by the selecting
device, generating a characteristic parameter of the long sound
part by interpolating the obtained two articulation data and
outputting the generated characteristic parameter of the long sound
part.
[0022] According to the present invention, only the articulation
template database 52 and the stationary template database 53 are
used, and the timbre template is basically not necessary.
[0023] After dividing the performance data into the transition part
and the long sound part, the articulation template is used without
change in the transition part. Therefore, singing voice of the
transition parts that are significant parts of the song sounds
natural, and quality of the synthesized singing voice will be
high.
[0024] Also, as for the long sound part, the characteristic
parameter of the transition parts of both ends of the long sound is
executed linear interpolation, and a characteristic parameter is
generated by adding the changing component included in the
stationary template on the interpolated characteristic parameter.
The singing voice will not be unnatural because of interpolation
based on data without change of the template.
BRIEF DESCRIPTION OF THE DRAWINGS
[0025] FIGS. 1A to 1C are a functional block diagram of a singing
voice synthesizing apparatus and an example of phoneme database
according to a first embodiment of the present invention.
[0026] FIGS. 2A and 2B show an example of a phoneme database 10
shown in FIG. 1.
[0027] FIG. 3 is a detail of a characteristic parameter correcting
unit 21 shown in FIG. 1.
[0028] FIG. 4 is a flow chart showing steps of data management in
the singing voice synthesizing apparatus according to a first
embodiment of the present invention.
[0029] FIGS. 5A to 5C are a functional block diagram of the singing
voice synthesizing apparatus and an example of phoneme database
according to a second embodiment of the present invention.
[0030] FIGS. 6A to 6C are a functional block diagram of the singing
voice synthesizing apparatus and an example of phoneme database
according to a third embodiment of the present invention.
[0031] FIG. 7 shows a principle of a singing voice synthesizing
apparatus disclosed in Japanese Patent Application
No.2001-67258.
[0032] FIG. 8 shows a principle of a singing voice synthesizing
apparatus according to the present invention.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
[0033] FIGS. 1A to 1C (hereinafter just called FIGS. 1) are a
functional block diagram of a singing voice synthesizing apparatus
and an example of phoneme database according to a first embodiment
of the present invention. The singing voice synthesizing apparatus
is, for example, realized by a general personal computer, and
functions of each block shown in FIGS. 1 can be accomplished by a
CPU, a RAM and a ROM in the personal computer. It can be
constructed also by a DSP and a logical circuit. A phonemic
database 10 has data for synthesizing a synthesized voice based on
a performance data. FIG. 1C shows an example of this phonemic
database 10 that is later explained with reference to FIGS. 2.
[0034] As shown in FIG. 2A, a voice signal such as singing song
data and the like that is actually recorded or obtained is
separated into a deterministic component (a sine wave component)
and a stochastic component by a spectral modeling synthesis (SMS)
analyzing device 31. Other analyzing methods such as a linear
predictive coding (LPC) and the like can be used instead of the SMS
analysis.
[0035] Next, the voice signal is divided by phonemes by a phoneme
dividing unit 32 based on phoneme dividing information. For
example, phoneme dividing information is normally input by a human
operation of a predetermined switch with reference to a waveform of
a voice signal.
[0036] Then, a characteristic parameter is extracted from the
deterministic component of the voice signal divided by phonemes by
a characteristic parameter extracting unit 33. The characteristic
parameter includes an excitation waveform envelope, a formant
frequency, a formant width, formant intensity, a spectrum of
difference and the like.
[0037] The excitation waveform envelope (excitation curve) is
consisted of an Egain that represents a magnitude of a vocal cord
waveform (dB), an EslopeDepth that represents slope for the
spectrum envelope of the vocal tract waveform, and an Eslope that
represents depth from a maximum value to a minimum value for the
spectrum envelope of the vocal cord vibration waveform (dB).
ExcitationCurve can be expressed by the following equation (A):
ExcitationCurve(.function.)=EGain+ESlopeDepth*(exp(-ESlope*.function.)-1)
(A)
[0038] The excitation resonance represents chest resonance. It is
consisted of three parameters: a central frequency (ERFreq), a band
width (ERBW) and an amplitude (ERAmp), and has a secondary
filtering character.
[0039] The formant represents a vocal tract resonance by combining
1 to 12 resonances. It is consisted of three parameters: a central
frequency (Formant Freqi, i is an integral number from 1 to 12), a
band width (FormantBWi, i is an integral number from 1 to 12) and
an amplitude (FormantAmpi, i is an integral number from 1 to
12).
[0040] The differential spectrum is a characteristic parameter that
has a differential spectrum from an original deterministic
component that cannot be expressed by the above three: the
excitation waveform envelope, the excitation resonance and the
formant.
[0041] This characteristic parameter is stored in a phoneme
database 10 corresponding to a name of phoneme. The stochastic
component is also stored in the phoneme database 10 corresponding
to the name of phoneme. In this phoneme database 10, they are
divided into articulation (phonemic chain) data and stationary data
to be stored as shown in FIG. 2B. Hereinafter, "voice synthesis
unit data" is a general term for the articulation data and the
stationary data.
[0042] The voice synthesis data is a chain of data corresponding to
a first phoneme name, a following phoneme name, the characteristic
parameter and the stochastic component.
[0043] On the other hand, the stationary data is a chain of data
corresponding to one phoneme name, a chain of the characteristic
parameters and the stochastic component.
[0044] Back to FIGS. 1, a unit 11 is a performance data storage
unit for storing the performance data. The performance data is, for
example, MIDI information that includes information such as a
musical note, lyrics, a pitch bend, dynamics, etc.
[0045] A voice synthesis unit selector 12 accepts an input of
performance data kept in the performance data storage unit 11 in a
unit of a frame (hereinafter the unit are called the frame data),
and reads voice synthesis unit data corresponding to lyrics data
included in the input performance data by selecting it from the
phoneme database 10.
[0046] A previous articulation data storage unit 13 and a later
articulation data storage unit 14 are used for storing stationary
data. The previous articulation data storage unit 13 stores
previous articulation data of stationary data to be processed. On
the other hand, the later articulation data storage unit 14 stores
later articulation data of stationary data to be processed.
[0047] A characteristic parameter interpolation unit 15 reads a
parameter of a last frame of the articulation data stored in the
previous articulation data storage unit 13 and a characteristic
parameter of a first frame of the articulation data stored in the
later articulation data storage unit 14, and interpolates the
characteristic parameters in a time sequence to be corresponding to
a time directed by the timer 27.
[0048] A stationary data storage unit 16 temporarily stored
stationary data from voice synthesis data read by the voice
synthesis unit selector 12. On the other hand, an articulation data
storage unit 17 temporarily stored articulation data.
[0049] A characteristic parameter change detecting unit 18 reads
stationary data stored in the stationary data storage unit 16 to
extract a change (throb) of the characteristic parameter, and it
has a function to output as a change component.
[0050] An adding unit K1 is a unit to output deterministic
component data of the long sound by adding output of the
characteristic parameter interpolation unit 15 and output of the
characteristic parameter change detecting unit 18.
[0051] A frame reading unit 19 reads articulation data stored in
the articulation data storage unit 17 as frame data in accordance
with a time indicated by a timer 27, and divides into a
characteristic parameter and a stochastic component to output.
[0052] A pitch defining unit 20 defines a pitch of a synthesized
voice to be synthesized finally based on musical note data in frame
data. Also, a characteristic parameter correction unit 21
interpolates a characteristic parameter of a long sound output from
the adding unit K1 and a characteristic parameter of a transition
part output from the frame reading unit 19 based on dynamics
information that is included in performance data. In the preceding
part of the characteristic parameter correction unit 21, a switch
SW1 is provided, and the characteristic parameter of the long sound
and the characteristic parameter of the transition part are input
in the characteristic correction unit. Details of a process in this
characteristic parameter correction unit 21 are explained later. A
switch SW2 switches the stochastic component of the long sound read
from the stationary data storage unit 16 and the stochastic
component of the transition part read from the frame reading unit
19 to output.
[0053] A harmonic chain generating unit 22 generates a harmonic
chain for formant synthesizing on a frequency axis in accordance
with a determined pitch.
[0054] A spectrum envelope generating unit 23 generates a spectrum
envelope in accordance with a characteristic parameter that is
interpolated in the characteristic parameter correction unit
21.
[0055] A harmonics amplitude/phase calculating unit 24 calculates
an amplitude or a phase of each harmonics generated in the harmonic
chain generating unit 22 in accordance with the spectrum envelope
generated in the spectrum envelope generating unit 23.
[0056] An adding unit K2 adds a deterministic component as output
of the harmonics amplitude/phase calculating unit 24 and a
stochastic component output from the switch SW2.
[0057] An inverse FFT unit 25 converts a signal in a frequency
expression into a signal in a time sequential expression by the
inverse fast Fourier transformation (IFFT) of output value of the
adding unit K2.
[0058] An overlapping unit 26 outputs a synthesized singing voice
by overlapping signals obtained one after another from lyrics data
processed in a time sequential order.
[0059] Details of the chacteristic parameter correction unit 21 are
explained based on FIG. 3. The chacteristic parameter correction
unit 21 equips an amplitude defining unit 41. This amplitude
defining unit 41 outputs a desired amplitude value A1 that is
corresponding to dynamics information input from the performance
data storage unit 11 by referring a dynamics amplitude
transformation table Tda.
[0060] Also, a spectrum envelope generating unit 42 generates a
spectrum envelope based on the characteristic parameter output from
the switch SW1.
[0061] A harmonics chain generating unit 43 generates a harmonics
based on the pitch defined in the pitch defining unit 20. An
amplitude calculating unit 44 calculates an amplitude A2
corresponding to the generated spectrum envelope and harmonics.
Calculation of the amplitude can be executed, for example, by the
inverse FFT and the like.
[0062] An adding unit K3 outputs difference between the desired
amplitude value A1 defined in the amplitude defining unit 41 and
the amplitude value A2 calculated in the amplitude calculating unit
44. A gain correcting unit 45 calculates amount of the amplitude
value based on this difference and corrects the characteristic
parameter based on the amount of this gain correction. By doing
that, a new characteristic parameter matched with desired
amplitude.
[0063] Further, in FIG. 3, although the amplitude is defined based
only on the dynamics with reference to the table Tda, a table for
defining the amplitude in accordance with a kind of a phoneme can
be used in addition to the table Tda. That is, a table that can
output different values of the amplitude when the phonemes are
different even if the dynamics are same. Similarly, a table for
defining the amplitude in accordance with a frequency in addition
to the dynamics can also be used.
[0064] Next, an operation of the singing voice synthesizing
apparatus according to a first embodiment of the present invention
is explained by referring a flow chart shown in FIG. 4.
[0065] A performance data storage unit 11 outputs frame data in a
time sequential order. A transition part and a long sound part show
by turns, processes are different for the transition part and the
long sound part.
[0066] When frame data is input from the performance data storage
unit 11 (S1), it is judged whether the frame data is related to a
long sound part or an articulation part in a voice synthesis unit
selector 12 (S2). In a case of the long sound part, previous
articulation data, later articulation data and stationary data are
transmitted to the previous articulation data storage unit 13, the
later articulation data storage unit 14 and the articulation data
storage unit 16 (S3).
[0067] Then, the characteristic parameter interpolation unit 15
picks up the characteristic parameter of the last frame of the
previous articulation data stored in the previous articulation data
storage unit 13 and the characteristic parameter of the first frame
of the last articulation data stored in the later articulation data
storage unit 1. Then a characteristic parameter of the long sound
prosecuted is generated by linear interpolation of these two
characteristic parameters (S4).
[0068] Also, the characteristic parameter of the stationary data
stored in the stationary data storage unit 16 is provided to the
characteristic parameter change detecting unit 18, and a change
component of the characteristic parameter of the stationary data is
extracted (S5). This change component is added to the
characteristic parameter output from the characteristic parameter
interpolation unit 15 in the adding unit K1 (S6). This adding value
is output to the characteristic parameter correction unit 21 as a
characteristic parameter of a long sound via the switch SW1, and
correction of the characteristic parameter is executed (S9). On the
other hand, the stochastic component of stationary data stored in
the stationary data storage unit 16 is provided to the adding unit
K2 via the switch SW2.
[0069] The spectrum envelope generating unit 23 generates a
spectrum envelope for this corrected characteristic parameter. The
harmonics amplitude/phase calculating unit 24 calculates an
amplitude or a phase of each harmonics generated in the harmonic
chain generating unit 22 in accordance with the spectrum envelope
generated in the spectrum envelope generating unit 23. This
calculated result is output to the adding unit K2 as a chain of
parameters (deterministic component) of the prosecuted long sound
part.
[0070] On the other hand, in the case that the obtained frame data
is judged to be a transition part (NO) in Step S2, articulation
data of the transition part is stored in the articulation data
storing unit 17 (S7).
[0071] Next, the frame reading unit 19 reads articulation data
stored in the articulation data storage unit 17 as frame data in
accordance with a time indicated by a timer 27, and divides into a
characteristic parameter and a stochastic component to output. The
characteristic parameter is output to the characteristic parameter
correction unit 21, and the stochastic component is output to the
adding unit K2. This characteristic parameter of the transition
part is executed the same process as the characteristic parameter
of the above long sound in the chacteristic parameter correction
unit 21, the spectrum envelope generating unit 23, the harmonics
amplitude/phase calculating unit 24 and the like.
[0072] Moreover, the switches SW1 and SW2 switch depending on kinds
of prosecuted data. The switch SW1 connects the characteristic
parameter correction unit 21 to the adding unit K1 during
processing the long sound and connects the chacteristic parameter
correction unit 21 to the frame reading unit 19 during processing
the transition part. The switch SW2 connects the adding unit K2 to
the stationary data storage unit 16 during processing the long
sound and connects to the adding unit K2 to the frame reading unit
19 during processing the transition part.
[0073] When the transition part, the characteristic parameter of
the long sound and the stochastic component are calculated, the
added value is processed in the inverse FFT unit 25, and it is
overlapped in the overlapping unit 26 to output a final synthesized
waveform (S10).
[0074] The singing voice synthesizing apparatus according to a
second embodiment of the present invention is explained based on
FIGS. 5. FIGS. 5A to 5C are a block diagram of the singing voice
synthesizing apparatus and an example of phoneme database according
to the second embodiment. An explanation for the same parts as the
first embodiment is omitted by giving the same symbols. One of
differences from the first embodiment is that the articulation data
and the stationary data stored in the phoneme database are assigned
to the characteristic parameters and stochastic component
differently in accordance with the pitches.
[0075] Also, the pitch defining unit 20 defines pitch based on
musical note information in performance data, and outputs the
result to the voice synthesis unit selector.
[0076] As for an operation of the second embodiment, the pitch
defining unit 20 defines pitch of prosecuted frame data based on
the musical note from the performance data storage unit 11, and
outputs the result to the voice synthesis unit selector 12. The
voice synthesis unit selector 12 reads articulation data and
stationary data which are the closest to the defined pitch and
phoneme information in lyrics information. The later process is the
same as that of the first embodiment.
[0077] The singing voice synthesizing apparatus according to a
third embodiment of the present invention is explained based on
FIGS. 6. FIGS. 6A to 6C are a block diagram of the singing voice
synthesizing apparatus and an example of a phoneme database
according to the third embodiment. An explanation for the same
parts as the first embodiment is omitted by giving the same
symbols. One of differences from the first embodiment is that an
expression template selector 30A to select an appropriate vibrato
template from an expression database is equipped based on an
expression database 30 in which vibrato information and the like
are stored and expression information in performance data, in
addition to the phoneme database 10.
[0078] Also, the pitch defining unit 20 defines pitch based on
vibrato data from musical note information performance data and the
expression template selector 30A.
[0079] As for an operation of the third embodiment, reading
articulation data and stationary data from the phoneme database 10
in the voice synthesis unit selector 12 is same as the first
embodiment based on the musical note from the performance data
storage unit 11. The later process is the same as that of the first
embodiment.
[0080] On the other hand, the expression template selector 30A
reads the most suitable vibrato data from the expression database
30 based on expression information from the performance data
storage unit 11. Pitch is defined by the pitch defining unit 20
based on the read vibrato data and musical note information in
performance data.
[0081] The present invention has been described in connection with
the preferred embodiments. The invention is not limited only to the
above embodiments. It is apparent that various modifications,
improvements, combinations, and the like can be made by those
skilled in the art.
* * * * *