U.S. patent application number 10/531632 was filed with the patent office on 2006-06-08 for method for generating and consuming 3d audio scene with extended spatiality of sound source.
Invention is credited to Chie-Teuk Ahn, Dae-Young Jang, Kyeong-Ok Kang, Jin-Woong Kim, Jeong-Il Seo.
Application Number | 20060120534 10/531632 |
Document ID | / |
Family ID | 36574228 |
Filed Date | 2006-06-08 |
United States Patent
Application |
20060120534 |
Kind Code |
A1 |
Seo; Jeong-Il ; et
al. |
June 8, 2006 |
Method for generating and consuming 3d audio scene with extended
spatiality of sound source
Abstract
A method of generating and consuming 3D audio scene with
extended spatiality of sound source describes the shape and size
attributes of the sound source. The method includes the steps of:
generating audio object; and generating 3D audio scene description
information including attributes of the sound source of the audio
object.
Inventors: |
Seo; Jeong-Il; (Daejon,
KR) ; Jang; Dae-Young; (Daejon, KR) ; Kang;
Kyeong-Ok; (Daejon, KR) ; Kim; Jin-Woong;
(Daejon, KR) ; Ahn; Chie-Teuk; (Daejon,
KR) |
Correspondence
Address: |
BLAKELY SOKOLOFF TAYLOR & ZAFMAN
12400 WILSHIRE BOULEVARD
SEVENTH FLOOR
LOS ANGELES
CA
90025-1030
US
|
Family ID: |
36574228 |
Appl. No.: |
10/531632 |
Filed: |
October 15, 2003 |
PCT Filed: |
October 15, 2003 |
PCT NO: |
PCT/KR03/02149 |
371 Date: |
October 31, 2005 |
Current U.S.
Class: |
381/19 |
Current CPC
Class: |
H04S 7/302 20130101;
H04S 2420/13 20130101; H04S 3/002 20130101; H04S 3/00 20130101 |
Class at
Publication: |
381/019 |
International
Class: |
H04R 5/00 20060101
H04R005/00 |
Foreign Application Data
Date |
Code |
Application Number |
Oct 15, 2002 |
KR |
1020020062962 |
Oct 14, 2003 |
KR |
1020030071345 |
Claims
1. A method for generating a three-dimensional audio scene with a
sound source whose spatiality is extended, comprising the steps of:
a) generating a sound object; and b) generating three-dimensional
audio scene description information including sound source
characteristics information for the sound object, wherein the sound
source characteristics information includes spatiality extension
information of the sound source which is information on the size
and shape of the sound source expressed in a three-dimensional
space.
2. The method as recited in claim 1, wherein the spatiality
extension information of the sound source includes sound source
dimension information that is expressed as an x component, y
component and z component of a three-dimensional rectangular
coordinates.
3. The method as recited in claim 2, wherein the spatiality
extension information of the sound source further includes
geometrical center location information of the sound source
dimension information.
4. The method as recited in claim 2, wherein the spatiality
extension information of the sound source further includes
direction information of the sound source and describes a
three-dimensional audio scene by extending the spatiality of the
sound source in a direction vertical to the direction of the sound
source.
5. A method for consuming a three-dimensional audio scene with a
sound source whose spatiality is extended, comprising the steps of:
a) receiving a sound object and three-dimensional audio scene
description information including sound source characteristics
information for the sound object; and b) outputting the sound
object based on the three-dimensional audio scene description
information, wherein the sound source characteristics information
includes spatiality extension information which is information on
the size and shape of the sound source expressed in a
three-dimensional space.
6. The method as recited in claim 5, wherein spatiality extension
information of the sound source includes sound source dimension
information that is expressed as an x component, y component and z
component of a three-dimensional rectangular coordinates.
7. The method as recited in claim 6, wherein the spatiality
extension information of the sound source further includes
geometrical center location information of the sound source
dimension information.
8. The method as recited in claim 6, wherein the spatiality
extension information of the sound source further includes
direction information of the sound source and describes a
three-dimensional audio scene by extending the spatiality of the
sound source in a direction vertical to the direction of the sound
source.
9. A three-dimensional audio scene data stream with a sound source
whose spatiality is extended, comprising: a sound object; and
three-dimensional audio scene description information including
sound source characteristics information for the sound object data,
wherein the sound source characteristics information includes
spatiality extension information which is information on the size
and shape of the sound source expressed in a three-dimensional
space.
10. The data stream as recited in claim 9, wherein the spatiality
extension information of the sound source includes sound source
dimension information that is expressed as an x component, y
component and z component of a three-dimensional rectangular
coordinates.
11. The data stream as recited in claim 9, wherein the spatiality
extension information of the sound source further includes
geometrical center location information of the sound source
dimension information.
12. The data stream as recited in claim 9, wherein the spatiality
extension information of the sound source further includes
direction information of the sound source and describes a
three-dimensional audio scene by extending the spatiality of the
sound source in a direction vertical to the direction of the sound
source.
Description
TECHNICAL FIELD
[0001] The present invention relates to a method for generating and
consuming a three-dimensional audio scene having sound source whose
spatiality is extended; and, more particularly, to a method for
generating and consuming a three-dimensional audio scene to extend
the spatiality of sound source in a three-dimensional audio
scene.
BACKGROUND ART
[0002] Generally, a content providing server encodes contents in a
predetermined encoding method and transmits the encoded contents to
content consuming terminals that consume the contents. The content
consuming terminals decode the contents in a predetermined decoding
method and output the transmitted contents.
[0003] Accordingly, the content providing server includes an
encoding unit for encoding the contents and a transmission unit for
transmitting the encoded contents. On the other hand, the content
consuming terminals includes a reception unit for receiving the
transmitted encoded contents, a decoding unit for decoding the
encoded contents, and an output unit for outputting the decoded
contents to users.
[0004] Many encoding/decoding methods of audio/video signals are
known so far. Among them, an encoding/decoding method based on
Moving Picture Experts Group 4 (MPEG-4) is widely used these days.
MPEG-4 is a technical standard for data compression and restoration
technology defined by the MPEG to transmit moving pictures at a low
transmission rate.
[0005] According to MPEG-4, an object of an arbitrary shape can be
encoded and the content consuming terminals consume a scene
composed of a plurality of objects. Therefore, MPEG-4 defines Audio
Binary Format for Scene (Audio BIFS) with a scene description
language for designating a sound object expression method and the
characteristics thereof.
[0006] Meanwhile, along with the development in video, users want
to consume contents of more lifelike sounds and video quality. In
the MPEG-4 AudioBIFS, an AudioFX node and a DirectiveSound node are
used to express spatiality of a three-dimensional audio scene. In
these nodes, modeling of sound source is usually depended on
point-source. Point-source can be described and embodied in a
three-dimensional sound space easily.
[0007] Actual point-sources, however, tend to have a dimension more
than two, rather than to be a point of literal meaning. More
important thing here is that the shape of the sound source can be
recognized by human beings, which is disclosed by J. Baluert,
"Spatial Hearing," the MIT Press, Cambridge Mass., 1996.
[0008] For example, a sound of waves dashing against the coastline
stretched in a straight line can be recognized as a linear sound
source instead of a point sound source. To improve the sense of the
real of the three-dimensional audio scene by using the AudioBIFS,
the size and shape of the sound source should be expressed.
Otherwise, the sense of the real of a sound object in the
three-dimensional audio scene would be damaged seriously.
[0009] That is, the spatiality of a sound source could be described
to endow a three-dimensional audio scene with a sound source which
is of more than one-dimensional.
DISCLOSURE OF INVENTION
[0010] It is, therefore, an object of the present invention to
provide a method for generating and consuming a three-dimensional
audio scene having a sound source whose spatiality is extended by
adding sound source characteristics information having information
on extending the spatiality of the sound source to
three-dimensional audio scene description information.
[0011] The other objects and advantages of the present invention
can be easily recognized by those of ordinary skill in the art from
the drawings, detailed description and claims of the present
specification.
[0012] In accordance with one aspect of the present invention,
there is provided a method for generating a three-dimensional audio
scene with a sound source whose spatiality is extended, including
the steps of: a) generating a sound object; and b) generating
three-dimensional audio scene description information including
sound source characteristics information for the sound object,
wherein the sound source characteristics information includes
spatiality extension information of the sound source which is
information on the size and shape of the sound source expressed in
a three-dimensional space.
[0013] In accordance with one aspect of the present invention,
there is provided a method for consuming a three-dimensional audio
scene with a sound source whose spatiality is extended, including
the steps of: a) receiving a sound object and three-dimensional
audio scene description information including sound source
characteristics information for the sound object; and b) outputting
the sound object based on the three-dimensional audio scene
description information, wherein the sound source characteristics
information includes spatiality extension information which is
information on the size and shape of a sound source expressed in a
three-dimensional space.
BRIEF DESCRIPTION OF DRAWINGS
[0014] The above and other objects and features of the present
invention will become apparent from the following description of
the preferred embodiments given in conjunction with the
accompanying drawings, in which:
[0015] FIG. 1 is a diagram illustrating various shapes of sound
sources;
[0016] FIG. 2 is a diagram describing a method for expressing
spatial sound source by grouping successive point sound
sources;
[0017] FIG. 3 shows an example where spatiality extension
information is added to a "DirectiveSound" node of AudioBIFS in
accordance with the present invention;
[0018] FIG. 4 is a diagram illustrating how a sound source is
extended in accordance with the present invention; and
[0019] FIG. 5 is a diagram depicting the distributions of point
sound sources based on the shapes of various sound sources in
accordance with the present invention.
BEST MODE FOR CARRYING OUT THE INVENTION
[0020] Other objects and aspects of the invention will become
apparent from the following description of the embodiments with
reference to the accompanying drawings, which is set forth
hereinafter.
[0021] Following description exemplifies only the principles of the
present invention. Even if they are not described or illustrated
clearly in the present specification, one of ordinary skill in the
art can embody the principles of the present invention and invent
various apparatuses within the concept and scope of the present
invention.
[0022] The use of the conditional terms and embodiments presented
in the present specification are intended only to make the concept
of the present invention understood, and they are not limited to
the embodiments and conditions mentioned in the specification.
[0023] In addition, all the detailed description on the principles,
viewpoints and embodiments and particular embodiments of the
present invention should be understood to include structural and
functional equivalents to them. The equivalents include not only
currently known equivalents but also those to be developed in
future, that is, all devices invented to perform the same function,
regardless of their structures.
[0024] For example, block diagrams of the present invention should
be understood to show a conceptual viewpoint of an exemplary
circuit that embodies the principles of the present invention.
Similarly, all the flowcharts, state conversion diagrams, pseudo
codes and the like can be expressed substantially in a
computer-readable media, and whether or not a computer or a
processor is described distinctively, they should be understood to
express various processes operated by a computer or a
processor.
[0025] Functions of various devices illustrated in the drawings
including a functional block expressed as a processor or a similar
concept can be provided not only by using hardware dedicated to the
functions, but also by using hardware capable of running proper
software for the functions. When a function is provided by a
processor, the function may be provided by a single dedicated
processor, single shared processor, or a plurality of individual
processors, part of which can be shared.
[0026] The apparent use of a term, `processor`, `control` or
similar concept, should not be understood to exclusively refer to a
piece of hardware capable of running software, but should be
understood to include a digital signal processor (DSP), hardware,
and ROM, RAM and non-volatile memory for storing software,
implicatively. Other known and commonly used hardware may be
included therein, too.
[0027] In the claims of the present specification, an element
expressed as a means for performing a function described in the
detailed description is intended to include all methods for
performing the function including all formats of software, such as
combinations of circuits for performing the intended function,
firmware/microcode and the like. To perform the intended function,
the element is cooperated with a proper circuit for performing the
software. The present invention defined by claims includes diverse
means for performing particular functions, and the means are
connected with each other in a method requested in the claims.
Therefore, any means that can provide the function should be
understood to be an equivalent to what is figured out from the
present specification.
[0028] Other objects and aspects of the invention will become
apparent from the following description of the embodiments with
reference to the accompanying drawings, which is set forth
hereinafter. The same reference numeral is given to the same
element, although the element appears in different drawings. In
addition, if further detailed description on the related prior arts
is determined to blur the point of the present invention, the
description is omitted. Hereafter, preferred embodiments of the
present invention will be described in detail.
[0029] FIG. 1 is a diagram illustrating various shapes of sound
sources. Referring to FIG. 1, a sound source can be a point, a
line, a surface and space having a volume. Since sound source has
an arbitrary shape and size, it is very complicated to describe the
sound source. However, if the shape of the sound source to be
modeled is controlled, the sound source can be described less
complicatedly.
[0030] In the present invention, it is assumed that point sound
sources are distributed uniformly in the dimension of a virtual
sound source in order to model sound sources of various shapes and
sizes. As a result, the sound sources of various shapes and sizes
can be expressed as continuous arrays of point sound sources. Here,
the location of each point sound source in a virtual object can be
calculated using a vector location of a sound source which is
defined in a three-dimensional scene.
[0031] When a spatial sound source is modeled with a plurality of
point sound sources, the spatial sound source should be described
using a node defined in AudioBIFS. When the node defined in
AudioBIFS, which will be referred to as an AudioBIFS node, is used,
any effect can be included in the three-dimensional scene.
Therefore, an effect corresponding to the spatial sound source can
be programmed through the AudioBIFS node and inserted to the
three-dimensional scene.
[0032] However, this requires very complicated Digital Signal
Processing (DSP) algorithm and it is very troublesome to control
the dimension of the spatial sound source.
[0033] Also, the point sound sources distributed in a limited
dimension of an object are grouped using the AudioBIFS, and the
spatial location and direction of the sound sources can be changed
by changing the sound source group. First of all, the
characteristics of the point sound sources are described using a
plurality of "DirectiveSound" node. The locations of the point
sound sources are calculated to be distributed on the surface of
the object uniformly.
[0034] Subsequently, the point sound sources are located with a
spatial distance that can eliminate spatial aliasing, which is
disclosed by A. J. Berkhout, D. de Vries, and P. Vogel, "Acoustic
control by wave field synthesis," J. Aoust. Soc. Am., Vol. 93, No.
5 on pages from 2764 to 2778, May, 1993. The spatial sound source
can be vectorized by using a group node and grouping the point
sound sources.
[0035] FIG. 2 is a diagram describing a method for expressing
spatial sound source by grouping successive point sound sources. In
the drawing, a virtual successive linear sound source is modeled by
using three point sound sources which are distributed uniformly
along the axis of the linear sound source.
[0036] The locations of the point sound sources are determined to
be (x.sub.0-dx, y.sub.0-dy, z.sub.0-dz), (x.sub.0, y.sub.0,
z.sub.0), and (x.sub.0+dx, y.sub.0+dy, z.sub.0+dz) according to the
concept of the virtual sound source. Here, dx, dy and dz can be
calculated from a vector between a listener and the location of the
sound source and the angle between the direction vectors of the
sound source, the vector and the angle which are defined in an
angle field and a direction field.
[0037] FIG. 2 describes a spatial sound source by using a plurality
of point sound sources. AudioBIFS appears it can support the
description of a particular scene. However, this method requires
too much unnecessary sound object definition. This is because many
objects should be defined to model one single object.
[0038] When it is told that the genuine object of hybrid
description of Moving Picture Experts Group 4 (MPEG-4) is more
object-oriented representations, it is desirable to combine the
point sound sources, which are used for model one spatial sound
source, and reproduce one single object.
[0039] In accordance with the present invention, a new field is
added to a "DirectiveSound" node of the AudioBIFS to describe the
shape and size attributes of a sound source. FIG. 3 shows an
example where spatiality extension information is added to a
"DirectiveSound" node of AudioBIFS in accordance with the present
invention.
[0040] Referring to FIG. 3, a new rendering design corresponding to
a value of a "SourceDimensions" field is applied to the
"DirectiveSound" node. The "SourceDimensions" field also includes
shape information of the sound source. If the value of the
"SourceDimensions" field is "0,0,0", the sound source becomes one
point, no additional technology for extending the sound source is
applied to the "DirectiveSound" node. If the value of the
"SourceDimensions" field is a value other than "0,0,0", the
dimension of the sound source is extended virtually.
[0041] The location and direction of the sound source are defined
in a location field and a direction field, respectively, in the
"DirectiveSound" node. The dimension of the sound source is
extended in vertical to a vector defined in the direction field
based on the value of the "SourceDimensions" field.
[0042] The "location" field defines the geometrical center of the
extended sound source, whereas the "SourceDimensions" field defines
the three-dimensional size of the sound source. In short, the size
of the sound source extended spatially is determined according to
the values of .DELTA.x, .DELTA.y and .DELTA.z.
[0043] FIG. 4 is a diagram illustrating how a sound source is
extended in accordance with the present invention. As illustrated
in the drawing, the value of the "SourceDimensions" field is (0,
.DELTA.y, .DELTA.z), .DELTA.y and .DELTA.z being not zero
(.DELTA.y.noteq.0, .DELTA.z.noteq.0). This indicates a surface
sound source having an area of .DELTA.y.times..DELTA.z.
[0044] The illustrated sound source is extended in a direction
vertical to a vector defined in the "direction" field based on the
values of the "SourceDimensions" field, i.e., (0, .DELTA.y,
.DELTA.z), and thereby forming a surface sound source. As shown in
the above, when the dimension and location of a sound source is
defined, the point sound sources are located on the surfaces of the
extended sound source. In the present invention, the locations of
the point sound sources are calculated to be distributed on the
surfaces of the extended sound source uniformly.
[0045] FIGS. 5A to 5C are diagrams depicting the distributions of
point sound sources based on the shapes of various sound sources in
accordance with the present invention. The dimension and distance
of a sound source are free variables. So, the size of the sound
source that can be recognized by a user can be formed freely.
[0046] For example, multi-track audio signals that are recorded by
using an array of microphones can be expressed by extending point
sound sources linearly as shown in FIG. 5A. In this case, the value
of the "SourceDimensions" field is (0, 0, .DELTA.z).
[0047] Also, different sound signals can be expressed as an
extension of a point sound source to generate a spread sound
source. FIGS. 5B and 5C show a surface sound source expressed
through the spread of the point sound source and a spatial sound
source having a volume. In case of FIG. 5B, the value of the
"SourceDimensions" field is (0, .DELTA.y, .DELTA.z) and, in case of
FIG. 5C, the value of the "SourceDimensions" field is (.DELTA.x,
.DELTA.y, .DELTA.z).
[0048] As the dimension of a spatial sound source is defined as
described in the above, the number of the point sound sources
(i.e., the number of input audio channels) determines the density
of the point sound sources in the extended sound source.
[0049] If an "AudioSource" node is defined in a "source" field, the
value of a "numChan" field may indicate the number of used point
sound sources. The directivity defined in "angle," "directivity"
and "frequency" fields of the "DirectiveSound" node can be applied
to all point sound sources included in the extended sound source
uniformly.
[0050] The apparatus and method of the present invention can
produce more effective three-dimensional sounds by extending the
spatiality of sound sources of contents.
[0051] While the present invention has been described with respect
to certain preferred embodiments, it will be apparent to those
skilled in the art that various changes and modifications may be
made without departing from the scope of the invention as defined
in the following claims.
* * * * *