U.S. patent application number 11/686291 was filed with the patent office on 2007-07-19 for method and system for proximity based voice chat.
This patent application is currently assigned to Sony Computer Entertainment America Inc.. Invention is credited to Masayuki Chatani, Mark Lester Jacob.
Application Number | 20070168359 11/686291 |
Document ID | / |
Family ID | 25296986 |
Filed Date | 2007-07-19 |
United States Patent
Application |
20070168359 |
Kind Code |
A1 |
Jacob; Mark Lester ; et
al. |
July 19, 2007 |
METHOD AND SYSTEM FOR PROXIMITY BASED VOICE CHAT
Abstract
A method for use in processing audio includes receiving audio
data associated with each of two or more characters in a virtual
world, determining a proximity of the other characters in the
virtual world for each of the characters, altering the received
audio data based on the determined proximity of another one of the
characters for each of the characters, and providing the altered
audio data to a client associated with the other character for
which the audio data was altered for each of the characters.
Inventors: |
Jacob; Mark Lester; (San
Diego, CA) ; Chatani; Masayuki; (Foster City,
CA) |
Correspondence
Address: |
FITCH EVEN TABIN AND FLANNERY
120 SOUTH LA SALLE STREET
SUITE 1600
CHICAGO
IL
60603-3406
US
|
Assignee: |
Sony Computer Entertainment America
Inc.
Foster City
CA
|
Family ID: |
25296986 |
Appl. No.: |
11/686291 |
Filed: |
March 14, 2007 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
09846115 |
Apr 30, 2001 |
|
|
|
11686291 |
Mar 14, 2007 |
|
|
|
Current U.S.
Class: |
1/1 ;
704/E13.004; 707/999.01 |
Current CPC
Class: |
G10L 13/033 20130101;
A63F 2300/65 20130101; A63F 13/40 20140902; G10L 2015/226 20130101;
A63F 13/215 20140902; A63F 2300/1081 20130101; A63F 13/34 20140902;
G10L 2021/0135 20130101; A63F 13/12 20130101 |
Class at
Publication: |
707/010 |
International
Class: |
G06F 17/30 20060101
G06F017/30 |
Claims
1. A method for use in processing audio, comprising: receiving
audio data associated with each of two or more characters in a
virtual world; for each of the characters, determining a proximity
of the other characters in the virtual world; for each of the
characters, altering the received audio data based on the
determined proximity of another one of the characters; and for each
of the characters, providing the altered audio data to a client
associated with the other character for which the audio data was
altered.
2. A method in accordance with claim 1, further comprising: for
each of the characters, mixing the altered audio data into a
multi-channel format.
3. A method in accordance with claim 2, further comprising: for
each of the characters, encoding the multi-channel format
stream.
4. A method in accordance with claim 1, further comprising: in each
of the clients, mixing the altered audio data with other audio and
playing it out through one or more speakers.
5. A method in accordance with claim 1, wherein the step of
altering the received audio data comprises scaling volume levels of
the audio data.
6. A method in accordance with claim 1, wherein the step of
altering the received audio data comprises introducing effects into
the audio data.
7. A method in accordance with claim 6, wherein the effects
comprise a delay effect.
8. A method for use in processing audio, comprising: receiving
audio data from each of a plurality of client computers coupled to
an interactive network, wherein each client computer is associated
with a character represented in a program executed on each
computer; determining a relative location of the characters in an
environment defined by the program; altering characteristics of the
audio data received from each of the client computers based upon
the determined relative locations of the characters in the
environment defined by the program; and providing the altered audio
data received from each of the client computers to another one of
the client computers corresponding to the character for which the
audio data was altered.
9. A method in accordance with claim 8, further comprising: mixing
the altered audio data received from each of the client computers
into a multi-channel format.
10. A method in accordance with claim 9, further comprising:
encoding each of the multi-channel format streams.
11. A method in accordance with claim 8, further comprising: in
each of the client computers, mixing the altered audio data with
other audio and played out through one or more speakers.
12. A method in accordance with claim 8, wherein the step of
altering characteristics of the audio data comprises introducing
effects into the audio data.
13. A method in accordance with claim 12, wherein the effects
comprise a delay effect.
14. A system for use in processing audio, comprising: a server that
is configured to: receive audio data associated with each of two or
more characters in a virtual world; for each of the characters,
determine a proximity of the other characters in the virtual world;
for each of the characters, alter the received audio data based on
the determined proximity of another one of the characters; and for
each of the characters, provide the altered audio data to a client
associated with the other character for which the audio data was
altered.
15. A system in accordance with claim 14, wherein the server is
further configured to: for each of the characters, mix the altered
audio data into a multi-channel format.
16. A system in accordance with claim 14, wherein the server is
further configured to: for each of the characters, encode the
multi-channel format stream.
17. A system in accordance with claim 14, wherein the server is
further configured to: in each of the clients, mix the altered
audio data with other audio and playing it out through one or more
speakers.
18. A system in accordance with claim 14, wherein the altering the
received audio data comprises scaling volume levels of the audio
data.
19. A system in accordance with claim 14, wherein the altering the
received audio data comprises introducing effects into the audio
data.
20. A system in accordance with claim 19, wherein the effects
comprise a delay effect.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application is a continuation-in-part (CIP) of U.S.
patent application Ser. No. 09/846,115, filed on Apr. 30, 2001,
entitled "ALTERING NETWORK TRANSMITTED CONTENT DATA BASED UPON USER
SPECIFIED CHARACTERISTICS", which is hereby incorporated by
reference herein in its entirety.
FIELD OF THE INVENTION
[0002] The present invention relates generally to computer
networks, and more specifically, to a system for transforming data
transmitted over a network through characteristics specified by a
user.
BACKGROUND OF THE INVENTION
[0003] The basic functions of a computer network are to transmit,
exchange or store data transmitted among computers coupled to the
network. Most network implementations use a computer network simply
as a point-to-point system to route and channel information among
the networked computers. Some processes, such as compression or
encryption techniques that speed transmission rates or enhance
transmission security may be implemented on the transmitted data.
In general, however, relatively little processing is performed on
most data once it is transmitted from the sending terminal. Data is
typically processed at the sending terminal and transmitted to the
receiving terminal in its processed form. Standard network
transmission systems therefore do not provide flexibility or
opportunity for a receiver or third party to transform or process
the data according to the receiving party's needs.
[0004] Present communication systems also typically do not provide
effective mechanisms in which the relative location of various
users is reflected in the audio output of characters representing
the users in a networked game or other application.
[0005] What is needed, therefore, is a system that allows
transmitted content data to be processed or transformed according
to a receiver's needs after it has been generated and transmitted
by a sending terminal.
SUMMARY OF THE INVENTION
[0006] One embodiment provides a method for use in processing
audio, comprising: receiving audio data associated with each of two
or more characters in a virtual world; for each of the characters,
determining a proximity of the other characters in the virtual
world; for each of the characters, altering the received audio data
based on the determined proximity of another one of the characters;
and for each of the characters, providing the altered audio data to
a client associated with the other character for which the audio
data was altered.
[0007] Another embodiment provides a method for use in processing
audio, comprising: receiving audio data from each of a plurality of
client computers coupled to an interactive network, wherein each
client computer is associated with a character represented in a
program executed on each computer; determining a relative location
of the characters in an environment defined by the program;
altering characteristics of the audio data received from each of
the client computers based upon the determined relative locations
of the characters in the environment defined by the program; and
providing the altered audio data received from each of the client
computers to another one of the client computers corresponding to
the character for which the audio data was altered.
[0008] Another embodiment provides a system for use in processing
audio, comprising: a server that is configured to: receive audio
data associated with each of two or more characters in a virtual
world; for each of the characters, determine a proximity of the
other characters in the virtual world; for each of the characters,
alter the received audio data based on the determined proximity of
another one of the characters; and for each of the characters,
provide the altered audio data to a client associated with the
other character for which the audio data was altered.
[0009] Other objects, features, and advantages of the present
invention will be apparent from the accompanying drawings and from
the detailed description that follows below.
BRIEF DESCRIPTION OF THE DRAWINGS
[0010] The present invention is illustrated by way of example and
not limitation in the figures of the accompanying drawings, in
which like references indicate similar elements, and in which:
[0011] FIG. 1 illustrates a block diagram of a computer network
system that implements embodiments of the present invention;
[0012] FIG. 2 illustrates a block diagram of a network that
includes a content data conversion process for text data, according
to an embodiment of the present invention;
[0013] FIG. 3 illustrates a block diagram of a network that
includes a content data conversion process for voice data,
according to an embodiment of the present invention;
[0014] FIG. 4 is a flow diagram illustrating the processing of data
through the voice conversion process illustrated in FIG. 3,
according to one embodiment of the present invention;
[0015] FIG. 5 illustrates a character profile setup input screen
displayed in a graphical user interface system, according to one
embodiment of the present invention;
[0016] FIG. 6 illustrates a networked game environment in which
user game consoles communicate over a network, according to one
embodiment of the present invention;
[0017] FIG. 7 illustrates a networked game environment in which
user game consoles communicate over a network, according to an
alternative embodiment of the present invention;
[0018] FIG. 8 is a flow diagram illustrating a method for use in
processing audio in accordance with an embodiment of the present
invention; and
[0019] FIG. 9 is a schematic diagram illustrating locations of
characters in a virtual world for illustrating an example
application of the method shown in FIG. 8 in accordance with an
embodiment of the present invention.
DETAILED DESCRIPTION
[0020] A data conversion system for processing downloaded content
over a computer network is described. In the following description,
for purposes of explanation, numerous specific details are set
forth in order to provide a thorough understanding of the present
invention. It will be evident, however, to one of ordinary skill in
the art, that the present invention may be practiced without these
specific details. In other instances, well-known structures and
devices are shown in block diagram form to facilitate explanation.
The description of preferred embodiments is not intended to limit
the scope of the claims appended hereto.
[0021] Aspects of the present invention may be implemented on one
or more computers executing software instructions. According to one
embodiment of the present invention, server and client computer
systems transmit and receive data over a computer network or
standard telephone line. The steps of accessing, downloading, and
manipulating the data, as well as other aspects of the present
invention are implemented by central processing units (CPU) in the
server and client computers executing sequences of instructions
stored in a memory. The memory may be a random access memory (RAM),
read-only memory (ROM), a persistent store, such as a mass storage
device, or any combination of these devices. Execution of the
sequences of instructions causes the CPU to perform steps according
to embodiments of the present invention.
[0022] The instructions may be loaded into the memory of the server
or client computers from a storage device or from one or more other
computer systems over a network connection. For example, a client
computer may transmit a sequence of instructions to the server
computer in response to a message transmitted to the client over a
network by the server. As the server receives the instructions over
the network connection, it stores the instructions in memory. The
server may store the instructions for later execution, or it may
execute the instructions as they arrive over the network
connection. In some cases, the downloaded instructions may be
directly supported by the CPU. In other cases, the instructions may
not be directly executable by the CPU, and may instead be executed
by an interpreter that interprets the instructions. In other
embodiments, hardwired circuitry may be used in place of, or in
combination with, software instructions to implement the present
invention. Thus, the present invention is not limited to any
specific combination of hardware circuitry and software, nor to any
particular source for the instructions executed by the server or
client computers.
[0023] FIG. 1 is a block diagram of a computer network system that
can be used to implement a data transmission and conversion,
according to one embodiment of the present invention. The system
100 of FIG. 1 enables the transmission and conversion of content
data. The term "content data" in the context of the specification
and claims shall be understood to refer to any type of downloadable
data, which may consist of any one of text data, video linear
streaming data, such as motion picture data in MPEG or MPEG2
format; linear audio streaming data, such as music data in MP3
format; binary program data; voice data; or any combination of such
data or similar data. In general, content data does not include
services or data that are used solely to provide access to a
network, such as browser software or protocol handlers whose main
function is only to establish a network connection.
[0024] FIG. 1 illustrates a computer network system 100 that
implements one or more embodiments of the present invention. In
system 100, a network server computer 104 is coupled, directly or
indirectly, to one or more network client computers 102 through a
network 110. The network interface between server computer 104 and
client computer 102 may also include one or more routers. The
routers serve to buffer and route the data transmitted between the
server and client computers. Network 110 may be the Internet, a
Wide Area Network (WAN), a Local Area Network (LAN), intranet,
extranet, wireless network, or any combination thereof.
[0025] In one embodiment of the present invention, the server
computer 104 is a World-Wide Web (WWW) server that stores data in
the form of `web pages` and transmits these pages as Hypertext
Markup Language (HTML) files over the Internet network 110 to one
or more of the client computers 102. For this embodiment, the
client computer 102 runs a "web browser" program 114 to access the
web pages served by server computer 104. Additional web based
content can be provided to client computer 102 by separate content
providers, such as supplemental server 103.
[0026] In one embodiment of the present invention, server 104 in
network system 100 includes a download service management process
112 that is configured to handle download requests from a user.
Access to the server 104, which may comprise one of several
servers, is facilitated typically through a router on network 110
which directs requests to the download management server. When the
server 104 receives requests from a user, the server executes a
download of requested content from a contents database that is
stored internally or externally to the server. Along with
processing requests for downloading of content data, the server 104
may also retrieve the requesting user's customer data from a
customer database and attach it to the requested primary contents
or use it to modify content or transmission parameters for
particular users. This data is then transmitted via the network 110
by means of a known networking protocol standard, such as the file
transfer protocol (ftp).
[0027] In one embodiment of the present invention, wherein network
110 is the Internet, network server 104 also executes a web server
process 116 to provide HTML documents to client computers coupled
to network 110. To access the HTML files provided by server 104,
client computer 102 runs a web client process (typically a web
browser) 114 that accesses and provides links to web pages
available on server 104 and other Internet server sites. It should
be noted that a network system 100 that implements embodiments of
the present invention may include a larger number of interconnected
client and server computers than shown in FIG. 1.
[0028] The network 110 is normally a bi-directional digital
communications network that connects the user's terminal hardware
with the download management server provided on the server side of
the system. With current technologies, a CATV (cable television)
bi-directional network, ISDN (Integrated Services Digital Network),
DSL (Digital Subscriber Line), or xDSL high-speed networks are
examples of existing network infrastructures enabling the necessary
network connections for implementing embodiments of the present
invention.
[0029] The client computer of system 100 may comprise a personal
computer that includes a modem or network adapter, or it may
comprise a networked game console (entertainment system) that
utilizes a detachable storage medium therein, and a TV monitor or
any other suitable display device connected to the game console.
The modem or network adapter is a device that is used to connect
the client's terminal hardware, e.g., a game console, for
connection to the network 110. For example, if network 110 is a
CATV network, the modem may be implemented as a cable modem device;
and if network 110 is an ISDN network, the modem may be implemented
as a terminal adapter.
[0030] The server can supply digital content such as voice data,
music clips, full-length audio and video programs, movies, still
picture data, and other similar types of content. The content might
further comprise promotional or advertising data associated with
the primary content, such as movie previews, demo games, sample
data, and other similar types of content.
[0031] In one embodiment, network system 100 includes a conversion
system that transforms or processes the data transmitted from the
server to the client to improve the user interface and quality of
entertainment. For the embodiment in which the transmitted data
comprises voice data, the conversion system can be used in various
IP telephony, network chat, video game, or 3D virtual chat
applications, among others.
[0032] FIG. 2 illustrates a block diagram of a network that
includes a content conversion process, according to one embodiment
of the present invention. For the embodiment illustrated in FIG. 2,
the data transmitted from the server comprises text data 201
generated by a server computer and transmitted to a client computer
over a network 210. The text data is output converted into voice
output through a digital-to-analog (D/A) converter 208 coupled to
the client computer. The conversion system 202 includes a
conversion process 204 and a receiver preference database 206. In
one embodiment, the conversion system 202 is resident within the
client computer. Alternatively, the conversion system 202 can be
included within a separate computer coupled to the network and the
client computer.
[0033] The conversion process 204 includes circuits or processes
that convert the input text data to output data, as well as
processes that modify or transform the characteristics of the text
data. For example, for voice output, the conversion process can be
used to control various characteristics such as, tone, accents,
intonation, and effects, such as echo, reverberation, and so on.
For speech output, the conversion process can control
characteristics such as language, dialect, expression, and so on.
For example, the conversion process 204 may include a translator
that translates speech in one language to another language. The
conversion process can also include processes that mimic the voice
characteristics of well-known characters or personalities.
[0034] FIG. 3 illustrates a block diagram of a network that
includes a content conversion process for content data that
comprises voice data, according to one embodiment of the present
invention. For the embodiment illustrated in FIG. 3, the data
transmitted from the server comprises voice data generated by a
server computer and transmitted to a client computer over a network
310. The voice data 302 is first input through an analog-to-digital
(A/D) converter 302 for conversion into digital form. The voice
packets can be addressed in one of several ways, including Unicast,
Multicast, or broadcast format.
[0035] Alternatively, if the voice data includes data that is first
input into the server computer, the data can be digitized prior to
input to the server computer. For example, a microphone or other
input means may include an A/D converter to convert the voice data
to digital form prior to input to the server computer. The
digitized voice data is then transmitted over network 310 for
further processing by voice conversion means 312.
[0036] The voice of the transmitted data can be changed and sent to
other assigned user(s) over the network using a protocol such as
Voice over IP (VoIP). The voice can be changed based on various
factors such as virtual character talk parameters, or user provided
preferences. The digitized voice data is transformed into output
voice data at the client computer through digital-to-analog (D/A)
converter 304. The digitized voice data output from AID converter
302 is processed through conversion system 312. The conversion
system 312 includes a voice conversion process 314 and a conversion
rules database 316. Alternatively, the digitized voice data can be
converted to analog form after output from the client computer
through an external D/A converter. Such a D/A converter may be
incorporated into speaker, headphone, or other sound output systems
that receive digital audio output from the client computer.
[0037] The voice conversion process 314 comprises processes that
alter or modify the digitized voice data output from A/D converter
302 in the server computer into converted voice data to be output
from D/A converter 304 on the client computer. FIG. 4 illustrates
the basic flow of data through the voice conversion process
illustrated in FIG. 3, according to one embodiment of the present
invention. In flow diagram 400, audio data 402 represents the
digitized voice data that is output from A/D converter after input
into the server computer through an input device, such as a
microphone. The digitized audio data 402 is converted into text
data 404 through a voice recognition process that converts
digitized audio data to equivalent digital text data. The text data
404 is then processed by a text conversion process 414 to produce
converted text data 406. This converted text data 406 is then
processed through a voice synthesis process 416 to produce audio
data 408. The audio data 408 comprises digital audio data that is
input to D/A converter 304 for conversion to analog voice to be
output through speakers on the client computer.
[0038] The text conversion process 414 includes several
sub-processes that alter the original voice data to change the
voice as it is played back on the client computer. Such changes can
include modification of the original voice tone, accent,
intonation, and so on. The text conversion process can also include
processes that alter the substance of the input data, such as
language translation (e.g., English-French) or dialect translation.
Primarily, the text conversion process alters the expression of the
original voice data. The expression shows a character's personality
or attribute (e.g., male or female or child speaker), character's
circumstance or environment (e.g., in a tunnel, cave, etc.), the
character's condition (e.g., excited, sad, injured, etc.). The text
conversion process can also include special effects that alter the
input voice data, such as Doppler effect, echo, and so on.
[0039] In one embodiment of the present invention, the
characteristics that dictate how the voice data is converted are
provided by a conversion rules process 316. The rules process 316
specifies several parameters used by the voice conversion process
314 that are used to alter the input voice data. The voice rules
process 316 includes user provided character profiles. In one
embodiment, the character profiles are entered by the user through
a user interface provided on the client computer.
[0040] The character profile can be used to tailor the voice that a
displayed character speaks with in applications such as video
games, educational programs, interactive applications,
text-to-speech programs, and the like. The character talking voice
is determined by fundamental parameters, such as frequency,
waveform, etc.). The voice conversion process shapes the basic
waveform to produce a converted voice that corresponds to the
selected character profile. In one embodiment, a user can set the
profile for the character.
[0041] FIG. 5 illustrates a graphical user screen that illustrates
a character profile input display. The character profile set up
display window 500 includes several user selectable input fields
that the user can change to alter the characteristics of the voice
output. The user first selects the gender of the character that
will recite the playback voice. As shown, the user can select a
man's voice or a woman's voice. Other voice type characteristics
can also be provided, such as child or baby. Various voice
characteristics are also provided, such as age, sociability,
activity, intelligence, and masculinity. Each of these
characteristics shapes the voice playback parameters. For example,
choosing an older age or increasing the masculinity generally
lowers the voice. The sociability, activity, and intelligence
characteristics generally affect how active and articulate the
playback voice is portrayed.
[0042] For the embodiment illustrated in FIG. 5, the user
characteristics are displayed as bar slides that the user can move
through an input device, such as a mouse, to select a relative
value for the respective characteristic. It should be noted that
various other input methods could be provided, such as numerical
value entries, percentage value entries, and the like.
[0043] In an alternative embodiment, the character's talking voice
can be created based on each pre-set characters profile. For this
embodiment, the rules process 316 can include a user specified
database that stores certain parameters or data entries for various
variables of the voice data. For example, database parameters could
include values that dictate the gender of the output voice,
language, expression, and so on. Through the use of such a
database, the voice data output on the client computer could, for
example, be set to speak in English in a male voice with an English
accent.
[0044] In one embodiment of the present invention, the voice
conversion process is implemented in a distributed interactive game
system comprising a plurality of networked games coupled among two
or more users. FIG. 6 illustrates a networked game environment in
which user game consoles communicate over a network, according to
one embodiment of the present invention. A first user game console
605 is coupled to network 608 through a cable modem 606. For this
embodiment, network 608 is typically a cable TV (CATV) network.
Also coupled to game console 605 is a speaker pair 604 for voice
output, and a microphone 602 for voice input. A second user game
console 607 is coupled to network 608 through a cable modem 612. A
microphone 614 and speaker pair 616 is coupled to the game console
607.
[0045] In system 600, a server computer 610 may be coupled to
network 608. The server computer can perform a variety of
functions, such as game monitoring, providing game or application
programs, managing user accounts, and the like.
[0046] FIG. 7 illustrates a networked game enviromnent in which
user game consoles communicate over a network, according to one
embodiment of the present invention. For system 700, the network
708 comprises the Internet, and the first game console 705 is
coupled to the second game console 707 through Voice over IP
gateways 706 and 712. Each game console is attached to a speaker
704, 716, and microphone 702 and 714 set, respectively.
[0047] For the embodiments illustrated in FIGS. 6 and 7, the output
voice characteristic depends upon user information. In this manner,
each user participant (player) can have a different voice assigned
to his character or terminal. It is assumed that each user controls
a character that is displayed on the terminal of each game console.
The characteristics of the character's voice can then be determined
based on the location of the user to whom the character belongs.
For example, assuming each game console has a left and right pair
of speakers, the output voice volume ratio of the speaker pair is
set based on the direction of the sender location. This provides
some spatial effect of the voice relative to the location of the
speaking character. The volume can also be changed based on the
distance between the sender and the receiver. Alternatively, when a
plurality of users is communicating with one another, each user's
voice is assigned to each speaker based on their location.
[0048] The user location determination process is included in the
voice conversion process as a means of altering the voice of a
character played back on a user game console. In this process, the
direction or/and distance between the sender and the receiver is
calculated and the volume ratio of the left-right speaker pair is
set based on the calculated data. In the case of surround-sound
environment in which multiple speakers are coupled to a console,
the other speakers are also considered.
[0049] In one embodiment, user location information for a plurality
of networked game players is determined by using address
information for each of the players. Address information can be
stored in a database provided in each game console.
[0050] The address or location information may be provided by using
the telephone number for each player. In this case, the area code
provides a rough approximation of a user's location relative to the
other users. An address database related to telephone numbers is
stored in the memory of each terminal. A particular user's terminal
receives a sender's telephone number and retrieves the location
based on the telephone number. Using the retrieved location data
and the user's own location data, the receiver terminal calculates
the direction or/and distance.
[0051] In an alternative embodiment, the location information can
be provided using a personal database stored in each game console
memory (e.g., secondary memory). For this case, each user has to
input the other user's addresses in advance. Zip code information
could be used to provide reasonable approximations of user
locations. The information is stored in a memory location of the
game console. When a connection between users is established, ID
information (e.g., user ID, telephone No., etc.) is sent to each
user. Using the ID information, the user location is retrieved in
each personal database and the direction and/or distance is
calculated based on the user location.
[0052] Instead of storing user location information in each game
console, the address information for a group of networked users can
be stored in a central server, such as server 610 in FIG. 6. For
this embodiment, the server stores the addresses or location
information (zip code, area code, etc.) for all of the users in a
database. The direction and/or the distance are calculated based on
the stored user information in the server. The server sends each
user direction and/or distance information for the other users.
Each individual user terminal then sets the volume ratio or whole
volume based on the location information. For this embodiment,
voice data is sent to each user through the server.
[0053] It should be noted that the process of altering the data in
accordance with output voice characteristics can be implemented
either in the server (data sending) computer, the client (data
receiving) computer, or a network server computer coupled to the
server and client computer. Each computer capable of altering the
transmitted data would have associated with it a voice or text
conversion means, such as that illustrated in FIG. 4. Such a
conversion means could be implemented in hardware circuitry coupled
to the computer, a software program executed by the computer, or a
combination of dedicated hardware and software processes. Moreover,
the database storing the various voice characteristics for each
associated client computer or character within a client computer
can be stored locally in each client computer or centrally in a
database accessible to a network server computer.
[0054] Depending upon where the output alteration process is
performed, the steps of transmitting, altering, and receiving the
data can be done in various different step sequences. For example,
the data can be first transmitted from the server computer, altered
in the server or other computer, and then received by the client
computer. If the alteration process is performed by the client
computer, the process can be performed by first transmitting the
data from the server computer to the client computer, receiving the
data in the client computer, and then altering the data in
accordance with the specified output characteristics.
[0055] Besides game or entertainment programs, the voice conversion
process described in relation to FIGS. 6 and 7 can be used in
various other applications involving speech content transmitted
among a plurality of users. Examples include chat room
applications, Internet telephony, and other similar
applications.
[0056] The characters that represent users in a networked game,
cyberspace, or other application are sometimes referred to as
"avatars". That is, an avatar may comprise a user's own interactive
graphical representation in cyberspace or virtual reality
environment. In some embodiments of the present invention, the
audio may be adjusted based on the relative location of avatars or
other characters in the game world.
[0057] More specifically, some embodiments provide for the dynamic
processing of the audio channel so that the virtual distance
between two avatars may be reflected in the audio output, such as
for example through audio attenuation levels. For example, when the
avatars are far apart in the game world the user chat channel
between the two controlling users may be attenuated, whereas when
the avatars come closer together in the game world the volume may
be increased.
[0058] Such a system may be implemented in many different types of
game related voice chat systems. For example, some game related
voice chat systems are implemented as part of a "push-to-talk"
system. The "push-to-talk" method is meant to simulate radio
communications via walkie-talkie or other radio frequency (RF)
based devices. As another example, some game related voice chat
systems are implemented as part of an "always-on" system using a
voice activity detection algorithm. The "always-on" method is used
to simulate a local area network (LAN) party environment where the
user merely needs to yell across the room to heckle at an opponent
or to speak to his or her neighbor to coordinate a strategy.
[0059] In these and other voice chat systems, the user talks, but
the user speaks for his or her avatar or other character in the
game world. Such systems may be used in any type of interactive
system or game, including but not limited to so-called first-person
perspective simulations and third-person perspective simulations.
Specifically, a first-person perspective simulation employs a
camera scheme that may be referred to herein as a first-person game
or more generally a first-person camera setting. Similarly, a
third-person perspective simulation employs a camera scheme that
may be referred to herein as a third-person game or more generally
a third-person camera setting. In a first-person camera setting the
on-screen view simulates the in-game character's point of view with
the camera movement normally being linked to the movement of the
character. In a third-person camera setting the player normally
sees the game world from a viewpoint above and behind the main
character. In many third-person games the camera movement is not
linked to the movement of the main character.
[0060] Some types of games or other similar applications use a
server for networked game play. The server typically has knowledge
of the game data and is able to either wholly run the simulation
and deliver updates to the clients or run the simulation in
parallel with the clients currently in the game. The clients
typically send their voice data to the server and let the server
handle the delivery of the voice traffic to the appropriate
clients.
[0061] In some embodiments of the present invention, the server
side simulation may be used to determine avatar proximities within
the game world and adjust the volume levels and position of the
individual streams. The individual streams may then be mixed
together into a multi-channel signal which is then delivered to the
appropriate client. In some embodiments a server such as the server
610 (FIG. 6) described above may be used for these functions and
the functions described below.
[0062] Referring to FIG. 8, there is illustrated a method 800 that
operates in accordance with an embodiment of the present invention.
The method 800, which may be used for processing audio, begins in
step 802 in which audio data associated with each of two or more
characters in a virtual world is received. In some embodiments the
audio data may be received by a server, such as described above.
Thus, in some embodiments each client device sends the audio data
associated with its character to the server.
[0063] FIG. 9 illustrates an example of a virtual world 900 that
may be used in accordance with an embodiment of the present
invention. The virtual world 900 may comprise any type of virtual
world or area of cyberspace. In some embodiments, the virtual world
900 may comprise a game world or any other environment defined by a
program. Several characters 902, 904, 906, 908, which may comprise
avatars or any other type of character, are dispersed within the
virtual world 900.
[0064] The audio data associated with each of the characters 902,
904, 906, 908 may be received by a server. In some embodiments, the
audio data may include a talking voice of the character, which is
typically provided by a user talking into a microphone of an
associated client device. In some embodiments, the audio data may
also include other audio or sounds, such as ambient noise around
the character in the virtual world. For example, such ambient noise
may include the character stepping on a twig or other items, a car
or train passing nearby the character, sounds made by other nearby
characters, or even sounds coming from a radio held by the
character. Thus, the audio data associated with a character may
include any sounds, such as any sounds that the character could
hear.
[0065] In step 802 (FIG. 8), the proximity of each character with
respect to each of the other characters in the virtual world is
determined. In some embodiments, the relative locations of the
characters in the environment defined by the program may be
determined. Such determinations may be used to indicate the
distance between each of the characters. In some embodiments, a
server, such as a game server, may be used for each character in
the game to determine the proximity of all the other characters in
the game.
[0066] For example, the proximity of character 902 (FIG. 9) with
respect to each of the other characters 904, 906, 908 in the
virtual world 900 may be determined. Namely, the proximity of
character 902 with respect to character 904 may be determined, the
proximity of character 902 with respect to character 906 may be
determined, and the proximity of character 902 with respect to
character 908 may be determined. As illustrated, character 902 is
located very close to character 904 and very far away from
character 908. In some embodiments this process may be repeated so
that the proximity of each character with respect to every other
character is determined.
[0067] In step 804 (FIG. 8), the received audio data for each of
the characters is altered based on the determined proximity of
another one of the characters. In some embodiments, the received
audio data for each of the characters may be altered based on the
determined proximity of each one of the other characters. Thus, if
a first character is located within a specified range of a second
character, then the first character's audio stream may be altered
based on the relative distance between the two characters. In some
embodiments, a server, such as a game server, may be used to
perform such altering. In some embodiments, such altering may
comprise scaling, adjusting volume levels, adding effects, etc.
[0068] For example, since character 902 (FIG. 9) is located very
close to character 904, the audio data received for character 902
may be scaled or altered so that it is louder. This means that
character 904 will hear the audio data associated with character
902 relatively loudly since character 904 is close to character
902. In contrast, since character 902 (FIG. 9) is located very far
away from character 908, the audio data received for character 902
may be scaled or altered so that it is quieter. This means that
character 908 will hear the audio data associated with character
902 relatively quietly since character 908 is far away from
character 902.
[0069] In step 806 (FIG. 8), the audio data for one character that
was altered based on the determined proximity of another character
is provided to a client associated with that other character. For
example, the audio data for character 902 (FIG. 9) that was scaled
based on the determined proximity of character 904 may be provided
to a client associated with character 904. This way, the user of
the client device associated with character 904 will hear the
sounds associated with or coming from character 902 with the volume
of those sounds being adjusted or scaled based on the distance
between characters 902 and 904 in the virtual world 900. Since
character 902 is located very close to character 904, the user of
the client device associated with character 904 will hear the
sounds associated with or coming from character 902 relatively
loudly. In contrast, the user of the client device associated with
character 908 will hear the sounds associated with or coming from
character 902 relatively quietly, or maybe not at all, since
character 902 is located very far away from character 908. Thus, in
some embodiments, the user will only be able hear those sounds that
are within hearing distance of his or her character or avatar in
the virtual or game world. Similarly, when a user talks or causes
his or her character to make some other noise, those sounds will
get quieter to other users whose characters are located at
increasing distances from the speaking character in the virtual
world.
[0070] As mentioned above, in some embodiments other ambient noise
around a character in the virtual world may also be altered, scaled
or adjusted based on the character's location. Again, such ambient
noise may include any sounds, noise or audio in the vicinity of the
character, such as for example, the character stepping on something
or reloading a gun, a vehicle near the character, sounds made by
other nearby characters, etc. Thus, the audio data associated with
a character may include any sounds and not just the character's
voice.
[0071] It was also mentioned above that in some embodiments the
ambient noise may include sounds coming from a radio held by the
character. Along these lines, in some embodiments the audio data
associated with a character may include not only the character's
speaking voice, but also the character's voice as heard through a
radio. That is, the audio data associated with a character may also
include a simulation of the character's voice transmitted over a
radio into which the character may be speaking. Such a simulation
may be created or generated by adding various audio effects to the
audio data so that the character's voice sounds like it is being
transmitted over a radio. Such audio effects may comprise, for
example, static, distortion, reverb, compression, presence, delay,
echo, etc.
[0072] As an example, if character 902 is speaking into a radio,
and character 908 has a radio and is listening to the transmission,
then the only sound heard by the user associated with character 908
may be character 902's voice as heard over the radio. This is
because characters 902 and 908 are located very far apart in the
virtual world 900. However, if character 908 starts walking towards
character 902 in the virtual world, then the user associated with
character 908 will continue to hear character 902's voice as heard
over the radio and may also start to hear character 902's actual
voice because character 902 may be within earshot.
[0073] In some embodiments, the server that is altering or scaling
the audio data associated with character 902 may insert a very
slight delay between character 902's actual voice heard by earshot
and character 902's voice as heard over the radio. Such a delay may
more realistically simulate the sounds heard by the user associated
with character 908 by introducing or producing an echo effect
between character 902's actual voice heard by earshot and character
902's voice as heard over the radio. That is, there will sometimes
be a delay between hearing a speaker's actual voice because he or
she is close by and simultaneously hearing the speaker's voice over
a radio, cellular telephone, or similar device. This is because the
speaker's voice goes through the radio and the radio introduces a
slight delay. This delay may produce an echo effect, which in some
embodiments the server may simulate.
[0074] In some embodiments, the altering or scaling of audio data
based on the relative proximities of the characters in the virtual
world may create some interesting scenarios from a game strategy
point of view. For example, the user corresponding to an enemy
character in an area may overhear another character. As an
illustration, character 904 may be an enemy of character 902, and
since character 904 is located nearby character 902 in the virtual
world 900, the user corresponding to character 904 may be able to
overhear character 902's actual voice as character 902 talks to
character 908 on the radio. Character 902 may not realize this and
thus may unknowingly give away valuable secrets to the enemy.
[0075] In some embodiments, a character's audio data that is
altered or scaled may optionally be mixed or encoded into a
multi-channel format stream prior to being provided to a client
associated with another character. For example, the individual
audio streams for the characters may be mixed into a multi-channel
signal stream based on relative position, which would then be
delivered to the clients. Any multi-channel format may be used. For
example, the Audio Coding 3 (AC-3) format may be used, which is a
known high quality multi-channel digital audio code format. As
another example, the Digital Theater Systems (DTS) sound format may
also be used, which is an established multi-channel audio format in
movie theaters.
[0076] In some embodiments, the multi-channel format stream may
then be encoded andor compressed as necessary to reduce network
bandwidth. Once compressed, the custom multi-channel stream may be
delivered to the appropriate client. Each client may then receive
the stream and decode it. The decoded stream may then be mixed with
other game audio and played out to the speakers.
[0077] As mentioned above, in some embodiments a server, such as
the server 610 (FIG. 6) described above, may be used for
implementing the functions described herein andor illustrated in
FIG. 8. In general, such a server may coordinate the game or other
program or simulation being played by a plurality of users. As
such, the server will generally know the game state and may be able
to determine the proximities of the characters corresponding to the
various users. In some embodiments, the server may receive encoded
audio data streams from the clients and then decode those streams.
The server may use the determined location information to alter,
scale, manipulate or otherwise modify the audio data as described
above. In some embodiments, plug-ins or custom modules may be used
with the server for implementing these functions.
[0078] Thus, embodiments of the present invention provide for
determining the relative locations of the avatars or other
characters corresponding to users in an environment defined by the
program or other simulation. The output characteristics of the
output audio may be altered depending upon the location or
proximities of each character associated with each of the users.
The altered audio data may then be provided to the appropriate
client devices.
[0079] In the foregoing, a system has been described for modifying
transmitted content data based on user preferences. Although the
present invention has been described with reference to specific
exemplary embodiments, it will be evident that various
modifications and changes may be made to these embodiments without
departing from the broader spirit and scope of the invention as set
forth in the claims. Accordingly, the specification and drawings
are to be regarded in an illustrative rather than a restrictive
sense.
* * * * *