U.S. patent application number 15/433196 was filed with the patent office on 2017-06-08 for method and apparatus for executing voice command in electronic device.
The applicant listed for this patent is Samsung Electronics Co., Ltd.. Invention is credited to Subhojit CHAKLADAR, Hee-Woon KIM, Sang-Hoon LEE.
Application Number | 20170162198 15/433196 |
Document ID | / |
Family ID | 48625739 |
Filed Date | 2017-06-08 |
United States Patent
Application |
20170162198 |
Kind Code |
A1 |
CHAKLADAR; Subhojit ; et
al. |
June 8, 2017 |
METHOD AND APPARATUS FOR EXECUTING VOICE COMMAND IN ELECTRONIC
DEVICE
Abstract
An apparatus and method for executing a voice command in an
electronic device. In an exemplary embodiment, a voice signal is
detected and speech thereof is recognized. When the recognized
speech contains a wakeup command, a voice command mode is
activated, and a signal containing at least a portion of the
detected voice signal is transmitted to a server. The server
generates a control signal or a result signal corresponding to the
voice command, and transmits the same to the electronic device. The
device receives and processes the control or result signal, and
awakens. Thereby, voice commands are executed without the need for
the user to physically touch the electronic device.
Inventors: |
CHAKLADAR; Subhojit;
(Gyeonggi-do, KR) ; LEE; Sang-Hoon; (Gyeonggi-do,
KR) ; KIM; Hee-Woon; (Gyeonggi-do, KR) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
Samsung Electronics Co., Ltd. |
Gyeonggi-do |
|
KR |
|
|
Family ID: |
48625739 |
Appl. No.: |
15/433196 |
Filed: |
February 15, 2017 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
13903345 |
May 28, 2013 |
9619200 |
|
|
15433196 |
|
|
|
|
Current U.S.
Class: |
1/1 |
Current CPC
Class: |
G06F 3/167 20130101;
G10L 2015/225 20130101; G10L 15/26 20130101; G10L 2015/223
20130101; G10L 15/22 20130101 |
International
Class: |
G10L 15/22 20060101
G10L015/22; G06F 3/16 20060101 G06F003/16 |
Foreign Application Data
Date |
Code |
Application Number |
May 29, 2012 |
KR |
10-2012-0057044 |
Claims
1. A computer program product comprising one or more computer
readable storage medium having a computer readable program stored
therein, wherein the computer readable program, when executed on an
electronic device, causes the device to: recognize a wakeup command
in a voice signal received through a microphone of the electronic
device, wherein the voice signal further includes a voice command
subsequent to the wakeup command, where there is a duration of
silence between the wakeup command and the voice command; in
response to the recognizing the wakeup command by the electronic
device, transmit a signal including the voice command of the voice
signal to a server for conducting speech recognition on the voice
command through a transceiver; and perform an operation responsive
to a control signal, corresponding to the recognized voice command,
received from the server.
2. The computer program product of claim 1, wherein the wakeup
command comprises at least one word which is predefined or set by a
user.
3. The computer program product of claim 2, wherein the recognizing
comprises recognizing the wakeup command based on information on a
specific user voice, and wherein the information is acquired from
the voice signal, corresponding to the wakeup command, spoken
plural times by the specific user in advance.
4. The computer program product of claim 3, wherein the
transmitting the signal comprises transmitting the signal including
both of the wakeup command and the voice command.
5. The computer program product of claim 3, wherein the wakeup
command included in the voice signal is received while the device
is displaying a lock screen on a display or is in idle mode.
6. The computer program product of claim 5, wherein, in response to
recognizing the wakeup command, the computer readable program
further causes the device to unlock the lock screen or change from
the idle mode to active mode.
7. The computer program product of claim 6, wherein, in response to
unlocking the lock screen or changing from the idle mode to the
active mode, the computer readable program further causes the
device to display a visual indication related to a voice
recognition.
8. The computer program product of claim 7, wherein the
transmitting the signal comprises transmitting the signal including
both of the wakeup command and the voice command.
9. The computer program product of claim 1, wherein, in response to
recognizing the wakeup command, the computer readable program
further causes the device to provide a visual indication related to
a voice recognition via a display of the device.
Description
CROSS REFERENCE TO RELATED APPLICATIONS
[0001] The present application is a Continuation of U.S. patent
application Ser. No. 13/903,345 filed on May 28, 2013 which claims
the benefit under 35 U.S.C. .sctn.119(a) to a Korean patent
application filed in the Korean Intellectual Property Office on May
29, 2012, and assigned Serial No. 10-2012-0057044, the entire
disclosure of which is hereby incorporated by reference.
TECHNICAL FIELD
[0002] The present disclosure relates generally to an electronic
device. More particularly, the present disclosure relates to an
apparatus and a method for executing a voice command in an
electronic device.
BACKGROUND
[0003] In recent times, as multimedia technologies have grown,
electronic devices having multiple functions have proliferated.
Examples of such multi-function devices include portable terminals
such as smart phones, tablet PCs, smart cameras, as well as fixed
home-based devices such as electronic devices integrated with home
kitchen appliances. The electronic devices mostly include a
convergence function which combines a number of functions.
[0004] Portable terminal designers strive to achieve advanced
performance as well as the convergence function, as well as slim
and aesthetic designs of the device as a whole. Terminal
manufacturers compete to present substantially the same or advanced
performance and to engineer new models that are smaller and slimmer
than prior designs.
[0005] Among the various functions available, a recently
commercialized device provides a voice recognition function of
relatively high accuracy. Such a voice recognition function
accurately recognizes a user's voice to easily execute a
corresponding function of the device without having to press a
separate button or touch a key or touchscreen.
[0006] For example, the voice recognition function allows the user
to make a call or write a text message without separate
manipulation in a portable terminal such as a smart phone, to send
the generated message, and to easily set various functions such as
a route planner, Internet search, and alarm.
[0007] To execute the voice recognition function, the related art
drives a corresponding voice recognition application, activates the
voice recognition function, and then performs the corresponding
function.
[0008] However, to perform the voice recognition, the voice
recognition application is initially started in response to a touch
input command on a separate key or the touchscreen. This operation
goes against the unique function of the voice recognition for
facilitating data input without touch. Further, launching the voice
recognition application requires finding it in a display screen
including various application objects, which may be difficult and
time consuming in some circumstances.
SUMMARY
[0009] Embodiments of apparatus and methods for executing a voice
command in an electronic device are disclosed. In an exemplary
embodiment, a voice signal is detected and speech thereof is
recognized. When the recognized speech contains a wakeup command, a
voice command mode is activated, and a signal containing at least a
portion of the detected voice signal is transmitted to a server.
The server generates a control signal or a result signal
corresponding to the voice command, and transmits the same back to
the electronic device. The device receives and processes the
control or result signal, and awakens. Thereby, voice commands are
executed without the need for the user to physically touch the
electronic device.
[0010] In various embodiments:
[0011] The voice signal may comprise the wakeup command followed by
the voice command.
[0012] The wakeup command may also comprise the voice command.
[0013] A silence duration may be determined between the wakeup
command and the voice command.
[0014] Processing the control signal or the result signal may
comprise executing a particular application of the electronic
device.
[0015] Processing of the control signal or the result signal may
comprise displaying data corresponding to the result signal.
[0016] Once the voice command mode is activated, an object may be
activated on a display indicative of the voice command mode being
activated.
[0017] When a screen is locked prior to recognizing the wakeup
command in the speech, the screen may be unlocked responsive to the
recognized wakeup command.
[0018] The speech may be recognized to contain a predetermined
wakeup command only if a predetermined speaker of the voice signal
is recognized. The wakeup command may be detected automatically
when the voice of the predetermined speaker is recognized.
[0019] Alternatively, the wakeup command may be detected when the
voice of the predetermined speaker is recognized and a
predetermined wakeup command is recognized within the speech of the
predetermined speaker. In another embodiment, a method for
executing a voice command in an electronic device, comprises:
detecting a voice signal which contains at least one of a wakeup
command and a voice command; transmitting the voice signal to a
server; awakening the electronic device upon receiving a result
signal indicative of the server detecting the wakeup command in the
voice signal; receiving a control signal or a result signal
corresponding to the voice command from the server; and processing
the control signal or the result signal corresponding to the voice
command.
[0020] In an embodiment, a method operable in a server, for
supporting a voice command of an electronic device, comprises:
receiving a transmitted voice signal which contains at least a
voice command, from the electronic device; generating a control
signal or a result signal corresponding to the voice command by
recognizing and analyzing the voice command; and sending the
control signal or the result signal corresponding to the first
voice command, to the electronic device.
[0021] In an embodiment, an electronic device comprises: one or
more processors; a memory; and one or more programs stored in the
memory and configured for execution by the one or more processors,
wherein the program comprises instructions for detecting a voice
signal and recognizing speech thereof; when the speech is
recognized to contain a wakeup command, activating a voice command
mode and transmitting a transmit signal containing at least a
portion of the detected voice signal to a server; and receiving and
processing a control signal or a result signal generated and
transmitted by the server in response to a voice command within the
transmit signal recognized by the server.
[0022] According to another aspect of the present invention,
[0023] Other aspects, advantages, and salient features of the
invention will become apparent to those skilled in the art from the
following detailed description, which, taken in conjunction with
the annexed drawings, discloses exemplary embodiments of the
invention.
BRIEF DESCRIPTION OF THE DRAWINGS
[0024] The above and other aspects, features, and advantages of
certain exemplary embodiments of the present invention will be more
apparent from the following description taken in conjunction with
the accompanying drawings, in which:
[0025] FIG. 1A is a block diagram of an electronic device for
executing a voice command according to an exemplary embodiment of
the present invention;
[0026] FIG. 1B is a diagram of a system for executing voice
commands according to an embodiment;
[0027] FIG. 2 is a flowchart of a method for executing wakeup and
voice commands in an electronic device according to one exemplary
embodiment of the present invention;
[0028] FIG. 3 is a flowchart of a method operable in a server
according to one exemplary embodiment of the present invention;
[0029] FIG. 4 is a flowchart of method for executing a voice
command in an electronic device according to another exemplary
embodiment of the present invention;
[0030] FIG. 5 is a flowchart of another method operable in a server
according to another exemplary embodiment of the present
invention;
[0031] FIG. 6 is a flowchart of a method for executing a voice
command in an electronic device according to yet another exemplary
embodiment of the present invention;
[0032] FIG. 7 illustrates a voice signal including a wakeup command
and the voice command that may be detected and recognized according
to embodiments of the present invention;
[0033] FIG. 8A, FIG. 8B and FIG. 8C illustrate a dialing based on
the voice signal including the wakeup command and the voice command
according to an exemplary embodiment of the present invention;
and
[0034] FIG. 9A and FIG. 9B depict screenshots for illustrating a
screen unlocked through wakeup command detection according to an
exemplary embodiment of the present invention.
[0035] Throughout the drawings, like reference numerals will be
understood to refer to like parts, components and structures.
DETAILED DESCRIPTION
[0036] The following description with reference to the accompanying
drawings is provided to assist in a comprehensive understanding of
exemplary embodiments of the invention as defined by the claims and
their equivalents. It includes various specific details to assist
in that understanding but these are to be regarded as merely
exemplary. Accordingly, those of ordinary skill in the art will
recognize that various changes and modifications of the embodiments
described herein can be made without departing from the scope and
spirit of the invention. In addition, descriptions of well-known
functions and constructions may be omitted for clarity and
conciseness.
[0037] The terms and words used in the following description and
claims are not limited to the bibliographical meanings, but, are
merely used by the inventor to enable a clear and consistent
understanding of the invention. Accordingly, it should be apparent
to those skilled in the art that the following description of
exemplary embodiments of the present invention is provided for
illustration purpose only and not for the purpose of limiting the
invention as defined by the appended claims and their
equivalents.
[0038] It is to be understood that the singular forms "a," "an,"
and "the" include plural referents unless the context clearly
dictates otherwise. Thus, for example, reference to "a component
surface" includes reference to one or more of such surfaces.
[0039] By the term "substantially" it is meant that the recited
characteristic, parameter, or value need not be achieved exactly,
but that deviations or variations, including for example,
tolerances, measurement error, measurement accuracy limitations and
other factors known to those of skill in the art, may occur in
amounts that do not preclude the effect the characteristic was
intended to provide.
[0040] Exemplary embodiments of the present invention provide an
apparatus and a method for executing a voice command in an
electronic device and a server.
[0041] FIG. 1A depicts an electronic device 100 for executing a
voice command according to an exemplary embodiment of the present
invention. Electronic device 100 can be any one of a variety of
fixed or portable devices. A portable device can be a portable
terminal, mobile terminal, mobile pad, media player, tablet
computer, smart phone, a notebook/desktop computer, a Personal
Digital Assistant (PDA), a smart camera, and so forth. The
electronic device may be a portable electronic device combining two
or more functions of those devices. An example of a fixed
electronic device is an electronic display device attached to a
home appliance such as a kitchen appliance.
[0042] Electronic device 100 can include a controller 110, a
speaker/microphone 112, a camera 120, a Global Positioning System
(GPS) receiver 130, a Radio Frequency (RF) unit 140, a sensor
module 150, a touch screen 160, a touch screen controller 165, and
an external memory 170.
[0043] Briefly, according to embodiments of the invention, device
100 detects a voice signal and recognizes speech in the detected
signal. When the speech is recognized to contain a wakeup command,
device 100 activates a voice command mode. In the voice command
mode, device 100 is able to respond to subsequent voice commands.
Thus, if the device is in a locked screen state or an idle state
(or in both of these states if conditions allow), the device can be
awakened through the wakeup command recognition without the need
for the user to physically touch a key or touchscreen on the device
100. Once the wakeup command is detected, a speech recognition
process is performed to discern if a voice command has been
uttered. If a voice command is recognized, the device 100 can then
perform an operation associated with that command.
[0044] In one embodiment, both the wakeup command detection and the
voice command recognition are performed in the device 100. In
another embodiment, the wakeup command detection is performed at
device 100 and the voice command recognition is performed at a
server, following a transmission of a portion of the voice signal
from the device 100 to the server. In still another embodiment,
both the wakeup command detection and the voice command detection
are performed at the server.
[0045] The controller 100 can include an interface 101, one or more
processors 102 and 103, and an internal memory 104. In some cases,
the whole controller 110 may be referred to as the processor. The
interface 101, the application processor 102, the communication
processor 103, and the internal memory 104 can be separate
components or integrated onto one or more integrated circuits.
[0046] The application processor 102 performs various functions for
the electronic device by running various software programs and the
communication processor 103 processes and controls voice
communication and data communication. In addition to those typical
functions, the processors 102 and 103 also execute a particular
software module (instruction set) stored in the external memory 170
or the internal memory 104 and conduct particular functions
corresponding to the module. That is, the processors 102 and 103
carry out the method of the present invention in association with
software modules stored in the external memory 170 or the internal
memory 104.
[0047] According to one exemplary embodiment of the present
invention (corresponding to the method of FIG. 2) the application
processor 102 receives a voice signal including a wakeup command
and a subsequent voice command from a user through the microphone
110, and performs speech recognition on the voice signal to detect
the presence of the wakeup command. When the wakeup command is
detected, the application processor 102 may detect a silence
duration between the wakeup command and the subsequent voice
command in the voice signal. Thus the application processor 102
determines whether the portion of the voice signal corresponding to
the voice command begins, and when it does, the application
processor 102 sends that portion of the voice signal to a server.
Next, the application processor 102 receives a voice recognition
result corresponding to the voice command from the server and
performs a corresponding operation based on this result.
[0048] According to another exemplary embodiment of the present
invention (corresponding to the method of FIG. 4), the application
processor 102 sends to the server all of the voice signal including
the wakeup command portion and the voice command portion, and
receives a speaker verification result corresponding to the wakeup
command from the server. When the result indicates that the wakeup
command is detected, the application processor 102 activates the
system. Next, the application processor 102 receives a voice
recognition result corresponding to the voice command and performs
the operation based on the voice recognition result.
[0049] According to yet another exemplary embodiment of the present
invention (corresponding to the method of FIG. 6), the application
processor 102 receives the voice signal including the wakeup
command and the voice command from the user through the microphone
110, and performs the wakeup command detection using speech
recognition. When the wakeup command is detected, the application
processor 102 uses a voice recognition algorithm to recognize a
voice command in a subsequent portion of the voice signal and
performs the corresponding operation based on the recognized voice
command.
[0050] One or more voice recognition processors and a speaker
verification processor can be a part of the application processor
102, or can be provided as separate processors. The voice
recognition processor and the speaker verification processor may be
unified, and include a plurality of processors for different
functions according to their implementation. The interface 101
interconnects the touch screen controller 165 of the electronic
device with the external or internal memory 170 or 104.
[0051] The sensor module 150 is coupled to the interface 101 to
allow various functions. For example, a motion sensor and an
optical sensor can be coupled to the interface 101 to detect a
motion of the electronic device or to detect the light from the
outside. Besides these, other sensors such as position determining
system, temperature sensor, or biometric sensor can be connected to
the interface 101 to conduct relevant functions.
[0052] The camera 120 is coupled to the sensor module 150 through
the interface 101 to perform a camera function such as photo and
video clip recording.
[0053] The RF unit 140, which may include at least one processor,
performs a communication function. For example, under control of
the communication processor 103, the RF unit 140 converts an RF
signal to a baseband signal and provides the baseband signal to the
communication processor 103, or converts a baseband signal output
from the communication processor 103 to an RF signal and transmits
the RF signal through an antenna ANT. Here, the communication
processor 103 processes the baseband signal according to various
communication schemes. For example, the communication scheme can
include, but not limited to, a Global System for Mobile
communication (GSM) communication scheme, an Enhanced Data GSM
Environment (EDGE) communication scheme, a Code Division Multiple
Access (CDMA) communication scheme, a W-CDMA communication scheme,
a Long Term Evolution (LTE) communication scheme, an Orthogonal
Frequency Division Multiple Access (OFDMA) communication scheme, a
Wireless Fidelity (Wi-Fi) communication scheme, a WiMax
communication scheme, and/or a Bluetooth communication scheme.
[0054] The speaker/microphone 110 can input and output an audio
signal such as one for voice recognition (used during a training
process to train device 100 to recognize a particular speaker
and/or wakeup command and/or voice command), voice reproduction,
digital recording, and telephone function. That is, the
speaker/microphone 110 converts the voice signal to an electric
signal or converts the electric signal to the voice signal. An
attachable and detachable earphone, headphone, or headset (not
shown) can be connected to the electronic device through an
external port.
[0055] The touch screen controller 165 can be coupled to the touch
screen 160. The touch screen 160 and the touch screen controller
165 can detect the touch and the motion or their cessation using,
but not limited to, capacitive, resistive, infrared and surface
sound wave techniques for determining one or more touch points with
the touch screen 160 and a multi-touch detection technique
including various proximity sensor arrays or other elements.
[0056] The touch screen 160 provides an input/output interface
between the electronic device and the user. That is, the touch
screen 160 forwards a user's touch input to electronic device 100.
The touch screen 160 also presents an output of device 100 to the
user. That is, the touch screen 160 presents a visual output to the
user. Here, the visual output can be represented as text, graphic,
video, and a combination of these.
[0057] The touch screen 160 can employ various displays, examples
of which include, but are not limited to, Liquid Crystal Display
(LCD), Light Emitting Diode (LED), Light emitting Polymer Display
(LPD), Organic LED (OLED), Active Matrix OLED (AMOLED) or Flexible
LED (FLED).
[0058] The GPS receiver 130 converts a signal received from an
"artificial" satellite, to information such as location, speed, or
time. For example, a distance between the satellite and the GPS
receiver 130 can calculated by multiplying the speed of light by a
signal arrival time, and measures the location of the electronic
device using the well-known triangulation by obtaining accurate
positions and distances of three satellites.
[0059] The external memory 170 or the internal memory 104 can
include fast random access memory such as one or more magnetic disc
storage devices and/or non-volatile memory, one or more optical
storage devices, and/or a flash memory (e.g., NAND and NOR).
[0060] The external memory 170 or the internal memory 104 stores
software. Software components include an operating system software
module, a communication software module, a graphic software module,
a user interface software module, an MPEG module, a camera software
module, and one or more application software modules. Since the
module being the software component can be a set of instructions,
the module can be referred to as an instruction set. The module may
be referred to as a program.
[0061] The operating system software includes various software
components for controlling general system operations. The control
of the general system operations includes, for example, memory
management and control, storage hardware (device) control and
management, and power control and management. The operating system
software may process normal communication between various hardware
devices and the software components (modules).
[0062] The communication software module allows communication with
other electronic devices such as computer, server, and/or portable
terminal, through the RF unit 140. The communication software
module is configured in a protocol architecture of the
corresponding communication scheme.
[0063] The graphic software module includes various software
components for providing and displaying graphics on the touch
screen 160. The term `graphics` embraces text, webpage, icon,
digital image, video, animation, and the like.
[0064] The user interface software module includes various software
components relating to a user interface. The user interface
software module is involved in the status change of the user
interface and the condition of the user interface status
change.
[0065] The camera software module includes camera related software
components allowing camera related processes and functions. The
application module includes a browser, an e-mail, an instant
message, a word processing, keyboard emulation, an address book, a
touch list, a widget, Digital Right Management (DRM), voice
recognition, voice reproduction, a position determining function, a
location based service, and the like. The memories 170 and 104 can
include an additional module (instructions) in addition to the
above-stated modules. Alternatively, if necessary, some module
(instructions) may not be used.
[0066] Herein, the application module includes an instruction for
carrying out a speaker recognition function or a speech recognition
function and a voice command execution function. The instructions
according to exemplary embodiments of the present invention
correspond to those for executing the operations illustrated in
FIGS. 2, 4 and 6.
[0067] The various functions of electronic device 100 as mentioned
above and to be explained, can be executed in hardware and/or
software and/or their combination including one or more signal
processing and/or Application Specific Integrated Circuits
(ASICs).
[0068] FIG. 1B illustrates a system 195 for executing voice
commands according to an embodiment of the present invention.
System 195 includes the portable terminal 100 which communicates
with a server 190 through a network 180. Server 190 can be e.g., a
home network server, or a remote server accessed through a large
network such as the Internet. Alternatively, server 190 can be a
third party portable electronic device capable of performing a
speech/voice/speaker recognition and analysis function on voice
signals transmitted thereto. Server 190 includes a minimum of at
least one processor 192 and a memory 194 to perform a host of
operations. Exemplary operations of the server 190 in conjunction
with electronic device 100 will be described hereafter.
[0069] FIG. 2 is a flowchart of a method 200 for executing wakeup
and voice commands in the electronic device 100 according to one
exemplary embodiment of the present invention.
[0070] At step 201, the electronic device 100 detects a voice
signal which may contain the wakeup command and the voice command
from the user through the microphone 110. The wakeup command
activates a voice command mode of the system, in which no touch
contact is required with the touchscreen or a key in order to
receive and analyze a voice command. Prior to receiving this voice
signal, the device 100 can be in an idle mode or a lockscreen mode.
In some embodiments, prior to receiving the voice signal, the
device 100 can be in an application execution mode in which no
listening for voice commands or operations responsive to voice
commands are executed.
[0071] In the following description, it will be assumed that the
wakeup command is typically independent of a voice command that
temporally follows the wakeup command. However, in some
"speaker-dependent" embodiments also discussed below, any voice
signal detected to be spoken by a predetermined speaker can serve
as a wakeup command. In still other embodiments (speaker dependent
or speaker independent), the wakeup command also contains an
inherent voice command. In the latter case, the wakeup command both
activates the voice command mode and is a catalyst for the device
100 to perform an additional predetermined task, such as running a
predefined application set by the user.
[0072] For example, the wakeup command can instruct to switch to a
mode for inputting the voice command ("voice command mode") and/or
to unlock the screen. The voice command executes various functions
provided by the electronic device 100. For example, the voice
command executes dialing, photographing, MP3 playing, and so on. In
various implementations, the voice command can request the server
190 to search a map and to plan a route.
[0073] In step 202, the electronic device 100 performs speech
recognition on the voice signal to discern whether the voice signal
contains a predetermined wakeup command. This speech recognition
can include a speaker dependent recognition scheme in one
embodiment, or a speaker independent recognition scheme in another
embodiment. Additional schemes are possible where a number of
different wakeup commands are predetermined, in which one or more
predetermined commands is a speaker dependent wakeup command and
one or more other commands is a speaker independent command.
[0074] According to the speaker dependent recognition scheme, a
particular speaker or user needs to train a recognizer with his/her
own voice in advance. In this case, the speech recognizer can
recognize only the speech of the trained voice. The speaker
independent recognition scheme can recognize speech of an arbitrary
speaker voice. The speaker independent recognition scheme extracts
and databases information about hundreds or thousands of voices in
advance, and thus any user can use it without a separate training
process.
[0075] Using the speaker dependent recognition, in some
embodiments, the speaker can be verified using the voice command
portion of the voice signal (which may comprise the entire voice
signal). Hence, there is no need to input a separate wakeup
command. For example, when the speaker is verified using unique
voice characteristics of the user, there is no need to input the
separate wakeup command. Accordingly, in these embodiments, the
voice command can also operate as the wakeup command. Thus, in
steps 202 and 204, the specific voice of the particular user is
recognized, and wakeup command is automatically detected via this
recognition.
[0076] Alternatively, when using the speaker dependent recognition
scheme, a speaker verification is performed based on the
information of a voice signal corresponding to a word(s) that is
predefined, or set by a user in advance through repeated voice
training. The user can train the device 100 to verify the speaker
(and a specific wakeup command(s)) by inputting his voice
corresponding to a predefined text. In so doing, it is necessary to
input the wakeup command. Herein, the predefined text can be input
directly by the user or converted by inputting the voice several
times. The electronic device 100 or the server 190 can convert the
voice to the text.
[0077] When the wakeup command is detected in step 204, the method
proceeds to step 206. Otherwise, it returns to 201.
[0078] Although not illustrated in FIG. 2, when the wakeup command
is successfully detected, an "object for recognizing the voice
command" is activated on the display in the locked screen (see FIG.
8A). This object, which may be a virtual microphone, indicates
activation of the voice command mode, i.e., that the device is
actively listening for voice commands. At this time, the object may
be firstly displayed, or, displayed in an emphasized manner if a
faded version was previously visible, in the locked screen. A
Graphical User Interface (GUI) relating to the speech/voice
recognition can also be displayed at this time. Alternatively, when
the wakeup command is detected, the displayed object can be
activated and the GUI relating to the voice recognition can be
immediately displayed in the unlocked screen.
[0079] In an embodiment variation, when the wakeup command is
detected in the idle mode and the screen is locked, the object for
recognizing the voice command and the GUI relating to the voice
recognition are displayed together. When the screen is not locked,
the object for recognizing the voice command and the GUI relating
to the voice recognition can also be displayed together.
[0080] In step 206, the electronic device 100 detects a silence
duration (if one exists) between a first portion of the detected
voice signal (hereafter, "first voice signal") corresponding to the
wakeup command and a second portion of the detected voice signal
(hereafter, "second voice signal") corresponding to the voice
command. Of course, this assumes that the voice command is a
separate entity than the wakeup command (as mentioned above, an
embodiment is possible where the wakeup command is also a voice
command). For example, assuming that the wakeup command is "Hi
Galaxy" and the voice command is "Call Hong Gil-dong", when the
user consecutively pronounces "Hi Galaxy" and "Call Hong Gil-dong",
a silence duration exists between "Hi Galaxy" and "Call Hong
Gil-dong".
[0081] A short pause between two words in the detected speech can
be used to detect the start of the voice command. In an embodiment,
an extraneous portion of the detected voice signal immediately
following the wakeup command can be blocked from being sent to the
server together with the ensuing voice command. For doing so, a
Voice Activity Detection (VAD) technique can be used. For example,
a voice signal typically has more energy than a background noise
signal including the "silence" period. However, when background
noise is low, unique characteristics of the human voice can be
additionally identified. Typically, the unique characteristics of
the human voice are identified by observing energy distribution
throughout various frequencies. The human voice includes
characteristics signature but no noise. Hence, the VAD technique
can distinguish speech from a silence period including background
noise. Accordingly, in an embodiment, instead of transmitting to
the server an audio signal including all sounds detected subsequent
to the wakeup command, the device 100 waits until speech is
detected, and thereafter transmits only sound signals beginning
with the detected speech that follows the wakeup command. That is,
the method 200 avoids transmitting signals containing just noise of
a silent period following a wakeup command detection.
[0082] In step 208, device 100 determines whether the second voice
signal corresponding to the voice command begins. For example,
device 100 checks the start point of the voice signal corresponding
to "Call Hong Gil-dong". When the second voice signal begins,
device 100 at step 210 sends the voice signal corresponding to the
voice command (e.g., "Call Hong Gil-dong") to the server. (The
portion of the voice signal transmitted to the server is variously
referred to herein as "the transmit signal.) (When the voice signal
corresponding to the voice command does not begin at 208, the flow
returns to 206.) Advantageously, by transmitting the voice command
to the server, device 100 is freed from the processor intensive
task of recognizing the speech of the voice command.
[0083] In step 212, device 100 receives the voice recognition
result corresponding to the voice command from the server. For
example, the server analyzes the voice command "Call Hong
Gil-dong", and sends a control signal corresponding to "Call Hong
Gil-dong" to device 100 or sends a search result of the route
planning request or the map search request.
[0084] In step 214, device 100 performs the corresponding operation
based on the voice recognition result corresponding to the voice
command, or displays a result corresponding to the voice
recognition. For example, when receiving the control signal
corresponding to "Call Hong Gil-dong" from the server, device 100
searches a phonebook for a phone number of Hong Gil-dong and tries
to connect the call with the searched phone number. In the case of
the map/route request, device 100 displays the search result of the
route planning request or the map search request. Thereafter, the
process ends.
[0085] FIG. 3 is a flowchart illustrating a method, 300, performed
by server 190 according to one exemplary embodiment of the present
invention. This method may complement the operations of the
above-described method 200 operating in device 100. In this
embodiment, the server receives the transmit signal, i.e., the
voice signal corresponding to the voice command (e.g., "Call Hong
Gil-dong") from the electronic device (e.g., transmitted at step
210 of FIG. 2) in step 301.
[0086] Next, the server analyzes the voice signal corresponding to
the voice command using a voice recognition algorithm
(equivalently, "speech recognition" algorithm) in step 302. That
is, the server analyzes the voice signal to recognize speech and
discern a voice command from the recognized speech. The server then
determines whether the result corresponding to the voice
recognition is a control signal in step 304. If so, the server
sends the control signal corresponding to the voice recognition to
device 100 in step 306. For example, after recognizing "Call Hong
Gil-dong", the server provides the corresponding control signal to
device 100 to instruct device 100 to call Hong Gil-dong at an
associated phone number extracted from a phone book storage
thereof.
[0087] When the result corresponding to the voice recognition is
not the control signal, the server provides the result
corresponding to the voice recognition to the electronic device in
step 308. Alternatively, the server sends image content containing
the search result of the route planning request or the map search
request, whereby the device 100 displays the content.
[0088] Accordingly, in the exemplary embodiments of methods 200 and
300, the electronic device fulfills the wakeup command detection
and the server fulfills the voice recognition of the voice command.
According to another exemplary embodiment of the present invention,
the server carries out both of the wakeup command detection and the
voice recognition of the voice command.
[0089] FIG. 4 is a flowchart of another example method, 400,
performed in device 100 according to another exemplary embodiment
of the present invention. Here, in an idle mode and/or locked
screen mode, device 100 receives a voice signal including the
wakeup command followed shortly thereafter (or continuously
thereafter) by the voice command from the user through the
microphone 110 in step 401. As described earlier, the wakeup
command, when recognized, activates the system. For example, the
wakeup command can instruct to switch to the mode for inputting the
voice command and/or to unlock the screen. The voice command
commands execution of various functions provided by the electronic
device 100. For example, the voice command executes dialing,
photographing, MP3 playing, and so on.
[0090] In step 402, device 100 sends the entire voice signal
including the wakeup command and the voice command, to the server
as the transmit signal. Next, a voice verification result
corresponding to the wakeup command is received from the server
(step 404). That is, when the server detects that the transmit
signal contains the wakeup command, it sends the voice recognition
result that is received in step 404; otherwise, the server may not
send any recognition signal back to device 100. For example, when
device 100 receives the recognition result at step 404, this
indicates that the wakeup command was detected, and device 100
activates the system in step 406. The system activation unlocks the
screen or switches from the idle mode to an active mode or voice
command mode. (With the system activated and in voice command mode,
device 100 may subsequently detect new voice signals containing
voice commands as in step 401 and repeat steps 402-404 and
subsequent steps accordingly.)
[0091] Next, device 100 receives the voice recognition result
corresponding to the voice command in step 408, and performs the
operation based on the voice recognition result or displays the
result corresponding to the voice recognition in step 410. For
example, when receiving the control signal corresponding to "Call
Hong Gil-dong" from the server, device 100 searches the phonebook
for the phone number of Hong Gil-dong and tries to connect the call
with the searched phone number. In the map/route example, device
100 displays the search result of the route planning request or the
map search request. Thereafter, the process ends, and device 100
may receive new voice signals at step 401 and forward these to the
server for processing, whereby the server may continue to respond
by sending control signals and/or results corresponding to the
subsequent voice commands. That is, steps 401 through 410 may be
repeated with relevant operations only for the voice commands, but
of course not for the wakeup command since the device 100 has
already been awakened.
[0092] FIG. 5 is a flowchart of an exemplary method 500 performed
by server 190 according to another exemplary embodiment of the
present invention. This method may complement the operations of the
above-described method 400 operating in device 100.
[0093] At step 501, the server receives the voice signal including
the wakeup command and the voice command from the electronic device
501 (i.e., the transmit signal transmitted at step 402). In step
502, the server analyzes the voice signal corresponding to the
wakeup command using the voice recognition algorithm. That is, the
server analyzes the voice signal corresponding to the wakeup
command and thus determines whether or not the wakeup command is
detected. This operation may be the same as that of steps 202 and
204 in FIG. 2 performed by device 100 in that embodiment. Note that
a speaker-dependent and/or speaker independent recognition
operation may be performed, as in the embodiment of FIG. 2. (Both
types of recognition schemes may be employed if multiple
predetermined wakeup commands are under consideration.)
[0094] In step 504, the server provides a speech verification
result to the electronic device. (Note that step 504 may be omitted
in other implementations.)
[0095] When the wakeup command is detected in step 506 as a result
of the speech recognition processing, the server then analyzes the
voice signal corresponding to the voice command using the voice
recognition algorithm in step 508. That is, the server recognizes
the speech corresponding to the voice command, and generates a
response signal corresponding to an action to be performed by
device 100 for the particularly discerned voice command. By
contrast, when the wakeup command is not detected in the voice
signal, the flow returns to step 501. To this end, the server may
transmit a signal informing device 100 that no wakeup command has
been detected, whereby device 100 may continue to transmit to the
server newly detected voice signals at step 501. In various
implementations, when the voice signal corresponding to the
previous voice command is normal, the server can request and
receive only the first voice signal corresponding to the wakeup
command.
[0096] Although not illustrated, the server can detect a silence
duration between the voice signal corresponding to the wakeup
command and the voice signal corresponding to the voice command,
and thus distinguish the wakeup command and the voice command.
[0097] In step 510, the server notifies the wakeup command result
and the voice recognition result (the response signal) to the
electronic device. For example, the server determines whether the
wakeup command is detected by analyzing whether the speech contains
the phrase "Hi Galaxy", analyzes the voice command "Call Hong
Gil-dong", and thus sends the control signal corresponding to "Call
Hong Gil-dong" to the electronic device 100.
[0098] Next, the server finishes this process, and may be
configured to listen for subsequent voice signal transmissions from
device 100 as in step 501. To this end, suitable signaling between
device 100 and server 190 can be designed to inform server 190 if
device 100 has returned to an idle mode or a lock screen mode. If
so, the server would treat a subsequently received voice signal as
one that may contain the wakeup command. If not, the server would
naturally just listen for a new voice command.
[0099] FIG. 6 is a flowchart depicting operations of an example
method, 600, performed by the electronic device according to yet
another exemplary embodiment of the present invention. In this
embodiment, the electronic device carries out both the wakeup
command detection and the voice recognition for the voice
commands.
[0100] Steps 601, 602, 604, 606 and 608 may be the same as steps
201, 202, 204, 206 and 208, respectively, of FIG. 2. The following
description of steps 601-608 reiterates some of the concepts
described in connection with steps 201-208.
[0101] At step 601, in the idle mode or in the locked screen, the
electronic device 100 receives the voice signal including the
wakeup command and the voice command from the user through the
microphone 110. The wakeup command activates the system. For
example, the wakeup command can instruct to switch to the mode for
inputting the voice command or to unlock the screen. The voice
command executes various functions provided by the electronic
device 100. For example, the voice command executes dialing,
photographing, MP3 playing, and so on.
[0102] In step 602, device 100 analyzes the voice signal using a
speech recognition algorithm (voice recognition algorithm) to
determine whether the voice signal contains the wakeup command. As
explained earlier, if speaker-dependent recognition is employed,
this operation may involve merely detecting that the voice matches
a predetermined voice, or that the predetermined voice also
contains particular speech matching a predetermined wakeup
command(s). Alternatively, a speaker-independent recognition scheme
may be utilized. When the wakeup command is recognized in step 604,
the flow proceeds to step 606; otherwise, it returns to step
600.
[0103] In step 606, device 100 detects the silence duration between
the voice signal portion corresponding to the wakeup command and
the voice signal portion corresponding to the voice command. For
example, provided that the wakeup command is "Hi Galaxy" and the
voice command is "Call Hong Gil-dong", when the user consecutively
pronounces "Hi Galaxy" and "Call Hong Gil-dong", the silence
duration lies between "Hi Galaxy" and "Call Hong Gil-dong".
[0104] In step 608, the electronic device 100 determines whether
the voice signal corresponding to the voice command begins. For
example, the electronic device 100 checks the start point of the
voice signal corresponding to "Call Hong Gil-dong" in step 608.
[0105] When the voice signal corresponding to the voice command
begins in step 608, the electronic device 100 analyzes the voice
signal corresponding to the voice command using the voice
recognition algorithm in step 610.
[0106] In step 612, the electronic device 100 performs the
corresponding operation based on the recognized voice command. For
example, when the recognized voice command is "Call Hong Gil-dong",
the electronic device 100 searches the phonebook for the phone
number of Hong Gil-dong and tries to connect the call with the
searched phone number. Thereafter, the process ends.
[0107] FIG. 7 depicts an example voice signal including a wakeup
command and a voice command that may be analyzed in the embodiments
described above. The illustrative voice signal input to device 100
may contain a wakeup command and a voice command in succession.
That is, the voice signal may have a portion 700 corresponding to
the wakeup command and a portion 720 corresponding to the voice
command, which are successively input to the electronic device. A
silence duration portion 720 lies between the wakeup portion 700
and the voice command portion 720.
[0108] FIGS. 8A, 8B and 8C are screenshots depicting a dialing
operation using the voice signal including the wakeup command and
the voice command according to an exemplary embodiment of the
present invention. As shown in FIG. 8A, an icon object 800 for
recognizing the voice command is activated according to the voice
signal portion 700 corresponding to the wakeup command. The voice
command ("Call Hong Gil-dong") corresponding to the voice command
portion 720 of the voice signal is recognized as shown in FIG. 8B,
and then the operation is conducted according to the voice command.
For example, the phone number of Hong Gil-dong is searched in the
phonebook and the call connection automatically commences with, the
searched phone number as shown in FIG. 8C.
[0109] FIGS. 9A and 9B depict screenshots of a screen unlocked
through speech recognition and control operations according to an
exemplary embodiment of the present invention. FIG. 9A depicts an
example lock screen; FIG. 9B shows an example unlocked screen. The
process of unlocking the locked screen to generate the unlocked
screen as illustrated in FIGS. 9A and 9B can be performed in any of
the above-described methods of FIGS. 2, 4 and 6 (e.g., steps 214,
406, 410 or 612).
[0110] In response to detection of the wakeup command portion 700
of the voice signal matching a predetermined wakeup command or
matching a particular user's voice, the locked screen of FIG. 9A is
switched to the unlocked screen of FIG. 9B. Although not depicted,
the corresponding operation can be performed by recognizing the
voice command portion 720 corresponding to the voice command ("Call
Hong Gil-dong") following the voice signal 700 corresponding to the
wakeup command after the screen is unlocked.
[0111] In the exemplary embodiments of the present invention
described above, the wakeup command and the voice command are
separated. Alternatively, the voice signal corresponding to the
voice command can be used for both of the speaker verification and
the voice command. Namely, the speaker is verified with the voice
signal corresponding to the voice command. When the speaker
verification is successful, the corresponding function of the
electronic device can be controlled or executed according to the
voice command.
[0112] The above-described methods according to the present
disclosure can be implemented in hardware or software alone or in
combination.
[0113] For software, a computer-readable storage medium containing
one or more programs (software modules) can be provided. One or
more programs stored to the computer-readable storage medium are
configured for execution of one or more processors of the
electronic device and/or the server. One or more programs include
instructions making the electronic device and/or the server execute
the methods according to the embodiments as described in the claims
and/or the specification of the present disclosure.
[0114] Such programs (software module, software) can be stored to a
random access memory, a non-volatile memory including a flash
memory, a Read Only Memory (ROM), an Electrically Erasable
Programmable ROM (EEPROM), a magnetic disc storage device, a
Compact Disc ROM (CD-ROM), Digital Versatile Discs (DVDs) or other
optical storage devices, and a magnetic cassette. Alternatively,
the programs can be stored to a memory combining part or all of
those recording media. A plurality of memories may be equipped.
[0115] The programs can be stored to an attachable storage device
of the electronic device and/or the server accessible via the
communication network such as Internet, Intranet, Local Area
Network (LAN), Wide LAN (WLAN), or Storage Area Network (SAN), or a
communication network by combining the networks. The storage device
can access the electronic device and/or the server through an
external port.
[0116] A separate storage device in the communication network can
access the portable electronic device/server.
[0117] As set forth above, since the detected voice signal
including the wakeup command portion and the voice command portion
is processed, the user can easily execute the voice command.
[0118] In addition, since the wakeup command detection is fulfilled
before the voice command is executed, the voice command can be
carried out based on security/personal information protection.
[0119] While the invention has been shown and described with
reference to certain exemplary embodiments thereof, it will be
understood by those skilled in the art that various changes in form
and details may be made therein without departing from the spirit
and scope of the invention as defined by the appended claims and
their equivalents.
* * * * *