U.S. patent application number 15/877016 was filed with the patent office on 2018-10-25 for determining if an action can be performed based on a dialogue.
This patent application is currently assigned to Digital Genius Limited. The applicant listed for this patent is Digital Genius Limited. Invention is credited to YORAM BACHRACH, PAVEL MINKOVSKY.
Application Number | 20180307745 15/877016 |
Document ID | / |
Family ID | 60037296 |
Filed Date | 2018-10-25 |
United States Patent
Application |
20180307745 |
Kind Code |
A1 |
BACHRACH; YORAM ; et
al. |
October 25, 2018 |
DETERMINING IF AN ACTION CAN BE PERFORMED BASED ON A DIALOGUE
Abstract
A method comprises: receiving input of a dialogue; processing
the dialogue by a neural network based system, to output, for each
of a plurality of slots, a probability distribution over a range of
values associated with the respective slot, the neural network
based system being trained using a training dataset comprising a
plurality of dialogues and, for each dialogue, a value
corresponding to each slot, wherein each dialogue resulted in an
action; determining, based at least on the probability distribution
for each slot, if an action requiring one of values for at least
some of the slots can be performed; if not, causing continuing of
the dialogue.
Inventors: |
BACHRACH; YORAM; (LONDON,
GB) ; MINKOVSKY; PAVEL; (BELMONT, CA) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
Digital Genius Limited |
London |
|
GB |
|
|
Assignee: |
Digital Genius Limited
London
GB
|
Family ID: |
60037296 |
Appl. No.: |
15/877016 |
Filed: |
January 22, 2018 |
Current U.S.
Class: |
1/1 |
Current CPC
Class: |
G06N 3/084 20130101;
G06F 40/35 20200101; G06N 3/006 20130101; G06N 3/0445 20130101;
G06N 3/0454 20130101; G10L 15/16 20130101; G10L 15/1822 20130101;
G06F 16/3329 20190101; G10L 15/22 20130101; G06F 40/30
20200101 |
International
Class: |
G06F 17/30 20060101
G06F017/30; G10L 15/22 20060101 G10L015/22; G10L 15/16 20060101
G10L015/16; G06F 17/27 20060101 G06F017/27; G10L 15/18 20060101
G10L015/18 |
Foreign Application Data
Date |
Code |
Application Number |
Apr 25, 2017 |
GB |
1713746.4 |
Claims
1. A method comprising: receiving input of a dialogue; processing
the dialogue by a neural network based system, to output, for each
of a plurality of slots, a probability distribution over a range of
values associated with the respective slot, the neural network
based system being trained using a training dataset comprising a
plurality of dialogues and, for each dialogue, a value
corresponding to each slot, wherein each dialogue resulted in an
action; determining, based at least on the probability distribution
for each slot, if an action requiring a value for at least some of
the slots can be performed; if not, causing continuing of the
dialogue.
2. The method of claim 1, wherein the determining if the action can
be performed comprises: determining, for each slot, if one of the
values can be selected based at least on the probability
distribution and at least one selection criterion; determining if
the action can be performed at least based also on a result of the
determining if one of the values can be selected for each slot.
3. The method of claim 2, further comprising: for each of the slots
for which a value can be selected, selecting the value for the
slot; and if the required values are selected, causing the action
to be performed using the selected values.
4. The method of claim 3, wherein, for each slot, if a result of
the determining is that no value can be selected for a slot,
associating an indication that no value can be selected with the
slot.
5. The method of claim 3, wherein the selecting the values for the
slots comprises selecting the mode value of the probability
distribution for the respective slot.
6. The method of claim 5, wherein the at least one selection
criterion comprises determining if the probability distribution
indicates that a probability score for the mode value meets a
requirement for the extent to which the probability score for the
mode value is greater than the probability score for other of the
values.
7. The method of claim 2, wherein the at least one selection
criterion comprises: determining, for each slot, a prior
distribution of the values for that slot in the training dataset;
determining, for each slot, a divergence value indicative of
divergence of the probability distribution from the prior
distribution; comparing the divergence value to a predetermined
threshold value; determining that one of the values can be selected
based on a result of the comparing.
8. The method of claim 7, wherein the determining, for each slot,
the divergence value, comprises evaluating the Kullback-Leibler
divergence between the prior distribution and the probability
distribution.
9. The method of claim 1, wherein the action has parameters, and
each slot corresponds to a respective one of the parameters.
10. The method of claim 9, wherein the determining if an action
requiring at least some of the values can be performed comprises
determining if a value is selected for each of the slots.
11. The method of claim 10, wherein the action comprises an API
routine.
12. The method of claim 11, wherein the training dataset comprises
API calls data comprising the plurality of dialogues, for each
dialogue, information indicative of each parameter, and, for each
parameter a respective value, each of the values was recorded by a
human agent when such a value was known to the human agent from the
corresponding dialogue, and the human agent invoked an API call to
the corresponding routine.
13. The method of claim 1, wherein the neural network based system
comprises a recurrent neural network component and, for each slot,
a respective classifier, wherein the processing the input dialogue
comprises: generating word representation vectors for the dialogue;
inputting the vectors into the recurrent neural network component,
and outputting a further vector for each slot; processing, for each
slot, the respective further vector, using the respective
classifier, to generate the probability distribution for the values
of the respective slot.
14. The method of claim 3, wherein the determining, for each slot,
if an action requiring at least one of the values can be performed
comprises: inputting a selected value or an indication that a value
cannot be selected for each slot to a decision module; determining,
by the decision module, to perform at least one of: causing the
action to be performed, and the causing continuing of the dialogue
by a non-person agent.
15. The method of claim 1, further comprising: determining, using
the training dataset, the slots; determining possible values for
each of the slots; setting the determined values for each slot as a
range for that slot.
16. The method of claim 1, further comprising: trained the neural
network based system using the training dataset comprising a
plurality of dialogues and, for each dialogue, the value
corresponding to each slot, wherein each dialogue resulted in the
action in the form of an API call invocation.
17. A system comprising: a neural network based system configured
to: receive input of a dialogue; process the dialogue by a neural
network based system; output, for each of a plurality of slots, a
probability distribution over a range of values associated with the
respective slot, the neural network based system being trained
using a training dataset comprising a plurality of dialogues and,
for each dialogue, a value corresponding to each slot, wherein each
dialogue resulted in an action; a decision module configured to:
determine, based at least on the probability distribution for each
slot, if an action requiring a value for at least some of the slots
can be performed; if not, causing continuing of the dialogue.
18. A computer program product comprising computer program code
stored on a computer readable storage medium, wherein, the computer
program code is configured to, when run on a processing unit,
perform the steps of: receiving input of a dialogue; processing the
dialogue by a neural network based system, to output, for each of a
plurality of slots, a probability distribution over a range of
values associated with the respective slot, the neural network
based system being trained using a training dataset comprising a
plurality of dialogues and, for each dialogue, a value
corresponding to each slot, wherein each dialogue resulted in an
action; determining, based at least on the probability distribution
for each slot, if an action requiring a value for at least some of
the slots can be performed; if not, causing continuing of the
dialogue.
19. The computer program product of claim 18, wherein the
determining if the action can be performed comprises: determining,
for each slot, if one of the values can be selected based at least
on the probability distribution and at least one selection
criterion; determining if the action can be performed at least
based also on a result of the determining if one of the values can
be selected for each slot.
20. The computer program product of claim 19, further comprising:
for each of the slots for which a value can be selected, selecting
the value for the slot; and if the required values are selected,
causing the action to be performed using the selected values.
Description
FIELD OF THE INVENTION
[0001] The invention relates to a method of determining if an
action requiring values can be performed based on a dialogue, in
particular where the action is a routine invokable by an API call
and where the dialogue is between a user and an automated agent,
such as a chatbot. The invention also relates to a related system
and computer program product. The invention also relates to a
method of automatically determining a state structure for a
dialogue tracking system, together with a related system and
computer program product.
BACKGROUND
[0002] Dialogue systems, sometimes called conversational agents,
are systems designed to converse with humans in natural language in
a coherent way, typically in order to help the user achieve some
goal. Uses of such systems include responding to customer
questions, replying to queries regarding a knowledge base,
automating help desk functions or providing technical support or
training and education.
[0003] Conversational agents that both converse with users and
autonomously take actions on their behalf are particularly
challenging to build. One method for designing such conversational
agents is though statistical dialogue systems, which maintain a
distribution over multiple hypotheses regarding correct state of
the dialogue, so as to be robust to complex requests by users
stated using utterances that may be noisy or ambiguous. Quality of
tracking of a correct dialogue state at different points in a
conversation has a strong impact on achieving a high system-wide
performance Dialogue systems comprise a dialogue state tracker
(DST), which infers a user's intentions as a conversation
progresses. DST systems represent the user's intention at a point
in a conversation as a belief-state, composed of a set of
slot-value pairs. Assigning a specific value to a slot reflects a
constraint or requirement a user has.
[0004] For example, an automated subscription management system may
allow creating, freezing or canceling subscriptions to various
magazines, such as Sports, Arts or Science magazines. In this case,
a person may have subscriptions to any subset of the magazines, and
a subscription can be frozen for a month, a quarter (three months)
or a whole year (assuming that a created subscription lasts until
canceled, and that once a subscription is canceled, no magazines
are sent until the user creates a new subscription). Thus, a
conversational agent talking to a customer has to determine which
action the user wants to take (create, freeze or cancel), which
magazine they want to take the action on (Sports, Arts or Science);
in the case of freezing a subscription, the agent must also
determine the duration (a month, quarter or a year). A DST system
for the above example domain might include three allowed actions:
Create, Freeze, Cancel. Further, it is reasonable to track two
slots: a Magazine slot (which can take the values Sports, Arts or
Science), and a Duration slot (which can take the values Month,
Quarter, Year, depending on how long the user wants to freeze the
subscription for). The list of slots and the possible values each
can take are referred to as the state-structure. This example is
referred to throughout.
[0005] It is an object of the present invention to provide a way to
automatically take actions based on dialogue.
SUMMARY OF THE INVENTION
[0006] An example conversation, along with the desired belief-state
that a DST would ideally output at every point in the conversation,
is provided in FIG. 1. Actions taken on the external subscription
management system are also listed (represented as routine calls to
the external system). The left side shows the conversation text
itself (chat log) along with the executed actions, and the right
side shows the desired belief-state to be outputted by the DST
system.
[0007] A big shortcoming of the current approach for building DST
systems is the lack of labelled data. Many firms do typically store
historical logs of past conversations between human agents and
customers. However, these conversations do not come annotated with
the correct belief state at every point in the conversation. In
order to get training data to build the DST, researchers have
proposed using a Wizard-of-Oz approach. In this approach, a human
domain expert first examines the historical chat logs to identify
key slots to track through the conversation, and the values that
these slots may take; the domain expert then constructs the full
state-structure for the DST. Then, human annotators are asked to
examine each conversation in the historical chat logs; they are
asked to provide the belief-state at every point in each such
conversation, under the state-structure specified by the domain
expert. Such annotations are indicated on the right in FIG. 1.
[0008] The above described approach is referred to as a "tight
supervision" approach, as the machine learning system gets a
supervision signal at every point in every conversation in the
training set. As this tight supervision approach requires
annotations to be done manually, it is very costly and does not
scale. Furthermore, it restricts DST models from being easily
generalizing across different domains.
[0009] In accordance with a first aspect of the present invention,
there is provided a method comprising: receiving input of a
dialogue; processing the dialogue by a neural network based system,
to output, for each of a plurality of slots, a probability
distribution over a range of values associated with the respective
slot, the neural network based system being trained using a
training dataset comprising a plurality of dialogues and, for each
dialogue, a value corresponding to each slot, wherein each dialogue
resulted in an action; determining, based at least on the
probability distribution for each slot, if an action requiring a
values for at least some of the slots can be performed; if not,
causing continuing of the dialogue.
[0010] Thus, where the neural network based system is trained with
such a dataset, it can be determined whether an action can be
performed, for example a routine executed, based on the dialogue,
and if the action cannot be performed, the dialogue system can be
instructed to continue the dialogue. The method may be applied for
each utterance input by a user.
[0011] The need to annotate training datasets is thus avoided.
[0012] The determining if the action can be performed may comprise:
determining, for each slot, if one of the values can be selected
based at least on the probability distribution and at least one
selection criterion; determining if the action can be performed at
least based also on a result of the determining if one of the
values can be selected for each slot.
[0013] The method may further comprising: for each of the slots for
which a value can be selected, selecting the value for the slot;
and if the required values are selected, causing the action to be
performed using the selected values.
[0014] For each slot, if a result of the determining is that no
value can be selected for a slot, an indication that no value can
be selected may be associated with the slot.
[0015] The selecting the values for the slots may comprise
selecting the mode value of the probability distribution for the
respective slot.
[0016] The at least one selection criterion may comprise
determining if the probability distribution indicates that a
probability score for the mode value meets a requirement for the
extent to which the probability score for the mode value is greater
than the probability score for other of the values.
[0017] The at least one selection criterion may comprise:
determining, for each slot, a prior distribution of the values for
that slot in the training dataset; determining, for each slot, a
divergence value indicative of divergence of the probability
distribution from the prior distribution; comparing the divergence
value to a predetermined threshold value; determining that one of
the values can be selected based on a result of the comparing.
[0018] The determining, for each slot, the divergence value, may
comprise evaluating the Kullback-Leibler divergence between the
prior distribution and the probability distribution.
[0019] According to the method, the action may have parameters, and
each slot may correspond to a respective one of the parameters.
[0020] The determining if an action requiring at least some of the
values can be performed may comprise determining if a value is
selected for each of the slots.
[0021] The action may comprise an API routine.
[0022] The training dataset may comprise API calls data comprising
the plurality of dialogues, for each dialogue, information
indicative of each parameter, and, for each parameter a respective
value, each of the values was recorded by a human agent when such a
value was known to the human agent from the corresponding dialogue,
and the human agent invoked an API call to the corresponding
routine.
[0023] The neural network based system may comprise a recurrent
neural network component and, for each slot, a respective
classifier, wherein the processing the input dialogue comprises:
generating word representation vectors for the dialogue; inputting
the vectors into the recurrent neural network component, and
outputting a further vector for each slot; processing, for each
slot, the respective further vector, using the respective
classifier, to generate the probability distribution for the values
of the respective slot.
[0024] The determining, for each slot, if an action requiring at
least one of the values can be performed may comprise: inputting a
selected value or an indication that a value cannot be selected for
each slot to a decision module; determining, by the decision
module, to perform at least one of: causing the action to be
performed, and the causing continuing of the dialogue by a
non-person agent.
[0025] The method may comprise: determining, using the training
dataset, the slots; determining possible values for each of the
slots; setting the determined values for each slot as a range for
that slot.
[0026] The method may further comprise: trained the neural network
based system using the training dataset comprising a plurality of
dialogues and, for each dialogue, the value corresponding to each
slot, wherein each dialogue resulted in the action in the form of
an API call invocation.
[0027] According to a second aspect of the present invention, a
system may comprise: a neural network based system configured to:
receive input of a dialogue; process the dialogue by a neural
network based system; output, for each of a plurality of slots, a
probability distribution over a range of values associated with the
respective slot, the neural network based system being trained
using a training dataset comprising a plurality of dialogues and,
for each dialogue, a value corresponding to each slot, wherein each
dialogue resulted in an action; a decision module configured to:
determine, based at least on the probability distribution for each
slot, if an action requiring a value for at least some of the slots
can be performed; if not, causing continuing of the dialogue.
[0028] According to a third aspect of the present invention, a
computer program product comprising computer program code stored on
a computer readable storage medium, wherein, the computer program
code is configured to, when run on a processing unit, perform the
steps of: receiving input of a dialogue; processing the dialogue by
a neural network based system, to output, for each of a plurality
of slots, a probability distribution over a range of values
associated with the respective slot, the neural network based
system being trained using a training dataset comprising a
plurality of dialogues and, for each dialogue, a value
corresponding to each slot, wherein each dialogue resulted in an
action; determining, based at least on the probability distribution
for each slot, if an action requiring a value for at least some of
the slots can be performed; if not, causing continuing of the
dialogue.
[0029] In accordance with a fourth aspect of the present invention,
there is provided a method of determining a state structure for API
calls to a predetermined API, comprising: determining one or more
slots of an API using past API calls data, wherein the API calls
data comprises information indicative of one or more slots and a
plurality of values for the or each of the parameters; determining
a plurality of possible values for the or each slot; setting the
determined values for each slot as a range for that slot.
[0030] The API calls data may comprise one or more parameters for
an API, wherein the or each slot corresponds to a respective
parameter, wherein the API calls data represents the one or more
parameters and the plurality of values for the or each slot in a
first format, the method further comprising converting the API call
data to a second format, wherein the determining the one or more
slots of the API and the plurality of possible values for the or
each slot is performed using the API calls data in the second
format.
[0031] The converting may comprise: inputting each API call in the
first format using a trained neural network (RNN) based on a
sequence-to-sequence model; processing each API call in the first
format by the neural network and outputting each API call in the
second format.
[0032] The trained neural network may comprise a recurrent neural
network having an encoder-decoder architecture. The creating of the
slot for each parameter and the setting of the determined values
for each slot is performed using a parsing function.
[0033] In accordance with a fifth aspect of the present invention,
there is provided a system for determining a state structure for
API calls to a predetermined API, comprising: a determining unit
configured to: determine one or more slots of an API using past API
calls data, wherein the API calls data comprises information
indicative of one or more slots and a plurality of values for the
or each of the parameters; determine a plurality of possible values
for the or each slot; set the determined values for each slot as a
range for that slot.
[0034] In accordance with a sixth aspect of the present invention,
there is provided a computer program product comprising computer
program code stored on a computer readable storage medium, wherein,
the computer program code is configured to, when run on a
processing unit, perform the steps of: comprising: determining one
or more slots of an API using past API calls data, wherein the API
calls data comprises information indicative of one or more slots
and a plurality of values for the or each of the parameters;
determining a plurality of possible values for the or each slot;
setting the determined values for each slot as a range for that
slot.
BRIEF DESCRIPTION OF THE FIGURES
[0035] For better understanding of the present invention,
embodiments will now be described, by way of example only, with
reference to the accompanying Figures in which:
[0036] FIG. 1 shows a dialogue with manually provided belief states
indicated for each utterance in the dialogue;
[0037] FIG. 2 shows illustratively an example of conversion of an
API calls dataset in a first format to an API calls dataset in a
canonised format in accordance with embodiments;
[0038] FIG. 3 shows illustratively an architecture of a
sequence-to-sequence model for use in the conversion;
[0039] FIG. 4 is a flowchart indicating steps in a process of
extracting a state structure from an example API calls dataset, in
according with embodiments of the invention;
[0040] FIG. 5 shows illustratively an architecture of a dialogue
state tracking (DST) system in according with embodiments;
[0041] FIG. 6 is a flowchart indicating steps that take place in
the DST system and a strategy network in accordance with
embodiments;
[0042] FIG. 7 shows illustratively an architecture of a strategy
network in accordance with embodiments of the invention;
[0043] FIG. 8 illustrates a comparison between tight supervision,
as known from prior art, and loose supervision in accordance with
embodiments of the invention;
[0044] FIG. 9 shows diagrammatically components in an example
computing device on which embodiments of the invention may be
implemented.
DETAILED DESCRIPTION OF EMBODIMENTS
[0045] Embodiments of the invention relate to a system configured
to automatically determine values required to perform an action,
and to determine whether the values required for the action have
been automatically determined. Actions in the form of routines that
can be caused to be performed by invocation of API calls to an
external system are referred to herein, but embodiments of the
system are not limited to such. Embodiments may be implemented in
other systems where values required to perform an action are to be
determined based on dialogue.
[0046] The term "utterance" is to be understood herein as an
uninterrupted sequence of words. An utterance may be input as text
by the user, or spoken, in which case the system includes a
conversion module to convert the speech to text. In the context of
the embodiments, a dialogue consists of alternating utterances by
the user and provided by a computerised agent.
[0047] An API consists of stored protocols and routines for using
the external system. An API routine name identifies a single
routine that can be called. An invocation of that routine relates
to a specific call to that routine and includes specific values for
parameters of the routine that are passed to the routine.
[0048] For instance, the routine "Freeze" may be used to access a
subscription management system to freeze a subscription, and take
two parameters: "Magazine", relating to the specific magazine (such
as "Sport" for the Sports Magazine or "Art" for the Arts Magazine)
and "Duration", relating to the time frame to freeze the
subscription for (such as "Month" for a single month, "Quarter" for
three months or "Year" for a full year). An API call invocation may
result from a GUI (graphical user interface) call. For instance,
when manipulating the GUI, a user may select a specific magazine
and time duration from drop-down menus, and click a "freeze
subscription" button. This would result in an API call invocation
for freezing a subscription. For example, a call for freezing a
certain user's Sports Magazine for a month might look like
"Freeze(Magazine=Sport, Duration=Month)", where "Freeze" is the API
routine name, and where "Magazine" and "Duration" are the names of
the parameters and where "Sports", and "Month" are the concrete
parameter values in the call.
[0049] The routine requires values for certain parameters in order
to execute the routine, which have to be provided. Accordingly, an
API call to the external system for the routine has to identify a
value for each of the parameters, as well as typically the routine.
Embodiments of the invention relate to a state structure extractor
(SSE). The SSE is configured to determine parameters required by
the API and also a set of possible values for each of the
parameters, using a corpus of API calls data of past calls to the
API. Embodiments also relate to a system for determining, based on
a dialogue, values for use in performance of an action and
determining if required values for the performance of the action
have been determined with acceptable certainty.
[0050] The SSE is configured to determine, based on the corpus of
API calls data to the particular API, a state structure defining
dialogue state tracking (DST) slots and possible values for each
slot. Each of the slots corresponds to a respective parameter of
the routine that was the subject of at least one API call. The
possible values for each slot are referred to as the range of that
slot. The corpus comprises dialogues and API calls data, where
dialogue was between a user and a human agent, and resulted in the
human agent causing an API call using parameter values that the
human agent determined from the dialogue.
[0051] Referring to FIG. 2, the SSE is configured to use a parser,
for example a regular expression ("regex") matching algorithm, to
extract parameters and values from the API calls data in the corpus
when the API calls data are provided as text strings in a standard
format, referred to herein as the "canonical API format", and to
determine the state structure.
[0052] However, there are many alternative formats for API calls
which could be used; the delimiters may be different, parameters
names and their values may be separated by other characters, and so
on. For instance, API calls could be given as an XML under some
schema, or in a JSON structure. When the API calls data is not
provided in the canonical API format, the SSE is configured to
convert API calls data into the canonical API as a prior step. This
is achieved using a sequencer-to-sequencer conversion module in
which the original API call data is input into the model and the
canonical API format is output.
[0053] The state structure is denoted herein as a tuple R=(S, f),
which describes a set of slots to be tracked in a conversation, and
values that each slot can take. The set of slots is denoted as
S={s.sub.1, . . . s.sub.k}. For a slot s.sub.i, which can take
d.sub.i different values, different possible values that the slot
s.sub.i can take are denoted as V.sub.i=v.sub.i,1, v.sub.i,2, . . .
, v.sub.id.sub.i. V.sub.i denotes a range of the slots S.sub.i. The
state structure, denoted as D, consists of a set S of slots and a
function f mapping a slot to the set of values it can take, so
D=(S, f), and f(s.sub.i)=V.sub.t.
[0054] In some implementations, in which it may be wanted to
indicate the user has not, in the dialogue, expressed a desire or a
constraint for a certain slot. In this case, the range of a slot
may include an indicator O indicating that the user has not yet
expressed an intent regarding this slot.
[0055] If the user has indicated in the dialogue that any value for
a particular slot is acceptable to them, the SSE is configured to
assign a value A to that slot indicative of such.
[0056] In the canonical format, each API call is represented by a
respective routine, in association with each parameter for the
routine and a value for each parameter. This may be represented,
for example, as
Routine(param.sub.1=v.sub.1;param.sub.2=v.sub.2, . . .
,param.sub.u=v.sub.u)
[0057] The term "routine" relates to the name of the called
routine, "param.sub.i" is the name of the "i" th parameter, and
v.sub.i is the value passed to this parameter.
[0058] For example, given an API data set of the form:
freeze-subscription(duration=month;magazine=art)
freeze-subscription(duration=quarter;magazine=sports)
freeze-subscription(duration=year;magazine=cooking)
[0059] The state structure becomes:
action:[freeze-subscription]
duration:[3;7;4]
magazine:[art;sports;cooking]
[0060] Formally, given an API routine r, the set of API calls to
the routine r in the corpus of API calls is denoted as
A.sub.r=(c.sub.1, c.sub.2, . . . , c.sub.n.sub.r) where n.sub.r is
the number of times r has been called according to the corpus, and
where each c.sub.j is an API call of the form r(p.sub.1=y.sub.j,1,
p.sub.2=y.sub.j,2, . . . , p.sub.u=y.sub.j,u) with u denoting the
number of parameters to the routine r, with p.sub.j denoting the
name of the jth parameter to routine r and with y.sub.j,k denoting
the concrete value specified to parameter k on the jth call to
routine r.
[0061] The SSE is configured to create one slot for every
parameter. Given a routine r and its set A.sub.r of API calls in
the data as defined above, creation of the u slots is denoted
(where u is the number of parameters observed in calls to r) as
follows:
s.sub.r,1:=p.sub.1;s.sub.r,2:=p.sub.2, . . .
,s.sub.r,u:=p.sub.u
where s.sub.r,j denotes the j'th slot created for routine r, and
where the symbol ":=" is used to denote a creation of a slot (where
the name of the slot s.sub.r,j is the same as the name for the j'th
parameter in the call to routine r).
[0062] The SSE is also configured, after creating the slots
S.sub.r=(s.sub.r,1, s.sub.r,2, . . . , s.sub.r,u) for routine r, to
generate the ranges for each slot. The SSE sets the range for each
slot to be a union of all the values observed in the corpus of API
calls data. The range generated for slot s.sub.m (i.e. the range
for the m'th slot created for routine r) is denoted by
V.sub.s.sub.r.sub.,k, and is set as follows:
V s r , k := j = 1 n r { y j , k } { .phi. r , k } ##EQU00001##
where y.sub.j,k denotes the value passed to the k'th parameter in
the j'th call to routine r in the corpus of API calls A.sub.r, and
where the symbol O.sub.r,k is a special value indicating that no
constraint has yet been specified for parameter k of routine r.
[0063] Given a corpus containing API calls over the c routines
I=(r.sub.1, r.sub.2, . . . , r.sub.c), where routine r.sub.i has
u.sub.i parameters, all the slots can be created for all the
routines. S.sub.r.sub.i denotes the set of slots for routine
r.sub.i, so
S r i = ( s r i , 1 , , s r i , 2 , , s r i , u i ,
##EQU00002##
where s.sub.r.sub.i.sub.,j relates to the slot created for the j'th
parameter of routine r.sub.i. The final list of slots consists of
all the slots across all the routines:
S=.orgate..sub.r.di-elect
cons.IS.sub.r.sub.i=(s.sub.r.sub.1.sub.,1,s.sub.r.sub.1.sub.,2, . .
.
,s.sub.r.sub.1.sub.,u.sub.1,s.sub.r.sub.2.sub.,1,s.sub.r.sub.2.sub.,2,
. . . ,s.sub.r.sub.2.sub.,u.sub.2, . . .
,s.sub.r.sub.c.sub.,1,s.sub.r.sub.c.sub.,2, . . .
,s.sub.r.sub.c.sub.,u.sub.c)
[0064] For each slot j (which relates to some parameter) of each
routine r.sub.i, a range is generated where V.sub.r.sub.i.sub.,j is
as discussed above.
[0065] The final dialogue state structure is denoted as: D=(S, f)
where the slots are S as defined above, and where
f(s.sub.r.sub.i.sub.,j)=V.sub.r.sub.i.sub.,j.
[0066] The trained sequence-to-Sequence ("Seq2Seq") conversion
module is configured to convert previously unobserved API call
formats to the canonical format. In an alternative embodiment, an
algorithm may be provided configured to convert a specific API call
format to the canonical format.
[0067] The API calls data in a non-canonical format comprise text,
in the form of strings of symbols (including at least letters and
numbers). Referring to FIG. 3, the Seq2Seq conversion module
includes an encoder configured to receive input of such text and
produces a numerical representation of it in the form of a
low-dimensional embedding v R.sup.d, where d is the chosen
dimensionality of the embedding. The conversion module also
includes a decoder, which processes the embedding to generate an
output in the form of the API calls data in a canonical format.
[0068] The encoder and decoder have recurrent neural network (RNN)
architectures. The RNN may include long short term memory (LSTM) or
gated recurrent unit (GRU) cells. Alternatively, a feedforward
neural network architecture may be used in place on an RNN
architecture, although the outputs may be less accurate
particularly for longer sequences.
[0069] In an example a set of input symbols is denoted by A, and an
input sequence is denoted by (x.sub.1, x.sub.2, . . . , x.sub.n),
where each element is a symbol (i.e. x.sub.i A for all i). Given a
required hidden size d, a recurrent neural network (RNN) design
iteratively processes the input text to yield a low dimensional
embedding h.sub.t R.sup.d by applying a cell function c:
A.times.R.sup.d.fwdarw.R.sup.d; the RNN iterates over the equation:
h.sub.t=c(x.sub.t,h.sub.t-1).
[0070] Given the RNN hidden states {h.sub.t}.sub.t-1.sup.n an
output y.sub.t R.sup.k (with k denoting the output's
dimensionality) is produced at every timestep, by applying a
transformation function g:R.sup.dR.sup.k.
[0071] The Seq2Seq conversion module is trained using a historical
training set. The historical training set comprises an input
sequence, denoted by a.sub.i, and an output sequence, denoted by
b.sub.i. The encoder is referred in the following as RNN E and the
decoder is referred to as RNN D. The RNN E is configured to receive
an input sequence a=(x.sub.1, . . . , x.sub.n) and to yield the
hidden states (h.sub.1, . . . , h.sub.n). The final hidden state of
the RNN2 h.sub.n is then copied as the first hidden state of the
decoder.
[0072] During training the decoder RNN D receives as an input the
ground truth output text shifted by one location, b=(s; z.sub.1, .
. . , z.sub.n) (where s is a special start symbol); at every
timestep t the decoder yields a decoder hidden state h'.sub.t and
an output y.sub.t, where the dimensionality of y.sub.t is chosen so
that y.sub.t encodes a distribution over the words in a vocabulary.
The desired output from the decoder after digesting the t'th ground
truth output word z.sub.t is a probability distribution placing
most of the probability mass on the next ground truth word
z.sub.t+1.
[0073] The encoder and decoder are jointly trained so that once the
encoder receives text as input, the decoder will produce the
desired output (predicting the next ground truth word at every
timestep). A loss function used for training is the sum of the
softmax cross entropy losses between the output distribution and
the one-hot encoding of the correct word. Given a vocabulary V, the
decoder input at time t is denoted as s for t=1 and z.sub.t-1 for
t>1, and the decoder output is denoted at time t as
y.sub.t=(r.sub.t.sup.1, r.sub.t.sup.2, . . . , r.sub.t.sup.|V|)
This is transformed into a normalized distribution as
u.sub.t=(u.sub.t.sup.1, u.sub.t.sup.2, . . . , u.sub.t.sup.|V|) by
applying the softmax operator
u i t = exp ( r t i ) j = 1 | V | exp ( r t j ) . ##EQU00003##
[0074] The target output word at time t is z.sub.t, and its
"one-hot" encoding is denoted as
.alpha.(z.sub.t)=(.alpha..sub.t.sup.1, . . .
,.alpha..sub.t.sup.|V|)
[0075] Where .alpha..sub.t.sup.i=1 for i=z.sub.t and at =1
elsewhere (i.e. if z.sub.t is the k'th word in the vocabulary, then
.alpha.(z.sub.t) is a vector with all coordinates set to 0 except
in the k'th location, where it is set to 1). The overall loss is
the cross entropy loss between u.sub.t and .alpha.(z.sub.t) along
all the timesteps:
=-.SIGMA..sub.t=1.sup.n.SIGMA..sub.i=1.sup.|V|.alpha..sub.t.sup.i
log u.sub.i.sup.t.
[0076] The seq2Seq model may be trained by applying a variant of
stochastic gradient descent (SGD) backpropagation over a training
set consisting of inputs and their ground truth outputs. Training
results in setting parameters of the encoder and decoder RNN cells
so as to achieve a low loss.
[0077] The training data to train the Seq2seq model may be prepared
by a human. For example, a state structure may have three slots,
and each slot may have 10,000 values. To create the training data,
values are sampled to fill their respective slots. A list of
different symbols are used as delimiters, parenthesis, etc. These
are uniformly sampled to create the structure of each x. The
targets (y) may be simultaneously created with each x, by simply
placing the slots and values in a constant canonical format. The
x's and y's are illustrated in FIG. 2.
[0078] Referring to FIG. 4, operation of the SSE is now described.
At step 400, the API calls data is converted from its original
format to the canonical format, where the API calls data is
initially in a non-canonical format. The API call data, that is,
the data indicative of routine, parameters and values, at least,
for each API is input into the encoder RNN, is processed, and is
then output by the decoder in the canonical format. Some API
formats may not include names for parameters. In this case a name
is assigned.
[0079] At step 402, the SSE determines slots for the or each
routine using the API calls data in the canonical format. This is
achieved by scanning the calls data and identifying the parameters
of each routine, and creating and storing a slot for a parameter
each time that a new parameter is identified.
[0080] At step 404, the SSE determines a set of possible values for
each slot, that is, the range of each slot. This is achieved by
scanning the values in the API calls data for each parameter and,
each time a new value is found, storing the value in association
with the slot for that parameter.
[0081] The system comprises a DST and a decision module in the form
of a "strategy network". The DST is configured to receive the state
structure and a dialogue as inputs and to predict values for each
slot in a state structure. The DST is configured to output a
prediction for each slot, yielding an element:
T=V.sub.r.sub.1.sub.,1.times.V.sub.r.sub.1.sub.,2.times. . . .
.times.V.sub.r.sub.1.sub.,u.sub.1.times.V.sub.r.sub.2.sub.,1.times.V.sub.-
r.sub.2.sub.,2.times. . . .
.times.V.sub.r.sub.2.sub.,u.sub.2.times. . . .
.times.V.sub.r.sub.c.sub.,1.times.V.sub.r.sub.c.sub.,2.times. . . .
.times.V.sub.r.sub.c.sub.,u.sub.c
[0082] where "x" denotes a Cartesian product. A probability
distribution
s r i , j ##EQU00004##
is required for each slot s.sub.r.sub.i .sub.j, rather than a
single value from the range V.sub.r.sub.i .sub.j of the slot (where
V.sub.r.sub.i .sub.j is determined by the SSE). The DST system
produces a distribution
s r i , j ##EQU00005##
and the mode of this distribution is output as the prediction (or
the value; indicating no constraint if the distribution fails the
tests described below). Thus, any element of the set T is a valid
possible DST output.
[0083] The strategy network is configured to use the values
predicted by the DST and to determine whether a routine can be
executed or if more information is needed. The the DST and the
strategy network, and their operation, are described in detail in
the following.
[0084] The system is coupled to a dialogue system, for example a
chatbot engine. Components of a dialogue system may include an
automatic speech recognition module (for cases where the dialogue
is voice based rather than chat based), a natural language
understanding unit for obtaining semantic information for
utterances or parts of the conversation (name identification, part
of speech tagging, semantic parsing), a dialogue manager which
keeps the history and state of the dialogue and manages the flow of
the conversation, and an output generator which produces utterances
for continuing the conversation. Detailed description of the
dialogue system is outside the scope of this description. The
dialogue system may be a chatbot.
[0085] Referring to FIG. 5, an architecture of the DST comprises a
recurrent neural network (RNN) and a DST head for every slot.
[0086] The RNN is configured to receive an utterance and to output
a dense numerical vector representation of the utterance in a
latent space. For example, the RNN may be a bidirectional
long-short term memory (LSTM) RNN.
[0087] Each such DST head is configured to receive the numerical
vector representation produced by the RNN, and to output a
prediction regarding the correct value for the slot. Each DST head
is in the form of a simple classifier; for example each DST head
may be configured to classify the vector using a logistic
regression to output a probability distribution over the range of
each slot in the state structure, or use a feedforward neural
network configured to receive the vector and to output the
probability distribution. Alternatively, other kinds of classifier
may be used.
[0088] Preferably, the dialogue is processed by the DST immediately
following receipt of a new utterance from the user. This means that
the conversation need only continue for the least time necessary,
since the dialogue can be ended after the strategy network
determines that there is sufficient information to execute a
routine. However, the DST can also receive as inputs a prefix of a
conversation, or an entire conversation, C.
[0089] Before it can be used, the DST is trained using the API
calls data. This is the same API calls data from which the state
structure was extracted by the SSE, although, to train the DST,
dialogue associated with each API call from that API calls data is
also used. The data consists of pairs (g; s) where g is a text
representing a conversation or a part of it, and where s is a
belief-state structure adhering to the state structure (i.e. s
assigns a value for every slot from the range of that slot)
[0090] The training set uses conversations along with their API
call invocation. A conversation c is a sequence of utterances,
denoted as u.sub.i, and API call invocations, denoted as a.sub.i.
For instance, a possible conversation could be c=(u.sub.1, u.sub.2,
u.sub.3, u.sub.4, a.sub.1, u.sub.5, u.sub.6, a.sub.2, u.sub.7,
u.sub.8) (where locations 1, 2, 3, 4 are utterances, location 5 is
an API call invocation, locations 6 and 7 are utterances, location
8 is an API call invocation and locations 9 and 10 are utterances).
t(c) denotes the set of indices where an API call occurs, so in the
above example we have t(c)={5, 8}. p.sub.i(c) denotes a prefix of
the conversation up until (but not including) index i; for
instance, in the above example p.sub.5(c)=(u.sub.1, u.sub.2,
u.sub.3, u.sub.4) and p.sub.8(c)=(u.sub.1, u.sub.2, u.sub.3,
u.sub.4, a.sub.1 u.sub.5, u.sub.6). For a set of indices J, we
denote by P.sub.J (c) the set of all conversation prefixes up until
each of the indices in J (i.e. every prefix of c ending in an index
from J). Thus, P.sub.t(c)(c) is the set of prefixes ending in (but
not including) API calls (and in the previous example,
P.sub.t(c)(c)={p.sub.5(c), p.sub.8(c)}.
[0091] At every point in a conversation where an API call is
executed, we can obtain a ground-truth supervision for the correct
belief-state at that point in the conversation, by converting the
API call invocation to its canonical format and extracting the
passed parameter values as described above. For every API call
invocation a occurring in an index j of a conversation c a training
instance (g, s) where g=P.sub.j(c) (the prefix up until the API
call), and where s is the belief-state 5 In extracted from the API
call invocation. Thus, for a conversation c containing multiple API
call invocations, we generate multiple training instances with the
prefixes P.sub.t(c)(c) along their respective belief-states. By
applying this process to all the conversations in the training set,
the DST training set is obtained.
[0092] Following the training, the DST can receive as an input a
previously unobserved conversation (or part of a conversation) and
predict the correct value for each slot.
[0093] Preferably, both the encoder RNN and DST heads are trained
simultaneously. The loss of a single DST head is a softmax
cross-entropy, the standard classifier loss. The overall loss of
the network is the sum of the losses across all the slots (the sum
of the DST head losses). Co-training the encoder and DST heads
makes the network generate a latent embedding that is expressive
with respect to slot values (and to ignore information that does
not pertain to any of the slots).
[0094] By way of specific example, the encoder may be a
bidirectional RNN with LSTM cells, and each DST head may be a small
feedforward network with one hidden layer. The DST head for a slot
S with a range V may have the form:
h 1 = c enc ( u 1 , i enc ) ##EQU00006## h t = c enc ( h t - 1 , u
t ) ##EQU00006.2## .pi. i ' = .sigma. ( W i h T + b i )
##EQU00006.3## .pi. i = exp ( .pi. i ' ) j - 1 | V | exp ( .pi. j '
) ##EQU00006.4##
[0095] In the above equations, c.sub.enc is an RNN cell such as an
LSTM cell, h.sub.i is the i'th hidden state of the encoder network,
which is digesting a text sequence consisting of T utterances
denoted (u.sub.1, u.sub.2, . . . , u.sub.T); W.sub.i is a parameter
weight matrix for slot i of dimension d.times.|V.sub.i| (with d
denoting the hidden layer size of the RNN cell), and where
.sigma.(x) denotes the sigmoid function
.sigma. ( x ) = exp ( x ) 1 + exp ( x ) . ##EQU00007##
The output .PI.s=(.pi..sub.1, .pi..sub.2, . . . , .pi..sub.|V|)
forms a normalized distribution over the range V of the slot s.
[0096] The softmax cross entropy function may be used to calculate
the loss for a single head for the slot s using:
L g , s = - i = 1 | V | y g , i log .pi. g , i ##EQU00008##
[0097] The index g above relates to the training instance g, so
.pi..sub.g,i is the distribution associated with the i'th value for
the slot s when feeding in the training instance g, and where
y.sub.g,i is an indicator variable denoting the ground truth value
for the slot (i.e. y.sub.g,i=1 for the value i which is the correct
value for the slot s on the training instance g, and y.sub.g,i=0
otherwise). The overall loss for the DST network is the sum of the
head losses across all slots (and across the fed training
instances):
L = g .di-elect cons. G s .di-elect cons. S L g , s
##EQU00009##
where G denotes the training instances used for the training
procedure, and where S is the set of all tracked slots.
[0098] During training, model parameters may be iteratively tuned
after examining mini-batches of training set instances, by applying
backpropagation, for example by stochastic gradient descent with an
Adam optimizer.
[0099] The DST system, for a state structure R=(S, f) takes an
input a conversation, as indicated at 600 in FIG. 6 or a part of
the conversation, and outputs a representation vector for each slot
at step 602. Preferably, the state structure used by the DST system
is generated using the SSE, as described above, although in
alternative embodiments the DST may receive a state structure
prepared by a human domain expert in place of the state structure
generated by the SSE.
[0100] The DST head then processes the representation vector to
generate a probability distribution over the range of that slot at
604. Thus, given a conversation C, for every sentence in the
conversation, for every slot s.sub.i the DST outputs a probability
distribution .pi..sub.i over the range of s.sub.i, i.e. list of
values:
(.pi..sub.i(v.sub.i,1),.pi..sub.i(v.sub.i,2), . . .
,.pi..sub.i(v.sub.i,d.sub.i))
[0101] Where, for any j.di-elect cons.{1, 2, . . . , d.sub.i},
.pi..sub.i,j(v.sub.i,j).gtoreq.0 and for any slot s.sub.i we have
.SIGMA..sub.j=1.sup.di.pi..sub.i(v.sub.i,j)=1
[0102] The DST then determines a mode of the probability
distribution for each slot at step 606.
[0103] The DST then determines whether a first selection criterion
is met for the mode value for each of the slots at step 608 and
610, to determine if the mode value has been determined for the
slot with an acceptable degree of certainty. If the probability
distribution for the slot is heavily weighted towards a particular
one of the possible values for a slot, this value is determined to
be the correct one for the slot (proceed to step 612); if the
probability distribution is spread across several of the possible
values, indicated by not having an obvious "peak" for the model's
output probability distribution, the DST system determines that a
value has not been determined for the slot. In this case, an
indication of no value is assigned to the slot (step 614).
[0104] The purpose of applying the first selection criterion is to
address a problem that the DST's RNN and the DST head have been
trained with the API data sets where values have been specified for
all slots, and thus have not been trained with data sets where a
value is unspecified. Accordingly, the DST will not make a
prediction that a value the slot is unspecified.
[0105] The first selection criterion requires examining the
distribution to determine that it places significant mass on the
mode value, so
max.sub.j=1.sup.d.sup.i.pi..sub.i(v.sub.i,j).gtoreq..alpha. for a
parameter .alpha.. The value
max.sub.j=1.sup.d.sup.i.pi..sub.i(v.sub.i,j) is a proxy for the
degree of certainty regarding the correct value for the slot
s.sub.i. .alpha.=0.6, although other values may be used and an
optimum value may be determined by experiment. If
max.sub.j=1.sup.d.sup.i.pi..sub.i(v.sub.i,j).gtoreq..alpha. the DST
output is set as the value O, indicating no constraint was
specified for s.sub.i yet.
[0106] The DST also applies at step 612 a second selection
criterion to determine that the reason that the probability
distribution is heavily weighted towards a slot is due to the
additional information gained after examining the dialogue, rather
than information regarding the prior distribution of the values of
the slots.
[0107] A priori, without even examining the input dialogue to the
DST task, some values are more likely to occur than other values
for a slot. For instance, referring to the example above, if 80% of
the dialogues regarding subscriptions relate to the Sports magazine
(and only 20% are regarding other magazines), the magazine slot is
far more likely to take the value sport than the other values.
[0108] The prior distribution over the possible values can be
computed over the possible values {v.sub.i,1, v.sub.i,1, . . . ,
v.sub.i,d.sub.i} for a slot s.sub.i by examining all training
instances in the corpus and checking the value that s.sub.i takes.
Alternatively, the prior distribution can be taken as an even
spread over all possible values for a slot. We denote by
T.sub.s.sub.i.sub.=v.sub.i,j all the instances in the corpus where
s.sub.i takes the value v.sub.i,j, which are all the API call
invocations where slot s.sub.i takes the value v.sub.i,j. q.sub.i,j
is used to denote the proportion of training instances where
s.sub.i takes the value v.sub.i,j, that is:
q i , j = | T s i = v i , j | k = 1 d i | T s i = v i , k |
##EQU00010##
[0109] The prior distribution over the values slot s.sub.i can take
is denoted as Q.sub.i=(q.sub.i,1, q.sub.i,2, . . . ,
q.sub.i,d.sub.i). After examining an input conversation (or prefix
of a conversation), the DST, produces a posterior distribution over
the values for slot s.sub.i, denoted
.PI..sub.i=(.pi..sub.i(v.sub.i,1), .pi..sub.i(v.sub.i,2), . . . ,
.pi..sub.i(v.sub.i,d.sub.i)).
[0110] Determining whether the second selection criterion is met
requires examining the degree to which the posterior distribution
differs from the prior distribution Q by evaluating the
Kullback-Leibler divergence D.sub.KL(.PI..sub.i.parallel.Q.sub.i)
between them. The Kullback-Leibler divergence
D.sub.KL(.PI..sub.i.parallel.Q.sub.i) is a measure of the amount of
information gained in the posterior model distribution .PI..sub.i
relative to the prior probability distribution Qi, and is defined
as:
D KL ( .PI. i || Q i ) = j = 1 d i .PI. i ( v i , j ) log ( .PI. i
( v i , j ) q i , j ) ##EQU00011##
[0111] A high the value of D.sub.KL(.PI..sub.i.parallel.Q.sub.i)
indicates that the prior and posterior distributions are very
different, indicating that the reason for the certainty for the
value of s.sub.i is the dialogue; a low value indicates that the
reason for the certainty is the information known a priori, with
the dialogue contributing little to no new information. If
D.sub.KL(.PI..sub.i.parallel.Q.sub.i<.beta. for some threshold
parameter .beta., the A value of .beta.=0.05 has been used in
inventor experiments.
[0112] If both the first and second selection criterion are met, so
both max.sub.j=1.sup.d.sup.i.pi..sub.i(v.sub.i,j).gtoreq..alpha.
and D.sub.KL(.PI..sub.i.parallel.Q.sub.i).gtoreq..beta., the mode
of the .PI. distribution is output as the predicted value for slot
s.sub.i, i.e. the prediction for this slot being v.sub.i,k where
k=argmax.sub.j=1.sup.d.sup.i.pi..sub.i(v.sub.i,j)
[0113] In an embodiment, the prior distribution may be uniform,
so
Q i = ( 1 d i , 1 di , , 1 di ) , ##EQU00012##
every value for this slot is equally likely a priori and the
divergence value is:
D KL ( .PI. i || Q i ) = i .di-elect cons. V i .PI. i log ( .PI. i
1 d i ) = i .di-elect cons. V i .PI. i ( log d i + log .PI. i ) = i
.di-elect cons. V i .PI. i log .PI. i + log d i i .di-elect cons. V
i .PI. i ##EQU00013##
[0114] As .PI. is a probability distribution over the values
V.sub.i that the slot can take
i .di-elect cons. V i .PI. i = 1 ##EQU00014## D KL ( .PI. i || Q i
) = i .di-elect cons. V i .PI. i log .PI. i + log d i = - i
.di-elect cons. V i .PI. i log 1 .PI. i + log d i = log d i - H (
.PI. i ) where H ( .PI. i ) ##EQU00014.2##
denotes the Shannon entropy of the distribution .PI..sub.i.
[0115] In variant embodiments, other ways of determining a
divergence value may be used in determining whether one of the
values should be selected for a slot from the range of possible
values for that slot. Other selection criteria may also be used
generally.
[0116] If the second selection criterion is met, the selected value
for the slot is passed to the strategy network. Otherwise, an
indication of that a value has not bee assigned is allocated to the
respective slot at step 616.
[0117] The strategy network is configured to determine whether a
routine can be executed based on the determined values and either
to cause an action relating to the routine, such as invocation of
an API call to that routine, or to cause the dialogue system to
continue the dialogue with the user. Such an action may simply be
communication to a human agent of the determined values and that
the routine is able to be executed.
[0118] A routine may require values to have been determined by the
DST system for all of the parameters of the routine. As described
above, one or more slots may have an indication in the form of a
value O, indicating that no value is assigned to the slot. If a
routine requires a slot to have a value and the value for that slot
is O, then that routine cannot be performed and further dialogue is
required.
[0119] Alternatively, values may be essential for some parameters
and not others, and in this case the strategy network is configured
to determine that the routine can be executed if essential values
are output from the DST system.
[0120] In a variant embodiment, the strategy network may determine
whether each of at least two routines can be executed based on the
determined values. In this case, the strategy network can be
configured to cause execution of any one or more of the routines
for which required values are output by the DST system, and
optionally instruct the dialogue system to continue the dialogue
with the user. Alternatively, if the strategy network causes
execution of any routine, or any particular one or more routines,
the strategy network may also be configured to instruct the
dialogue system that there is no more need for information and the
dialogue can thus be ended.
[0121] The strategy network may be configured to determine whether
any of the slots for which values are required by the routine can
be executed using algorithms.
[0122] In the embodiment indicated in FIG. 7, the strategy network
is a neural network configured to receive as an input the
probability distributions generated by the DST and to determine
whether values have been determined for predetermined slots, such
that a routine can be executed and thus an API call can be invoked.
Use of a neural network means that the strategy network does not
have to be configured with information on which parameters of the
one or more routines, or which combinations thereof, are essential
in order for the one or more routines to be executed. In the case
of m routines, the neural network is designed to have m+1 output
neurons, denoted t=(t.sub.0, t.sub.1, t.sub.2, . . . , t.sub.m),
where t.sub.0 relates to not invoking an API call (and further
conversing with the customer), and where t, relates to executing
r.sub.i, the i'th routine. The inputs to the neural network are the
DST outputs, denoted as a=(a.sub.1, a.sub.2, . . . , a.sub.w). The
number of inputs is the product of all the slot sizes, i.e.
w=.PI..sub.i=1.sup.kd.sub.i where d.sub.i denotes the number of
elements in the range of slot s.sub.i.
[0123] The neural network is a feedforward neural network, although
other kinds of neural network may be used. For instance, an
implementation with one hidden layer would be applying a linear
transformation followed by a sigmoid non-linearity to get the
hidden layer h=.alpha.(W.sub.1a+b.sub.1), then applying another
linear transformation followed by a sigmoid nonlinearity to obtain
the outputs h=.sigma.(W.sub.2a+b.sub.2), where W.sub.1, W.sub.2 are
matrix parameters and b.sub.1, b.sub.2 are vector parameters to be
learned during training. The loss of this neural network is the
softmax cross-entropy loss with the ground truth (the identity of
the executed routine, or the value specifying that no routine was
called), similarly to the loss of the DST heads.
[0124] The neural network is configured with an output to indicate
that there is insufficient values for the routine to be performed,
and an output to indicate that there are the required values for
the routine. Where more than one routine may be called, there may
be an output corresponding to each routine. The strategy network is
configured to indicate to the dialogue system if dialogue with the
user should be continued. If the conversation state is not properly
constrained, then the conversation agent continues the dialogue
with the customer, asking for information until the missing
constraints are filled. Formally, the strategy network is denoted
by y=softmax(VO.sup.T+b)
where V is a weight vector of length
i = 1 size ( S ) size ( V i ) , ##EQU00015##
and b,y.di-elect cons.IR, and softmax is the softmax operator over
some vector v, producing v':
v i ' = exp ( v i ) j = 1 size ( V i ) exp ( v j ) ##EQU00016##
where v'=[v'.sub.1, . . . , v'.sub.m] is a probability
distribution.
[0125] Referring again to FIG. 6, at step 618 the strategy network
determines whether a routine should be executed and, if so, which
one. If a routine r is to be executed, the DST slots relating to
the parameters of the routine r are examined. If a constraint
(value) is specified for a parameter, it is passed to the routine,
and if no value is specified (i.e. the DST output is O, for that
slot), the parameter is not passed to the routine. If sufficient
values have been determined so that the routine can be performed,
an API invocation module (not shown) may then invoke an API call at
step 620. Otherwise, at step 622, the strategy network signals to
the dialogue system to continue the dialogue with the user.
[0126] Given the context of the conversation and outputs from the
DST, the strategy network infers whether the state space is
properly constrained i.e. if the system has learned enough
information from the user to issue an API call.
[0127] The neural network can be trained by taking historical
conversations and creating a training instance q.sub.u=(a,t) for
every utterance u in each such conversation. If an API call was not
executed following the utterance u, the correct SN output t is
conversing (not invoking an API call), and if an API call was
executed, it is the identity of the executed routine. The training
instance's input a is the DST's belief-state following the
utterance u (i.e. the belief-state after the DST ingesting the
prefix of the conversation ending with the utterance u). The SN can
then be trained using backpropagation (applying stochastic gradient
descent), similarly to the DST neural network.
[0128] The processes described above are implemented by computer
programs. The computer programs comprise computer program code. The
computer programs are stored on one or more computer readable
storage media and may be located in one or more physical
locations.
[0129] The computer programs may be implemented in any one or more
of a number of computer programming languages and/or deep learning
frameworks, for example Pytorch, TensorFlow, Theano, DL4J. When run
on one or more processors, the computer programs are configured to
enable the functionality described herein.
[0130] As will be apparent to a person skilled in the art, the
processes described herein may be carried out by executing suitable
computer program code on any computing device suitable for
executing such code and meeting suitable minimum processing and
memory requirements. For example, the computing device may be a
server or a personal computer. Some components of such a computing
device are now described with reference to FIG. 9. In practice such
a computing device will have a greater number of components. The
computer system 700 comprises a processor 702, computer readable
storage media 704 and input/output interfaces 706, all operatively
interconnected with one or more busses. The computer device 700 may
include a plurality of processors or a plurality of computer
readable storage media 704, operatively connected. The input/out
interfaces 706 allow coupling of input/output devices, such as a
keyboard, a pointer device, a display, et cetera.
[0131] The processor 702 may be a conventional central processing
unit (CPU). The processor 702 may be a CPU augmented by a graphical
processing unit (GPU) to speed up training. Tensor processing units
(TPU) may also be used. The computer readable storage media 704 may
comprise volatile and non-volatile, removable and non-removable
media. Examples of such media include ROM, RAM, EEPROM, flash
memory or other solid state memory technology, optical storage
media, or any other media that can be used to store the desired
information including the computer program code and to which the
processor 702 has access.
[0132] As an alternative to being implemented in software, the
computer programs may be implemented in hardware, for example GPU,
CPU or special purpose logic circuitry such as field programmable
gate array or an application specific integrated circuit such as a
TPU. Alternatively, the computer programs may implemented in a
combination of hardware and software.
[0133] Embodiments of the invention are not limited to use with any
particular kind of API. The API may be, for example, a web API or a
Java API.
[0134] It will be appreciated by persons skilled in the art that
various modifications are possible to the embodiments.
[0135] The applicant hereby discloses in isolation each individual
feature or step described herein and any combination of two or more
such features, to the extent that such features or steps or
combinations of features and/or steps are capable of being carried
out based on the present specification as a whole in the light of
the common general knowledge of a person skilled in the art,
irrespective of whether such features or steps or combinations of
features and/or steps solve any problems disclosed herein, and
without limitation to the scope of the claims. The applicant
indicates that aspects of the present invention may consist of any
such individual feature or step or combination of features and/or
steps. In view of the foregoing description it will be evident to a
person skilled in the art that various modifications may be made
within the scope of the invention.
* * * * *