U.S. patent application number 17/468007 was filed with the patent office on 2022-09-15 for conversational bot interaction with utterance ranking.
The applicant listed for this patent is Drift.com, Inc.. Invention is credited to Paul A. Crowley, Jeffrey D. Orkin, Christopher M. Ward.
Application Number | 20220294748 17/468007 |
Document ID | / |
Family ID | 1000005830050 |
Filed Date | 2022-09-15 |
United States Patent
Application |
20220294748 |
Kind Code |
A1 |
Crowley; Paul A. ; et
al. |
September 15, 2022 |
Conversational bot interaction with utterance ranking
Abstract
A conversational bot system uses a set of conversations that
have been annotated to identify speech acts, wherein a speech act
is a labeled grouping of utterances. To facilitate processing, a
data model associated with a multi-turn conversation is received.
The data model comprises an observation history. Upon receipt of
query that includes a sequence of at least two or more utterances,
an utterance ranking algorithm is applied. The algorithm
selectively reorders the utterances in the sequence into a ranked
order of importance that reflects a lowest to highest priority of
response. In response to applying the utterance ranking algorithm,
the data model is then updated to reflect the ranked order. In one
embodiment, updating the data model positions the highest priority
utterance as a most recent utterance in the observation history.
The updated data model is then used to attempt to generate a
coherent response to the query.
Inventors: |
Crowley; Paul A.; (Boston,
MA) ; Orkin; Jeffrey D.; (Arlington, MA) ;
Ward; Christopher M.; (Somerville, MA) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
Drift.com, Inc. |
Boston |
MA |
US |
|
|
Family ID: |
1000005830050 |
Appl. No.: |
17/468007 |
Filed: |
September 7, 2021 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
17195776 |
Mar 9, 2021 |
11115353 |
|
|
17468007 |
|
|
|
|
Current U.S.
Class: |
1/1 |
Current CPC
Class: |
G10L 15/22 20130101;
H04L 51/02 20130101; G10L 2015/223 20130101; G10L 15/183
20130101 |
International
Class: |
H04L 12/58 20060101
H04L012/58; G10L 15/183 20060101 G10L015/183; G10L 15/22 20060101
G10L015/22 |
Claims
1. A method for imitating a human conversational response using a
set of conversations that have been annotated to identify speech
acts, and physical acts, wherein a speech act is a labeled grouping
of utterances, comprising: in association with an automated
conversational bot executing in a computing system: receiving a
data model associated with a multi-turn conversation, the data
model comprising an observation history; upon receipt of a query
that includes a sequence of utterances, applying an utterance
ranking algorithm that outputs a ranked order of importance of the
utterances in the sequence, the ranked order of importance
reflecting a lowest to highest priority of response, and wherein a
position of a given utterance in the ranked order of importance
also reflects an importance of the given utterance within a context
of the multi-turn conversation; in response to applying the
utterance ranking algorithm, updating the data model to reflect the
ranked order; and using the updated data model to attempt to
generate a coherent response to the query for the automated
conversational bot; and the automated conversational bot returning
the coherent response to the query.
2. The method as described in claim 1 wherein updating the data
model positions a highest priority utterance as a most recent
utterance in the observation history.
3. The method as described in claim 2 wherein using the data model
to attempt to generate a coherent response searches for a correct
action to a next highest priority utterance when a response to the
highest priority utterance cannot be found.
4. The method as described in claim 1 wherein the given utterance
is an utterance that has a low frequency of language use.
Description
BACKGROUND
Technical Field
[0001] This application relates generally to information retrieval
methods and systems.
Background of the Related Art
[0002] Online conversational marketing is a well-known technique
designed to move buyers or prospects through marketing and sales
funnels through the power of real-time conversations. In this
approach, and instead of forcing users to go through lead capture
forms and wait days for a response, conversational marketing uses
targeted messaging, live chat and intelligent conversational bots
(chatbots) to engage with users in real-time when they visit a
website. A chatbot is a software application that executes on the
site and that is used to interact with the user, often in lieu of a
direct human interaction. Typically, conversational bots are of
several types. A rules-based chatbot follows pre-designed rules
that form a decision tree. A more sophisticated approach is an
Artificial Intelligence (AI)-based chatbot. An AI chatbot can
understand language outside of a closed set of pre-programmed
recognized inputs, and it has the capability to learn based on the
inputs it receives. It can also make changes based on patterns, and
it can become smarter over time as new situations are experienced.
This type of chatbot can be applied to a range of uses, e.g.,
providing alternatives to human representatives, sentiment
analysis, making predictions about what a visitor is looking for on
a website, learning and adapting to user preferences, and
others.
[0003] Although AI-based chatbot solutions provide significant
advantages, certain common use cases present significant
challenges. For example, when a customer says more than one
statement at once to the conversational bot, one typical default
operation is that the system responds to the most recent utterance
for which a response action can be determined. Depending on
context, however, this default behavior sometimes is not correct,
in which case the bot typically produces a wrong (or
context-inappropriate) response.
[0004] There remains a need to provide enhancements to
conversational bot systems that enable coherent processing of
utterance sequences if and when they occur during a
conversation.
BRIEF SUMMARY
[0005] A method and computing platform provides query understanding
to select a response in a multi-turn conversation between a user
and a conversational bot, where the query comprises a sequence of
utterances. As used herein, the notion of a sequence refers to a
scenario in which the human user interacting with the bot makes two
or more utterances before the bot is able to provide a response.
According to this disclosure, an utterance ranking is applied to
the utterances in the sequence to enable the bot to provide a
coherent and context-appropriate response.
[0006] To this end, and in a representative implementation, the
conversational bot system uses a set of conversations that have
been annotated to identify speech acts, wherein a speech act is a
labeled grouping of utterances. To facilitate processing, a data
model associated with a multi-turn conversation is received. The
data model comprises an observation history. Upon receipt of query
that includes a sequence of at least two or more utterances, an
utterance ranking algorithm is applied. The algorithm selectively
reorders the utterances in the sequence into a ranked order of
importance that reflects a lowest to highest priority of response.
In response to applying the utterance ranking algorithm, the data
model is then updated to reflect the ranked order. In one
embodiment, updating the data model positions the highest priority
utterance as a most recent utterance in the observation history.
The updated data model is then used to attempt to generate a
coherent response to the query.
[0007] The foregoing has outlined some of the more pertinent
features of the subject matter. These features should be construed
to be merely illustrative. Many other beneficial results can be
attained by applying the disclosed subject matter in a different
manner or by modifying the subject matter as will be described.
BRIEF DESCRIPTION OF THE DRAWINGS
[0008] For a more complete understanding of the subject matter and
the advantages thereof, reference is now made to the following
descriptions taken in conjunction with the accompanying drawings,
in which:
[0009] FIG. 1 is a block diagram depicting an information retrieval
system in which the technique of this disclosure may be
implemented;
[0010] FIG. 2 is a block diagram of a relational database
comprising a set of inter-related data tables in which a corpus of
annotated conversation transcripts is organized to provide
storage-efficient informational retrieval according to one aspect
of this disclosure;
[0011] FIG. 3 depicts a representative data model that is persisted
in a database and represents a conversational history for a
particular user-chatbot conversation or session over one-to-many
turns;
[0012] FIG. 4 depicts a representative system interaction showing
processing of a user query according to an embodiment of this
disclosure;
[0013] FIG. 5 depicts a first example of a sequence of utterances
provided to a bot during a conversation;
[0014] FIG. 6 depicts a second example of a sequence of utterances
provided to the bot;
[0015] FIG. 7 depicts a process flow of a routine to detect a
sequence of utterances that will then be processed to identify
which utterance should be processed by the bot; and
[0016] FIG. 8 depicts a process flow of an utterance ranking
algorithm according to this disclosure;
[0017] FIG. 9 depicts a process flow of how the rankings identified
by the utterance ranking algorithm are then applied by the AI-based
bot;
[0018] FIG. 10 depicts a first example scenario (multiple
statements) showing the operation of the ranking algorithm;
[0019] FIG. 11 depicts a second example scenario (mixture of
questions and statement) showing the operation of the ranking
algorithm; and
[0020] FIG. 12 depicts a third example scenario (multiple
questions) showing the operation of the ranking algorithm.
DETAILED DESCRIPTION
[0021] The following is a glossary of terms that are used
herein:
[0022] Event: a sequence of observations, recognizable as a
coherent behavior. Observations within the sequence can be an
arbitrary mix of speech acts and physical acts, from multiple
actors. One event can be represented by many different possible
expressions of that event.
[0023] Event expression: one specific sequence of one or more
observations that can represent some event. An event may unfold in
many different ways, i.e., there may be many different sequences of
saying things, and taking physical actions.
[0024] Event recognition: inferring which event is unfolding, or
has unfolded, by comparing a sequence of observations to sequences
that compose the expressions for some event.
[0025] Event aliasing: inferring which event is unfolding, or has
unfolded, with an inexact match. When an exact sequence is not
recognized, a closest inexact match can be determined and presented
as an alias such that further processing can then be performed as
if the closest inexact match was recognized as a known
sequence.
[0026] Observation: an atomic action taken by an actor. Each action
may be a physical act or a speech act. One or more observations may
be intermingled between or among events as the events unfold in
parallel, in which case the events themselves are active (in time)
concurrently. Each observation is a member of one event, but as
noted events can overlap in time, e.g., given observations 1-4,
observations 1 and 3 may be part of event 1, while observations 2
and 4 may be part of event 2, such that events 1 and 2 are both
active over a time period. As this example shows, observations from
different events may be arbitrarily sequenced among observations
from other events.
[0027] Physical act: a non-linguistic action taken by an actor,
e.g., clicking a button or a link on a Graphical User Interface
(GUI), entering data in a form, or the like. A physical act also
may be received via more sophisticated data entry mechanisms, such
as a gesture-based interface that reads and interprets facial
expressions, biometric devices that receive and interpret physical
features or movements, etc.
[0028] Speech act: an utterance expressed by one actor to another,
in natural language. A speech act has some purpose for expressing
it, and typically it can be expressed in many different ways to
accomplish the same thing. The utterance may be typed text (e.g.,
in a chat window), transcribed from spoken audio, or the like.
[0029] Utterance: a sequence of words that is grammatically
complete; usually one sentence.
[0030] Multi-turn conversation: typically, an interaction between
end user participating in a chat, and a conversational hot,
typically over multiple question and answer cycles. A multi-turn
conversation may involve more than one human user, and more than
one bot. For example, a bot may be configured to talk to multiple
users concurrently (e.g., participants in a Zoom.RTM. web
conference).
[0031] FIG. 1 depicts a representative information retrieval system
in which the techniques of this disclosure may be implemented. In
one embodiment, and which is not intended to be limiting, a
multi-turn conversation is carried out between an end user 100, and
a conversational bot software application 102 that executes in a
network-accessible computing platform 104. The end user 100 is a
human being that interacts with the platform, typically using a
client machine 106 having a browser or mobile application (app). In
a typical user case, the software application 102 executes in
association with a website 108, although the chatbot functionality
may be utilized by multiple distinct websites operated by separate
and independent content providers. As such, the computing platform
provides the chatbot functionality in a multi-tenant operating
environment, although this is not a requirement. The user provides
input to the chatbot as speech, as one or more physical actions
(e.g., selecting a button or link, entering data in a field, etc.),
or as some combination of speech and physical action. The chatbot
102 herein is an AI-based conversational bot. As depicted in FIG.
1, preferably the platform is configured with a network-accessible
front end component 110, together with an Application Programming
Interface (API)-accessible back-end system 112. The API is depicted
at 114. The back-end system executes an instance of the software
application 102 for each multi-turn conversation and, in
particular, it provides both an understanding of a user's query (a
speech and/or physical input), as well as a possible response to
this query. Typically, the front-end system handles inputs from
multiple end users that access the platform, and individual
responses as determined by the back-end system (and in particular
the software application instance that is managing the particular
multi-turn conversation) are passed from the back-end to the
front-end for delivery to the end user. In the context of a
chatbot, the response typically is provided in a chat window.
[0032] Preferably, during a multi-turn conversation between the
user and the conversational bot, it is desired that the system
responds to a user input in less than a given time period (e.g., a
few seconds) given network latency and transit time, and that such
response is both correct and coherent with respect to the history
of the conversation so far, i.e., the history of previous
utterances or physical actions by the user. To this end, and
according to a first aspect of this disclosure, information against
which a query is processed preferably is organized in a particular
manner. This information typically is a corpus of historical
conversation transcripts that are first annotated (e.g., by human
analysts) to provide an information database, e.g., queries and
associated responses, that have been generated by the system
previously. The nature and scope of the annotations are
implementation-specific, but the notion is that the database
provides a large corpus of possible conversational flows that the
system may then use during a real-time multi-conversation between
the user and the chatbot.
[0033] According to this disclosure, and as shown in FIG. 2,
preferably the conversational transcripts 200 are annotated and
then re-organized into a set of inter-related data tables that are
structured in a specific manner to enable the system to make fast,
discrete comparisons between a live conversation (i.e., a current
one) and a corpus of conversions that have been seen in the past
(i.e., the annotated historical conversations). This
re-organization is sometimes referred to herein as packing. The set
of data tables comprises an event bindings table 202, an events
table 204, an observations table 206, a physical acts tables 208,
and a speech acts table 210. The event bindings 202 table stores a
set of event bindings, which are entities that connect lines of
transcripts with whatever events these lines have been annotated to
represent in the transcript. The events table 204 stores pointers
to events, wherein an event typically is composed of a linear
sequence of observations that represent an expression of the event.
The observations table 206 stores pointers to observations that are
either physical acts in the physical acts table 208, or speech acts
in the speech acts table 210. As will be described, information in
the relational database is searched in response to physical acts or
speech acts, and the latter typically are organized as clusters of
utterances 212. As defined above, an utterance 214 typically is a
grammatically-complete set of words, nominally a complete sentence.
Thus, and as depicted, the event bindings (event expressions) point
to events, which point to the observations, either physical or
speech acts. As will be described, this layout of the relational
database enables the system to place an utterance or user action
into its appropriate context in a highly-efficient manner, thereby
giving the user input meaning within the conversational flow. In
particular, the tables allow the system to efficiently search for
all instances of an utterance or action (speech or physical act)
within the annotated transcripts. Typically, the relational
database tables are continually updated as the system interacts
with end users; in one embodiment, the data tables are repacked
periodically, e.g., every 10-20 minutes, although this is not a
requirement.
[0034] Efficient querying according to the techniques herein is
facilitated by representing observations as either speech or
physical acts, and which are taken by either actor (conversation
participants), and that are fluidly intermixed to form events. To
this end, and according to a further aspect of this disclosure, as
a multi-turn conversation proceeds, the system generates and
persists in memory a data model (or, more generally, a data
structure) that represents a conversation history between the user
and the bot. Generally, the data model comprises an observation
history, together with a set of events that have been determined to
represent the conversation up to at least one turn. FIG. 3 depicts
an example of one such data model. As depicted, event 302 has a
particular event identifier "787968," which is an entry in the
events table. The event has a particular type, in this example
"Schedule_Demo." Based on the conversation so far, the event is
composed of a sequence of observations (looking left to right)
including an observation 302 (identifier "765576," which is an
entry in the observations table), and observation 304 (identifier
"4453," which is another entry in the observations table).
Observation 302 points to a speech act 306 (identifier "53709,"
which is an entry in the speech acts table), which speech act in
turn is associated to the conversation participants 308, in this
case wherein the "actor" is the bot and the "listener" is the user.
As also indicated, the speech act 306 has been associated to an
utterance cluster 310, in this example the "Offer_Demo." In this
example, the utterance cluster is common to utterances 312 and 314
(e.g., "Would you like to schedule a demo?" or "Would you like to
see how the app works on a website?). Although just two utterances
are depicted for the cluster, there may be any number of
utterances. The entities 302, 306, 308, 310, 312 and 314 thus
comprise a hierarchical representation of a first observation in
the sequence of observations that comprise the event, and this
activity is implemented by the chatbot.
[0035] More generally, utterances such as described preferably are
used as training data for a machine learning (ML)-based statistical
classifier; upon training, the classifier is then useful for both
checking for exact matches, as well as for further generalization,
i.e., finding other wordings that have a similar meaning to words
and phrases recognized by the classifier.
[0036] Referring back to FIG. 3, the second observation 304
represents the user's response to what in this example is a prompt
from the bot In particular, the observation 304 is associated with
physical act 316 having identifier "87302," which is another entry
in the observations data table. In this portion of the tree,
conversation participants are reversed, such that in entity 318 the
"actor" is the user, and the "target" of the physical act is the
bot. As also depicted, physical act 316 also points to a particular
button 320, representing the user responding that he or she wants
to see the demonstration (the "Yes_Demo" action). The data model
continues (on the right) with the next follow-on observation, e.g.,
an utterance by the user, another physical act, and so forth.
[0037] As can be seen then, the data model represents each event as
being composed of an observation sequence, which is sometimes
referred to herein as an observation history. As the data model is
persisted (and in the depicted tree grows right-ward), the
conversation history between the user and the bot is represented.
In particular, and at any point in the multi-turn conversation, the
data model comprises the observation history, namely, a
hierarchical set of events that have been determined to represent
the conversation up to at least one conversation turn (and
typically many turns). More specifically, the data model is the
observation history (everything said, and every button clicked,
since the beginning of the conversation), and a tree of events that
the system has overlaid on top of the observations to represent the
system's best guess at explaining how a linear sequence of
observations breaks down into coherent fragments of conversation.
Persisting a data model that is built in this manner (and based on
both utterances and physical actions) provides significant
advantages because real conversations do not always unfold neatly
into a linear sequence of topics. Instead, often they are typically
quite messy, fluidly oscillating between several topics.
[0038] According to a further aspect of this disclosure, the data
model keeps track of any number of events, all of which can be
actively "extended" at any time. As described herein, an event can
be extended when appending an identified observation creates a new
sequence of observations that exactly matches an existing event
expression. Thus, an event can be extended if appending the most
recent observation creates a new (longer) sequence that exactly
matches a sequence in the events table of the relational database.
As noted above, the events table against which the matching is
attempted is populated during packing, based on human annotation of
events found in real human conversation transcripts. When appending
an observation to an existing event expression in the data model is
not possible, the system determines if it can create a new event
with the observation then being the initial observation for the new
event sequence. In circumstances when a particular observation does
not fall neatly within an event expression (i.e., it cannot be used
extend the event), or when the observation does not make sense as a
new event, the system then provides a fallback (or failover)
operation to enable processing to continue efficiently. This
fallback operation is referred to herein as event aliasing.
[0039] As defined above, event aliasing refers to the notion of
inferring which event is unfolding, or has unfolded, with an
inexact match. In particular, when an exact sequence is not
recognized, preferably a closest inexact match is determined and
presented as an alias such that further processing can then be
performed as if the closest inexact match was recognized as a known
sequence. When an aliased event is created, the system then
proceeds to process the alias event as the event being spoofed (and
thus no longer sees the inexact match). Aliasing may be carried out
either to extend an event (an "aliased extension"), or even to
create a new event ("aliasing a new event"). Preferably, and as
noted above, the system first tries to extend an event (using an
exact match); then, if extending an event fails, the system then
determines if it can create a new event (if this can be done
logically given the conversation history so far). Once the latter
option fails, the system fails over to attempt to generate an
aliased extension and, failing that, performs aliasing for a new
event. Event aliasing is advantageous, as it enables the system to
provide coherent and appropriate responses even when the database
lookup does not return exact (or otherwise useful) results. The
mechanism enables the system to pretend it has seen a sequence of
observations that exactly matches something in the database, when
in reality the system observed something very similar but that,
e.g. skips an utterance in the middle, re-orders two utterances
within the sequence, or the like. Event aliasing enables to system
to operate coherently even when there is no exact match for a
sequence that can be extended with the most recent observation. In
this manner, the sequence is auto-corrected to something that is
close enough to what has been observed. This approach ensures that
the rest of the system functions efficiently with exact look-ups
into tables within the relational database.
[0040] The intelligent packing of the relational database, and the
use of the data model structured as described, enables the system
to participate in a multi-turn conversation, coherently, and
quickly. The system has several options for flexibly understanding
the meaning of an utterance, where aliasing is the fallback, and
where meaning is ascribed based on which observation is seen at
which step of an event, within a sequence of events that compose a
conversation. Events are of arbitrary length, and they can overlap
in time with other events. Thus, and although not depicted in the
example event hierarchical representation shown in FIG. 3, multiple
events may be active concurrently.
[0041] In this example depicted in FIG. 3, two different utterances
are indicated as being members of an utterance cluster. Preferably,
and mentioned above, the system defines utterance clusters by
training a machine learning (ML) classifier.
[0042] A conversational bot system 400 that implements the
techniques of this disclosure is shown in FIG. 4. The system is
configured to process a query (e.g., a user utterance or physical
act) within a given turn of the multi-turn conversation. It assumes
that the corpus of annotated conversation transcripts is packed
into a relational database 402 comprising the set of inter-related
tables as shown in FIG. 2. Preferably, these tables include a table
of events, a table of observations, a table of physical acts, and a
table of speech acts. As will be described, the relational
structure includes the pointers (keys) identified, and preferably
look-ups in to the database are performed on a constant-time basis
such that only a single database query into the data structures is
required for each database access (although there are typically
several of such accesses as will be described).
[0043] At a high level, the system 400 comprises several main
components, namely, an event recognition sub-system 404, and an
action selection sub-system 406. The event recognition sub-system
404 has several basic functions. Initially, and upon receipt of a
new user input, the event recognition sub-system 404 tries to
decide how to interpret what was actually said to the bot given
what has been observed previously. Typically, an input can be
successfully interpreted if two conditions are met: (1) the user's
input can be recognized as a speech act or physical act that exists
as an observation in the historical data, and (2) this observation
can legally extend an event or start a new event, giving contextual
meaning to the observation.
[0044] As noted above, event recognition is the process of first
trying to extend an event, or adding a new event, or (failing those
options) performing aliasing. In general, this evaluation is
carried out using an initial lookup into the database of historical
transcripts to identify one or more candidate interpretations,
followed by a filtering of the candidates for coherency (sometimes
referred to herein as "coherency filtering"). More formally, a
candidate interpretation typically corresponds to some human
annotation found in the historical conversation transcripts that
were indexed during the packing operation. In one exemplary
implementation, the candidate interpretation is a pointer
identifying specific lines of historical data in the set of
inter-related tables that comprise that relational database 402.
Another way of describing a candidate interpretation is as an
instance of observations, e.g., one candidate interpretation might
be a specific utterance in a specific transcript that expresses the
speech act "hello" with the line "hey man nice to see you."
Multiple candidate interpretations are identified by a set of
pointers to specific lines in the transcripts, representing
instances of observations.
[0045] Preferably, the input to event recognition sub-system 404,
where the system tries to extend an event, etc., is represented as
a sequence of observation identifiers (IDs), rather than instances.
An ID is akin to a type of observation that may appear many times
in the annotated historical data, rather than one instance of an
observation with that ID. To take a concrete example, assume the
user utterance is "hello kind sir." The classifier is run against
the utterance, and in this example the classifier assigns the
utterance the speech act label "hello." From this, the system looks
up a speech act ID (for this label) from the speech acts table.
Then, from this speech act ID, the system looks up an associated
observation ID in the observations table. These lookups typically
are done at once, through SQL table joins. After determining what
observation ID "hello kind sir" maps to, the system tries to use
this observation ID to first extend an event. To this end,
preferably the data model is updated to reflect that the event is
extended (at this point, merely an assumption that the event can be
extended), and it is the resulting modified event that the system
then uses to perform the initial lookup described above, namely, to
fetch the one or more candidate interpretations. These candidate
interpretations are instances of observations with the observation
ID previously mentioned, e.g., where the system found things like
"hello."
[0046] In addition to obtaining the one or more candidate
interpretations, the event recognition sub-system 404 evaluates the
candidate interpretations that it fetches, with the goal of
identifying one candidate interpretation that it will then provide
to update the data model before that updated model is then provided
to the action selection sub-system 406. In this initial pass, the
event recognition sub-system 404 functions to attempt to understand
the user's input. A mechanism 408 for performing this function
preferably comprises a set of evaluation agents that are sometimes
referred to herein as "critics." Critics preferably operate over
candidate interpretations. As used herein, a critic typically is a
software agent (e.g., implemented as a Python class) that evaluates
a candidate interpretation against a statistical model, a set of
hard-coded rules, or some other data or criteria, in each case to
determine whether the candidate interpretation should be used to
potentially update the data model. Preferably, there are a
plurality of separate and distinct "critics," with each critic
providing a particular type of analysis, and a particular candidate
interpretation must pass each and every critic. In other words,
preferably the critics collectively provide a Boolean AND function
such that a candidate interpretation is not accepted for use to
update the data model unless all critics are satisfied. Although
this approach is preferred, the all-or-nothing criteria may be
relaxed. The nature and processing performed by a critic may vary
from computationally-simple tasks (e.g., evaluation of business
logic), to more complex evaluation tasks (e.g., evaluation against
a statistical model). The critics may be arranged sequentially,
such that the more computationally-simple tasks are tested before
the more complex ones. Critics may also operate in parallel with
one another, and one or more critics may be combined into a single
critic. In this manner, the critics selectively filter the
candidate interpretations so as to ensure that the bot operates
coherently and consistently given the conversation so far.
[0047] As noted above, event recognition tries to extend an event
if it can do so. To this end, and as part of the event recognition
function, the critics are executed against the set of candidate
interpretations. If this initial pass through the critics returns a
positive result, work of the event recognition sub-system is done
(for this first pass), as the sub-system has recognized the user's
input. If, however, the initial pass through the critics returns
empty-handed (i.e., no candidate interpretation survives), then the
event recognition sub-system tries to assign a new event. The new
event is a new instance of an event that will initially have only
one observation in it, namely, a candidate interpretation that
makes it through the critics. To this end, a second pass through
the critics is performed. This second pass may or may not identify
a candidate interpretation. If a candidate interpretation passes
the critics, it is used for the new event. If, however, evaluation
of the new event also returns empty-handed (i.e., no candidate
interpretation logically starts a new event), the routine then
continues by moving on to event aliasing. As noted above, first the
system tries an aliased event, where the system allows an aliased
addition, namely, where a new instance of an event is started at a
position in the conversation where the system has not seen (in the
historical data) that kind (label) of event start. Once again, the
critics are run again in an attempt to find a candidate
interpretation. Processing efficiencies are provided by caching
results of the critic evaluations where possible. The result of
this iterative process (trying first to extend an event, or start a
new event, or aliasing an event, etc.) is a given candidate
interpretation, and processing then continues at the action
selection sub-system 406.
[0048] To this end, the given candidate interpretation identified
by the event recognition sub-system is applied to the data model to
produce an adjusted data model. As used herein, adjusting the data
model may involve an addition to the data model, or not rolling
back a change (which typically occurs when a candidate
interpretation is rejected by a critic). The adjusted data model is
then applied as an input to the action selection sub-system 406,
whose primary function is to propose what observations might come
next. To this end, action selection sub-system 406 tries different
ways of navigating the transcript data to come up with a set of one
or more such observation(s) that might logically come next. An
observation identified by the action selection sub-system
represents a component of an imagined future state of the
conversation. Preferably, the action selection sub-system does not
evaluate correctness (coherency) of the observation. Instead, and
as described below, action selection sub-system 406 hands the
observations (in particular, their observation IDs) that it
identifies back to the event recognition sub-system 404, which then
repeats its above-described operations(s) to evaluate correctness
(coherence filtering). Thus, and according to this disclosure, the
event recognition sub-system is used to both understand the user,
and to select a response to return to the user.
[0049] The action selection sub-system 406 preferably operates as
follows. Taking the adjusted (modified) data model as input, it
performs an additional database lookup pass through the historical
transcripts in the database 402, and in response generates a set of
next possible observations. If these observations were to be
applied to the data model, the resulting data model would represent
one or more imagined future states of the conversation. As such,
this set of observations represents possible responses that might
be returned by the bot (as a response to the new user input). The
action selection sub-system 406 then iterates over different ways
(strategies) to suggest what observation(s) could come next, and it
returns a resulting set of observation IDs (corresponding to the
identified observations) back to the event recognition sub-system
404, thereby re-cycling them back through the coherency filtering
process to test their coherency. Using the set of observation IDs
proposed by the action selection sub-system, the event recognition
sub-system 404 again tries to extend the event, etc., once again
performing a query into the database for candidate interpretations.
The above-described processing by the event recognition sub-system
404 is then repeated. In particular, the critics in the event
processing sub-system 404 are then run over the candidate
interpretations to find one single approved candidate (an
instance). At this point the system has found a coherent response
to the user input, the data model is adjusted to include it, and
this response is then selected to be returned.
[0050] Preferably, the action selection sub-system 406 includes a
mechanism 410 to identify the set of observations. Akin to the
critics in the event recognition sub-system 404, the mechanism 410
comprises a set of evaluation agents that are sometimes referred to
as "action selectors." As used herein, an action selector typically
is a software agent (e.g., once again implemented as a Python
class) that provides a strategy for navigating through the
transcript data. Preferably, there are a plurality of separate and
distinct "action selectors," with each action selector providing a
particular strategy for navigating through the transcript data. The
nature and processing performed by an action selector may vary from
computationally-simple tasks, to more complex evaluation tasks. A
simple action selector just tries whatever came next in one
particular log. A more complicated action selector tries to find
ways to complete events that are otherwise hanging open. Or, a
selector might try to urge the bot to start an event that will
capture some variable value that the bot knows it needs. These are
just example action selectors. As the action selection sub-system
iterates over its action selectors, one or more observations (of
what might come next) are identified and sent back over to the
event recognition sub-system, as previously noted. Once an action
selector has found an observation that is then approved via event
recognition, the operation of the action selection sub-system
ends.
[0051] In operation, action selectors may be arranged sequentially,
such that the more computationally-simple tasks are tested before
the more complex ones. Action selectors may also operate in
parallel with one another, and one or more action selectors may be
combined into a single action selector. Preferably, as an action
selector identifies a possible observation that could come next, it
is returned to the event recognition sub-system for filtering, even
as other action selectors are still executing. In another
embodiment, all of the action selections are enabled to complete
their processing before the resulting one or more observations are
returned (in effect, in batch) back for filtering
[0052] Critics and action selectors may be the same or distinct
software agents. By way of example, the following agents apply to
both event recognition and action selection:
[0053] CriticStaleExtension--after some number of conversational
turns have passed, it becomes unlikely that something someone just
said is supposed to be interpreted as an extension of something
said long ago in the conversation. This critic rejects an
interpretation that tries to extend an event that has not changed
in the recent several turns.
[0054] CriticExtendedEventTypeChange--sometimes extending an event
with one more observation changes the label (and thus the meaning)
of that event. The system needs to be careful about allowing a
semantic shift like this. If the original event was not an alias,
and changing the event label leaves it in a position in the
conversation history where this ordering of events has not been
observed, then the label change (via the event extension) that
would lead to this unfamiliar ordering of events is not
allowed.
[0055] ActionSelectorContinueLog--this is the simplest action
selector. Every time critics approve a candidate interpretation,
the system records in the data model which line of which transcript
that candidate pointed at. This action selector blindly proposes
that the next thing to say is whatever was said next in that
particular transcript.
[0056] ActionSelectorCompleteEvent--this action selector detects
that the data model contains one or more events that cannot yet be
considered complete--they still require extension with more
observations before they represent a whole instance of that event.
This action selector proposes the bot say (or do) something next
that would be the next step in extending some existing event, to
get it closer to completion. For example, if someone asked the bot
"how are you today?", that begins an event that is not complete
until the bot responds with "I'm well."
[0057] ActionSelectorMotivation--this action selector encourages
the bot to say something that prompts the user to answer with some
missing piece of qualifying information (e.g., in the context of
qualifying sales leads). For example, the bot is aware that it
still needs to know how many employees work for some company, so if
the simpler action selectors fail, it tries to move the
conversation forward productively.
[0058] The following agents are only used during action
selection:
[0059] CriticDuplicateUtternance--prevents the bot from saying the
exact thing it has said previously within the same session;
[0060] CriticBizrule--prevents the bot from saying something that
contradicts some piece of business logic. For example, this agent
ensures that the bot does not say "sure, we will connect you to one
of our sales reps right away!" if the system has already detected
that this user's company is in an industry or geographic location
that is not being serviced. After the critic rejects this candidate
bot response, it moves on to another candidate that will get
approved that might say "sorry, we don't offer service in your
location. Would you like to join our waiting list to be notified
when coverage is extended?" or the like.
[0061] The above example agents are not intended to be
limiting.
Utterance Ranking
[0062] As noted above, a common problem that occurs in an AI-based
chatbot solution such as described above occurs when a user (e.g.,
a customer visiting a website) says more than one statement at once
to the conversational bot. A typical default operation is that the
system responds to the most recent utterance for which a response
action can be determined. Depending on context, however, this
default behavior sometimes is not correct, in which case the bot
typically produces a wrong (or context-inappropriate) response.
FIG. 5 depicts an example conversation illustrating this problem.
In this example, the bot defaults to respond to the last utterance
when it should be responding to the user's first utterance; here,
too, the second utterance likewise can be ignored. This is a common
variety of the multi-utterance problem, in which the bot is
responding to a most recent user utterance, even though the
utterance itself is vacuous. FIG. 6 depicts a second example
wherein, although the last utterance in the group is not
necessarily vacuous, it is still not the most importance utterance
to which the bot should respond. In this example, the bot response
to the last (most recent) utterance is not classified correctly.
Once again, here the most important utterance is the first one in
which the customer is asking to talk with a person. Both examples
depict ways in which users interact with bots and, in particular
where the user is following up a request with a reason for the
request. Other variants of multiple customer utterance sequences
include multiple question sequences, multiple statement sequences,
and sequences with many more utterances. In all of these
situations, it is desirable that the bot be able to recognize such
sequences of utterances and be able to appropriately determine
which utterance of the sequence should be addressed. This subject
matter herein addresses this need.
[0063] According to a preferred approach, an utterance ranking
algorithm generates a list of indexes that represent a ranking of
simultaneous (concurrent) utterances by importance. In operation,
the list generated by the ranking algorithm is then passed to the
component of the AI system that selects a next appropriate action
(i.e., a response to a query). As has been described, the action
selector then steps through the history of the conversation
according to the list to determine an approved response action for
one of the utterances.
[0064] The utterance ranking preferably takes into consideration a
set of factors, which are now described. A primary factor is a
"priority status." This is a special status that is provisioned for
a (preferably) relatively small predetermined set of utterance
classes that, based on the implementation requirements, are flagged
as priorities. When a priority (utterance) is observed in a
sequence of customer utterances, the bot puts it at a top of an
utterance ranking, regardless of what other linguistic properties
it may have. In one embodiment involving a sales bot, the following
types of utterances may be afforded priority status: cross-domain
priorities (e.g., demo_request, pricing_request, and
person_request), domain-specific priorities (e.g.
career_opportunity), and the like. Of course, the nature and type
of utterances that are afforded priority status will depend on the
implementation (the expected bot interactions).
[0065] A particular priority utterance class may have one or more
associated sub-rankings, with the sub-ranking then taking effect
when there is more than one priority uttered at once by a customer.
Continuing with the above examples, utterances classified at
demo_request or pricing_request are both ranked as more important
than utterances classified as person_request. To provide a more
concrete example, consider two utterances U1: "I want to schedule a
demo" and U2: "Can I talk to a person?" In these situations, it is
safely assumed that the customer is requesting a person to schedule
a demonstration, and without knowledge of whether the bot can do
this task. An example ranking such as
demo_request>pricing_request tells the bot to respond to the
demonstration request, in which case it will get a demo scheduled,
all without having to route the request to a person. Where
utterances are not likely to be made simultaneously, typically
there is no need to do a relative ranking between them. Again, the
nature of the sub-rankings and how they are provisioned will be
implementation-specific.
[0066] Another factor preferably considered by the algorithm
distinguishes "statements" versus "non-statements." In particular,
whether or not a customer utterance is a question/command (a
"non-statement") or a statement is factored into the rankings.
Non-statements (questions and commands) are ranked higher than
statements. Preferably, questions and commands are not ranked
relative to each other, as they are not likely to co-occur in the
same sequence.
[0067] Another factor that is considered by the algorithm is
"content value," and this factor computes a quantitative measure of
how much actual content is contained in an utterance. Preferably,
and as will be seen, content value is measured and factored into
the utterance ranking, typically with higher content utterances
being ranked higher than lower content utterances. The content
value for a particular utterance is determined by the content of
the words that make up the utterance. The following describes one
representative implementation of the content scoring factor and,
preferably, a content score is assigned to every utterance in a
sequence of customer utterances, including those that fall into the
prioritized utterance classes.
[0068] As noted, the content score of an utterance is determined by
the content value of the words in the utterance. A content value is
a numerical weight assigned to a word. In one embodiment, there are
multiple (e.g., three or more) distinct word categories, with
different "values" then associated with the different categories.
For example, consider a schema wherein there are four (4) main
categories of words corresponding to their content values, namely,
"zero-value" words, "low-value" words, "high-value" words, and "all
other" words. In this example, "zero-value" words correspond to
stop-words, high frequency, functional, vacuous or anaphoric words
(e.g. "they," "because," "thanks" and others). As the name implies,
these words are afforded a content value per word=0. In the next
category, "low-value" words correspond to a set of words that are
not necessarily vacuous but should be treated as low-content give
how informative they are for what the bot cares about. Example
words in this category are "possible," "great," "helpful" and the
like, and they are afforded a content value per word=0.5.
High-value words, in contrast, typically comprise a relatively
small set of words that indicate high content value in chats. In
practice, these are words that strongly indicate requests that the
bot should particularly care about, e.g., "demo," "buy,"
"integrate" and so forth, and they are afforded a content value per
word=3. In this example schema, the "all other" words class is then
a complement of the above three classes, with content value per
value=1. Of course, the above-identified classes, values, and
associated words are merely exemplary, as the scoring schema will
vary depending on implementation. The content score for an
utterance then is the sum of the content values of each word in the
utterance.
[0069] To provide a concrete example, a sentence such as "Would it
be possible for you to do that for me, thanks?" has a word
length=12 but a content score=0.5, as all of the words are
non-zero, and the word "possible" is a low-value word. Another
example, "Does Drift.RTM. integrate with Salesforce.RTM.? has a
length=5 and a content score=5.0, as the word "integrate" is a
high-value word, and the Company names are "all other" words. The
first example then corresponds to a "low-content utterance,"
namely, those with content score <1, whereas the second example
corresponds to a "high-content utterance," namely, those with
content score 1. Of course, this breakpoint is also arbitrary, and
the notion of "high-content" does not necessarily mean a high
number for a content score, but only high relative to the
low-content scores. This latter distinction enables the algorithm
to distinguish between utterance with a low content value and
utterances with a high content value, regardless of whether they
are questions/commands or not. While questions and commands
(non-statements) are generally more important than statements, if a
question/command has a content score less than, for example, 1, it
is assumed to be too low in content to respond to unless there is
no high-content alternative in the sequence.
[0070] As will be seen, preferably the priority utterances have
priority regardless of their linguistic properties, and the
non-priority utterances are evaluated for content and for
non-statement status. To the end, the following describes a
preferred algorithm for creating utterance rankings.
[0071] In particular, FIG. 7 depicts a process flow that detects a
sequence of utterances that are to be prioritized for the bot. The
routine begins at step 700 given a list of utterances (or
"observations") representing the history of the conversation up to
the current point-in-time (turn). At step 702, a backward search in
this history is initiated until a bot utterance is found. At step
704, if no customer utterance a test is performed to determine if a
bot utterance is found is found before hitting a bot utterance (if
the most recent utterance is from the bot), the routine waits for
customer response. At step 706, if there is only one customer
utterance found before finding a bot response, the routine reverts
to a default behavior of responding to most recent customer
utterance. At step 708, if a sequence of two or more customer
utterances is found, however, then the ranking algorithm is
called.
[0072] FIG. 8 depicts a representative operation of the ranking
algorithm. Preferably, the algorithm operates to construct a
ranking of the multiple customer utterances found based on the
three (3) types of factors described above. In general, the routine
creates rankings within subclasses of utterances found in the
sequence before joining the sub-rankings and rearranging the order
to reflect all three factors. Stated another way, preferably the
system ranks utterances in a user-input sequence by generating
sub-rankings based on the three factors of utterance meaning,
utterance type and an utterance content score, and ordering these
sub-ranking with respect to one another.
[0073] To this end, and assuming a sequence of unanswered customer
utterances is found in the conversational history (per the process
flow in FIG. 7). The ranking algorithm begins at step 800 to
extract the sequence of unanswered utterances from the history and
calculate a content score for each utterance. At step 802, the
routine separates this collection into a set of four (4) buckets
based on content scores and utterance type (question/commands vs.
non-question): high-content questions/commands, low-content
questions/commands, high-content non-question/commands low-content
non-questions/commands. At step 804, and within each sub-group, the
routine internally ranks the utterances based on their content
scores. At step 806, the routine joints the lists created in steps
802 and 804, preferably in the following order: high-content
questions/commands+high-content statements+low-content
questions/commands+low-content statements. At step 808, the routine
conducts a search of the list created in step 806 and, in
particular, for utterances in a priority class. At step 810, the
routine removes any such utterances (from a priority class) from
the list created at step 806 and adds them to a new list of
priorities. If there is more than one utterance in this new list of
priorities, they are ordered with the most recent in the history
being first. At step 812, the routine prepends the priority list
(the new list of priorities, possibly as re-ordered) to the front
of the list created in step 806 (with the priorities having been
previously removed. This results in a ranked list of the utterances
in the sequence, placing them into a ranked order of importance
that reflects a lowest to highest priority of response. At step
814, the ranked list is passed to a component that then implements
the rankings in the bot's response to the customer.
[0074] In particular, FIG. 9 represents one embodiment that
implements the rankings provided by the ranking algorithm. In this
embodiment, the system modifies the history that is tracked in the
bot's data model to reflect the re-ranking. If there is a
multi-utterance input from the human, the utterances are
represented in the history order of lowest to highest priority. To
this end, at step 900 the ranking output from the ranking algorithm
is received. At step 902, the AI bot is then set to respond to the
most recent utterance (the last utterance) in the history, which
(based on the re-ranking provided by the algorithm) will always be
the most important given a sequence of utterances. This
implementation, wherein the bot responds to the most recent
utterance following the re-ranking of the utterances) avoids
interfering with the bot's behavior in regards to event
continuation, as described generally above. After the history is
re-ordered, in the action selection component of the AI system, the
bot considers the last utterance in the conversation history first
and check for an approved action. At step 904, and if an approved
action is found, the bot performs the action. At step 906, and if
an approved action is not found, the bot continues stepping through
the history backwards (i.e., through the utterance ranking) until
an approved action is found for one of the utterances.
[0075] Although in a preferred embodiment, the re-ranking positions
the highest-ranked utterance as the most recent (so that the bot
can process it preferentially), this is not a requirement. In an
alternative embodiment, the action selection component may receive
the highest priority utterance directly irrespective of its
particular placement in the data model.
[0076] In general, the process flows described above in FIGS. 7-9
are implemented in software, as a set of computer program
instructions executed by one or more machines. The process flows
may be integrated or executed as separate flows.
[0077] Illustrative examples of the operation of the ranking
algorithm are provided in FIGS. 10-12. These are representative but
non-limiting situations that would motivate the bot to respond to
an utterance other than the most recent when facing a sequence of
two or more customer utterances. FIG. 10 depicts multiple statement
utterances. In Example 1, both utterances are non-priority
statements and U1 is determined to be more important because it has
a content score of 6, where U2 has a content score of 3. FIG. 11
depicts a mixture of questions and statements. In Example 2, all
utterances are non-priority. U1 and U3 are statements and U2 is a
question. U3 has a content value of 0 and is ranked last. U2 is a
question but is a low-content question (score of 0.5), whereas U1
has a content value of 2.5 (with the word "something" being treated
as a low content word with value 0.5) and is thus ranked higher. In
Example 3, both utterances are non-priority utterances and, while
U2 has a higher content value than U1, U1 is determined to be more
important because it is a high-content question, whereas U2 is a
high-content statement. FIG. 12 depicts multiple question
utterances. Both utterances here are questions. U1 has a content
score of 1 and U2 a content score=2.5 but, because U1 is in the
priority category pricing_request, it is given a higher rank.
Variants
[0078] The bot may also be configured to determine when a subset of
utterances (e.g., 2) in a set of multiple utterances are both worth
responding to, in which case both are then responded to
irrespective of the final priority determination. Example 4 in FIG.
12 is an example (here where both utterances should be given a
response). This approach may be used when there are multiple
questions.
[0079] Another variant provides for more fine-grained content
values, which provides that the utterance ranking may be more
informed. For example, the fine-grained scoring may be configured
to reflect the importance of the utterance(s) within the context of
the conversational chat (or even in a non-conversational chat), and
not just relative importance/frequency in language use generally.
When making more fine-grained scoring for words, importance need
not be equated to low-frequency (which is typical in NLP).
Typically, the most important words for the bot will be frequent in
comparison to other non-stop-words.
[0080] Other implementation-specific ranking criteria may also be
applied by the algorithm.
Enabling Technologies
[0081] Typically, the computing platform is managed and operated
"as-a-service" by a service provider entity. In one embodiment, the
platform is accessible over the publicly-routed Internet at a
particular domain, or sub-domain. The platform is a
securely-connected infrastructure (typically via SSL/TLS
connections), and that infrastructure includes data encrypted at
rest, e.g., in an encrypted database, and in transit. The computing
platform typically comprises a set of applications implemented as
network-accessible services. One or more applications (services)
may be combined with one another. An application (service) may be
implemented using a set of computing resources that are co-located
or themselves distributed. Typically, an application is implemented
using one or more computing systems. The computing platform (or
portions thereof) may be implemented in a dedicated environment, in
an on-premises manner, as a cloud-based architecture, or some
hybrid.
[0082] The system may be implemented on-premises (e.g., in an
enterprise network), in a cloud computing environment, or in a
hybrid infrastructure. An individual end user typically accesses
the system using a user application executing on a computing device
(e.g., mobile phone, tablet, laptop or desktop computer,
Internet-connected appliance, etc.). In a typical use case, a user
application is a mobile application (app) that a user obtains from
a publicly-available source, such as a mobile application
storefront. The platform may be managed and operated by a service
provider. Although typically the platform is network-accessible,
e.g., via the publicly-routed Internet, the computing system may be
implemented in a standalone or on-premises manner. In addition, one
or more of the identified components may interoperate with some
other enterprise computing system or application.
[0083] Preferably, the platform supports a machine learning system.
The nature and type of Machine Learning (ML) algorithms that are
used to process the query may vary. As is known, ML algorithms
iteratively learn from the data, thus allowing the system to find
hidden insights without being explicitly programmed where to look.
ML tasks are typically classified into various categories depending
on the nature of the learning signal or feedback available to a
learning system, namely supervised learning, unsupervised learning,
and reinforcement learning. In supervised learning, the algorithm
trains on labeled historic data and learns general rules that map
input to output/target. The discovery of relationships between the
input variables and the label/target variable in supervised
learning is done with a training set, and the system learns from
the training data. In this approach, a test set is used to evaluate
whether the discovered relationships hold and the strength and
utility of the predictive relationship is assessed by feeding the
model with the input variables of the test data and comparing the
label predicted by the model with the actual label of the data. The
most widely used supervised learning algorithms are Support Vector
Machines, linear regression, logistic regression, naive Bayes, and
neural networks. As will be described, the techniques herein
preferably leverage a network of neural networks. Formally, a NN is
a function g: X.fwdarw.Y, where X is an input space, and Y is an
output space representing a categorical set in a classification
setting (or a real number in a regression setting). For a sample x
that is an element of X, g(x)=f.sub.L (f.sub.L-1/( . . .
((f.sub.1(x)))). Each f.sub.i represents a layer, and f.sub.L is
the last output layer. The last output layer creates a mapping from
a hidden space to the output space (class labels) through a softmax
function that outputs a vector of real numbers in the range [0, 1]
that add up to 1. The output of the softmax function is a
probability distribution of input x over C different possible
output classes.
[0084] Thus, for example, in one embodiment, and without
limitation, a neural network such as described is used to extract
features from an utterance, with those extracted features then
being used to train a Support Vector Machine (SVM).
[0085] In unsupervised machine learning, the algorithm trains on
unlabeled data. The goal of these algorithms is to explore the data
and find some structure within. The most widely used unsupervised
learning algorithms are Cluster Analysis and Market Basket
Analysis. In reinforcement learning, the algorithm learns through a
feedback system. The algorithm takes actions and receives feedback
about the appropriateness of its actions and based on the feedback,
modifies the strategy and takes further actions that would maximize
the expected reward over a given amount of time.
[0086] The following provides additional details regarding
supervised machine learning. As noted above, supervised learning is
the machine learning task of inferring a function from labeled
training data. The training data consist of a set of training
examples. In supervised learning, typically each example is a pair
consisting of an input object (typically a vector), and a desired
output value (also called the supervisory signal). A supervised
learning algorithm analyzes the training data and produces an
inferred function, which can be used for mapping new examples. An
optimal scenario allows for the algorithm to correctly determine
the class labels for unseen instances. This requires the learning
algorithm to generalize reasonably from the training data to unseen
situations.
[0087] For supervised learning, the following steps are used. An
initial determination is what kind of data is to be used as a
training set. The training set is then gathered. In particular, a
set of input objects is gathered and corresponding outputs are also
gathered, either from human experts or from measurements. Then, an
input feature representation of the learned function is determined.
In this approach, typically the input object is transformed into a
feature vector, which contains a number of features that are
descriptive of the object. The structure of the learned function
and corresponding learning algorithm are then determined. For
example, support vector machines or decision trees may be used. The
learning algorithm is then run on the gathered training set. Some
supervised learning algorithms require a user to determine certain
control parameters. These parameters may be adjusted by optimizing
performance on a subset (called a validation set) of the training
set, or via cross-validation. The accuracy of the learned function
is then evaluated. After parameter adjustment and learning, the
performance of the resulting function is measured on a test set
that is separate from the training set.
[0088] One or more functions of the computing platform of this
disclosure may be implemented in a cloud-based architecture. As is
well-known, cloud computing is a model of service delivery for
enabling on-demand network access to a shared pool of configurable
computing resources (e.g. networks, network bandwidth, servers,
processing, memory, storage, applications, virtual machines, and
services) that can be rapidly provisioned and released with minimal
management effort or interaction with a provider of the service.
Available services models that may be leveraged in whole or in part
include: Software as a Service (SaaS) (the provider's applications
running on cloud infrastructure); Platform as a service (PaaS) (the
customer deploys applications that may be created using provider
tools onto the cloud infrastructure); Infrastructure as a Service
(IaaS) (customer provisions its own processing, storage, networks
and other computing resources and can deploy and run operating
systems and applications).
[0089] The platform may comprise co-located hardware and software
resources, or resources that are physically, logically, virtually
and/or geographically distinct. Communication networks used to
communicate to and from the platform services may be packet-based,
non-packet based, and secure or non-secure, or some combination
thereof.
[0090] More generally, the techniques described herein are provided
using a set of one or more computing-related entities (systems,
machines, processes, programs, libraries, functions, or the like)
that together facilitate or provide the described functionality
described above. In a typical implementation, a representative
machine on which the software executes comprises commodity
hardware, an operating system, an application runtime environment,
and a set of applications or processes and associated data, that
provide the functionality of a given system or subsystem. As
described, the functionality may be implemented in a standalone
machine, or across a distributed set of machines.
[0091] Other enabling technologies for the machine learning
algorithms include, without limitation, vector autoregressive
modeling (e.g., Autoregressive Integrated Moving Average (ARIMA)),
state space modeling (e.g., using a Kalman filter), a Hidden Markov
Model (HMM), recurrent neural network (RNN) modeling, RNN with long
short-term memory (LSTM), Random Forests, Generalized Linear
Models, Extreme Gradient Boosting, Extreme Random Trees, and
others. By applying these modeling techniques, new types of
features are extracted, e.g., as follows: model parameters (e.g.
coefficients for dynamics, noise variance, etc.), latent states,
and predicted values for a next couple of observation periods.
[0092] Typically, but without limitation, a client device is a
mobile device, such as a smartphone, tablet, or wearable computing
device, laptop or desktop. A typical mobile device comprises a CPU
(central processing unit), computer memory, such as RAM, and a
drive. The device software includes an operating system (e.g.,
Google.RTM. Android.TM., or the like), and generic support
applications and utilities. The device may also include a graphics
processing unit (GPU). The mobile device also includes a
touch-sensing device or interface configured to receive input from
a user's touch and to send this information to processor. The
touch-sensing device typically is a touch screen. The mobile device
comprises suitable programming to facilitate gesture-based control,
in a manner that is known in the art.
[0093] Generalizing, the mobile device is any wireless client
device, e.g., a cellphone, pager, a personal digital assistant
(PDA, e.g., with GPRS NIC), a mobile computer with a smartphone
client, or the like. Other mobile devices in which the technique
may be practiced include any access protocol-enabled device (e.g.,
an Android.TM.-based device, or the like) that is capable of
sending and receiving data in a wireless manner using a wireless
protocol. Typical wireless protocols are: WiFi, GSM/GPRS, CDMA or
WiMax. These protocols implement the ISO/OSI Physical and Data Link
layers (Layers 1 & 2) upon which a traditional networking stack
is built, complete with IP, TCP, SSL/TLS and HTTP.
[0094] Each above-described process preferably is implemented in
computer software as a set of program instructions executable in
one or more processors, as a special-purpose machine.
[0095] While the above describes a particular order of operations
performed by certain embodiments of the invention, it should be
understood that such order is exemplary, as alternative embodiments
may perform the operations in a different order, combine certain
operations, overlap certain operations, or the like. References in
the specification to a given embodiment indicate that the
embodiment described may include a particular feature, structure,
or characteristic, but every embodiment may not necessarily include
the particular feature, structure, or characteristic.
[0096] While the disclosed subject matter has been described in the
context of a method or process, the subject matter also relates to
apparatus for performing the operations herein. This apparatus may
be a particular machine that is specially constructed for the
required purposes, or it may comprise a computer otherwise
selectively activated or reconfigured by a computer program stored
in the computer. Such a computer program may be stored in a
computer readable storage medium, such as, but is not limited to,
any type of disk including an optical disk, a CD-ROM, and a
magnetic-optical disk, a read-only memory (ROM), a random access
memory (RAM), a magnetic or optical card, or any type of media
suitable for storing electronic instructions, and each coupled to a
computer system bus.
[0097] A given implementation of the computing platform is software
that executes on a hardware platform running an operating system
such as Linux. A machine implementing the techniques herein
comprises a hardware processor, and non-transitory computer memory
holding computer program instructions that are executed by the
processor to perform the above-described methods.
[0098] The functionality may be implemented with other application
layer protocols besides HTTP/HTTPS, or any other protocol having
similar operating characteristics.
[0099] There is no limitation on the type of computing entity that
may implement the client-side or server-side of the connection. Any
computing entity (system, machine, device, program, process,
utility, or the like) may act as the client or the server.
[0100] While given components of the system have been described
separately, one of ordinary skill will appreciate that some of the
functions may be combined or shared in given instructions, program
sequences, code portions, and the like. Any application or
functionality described herein may be implemented as native code,
by providing hooks into another application, by facilitating use of
the mechanism as a plug-in, by linking to the mechanism, and the
like.
[0101] The platform functionality may be co-located or various
parts/components may be separately and run as distinct functions,
perhaps in one or more locations (over a distributed network).
[0102] Each above-described process preferably is implemented in
computer software as a set of program instructions executable in
one or more processors, as a special-purpose machine.
[0103] The techniques herein generally provide for the
above-described improvements to a technology or technical field, as
well as the specific technological improvements to various fields,
all as described above.
[0104] The above-described solution may be implemented across
various use cases. A representative (but non-limiting) use case is
a chatbot that is used as an alternative to a human Sales
Development Representative (SDR) to help an organization scale to
handle a larger volume of traffic that a sales team can support,
and to handle conversations at off-hours or otherwise when no
humans are standing by or available.
[0105] Although the above-described workflow is described in the
context of a chat, this is not a limitation. Generalizing, the
multi-turn conversation is one of: chat, e-mail, SMS and a
voice-based interaction.
* * * * *