U.S. patent application number 13/415809 was filed with the patent office on 2012-10-04 for systems, methods, and media for determining fraud risk from audio signals.
Invention is credited to Lisa Guerra, Anthony Rajakumar, Vipul Vyas, Torsten Zeppenfeld.
Application Number | 20120253805 13/415809 |
Document ID | / |
Family ID | 46928417 |
Filed Date | 2012-10-04 |
United States Patent
Application |
20120253805 |
Kind Code |
A1 |
Rajakumar; Anthony ; et
al. |
October 4, 2012 |
SYSTEMS, METHODS, AND MEDIA FOR DETERMINING FRAUD RISK FROM AUDIO
SIGNALS
Abstract
Systems, methods, and media for determining fraud risk from
audio signals and non-audio data are provided herein. Some
exemplary methods include receiving an audio signal and an
associated audio signal identifier, receiving a fraud event
identifier associated with a fraud event, determining a speaker
model based on the received audio signal, determining a channel
model based on a path of the received audio signal, using a server
system, updating a fraudster channel database to include the
determined channel model based on a comparison of the audio signal
identifier and the fraud event identified, and updating a fraudster
voice database to include the determined speaker model based on a
comparison of the audio signal identifier and the fraud event
identifier.
Inventors: |
Rajakumar; Anthony;
(Fremont, CA) ; Zeppenfeld; Torsten; (Emerald
Hills, CA) ; Guerra; Lisa; (Los Altos, CA) ;
Vyas; Vipul; (Palo Alto, CA) |
Family ID: |
46928417 |
Appl. No.: |
13/415809 |
Filed: |
March 8, 2012 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
13290011 |
Nov 4, 2011 |
|
|
|
13415809 |
|
|
|
|
11754974 |
May 29, 2007 |
8073691 |
|
|
13290011 |
|
|
|
|
11754975 |
May 29, 2007 |
|
|
|
11754974 |
|
|
|
|
12352530 |
Jan 12, 2009 |
|
|
|
11754975 |
|
|
|
|
12856200 |
Aug 13, 2010 |
|
|
|
12352530 |
|
|
|
|
12856118 |
Aug 13, 2010 |
|
|
|
12856200 |
|
|
|
|
12856037 |
Aug 13, 2010 |
|
|
|
12856118 |
|
|
|
|
11404342 |
Apr 14, 2006 |
|
|
|
12856037 |
|
|
|
|
13278067 |
Oct 20, 2011 |
|
|
|
11404342 |
|
|
|
|
11754974 |
May 29, 2007 |
8073691 |
|
|
13278067 |
|
|
|
|
13415816 |
Mar 8, 2012 |
|
|
|
11754974 |
|
|
|
|
60923195 |
Apr 13, 2007 |
|
|
|
60808892 |
May 30, 2006 |
|
|
|
60923195 |
Apr 13, 2007 |
|
|
|
60808892 |
May 30, 2006 |
|
|
|
61197848 |
Oct 31, 2008 |
|
|
|
61010701 |
Jan 11, 2008 |
|
|
|
61335677 |
Jan 11, 2010 |
|
|
|
61335677 |
Jan 11, 2010 |
|
|
|
61335677 |
Jan 11, 2010 |
|
|
|
60673472 |
Apr 21, 2005 |
|
|
|
60923195 |
Apr 13, 2007 |
|
|
|
60808892 |
May 30, 2006 |
|
|
|
Current U.S.
Class: |
704/236 ;
704/E15.001 |
Current CPC
Class: |
H04M 3/436 20130101;
H04W 12/12 20130101; G06Q 20/4016 20130101; H04M 15/47 20130101;
G10L 17/00 20130101; G10L 17/04 20130101; G10L 25/48 20130101; G10L
17/02 20130101; H04M 2201/41 20130101; H04W 12/1206 20190101; G10L
17/06 20130101; G10L 17/26 20130101; H04M 2201/18 20130101; H04M
2203/6027 20130101; H04M 2203/6054 20130101 |
Class at
Publication: |
704/236 ;
704/E15.001 |
International
Class: |
G10L 15/00 20060101
G10L015/00 |
Claims
1. A method comprising: receiving an audio signal and an associated
audio signal identifier; receiving a fraud event identifier
associated with a fraud event; determining a speaker model based on
the received audio signal, using a server system; determining a
channel model based on a path of the received audio signal, using
the server system; updating a fraudster channel database to include
the determined channel model based on a comparison of the audio
signal identifier and the fraud event identifier; and updating a
fraudster voice database to include the determined speaker model
based on a comparison of the audio signal identifier and the fraud
event identifier.
2. The method of claim 1, wherein the audio signal is received
without regard to fraud activities.
3. The method of claim 1, wherein the channel model represents
distortion of a source of the received audio signal along the path
of the received audio signal.
4. The method of claim 1, further comprising: receiving a candidate
audio sample; determining a channel match score based on a match
between candidate audio sample and the determined channel model in
the fraudster channel database; determining a voice match score
based on a match between candidate audio sample and the determined
speaker model in the fraudster voice database; and determining an
audio sample risk score based on the channel match score and the
voice match score.
5. The method of claim 4, further comprising selecting the
fraudster voice database based on a match between candidate audio
sample and the determined channel model in the fraudster channel
database.
6. The method of claim 4, wherein the channel match score
represents a match between the path of the received audio signal
and a path traversed by the received candidate audio sample.
7. The method of claim 6, further comprising selecting the
fraudster voice database based on the channel match score.
8. The method of claim 1, wherein updating a fraudster voice
database further comprises comparing a time stamp of the fraud
event and a time stamp of the received audio signal.
9. The method of claim 1, wherein the fraudster voice database and
the fraudster channel database are included in a fraudster
database.
10. A method for screening an audio sample, the method comprising:
maintaining a set of channel models in a server system, each
channel model belonging to a disqualified candidate and
representing a path of an audio signal associated with an
identifier that has been matched to information associated with an
instance of fraud; receiving a screening request, the screening
request comprising an audio sample for a candidate; comparing the
audio sample with the channel models in the set of channel models
in the server system; and generating a channel match score based on
at least a partial match between the audio sample and a channel
model in the set of channel models.
11. The method of claim 10, further comprising: maintaining a set
of speaker models in the server system, each speaker model
belonging to a disqualified candidate and based on a speaker model
associated with an identifier that has been matched to information
associated with an instance of fraud; and generating a voice match
score on at least partial match between the audio sample and a
speaker model in the set of speaker models.
12. The method of claim 11, further comprising generating a risk
score based on the voice match score and the channel match score,
and providing the risk score to a third party.
13. The method of claim 11, further comprising selecting the
maintained set of speaker models belonging to disqualified
candidates based on the channel score.
14. A system for analyzing audio, comprising: a memory for storing
executable instructions for analyzing audio; a processor for
executing the instructions; a communications module stored in
memory and executable by the processor to receive an audio signal
and an associated audio signal identifier, and to receive a fraud
event identifier associated with a fraud event; an audio analysis
module stored in memory and executable by the processor to extract
signatures from the received audio signal; and an enrollment module
stored in memory and executable by the processor to compare the
audio signal identifier and the fraud event identifier and based on
the comparison to store in a fraudster database any of a channel
model extracted from the audio signal using the analysis module, a
speaker model extracted from the audio signal using the analysis
module, or combinations thereof.
15. The system of claim 14, wherein the channel model represents a
distortion of a signal originating from a source of the received
audio signal, the distortion occurring along a path of the audio
signal between the source and the communications module.
16. The system of claim 14, wherein the enrollment module is
further executable by the processor to compare the audio signal
identifier and a fraud event identifier, and based on the
comparison store in a whitelist database a channel model extracted
from the audio signal using the analysis module.
17. The system of claim 14, further comprising: a scoring module
stored in memory and executable by the processor to: receive a
candidate audio sample; determine a channel match score based on an
at least partial match between candidate audio sample and the
extracted channel model in the fraudster database; determine a
voice match score based on an at least partial match between
candidate audio sample and the extracted speaker model in the
fraudster database; and determine a fraud score based on the
channel match score and the voice match score.
18. The system of claim 17, wherein the channel match score
represents an at least partial match of a path between a source of
the received audio signal and the communications module, and a path
traversed by the received candidate audio.
19. The system of claim 17, wherein the scoring module is further
executable to select a subset of the fraudster database for
comparison of the candidate audio and speaker models in the
fraudster database based on the channel match score.
20. The system of claim 17, wherein the scoring module stored in
memory is further executable by the processor to determine the
fraud score based on a whitelist risk score.
21. The system of claim 20, wherein the scoring module selects a
whitelist database based on the channel match score for comparison
of the candidate audio and speaker models in the whitelist
database.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application is a continuation-in-part and claims
benefit of and priority to U.S. patent application Ser. No.
13/290,011, filed on Nov. 4, 2011, entitled "SYSTEMS, METHODS, AND
MEDIA FOR DETERMINING FRAUD PATTERNS AND CREATING FRAUD BEHAVIORAL
MODELS," which is a continuation-in-part of U.S. patent application
Ser. No. 11/754,974, (now U.S. Pat. No. 8,073,691) filed on May 29,
2007, entitled "METHOD AND SYSTEM FOR SCREENING USING VOICE DATA
AND METADATA," which in turn claims the benefit of and priority to
U.S. Provisional Applications 60/923,195, filed on Apr. 13, 2007,
entitled "Selecting Techniques and Geographical Optimization
Details for a Fraud Detection System that uses Voiceprints," and
60/808,892, filed on May 30, 2006, entitled "Optimizations for a
Fraud Detection System that uses Voiceprints."
[0002] This application is also a continuation-in-part and claims
benefit of and priority to U.S. patent application Ser. No.
11/754,975, filed on May 29, 2007, entitled "Method and System to
Seed a Voice Database," which in turn claims the benefit of and
priority to U.S. Provisional Applications 60/923,195, filed on Apr.
13, 2007, entitled "Selecting Techniques and Geographical
Optimization Details for a Fraud Detection System that uses
Voiceprints," and 60/808,892, filed on May 30, 2006, entitled
"Optimizations for a Fraud Detection System that uses
Voiceprints."
[0003] This application is also a continuation-in-part and claims
benefit of and priority to U.S. patent application Ser. No.
12/352,530, filed on Jan. 12, 2009, entitled "BUILDING WHITELISTS
COMPRISING VOICEPRINTS NOT ASSOCIATED WITH FRAUD AND SCREENING
CALLS USING A COMBINATION OF A WHITELIST AND BLACKLIST," which in
turn claims the benefit of and priority to U.S. Provisional
Applications 61/197,848, filed Oct. 31, 2008, entitled "Fraud
system incorporating both Voiceprint Whitelists and Voiceprint
Blacklists," and 61/010,701, filed Jan. 11, 2008, entitled
"Voiceprint databases of any group of individuals."
[0004] This application is also a continuation-in-part and claims
benefit of and priority to U.S. patent application Ser. No.
12/856,200, filed on Aug. 13, 2010, entitled "SPEAKER
VERIFICATION-BASED FRAUD SYSTEM FOR COMBINED AUTOMATED RISK SCORE
WITH AGENT REVIEW AND ASSOCIATED USER INTERFACE," which in turn
claims the benefit of and priority to U.S. Provisional Application
61/335,677, filed on Jan. 11, 2010, entitled "A method for
Voice-Biometrics-Based Fraud Risk Scoring."
[0005] This application is also a continuation-in-part and claims
benefit of and priority to U.S. patent application Ser. No.
12/856,118, filed on Aug. 13, 2010, entitled "METHOD AND SYSTEM FOR
GENERATING A FRAUD RISK SCORE USING TELEPHONY CHANNEL BASED AUDIO
AND NON-AUDIO DATA," which in turn claims the benefit of and
priority to U.S. Provisional Applications 61/335,677, filed on Jan.
11, 2010, entitled "A method for Voice-Biometrics-Based Fraud Risk
Scoring," and 60/673,472, filed on Apr. 21, 2005, entitled
"Detecting Fraudulent Use of Financial Account Numbers Using
Voiceprints."
[0006] This application is also a continuation-in-part and claims
benefit of and priority to U.S. patent application Ser. No.
12/856,037, filed on Aug. 13, 2010, entitled "METHOD AND SYSTEM FOR
ENROLLING A VOICEPRINT IN A FRAUDSTER DATABASE," which in turn
claims the benefit of and priority to U.S. Provisional Applications
61/335,677, filed on Jan. 11, 2010, and 60/673,472, filed on Apr.
21, 2005.
[0007] This application and each of the aforementioned
Non-Provisional U.S. patent applications is a continuation-in-part
and claims benefit of and priority to U.S. patent application Ser.
No. 11/404,342, filed on Apr. 14, 2006, entitled "Method and system
to detect fraud using voice data," which in turn claims the benefit
of U.S. Provisional Application 60/673,472, filed on Apr. 21, 2005,
entitled "Detecting Fraudulent Use of Financial Account Numbers
Using Voiceprints."
[0008] This application is also a continuation-in-part and claims
the benefit of and priority to U.S. patent application Ser. No.
13/278,067, filed on Oct. 20, 2011, entitled "Method and System for
Screening Using Voice Data and Metadata," which in turn is a
continuation of and claims the benefit of and priority to U.S.
patent application Ser. No. 11/754,974, filed on May 29, 2007,
entitled "METHOD AND SYSTEM FOR SCREENING USING VOICE DATA AND
METADATA," which in turn claims the benefit of and priority to U.S.
Provisional Applications 60/923,195, filed on Apr. 13, 2007,
entitled "Seeding Techniques and Geographical Optimization Details
for a Fraud Detection System that uses Voiceprints," and
60/808,892, filed on May 30, entitled "Optimizations for a Fraud
Detection System that uses Voiceprints."
[0009] This application is also a continuation-in-part and claims
benefit of and priority to U.S. Patent Application Attorney Docket
Number PA5872US, filed concurrently herewith on Mar. 8, 2012,
entitled "SYSTEMS, METHODS, AND MEDIA FOR GENERATING HIERARCHICAL
FUSED RISK SCORES." All of above applications and patents are
hereby incorporated by reference herein in their entirety.
FIELD OF THE TECHNOLOGY
[0010] Embodiments of the disclosure relate to determining fraud
risk from audio signals, and more specifically, but not by way of
limitation, to systems, methods, and media for extracting and using
characteristics of audio signals to determine fraud risk. Signal
processing may include comparing components such as speaker models,
channel models, and/or operational models of a candidate audio
signal to a plurality of types of data stored in various fraudster
databases. Matches between the audio signal and data stored in the
various fraudster databases may indicate that an audio signal is
associated with fraud.
BACKGROUND
[0011] Fraud such as credit card fraud and identity fraud are
common. To deal with fraud, enterprises such as merchants and banks
use a variety of fraud detection systems. However, these fraud
detection systems are susceptible to becoming obsolete within a
short time because fraudsters change their methods of perpetrating
fraud in order to maneuver past such fraud detection systems.
SUMMARY
[0012] According to some embodiments, the present technology may be
directed to methods that comprise: (a) receiving an audio signal
and an associated audio signal identifier without regard to fraud
activities; (b) receiving a fraud event identifier associated with
a fraud event; (c) determining a speaker model based on the
received audio signal, using a server system; (d) determining a
channel model based on a path of the received audio signal, using
the server system; (e) updating a fraudster channel database to
include the determined channel model based on a comparison of the
audio signal identifier and the fraud event identifier; and (f)
updating a fraudster voice database to include the determined
speaker model based on a comparison of the audio signal identifier
and the fraud event identifier.
[0013] According to other embodiments, the present technology may
be directed to methods for screening an audio sample that include:
(a) maintaining a list of channel models in a server system, each
channel model belonging to a disqualified candidate and
representing a path of an audio signal associated with an
identifier that has been matched to information associated with an
instance of fraud; (b) receiving a screening request, the screening
request comprising an audio sample for a candidate; (c) comparing
the audio sample with the channel models in the list of channel
models in the server system; and (d) sending a channel score to the
third party, the channel score indicating at least a partial match
between the audio sample and a channel model in the list of channel
models.
[0014] According to some embodiments, the present technology may be
directed to systems for analyzing audio. The systems may comprise:
(a) a memory for storing executable instructions for analyzing
audio; (b) a processor for executing the instructions; (c) a
communications module stored in memory and executable by the
processor to receive an audio signal and an associated audio signal
identifier, and to receive a fraud event identifier associated with
a fraud event; (d) an audio analysis module stored in memory and
executable by the processor to extract signatures from the received
audio signal; and (e) an enrollment module stored in memory and
executable by the processor to compare the audio signal identifier
and the fraud event identifier and based on the comparison to store
in a fraudster database any of: (i) a channel model extracted from
the audio signal using the analysis module; (ii) a speaker model
extracted from the audio signal using the analysis module; and
(iii) combinations thereof.
BRIEF DESCRIPTION OF THE DRAWINGS
[0015] The accompanying drawings, where like reference numerals
refer to identical or functionally similar elements throughout the
separate views, together with the detailed description below, are
incorporated in and form part of the specification, and serve to
further illustrate embodiments of concepts that include the claimed
disclosure, and explain various principles and advantages of those
embodiments.
[0016] The methods and systems disclosed herein have been
represented where appropriate by conventional symbols in the
drawings, showing only those specific details that are pertinent to
understanding the embodiments of the present disclosure so as not
to obscure the disclosure with details that will be readily
apparent to those of ordinary skill in the art having the benefit
of the description herein.
[0017] FIG. 1 illustrates a pictorial representation of an
exemplary implementation of a system for fraud detection;
[0018] FIG. 2 illustrates an exemplary audio analysis system for
processing call data;
[0019] FIG. 3 shows a flowchart of an exemplary method for
processing audio signals; and
[0020] FIG. 4 illustrates an exemplary computing system that may be
used to implement embodiments according to the present
technology.
DETAILED DESCRIPTION
[0021] In the following description, for purposes of explanation,
numerous specific details are set forth in order to provide a
thorough understanding of the disclosure. It will be apparent,
however, to one skilled in the art, that the disclosure may be
practiced without these specific details. In other instances,
structures and devices are shown at block diagram form only in
order to avoid obscuring the disclosure.
[0022] Generally speaking, the present technology is directed to
systems, methods, and media for analyzing call data against speaker
models and channel models derived from audio signals and/or
non-audio data to determine a fraud risk for a call event. That is,
when a caller engages an enterprise telephonically (e.g., the a
call event) the call data of the call event may be recorded and
analyzed to determine if the call event is likely to be associated
with fraud.
[0023] The term "speaker model" may be understood to comprise a
voice model and/or a language model. A voice model may be
understood to include structural features associated with a speaker
such as tenor, timbre, frequency, overtones, and so forth. A
language model may be understood to comprise features such as word
choice, accent, language, word order, and so forth. The combination
of a voice model and a language model may provide a robust model to
uniquely identify a speaker given an audio signal.
[0024] In some embodiments, call data including features and/or
characteristics of the call data received from candidates may be
compared to fraudster models that are stored in a fraudster
database. The fraudster model may comprise any combination of a
speaker model, a channel model, and an operational model for a
given fraudster. The operational model for a fraudster may comprise
data such as aliases/names utilized, ANIs used, geographical area
of operation (e.g., shipping address, zip code, etc.), fraudulent
activities, and so forth. Each fraudster may be associated with a
fraud identifier that uniquely identifies a fraudster and allows
the fraudster to be tracked.
[0025] Facets of the analysis of the call event may include
determining audio signal characteristics of the audio signal of the
call event such as speaker characteristics and channel
characteristics. These audio signal characteristics may be analyzed
by comparing them to fraud indicators to determine a fraud risk for
the call event. It will be understood that audio signal audio
and/or characteristics (voice and non-voice portions) of a call
event may be analyzed by determining whether one or more audio
signal data and/or characteristics match a characteristic of a
landline, VoIP (Voice over Internet Protocol), a cellular phone, or
other any other telecommunications path that may be traversed by an
audio signal. In some instances the present technology may decipher
the specific type of communications method (e.g., CDMA, GSM, VoIP,
etc.) employed in the transmission of the audio signal. In
additional embodiments the present technology may determine a call
traversal path by analyzing the delay, jitter, or other artifacts
inherent in the call path. Further details regarding determining a
call traversal path feature are disclosed in PinDrOp: Using
Single-Ended Audio Features To Determine Call Provenance,
Converging Infrastructure Security (CISEC) Laboratory, Georgia
Tech. Information Security Center (GTISC); Authors: Vijay A.
Balasubramaniyan, Aamir Poonawalla, Mustaque Ahamad, Michael T.
Hunter, and Patrick Traynor--which is hereby incorporated herein by
reference in its entirety.
[0026] Advantageously, once the present technology has
characterized the audio signal, the information can be used for
improving risk scoring in one or more of the following non-limiting
examples. In some instances audio signal characteristics and/or
non-audio data may be utilized to detect changes in behavior of an
account, for example, by comparing the audio characteristic of past
calls (for a given account) with the current calls, with risk being
higher depending on the amount and type of a change. Alternatively,
the present technology may compare the audio signal characteristics
with what should be expected based on an automatic number
identification (ANI) associated with the account. Databases may be
compiled in which the ANI can be looked up to determine the type of
phone a given ANI is associated with, and thus, it may be possible
to determine what expected audio signal characteristics should be
present in the audio signal. This information can be compared to
the observed characteristics, and calls that have a mismatch with
expected values may be rated higher for risk.
[0027] Additionally, the present technology may be utilized to
associate noise characteristics with fraudster channel models to
reduce false positives and aid in selecting fraudster models that
should be scanned based on this association. These methods may be
employed to "partition" or "segment" the fraudster voiceprint
database to reduce the total number of potential fraudster
voiceprints that may be compared against the voiceprint of the
candidate audio sample.
[0028] General applications of the present technology allow for the
generation and utilization of a historical set of speaker models
and/or channel models for a particular customer account. The
history may include a speaker model and/or channel model for each
call event for a given customer account. When a fraud report or
fraud event identifier is received, a time stamp associated with
the fraud event may be utilized to determine speaker models and/or
channel models that are proximate the time stamp of the fraud
event. These and other advantages of the present technology are
described infra with reference to the collective drawings, FIGS.
1-4.
[0029] Referring now to FIG. 1, a pictorial representation of an
exemplary implementation of a system for fraud detection is shown,
in accordance with various embodiments of the present disclosure.
As can be seen from FIG. 1, an enterprise call center 100, a fraud
detection system 102 (hereinafter "FDS 102"), and a plurality of
callers 104 are shown. The call center 100 may receive and process
calls on behalf of an enterprise. The enterprise may include a
merchant, an insurance company, an affiliate of a company, a bank,
a telecommunication company, a cellular service provider, a credit
card company, a credit card service company, and the like. The call
center may be located at the enterprise and/or at a separate
entity. In some embodiments, the enterprise is cloud-based.
[0030] According to some embodiments, the call center 100 may
receive calls from the plurality of callers 104 (hereinafter "the
callers 104") for goods and/or services provided by the enterprise.
The callers 104 may call the call center 100 using a VoIP/Public
Switched Telephone Network (PSTN)/mobile network 106A. The calls
from the callers 104 may enter an automatic call distributor 108,
which distributes calls across individual agents 110a-n. Calls
events may be recorded by a recording device 112 of the call center
100 for processing in real time and/or later for fraud detection.
It will be understood that the callers 104 may include legitimate
customers and fraudsters.
[0031] The callers 104 may request call center agents (who receive
phone calls) to process transactions related to goods/services. In
some embodiments, the call center 100 may apply one or more
business rules to decide to call, to determine whether to process a
transaction directly or to have a fraud check performed on the
caller.
[0032] The term "call data" for a call event or a segment of the
call event may be understood to include not only audio data (e.g.,
audio signals, or call audio data) for the call event, but
non-audio data for the call event. The term "call audio data" for
the call event or segment of the call event may be understood to
include the audio portion of the call data (e.g., audio signals).
"Call audio data," "audio sample," "audio signal," and "audio data"
may be used interchangeably. The above-described examples of audio
signal data are to be understood to be non-limiting, and one of
ordinary skill in the art will appreciate that many other types of
audio signal may likewise be utilized in accordance with the
present technology. Additionally, audio information or data may be
extracted from call audio data including both speaker models that
represent the voice of a speaker and channel models that represent
a communication profile of an audio path for a channel used by the
speaker. The communications profile may include noise models,
background noise, transfer path functions (as will be described in
greater detail infra), as well as other representative
characteristics that may be determined for a communications channel
that would be known to one of ordinary skill in the art.
[0033] Examples non-audio data include identification (e.g., the
phone number the caller called from), a dialed number
identification service information (e.g., phone number the caller
dialed), agent identification (e.g., the agent that handled the
call), timestamp-date and time of call, type of call (e.g., subject
matter of the call), an account or order identification (e.g., some
unique transaction or account identifier that the call was in
reference to), and a shipping zip code (e.g., if a product was to
be delivered to a particular location), and so forth and any other
available data that may be relevant to the call.
[0034] Additional examples of non-audio data include in various
combinations a call identification that includes a unique
identifier that identifies the call, an automatic number
identification that represents the number that initiated a call
event, a dialed number identification service that comprises a
dialed number (e.g., telephone number, short code, etc.), an agent
identification that specifies the call agent associated with the
call event, a queue identifier that identifies the telephony queue
into which a call event has been directed by the call center 100
(e.g., sales, technical support, fraud review, etc.), a timestamp
that indicates a date and time when the call event was initiated, a
call center identifier that indicates the call center which
initially received the call event, and/or the like.
[0035] For a call in reference to an account and/or transaction,
examples of non-audio data include an account number that specifies
the account number that the call event was in reference to, a
transaction number that specifies a transaction that the call was
in reference to, names associated with the account (first, last,
etc), a social security number or other government issued
identification number, an address (current and/or previous),
telephone number (current and/or previous), email address, account
type (business, consumer, reseller, etc.), account opening date,
credit limit, list of transactions associated with the account.
[0036] Examples of transaction non-audio data include a transaction
identifier that uniquely identifies the transaction, a timestamp
specifying a date and time for the transaction, a transaction
disposition (e.g., change of address, account balance check,
account payment details, account plan change, and so forth), a
shipping address, and combinations thereof.
[0037] For a call in reference to an order, examples of non-audio
data include an order number such as a unique order identification,
a list of items ordered, an order value, a timestamp, a name, a
shipping address, an email address, a phone number, a shipping
method, billing details, and combinations thereof. Any of the above
non-audio data may be used as an audio signal identifier.
[0038] All of the aforementioned types of data including audio
and/or non-audio data may be employed to generate risk scores for a
call event, as will be described in greater detail infra.
[0039] Many types of customer metadata may be determined from an
evaluation of the above mentioned call data. Exemplary types of
metadata include account, transaction, and/or order metadata, along
with call metadata. Additional data may also be extracted from
non-audio data, such as patterns or relationships.
[0040] It will be understood that the channel characteristics for a
segment of call audio data may be sufficiently unique to determine
that separate segments of call audio data belong to two separate
speakers. For example, a customer calling into an enterprise may
have channel characteristics that are inherently distinctive
relative to the channel model associated with call agents of the
enterprise. Therefore, differences in channel characteristics may
alone suffice as a basis for diarizing and separating segments of
call audio data.
[0041] The term "speaker model" may be understood to include a
voice model representing the unique characteristics of an
individual's voice, and/or a language model representing linguistic
characteristics of the speaker. The voice model may include a
collection of features that are extracted from an audio signal, of
the individual's voice, and encoded within a specific statistical
framework. In various embodiments, these features include cadence,
tone, rate of speech, spectral characteristics, and/or other
descriptive information about the voice and vocal track of the
speaker that describes the speaker (separately from the words
spoken). Other synonyms for a voice model may include, but are not
limited to, a voice signature, a voice print, a voice portion of a
speaker model, and also in some instances, simply a speaker voice.
In various embodiments, the language model is comprised of features
or characteristics (such as the words spoken and speech choices
made by the speaker) and a statistical framework for encoding those
features. Examples of a statistical framework include the
probability of an occurrence of a string of words, and how that
probability is calculated. In various embodiments, the language
model includes language(s) spoken, word choice, word order, accent,
grammar, diction, slang, jargon, rate of speech, and/or the like.
It is noteworthy that in some instances information in addition to
a speaker model (voice model and language model) can be extracted
from call audio data. For example, a channel model may be extracted
from call audio data, as described elsewhere herein. Further, word
spotting or word recognition may be used to extract data, for
example, name, account number, social security number, address,
and/or the like from call audio data.
[0042] In some embodiments, all callers are recorded automatically,
and an audio signal and/or non-audio data is stored for all calls.
In other embodiments, a portion of the calls are recorded and/or
stored. Additionally, the audio signal may be time stamped. Call
audio data may be streamed for processing in real time and/or
recorded and stored for processing.
[0043] The present technology may also enroll the stored voice
signals determined to correspond to a fraudster into a blacklist
that includes speaker/channel models determined to be associated
with fraudsters. For additional details regarding the enrollment of
speaker models into a blacklist see, e.g., U.S. patent application
Ser. Nos. 11/404,342, 11/754,974, 11/754,975, 12/352,530,
12/856,037, 12/856,118, 12/856,200, which are all hereby
incorporated by reference herein in their entireties. Similarly,
the present technology may enroll the stored channel signals
determined to correspond to a fraudster into a blacklist that
includes channel models determined to be associated with
fraudsters.
[0044] A call database 114 may store call data such as audio and
non-audio data. Customer accounts for each customer may be stored
in or linked to the call database 114. In some embodiments, various
elements of the call data (including audio data and/or non-audio
data) are stored in multiple separate databases and linked across
those databases. For example, non audio data in one database may be
linked to the customer account in another database using a call
identifier that associates a particular call event with a customer
account. In another example, a call event regarding an inquiry
about a particular order may be linked to the order and/or an
account associated with the order. Both legitimate and fraudulent
call data may be linked to the customer account. In some
embodiments, the call database 114 is a collection of multiple
databases, for example, customer account data base, order database,
customer support database, RMA (returned merchandise authorization)
database, warranty database, white list, black list, customer
history database, and/or the like.
[0045] In some embodiments, the call center 100 may include a fraud
management system 116 that receives data indicative of potential or
actual fraudulent activities from the FDS 102. The fraud management
system 116 may utilize the fraud data provided by the FDS 102,
along with other enterprise-specific information, to process and
remedy fraudulent account activity.
[0046] A file transfer server 118 of the call center 100 may
communicate recorded, live, or stored audio signals to the FDS 102
using Internet/LAN 106B. In some instances the audio signals and/or
non-audio data may be streamed to the FDS 102 via the file transfer
server 118. The Internet/LAN 106B may utilize a secure
communications protocol. File transfer server 118 may communicate
audio signal and/or non-audio data to an audio processing system,
hereinafter "system 200" via an application programming interface
("API") or any other suitable data transmission protocols, which
may be secure or insecure. The audio signal and/or non-audio data
may be communicated via Internet or LAN. Additional operational
details of the system 200 are described in greater detail with
regard to FIG. 2.
[0047] It will be understood that the FDS 102 may detect any type
of fraud. However, for the purposes of brevity, the present
disclosure focuses on fraud perpetrated by fraudsters utilizing
telephonic devices. While not shown, the FDS 102 may include
additional modules or engines that determine fraud and generate
fraud reports. Additional details regarding the FDS 102 have been
omitted so as not to obscure the description of the present
technology. See, e.g., U.S. Patent Application Attorney Docket
Number PA5872US, filed concurrently herewith on Mar. 8, 2012,
entitled "SYSTEMS, METHODS, AND MEDIA FOR GENERATING HIERARCHICAL
FUSED RISK SCORES."
[0048] The enrolled speaker models and/or channel models in one or
more fraudster databases/blacklists may be used as a corpus that
may be queried against for comparing voice and/or channel data of a
candidate audio sample.
[0049] The enrollment of speaker models into a fraudster database
uses one or more precursor fraud databases. A precursor fraud
database may be seeded with audio samples and associated audio
sample identifiers collected without regard to fraudulent activity
associated with the audio samples. The audio sample identifiers may
be matched with identifiers in a fraud report. Speaker models
extracted from audio in the precursor fraud database that is
associated with the matched audio sample identifiers may be
enrolled into the fraudster database. In various embodiments, the
audio sample identifiers include any type of information that links
the audio signal with the fraud identifiers. The audio sample
identifiers include one or a combination of a call identifier, a
customer account, a timestamp, identity information (name, social
security number, etc.), agent information, and/or a communications
device, such as a cellular telephone, a landline, or computing
system that communicates via VoIP protocols. Information for a
communications device may include data such as ANI, IMEI, caller
identification, and so forth. As will be discussed below, channel
models extracted from audio in the precursor fraud database that is
associated with the matched audio sample identifiers may be
enrolled into the fraudster database in a manner similar to speaker
models.
[0050] Further details regarding precursor fraud databases well as
the enrollment of fraudster voice signature/speaker models into a
fraudster database/blacklist using precursor fraud databases are
described in U.S. patent application Ser. Nos. 11/404,342,
11/754,974, 11/754,975, 12/352,530, 12/856,037, 12/856,118,
12/856,200, all of which are hereby incorporated by reference in
their entirety herein. Channel model enrollment may be performed in
a similar manner to speaker model enrollment, as described in these
U.S. patent applications.
[0051] A channel model may be understood to include information
that corresponds to the traversal path traveled by an audio sample.
This information may be referred to as a "distortion" of a source
of the received audio signal. Other terms for referring to this
information include "noise" and "artifact." An example of noise
includes random and/or systematic features or artifacts in the
audio sample that are present due to background or ambient noise
generated from one or more sources. For example, noise features for
an agent may include background voices from other call agents that
are proximate the agent. Examples of artifacts include filtering,
error recovery, packet handling, segmentation, beat frequencies,
and so forth. One of ordinary skill in the art will appreciate that
terms noise, artifact, distortion, and similar terms may be
utilized interchangeably in some contexts.
[0052] In some embodiments, an audio signal and/or non-audio data
for call events is stored in a precursor database for enrollment
into a fraudster database, see e.g., U.S. patent application Ser.
Nos. 11/404,342, 11/754,975 and 12/856,037, which are all hereby
incorporated by reference herein in their entirety.
[0053] FIG. 2 illustrates the system 200 which may be utilized to
process candidate audio samples to determine potential fraud.
Additionally, the system 200 may enroll channel and/or speaker
models that have been determined as being associated with fraud
into one or more fraudster databases such as the fraudster
database, fraudster voice database and/or fraudster channel
database. In various embodiments one or more fraudster database,
fraudster voice database and/or fraudster channel database may be
employed. Generally speaking, the system 200 may include a
diarization module 202 and an analysis module 204.
[0054] It is noteworthy that the system 200 may include additional
modules, engines, or components, and still fall within the scope of
the present technology. As used herein, the term "module" may also
refer to any of an application-specific integrated circuit
("ASIC"), an electronic circuit, a processor (shared, dedicated, or
group) that executes one or more software or firmware programs, a
combinational module circuit, and/or other suitable components that
provide the described functionality. In other embodiments,
individual modules of the system 200 may include separately
configured web servers.
[0055] In some embodiments, the system 200 may be implemented in a
cloud computing environment. Generally speaking, a cloud computing
environment or "cloud" is a resource that typically combines the
computational power of a large grouping of processors and/or that
combines the storage capacity of a large grouping of computer
memories or storage devices. For example, systems that provide a
cloud resource may be utilized exclusively by their owners, such as
Google.TM. or Yahoo.TM.; or such systems may be accessible to
outside users who deploy applications within the computing
infrastructure to obtain the benefit of large computational or
storage resources.
[0056] The cloud may be formed, for example, by a network of
servers with each server providing processor and/or storage
resources. These servers may manage workloads provided by multiple
users (e.g., cloud resource customers or other users). Typically,
each user may place workload demands upon the cloud that vary in
real-time, sometimes dramatically. The nature and extent of these
variations typically depends on the type of business associated
with the user.
[0057] The present technology may leverage the computational
resources of distributed computing (e.g., cloud computing systems)
to facilitate efficient processing of call data.
[0058] It is envisioned that the system 200 may cooperate with the
FDS 102 or may, in some embodiments, function as a stand-alone
audio processing system that may be utilized by an enterprise,
separately from the FDS 102.
[0059] In other embodiments, a portion (or potentially all
portions) of system 200 may be integrated into FDS 102, while in
other embodiments, the constituent sub-modules/components of the
system 200 may be remotely distributed from one another in a remote
computing arrangement, wherein each of the modules may communicate
with one another via the Internet 106B utilizing any one (or
combination) of a number of communications protocols or
communications mechanisms (e.g., API, HTTP, FTP, etc.).
[0060] An audio signal is received by the diarization module 202
from any of one or more originating sources, such as file transfer
server 118, or may be received directly from callers 104 (see FIG.
1). Again, the audio signal may include not only voice data, but
also channel data, and metadata associated with the voice data and
the channel data.
[0061] Upon receiving call data, the diarization module 202 may
diarize the audio signal into one or more segments. It will be
understood that a segment of audio signal may include segments
comprising voice data, channel data, and metadata for a unique
speaker.
[0062] It is noteworthy that in some instances, the system 200 may
receive diarized audio signals from the enterprise, such as when
the diarization module 202 is associated with the call center 100.
The system 200 may receive the diarized audio signals as recorded
data or streamed data. Moreover, the call center 100 may record
incoming call data (also referred to as the calling leg) and the
outgoing call data (also referred to as the called leg) separately
from one another. These two separate legs of call data may be
stored and transmitted or optionally streamed to the FDS 102.
Either of these two separate legs may include more than one voice.
For example, agent 1 in the called leg may ask agent 2 in the
called leg to speak with the caller on the calling leg.
[0063] According to some embodiments, the analysis module 204
comprises a communications module 206, an audio analysis module
208, an enrollment module 210, and a scoring module 212. It is
noteworthy that the analysis module 204 may include additional or
fewer modules, engines, or components, and still fall within the
scope of the present technology. Additionally, the functionalities
of two or more modules may be combined into a single module.
[0064] Generally speaking, the analysis module 204 may be executed
upon receiving a fraud event identifier that indicates that a fraud
event has occurred. Non-limiting examples of fraud event
identifiers may include a fraud report that comprises one or more
fraud events. That is, a fraud report may include multiple
instances of fraud events. A fraud event identifier may specify
details regarding an instance of fraud, such as the customer
account which was defrauded, or other identifying information that
allows the system to link a fraud event to enterprise related
information, such as customer accounts, call queues, phone numbers,
and so forth. The fraud event may also include a time stamp that
identifies an approximate day and/or time that the fraud event
occurred.
[0065] Once an audio signal has been received, the audio signal may
be processed by execution of the audio analysis module 208.
Generally speaking, the audio analysis module 208 may parse through
the data included in the audio signal and determine both speaker
models and channel models for the audio signal. The term
"determine" may also include extract, calculate, analyze, evaluate,
and so forth.
[0066] Again, the speaker model may be generated by initially
generating a voice model and a language model for the call data. If
the call data has already been diarized it may be inferred that the
voice model, language model and also the speaker model are
associated with a unique speaker. The speaker model may provide a
robust and multifaceted profile of the speech of a particular
speaker.
[0067] With regard to analyzing the audio signal, the audio
analysis module 208 may employ speaker recognition and/or speech
recognition. In general, speaker recognition may attempt to
recognize the identity of the speaker (e.g., recognition of the
voice of the speaker), whereas speech recognition refers to the
process of recognizing what words have been spoken by the
speaker.
[0068] In addition to determining a speaker model and a channel
model for the audio signal, the audio analysis module 208 may also
evaluate non-audio data, such as an audio signal identifier
associated with the audio signal. It will be understood that each
call event may be assigned an audio signal identifier that uniquely
identifies a call event. The identifier may be used for tracking
the call event and data associated with the call event during
evaluation and/or enrollment into a fraudster database.
[0069] As mentioned above, a channel model may include information
regarding the path that was traversed by an audio sample (e.g., the
path between the caller and the call agent or enterprise system).
The audio analysis module 208 may evaluate and model the delay
present in the audio signal to characterize the path taken by the
audio signal. In addition to modeling delay, the audio analysis
module 208 may model jitter, echo, artifacts (such as artifacts
introduced by audio compression/encoding techniques), error
recovery, packet loss, changes to the signal bandwidth, spectral
characteristics, and/or other audio artifacts that occur at
switching boundaries. With particular regard to VoIP paths,
discrete devices (e.g., routers, gateways, servers, computing
devices, etc.) involved in the transmission of VoIP data may also
imprint artifacts in an audio sample. The channel model also can
model handset characteristics such as microphone type.
[0070] The audio analysis module 208 may also be adapted to utilize
voice changer detection. That is, if a voice changer has been
utilized in the generation of a candidate audio sample, audio
artifacts may persist within the audio signal as it propagates to
the enterprise. Audio signal characteristics may be correlated with
signal tampering to detect the use of a voice changer.
[0071] In sum, the channel model may include a representation of
the many types of artifacts and/or distortions (e.g., degradations,
modifications, etc.) of the audio signal as it traverses along a
given path. These distortions may be utilized to determine if the
call originated from a particular source (e.g., geographic region
such as a country), passed through a cellular telephone network, or
many other types of distorting processes/features that impose
unique noise signatures on the audio signal.
[0072] According to some embodiments, the enrollment module 210 is
executed to perform a comparison of the audio signal identifier
and/or other associated non-audio data in a precursor database,
with the fraud event identifier and/or other associated non-audio
data in the fraud report. If there is a match then the enrollment
module 210 may update a fraudster voice database to include a
speaker model associated with the audio signal identifier.
Similarly, the enrollment module 210 may update a fraudster channel
database to include a channel model associated with the audio
signal identifier. The speaker model and/or channel model may be
determined or extracted by the audio analysis module 208 before or
after the enrollment module 210 is executed to perform a comparison
of the audio signal identifier and the fraud event identifier. In
some embodiments, the enrollment module 210 may find a match
between a fraud event identifier and an audio signal identifier for
multiple speaker models in the precursor database.
[0073] If a match is found between an audio signal identifier
(and/or other non-audio data) and a fraud event identifier, the
speaker models, voice models, language models and/or the channel
models associated with the audio signal identifier may be
automatically enrolled into a blacklist.
[0074] In some instances, a speaker model and/or a channel model
for a call event may be automatically enrolled into a whitelist.
For example, a customer may be explicitly prompted to enroll by
stating a particular phrase multiple times. In another example, no
match may be found between an audio signal identifier (and/or other
non-audio data) and a set of fraud events. The speaker model and
channel model extracted from the call data for the audio signal
identifier may then be automatically enrolled into the whitelist.
Enrollment that is conducted without the involvement of the speaker
may be referred to as "passive enrollment."
[0075] The enrollment module 210 may also store a channel model for
a candidate audio sample in a whitelist database when an at least
partial match between an audio signal identifier (and/or other
non-audio data) for the candidate audio sample and a fraud event
identifier cannot be determined. Stated otherwise, because the
channel model extracted from a candidate audio sample does not
match any fraud event identifiers stored in a fraudster database,
it can be inferred that the candidate audio sample is not
associated with a fraudster. For further details regarding the use
of a whitelist see, e.g., U.S. patent application Ser. No.
12/352,530 which is hereby incorporated herein by reference in its
entirety.
[0076] According to some embodiments, enrollment of a speaker model
or a channel model may be affected by comparing a time stamp
associated with a fraud event to a time stamp associated with the
audio sample. That is, audio samples with time stamps that are
temporally adjacent to time stamps of fraud events may more likely
be associated with fraudsters than audio samples with time stamps
that are temporally remote from fraud events. Audio samples with
time stamps that are temporally remote from fraud events may not be
automatically enrolled into the fraudster database, but may be
flagged as subject to further review before being enrolled.
[0077] In some instances, the present technology may utilize
active/dynamic scoring to further process call data to reduce the
likelihood that a speaker model or channel model of an audio signal
is mistakenly enrolled into a fraudster database.
[0078] According to other embodiments, the enrollment module 210
may maintain a list of channel models, where each channel model
belongs to a disqualified candidate (e.g., fraudster). As mentioned
previously, the channel model may represent a path traversed by an
audio signal. The audio signal may be provided with an identifier
(and/or non-audio data) that links the audio signal to a customer
account, a communications device, and/or a specific fraudster.
Stated otherwise, when a disqualified candidate is determined,
information indicative of that disqualified candidate may be stored
in a fraudster database.
[0079] In other examples, the scoring module 212 may compare the
candidate audio sample to one or more speaker models stored in a
fraudster database to generate a voice match score. These voice
match scores represent an at least partial match between the
candidate audio sample and one or more speaker models stored in the
fraudster voice database. These match scores may represent the
degree of similarity between the candidate audio sample and the one
or more speaker models.
[0080] According to additional embodiments, the scoring module 212
may be executed to generate various types of risk scores for
constituent parts of the call data. With reference to generating
risk scores for a candidate audio sample (e.g., signal), the
scoring module 212 may compare the candidate audio sample to a
variety of call data components that have been stored in fraudster
databases. For example, the scoring module 212 may compare the
candidate audio sample to one or more channel models stored in a
fraudster database to generate a match score and/or a channel risk
score. Generally speaking, the risk score for a call event may
represent the likelihood that a call event is associated with an
instance of fraud, or even a particular fraudster. For example, a
number between 0 and 1000 or between 0 and 10 may be generated by
the scoring module 212; the higher the number, the higher the risk.
The scoring module 212 may employ any of a variety of functions
and/or mathematical relations for computing match scores and risk
scores. Examples include averaging, weighted averaging, min/max
selection, weighted min/max selection, correlation, likelihood
function, and/or the like. In some embodiments a likelihood
function is defined as the likelihood that a particular candidate
call is fraud, or the likelihood that a particular candidate call
is a specific fraudster, etc. The risk score may be used to
evaluate both the call audio data (e.g., audio signals, extracted
speaker characteristics, extracted channel characteristics) and
call non-audio data such as account, transactional, order, or other
call related records and metadata.
[0081] It will be understood that in some instances, the particular
fraudster voice database that is selected may be based upon a
comparison between the candidate audio sample and a channel model
stored in the fraudster channel database. This type of analysis
helps to "partition" the fraudster database into subsets or
segments, where the candidate audio sample may be compared to more
relevant samples, rather than generally comparing the audio sample
to the entire fraudster database. Such partitioning may reduce the
time required to determine a risk score for a speaker model, thus,
enhancing real-time detection of fraudsters.
[0082] Stated otherwise, many speaker models may be maintained in a
collection of various fraudster databases and/or fraudster voice
databases. The scoring module 212 may be prevented from scanning
each and every database until an at least a partial match is
determined. Thus, multiple fraudster databases may be scanned for a
match, while allowing the scoring module 212 to select a subset of
the databases to utilize based on the scan, rather than using all
data included in the entire collection of fraudster databases for
scoring. Similarly, a fraudster database including multiple
subsets, partitions or segments and may be scanned globally. Based
on the results of the global scan, the scoring module 212 may
select a segment or some of the multiple segments of the databases
for scoring rather than all segments included in the fraudster
database.
[0083] These optimization techniques may be employed utilizing
other risk scores. For example, a fraudster channel database may be
selected based upon a voice match score. In other embodiments, the
fraudster voice database may be selected based upon the channel
match score. In another embodiment, fraudster models may be
maintained in a single fraudster database, and an analysis of the
call data may aid in determining a subset of fraudster models to be
used for comparison to the candidate audio sample. A subsequent
generation of channel and voice match scores may be based on the
comparison of the call data against the subset of the fraudster
models, rather than the entire fraudster database. One of ordinary
skill in the art will appreciate that other permutations and/or
variations of the optimization concepts described herein may
likewise be utilized in accordance with the present disclosure.
[0084] In other embodiments, the scoring module 212 may combine a
channel match score and a voice match score to create an audio
sample risk score. These match scores may be fused together to
generate a fused risk score that represents a more comprehensive
risk score for the candidate audio sample than would be available
by considering the channel match score and the voice match score
alone. Specific details for generating fused risk scores for call
data are described in co-pending U.S. Patent Application Attorney
Docket Number PA5872US, filed concurrently herewith on March XX.
2012, entitled "SYSTEMS, METHODS, AND MEDIA FOR GENERATING
HIERARCHICAL FUSED RISK SCORES," which is hereby incorporated by
reference herein in its entirety.
[0085] The scoring module 212 may also be configured to select a
whitelist database, based on a channel match score. It is
noteworthy that an entry in a whitelist includes one or more
qualified candidates that are associated with a customer account.
More specifically, in some instances a speaker model for a
qualified candidate may be stored in the whitelist database. A
candidate audio sample may be compared to speaker models included
in the whitelist database to determine if the candidate is a
qualified candidate. If the scoring module 212 is unable to match
the candidate audio sample with a speaker model included in the
database, this does not automatically indicate that the candidate
is a disqualified candidate.
[0086] By way of non-limiting example, a husband and wife may be
customers associated with the same credit card account. The
whitelist database may only include a speaker model for the wife.
Therefore, when the husband calls the credit card entity, an audio
sample collected for the husband and compared against the whitelist
database may not match the speaker model associated with the
account. An alert may be provided to the caller agent that the
speaker model for the candidate does not match, but upon gathering
additional information, the caller agent may ultimately verify the
candidate as a legitimate account holder. The collected audio
sample for the husband may be stored in the whitelist database and
associated with the customer account.
[0087] Additionally, the scoring module 212 may be executed to
compare a candidate audio sample to the whitelist database to
generate a whitelist match score. It may then incorporate this
match score into the comprehensive audio sample risk score.
[0088] When a screening request is received by the communications
module 206, the audio analysis module 208 may be executed to
process an audio sample included in the request. More specifically,
the audio analysis module 208 may compare the audio sample with
channel models included in the list maintained by the enrollment
module 210 in the fraudster channel database.
[0089] One or more channel match scores may be generated by the
scoring module 212 that indicates an at least partial match between
the audio sample and one or more channel models in the fraudster
channel database.
[0090] Similarly to channel models, the audio sample may be
compared to a list of speaker models that are associated with
disqualified candidates in the fraudster voice database and/or to
qualified candidates in the whitelist database. Voice match scores
may be generated based on the comparisons. Audio sample match
scores may be generated based on the voice match score and the
channel match score. Voice match scores, channel match scores,
and/or audio sample match scores may be used to generate risk
scores for the audio sample, or may be provided to a third party
for review. In some instances, a voice match score, channel match
score, and/or audio sample match score may be provided to a call
agent that is currently speaking with the speaker from which the
audio sample was captured. That is, the system 200 may operate in
near-real-time such that risk scores based on audio samples may be
obtained during a transaction between a caller agent and a
candidate. Risk scores generated from various match scores (voice
match scores, channel match scores, and/or audio sample match
scores) may be generated and provided to the caller agent to assist
the caller agent in conducting the transaction. Upon receiving a
risk score that indicates a high degree of risk, the caller agent
may prompt the candidate for further information, may flag the call
event for further review, and so forth. Conversely, upon receiving
a risk score that indicates a low degree of risk, the caller agent
may approve the current transaction. It is noteworthy to mention
that the risk score may be utilized along with other scores to
determine a risk level for a call event. For example, a risk score
may indicate a level of risk rather than indicate that a particular
caller is either a fraudster or a legitimate caller. Thus, if the
risk score is high, that may indicate that the call event should be
evaluated more carefully. The risk score may be combined with other
risk scores using various relations or functions.
[0091] FIG. 3 illustrates a flowchart of an exemplary method for
processing audio signals. The method may include a step 305 of
receiving an audio signal and an associated audio signal identifier
(and/or non-audio data). The audio signal may be received without
regard to fraud activities. The audio signal may be received from a
call center, or may be included in a diarized segment extracted
from call data. The method may also include a step 310 of receiving
a fraud event identifier associated with a fraud event. The fraud
event identifier may include a timestamp that indicates an
approximate time that a fraud event occurred.
[0092] Next, the method may include a step 315 of determining a
speaker model and a channel model based on the received audio
signal. The channel model may represent distortion included in the
audio signal that uniquely identifies details of the audio signal
such as country of origin, communications protocols and paths, and
the like. The combination of the received audio signal, speaker
characteristics and the channel characteristics provide a robust
set of data that may be compared against various fraudster
databases to determine if the caller associated with the audio file
is a fraudster.
[0093] The method may also include a step 320 of updating a
fraudster channel database to include the determined channel model
based on a comparison of the audio signal identifier and the fraud
event identifier, along with a step 325 of updating a fraudster
voice database to include the determined speaker model based on a
comparison of the audio signal identifier and the fraud event
identifier.
[0094] The method may also include various steps for generating
different types of match scores and risk scores for the speaker
model, channel model, and the audio signal. For example, the method
may include an optional step 330 of receiving a candidate audio
sample and a step 335 of determining a channel match score based on
a match between candidate audio sample and a channel model in the
fraudster channel database.
[0095] Additionally, the method may include a step 340 of
determining a voice match score based on a match between candidate
audio sample and a speaker model in the fraudster voice database,
along with a step 345 of determining an audio sample risk score
based on the channel match score and the voice match score.
[0096] It will be understood that the method may include additional
or fewer or steps that those listed above. Additionally, optional
steps have been shown as dotted lined objects in the Figures.
[0097] FIG. 4 illustrates an exemplary computing system 400 that
may be used to implement an embodiment of the present technology.
The computing system 400 of FIG. 4 may be implemented in the
contexts of the likes of computing systems, clouds, modules,
engines, networks, servers, and so forth. The computing system 400
of FIG. 4 includes one or more processor units 410 and main memory
420. Main memory 420 stores, in part, instructions and data for
execution by processor unit 410. Main memory 420 may store the
executable code when in operation. The system 400 of FIG. 4 further
includes a mass storage device 430, portable storage devices(s)
440, output devices 450, input devices 460, a graphics display 470,
and peripherals 480.
[0098] The components shown in FIG. 4 are depicted as being
connected via a single bus 490. The components may be connected
through one or more data transport means. Processor unit 410 and
main memory 420 may be connected via a local microprocessor bus,
and the mass storage device 430, peripheral(s) 480, portable
storage device 440, and display system 470 may be connected via one
or more input/output (I/O) buses.
[0099] Mass storage device 430, which may be implemented with a
magnetic disk drive and/or an optical disk drive, is a non-volatile
storage device for storing data and instructions for use by
processor unit 410. Mass storage device 430 may store the system
software for implementing embodiments of the present technology for
purposes of loading that software into main memory 420. The mass
storage device 430 may also be used for storing databases, such as
the fraudster voice database, the fraudster channel database, and
the precursor database.
[0100] Portable storage device 440 operates in conjunction with a
portable non-volatile storage medium, such as a floppy disk,
compact disk, digital video disc, or USB storage device, to input
and output data and code to and from the computing system 400 of
FIG. 4. The system software for implementing embodiments of the
present technology may be stored on such a portable medium and
input to the computing system 400 via the portable storage device
440.
[0101] Input devices 460 provide a portion of a user interface.
Input devices 460 may include an alphanumeric keypad, such as a
keyboard, for inputting alpha-numeric and other information, or a
pointing device, such as a mouse, a trackball, stylus, or cursor
direction keys. Additionally, the system 400 as shown in FIG. 4
includes output devices 450. Suitable output devices include
speakers, printers, network interfaces, and monitors.
[0102] Display system 470 may include a liquid crystal display
(LCD) or other suitable display device. Display system 470 receives
textual and graphical information, and processes the information
for output to the display device.
[0103] Peripherals 480 may include any type of computer support
device to add additional functionality to the computing system.
Peripheral device(s) 480 may include a modem or a router.
[0104] The components provided in the computing system 400 of FIG.
4 are those typically found in computing systems that may be
suitable for use with embodiments of the present technology and are
intended to represent a broad category of such computer components
that are well known in the art. Thus, the computing system 400 of
FIG. 4 may be a personal computer, hand held computing system,
telephone, mobile computing system, workstation, server,
minicomputer, mainframe computer, or any other computing system.
The computer may also include different bus configurations,
networked platforms, multi-processor platforms, etc. Various
operating systems may be used including Unix, Linux, Windows,
Macintosh OS, Palm OS, Android, iOS (iPhone OS), VMWare OS, and
other suitable operating systems.
[0105] It is noteworthy that any hardware platform suitable for
performing the processing described herein is suitable for use with
the technology. Computer-readable storage media refer to any medium
or media that participate in providing instructions to a central
processing unit (CPU), a processor, a microcontroller, or the like.
Such media may take forms including, but not limited to,
non-volatile and volatile media such as optical or magnetic disks
and dynamic memory, respectively. Common forms of computer-readable
storage media include a floppy disk, a flexible disk, a hard disk,
magnetic tape, any other magnetic storage medium, a CD-ROM disk,
digital video disk (DVD), any other optical storage medium, RAM,
PROM, EPROM, a FLASHEPROM, any other memory chip or cartridge.
[0106] While certain exemplary embodiments have been described and
shown in the accompanying drawings, it is to be understood that
such embodiments are merely illustrative and not restrictive of the
broad disclosure and that this disclosure is not limited to the
specific constructions and arrangements shown and described, since
various other modifications may occur to those ordinarily skilled
in the art upon studying this disclosure. In an area of technology
such as this, where growth is fast and further advancements are not
easily foreseen, the disclosed embodiments may be readily
modifiable in arrangement and detail as facilitated by enabling
technological advancements without departing from the principals of
the present disclosure.
[0107] In the foregoing specification, specific embodiments of the
present disclosure have been described. However, one of ordinary
skill in the art appreciates that various modifications and changes
can be made without departing from the scope of the present
disclosure as set forth in the claims below. Accordingly, the
specification and figures are to be regarded in an illustrative
rather than a restrictive sense, and all such modifications are
intended to be included within the scope of present disclosure. The
benefits, advantages, solutions to problems, and any element(s)
that may cause any benefit, advantage, or solution to occur or
become more pronounced are not to be construed as a critical,
required, or essential features or elements of any or all the
claims. The disclosure is defined solely by the appended claims
including any amendments made during the pendency of this
application and all equivalents of those claims as issued.
[0108] In the foregoing specification, the invention is described
with reference to specific embodiments thereof, but those skilled
in the art will recognize that the invention is not limited
thereto. Various features and aspects of the above-described
invention can be used individually or jointly. Further, the
invention can be utilized in any number of environments and
applications beyond those described herein without departing from
the broader spirit and scope of the specification. The
specification and drawings are, accordingly, to be regarded as
illustrative rather than restrictive. It will be recognized that
the terms "comprising," "including," and "having," as used herein,
are specifically intended to be read as open-ended terms of art.
Moreover, the phrase "at least one of . . . and" and "at least one
of . . . or" will both be understood to allow for individual
selection of any of the listed features, or any combination of the
individual features.
* * * * *