U.S. patent application number 16/902901 was filed with the patent office on 2021-12-16 for automated third-party data evaluation for modeling system.
The applicant listed for this patent is Hartford Fire Insurance Company. Invention is credited to Kudakwashe F. Chibanda, Sterling M. Cutler, Daniela Fassbender, Haibin Li, Jing-Ru Jimmy Li, Cyan Justina Manuel, Ahmad J. Paintdakhi, Alexi Resto, Peter Ross Thomas-Melly.
Application Number | 20210390564 16/902901 |
Document ID | / |
Family ID | 1000004904915 |
Filed Date | 2021-12-16 |
United States Patent
Application |
20210390564 |
Kind Code |
A1 |
Chibanda; Kudakwashe F. ; et
al. |
December 16, 2021 |
AUTOMATED THIRD-PARTY DATA EVALUATION FOR MODELING SYSTEM
Abstract
In some embodiments, a system may evaluate third-party data for
an enterprise (e.g., a potential risk enterprise such as an
insurance company), based on information about potential customers
received via a first-party data source and additional information
about the potential customers from sources other than the
enterprise received via a third-party data source. A model factory
may provide information about at least one enterprise predictive
model, and a third-party data evaluation platform may analyze the
additional information to determine an impact on the enterprise
predictive model. The third-party data evaluation platform may then
output an indication of a result of said analysis to a database of
findings (e.g., for use by data scientists).
Inventors: |
Chibanda; Kudakwashe F.;
(Brooklyn, NY) ; Cutler; Sterling M.; (West
Hartford, CT) ; Fassbender; Daniela; (Holland,
MI) ; Li; Haibin; (Livingston, NJ) ; Li;
Jing-Ru Jimmy; (Hartford, CT) ; Manuel; Cyan
Justina; (Chattanooga, TN) ; Paintdakhi; Ahmad
J.; (New Milford, CT) ; Resto; Alexi;
(Bloomfield, CT) ; Thomas-Melly; Peter Ross;
(Northampton, MA) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
Hartford Fire Insurance Company |
Hartford |
CT |
US |
|
|
Family ID: |
1000004904915 |
Appl. No.: |
16/902901 |
Filed: |
June 16, 2020 |
Current U.S.
Class: |
1/1 |
Current CPC
Class: |
G06Q 10/0635 20130101;
G06Q 30/0201 20130101; G06Q 40/08 20130101; G06N 20/00
20190101 |
International
Class: |
G06Q 30/02 20060101
G06Q030/02; G06N 20/00 20060101 G06N020/00; G06Q 10/06 20060101
G06Q010/06; G06Q 40/08 20060101 G06Q040/08 |
Claims
1. A system to evaluate third-party data for an enterprise,
comprising: a first-party data source to provide information about
potential customers from the enterprise; a third-party data source
to provide additional information about the potential customers
from sources other than the enterprise; a model factory to provide
information about at least one enterprise predictive model; a
third-party data evaluation platform coupled to the first-party
data source, the third-party data source, and the model factory
platform, including: a computer processor; and a storage device in
communication with said processor and storing instructions adapted
to be executed by said processor to: (i) receive information about
the enterprise predictive model from the model factory, (ii)
receive the additional information about the potential customers
from the third-party data source, (iii) analyze the additional
information to determine an impact on the enterprise predictive
model in connection with potential risk applications for insurance
company underwriting, and (iv) output an indication of a result of
said analysis; and a database of findings to store information
about the indication of the result of said analysis, wherein the
system executes performance monitoring using machine learning to
automatically and proactively identify potential issues, wherein
the system automatically re-trains the enterprise predictive model
using the additional information about the potential customers from
the third-party data source, and wherein information in the
database of findings is used by data scientists and an
unconstrained loss modeling team to identify and feedback important
information to an unconstrained loss modeling component of the
model factory.
2. (canceled)
3. The system of claim 1, wherein said identification is performed
via cloud analytics associated with at least one of: (i) object
storage, (ii) a data catalog, (iii) a data lake store, (iv) a data
factory, (v) machine learning, and (vi) artificial intelligence
services.
4. (canceled)
5. The system of claim 1, wherein the system automatically scores
the additional information about the potential customers from the
third-party data source.
6. The system of claim 1, wherein the information from the
first-party or third-party data source include all of: a risk claim
file, a medical report, a police report, and social network
data.
7. The system of claim 1, wherein a first enterprise predictive
model is associated with large loss and volatile claim detection
and a second enterprise predictive model is associated with a
premium evasion analysis.
8. The system of claim 7, wherein the indication of the result of
said analysis is to: (i) trigger a risk application, or (ii) update
a risk application.
9. The system of claim 1, wherein the indication of the result of
said analysis is associated with a variable or weighing factor of a
predictive model.
10. A computer-implemented method to evaluate third-party data for
an enterprise, comprising: receiving from the enterprise
information about potential customers via a first-party data
source; receiving from sources other than the enterprise additional
information about the potential customers via a third-party data
source; receiving information about at least one enterprise
predictive model from a model factory; analyzing, by a third-party
data evaluation platform, the additional information to determine
an impact on the enterprise predictive model in connection with
potential risk applications for insurance company underwriting; and
storing an indication of a result of said analysis in a database of
findings, wherein a system associated with the method executes
performance monitoring using machine learning to automatically and
proactively identify potential issues, wherein the system
automatically re-trains the enterprise predictive model using the
additional information about the potential customers from the
third-party data source, and wherein information in the database of
findings is used by data scientists and an unconstrained loss
modeling team to identify and feedback important information to an
unconstrained loss modeling component of the model factory executes
performance monitoring to automatically and proactively identify
potential issues and automatically re-trains the enterprise
predictive model using the additional information about the
potential customers from the third-party data source.
11. (canceled)
12. The method of claim 10, wherein said identification is
performed via cloud analytics associated with at least one of: (i)
object storage, (ii) a data catalog, (iii) a data lake store, (iv)
a data factory, (v) machine learning, and (vi) artificial
intelligence services.
13. (canceled)
14. The method of claim 10, further comprising: automatically
scoring the additional information about the potential customers
from the third-party data source.
15. The method of claim 10, wherein the information from the
first-party or third-party data source include all of: a risk claim
file, a medical report, a police report, and social network
data.
16. The method of claim 10, wherein a first enterprise predictive
model is associated with large loss and volatile claim detection
and a second enterprise predictive model is associated with a
premium evasion analysis.
17. The method of claim 16, wherein the indication of the result of
said analysis is to: (i) trigger a risk application, or (ii) update
a risk application.
18. The method of claim 10, wherein the indication of the result of
said analysis is associated with a variable or weighing factor of a
predictive model.
19. A non-transitory, computer-readable medium storing instructions
adapted to be executed by a computer processor to perform a method
to evaluate third-party data for an enterprise, said method
comprising: receiving from the enterprise information about
potential customers via a first-party data source; receiving from
sources other than the enterprise additional information about the
potential customers via a third-party data source; receiving
information about at least one enterprise predictive model from a
model factory; analyzing, by a third-party data evaluation
platform, the additional information to determine an impact on the
enterprise predictive model in connection with potential risk
applications for insurance company underwriting; and storing an
indication of a result of said analysis in a database of findings,
wherein a system associated with the method executes performance
monitoring using machine learning to automatically and proactively
identify potential issues, wherein the system automatically
re-trains the enterprise predictive model using the additional
information about the potential customers from the third-party data
source, and wherein information in the database of findings is used
by data scientists and an unconstrained loss modeling team to
identify and feedback important information to an unconstrained
loss modeling component of the model factory.
20. (canceled)
21. The medium of claim 19, wherein said identification is
performed via cloud analytics associated with at least one of: (i)
object storage, (ii) a data catalog, (iii) a data lake store, (iv)
a data factory, (v) machine learning, and (vi) artificial
intelligence services.
Description
BACKGROUND
[0001] An entity, such as an enterprise that analyzes risk
information, may want to analyze or "mine" large amounts of data,
such as internal enterprise data and/or data that is available from
other parties (e.g., "third-party data"). For example, a risk
enterprise might want to analyze tens of thousands of credit files
to look for detailed information about potential customers (e.g.,
customer names, addresses, ZIP codes, etc.). Note that an entity
might analyze this data in connection with different types of
risk-related models, and, moreover, different models may use the
data differently. For example, a description of a business or
residence might have different meanings depending on the types of
risk being evaluated. It can be difficult, however, to identify
useful information across such large amounts of data and different
types of predictive models. In addition, manually managing the
different needs and requirements (e.g., different business logic
rules) associated with various models can be a time consuming and
error prone process. Increasingly, third-party data has a shorter
and shorter shelf life (and the amount of data available is growing
at a substantial rate). As a result, data scientists spend a
substantial amount of valuable time trying to figure out if new
third-party data has any real value to the enterprise. It would
therefore be desirable to provide improved third-party data
evaluation for a modeling system.
SUMMARY OF THE INVENTION
[0002] According to some embodiments, systems, methods, apparatus,
computer program code and means are provided to improve third-party
data evaluation for a modeling system. In some embodiments, a
system may evaluate third-party data for an enterprise (e.g., a
potential risk enterprise such as an insurance company) based on
information about potential customers received via a first-party
data source and additional information about the potential
customers from sources other than the enterprise received via a
third-party data source. A model factory may provide information
about at least one enterprise predictive model, and a third-party
data evaluation platform may analyze the additional information to
determine an impact on the enterprise predictive model. The
third-party data evaluation platform may then output an indication
of a result of said analysis to a database of findings (e.g., for
use by data scientists).
[0003] Some embodiments provide: means for receiving from an
enterprise information about potential customers via a first-party
data source; means for receiving from sources other than the
enterprise additional information about the potential customers via
a third-party data source; means for receiving information about at
least one enterprise predictive model from a model factory; means
for analyzing, by a third-party data evaluation platform, the
additional information to determine an impact on the enterprise
predictive model; and means for storing an indication of a result
of said analysis in a database of findings.
[0004] A technical effect of some embodiments of the invention is
improved third-party data evaluation for a modeling system. With
these and other advantages and features that will become
hereinafter apparent, a more complete understanding of the nature
of the invention can be obtained by referring to the following
detailed description and to the drawings appended hereto.
BRIEF DESCRIPTION OF THE DRAWINGS
[0005] FIG. 1 is block diagram of a system according to some
embodiments of the present invention.
[0006] FIG. 2 illustrates a method in accordance with some
embodiments of the present invention.
[0007] FIG. 3 is a modeling workflow according to some
embodiments.
[0008] FIG. 4 is a modeling process ecosystem in accordance with
some embodiments.
[0009] FIG. 5 is a modeling system schema according to some
embodiments.
[0010] FIG. 6 illustrates data science assets in accordance with
some embodiments.
[0011] FIG. 7 is a modeling system data strategy according to some
embodiments.
[0012] FIG. 8 is a data science modeling system workflow in
accordance with some embodiments.
[0013] FIG. 9 is a generic model factory according to some
embodiments.
[0014] FIG. 10 is a general third-party data evaluation setup in
accordance with some embodiments.
[0015] FIG. 11 illustrates third-party data evaluation
orchestration according to some embodiments.
[0016] FIG. 12 illustrates third-party data scorecards in
accordance with some embodiments.
[0017] FIG. 13 is a machine learning refinery display in accordance
with some embodiments.
[0018] FIG. 14 is block diagram of a platform according to some
embodiments of the present invention.
[0019] FIG. 15 illustrates a tabular portion of a machine learning
refinery database in accordance with some embodiments.
[0020] FIG. 16 illustrates a wireless or handheld tablet device in
accordance with some embodiments of the present invention.
DETAILED DESCRIPTION
[0021] The present invention provides significant technical
improvements to facilitate a monitoring and/or processing of
third-party data, risk related data modeling, and dynamic data
processing. The present invention is directed to more than merely a
computer implementation of a routine or conventional activity
previously known in the industry as it significantly advances the
technical efficiency, access and/or accuracy of communications
between devices by implementing a specific new method and system as
defined herein. The present invention is a specific advancement in
the areas of data and model monitoring and/or processing by
providing benefits in data accuracy, analysis speed, data
availability, and data integrity, and such advances are not merely
a longstanding commercial practice. The present invention provides
improvement beyond a mere generic computer implementation as it
involves the processing and conversion of significant amounts of
data in a new beneficial manner as well as the interaction of a
variety of specialized risk-related applications and/or third-party
systems, networks, and subsystems. For example, in the present
invention third-party data and related risk information may be
processed, forecast, and/or scored via an analytics engine and
results may then be analyzed efficiently to evaluate risk-related
data, thus improving the overall performance of an enterprise
system, including message storage requirements and/or bandwidth
considerations (e.g., by reducing a number of messages that need to
be transmitted via a network). Moreover, embodiments associated
with predictive models might further improve the performance of
claims processing applications, resource allocation decisions,
reduce errors in templates, improve future risk estimates, etc.
[0022] An enterprise may want to analyze or "mine" large amounts of
data, such as third-party data received from various sources. By
way of example, a risk enterprise might want to analyze tens of
thousands of risk-related third-party data files to look for useful
information (e.g., to find information that might correct and/or
supplement existing first-party data used by the enterprise). Note
that an entity might analyze this data in connection with different
types of applications (e.g., potential risk applications of an
insurance company), and that different applications may need to
analyze the data differently. It may therefore be desirable to
provide systems and methods that permit third-party data evaluation
for a modeling system in an automated, efficient, and accurate
manner.
[0023] FIG. 1 is block diagram of a system 100 according to some
embodiments of the present invention. The system includes a model
factory 110 that might have a pricing module 112, an underwriting
module 114, an unconstrained loss monitoring module 116, etc. The
model factory 110 provides information to a third-party data
evaluation platform 150 that also receives new first-party data 120
and new third-party data 130 (e.g., information about an entity at
a ZIP code or state level). Examples of new third-party data 130
might include, for example, information from EXPERIAN.RTM., Dun
& Bradstreet ("D&B"), the Bureau of Labor Statistics
("BLS"), the National Oceanic and Atmospheric Administration
("NOAA"), TransUnion credit scores, credit reports and credit
checks, etc.
[0024] The pricing module 112 may feed an initial baseline model to
the third-party data evaluation platform 150. The third-party data
evaluation platform 150 may then bring in the new first-party and
third-party data 120, 130 elements and kick off a data analysis
processing loop and/or a scorecard processing loop. The results may
then be stored in a database of findings 160 for use by data
scientists and/or an unconstrained loss modeling team 190 (to
determine important information and feedback that data to the
unconstrained loss modeling component 116). This process might be
performed automatically or be initiated via a command from a remote
interface device. As used herein, the term "automatically" may
refer to, for example, actions that can be performed with little or
no human intervention.
[0025] As used herein, devices, including those associated with the
system 100 and any other device described herein, may exchange
information via any communication network which may be one or more
of a Local Area Network ("LAN"), a Metropolitan Area Network
("MAN"), a Wide Area Network ("WAN"), a proprietary network, a
Public Switched Telephone Network ("PSTN"), a Wireless Application
Protocol ("WAP") network, a Bluetooth network, a wireless LAN
network, and/or an Internet Protocol ("IP") network such as the
Internet, an intranet, or an extranet. Note that any devices
described herein may communicate via one or more such communication
networks.
[0026] The third-party data evaluation platform 150 may store
information into and/or retrieve information from various data
stores (e.g., the database of findings 160), which may be locally
stored or reside remote from the third-party data evaluation
platform 150. Although a single third-party data evaluation
platform 150 and model factory 110 are shown in FIG. 1, any number
of such devices may be included. Moreover, various devices
described herein might be combined according to embodiments of the
present invention. For example, in some embodiments, the
third-party data evaluation platform 150 and database of findings
160 might comprise a single apparatus. Any of the system 100
functions may be performed by a constellation of networked
apparatuses, such as in a distributed processing or cloud-based
architecture.
[0027] A user or administrator may access the system 100 via a
remote device (e.g., a Personal Computer ("PC"), tablet, or
smartphone) to view information about and/or manage operational
information in accordance with any of the embodiments described
herein. In some cases, an interactive graphical user interface
display may let an operator or administrator define and/or adjust
certain parameters (e.g., to define advanced rules or business
logic) and/or provide or receive automatically generated
recommendations or results from the system 100.
[0028] Ingestion of information into the third-party data
evaluation platform 150 may include key assignment and ingestion of
existing tags (e.g., latitude and longitude) that are associated
with the data. This information might then be processed to
determine an appropriate domain assignment (e.g., using general tag
learning and artificial intelligence) and/or custom tagging (e.g.,
using custom tags and feedback from users) to create a broad set of
tags. As a result, the system 100 might automatically evaluate data
quality (e.g., duplication), size, timeliness, grain, completeness,
etc. Moreover, embodiments may leverage matching for name and/or
address matching, perform dislocation analysis (how does the new
third-party data 130 "move" groupings), asses which variables have
the strongest relationship with a target using a Least Absolute
Shrinkage and Selection Operator ("LASSO") algorithm, a Gradient
Boosting Machine ("GBM") algorithm, a Random Forest ("RF") method,
etc. The system 100 may use an existing model as a baseline to
determine how much additional impact the third-party data 130 has
on the model (e.g., by comparing the performance of existing
variables and new variables on a predictive enterprise model).
[0029] In this way, the system 100 may mine third-party data in an
efficient and accurate manner. For example, FIG. 2 illustrates a
method that might be performed by some or all of the elements of
the system 100 described with respect to FIG. 1 according to some
embodiments of the present invention. The flow charts described
herein do not imply a fixed order to the steps, and embodiments of
the present invention may be practiced in any order that is
practicable. Note that any of the methods described herein may be
performed by hardware, software, or any combination of these
approaches. For example, a computer-readable storage medium may
store thereon instructions that when executed by a machine result
in performance according to any of the embodiments described
herein.
[0030] At 202, the system may receive from an enterprise
information about potential customers via a first-party data
source. At 204, the system may receive from sources other than the
enterprise additional information about the potential customers via
a third-party data source. The information from the first-party or
third-party data source might be associated with, for example, a
risk claim file, a risk claim note, a medical report, a police
report, social network data, web image data, Internet of Things
data, Global Positioning System ("GPS") satellite data, activity
tracking data, big data information, a loss, an injury, a first
notice of loss statement, video chat stream, optical character
recognition data, a governmental agency, etc.
[0031] At 206, the system may receive information about at least
one enterprise predictive model from a model factory. In some
embodiments, a plurality of enterprise predictive models are
associated with a plurality of risk applications, including at
least two of: a workers' compensation claim, a personal risk
policy, a business risk policy, an automobile risk policy, a home
risk policy, a sentiment analysis, risk event detection, a cluster
analysis, a predictive model, a subrogation analysis, fraud
detection, a recovery factor analysis, large loss and volatile
claim detection, a premium evasion analysis, a risk policy
comparison, an underwriting decision, indicator incidence rate
trending, etc.
[0032] At 208, the system may analyze, by a third-party data
evaluation platform, the additional information to determine an
impact on the enterprise predictive model. At 210, the system may
store an indication of a result of said analysis in a database of
findings. According to some embodiments, the indication of the
result of said analysis will trigger a risk application and/or
update a risk application. Moreover, in some embodiments, the
indication of the result of said analysis is associated with a
variable or weighing factor of a predictive model.
[0033] According to some embodiments, the system may also execute
performance monitoring to automatically and proactively identify
potential issues. This identification might be performed, for
example, via cloud analytics associated with object storage, a data
catalog, a data lake store, a data factory, machine learning,
artificial intelligence services, etc. Moreover, in some
embodiments the system automatically re-trains the enterprise
predictive model using the additional information about the
potential customers from the third-party data source and
automatically scores the additional information about the potential
customers from the third-party data source.
[0034] FIG. 3 is a modeling workflow according to some embodiments.
Note that typical model rebuilds require extensive resources and
time commitment. Moreover, it involves cumbersome processes and
competing priorities that may hinder unconstrained modeling
practices. According to some embodiments, business partners and
data scientists work together to investigate issues and, after
investigation, a project may be initiated to adjust a model. In
particular, performance monitoring may be performed at 302 to
proactively identify potential issues. At 304, the system may
provide an ability to re-train a model quickly and implement
relevant factors. At 306, self service tools may unlock additional
analysis for the business and a capacity to explore unconstrained
modeling may be created at 308.
[0035] Some embodiments may be associated with a model factory that
streamlines maintenance activities of pricing models and supports
unconstrained modeling with baseline data and model foundation. For
example, FIG. 4 is a modeling process ecosystem in accordance with
some embodiments. The system may digest a data supply at 402 (e.g.,
analytic base data may be collected at a lowest grain with standard
data diagnostic and source checks provided by a performance
monitoring team. The system may then preserve a current state at
404 (e.g. to remediate, monitor, and/or re-train models on a
regular cadence to provide consistency and knowledge across
different enterprise lines of business). The system may generate
outcomes at 406 via guided tutorials of an application with
standard Key Performance Indicators ("KPIs"). At 408, the system
may consume the analysis (note that a value-add analysis may still
be required to digest the reports generated and signals detected).
Finally, some embodiments may provide a foundation for a larger
effort 410 (e.g., process, tools and pricing model datasets may
bring efficiency and transparency). Note that embodiments may gain
speed with respect to refresh and rebuild and/or release capacity
for unconstrained modeling activity.
[0036] FIG. 5 is a modeling system schema 500 according to some
embodiments. A data transformation portion (e.g., data, business
insights, etc.) may take new sources, concept traces, filters, and
merge keys to be combined 510 thus collecting data 512 from
preferred sources at the lowest possible granularity. Data
performance monitoring 514 may be performed along with an analysis
of enabled data 516. A modeling portion of the schema 500 may
include a pricing factor toolbox 520 that provides information to
an output portion. The output portion might include, for example,
scoring indicated and implemented information 530 during a modeling
process that also trains a model 532. If something significant is
found, the system may train (or re-train) and score the model 540
creating an analysis report for data science 542. The analysis
report for data science 542 might result in an action or
unconstrained model training and scoring 550, including rebuild
and/or refresh requirements (and another analysis report 552). This
process may be repeated in an iterative fashion (as illustrated by
the dashed arrows in FIG. 5) to improve the model as
appropriate.
[0037] According to some embodiments, the system may automatically
update documentation, pull and cleanse data, identify and cleanse
code, update models, and update modeling reports (e.g., in a single
day). Moreover, a driver table--such as an EXCEL.RTM. workbook--may
serve as documentation for models, help steer the process, and
host, for example, code chunks, model structure, data items needed,
data transformations, what to store, etc. Moreover, embodiments may
automatically assess the impact of various new data indicators and
the predictability of new data sources to a current pricing
model.
[0038] FIG. 6 illustrates data science assets 600 in accordance
with some embodiments. Note that a supply (including internal first
party data 620 and third-party data 630) may get processed through
a third-party data evaluation platform 650. The platform 650 may,
for example, evaluate the value of a new third-party data 630
source by performing an unconstrained modeling exercise and by
comparing information to a pricing factory baseline model.
Initially, an entity resolution 660 may strengthen an enterprise
connection with the third-party data 630 to find a quicker and/or
more reliable match. An assessment of data sources 670 may then
provide a faster assessment of new data sources (and an indication
of which elements the enterprise should focus on). A pricing
factory 680 may provide an automated baseline model data to enable
the fast startup of the unconstrained modeling. In this way, all
three pieces 660, 670, 680 may empower data scientists 690 to
perform a deep analysis and improve both the quality of analysis
and the turn-around time (and even providing new insights).
[0039] Note that according to some embodiments data is not brought
in isolation to simply increase supply, but instead as an integral
part of a thoughtful end to process-to-power business functions.
Such an approach may help create a sustainable ecosystem to
establish flexible data-driven workflows by connecting several
different elements of an enterprise. FIG. 7 is a modeling system
data strategy according to some embodiments. As shown, embodiments
may connect supply 702 elements (e.g., internal and external data),
input 704 elements (e.g., an object, people, places, events, etc.),
process 706 elements, outcome 708 elements (e.g., trusted,
connected, and/or curated, consume 710 elements (e.g., prospecting,
underwriting, and capacity planning), etc.
[0040] FIG. 8 is a data science modeling system workflow in
accordance with some embodiments. Testing 802 may be performed by
an incubation laboratory on enterprise data, third-party data,
ground truth information, etc. Feature engineering 804 may be
associated with a client hub, image factory, text factory, entity
resolution, etc. A factory 806 may provide third-party testing, a
pricing factory, location intelligence (ZIP code level data), etc.
Product delivery 808 may be provided, for example, with respect to
insurance claims or service. Warranty 810 may also be provided to
sustain and evolve a product, model automation, performance
monitoring, refresh and/or retrain models, etc.
[0041] FIG. 9 is a generic model factory 900 according to some
embodiments. One or more suppliers 910 (e.g., associated with
Architecture Development Method ("ADM") data, point-in-time
tactical information, etc.) may provide inputs 920, such as driver
tables via GitHub, a customer table, production and data scores,
etc.). This information may be sent to a data transformation module
930 of a process 950 (e.g., including a read driver, an initial
data pull, appropriate transformations, an analytics enabled data
release, initial data testing, transformation testing, etc.). A
modeling and analysis module 940 may then perform monitoring 942
(e.g., indicated offset modeling, significance checks) and re-train
944 the model as appropriate. A scoring engine 960 may score each
model component and aggregate model components to create a final
score that may be provided to an output data module 970 (that
collects data and feeds an R-shiny application. Output 980 of the
process 950, including monitoring application data (e.g., analysis
reports and model documentation), analysis foundation, and a
scoring mart (e.g., a risk score repository) may be provided to
consumers 990 (e.g., actuarial, product, data science,
underwriting, etc.).
[0042] FIG. 10 is a general third-party data evaluation setup 1000
in accordance with some embodiments. A user input 1010 includes a
R-shiny application 1012 may let a user upload data and initiate a
process by exchanging information with driver tables 1022 in data
collection. A batch submitter 1014 may provide information to a
read request log 1032 in request fulfillment 1030. The setup 1000
may add new sources, add analysis files, and run scorecards 1034
and results may be provided to a data pre-processor and/or a
scorecard batch submitter 1036. These elements 1036 may then
provide information to a driver table manager and/or results
manager 1038 that updates a Hadoop 1024 data store. In addition,
data may be provided to storage 1042 and/or an email to the user
1044 in an output 1040 section of the process. According to some
embodiments, an automated mining platform may access rules in an
event rules database to mine received data. The mining platform may
then transmit results to external systems, such as an email alert
server, a workflow application, and/or reporting and calendar
functions (e.g., executing on a server).
[0043] FIG. 11 illustrates third-party data evaluation
orchestration 1100 according to some embodiments. An Oracle
database 1110 (e.g., storing a document request) may receive
information from a user input 1120 (shiny) during a kickoff. A
request handler 1130 may also receive information from the user
input 1120 along with a utility library 1150. A request processor
1140 may also receive information from the utility library 1150 and
update a Hadoop 1160 data store (where user input files are merged
with added files and key reference information is stored). The
handler may also transmit an email to the user 1170. In particular,
at (A) the request handler 1130 may be initiated by the user input
1120 that requests information collection and kicks off the request
processor 1140. The user input 1120 may provide for user
friendliness (enabling users not familiar with H2O or R to perform
analysis), documentation (the application records results and
reports that can be leveraged for future runs to avoid
duplication), and/or allow a future addition of pricing factory
models. At (B), the request processor 1140 picks up user input in
Hadoop 1160 (e.g., via a Hadoop Distributed Files System ("HDFS")),
processes the request (e.g., by adding an identifier), and generate
merge files based on key reference file. At (C), the orchestration
1100 performs the final steps, including returning results request
handler 1130, sending out the email to the user 1170, sending a
document to the Oracle database 1110, close information loop to
shiny 1120, etc.
[0044] FIG. 12 illustrates third-party data scorecards 1200 in
accordance with some embodiments. The scorecards 1200 may be
associated with, according to some embodiments, an object-oriented
design framework and include a main class 1210 (e.g., with a
parameter parser, a data retriever, a metric calculation, and
utility modules). A pre-match scorecard 1220 or report may comprise
a pdf visual and include database of findings information (to let
someone quickly go through a data set to see if it is worth
processing) and a post-match scorecard 1230 may include third-party
displacement information after the process merged data looking for
similar things (grains, completeness, machine learning, etc.). A
third-party data only model scorecard 1240 may be associated with
model generation and predict a target from new data that was
identified (e.g., and recommend either "yes," "no," or "maybe" with
respect to further evaluation). A residual model scorecard 1250 may
comprise a visualization while a base model and third-party data
scorecard 1260 may bring information together to complete the
evaluation.
[0045] According to some embodiments, an administrator or operator
interface may display various Graphical User Interface ("GUI")
elements. For example, FIG. 13 illustrates a machine learning
refinery GUI display 1300 in accordance with some embodiments of
the present invention. The display 1300 may include a graphical
representation 1310 of the components associated with a third-party
data evaluation platform. According to some embodiments, an
administrator or operator may then select an element (e.g., via a
touchscreen or computer mouse pointer 1320) to see more information
about that element (e.g., in a popup window) and or adjust
parameters (e.g., linking to a new third-party data source).
Selection of an "Edit" icon 1330 may also allow for alteration of
the system's operation.
[0046] The embodiments described herein may be implemented using
any number of different hardware configurations. For example, FIG.
14 illustrates a platform or apparatus 1400 that may be, for
example, associated with the system 100 of FIG. 1 as well as the
other systems described herein. The apparatus 1400 comprises a
processor 1410, such as one or more commercially available Central
Processing Units ("CPUs") in the form of one-chip microprocessors,
coupled to a communication device 1420 configured to communicate
via a communication network (not shown in FIG. 14). The
communication device 1420 may be used to communicate, for example,
with one or more first-party and/or third-party data sources and
risk applications. The apparatus 1400 further includes an input
device 1440 (e.g., a mouse and/or keyboard to define data rules and
events) and an output device 1450 (e.g., a computer monitor to
display reports and data mining results to an administrator).
[0047] The processor 1410 also communicates with a storage device
1430. The storage device 1430 may comprise any appropriate
information storage device, including combinations of magnetic
storage devices (e.g., a hard disk drive), optical storage devices,
mobile telephones, and/or semiconductor memory devices. The storage
device 1430 stores a program 1412 and/or a machine learning
refinery engine 1414 (e.g., associated with a modeling system
engine plug-in) for controlling the processor 1410. The processor
1410 performs instructions of the programs 1412, 1414, and thereby
operates in accordance with any of the embodiments described
herein. For example, the processor 1410 may evaluate third-party
data for an enterprise, including information about potential
customers via a first-party data source and additional information
about the potential customers from sources other than the
enterprise via a third-party data source. The processor 1410 may
provide information about at least one enterprise predictive model
and analyze the additional information to determine an impact on
the enterprise predictive model. The processor 1410 may then output
an indication of a result of said analysis to a database of
findings (e.g., for use by data scientists).
[0048] The programs 1412, 1414 may be stored in a compressed,
uncompiled and/or encrypted format. The programs 1412, 1414 may
furthermore include other program elements, such as an operating
system, a database management system, and/or device drivers used by
the processor 1410 to interface with peripheral devices.
[0049] As used herein, information may be "received" by or
"transmitted" to, for example: (i) the apparatus 1400 from another
device; or (ii) a software application or module within the
apparatus 1400 from another software application, module, or any
other source.
[0050] In some embodiments (such as shown in FIG. 14), the storage
device 1430 further stores third-party input data 1460, an event
rules database 1470, and a Machine Learning ("ML") refinery
database 1500. An example of a database that may be used in
connection with the apparatus 1400 will now be described in detail
with respect to FIG. 15. Note that the database described herein is
only one example, and additional and/or different information may
be stored therein. Moreover, various databases might be split or
combined in accordance with any of the embodiments described
herein.
[0051] Referring to FIG. 15, a table is shown that represents the
ML refinery database 1500 that may be stored at the apparatus 1400
according to some embodiments. The table may include, for example,
entries identifying rules and algorithms that may facilitate
third-party data evaluation and/or mining. The table may also
define fields 1502, 1504, 1506, 1508, 1510 for each of the entries.
The fields 1502, 1504, 1506, 1508, 1510 may, according to some
embodiments, specify: ML refinery identifier 1502, a model
identifier 1504, a date and time 1506, a third-party data
identifier 1508, and scorecard data 1510. The ML refinery database
1500 may be created and updated, for example, based on information
received from an operator or administrator (e.g., when a potential
new data source is added).
[0052] The ML refinery 1502 may be, for example, a unique
alphanumeric code identifying a system currently be operated by an
enterprise. The model identifier 1504 might indicate an enterprise
predictive model that is be evaluated and the date and time 1506
might indicate the last time the model was updated or executed. The
third-party data identifier 1508 may indicate a data source that is
providing information to be evaluated and the scorecard data 1510
may include the results of that evaluation (e.g., a category or
numerical score).
[0053] According to some embodiments, the third-party data
evaluation is associated with a "big data" activity that may use
machine learning to sift through large amounts of unstructured data
to find meaningful patterns to support business decisions. As used
herein, the phrase "big data" may refer to massive amounts of data
that are collected over time that may be difficult to analyze and
handle using common database management tools. This type of big
data may include web data, business transactions, email messages,
activity logs, and/or machine-generated data. In addition, data
from sensors, unstructured documents posted on the Internet, such
as blogs and social media, may be included in embodiments described
herein.
[0054] According to some embodiments, the data evaluation performed
herein may be associated with hypothesis testing. For example, one
or more theories may be provided (e.g., "the elimination of this
parameter will not negatively impact underwriting decisions").
Knowledge engineering may then translate common smart tags for
industry and scenario specific business context analysis.
[0055] In some embodiments, the data and model evaluations
described herein may be associated with insight discovery wherein
unsupervised data mining techniques may be used to discover common
patterns in data. For example, highly recurrent themes may be
classified, and other concepts may then be highlighted based on a
sense of adjacency to these recurrent themes. In some cases,
cluster analysis and drilldown tools may be used to explore the
business context of such themes. For example, sentiment analysis
may be used to determine how an entity is currently perceived
and/or the detection of a real-world event may be triggered (e.g.,
it might be noted that a particular automobile model is frequently
experiencing a particular unintended problem).
[0056] Thus, embodiments may provide improved third-party data
evaluation for a modeling system. FIG. 16 illustrates a wireless or
tabular device 1600 displaying elements of a system in accordance
with some embodiments of the present invention. For example, in
some embodiments, the device 1600 is an iPhone.RTM. from Apple,
Inc., a BlackBerry.RTM. from RIM, a mobile phone using the Google
Android.RTM. operating system, a portable or tablet computer (such
as the iPad.RTM. from Apple, Inc.), a mobile device operating the
Android.RTM. operating system or other portable computing device
having an ability to communicate wirelessly with a remote
entity.
[0057] The device 1600 presents a display 1610 that may be used to
display information about a data evaluation system. For example,
the elements may be selected by an operator (e.g., via a
touchscreen interface of the device 1600) to view more information
about that element and/or to adjust settings or parameters
associated with that element (e.g., to introduce a new third-party
data source to the system).
[0058] The following illustrates various additional embodiments of
the invention. These do not constitute a definition of all possible
embodiments, and those skilled in the art will understand that the
present invention is applicable to many other embodiments. Further,
although the following embodiments are briefly described for
clarity, those skilled in the art will understand how to make any
changes, if necessary, to the above-described apparatus and methods
to accommodate these and other embodiments and applications.
[0059] Although specific hardware and data configurations have been
described herein, note that any number of other configurations may
be provided in accordance with embodiments of the present invention
(e.g., some of the information associated with the databases
described herein may be combined or stored in external systems).
Applicants have discovered that embodiments described herein may be
particularly useful in connection with insurance policies and
associated claims. Note that other types of business and risk data
may also benefit from the present invention. For example,
embodiments might be used in connection with bank loan
applications, warranty services, etc.
[0060] Moreover, although some embodiments have been described with
respect to particular data evaluation approaches, note that any of
the embodiments might instead be associated with other information
processing techniques. For example, third-party evaluations may be
performed to process and/or mine certain characteristic information
from various social networks to determine whether a party is
engaging in certain risky behavior or providing high risk products.
It is also contemplated that embodiments may process data including
text in one or more languages, such English, French, Arabic,
Spanish, Chinese, German, Japanese and the like. In an exemplary
embodiment, a system can be employed for sophisticated data
analyses, wherein information can be recognized irrespective of the
source.
[0061] According to some embodiments, third-party data may be used
in conjunction with one or more predictive models to take into
account a large number of underwriting and/or other parameters. The
predictive model(s), in various implementations, may include one or
more of neural networks, Bayesian networks (such as Hidden Markov
models), expert systems, decision trees, collections of decision
trees, support vector machines, or other systems known in the art
for addressing problems with large numbers of variables.
Preferably, the predictive model(s) are trained on prior data and
outcomes known to the risk company. The specific data and outcomes
analyzed may vary depending on the desired functionality of the
particular predictive model. The particular data parameters
selected for analysis in the training process may be determined by
using regression analysis and/or other statistical techniques known
in the art for identifying relevant variables and associated
weighting factors in multivariable systems. The parameters can be
selected from any of the structured data parameters stored in the
present system (e.g., tags and event data), whether the parameters
were input into the system originally in a structured format or
whether they were extracted from previously unstructured objects,
such as from big data.
[0062] In the present invention, the selection of weighting factors
(either on an event level or a data source level) may improve the
predictive power of the data mining. For example, more reliable
data sources may be associated with a higher weighting factor,
while newer or less reliable sources might be associated with a
relatively lower weighting factor.
[0063] The present invention has been described in terms of several
embodiments solely for the purpose of illustration. Persons skilled
in the art will recognize from this description that the invention
is not limited to the embodiments described, but may be practiced
with modifications and alterations limited only by the spirit and
scope of the appended claims.
* * * * *