U.S. patent application number 14/280065 was filed with the patent office on 2014-09-04 for claims analytics engine.
This patent application is currently assigned to Accenture Global Services Limited. The applicant listed for this patent is Accenture Global Services Limited. Invention is credited to Ajay K. Easo, Dmitriy Feferman, Rayid Ghani, Nicholas F. Howell, Michael S. Irish, Laura J. Jantzen, Mohit Kumar, Lindsey J. Lizardi, Zhu-Song Mei, Leana R. Wallace.
Application Number | 20140249865 14/280065 |
Document ID | / |
Family ID | 42827395 |
Filed Date | 2014-09-04 |
United States Patent
Application |
20140249865 |
Kind Code |
A1 |
Ghani; Rayid ; et
al. |
September 4, 2014 |
CLAIMS ANALYTICS ENGINE
Abstract
Methods and systems for processing claims (e.g., healthcare
insurance claims) are described. For example, prior to payment of
an unpaid claim, a prediction is made as to whether or not an
attribute specified in the claim is correct. Depending on the
prediction results, the claim can be flagged for an audit. Feedback
from the audit can be used to update the prediction models in order
to refine the accuracy of those models.
Inventors: |
Ghani; Rayid; (Evanston,
IL) ; Wallace; Leana R.; (San Francisco, CA) ;
Irish; Michael S.; (San Francisco, CA) ; Easo; Ajay
K.; (Chicago, IL) ; Kumar; Mohit; (Chicago,
IL) ; Mei; Zhu-Song; (Chicago, IL) ; Feferman;
Dmitriy; (Chicago, IL) ; Howell; Nicholas F.;
(San Marino, CA) ; Lizardi; Lindsey J.; (State
College, PA) ; Jantzen; Laura J.; (Greenwood Village,
CO) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
Accenture Global Services Limited |
Dublin |
|
IE |
|
|
Assignee: |
Accenture Global Services
Limited
Dublin
IE
|
Family ID: |
42827395 |
Appl. No.: |
14/280065 |
Filed: |
May 16, 2014 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
12547110 |
Aug 25, 2009 |
8762180 |
|
|
14280065 |
|
|
|
|
Current U.S.
Class: |
705/4 |
Current CPC
Class: |
G06N 20/00 20190101;
G06Q 40/08 20130101; G06Q 10/10 20130101 |
Class at
Publication: |
705/4 |
International
Class: |
G06Q 40/08 20120101
G06Q040/08; G06N 99/00 20060101 G06N099/00 |
Claims
1. A computer-implemented method comprising: generating training
data for training a predictive model to identify claims as likely
erroneous or as not likely erroneous, the training data including,
for each of a plurality of claims, (i) one or more intrinsic
features that are derived from the claim itself, and (ii) a label
indicating whether the claim is erroneous or not, the training data
further including one or more extrinsic features that are not
derived from the plurality of claims; training the predictive model
using the one or more intrinsic features, the one or more extrinsic
features, and the labels included in the training data; after
training the predictive model, receiving a particular claim;
generating (i) one or more intrinsic features for the particular
claim and (ii) one or more extrinsic features associated with the
particular claim; providing, to the predictive model, (i) the one
or more intrinsic features for the particular claim and (ii) the
one or more extrinsic features associated with the particular
claim; and obtaining, from the predictive model, an indication of
whether the particular claim is likely erroneous or is not likely
erroneous based on providing (i) the one or more intrinsic features
for the particular claim and (ii) the one or more extrinsic
features for the particular claim to the predictive model.
2. The computer-implemented method of claim 1, further comprising
submitting the particular claim to an audit process in response to
obtaining an indication that the particular claim is likely
erroneous.
3. The computer-implemented method of claim 2, further comprising:
receiving feedback associated with the particular claim from the
audit process; and updating the predictive model based on the
received feedback.
4. The computer-implemented method of claim 2, wherein submitting
the particular claim to the audit process includes providing a
description of why the particular claim is likely erroneous.
5. The computer-implemented method of claim 4, wherein the
description includes one or more potential errors in the particular
claim.
6. The computer-implemented method of claim 1, wherein the
indication of whether the particular claim is likely erroneous or
is not likely erroneous includes a probability score that indicates
a probability that the claim is erroneous.
7. The computer-implemented method of claim 1, wherein the training
data for training the predictive model is generated from historical
claim information.
8. The computer-implemented method of claim 1, wherein each of the
one or more extrinsic features included in the training data is
associated with at least one claim from the plurality of
claims.
9. The computer-implemented method of claim 1, wherein each of the
one or more extrinsic features included in the training data is
associated with a patient included in at least one claim from the
plurality of claims.
10. A non-transitory, computer-readable medium storing instructions
operable when executed to cause at least one processor to perform
operations comprising: generating training data for training a
predictive model to identify claims as likely erroneous or as not
likely erroneous, the training data including, for each of a
plurality of claims, (i) one or more intrinsic features that are
derived from the claim itself, and (ii) a label indicating whether
the claim is erroneous or not, the training data further including
one or more extrinsic features that are not derived from the
plurality of claims; training the predictive model using the one or
more intrinsic features, the one or more extrinsic features, and
the labels included in the training data; after training the
predictive model, receiving a particular claim; generating (i) one
or more intrinsic features for the particular claim and (ii) one or
more extrinsic features associated with the particular claim;
providing, to the predictive model, (i) the one or more intrinsic
features for the particular claim and (ii) the one or more
extrinsic features associated with the particular claim; and
obtaining, from the predictive model, an indication of whether the
particular claim is likely erroneous or is not likely erroneous
based on providing (i) the one or more intrinsic features for the
particular claim and (ii) the one or more extrinsic features for
the particular claim to the predictive model.
11. The computer-readable medium of claim 10, the operations
further comprising submitting the particular claim to an audit
process in response to obtaining an indication that he particular
claim is likely erroneous.
12. The computer-readable medium of claim 11, the operations
further comprising: receiving feedback associated with the
particular claim from the audit process; and updating the
predictive model based on the received feedback.
13. The computer-readable medium of claim 11, wherein submitting
the particular claim to the audit process includes providing a
description of why the particular claim is likely erroneous.
14. The computer-readable medium of claim 13, wherein the
description includes one or more potential errors in the particular
claim.
15. The computer-readable medium of claim 10, wherein the
indication of whether the particular claim is likely erroneous or
is not likely erroneous includes a probability score that indicates
a probability that the claim is erroneous.
16. The computer-readable medium of claim 10, wherein the training
data for training the predictive model is generated from historical
claim information.
17. The computer-readable medium of claim 10, wherein each of the
one or more extrinsic features included in the training data is
associated with at least one claim from the plurality of
claims.
18. The computer-readable medium of claim 10, wherein each of the
one or more extrinsic features included in the training data is
associated with a patient included in at least one claim from the
plurality of claims.
19. A system comprising: memory for storing data; and one or more
processors operable to perform operations comprising: generating
training data for training a predictive model to identify claims as
likely erroneous or as not likely erroneous, the training data
including, for each of a plurality of claims, (i) one or more
intrinsic features that are derived from the claim itself, and (ii)
a label indicating whether the claim is erroneous or not, the
training data further including one or more extrinsic features that
are not derived from the plurality of claims; training the
predictive model using the one or more intrinsic features, the one
or more extrinsic features, and the labels included in the training
data; after training the predictive model, receiving a particular
claim; generating (i) one or more intrinsic features for the
particular claim and (ii) one or more extrinsic features associated
with the particular claim; providing, to the predictive model, (i)
the one or more intrinsic features for the particular claim and
(ii) the one or more extrinsic features associated with the
particular claim; and obtaining, from the predictive model, an
indication of whether the particular claim is likely erroneous or
is not likely erroneous based on providing (i) the one or more
intrinsic features for the particular claim and (ii) the one or
more extrinsic features for the particular claim to the predictive
model.
20. The system of claim 19, the operations further comprising
submitting the particular claim to an audit process in response to
obtaining an indication that the particular claim is likely
erroneous.
Description
BACKGROUND
[0001] Processes used by healthcare claims payers are manually
intensive and inconsistently executed, and are subject to error,
fraud, and abuse. As a result healthcare administrators have
difficulty identifying and preventing claim payment errors.
Currently, about 30 percent of the expense of administering claims
is associated with back end operators and support, particularly
activities associated with "reworking" claims. That is, a great
deal of expense is associated with auditing claims to identify
payment errors, handling provider and patient complaints when
underpayments are made, and contacting providers and patients to
recover overpayments. These costs are ultimately borne by customers
(both providers and patients), and errors in processing claims can
also result in increasing customer dissatisfaction.
SUMMARY
[0002] Embodiments according to the present invention pertain to an
analytical tool that can be utilized in, for example, the
healthcare industry but can also be applied outside the healthcare
industry. Using healthcare as an example, the analytical tool is
used to address problems with reworking claims, such as but not
limited to payment issues (e.g., overpayment or underpayment of
claims). Importantly, the analytical tool is intended to identify
claims that have a high probability of being problematic so that
those claims can be proactively reconciled, thus avoiding or
reducing the cost and effort of reworking erroneous claims.
[0003] The analytical tool is generally characterized as
predictive, learning, and real-time system of models. To develop
the tool, information is collected from a number of disparate
sources and transformed into a useful format. The information can
include relatively unstructured text and semantic data collected
from a variety of sources, as well as structured (e.g., numerical,
statistical) data read from standardized claim forms. The
information is analyzed using methods such as segmentation,
classification, etc., to create predictive models that have the
capability to continuously learn--that is, the models can be
continually improved as new information is collected and as more
claims are evaluated.
[0004] In practice, as an example, claims known to have resulted in
a payment error can be analyzed using the inventive analytical
tool, and attributes of those claims can be compared against those
of other claims to identify additional claims that may also result
in a payment error. Those additional claims can be identified to an
auditor (automated or human), so that any errors in the claims can
be corrected before payment; if payment has been made, then errors
can be proactively rectified. Feedback from the auditor is
incorporated into the analytical tool, in this manner refining the
accuracy of the tool for application to subsequent claims.
[0005] By automatically detecting claims that may require rework
(correction or adjustment) in advance, customer relations can be
improved, and administrative efforts and costs can be reduced.
[0006] These and other objects and advantages of the present
invention will be recognized by one skilled in the art after having
read the following detailed description, which are illustrated in
the various drawing figures.
BRIEF DESCRIPTION OF THE DRAWINGS
[0007] The accompanying drawings, with are incorporated in and form
a part of this specification, illustrate embodiments of the
invention and, together with the description serve to explain the
principles of the invention. Like numbers denote like elements
throughout the drawings and specification.
[0008] FIG. 1 is e block diagram of an example of a computer system
upon which embodiments of the present invention can be
implemented.
[0009] FIG. 2 is an embodiment of a training process that can be
implemented by an analytical tool according to the invention.
[0010] FIG. 3 is an embodiment of deployment process that can be
implemented by an analytical tool according to the invention.
[0011] FIG. 4 is a block diagram showing an embodiment of a
computer-implemented analytical tool according to the
invention.
[0012] FIGS. 5 and 6 are flowcharts showing steps in
computer-implemented methods for processing claims according to
embodiments of the present invention.
DETAILED DESCRIPTION
[0013] In the following detailed description of embodiments
according to the present invention, numerous specific details are
set forth in order to provide a thorough understanding of those
embodiments. However, it will be recognized by one skilled in the
art that the present invention may be practiced without these
specific details or with equivalents thereof. In other instances,
well-known methods, procedures, components, and circuits have not
been described in detail as not to unnecessarily obscure aspects of
the present invention.
[0014] Some portions of the detailed descriptions which follow are
presented in terms of procedures, logic blocks, processing and
other symbolic representations of operations on data bits within a
computer memory. These descriptions and representations are the
means used by those skilled in the data processing arts to most
effectively convey the substance of their work to others skilled in
the art. In the present application, a procedure, logic block,
process, or the like, is conceived to be a self-consistent sequence
of steps or instructions leading to a desired result. The steps are
those requiring physical manipulations of physical quantities.
Usually, although not necessarily, these quantities take the form
of electrical or magnetic signals capable of being stored,
transferred, combined, compared, and otherwise manipulated in a
computer system.
[0015] It should be borne in mind, however, that all of these and
similar terms are to be associated with the appropriate physical
quantities and are merely convenient labels applied to these
quantities. Unless specifically stated otherwise as apparent from
the following discussions, it is appreciated that throughout the
present application, discussions utilizing the terms such as
"accessing," "combining," "incorporating," "identifying,"
"extracting," "predicting," "deriving," "flagging," "evaluating,"
"updating," "comparing," "applying," "quantifying," "associating"
"selecting" or the like, may refer to the actions and processes of
a computer system, or similar electronic computing device, that
manipulates and transforms data represented as physical
(electronic) quantities within the computer system's registers and
memories into other data similarly represented as physical
quantities within the computer system memories or registers or
other such information storage, transmission or display
devices.
[0016] Embodiments described herein may be discussed in the general
context of computer-executable instructions residing on some form
of computer-usable medium, such as program modules, executed by one
or more computers or other devices. Generally, program modules
include routines, programs, objects, components, data structures,
etc., that perform particular tasks or implement particular
abstract data types. The functionality of the program modules may
be combined or distributed as desired in various embodiments.
[0017] FIG. 1 shows a block diagram of an example of a computer
system 100 upon which the embodiments described herein may be
implemented. In its most basic configuration, the system 100
includes at least one processing unit 102 and memory 104. This most
basic configuration is illustrated in FIG. 1 by dashed line 106.
The system 100 may also have additional features/functionality. For
example, the system 100 may also include additional storage
(removable and/or non-removable) including, but not limited to,
magnetic or optical disks or tape. Such additional storage is
illustrated in FIG. 1 by removable storage 108 and non-removable
storage 120. The system 100 may also contain communications
connection(s) 122 that allow the device to communicate with other
devices.
[0018] Generally speaking, the system 100 includes at least some
form of computer-usable media. Computer-usable media can be any
available media that can be accessed by the system 100. By way of
example, and not limitation, computer-usable media may comprise
computer storage media and communication media.
[0019] Computer storage media includes volatile and nonvolatile,
removable and non-removable media implemented in any method or
technology for storage of information such as computer-readable
instructions, data structures, program modules or other data.
Computer storage media includes, but is not limited to, random
access memory (RAM), read only memory (ROM), electrically erasable
programmable ROM (EEPROM), flash memory or other memory technology,
compact disk ROM (CD-ROM), digital versatile disks (DVDs) or other
optical storage, magnetic cassettes, magnetic tape, magnetic disk
storage or other magnetic storage devices, or any other medium that
can be used to store the desired information and that can be
accessed by the system 100. Any such computer storage media may be
part of the system 100. The memory 104, removable storage 108 and
non-removable storage 20 are all examples of computer storage
media.
[0020] Communication media can embody computer-readable
instructions, data structures, program modules or other data in a
modulated data signal such as a carrier wave or other transport
mechanism and includes any information delivery media. The term
"modulated data signal" means a signal that has one or more of its
characteristics set or changed in such a manner as to encode
information in the signal. By way of example, and not limitation,
communication media includes wired media such as a wired network or
direct-wired connection, and wireless media such a acoustic, radio
frequency (RF), infrared and other wireless media. Combinations of
any of the above can also be included within the cope of
computer-readable media. The communications connection(s) 122
is/are an example of communication media.
[0021] The system 100 may also have input device(s) 124 such as
keyboard, mouse, pen, voice input device, touch input device, etc.
Output device(s) 126 such as a display, speakers, printer, etc.,
may also be included.
[0022] The system 100 may operate in a networked environment using
logical connections to one or more remote computers. When used in a
networking environment, the system 100 can be connected to the
network through the communication connection(s) 122.
[0023] In the example of FIG. 1, the memory 104 includes
computer-readable instructions, data structures, program modules
and the like associated with an analytics engine 150. However, the
analytics engine 150 may instead reside in any one of the computer
storage media used by the system 100, or may be distributed over
some combination of the computer storage media, or may be
distributed over some combination of networked computers.
[0024] The analytics engine 150 is generally characterized as a
predictive, learning, and real-time system of models. In overview,
the analytics engine (or analytical tool) 150 can be utilized in,
for example, the healthcare industry, including but not limited to
health insurance claim payers, hospitals, and physician groups.
However, embodiments according to the invention are not limited to
healthcare applications. Generally speaking, embodiments of the
invention can be utilized in businesses, industries, and the like
that utilize claims and other types of records, files, and forms
(other than claim forms) on a regular basis. In addition to
healthcare, embodiments according to the invention can be used to
evaluate claims, records, files, and the like that are used for
workman's compensation, property insurance, and casualty insurance,
for example.
[0025] When processing claims, for example, the analytics engine
150 can be used to identify claims that have a high probability of
being problematic so that those claims can be proactively
reconciled, thus avoiding or at least reducing the cost and effort
of reworking erroneous claims. In practice, claims known to include
an error, or to have resulted in an error, can be analyzed using
the analytics engine 150, and attributes of those claims can be
compared against other claims to identify any other claims that may
be associated with the same is type of error. As an example, claims
known to have resulted in a payment error can be analyzed using the
analytics engine 150, and attributes of those claims can be used as
the basis for identifying any other claims that may also result in
a payment error. Claims identified as potentially being problematic
can be submitted to an auditor (human or automated), so that any
errors can be corrected before handling of the claim is completed
(e.g., before payment). If the claim has been settled or finalized
(e.g., payment has been made), then errors can be proactively
rectified. Feedback from the auditor is incorporated into the
analytics engine 150, in this manner refining the accuracy of the
tool for subsequent application to other claims.
[0026] Elements and functionalities associated with the analytics
engine 150 are presented in detail below. The analytics engine 150
can be utilized in a training process (see FIG. 2) and/or in a
deployment process (see FIG. 3).
[0027] FIG. 2 shows an embodiment of a training process 200 that
can be implemented using the analytics engine 150 according to the
present invention. Process 200 can be implemented as
computer-readable instructions stored in a computer usable
medium.
[0028] In the FIG. 2 embodiment, the analytics engine 150 accesses
a claims database 202. The claims database 202 can include paid
and/or unpaid claims, claims that have been audited, claims that
have not yet been audited, claims that have been finalized/settled,
and claims that are pending. In the training process, consideration
of claims that have been audited can be important because the
lessons learned from those claims can be incorporated into the
prediction models 214 that are discussed below.
[0029] In a healthcare implementation, the claims are, in general,
stored in a standardized, computer-usable (computer-readable)
format, such as a format based on Health Care Financing
Administration (HCFA) Form 1500 or Uniform Billing (UB) Form 92.
The information in the claims database 202 may be referred to
herein as "first historical information."
[0030] The analytics engine 150 can also access information such as
unstructured text data 204 and external data 206. Generally
speaking, unstructured text data 204 and external data 206
encompass information not included in the claims database 202. As
will be seen from the discussion below, an objective of the
training process 200 is to develop models that can be used to
predict whether or not a claim contains accurate information--in
general, a purpose of the analytics engine 150 is to identify
potentially problematic claims and intercept those claims before
payment (as noted above, the tool can also be used to identify
potentially problematic claims after payment). A prediction can be
made by correlating information in a claim to information that is
known to be correct and/or information that is known to be
incorrect weighted by other types of information found to be
interesting by virtue of that information's value as an error
predictor or marker. In general, unstructured text data 204 and
external data 206 constitute those "other types of information,"
and as such they include a wide variety of different types of
information.
[0031] Unstructured text data 204 and external data 206 can be
based on, for example, dispute information, provider call
information, recovery information, and audit result information.
Unstructured text data 204 may further include information such as,
but not limited to, doctor's notes, auditor's notes, notes from
customer service calls, etc. External data 206, in general,
includes other information that is not included in the claims and
is not already included in the unstructured text data 204. External
data 206 may include information such as, but not limited to,
Web-based information or information from other sources that are
up-to-date--for example, information that a certain area of a
country or the world is experiencing a flu outbreak may be included
in the external data. External data 206 may also include
information from various public or private databases that may be
freely available or may be sold commercially.
[0032] The types of information included in the unstructured text
data 204 and the external data 206, as well as the information
itself, may be dynamic. For example, information about the
aforementioned flu outbreak may be considered relevant for a period
of time but may become less and less relevant, and this may be
weighted less or discarded. Unstructured text data 204 and external
data 206, as well as any other information not included in the
claims database 202 or not derived from the information in the
claims database, may be collectively referred to herein as "second
historical information."
[0033] The information in the data sources--the claims database
202, the unstructured text data 204, and the external data 206--is
used as the basis for identifying features that are expected to be
useful in identifying problematic claims. A "feature," generally
speaking, is a named parameter or variable; a value for the feature
is obtained from or derived from the aforementioned data sources on
a per-claim basis (although a feature may have the same value for a
group of claims).
[0034] Features may be based on the type of entries included in the
data sources --for example, the patient's age, the doctor's name, a
contract number, an insurance code, a monetary amount, and an
address can be read directly from the data sources. Features may
also be derived from the types of information included in the data
sources--for example, the number of days between the time when a
medical procedure was performed and the time when the claim is
submitted, the number of days between claim receipt and data
processing, the procedure code that had the highest billed amount
in a claim, the amount paid for a particular procedure minus the
average amount paid for that procedure over the past six months and
then divided by the standard deviation for the amount paid for that
procedure over the same six months, or the cost of a particular
type of medication averaged over a group of doctors or over a
particular geographical region over a period of time, may be
considered useful features that can be determined using the
information included in the data sources.
[0035] The transformed data 208 include values for the chosen
features. The transformed data 208 may include the claims
themselves (paid and unpaid, audited and not audited, settled and
pending), actual data that is parsed from the aforementioned data
sources, and derived data that is determined by manipulating,
translating, or mining the actual data.
[0036] A variety of analytical techniques can be used, individually
and jointly, to generate the transformed data 208. These techniques
include, but are not limited to, machine learning techniques.
[0037] Machine learning techniques include text mining, data
mining, neural networks, and natural language processing. Examples
of machine learning techniques include, but are not limited to,
linear perception, support vector machines, decision trees, and
Naive Bayes. These techniques can each be used to more quickly
evaluate unstructured data; identify and delineate interesting
segmentations, groupings and patterns in the data; and facilitate
further human understanding and interpretation of large sets of
data. Machine learning techniques can rely on advanced statistical
methods like linear and non-linear regression techniques, as well
as non-statistical methods like decision trees, segmentation,
classification, and neural networks. Using machine learning
techniques, the unstructured data 204 and external data 206 can be
searched to find data that matches or satisfactorily matches a
specified text string, for example.
[0038] Essentially, the portion of the process 200 described to
this point constitutes aspects of an ETL (extract, transform, load)
process. Accordingly, disparate data in a variety of different
formats from a variety of different sources can be transformed into
a useful and relatively standardized format.
[0039] In the example of FIG. 2, the transformed data 208 is
separated into training data 210 and validation data 212. The
training data 210 can be used to develop the models 214. The models
214 are essentially qualification or regression models that
implement machine learning techniques to evaluate the transformed
data 208 in order to calculate a probability, and perhaps an
associated confidence level that a claim contains an error. Once
developed, the models 214 can be validated using the validation
data 212.
[0040] More specifically, the training data 210 is used to develop
correlations between the features included in the transformed data
208 and the likelihood that a particular claim is either correct or
incorrect. Embodiments according to the invention can be used to
detect various types of errors, not just errors associated with
payment/reimbursement. For example, an error in one or more of the
claim attributes can be identified--e.g., an error in a contract
number, procedure code, etc., specified in the claim, as well as
any monetary amount included in the claim may be detected and
identified.
[0041] However, it may be that only a certain characteristic of the
claim is of interest. For example, desired goal is accurate
reimbursement of money owed to a provider or patient, then the
claim may be classified as incorrect only if a monetary amount
included in the claim is incorrect, or if some other claim
attribute that affects proper reimbursement is incorrect, or if
some claim attribute that is an accurate predictor of a potential
reimbursement error has a certain value or is itself erroneous.
Thus, in one implementation, the venous types of errors that are
identified can be filtered in order to focus on a particular type
of error (e.g., payment/reimbursement errors).
[0042] Each feature included in the training data 210 can be
appropriately considered and assessed using the various analytical
techniques mentioned above, until one or more models 214 are
produced. In practice, many different models are produced. The
models 214 use the value associated with a particular feature, or
values associated with a particular combination of features, to
calculate probability that a particular claim is problematic (e.g.,
the claim may contain an error, or may contain information that
results in an error). The models 214 can also be used to calculate
a confidence level associated with the calculated probability. Once
the training data 210 can be satisfactorily predicted using the
models 214--that is, once problematic claims can be identified to a
satisfactory level of confidence--then the validation data 212 can
be used to independently test and verify the accuracy of the
models. Model development is an iterative process between training
and validation that proceeds until the validation data 212 is
satisfactorily predicted.
[0043] With reference now to FIG. 3, an embodiment of a deployment
process 300 that can be implemented using the analytics engine 150
according to the present invention is shown. Process 300 can be
implemented as computer-readable instructions stored in a
computer-usable medium.
[0044] In the FIG. 3 embodiment, the analytics engine 150 accesses
a claims database 302. The type of content in the claims database
302 is similar to that of the claims database 202 (FIG. 2), but the
claims in the deployment process 300 are different from, or in
addition to, the claims used in the training process 200.
[0045] In a healthcare claims payer example, the claims database
302 can include claims that have been paid as well as claims that
have not yet been paid. In general, the claims database 302 can
include claims that have been processed/finalized, and claims that
have not yet been processed or that are undergoing processing. More
generally, the analytics engine 150 can be used to evaluate any
instantiated claim.
[0046] The analytics engine 150 can also access information such as
unstructured text data 304 and external data 306 of FIG. 3. The
unstructured text data 304 and external data 306 are, in general,
similar to the respective elements described above in conjunction
with FIG. 2 with regard to the type of content, but the
unstructured text data 304 and external data 306 used in the
deployment process 300 may be different from, or in addition to,
the information used in the training process 200.
[0047] The data from the data sources 302, 304, and 306 of FIG. 3
is transformed into transformed data 308 based on the features
identified during the training process 200 of FIG. 2. In other
words, the transformed data 308 may include the claims themselves,
actual data parsed from the FIG. 3 data sources, and derived data
determined by manipulating, translating, or mining the actual data.
Thus, generally speaking, relatively large and diverse sets of data
(diverse in terms of both content and source) are accommodated and
managed.
[0048] Continuing with reference to FIG. 3, for each claim selected
from the claims database 302, the models 214 are applied to the
corresponding transformed data 308 to calculate a probability that
the claim is incorrect (or, conversely, is correct). In essence,
each evaluated claim is scored. A single score may be assigned to
the claim as a whole, or there may be multiple scores associated
with the claim. In the latter case, for example, one or more
attributes of a claim may be individually scored. In one
embodiment, the claim's score can be compared to a conditional
value such as a specified threshold value; if the score satisfies
the conditional value (e.g., exceeds the threshold), then the claim
can be forwarded to an auditor 312 for further evaluation (e.g.,
audit). The audit can be implemented in an automated fashion (e.g.,
in software) by adhering to pre-established auditing rules.
Alternatively, a human auditor can perform the audit.
[0049] As mentioned previously herein, there can be many different
types of models 214. For example, the models 214 may be based on
neural networks, regression logic (e.g., based on linear
regression, logistic regression, polynomial regression, and/or
kernel regression), and decision trees. One or more of the models
214 may be used to assess a claim. The models themselves may weight
the various claim attributes to calculate a result, and the results
from each model may also be weighted.
[0050] In the audit, a potentially problematic claim can be
reviewed and decision made as to whether the claim is in fact
erroneous. Feedback from the auditor 312 can be incorporate into
the modes 214 in order to refine the accuracy of those models. For
example, if the claim is indeed erroneous, then the auditor's
feedback can be used to reinforce the accuracy of the models 214;
if the claim is in fact not problematic (although predicted to be)
then the auditor's feedback can be used to refine the models 214 so
that erroneous predictions are less likely to occur during
evaluations of subsequent claims. As shown in FIG. 3, the training
process 200 can be repeated using feedback from the auditor 312 to
refine (update) the existing models 214 or to develop (create) new,
additional models. Generally speaking, the analytics engine 150 can
be continually updated--as new results are generated from the
deployment process 300, the training process 200 can be repeated to
update the models 214. In this manner, the audit process helps
teach the analytics engine 150 by reinforcing correct decisions and
identifying incorrect ones.
[0051] The analytics engine 150 can also be updated because of
temporal changes. For example, payer practices or procedures may
change for some reason--for example, the terms of a contract may
change--and the analytics engine 150 can be updated accordingly.
Furthermore, as experience is gained through the deployment process
300, new claim attributes or features of interest may be identified
and added to the models 214, and new models may be developed.
[0052] In one embodiment, in addition to generating a score for a
claim of interest, the analytics engine 150 also generates comments
or explanations that accompany the claim and are specific to that
claim. The comments may take a variety of forms such as those
described below.
[0053] In one implementation, a human auditor is presented with an
outline (electronic, displayed) version of the claim form, with
potentially problematic items in the claim highlighted in some
manner. For example, potential errors in the claims can be
presented in a different color, or have a different background
color, relative to other items in the claim. The degree of
coloration can be varied to reflect the degree of probability of a
potential error. For example, items that are more likely to be
incorrect can be displayed using a darker or brighter color
relative to items less likely to be incorrect. For example, if the
analytics engine 150 indicates that a monetary amount in the claim
is possibly in error, then in addition to highlighting that amount,
the attribute(s) of the claim that triggered the identification of
that error can also be highlighted to varying degrees.
[0054] In another implementation, the auditor is provided with a
text-based explanation of why a claim, or an attribute of the
claim, may be incorrect. As the number of claims evaluated by the
analytics engine 150 increases, recurring problems/errors can be
identified and uniformly described. In other words, in one
embodiment, a set of standard comments is generated; these standard
comments may each be associated with a respective numerical code or
key that in turn is associated with an appropriate text-based
explanation. These standard comments provide a plain language
explanation of the potential problem with or error in a claim. Once
an error is identified, the set of standard comments can be
automatically reviewed to determine which comment, if any, is
probably the most appropriate, and the selected comment can be
provided to the auditor along with the potentially problematic
claim. While auditing the claim, the auditor can also audit the
appropriateness of the selected comment and provide feedback to the
analytics engine 150 in that regard. The feedback about the comment
can be used to refine the part of the analytics engine 150 that
selects comments so that, in subsequent claim evaluations, the
appropriate comment is more accurately selected.
[0055] The additional information provided to the auditor is not
necessarily limited to an explanation highlighting one or more
attributes of the claim. An explanation can also be associated with
information (e.g., a feature or features) derived from those
attributes. An explanation can also be associated with the claim in
general, in order to broadly characterize the claim; for example,
the explanation may label a claim as an underpayment or
overpayment.
[0056] Thus, generally speaking, auditors can be presented with
information that helps them identify the reason(s) why a claim was
flagged and what aspect(s) of the claim should probably be the
focus of the audit. Such information can be about the claim, about
an attribute of the claim, or about a feature derived from an
attribute of the claim.
[0057] In one embodiment, instead of providing all potentially
problematic claims to the auditors, only certain claims are
forwarded. More specifically, only those claims deemed to be more
important, or the most important, may be provided to auditors.
Different criteria can be applied to identify which claims are the
most important. In one implementation, those claims that would
result in the most improvement to the analytics engine 150 are
deemed to be the most important claims. For example, those claims
that would result in the greatest improvement in prediction
accuracy would be deemed the most important. The most important
claims can be, for example, those claims that represent a large
number of identical or similar claims, or the claims that have the
potential for confusing the analytics engine because the
probability that they are erroneous is very close to the threshold
value mentioned above. Claims that may result in the greatest cost
savings, either directly or indirectly, if correctly identified as
problematic may also be considered important. For example,
substantial cost savings may result if erroneous claims that are
more likely to result in a phone call to a service center are
intercepted and corrected before payment is mailed--by reducing the
number of such claims, perhaps the size of the call center can be
reduced to reduce costs.
[0058] In one embodiment, to identify the most important
problematic claims, a measure of the improvement in the accuracy of
the analytics engine 150, assuming those claims were correctly
predicted, is quantified. For example, for a potentially
problematic claim identified as such, the effect on the analytics
engine 150 can be simulated in the background (e.g., by executing
the training process 200 in parallel with the deployment process
300), with the simulation based on an assumption that the claim at
issue has been properly characterized as correct/incorrect. Claims
that have the largest impact on system accuracy are forwarded to
the auditors. For example, as mentioned above, a measure of
improvement can be calculated per claim; in such an implementation,
only those claims whose measure is greater than a threshold value
are forwarded to auditors.
[0059] In another embodiment, once a properly predicted claim--that
is, a claim that is properly predicted as being correct or, perhaps
of greater interest, a claim that is properly predicted as being
problematic--is identified, an auditor can request additional
claims that are similar to that claim. A similar claim may include
attributes that are, at least to some degree, identical to the
properly predicted claim. If, for example, a particular attribute
or feature of a properly predicted, problematic claim is identified
as a source of error, then an auditor can request other claims that
have the same value for that attribute or feature. An auditor can
alternatively request, for example, other claims that received the
same or similar scores as the processed claim. As noted previously
herein, attributes of a claim can be individually scored;
accordingly, an auditor can request other claims with an attribute
or attributes that received the same or similar scores as the
attribute(s) in the processed claim. Also, as noted previously
herein, additional information (e.g., a standard explanation,
perhaps text-based or identified by a numerical code) can be
associated with a processed claim; accordingly, an auditor can
request other claims associated with the same or similar additional
information as that associated with the processed claim. In
general, an auditor can use one or more attributes of any claim of
interest, and/or the results associated with a processed claim, to
define search criteria that can be used to identify and select
other claims that may be of interest to the auditor. Even more
generally, any information associated with any claim of interest
can be used to search for, identify, and select another claim from
the claims database 202 or 302.
[0060] FIG. 4 is a block diagram showing elements of an embodiment
of an analytics engine 150 according to the present invention. In
one embodiment, the analytics engine 150 is implemented as
computer-readable components residing on a computer-usable
medium.
[0061] In the example of FIG. 4, the analytics engine 150 includes
a data extraction module 404 and one or more predictive models 214.
The data extraction module 404 can access the transformed data 308
and extract information associated with each claim being
evaluated.
[0062] Claims selected for evaluation may be paid or unpaid,
audited or not audited, pending or settled; in general, any
instance of a claim, regardless of the claim's status, can be
evaluated. Claims can be selected for evaluation in a variety of
ways. In one implementation, all claims are evaluated. In other
implementations, only selected claims are evaluated. In the latter
implementations, claims can be selected at random, or they can be
selected in response to queries, rules, or other specified
selection (search) criteria. For example, once a claim has been
identified as being potentially problematic, or after such a claim
has been audited and confirmed as being problematic, parameters
based on that claim can be defined to execute a search of the
transformed data 308 (or of the claims database 302) in order to
identify other claims that may be similarly problematic. Examples
of other mechanisms for selecting claims for processing have been
mentioned above. Claims selected as a result of such a search may
bypass the models 214, proceeding directly to an audit stage (e.g.,
auditor 312).
[0063] As described above, the models 214 can be used to predict
whether a claim is correct or incorrect. For example the models 214
can predict whether a monetary value specified in each claim is
correct. Furthermore, the models 214 can flag a claim for an audit
if the claim is predicted to be incorrect, generate comments
associated with each claim, and highlight aspects of a claim that
may be of particular interest to an auditor, as previously
described herein.
[0064] In one embodiment, the analytics engine 150 also includes a
data transformation module 402 that can be used in the training
process 200 to access the data sources 202, 204, and 206 and
generate the transformed data 208, and that can be used in the
deployment process 300 to access the data sources 302, 304, and 308
and generate the transformed data 308, as previously described
herein. The data transformation module 402 can use, for example,
machine learning techniques.
[0065] As previously described herein, auditing may be performed by
a human. However, in one embodiment, the analytics engine 150 also
includes an auditor 312 that can automatically audit flagged claims
using pre-established auditing rules. In one such embodiment, the
auditor 312 is used to identity flagged claims that may be more
important than other claims, as previously mentioned herein. That
is, the auditor 312 can, in effect, filter flagged claims so that
only selected claims are forwarded to a human auditor. Results from
the auditor 312, or from the audit process in general, can be used
to identify other claims for evaluation. Results from the auditor
312, or from the audit process in general, are also fed back into
the training process 200 to update the models 214 as previously
described herein.
[0066] The analytics engine 150 can provide other functionality in
addition to that just described. For example, the analytics engine
150 can incorporate functionality that permits tracking of the
status of unpaid claims or paid claims that need to be recovered or
adjusted. For example, the analytics engine 150 can incorporate
functionality that allows patients and providers to be
automatically contacted once an erroneous claim is identified--in
other works, the analytics engine can generate and perhaps send
standardized form letters. Also, for example, the analytics engine
150 can provide management and financial reporting
functionality.
[0067] Furthermore, information regarding the type and frequency of
errors can be recorded by the analytics engine 150 so that such
information can be subsequently used to perform root cause analyses
or to spot emerging trends,
[0068] FIGS. 5 and 6 are flowcharts showing embodiments of
computer-implemented methods for processing claims. Although
specific steps are disclosed in the flowcharts, such steps are
examples only. That is, various other steps or variations of the
steps recited in the flowcharts can be performed. The steps in the
flowcharts may be performed in an order different than presented.
Furthermore, the features of the various embodiments described by
the flowcharts can be used alone or in combination with each other.
In one embodiment, the flowcharts are implemented as
computer-readable instructions stored in a computer-usable
medium.
[0069] With reference first to FIG. 5, in block 502, first
historical information associated with a set of claims is accessed.
The first historical information includes attributes of the claims
and information derived from the claim attributes. Specifically,
during a training process, the first historical information
includes information in the claims database 202 of FIG. 2.
[0070] In block 504, second historical information is accessed. The
second historical information includes information in addition to
the aforementioned first historical information. Specifically,
during the training process, the second historical information
includes the unstructured text data 204 and the external data 206
of FIG. 2.
[0071] In block 506, using a training process 200 (FIG. 2) such as
that described above, the first and second historical information
can be combined to develop models 214 that are useful for
predicting whether a claim is correct.
[0072] In block 508, feedback from an audit of the claim is
incorporated into the models 214.
[0073] With reference now to FIG. 6, in block 602, attributes of a
claim are accessed.
[0074] In block 604, the attributes of the claim are evaluated
using the models 214 (FIG. 3) to determine a probability that the
claim is erroneous. For example, a score is calculated for the
claim being evaluated.
[0075] In block 606, if the probability (score) satisfies a
threshold, then the claim is flagged for further evaluation (e.g.,
an audit) to determine whether the claim is indeed erroneous.
[0076] In block 608, in one embodiment, if the claim is flagged for
an audit, then additional information (e.g., a comment or
explanation) is associated with the claim, to facilitate the audit.
The additional information, generally speaking, is used to
highlight in some fashion an attribute or attributes of the claim
that likely triggered the audit. The additional information may be,
for example, a text-based comment or non-textual visual cue (e.g.,
a different color or brightness may be used to highlight an
attribute).
[0077] In block 610, in one embodiment, if the claim is flagged for
an audit, then a measure of improvement to the model 214 is
determined prior to the audit. The audit may only be performed if
the measure satisfies a threshold value. In other words, an effort
can be made to identify important claims, where importance may be
defined in any number of ways, with only the more important claims
being audited.
[0078] In summary, embodiments according to the present invention
provide an automated system/tool that allows seamless integration
of claims data and other data sets including unstructured
text/semantic data from a variety of sources, including Web-based
sources and databases. Using that information, an automated
analysis of claims can be performed to identify problems before the
claim is settled, or to reconcile errors identified after the claim
is settled. Moreover, the system/tool can continually learn from
the results of the claims analysis. In the long run, less rework
will be needed, thereby reducing costs. Also, as accuracy
increases, consumer (patient and provider) satisfaction will
increase. Error may be unintentional or intentional (e.g.,
fraudulent)--by systematically improving the capability to
accurately identify errors, fraudulent claims can be more readily
identified.
[0079] Although described in the context of insurance claims in the
healthcare industry, embodiments according to the present invention
are not limited. For example, aspects of the present invention can
be applied to insurance claims in other industries. Also, aspects
of the invention can be applied to records, files, and other types
of forms (other than claim forms) that may be utilized on a regular
basis in various types of industries.
[0080] The foregoing descriptions of specific embodiments according
to the present invention have been presented for purposes of
illustration and description. They are not intended to be
exhaustive or to limit the invention to the precise forms
disclosed, and many modifications and variations are possible in
light of the above teaching. The embodiments were chosen and
described in order to best explain the principles of the invention
and its practical application, to thereby enable others skilled in
the art to best utilize the invention and various embodiments with
various modifications as are suited to the particular use
contemplated. It is intended that the scope of the invention be
defined by the claims appended hereto and their equivalents.
* * * * *