U.S. patent application number 17/493205 was filed with the patent office on 2022-01-27 for system and method for aggregating data from a plurality of data sources.
The applicant listed for this patent is Palantir Technologies Inc.. Invention is credited to Eli Bingham, Jasjit Grewal, Engin Ural, Nicholas White.
Application Number | 20220027426 17/493205 |
Document ID | / |
Family ID | |
Filed Date | 2022-01-27 |
United States Patent
Application |
20220027426 |
Kind Code |
A1 |
White; Nicholas ; et
al. |
January 27, 2022 |
SYSTEM AND METHOD FOR AGGREGATING DATA FROM A PLURALITY OF DATA
SOURCES
Abstract
According to certain aspects, a computer system may be
configured to aggregate and analyze data from a plurality of data
sources. The system may obtain data from a plurality of data
sources, each of which can include various types of data, including
email data, system logon data, system logoff data, badge swipe
data, employee data, job processing data, etc. associated with a
plurality of individuals. The system may also transform data from
each of the plurality of data sources into a format that is
compatible for combining the data from the plurality of data
sources. The system can resolve the data from each of the plurality
of data sources to unique individuals of the plurality of
individuals. The system can also determine an efficiency indicator
based at least in part on a comparison of individuals of the unique
individuals that have at least one common characteristic.
Inventors: |
White; Nicholas; (London,
GB) ; Bingham; Eli; (New York, NY) ; Ural;
Engin; (Brooklyn, NY) ; Grewal; Jasjit;
(Brooklyn, NY) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
Palantir Technologies Inc. |
Denver |
CO |
US |
|
|
Appl. No.: |
17/493205 |
Filed: |
October 4, 2021 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
16173408 |
Oct 29, 2018 |
11138279 |
|
|
17493205 |
|
|
|
|
14816599 |
Aug 3, 2015 |
10198515 |
|
|
16173408 |
|
|
|
|
14304741 |
Jun 13, 2014 |
9105000 |
|
|
14816599 |
|
|
|
|
61914229 |
Dec 10, 2013 |
|
|
|
International
Class: |
G06F 16/9535 20060101
G06F016/9535; G06Q 10/06 20060101 G06Q010/06; G06F 16/215 20060101
G06F016/215; G06F 16/242 20060101 G06F016/242; G06F 16/2457
20060101 G06F016/2457; G06F 16/34 20060101 G06F016/34 |
Claims
1. A computer system comprising: a hardware computer processor
configured to execute code to cause the computer system to: access,
from a first data source, first data items each associated with one
of a plurality of individuals, the first data items indicating one
or more of a badge-in time indicating entrance to a physical
facility or a badge-out time indicating exit from the physical
facility; generate, for each individual, a first summary of first
data items associated with the individual; access, from a second
data source, second data items each associated with one of the
plurality of individuals, the second data items indicating one or
more of system login time, system logout time, VPN login time, or
VPN logout time; generate, for each individual, a second summary of
second data items associated with the individual, wherein at least
the first summary and the second summary are each accessible by the
computer system; determine, a first group of unique individuals
each sharing a first correlation between the first summary and the
second summary; determine a second group of unique individuals each
sharing a second correlation between the first summary and the
second summary; generate a first output based at least on first
data items and second data items of individuals in the first group;
generate a second output based at least on first data items and
second data items of individuals in the second group; determine a
first security indicator for the first group relative to the second
group based at least in part on comparison of the first output and
the second output; and generate user interface data configured to
cause display of a user interface on a user computing device, the
user interface including an indication of the first group, the
second group, and the determined first security indicator.
2. The computing system of claim 1, wherein the first output
indicates one or more of an average badge-in time or an average
badge-out time.
3. The computing system of claim 1, wherein the code is further
configured to cause the computer system to: receive, via input from
the user interface, selection of a comparison characteristic;
determine, a third group of unique individuals each sharing the
comparison characteristic; generate a third output based at least
on first data items of individuals in the third group; determine a
second security indicator for the first group based at least in
part on comparison of the first output and the third output; and
update the user interface data to indicate the first group, the
third group and the determined second security indicator.
4. The computer system of claim 1, wherein the first common
characteristic comprises badge-in time within a timespan or
badge-out time within the timespan.
5. The computing system of claim 1, wherein the code is further
configured to cause the computer system to: generate a mapping of
unique individuals within each of the first group and the second
group.
6. The computing system of claim 1, wherein the code is further
configured to cause the computer system to: access expected format
information indicating an expected format of data from the first
data source providing the first data items or the second data
source providing the second data items; and detect inconsistencies
in a first format of the data from the first data source or a
second format from the second data source as compared to the
expected format.
7. The computing system of claim 6, wherein the code is further
configured to cause the computer system to: in response to
detection of an inconsistency, obtain the first or second data from
the respective first or second data source such that the first or
second data no longer has the inconsistency.
8. The computing system of claim 7, wherein the code is further
configured to cause the computer system to: update the user
interface data to include an indicator of the inconsistency.
9. The computing system of claim 1, wherein the code is further
configured to cause the computer system to: determine a first file
size for the first data items or second data items; access a
previous file size for a previous version of the first data items
or second data items; detect a discrepancy in size between the
previous file size and the first file size; and in response to
detection of the discrepancy, obtain the first data items or second
data items from respective first or second data sources such that
the first or second data no longer has the discrepancy.
10. A computerized method, performed by a computing system having
one or more hardware computer processors and one or more
non-transitory computer readable storage device storing software
instructions executable by the computing system to perform the
computerized method comprising: accessing, from a first data
source, first data items each associated with one of a plurality of
individuals, the first data items indicating one or more of a
badge-in time indicating entrance to a physical facility or a
badge-out time indicating exit from the physical facility;
generating, for each individual, a first summary of first data
items associated with the individual; accessing, from a second data
source, second data items each associated with one of the plurality
of individuals, the second data items indicating one or more of
system login time, system logout time, VPN login time, or VPN
logout time; generating, for each individual, a second summary of
second data items associated with the individual, wherein at least
the first summary and the second summary are each accessible by the
computer system; determining, a first group of unique individuals
each sharing a first correlation between the first summary and the
second summary; determining a second group of unique individuals
each sharing a second correlation between the first summary and the
second summary; generating a first output based at least on first
data items and second data items of individuals in the first group;
generating a second output based at least on first data items and
second data items of individuals in the second group; determining a
first security indicator for the first group relative to the second
group based at least in part on comparison of the first output and
the second output; and generating user interface data configured to
cause display of a user interface on a user computing device, the
user interface including an indication of the first group, the
second group, and the determined first security indicator.
11. The computerized method of claim 10, wherein the first output
indicates one or more of an average badge-in time or an average
badge-out time.
12. The computerized method of claim 10, further comprising:
receiving, via input from the user interface, selection of a
comparison characteristic; determining, a third group of unique
individuals each sharing the comparison characteristic; generating
a third output based at least on first data items of individuals in
the third group; determining a second security indicator for the
first group based at least in part on comparison of the first
output and the third output; and updating the user interface data
to indicate the first group, the third group and the determined
second security indicator.
13. The computerized method of claim 10, wherein the first common
characteristic comprises badge-in time within a timespan or
badge-out time within the timespan.
14. The computerized method of claim 10, further comprising:
generating a mapping of unique individuals within each of the first
group and the second group.
15. The computerized method of claim 10, further comprising:
accessing expected format information indicating an expected format
of data from the first data source providing the first data items
or the second data source providing the second data items; and
detecting inconsistencies in a first format of the data from the
first data source or a second format from the second data source as
compared to the expected format.
16. The computerized method of claim 10, further comprising: in
response to detection of an inconsistency, obtaining the first or
second data from the respective first or second data source such
that the first or second data no longer has the inconsistency.
17. The computerized method of claim 10, further comprising:
updating the user interface data to include an indicator of the
inconsistency.
18. The computerized method of claim 10, further comprising:
determining a first file size for the first data items or second
data items; accessing a previous file size for a previous version
of the first data items or second data items; detecting a
discrepancy in size between the previous file size and the first
file size; and in response to detection of the discrepancy,
obtaining the first data items or second data items from respective
first or second data sources such that the first or second data no
longer has the discrepancy.
Description
INCORPORATION BY REFERENCE TO ANY PRIORITY APPLICATIONS
[0001] Any and all applications for which a foreign or domestic
priority claim is identified in the Application Data Sheet as filed
with the present application are hereby incorporated by reference
under 37 CFR 1.57.
[0002] This application is a continuation of U.S. application Ser.
No. 16/173,408, filed on Oct. 29, 2018, which is a continuation of
U.S. application Ser. No. 14/816,599, filed on Aug. 3, 2015, which
is a continuation of U.S. application Ser. No. 14/304,741, filed
Jun. 13, 2014, now U.S. Pat. No. 9,105,000, which claims the
benefit of U.S. Provisional Application No. 61/914,229, filed Dec.
10, 2013. Each of these applications are hereby incorporated by
reference herein in their entireties.
TECHNICAL FIELD
[0003] The present disclosure relates to systems and techniques for
data integration and analysis. More specifically, the present
disclosure relates to aggregating data from a plurality of data
sources and analyzing the aggregated data.
BACKGROUND
[0004] Organizations and/or companies are producing increasingly
large amounts of data. Data may be stored across various storage
systems and/or devices. Data may include different types of
information and have various formats.
SUMMARY
[0005] The systems, methods, and devices described herein each have
several aspects, no single one of which is solely responsible for
its desirable attributes. Without limiting the scope of this
disclosure, several non-limiting features will now be discussed
briefly.
[0006] In one embodiment, a computer system configured to aggregate
and analyze data from a plurality of data sources comprises: one or
more hardware computer processors configured to execute code in
order to cause the system to: obtain data from a plurality of data
sources, each of the plurality of data sources comprising one or
more of: email data, system logon data, system logoff data, badge
swipe data, employee data, software version data, software license
data, remote access data, phone call data, or job processing data
associated with a plurality of individuals; detect inconsistencies
in formatting of data from each of the plurality of data sources;
transform data from each of the plurality of data sources into a
format that is compatible for combining the data from the plurality
of data sources; associate the data from each of the plurality of
data sources to unique individuals of the plurality of individuals;
and determine efficiency indicators for respective individuals
based at least in part on a comparison of data associated with the
respective individuals and other individuals that have at least one
common characteristic.
[0007] In another embodiment, a computer system configured to
aggregate and analyze data from a plurality of data sources
comprises: one or more hardware computer processors configured to
execute code in order to cause the system to: access data from a
plurality of data sources, each of the plurality of data sources
comprising one or more of: email data, system logon data, system
logoff data, badge swipe data, employee data, software version
data, software license data, remote access data, phone call data,
or job processing data associated with a plurality of individuals;
detect inconsistencies in formatting of datum from respective data
sources; associate respective datum from the plurality of data
sources to respective individuals of the plurality of individuals;
and provide statistics associated with respective individuals based
on data associated with the respective individuals from the
plurality of data sources. In certain embodiments, the code is
further configured to cause the computer system to: transform
respective datum from the plurality of data sources into one or
more formats usable to generate the statistics.
[0008] In yet another embodiment, a non-transitory computer
readable medium comprises instructions for aggregating and
analyzing data from a plurality of data sources that cause a
computer processor to: access data from a plurality of data
sources, each of the plurality of data sources comprising one or
more of: email data, system logon data, system logoff data, badge
swipe data, employee data, software version data, software license
data, remote access data, phone call data, or job processing data
associated with a plurality of individuals; detect inconsistencies
in formatting of datum from respective data sources; associate
respective datum from the plurality of data sources to respective
individuals of the plurality of individuals; and provide statistics
associated with respective individuals based on data associated
with the respective individuals from the plurality of data sources.
In certain embodiments, the instructions are further configured to
cause the computer processor to: transform respective datum from
the plurality of data sources into one or more formats usable to
generate the statistics.
BRIEF DESCRIPTION OF THE DRAWINGS
[0009] FIG. 1 is a block diagram illustrating one embodiment of a
data analysis system configured to aggregate and analyze data from
a plurality of data sources.
[0010] FIG. 2 is a block diagram illustrating components of the
data analysis system of FIG. 1, according to one embodiment.
[0011] FIG. 3 is a flowchart illustrating one embodiment of a
process for aggregating and analyzing data from a plurality of data
sources.
[0012] FIG. 4 is a flowchart illustrating one embodiment of a
process for performing a quality reliability test for determining
the reliability of data from one or more of a plurality of data
sources.
[0013] FIG. 5 is a flow diagram illustrating examples of types of
data that can be aggregated and analyzed for employee efficiency
and/or productivity analysis.
[0014] FIG. 6 illustrates an example user interface displaying an
output of a data analysis system.
[0015] FIG. 7 illustrates another example user interface displaying
an output of a data analysis system.
[0016] FIG. 8 is a block diagram illustrating a computer system
with which certain methods discussed herein may be implemented.
DETAILED DESCRIPTION
Overview
[0017] Organizations and/or companies may generate, collect, and
store large amounts of data related to activities of employees.
Such data may be stored across various storage systems and/or
devices and may have different formats. For example, data of an
organization may be stored in different locations (e.g., different
cities, countries, etc.) and different types of media (disk
storage, tapes, etc.). Data may be available in the form of web
services, databases, flat files, log files, etc. Even within the
same organization, the format of various data sources can be
different (e.g., different identifiers can be used to refer to an
employee). Therefore, it may be difficult to query and extract
relevant information from vast amounts of data scattered in
different data sources. Accordingly, there is a need for
aggregating and analyzing data from various data sources in an
efficient and effective way in order to obtain meaningful analysis.
For example, there is a need for improved analysis of data from
multiple data sources in order to track activities of and determine
productivity of employees.
[0018] As disclosed herein, a data analysis system may be
configured to aggregate and analyze data from various data sources.
Such data analysis system may also be referred to as a "data
pipeline." The data pipeline can accept data from various data
sources, transform and cleanse the data, aggregate and resolve the
data, and generate statistics and/or analysis of the data. The data
analysis system can accept data in different formats and transform
or convert them into a format that is compatible for combining with
data from other data sources. The data analysis system can also
resolve the data from different sources and provide useful analysis
of the data. Because data from any type or number of data sources
can be resolved and combined, the resulting analysis can be robust
and provide valuable insight into activities relating to an
organization or company. For example, a company can obtain analysis
relating to employee email activity, employee efficiency, real
estate resource utilization, etc. As used herein, combining of data
may refer to associating data items from different data sources
without actually combining the data items within a data structure,
as well as storing data from multiple data sources together.
Data Pipeline
[0019] FIG. 1 is a block diagram illustrating one embodiment of a
data analysis system 100 configured to aggregate and analyze data
from a plurality of data sources. The data sources may provide data
in various data formats that are accepted by the data analysis
system 100, such as web services, databases, flat files, log files,
etc. The data can be in various formats. In some instances, the
data may be defined as CSV, in rows and columns, in XML, etc. The
data analysis system 100 can accept data from multiple data sources
and produce statistics and/or analysis related to the data. Some
examples of types of data sources that the data analysis system 100
can accept as input include employee data, email data, phone log
data, email log data, single sign-on (SSO) data, VPN login data,
system logon/logoff data, software version data, software license
data, remote access data, badge swipe data, etc.
[0020] The data analysis system 100 can function as a general
transform system that can receive different types of data as input
and generate an output specified by an organization or company. For
example, the data analysis system 100 can accept employee data and
email data of an organization and output a list of top 10 email
senders or recipients for each employee. The output from the data
analysis system 100 may package the aggregated data in a manner
that facilitates querying and performing analysis of the data. One
type of querying and/or analysis may be operational efficiency
analysis.
[0021] FIG. 2 is a block diagram illustrating components of the
data analysis system 100 of FIG. 1, according to one embodiment.
The data analysis system 200 of FIG. 2 can be similar to the data
analysis system 100 of FIG. 1. The data analysis system 200 can
accept data from multiple data sources and output processed data.
The system 200 can include a data reliability/consistency module
210, a job orchestration module 220, an aggregation/cleansing
module 230, and an analysis module 240. The system 200 can include
fewer or additional modules or components, depending on the
embodiment. One or more modules may be combined or reside on the
same computing device, depending on the embodiment. In certain
embodiments, functions performed by one module in the system 200
may be performed by another module in the system 200.
[0022] The data reliability/consistency module 210 may perform
quality reliability tests on various data sources. The system 200
may have access to information about the format of a data source,
and the data reliability/consistency module 210 can detect whether
the format of the data from the data source is consistent with the
expected format. Performing reliability tests prior to aggregating
the data can ensure that the output and analysis generated by the
system 200 is reliable. Details relating to quality reliability
test are explained further below.
[0023] The job orchestration module 220 may automate jobs for
aggregating, cleansing, and/or analyzing the data. The job
orchestration module 220 can manage steps involved in each process
and scheduling the steps. For example, the job orchestration module
220 can define and schedule steps for transforming and resolving
data from multiple data sources. The job orchestration module 220
can define workflows for various processes, e.g., through
coordinated sequences of commands, scripts, jobs, etc. In some
embodiments, the system 200 uses open source tools, such as
Rundeck, Kettle, etc.
Cleansing/Aggregation
[0024] The aggregation/cleansing module 230 aggregates and/or
cleanses data from various sources. "Cleansing" may refer to
transforming and resolving the data from various sources so that
they can be combined. Data in different sources may not be readily
combined although the data may relate to a common entity (e.g., an
employee or group of employees) and/or same type of information
(e.g., time). For example, the IDs used to identify an employee can
be different from one data source to another. In such case, the IDs
may be mapped so that they can be resolved to specific employees.
The system 200 may identify a standard identifier that can map two
or more employee IDs used by different data sources to a particular
employee. One example of a standard identifier can be the
employee's email address. If both data sources include the
corresponding email addresses for employee IDs, the data from the
two sources can be associated with an employee associated with the
email addresses.
[0025] In another example, data sources may include timestamp or
time data (e.g., employee badge-in time, employee badge-out time,
etc.), but the time information in data received from different
sources may have the same time reference and, thus, may not be in
local time zone of any particular employee (except those employees
that may be in the standard time reference used by the
organization). For example, an organization may have locations in
various time zones, but all timestamps may be represented in UTC
(Coordinated Universal Time) or GMT (Greenwich Mean Time) format.
In order to make comparison across different time zones, the time
information may need to be converted or adjusted according to the
local time zone. Each employee's timestamp can be shifted or
adjusted so that the time information reflects the local time. The
aggregation/cleansing module 230 may obtain the local time zone
information based on the employee's city, state, and country
information in order to make the appropriate adjustment. Data for
employees related to time (e.g., time arrived at office, time
checked-out for lunch, etc.) may then be compared in a more
meaningful manner with the time entries converted to represent
local times. The examples above have been explained for
illustrative purposes, and cleansing can include any processing
that is performed on the data from multiple sources in order to
combine them.
[0026] Once the data from various data sources are cleansed, they
can be combined and aggregated. For example, multiple different IDs
can be mapped to the same employee (or other entity) based on a
common or standard identifier such that data using any of those
multiple different IDs can each be associated with the same
employee. The vast amounts of data can be joined and combined so
that they are available for analysis.
[0027] In some embodiments, the data can be imported into and
aggregated using a distributed computing framework (e.g., Apache
Hadoop). A distributed computing framework may allow for
distributed processing of large data sets across clusters of
computers. Such framework may provide scalability for large amounts
of data. Data may be cleansed and/or aggregated using a software
that facilitates querying and managing of large data sets in
distributed storage (e.g., Apache Hive).
[0028] The aggregation/cleansing module 230 can generate an output
from the combined data. The output can be defined as appropriate by
the organization or company that is requesting the data analysis.
The output may combine data and provide an intermediate format that
is configured for further analysis, such as querying and/or other
analysis. In one example, employee data and email data are
aggregated in order to provide an intermediate outcome that may be
analyzed to provide insights into employee email activity. For
example, an organization may have about 5 billion emails, but
querying all emails can be slow and may not yield valuable
information. Instead, the aggregation/cleansing module 230 can
aggregate the email data and provide an intermediate output
including data such as a list of top email senders, top email
recipients, top sender domains, top recipient domains, number of
sent emails, number of received emails, etc. for each employee.
This intermediate output, which is a reduced amount of data, may
then be considered in querying. Using the intermediate output, the
aggregation/cleansing module 230 may search for top
sender/recipient domains, top email sending employees, top email
receiving employees, etc. For instance, the aggregation/cleansing
module 230 can search through the intermediate output (which can
include, e.g., top email senders, top email recipients, top sender
domains, top recipient domains, number of sent emails, number of
received emails, etc. for each employee) in order to produce
statistics and/or analysis for the entire organization (e.g.,
information such as top email sending employees, top email
receiving employees, top domains, etc. and/or information output in
user interfaces illustrated in FIGS. 6-7). In this manner, the data
can be analyzed and reduced in a manner that makes it easier for
organizations to ask questions about various aspects of their
operations or business. For instance, a company may be interested
in finding out information on its operational efficiency.
[0029] In another example, the aggregation/cleansing module 230 can
output the number of emails an employee sends in 15-minute buckets
(e.g., how many emails an employee has sent every 15 minutes). This
output can serve as an intermediate output for determining a
relationship between the number of emails an employee sends and
employee efficiency or productivity. In other embodiments, the
aggregation/cleansing module 230 can generate an output that is not
used as an intermediate output for analysis, but that can be
directly output to the users (e.g., information output in user
interfaces illustrated in FIGS. 6-7). In certain embodiments, the
aggregation/cleansing module 230 may provide combined data to the
analysis module 240, and the analysis module 240 may generate the
intermediate output. For example, the aggregation/cleansing module
230 may resolve and combine the data sets, and the analysis module
240 can generate an intermediate output from the combined data
set.
Analysis
[0030] The analysis module 240 can perform analysis based on the
output from the aggregation/cleansing module 230. For example, the
analysis module 240 can perform queries for answering specific
questions relating to the operations of an organization. One
example question may be what are the top domains that send emails
to employees of an organization. The analysis module 240 can search
through the top sender domains for all employees and produce a list
of top sender domains. The analysis module 240 may be an analysis
platform that is built to interact with various outputs from the
data analysis system 200, which users from an organization can use
to obtain answers to various questions relating to the
organization. In some embodiments, the output from the
aggregation/cleansing module 230 may be an intermediate output for
facilitating further analysis. In other embodiments, the output
from the aggregation/cleansing module 230 may be a direct output
from the cleansing and/or aggregating step that does not involve an
intermediate output.
Quality Reliability Tests
[0031] As mentioned above, the system 200 can perform quality
reliability tests to determine the reliability of the data from
various data sources. The data reliability/consistency module 210
of the system 200 can perform the reliability tests. The data
reliability/consistency module 210 can detect inconsistencies
and/or errors in the data from a data source. The data
reliability/consistency module 210 may have access to information
about data from a certain data source, such as the typical size of
the data, typical format and/or structure of the data, etc. The
data reliability/consistency module 210 may refer to the
information about a data source in order to detect inconsistencies
and/or errors in received data.
[0032] The data reliability/consistency module 210 can flag a
variety of issues, including: whether the file size is similar to
the file size of previous version of the data, whether the
population count is similar to the previously received population
count, whether the structure of the data has changed, whether the
content or the meaning of the data has changed, etc. In one
embodiment, population count may refer to the number of items to
expect, for example, as opposed to their size. For example, the
data reliability/consistency module 210 can identify that the
content or the meaning of a column may have changed if the
information used to be numeric but now is text, or if a timestamp
used to be in one format but now is in a different format. Large
discrepancies or significant deviations in size and/or format can
indicate that the data was not properly received or pulled.
[0033] The data reliability/consistency module 210 can run one or
more tests on data received from a data source. If the data
reliability/consistency module 210 determines that data from a data
source is not reliable, the system 200 may attempt to pull or
receive the data again and run reliability tests until the data is
considered sufficiently reliable. By making sure that the data of a
data source is reliable, the system 200 can prevent introducing
inaccurate data into the analysis further down the process.
[0034] In some embodiments, the data reliability/consistency module
210 may also perform quality reliability tests on aggregated data
to make sure that the output from the aggregation/cleansing module
230 does not have errors or inconsistencies. In one example, the
number of unique employees may be known or expected to be around
250,000. However, if the resolved number of unique employees is
much more or less than the known or expected number, this may
indicate an error with the output. In such case, the data
reliability/consistency module 210 can flag issues with the output
and prevent introduction of error in later steps of the
process.
Employee Email Activity Example
[0035] In one embodiment, the data analysis system 200 aggregates
data relating to employees of an organization, such as email
activity of the employees. For example, an organization may be
interested in finding out about any patterns in employee email
activity and employee efficiency. The data analysis system 200
accepts data from at least two data sources: one data source that
includes employee data and another data source that includes email
data. The email data may be available in the form of email logs
from an email server application (e.g., Microsoft Exchange), for
example, and may include information such as sender, recipient,
subject, body, attachments, etc. The employee data may be accessed
in one or more databases, for example, and may include information
such as employee ID, employee name, employee email address,
etc.
[0036] The data reliability/consistency module 210 can perform
quality reliability tests on data received from each of the data
sources. For example, the data reliability/consistency module 210
can compare the file size of the employee data against the file
size of the previously received version of the employee data, or
the data reliability/consistency module 210 can compare the file
size against an expected size. The data reliability/consistency
module 210 can also check whether the structure of the employee
data has changed (e.g., number and/or format of rows and columns).
The data reliability/consistency module 210 can also run similar
tests for the email data. The data reliability/consistency module
210 can check the file size, structure, etc. If the data
reliability/consistency module 210 determines that the data from a
data source has errors or inconsistencies, the system 200 may try
to obtain the data again. The data reliability/consistency module
210 can run the reliability tests on the newly received data to
check whether it is now free of errors and/or inconsistencies. Once
all (or selected samplings) of the data from the data sources is
determined to be reliable, the system 200 can cleanse and aggregate
the data from the data sources.
[0037] The job orchestration module 220 can schedule the steps
involved in cleansing and aggregating data from the various
sources, such as employee data and email data. For example, the job
orchestration module 220 can define a series of commands to be
performed to import data from multiple data sources and transform
the data appropriately for combining. As explained above, the data
from the data sources can be imported into a distributed computing
framework that can accommodate large amounts of data, such as
Hadoop. The data can be cleansed and aggregated using the
distributed computing framework.
[0038] The aggregation/cleansing module 230 can map the employee
data and the email data by using the employee's email address. The
email data may include emails that have the employee's email
address as either the sender or the recipient, and these emails can
be resolved to the employee who has the corresponding email
address. By mapping the email address to employee ID, the email
data can be resolved to unique individuals. After resolving the
data to unique individuals, the aggregation/cleansing module 230
can aggregate the data and generate an intermediate output that can
be used to perform further analysis (e.g., by the analysis module
240).
[0039] Because the size of email data in an organization can be
quite large (e.g., 5 billion emails), analyzing all emails after
they have been resolved to unique individuals may not be the most
efficient way to proceed. Accordingly, the aggregation/cleansing
module 230 may make the amount of data to be analyzed more
manageable by aggregating or summarizing the data set. In a
specific example, the aggregation/cleansing module 230 generates a
list of top email senders and top email recipients for each
employee from all of the emails associated with that employee. For
instance, the aggregation/cleansing module 230 can generate an
output of top email senders in the format "sender name," "sender
domain," and "number of emails from sender" (e.g., "John Doe,"
"gmail.com," "10"). The aggregation/cleansing module 230 can
extract the domain information from the sender email address to
provide the sender domain information. The aggregation/cleansing
module 230 can generate an output of top email recipients for an
employee in a similar manner. The aggregation/cleansing module 230
may also extract file extensions for attachments and generate a
list of types of attachments and counts of attachments for each
employee. The aggregation/cleansing module 230 can generate any
intermediate output of interest for each employee, and such
intermediate output can be used in further analysis by the analysis
module 240. In some embodiments, the intermediate output may be
directly output to the users. For example, the system 200 may
display in the user interface the list of top senders and top
recipients for employees who send and/or receive the most number of
emails within the organization.
[0040] The users in an organization can interact with the analysis
module 240 in order to obtain information regarding operational
efficiency. In the above example, the analysis module 240 may
produce a list of employees who send the highest number of emails,
employees who receive the highest number of emails, employees who
receive the highest number of attachments, top domains for all
emails, etc. for the whole organization. The final output can be
based on the intermediate output for each employee. For example,
the top domains for all emails can be determined by querying which
domains are most common in the top sender domains and top recipient
domains for all employees.
[0041] In certain embodiments, the email activity may be compared
with other activities of employees (e.g., loan processing as
explained below) to determine whether certain patterns or trends in
an employee's email activity affect employee efficiency. In some
embodiments, the analysis module 240 can generate an efficiency
indicator for each employee based on the combined and aggregated
data from multiple data sources. An efficiency indicator can
provide information relating to the efficiency of an employee. The
efficiency indicator may be based on some or all aspects of
employee activity.
Loan Processing Example
[0042] In another embodiment, the data analysis system 200 accepts
data from data sources relating to loan processing. For example,
the data analysis system 200 can receive employee data, loan
processing data, and other related information in order to analyze
employee efficiency in processing loans and factors affecting
employee efficiency. The data reliability/consistency module 210
can perform any relevant quality reliability tests on each of the
data sources.
[0043] The aggregation/cleansing module 230 can cleanse and combine
the data from multiple data sources. Any combination of data can be
aggregated. For instance, the employee data and the loan processing
data can be combined with one or more of: software version or
upgrade data, employee arrival/departure time data, training
platform data, etc. In one example, the employee data and the loan
processing data are combined with software version or upgrade data,
and the resulting data may show that employees that have a certain
version of the software are more efficient. In another example, the
employee data and the loan processing data are combined with
employee arrival time, and the resulting data may show that
employees who arrive earlier in the day are more efficient.
[0044] The analysis module 240 may compare a group of individuals
who have one or more common characteristics against each other. For
example, a group may be defined by the same position, manager,
department, location, etc. The members within a group may be
compared to analyze trends. The analysis module 240 may also
compare one group against another group. For instance, one group
may report to one manager, and another group may report to a
different manager. A group of individuals that share a common
characteristic may be referred to as a "cohort." Comparison of
groups or cohorts may reveal information about why one group is
more efficient than another group. For example, the analysis of the
resulting data may show that Group A is more efficient than Group
B, and the software for Group A was recently upgraded whereas the
software for Group B has not been. Such correlation can allow the
organization to decide whether and when to proceed with the
software upgrade for Group B.
Real Estate Resource Example
[0045] In certain embodiments, the data analysis system 200 can
accept data relating to real estate resources of an organization.
The data sources may include information relating to one or more
of: real estate resource data, costs associated with various real
estate resources, state of various real estate resources, employees
assigned to various real estate resources, functions assigned to
various real estate resources, etc. For example, the data analysis
system 200 can accept real estate resource data and employee
activity data from multiple data sources. The data from multiple
data sources can be combined to analyze whether certain real estate
resources can be merged, eliminated, temporarily replace other
resources during emergencies, etc. For example, if there are
multiple locations of an organization within the same city, it may
make sense to merge the offices based on the analysis.
Organizations may also determine which real estate resources can
carry on fundamental business processes during emergencies or
natural disasters, e.g., as part of business continuity planning.
The analysis module 240 can perform such analysis based on the
resulting output from the aggregation/cleansing module 230. In some
embodiments, the aggregation/cleansing module 230 may produce an
intermediate output that can be used by the analysis module
240.
Other Examples
[0046] In some embodiments, employee activity data can be combined
and analyzed to detect any security breaches. The data analysis
system 200 can aggregate employee data relating to badge-in time,
badge-out time, system login/logout time, VPN login/logout time,
etc. in order to identify inconsistent actions. For example, if an
employee badged in at 9:30 am at the office and logged on to the
system at 9:32 am, but there is a VPN login at 9:30 am, the data
analysis system 200 can identify that the VPN login is probably not
by the employee.
[0047] FIG. 3 is a flowchart illustrating one embodiment of a
process 300 for aggregating and analyzing data from a plurality of
data sources. The process 300 may be implemented by one or more
systems described with respect to FIGS. 1-2 and 8. For illustrative
purposes, the process 300 is explained below in connection with the
system 100 in FIG. 1. Certain details relating to the process 300
are explained in more detail with respect to FIGS. 1-2 and 4-8.
Depending on the embodiment, the process 300 may include fewer or
additional blocks, and the blocks may be performed in an order that
is different than illustrated.
[0048] At block 301, the data analysis system 100 accesses and/or
obtains data from a plurality of data sources. In the examples of
employee monitoring, the data sources can include various data
types, including one or more of: email data, system logon data,
system logoff data, badge swipe data, employee data, software
version data, software license data, remote access data, phone call
data, job processing data, etc. associated with a plurality of
individuals. The type of data source accepted by the system 100 can
include a database, a web service, a flat file, a log file, or any
other format data source. The data in one data source can have a
different format from the data in another data source.
[0049] At block 302, the system 100 performs a quality reliability
test for determining reliability of data from each of the plurality
of data sources. Quality reliability tests may be based on expected
characteristics of data from a particular data source. Each data
source may have a different set of expected characteristics. In one
embodiment, the system 100 detects inconsistencies in formatting of
data from each of the plurality of data sources. In one embodiment,
the system 100 may perform multiple reliability tests on data from
each of the plurality of data sources in order to identify any
errors or inconsistences in data received from the data sources.
For example, the system 100 can check whether the file size matches
expected file size, structure of the data matches expected
structure, and/or number of entries matches expected number of
entries, among other data quality checks. Any significant
deviations may signal problems with a particular data source.
Details relating to performing quality reliability tests are
further explained in connection with FIG. 4.
[0050] At block 303, the system 100 transforms the data into a
format that is compatible for combining and/or analysis. When the
data from different data sources are imported into the system 100,
the data from the data sources may not be in a format that can be
combined. In one example, time information may be available from
multiple data sources, but one time data source can use the
universal time, and another data source can use local time. In such
case, the time data in one of the data sources should be converted
to the format of the other data source so that the combined data
has the time same reference.
[0051] At block 304, the system 100 resolves the data from each of
the plurality of data sources to unique individuals. Unique
individuals may be a subset of the plurality of individuals with
whom the data from the plurality of data sources is associated. For
example, some data sources may include information about
individuals who are not employees (e.g., consultants), and such
data may not be resolved to specific employees. The system 100 may
resolve the data from each of the plurality of data sources at
least partly by mapping a column in one data source to a column in
another data source.
[0052] At block 305, the system 100 generates output data
indicating analysis of the resolved data, such as efficiency
indicators that are calculated using algorithms that consider data
of employees gathered from multiple data sources. In one
embodiment, the system 100 determines an efficiency indicator based
at least in part on a comparison of individuals of the unique
individuals that have at least one common characteristic. The at
least one common characteristic can be the same title, same
position, same location, same department, same manager or
supervisor, etc.
[0053] In certain embodiments, the system 100 may generate an
intermediate output based on the resolved data, and the system 100
can determine an efficiency indicator based on the intermediate
output. The intermediate output may be a reduced version of the
resolved data. A reduced version may not contain all of the
resolved data, but may include a summary or aggregation of some of
the resolved data. For example, the reduced version of employee
email data does not contain all employee emails, but can include a
list of top senders and top recipients for each employee.
[0054] In one embodiment, a first data source of the plurality of
data sources includes employee data, and a second data source of
the plurality of data sources includes email data. The system 100
can resolve the data from each of the plurality of data sources by
resolving the employee data and the email data to unique employees.
The efficiency indicator can indicate an efficiency level
associated with an employee out of the unique employees.
[0055] FIG. 4 is a flowchart illustrating one embodiment of a
process 400 for performing a quality reliability test for
determining the reliability of data from one or more of a plurality
of data sources. The process 400 may be implemented by one or more
systems described with respect to FIGS. 1-2 and 8. For illustrative
purposes, the process 400 is explained below in connection with the
system 100 in FIG. 1. Certain details relating to the process 400
are explained in more detail with respect to FIGS. 1-3 and 5-8.
Depending on the embodiment, the process 400 may include fewer or
additional blocks, and the blocks may be performed in an order that
is different than illustrated.
[0056] At block 401, the data analysis system 100 accesses
information associated with a format of data from a data source.
The information may specify the structure of data (e.g., number of
columns, type of data for each column, etc.), expected size of the
data, expected number of entries in the data, etc. For example,
each data source (or set of data sources) may have a different
format.
[0057] At block 402, the system 100 determines whether data from a
data source is consistent with the expected format of data as
indicated in the accessed data format information. For example, the
system 100 can check if the structure of the data is consistent
with the expected format. The system 100 can also check if the size
of the data is similar to the expected size of the data.
[0058] At block 403, if data from a data source is not consistent
with the expected format, size, or other expected characteristic,
the system 100 identifies an inconsistency in the data from the
data source. If the system 100 identifies any inconsistencies, the
system 100 can output indications of the inconsistency in the data
to the user. The system 100 may also attempt to obtain the data
from the data source until the data no longer has
inconsistencies.
[0059] FIG. 5 is a flow diagram illustrating examples of types of
data that can be aggregated and analyzed for employee efficiency
and/or productivity analysis. In one embodiment, the data analysis
system 500 accepts email data 510, building/equipment access data
520, and/or human resources data 530. Based on the imported data,
the data analysis system 500 can produce an employee productivity
report 550. The employee productivity report 550 can be based on
any combination of data from multiple data sources.
[0060] As explained above, the building/equipment access data 520
and human resources data 530 can be cleansed and aggregated to
perform security-based analysis (e.g., are there any suspicious
system logins or remote access). The building/equipment access data
520 and human resources data 530 can also be combined to perform
efficiency analysis (e.g., what are the work hour patterns of
employees and how efficient are these employees). In other
embodiments, the email data 510 can be combined with human
resources data 530 to perform efficiency analysis (e.g., how does
employee email activity affect efficiency). An organization can
aggregate relevant data that can provide answers to specific
queries about the organization. Certain details relating to
analysis of employee efficiency or email activity is explained in
more detail with respect to FIGS. 1-4 and 6-7.
[0061] The employee productivity report 550 can provide a
comparison of an employee to individuals who share common
characteristics. Depending on the embodiment, the employee may be
compared to individuals who have different characteristics (e.g.,
supervisors). The comparison can also be between a group to which
an employee belongs and a group to which an employee does not
belong.
[0062] FIG. 6 illustrates an example user interface 600 displaying
an output of a data analysis system. The data analysis system can
be similar to systems explained in connection with FIGS. 1-2 and 8.
The user interface 600 shows an example of results from employee
email analysis. As illustrated, the user interface 600 includes a
list of top 10 senders 610, a list of top 10 recipients 620, a list
of attachment count 630, and a list of top 10 domains 640.
[0063] The list of top 10 senders 610 can include top 10 employees
of an organization who sent the most number of emails in a specific
time period. The top 10 senders list 610 can show, for each
employee in the list, the name of the employee, the email address
of the employee, and the total number of emails sent by the
employee. The time period or span for which the list 610 is
generated can vary depending on the requirements of the
organization. For instance, the list 610 may include top 10 senders
for a specific day, week, month, etc.
[0064] The list of top 10 recipients 620 may be similar to the list
of top 10 senders 610. The top 10 recipients list 620 can include
top 10 employees of the organization who received the most number
of emails in a specific time period. The top 10 recipients list 620
can show, for each employee in the list, the name of the employee,
the email address of the employee, and the total number of emails
received by the employee. The time period or span for which the top
10 recipients list 620 is generated can be the same as the time
period or span for the top 10 senders list 610.
[0065] The list of attachment count 630 can list top employees who
have sent or received the most number of attachments. The
attachment list 630 can display the name of the employee and the
total number of attachments. The attachment list 630 can provide an
overview of employees who may potentially use a large percentage of
storage resources due to sending and/or receiving of numerous
attachments.
[0066] The list of top 10 domains 640 can show the list of common
domains from which emails are sent to the employees of the
organization or common domains to which the employees send emails.
In the example of FIG. 6, the top domains list 640 lists
"gmail.com" as the top domain, "yahoo.com" as the second domain,
and so forth. Since employees can send many internal emails, the
domain for the organization may not be included in the top domain
list.
[0067] In this manner, the data analysis system can provide an
analysis of certain aspects of employee behavior. The email
activity data may be combined and/or aggregated with other types of
data in order to examine relationships between employee email
activity and other aspects of employee behavior. Such relationships
may provide insights into factors that affect employee
efficiency.
[0068] FIG. 7 illustrates another example user interface 700
displaying an output of a data analysis system. The data analysis
system can be similar to systems explained in connection with FIGS.
1-2 and 8. The user interface 700 shows an example of results from
employee loan processing analysis. As illustrated, the user
interface 700 includes columns for the following information:
employee name 710, employee ID 715, employee position/title 720,
employee location 725, average badge-in time 730, average badge-out
time 735, average number of processed jobs 740, employee efficiency
745, percentage of total emails sent to applicants 750, and average
percentage of total emails sent to applicants for others with the
same title 755.
[0069] Employee name 710 can refer to the name of an employee, and
employee ID 715 can be an identifier that designates a particular
employee. Position/title 720 can refer to an employee's position or
title. The user interface 700 shows two different positions: loan
processor and loan processing supervisor. The location 725 can
refer to the office location of an employee. The user interface 700
shows three different locations: A, B, and C.
[0070] The average badge-in time 730 can refer to the average time
an employee badges in to the office during a period of time. The
average badge-out time 735 can refer to the average time an
employee badges out of the office during a period of time. The
average can be calculated based on badge-in or badge-out times over
a specific period of time, such as several days, a week, several
weeks, a month, etc.
[0071] The average number of processed jobs 740 may refer to the
number of loan jobs an employee processed over a period of time.
The period of time can be determined as appropriate by the
organization (e.g., a week, several weeks, a month, etc.). The time
period over which the number of jobs is averaged may match the time
period used for determining average badge-in time and badge-out
time.
[0072] The employee efficiency 745 may refer to the efficiency
level or indicator associated with an employee. The values shown in
user interface 700 are low, medium, and high, but the efficiency
level can be defined as any metric or scale that the organization
wants.
[0073] The percentage of total emails sent to applicants 750 may
refer to the percentage of emails sent to loan applicants out of
all of the emails sent by an employee. In the user interface 700,
80% of Jane Doe's emails are sent to loan applicants, while 95% of
John Smith's emails are sent to loan applicants. John Doe sends
only 50% of his emails to loan applicants, and Jane Smith sends 85%
of her emails to loan applicants.
[0074] The average percentage of total emails sent to applicants
for other with the same title 755 can refer to the average
percentage for employees that have the same position/title. The
data analysis system can provide a point of comparison with other
employees with respect to a specific attribute or property. In the
example of FIG. 7, the average percentage column provides a point
of comparison for percentage of emails sent to loan applicants with
respect to employees having the same title. The average percentage
for the position of "loan processing supervisor" is 52%, and the
average percentage for the position of "loan processor" is 87%.
This column can provide a point of comparison with other employees
that have the same title. For example, John Doe's percentage of
emails sent to applicants is very low compared to the average
percentage for all employees who are loan processors.
[0075] The user interface 700 also includes a drop-down menu or
button 760 that allows the user to change the comparison group. In
the example of FIG. 7, the comparison group is other employees that
have the same title. The comparison group can be changed by
selecting a different category from the options provided in the
drop-down menu 760. For example, the user can change the comparison
group to employees at the same location or employees at the same
location with the same title. The options in the drop-down menu can
be a list of item or checkboxes. Depending on the embodiment,
multiple items or checkboxes can be selected or checked. The
comparison group can be changed by the user as appropriate, and the
content displayed in the user interface 700 can be updated
accordingly. In some embodiments, the comparison group can have
different attributes from an employee, or can be different from the
group to which an employee belongs. For example, the comparison
group can include employees who have a different position,
employees from a different department, etc.
[0076] The efficiency level or indicator 745 can be based on any
combination of data that may be available to the data analysis
system. As explained above, an efficiency indicator can provide
information relating to one or more aspects of an employee's
efficiency. In one example, the efficiency level can be based on
the average number of processed jobs and the time spent in the
office during a particular period of time. In another example, the
efficiency level can be based on a comparison with other employees.
In FIG. 7, the percentage of emails sent to applicants for an
employee is compared to the average percentage of emails sent to
applicants for employees having the same title. The efficiency
level may incorporate the comparison to others having the same
title. In such case, the efficiency level for John Doe can be very
low since his percentage of emails sent to applicants is far below
the average percentage for employees with the same title.
Implementation Mechanisms
[0077] According to one embodiment, the techniques described herein
are implemented by one or more special-purpose computing devices.
The special-purpose computing devices may be hard-wired to perform
the techniques, or may include circuitry or digital electronic
devices such as one or more application-specific integrated
circuits (ASICs) or field programmable gate arrays (FPGAs) that are
persistently programmed to perform the techniques, or may include
one or more hardware processors programmed to perform the
techniques pursuant to program instructions in firmware, memory,
other storage, or a combination. Such special-purpose computing
devices may also combine custom hard-wired logic, ASICs, or FPGAs
with custom programming to accomplish the techniques. The
special-purpose computing devices may be desktop computer systems,
server computer systems, portable computer systems, handheld
devices, networking devices or any other device or combination of
devices that incorporate hard-wired and/or program logic to
implement the techniques.
[0078] Computing device(s) are generally controlled and coordinated
by operating system software, such as iOS, Android, Chrome OS,
Windows XP, Windows Vista, Windows 7, Windows 8, Windows Server,
Windows CE, Unix, Linux, SunOS, Solaris, iOS, Blackberry OS,
VxWorks, or other compatible operating systems. In other
embodiments, the computing device may be controlled by a
proprietary operating system. Conventional operating systems
control and schedule computer processes for execution, perform
memory management, provide file system, networking, I/O services,
and provide a user interface functionality, such as a graphical
user interface ("GUI"), among other things.
[0079] For example, FIG. 8 is a block diagram that illustrates a
computer system 1800 upon which an embodiment may be implemented.
For example, the computing system 1800 may comprises a server
system that accesses law enforcement data and provides user
interface data to one or more users (e.g., executives) that allows
those users to view their desired executive dashboards and
interface with the data. Other computing systems discussed herein,
such as the user (e.g., executive), may include any portion of the
circuitry and/or functionality discussed with reference to system
1800.
[0080] Computer system 1800 includes a bus 1802 or other
communication mechanism for communicating information, and a
hardware processor, or multiple processors, 1804 coupled with bus
1802 for processing information. Hardware processor(s) 1804 may be,
for example, one or more general purpose microprocessors.
[0081] Computer system 1800 also includes a main memory 1806, such
as a random access memory (RAM), cache and/or other dynamic storage
devices, coupled to bus 1802 for storing information and
instructions to be executed by processor 1804. Main memory 1806
also may be used for storing temporary variables or other
intermediate information during execution of instructions to be
executed by processor 1804. Such instructions, when stored in
storage media accessible to processor 1804, render computer system
1800 into a special-purpose machine that is customized to perform
the operations specified in the instructions.
[0082] Computer system 1800 further includes a read only memory
(ROM) 808 or other static storage device coupled to bus 1802 for
storing static information and instructions for processor 1804. A
storage device 1810, such as a magnetic disk, optical disk, or USB
thumb drive (Flash drive), etc., is provided and coupled to bus
1802 for storing information and instructions.
[0083] Computer system 1800 may be coupled via bus 1802 to a
display 1812, such as a cathode ray tube (CRT) or LCD display (or
touch screen), for displaying information to a computer user. An
input device 1814, including alphanumeric and other keys, is
coupled to bus 1802 for communicating information and command
selections to processor 1804. Another type of user input device is
cursor control 1816, such as a mouse, a trackball, or cursor
direction keys for communicating direction information and command
selections to processor 1804 and for controlling cursor movement on
display 1812. This input device typically has two degrees of
freedom in two axes, a first axis (e.g., x) and a second axis
(e.g., y), that allows the device to specify positions in a plane.
In some embodiments, the same direction information and command
selections as cursor control may be implemented via receiving
touches on a touch screen without a cursor.
[0084] Computing system 1800 may include a user interface module to
implement a GUI that may be stored in a mass storage device as
executable software codes that are executed by the computing
device(s). This and other modules may include, by way of example,
components, such as software components, object-oriented software
components, class components and task components, processes,
functions, attributes, procedures, subroutines, segments of program
code, drivers, firmware, microcode, circuitry, data, databases,
data structures, tables, arrays, and variables.
[0085] In general, the word "module," as used herein, refers to
logic embodied in hardware or firmware, or to a collection of
software instructions, possibly having entry and exit points,
written in a programming language, such as, for example, Java, Lua,
C or C++. A software module may be compiled and linked into an
executable program, installed in a dynamic link library, or may be
written in an interpreted programming language such as, for
example, BASIC, Perl, or Python. It will be appreciated that
software modules may be callable from other modules or from
themselves, and/or may be invoked in response to detected events or
interrupts. Software modules configured for execution on computing
devices may be provided on a computer readable medium, such as a
compact disc, digital video disc, flash drive, magnetic disc, or
any other tangible medium, or as a digital download (and may be
originally stored in a compressed or installable format that
requires installation, decompression or decryption prior to
execution). Such software code may be stored, partially or fully,
on a memory device of the executing computing device, for execution
by the computing device. Software instructions may be embedded in
firmware, such as an EPROM. It will be further appreciated that
hardware modules may be comprised of connected logic units, such as
gates and flip-flops, and/or may be comprised of programmable
units, such as programmable gate arrays or processors. The modules
or computing device functionality described herein are preferably
implemented as software modules, but may be represented in hardware
or firmware. Generally, the modules described herein refer to
logical modules that may be combined with other modules or divided
into sub-modules despite their physical organization or storage
[0086] Computer system 1800 may implement the techniques described
herein using customized hard-wired logic, one or more ASICs or
FPGAs, firmware and/or program logic which in combination with the
computer system causes or programs computer system 1800 to be a
special-purpose machine. According to one embodiment, the
techniques herein are performed by computer system 1800 in response
to processor(s) 1804 executing one or more sequences of one or more
instructions contained in main memory 1806. Such instructions may
be read into main memory 1806 from another storage medium, such as
storage device 1810. Execution of the sequences of instructions
contained in main memory 1806 causes processor(s) 1804 to perform
the process steps described herein. In alternative embodiments,
hard-wired circuitry may be used in place of or in combination with
software instructions.
[0087] The term "non-transitory media," and similar terms, as used
herein refers to any media that store data and/or instructions that
cause a machine to operate in a specific fashion. Such
non-transitory media may comprise non-volatile media and/or
volatile media. Non-volatile media includes, for example, optical
or magnetic disks, such as storage device 1810. Volatile media
includes dynamic memory, such as main memory 1806. Common forms of
non-transitory media include, for example, a floppy disk, a
flexible disk, hard disk, solid state drive, magnetic tape, or any
other magnetic data storage medium, a CD-ROM, any other optical
data storage medium, any physical medium with patterns of holes, a
RAM, a PROM, and EPROM, a FLASH-EPROM, NVRAM, any other memory chip
or cartridge, and networked versions of the same.
[0088] Non-transitory media is distinct from but may be used in
conjunction with transmission media. Transmission media
participates in transferring information between nontransitory
media. For example, transmission media includes coaxial cables,
copper wire and fiber optics, including the wires that comprise bus
1802. Transmission media can also take the form of acoustic or
light waves, such as those generated during radio-wave and
infra-red data communications.
[0089] Various forms of media may be involved in carrying one or
more sequences of one or more instructions to processor 1804 for
execution. For example, the instructions may initially be carried
on a magnetic disk or solid state drive of a remote computer. The
remote computer can load the instructions into its dynamic memory
and send the instructions over a telephone line using a modem. A
modem local to computer system 1800 can receive the data on the
telephone line and use an infra-red transmitter to convert the data
to an infra-red signal. An infra-red detector can receive the data
carried in the infra-red signal and appropriate circuitry can place
the data on bus 1802. Bus 1802 carries the data to main memory
1806, from which processor 1804 retrieves and executes the
instructions. The instructions received by main memory 1806 may
retrieves and executes the instructions. The instructions received
by main memory 1806 may optionally be stored on storage device 1810
either before or after execution by processor 1804.
[0090] Computer system 1800 also includes a communication interface
1818 coupled to bus 1802. Communication interface 1818 provides a
two-way data communication coupling to a network link 1820 that is
connected to a local network 1822. For example, communication
interface 1818 may be an integrated services digital network (ISDN)
card, cable modem, satellite modem, or a modem to provide a data
communication connection to a corresponding type of telephone line.
As another example, communication interface 1818 may be a local
area network (LAN) card to provide a data communication connection
to a compatible LAN (or WAN component to communicated with a WAN).
Wireless links may also be implemented. In any such implementation,
communication interface 1818 sends and receives electrical,
electromagnetic or optical signals that carry digital data streams
representing various types of information.
[0091] Network link 1820 typically provides data communication
through one or more networks to other data devices. For example,
network link 1820 may provide a connection through local network
1822 to a host computer 1824 or to data equipment operated by an
Internet Service Provider (ISP) 1826. ISP 1826 in turn provides
data communication services through the world wide packet data
communication network now commonly referred to as the "Internet"
1828. Local network 1822 and Internet 1828 both use electrical,
electromagnetic or optical signals that carry digital data streams.
The signals through the various networks and the signals on network
link 1820 and through communication interface 1818, which carry the
digital data to and from computer system 1800, are example forms of
transmission media.
[0092] Computer system 1800 can send messages and receive data,
including program code, through the network(s), network link 1820
and communication interface 1818. In the Internet example, a server
1830 might transmit a requested code for an application program
through Internet 1828, ISP 1826, local network 1822 and
communication interface 1818.
[0093] The received code may be executed by processor 1804 as it is
received, and/or stored in storage device 1810, or other
non-volatile storage for later execution.
[0094] Each of the processes, methods, and algorithms described in
the preceding sections may be embodied in, and fully or partially
automated by, code modules executed by one or more computer systems
or computer processors comprising computer hardware. The processes
and algorithms may be implemented partially or wholly in
application-specific circuitry.
[0095] The various features and processes described above may be
used independently of one another, or may be combined in various
ways. All possible combinations and subcombinations are intended to
fall within the scope of this disclosure. In addition, certain
method or process blocks may be omitted in some implementations.
The methods and processes described herein are also not limited to
any particular sequence, and the blocks or states relating thereto
can be performed in other sequences that are appropriate. For
example, described blocks or states may be performed in an order
other than that specifically disclosed, or multiple blocks or
states may be combined in a single block or state. The example
blocks or states may be performed in serial, in parallel, or in
some other manner. Blocks or states may be added to or removed from
the disclosed example embodiments. The example systems and
components described herein may be configured differently than
described. For example, elements may be added to, removed from, or
rearranged compared to the disclosed example embodiments.
[0096] Conditional language, such as, among others, "can," "could,"
"might," or "may," unless specifically stated otherwise, or
otherwise understood within the context as used, is generally
intended to convey that certain embodiments include, while other
embodiments do not include, certain features, elements and/or
steps. Thus, such conditional language is not generally intended to
imply that features, elements and/or steps are in any way required
for one or more embodiments or that one or more embodiments
necessarily include logic for deciding, with or without user input
or prompting, whether these features, elements and/or steps are
included or are to be performed in any particular embodiment.
[0097] Any process descriptions, elements, or blocks in the flow
diagrams described herein and/or depicted in the attached figures
should be understood as potentially representing modules, segments,
or portions of code which include one or more executable
instructions for implementing specific logical functions or steps
in the process. Alternate implementations are included within the
scope of the embodiments described herein in which elements or
functions may be deleted, executed out of order from that shown or
discussed, including substantially concurrently or in reverse
order, depending on the functionality involved, as would be
understood by those skilled in the art.
[0098] It should be emphasized that many variations and
modifications may be made to the above-described embodiments, the
elements of which are to be understood as being among other
acceptable examples. All such modifications and variations are
intended to be included herein within the scope of this disclosure.
The foregoing description details certain embodiments of the
invention. It will be appreciated, however, that no matter how
detailed the foregoing appears in text, the invention can be
practiced in many ways. As is also stated above, it should be noted
that the use of particular terminology when describing certain
features or aspects of the invention should not be taken to imply
that the terminology is being re-defined herein to be restricted to
including any specific characteristics of the features or aspects
of the invention with which that terminology is associated. The
scope of the invention should therefore be construed in accordance
with the appended claims and any equivalents thereof.
* * * * *