U.S. patent application number 15/639557 was filed with the patent office on 2018-09-20 for predictive modeling from distributed datasets.
This patent application is currently assigned to Microsoft Technology Licensing, LLC. The applicant listed for this patent is Microsoft Technology Licensing, LLC. Invention is credited to Melissa E. Chase, Ran Gilad-Bachrach, Kim Henry Martin Laine, Kristin Estella Lauter.
Application Number | 20180268283 15/639557 |
Document ID | / |
Family ID | 63519484 |
Filed Date | 2018-09-20 |
United States Patent
Application |
20180268283 |
Kind Code |
A1 |
Gilad-Bachrach; Ran ; et
al. |
September 20, 2018 |
Predictive Modeling from Distributed Datasets
Abstract
Techniques for using data sets for a predictive model are
described. According to various implementations, techniques
described herein enable different data sets to be used to generate
a predictive model, while minimizing the risk that individual data
points of the data sets will be exposed by the predictive model.
This aids in protecting individual privacy (e.g., protecting
personally identifying information for individuals), while enabling
robust predictive models to be generated using data sets from a
variety of different sources
Inventors: |
Gilad-Bachrach; Ran; (Hogla,
IL) ; Laine; Kim Henry Martin; (Seattle, WA) ;
Chase; Melissa E.; (Seattle, WA) ; Lauter; Kristin
Estella; (Redmond, WA) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
Microsoft Technology Licensing, LLC |
Redmond |
WA |
US |
|
|
Assignee: |
Microsoft Technology Licensing,
LLC
Redmond
WA
|
Family ID: |
63519484 |
Appl. No.: |
15/639557 |
Filed: |
June 30, 2017 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
62472962 |
Mar 17, 2017 |
|
|
|
Current U.S.
Class: |
1/1 |
Current CPC
Class: |
G06F 17/11 20130101;
G06N 3/04 20130101; G06N 20/00 20190101; G06N 3/084 20130101; H04L
9/085 20130101; G06K 9/6261 20130101; G06F 16/27 20190101; G06F
21/6254 20130101; G06F 17/18 20130101; G06N 5/04 20130101; G06N
5/003 20130101 |
International
Class: |
G06N 3/04 20060101
G06N003/04; G06N 3/08 20060101 G06N003/08; G06F 17/30 20060101
G06F017/30 |
Claims
1. A system comprising: at least one processor; and one or more
computer-readable storage media including instructions stored
thereon that, responsive to execution by the at least one
processor, cause the system perform operations including:
calculating a gradient value based on a data set applied to a data
model, the gradient value including a weight value calculated for
the data model; communicating the gradient value to an external
service; receiving an average gradient value from the external
service; applying the average gradient value to the data model; and
obtaining, based on ascertaining that a termination criterion
occurs, a predictive model that represents a trained version of the
data model.
2. A system as recited in claim 1, wherein said calculating
comprises using a backpropagation procedure to train the data model
using the data set.
3. A system as recited in claim 1, wherein said calculating
comprises: dividing the data set into a set of mini-batches; and
calculating the gradient value using a particular mini-batch of the
set of mini-batches.
4. A system as recited in claim 1, wherein said calculating
comprises: dividing the data set into a set of mini-batches; and
calculating the gradient value using a particular mini-batch of the
set of mini-batches, wherein the termination criterion comprises
determining that each mini-batch of the set of mini-batches is
evaluated to generate a respective gradient value.
5. A system as recited in claim 1, wherein said applying comprises
applying the average gradient value to update a weight value of the
data model.
6. A system as recited in claim 1, wherein the predictive model
comprises a neural network trained using the average gradient
value.
7. A system as recited in claim 1, wherein the operations further
include: applying a set of input data to the predictive model;
ascertaining an output of the predictive model; and performing an
action based on the output of the predictive model.
8. A computer-implemented method, comprising: receiving multiple
gradient values from multiple different source systems; generating
an average gradient value from the multiple gradient values; adding
a noise term to the average gradient value to generate a noisy
gradient average; communicating the noisy gradient average to the
multiple different source systems; and obtaining a predictive model
trained using the noisy gradient average.
9. A method as described in claim 8, wherein said adding the noise
term comprises adding a Laplace-distributed random number to the
average gradient value to generate the noisy gradient average.
10. A method as described in claim 8, wherein said adding the noise
term comprises performing a garbled circuits protocol using the
average gradient value.
11. A method as described in claim 8, wherein the predictive model
comprises a neural network trained using the noisy gradient
average.
12. A computer-implemented method, comprising: calculating a
gradient value based on a data set applied to a data model;
generating a perturbed gradient value based on the gradient value
and a perturbation value; communicating the perturbed gradient
value to a first host system; communicating the perturbation value
to a second host system; receiving an average gradient value from
one or more of the first host system or the second host system, the
average gradient value calculated based on the perturbed gradient
value and the perturbation value; applying the average gradient
value to the data model; and obtaining a predictive model that
represents a trained version of the data model, the data model
trained at least in part using the average gradient value.
13. A method as described in claim 12, wherein said calculating
comprises applying backpropagation to the data model and using the
data set to calculate the gradient value.
14. A method as described in claim 12, wherein said calculating
comprises: dividing the data set into a set of mini-batches; and
calculating the gradient value using a particular mini-batch of the
set of mini-batches.
15. A method as described in claim 12, wherein said generating the
perturbed gradient value comprises generating the perturbation
value as a random vector, and adding the random vector to the
gradient value to generate the perturbed gradient value.
16. A method as described in claim 12, wherein said applying
comprises applying a weight value from the average gradient value
to the data model.
17. A method as described in claim 12, wherein said obtaining is
performed in response to ascertaining that a termination criterion
occurs.
18. A method as described in claim 12, wherein the average gradient
value is calculated using a garbled circuits protocol.
19. A method as described in claim 12, wherein the predictive model
comprises a neural network trained using the average gradient
value.
20. A method as described in claim 12, further comprising: applying
a set of input data to the predictive model; ascertaining an output
of the predictive model; performing an action based on the output
of the predictive model.
Description
RELATED APPLICATION
[0001] This application claims priority to U.S. provisional
application No. 62/472,962, filed on 17 Mar. 2017 and titled
"Predictive Modeling," the disclosure of which is incorporated by
reference in its entirety herein.
BACKGROUND
[0002] Today's era of "big data" includes different data systems
with access to tremendous amounts of data of a variety of different
types, such as consumer data, educational data, medical data,
social networking data, and so forth. This data can be processed in
various ways and utilized for different useful purposes.
Educational data, for instance, can be analyzed to identify
different trends and outcomes in educational processes to optimize
those processes. Medical data can be analyzed to identify
predictive indicators of different medical conditions. Protecting
privacy of individuals associated with data, however, is of
paramount importance.
SUMMARY
[0003] This Summary is provided to introduce a selection of
concepts in a simplified form that are further described below in
the Detailed Description. This Summary is not intended to identify
key features or essential features of the claimed subject matter,
nor is it intended to be used as an aid in determining the scope of
the claimed subject matter.
[0004] Techniques for using data sets for a predictive model are
described. According to various implementations, techniques
described herein enable different data sets to be used to generate
a predictive model, while minimizing the risk that individual data
points of the data sets will be exposed by the predictive model or
by the process of generating it. This aids in protecting individual
privacy (e.g., protecting personally identifying information for
individuals), while enabling robust predictive models to be
generated using data sets from a variety of different sources.
BRIEF DESCRIPTION OF THE DRAWINGS
[0005] The detailed description is described with reference to the
accompanying figures. In the figures, the left-most digit(s) of a
reference number identifies the figure in which the reference
number first appears. The use of the same reference numbers in
different instances in the description and the figures may indicate
similar or identical items. Identical numerals followed by
different letters in a reference number may refer to difference
instances of a particular item.
[0006] FIG. 1 is an illustration of an environment in an example
implementation that is operable to employ techniques discussed
herein.
[0007] FIG. 2 depicts an example implementation scenario for a high
level overview of predictive model training in accordance with one
or more implementations.
[0008] FIG. 3 depicts an example implementation scenario for
predictive model training using distributed hosts in accordance
with one or more implementations.
[0009] FIG. 4 is a flow diagram that describes steps in a method
for enabling a predictive model to be generated in accordance with
one or more implementations.
[0010] FIG. 5 is a flow diagram that describes steps in a method
for generating a predictive model in accordance with one or more
implementations.
[0011] FIG. 6 is a flow diagram that describes steps in a method
for enabling a predictive model to be generated using multiple
hosts in accordance with one or more implementations.
[0012] FIG. 7 is a flow diagram that describes steps in a method
for enabling a predictive model to be generated using multiple
hosts in accordance with one or more implementations.
[0013] FIG. 8 is a flow diagram that describes steps in a method
for utilizing a predictive model in accordance with one or more
implementations.
[0014] FIG. 9 illustrates an example system and computing device as
described with reference to FIG. 1, which are configured to
implement implementations of techniques described herein.
DETAILED DESCRIPTION
[0015] Techniques for using data sets for a predictive model are
described. Generally, a predictive model represents a collection of
evaluable conditions to which a data set can be applied to
determine a possible, predicted outcome. In at least one
implementation, a predictive model is a neural network.
[0016] According to various implementations, techniques described
herein enable different data sets to be used to generate a
predictive model, while minimizing the risk that individual data
points of the data sets will be exposed by the predictive model.
This aids in protecting individual privacy (e.g., protecting
personally identifying information for individuals), while enabling
robust predictive models to be generated using data sets from a
variety of different sources.
[0017] In example implementations, different data sources with
different data sets use their respective data sets as training sets
to train a data model. As part of the training, the data sources
obtain gradient values and submit the gradient values to an
external system that processes the gradient values to determine
optimal ways for training the data model to generate a predictive
model, e.g., a trained neural network. The external system, for
example, determines average gradient values based on a collection
of gradient values from different data sources. Further, the
external system adds noise to the average gradient values to avoid
directly or inferentially exposing information about individual
data points of the local data sets. The noisy gradient values are
used to further train the data model and generate a trained
predictive model.
[0018] According to various implementations, data sets used to
generate a predictive model can be very large. Thus, techniques
described herein enable local data sources that maintain the data
sets to perform various local computations on their large data sets
to generate gradient values. The gradient values can then be
communicated to an external system that uses the gradient values to
calculate optimum gradient values and add noise to the optimum
gradient values for generating a predictive model that protects
individual data points from exposure outside their respective data
sets.
[0019] Thus, techniques described herein protect individual and
group privacy by reducing the likelihood that individual records of
a data set will be exposed when generating a predictive model using
the data set. Further, computational and network resources are
conserved by enabling local data sources to perform computations of
gradients based on their own respective data sets, and enabling an
external system to use the gradients to generate a predictive model
based on the different data sets. The external system, for example,
need not process entire large data sets, but can perform various
calculations described herein using smaller data sets that
summarize the larger data sets.
[0020] In the following discussion, an example environment is first
described that is operable to employ techniques described herein.
Next, some example implementation scenarios are described in
accordance with one or more implementations. Following this, some
example procedures are described in accordance with one or more
implementations. Finally, an example system and device are
described that are operable to employ techniques discussed herein
in accordance with one or more implementations. Consider now an
example environment in which example implementations may by
employed.
[0021] FIG. 1 is an illustration of an environment 100 in an
example implementation that is operable to employ techniques for
using data sets for a predictive model described herein. Generally,
the environment 100 includes various devices, services, and
networks that enable data communication via a variety of different
modalities. For instance, the environment 100 includes source
systems 102 and a host system 104 connected to a network 106.
Generally, the source systems 102 represent different data sources
that can provide data for generating predictive models. The source
systems 102 include various instances of information systems that
collect and aggregate different types of data, such as medical
information (e.g., patient records, medical statistics, and so
forth) from medical institutions, education information from
educational institutions, consumer information from enterprise
entities, government information from governmental entities, social
networking information regarding users of different social
networking platforms, and so on. The source systems 102 may be
implemented in various ways, such as servers, server systems,
distributed computing systems (e.g., cloud servers), corpnets, and
so on. Examples of different implementations of the source systems
102 are described below with reference to the example system
900.
[0022] The source systems 102 include data sets 108 and local
computation modules ("local modules") 110. The data sets 108
represent sets of different types of data, examples of which are
described above. Generally, each of the source systems 102
aggregates and maintains its own respective data set 108. The local
modules 110 represent functionality for performing different sets
of computations on the data sets 108 as well as other types of
data. As further detailed herein, some forms of computation can be
performed locally by the local modules 110, while others can be
performed at the host system 104.
[0023] The host system 104 is representative of functionality to
perform various computations outside of the context of the source
systems 102. For instance, the host system 104 can receive data
from the source systems 102, and can perform different calculations
using the data. Accordingly, the host system 104 includes a
multiparty computation module ("multiparty module") 112, which in
turn includes a privacy module 114. In accordance with
implementations for using data sets for a predictive model
described herein, the multiparty module 112 represents
functionality for performing various calculations on data received
from the source systems 102 to generate predictive models 116.
Generally, the predictive models 116 represent statistical models
that are generated based on attributes of the data sets 108 and
that can be used to predict various outcomes dependent on input
data values. In at least one implementation, the predictive models
116 represent different instances of a neural network.
[0024] As further detailed below, cooperation between the source
systems 102 and the host system 104 enables various attributes of
the different data sets 108 to be used to generate the predictive
models 116, while protecting the raw data from an individual data
set 108 from being exposed (e.g., directly or inferred) across the
different source systems 102. This enables multiple data sets 108
to be used to generate an individual predictive model 116 thus
increasing a robustness and accuracy of the individual predictive
model 116, while protecting a data set 108 from one source system
102 from being exposed to a different source system 102.
[0025] The network 106 is representative of a network that provides
the source systems 102 and the host system 104 with connectivity to
various networks and/or services, such as the Internet. The network
106 may be implemented via a variety of different connectivity
technologies, such as broadband cable, digital subscriber line
(DSL), wireless cellular, wireless data connectivity (e.g.,
WiFi.TM.), T-carrier (e.g., T1), Ethernet, and so forth. In at
least some implementations, the network 106 represents different
interconnected wired and wireless networks.
[0026] While the source systems 102 and the host system 104 are
depicted as being remote from one another, it is to be appreciated
that in one or more implementations, one or more of the source
systems 102 and the host system 104 may be implemented as part of a
single, multifunctional system to perform various aspects of using
data sets for a predictive model described herein. For instance, in
some implementations, the host system 104 can be implemented as a
secure hardware environment that is local to a particular source
system 102, but that is protected from tampering by functionalities
outside of the secure hardware environment.
[0027] Having described an example environment in which the
techniques described herein may operate, consider now a discussion
of some example implementation scenarios for using data sets for a
predictive model in accordance with one or more implementations.
The implementation scenarios may be implemented in the environment
100 discussed above, the system 900 described below, and/or any
other suitable environment.
[0028] FIG. 2 depicts an example implementation scenario 200 which
represents a high level overview of predictive model training in
accordance with one or more implementations. The scenario 200
includes various entities and components introduced above with
reference to the environment 100.
[0029] In the scenario 200, the host system 104 distributes an
initial model 202 separately to a source system 102a and a source
system 102b. Generally, the initial model 202 represents a starting
data model that is subsequently trained according to techniques
described herein to generate a predictive model. The source systems
102a, 102b represent separate sources of a data set 108a and a data
set 108b, respectively. In at least one implementation, the data
sets 108a, 108b represent different respective sets of data of a
same type. For instance, the data sets 108a, 108b can include
medical data, education data, enterprise data, and so forth.
[0030] Continuing with the scenario 200, local modules 110a, 110b
of the source systems 102a, 102b each perform training operations
204a, 204b on their respective instances of the initial model 202
and using their respective data sets 108a, 108b to generate
respective gradient values 206a, 206b. Generally, the training
operations 204a, 204b can be performed in a variety of different
ways for training a neural network. In this particular example, the
training operations 204a, 204b represent a backpropagation
technique that is applied to the initial models 202 using
mini-batches 208a, 208b of the respective data sets 108a, 108b.
Consider, for example, that the data sets 108a, 108b represent
collections of data records, such as patient records from medical
data. Accordingly, the mini-batches 208a, 208b represent subsets of
the collections of data records. As further detailed below,
generating a trained data model can be implemented as an iterative
process with each iteration using a different mini-batch 208a, 208b
of the data sets 108a, 108b.
[0031] The gradient values 206a, 206b generally represent
respective gradients of a loss function utilized as part of the
training operations 204a, 204b. Proceeding with the scenario 200,
the source systems 102a, 102b communicate their respective gradient
values 206a, 206b to the multiparty module 112, which processes the
gradients 206a, 206b to generate an average gradient 210. An
averaging function, for instance, is applied to the gradients 206a,
206b to generate the average gradient 210. The privacy module 114
then processes the average gradient 210 to generate a noisy
gradient 212. For example, the privacy module 114 adds noise to the
average gradient 210 to generate the noisy gradient 212. Generally,
adding noise to the average gradient 210 reduces a likelihood that
actual data values from the data sets 108a, 108b can be discovered
or inferred from the noisy gradient 212.
[0032] The multiparty module 112 then communicates the noisy
gradient 212 separately to the source systems 102a, 102b. The local
modules 110a, 110b on the source systems 102a, 102b utilize the
noisy gradient 212 to perform a training iteration on the initial
model 202 to generate an updated model 214. According to various
implementations, this process is repeated (e.g., for each of the
mini-batches 208a, 208b) until all of the data sets 108a, 108b have
been evaluated to generate a predictive model 116. Generally, the
predictive model 116 represents an optimized version of the initial
model 202 that can be evaluated with a set of input data to
generate a predicted outcome value or set of values. The predictive
model 116 may be generated at the host system 104, and/or
individually at the source systems 102a, 102b.
[0033] FIG. 3 depicts an example implementation scenario 300 which
represents predictive model training using distributed hosts in
accordance with one or more implementations. The scenario 300, for
instance, represents a variation on the scenario 200 described
above.
[0034] The scenario 300 includes a host system 104a and a host
system 104b, which represent different instances of the host system
104 introduced above. Generally, the host systems 104a, 104b
represent individual autonomous systems that are able to
communicate with one another to perform various aspects of
techniques described herein, but that are also able to protect
certain data from being accessible across the host systems 104a,
104b.
[0035] Similarly to the scenario 200, the source systems 102a, 102b
start with an initial model 302 and calculate respective gradients
304a, 304b based on their respective data sets 108a, 108b. As
mentioned above, the gradients 304a, 304b can be calculated using a
backpropagation technique that is applied to the initial models
302a, 302b using the mini-batches 208a, 208b of the respective data
sets 108a, 108b.
[0036] In the scenario 300, however, the source systems 102a, 102b
use a secret sharing technique to further enhance the security and
privacy aspects of techniques for [title] described herein.
Accordingly, the source system 102a calculates a perturbation value
306a and generates a perturbed gradient 308a which represents the
gradient 304a+perturbation value 306a. In at least one
implementation, the perturbation value 306a represents a random
vector with the same dimensions as the gradient 304a. The source
system 102a then communicates the perturbed gradient 308a to the
host system 104a, and the perturbation value 306a to the host
system 104b.
[0037] Similarly, the source system 102b calculates a perturbation
value 306b and generates a perturbed gradient 308b which represents
the gradient 304b+perturbation value 306b. The source system 102b
then communicates the perturbed gradient 308b to the host system
104a, and the perturbation value 306b to the host system 104b.
[0038] Continuing with the scenario 300, the host systems 104a,
104b sum the values that they've received from the respective
source systems 102a, 102b. The host system 104a, for instance, sums
the perturbed gradients 308a, 308b to generate a gradient sum 310.
The host system 104a then adds noise to the gradient sum 310 to
generate a noisy gradient sum 312.
[0039] Further, the host system 104b sums the perturbation values
306a, 306b to generate a perturbation sum 314. The host system 104b
then adds noise to the perturbation sum 314 to generate a noisy
perturbation sum 316.
[0040] The host systems 104a, 104b then engage in a cooperative
protocol 318 using the noisy gradient sum 312 and the noisy
perturbation sum 316 to generate a noisy gradient 320. The
cooperative protocol 318, for instance, represents a secure
computation procedure performed between the host systems 104a,
104b. In one example implementation, the cooperative protocol 318
is implemented as a garbled circuit protocol using the noisy
gradient sum 312 and the noisy perturbation sum 316 as inputs to
generate the noisy gradient 320. Generally, the noisy gradient 320
represents an average of the perturbed gradients 308a, 308b with
noise added to the data.
[0041] Accordingly, the noisy gradient 320 is communicated to the
source systems 102a, 102b, which use the noisy gradient 320 to
update the initial model 302 to generate an updated model 322.
Generally, this process is performed iteratively until a
termination criterion is reached, such as when all of the
mini-batches 208a, 208b have been evaluated, to obtain the
predictive model 116. Thus, the scenario 300 illustrates that
distributed calculations can be utilized to further enhance
security of techniques for [title] described herein.
[0042] Having discussed some example implementation scenarios,
consider now a discussion of some example procedures in accordance
with one or more implementations.
[0043] The following discussion describes some example procedures
for using data sets for a predictive model in accordance with one
or more implementations. The example procedures may be employed in
the environment 100 of FIG. 1, the system 900 of FIG. 9, and/or any
other suitable environment. The procedures, for instance, represent
example procedures for performing the implementation scenarios
described above. In at least some implementations, the steps
described for the various procedures are implemented automatically
and independent of user interaction.
[0044] FIG. 4 is a flow diagram that describes steps in a method in
accordance with one or more implementations. The method describes
an example procedure for enabling a predictive model to be
generated in accordance with one or more implementations.
[0045] Step 400 calculates a gradient value based on a data set
applied to an initial data model. In at least one implementation, a
source system 102 calculates the gradient value using
backpropagation with a data set 108 and an initial model as input.
As described above, the data set 108 may be divided into
mini-batches, and thus a particular gradient value can be
calculated for a discrete mini-batch.
[0046] Step 402 communicates the gradient value to an external
service. A source system 102, for instance, communicates the
gradient value to the host system 104.
[0047] Step 404 receives an average gradient value from the
external service. For example, a source system 102 receives the
average gradient value from the host system 104. Generally, the
average gradient value represents an average of multiple gradient
values received from multiple different source systems 102 and
based on multiple different data sets 108. Further, the average
gradient value is a noisy gradient, i.e., a raw average gradient
value to which a noise term has been added.
[0048] Step 406 applies the average gradient value to the initial
data model. A local module 110 at a source system 102, for
instance, applies the average gradient value to an initial model to
generate an updated model. For example, the average gradient value
is used to update one or more weight and bias values for the
initial model 202.
[0049] Step 408 ascertains whether a termination criterion occurs.
Generally, a termination criterion represents an event that
indicates whether an iterative process of training the data model
is to terminate. In at least one implementation, the termination
criterion represents an indication that a set number of
mini-batches 208 have been evaluated according to the process
described above. In another example implementation, the termination
criterion represents an indication that a specified number of
iterations through the process have been performed. In another
example implementation, the termination criterion represents an
indication that the trained model did not significantly change for
the last few iterations. In another example implementation, the
termination criterion represents an indication that the accuracy of
the model, as tested on some validation set, did not improve or
even deteriorated, over the last few iterations.
[0050] If the termination criterion does not occur ("No"), the
process returns to step 400 where additional gradient values are
calculated and used to update the data model.
[0051] If the termination criterion occurs ("Yes"), step 410
obtains a predictive model that represents a trained version of the
initial data model. The predictive model, for instance, represents
a neural network whose weights and biases have been trained
according to techniques for [title] described herein. In at least
one implementation, the predictive model can be generated locally
at a source system 102 using noisy gradient values obtain from the
host system 104. Alternatively or additionally, the predictive
model can be received from the host system 104. Generally, the
predictive model can be used for various purposes, such as
predicting an outcome based on a set of input values.
[0052] FIG. 5 is a flow diagram that describes steps in a method in
accordance with one or more implementations. The method describes
an example procedure for generating a predictive model in
accordance with one or more implementations.
[0053] Step 500 receives multiple gradient values from multiple
different source systems. The host system 104, for instance,
receives gradient values from multiple different source systems
102.
[0054] Step 502 generates an average gradient value from the
multiple gradient values. Each of the multiple gradient values, for
instance, is a different value, e.g., a different gradient of a
loss function calculated at a respective source system 102. Thus,
the host system 104 averages the different gradient values to
obtain an average gradient value.
[0055] Step 504 adds a noise term to the average gradient value to
generate a noisy gradient average. In at least one implementation,
the noise term is added as random noise added to the average
gradient value, such as a Laplace-distributed random number added
to the average gradient value. In at least one implementation, the
noisy gradient average can be calculated via interaction between
multiple hosts, such as discussed with reference to the scenario
300. For instance, the noisy gradient average can be calculated via
a garbled circuits protocol performed between the host systems
104a, 104b.
[0056] Step 506 communicates the noisy gradient average to the
multiple different source systems. The host system 104, for
instance, communicates the noisy gradient average to the multiple
different source systems 102.
[0057] Step 508 ascertains whether a termination criterion occurs.
Different examples of a termination criterion are discussed above.
If the termination criterion does not occur ("No"), the process
returns to step 500. For instance, further gradient values are
received and are averaged to generate further noisy gradient
averages, which are communicated back to the source systems 102.
This process can be performed iteratively to enable the source
systems 102 to iteratively train their respective data models.
[0058] If the termination criterion occurs ("Yes"), step 510
obtains a predictive model trained using the noisy gradient
average. In at least one implementation, the predictive model can
be generated locally at the host system 104, and/or locally at the
individual source systems 102.
[0059] FIG. 6 is a flow diagram that describes steps in a method in
accordance with one or more implementations. The method describes
an example procedure for enabling a predictive model to be
generated using multiple hosts in accordance with one or more
implementations.
[0060] Step 600 calculates a gradient value based on a data set
applied to a data model. In at least one implementation, a source
system 102 calculates the gradient value using backpropagation with
a data set 108 and the initial model 202 as input.
[0061] In at least one implementation, the gradient value is
calculated as:
g.sup.i=.SIGMA..sub.z.di-elect cons.Z.sub.iClip(C,F'(w.sub.t,z))
Equation 1:
where Z.sup.i is the dataset used in the i'.sup.th minibatch, C is
a bound on a size of the gradient, F is the function of the data
model to be optimized, w.sub.t is the current weight vector, and z
is an example from the current mini-batch. Clip can be calculated
as:
Clip ( C , x ) = min ( 1 , C || x || ) x , Equation 2
##EQU00001##
where x is the vector being calculated for the gradient value.
[0062] Step 602 generates a perturbed gradient value based on the
gradient value and a perturbation value. A source system 102, for
instance, generates a perturbation value, and adds the perturbation
value to the original gradient value to generate the perturbed
gradient value.
[0063] In one example, the perturbation value r.sup.i is generated
as:
r.sup.i.rarw.Laplace(b), Equation 3:
which represents a random vector with the same dimension as g.sup.i
sampled from the Laplace distribution.
[0064] Accordingly, the perturbed gradient value can be generated
as g.sup.i+r.sup.i.
[0065] Step 604 communicates the perturbed gradient value to a
first host system. The source system 102, for instance,
communicates the perturbed gradient value to a first host system
104.
[0066] Step 606 communicates the perturbation value to a second
host system. For example, the source system 102 communicates the
perturbation value to a second host system 104. In at least one
implementation, the first host system 104 and the second host
system 104 represent host systems that are physically and/or
communicatively remote from one another and that are protected from
mutual access. Alternatively, the first host system 104 and the
second host system 104 represent protected portions of a single
larger system, such as different trusted platform modules (TPM)
that reside on a single server and/or other computing device.
[0067] Step 608 receives an average gradient value from one or more
of the first host system or the second host system. Generally, the
average gradient value represents a perturbed average gradient
value and is based calculations performed at the different host
systems using the perturbed gradient value and the perturbation
value, as well as other perturbed gradient values and perturbation
values from other source systems.
[0068] Step 610 applies the average gradient value to the data
model. For instance, a weight value and/or a bias value from the
average gradient value are applied to update (e.g., train) the data
model.
[0069] Step 612 ascertains whether a termination criterion occurs.
Different examples of termination criteria are discussed above. If
the termination criterion does not occur ("No"), the process
returns to step 600. For instance, the source system 102 determines
a further gradient value based on the updated data model, and the
process proceeds as indicated above using the further gradient
value.
[0070] If the termination criterion occurs ("Yes"), step 614
obtains a predictive model that represents a trained version of the
data model. The predictive model, for instance, is generated
locally at the source system 102 and based on different gradient
values received from the host systems 104. Alternatively or
additionally, the predictive model is communicated to the source
system 102 from one or more of the host systems 104.
[0071] FIG. 7 is a flow diagram that describes steps in a method in
accordance with one or more implementations. The method describes
an example procedure for enabling a predictive model to be
generated using multiple hosts in accordance with one or more
implementations. In this particular example, portions of the method
are divided into actions at a first host system and actions at a
second host system.
[0072] Step 700 receives perturbed gradients representing gradient
values summed with perturbation values from multiple different
source systems. The host system 104a, for instance, receives the
perturbed gradients from different source systems 102.
[0073] Step 702 sums the perturbed gradients to generate a gradient
sum. For example, the host system 104a sums a set of perturbed
gradients to generate a gradient sum. The gradient sum {tilde over
(g)}.sub.1, for instance is generated as:
{tilde over (g)}.sub.1=.SIGMA..sub.i(g.sub.i+r.sub.i)s mod mC
Equation 4:
[0074] In at least one implementation, smod is a symmetric mode
operation, such as calculated as:
x mod C=((x+C)mod 2C)-C Equation 5:
[0075] Step 704 calculates a first seed for a random number
generator. The host system 104a, for example, calculates a seed
value s.sub.1.
[0076] Step 706 receives perturbation values from the multiple
different source systems. For instance, the host system 104b
receives perturbation values that were used to generate the
perturbed gradients from multiple different source systems 102.
[0077] Step 708 sums the perturbation values to generate a
perturbation sum. The host system 104b, for example, sums the
perturbation values as:
{tilde over (g)}.sub.2=.SIGMA..sub.ir.sub.is mod mC Equation 6:
[0078] Step 710 calculates a second seed for the random number
generator. The host system 104b, for example, calculates a seed
value s.sub.2.
[0079] Step 712 implements a secure computation protocol using the
gradient sum, the perturbation sum, the first seed, and the second
seed to generate a noisy average of the gradient values. The host
systems 104a, 104b, for example, interact to perform a secure
computation protocol using these different sets of values. In at
least one implementation, the host systems 104a, 104b participate
in a garbled circuits protocol to compute the noisy average as:
(({tilde over (g)}.sub.1-{tilde over (g)}.sub.2)s mod
mC)+Rand.sub.s.sub.1.sub..sym.s.sub.2(b), Equation 7:
where b is an arbitrarily defined random noise parameter that is a
function of the required privacy.
[0080] Step 714 communicates the noisy average to a source system
to enable a predictive model to be trained using the noisy average.
One or more of the host systems 104a, 104b, for instance,
communicate the noisy gradient 320 to the source systems 102a,
102b. Generally, the noisy gradient 320 can be used as part of a
training step to generate a trained predictive model 116, e.g., a
trained neural network.
[0081] Step 716 ascertains whether a termination criterion occurs.
Different examples of termination criteria are discussed above. If
the termination criterion does not occur ("No"), the process
returns to step 700. For instance, the host systems 104a, 104b
receive further gradient values from the source systems 102a, 102b,
and the process iterates until a termination criterion occurs.
[0082] If the termination criterion occurs ("Yes"), step 718
obtains a predictive model that represents a trained version of an
initial data model. The predictive model, for instance, is
generated locally at the source systems 102a, 102b and based on
different noisy gradient values received from the host systems 104.
Alternatively or additionally, the predictive model is communicated
to the source systems 102a, 102b from one or more of the host
systems 104a, 104b.
[0083] Generally, a predictive model generated according to
techniques for [title] described herein can be used for various
purposes, such as predicting outcomes based on various input data
sets and scenarios.
[0084] FIG. 8 is a flow diagram that describes steps in a method in
accordance with one or more implementations. The method describes
an example procedure for utilizing a predictive model in accordance
with one or more implementations. The method, for instance,
represents a continuation of one or more of the procedures
described above.
[0085] Step 800 applies a set of input data to a predictive model.
A source system 102, for example, receives a set of data and uses
the set of data to evaluate a predictive model generated according
to techniques for using data sets for a predictive model described
herein. In at least some implementations, the set of data includes
data values that are evaluated using the predictive model.
[0086] Step 802 ascertains an output of the predictive model. For
instance, the predictive model provides an output prediction value
based on values of the input data.
[0087] Step 804 performs, by a computing device, an action based on
the output of the predictive model. Generally, the action can take
various forms, such as performing different computation tasks based
on the output of the predictive model. For example, consider that
the predictive model is configured to provide a prediction of
health condition. If the output of the predictive model indicates a
possible adverse health condition, the action can include
performing an automatic scheduling of a health procedure and/or an
automatic communication to an individual regarding the possible
adverse health condition.
[0088] As another example, consider that the predictive model is
configured to provide a prediction of a possible computer network
malfunction. For instance, the predictive model can include various
conditions and events that are indicative of a potential network
failure. Accordingly, the action can include performing an
automated maintenance and/or diagnostic procedure on the network to
attempt to prevent and/or repair a network malfunction.
[0089] These examples are presented for purpose of illustration
only, and it is to be appreciated that predictive models generated
and/or trained according to techniques for using data sets for a
predictive model described herein can be used for a variety of
different purposes not expressly discussed in this disclosure.
[0090] Thus, techniques for using data sets for a predictive model
described herein provide ways for generating predictive models
based on data sets from a variety of different sources, while
protecting the data used to generate the predictive models from
being exposed to unauthorized parties. Further, computational
resources are conserved by enabling local data sources to perform
averaging of data points from large data sets, while allowing a
centralized service (e.g., a host system 104 or set of host systems
104) to generate predictive models using the locally averaged data
points.
[0091] Having discussed some example procedures, consider now a
discussion of an example system and device in accordance with one
or more implementations.
[0092] FIG. 9 illustrates an example system generally at 900 that
includes an example computing device 902 that is representative of
one or more computing systems and/or devices that may implement
various techniques described herein. For example, the source
systems 102 and/or the host systems 104 discussed above with
reference to FIG. 1 can be embodied as the computing device 902.
The computing device 902 may be, for example, a server of a service
provider, a device associated with the client (e.g., a client
device), an on-chip system, and/or any other suitable computing
device or computing system.
[0093] The example computing device 902 as illustrated includes a
processing system 904, one or more computer-readable media 906, and
one or more Input/Output (I/O) Interfaces 908 that are
communicatively coupled, one to another. Although not shown, the
computing device 902 may further include a system bus or other data
and command transfer system that couples the various components,
one to another. A system bus can include any one or combination of
different bus structures, such as a memory bus or memory
controller, a peripheral bus, a universal serial bus, and/or a
processor or local bus that utilizes any of a variety of bus
architectures. A variety of other examples are also contemplated,
such as control and data lines.
[0094] The processing system 904 is representative of functionality
to perform one or more operations using hardware. Accordingly, the
processing system 904 is illustrated as including hardware element
910 that may be configured as processors, functional blocks, and so
forth. This may include implementation in hardware as an
application specific integrated circuit or other logic device
formed using one or more semiconductors. The hardware elements 910
are not limited by the materials from which they are formed or the
processing mechanisms employed therein. For example, processors may
be comprised of semiconductor(s) and/or transistors (e.g.,
electronic integrated circuits (ICs)). In such a context,
processor-executable instructions may be electronically-executable
instructions.
[0095] The computer-readable media 906 is illustrated as including
memory/storage 912. The memory/storage 912 represents
memory/storage capacity associated with one or more
computer-readable media. The memory/storage 912 may include
volatile media (such as random access memory (RAM)) and/or
nonvolatile media (such as read only memory (ROM), Flash memory,
optical disks, magnetic disks, and so forth). The memory/storage
912 may include fixed media (e.g., RAM, ROM, a fixed hard drive,
and so on) as well as removable media (e.g., Flash memory, a
removable hard drive, an optical disc, and so forth). The
computer-readable media 906 may be configured in a variety of other
ways as further described below.
[0096] Input/output interface(s) 908 are representative of
functionality to allow a user to enter commands and information to
computing device 902, and also allow information to be presented to
the user and/or other components or devices using various
input/output devices. Examples of input devices include a keyboard,
a cursor control device (e.g., a mouse), a microphone (e.g., for
voice recognition and/or spoken input), a scanner, touch
functionality (e.g., capacitive or other sensors that are
configured to detect physical touch), a camera (e.g., which may
employ visible or non-visible wavelengths such as infrared
frequencies to detect movement that does not involve touch as
gestures), and so forth. Examples of output devices include a
display device (e.g., a monitor or projector), speakers, a printer,
a network card, tactile-response device, and so forth. Thus, the
computing device 902 may be configured in a variety of ways as
further described below to support user interaction.
[0097] Various techniques may be described herein in the general
context of software, hardware elements, or program modules.
Generally, such modules include routines, programs, objects,
elements, components, data structures, and so forth that perform
particular tasks or implement particular abstract data types. The
terms "module," "functionality," "entity," and "component" as used
herein generally represent software, firmware, hardware, or a
combination thereof. The features of the techniques described
herein are platform-independent, meaning that the techniques may be
implemented on a variety of commercial computing platforms having a
variety of processors.
[0098] An implementation of the described modules and techniques
may be stored on or transmitted across some form of
computer-readable media. The computer-readable media may include a
variety of media that may be accessed by the computing device 902.
By way of example, and not limitation, computer-readable media may
include "computer-readable storage media" and "computer-readable
signal media."
[0099] "Computer-readable storage media" may refer to media and/or
devices that enable persistent storage of information in contrast
to mere signal transmission, carrier waves, or signals per se.
Computer-readable storage media do not include signals per se. The
computer-readable storage media includes hardware such as volatile
and non-volatile, removable and non-removable media and/or storage
devices implemented in a method or technology suitable for storage
of information such as computer readable instructions, data
structures, program modules, logic elements/circuits, or other
data. Examples of computer-readable storage media may include, but
are not limited to, RAM, ROM, EEPROM, flash memory or other memory
technology, CD-ROM, digital versatile disks (DVD) or other optical
storage, hard disks, magnetic cassettes, magnetic tape, magnetic
disk storage or other magnetic storage devices, or other storage
device, tangible media, or article of manufacture suitable to store
the desired information and which may be accessed by a
computer.
[0100] "Computer-readable signal media" may refer to a
signal-bearing medium that is configured to transmit instructions
to the hardware of the computing device 902, such as via a network.
Signal media typically may embody computer readable instructions,
data structures, program modules, or other data in a modulated data
signal, such as carrier waves, data signals, or other transport
mechanism. Signal media also include any information delivery
media. The term "modulated data signal" means a signal that has one
or more of its characteristics set or changed in such a manner as
to encode information in the signal. By way of example, and not
limitation, communication media include wired media such as a wired
network or direct-wired connection, and wireless media such as
acoustic, radio frequency (RF), infrared, and other wireless
media.
[0101] As previously described, hardware elements 910 and
computer-readable media 906 are representative of instructions,
modules, programmable device logic and/or fixed device logic
implemented in a hardware form that may be employed in some
implementations to implement at least some aspects of the
techniques described herein. Hardware elements may include
components of an integrated circuit or on-chip system, an
application-specific integrated circuit (ASIC), a
field-programmable gate array (FPGA), a complex programmable logic
device (CPLD), and other implementations in silicon or other
hardware devices. In this context, a hardware element may operate
as a processing device that performs program tasks defined by
instructions, modules, and/or logic embodied by the hardware
element as well as a hardware device utilized to store instructions
for execution, e.g., the computer-readable storage media described
previously.
[0102] Combinations of the foregoing may also be employed to
implement various techniques and modules described herein.
Accordingly, software, hardware, or program modules and other
program modules may be implemented as one or more instructions
and/or logic embodied on some form of computer-readable storage
media and/or by one or more hardware elements 910. The computing
device 902 may be configured to implement particular instructions
and/or functions corresponding to the software and/or hardware
modules. Accordingly, implementation of modules that are executable
by the computing device 902 as software may be achieved at least
partially in hardware, e.g., through use of computer-readable
storage media and/or hardware elements 910 of the processing
system. The instructions and/or functions may be
executable/operable by one or more articles of manufacture (for
example, one or more computing devices 902 and/or processing
systems 904) to implement techniques, modules, and examples
described herein.
[0103] As further illustrated in FIG. 9, the example system 900
enables ubiquitous environments for a seamless user experience when
running applications on a personal computer (PC), a television
device, and/or a mobile device. Services and applications run
substantially similar in all three environments for a common user
experience when transitioning from one device to the next while
utilizing an application, playing a video game, watching a video,
and so on.
[0104] In the example system 900, multiple devices are
interconnected through a central computing device. The central
computing device may be local to the multiple devices or may be
located remotely from the multiple devices. In one embodiment, the
central computing device may be a cloud of one or more server
computers that are connected to the multiple devices through a
network, the Internet, or other data communication link.
[0105] In one embodiment, this interconnection architecture enables
functionality to be delivered across multiple devices to provide a
common and seamless experience to a user of the multiple devices.
Each of the multiple devices may have different physical
requirements and capabilities, and the central computing device
uses a platform to enable the delivery of an experience to the
device that is both tailored to the device and yet common to all
devices. In one embodiment, a class of target devices is created
and experiences are tailored to the generic class of devices. A
class of devices may be defined by physical features, types of
usage, or other common characteristics of the devices.
[0106] In various implementations, the computing device 902 may
assume a variety of different configurations, such as for computer
914, mobile 916, and television 918 uses. Each of these
configurations includes devices that may have generally different
constructs and capabilities, and thus the computing device 902 may
be configured according to one or more of the different device
classes. For instance, the computing device 902 may be implemented
as the computer 914 class of a device that includes a personal
computer, desktop computer, a multi-screen computer, laptop
computer, netbook, and so on.
[0107] The computing device 902 may also be implemented as the
mobile 916 class of device that includes mobile devices, such as a
mobile phone, portable music player, portable gaming device, a
tablet computer, a wearable device, a multi-screen computer, and so
on. The computing device 902 may also be implemented as the
television 918 class of device that includes devices having or
connected to generally larger screens in casual viewing
environments. These devices include televisions, set-top boxes,
gaming consoles, and so on.
[0108] The techniques described herein may be supported by these
various configurations of the computing device 902 and are not
limited to the specific examples of the techniques described
herein. For example, functionalities discussed with reference to
the source systems 102 and/or the host systems 104 may be
implemented all or in part through use of a distributed system,
such as over a "cloud" 920 via a platform 922 as described
below.
[0109] The cloud 920 includes and/or is representative of a
platform 922 for resources 924. The platform 922 abstracts
underlying functionality of hardware (e.g., servers) and software
resources of the cloud 920. The resources 924 may include
applications and/or data that can be utilized while computer
processing is executed on servers that are remote from the
computing device 902. Resources 924 can also include services
provided over the Internet and/or through a subscriber network,
such as a cellular or Wi-Fi network.
[0110] The platform 922 may abstract resources and functions to
connect the computing device 902 with other computing devices. The
platform 922 may also serve to abstract scaling of resources to
provide a corresponding level of scale to encountered demand for
the resources 924 that are implemented via the platform 922.
Accordingly, in an interconnected device embodiment, implementation
of functionality described herein may be distributed throughout the
system 900. For example, the functionality may be implemented in
part on the computing device 902 as well as via the platform 922
that abstracts the functionality of the cloud 920.
[0111] Discussed herein are a number of methods that may be
implemented to perform techniques discussed herein. Aspects of the
methods may be implemented in hardware, firmware, or software, or a
combination thereof. The methods are shown as a set of steps that
specify operations performed by one or more devices and are not
necessarily limited to the orders shown for performing the
operations by the respective blocks. Further, an operation shown
with respect to a particular method may be combined and/or
interchanged with an operation of a different method in accordance
with one or more implementations. Aspects of the methods can be
implemented via interaction between various entities discussed
above with reference to the environment 100.
[0112] In the discussions herein, various different implementations
are described. It is to be appreciated and understood that each
implementation described herein can be used on its own or in
connection with one or more other implementations described herein.
Further aspects of the techniques discussed herein relate to one or
more of the following implementations.
[0113] A system for obtaining a predictive model, the system
including: at least one processor; and one or more
computer-readable storage media including instructions stored
thereon that, responsive to execution by the at least one
processor, cause the system perform operations including:
calculating a gradient value based on a data set applied to a data
model, the gradient value including a weight value calculated for
the data model; communicating the gradient value to an external
service; receiving an average gradient value from the external
service; applying the average gradient value to the data model; and
obtaining, based on ascertaining that a termination criterion
occurs, a predictive model that represents a trained version of the
data model.
[0114] In addition to any of the above described systems, any one
or combination of: wherein said calculating includes using a
backpropagation procedure to train the data model using the data
set; wherein said calculating includes: dividing the data set into
a set of mini-batches; and calculating the gradient value using a
particular mini-batch of the set of mini-batches; wherein said
calculating includes: dividing the data set into a set of
mini-batches; and calculating the gradient value using a particular
mini-batch of the set of mini-batches, wherein the termination
criterion includes determining that each mini-batch of the set of
mini-batches is evaluated to generate a respective gradient value;
wherein said applying includes applying the average gradient value
to update a weight value of the data model; wherein the predictive
model includes a neural network trained using the average gradient
value; wherein the operations further include: applying a set of
input data to the predictive model; ascertaining an output of the
predictive model; and performing an action based on the output of
the predictive model.
[0115] A computer-implemented method for obtaining a predictive
model, the method including: receiving multiple gradient values
from multiple different source systems; generating an average
gradient value from the multiple gradient values; adding a noise
term to the average gradient value to generate a noisy gradient
average; communicating the noisy gradient average to the multiple
different source systems; and obtaining a predictive model trained
using the noisy gradient average.
[0116] In addition to any of the above described methods, any one
or combination of: wherein said adding the noise term includes
adding a Laplace-distributed random number to the average gradient
value to generate the noisy gradient average; wherein said adding
the noise term includes performing a garbled circuits protocol
using the average gradient value; wherein the predictive model
includes a neural network trained using the noisy gradient
average.
[0117] A computer-implemented method for obtaining a predictive
model, the method including: calculating a gradient value based on
a data set applied to a data model; generating a perturbed gradient
value based on the gradient value and a perturbation value;
communicating the perturbed gradient value to a first host system;
communicating the perturbation value to a second host system;
receiving an average gradient value from one or more of the first
host system or the second host system, the average gradient value
calculated based on the perturbed gradient value and the
perturbation value; applying the average gradient value to the data
model; and obtaining a predictive model that represents a trained
version of the data model, the data model trained at least in part
using the average gradient value.
[0118] In addition to any of the above described methods, any one
or combination of: wherein said calculating includes applying
backpropagation to the data model and using the data set to
calculate the gradient value; wherein said calculating includes:
dividing the data set into a set of mini-batches; and calculating
the gradient value using a particular mini-batch of the set of
mini-batches; wherein said generating the perturbed gradient value
includes generating the perturbation value as a random vector, and
adding the random vector to the gradient value to generate the
perturbed gradient value; wherein said applying includes applying a
weight value from the average gradient value to the data model;
wherein said obtaining is performed in response to ascertaining
that a termination criterion occurs; wherein the average gradient
value is calculated using a garbled circuits protocol; wherein the
predictive model includes a neural network trained using the
average gradient value; further including: applying a set of input
data to the predictive model; ascertaining an output of the
predictive model; performing an action based on the output of the
predictive model.
[0119] Techniques for using data sets for a predictive model are
described. Although implementations are described in language
specific to structural features and/or methodological acts, it is
to be understood that the implementations defined in the appended
claims are not necessarily limited to the specific features or acts
described. Rather, the specific features and acts are disclosed as
example forms of implementing the claimed implementations.
* * * * *