U.S. patent application number 16/549671 was filed with the patent office on 2020-03-05 for method for the safe training of a dynamic model.
The applicant listed for this patent is Robert Bosch GmbH. Invention is credited to Mona Meister, The Duy Nguyen-Tuong, Christoph Zimmer.
Application Number | 20200074237 16/549671 |
Document ID | / |
Family ID | 67658669 |
Filed Date | 2020-03-05 |
![](/patent/app/20200074237/US20200074237A1-20200305-D00000.png)
![](/patent/app/20200074237/US20200074237A1-20200305-D00001.png)
![](/patent/app/20200074237/US20200074237A1-20200305-D00002.png)
![](/patent/app/20200074237/US20200074237A1-20200305-P00001.png)
United States Patent
Application |
20200074237 |
Kind Code |
A1 |
Zimmer; Christoph ; et
al. |
March 5, 2020 |
METHOD FOR THE SAFE TRAINING OF A DYNAMIC MODEL
Abstract
A computer-implemented method for the safe, active training of a
computer-aided model for modeling time series of a physical system
using Gaussian processes, including the steps of establishing a
safety threshold value .alpha.; initializing by implementing safe
initial curves as input values on the system, creating an initial
regression model and an initial safety model; repeatedly carrying
out the steps of updating the regression model; updating the safety
model; determining a new curve section; implementing the determined
new curve section on the physical system and measuring output
variables; incorporating the new output values in the regression
model and the safety model until N passes have been carried out;
and outputting the regression model and the safety model.
Inventors: |
Zimmer; Christoph; (Korntal,
DE) ; Meister; Mona; (Stuttgart, DE) ;
Nguyen-Tuong; The Duy; (Calw, DE) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
Robert Bosch GmbH |
Stuttgart |
|
DE |
|
|
Family ID: |
67658669 |
Appl. No.: |
16/549671 |
Filed: |
August 23, 2019 |
Current U.S.
Class: |
1/1 |
Current CPC
Class: |
G06K 9/6293 20130101;
G06N 20/10 20190101; G06N 5/003 20130101; G06F 17/16 20130101; G06K
9/6211 20130101; G06N 7/005 20130101; G06K 9/6262 20130101 |
International
Class: |
G06K 9/62 20060101
G06K009/62; G06F 17/16 20060101 G06F017/16 |
Foreign Application Data
Date |
Code |
Application Number |
Sep 5, 2018 |
DE |
102018215061.3 |
Claims
1. A computer-implemented method for the safe, active training of a
time series-based model of a physical system using Gaussian
processes, the method comprising: establishing a safety threshold
value; initializing by implementing safe initial curves as input
values on the system, creating an initial regression model and an
initial safety model; repeatedly carrying out for a plurality of
passes, the following steps: updating the regression model;
updating the safety model; determining a new curve section;
implementing a determined new curve on the physical system and
measuring new output values; and incorporating the new output
values in the regression model and the safety model; and updating
and outputting the regression model and the safety model.
2. The method as recited in claim 1, wherein the new curve section
is determined in such a way that information gain is maximized
while meeting safety criteria of the safety model.
3. The method as recited in claim 1, wherein the new curve section
is determined while using a covariance matrix.
4. The method as recited in claim 1, wherein the new curve section
is determined via optimization under a secondary condition.
5. The method as recited in claim 1, wherein the system is a test
bench for internal combustion engines, a robot controller, a
physical sensor, or a chemical reaction.
6. The method as recited in claim 1, wherein the safety model
includes values of the system, the values of the system including:
pressure values, and/or exhaust values, and/or consumption values,
and/or power values, and/or joint position values, and/or movement
limits, and/or sensor values, and/or temperature values, and/or
acidity values.
7. A computer-aided model of a physical system which is trained
using Gaussian processes, the model being trained by: establishing
a safety threshold value; initializing by implementing safe initial
curves as input values on the system, creating an initial
regression model and an initial safety model; repeatedly carrying
out, for a plurality of passes, the following steps: updating the
regression model; updating the safety model; determining a new
curve section; implementing a determined new curve on the physical
system and measuring new output values; and incorporating the new
output values in the regression model and the safety model; and
updating and outputting the regression model and the safety
model.
8. A non-transitory machine-readable memory medium on which is
stored a computer program for active training of a time
series-based model of a physical system using Gaussian processes,
the computer program, when executed by a computer, causing the
computer to perform: establishing a safety threshold value;
initializing by implementing safe initial curves as input values on
the system, creating an initial regression model and an initial
safety model; repeatedly carrying out, for a plurality of passes,
the following steps: updating the regression model; updating the
safety model; determining a new curve section; implementing a
determined new curve on the physical system and measuring new
output values; and incorporating the new output values in the
regression model and the safety model; and updating and outputting
the regression model and the safety model.
9. A device configured to actively train a time series-based model
of a physical system using Gaussian processes, the device
configured to: establish a safety threshold value; initialize by
implementing safe initial curves as input values on the system,
creating an initial regression model and an initial safety model;
repeatedly carry out, for a plurality of passes, the following:
updating the regression model; updating the safety model;
determining a new curve section; implementing a determined new
curve on the physical system and measuring new output values; and
incorporating the new output values in the regression model and the
safety model; and update and output the regression model and the
safety model.
Description
CROSS REFERENCE
[0001] The present application claims the benefit under 35 U.S.C.
.sctn. 119 of German Patent Application No. DE 102018215061.3 filed
on Sep. 5, 2018, which is expressly incorporated herein by
reference in its entirety.
BACKGROUND INFORMATION
[0002] The present invention relates to a method including a safety
condition for active learning for modelling dynamic systems with
the aid of time series based on Gaussian processes, a system that
has been trained using this method, a computer program which
includes instructions that are configured to carry out the method
when it is executed on a computer, a machine-readable memory medium
on which the computer program is stored, and a computer which is
configured to carry out the method.
[0003] Safe exploration during active learning is described in
"Safe Exploration for Active Learning with Gaussian Processes" by
J. Schreiter, D. Nguyen-Tuong, M. Eberts, B. Bischoff, H. Markert
and M. Toussaint (ECML/PKDD, Volume 9286, 2015). Specifically,
selective data are detected in a static state.
[0004] Active learning deals with sequential data identification
for learning an unknown function. In the process, the data points
are selected sequentially for identification in such a way that the
availability of the pieces of information required for the
approximation of the unknown function is maximized. The general aim
is to create an accurate model without providing more pieces of
information than are necessary. In this way, the model becomes more
efficient, since potentially cost-intensive measurements may be
avoided.
[0005] Active learning is commonplace for classifying data, for
example, for identifying images. For active learning in the case of
time series models that represent physical systems, the data must
be generated in such a way that relevant dynamic processes are able
to be detected.
[0006] This means, the physical system must be stimulated by
dynamic movement in the input area by input curves in such a way
that the collected data, i.e., input and output curves, contain as
many pieces of information about the dynamics as possible. Examples
of input curves that may be used are, among others, sinus
functions, ramp functions and step functions, and white noise. When
stimulating the physical systems, however, it is also necessary to
observe safety requirements. The stimulation must not damage the
physical system as the input area is being dynamically
explored.
[0007] It is important, therefore, to identify areas in which the
dynamic stimulation may be safely carried out.
SUMMARY
[0008] An example method according to the present invention may
have an advantage that it combines dynamic exploration, active
exploration and safe exploration.
[0009] Dynamic exploration in this case is understood to mean the
detection of pieces of information under changing conditions of the
system to be measured. Active exploration aims for a preferably
rapid detection of pieces of information, the pieces of information
being detected sequentially in such a way that many pieces of
information are able to be detected in a short period of time. In
other words, the information gain of the individual measurement is
maximized. Finally, safe exploration ensures that the system to be
measured is preferably not damaged.
[0010] These three types of exploration may be combined using the
example method according to the present invention.
[0011] Advantageous refinements of and improvements on the example
method are described herein.
SUMMARY
[0012] The present invention provides an active learning
environment with dynamic exploration (active learning) for time
series models, based on Gaussian processes, which take the aspect
of safety into consideration by deriving a suitable criterion for
the dynamic exploration of the input area.
[0013] Active learning is useful in a series of applications, such
as in simulations and forecasts. The goal of learning methods is,
in general, to create a model that describes reality. For this
purpose, a real process, a real system or a real object is
measured, also referred to as objective below in the sense that
pieces of information about the objective are detected. The reality
model created may then be used in a simulation or forecast instead
of the objective. An advantage of this approach is in the savings
resulting from not having to repeat the process, since most
resources are consumed in this case or that the object or the
system is not exposed to the process to be simulated and
potentially consumed, damaged or modified in the process.
[0014] It is advantageous if the model describes reality as
accurately as possible. In the present invention, it is
particularly advantageous that active learning may be used while
taking safety conditions into consideration. These safety
conditions are to ensure that the objective to be detected is
influenced negatively/critically as little as possible, for
example, in the sense that the object or the system is damaged.
[0015] In the present invention, a Gaussian process having a time
series structure is used, for example, with the non-linear
exogenous structure or with the non-linear autoregressive exogenous
structure. By dynamically exploring the input area, input curves
and output curves or output measurements appropriate for the time
series model are generated. The output measurements, i.e., the data
identification, serve as pieces of information for the time series
model. In the process, the input curve is parameterized in
successive curve areas, for example, successive sections of ramp
functions or of step functions which, given safety requirements and
preceding observations, are determined gradually using an
explorative approach.
[0016] The respectively subsequent section is determined while
taking the previous observations into consideration in such a way
that the information gain with respect to a criterion relating to
the model is maximized.
[0017] In the process, a Gaussian process having non-linear
exogenous structures is used with a suitable exploration criterion
as a time series model. At the same time, an additional Gaussian
model is used in order to forecast safe input areas with respect to
the given safety demands. The sections of the input curve are
determined by solving an optimization problem with secondary
conditions for taking the safety forecast into consideration.
[0018] Exemplary applications of the present invention are, for
example, test benches for internal combustion engines, in which
processes in the engines are intended to be able to be simulated.
Parameters to be detected in this case are, for example, pressure
values, exhaust values, consumption values, power values, etc.
Another application is, for example, the learning of dynamic models
of robot controllers, in which a dynamic model is to be learned,
which maps joint positions on joint torques of the robot, which may
be used for controlling the robot. This model may be actively
learned through exploration of the joint area, however, this should
be carried out in a safe manner so that the movement limits of the
joints are not exceeded, as a result of which the robots may be
damaged. Another application is, for example, the learning of a
dynamic model that is used as a substitute for a physical sensor.
The data for learning this model may be actively generated and
measured through exploration on the physical system. A safe
exploration is essential in this case, since a measurement in an
unsafe region may damage the physical system. Another application
is, for example, the learning of the behavior of a chemical
reaction, in which the safety requirements may relate to parameters
such as temperature, pressure, acidity or the like.
BRIEF DESCRIPTION OF THE DRAWINGS
[0019] Exemplary embodiments of the present invention are shown in
the figures and are explained in greater detail below.
[0020] FIG. 1 shows the sequence 100 of the method for the safe
training of a computer-aided model.
[0021] FIG. 2 shows the sequence 200 of the method for the safe
training of a computer-aided model.
DETAILED DESCRIPTION OF EXAMPLE EMBODIMENTS
[0022] The approximation of an unknown function f: X.OR
right..sup.d.fwdarw.Y.OR right. is to be achieved. In the case of
time series models such as, for example, the well-known non-linear
exogenous (NX) model, the input area is made up of discrete values,
the so-called manipulated variables.
[0023] With x.sub.k for the point in time k, (u.sub.k, u.sub.k-1, .
. . , u.sub.k-{tilde over (d)}+1) is applicable, (u.sub.k).sub.k,
u.sub.k.di-elect cons..pi..OR right..sup.d representing the
discretized curve. In this case, d is the dimension of input area,
.pi. of the system, {tilde over (d)} the dimension of the NX
structure and d=d{tilde over (d)} the dimension of X.
[0024] The elements u.sub.k are measured by the physical system and
need not be equidistant. For reasons of simpler notation, an
equidistance is assumed by way of example. In general, the control
curves are continuous signals and may be explicitly controlled.
[0025] Data in the form of n successive curve sections
D.sub.n.sup.f={.tau..sub.i, .rho..sub.i}.sub.i=1.sup.2 are observed
in the learning environment of the model, the input curve
.tau..sub.i being a matrix and being made up of m input points of
dimension d, i.e., .tau..sub.i=(x.sub.1.sup.i, . . . ,
x.sub.m.sup.i).di-elect cons..sup.d.times.m. Output curve
.rho..sub.i includes m corresponding output measurements, i.e.,
.rho..sub.i=(y.sub.1.sup.i, . . . , y.sub.m.sup.i).di-elect
cons..sup.m
[0026] The next curve section .tau..sub.n+1 to be input as
stimulation into the physical system is to be determined in such a
way that the information gain D.sub.n+1.sup.f with respect to the
modelling of f is increased, taking safety conditions into
consideration, however.
[0027] As an approximation of the function f, a Gaussian process
(hereinafter abbreviated as GP) is used, which is established by
its mean value function .mu.(x) and its covariance function
k(x.sub.i, x.sub.j), i.e., f(x.sub.i).about.GP(.mu.(x.sub.i),
k(x.sub.i, x.sub.j)).
[0028] Assuming noisy observations of the input and output curves,
the shared distribution according to the Gaussian process is given
as p(P.sub.n|T.sub.n)=N(P.sub.n|0, K.sub.n+.sigma..sup.2I), P.sub.n
.sup.nm being a vector, which connects output curves, and T.sub.n
.sup.nm.times.d being a matrix containing input curves. The
covariance matrix is represented by K.sub.n .sup.nm.times.nm. As an
illustration, a Gaussian core is used as a covariance function,
i.e., k(x.sub.i, x.sub.i)=.sigma..sub.f.sup.2
exp(-1/2(x.sub.i-x.sub.j).sup.T.LAMBDA..sub.f.sup.2(x.sub.i-x.sub.j)),
which is parameterized by .theta..sub.f=(.sigma..sub.f.sup.2,
.LAMBDA..sub.f.sup.2). A zero vector 0.di-elect cons..sup.nm is
also assumed as a mean value, a nm-dimensional identity matrix I
and .sigma..sup.2 as an output noise variance.
[0029] Under the given shared distribution, the forecasted
distribution p(.rho.*|.tau.*, D.sub.n.sup.f) may be expressed for a
new curve section .tau.* as
p(.rho.*|.tau.*, D.sub.n.sup.f)=N(.rho.*|.mu.(.tau.*),
.SIGMA.(.tau.*)), [equation A:]
.mu.(.tau.*)=k(.tau.*,
T.sub.n).sup.T(K.sub.n+.sigma..sup.2I).sup.-1P.sub.n, [equation B:]
being
.SIGMA.(.tau.*)=k**(.tau.*, .tau.*)-k(.tau.*,
T.sub.n).sup.T(K.sub.n+.sigma..sup.2I).sup.-1k(.tau.*, T.sub.n)
[equation C:] being
[0030] k** .sup.m.times.m being a matrix with k.sub.ij**=(x.sub.i,
x.sub.j). Matrix k** .sup.m.times.nm further contains core
evaluations with respect to .tau.* of the previous n input curves.
Since the covariance matrix is completely filled, input points x
correlate both completely with a curve section as well as beyond
various curves utilizing the correlations for planning the next
curve. Since matrix K.sub.n+.sigma..sup.2I potentially has a high
dimensional number nm, its inversion may be time-consuming, so that
GP approximation techniques may be used.
[0031] The safety status of the system is described by an unknown
function g, with g:X.OR right..sup.d.fwdarw.Z.OR right., which
assigns to each input point x a safety value z, which serves as a
safety indicator. Values z are determined using pieces of
information from the system, and are configured in such a way that
for all values of z that are greater than or equal to zero,
corresponding input point x is considered as safe.
[0032] Such safety values z are a function of the respective system
and may, as explained above, embody system-dependent values for
safe or unsafe pressure values, exhaust values, consumption values,
power values, joint position values, movement limits, sensor
values, temperature values, acidity values or the like.
[0033] The values of z are generally continuous and indicate the
distance of a given point x from the unknown safety limit in the
input area. Thus, the safety level for a curve .tau. may be
ascertained with the given function g or with an estimation
thereof. A curve is classified as safe if the probability that its
safety value z is greater than zero is sufficiently great, i.e.,
.intg..sub.z.sub.1.sub., . . . z.sub.m.sub..gtoreq.0p(z.sub.1, . .
. , z.sub.m|.tau.)dz.sub.1, . . . , z.sub.m>1-.alpha. with
.alpha..di-elect cons.[0,1] representing the threshold value for
the fact that .tau. is unsafe. With the given data
D.sub.n.sup.g={.tau..sub.i, .zeta..sub.i}.sub.i=1.sup.n, with
.zeta..sub.i=(z.sub.1.sup.i, . . . , z.sub.m.sup.i).di-elect
cons..sup.m a GP may be used in order to approximate the function
g. The forecast distribution p(.zeta.*|.tau.*, D.sub.n.sup.g) for a
given curve section .tau.* is calculated as
p(.zeta.*|.tau.*, D.sub.n.sup.g)=N(.zeta.*|.mu..sub.g(.tau.*),
.SIGMA..sub.g(.tau.*)) [equation D:]
[0034] .mu..sub.g(.tau.*) and .SIGMA..sub.g(.tau.*) being the
corresponding mean value and covariance values. The variables
.mu..sub.g and .SIGMA..sub.g are calculated as shown in equation 2
and 3, however, with Z.sub.n .sup.nm as the target vector, which
connects all .zeta..sub.i. By using a GP for approximating g,
safety condition .xi.(.tau.) may be calculated for a curve .tau. as
follows
.xi.(.tau.)=.intg..sub.z1, . . .
z.sub.m.sub..gtoreq.0N(.zeta.|.mu..sub.g(.tau.),
.SIGMA..sub.g(.tau.))dz.sub.1, . . . , z.sub.m>1-.alpha..
[equation E:]
[0035] The calculation of .xi.(.tau.) is generally difficult to
solve analytically, and thus a certain approximation may be used
such as, for example, a Monte-Carlo simulation or expectation value
progagation ("expectation propagation").
[0036] For the efficient selection of an optimal .tau., the curve
must be parameterized in a suitable manner. One possibility is to
perform the parameterization already in the input area. The
parameterization of the curve may be implemented, for example, as
ramp functions or step functions.
[0037] For a curve parameterization with a forecast distribution
according to equation A and safety conditions according to equation
E, the next curve section .tau..sub.n+1(.eta.*) may be obtained by
solving the following optimization problem with secondary
conditions:
.eta.*=arg max.sub..eta..di-elect cons..pi.J(.SIGMA.(.eta.))
[equation F:]
so that .xi.(.eta.)>1-.alpha., [equation G:]
[0038] .eta. .pi. representing the curve parameterization and J an
optimality criterion.
[0039] According to equation F, predictive variance .SIGMA. from
equation A is used for the exploration. This is a covariance
matrix, which is mapped on a real number by optimality criterion J,
as shown in equation F. Different optimality criteria may be used
for J as a function of the system. Thus, J may, for example, be the
determinant, i.e., equivalent to maximizing the volume of the
forecast reliance ellipsoid of the multi-normal distribution, the
trace, i.e., equivalent to maximizing the average forecast
variance, or the maximum intrinsic value, i.e., equivalent to
maximizing the largest axis of the forecast reliance ellipsoid.
However, other optimality criteria are also conceivable.
[0040] Referring to FIG. 1, an initialization is carried out in
step 120 by implementing n.sub.0 safe initial curves. A regression
and safety process, Gaussian processes are also created in the
process. The initial curves are located in a small safe area in
which the exploration begins. This small safe area is selected in
advance as a result of prior knowledge about the system. The
initial curves are determined to D.sub.0.sup.f,g={.tau..sub.i,
.rho..sub.i,.zeta..sub.i}.sub.i=1.sup.n with n=n.sub.0.
[0041] A new curve section .tau..sub.m+1 is subsequently determined
in step 160 according to equations F and G by optimizing .eta..
[0042] Determined curve section .tau..sub.n+1 is subsequently used
as input in step 170 and measured in this area .rho..sub.n+1 and
.zeta..sub.n+1 on the physical system.
[0043] The regression and safety processes are then updated in step
150. Regression model f is updated according to equation A using
D.sub.n.sup.f={.tau..sub.i, .rho..sub.i}.sub.i=1.sup.n, and safety
model g is updated according to equation D using
D.sub.n.sup.g={.tau..sub.i, .zeta..sub.i}.sub.i=1.sup.n.
[0044] The steps 150 through 170 in this case are passed through
N-times. In addition to a previously established number of passes,
an automatic ending after reaching a termination condition is also
possible. This could be based, for example, on training errors
(error metric in model prediction and system response) or on an
additional potential information gain (if the optimality criterion
becomes too small).
[0045] Subsequently, the regression model and the safety model are
output in step 190.
[0046] Referring to FIG. 2, an implementation 200 of the method is
explained. In step 210, a safety threshold value is established. In
the process, a value between 0 and 1 is selected for .alpha.. An
initialization is then carried out in step 220 by implementing
n.sub.0 safe initial curves, D.sub.0.sup.f,g={.tau..sub.i,
.rho..sub.i,.zeta..sub.i}.sub.i=1.sup.n, with n=n.sub.0. In this
way, regression and safety processes (Gaussian processes) are also
created. The initial curves are located in a small safe area in
which the exploration begins. This small safe area is selected in
advance as a result of prior knowledge about the system.
[0047] The part of the method encompassing steps 240 through 280 is
carried out N times, k being the control variable, i.e., indicating
the instantaneous pass. As in FIG. 1, an automatic ending after
reaching a termination condition is conceivable, in addition to a
previously established number of passes. This could, for example,
be based on training errors (error metric in model prediction and
system response) or on an additional potential information gain (if
the optimality criterion becomes too small).
[0048] Regression model f according to equation A is first updated
in step 240 using D.sub.k.sup.f={.tau..sub.i,
.rho..sub.i}.sub.i=1.sup.n. In step 250, safety model g according
to equation D is updated using D.sub.k.sup.g={.tau..sub.i,
.zeta..sub.i}.sub.i=1.sup.n. In the first pass of steps 240 through
280, steps 240 and 250 may be omitted.
[0049] A new curve section .tau..sub.n+1 is subsequently determined
in step 260 according to equations F and G by optimizing .eta..
[0050] Determined curve section .tau..sub.n+1 is subsequently used
as input in step 270 and measured in this area .rho..sub.n+1 and
.zeta..sub.n+1 on the physical system.
[0051] The input and output curves to D.sub.k-1.sup.f and
D.sub.k-1.sup.g processed in the preceding steps are then added in
step 280.
[0052] After the repetitions of steps 240 through 280 are
completed, step 290 follows, in which the regression and safety
model are updated and output.
[0053] The incremental updating of the GP model for new data, i.e.,
steps 150, respectively 240 and 250, may be efficiently carried
out, for example, by updating the range of the matrix (rank-one
update). A NX structure is shown by way of example here in
combination with the GP model for time series modelling, however,
the general, non-linear auto-regressive exogenous case may also be
used, i.e., GP with NARX input structure, where x.sub.k=(y.sub.k,
y.sub.k-1, . . . , y.sub.k-q,u.sub.k,u.sub.k-1, . . . , u.sub.k-d).
In this case, the forecast mean value of p(.rho.|.tau.,
D.sub.n.sup.f), for example, may be used as a substitute for
y.sub.k for optimizing and for planning for the next curve section.
The input stimulation of the system is nevertheless carried out via
manipulation variable u.sub.k in the case of NARX.
* * * * *