U.S. patent application number 17/499546 was filed with the patent office on 2022-04-14 for model learning apparatus, control apparatus, model learning method and computer program.
This patent application is currently assigned to KABUSHIKI KAISHA TOYOTA CHUO KENKYUSHO. The applicant listed for this patent is KABUSHIKI KAISHA TOYOTA CHUO KENKYUSHO. Invention is credited to Taro IKEDA, Ryuta MORIYASU, Masato TAKEUCHI.
Application Number | 20220114461 17/499546 |
Document ID | / |
Family ID | |
Filed Date | 2022-04-14 |
![](/patent/app/20220114461/US20220114461A1-20220414-D00000.png)
![](/patent/app/20220114461/US20220114461A1-20220414-D00001.png)
![](/patent/app/20220114461/US20220114461A1-20220414-D00002.png)
![](/patent/app/20220114461/US20220114461A1-20220414-D00003.png)
![](/patent/app/20220114461/US20220114461A1-20220414-D00004.png)
![](/patent/app/20220114461/US20220114461A1-20220414-D00005.png)
![](/patent/app/20220114461/US20220114461A1-20220414-D00006.png)
![](/patent/app/20220114461/US20220114461A1-20220414-D00007.png)
![](/patent/app/20220114461/US20220114461A1-20220414-M00001.png)
![](/patent/app/20220114461/US20220114461A1-20220414-M00002.png)
![](/patent/app/20220114461/US20220114461A1-20220414-M00003.png)
View All Diagrams
United States Patent
Application |
20220114461 |
Kind Code |
A1 |
MORIYASU; Ryuta ; et
al. |
April 14, 2022 |
MODEL LEARNING APPARATUS, CONTROL APPARATUS, MODEL LEARNING METHOD
AND COMPUTER PROGRAM
Abstract
A model learning apparatus is configured to learn a model that
shows a relationship between an input variable v input into a
system and an output variable y output from the system. The model
learning apparatus includes a storage that stores a model used to
learn a nonlinear equation of state for predicting the output
variable y by using the input variable v, and a processor
programmed to learn the equation of state by using the model and an
input-output data set including multiple sets of input variable
data and output variable data with respect to the model. The model
is an equation of state including a bijective mapping .psi. that
uses the input variable v as an input thereof and a bijective
mapping .PHI. that uses the output variable y as an input
thereof.
Inventors: |
MORIYASU; Ryuta;
(Nagakute-shi, JP) ; IKEDA; Taro; (Nagakute-shi,
JP) ; TAKEUCHI; Masato; (Kariya-shi, JP) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
KABUSHIKI KAISHA TOYOTA CHUO KENKYUSHO |
Nagakute-shi |
|
JP |
|
|
Assignee: |
KABUSHIKI KAISHA TOYOTA CHUO
KENKYUSHO
Nagakute-shi
JP
|
Appl. No.: |
17/499546 |
Filed: |
October 12, 2021 |
International
Class: |
G06N 5/04 20060101
G06N005/04; G06N 5/02 20060101 G06N005/02 |
Foreign Application Data
Date |
Code |
Application Number |
Oct 14, 2020 |
JP |
2020-173380 |
Claims
1. A model learning apparatus configured to learn a model that
shows a relationship between an input variable v input into a
system and an output variable y output from the system, the model
learning apparatus comprising: a storage that stores a model used
to learn a nonlinear equation of state for predicting the output
variable y by using the input variable v; and a processor
programmed to learn the equation of state by using the model and an
input-output data set including multiple sets of input variable
data and output variable data with respect to the model, wherein
the model is an equation of state including a bijective mapping
.psi. that uses the input variable v as an input thereof and a
bijective mapping .PHI. that uses the output variable y as an input
thereof.
2. The model learning apparatus according to claim 1, wherein the
model is defined by an expression (1): y . = ( .differential. .PHI.
.differential. y ) - 1 .times. { A ' .function. ( d ) .times. .PHI.
.function. ( y , d ) + B ' .function. ( d ) .times. .PSI.
.function. ( v , d ) + c ' .function. ( d ) - .differential. .PHI.
.differential. d .times. d . } ( 1 ) ##EQU00023## where a left side
of an equal sign is a time derivative of an n-dimensional vector
that indicates the output variable y, where n denotes an integer
number; and in a right side of the equal sign, the input variable v
is an m-dimensional vector, where m denotes an integer number, an
exogenous input d is a p-dimensional vector that indicates an
uncontrollable input affecting a variation of the output variable
y, where p denotes an integer number, the mapping .psi. is a
function that gives an m-dimensional vector by using the input
variable v and the exogenous variable d as inputs thereof, the
mapping .PHI. is a function that gives an n-dimensional vector by
using the output variable y and the exogenous variable d as inputs
thereof, and a function A', a function B' and a function c' are
respectively functions that give an n.times.n matrix, an n.times.m
matrix, and an n-dimensional vector by using the exogenous input d
as an input thereof.
3. The model learning apparatus according to claim 2, wherein in
the expression (1), the mapping .psi. is defined as an internal
variable u and the mapping .PHI. is defined as an internal variable
x, and the processor learns the equation of state defined by an
expression (2) to an expression (4): u = .PSI. .function. ( v , d )
; ( 2 ) y = .PHI. - 1 .function. ( x , d ) ; ( 3 ) and x . = A '
.function. ( d ) .times. x + B ' .function. ( d ) .times. u + c '
.function. ( d ) . ( 4 ) ##EQU00024##
4. The model learning apparatus according to claim 3, wherein the
mapping .psi. is defined by an expression (5) to an expression (8):
.PSI. .function. ( v , d ) = v .PSI. ( L .PSI. ) ; ( 5 ) v .PSI. (
i ) = .psi. .PSI. ( i ) .function. ( u .PSI. ( i ) , d ) ; ( 6 ) u
.PSI. ( i ) = W .PSI. ( i ) .function. ( d ) .times. v .PSI. ( i -
1 ) + b .PSI. ( i ) .function. ( d ) ; ( 7 ) and v .PSI. ( 0 ) = v
, .times. and ( 8 ) ##EQU00025## the mapping .PHI. is defined by an
expression (9) to an expression (12): .PHI. .function. ( y , d ) =
y .PHI. ( L .PHI. ) ; ( 9 ) y .PHI. ( i ) = .phi. .PHI. ( i )
.function. ( x .PHI. ( i ) , d ) ; ( 10 ) x .PHI. ( i ) = W .PHI. (
i ) .function. ( d ) .times. y .PHI. ( i - 1 ) + b .PHI. ( i )
.function. ( d ) ; ( 11 ) and y .PHI. ( 0 ) = y , ( 12 )
##EQU00026## where i denotes a layer number in a multilayer neural
network; each of L.sub..psi. and L.sub..PHI. denotes number of
layers in the multilayer neural network; each of W.sub..psi. and
W.sub..PHI. denotes a weight, each of b.sub..psi. and b.sub..PHI.
denotes a bias; and each of .psi..sub..psi. and .PHI..sub..PHI. is
an activation function and denotes an arbitrary bijective mapping
that gives an output of an identical dimension with a dimension of
an input thereof.
5. The model learning apparatus according to claim 1, wherein the
processor is programmed to: transmit a set of the input variable
data in the input-output data set to the model and estimate an
output; evaluate a matching degree of the estimated output with a
set of the output variable data in the input-output data set; and
update a learning parameter of the model according to a result of
the evaluation, so as to learn the equation of state.
6. The model learning apparatus according to claim 2, wherein the
processor is programmed to: give a set of the input variable data
in the input-output data set to the model and estimate an output;
evaluate a matching degree of the estimated output with a set of
the output variable data in the input-output data set; and update a
learning parameter of the model according to a result of the
evaluation, so as to learn the equation of state.
7. The model learning apparatus according to claim 3, wherein the
processor is programmed to: give a set of the input variable data
in the input-output data set to the model and estimate an output;
evaluate a matching degree of the estimated output with a set of
the output variable data in the input-output data set; and update a
learning parameter of the model according to a result of the
evaluation, so as to learn the equation of state.
8. The model learning apparatus according to claim 4, wherein the
processor is programmed to: give a set of the input variable data
in the input-output data set to the model and estimate an output;
evaluate a matching degree of the estimated output with a set of
the output variable data in the input-output data set; and update a
learning parameter of the model according to a result of the
evaluation, so as to learn the equation of state.
9. The model learning apparatus according to claim 3, wherein the
processor is programmed to learn an equation of state expressed by
an expression (13) to an expression (15) obtained by discretizing
the equation (2) to the equation (4) by a time step at a discrete
time k: u k = .PSI. .function. ( v k , d k ) ; ( 13 ) y k = .PHI. -
1 .function. ( x k , d k ) ; ( 14 ) and x k + 1 = A .function. ( d
k ) .times. x k + B .function. ( d k ) .times. u k + c .function. (
d k ) . ( 15 ) ##EQU00027##
10. A control apparatus configured to control a system, comprising:
the model learning apparatus according to claim 9, wherein the
processor is programmed to determine a target value of the input
variable v corresponding to a target value of the output variable y
by using the equation of state learned by the processor, and the
processor solves an optimal control problem using the equation of
state expressed by the expression (13) to the expression (15) and
learned by the processor.
11. A model learning method of learning a model that shows a
relationship between an input variable v input into a system and an
output variable y output from the system, the model learning method
comprising: a process of obtaining a model used to learn a
nonlinear equation of state for predicting the output variable y by
using the input variable v; and a process of learning the equation
of state by using the model and an input-output data set including
multiple sets of input variable data and output variable data with
respect to the model, wherein the model is an equation of state
including a bijective mapping .psi. that uses the input variable v
as an input thereof and a bijective mapping .PHI. that uses the
output variable y as an input thereof.
12. A non-transitory computer-readable storage medium that stores a
program that causes an information processing apparatus to perform
leaning of a model that shows a relationship between an input
variable v input into a system and an output variable y output from
the system, the computer program causing the information processing
apparatus to perform: a function of obtaining a model used to learn
a nonlinear equation of state for predicting the output variable y
by using the input variable v; and a function of learning the
equation of state by using the model and an input-output data set
including multiple sets of input variable data and output variable
data with respect to the model, wherein the model is an equation of
state including a bijective mapping .psi. that uses the input
variable v as an input thereof and a bijective mapping .PHI. that
uses the output variable y as an input thereof.
Description
CROSS-REFERENCE TO RELATED APPLICATION
[0001] The present application claims priority to Japanese Patent
Application No. 2020-173380 filed on Oct. 14, 2020. The disclosure
of the prior application is hereby incorporated by reference in its
entirety.
TECHNICAL FIELD
[0002] The present disclosure relates to a model learning
apparatus, a control apparatus, a model learning method, and a
computer program.
BACKGROUND
[0003] A model learning apparatus has conventionally been known to
learn a model showing a relationship between an input for
controlling a system and an output from the system in response to
this input.
[0004] For example, Japanese Patent Application No. 2018-179888 (JP
2020-51305A) describes a model learning apparatus configured to
learn a model, which is used for model predictive control that
predicts and controls a future state of a system, by machine
learning. Non-Patent Literature "Optimal Control Via Neural
Networks: A Convex Approach" (Yize Chen, Yuanyuan Shi, Baosen Zhang
URL: https://arxiv.org/abs/1805.11835) describes a technique of
maximizing the output of a system by model predictive control using
a special model.
SUMMARY
Technical Problem
[0005] The proposed techniques described above, however, still have
some room for improvement with respect to the technique involved in
the model learning apparatus to learn a model that is capable of
establishing a control apparatus configured to determine an input
that improves the correlation of an output to a target value, while
stably controlling the system. Model predictive control using a
model solves some type of an optimization problem called an optimal
control problem (OCP) at every control period of the system. This
optimal control problem takes advantage of prediction of a future
state of a system and a change in output of the system by using a
model and determines an optimum time series of input to provide a
most desired behavior with regard to the state of the system and
the change in output. More specifically, the model predictive
control solves an optimization (minimization) problem to determine
a time series of input that minimizes an objective function
arbitrarily set by a designer.
[0006] In the technique described in Patent Literature 1, the model
learnt by machine learning has a relatively high non-linearity. An
optimal control problem is thus likely to become a nonconvex
optimization problem. This is unlikely to guarantee the uniqueness
of a solution and is also likely to cause an irregular fluctuation
of an input, depending on a set initial condition. It is
accordingly difficult to assure the reliability. The technique of
Non-Patent Literature 1 establishes a control apparatus by using a
special model to determine an input that maximizes or minimizes a
certain output or a state itself. It is, however, difficult to
determine a unique input that minimizes a deviation of output in
the case where an output is controlled to follow a given target
value of the output. The control of making the output follow the
target value of the output is thus likely to become unstable.
[0007] In order to solve the problems described above, with respect
to a model learning apparatus configured to learn a model showing a
relationship between an input and an output in a system, an object
of the present disclosure is to provide a technique of learning a
model to determine an input that improves the correlation of an
output to a target value, while stably controlling the system.
Solution to Problem
[0008] The present disclosure may be implemented by aspects
described below to solve the problems described above.
[0009] (1) According to one aspect of the present disclosure, there
is provided a model learning apparatus configured to learn a model
that shows a relationship between an input variable v input into a
system and an output variable y output from the system. This model
learning apparatus comprises a model storage portion configured to
store a model used to learn a nonlinear equation of state for
predicting the output variable y by using the input variable v; and
a learning portion configured to learn the equation of state by
using the model and an input-output data set including multiple
sets of input variable data and output variable data with respect
to the model. The model is an equation of state including a
bijective mapping .psi. that uses the input variable v as an input
thereof and a bijective mapping .PHI. that uses the output variable
y as an input thereof.
[0010] In the model learning apparatus of this aspect, the model is
the equation of state including the bijective mapping .psi. that
uses the input variable v input into the system as the input
thereof and the bijective mapping .PHI. that uses the output
variable y output from the system as the input thereof. This
equation of state is linearized by using the respective mappings
.psi. and .PHI. as internal variables. This configuration thus
guarantees that even a control problem using a model having a
nonlinear structure gives a unique solution. This allows for
determination of just one optimum value of the input variable v
input into the system. In the case where this model learning
apparatus is applied to a control apparatus configured to control a
system, the control apparatus uses the optimum value of the input
variable v to improve the correlation of an output from the system
to a target value, while stably controlling the system.
Accordingly, the model learning apparatus of this configuration
learns a model that is capable of establishing a control apparatus
configured to determine an input that improves the correlation of
an output to a target value, while stably controlling the
system.
[0011] (2) In the model learning apparatus of the above aspect, the
model may be defined by an expression (1)
[ Math . .times. 16 ] .times. y . = ( .differential. .PHI.
.differential. y ) - 1 .times. { A ' .function. ( d ) .times. .PHI.
.function. ( y , d ) + B ' .times. .PSI. .function. ( v , d ) + c '
.function. ( d ) - .differential. .PHI. .differential. d .times. d
. } ( 1 ) ##EQU00001##
[0012] where a left side of an equal sign is a time derivative of
an n-dimensional vector that indicates the output variable y, where
n denotes an integer number; and in a right side of the equal sign,
the input variable v is an m-dimensional vector, where m denotes an
integer number, an exogenous input d is a p-dimensional vector that
indicates an uncontrollable input affecting a variation of the
output variable y, where p denotes an integer number, the mapping
.psi. is a function that gives an m-dimensional vector by using the
input variable v and the exogenous variable d as inputs thereof,
the mapping .PHI. is a function that gives an n-dimensional vector
by using the output variable y and the exogenous variable d as
inputs thereof, and a function A', a function B' and a function c'
are respectively functions that give an n.times.n matrix, an
n.times.m matrix, and an n-dimensional vector by using the
exogenous input d as an input thereof. In the model learning
apparatus of this aspect, the mappings .psi. and .PHI. are
respectively the bijective mappings using the input variable v and
the output variable y as their inputs, so that the expression (1)
is formally rewritten, like F.sup.-1=.psi. and G.sup.-1=.PHI. by
using, for example, functions F and G. The exogenous input d that
is the uncontrollable input affecting a variation of the output
variable y is included in each of the mappings .psi. and .PHI.
included in the model of the expression (1). Furthermore, in the
model of the expression (1), a function A'(d) and a function B'(d)
that use the exogenous input d as inputs thereof respectively work
as coefficients of the mappings .psi. and .PHI.. Additionally, the
model of the expression (1) includes a function c'(d) that uses the
exogenous input d as an input thereof and a time derivative term of
the exogenous input d. This causes the model of the expression (1)
to be an equation of state that takes into account the influence of
the uncontrollable exogenous input d affecting a variation of the
output variable y. Using this model thus enables a future state of
the system to be predicted with high accuracy. Accordingly, the
model learning apparatus of this configuration learns a model that
controls the system with high accuracy.
[0013] (3) In the model learning apparatus of the above aspect, in
the expression (1), when the mapping .psi. is defined as an
internal variable u and the mapping .PHI. is defined as an internal
variable x, the learning portion may learn the equation of state
defined by an expression (2) to an expression (4):
[ Math . .times. 17 ] .times. u = .PSI. .function. ( v , d ) ; ( 2
) [ Math . .times. 18 ] .times. y = .PHI. - 1 .function. ( x , d )
; and ( 3 ) [ Math . .times. 19 ] .times. x . = A ' .function. ( d
) .times. x + B ' .function. ( d ) .times. u + c ' .function. ( d )
. ( 4 ) ##EQU00002##
[0014] In the model learning apparatus of this aspect, the equation
of state of the expression (1) is linearized by defining the
mapping .psi. and the mapping .PHI. in the equation of state of the
expression (1) respectively as the internal variable u and as the
internal variable x. This configuration guarantees that an optimal
control problem using the equation of state shown by the expression
(1) gives a unique solution. Accordingly, the model learning
apparatus of this configuration learns a model that is capable of
establishing a control apparatus configured to determine an input
that improves the correlation of an output to a target value, while
stably controlling the system.
[0015] (4) In the model learning apparatus of the above aspect, the
mapping .psi. may be defined by an expression (5) to an expression
(8):
[ Math . .times. 20 ] .times. .PSI. .function. ( v , d ) = v .PSI.
( L .PSI. ) ; ( 5 ) [ Math . .times. 21 ] .times. v .PSI. ( i ) =
.psi. .PSI. ( i ) .function. ( u .PSI. ( i ) , d ) ; ( 6 ) [ Math .
.times. 22 ] .times. u .PSI. ( i ) = W .PSI. ( i ) .function. ( d )
.times. v .PSI. ( i - 1 ) + b .PSI. ( i ) .function. ( d ) ; and (
7 ) [ Math . .times. 23 ] .times. v .PSI. ( 0 ) = v , ( 8 )
##EQU00003##
[0016] the mapping .PHI. may be defined by an expression (9) to an
expression (12):
[ Math . .times. 24 ] .times. .PHI. .function. ( y , d ) = y .PHI.
( L .PHI. ) ; ( 9 ) [ Math . .times. 25 ] .times. y .PHI. ( i ) =
.phi. .PHI. ( i ) .function. ( x .PHI. ( i ) , d ) ; ( 10 ) [ Math
. .times. 26 ] .times. x .PHI. ( i ) = W .PHI. ( i ) .function. ( d
) .times. y .PHI. ( i - 1 ) + b .PHI. ( i ) .function. ( d ) ; ( 11
) and [ Math . .times. 27 ] .times. y .PHI. ( 0 ) = y , ( 12 )
##EQU00004##
[0017] In the expression (5) to the expression (12), i denotes a
layer number in a multilayer neural network; each of L.sub..psi.
and L.sub..PHI. denotes number of layers in the multilayer neural
network; each of W.sub..psi. and W.sub..PHI. denotes a weight, each
of b.sub..psi. and b.sub..PHI. denotes a bias; and each of
.psi..sub..psi. and .PHI..sub..PHI. is an activation function and
denotes an arbitrary bijective mapping that gives an output of an
identical dimension with a dimension of an input thereof. In the
model learning apparatus of this aspect, each of the mappings .psi.
and .PHI. is defined by using a multilayer neural network. This
configuration enables a model that predicts an actual output of the
system with high accuracy to be learnt by adjusting the weights
W.sub..psi. and W.sub..PHI. and the biases b.sub..psi. and
b.sub..PHI. in each layer of the multilayer neural network such as
to cause the output variable y corresponding to the input variable
v calculated by using the model to approach an actual output of the
system. Accordingly, the model learning apparatus of this
configuration learns a model that is capable of establishing a
control apparatus configured to determine an input that further
improves the correlation of an output to a target value.
[0018] (5) In the model learning apparatus of the above aspect, the
learning portion may be configured to: give a set of the input
variable data in the input-output data set to the model and
estimate an output; evaluate a matching degree of the estimated
output with a set of the output variable data in the input-output
data set; and update a learning parameter of the model according to
a result of the evaluation, so as to learn the equation of state.
In the model learning apparatus of this aspect, the learning
portion evaluates the matching degree of the output estimated by
using the input variable dataset in the input-output data set, with
the output variable data set. The learning portion updates the
learning parameter with respect to the model according to this
evaluation of the matching degree to learn the equation of state.
The learning portion can thus learn a nonlinear equation of state
according to a learning procedure using input-output data set
provided in advance as teaching data. This enables the model to be
learnt in accordance with an actual system. Accordingly, this
configuration learns a model that is capable of establishing a
control apparatus configured to furthermore improve the correlation
of an output from the system to a target value, while furthermore
stably controlling the system.
[0019] (6) In the model learning apparatus of the above aspect, the
learning portion may learn an equation of state expressed by an
expression (13) to an expression (15) obtained by discretizing the
equation (2) to the equation (4) by a time step at a discrete time
k:
[ Math . .times. 28 ] .times. u k = .PSI. .function. ( v k , d k )
; ( 13 ) [ Math . .times. 23 ] .times. y k = .PHI. - 1 .function. (
x k , d k ) ; ( 14 ) and [ Math . .times. 30 ] .times. x k + 1 = A
.function. ( d k ) .times. x k + B .function. ( d k ) .times. u k +
c .function. ( d k ) . ( 15 ) ##EQU00005##
[0020] In the model learning apparatus of this aspect, the learning
portion learns the equation of state expressed by the expression
(13) to the expression (15) obtained by discretizing the equation
of state expressed by the expression (2) to the expression (4) by
the time step at the discrete time k. This configuration limits the
numbers of the internal variables x and u and thereby shortens a
time period required for learning the model. Accordingly, this
configuration learns, in a relatively short time, a model that is
capable of establishing a control apparatus configured to determine
an input that improves the correlation of an output to a target
value, while stably controlling the system.
[0021] (7) According to another aspect of the present disclosure,
there is provided a control apparatus configured to control a
system. This control apparatus comprises the model learning
apparatus described in the above aspect (6); and a determination
portion configured to determine a target value of the input
variable v corresponding to a target value of the output variable y
by using the equation of state learnt by the learning portion. The
determination portion solves an optimal control problem using the
equation of state expressed by the expression (13) to the
expression (15) and learnt by the learning portion. In the control
apparatus of this aspect, the determination portion uses the
equation of state expressed by the expression (13) to the
expression (15) and learnt by the learning portion to solve the
optimal control problem and thereby determine the target value of
the input variable v. By taking advantage of that the equation (15)
is a linear model, the optimal control problem using the expression
(13) to the expression (15) can be regarded as a convex
optimization problem. This configuration allows for determination
of just one optimum value of the input variable v input into the
system. The control apparatus accordingly improves the correlation
of an output from the system to a target value, while stably
controlling the system.
[0022] (8) According to another aspect of the present disclosure,
there is provided a model learning method of learning a model that
shows a relationship between an input variable v input into a
system and an output variable y output from the system. This model
learning method comprises a process of obtaining a model used to
learn a nonlinear equation of state for predicting the output
variable y by using the input variable v; and a process of learning
the equation of state by using the model and an input-output data
set including multiple sets of input variable data and output
variable data with respect to the model. The model is an equation
of state including a bijective mapping .psi. that uses the input
variable v as an input thereof and a bijective mapping .PHI. that
uses the output variable y as an input thereof. In the model
learning method of this aspect, the model obtained by the model
obtaining process is the equation of state including the bijective
mapping .psi. that uses the input variable v input into the system
as the input thereof and the bijective mapping .PHI. that uses the
output variable output from the system as the input thereof. This
equation of state is linearized by using the respective mappings
.psi. and .PHI. as internal variables. This configuration thus
guarantees that even a control problem using a model having a
nonlinear structure gives a unique solution. This allows for
determination of just one optimum value of the input variable v
input into the system. In the case where this model learning method
is applied to a control apparatus configured to control a system,
the control apparatus uses the optimum value of the input variable
v to improve the correlation of an output from the system to a
target value, while stably controlling the system. Accordingly, the
model learning method of this configuration learns a model that is
capable of establishing a control apparatus configured to determine
an input that improves the correlation of an output to a target
value, while stably controlling the system.
[0023] (9) According to another aspect of the present disclosure,
there is provided a computer program that causes an information
processing apparatus to perform leaning of a model that shows a
relationship between an input variable v input into a system and an
output variable y output from the system. This computer program
causes the information processing apparatus to perform: a function
of obtaining a model used to learn a nonlinear equation of state
for predicting the output variable y by using the input variable v;
and a function of learning the equation of state by using the model
and an input-output data set including multiple sets of input
variable data and output variable data with respect to the model.
The model is an equation of state including a bijective mapping
.psi. that uses the input variable v as an input thereof and a
bijective mapping .PHI. that uses the output variable y as an input
thereof. In the computer program of this aspect, the model obtained
by the model obtaining function is the equation of state including
the bijective mapping .psi. that uses the input variable v input
into the system as the input thereof and the bijective mapping
.PHI. that uses the output variable output from the system as the
input thereof. This equation of state is linearized by using the
respective mappings .psi. and .PHI. as internal variables. This
configuration thus guarantees that even a control problem using a
model having a nonlinear structure gives a unique solution. This
allows for determination of just one optimum value of the input
variable v input into the system. In the case where this computer
program is applied to the information processing apparatus of a
control apparatus configured to control a system, the control
apparatus uses the optimum value of the input variable v to improve
the correlation of an output from the system to a target value,
while stably controlling the system. Accordingly, the information
processing apparatus learns a model that is capable of establishing
a control apparatus configured to determine an input that improves
the correlation of an output to a target value, while stably
controlling the system.
[0024] The present disclosure may be implemented by a variety of
aspects: for example, an apparatus and a method of learning a model
of a nonlinear system; an apparatus and a method of estimating a
state by using a model obtained by learning; a system including
these apparatuses; a computer program executed in these apparatuses
and the system; a server apparatus configured to deliver the
computer program; and a non-transitory storage medium configured to
store the computer program therein.
BRIEF DESCRIPTION OF DRAWINGS
[0025] FIG. 1 is a schematic diagram illustrating the configuration
of a model learning apparatus according to a first embodiment;
[0026] FIG. 2 is a flowchart showing a model learning method
according to the first embodiment;
[0027] FIG. 3 is a schematic diagram illustrating the configuration
of a control apparatus according to a second embodiment;
[0028] FIG. 4 is a flowchart showing a predictive control method
according to the second embodiment;
[0029] FIG. 5 is a schematic diagram illustrating one example of a
convex function and a nonconvex function;
[0030] FIG. 6 is a first schematic diagram illustrating results of
calculation in the model learning apparatus; and
[0031] FIG. 7 is a second schematic diagram illustrating results of
calculation in two model learning apparatuses.
DESCRIPTION OF EMBODIMENTS
First Embodiment
[0032] FIG. 1 is a schematic diagram illustrating the configuration
of a model learning apparatus 100 according to a first embodiment.
The model learning apparatus 100 of this embodiment is an apparatus
configured to learn a model of a nonlinear system. The "nonlinear
system" herein means a system having such a characteristic that a
relationship between an input parameter and an output parameter
with respect to an arbitrary control object (system) is not
expressed by or is not approximated by a linear expression.
According to this embodiment, a nonlinear equation of state is
illustrated as the "model". More specifically, the model learning
apparatus 100 learns a nonlinear equation of state that predicts an
output variable y of an arbitrary system as a result of control
with an input variable v input into the system by regarding a state
of the system as the output variable y output from the system. The
"equation of state" means an equation that determines an output
variable thereof y(t) by using an output variable y(t) at a present
time t, like "y(t)=f(y(t), . . . )". Hereinafter, as a matter of
convenience of notation, a time derivative of an arbitrary variable
z is expressed as "z".
[0033] The system includes, for example, an internal combustion
engine, a hybrid engine, a power train or the like. When the system
is a driving engine such as an internal combustion engine, a hybrid
engine, or a power train, the model to be learnt by the model
learning apparatus 100 is a nonlinear equation of state that
indicates a relationship of a variety of parameters relating to
driving of the system, for example, an operation amount of an
actuator of a control object, a disturbance to the control object,
a state of the control object, an output of the control object, and
an output target value of the control object. When an internal
combustion engine mounted on a vehicle is assumed as the system of
the embodiment, the model learning apparatus learns the equation of
state for predicting an output value of the internal combustion
engine, an emission amount of carbon dioxide, and an emission
amount of hydrocarbons, which are output from the internal
combustion engine, as the output variable y, in response to input
of an accelerator position, a speed of the vehicle and an
acceleration of the vehicle as the input variable v. When a hybrid
engine comprised of an internal combustion engine and a motor
mounted on a vehicle is assumed as the system of the embodiment,
the model learning apparatus learns the equation of state for
predicting an output value of the internal combustion engine, an
output value of the motor, a power storage amount of a battery, and
a limiting value of the power storage amount, which are output from
the hybrid engine, as the output variable y, in response to input
of an accelerator position, an operation amount of a brake and an
acceleration of the vehicle as the input variable. In these cases,
the running condition of the vehicle that varies in the course of a
run (for example, whether the vehicle is turning or not or whether
the vehicle is going up an uphill road) is the "initial condition"
described in paragraph [0068].
[0034] The model learning apparatus 100 is configured by, for
example, a personal computer (PC) and includes a CPU 110, a storage
module 120, a ROM/RAM 130, a communication module 140, and an
input-output module 150. The respective components of the model
learning apparatus 100 are connected with each other by means of
buses.
[0035] The CPU 110 includes a controller 111 and a learning module
112. The controller 111 loads a computer program stored in the ROM
130 and expands and executes the computer program on the RAM 130 to
control the respective components of the model learning apparatus
100. The CPU 110 may be one of a plurality of CPUs with a similar
hardware configuration, where each CPU executes the computer
program. The CPU may either include or be a neural processing unit
(NPU) that is specifically designed to accelerate machine learning.
The learning module 112 functions to learn a nonlinear equation of
state for predicting an output variable y that indicates a state of
an arbitrary system (nonlinear system). The learning module 112 may
be a software program such as a machine learning algorithm executed
by the CPU 110. The details of the functions of the learning module
112 will be described later.
[0036] The storage module 120 is a storage medium configured by a
hard disk, a flash memory, a memory card or the like. In other
words, the storage module 120 may be a computer-readable
nonvolatile storage medium. The storage module 120 includes a model
storage portion 121 and a data set storage portion 122. The model
storage portion 121 stores in advance a model that is used to learn
the equation of state by the learning module 112. According to the
embodiment, the model stored in the model storage portion 121 is an
equation of state including a bijective mapping .psi. that uses an
input variable v as an input thereof and a bijective mapping .PHI.
that uses the output variable y as an input thereof and is defined
by Expression (1) given below. The term "bijective" herein means a
state that, when the result of mapping of a set A is a set B,
respective elements of the set A and respective elements of the set
B necessarily have a one-to-one mapping relationship. This is
synchronous with, for example, a state that a bijective function f
assures the presence of a unique inverse function f.sup.-1.
[ Math . .times. 31 ] .times. y . = ( .differential. .PHI.
.differential. y ) - 1 .times. { A ' .function. ( d ) .times. .PHI.
.function. ( y , d ) + B ' .function. ( d ) .times. .PSI.
.function. ( v , d ) + c ' .function. ( d ) - .differential. .PHI.
.differential. d .times. d . } ( 1 ) ##EQU00006##
[0037] In the above expression, a left side of an equal sign is a
time derivative of an n-dimensional vector (where n denotes an
integer number) that indicates the output variable y. In a right
side of the equal sign, the input variable v is an m-dimensional
vector (where m denotes an integer number), and an exogenous input
d is a p-dimensional vector (where p denotes an integer number)
that indicates an uncontrollable input affecting a variation of the
output variable y. In the right side of the equal sign, the mapping
.psi. is a function that gives an m-dimensional vector by using the
input variable v and the exogenous variable d as inputs thereof;
the mapping .PHI. is a function that gives an n-dimensional vector
by using the output variable y and the exogenous variable d as
inputs thereof; and a function A', a function B' and a function c'
are respectively functions that give an n.times.n matrix, an
n.times.m matrix, and an n-dimensional vector by using the
exogenous input d as an input thereof.
[0038] The data set storage portion 122 stores in advance an
input-output data set including multiple sets of input variable
data and output variable data with respect to the model expressed
by Expression (1). These sets of the input variable data and the
output variable data are determined in advance by experiment or
calculation with respect to the system. The input-output data set
is used as teaching data, which is employed to learn the equation
of state by the learning module 112. In the description below, in
the input-output data set, a plurality of input variable data may
collectively be referred to as "input variable data set", and a
plurality of output variable data may collectively be referred to
as "output variable data set".
[0039] The communication module 140 controls communication via a
communication interface between the model learning apparatus 100
and another apparatus. Another apparatus is, for example, a control
apparatus configured to control the system, another information
processing apparatus, or a measuring instrument configured to
obtain the input-output data set from the data set storage portion
122. The communication module 140 may include wired communication
circuitry, such as controller area network (CAN) bus circuitry or
Ethernet communication circuitry. The communication module may in
other embodiments include wireless communication circuitry with an
antenna to enable wireless communication by Wi-Fi, LTE, or
Bluetooth. The input-output module 150 serves as various interfaces
used for input and output of information between the model learning
apparatus 100 and users. Examples of the input-output module 150
include a touch panel, a keyboard, a mouse, an operation button,
and a microphone as an input portion and a touch panel, a monitor,
a speaker, and an LED (light emitting diode) indicator as an output
portion.
[0040] FIG. 2 is a flowchart showing a model learning method
according to the first embodiment. The model learning method in the
model learning apparatus 100 is performed, for example, in response
to a user's request, such as activation of a predetermined
application. According to the embodiment, the model learning method
learns (estimates) a function form of a function F expressed by
Expression (16) given below by using a known input-output data set
including an output variable y, an input variable v, an exogenous
input d in a system, a time derivative y of the output variable y,
and a time derivative d of the exogenous input d in the equation of
state shown by Expression (1). In this embodiment, the output
variable y is an n-dimensional vector, the input variable v is an
m-dimensional vector, and the exogenous input d is a p-dimensional
vector.
[ Math . .times. 32 ] .times. y . .function. ( t ) = F .function. (
y , v , d , d . ) ( 16 ) ##EQU00007##
[0041] The learning module 112 first obtains a model that is stored
in the model storage portion 121 (step S11). More specifically, the
learning module 112 assumes a model for learning the function F as
the equation of state expressed by Expression (1) given below. The
learning module 112 sets each of the values of the respective
variables to zero or a random value in the equation of state
expressed by Expression (1), so as to initialize the respective
variables
[ Math . .times. 33 ] .times. y . = ( .differential. .PHI.
.differential. y ) - 1 .times. { A ' .function. ( d ) .times. .PHI.
.function. ( y , d ) + B ' .function. ( d ) .times. .PSI.
.function. ( v , d ) + c ' .function. ( d ) - .differential. .PHI.
.differential. d .times. d . } ( 1 ) ##EQU00008##
[0042] According to the embodiment, the learning module 112 defines
the mapping .psi. included in Expression (1) as an internal
variable u expressed by Expression (2) given below and defines the
mapping .PHI. included in Expression (1) as an internal variable x
expressed by Expression (3) given below. The learning module 112
accordingly learns an equation of state expressed by Expression (4)
given below and obtained by rewriting Expression (1) by using the
internal variables u and x. The advantageous effects of
respectively defining the respective mappings .PHI. and .psi.
included in the equation of state in Expression (1) as the internal
variables x and u will be described later.
[ Math . .times. 34 ] .times. u = .PSI. .function. ( v , d ) ( 2 )
[ Math . .times. 35 ] .times. y = .PHI. - 1 .function. ( x , d ) (
3 ) [ Math . .times. 36 ] .times. x . = A ' .function. ( d )
.times. x + B ' .function. ( d ) .times. u + c ' .function. ( d ) (
4 ) ##EQU00009##
[0043] Furthermore, according to the embodiment, the learning
module 112 employs the concept of a multilayer neural network to
define Expression (5) to Expression (8) given below with respect to
the mapping .psi.:
[ Math . .times. 37 ] .times. .PSI. .function. ( v , d ) = v .PSI.
( L .PSI. ) ( 5 ) [ Math . .times. 38 ] .times. v .PSI. ( i ) =
.psi. .PSI. ( i ) .function. ( u .PSI. ( i ) , d ) ( 6 ) [ Math .
.times. 39 ] .times. u .PSI. ( i ) = W .PSI. ( i ) .function. ( d )
.times. v .PSI. ( i - 1 ) + b .PSI. ( i ) .function. ( d ) ( 7 ) [
Math . .times. 40 ] .times. v .PSI. ( 0 ) = v ( 8 )
##EQU00010##
[0044] According to the embodiment, like Expression (5) to
Expression (8) with respect to the mapping .psi., the learning
module 112 also employs the concept of the multilayer neural
network to define Expression (9) to Expression (12) given below
with respect to the mapping .PHI.:
[ Math . .times. 41 ] .times. .PHI. .function. ( y , d ) = y .PHI.
( L .PHI. ) ( 9 ) [ Math . .times. 42 ] .times. y .PHI. ( i ) =
.phi. .PHI. ( i ) .function. ( x .PHI. ( i ) , d ) ( 10 ) [ Math .
.times. 43 ] .times. x .PHI. ( i ) = W .PHI. ( i ) .function. ( d )
.times. y .PHI. ( i - 1 ) + b .PHI. ( i ) .function. ( d ) ( 11 ) [
Math . .times. 44 ] .times. y .PHI. ( 0 ) = y ( 12 )
##EQU00011##
[0045] where i denotes a layer number in the multilayer neural
network; each of L.sub..psi. and L.sub..PHI. denotes the number of
layers in the multilayer neural network; each of W.sub..psi. and
W.sub..PHI. denotes a weight, each of b.sub..psi. and b.sub..PHI.
denotes a bias; and each of .psi..sub..psi. and .PHI..sub..PHI. is
an activation function and denotes an arbitrary bijective mapping
that gives an output of an identical dimension with the dimension
of an input thereof. Each of the weights W.sub..psi. and
W.sub..PHI. the biases b.sub..psi. and b.sub..PHI. and the
activation functions .psi..sub..psi. and .PHI..sub..PHI. may be set
for each layer of the multilayer neural network.
[0046] The learning module 112 subsequently obtains an input-output
data set [y, v, d, y and d] with respect to the output variable y,
the input variable v, the exogenous input d, the time derivative y
of the output variable y and the time derivative d of the exogenous
input d from the data set storage portion 122 (step S12). According
to the embodiment, the input-output data set [y, v, d, y and d]
includes j sets of the respective data (where j denotes a natural
number and j=1 to N). In the obtained input-output data set
[y.sub.j, v.sub.j, d.sub.j, d.sub.j] corresponds to the input
variable data set, and [y.sub.j] corresponds to the output variable
data set.
[0047] The learning module 112 subsequently gives an input data set
to the model and estimates an output (step S13). More specifically,
the learning module 112 gives the input variable dataset [y.sub.j,
v.sub.j, d.sub.j, d.sub.j] obtained at step S12 to the equation of
state of Expression (1) obtained and initialized at step S11. The
learning module 112 accordingly obtains an estimated value of an
output variable y.sub.j (a left side of Expression (17)). In
Expression (17), (.differential..PHI./.differential.y).sup.-1 is a
function of the output variable y and the exogenous input d and is
thereby evaluable by substitution of the output variable y.sub.j
and the exogenous input d.sub.j, and
(.differential..PHI./.differential.d) in a right side of Expression
(17) is a function of the output variable y and the exogenous input
d and is thereby evaluable by substitution of the output variable
y.sub.j and the exogenous input d.sub.j.
[ Math . .times. 45 ] .times. y . ^ j = ( .differential. .PHI.
.differential. y ) - 1 .times. { A ' .function. ( d j ) .times.
.PHI. .function. ( y j , d j ) + B ' .function. ( d j ) .times.
.PSI. .function. ( v j , d j ) + c ' .function. ( d j ) -
.differential. .PHI. .differential. d .times. d . j } ( 17 )
##EQU00012##
[0048] The learning module 112 subsequently evaluates a matching
degree of the estimated output with the output variable data set
(step S14). More specifically, the learning module 112 evaluates
the matching degree of the estimated value of the output variable
y.sub.j obtained at step S13 with the output variable data set
[y.sub.j] obtained at step S12. The learning module 112 may use,
for example, a mean square error (MSE) shown by Expression (18)
given below, as an index of the matching degree. In the case of
MSE, the smaller value of J in a left side of the equal sign
indicates the higher matching degree. The learning module 112 may
use another index such as a mean absolute error ratio or a cross
entropy to evaluate the matching degree, in place of the mean
square error.
[ Math . .times. 46 ] .times. J = 1 N .times. j = 1 N .times. ( y .
j - y . ^ j ) 2 ( 18 ) ##EQU00013##
[0049] The learning module 112 subsequently determines whether the
matching degree is sufficient (step S15). For example, in the case
of using MSE of Expression (18), the learning module 112 may
determine that the matching degree is sufficient when the value of
J is equal to or smaller than a predetermined value. According to a
modification, the learning module 112 may determine that the
matching degree is sufficient when a rate of change in the value of
J is equal to or smaller than a predetermined value. The
predetermined value may be determined arbitrarily.
[0050] When the matching degree is not sufficient (step S15: NO),
the learning module 112 proceeds to step S16 to update learning
parameters in the model of Expression (1) defined at step S11, for
example, the function A', the function B', and the function c'
included in Expression (1), and the weights W.sub..psi. and
W.sub..PHI. and the biases b.sub..psi. and b.sub..PHI. included in
Expression (5) to Expression (12). The learning module 112 may, for
example, evaluate a gradient of J with respect to each of the
learning parameters by back propagation and update each learning
parameter based on any of various gradient methods. The learning
module 112 then proceeds to step S13 and repeats the estimation and
the evaluation of the output.
[0051] When the matching degree is sufficient (step S15: YES), on
the other hand, the learning module 112 terminates the series of
processing. In this case, the learning module 112 may output the
learnt function F to the input-output module 150, may store the
learnt function F in the storage module 120, or may send the learnt
function F to another apparatus via the communication module
140.
[0052] When the model learning apparatus 100 of the embodiment is
provided in combination with a control apparatus configured to
control an operation amount of a system, the model learning
apparatus 100 outputs the function F learnt by the learning module
112 to the control apparatus. The control apparatus uses the output
function F and calculates an input for controlling a future output,
based on an output of the system at a present time. The control
apparatus outputs the calculated input to the system and controls
the system.
[0053] The following describes a reason for ensuring the uniqueness
of a solution in the model (equation of state) learnt by the model
learning method described with reference to FIG. 2. In general,
when a dynamic model that reproduces a transient phenomenon is
established by a neural network (machine learning), there is no
guarantee that the model is stable or, in other words, the model
does not diverge. Expression (4) that is an equivalent
transformation of the equation of state expressed by Expression (1)
described above by using the internal variable x, which is obtained
by converting the output variable y by using the mapping .PHI.,
however, includes a linear differential equation with respect to
the internal variable x. The internal variable u obtained by
converting the input variable v by using the mapping .psi. is
similarly a linear term of the differential equation. The
respective mappings .PHI. and y are bijective mappings and
accordingly have unique inverse functions. The internal variable x
and the output variable y are convertible to each other, and the
input variable v and the internal variable u are convertible to
each other, so that the solution of nonlinear Expression (1) is
determinable by solving linearized Expression (4). The control
apparatus equipped with the model learning apparatus 100 uses the
model learnt by the model learning method described with reference
to FIG. 2 to improve the correlation of an output from the system
to a target value, while stably controlling the system.
[0054] In the model learning apparatus 100 of the embodiment
described above, the model is the equation of state including the
bijective mapping .psi. that uses the input variable v input into
the system as an input thereof and the bijective mapping .PHI. that
uses the output variable y output from the system as an input
thereof. This equation of state is linearized by using the
respective mappings .psi. and .PHI. as the internal variables. This
guarantees that the solution is unique even in a control problem
using a model having a nonlinear structure. This allows for
determination of just one optimum value of the input variable v
input into the system. When this model learning apparatus 100 is
applied to a control apparatus configured to control the system,
this improves the correlation of an output from the system to a
target value, while stably controlling the system, by using the
optimum value of the input variable v. This configuration
accordingly learns a model configured to determine an input that
improves the correlation of an output to a target value, while
stably controlling the system.
[0055] In general, a model learnt by machine learning has a
relatively high non-linearity. An optimal control problem that
causes an output predicted by using this model to appropriately
follow some target is thus likely to become a nonconvex
optimization problem. Accordingly, a solution obtained is likely to
vary significantly, depending on an initial condition set in the
process of solving the problem. This leads to a reliability problem
such as fluctuation of the input and makes it very difficult to
obtain an optimal solution. The model learning apparatus 100 of the
embodiment, on the other hand, guarantees that a solution is unique
and thus enables an optimal control problem corresponding to a
control problem of making the output (state) of the system follow a
target value to become a convex optimization problem. This
configuration guarantees that the solution is an optimal unique
solution, regardless of the initial condition. This improves the
correlation of an output from the system to a target value, while
stably controlling the system.
[0056] Moreover, in the model learning apparatus 100 of the
embodiment, each of the mappings .psi. and .PHI. included in the
model of Expression (1) includes the exogenous input d that is an
uncontrollable input affecting a variation of the output variable
y. In the model of Expression (1), a function A'(d) and a function
B'(d) that use the exogenous input d as inputs thereof respectively
work as coefficients of the mappings .psi. and .PHI.. Additionally,
the model of Expression (1) includes a function c'(d) that uses the
exogenous input d as an input thereof and a time derivative term of
the exogenous input d. This causes the model of Expression (1) to
be an equation of state that takes into account the influence of
the uncontrollable exogenous input d affecting a variation of the
output variable y. Using this model thus enables a future state of
the system to be predicted with high accuracy. Accordingly, this
configuration learns a model that is capable of establishing a
control apparatus configured to control the system with high
accuracy.
[0057] In the model learning apparatus 100 of the embodiment, the
equation of state of Expression (1) is linearized as shown by
Expression (4) by respectively defining the mapping .psi. and the
mapping .PHI. in the equation of state of Expression (1) as the
internal variable u and as the internal variable x. This guarantees
that the equation of state expressed by Expression (1) has a unique
solution. Accordingly, this configuration learns a model that is
capable of establishing a control apparatus configured to determine
an input that improves the correlation of an output to a target
value, while stably controlling the system.
[0058] Furthermore, in the model learning apparatus 100 of the
embodiment, each of the mappings .psi. and .PHI. is defined by
using the multilayer neural network (Expression (5) to Expression
(12)). Adjusting the weights W.sub..psi. and W.sub..PHI. and the
biases b.sub..psi. and b.sub..PHI. in each layer of the multilayer
neural network enables the output of the system based on the input
of the input variable v calculated by using the model to approach
to an actual value. Accordingly, this configuration learns a model
that is capable of establishing a control apparatus configured to
determine an input that further improves the correlation of an
output to a target value.
[0059] In the model learning apparatus 100 of the embodiment, the
learning module 112 evaluates the matching degree of the output,
which is estimated by using the input variable data set included in
the input-output data set, with the output variable data set. The
learning module 112 updates the learning parameters with respect to
the model according to the valuation of this matching degree and
learns the equation of state. The learning module 112 can thus
learn a nonlinear equation of state according to a learning
procedure using input-output data set provided in advance as
teaching data. This enables the model to be learnt in accordance
with an actual system. Accordingly, this configuration learns a
model that is capable of establishing a control apparatus
configured to furthermore improve the correlation of an output from
the system to a target value, while furthermore stably controlling
the system.
[0060] Furthermore, in the model learning method of the embodiment,
the model obtained at step S11 that is the model obtaining step is
the equation of state including the bijective mapping .psi. that
uses the input variable v input into the system as an input thereof
and the bijective mapping .PHI. that uses the output variable y
output from the system as an input thereof. This equation of state
is linearized by using the respective mappings .psi. and .PHI. as
the internal variables u and x. This guarantees that the solution
is unique even in a control problem using a model having a
nonlinear structure. This allows for determination of just one
optimum value of the input variable v input into the system. When
this model learning method is applied to a control apparatus
configured to control the system, this improves the correlation of
an output from the system to a target value, while stably
controlling the system, by using the optimum value of the input
variable v. Accordingly, this configuration learns a model that is
capable of establishing a control apparatus configured to determine
an input that improves the correlation of an output to a target
value, while stably controlling the system.
Second Embodiment
[0061] FIG. 3 is a schematic diagram illustrating the configuration
of a control apparatus 200 according to a second embodiment. The
control apparatus 200 of the second embodiment has a CPU 210
including a learning module 212 and a determination module 213.
[0062] The control apparatus 200 may be implemented as an
in-vehicle ECU (electronic control unit). The control apparatus 200
of this embodiment may be used to control a system 300. Like the
first embodiment, the system 300 is, for example, an internal
combustion engine a hybrid engine, or a power train. The control
apparatus 200 may be configured by a personal computer and may be
used to analyze the system 300.
[0063] The control apparatus 200 includes a CPU 210, a storage
module 120, a ROM/RAM 130, a communication module 140 and an
input-output module 150. The respective components of the control
apparatus 200 are connected with each other by means of buses. At
least part of the functional portions of the control apparatus 200
may be implemented by an ASIC (application specification integrated
circuit).
[0064] The CPU 210 includes a controller 111, the learning module
212 and the determination module 213. Like the controller 111 of
the first embodiment, the controller 111 loads a computer program
stored in the ROM 130 and expands and executes the computer program
on the RAM 130 to control the respective components of the control
apparatus 200. The learning module 212 learns a nonlinear equation
of state for predicting an output variable y that indicates a state
of the system 300 in a predictive control method described later.
The determination module 213 uses the equation of state learnt by
the learning module 212 to determine a target value of an input
variable v corresponding to a target value of the output variable
y.
[0065] FIG. 4 is a flowchart showing a predictive control method
according to the second embodiment. The predictive control method
of the system 300 is performed, for example, in response to a
user's request, such as activation of a predetermined
application.
[0066] The learning module 212 first obtains a model, an objective
function and a constraint function (step S21). More specifically,
the learning module 212 reads a nonlinear equation of state stored
in a model storage portion 121, an objective function J used to
control the system 300 optimally, and a constraint function G.
According to this embodiment, the learning module 212 reads an
equation of state expressed by Expression (13) to Expression (15)
given below and obtained by discretizing Expression (2) to
Expression (4) given above by a predetermined time step .DELTA.t at
a discrete time k.
[ Math . .times. 47 ] .times. u k = .PSI. .function. ( v k , d k )
( 13 ) [ Math . .times. 48 ] .times. y k = .PHI. - 1 .function. ( x
k , d k ) ( 14 ) [ Math . .times. 49 ] .times. x k + 1 = A
.function. ( d k ) .times. x k + B .function. ( d k ) .times. u k +
c .function. ( d k ) ( 15 ) ##EQU00014##
[0067] where A(d.sub.k), B(d.sub.k) and c(d.sub.k) included in
Expression (15) may respectively be expressed as Expression (19) to
Expression (21) given below, for example, by using the function
A'(d), the function B'(d) and the function c'(d) of Expression (2)
to Expression (4) given above:
[ Math . .times. 50 ] .times. A .function. ( d k ) = I + .DELTA.
.times. tA ' .function. ( d k ) ( 19 ) [ Math . .times. 51 ]
.times. B .function. ( d k ) = .DELTA. .times. tB ' .function. ( d
k ) ( 20 ) [ Math . .times. 52 ] .times. c .function. ( d k ) =
.DELTA. .times. tc ' .function. ( d k ) ( 21 ) ##EQU00015##
[0068] The learning module 212 subsequently determines parameters
of an optimal control problem at a present time (step S22). More
specifically, the learning module 212 sets a present time as a time
k and reads an output variable y.sub.k, a control input v.sub.k-1,
an exogenous input d.sub.k and a target value y.sub.kt obtained
from sensors or the like provided in advance at respective
locations of the system 300. The learning module 212 uses
Expression (13) to Expression (15) to calculate an internal
variable x.sub.k, a target value x.sub.kt of the internal variable
x.sub.k and an internal variable u.sub.k-1.
[0069] The determination module 213 then reads an initial input
time series for optimization (step S23). More specifically, the
determination module 213 determines initial values of an input time
series u.sub.k, . . . , u.sub.kf starting from the discrete time k
as a starting point to a time k.sub.f=k+N (where N denotes a
predetermined natural number).
[0070] The determination module 213 subsequently solves the optimal
control problem (step S24). More specifically, the determination
module 213 solves the optimal control problem shown by Expression
(22) and Expression (23) given below:
[ Math . .times. 53 ] minimize .times. { u .kappa. } .kappa. = k ,
, k f .times. J = g .function. ( x k , , x k f + 1 , u k - 1 , , u
k f , d k - 1 , , d k f ) + .kappa. = k k f .times. ( x .kappa. + 1
- x ( .kappa. + 1 ) .times. t ) T .times. Q .function. ( x .kappa.
+ 1 - x ( .kappa. + 1 ) .times. t ) ( 22 ) [ Math . .times. 54 ]
.times. subject .times. .times. to .times. .times. G .function. ( x
k , , x k f + 1 , u k - 1 , , u k f , d k - 1 , , d k f ) .ltoreq.
0 ( 23 ) ##EQU00016##
[0071] where x.sub..kappa. (.kappa.=k, . . . , k.sub.f+1) follows
Expression (15); g denotes an arbitrary scalar function that is
convex to x.sub.k, . . . , x.sub.kf+1 and u.sub.k-1, . . . ,
u.sub.kf; the constraint function G is an arbitrary vector function
that is convex to xi, . . . , x.sub.kf+1 and u.sub.k-1, . . . ,
u.sub.kf; Q denotes a positive definite symmetric matrix of
n.times.n; and the target value x.sub.kt is a target value of x at
the discrete time k and is converted from the target value y.sub.kt
of the output variable y at the discrete time k by
x.sub.kt=.PHI.(y.sub.kt,d.sub.k).
[0072] The optimal control problem shown by Expression (22) and
Expression (23) determines a time series of u.sub..kappa.
(.kappa.=k, . . . , k.sub.f) that minimizes the objective function
J. In order to decrease Expression (24) included in Expression
(22), u.sub..kappa. (.kappa.=k, . . . , k.sub.f) is required to
promptly follow the target value. Accordingly, the solution of
u.sub..kappa. (.kappa.=k, . . . , k.sub.f) that minimizes the
objective function J including Expression (24) achieves control of
making u.sub..kappa. (.kappa.=k, . . . , k.sub.f) promptly follow
the target value.
[ Math . .times. 55 ] .times. ( x .kappa. + 1 - x ( .kappa. + 1 )
.times. t ) T .times. Q .function. ( x .kappa. + 1 - X ( .kappa. +
1 ) .times. t ) ( 24 ) ##EQU00017##
[0073] The scalar function g is set freely to have an additional
function and may be set, for example, as follows:
[ Math . .times. 56 ] .times. g = .kappa. = k k f .times. ( ( u k )
T .times. Ru k + ( u .kappa. - u .kappa. - 1 ) T .times. S
.function. ( u .kappa. - u .kappa. - 1 ) ) ( 25 ) ##EQU00018##
[0074] where each of R and S denotes a positive definite m.times.m
symmetric matrix. Expression (26) included in Expression (25)
decreases with an approach of the internal variable u to zero, and
Expression (27) included in Expression (25) decreases with a
decrease in time change of the internal variable u. Accordingly,
the solution that minimizes the objective function J makes the
internal variable u closest possible to zero and minimizes a change
of the internal variable u.
[ Math . .times. 57 ] .times. ( u k ) T .times. Ru .kappa. ( 26 ) [
Math . .times. 58 ] .times. ( u .kappa. - u .kappa. - 1 ) T .times.
S .function. ( u .kappa. - u .kappa. - 1 ) ( 27 ) ##EQU00019##
[0075] Desired constraint conditions may be set for the constraint
function G that is the vector function. For example, the following
conditions may be set for the constraint function G:
[ Math . .times. 59 ] .times. G = [ u _ k - u k u _ k f - u k f u k
- u _ k u k f - u _ k f ] ( 28 ) ##EQU00020##
[0076] Expression (28) expresses upper limit and lower limit
constraints shown by Expression (29) given below:
[ Math . .times. 60 ] .times. u _ .kappa. .ltoreq. u .kappa.
.ltoreq. u _ .kappa. .function. ( .kappa. = k , , k f ) ( 29 )
##EQU00021##
[0077] The determination module 213 solves the problem described
above to determine the internal variable u.sub.k and then uses
Expression (13) given above to determine the target value of the
input variable vi.
[0078] FIG. 5 is a schematic diagram illustrating one example of a
convex function and a nonconvex function. The convex function
herein denotes a function that satisfies Expression (30) given
below with respect to any t, where 0<t<1 and any x and y:
[ Math . .times. 61 ] .times. f .function. ( ( 1 - t ) .times. x +
ty ) .ltoreq. ( 1 - t ) .times. f .function. ( x ) + tf .function.
( y ) ( 30 ) ##EQU00022##
[0079] Intuitively, a function in such a shape as shown in FIG.
5(a) is a convex function, and a function in such a shape as shown
in FIG. 5(b) is a nonconvex function. In the case of the convex
function, a unique optimum value (a minimum value L0 in the case of
FIG. 5(a)) can be determined. In the case of the nonconvex
function, however, as shown in FIG. 5(b), there are a plurality of
local minimum values (values L1, L2, L3, L4, L5 and L6 in the case
of FIG. 5(b)), so that an optimum value is not necessarily
determined.
[0080] At step S24, the determination module 213 solve the optimal
control problem of Expressions (22) and (23) by using the initial
values determined at step S23 under the conditions determined at
step S22. This problem may be solved by using, for example, a
mathematical programming method, such as a sequential quadratic
programming method.
[0081] The controller 111 then reflects the obtained solution as an
input into the system 300 (step S25). More specifically, the
controller 111 converts the optimal solution u.sub.k, . . . ,
u.sub.kf obtained at step S24 into v.sub.k, . . . , v.sub.kf by
using the mapping .psi. of Expression (13) and specifies vi in the
converted optimal solution as an actual control input vi.
[0082] The controller 111 subsequently determines whether the
control is to be terminated or not (step S26). More specifically,
the controller 111 determines whether the control is to be
terminated or not, according to the state of reception of an
external signal to terminate the control. When the controller 111
receives the external signal, the controller 111 outputs the
predicted control input vi to outside and terminates a current
cycle of the control process. The predicted control input vi may be
output to the input-output module 150, may be stored in the storage
module 120 or may be sent to another apparatus, for example, a
caller ECU, via the communication module 140. When the controller
111 does not receive the external signal, the controller 111
proceeds to step S27.
[0083] When the controller 111 does not receive the external signal
at step S26, the controller 111 advances the time (step S27). After
advancing the time, the controller 111 returns to step S22. The
processing of steps S22 to S25 is then repeated, and the controller
111 determines again whether the controller 111 receives the
external signal to terminate the control at step S26.
[0084] FIG. 6 is a first schematic diagram illustrating results of
calculation in the model learning apparatus 100. The following
describes the results of calculation to predict an input from an
output of a virtual system by using the model learning apparatus
100 of the first embodiment. FIG. 6 shows time changes of a
plurality of outputs in a virtual system as the results of current
calculation. FIG. 6 illustrates time changes of four outputs
("output 1", "output 2", "output 3" and "output 4") by solid line
curves OP1, OP2, OP3 and OP4. Among the four outputs, the output 1,
the output 2 and the output 3 are different types of outputs, and
target values are set in the respective outputs (as shown by dotted
line curves Do1, Do2 and Do3 of the output 1, the output 2 and the
output 3). With respect to the output 4, an upper limit constraint
is shown by a dotted line curve Do4.
[0085] FIG. 7 is a second schematic diagram illustrating results of
calculation in two model learning apparatuses. FIG. 7 shows the
results of calculation of inputs to cause the four outputs shown in
FIG. 6 to be output from the virtual system. FIG. 7 illustrates
time changes of three inputs ("input 1", "input 2" and "input 3")
calculated by using the model learning apparatus of the embodiment,
inside of an encirclement of one-dot chain line. FIG. 7 also
illustrates time changes of the three inputs calculated by using a
model learning apparatus of a comparative example, inside of an
encirclement of two-dot chain line. Unlike in the model learning
apparatus of the embodiment, in the model learning apparatus of the
comparative example, a bijective mapping is not employed in the
model for a mapping that uses an input variable and an output
variable as inputs thereof.
[0086] The input 1 to the input 3 shown in FIG. 7 are the results
of calculation under a plurality of different initial conditions
with respect to the four outputs shown in FIG. 6. In the model
learning apparatus of the comparative example, the different
initial conditions cause fluctuations of the values of the input 1
to the input 3. For example, even the input 2 is unstable and
fluctuates. Accordingly, it is difficult for a prediction process
of the comparative example to determine just one input that
achieves the output 1 to the output 4. In the model learning
apparatus of the embodiment, on the other hand, even the different
initial conditions do not cause fluctuations of the values of the
input 1 to the input 3. Accordingly, the model learning apparatus
of the embodiment allows for determination of just one input and
thereby stabilizes the input.
[0087] In the control apparatus 200 of the embodiment described
above, the model obtained by the leaning module 212 is the equation
of state including the bijective mapping .psi. that uses the input
variable v input into the system 300 as an input thereof and the
bijective mapping .PHI. that uses the output variable y output from
the system 300 as an input thereof. This equation of state is
linearized by using the respective mappings .psi. and .PHI. as the
internal variables. This guarantees that the solution is unique
even in a control problem using a model having a nonlinear
structure. Accordingly, this configuration learns a model that is
capable of establishing a control apparatus configured to determine
an input that improves the correlation of an output to a target
value, while stably controlling the system 300.
[0088] In the control apparatus 200 of the embodiment, the learning
module 212 learns the equation of state expressed by Expression
(13) to Expression (15) obtained by discretizing Expressions (2) to
(4) by the time step at the discrete time k. This configuration
limits the numbers of the internal variables x and u and thereby
shortens a time period required for learning the model.
Accordingly, this configuration learns, in a relatively short time,
a model that is capable of establishing a control apparatus
configured to determine an input that improves the correlation of
an output to a target value, while stably controlling the system
300.
[0089] In the control apparatus 200 of the embodiment, the
determination module 213 uses the equation of state expressed by
Expressions (13) to (15) and learnt by the learning module 212 to
solve the optimal control problem shown by Expression (22) and
Expression (23) and thereby determine the input variable v. This
causes the optimal control problem to become a control problem with
respect to a linear model and causes the optimal control problem
using Expressions (13) to (15) to become a convex optimization
problem. Accordingly, this configuration allows for determination
of just one optimum value of the input variable v input into the
system 300. The control apparatus thus improves the correlation of
an output from the system 300 to a target value, while stably
controlling the system 300.
<Modifications of Embodiments>
[0090] The present disclosure is not limited to the embodiments
described above but may be implemented by a variety of other
aspects without departing from the scope of the disclosure. Some
examples of possible modification are given below. In the above
embodiments, part of the configuration implemented by hardware may
be replaced by software. On the contrary, part of the configuration
implemented by software may be replaced by hardware.
[Modification 1]
[0091] The above embodiments illustrate the examples of the
configuration of the model learning apparatus and the configuration
of the control apparatus provided with the model learning
apparatus. The configuration of the model learning apparatus and
the configuration of the control apparatus may, however, be
modified in various ways and are not limited to the configurations
of these embodiments. For example, at least one of the model
learning apparatus and the control apparatus may be configured by
cooperation of a plurality of information processing apparatuses
(including a server apparatus and an in-vehicle ECU) located on a
network.
[Modification 2]
[0092] The above embodiments illustrate the examples of the
procedures of the model learning method (shown in FIG. 2) and the
predictive control method (shown in FIG. 4). The procedures of
these methods may, however, be modified in various ways and are not
limited to the procedures of these embodiments. For example, part
of the steps may be omitted, or other steps that are not described
herein may be added. The sequence of execution of part of the steps
may also be changed.
[Modification 3]
[0093] According to the first embodiment, the equation of state is
defined by Expression (1), and the respective mappings .psi. and
.PHI. included in Expression (1) are defined by the respective
internal variables u and x shown by Expressions (2) and (3). These
definitions of the respective mappings .psi. and .PHI. are,
however, only illustrative, and the mappings .psi. and .PHI. may be
defined in any forms. The mapping that uses an uncontrollable
exogenous input d affecting a variation in an output variable y, in
addition to the internal variable, as inputs thereof provides a
model configured to predict a future state of the system with high
accuracy.
[Modification 4]
[0094] According to the first embodiment, the learning module 112
uses the matching degree to learn the model at step S14 in the
model learning method (shown in FIG. 2). According to a
modification, however, the learning module 112 may determine
whether constraint conditions are satisfied, in addition to the
evaluation for the matching degree. For example, the constraint
conditions may respectively be set for the function A'(d), the
function B'(d) and the function c'(d) included in the equation of
state of Expression (1).
[Modification 5]
[0095] According to the first embodiment, the mapping .psi., the
mapping .PHI., the function A'(d), the function B'(d) and the
function c'(d) are output in response to input of the exogenous
input d. According to a modification, however, the outputs of the
mapping .psi., the mapping .PHI., the function A'(d), the function
B'(d) and the function c'(d) may not be changed, depending on the
exogenous input d.
[Modification 6]
[0096] According to the second embodiment, the learning module 212
solves the optimal control problem by using the discretized
equation of state expressed by Expression (13) to Expression (15)
converted from Expression (2) to Expression (4). According to a
modification, however, the learning module 212 may solve the
optimal control problem without discretizing the equation of state.
Solving the optimal control problem by using the equation of state
converted to Expression (13) to Expression (15) limits the numbers
of the internal variables x and u and thus relatively shortens the
time period required for learning the model.
[0097] The aspects of the present disclosure are described above,
based on the embodiments and the modifications. The embodiments and
the modifications described above are, however, presented to
facilitate understanding of the present disclosure and are not at
all intended to limit the present disclosure. The aspects of the
present disclosure may be changed, altered, modified or improved
without departing from the subject matter or the scope of the
present disclosure and include equivalents thereof. Furthermore,
any of the technical features may be omitted appropriately unless
it is described as essential in the description hereof.
REFERENCE SIGNS LIST
[0098] 100 model learning apparatus [0099] 110, 210 CPU [0100] 111
controller [0101] 112, 212 learning module [0102] 120 storage
module [0103] 121 model storage portion [0104] 122 data set storage
portion [0105] 130 ROM/RAM [0106] 140 communication module [0107]
150 input-output module [0108] 200 control apparatus [0109] 213
determination module [0110] 300 system
* * * * *
References