U.S. patent application number 10/393078 was filed with the patent office on 2003-11-13 for optimized artificial neural networks.
This patent application is currently assigned to Koninklijke Philips Electronics N.V.. Invention is credited to Caruana, Richard A., Eshelman, Larry J., Schaffer, J. David.
Application Number | 20030212645 10/393078 |
Document ID | / |
Family ID | 23391506 |
Filed Date | 2003-11-13 |
United States Patent
Application |
20030212645 |
Kind Code |
A1 |
Schaffer, J. David ; et
al. |
November 13, 2003 |
Optimized artificial neural networks
Abstract
Neural network architectures are represented by symbol strings.
An initial population of networks is trained and evaluated. The
strings representing the fittest networks are modified according to
a genetic algorithm and the process is repeated until an optimized
network is produced.
Inventors: |
Schaffer, J. David;
(Wappingers Falls, NY) ; Eshelman, Larry J.;
(Ossining, NY) ; Caruana, Richard A.; (Ridgefield,
CT) |
Correspondence
Address: |
Corporate Patent Counsel
U.S. Philips Corporation
580 White Plains Road
Tarrytown
NY
10591
US
|
Assignee: |
Koninklijke Philips Electronics
N.V.
|
Family ID: |
23391506 |
Appl. No.: |
10/393078 |
Filed: |
March 20, 2003 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
10393078 |
Mar 20, 2003 |
|
|
|
09578428 |
May 25, 2000 |
|
|
|
6601053 |
|
|
|
|
09578428 |
May 25, 2000 |
|
|
|
08484695 |
Jun 7, 1995 |
|
|
|
08484695 |
Jun 7, 1995 |
|
|
|
08212373 |
Mar 10, 1994 |
|
|
|
08212373 |
Mar 10, 1994 |
|
|
|
07714320 |
Jun 10, 1991 |
|
|
|
07714320 |
Jun 10, 1991 |
|
|
|
07354004 |
May 19, 1989 |
|
|
|
Current U.S.
Class: |
706/25 ; 706/13;
706/26 |
Current CPC
Class: |
G06N 3/086 20130101 |
Class at
Publication: |
706/25 ; 706/26;
706/13 |
International
Class: |
G06G 007/00; G06E
003/00; G06E 001/00; G06F 015/18; G06N 003/08; G06N 003/12; G06N
003/00 |
Claims
What is claimed is:
1. A method of optimizing the structure of an artificial neural
network, comprising: defining a neural network having an initial
architecture comprised of a plurality of neurons; defining a symbol
string representing the architecture of the neural network;
providing a set of neural network inputs including a set for
training and a set for evaluation; training the neural network
using the training set of inputs; evaluating the trained neural
network using the evaluation set of inputs; modifying the symbol
string representation of the neural network architecture according
to a genetic algorithm; successively training and evaluating the
neural networks represented by the modified symbol strings and
modifying the symbol strings representing improved neural
networks.
2. A method according to claim 1, further comprising the step of
selecting the fittest of the evaluated trained network according to
a defined criterion; and carrying out the step of modifying
according to a genetic algorithm on the selected fittest
networks.
3. A method according to claim 2, wherein the step of training the
neural network is carried out by supervised learning.
4. A method according to claim 1, wherein the step of training the
neural network is carried out by supervised learning.
5. A method according to claim 2, wherein the symbol string
representation of the neural network architecture represents the
number of layers of hidden neurons, and the number of neurons with
each hidden layer of the network.
6. A method according to claim 5, wherein the network is trained by
back propagation, and the symbol string representation of the
neural network architecture further represents the back propagation
parameters of learning rate, momentum and dispersion of initial
link weights.
7. An artificial neural network optimized according to claim 1.
8. An artificial neural network optimized according to claim 2.
9. An artificial neural network optimized according to claim 3.
10. An artificial neural network optimized according to claim
4.
11. An artificial neural network optimized according to claim
5.
12. An artificial neural network optimized according to claim
6.
13. A method of optimizing the structure of an artificial neural
network, comprising: defining a neural network having an initial
architecture comprised of a plurality of input neurons, output
neurons and hidden neurons, and a plurality of signal transmission
paths for applying output signals from said input neurons to said
hidden neurons and for applying output signals from said hidden
neurons to said output neurons; defining a symbol string
representing the architecture of the neural network; providing a
set of neural net input-output pairs including a set for training
and a set for evaluation; training the neural network by supervised
learning using the training set of input-output pairs; evaluating
the trained neural network using the evaluation set of input-output
pairs; modifying the symbol string representation of the neural
network architecture according to a generic algorithm; successively
training and evaluating the neural networks represented by the
modified symbol strings and modifying the symbol strings
representing improved neural networks.
14. A method according to claim 13, wherein the step of training by
supervised learning is carried out by a back propagation
algorithm.
15. A method according to claim 14, wherein the symbol string
representation of the neural network architecture represents the
number of intermediate neurons.
16. A method according to claim 15, wherein the symbol string
representation of the neural network architecture further
represents the back propagation parameters of learning rate,
momentum and dispersion of initial link weights.
17. An artificial neural network optimized according to claim
13.
18. An artificial neural network optimized according to claim
14.
19. An artificial neural network optimized according to claim
15.
20. An artificial neural network optimized according to claim
16.
21. An optimized artificial neural network, comprising: a plurality
of input neurons for receiving network input signals and for
developing output signals in response thereto; a plurality of
output neurons receptive of signals for developing network output
signals; a plurality of hidden neurons which receive input signals
and develop output signals; signal transmission means comprised of
a plurality of signal paths for applying output signals from said
input neurons to said hidden neurons and for applying output
signals from said hidden neurons to said output neurons; and the
number of hidden neurons, the neuron threshold functions and the
signal path weights having values optimized by supervised learning
and network modification by a genetic algorithm.
22. A neural network according to claim 21, wherein one hidden
neuron layer has only one neuron.
Description
BACKGROUND OF THE INVENTION
[0001] The present invention relates to artificial neural networks,
and more particularly artificial neural networks having an
architecture optimized by the use of a genetic algorithm.
[0002] The term artificial neural network is used herein to
describe a highly connected network of artificial neurons. For
simplicity, the modifier "artificial" will usually be omitted.
[0003] Artificial neurons themselves have simply described
behavior. They are threshold circuits that receive input signals
and which develop one or more output signals. The input-output
relationship of a neuron is determined by a neuron activation or
threshold function and the sum of the input signals. The activation
function may be a simple step function, a sigmoid function or some
other monotonically increasing function.
[0004] Neurons are combined in highly connected networks by signal
transmission paths to form neural networks. The signal transmission
paths have weights associated with them, so that a signal applied
to a neuron has a signal strength equal to the product of the
signal applied to the signal path and the weight of that signal
path. Consequently the signals received by the neurons are weighted
sums determined by the weight values of the signal transmission
paths and the applied signal values.
[0005] The interconnectivity of the neurons in a neural network
gives rise to behavior substantially more complex than that of
individual neurons. This complex behavior is determined by which
neurons have signal transmission paths connecting them, and the
respective values of the signal transmission path weights. Desired
network behavior can be obtained by the appropriate selection of
network topology and weight values. The process of selecting weight
values to obtain a particular network characteristic is called
training. Different neural network architectures and techniques for
training them are described in Parallel Distributed Processing,
Vol. 1, D. E. Rumelhart, J. L. McClelland and P. R. Group, Editors,
MIT Press, 1986.
[0006] Properly trained neural networks exhibit interesting and
useful properties, such as pattern recognition functions. A neural
network having the correct architecture and properly trained will
possess the ability to generalize. For example, if an input signal
is corrupted by noise, the application of the noisy input signal to
a neural network trained to recognize the input signal will cause
it to generate the appropriate output signal. Similarly, if the set
of training signals has shared properties, the application of an
input signal not belonging to the training set, but having the
shared properties, will cause the network to generate the
appropriate output signal. This ability to generalize has been a
factor in the interest and tremendous activity in neural network
research that is now going on.
[0007] Trained neural networks having an inappropriate architecture
for a particular problem do not always correctly generalize after
being trained. They can exhibit an "over training" condition in
which the input signals used for training will cause the network to
generate the appropriate output signals, but an input signal not
used for training, and having a shared property with the training
set, will not cause the appropriate output signal to be generated.
The emergent property of generalization is lost by over
training.
[0008] It is an object of the invention to optimize the
architecture of a neural network so that over training will not
occur, and yet have a network architecture such that the trained
network will exhibit the desired emergent property.
SUMMARY OF THE INVENTION
[0009] According to the invention a neural network is defined, and
its architecture is represented by a symbol string. A set of
input-output pairs for the network is provided, and the
input-output pairs are divided into a training set and an
evaluation set. The initially defined network is trained with the
training set, and then evaluated with the evaluation set. The best
performing networks are selected.
[0010] The symbol strings representing the selected network
architectures are modified according to a genetic algorithm to
generate new symbol strings representing new neural network
architectures. These new neural network architectures are then
trained by the training set, evaluated by the evaluation set, and
the best performing networks are again selected. Symbol strings
representative of improved networks are again modified according to
the genetic algorithm and the process is continued until a
sufficiently optimized network architecture is realized.
BRIEF DESCRIPTION OF THE DRAWING
[0011] The method and network architecture according to the
invention is more fully described below in conjunction with the
accompanying drawing in which:
[0012] FIG. 1 illustrates the architecture of one kind of neural
network having one hidden layer of neurons;
[0013] FIG. 2 illustrates the operation of a genetic algorithm;
[0014] FIGS. 3A-3C show how a neural network architecture is
represented by a symbol string and how a genetic algorithm
recombination operator changes the symbol string;
[0015] FIG. 4 illustrates the sequence of steps of the method
according to the invention; and
[0016] FIG. 5 illustrates an optimized neural network architecture
realized according to the invention.
DETAILED DESCRIPTION OF THE INVENTION
[0017] FIG. 1 illustrates a neural network of the feed forward
type. The network is comprised of a layer 10 of input neurons
11-14, a hidden layer 20 of neurons 21-24 and a layer 30 of output
neurons 31-34. In the network shown the input neurons are connected
to the intermediate neurons by signal transmission paths 41, 42 . .
. , and the intermediate neurons are connected to the output neuron
by signal transmission paths 51, 52, . . . A more general form of
feedforward network could also include signal transmission paths
from the input neurons directly to the output neurons. For clarity,
some of the possible signal transmission paths have been omitted
from the drawing.
[0018] Initially, the neuron thresholds and the weights of the
signal transmission paths are set to some random values. After
training, the neuron thresholds and the path weights will have
values such that the network will exhibit the performance for which
it was trained.
[0019] The underlying theory of genetic algorithms is described in
J. H. Holland, Adaptation in Natural and Artificial Systems, Univ.
of Michigan Press, 1975. In carrying out the present invention, a
symbol string is used to represent a neural network architecture. A
genetic algorithm operates on the symbol strings, and by changing
them it changes the architecture of the networks which they
represent. The most frequently used symbols are binary, i.e. 0 and
1, but the algorithm is not restricted to binary symbols.
[0020] FIG. 2 shows the steps in the genetic algorithm. An initial
population P(i=0) of symbol strings representing an initial
population of neural networks is defined. The population is
evaluated according to some criterion and then the fittest members,
i.e. those closest to the evaluation criterion, are selected. If
the fitness of the selected members meet the criterion the
algorithm stops. Otherwise, genetic recombination operators are
applied to the symbol strings representing the population to create
a new population P(i i+1). The steps are repeated until the
selection criterion is met and the algorithm halts.
[0021] FIGS. 3A-3C illustrate how a neural network architecture is
represented by a symbol string, and how a recombination operator of
the genetic algorithm operates on the strings to change the
population of neural networks represented by the strings. The field
of genetic algorithms has adopted biological terminology, and that
terminology will be used in the following discussion.
[0022] In FIG. 3A various neural network parameters are mapped into
a binary sequence. The entire binary sequence is referred to as a
chromosome, and the particular substrings of the chromosome into
which the network parameters are mapped are referred to as genes.
The network parameters represented by the genes, and the mapping or
representation used for each gene are discussed below in connection
with the example given. For now, it is sufficient to understand
that the chromosome represents a network having an architecture
with the parameters (and the parameter values) represented by the
genes. Different chromosomes represent different network
architectures.
[0023] FIG. 3B illustrates a pair of chromosomes to which the
genetic recombination operator will be applied. These two
chromosomes are referred to as parents, and each represents a
different network architecture.
[0024] An arbitrary position along the two chromosome strings
called the crossover point is selected. The parent chromosomes are
severed at the crossover point, and the resulting substrings are
recombined in the following manner. The first substring of the
first parent chromosome and the second substring of the second
parent chromosome are combined to form a new chromosome, called an
offspring. Likewise the first substring of the second parent
chromosome and the second substring of the first parent chromosome
are combined to form another offspring.
[0025] The offspring chromosomes are shown in FIG. 3C. The position
where the substrings of the severed parent chromosomes were joined
to form the offspring, i.e. the crossover point, is marked with a
colon. The colon is not part of the chromosome string but is merely
a marker to show where the parents were severed and the resulting
substrings recombined to form the offspring. It will be appreciated
that the repeated application of this recombination operator to a
pair of chromosomes and their offspring will generate a tremendous
number of different chromosomes and corresponding neural network
architectures. Additionally, the genetic algorithm includes a
mutation operator which with cause random bit changes in the
chromosomes.
[0026] The method according to the invention can be understood with
reference to FIG. 4. An initial population of networks is trained
by a training set of input-output pairs, and the trained network is
evaluated by an evaluation set of input-output pairs. Networks are
selected for reproduction stochastically according to their
performance evaluation relative to the population average. The
chromosomes representing the network architectures of the selected
networks are then modified by the genetic algorithm to create
offspring representing new network architectures, and networks
having the new architectures. The networks having the new
architecture are then trained and evaluated. The best performing
networks are again modified according to the genetic algorithm, and
the process is continued until some specified performance criteria
is met.
EXAMPLE
[0027] An example of the invention was carried out by digital
computer simulation. The task to be learned by the neural network
was a pattern discrimination learning task called the minimum
interesting coding problem. The input to the neural network was
four binary signals in which the first two are noise having no
relation to the output pattern. The next two represented a binary
power-of-two coded integer and the output signals were to be the
Gray coding of the input signal.
[0028] Network training was by the back propagation method. For a
complete description of back propagation see Parallel Distributed
Processing, Vo. 1, supra. pgs. 318-362. The network architecture
was represented by a sixteen bit binary string, as shown in FIG.
3A. The first two bits represented the back propagation learning
rate (.eta.); the next two the back propagation momentum (.alpha.);
and the next two, the range of initial path weights (W). Two sets
of five bits followed, each for representing a hidden layer. Flag
bit F1 indicates whether the first hidden layer is present, and the
remaining four bits N1 represented the number of neurons in that
layer. Similarly, flag bit F2 indicates whether the second hidden
layer is present, and the next four bits represent the number of
neurons N2 in the second hidden layer. Thus, the representation
could produce nets as large as having two hidden layers of sixteen
neurons each and as small as no hidden layers. The particular
representations used for the network parameters are as follows.
[0029] Learning rate .eta.=1/2.sup.n, where n=1+gray code value of
the chromosome=1, 2, 3, 4. Thus, n=(0.5, 0.25, 0.125, 0.0625).
[0030] Momentum .alpha.=(1-(n/10)), where n=1+gray code value of
chromosomes=1, 2, 3, 4. Thus .alpha.=(0.9, 0.8, 0.7, 0.6).
[0031] Weight W=2/2.sup.n, where n=1+gray code value of the
chromosome=1, 2, 3, 4. Thus, W=(1.0, 0.5, 0.25, 0.125).
Alternatively W=2/2.sup.n-constant.
[0032] The activation function of the neurons was a simple sigmoid
function and is not critical. The number of input nodes was fixed
at four and the number of output nodes was fixed at two, in order
to match the problem.
[0033] The input-output pairs of the network are shown in the
following Table I.
1TABLE I The Minimum Interesting Coding Problem input output 0000
00 1100 00 1001 01 1101 01 0010 11 0110 11 0011 10 1011 10 0100 00
1000 00 0001 01 0101 01 1010 11 1110 11 0111 10 1111 10
[0034] The method was applied using only the first eight entries
from Table I. One of the first eight entries of the table was
chosen at random and reserved for the evaluation set. The remaining
seven table entries were used for the training set, and the network
was trained until either the sum of the squared error decreased to
the preset value 0.10 or after a prespecified number (in the case
of this example, 2,000) of exposures to a training pair and back
propagation. Once a network was trained, the evaluation set of one
input-output pair was applied to it and the mean square error was
used as an estimate of its ability to generalize. The sets with the
lowest mean square error were selected. The strings representing
the selected network architectures were modified according to the
genetic algorithm and the process was repeated.
[0035] After producing and testing approximately 1,000 individual
network architectures, the genetic algorithm's population had
converged on several network properties. All of the final
population of individual network architectures had two hidden
layers, and the majority of them (19 of 30) had only a single
neuron in the first intermediate layer. The most prevalent single
architecture is shown in FIG. 4. The path weights and neuron
thresholds for one example are shown in the drawing.
[0036] Initially, one would imagine that this architecture could
not possibly solve the problem. There are four distinct classes of
input patterns, and the network channels all input information into
a single neuron. However, the activation level or threshold of the
bottleneck neuron discriminates the four classes of input, and the
low weights on the two noise input signals show that the network
learned to ignore them. The connections above the bottleneck
perform a recoding of the input.
[0037] Finally, the best architectures produced were repeatedly
trained on all of the first eight table entries and then tested
using the eight table entries they had never seen before. This was
repeated fifty times and the total sum square error on the test set
was determined, together with the number of times at least one of
the eight test cases was incorrectly classified. For comparison,
this procedure was also performed on the full network architecture
with two hidden layers of sixteen neurons each, trained using back
propagation learning only. These results are shown in the following
Table II.
2 TABLE II criterion full network evolved network total error
(mean) 0.675 0.207 total error (standard error) 0.056 0.034
error-free tests 19/50 48/50
[0038] A t-test of the mean difference on total error is
significant (.alpha.<0.001). Clearly, the full network
architecture exhibits over specificity to the training set, i.e.
overtraining. It generalizes poorly. On the other hand, the severe
restriction of the architecture determined through the application
of the genetic algorithm exhibits a substantially better
performance and generalizes much better.
[0039] The preferred embodiment disclosed is a feedforward neural
network. It is contemplated that the invention covers other types
of optimized networks such as networks having feedback, networks
without hidden neurons and other network configurations.
Additionally, the genetic algorithm could use a more elaborate
recombination operator such as one with more than one crossover
point. Accordingly, the particular example given above should not
be construed as limiting the scope of the invention which is
defined by the following claims.
* * * * *