U.S. patent application number 17/204287 was filed with the patent office on 2021-10-28 for method and device for on-vehicle active learning to be used for training perception network of autonomous vehicle.
The applicant listed for this patent is Stradvision, Inc.. Invention is credited to Sung An Gweon, Hongmo Je, Bongnam Kang, Yongjoong Kim.
Application Number | 20210334652 17/204287 |
Document ID | / |
Family ID | 1000005895404 |
Filed Date | 2021-10-28 |
United States Patent
Application |
20210334652 |
Kind Code |
A1 |
Je; Hongmo ; et al. |
October 28, 2021 |
METHOD AND DEVICE FOR ON-VEHICLE ACTIVE LEARNING TO BE USED FOR
TRAINING PERCEPTION NETWORK OF AUTONOMOUS VEHICLE
Abstract
A method of on-vehicle active learning for training a perception
network of an autonomous vehicle is provided. The method includes
steps of: an on-vehicle active learning device, (a) if a driving
video and sensing information are acquired from a camera and
sensors on an autonomous vehicle, inputting frames of the driving
video and the sensing information into a scene code assigning
module to generate scene codes including information on scenes in
the frames and on driving events; and (b) at least one of selecting
a part of the frames, whose object detection information satisfies
a condition, as specific frames by using the scene codes and the
object detection information and selecting a part of the frames,
matching a training policy, as the specific frames by using the
scene codes and the object detection information, and storing the
specific frames and specific scene codes in a frame storing
part.
Inventors: |
Je; Hongmo; (Pohang-si,
KR) ; Kang; Bongnam; (Pohang-si, KR) ; Kim;
Yongjoong; (Pohang-si, KR) ; Gweon; Sung An;
(Seoul, KR) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
Stradvision, Inc. |
Pohang-si |
|
KR |
|
|
Family ID: |
1000005895404 |
Appl. No.: |
17/204287 |
Filed: |
March 17, 2021 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
63014877 |
Apr 24, 2020 |
|
|
|
Current U.S.
Class: |
1/1 |
Current CPC
Class: |
G05D 1/0231 20130101;
G06N 3/08 20130101; G06N 3/04 20130101 |
International
Class: |
G06N 3/08 20060101
G06N003/08; G06N 3/04 20060101 G06N003/04 |
Claims
1. A method for on-vehicle active learning to be used for training
a perception network of an autonomous vehicle, comprising steps of:
(a) an on-vehicle active learning device, if a driving video and
sensing information are acquired respectively from a camera and one
or more sensors mounted on an autonomous vehicle while the
autonomous vehicle is driven, performing or supporting another
device to perform a process of inputting one or more consecutive
frames of the driving video and the sensing information into a
scene code assigning module, to thereby allow the scene code
assigning module to generate each of one or more scene codes
including information on each of scenes in each of the frames and
information on one or more driving events by referring to the
frames and the sensing information; (b) the on-vehicle active
learning device performing or supporting another device to perform
at least one of (i) a process of selecting a first part of the
frames, whose object detection information generated during the
driving events satisfies a preset condition, as specific frames to
be used for training the perception network of the autonomous
vehicle by using each of the scene codes of each of the frames and
the object detection information, for each of the frames, detected
by an object detector and a process of storing the specific frames
and their corresponding specific scene codes in a frame storing
part such that the specific frames and their corresponding specific
scene codes match with one another and (ii) a process of selecting
a second part of the frames, matching with a training policy of the
perception network of the autonomous vehicle, as the specific
frames by using the scene codes and the object detection
information and a process of storing the specific frames and their
corresponding specific scene codes in the frame storing part such
that the specific frames and their corresponding specific scene
codes match with one another; and (c) the on-vehicle active
learning device performing or supporting another device to perform
(c1) a process of sampling the specific frames stored in the frame
storing part by using the specific scene codes to thereby generate
training data and (c2) a process of executing on-vehicle learning
of the perception network of the autonomous vehicle by using the
training data.
2. (canceled)
3. The method of claim 1, wherein, at the step of (c), the
on-vehicle active learning device performs or supports another
device to perform at least one of (i) a process of under-sampling
the specific frames by referring to the scene codes or a process of
over-sampling the specific frames by referring to the scene codes,
to thereby generate the training data and thus train the perception
network, at the step of (c1) and (ii) (ii-1) a process of
calculating one or more weight-balanced losses on the training
data, corresponding to the scene codes, by weight balancing and
(ii-2) a process of training the perception network via
backpropagation using the weight-balanced losses, at the step of
(c2).
4. A method for on-vehicle active learning to be used for training
a perception network of an autonomous vehicle, comprising steps of:
(a) an on-vehicle active learning device, if a driving video and
sensing information are acquired respectively from a camera and one
or more sensors mounted on an autonomous vehicle while the
autonomous vehicle is driven, performing or supporting another
device to perform a process of inputting one or more consecutive
frames of the driving video and the sensing information into a
scene code assigning module, to thereby allow the scene code
assigning module to generate each of one or more scene codes
including information on each of scenes in each of the frames and
information on one or more driving events by referring to the
frames and the sensing information; and (b) the on-vehicle active
learning device performing or supporting another device to perform
at least one of (i) a process of selecting a first part of the
frames, whose object detection information generated during the
driving events satisfies a preset condition, as specific frames to
be used for training the perception network of the autonomous
vehicle by using each of the scene codes of each of the frames and
the object detection information, for each of the frames, detected
by an object detector and a process of storing the specific frames
and their corresponding specific scene codes in a frame storing
part such that the specific frames and their corresponding specific
scene codes match with one another and (ii) a process of selecting
a second part of the frames, matching with a training policy of the
perception network of the autonomous vehicle, as the specific
frames by using the scene codes and the object detection
information and a process of storing the specific frames and their
corresponding specific scene codes in the frame storing part such
that the specific frames and their corresponding specific scene
codes match with one another; and wherein, at the step of (a), the
on-vehicle active learning device performs or supports another
device to perform a process of allowing the scene code assigning
module to (i) apply a learning operation to each of the frames, to
thereby classify each of the scenes of each of the frames into one
of classes of driving environments and one of classes of driving
roads and thus generate each of class codes of each of the frames,
via a scene classifier based on deep learning, (ii) detect each of
driving events, which occurs while the autonomous vehicle is
driven, by referring to each of the frames and each piece of the
sensing information on each of the frames, to thereby generate each
of event codes, via a driving event detecting module, and (iii)
generate each of the scene codes for each of the frames by using
each of the class codes of each of the frames and each of the event
codes of each of the frames.
5. The method of claim 4, wherein the on-vehicle active learning
device performs or supports another device to perform a process of
allowing the scene code assigning module to (i) detect one or more
scene changes in the frames via the driving event detecting module
and thus generate one or more frame-based event codes and (ii)
detect one or more operation states, corresponding to the sensing
information, of the autonomous vehicle and thus generate one or
more vehicle-based event codes, to thereby generate the event
codes.
6. The method of claim 1, wherein, at the step of (b), the
on-vehicle active learning device performs or supports another
device to perform a process of selecting a certain frame, on which
no object is detected from its collision area, corresponding to a
collision event, as one of the specific frames by referring to the
scene codes, wherein the collision area is an area, in the certain
frame, where an object is estimated as being located if the
autonomous vehicle collides with the object or where the object is
estimated to be located if the autonomous vehicle is estimated to
collide with the object.
7. A method for on-vehicle active learning to be used for training
a perception network of an autonomous vehicle, comprising steps of:
(a) an on-vehicle active learning device, if a driving video and
sensing information are acquired respectively from a camera and one
or more sensors mounted on an autonomous vehicle while the
autonomous vehicle is driven, performing or supporting another
device to perform a process of inputting one or more consecutive
frames of the driving video and the sensing information into a
scene code assigning module, to thereby allow the scene code
assigning module to generate each of one or more scene codes
including information on each of scenes in each of the frames and
information on one or more driving events by referring to the
frames and the sensing information; and (b) the on-vehicle active
learning device performing or supporting another device to perform
at least one of (i) a process of selecting a first part of the
frames, whose object detection information generated during the
driving events satisfies a preset condition, as specific frames to
be used for training the perception network of the autonomous
vehicle by using each of the scene codes of each of the frames and
the object detection information, for each of the frames, detected
by an object detector and a process of storing the specific frames
and their corresponding specific scene codes in a frame storing
part such that the specific frames and their corresponding specific
scene codes match with one another and (ii) a process of selecting
a second part of the frames, matching with a training policy of the
perception network of the autonomous vehicle, as the specific
frames by using the scene codes and the object detection
information and a process of storing the specific frames and their
corresponding specific scene codes in the frame storing part such
that the specific frames and their corresponding specific scene
codes match with one another; and wherein, at the step of (b), the
on-vehicle active learning device performs or supports another
device to perform a process of selecting a certain frame, on which
an object is detected from its collision area, corresponding to a
normal event, as one of the specific frames by referring to the
scene codes, wherein the collision area is an area, in the certain
frame, where an object is estimated as being located if the
autonomous vehicle collides with the object or where the object is
estimated to be located if the autonomous vehicle is estimated to
collide with the object.
8. The method of claim 1, wherein, at the step of (b), the
on-vehicle active learning device performs or supports another
device to perform a process of selecting a certain frame where an
object, with its confidence score included in the object detection
information equal to or lower than a preset value, is located as
one of the specific frames.
9. The method of claim 1, wherein, at the step of (b), the
on-vehicle active learning device performs or supports another
device to perform a process of selecting a certain frame, from
which a pedestrian in a rare driving environment is detected, as
one of the specific frames, by referring to the scene codes.
10. An on-vehicle active learning device for on-vehicle active
learning to be used for training a perception network of an
autonomous vehicle, comprising: at least one memory that stores
instructions; and at least one processor configured to execute the
instructions to perform or support another device to perform: (I)
if a driving video and sensing information are acquired
respectively from a camera and one or more sensors mounted on an
autonomous vehicle while the autonomous vehicle is driven, a
process of inputting one or more consecutive frames of the driving
video and the sensing information into a scene code assigning
module, to thereby allow the scene code assigning module to
generate each of one or more scene codes including information on
each of scenes in each of the frames and information on one or more
driving events by referring to the frames and the sensing
information and (II) at least one of (i) a process of selecting a
first part of the frames, whose object detection information
generated during the driving events satisfies a preset condition,
as specific frames to be used for training the perception network
of the autonomous vehicle by using each of the scene codes of each
of the frames and the object detection information, for each of the
frames, detected by an object detector and a process of storing the
specific frames and their corresponding specific scene codes in a
frame storing part such that the specific frames and their
corresponding specific scene codes match with one another and (ii)
a process of selecting a second part of the frames, matching with a
training policy of the perception network of the autonomous
vehicle, as the specific frames by using the scene codes and the
object detection information and a process of storing the specific
frames and their corresponding specific scene codes in the frame
storing part such that the specific frames and their corresponding
specific scene codes match with one another; and wherein the
processor further performs or supports another device to perform:
(III) (III-1) a process of sampling the specific frames stored in
the frame storing part by using the specific scene codes to thereby
generate training data and (III-2) a process of executing
on-vehicle learning of the perception network of the autonomous
vehicle by using the training data.
11. (canceled)
12. The on-vehicle active learning device of claim 10, wherein, at
the process of (III), the processor performs or supports another
device to perform at least one of (i) a process of under-sampling
the specific frames by referring to the scene codes or a process of
over-sampling the specific frames by referring to the scene codes,
to thereby generate the training data and thus train the perception
network, at the process of (III-1) and (ii) (ii-1) a process of
calculating one or more weight-balanced losses on the training
data, corresponding to the scene codes, by weight balancing and
(ii-2) a process of training the perception network via
backpropagation using the weight-balanced losses, at the process of
(III-2).
13. An on-vehicle active learning device for on-vehicle active
learning to be used for training a perception network of an
autonomous vehicle, comprising: at least one memory that stores
instructions; and at least one processor configured to execute the
instructions to perform or support another device to perform: (I)
if a driving video and sensing information are acquired
respectively from a camera and one or more sensors mounted on an
autonomous vehicle while the autonomous vehicle is driven, a
process of inputting one or more consecutive frames of the driving
video and the sensing information into a scene code assigning
module, to thereby allow the scene code assigning module to
generate each of one or more scene codes including information on
each of scenes in each of the frames and information on one or more
driving events by referring to the Response to Non-Final Office
Action frames and the sensing information and (II) at least one of
(i) a process of selecting a first part of the frames, whose object
detection information generated during the driving events satisfies
a preset condition, as specific frames to be used for training the
perception network of the autonomous vehicle by using each of the
scene codes of each of the frames and the object detection
information, for each of the frames, detected by an object detector
and a process of storing the specific frames and their
corresponding specific scene codes in a frame storing part such
that the specific frames and their corresponding specific scene
codes match with one another and (ii) a process of selecting a
second part of the frames, matching with a training policy of the
perception network of the autonomous vehicle, as the specific
frames by using the scene codes and the object detection
information and a process of storing the specific frames and their
corresponding specific scene codes in the frame storing part such
that the specific frames and their corresponding specific scene
codes match with one another; and wherein, at the process of (I),
the processor performs or supports another device to perform a
process of allowing the scene code assigning module to (i) apply a
learning operation to each of the frames, to thereby classify each
of the scenes of each of the frames into one of classes of driving
environments and one of classes of driving roads and thus generate
each of class codes of each of the frames, via a scene classifier
based on deep learning, (ii) detect each of driving events, which
occurs while the autonomous vehicle is driven, by referring to each
of the frames and each piece of the sensing information on each of
the frames, to thereby generate each of event codes, via a driving
event detecting module, and (iii) generate each of the scene codes
for each of the frames by using each of the class codes of each of
the frames and each of the event codes of each of the frames.
14. The on-vehicle active learning device of claim 13, wherein the
processor performs or supports another device to perform a process
of allowing the scene code assigning module to (i) detect one or
more scene changes in the frames via the driving event detecting
module and thus generate one or more frame-based event codes and
(ii) detect one or more operation states, corresponding to the
sensing information, of the autonomous vehicle and thus generate
one or more vehicle-based event codes, to thereby generate the
event codes.
15. The on-vehicle active learning device of claim 10, wherein, at
the process of (II), the processor performs or supports another
device to perform a process of selecting a certain frame, on which
no object is detected from its collision area, corresponding to a
collision event, as one of the specific frames by referring to the
scene codes, wherein the collision area is an area, in the certain
frame, where an object is estimated as being located if the
autonomous vehicle collides with the object or where the object is
estimated to be located if the autonomous vehicle is estimated to
collide with the object.
16. An on-vehicle active learning device for on-vehicle active
learning to be used for training a perception network of an
autonomous vehicle, comprising: at least one memory that stores
instructions; and at least one processor configured to execute the
instructions to perform or support another device to perform: (I)
if a driving video and sensing information are acquired
respectively from a camera and one or more sensors mounted on an
autonomous vehicle while the autonomous vehicle is driven, a
process of inputting one or more consecutive frames of the driving
video and the sensing information into a scene code assigning
module, to thereby allow the scene code assigning module to
generate each of one or more scene codes including information on
each of scenes in each of the frames and information on one or more
driving events by referring to the frames and the sensing
information and (II) at least one of (i) a process of selecting a
first part of the frames, whose object detection information
generated during the driving events satisfies a preset condition,
as specific frames to be used for training the perception network
of the autonomous vehicle by using each of the scene codes of each
of the frames and the object detection information, for each of the
frames, detected by an object detector and a process of storing the
specific frames and their corresponding specific scene codes in a
frame storing part such that the specific frames and their
corresponding specific scene codes match with one another and (ii)
a process of selecting a second part of the frames, matching with a
training policy of the perception network of the autonomous
vehicle, as the specific frames by using the scene codes and the
object detection information and a process of storing the specific
frames and their corresponding specific scene codes in the frame
storing part such that the specific frames and their corresponding
specific scene codes match with one another; and wherein, at the
process of (II), the processor performs or supports another device
to perform a process of selecting a certain frame, on which an
object is detected from its collision area, corresponding to a
normal event, as one of the specific frames by referring to the
scene codes, wherein the collision area is an area, in the certain
frame, where an object is estimated as being located if the
autonomous vehicle collides with the object or where the object is
estimated to be located if the autonomous vehicle is estimated to
collide with the object.
17. The on-vehicle active learning device of claim 10, wherein, at
the process of (II), the processor performs or supports another
device to perform a process of selecting a certain frame where an
object, with its confidence score included in the object detection
information equal to or lower than a preset value, is located as
one of the specific frames.
18. The on-vehicle active learning device of claim 10, wherein, at
the process of (II), the processor performs or supports another
device to perform a process of selecting a certain frame, from
which a pedestrian in a rare driving environment is detected, as
one of the specific frames, by referring to the scene codes.
Description
CROSS REFERENCE TO RELATED APPLICATION
[0001] This application claims the benefit of priority to U.S.
Provisional Patent Application No. 63/014,877, filed on Apr. 24,
2020, the entire contents of which being incorporated herein by
reference.
FIELD OF THE DISCLOSURE
[0002] The present disclosure relates to a method for on-vehicle
active learning to be used for training a perception network of an
autonomous vehicle; and more particularly, to the method for
selecting training data, to be used for training the perception
network, from real-time data of the autonomous vehicle, and for
training the perception network with the selected training data,
and an on-vehicle active learning device using the same.
BACKGROUND OF THE DISCLOSURE
[0003] Recently, researches has been conducted on methods of
identifying objects via machine learning technologies.
[0004] As one of the machine learning technologies, deep learning,
which uses a neural network including multiple hidden layers
between an input layer and an output layer, has high performance on
the object identification.
[0005] And, the neural network is generally trained via
backpropagation using one or more losses.
[0006] Conventionally, in order to train a deep learning network,
raw data were collected according to a data collection policy, and
then human labelers perform annotation on the raw data, to thereby
generate new training data. Thereafter, by using the new training
data and existing training data, the deep learning network is
trained, and then, by referring to a result of analysis conducted
by human engineers, a training algorithm for the deep learning
network is revised and improved. Moreover, by referring to the
result of the analysis, the data collection policy and incorrect
annotations are revised.
[0007] However, as a performance of the deep learning network is
improved, hard examples useful for training become scarce in such
conventional methods. Accordingly, an efficiency of training the
deep learning network with new training data becomes less
productive, and, therefore, a return on investment from a data
annotation performed by the human labelers is reduced.
[0008] Meanwhile, the autonomous vehicle is a vehicle driven
without any action of a driver in response to driving information
and driving environments of the vehicle, and uses a perception
network based on deep learning in order to detect driving
environment information, e.g., objects, lanes, traffic signal, etc.
near the vehicle.
[0009] Such an autonomous vehicle requires online learning, that
is, training with the perception network installed, in order to
update the perception network. However, since a storage capacity of
an embedded system for the autonomous vehicle is limited, the
autonomous vehicle must perform data sampling on a database, e.g.,
cloud storage, in which the training data are stored in order to
acquire some part of the training data and update the perception
network using said some part of the training data.
[0010] Conventionally, sampling methods, such as a random sampling
method, metadata sampling method and manual curation sampling
method, etc. have been used for performing the data sampling.
However, such sampling methods are inappropriate for an on-vehicle
active learning since such sampling methods must store all data
under offline condition in order to perform the active
learning.
SUMMARY OF THE DISCLOSURE
[0011] It is an object of the present disclosure to solve all the
aforementioned problems.
[0012] It is another object of the present disclosure to provide a
method for allowing on-line active learning.
[0013] It is still another object of the present disclosure to
provide a method for improving an efficiency of training a
perception network with new training data.
[0014] It is still yet another object of the present disclosure to
provide a method for performing on-vehicle learning of the
perception network of an autonomous vehicle.
[0015] In accordance with one aspect of the present disclosure,
there is provided a method for on-vehicle active learning to be
used for training a perception network of an autonomous vehicle,
including steps of: (a) an on-vehicle active learning device, if a
driving video and sensing information are acquired respectively
from a camera and one or more sensors mounted on an autonomous
vehicle while the autonomous vehicle is driven, performing or
supporting another device to perform a process of inputting one or
more consecutive frames of the driving video and the sensing
information into a scene code assigning module, to thereby allow
the scene code assigning module to generate each of one or more
scene codes including information on each of scenes in each of the
frames and information on one or more driving events by referring
to the frames and the sensing information; and (b) the on-vehicle
active learning device performing or supporting another device to
perform at least one of (i) a process of selecting a first part of
the frames, whose object detection information generated during the
driving events satisfies a preset condition, as specific frames to
be used for training the perception network of the autonomous
vehicle by using each of the scene codes of each of the frames and
the object detection information, for each of the frames, detected
by an object detector and a process of storing the specific frames
and their corresponding specific scene codes in a frame storing
part such that the specific frames and their corresponding specific
scene codes match with one another and (ii) a process of selecting
a second part of the frames, matching with a training policy of the
perception network of the autonomous vehicle, as the specific
frames by using the scene codes and the object detection
information and a process of storing the specific frames and their
corresponding specific scene codes in the frame storing part such
that the specific frames and their corresponding specific scene
codes match with one another.
[0016] As one example, the method further includes a step of: (c)
the on-vehicle active learning device performing or supporting
another device to perform (c1) a process of sampling the specific
frames stored in the frame storing part by using the specific scene
codes to thereby generate training data and (c2) a process of
executing on-vehicle learning of the perception network of the
autonomous vehicle by using the training data.
[0017] As one example, at the step of (c), the on-vehicle active
learning device performs or supports another device to perform at
least one of (i) a process of under-sampling the specific frames by
referring to the scene codes or a process of over-sampling the
specific frames by referring to the scene codes, to thereby
generate the training data and thus train the perception network,
at the step of (c1) and (ii) (ii-1) a process of calculating one or
more weight-balanced losses on the training data, corresponding to
the scene codes, by weight balancing and (ii-2) a process of
training the perception network via backpropagation using the
weight-balanced losses, at the step of (c2).
[0018] As one example, at the step of (a), the on-vehicle active
learning device performs or supports another device to perform a
process of allowing the scene code assigning module to (i) apply a
learning operation to each of the frames, to thereby classify each
of the scenes of each of the frames into one of classes of driving
environments and one of classes of driving roads and thus generate
each of class codes of each of the frames, via a scene classifier
based on deep learning, (ii) detect each of driving events, which
occurs while the autonomous vehicle is driven, by referring to each
of the frames and each piece of the sensing information on each of
the frames, to thereby generate each of event codes, via a driving
event detecting module, and (iii) generate each of the scene codes
for each of the frames by using each of the class codes of each of
the frames and each of the event codes of each of the frames.
[0019] As one example, the on-vehicle active learning device
performs or supports another device to perform a process of
allowing the scene code assigning module to (i) detect one or more
scene changes in the frames via the driving event detecting module
and thus generate one or more frame-based event codes and (ii)
detect one or more operation states, corresponding to the sensing
information, of the autonomous vehicle and thus generate one or
more vehicle-based event codes, to thereby generate the event
codes.
[0020] As one example, at the step of (b), the on-vehicle active
learning device performs or supports another device to perform a
process of selecting a certain frame, on which no object is
detected from its collision area, corresponding to a collision
event, as one of the specific frames by referring to the scene
codes, wherein the collision area is an area, in the certain frame,
where an object is estimated as being located if the autonomous
vehicle collides with the object or where the object is estimated
to be located if the autonomous vehicle is estimated to collide
with the object.
[0021] As one example, at the step of (b), the on-vehicle active
learning device performs or supports another device to perform a
process of selecting a certain frame, on which an object is
detected from its collision area, corresponding to a normal event,
as one of the specific frames by referring to the scene codes,
wherein the collision area is an area, in the certain frame, where
an object is estimated as being located if the autonomous vehicle
collides with the object or where the object is estimated to be
located if the autonomous vehicle is estimated to collide with the
object.
[0022] As one example, at the step of (b), the on-vehicle active
learning device performs or supports another device to perform a
process of selecting a certain frame where an object, with its
confidence score included in the object detection information equal
to or lower than a preset value, is located as one of the specific
frames.
[0023] As one example, at the step of (b), the on-vehicle active
learning device performs or supports another device to perform a
process of selecting a certain frame, from which a pedestrian in a
rare driving environment is detected, as one of the specific
frames, by referring to the scene codes.
[0024] In accordance with another aspect of the present disclosure,
there is provided an on-vehicle active learning device for
on-vehicle active learning to be used for training a perception
network of an autonomous vehicle, including: at least one memory
that stores instructions; and at least one processor configured to
execute the instructions to perform or support another device to
perform: (I) if a driving video and sensing information are
acquired respectively from a camera and one or more sensors mounted
on an autonomous vehicle while the autonomous vehicle is driven, a
process of inputting one or more consecutive frames of the driving
video and the sensing information into a scene code assigning
module, to thereby allow the scene code assigning module to
generate each of one or more scene codes including information on
each of scenes in each of the frames and information on one or more
driving events by referring to the frames and the sensing
information and (II) at least one of (i) a process of selecting a
first part of the frames, whose object detection information
generated during the driving events satisfies a preset condition,
as specific frames to be used for training the perception network
of the autonomous vehicle by using each of the scene codes of each
of the frames and the object detection information, for each of the
frames, detected by an object detector and a process of storing the
specific frames and their corresponding specific scene codes in a
frame storing part such that the specific frames and their
corresponding specific scene codes match with one another and (ii)
a process of selecting a second part of the frames, matching with a
training policy of the perception network of the autonomous
vehicle, as the specific frames by using the scene codes and the
object detection information and a process of storing the specific
frames and their corresponding specific scene codes in the frame
storing part such that the specific frames and their corresponding
specific scene codes match with one another.
[0025] As one example, the processor further performs or supports
another device to perform: (III) (III-1) a process of sampling the
specific frames stored in the frame storing part by using the
specific scene codes to thereby generate training data and (III-2)
a process of executing on-vehicle learning of the perception
network of the autonomous vehicle by using the training data.
[0026] As one example, at the process of (III), the processor
performs or supports another device to perform at least one of (i)
a process of under-sampling the specific frames by referring to the
scene codes or a process of over-sampling the specific frames by
referring to the scene codes, to thereby generate the training data
and thus train the perception network, at the process of (III-1)
and (ii) (ii-1) a process of calculating one or more
weight-balanced losses on the training data, corresponding to the
scene codes, by weight balancing and (ii-2) a process of training
the perception network via backpropagation using the
weight-balanced losses, at the process of (III-2).
[0027] As one example, at the process of (I), the processor
performs or supports another device to perform a process of
allowing the scene code assigning module to (i) apply a learning
operation to each of the frames, to thereby classify each of the
scenes of each of the frames into one of classes of driving
environments and one of classes of driving roads and thus generate
each of class codes of each of the frames, via a scene classifier
based on deep learning, (ii) detect each of driving events, which
occurs while the autonomous vehicle is driven, by referring to each
of the frames and each piece of the sensing information on each of
the frames, to thereby generate each of event codes, via a driving
event detecting module, and (iii) generate each of the scene codes
for each of the frames by using each of the class codes of each of
the frames and each of the event codes of each of the frames.
[0028] As one example, the processor performs or supports another
device to perform a process of allowing the scene code assigning
module to (i) detect one or more scene changes in the frames via
the driving event detecting module and thus generate one or more
frame-based event codes and (ii) detect one or more operation
states, corresponding to the sensing information, of the autonomous
vehicle and thus generate one or more vehicle-based event codes, to
thereby generate the event codes.
[0029] As one example, at the process of (II), the processor
performs or supports another device to perform a process of
selecting a certain frame, on which no object is detected from its
collision area, corresponding to a collision event, as one of the
specific frames by referring to the scene codes, wherein the
collision area is an area, in the certain frame, where an object is
estimated as being located if the autonomous vehicle collides with
the object or where the object is estimated to be located if the
autonomous vehicle is estimated to collide with the object.
[0030] As one example, at the process of (II), the processor
performs or supports another device to perform a process of
selecting a certain frame, on which an object is detected from its
collision area, corresponding to a normal event, as one of the
specific frames by referring to the scene codes, wherein the
collision area is an area, in the certain frame, where an object is
estimated as being located if the autonomous vehicle collides with
the object or where the object is estimated to be located if the
autonomous vehicle is estimated to collide with the object.
[0031] As one example, at the process of (II), the processor
performs or supports another device to perform a process of
selecting a certain frame where an object, with its confidence
score included in the object detection information equal to or
lower than a preset value, is located as one of the specific
frames.
[0032] As one example, at the process of (II), the processor
performs or supports another device to perform a process of
selecting a certain frame, from which a pedestrian in a rare
driving environment is detected, as one of the specific frames, by
referring to the scene codes.
[0033] In addition, recordable media readable by a computer for
storing a computer program to execute the method of the present
disclosure is further provided.
BRIEF DESCRIPTION OF THE DRAWINGS
[0034] The following drawings to be used to explain example
embodiments of the present disclosure are only part of example
embodiments of the present disclosure and other drawings can be
obtained based on the drawings by those skilled in the art of the
present disclosure without inventive work.
[0035] FIG. 1 is a drawing schematically illustrating an on-vehicle
active learning device for on-vehicle active learning to be used
for training a perception network of an autonomous vehicle in
accordance with one example embodiment of the present
disclosure.
[0036] FIG. 2 is a drawing schematically illustrating a method for
the on-vehicle active learning in accordance with one example
embodiment of the present disclosure.
[0037] FIG. 3 is a drawing schematically illustrating a method for
generating a scene code during processes of the on-vehicle active
learning in accordance with one example embodiment of the present
disclosure.
[0038] FIG. 4 is a drawing schematically illustrating a method for
determining a useful frame, which has a degree of usefulness higher
than a threshold usefulness value, for the on-vehicle active
learning in accordance with one example embodiment of the present
disclosure.
[0039] FIG. 5 is a drawing schematically illustrating another
method for determining the useful frame for the on-vehicle active
learning in accordance with one example embodiment of the present
disclosure.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
[0040] Detailed explanation on the present disclosure to be made
below refer to attached drawings and diagrams illustrated as
specific embodiment examples under which the present disclosure may
be implemented to make clear of purposes, technical solutions, and
advantages of the present disclosure. These embodiments are
described in sufficient detail to enable those skilled in the art
to practice the invention.
[0041] Besides, in the detailed description and claims of the
present disclosure, a term "include" and its variations are not
intended to exclude other technical features, additions, components
or steps. Other objects, benefits and features of the present
disclosure will be revealed to one skilled in the art, partially
from the specification and partially from the implementation of the
present disclosure. The following examples and drawings will be
provided as examples but they are not intended to limit the present
disclosure.
[0042] Moreover, the present disclosure covers all possible
combinations of example embodiments indicated in this
specification. It is to be understood that the various embodiments
of the present disclosure, although different, are not necessarily
mutually exclusive. For example, a particular feature, structure,
or characteristic described herein in connection with one
embodiment may be implemented within other embodiments without
departing from the spirit and scope of the present disclosure. In
addition, it is to be understood that the position or arrangement
of individual elements within each disclosed embodiment may be
modified without departing from the spirit and scope of the present
disclosure. The following detailed description is, therefore, not
to be taken in a limiting sense, and the scope of the present
disclosure is defined only by the appended claims, appropriately
interpreted, along with the full range of equivalents to which the
claims are entitled. In the drawings, similar reference numerals
refer to the same or similar functionality throughout the several
aspects.
[0043] To allow those skilled in the art to carry out the present
disclosure easily, the example embodiments of the present
disclosure by referring to attached diagrams will be explained in
detail as shown below.
[0044] FIG. 1 is a drawing schematically illustrating an on-vehicle
active learning device for on-vehicle active learning, to be used
for training a perception network of an autonomous vehicle, in
accordance with one example embodiment of the present disclosure.
By referring to FIG. 1, the on-vehicle active learning device 1000
may include a memory 1001 which stores one or more instructions for
performing the on-vehicle active learning of one or more
consecutive frames in a driving video acquired from the autonomous
vehicle, and a processor 1002 which performs functions for the
on-vehicle active learning in response to the instructions stored
in the memory 1001.
[0045] Specifically, the basic learning device 1000 may typically
achieve a desired system performance by using combinations of at
least one computing device and at least one computer software,
e.g., a computer processor, a memory, a storage, an input device,
an output device, or any other conventional computing components,
an electronic communication device such as a router or a switch, an
electronic information storage system such as a network-attached
storage (NAS) device and a storage area network (SAN) as the
computing device and any instructions that allow the computing
device to function in a specific way as the computer software.
[0046] The processor of the computing device may include hardware
configuration of MPU (Micro Processing Unit) or CPU (Central
Processing Unit), cache memory, data bus, etc. Additionally, the
computing device may further include software configuration of OS
and applications that achieve specific purposes.
[0047] However, such description of the computing device does not
exclude an integrated device including any combination of a
processor, a memory, a medium, or any other computing components
for implementing the present disclosure.
[0048] Meanwhile, a method for using the on-vehicle active learning
device 1000 for the on-vehicle active learning, to be used for
training the perception network of the autonomous vehicle, is
explained below by referring to FIG. 2 in accordance with one
example embodiment of the present disclosure.
[0049] First, if the driving video and sensing information are
acquired respectively from a camera, e.g., an image sensor, and one
or more sensors mounted on the autonomous vehicle while the
autonomous vehicle is driven, the on-vehicle active learning device
1000 may perform or support another device to perform a process of
inputting one or more consecutive frames of the driving video and
the sensing information into a scene code assigning module 1200, to
thereby allow the scene code assigning module 1200 to generate each
of one or more scene codes including information on each of scenes
in each of the frames and information on one or more driving events
by referring to the frames and the sensing information.
[0050] Herein, each of the scene codes may be created by encoding,
e.g., codifying, information on each of the scenes of each of the
frames and information on the driving events.
[0051] For example, by referring to FIG. 3, the scene code
assigning module 1200 may perform or support another device to
perform a process of applying a learning operation to each of the
frames and thus classifying each of the scenes of each of the
frames into one of preset classes of driving environments and one
of preset classes of driving roads, to thereby generate each of
class codes of each of the frames, via a scene classifier 1210
based on deep learning. That is, the scene classifier 1210 may
extract features of each of the frames and classify the extracted
features into one of the classes of the driving environments and
one of the classes of the driving roads, to thereby generate each
of the class codes of each of the frames.
[0052] Herein, the driving environments may include information on
weather and information on a time zone of an area where the
autonomous vehicle is driven, but the scope of the present
disclosure is not limited thereto, and may include various
information on weather in a local area or region where the
autonomous vehicle is driven. Also, the information on weather may
include information on weather phenomena like sunshine, rain, snow,
fog, etc. and the information on the time zone may include
information like day, night, etc. Also, the driving roads may
include types of roads, e.g., a highway, an urban road, a tunnel,
etc., where the autonomous vehicle is driven, but the scope of the
present disclosure is not limited thereto, and may include various
road environments where the autonomous vehicle is driven.
[0053] Also, the scene code assigning module 1200 may perform or
support another device to perform a process of detecting each of
the driving events, which occurs while the autonomous vehicle is
driven, by referring to each of the frames and each piece of the
sensing information on each of the frames, to thereby generate each
of event codes, via a driving event detecting module 1220.
[0054] Herein, the event codes may include (1) frame-based event
codes detected by using the consecutive frames and (2)
vehicle-based event codes detected by using the sensing
information.
[0055] As one example, the scene code assigning module 1200 may
perform or support another device to perform a process of inputting
the consecutive frames into a scene change detector of the driving
event detecting module 1220, to thereby allow the scene change
detector to detect whether each of the scenes of each of the
consecutive frames is changed and thus generate each of the
frame-based event codes corresponding to each of the frames.
Herein, the frame-based event codes may include codes respectively
corresponding to a uniform sample, a scene change, etc. according
to whether the scenes are changed. In addition, the scene code
assigning module 1200 may perform or support another device to
perform a process of detecting operation states of the autonomous
vehicle by using the sensing information and thus detecting events
which occur while the autonomous vehicle is driven, to thereby
generate vehicle-based event codes. Herein, the vehicle-based event
codes may include codes respectively corresponding to a rapid
steering, rapid brake slamming, normal action, AEB activated
action, etc. And, the scene code assigning module 1200 may perform
or support another device to perform a process of generating each
of the scene codes of each of the frames by using each of the class
codes of each of the frames and each of the event codes of each of
the frames.
[0056] The following table may indicate each of the scene codes
assigned to each of the frames.
TABLE-US-00001 Class code Event code Driving environment
Frame-based Vehicle-based (weather/time) Driving road event code
event code sunshine, rain, snow, highway/city/ uniform rapid
steering/ fog, etc. day/night tunnel sample/ rapid brake scene
change slamming/ normal action/ AEB activated
[0057] However, it should be noted that the scene codes listed in
the above table are not to be taken in a limiting sense, and
various types of the scene codes of the frames in the driving video
can be generated.
[0058] Herein, by referring to FIG. 2 again, the driving video and
the sensing information may be inputted into a driving video &
driving information analyzing module 1110 of the autonomous
vehicle.
[0059] Then, the driving video & driving information analyzing
module 1110 may perform or support another device to perform a
process of applying a learning operation to the consecutive frames
of the driving video, to thereby detect information on a nearby
environment of the autonomous vehicle, for example, information on
objects, such as vehicles, pedestrians, etc., information on lanes,
information on traffic signal of the driving road, etc. via the
perception network, and a process of detecting information on the
operation states of the autonomous vehicle by referring to the
sensing information. And, the information on the nearby environment
and the information on the operation states of the autonomous
vehicle may be transmitted to an autonomous driving controlling
part 1500, and the autonomous driving controlling part 1500 may
control operation of the autonomous vehicle by using the
information on the nearby environment and the information on the
operation states.
[0060] As one example, the driving video & driving information
analyzing module 1110 may perform or support another device to
perform a process of detecting objects from the frames of the
driving video, to thereby generate object detection information of
each of the frames, via an object detector based on deep learning,
for example, the object detector based on a convolutional neural
network (CNN), or a process of segmenting the frames of the driving
video, to thereby generate the information on the lanes on each of
the frames, via a segmentation network based on deep learning.
Also, the driving video & driving information analyzing module
1110 may also perform or support another device to perform a
process of outputting the information on the operation states of
the autonomous vehicle. Herein, the information on the operation
states may include information on driving conditions of the
autonomous vehicle respectively corresponding to an acceleration, a
deceleration, a steering wheel operation, an activation autonomous
emergency brake (AEB), etc. of the autonomous vehicle.
[0061] Next, the on-vehicle active learning device 1000 may perform
or support another device to perform a process of selecting frames
useful for the training data, with which the perception network of
the autonomous vehicle is to be trained, by using each of the scene
codes of each of the frames and the object detection information on
each of the frames detected by the object detector, via a frame
selecting module 1300 and a process of storing the frames, selected
as the training data, in a frame storing part 1400.
[0062] That is, the on-vehicle active learning device 1000 may
perform or support another device to perform a process of allowing
the frame selecting module 1300 to select frames, i.e., images,
which are useful for training the perception network based on deep
learning of the autonomous vehicle, among the consecutive frames
acquired from the driving video.
[0063] Herein, the frame selecting module 1300 may select the
frames useful for training the perception network in various
ways.
[0064] That is, the on-vehicle active learning device 1000 may
perform or support another device to perform (i) a process of
selecting a first part of the frames, whose object detection
information generated during the driving events satisfies a preset
condition, as specific frames to be used for training the
perception network of the autonomous vehicle, via a frame selecting
module 1300, by using each of the scene codes of each of the frames
and the object detection information, for each of the frames,
detected by an object detector and (ii) a process of storing the
specific frames and their corresponding specific scene codes in a
frame storing part 1400, i.e., a memory with limited capacity
installed on the autonomous vehicle, such that the specific frames
and their corresponding specific scene codes match with one
another.
[0065] Also, the on-vehicle active learning device 1000 may perform
or support another device to perform a process of selecting a
second part of the frames, matching with a training policy of the
perception network of the autonomous vehicle, as the specific
frames among the frames by using the scene codes and the object
detection information, via the frame selecting module 1300 and a
process of storing the specific frames and their corresponding
specific scene codes in the frame storing part 1400 such that the
specific frames and their corresponding specific scene codes match
with one another.
[0066] As one example, the on-vehicle active learning device 1000
may perform or support another device to perform a process of
selecting a certain frame, which has a collision area where no
object is detected in a collision event, as one of the specific
frames useful for training the perception network by referring to
the scene codes. Herein, the collision event may be a driving event
performed in a situation, e.g., a sudden braking, a sudden right
turn, a sudden left turn, etc., in which an operation state of the
autonomous vehicle represents a traffic collision or an estimated
traffic collision. For example, the collision event may include an
event where braking of the autonomous vehicle occurs when a traffic
collision is expected to be imminent, but the scope of the present
disclosure is not limited thereto. Herein, the collision area may
be an area, in the certain frame, where an object is estimated as
being located if the autonomous vehicle collides with the object or
where the object is estimated to be located if the autonomous
vehicle is estimated to collide with the object.
[0067] That is, if an event code of the autonomous vehicle
corresponds to a sudden braking, a sudden right turn, a sudden left
turn, etc., an object must be detected in the collision area,
however, if no object is detected in the collision area on one of
the frames of the driving video, a false negative is suspected,
therefore, said one of the frames may be selected as one of the
specific frames useful for training the perception network.
[0068] Also, the on-vehicle active learning device 1000 may perform
or support another device to perform a process of selecting a
certain frame, which has the collision area where an object is
detected in a normal event, as one of the specific frames useful
for training the perception network by referring to the scene
codes. Herein, the normal event may be an event where the
autonomous vehicle is driven normally without any accidents or
collisions.
[0069] That is, if the autonomous vehicle is driven normally
without any accidents or collisions, etc., no object should be
detected in the collision area, however, if an object is detected
in the collision areas on one of the frames of the driving video, a
function false alarm is suspected, therefore, said one of the
frames may be selected as one of the specific frames useful for
training the perception network.
[0070] Also, the on-vehicle active learning device 1000 may perform
or support another device to perform a process of selecting a
certain frame, where an object with its confidence score included
in the object detection information equal to or lower than a preset
value is located, as one of the specific frames which are useful
for training the perception network.
[0071] And, for frames corresponding to situations other than the
specific situations described above, the perception network is
determined as properly operating on such frames, therefore, such
frames may be determined as frames not useful for training the
perception network and be discarded.
[0072] Meanwhile, according to a training policy of the perception
network, the on-vehicle active learning device 1000 may perform or
support another device to perform a process of selecting a certain
frame, from which a pedestrian in a rare driving environment is
detected, as one of the specific frames which are useful for
training the perception network by referring to the scene
codes.
[0073] As one example, in case the scene code corresponds to a
rainy night, a frame where a pedestrian is detected may be
determined as a hard example, that is, an example which has the
degree of usefulness higher than a threshold usefulness value, to
be used for training the perception network and thus said frame may
be determined as useful for training the perception network. As
another example, in case the scene code corresponds to a sunny day,
the perception network may be determined as sufficiently trained,
and therefore, said frame may be determined as not useful for
training the perception network in order to avoid overfitting.
[0074] However, it should be noted that the method described above
for determining whether the frames of the driving video are useful
for training the perception network or not is just an example. That
is, the scope of the present disclosure is not limited thereto and
the method may vary by set conditions.
[0075] Meanwhile, the frame selecting module 1300 may determine
whether the frames of the driving video are useful for training the
perception network or not by using a trained network, i.e., a
trained deep learning network.
[0076] For example, by referring to FIG. 4, the frame selecting
module 1300 may perform or support another device to perform a
process of inputting the frames into an auto labeling network 1310
and the trained deep learning network 1320, respectively.
Thereafter, by performing an output comparison, which is a process
of comparing an output from the auto labeling network 1310 and an
output from the trained deep learning network 1320, the frames may
be determined as useful or not for training the perception network.
If the outputs are identical or similar to each other, the frames
may be determined as not useful. And, if a difference between the
outputs is equal or greater than a predetermined value, the frames
may be considered as hard examples and determined useful for
training the perception network.
[0077] As another example, by referring to FIG. 5, the frame
selecting module 1300 may perform or support another device to
perform a process of modifying the frames in various ways, to
thereby create various modified frames. Herein, the various ways of
modifying the frames may include resizing the frames, changing
aspect ratios of the frames, changing color tone of the frames,
etc. And then, the frame selecting module 1300 may perform or
support another device to perform a process of inputting each of
the modified frames into the trained deep learning network 1320.
Thereafter, by computing a variance of output values of each of the
modified frames from the trained deep learning network 1320, the
frames may be determined as useful or not for training the
perception network. If the computed variance is equal or smaller
than a preset threshold, the frames may be determined as not
useful. And, if the computed variance is greater than the preset
threshold, the frames may be considered as hard examples and thus
determined as useful for training the perception network.
[0078] Next, the on-vehicle active learning device 1000 may perform
or support another device to perform (i) a process of sampling the
specific frames stored in the frame storing part 1400 by using the
specific scene codes to thereby generate training data and (ii) a
process of executing on-vehicle learning of the perception network
of the autonomous vehicle by using the training data.
[0079] Herein, the on-vehicle active learning device 1000 may
perform or support another device to perform (i) a process of
under-sampling through selecting a part of the specific frames in a
majority class and as many as possible of the specific frames in a
minority class by referring to the scene codes or (ii) a process of
over-sampling through generating copies of the specific frames in
the minority class as many as the number of the specific frames in
the majority class, by referring to the scene codes, at the step of
sampling the specific frames stored in the frame storing part 1400,
to thereby generate the training data and thus train the perception
network with the sampled training data. For example, in case that
the number of frames corresponding to the majority class is 100 and
that the number of frames corresponding to the minority class is
10, then if a desired number of frames to be sampled is 30, ten
frames corresponding to the minority class may be selected and
twenty frames corresponding to the majority class may be
selected.
[0080] Also, the on-vehicle active learning device 1000 may perform
or support another device to perform a process of calculating one
or more weight-balanced losses on the training data, corresponding
to the scene codes, by weight balancing, to thereby train the
perception network via backpropagation by using the weight-balanced
losses, at the step of executing the on-vehicle learning of the
perception network by using the specific frames stored in the frame
storing part 1400.
[0081] The present disclosure has an effect of providing the method
for improving an efficiency of training the perception network with
new training data by performing a process of assigning the scene
code corresponding to a frame of a video, a process of determining
the frame as useful for training or not, and then a process of
storing the data in a storage of a vehicle.
[0082] The present disclosure has another effect of providing the
method for performing the on-line active learning on the vehicle
itself, through sampling balancing on the training data according
to the scene code.
[0083] The present disclosure has still another effect of providing
the method for performing the on-vehicle learning of the perception
network of the autonomous vehicle by performing the sampling
balancing on the training data according to its corresponding scene
code.
[0084] The embodiments of the present disclosure as explained above
can be implemented in a form of executable program command through
a variety of computer means recordable to computer readable media.
The computer readable media may include solely or in combination,
program commands, data files, and data structures. The program
commands recorded to the media may be components specially designed
for the present disclosure or may be usable to those skilled in the
art. Computer readable media include magnetic media such as hard
disk, floppy disk, and magnetic tape, optical media such as CD-ROM
and DVD, magneto-optical media such as floptical disk and hardware
devices such as ROM, RAM, and flash memory specially designed to
store and carry out program commands. Program commands include not
only a machine language code made by a complier but also a high
level code that can be used by an interpreter etc., which is
executed by a computer. The aforementioned hardware device can work
as more than a software module to perform the action of the present
disclosure and vice versa.
[0085] As seen above, the present disclosure has been explained by
specific matters such as detailed components, limited embodiments,
and drawings. They have been provided only to help more general
understanding of the present disclosure. It, however, will be
understood by those skilled in the art that various changes and
modification may be made from the description without departing
from the spirit and scope of the disclosure as defined in the
following claims.
[0086] Accordingly, the thought of the present disclosure must not
be confined to the explained embodiments, and the following patent
claims as well as everything including variations equal or
equivalent to the patent claims pertain to the category of the
thought of the present disclosure.
* * * * *