U.S. patent application number 17/442075 was filed with the patent office on 2022-06-02 for information processing method, program, and information processing apparatus.
This patent application is currently assigned to Sony Semiconductor Solutions Corporation. The applicant listed for this patent is Sony Semiconductor Solutions Corporation. Invention is credited to Haruyoshi Yonekawa.
Application Number | 20220172484 17/442075 |
Document ID | / |
Family ID | 1000006209744 |
Filed Date | 2022-06-02 |
United States Patent
Application |
20220172484 |
Kind Code |
A1 |
Yonekawa; Haruyoshi |
June 2, 2022 |
INFORMATION PROCESSING METHOD, PROGRAM, AND INFORMATION PROCESSING
APPARATUS
Abstract
The present technology relates to an information processing
method, a program, and an information processing apparatus capable
of analyzing a learning situation of a model using a neural
network. In step S1, feature data that numerically represents a
feature of a feature map generated from input data in a model using
a neural network is generated. In step S2, analysis data based on
the feature data of a plurality of the feature maps is generated.
The present technology can be applied to, for example, a system
that recognizes a vehicle in front of a vehicle.
Inventors: |
Yonekawa; Haruyoshi;
(Kanagawa, JP) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
Sony Semiconductor Solutions Corporation |
Kanagawa |
|
JP |
|
|
Assignee: |
Sony Semiconductor Solutions
Corporation
Kanagawa
JP
|
Family ID: |
1000006209744 |
Appl. No.: |
17/442075 |
Filed: |
March 17, 2020 |
PCT Filed: |
March 17, 2020 |
PCT NO: |
PCT/JP2020/011601 |
371 Date: |
September 22, 2021 |
Current U.S.
Class: |
1/1 |
Current CPC
Class: |
G01S 13/89 20130101;
G06V 10/82 20220101; G06V 20/56 20220101; G06N 3/08 20130101 |
International
Class: |
G06V 20/56 20060101
G06V020/56; G06V 10/82 20060101 G06V010/82; G06N 3/08 20060101
G06N003/08; G01S 13/89 20060101 G01S013/89 |
Foreign Application Data
Date |
Code |
Application Number |
Mar 29, 2019 |
JP |
2019-065378 |
Claims
1. An information processing method comprising: a feature data
generation step of generating feature data that numerically
represents a feature of a feature map generated from input data in
a model using a neural network; and an analysis data generation
step of generating analysis data based on the feature data of a
plurality of the feature maps.
2. The information processing method according to claim 1, wherein
the feature data of a plurality of the feature maps is arranged in
the analysis data.
3. The information processing method according to claim 2, wherein
in the analysis data, the feature data of a plurality of the
feature maps generated from a plurality of the input data is
arranged in a predetermined hierarchy of the model that generates a
plurality of the feature maps from one of the input data.
4. The information processing method according to claim 3, wherein
the feature data of a plurality of the feature maps generated from
the same input data is arranged in a first direction of the
analysis data, and the feature data of the feature maps
corresponding to each other of the different input data is arranged
in a second direction orthogonal to the first direction.
5. The information processing method according to claim 4, wherein
the feature data indicates a degree of dispersion of pixel values
of the feature map.
6. The information processing method according to claim 5, further
comprising: an analysis step of analyzing a learning situation of
the model on a basis of a line in the first direction and a line in
the second direction of the analysis data.
7. The information processing method according to claim 6, further
comprising: a parameter setting step of setting a parameter for
learning of the model on a basis of an analysis result of the
learning situation of the model.
8. The information processing method according to claim 7, wherein
in the parameter setting step, a regularization parameter for
learning of the model is set on a basis of the number of lines in
the second direction of the analysis data.
9. The information processing method according to claim 3, wherein
the feature data indicates a frequency distribution of pixel values
of the feature map, and in the analysis data, the feature data of a
plurality of the feature maps generated from a plurality of the
input data is three-dimensionally arranged in the hierarchy of the
model.
10. The information processing method according to claim 1, wherein
the input data is image data, and the model performs recognition
processing of an object.
11. The information processing method according to claim 10,
wherein the model performs recognition processing of a vehicle.
12. The information processing method according to claim 11,
wherein the input data is image data representing a distribution of
intensity of a received signal of a millimeter wave radar in a
bird's-eye view.
13. The information processing method according to claim 12,
wherein the model converts the image data into an image in a camera
coordinate system.
14. The information processing method according to claim 1, wherein
the analysis data includes the feature data satisfying a
predetermined condition among the feature data of a plurality of
the feature maps.
15. A program that causes a computer to execute processing
comprising: a feature data generation step of generating feature
data that numerically represents a feature of a feature map
generated from input data in a model using a neural network; and an
analysis data generation step of generating analysis data based on
the feature data of a plurality of the feature maps.
16. An information processing apparatus comprising: a feature data
generation unit that generates feature data that numerically
represents a feature of a feature map generated from input data in
a model using a neural network; and an analysis data generation
unit that generates analysis data based on the feature data of a
plurality of the feature maps.
Description
TECHNICAL FIELD
[0001] The present technology relates to an information processing
method, a program, and an information processing apparatus, and
particularly relates to an information processing method, a
program, and an information processing apparatus suitable for use
in a case of analyzing a model using a neural network.
BACKGROUND ART
[0002] Conventionally, a neural network is used in an image pattern
recognition device (see, for example, Patent Document 1).
CITATION LIST
Patent Document
[0003] Patent Document 1: Japanese Patent Application Laid-Open No.
H5-61976
SUMMARY OF THE INVENTION
Problems to be Solved by the Invention
[0004] Meanwhile, it is desired to be able to analyze a learning
situation of a model using a neural network.
[0005] The present technology has been made in view of such a
situation, and enables analysis of a learning situation of a model
using a neural network.
Solutions to Problems
[0006] An information processing method according to one aspect of
the present technology includes: a feature data generation step of
generating feature data that numerically represents a feature of a
feature map generated from input data in a model using a neural
network; and an analysis data generation step of generating
analysis data based on the feature data of a plurality of the
feature maps.
[0007] A program according to one aspect of the present technology
causes a computer to execute processing including: a feature data
generation step of generating feature data that numerically
represents a feature of a feature map generated from input data in
a model using a neural network; and an analysis data generation
step of generating analysis data based on the feature data of a
plurality of the feature maps.
[0008] An information processing apparatus according to one aspect
of the present technology includes: a feature data generation unit
that generates feature data that numerically represents a feature
of a feature map generated from input data in a model using a
neural network; and an analysis data generation unit that generates
analysis data based on the feature data of a plurality of the
feature maps.
[0009] In one aspect of the present technology, feature data that
numerically represents a feature of a feature map generated from
input data in a model using a neural network is generated, and
analysis data based on the feature data of a plurality of the
feature maps is generated.
BRIEF DESCRIPTION OF DRAWINGS
[0010] FIG. 1 is a block diagram illustrating a configuration
example of a vehicle control system.
[0011] FIG. 2 is a diagram illustrating a first embodiment of an
object recognition model.
[0012] FIG. 3 is a diagram illustrating an example of feature
maps.
[0013] FIG. 4 is a block diagram illustrating a configuration
example of an information processing apparatus to which the present
technology is applied.
[0014] FIG. 5 is a flowchart for explaining learning situation
analysis processing.
[0015] FIG. 6 is a diagram illustrating an example of feature maps
having large variance of pixel values and feature maps having small
variance of pixel values.
[0016] FIG. 7 is a diagram illustrating an example in which feature
maps are arranged in a line.
[0017] FIG. 8 is a diagram illustrating an example in which feature
maps are arranged two-dimensionally.
[0018] FIG. 9 is a diagram illustrating an example of a variance
vector and an image of the variance vector.
[0019] FIG. 10 is a diagram illustrating an example of analysis
data.
[0020] FIG. 11 is a diagram illustrating a first example of a state
of change in the analysis data with progress of learning.
[0021] FIG. 12 is a diagram illustrating an example of feature maps
at the beginning of learning and at the end of learning.
[0022] FIG. 13 is a diagram illustrating an example of the analysis
data.
[0023] FIG. 14 is a diagram illustrating an example of the analysis
data in a case where regularization is too strong.
[0024] FIG. 15 is a diagram illustrating an example of the analysis
data in a case where the regularization is too weak.
[0025] FIG. 16 is a diagram illustrating a second embodiment of an
object recognition model.
[0026] FIG. 17 is a diagram illustrating a second example of a
state of change in analysis data with progress of learning.
[0027] FIG. 18 is an enlarged view of the analysis data in E of
FIG. 17.
[0028] FIG. 19 is a diagram illustrating a first example of a
millimeter wave image and a captured image.
[0029] FIG. 20 is a diagram illustrating a second example of the
millimeter wave image and the captured image.
[0030] FIG. 21 is a diagram illustrating a third example of the
millimeter wave image and the captured image.
[0031] FIG. 22 is a diagram illustrating an example of feature maps
generated from a millimeter wave image.
[0032] FIG. 23 is a graph illustrating an example of a histogram of
pixel values of a feature map.
[0033] FIG. 24 is a diagram illustrating an example of analysis
data based on frequency distribution of pixel values of a feature
map.
[0034] FIG. 25 is a diagram illustrating a configuration example of
a computer.
MODE FOR CARRYING OUT THE INVENTION
[0035] Hereinafter, a mode for carrying out the present technology
will be described. The description will be given in the following
order. [0036] 1. Embodiment [0037] 2. Case Example of Analysis of
Learning Situation [0038] 3. Modified Examples [0039] 4. Others
1. Embodiment
[0040] First, an embodiment of the present technology will be
described with reference to FIGS. 1 to 15.
[0041] <Configuration Example of Vehicle Control System
100>
[0042] FIG. 1 is a block diagram illustrating a configuration
example of schematic functions of a vehicle control system 100
which is an example of a moving body control system to which the
present technology can be applied.
[0043] Note that, hereinafter, in a case where a vehicle 10
provided with the vehicle control system 100 is distinguished from
other vehicles, it is referred to as a host vehicle or a host
car.
[0044] The vehicle control system 100 includes an input unit 101, a
data acquisition unit 102, a communication unit 103, an in-vehicle
device 104, an output control unit 105, an output unit 106, a drive
system control unit 107, a drive system 108, a body system control
unit 109, a body system 110, a storage unit 111, and an automatic
driving control unit 112. The input unit 101, the data acquisition
unit 102, the communication unit 103, the output control unit 105,
the drive system control unit 107, the body system control unit
109, the storage unit 111, and the automatic driving control unit
112 are connected to one another via a communication network 121.
The communication network 121 includes, for example, a bus, a
vehicle-mounted communication network conforming to any standard
such as a controller area network (CAN), a local interconnect
network (LIN), a local area network (LAN), FlexRay (registered
trademark), and the like. Note that there is also a case where the
units of the vehicle control system 100 are directly connected to
each other without the communication network 121.
[0045] Note that, hereinafter, in a case where each unit of the
vehicle control system 100 performs communication via the
communication network 121, description of the communication network
121 will be omitted. For example, in a case where the input unit
101 and the automatic driving control unit 112 perform
communication via the communication network 121, it is simply
described that the input unit 101 and the automatic driving control
unit 112 perform communication.
[0046] The input unit 101 includes a device used by a passenger for
inputting various data, instructions, and the like. For example,
the input unit 101 includes an operation device such as a touch
panel, a button, a microphone, a switch, a lever, and the like, and
an operation device and the like that can be input by a method
other than a manual operation such as voice, gesture, and the like.
Furthermore, for example, the input unit 101 may be a remote
control device using infrared rays or other radio waves, or an
external connection device such as a mobile device, a wearable
device, and the like corresponding to an operation of the vehicle
control system 100. The input unit 101 generates an input signal on
the basis of data, instructions, and the like input by the
passenger, and supplies the input signal to each unit of the
vehicle control system 100.
[0047] The data acquisition unit 102 includes various sensors and
the like that acquire data used for processing of the vehicle
control system 100, and supplies the acquired data to each unit of
the vehicle control system 100.
[0048] For example, the data acquisition unit 102 includes various
sensors for detecting a state and the like of the host vehicle.
Specifically, for example, the data acquisition unit 102 includes a
gyro sensor, an acceleration sensor, an inertial measurement unit
(IMU), and a sensor and the like for detecting an operation amount
of an accelerator pedal, an operation amount of a brake pedal, a
steering angle of a steering wheel, an engine speed, a motor speed,
a wheel rotation speed, and the like.
[0049] Furthermore, for example, the data acquisition unit 102
includes various sensors for detecting information outside the host
vehicle. Specifically, for example, the data acquisition unit 102
includes an imaging device such as a time of flight (ToF) camera, a
stereo camera, a monocular camera, an infrared camera, another
camera, and the like. Furthermore, for example, the data
acquisition unit 102 includes an environment sensor for detecting
weather, atmospheric phenomena, and the like, and a surrounding
information detection sensor for detecting an object around the
host vehicle. The environment sensor includes, for example, a
raindrop sensor, a fog sensor, a sunshine sensor, a snow sensor,
and the like. The surrounding information detection sensor
includes, for example, an ultrasonic sensor, a radar, a light
detection and ranging or laser imaging detection and ranging
(LiDAR), a sonar, and the like.
[0050] Moreover, for example, the data acquisition unit 102
includes various sensors for detecting a current position of the
host vehicle. Specifically, for example, the data acquisition unit
102 includes a global navigation satellite system (GNSS) receiver
and the like that receives a GNSS signal from a GNSS satellite.
[0051] Furthermore, for example, the data acquisition unit 102
includes various sensors for detecting information inside the
vehicle. Specifically, for example, the data acquisition unit 102
includes an imaging device that images a driver, a biological
sensor that detects biological information of the driver, a
microphone that collects voice in the vehicle interior, and the
like. The biological sensor is provided on, for example, a seat
surface, a steering wheel, and the like and detects biological
information of a passenger sitting on the seat or a driver gripping
the steering wheel.
[0052] The communication unit 103 performs communication with the
in-vehicle device 104, various devices outside the vehicle, a
server, a base station, and the like, transmits data supplied from
each unit of the vehicle control system 100, and supplies received
data to each unit of the vehicle control system 100. Note that a
communication protocol supported by the communication unit 103 is
not particularly limited, and furthermore, the communication unit
103 can support a plurality of types of communication
protocols.
[0053] For example, the communication unit 103 performs wireless
communication with the in-vehicle device 104 by a wireless LAN,
Bluetooth (registered trademark), near field communication (NFC), a
wireless USB (WUSB), and the like. Furthermore, for example, the
communication unit 103 performs wired communication with the
in-vehicle device 104 by a universal serial bus (USB), a
high-definition multimedia interface (HDMI) (registered trademark),
a mobile high-definition link (MHL), and the like via a connection
terminal (not shown) (and a cable if necessary).
[0054] Moreover, for example, the communication unit 103 performs
communication unit with a device (for example, an application
server or a control server) existing on an external network (for
example, the Internet, a cloud network, or an operator-specific
network) via the base station or an access point. Furthermore, for
example, the communication unit 103 performs communication with a
terminal existing near the host vehicle (for example, a terminal of
a pedestrian or a store, or a machine type communication (MTC)
terminal) using peer to peer (P2P) technology. Moreover, for
example, the communication unit 103 performs V2X communication such
as vehicle to vehicle communication, vehicle to infrastructure
communication, host vehicle to home communication, vehicle to
pedestrian communication, and the like. Furthermore, for example,
the communication unit 103 includes a beacon receiving unit that
receives radio waves or electromagnetic waves transmitted from a
wireless station and the like installed on a road and acquires
information such as a current position, traffic congestion, traffic
regulation, required time, and the like.
[0055] The in-vehicle device 104 includes, for example, a mobile
device or a wearable device possessed by a passenger, an
information device carried in or attached to the host vehicle, a
navigation device that searches for a route to an arbitrary
destination, and the like.
[0056] The output control unit 105 controls output of various
information to a passenger of the host vehicle or the outside of
the vehicle. For example, the output control unit 105 generates an
output signal including at least one of visual information (for
example, image data) or auditory information (for example, voice
data) and supplies the output signal to the output unit 106,
thereby controlling output of the visual information and the
auditory information from the output unit 106. Specifically, for
example, the output control unit 105 synthesizes image data imaged
by different imaging devices of the data acquisition unit 102 to
generate an overhead view image, a panoramic image, and the like,
and supplies an output signal including the generated image to the
output unit 106. Furthermore, for example, the output control unit
105 generates voice data including a warning sound, a warning
message, and the like for danger such as collision, contact, or
entry into a danger zone, and supplies an output signal including
the generated voice data to the output unit 106.
[0057] The output unit 106 includes a device capable of outputting
visual information or auditory information to a passenger of the
host vehicle or the outside of the vehicle. For example, the output
unit 106 includes a display device, an instrument panel, an audio
speaker, a headphone, a wearable device such as a glasses-type
display and the like worn by a passenger, a projector, a lamp, and
the like. In addition to a device having a normal display, the
display device included in the output unit 106 may be, for example,
a device that displays visual information in a field of view of a
driver, such as a head-up display, a transmissive display, a device
having an augmented reality (AR) display function, and the
like.
[0058] The drive system control unit 107 controls the drive system
108 by generating various control signals and supplying them to the
drive system 108. Furthermore, the drive system control unit 107
supplies a control signal to each unit other than the drive system
108 as necessary, and performs notification of a control state of
the drive system 108 and the like.
[0059] The drive system 108 includes various devices related to the
drive system of the host vehicle. For example, the drive system 108
includes a driving force generation device for generating a driving
force of an internal combustion engine, a driving motor, and the
like, a driving force transmission mechanism for transmitting the
driving force to wheels, a steering mechanism for adjusting a
steering angle, a braking device for generating a braking force, an
antilock brake system (ABS), an electronic stability control (ESC),
an electric power steering device, and the like.
[0060] The body system control unit 109 controls the body system
110 by generating various control signals and supplying them to the
body system 110. Furthermore, the body system control unit 109
supplies a control signal to each unit other than the body system
110 as necessary, and performs notification of a control state of
the body system 110 and the like.
[0061] The body system 110 includes various devices of a body
system mounted on a vehicle body. For example, the body system 110
includes a keyless entry system, a smart key system, a power window
device, a power seat, a steering wheel, an air conditioner, various
lamps (for example, a head lamp, a back lamp, a brake lamp, a
blinker, a fog lamp, and the like), and the like.
[0062] The storage unit 111 includes, for example, a read only
memory (ROM), a random access memory (RAM), a magnetic storage
device such as a hard disc drive (HDD) and the like, a
semiconductor storage device, an optical storage device, a
magneto-optical storage device, and the like. The storage unit 111
stores various programs, data, and the like used by each unit of
the vehicle control system 100. For example, the storage unit 111
stores map data such as a three-dimensional high-precision map such
as a dynamic map and the like, a global map that is less precise
than the high-precision map and covers a wider area, a local map
including information around the host vehicle, and the like.
[0063] The automatic driving control unit 112 performs control
related to automatic driving such as autonomous traveling, driving
assistance, and the like. Specifically, for example, the automatic
driving control unit 112 performs cooperative control for the
purpose of realizing functions of an advanced driver assistance
system (ADAS) including collision avoidance or shock mitigation of
a host vehicle, following running based on a following distance,
vehicle speed maintaining running, collision warning of the host
vehicle, lane departure warning of the host vehicle, and the like.
Furthermore, for example, the automatic driving control unit 112
performs cooperative control for the purpose of automatic driving
and the like in which a vehicle autonomously travels without
depending on an operation of the driver. The automatic driving
control unit 112 includes a detection unit 131, a self-position
estimation unit 132, a situation analysis unit 133, a planning unit
134, and an operation control unit 135.
[0064] The detection unit 131 detects various information necessary
for controlling the automatic driving. The detection unit 131
includes a vehicle exterior information detection unit 141, a
vehicle interior information detection unit 142, and a vehicle
state detection unit 143.
[0065] The vehicle exterior information detection unit 141 performs
detection processing of information outside the host vehicle on the
basis of data or signals from each unit of the vehicle control
system 100. For example, the vehicle exterior information detection
unit 141 performs detection processing, recognition processing, and
tracking processing of an object around the host vehicle, and
detection processing of a distance to the object. The object to be
detected includes, for example, a vehicle, a person, an obstacle, a
structure, a road, a traffic light, a traffic sign, a road sign,
and the like. Furthermore, for example, the vehicle exterior
information detection unit 141 performs detection processing of a
surrounding environment of the host vehicle. The surrounding
environment to be detected includes, for example, weather,
temperature, humidity, brightness, a state of a road surface, and
the like. The vehicle exterior information detection unit 141
supplies data indicating a result of the detection processing to
the self-position estimation unit 132, a map analysis unit 151, a
traffic rule recognition unit 152, and a situation recognition unit
153 of the situation analysis unit 133, an emergency avoidance unit
171 of the operation control unit 135, and the like.
[0066] The vehicle interior information detection unit 142 performs
detection processing of vehicle interior information on the basis
of data or signals from each unit of the vehicle control system
100. For example, the vehicle interior information detection unit
142 performs authentication processing and recognition processing
of a driver, detection processing of a state of the driver,
detection processing of a passenger, detection processing of a
vehicle interior environment, and the like. The state of the driver
to be detected includes, for example, a physical condition, a
wakefulness level, a concentration level, a fatigue level, a
line-of-sight direction, and the like. The vehicle interior
environment to be detected includes, for example, temperature,
humidity, brightness, odor, and the like. The vehicle interior
information detection unit 142 supplies data indicating a result of
the detection processing to the situation recognition unit 153 of
the situation analysis unit 133, the emergency avoidance unit 171
of the operation control unit 135, and the like.
[0067] The vehicle state detection unit 143 performs detection
processing of a state of the host vehicle on the basis of data or
signals from each unit of the vehicle control system 100. The state
of the host vehicle to be detected includes, for example, speed,
acceleration, a steering angle, presence/absence and contents of
abnormality, a state of a driving operation, a position and
inclination of a power seat, a state of a door lock, states of
other in-vehicle devices, and the like. The vehicle state detection
unit 143 supplies data indicating a result of the detection
processing to the situation recognition unit 153 of the situation
analysis unit 133, the emergency avoidance unit 171 of the
operation control unit 135, and the like.
[0068] The self-position estimation unit 132 performs estimation
processing of a position, posture, and the like of the host vehicle
on the basis of data or signals from each unit of the vehicle
control system 100 such as the vehicle exterior information
detection unit 141, the situation recognition unit 153 of the
situation analysis unit 133, and the like. Furthermore, the
self-position estimation unit 132 generates a local map used for
estimating a self-position (hereinafter referred to as a
self-position estimation map) as necessary. The self-position
estimation map is, for example, a highly accurate map using a
technique such as simultaneous localization and mapping (SLAM) and
the like. The self-position estimation unit 132 supplies data
indicating a result of the estimation processing to the map
analysis unit 151, the traffic rule recognition unit 152, the
situation recognition unit 153, and the like of the situation
analysis unit 133. Furthermore, the self-position estimation unit
132 stores the self-position estimation map in the storage unit
111.
[0069] The situation analysis unit 133 performs analysis processing
of the host vehicle and a surrounding situation. The situation
analysis unit 133 includes the map analysis unit 151, the traffic
rule recognition unit 152, the situation recognition unit 153, and
a situation prediction unit 154.
[0070] The map analysis unit 151 performs analysis processing of
various maps stored in the storage unit 111 while using data or
signals from each unit of the vehicle control system 100 such as
the self-position estimation unit 132, the vehicle exterior
information detection unit 141, and the like as necessary, and
constructs a map including information necessary for automatic
driving processing. The map analysis unit 151 supplies the
constructed map to the traffic rule recognition unit 152, the
situation recognition unit 153, the situation prediction unit 154,
a route planning unit 161, an action planning unit 162, and an
operation planning unit 163 of the planning unit 134, and the
like.
[0071] The traffic rule recognition unit 152 performs recognition
processing of traffic rules around the host vehicle on the basis of
data or signals from each unit of the vehicle control system 100
such as the self-position estimation unit 132, the vehicle exterior
information detection unit 141, the map analysis unit 151, and the
like. By this recognition processing, for example, a position and a
state of a signal around the host vehicle, contents of traffic
regulations around the host vehicle, a travelable lane, and the
like are recognized. The traffic rule recognition unit 152 supplies
data indicating a result of the recognition processing to the
situation prediction unit 154 and the like.
[0072] The situation recognition unit 153 performs recognition
processing of a situation related to the host vehicle on the basis
of data or signals from each unit of the vehicle control system 100
such as the self-position estimation unit 132, the vehicle exterior
information detection unit 141, the vehicle interior information
detection unit 142, the vehicle state detection unit 143, the map
analysis unit 151, and the like. For example, the situation
recognition unit 153 performs recognition processing of a situation
of the host vehicle, a situation around the host vehicle, a
situation of a driver of the host vehicle, and the like.
Furthermore, the situation recognition unit 153 generates a local
map used to recognize the situation around the host vehicle
(hereinafter referred to as a situation recognition map) as
necessary. The situation recognition map is, for example, an
occupancy grid map.
[0073] The situation of the host vehicle to be recognized includes,
for example, a position, posture, and movement (for example, speed,
acceleration, a moving direction, and the like) of the host
vehicle, presence/absence and contents of abnormality, and the
like. The situation around the host vehicle to be recognized
includes, for example, a type and a position of a surrounding
stationary object, a type, a position, and movement (for example,
speed, acceleration, a moving direction, and the like) of a
surrounding moving object, a configuration of a surrounding road
and a state of a road surface, surrounding weather, temperature,
humidity, brightness, and the like. The state of the driver to be
recognized includes, for example, a physical condition, a
wakefulness level, a concentration level, a fatigue level, movement
of a line of sight, a driving operation, and the like.
[0074] The situation recognition unit 153 supplies data indicating
a result of the recognition processing (including the situation
recognition map as necessary) to the self-position estimation unit
132, the situation prediction unit 154, and the like. Furthermore,
the situation recognition unit 153 stores the situation recognition
map in the storage unit 111.
[0075] The situation prediction unit 154 performs prediction
processing of a situation related to the host vehicle on the basis
of data or signals from each unit of the vehicle control system 100
such as the map analysis unit 151, the traffic rule recognition
unit 152, the situation recognition unit 153, and the like. For
example, the situation prediction unit 154 performs prediction
processing of a situation of the host vehicle, a situation around
the host vehicle, a situation of the driver, and the like.
[0076] The situation of the host vehicle to be predicted includes,
for example, behavior of the host vehicle, occurrence of
abnormality, a travelable distance, and the like. The situation
around the host vehicle to be predicted includes, for example,
behavior of a moving object around the host vehicle, a change in a
signal state, a change in an environment such as weather, and the
like. The situation of the driver to be predicted includes, for
example, behavior, a physical condition, and the like of the
driver.
[0077] The situation prediction unit 154 supplies data indicating a
result of the prediction processing together with the data from the
traffic rule recognition unit 152 and the situation recognition
unit 153 to the route planning unit 161, the action planning unit
162, and the operation planning unit 163 of the planning unit 134,
and the like.
[0078] The route planning unit 161 plans a route to a destination
on the basis of data or signals from each unit of the vehicle
control system 100 such as the map analysis unit 151, the situation
prediction unit 154, and the like. For example, the route planning
unit 161 sets a route from a current position to a designated
destination on the basis of the global map. Furthermore, for
example, the route planning unit 161 appropriately changes the
route on the basis of a situation such as traffic congestion, an
accident, traffic regulation, construction, and the like, and a
physical condition of the driver, and the like. The route planning
unit 161 supplies data indicating the planned route to the action
planning unit 162 and the like.
[0079] The action planning unit 162 plans an action of the host
vehicle for safely traveling the route planned by the route
planning unit 161 within a planned time on the basis of data or
signals from each unit of the vehicle control system 100 such as
the map analysis unit 151, the situation prediction unit 154, and
the like. For example, the action planning unit 162 performs
planning of start, stop, a traveling direction (for example,
forward movement, backward movement, left turn, right turn,
direction change, and the like), a traveling lane, traveling speed,
overtaking, and the like. The action planning unit 162 supplies
data indicating the planned action of the host vehicle to the
operation planning unit 163 and the like.
[0080] The operation planning unit 163 plans operation of the host
vehicle for realizing the action planned by the action planning
unit 162 on the basis of data or signals from each unit of the
vehicle control system 100 such as the map analysis unit 151, the
situation prediction unit 154, and the like. For example, the
operation planning unit 163 plans acceleration, deceleration, a
travel trajectory, and the like. The operation planning unit 163
supplies data indicating the planned operation of the host vehicle
to an acceleration/deceleration control unit 172, a direction
control unit 173, and the like of the operation control unit
135.
[0081] The operation control unit 135 controls the operation of the
host vehicle. The operation control unit 135 includes the emergency
avoidance unit 171, the acceleration/deceleration control unit 172,
and the direction control unit 173.
[0082] The emergency avoidance unit 171 performs detection
processing of an emergency such as collision, contact, entry into a
danger zone, abnormality of the driver, abnormality of the vehicle,
and the like on the basis of the detection results of the vehicle
exterior information detection unit 141, the vehicle interior
information detection unit 142, and the vehicle state detection
unit 143. In a case of detecting occurrence of an emergency, the
emergency avoidance unit 171 plans operation of the host vehicle
for avoiding the emergency such as a sudden stop, a sudden turn,
and the like. The emergency avoidance unit 171 supplies data
indicating the planned operation of the host vehicle to the
acceleration/deceleration control unit 172, the direction control
unit 173, and the like.
[0083] The acceleration/deceleration control unit 172 performs
acceleration/deceleration control for realizing operation of the
host vehicle planned by the operation planning unit 163 or the
emergency avoidance unit 171. For example, the
acceleration/deceleration control unit 172 calculates a control
target value of the driving force generation device or the braking
device for realizing planned acceleration, deceleration, or sudden
stop, and supplies a control command indicating the calculated
control target value to the drive system control unit 107.
[0084] The direction control unit 173 performs direction control
for realizing operation of the host vehicle planned by the
operation planning unit 163 or the emergency avoidance unit 171.
For example, the direction control unit 173 calculates a control
target value of the steering mechanism for realizing the traveling
trajectory or the sudden turn planned by the operation planning
unit 163 or the emergency avoidance unit 171, and supplies a
control command indicating the calculated control target value to
the drive system control unit 107.
[0085] <Configuration Example of Object Recognition Model
201>
[0086] FIG. 2 illustrates a configuration example of an object
recognition model 201. The object recognition model 201 is used,
for example, in the vehicle exterior information detection unit 141
of the vehicle control system 100 in FIG. 1.
[0087] For example, a captured image which is image data obtained
by imaging the front of the vehicle 10 (for example, a captured
image 202) is input to the object recognition model 201 as input
data. Then, the object recognition model 201 performs recognition
processing of a vehicle in front of the vehicle 10 on the basis of
the captured image, and outputs an output image which is image data
indicating a recognition result (for example, an output image 203)
as output data.
[0088] The object recognition model 201 is a model using a
convolutional neural network (CNN), and includes a feature
extraction layer 211 and a prediction layer 212.
[0089] The feature extraction layer 211 includes a plurality of
hierarchies, and each hierarchy includes a convolutional layer, a
pooling layer, and the like. Each hierarchy of the feature
extraction layer 211 generates one or more feature maps indicating
features of the captured image by predetermined calculation, and
supplies the feature maps to the next hierarchy. Furthermore, some
hierarchies of the feature extraction layer 211 supply the feature
maps to the prediction layer 212.
[0090] Note that size (the number of pixels) of the feature map
generated in each hierarchy of the feature extraction layer 211 is
different and gradually decreases as the hierarchy progresses.
[0091] FIG. 3 illustrates an example of feature maps 231-1 to 231-n
generated in and output from a hierarchy 221 that is an
intermediate layer surrounded by a dotted square of the feature
extraction layer 211 in FIG. 2.
[0092] In the hierarchy 221, n feature maps 231-1 to 231-n are
generated for one captured image. The feature maps 231-1 to 231-n
are, for example, image data in which 38 pixels in a vertical
direction.times.38 pixels in a horizontal direction are
two-dimensionally arranged. Furthermore, the feature maps 231-1 to
231-n are feature maps having the largest size among the feature
maps used for the recognition processing in the prediction layer
212.
[0093] Note that, hereinafter, it is assumed that serial numbers
starting from 1 are assigned to the feature maps generated from
each captured image for each hierarchy. For example, it is assumed
that serial numbers 1 to n are respectively assigned to the feature
maps 231-1 to 231-n generated from one captured image in the
hierarchy 221. Similarly, it is assumed that serial numbers 1 to N
are assigned to feature maps generated from one captured image in
another hierarchy. Note that N represents the number of feature
maps generated from one captured image in the hierarchy.
[0094] The prediction layer 212 performs recognition processing of
a vehicle in front of the vehicle 10 on the basis of the feature
maps supplied from the feature extraction layer 211. The prediction
layer 212 outputs an output image indicating a recognition result
of the vehicle.
[0095] <Configuration Example of Information Processing
Apparatus 301>
[0096] FIG. 4 illustrates a configuration example of an information
processing apparatus 301 used for learning of a model using a
neural network (hereinafter referred to as a learning model) such
as the object recognition model 201 and the like in FIG. 2.
[0097] The information processing apparatus 301 includes an input
unit 311, a learning unit 312, a learning situation analysis unit
313, an output control unit 314, and an output unit 315.
[0098] The input unit 311 includes an input device used for
inputting various data, instructions, and the like, generates an
input signal on the basis of the input data, instructions, and the
like, and supplies the input signal to the learning unit 312. For
example, the input unit 311 is used to input teacher data for
learning of the learning model.
[0099] The learning unit 312 performs learning processing of the
learning model. Note that a learning method of the learning unit
312 is not limited to a specific method. Furthermore, the learning
unit 312 supplies a feature map generated in the learning model to
the learning situation analysis unit 313.
[0100] The learning situation analysis unit 313 performs analysis
and the like of a learning situation of the learning model by the
learning unit 312 on the basis of the feature map supplied from the
learning unit 312. The learning situation analysis unit 313
includes a feature data generation unit 321, an analysis data
generation unit 322, an analysis unit 323, and a parameter setting
unit 324.
[0101] The feature data generation unit 321 generates feature data
that numerically represents a feature of the feature map, and
supplies the feature data to the analysis data generation unit
322.
[0102] The analysis data generation unit 322 generates analysis
data in which the feature data of the plurality of feature maps is
arranged (arrayed), and supplies the analysis data to the analysis
unit 323 and the output control unit 314.
[0103] The analysis unit 323 performs analysis processing of the
learning situation of the learning model on the basis of the
analysis data, and supplies data indicating an analysis result to
the parameter setting unit 324.
[0104] The parameter setting unit 324 sets various parameters for
learning of the learning model on the basis of the analysis result
of the learning situation of the learning model. The parameter
setting unit 324 supplies data indicating the set parameters to the
learning unit 312.
[0105] The output control unit 314 controls output of various
information by the output unit 315. For example, the output control
unit 314 controls display of the analysis data by the output unit
315.
[0106] The output unit 315 includes an output device capable of
outputting various information such as visual information, auditory
information, and the like. For example, the output unit 315
includes a display, a speaker, and the like.
[0107] <Learning Situation Analysis Processing>
[0108] Next, learning situation analysis processing executed by the
information processing apparatus 301 will be described with
reference to a flowchart of FIG. 5.
[0109] This processing is started, for example, when the learning
processing of the learning model is started by the learning unit
312.
[0110] Note that, hereinafter, a case where the learning processing
of the object recognition model 201 in FIG. 2 is performed and its
learning situation is analyzed will be described as an example.
Furthermore, hereinafter, a case where 512 feature maps are
generated from one captured image in the hierarchy 221 of the
object recognition model 201 will be described as an example.
[0111] In step S1, the feature data generation unit 321 generates
feature data. For example, the feature data generation unit 321
calculates variance indicating a degree of dispersion of pixel
values of the feature maps by the following formula (1).
[ Mathematical .times. .times. formula .times. .times. 1 ] .times.
var = 1 I .times. J .times. i = 0 I .times. j = 0 J .times. ( A i ,
j - mean ) 2 ( 1 ) .times. mean = 1 I .times. J .times. i = 0 I
.times. j = 0 J .times. A i , j ( 2 ) ##EQU00001##
[0112] In the formula (1), var represents variance of pixel values
of the feature maps. I represents a value obtained by subtracting 1
from the number of pixels in a column direction (vertical
direction) of the feature map, and J represents a value obtained by
subtracting 1 from the number of pixels in a row direction
(horizontal direction) of the feature map. A.sub.i,j represents a
pixel value of coordinates (i, j) of the feature map. mean
represents an average of pixel values of the feature maps, and is
calculated by a formula (2).
[0113] Note that A of FIG. 6 illustrates an example of feature maps
having large variance of pixel values, and B of FIG. 6 illustrates
an example of feature maps having small variance of pixel values. A
feature map having larger variance of pixel values has a higher
possibility of capturing a feature of a captured image, and a
feature map having smaller variance of pixel values has a lower
possibility of capturing a feature of a captured image.
[0114] For example, every time teacher data is input, the learning
unit 312 performs learning processing of the object recognition
model 201, and supplies a plurality of feature maps generated from
a captured image included in the teacher data in the hierarchy 221
of the object recognition model 201 to the feature data generation
unit 321. The feature data generation unit 321 calculates variance
of pixel values of each feature map, and supplies feature data
indicating the calculated variance to the analysis data generation
unit 322.
[0115] In step S2, the analysis data generation unit 322 generates
analysis data.
[0116] FIG. 7 illustrates an example in which the feature maps
generated in the hierarchy 221 of the object recognition model 201
are arranged in a line. The feature maps contain useful
information, but are not very suitable for analyzing the learning
situation of the object recognition model 201. For example, in a
case where 200 captured images are used for the learning
processing, a total of 102,400 (=512.times.200) feature maps are
generated in the hierarchy 221. Therefore, it is difficult to
analyze the learning situation of the object recognition model 201
using the feature maps as they are because an information amount is
too large.
[0117] On the other hand, the analysis data generation unit 322
arranges the feature data of the plurality of feature maps
generated from the plurality of captured images in the hierarchy
221 in predetermined order, thereby compressing an information
amount and generating one piece of analysis data.
[0118] FIG. 8 illustrates an example in which feature maps
generated from 200 captured images are two-dimensionally arranged
in the hierarchy 221 of the object recognition model 201. In the
drawing, a column in the vertical direction indicates a captured
image number, and a row in the horizontal direction indicates a
feature map number. In each row, 512 feature maps generated from
each captured image are arranged in numerical order from left to
right. For example, in the first row, 512 feature maps generated
from the first captured image are arranged in numerical order from
left to right.
[0119] Then, the analysis data generation unit 322 arranges 512
pieces of feature data (variance of each feature map) based on 512
feature maps generated from each captured image in numerical order
of corresponding feature maps for each captured image, thereby
generating a 512-dimensional vector (hereinafter referred to as a
variance vector). For example, for a captured image 1 in FIG. 8,
one variance vector having 512 pieces of feature data based on
feature maps 1 to 512 generated from the captured image 1 as
elements is generated. Then, variance vectors 1 to 200 for the
captured images 1 to 200 are generated.
[0120] FIG. 9 illustrates an example of a variance vector and an
imaged variance vector. The image of the variance vector is
obtained by arranging pixels indicating colors according to values
of elements (feature data) of the variance vector in order in the
horizontal direction. For example, the color of the pixel is set to
be redder as the value of the feature data (variance of the feature
map), which is the pixel value, decreases, and to be bluer as the
value of the feature data (variance of the feature map), which is
the pixel value, increases. Note that the image of the variance
vector is actually a color image, but is illustrated by a grayscale
image here.
[0121] Then, the analysis data generation unit 322 generates
analysis data including image data in which elements (feature data)
of the variance vector of each captured image are pixels.
[0122] FIG. 10 illustrates an example in which analysis data
generated on the basis of the feature maps of FIG. 8 is imaged.
Note that, for example, similarly to the image of the variance
vector in FIG. 9, a color of a pixel of the analysis data is set to
be redder as a value of feature data, which is a pixel value,
decreases, and to be bluer as the value of the feature data, which
is the pixel value, increases. Note that the image of the analysis
data is actually a color image, but is illustrated by a grayscale
image here. Furthermore, an x-axis direction (horizontal direction)
of the analysis data indicates a feature map number, and a y-axis
direction (vertical direction) thereof indicates a captured image
number.
[0123] In the analysis data of FIG. 10, feature data of each
feature map is arranged in the same arrangement order as the
feature maps of FIG. 8. In other words, in the x-axis direction
(horizontal direction) of the analysis data, 512 pieces of feature
data based on 512 feature maps generated from the same captured
images are arranged in numerical order of the feature maps.
Furthermore, in the y-axis direction of the analysis data, the
feature data based on the feature maps corresponding to each other
(of the same number) of the different captured images is arranged
in numerical order of the captured images. A white circle in the
drawing indicates a position of a pixel corresponding to the 200th
feature map 200 of the 100th captured image 100.
[0124] The analysis data generation unit 322 supplies the generated
analysis data to the analysis unit 323 and the output control unit
314.
[0125] The output unit 315 displays the analysis data, for example,
under the control of the output control unit 314.
[0126] In step S3, the analysis unit 323 analyzes a learning
situation by using the analysis data.
[0127] FIG. 11 illustrates an example of a state of change in the
analysis data with progress of learning. Analysis data in A to E of
FIG. 11 is arranged in learning progress order. The analysis data
in A of FIG. 11 is the oldest, and the analysis data in E of FIG.
11 is the latest.
[0128] In the analysis data in A of FIG. 11 at the beginning of
learning, there are many horizontal lines (lines in the x-axis
direction) and vertical lines (lines in the y-axis direction). On
the other hand, as the learning progresses, the number of
horizontal lines and vertical lines decreases. In the analysis data
of E of FIG. 11, the horizontal line does not exist, and the number
of vertical lines is also about ten.
[0129] Here, the horizontal line is, for example, a row in the
x-axis direction, in which pixels having pixel values (values of
feature data) equal to or greater than a predetermined threshold
value exist equal to or greater than a predetermined threshold
value. The vertical line is, for example, a column in the y-axis
direction, in which pixels having pixel values (values of feature
data) equal to or greater than a predetermined threshold value
exist equal to or greater than a predetermined threshold value.
Furthermore, the number of horizontal lines and vertical lines is
counted for each row in the x-axis direction and each column in the
y-axis direction of the analysis data. In other words, even if a
plurality of horizontal lines or a plurality of vertical lines is
adjacent to each other and looks like one line, the lines are
counted as different lines.
[0130] Then, as illustrated in this example, a state in which there
is no horizontal line of the analysis data and the vertical lines
thereof converge within a predetermined range is an ideal state,
and is a state in which learning of the learning model is
appropriately performed.
[0131] Specifically, in a case where there is a feature map in
which variance of pixel values increases regardless of contents of
a captured image (hereinafter referred to as a high variance
feature map), a vertical line appears in a column corresponding to
the high variance feature map of the analysis data. The high
variance feature map is a feature map that extracts a feature of a
captured image regardless of contents of the captured image and is
highly likely to contribute to recognition of an object of the
learning model.
[0132] On the other hand, in a case where there is a feature map in
which variance of pixel values decreases regardless of contents of
a captured image (hereinafter referred to as a low variance feature
map), no vertical line appears in a column corresponding to the low
variance feature map of the analysis data. The low variance feature
map is a feature map that does not extract a feature of a captured
image regardless of contents of the captured image and is less
likely to contribute to recognition of an object of the learning
model.
[0133] Therefore, as the number of high variance feature maps
increases, the number of vertical lines of the analysis data
increases, and as the number of high variance feature maps
decreases, the number of vertical lines of the analysis data
decreases.
[0134] Note that the high variance feature map does not necessarily
contribute to recognition of an object to be recognized by the
learning model (for example, a vehicle), and in a case where
learning is not sufficient, there is also a case where it
contributes to recognition of an object other than the object to be
recognized. However, if the learning model has been appropriately
learned, the high variance feature map is a feature map that
contributes to recognition of an object to be recognized by the
learning model.
[0135] Here, in general, in learning processing of a learning model
using a neural network, regularization processing is performed in
order to suppress over-learning, reduce weight of the neural
network, and the like. By this regularization processing, a type of
a feature amount extracted from a captured image is narrowed down
to some extent. In other words, the number of feature maps from
which features of the captured image can be extracted, that is, the
number of high variance feature maps is narrowed down to some
extent.
[0136] For example, as illustrated in FIG. 12, there are many high
variance feature maps at the beginning of learning, but the number
of high variance feature maps is narrowed down at the end of the
learning.
[0137] Therefore, for example, as indicated by arrows in FIG. 13, a
state in which the number of vertical lines of the analysis data
converges within a predetermined range is an ideal state in which
regularization is normally performed.
[0138] On the other hand, as illustrated in FIG. 14, a state in
which the number of vertical lines of the analysis data decreases
too much or becomes zero as the learning progresses is a state in
which the regularization is too strong and the feature of the
captured image cannot be sufficiently extracted. Furthermore, as
illustrated in FIG. 15, a state in which the number of vertical
lines of the analysis data does not decrease even when the learning
progresses is a state in which the regularization is too weak and
types of the features to be extracted from the captured image are
not completely narrowed down.
[0139] Therefore, in a case where the number of vertical lines of
the analysis data converges within a predetermined range, the
analysis unit 323 determines that regularization of the learning
processing is normally performed. On the other hand, in a case
where the number of vertical lines of the analysis data converges
to a value exceeding the predetermined range or in a case where the
number of vertical lines does not converge, the analysis unit 323
determines that the regularization is too weak. Furthermore, in a
case where the number of vertical lines of the analysis data
converges to a value less than the predetermined range, the
analysis unit 323 determines that the regularization is too
strong.
[0140] Note that even in a case where the regularization processing
is not performed, the number of vertical lines of the analysis data
decreases and converges as the learning progresses. In this case,
the number of vertical lines of the analysis data converges to a
large value as compared with a case where the regularization
processing is not performed. However, similarly to a case where the
regularization processing is performed, it is possible to determine
whether or not the learning processing is normally performed on the
basis of the number of vertical lines.
[0141] Furthermore, in a case where there is a captured image in
which variance of pixel values of most of the feature maps
increases (hereinafter referred to as a high variance captured
image), a horizontal line appears in a row corresponding to the
high variance captured image of the analysis data. On the other
hand, in a case where there is a captured image in which variance
of pixel values of most of the feature maps decreases (hereinafter
referred to as a low variance captured image), a horizontal line
does not appear in a row corresponding to the low variance captured
image of the analysis data.
[0142] Therefore, in a case where the high variance captured image
and the low variance captured image are mixed, a horizontal line
appears in the analysis data. This indicates a state in which role
sharing of each feature map is not sufficiently performed, that is,
features extracted by each feature map are not clearly separated,
and the features extracted by each feature map vary depending on
contents of the captured images.
[0143] Therefore, the smaller the number of horizontal lines of the
analysis data, the better, and a state in which the number of
horizontal lines is zero is the most ideal state.
[0144] Therefore, in a case where the number of horizontal lines of
the analysis data converges to a value less than a predetermined
threshold value, the analysis unit 323 determines that the learning
processing is normally performed. On the other hand, in a case
where the number of horizontal lines of the analysis data converges
to a value equal to or more than the predetermined threshold value,
or in a case where the number of horizontal lines does not
converge, the analysis unit 323 determines that the learning
process is not normally performed.
[0145] Then, the analysis unit 323 supplies data indicating an
analysis result of the learning situation to the parameter setting
unit 324.
[0146] In step S4, the parameter setting unit 324 adjusts a
parameter of the learning processing on the basis of the analysis
result. For example, in a case where the analysis unit 323
determines that the regularization is too strong, the parameter
setting unit 324 makes a value of a regularization parameter used
for the regularization processing smaller than a current value. On
the other hand, in a case where the analysis unit 323 determines
that the regularization is too weak, the parameter setting unit 324
makes the value of the regularization parameter larger than the
current value. Note that the larger the value of the regularization
parameter, the stronger the regularization, and the smaller the
value of the regularization parameter, the weaker the
regularization. The parameter setting unit 324 supplies data
indicating the regularization parameter after adjustment to the
learning unit 312.
[0147] The learning unit 312 sets the value of the regularization
parameter to the value set by the parameter setting unit 324.
Therefore, the regularization is more normally performed in the
learning processing of the learning model.
[0148] Note that, for example, a user may adjust a learning
parameter such as the regularization parameter and the like with
reference to the analysis data displayed on the output unit
315.
[0149] In step S5, the learning situation analysis unit 313
determines whether or not the learning processing by the learning
unit 312 has ended. In a case where it is determined that the
learning processing has not been ended yet, the processing returns
to step S1. Thereafter, the processing of steps S1 to S5 is
repeatedly executed until it is determined in step S5 that the
learning processing has ended.
[0150] On the other hand, in a case where it is determined in step
S5 that the learning processing has ended, the learning situation
analysis processing ends.
[0151] As described above, the learning situation of the learning
model by the learning unit 312 can be analyzed. Furthermore, it is
possible to appropriately set the parameter of the learning
processing on the basis of the analysis result to improve learning
accuracy or shorten a learning time.
[0152] Furthermore, the user can easily recognize the learning
situation of the learning model by visually recognizing the
analysis data.
2. Case Example of Analysis of Learning Situation
[0153] Next, a case example in which a learning situation of a
learning model is analyzed using the information processing
apparatus 301 of FIG. 4 will be described with reference to FIGS.
16 to 22.
[0154] Note that, in the above description, an example of analyzing
the learning situation of the object recognition model 201 that
performs the recognition processing of the vehicle on the basis of
the captured image has been described. However, the type of image
data used by the learning model to be analyzed by the information
processing apparatus 301 and the type of the object to be
recognized are not particularly limited. Hereinafter, a case
example will be described in which a learning situation of an
object recognition model that performs recognition processing of a
vehicle in front of the vehicle 10 is analyzed on the basis of a
captured image obtained by imaging the front of the vehicle 10 and
a millimeter wave image output from a millimeter wave radar that
monitors the front of the vehicle 10.
[0155] <Configuration Example of Object Recognition Model
401>
[0156] FIG. 16 illustrates a configuration example of an object
recognition model 401 that is a learning model to be analyzed for a
learning situation. The object recognition model 401 is used, for
example, in the vehicle exterior information detection unit 141 of
the vehicle control system 100 in FIG. 1.
[0157] For example, a captured image that is image data obtained by
imaging the front of the vehicle 10 (for example, a captured image
402) and a millimeter wave image output from the millimeter wave
radar that monitors the front of the vehicle 10 (for example, a
millimeter wave image 403) are input to the object recognition
model 401 as input data. Note that the millimeter wave image is,
for example, image data representing a distribution of intensity of
a received signal of the millimeter wave radar reflected by the
object in front of the vehicle 10 in a bird's-eye view. Then, the
object recognition model 401 performs recognition processing of a
vehicle in front of the vehicle 10 on the basis of the captured
image and the millimeter wave image, and outputs an output image
which is image data indicating a recognition result (for example,
an output image 404) as output data.
[0158] The object recognition model 401 is a learning model using a
Deconvolutional Single Shot Detector (DSSD). The object recognition
model 401 includes a feature extraction layer 411, a feature
extraction layer 412, a combining unit 413, and a prediction layer
414.
[0159] The feature extraction layer 411 has a configuration similar
to that of the feature extraction layer 211 of the object
recognition model 201 in FIG. 2. Each hierarchy of the feature
extraction layer 411 generates a feature map indicating a feature
of the captured image by predetermined calculation, and supplies
the feature map to the next hierarchy. Furthermore, some
hierarchies of the feature extraction layer 411 supply the feature
maps to the combining unit 413.
[0160] The feature extraction layer 412 has a hierarchical
structure, and each hierarchy includes a convolutional layer, a
pooling layer, and the like. Each hierarchy of the feature
extraction layer 412 generates a feature map indicating a feature
of the millimeter wave image by predetermined calculation, and
supplies the feature map to the next hierarchy. Furthermore, some
hierarchies of the feature extraction layer 412 supply the feature
maps to the combining unit 413. Moreover, the feature extraction
layer 412 converts the millimeter wave image into an image in the
same camera coordinate system as the captured image.
[0161] The combining unit 413 combines the feature maps output from
the corresponding hierarchies of the feature extraction layer 411
and the feature extraction layer 412, and supplies the combined
feature maps to the prediction layer 414.
[0162] The prediction layer 414 performs recognition processing of
a vehicle in front of the vehicle 10 on the basis of the feature
maps supplied from the combining unit 413. The prediction layer 414
outputs an output image indicating a recognition result of the
vehicle.
[0163] Note that the object recognition model 401 is divided into a
camera network 421, a millimeter wave radar network 422, and a
combining network 423.
[0164] The camera network 421 includes a first half portion of the
feature extraction layer 411 that generates a feature map that is
not to be combined from the captured image.
[0165] The millimeter wave radar network 422 includes a first half
portion of the feature extraction layer 412 that generates a
feature map that is not to be combined from the millimeter wave
image.
[0166] The combining network 423 includes a second half portion of
the feature extraction layer 411 that generates a feature map to be
combined, a second half portion of the feature extraction layer 412
that generates a feature map to be combined, the combining unit
413, and the prediction layer 414.
[0167] Here, learning processing of the object recognition model
401 has been performed, and recognition processing of a vehicle in
front of the vehicle 10 has been performed using the object
recognition model 401 after learning. As a result, recognition of
the vehicle has failed without being able to recognize
anything.
[0168] Therefore, a learning situation of the object recognition
model 401 has been analyzed by the learning situation analysis
processing described above with reference to FIG. 5. Specifically,
analysis processing of a feature map generated in and output from a
hierarchy 431, which is an intermediate layer of the feature
extraction layer 411 surrounded by a thick frame line in FIG. 16,
and a feature map generated in and output from a hierarchy 432,
which is an intermediate layer of the feature extraction layer 412,
has been performed. Note that the feature map generated in the
hierarchy 431 is a feature map having the largest size among the
feature maps of the feature extraction layer 411 used for the
recognition processing of the prediction layer 414. The feature map
generated in the hierarchy 432 is a feature map having the largest
size among the feature maps of the feature extraction layer 412
used for the recognition processing of the prediction layer
414.
[0169] Then, in analysis data based on the feature map generated in
the hierarchy 431 of the feature extraction layer 411, as the
learning processing proceeds, a horizontal line disappears, and the
number of vertical lines converges within a predetermined range, as
illustrated in FIG. 11 described above.
[0170] On the other hand, in analysis data based on the feature map
generated in the hierarchy 432 of the feature extraction layer 412,
as illustrated in A to E of FIG. 17, even if the learning
processing proceeds, horizontal lines do not disappear and remain.
In other words, it has been found that learning is not
appropriately performed in the feature extraction layer 412 and a
feature of the millimeter wave image is not appropriately
extracted. Note that analysis data in A to E of FIG. 17 is arranged
in learning progress order, similarly to A to E of FIG. 11. FIG. 18
is an enlarged view of the analysis data in E of FIG. 17.
[0171] Therefore, in order to investigate a cause of appearance of
the horizontal lines in the analysis data, the captured images and
the millimeter wave images corresponding to rows in which the
horizontal lines appear have been examined.
[0172] FIGS. 19 to 21 each schematically illustrate a millimeter
wave image and a captured image corresponding to a row indicated by
an arrow in FIG. 18 in which the horizontal line of the analysis
data appears. A of FIGS. 19 to 21 illustrates the millimeter wave
image converted into a grayscale image, and B of FIGS. 19 to 21
illustrates the captured image as a diagram.
[0173] Then, as illustrated in examples of the captured images in A
of FIGS. 19 to 21, it has been found that the vehicle 10 travels in
a center lane of a highway in all the rows in which the horizontal
lines of the analysis data appear.
[0174] Furthermore, FIG. 22 illustrates an example of feature maps
of rows in which horizontal lines of analysis data appear. A
millimeter wave image 501 is obtained by converting a millimeter
wave image of rows in which horizontal lines of analysis data
appear into a grayscale image, and feature maps 502-1 to 502-4 are
some of feature maps generated from the millimeter wave image
501.
[0175] In the feature maps 502-1 to 502-4, pixel values greatly
change near left and right ends in front of the vehicle 10.
Therefore, it can be seen that, in the feature maps generated from
the millimeter wave image, features corresponding to not the
vehicle in front of the vehicle 10 but left and right walls and the
like in front of the vehicle are easily extracted.
[0176] Therefore, it has been found that there is a possibility
that the feature extraction layer 412 is more suitable for
recognizing the left and right walls and the like than the vehicle
in front of the vehicle 10.
[0177] As described above, by using the analysis data, it is
possible to easily specify the cause of the failure of the object
recognition model 401 in recognizing the vehicle.
3. Modified Examples
[0178] Hereinafter, modified examples of the above-described
embodiments of the present technology will be described.
[0179] Feature data used for analysis data is not limited to the
variance of the pixel values of the feature map described above,
and other numerical values representing the features of the feature
map can be used.
[0180] For example, an average value, a maximum value, a median
value, and the like of the pixel values of the feature map can be
used for the feature data.
[0181] Furthermore, for example, a norm of the feature map
calculated by the following formula (3) may be used for the feature
data.
[ Mathematical .times. .times. formula .times. .times. 2 ] .times.
norm = 1 I .times. J .times. i = 0 I .times. j = 0 J .times. A i ,
j - mean n m ( 3 ) ##EQU00002##
[0182] norm in the formula (3) indicates the norm of the feature
map. n and m represent arbitrary numbers. Other symbols are similar
to those in the above-described formula (1).
[0183] The norm of the feature map increases as a degree of
dispersion of pixel values of pixels with respect to an average
value of the pixel values of the feature map increases, and
decreases as the degree of dispersion of the pixel values of the
pixels with respect to the average value of the pixel values of the
feature map decreases. Therefore, the norm of the feature map
indicates the degree of dispersion of the pixel values of the
feature map.
[0184] Moreover, for example, a frequency distribution of the pixel
values of the feature map may be used for the feature data.
[0185] FIG. 23 illustrates an example of a histogram (frequency
distribution) of the pixel values of the feature map. A horizontal
axis indicates a grade based on the pixel values of the feature
map. In other words, the pixel values of the feature map are
classified into a plurality of grades. A vertical axis indicates a
frequency. In other words, the number of pixels of the feature map
whose pixel values belong to each grade is indicated.
[0186] Then, analysis data illustrated in FIG. 24 is generated on
the basis of a plurality of feature maps generated from a plurality
of input images (for example, the captured image, the millimeter
wave image, and the like described above).
[0187] Analysis data 521 in FIG. 24 is three-dimensional data in
which two-dimensional frequency maps 522-1 to 522-m generated for
each grade of the histogram are arranged in a z-axis direction (a
depth direction).
[0188] Specifically, an x-axis direction (a horizontal direction)
of the frequency map 522-1 indicates a feature map number, and a
y-axis direction thereof indicates an input image number.
[0189] In the frequency map 522-1, frequencies of a first grade of
the histogram of the feature map of each input image are arranged
in the x-axis direction (horizontal direction) in numerical order
of the corresponding feature maps. Furthermore, frequencies
corresponding to the feature maps of the same number of different
input images are arranged in the y-axis direction (vertical
direction) in numerical order of the corresponding input
images.
[0190] Similarly, in the frequency maps 522-2 to 522-m, frequencies
of an i-th (i=2 to m) grade of the histogram of the feature map of
each captured image are arranged in the x-axis direction and the
y-axis direction.
[0191] The frequency maps 522-1 to 522-m can be used for analysis
of a learning situation on the basis of vertical lines and
horizontal lines, for example, similarly to the two-dimensional
analysis data described above. Furthermore, for example, the
analysis data 521 can be used for the analysis of the learning
situation on the basis of lines in the z-axis direction.
[0192] Furthermore, for example, only feature data satisfying a
predetermined condition may be extracted from among feature data
based on a plurality of feature maps generated from one or more
image data, and analysis data including the extracted feature data
may be generated. For example, in a case where the feature data is
variance of pixel values of a feature map, feature data having a
value equal to or greater than a predetermined threshold value may
be extracted from among feature data based on a plurality of
feature maps generated from one image data, and analysis data
including the extracted feature data may be generated. In this
case, the analysis data includes only feature data (=variance of
pixel values) of a feature map having variance of pixel values
equal to or greater than the predetermined threshold value.
[0193] Moreover, for example, analysis data may be generated on the
basis of a plurality of feature maps generated for one or more
image data in different hierarchies of a neural network. For
example, three-dimensional analysis data may be generated by
laminating, in the z-axis direction, data for each hierarchy
obtained by two-dimensionally arraying feature data of the feature
map generated in each hierarchy in the x-axis direction and the
y-axis direction. In other words, in this analysis data, feature
data of feature maps generated in the same hierarchy is arranged in
the x-axis direction and the y-axis direction, and feature data of
feature maps generated in different hierarchies is arranged in the
z-axis direction.
[0194] Furthermore, in the above description, an example has been
described in which the analysis processing of the learning
situation is performed in parallel with the learning processing.
However, for example, the analysis processing of the learning
situation can be performed after the end of the learning
processing. For example, feature maps generated during the learning
processing may be accumulated, and the analysis data may be
generated on the basis of the feature maps accumulated after the
learning processing to analyze the learning situation.
[0195] Moreover, the learning model to be analyzed in the learning
processing is not limited to the above-described example, and can
be a general learning model using a neural network. For example, a
recognition model that recognizes an object other than the vehicle
and a recognition model that recognizes a plurality of objects
including the vehicle are also targets. Furthermore, a learning
model whose input data is other than image data is also a target.
For example, a voice recognition model using voice data as the
input data, a sentence analysis model using sentence data as the
input data, and the like are also targets.
4. Others
[0196] <Configuration Example of Computer>
[0197] The series of processing described above can be executed by
hardware or software. In a case where the series of processing is
executed by the software, a program constituting the software is
installed on a computer. Here, the computer includes a computer
incorporated in dedicated hardware, a general-purpose personal
computer capable of executing various functions by installing
various programs, and the like, for example.
[0198] FIG. 25 is a block diagram illustrating a configuration
example of hardware of a computer that executes the above-described
series of processing by a program.
[0199] In a computer 1000, a central processing unit (CPU) 1001, a
read only memory (ROM) 1002, and a random access memory (RAM) 1003
are mutually connected by a bus 1004.
[0200] Moreover, an input/output interface 1005 is connected to the
bus 1004. An input unit 1006, an output unit 1007, a recording unit
1008, a communication unit 1009, and a drive 1010 are connected to
the input/output interface 1005.
[0201] The input unit 1006 includes an input switch, a button, a
microphone, an imaging element, and the like. The output unit 1007
includes a display, a speaker, and the like. The recording unit
1008 includes a hard disk, a nonvolatile memory, and the like. The
communication unit 1009 includes a network interface and the like.
The drive 1010 drives a removable medium 1011 such as a magnetic
disk, an optical disk, a magneto-optical disk, or a semiconductor
memory.
[0202] In the computer 1000 configured as described above, for
example, the CPU 1001 loads a program recorded in the recording
unit 1008 into the RAM 1003 via the input/output interface 1005 and
the bus 1004 and executes the program, whereby the above-described
series of processing is performed.
[0203] The program executed by the computer 1000 (CPU 1001) can be
provided by recording on the removable medium 1011 as a package
medium and the like, for example. Furthermore, the program can be
provided via a wired or wireless transmission medium such as a
local area network, the Internet, or digital satellite
broadcasting.
[0204] In the computer 1000, the program can be installed in the
recording unit 1008 via the input/output interface 1005 by
attaching the removable medium 1011 to the drive 1010. Furthermore,
the program can be received by the communication unit 1009 via the
wired or wireless transmission medium and installed in the
recording unit 1008. In addition, the program can be installed in
the ROM 1002 or the recording unit 1008 in advance.
[0205] Note that the program executed by the computer may be a
program in which processing is performed in time series in the
order described in the present specification, or may be a program
in which processing is performed in parallel or at necessary timing
such as when a call is made, and the like.
[0206] Furthermore, in the present specification, the system means
a set of a plurality of components (devices, modules (parts), and
the like), and it does not matter whether or not all the components
are in the same housing. Therefore, a plurality of devices housed
in separate housings and connected via a network, and one device
housing a plurality of modules in one housing are both systems.
[0207] Moreover, embodiments of the present technology are not
limited to the above-described embodiments, and various
modifications can be made without departing from the scope of the
present technology.
[0208] For example, the present technology can be configured as
cloud computing in which one function is shared and jointly
processed by a plurality of devices via a network.
[0209] Furthermore, each step described in the above-described
flowcharts can be executed by one device or shared and executed by
a plurality of devices.
[0210] Moreover, in a case where one step includes a plurality of
processing, the plurality of processing included in the one step
can be executed by one device or shared and executed by a plurality
of devices.
[0211] <Combination Example of Configurations>
[0212] The present technology can have the following
configurations.
[0213] (1)
[0214] An information processing method including:
[0215] a feature data generation step of generating feature data
that numerically represents a feature of a feature map generated
from input data in a model using a neural network; and an analysis
data generation step of generating analysis data based on the
feature data of a plurality of the feature maps.
[0216] (2)
[0217] The information processing method according to (1),
[0218] in which the feature data of a plurality of the feature maps
is arranged in the analysis data.
[0219] (3)
[0220] The information processing method according to (2),
[0221] in which in the analysis data, the feature data of a
plurality of the feature maps generated from a plurality of the
input data is arranged in a predetermined hierarchy of the model
that generates a plurality of the feature maps from one of the
input data.
[0222] (4)
[0223] The information processing method according to (3),
[0224] in which the feature data of a plurality of the feature maps
generated from the same input data is arranged in a first direction
of the analysis data, and the feature data of the feature maps
corresponding to each other of the different input data is arranged
in a second direction orthogonal to the first direction.
[0225] (5)
[0226] The information processing method according to (4),
[0227] in which the feature data indicates a degree of dispersion
of pixel values of the feature map.
[0228] (6)
[0229] The information processing method according to (5), further
including:
[0230] an analysis step of analyzing a learning situation of the
model on the basis of a line in the first direction and a line in
the second direction of the analysis data.
[0231] (7)
[0232] The information processing method according to (6), further
including:
[0233] a parameter setting step of setting a parameter for learning
of the model on the basis of an analysis result of the learning
situation of the model.
[0234] (8)
[0235] The information processing method according to (7),
[0236] in which in the parameter setting step, a regularization
parameter for learning of the model is set on the basis of the
number of lines in the second direction of the analysis data.
[0237] (9)
[0238] The information processing method according to (3),
[0239] in which the feature data indicates a frequency distribution
of pixel values of the feature map, and
[0240] in the analysis data, the feature data of a plurality of the
feature maps generated from a plurality of the input data is
three-dimensionally arranged in the hierarchy of the model.
[0241] (10)
[0242] The information processing method according to any one of
(1) to (9),
[0243] in which the input data is image data, and
[0244] the model performs recognition processing of an object.
[0245] (11)
[0246] The information processing method according to (10),
[0247] in which the model performs recognition processing of a
vehicle.
[0248] (12)
[0249] The information processing method according to (11),
[0250] in which the input data is image data representing a
distribution of intensity of a received signal of a millimeter wave
radar in a bird's-eye view.
[0251] (13)
[0252] The information processing method according to (12),
[0253] in which the model converts the image data into an image in
a camera coordinate system.
[0254] (14)
[0255] The information processing method according to (1),
[0256] in which the analysis data includes the feature data
satisfying a predetermined condition among the feature data of a
plurality of the feature maps.
[0257] (15)
[0258] A program that causes a computer to execute processing
including:
[0259] a feature data generation step of generating feature data
that numerically represents a feature of a feature map generated
from input data in a model using a neural network; and
[0260] an analysis data generation step of generating analysis data
in which the feature data of a plurality of the feature maps is
arranged.
[0261] (16)
[0262] An information processing apparatus including:
[0263] a feature data generation unit that generates feature data
that numerically represents a feature of a feature map generated
from input data in a model using a neural network; and
[0264] an analysis data generation unit that generates analysis
data in which the feature data of a plurality of the feature maps
is arranged.
[0265] Note that the effects described in the present specification
are merely examples and are not limited, and there may be other
effects.
REFERENCE SIGNS LIST
[0266] 10 Vehicle [0267] 100 Vehicle control system [0268] 141
Vehicle exterior information detection unit [0269] 201 Object
recognition model [0270] 211 Feature extraction layer [0271] 221
Hierarchy [0272] 301 Information processing apparatus [0273] 312
Learning unit [0274] 313 Learning situation analysis unit [0275]
321 Feature data generation unit [0276] 322 Analysis data
generation unit [0277] 323 Analysis unit [0278] 324 Parameter
setting unit [0279] 401 Object recognition model [0280] 411, 412
Feature extraction layer [0281] 421 Camera network [0282] 422
Millimeter wave radar network [0283] 423 Combining network [0284]
431, 432 Hierarchy
* * * * *