U.S. patent application number 15/431465 was filed with the patent office on 2017-07-20 for apparatus and methods for operating robotic devices using selective state space training.
The applicant listed for this patent is Brain Corporation. Invention is credited to Eugene Izhikevich, Jean-Baptiste Passot, Oleg Sinyavskiy.
Application Number | 20170203437 15/431465 |
Document ID | / |
Family ID | 53007606 |
Filed Date | 2017-07-20 |
United States Patent
Application |
20170203437 |
Kind Code |
A1 |
Passot; Jean-Baptiste ; et
al. |
July 20, 2017 |
APPARATUS AND METHODS FOR OPERATING ROBOTIC DEVICES USING SELECTIVE
STATE SPACE TRAINING
Abstract
Apparatus and methods for training and controlling of e.g.,
robotic devices. In one implementation, a robot may be utilized to
perform a target task characterized by a target trajectory. The
robot may be trained by a user using supervised learning. The user
may interface to the robot, such as via a control apparatus
configured to provide a teaching signal to the robot. The robot may
comprise an adaptive controller comprising a neuron network, which
may be configured to generate actuator control commands based on
the user input and output of the learning process. During one or
more learning trials, the controller may be trained to navigate a
portion of the target trajectory. Individual trajectory portions
may be trained during separate training trials. Some portions may
be associated with robot executing complex actions and may require
additional training trials and/or more dense training input
compared to simpler trajectory actions.
Inventors: |
Passot; Jean-Baptiste;
(Solana Beach, CA) ; Sinyavskiy; Oleg; (San Diego,
CA) ; Izhikevich; Eugene; (San Diego, CA) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
Brain Corporation |
San Diego |
CA |
US |
|
|
Family ID: |
53007606 |
Appl. No.: |
15/431465 |
Filed: |
February 13, 2017 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
14070269 |
Nov 1, 2013 |
9566710 |
|
|
15431465 |
|
|
|
|
Current U.S.
Class: |
1/1 |
Current CPC
Class: |
G05B 2219/39289
20130101; G06N 20/00 20190101; G05B 2219/33034 20130101; B25J 9/163
20130101; G06N 3/008 20130101; G05B 2219/39298 20130101; G06N 3/08
20130101; B25J 9/161 20130101; G06N 3/049 20130101; Y10S 901/03
20130101 |
International
Class: |
B25J 9/16 20060101
B25J009/16; G06N 3/08 20060101 G06N003/08 |
Claims
1.-26. (canceled)
27. A method of operating a robotic controller apparatus, the
method comprising: determining by the robotic controller apparatus
a current performance measure of the robotic controller apparatus
associated with performing a target task autonomously along a
target trajectory; selecting by the robotic controller apparatus a
first portion of the target trajectory, the first portion being
characterized by a first performance measure that is lower as
compared to a second performance measure associated with another
portion of the target trajectory; and receiving a teaching input
for navigating the first portion of the target trajectory, the
teaching input being configured to navigate the robot towards the
target trajectory and improve the current performance measure.
28. The method of claim 27, wherein the robotic controller
apparatus is operable in accordance with a supervised learning
process configured based at least in part on the teaching input,
the supervised learning process being adapted based at least in
part on the current performance measure.
29. The method of claim 27, wherein the first portion of the target
trajectory is characterized by a state space, and an extent of the
state space is characterized by a first dimension having a first
value and by a second dimension having a second value.
30. The method of claim 29, further comprising performing the
target task autonomously, wherein: the performance of the target
task by the robotic controller apparatus comprises a provision of a
control signal by the robotic controller apparatus to a robotic
platform; and the first dimension is selected from the group
consisting of a spatial coordinate, a velocity, an acceleration,
and an orientation of the platform.
31. The method of claim 29, wherein: the determining the first
portion of the target trajectory is based at least on the first
dimension being outside a target range of at least one state space
parameter.
32. The method of claim 27, wherein the first performance measure
and second performance measure are determined based at least in
part on a deviation of an actual position of the robotic controller
apparatus from the target trajectory.
33. The method of claim 27, wherein the robotic controller
apparatus is configured to operate in accordance with a supervised
learning process based at least on the teaching input and a
plurality of training trials, the supervised learning process being
adapted based on the current performance measure.
34. The method of claim 27, wherein the teaching input comprises a
control signal for a controllable degree of freedom of motion of
the robotic controller apparatus.
35. A robot comprising: a platform configured to navigate an
environment; a sensor module configured to provide information
related to the environment of the platform; and an adaptive
controller configured to: provide navigation instructions to the
platform based at least in part on the information provided by the
sensor module; receive one or more teaching inputs providing
instructions for the navigation of the platform; determine a
current performance measure of the platform associated with
navigating autonomously along a target trajectory; select a first
portion of the target trajectory, the first portion being
characterized by a first performance measure that is lower as
compared to a second performance measure associated with another
portion of the target trajectory; and receive a first teaching
input for navigating the first portion of the target trajectory,
the first teaching input being configured to provide instructions
to navigate the robot towards the target trajectory and improve the
current performance measure.
36. The robot of claim 35, further comprising a user interface
configured to receive the first teaching input.
37. The robot of claim 36, wherein the user interface is remotely
located from the robot and communicatively coupled with the
robot.
38. The robot of claim 35, further comprising one or more actuators
configured to actuate the robot along one or more controllable
degrees of freedom of motion.
39. The robot of claim 35, wherein the adaptive controller is
further configured to: request assistance for navigating the first
portion of the target trajectory, wherein receiving the first
teaching input is in response to the request for assistance.
40. The robot of claim 35, wherein the first portion comprises a
state space extent that is less than half of a state space extent
associated with traversing all of the target trajectory.
41. An adaptive controller apparatus comprising a plurality of
computer readable instructions configured to, when executed, cause
a performance of a target task, the computer readable instructions
configured to cause the adaptive controller apparatus to: receive
information relating to autonomous navigation of a robotic device;
determine from the information a current performance measure of the
robotic device associated with navigating autonomously along a
target trajectory; select a first portion of the target trajectory,
the first portion being characterized by a first performance
measure that is lower as compared to a second performance measure
associated with another portion of the target trajectory; and
receive a first teaching input for navigating the first portion of
the target trajectory, the first teaching input being configured to
provide instructions to navigate the robotic device towards the
target trajectory and improve the current performance measure.
42. The apparatus of claim 41, wherein the computer readable
instructions are further configured to cause the adaptive
controller apparatus to perform the target task autonomously by
sending a control signal to the robotic device.
43. The apparatus of claim 41, wherein the computer readable
instructions are further configured to cause the adaptive
controller apparatus to perform a supervised learning process over
a plurality of trials, the supervised learning process being
adapted based at least in part on the first teaching input and the
current performance measure.
44. The apparatus of claim 41, wherein the teaching input comprises
a control signal for a controllable degree of freedom of motion of
the adaptive controller apparatus.
45. The apparatus of claim 41, wherein: the received information
provides a context to the adaptive controller apparatus; and the
adaptive controller apparatus determines a navigation control
signal associated with the context and sends the control signal to
the robotic device.
46. The apparatus of claim 41, wherein the target task comprises a
cleaning task.
Description
PRIORITY
[0001] This application is a continuation of, and claims priority
to, U.S. patent application Ser. No. 14/070,269 filed Nov. 1, 2013
of the same title, the contents of which are incorporated herein by
reference in its entirety.
CROSS-REFERENCE TO RELATED APPLICATIONS
[0002] This application is related to co-pending and co-owned U.S.
patent application Ser. No. 14/070,239 entitled "REDUCED DEGREE OF
FREEDOM ROBOTIC CONTROLLER APPARATUS AND METHODS", filed herewith,
and U.S. patent application Ser. No. 14/070,114 entitled "APPARATUS
AND METHODS FOR ONLINE TRAINING OF ROBOTS", filed herewith, each of
the foregoing being incorporated herein by reference in its
entirety.
[0003] This application is also related to commonly owned, and
co-pending U.S. patent application Ser. No. 13/866,975, entitled
"APPARATUS AND METHODS FOR REINFORCEMENT-GUIDED SUPERVISED
LEARNING", filed Apr. 19, 2013, Ser. No. 13/918,338 entitled
"ROBOTIC TRAINING APPARATUS AND METHODS", filed Jun. 14, 2013, Ser.
No. 13/918,298, entitled "HIERARCHICAL ROBOTIC CONTROLLER APPARATUS
AND METHODS", filed Jun. 14, 2013, Ser. No. 13/907,734, entitled
"ADAPTIVE ROBOTIC INTERFACE APPARATUS AND METHODS", filed May 31,
2013, Ser. No. 13/842,530, entitled "ADAPTIVE PREDICTOR APPARATUS
AND METHODS", filed Mar. 15, 2013, Ser. No. 13/842,562, entitled
"ADAPTIVE PREDICTOR APPARATUS AND METHODS FOR ROBOTIC CONTROL",
filed Mar. 15, 2013, Ser. No. 13/842,616, entitled "ROBOTIC
APPARATUS AND METHODS FOR DEVELOPING A HIERARCHY OF MOTOR
PRIMITIVES", filed Mar. 15, 2013, Ser. No. 13/842,647, entitled
"MULTICHANNEL ROBOTIC CONTROLLER APPARATUS AND METHODS", filed Mar.
15, 2013, Ser. No. 13/842,583, entitled "APPARATUS AND METHODS FOR
TRAINING OF ROBOTIC DEVICES", filed Mar. 15, 2013, Ser. No.
13/152,084, filed Jun. 2, 2011, entitled "APPARATUS AND METHODS FOR
PULSE-CODE INVARIANT OBJECT RECOGNITION", Ser. No. 13/757,607,
filed Feb. 1, 2013, entitled "TEMPORAL WINNER TAKES ALL SPIKING
NEURON NETWORK SENSORY PROCESSING APPARATUS AND METHODS", Ser. No.
13/623,820, filed Sep. 20, 2012, entitled "APPARATUS AND METHODS
FOR ENCODING OF SENSORY DATA USING ARTIFICIAL SPIKING NEURONS",
Ser. No. 13/623,842, entitled "SPIKING NEURON NETWORK ADAPTIVE
CONTROL APPARATUS AND METHODS", filed Sep. 20, 2012, Ser. No.
13/487,499, entitled "STOCHASTIC APPARATUS AND METHODS FOR
IMPLEMENTING GENERALIZED LEARNING RULES", filed Jun. 4, 2012, Ser.
No. 13/465,903 entitled "SENSORY INPUT PROCESSING APPARATUS IN A
SPIKING NEURAL NETWORK", filed May 7, 2012, Ser. No. 13/488,106,
entitled "SPIKING NEURON NETWORK APPARATUS AND METHODS", filed Jun.
4, 2012, Ser. No. 13/541,531, entitled "CONDITIONAL PLASTICITY
SPIKING NEURON NETWORK APPARATUS AND METHODS", filed Jul. 3, 2012,
Ser. No. 13/691,554, entitled "RATE STABILIZATION THROUGH
PLASTICITY IN SPIKING NEURON NETWORK", filed Nov. 30, 2012, Ser.
No. 13/660,967, entitled "APPARATUS AND METHODS FOR ACTIVITY-BASED
PLASTICITY IN A SPIKING NEURON NETWORK", filed Oct. 25, 2012, Ser.
No. 13/660,945, entitled "MODULATED PLASTICITY APPARATUS AND
METHODS FOR SPIKING NEURON NETWORKS", filed Oct. 25, 2012, Ser. No.
13/774,934, entitled "APPARATUS AND METHODS FOR RATE-MODULATED
PLASTICITY IN A SPIKING NEURON NETWORK", filed Feb. 22, 2013, Ser.
No. 13/763,005, entitled "SPIKING NETWORK APPARATUS AND METHOD WITH
BIMODAL SPIKE-TIMING DEPENDENT PLASTICITY", filed Feb. 8, 2013,
Ser. No. 13/660,923, entitled "ADAPTIVE PLASTICITY APPARATUS AND
METHODS FOR SPIKING NEURON NETWORK", filed Oct. 25, 2012, Ser. No.
13/239,255 filed Sep. 21, 2011, entitled "APPARATUS AND METHODS FOR
SYNAPTIC UPDATE IN A PULSE-CODED NETWORK", Ser. No. 13/588,774,
entitled "APPARATUS AND METHODS FOR IMPLEMENTING EVENT-BASED
UPDATES IN SPIKING NEURON NETWORKS", filed Aug. 17, 2012, Ser. No.
13/560,891 entitled "APPARATUS AND METHODS FOR EFFICIENT UPDATES IN
SPIKING NEURON NETWORK", filed Jul. 27, 2012, Ser. No. 13/560,902,
entitled "APPARATUS AND METHODS FOR STATE-DEPENDENT LEARNING IN
SPIKING NEURON NETWORKS", filed Jul. 27, 2012, Ser. No. 13/722,769
filed Dec. 20, 2012, and entitled "APPARATUS AND METHODS FOR
STATE-DEPENDENT LEARNING IN SPIKING NEURON NETWORKS", Ser. No.
13/842,530 entitled "ADAPTIVE PREDICTOR APPARATUS AND METHODS",
filed Mar. 15, 2013, Ser. No. 13/239,255 filed Sep. 21, 2011,
entitled "APPARATUS AND METHODS FOR SYNAPTIC UPDATE IN A
PULSE-CODED NETWORK", Ser. No. 13/487,576 entitled "DYNAMICALLY
RECONFIGURABLE STOCHASTIC LEARNING APPARATUS AND METHODS", filed
Jun. 4, 2012; Ser. No. 13/953,595 entitled "APPARATUS AND METHODS
FOR TRAINING AND CONTROL OF ROBOTIC DEVICES", filed Jul. 29, 2013;
Ser. No. 13/918,620 entitled "PREDICTIVE ROBOTIC CONTROLLER
APPARATUS AND METHODS", filed Jun. 14, 2013; and commonly owned
U.S. Pat. No. 8,315,305, issued Nov. 20, 2012, entitled "SYSTEMS
AND METHODS FOR INVARIANT PULSE LATENCY CODING"; each of the
foregoing incorporated herein by reference in its entirety.
COPYRIGHT
[0004] A portion of the disclosure of this patent document contains
material that is subject to copyright protection. The copyright
owner has no objection to the facsimile reproduction by anyone of
the patent document or the patent disclosure, as it appears in the
Patent and Trademark Office patent files or records, but otherwise
reserves all copyright rights whatsoever.
BACKGROUND
[0005] Technological Field
[0006] The present disclosure relates to adaptive control and
training, such as control and training of robotic devices.
[0007] Background
[0008] Robotic devices are used in a variety of industries, such as
manufacturing, medical, safety, military, exploration, and/or
other. Robotic "autonomy", i.e., the degree of human control,
varies significantly according to application. Some existing
robotic devices (e.g., manufacturing assembly and/or packaging) may
be programmed in order to perform desired functionality without
further supervision. Some robotic devices (e.g., surgical robots)
may be controlled by humans.
[0009] Robotic devices may comprise hardware components that enable
the robot to perform actions in 1-dimension (e.g., a single range
of movement), 2-dimensions (e.g., a plane of movement), and/or
3-dimensions (e.g., a space of movement). Typically, movement is
characterized according to so-called "degrees of freedom". A degree
of freedom is an independent range of movement; a mechanism with a
number of possible independent relative movements (N) is said to
have N degrees of freedom. Some robotic devices may operate with
multiple degrees of freedom (e.g., a turret and/or a crane arm
configured to rotate around vertical and/or horizontal axes). Other
robotic devices may be configured to follow one or more
trajectories characterized by one or more state parameters (e.g.,
position, velocity, acceleration, orientation, and/or other). It is
further appreciated that some robotic devices may simultaneously
control multiple actuators (degrees of freedom) resulting in very
complex movements.
SUMMARY
[0010] One aspect of the disclosure relates to a non-transitory
computer readable medium having instructions embodied thereon. The
instructions, when executed, are configured to control a robotic
platform.
[0011] In another aspect, a method of operating a robotic
controller apparatus is disclosed. In one implementation, the
method includes: determining a current controller performance
associated with performing a target task; determining a "difficult"
portion of a target trajectory associated with the target task, the
difficult portion characterized by an extent of a state space; and
providing a training input for navigating the difficult portion,
the training input configured to transition the current performance
towards the target trajectory.
[0012] In one variant, the difficult portion of the target
trajectory is determined based at least on the current performance
being outside a range from the target trajectory; the state space
is associated with performing of the target task by the controller;
and performing by the controller of a portion of the target task
outside the extent is configured based on autonomous controller
operation.
[0013] In another variant, the controller is operable in accordance
with a supervised learning process configured based on the teaching
input, the learning process being adapted based on the current
performance; and the navigating of the difficult portion is based
at least in part on a combination of the teaching input and an
output of the controller learning process.
[0014] In a further variant, the extent is characterized by a first
dimension having a first value, and the state space is
characterized by a second dimension having a second value; and the
first value is less than one-half (1/2) of the second value.
[0015] In yet another variant, the controller is operable in
accordance with a supervised learning process configured based on
the teaching input and a plurality of training trials, the learning
process being adapted based on the current performance; and the
difficult trajectory portion determination is based at least on a
number of trials within the plurality of trials required to attain
the target performance.
[0016] In another aspect, an adaptive controller apparatus is
disclosed. In one implementation, the apparatus includes a
plurality of computer readable instructions configured to, when
executed, cause performing of a target task by at least: during a
first training trial, determining a predicted signal configured in
accordance with a sensory input, the predicted signal configured to
cause execution of an action associated with the target task, the
action execution being characterized by a first performance; during
a second training trial, based on a teaching input and the
predicted signal, determining a combined signal configured to cause
execution of the action, the action execution during the second
training trial being characterized by a second performance; and
adjusting a learning parameter of the controller based on the first
performance and the second performance.
[0017] In one variant of the apparatus, the execution of the target
task comprises execution of the action and at least one other
action; the adjusting of the learning parameter is configured to
enable the controller to determine, during a third training trial,
another predicted signal configured in accordance with the sensory
input; and the execution, based on the another predicted signal, of
the action during the third training trial is characterized by a
third performance that is closer to the target task compared to the
first performance.
[0018] In another variant, execution of the target task the target
task is characterized by a target trajectory in a state space;
execution of the action is characterized by a portion of the target
trajectory having a state space extent associated therewith; and
the state space extent occupies a minority fraction of the state
space.
[0019] In a further aspect, a robotic apparatus is disclosed. In
one implementation, the apparatus includes a platform characterized
by first and second degrees of freedom; a sensor module configured
to provide information related to the platform's environment; and
an adaptive controller apparatus configured to determine first and
second control signals to facilitate operation of the first and the
second degrees of freedom, respectively.
[0020] In one variant, the first and the second control signals are
configured to cause the platform to perform a target action; the
first control signal is determined in accordance with the
information and a teaching input; the second control signal is
determined in an absence of the teaching input and in accordance
with the information and a configuration of the controller; and the
configuration is determined based at least on an outcome of
training of the controller to operate the second degree of
freedom.
[0021] In another variant, the determination of the first control
signal is effectuated based at least on a supervised learning
process characterized by multiple iterations; and performance of
the target action in accordance with the first control signal at a
given iteration is characterized by a first performance.
[0022] In a further aspect, a method of optimizing the operation of
a robotic controller apparatus is disclosed. In one implementation,
the method includes: determining a current controller performance
associated with performing a target task, the current performance
being non-optimal for accomplishing the task; and for at least a
selected first portion of a target trajectory associated with the
target task, the first portion characterized by an extent of a
state space, providing a training input that facilitates navigation
of the first portion, the training input configured to transition
the current performance towards the target trajectory.
[0023] In one variant, the first portion of the target trajectory
is selected based at least on the current performance not meeting
at least one prescribed criterion with respect to the target
trajectory. The at least one prescribed criterion comprises for
instance the current performance exceeding a disparity from, or
range associated with, an acceptable performance.
[0024] In another variant, a performance by the controller of a
portion of the target task outside the extent is effectuated in the
absence of the training input.
[0025] In yet another variant, the controller is configured to be
trained to perform the target task using multiple iterations; and
for a given iteration of the multiple iterations, the selected
first portion comprises a portion with a higher rate of non-optimal
performance determined based on one or more prior iterations of the
multiple iterations.
[0026] These and other features, and characteristics of the present
disclosure, as well as the methods of operation and functions of
the related elements of structure and the combination of parts and
economies of manufacture, will become more apparent upon
consideration of the following description and the appended claims
with reference to the accompanying drawings, all of which form a
part of this specification, wherein like reference numerals
designate corresponding parts in the various figures. It is to be
expressly understood, however, that the drawings are for the
purpose of illustration and description only and are not intended
as a definition of the limits of the disclosure. As used in the
specification and in the claims, the singular form of "a", "an",
and "the" include plural referents unless the context clearly
dictates otherwise.
BRIEF DESCRIPTION OF THE DRAWINGS
[0027] FIG. 1 is a graphical illustration depicting a robotic
manipulator apparatus operable in two degrees of freedom, according
to one or more implementations.
[0028] FIG. 2 is a graphical illustration depicting a robotic
control apparatus configured to activate a single robotic actuator
at a given time, according to one or more implementations.
[0029] FIG. 3 is a graphical illustration depicting a robotic rover
platform operable in two degrees of freedom, according to one or
more implementations.
[0030] FIG. 4 is a graphical illustration depicting a multilayer
neuron network configured to operate multiple degrees of freedom
of, e.g., a robotic apparatus of FIG. 1, according to one or more
implementations.
[0031] FIG. 5 is a graphical illustration depicting a single layer
neuron network configured to operate multiple degrees of freedom
of, e.g., a robotic apparatus of FIG. 1, according to one or more
implementations.
[0032] FIG. 6 is a logical flow diagram illustrating a method of
operating an adaptive robotic device, in accordance with one or
more implementations.
[0033] FIG. 7 is a logical flow diagram illustrating a method of
training an adaptive controller of a robot using a reduced degree
of freedom methodology, in accordance with one or more
implementations.
[0034] FIG. 8 is a logical flow diagram illustrating a method of
training an adaptive controller apparatus to control a robot using
a reduced degree of freedom methodology, in accordance with one or
more implementations.
[0035] FIG. 9 is a logical flow diagram illustrating a method of
training an adaptive controller of a robot using selective state
space training methodology, in accordance with one or more
implementations.
[0036] FIG. 10A is a graphical illustration depicting a race
vehicle trajectory useful with the selective state space training
methodology, according to one or more implementations.
[0037] FIG. 10B is a graphical illustration depicting a
manufacturing robot trajectory useful with the selective state
space training methodology, according to one or more
implementations.
[0038] FIG. 10C is a graphical illustration depicting an exemplary
state space trajectory useful with the selective state space
training methodology, according to one or more implementations.
[0039] FIG. 11A is a block diagram illustrating a computerized
system useful for, inter alia, operating a parallel network
configured using backwards error propagation methodology, in
accordance with one or more implementations.
[0040] FIG. 11B is a block diagram illustrating a cell-type
neuromorphic computerized system useful with, inter alia, backwards
error propagation methodology of the disclosure, in accordance with
one or more implementations.
[0041] FIG. 11C is a block diagram illustrating hierarchical
neuromorphic computerized system architecture useful with, inter
alia, backwards error propagation methodology, in accordance with
one or more implementations.
[0042] FIG. 11D is a block diagram illustrating cell-type
neuromorphic computerized system architecture useful with, inter
alia, backwards error propagation methodology, in accordance with
one or more implementations.
[0043] All Figures disclosed herein are .COPYRGT. Copyright 2017
Brain Corporation. All rights reserved.
DETAILED DESCRIPTION
[0044] Implementations of the present technology will now be
described in detail with reference to the drawings, which are
provided as illustrative examples so as to enable those skilled in
the art to practice the technology. Notably, the figures and
examples below are not meant to limit the scope of the present
disclosure to a single implementation, but other implementations
are possible by way of interchange of, or combination with, some or
all of the described or illustrated elements. Wherever convenient,
the same reference numbers will be used throughout the drawings to
refer to same or like parts.
[0045] Where certain elements of these implementations can be
partially or fully implemented using known components, only those
portions of such known components that are necessary for an
understanding of the present technology will be described, and
detailed descriptions of other portions of such known components
will be omitted so as not to obscure the disclosure.
[0046] In the present specification, an implementation showing a
singular component should not be considered limiting; rather, the
disclosure is intended to encompass other implementations including
a plurality of the same components, and vice-versa, unless
explicitly stated otherwise herein.
[0047] Further, the present disclosure encompasses present and
future known equivalents to the components referred to herein by
way of illustration.
[0048] As used herein, the term "bus" is meant generally to denote
all types of interconnection or communication architecture that are
used to access the synaptic and neuron memory. The "bus" may be
electrical, optical, wireless, infrared, and/or any type of
communication medium. The exact topology of the bus could be, for
example: a standard "bus", a hierarchical bus, a network-on-chip,
an address-event-representation (AER) connection, and/or any other
type of communication topology configured to access e.g., different
memories in a pulse-based system.
[0049] As used herein, the terms "computer", "computing device",
and "computerized device" may include one or more of personal
computers (PCs) and/or minicomputers (e.g., desktop, laptop, and/or
other PCs), mainframe computers, workstations, servers, personal
digital assistants (PDAs), handheld computers, embedded computers,
programmable logic devices, personal communicators, tablet
computers, portable navigation aids, J2ME equipped devices,
cellular telephones, smart phones, personal integrated
communication and/or entertainment devices, and/or any other device
capable of executing a set of instructions and processing an
incoming data signal.
[0050] As used herein, the term "computer program" or "software"
may include any sequence of human and/or machine cognizable steps
which perform a function. Such program may be rendered in a
programming language and/or environment including one or more of
C/C++, C#, Fortran, COBOL, MATLAB.TM., PASCAL, Python, assembly
language, markup languages (e.g., HTML, SGML, XML, VoXML),
object-oriented environments (e.g., Common Object Request Broker
Architecture (CORBA)), Java.TM. (e.g., J2ME, Java Beans), Binary
Runtime Environment (e.g., BREW), and/or other programming
languages and/or environments.
[0051] As used herein, the terms "synaptic channel", "connection",
"link", "transmission channel", "delay line", and "communications
channel" include a link between any two or more entities (whether
physical (wired or wireless), or logical/virtual) which enables
information exchange between the entities, and may be characterized
by a one or more variables affecting the information exchange.
[0052] As used herein, the term "memory" may include an integrated
circuit and/or other storage device adapted for storing digital
data. By way of non-limiting example, memory may include one or
more of ROM, PROM, EEPROM, DRAM, Mobile DRAM, SDRAM, DDR/2 SDRAM,
EDO/FPMS, RLDRAM, SRAM, "flash" memory (e.g., NAND/NOR), memristor
memory, PSRAM, and/or other types of memory.
[0053] As used herein, the terms "integrated circuit (IC)", and
"chip" are meant to refer without limitation to an electronic
circuit manufactured by the patterned diffusion of elements in or
on to the surface of a thin substrate. By way of non-limiting
example, integrated circuits may include field programmable gate
arrays (e.g., FPGAs), programmable logic devices (PLD),
reconfigurable computer fabrics (RCFs), application-specific
integrated circuits (ASICs), printed circuits, organic circuits,
and/or other types of computational circuits.
[0054] As used herein, the terms "microprocessor" and "digital
processor" are meant generally to include digital processing
devices. By way of non-limiting example, digital processing devices
may include one or more of digital signal processors (DSPs),
reduced instruction set computers (RISC), general-purpose (CISC)
processors, microprocessors, gate arrays (e.g., field programmable
gate arrays (FPGAs)), PLDs, reconfigurable computer fabrics (RCFs),
array processors, secure microprocessors, application-specific
integrated circuits (ASICs), and/or other digital processing
devices. Such digital processors may be contained on a single
unitary IC die, or distributed across multiple components.
[0055] As used herein, the term "network interface" refers to any
signal, data, and/or software interface with a component, network,
and/or process. By way of non-limiting example, a network interface
may include one or more of FireWire (e.g., FW400, FW800, etc.), USB
(e.g., USB2), Ethernet (e.g., 10/100, 10/100/1000 (Gigabit
Ethernet), 10-Gig-E, etc.), MoCA, Coaxsys (e.g., TVnet.TM.), radio
frequency tuner (e.g., in-band or OOB, cable modem, and/or other.),
Wi-Fi (802.11), WiMAX (802.16), PAN (e.g., 802.15), cellular (e.g.,
3G, LTE/LTE-A/TD-LTE, GSM, etc.), IrDA families, and/or other
network interfaces.
[0056] As used herein, the term "Wi-Fi" includes one or more of
IEEE-Std. 802.11, variants of IEEE-Std. 802.11, standards related
to IEEE-Std. 802.11 (e.g., 802.11 a/b/g/n/s/v), and/or other
wireless standards.
[0057] As used herein, the term "wireless" means any wireless
signal, data, communication, and/or other wireless interface. By
way of non-limiting example, a wireless interface may include one
or more of Wi-Fi, Bluetooth, 3G (3GPP/3GPP2), HSDPA/HSUPA, TDMA,
CDMA (e.g., IS-95A, WCDMA, etc.), FHSS, DSSS, GSM, PAN/802.15,
WiMAX (802.16), 802.20, narrowband/FDMA, OFDM, PCS/DCS,
LTE/LTE-A/TD-LTE, analog cellular, CDPD, satellite systems,
millimeter wave or microwave systems, acoustic, infrared (i.e.,
IrDA), and/or other wireless interfaces.
Overview and Description of Exemplary Implementations
[0058] Apparatus and methods for training and controlling of
robotic devices are disclosed. In one implementation, a robot or
other entity may be utilized to perform a target task characterized
by e.g., a target trajectory. The target trajectory may be, e.g., a
race circuit, a surveillance route, a manipulator trajectory
between a bin of widgets and a conveyor, and/or other. The robot
may be trained by a user, such as by using an online supervised
learning approach. The user may interface to the robot via a
control apparatus, configured to provide teaching signals to the
robot. In one variant, the robot may comprise an adaptive
controller comprising a neuron network, and configured to generate
actuator control commands based on the user input and output of the
learning process. During one or more learning trials, the
controller may be trained to navigate a portion of the target
trajectory. Individual trajectory portions may be trained during
separate training trials. Some trajectory portions may be
associated with the robot executing complex actions that may
require more training trials and/or more dense training input
compared to simpler trajectory actions. A complex trajectory
portion may be characterized by e.g., a selected range of state
space parameters associated with the task and/or operation by the
robot.
[0059] By way of illustration and example only, a robotic
controller of a race car may be trained to navigate a trajectory
(e.g., a race track), comprising one or more sharp turns (e.g.,
greater than, or equal to, 90.degree. in some implementations).
During training, the track may be partitioned into one or more
segments comprised of e.g., straightaway portions and turn
portions. The controller may be trained on one or more straightaway
portions during a first plurality of trials (e.g., between 1 and 10
in some implementations depending on the car characteristics,
trainer experience, and target performance). During a second number
of trials, the controller may be trained on one or more turn
portions (e.g., a 180.degree. turn) using a second plurality of
trials. The number of trials in the second plurality of trials may
be greater than number of first plurality of trials (e.g., between
10 and 1000 in some implementations), and may depend on factors
such as the car characteristics, trainer experience, and/or target
performance. Training may be executed in one or more training
sessions, e.g., every week to improve a particular performance for
a given turn.
[0060] In the exemplary context of the above race car, individual
ones of the one or more turn portions may be characterized by
corresponding ranges (subsets) of the state space associated with
the full trajectory of navigation. The range of state parameters
associated with each of the one or more turn portions may be
referred as a selected subset of the state space. The added
training associated with the state space subset may be referred to
as selective state space sampling (SSSS). Selection of a trajectory
portion for SSSS added training may be configured based on one or
more state parameters associated of the robotic device navigation
of the target trajectory. In one or more implementations, the
selection may be based on location (a range of coordinates),
velocity, acceleration, jerk, operational performance (e.g., lap
time), the rate of performance change over multiple trials, and/or
other parameters.
[0061] In some implementations of devices characterized by multiple
controllable degrees of freedom (CDOF), the trajectory portion
selection may correspond to training a subset of CDOF of the
device, and operating one or more remaining CDOF based on prior
training and/or pre-configured operational instructions.
[0062] An exemplary implementation of the robot may comprise an
adaptive controller implemented using e.g., a neuron network.
Training the adaptive controller may comprise for instance a
partial set training during so-called "trials". The user may train
the adaptive controller to separately train a first actuator
subset, and a second actuator subset of the robot. During a first
set of trials, the control apparatus may be configured to select
and operate a first subset of the robot's complement of actuators
e.g., operate a shoulder joint of a manipulator arm. The adaptive
controller network may be configured to generate control commands
for the shoulder joint actuator based on the user input and output
of the learning process. However, since a single actuator (e.g.,
the shoulder joint) may be inadequate for achieving a target task
(e.g., reaching a target object), subsequently thereafter the
adaptive controller may be trained to operate the second subset
(e.g., an elbow joint) during a second set of trials. During
individual trials of the second set of trials, the user may provide
control input for the second actuator, while the previously trained
network may provide control signaling for the first actuator (the
shoulder). Subsequent to performing the second set of trials, the
adaptive controller may be capable of controlling the first and the
second actuators in absence of user input by e.g., combining the
training of the first and second trials.
[0063] FIG. 1 illustrates one implementation of a robotic apparatus
for use with the robot training methodology set forth herein. The
apparatus 100 of FIG. 1 may comprise a manipulator arm comprised of
limbs 110, 112. The limb 110 orientation may be controlled by a
motorized joint 102, the limb 112 orientation may be controlled by
a motorized joint 106. The joints 102, 106 may enable control of
the arm 100 in two degrees of freedom, shown by arrows 108, 118 in
FIG. 1. The robotic arm apparatus 100 may be controlled in order to
perform one or more target actions, e.g., reach a target 120.
[0064] In some implementations, the arm 100 may be controlled using
an adaptive controller (e.g., comprising a neuron network described
below with respect to FIGS. 4-5). The controller may be operable in
accordance with a supervised learning process described in e.g.,
commonly owned, and co-pending U.S. patent application Ser. No.
13/866,975, entitled "APPARATUS AND METHODS FOR
REINFORCEMENT-GUIDED SUPERVISED LEARNING", filed Apr. 19, 2013,
Ser. No. 13/918,338 entitled "ROBOTIC TRAINING APPARATUS AND
METHODS", filed Jun. 14, 2013, Ser. No. 13/918,298 entitled
"HIERARCHICAL ROBOTIC CONTROLLER APPARATUS AND METHODS", filed Jun.
14, 2013, Ser. No. 13/907,734 entitled "ADAPTIVE ROBOTIC INTERFACE
APPARATUS AND METHODS", filed May 31, 2013, Ser. No. 13/842,530
entitled "ADAPTIVE PREDICTOR APPARATUS AND METHODS", filed Mar. 15,
2013, Ser. No. 13/842,562 entitled "ADAPTIVE PREDICTOR APPARATUS
AND METHODS FOR ROBOTIC CONTROL", filed Mar. 15, 2013, Ser. No.
13/842,616 entitled "ROBOTIC APPARATUS AND METHODS FOR DEVELOPING A
HIERARCHY OF MOTOR PRIMITIVES", filed Mar. 15, 2013, Ser. No.
13/842,647 entitled "MULTICHANNEL ROBOTIC CONTROLLER APPARATUS AND
METHODS", filed Mar. 15, 2013, and Ser. No. 13/842,583 entitled
"APPARATUS AND METHODS FOR TRAINING OF ROBOTIC DEVICES", filed Mar.
15, 2013, each of the foregoing being incorporated herein by
reference in its entirety.
[0065] During controller training, the supervised learning process
may receive supervisory input (training) from a trainer. In one or
more implementations, the trainer may comprise a computerized agent
and/or a human user. In some implementations of controller training
by a human user, the training input may be provided by the user via
a remote control apparatus e.g., such as illustrated in FIG. 2. The
control apparatus 200 may be configured to provide teaching input
to the adaptive controller and/or operate the robotic arm 100 via
control element 214.
[0066] In the implementation illustrated in FIG. 2, the control
element 214 comprises a slider with a single direction 218
representing one degree of freedom (DOF), which may comprise a
controllable DOF (CDOF). A lateral or "translation" degree of
freedom refers to a displacement with respect to a point of
reference. A rotational degree of freedom refers to a rotation
about an axis. Other common examples of control elements include
e.g., joysticks, touch pads, mice, track pads, dials, and/or other.
More complex control elements may offer even more DOF; for example,
so called 6DOF controllers may offer translation in 3 directions
(forward, backward, up/down), and rotation in 3 axis (pitch, yaw,
roll). The control apparatus 200 provides one or more control
signals (e.g., teaching input).
[0067] In one exemplary embodiment, the one or more control signals
represent a fewer number of CDOF than the robot can support. For
instance, with respect to FIGS. 1 and 2, the control apparatus 200
provides control signals for a single (1) DOF, whereas the robotic
arm 100 supports two (2) DOF. In order to train and/or control
multiple degrees of freedom of the arm 100, the control apparatus
200 may further comprise a switch element 210 configured to select
the joint 102 or joint 106 the control signals should be associated
with. Other common input apparatus which may be useful to specify
the appropriate DOF include, without limitation: buttons,
keyboards, mice, and/or other devices
[0068] Referring now to FIG. 3, the control apparatus 200 may be
utilized to provide supervisory input to train a mobile robotic
platform 300 characterized by two degrees of freedom (indicated by
arrows 314, 310). The platform 300 may comprise a motorized set of
wheels 312 configured to move the platform (as shown, along the
direction 314). The platform 300 may also comprise a motorized
turret 304 (adapted to support an antenna and/or a camera) that is
configured to be to be rotated about the axis 310.
[0069] In the exemplary robotic devices of FIGS. 1 and 3, the
supervisory signal comprises: (i) an actuator displacement value
(selected by the slider 218), and (ii) a selection as to the
appropriate actuator mechanism (selected by the switch element
210), torque values for individual joints, and/or other. As shown
in FIG. 1, the actuators control the angular displacement for the
robotic limbs. In contrast in FIG. 3, the actuators control the
linear displacement (via a motorized wheel drive), and a rotational
displacement about the axis 310. The foregoing exemplary
supervisory signal is purely illustrative and those of ordinary
skill in the related arts will readily appreciate that the present
disclosure contemplates supervisory signals that include e.g.,
multiple actuator displacement values (e.g., for multi-CDOF
controller elements), multiple actuator selections, and/or other
components.
[0070] It is further appreciated that the illustrated examples are
readily understood to translate the value from the actuator
displacement value to a linear displacement, angular displacement,
rotational displacement, and/or other. Translation may be
proportional, non-proportional, linear, non-linear, and/or other.
For example, in some variable translation schemes, the actuator
displacement value may be "fine" over some ranges (e.g., allowing
small precision manipulations), and much more "coarse" over other
ranges (e.g., enabling large movements). While the present examples
use an actuator displacement value, it is appreciated that e.g.,
velocity values may also be used. For example, an actuator velocity
value may indicate the velocity of movement which may be useful for
movement which is not bounded within a range per se. For example,
with respect to FIG. 3, the motorized wheel drive and the turret
rotation mechanisms may not have a limited range.
[0071] Those of ordinary skill will appreciate that actuator
mechanisms vary widely based on application. Actuators may use
hydraulic, pneumatic, electrical, mechanical, and/or other.
mechanisms to generate e.g., linear force, rotational force, linear
displacement, angular displacement, and/or other. Common examples
include: pistons, comb drives, worm drives, motors, rack and
pinion, chain drives, and/or other.
[0072] In some implementations of supervised learning by neuron
networks, the training signal may comprise a supervisory signal
(e.g., a spike) that triggers neuron response. Referring now to
FIGS. 4-5, adaptive controllers of robotic apparatus (e.g., 100,
300 of FIGS. 1, 3) comprising a neuron network is graphically
depicted.
[0073] As shown in FIG. 4, a multilayer neuron network configured
to control multiple degrees of freedom (e.g., the robotic arm
apparatus 100 of FIG. 1), according to one or more implementations
is presented.
[0074] The multilayer network 500 of neurons is depicted within
FIG. 4. The network 500 comprises: an input neuron layer (neurons
502, 504, 506), a hidden neuron layer (neurons 522, 524, 526), and
an output neuron layer (neurons 542, 544). The neurons 502, 504,
506 of the input layer may receive sensory input 508 and
communicate their output to the neurons 522, 524, 526 via one or
more connections (512, 514, 516 in FIG. 4). In one or more
implementations of sensory data processing and/or object
recognition, the input layer of neurons may be referred to as
non-adaptive feature extraction layer that is configured to respond
to occurrence of one or more features/objects (e.g., edges, shapes,
color, and or other) represented by the input 508. The neurons 522,
524, 526 of the hidden layer may communicate output (generated
based on one or more inputs 512, 514, 516 and feedback signal 530)
to one or more output layer neurons 542, 544 via one or more
connections (532, 534, 536 in FIG. 5). In one or more
implementations, the network 500 of FIG. 4 may be referred to as
the two-layer network comprising two learning layers: layer of
connections between the input and the hidden neuron layers (e.g.,
512, 514, characterized by efficacies 518, 528), and layer of
connections between the hidden and the output neuron layers (e.g.,
532, 534 characterized by efficacies 548, 538). Those of ordinary
skill in the related arts will readily appreciate that the
foregoing network is purely illustrative and that other networks
may have different connectivity; network connectivity may be e.g.,
one-to-one, one-to-all, all-to-one, some to some, and/or other
methods.
[0075] In some instances, a network layer may provide an error
feedback signal to a preceding layer. For example, as shown by
arrows 530, 520 in FIG. 4, the neurons (542, 544) of the output
layer provide error feedback to the neurons (522, 524, 526) of the
hidden layer. T neurons (522, 524, 526) of the hidden layer provide
feedback to the input layer neurons (502, 504, 506). The error
propagation may be implemented using any applicable methodologies
including those described in, e.g. U.S. patent application Ser. No.
14/054,366 entitled "APPARATUS AND METHODS FOR BACKWARD PROPAGATION
OF ERRORS IN A SPIKING NEURON NETWORK", filed Oct. 15, 2013,
incorporated herein by reference in its entirety.
[0076] The exemplary network 500 may comprise a network of spiking
neurons configured to communicate with one another by means of
"spikes" or electrical pulses. Additionally, as used herein, the
terms "pre-synaptic" and "post-synaptic" are used to describe a
neuron's relation to a connection. For example, with respect to the
connection 512, the units 502 and 522 are referred to as the
pre-synaptic and the post-synaptic unit, respectively. It is
noteworthy, that the same unit is referred to differently with
respect to different connections. For instance, unit 522 is
referred to as the pre-synaptic unit with respect to the connection
532, and the post-synaptic unit with respect to the connection 512.
In one or more implementations of spiking networks, the error
signal 520, 530 may be propagated using spikes, e.g., as described
in U.S. patent application Ser. No. 14/054,366, entitled "APPARATUS
AND METHODS FOR BACKWARD PROPAGATION OF ERRORS IN A SPIKING NEURON
NETWORK", filed Oct. 15, 2013, the foregoing being incorporated
herein by reference in its entirety.[
[0077] The input 508 may comprise data used for solving a
particular control task. For example, the signal 508 may comprise a
stream of raw sensor data and/or preprocessed data. Raw sensor data
may include data conveying information associated with one or more
of proximity, inertial, terrain imaging, and/or other information.
Preprocessed data may include data conveying information associated
with one or more of velocity, information extracted from
accelerometers, distance to obstacle, positions, and/or other
information. In some implementations, such as those involving
object recognition, the signal 508 may comprise an array of pixel
values in the input image, or preprocessed data. Preprocessed data
may include data conveying information associated with one or more
of levels of activations of Gabor filters for face recognition,
contours, and/or other information. In one or more implementations,
the input signal 508 may comprise a target motion trajectory. The
motion trajectory may be used to predict a future state of the
robot on the basis of a current state and the target state. In one
or more implementations, the signal 508 in FIG. 4 may be encoded as
spikes, as described in detail in commonly owned, and co-pending
U.S. patent application Ser. No. 13/842,530 entitled "ADAPTIVE
PREDICTOR APPARATUS AND METHODS", filed Mar. 15, 2013, incorporated
supra.
[0078] In one or more implementations, such as object recognition
and/or obstacle avoidance, the input 508 may comprise a stream of
pixel values associated with one or more digital images. In one or
more implementations (e.g., video, radar, sonography, x-ray,
magnetic resonance imaging, and/or other types of sensing), the
input may comprise electromagnetic waves (e.g., visible light, IR,
UV, and/or other types of electromagnetic waves) entering an
imaging sensor array. In some implementations, the imaging sensor
array may comprise one or more of RGCs, a charge coupled device
(CCD), an active-pixel sensor (APS), and/or other sensors. The
input signal may comprise a sequence of images and/or image frames.
The sequence of images and/or image frame may be received from a
CCD camera via a receiver apparatus and/or downloaded from a file.
The image may comprise a two-dimensional matrix of RGB values
refreshed at a 25 Hz frame rate. It will be appreciated by those
skilled in the arts that the above image parameters are merely
exemplary, and many other image representations (e.g., bitmap,
CMYK, HSV, HSL, grayscale, and/or other representations) and/or
frame rates are equally useful with the present technology. Pixels
and/or groups of pixels associated with objects and/or features in
the input frames may be encoded using, for example, latency
encoding described in commonly owned and co-pending U.S. patent
application Ser. No. 12/869,583, filed Aug. 26, 2010 and entitled
"INVARIANT PULSE LATENCY CODING SYSTEMS AND METHODS"; U.S. Pat. No.
8,315,305, issued Nov. 20, 2012, entitled "SYSTEMS AND METHODS FOR
INVARIANT PULSE LATENCY CODING"; Ser. No. 13/152,084, filed Jun. 2,
2011, entitled "APPARATUS AND METHODS FOR PULSE-CODE INVARIANT
OBJECT RECOGNITION"; and/or latency encoding comprising a temporal
winner take all mechanism described U.S. patent application Ser.
No. 13/757,607, filed Feb. 1, 2013 and entitled "TEMPORAL WINNER
TAKES ALL SPIKING NEURON NETWORK SENSORY PROCESSING APPARATUS AND
METHODS", each of the foregoing being incorporated herein by
reference in its entirety.
[0079] In one or more implementations, encoding may comprise
adaptive adjustment of neuron parameters, such neuron excitability
described in commonly owned and co-pending U.S. patent application
Ser. No. 13/623,820 entitled "APPARATUS AND METHODS FOR ENCODING OF
SENSORY DATA USING ARTIFICIAL SPIKING NEURONS", filed Sep. 20,
2012, the foregoing being incorporated herein by reference in its
entirety.
[0080] Individual connections (e.g., 512, 532) may be assigned,
inter alia, a connection efficacy, which in general may refer to a
magnitude and/or probability of input into a neuron affecting
neuron output. The efficacy may comprise, for example a parameter
(e.g., synaptic weight) used for adaptation of one or more state
variables of post-synaptic units (e.g., 530). The efficacy may
comprise a latency parameter by characterizing propagation delay
from a pre-synaptic unit to a post-synaptic unit. In some
implementations, greater efficacy may correspond to a shorter
latency. In some other implementations, the efficacy may comprise
probability parameter by characterizing propagation probability
from pre-synaptic unit to a post-synaptic unit; and/or a parameter
characterizing an impact of a pre-synaptic spike on the state of
the post-synaptic unit.
[0081] Individual neurons of the network 500 may be characterized
by a neuron state. The neuron state may, for example, comprise a
membrane voltage of the neuron, conductance of the membrane, and/or
other parameters. The learning process of the network 500 may be
characterized by one or more learning parameters, which may
comprise input connection efficacy, output connection efficacy,
training input connection efficacy, response generating (firing)
threshold, resting potential of the neuron, and/or other
parameters. In one or more implementations, some learning
parameters may comprise probabilities of signal transmission
between the units (e.g., neurons) of the network 500.
[0082] Referring back to FIG. 4, the training input 540 is
differentiated from sensory inputs (e.g., inputs 508) as follows.
During learning, input data (e.g., spike events) received at the
first neuron layer via the input 508 may cause changes in the
neuron state (e.g., increase neuron membrane potential and/or other
parameters). Changes in the neuron state may cause the neuron to
generate a response (e.g., output a spike). The training input 540
(also "teaching data") causes (i) changes in the neuron dynamic
model (e.g., modification of parameters a,b,c,d of Izhikevich
neuron model, described for example in commonly owned and
co-pending U.S. patent application Ser. No. 13/623,842, entitled
"SPIKING NEURON NETWORK ADAPTIVE CONTROL APPARATUS AND METHODS",
filed Sep. 20, 2012, incorporated herein by reference in its
entirety), and/or (ii) modification of connection efficacy, based,
for example, on the timing of input spikes, teaching spikes, and/or
output spikes. In some implementations, the teaching data may
trigger neuron output in order to facilitate learning. In some
implementations, the teaching data may be communicated to other
components of the control system.
[0083] During normal operation (e.g., subsequent to learning), data
508 arriving to neurons of the network may cause changes in the
neuron state (e.g., increase neuron membrane potential and/or other
parameters). Changes in the neuron state may cause the neuron to
generate a response (e.g., output a spike). However, during normal
operation, The training input 540 is absent; the input data 508 is
required for the neuron to generate output.
[0084] In some implementations, one of the outputs (e.g., generated
by neuron 542) may be configured to actuate the first CDOF of the
robotic arm 100 (e.g., joint 102); another output (e.g., generated
by neuron 542) may be configured to actuate the second CDOF of the
robotic arm 100 (e.g., the joint 106).
[0085] While FIG. 4 illustrates a multilayer neuron network having
three layers of neurons and two layers of connections, it will be
appreciated by those of ordinary skill in the related arts that any
number of layers of neurons are contemplated by the present
disclosure. Complex systems may require more neuron layers whereas
simpler systems may utilize fewer layers. In other cases,
implementation may be driven by other cost/benefit analysis. For
example, power consumption, system complexity, number of inputs,
number of outputs, the presence (or lack of) existing technologies,
and/or other. may affect the multilayer neuron network
implementation.
[0086] FIG. 5 depicts an exemplary neuron network 550 for
controlling multiple degrees of freedom (e.g., the robotic arm
apparatus 100 of FIG. 1), according to one or more implementations
is presented.
[0087] The network 550 of FIG. 5 may comprise two layers of
neurons. The first layer (also referred to as the input layer) may
comprise multiple neurons (e.g., 552, 554, 556). The second layer
(also referred to as the output layer) may comprise two neurons
(572, 574). The input layer neurons (e.g., 552, 554, 556) receive
sensory input 558 and communicate their output to the output layer
neurons (572, 574) via one or more connections (e.g., 562, 564, 566
in FIG. 5). In one or more implementations, the network 550 of FIG.
5 may be referred to as the single-layer network comprising one
learning layer of connections (e.g., 562, 566 characterized by
efficacies e.g., 578, 568).
[0088] In sensory data processing and/or object recognition
implementations, the first neuron layer (e.g., 552, 554, 556) may
be referred to as non-adaptive feature extraction layer configured
to respond to occurrence of one or more features/objects (e.g.,
edges, shapes, color, and or other) in the input 558. The second
layer neurons (572, 574) generate control output 576, 570 based on
one or more inputs received from the first neuron layer (e.g., 562,
564, 566) to a respective actuator (e.g., the joints 102, 106 in
FIG. 1). Those of ordinary skill in the related arts will readily
appreciate that the foregoing network is purely illustrative and
that other networks may have different connectivity; network
connectivity may be e.g., one-to-one, one-to-all, all-to-one, some
to some, and/or other methods.
[0089] The network 500 and/or 550 of FIGS. 4-5 may be operable in
accordance with a supervised learning process configured based on
teaching signal 540, 560, respectively. In one or more
implementations, the network 500, 550 may be configured to optimize
performance (e.g., performance of the robotic apparatus 100 of FIG.
1) by minimizing the average value of a performance function e.g.,
as described in detail in commonly owned and co-pending U.S. patent
application Ser. No. 13/487,499, entitled "STOCHASTIC APPARATUS AND
METHODS FOR IMPLEMENTING GENERALIZED LEARNING RULES" filed Jun. 4,
2012, incorporated herein by reference in its entirety. It will be
appreciated by those skilled in the arts that supervised learning
methodologies may be used for training artificial neural networks,
including but not limited to, an error back propagation, described
in, e.g. U.S. patent application Ser. No. 14/054,366 entitled
"APPARATUS AND METHODS FOR BACKWARD PROPAGATION OF ERRORS IN A
SPIKING NEURON NETWORK", filed Oct. 15, 2013, incorporated supra,
naive and semi-naive Bayes classifier, described in, e.g. U.S.
patent application Ser. No. 13/756,372 entitled "SPIKING NEURON
CLASSIFIER APPARATUS AND METHODS USING CONDITIONALLY INDEPENDENT
SUBSETS", filed Jan. 31, 2013, the foregoing being incorporated
herein by reference in its entirety, and/or other approaches, such
as ensembles of classifiers, random forests, support vector
machine, Gaussian processes, decision tree learning, boosting
(using a set of classifiers with a low correlation to the true
classification), and/or other. During learning, the efficacy (e.g.,
518, 528, 538, 548 in FIG. 4 and 568, 578 in FIG. 5) of connections
of the network may be adapted in accordance with one or more
adaptation rules. The rules may be configured to implement synaptic
plasticity in the network. In some implementations, the synaptic
plastic rules may comprise one or more spike-timing dependent
plasticity rules, such as rules comprising feedback described in
commonly owned and co-pending U.S. patent application Ser. No.
13/465,903 entitled "SENSORY INPUT PROCESSING APPARATUS IN A
SPIKING NEURAL NETWORK", filed May 7, 2012; rules configured to
modify of feed forward plasticity due to activity of neighboring
neurons, described in co-owned U.S. patent application Ser. No.
13/488,106, entitled "SPIKING NEURON NETWORK APPARATUS AND
METHODS", filed Jun. 4, 2012; conditional plasticity rules
described in U.S. patent application Ser. No. 13/541,531, entitled
"CONDITIONAL PLASTICITY SPIKING NEURON NETWORK APPARATUS AND
METHODS", filed Jul. 3, 2012; plasticity configured to stabilize
neuron response rate as described in U.S. patent application Ser.
No. 13/691,554, entitled "RATE STABILIZATION THROUGH PLASTICITY IN
SPIKING NEURON NETWORK", filed Nov. 30, 2012; activity-based
plasticity rules described in co-owned U.S. patent application Ser.
No. 13/660,967, entitled "APPARATUS AND METHODS FOR ACTIVITY-BASED
PLASTICITY IN A SPIKING NEURON NETWORK", filed Oct. 25, 2012, U.S.
patent application Ser. No. 13/660,945, entitled "MODULATED
PLASTICITY APPARATUS AND METHODS FOR SPIKING NEURON NETWORK", filed
Oct. 25, 2012; and U.S. patent application Ser. No. 13/774,934,
entitled "APPARATUS AND METHODS FOR RATE-MODULATED PLASTICITY IN A
SPIKING NEURON NETWORK", filed Feb. 22, 2013; multi-modal rules
described in U.S. patent application Ser. No. 13/763,005, entitled
"SPIKING NETWORK APPARATUS AND METHOD WITH BIMODAL SPIKE-TIMING
DEPENDENT PLASTICITY", filed Feb. 8, 2013, each of the foregoing
being incorporated herein by reference in its entirety.
[0090] In one or more implementations, neuron operation may be
configured based on one or more inhibitory connections providing
input configured to delay and/or depress response generation by the
neuron, as described in commonly owned and co-pending U.S. patent
application Ser. No. 13/660,923, entitled "ADAPTIVE PLASTICITY
APPARATUS AND METHODS FOR SPIKING NEURON NETWORK", filed Oct. 25,
2012, the foregoing being incorporated herein by reference in its
entirety. Connection efficacy updated may be effectuated using a
variety of applicable methodologies such as, for example,
event-based updates described in detail in commonly owned and
co-pending U.S. patent application Ser. No. 13/239,255 filed Sep.
21, 2011, entitled "APPARATUS AND METHODS FOR SYNAPTIC UPDATE IN A
PULSE-CODED NETWORK", Ser. No. 13/588,774, entitled "APPARATUS AND
METHODS FOR IMPLEMENTING EVENT-BASED UPDATES IN SPIKING NEURON
NETWORKS", filed Aug. 17, 2012; and Ser. No. 13/560,891 entitled
"APPARATUS AND METHODS FOR EFFICIENT UPDATES IN SPIKING NEURON
NETWORK", each of the foregoing being incorporated herein by
reference in its entirety
[0091] A neuron process may comprise one or more learning rules
configured to adjust neuron state and/or generate neuron output in
accordance with neuron inputs. In some implementations, the one or
more learning rules may comprise state dependent learning rules
described, for example, in commonly owned and co-pending U.S.
patent application Ser. No. 13/560,902, entitled "APPARATUS AND
METHODS FOR GENERALIZED STATE-DEPENDENT LEARNING IN SPIKING NEURON
NETWORKS", filed Jul. 27, 2012 and/or U.S. patent application Ser.
No. 13/722,769 filed Dec. 20, 2012, and entitled "APPARATUS AND
METHODS FOR STATE-DEPENDENT LEARNING IN SPIKING NEURON NETWORKS",
each of the foregoing being incorporated herein by reference in its
entirety.
[0092] In some implementations, the single-layer network 550 of
FIG. 5 may be embodied in an adaptive controller configured to
operate a robotic platform characterized by multiple degrees of
freedom (e.g., the robotic arm 100 of FIG. 1 with two CDOF). By way
of an illustration, the network 550 outputs 570, 576 of FIG. 5 may,
be configured to operate the joints 102, 106, respectively, of the
robotic arm in FIG. 1. During a first plurality of trials, the
network 550 may trained to operate a first subset of the robot's
available CDOF (e.g., the joint 102 in FIG. 1). Efficacy of the
connections communicating signals from the first layer of the
network 550 (e.g., the neurons 552, 554, 556) to the second layer
neurons (e.g., efficacy 568 of the connection 566 communicating
data to the neuron 574 in FIG. 5) may be adapted in accordance with
a learning method.
[0093] Similarly, during a second plurality of trials, the network
550 may trained to operate a second subset of the robot's available
CDOF (e.g., the joint 106 in FIG. 1). Efficacy of the connections
communicating signal from the first layer of the network 550 (e.g.,
the neurons 552, 554, 556) to the second layer neurons (e.g.,
efficacy 578 of the connection 562 communicating data to the neuron
572 in FIG. 5) may be adapted in accordance with the learning
method.
[0094] By employing time multiplexed learning of multiple CDOF
operations, learning speed and/or accuracy may be improved,
compared to a combined learning approach wherein the entire
complement of the robot's CDOF are being trained contemporaneously.
It is noteworthy, that the two-layer network architecture (e.g., of
the network 550 in FIG. 5) may enable separate adaptation of
efficacy for individual network outputs. That is, efficacy of
connections into the neuron 572 (obtained when training the neuron
572 to operate the joint 102) may be left unchanged when training
the neuron 574 to operate the joint 106.
[0095] In some implementations, the multi-layer network 500 of FIG.
4 may be embodied in an adaptive controller configured to operate a
robotic platform characterized by multiple degrees of freedom
(e.g., the robotic arm 100 of FIG. 1 with two CDOF). By way of
illustration, the network 500 outputs 546, 547 of FIG. 4 may be
configured to operate the joints 102, 106, respectively, of the arm
in FIG. 1. During a first plurality of trials, the network 500 may
trained to operate a first subset of the robot's available CDOF
(e.g., the joint 102 in FIG. 1). Efficacy of connections
communicating signal from the first layer of the network 500 (e.g.,
the neurons 502, 504, 506) to the second layer neurons (e.g.,
efficacy 518, 528 of connections 514, 512 communicating data to
neurons 526, 522 in FIG. 4) may be adapted in accordance with a
learning method. Efficacy of connections communicating signal from
the second layer of the network 500 (e.g., the neurons 522, 524,
526) to the second layer output neuron (e.g., efficacy 548 of
connections 532 communicating data to the neuron 542 in FIG. 4) may
be adapted in accordance with the learning method.
[0096] During a second plurality of trials, the network 500 may
trained to operate a second subset of the robot's available CDOF
(e.g., the joint 106 in FIG. 1). During individual trials of the
second plurality of trials efficacy of connections communicating
signal from the second layer of the network 500 (e.g., the neurons
522, 524, 526) to the second layer output neuron (e.g., efficacy
538 of connections 534 communicating data to the neuron 544 in FIG.
4) may be adapted in accordance with the learning method. In some
implementations, the efficacy of connections communicating signal
from the first layer of the network to the second layer neurons
determined during the first plurality of trials may be further
adapted or refined during the second plurality of trials in
accordance with the learning method, using, e.g., optimization
methods based on a cost/reward function. The cost/reward function
may be configured the user and/or determined by the adaptive system
during the first learning stage.
[0097] A robotic device may be configured to execute a target task
associated with a target trajectory. A controller of the robotic
device may be trained to navigate the target trajectory comprising
multiple portions. Some trajectory portions may be associated with
the robot executing complex actions (e.g., that may require more
training trials and/or more dense training input compared to
simpler trajectory actions). A complex trajectory portion may be
characterized by, e.g., a selected range of state space parameters
associated with the task operation by the robot. In one or more
implementations, the complex action may be characterized by a high
rate of change of one or more motion parameters (e.g.,
acceleration), higher position tolerance (e.g., tight corners,
precise positioning of components during manufacturing, fragile
items for grasping by a manipulator, high target performance (e.g.,
lap time of less than N seconds), actions engaging multiple CDOF of
a manipulator arm, and/or other parameters).
[0098] The range of state parameters associated with the complex
trajectory portion may be referred as a selected subset of the
state space. The added training associated with the state space
subset may be referred to as selective state space sampling. The
selection of a trajectory portion for selective state space
sampling added training may be configured based on one or more
state parameters associated with the robotic device navigation of
the target trajectory in the state space.
[0099] The target trajectory navigation may be characterized by a
performance measure determined based on one or more state
parameters. In some implementations, the selection of the
trajectory portion (e.g., complex trajectory portion, and/or
other.) may be determined based on an increased level of target
performance. By way of illustration, consider one exemplary
autonomous rover implementation: the rover performance may be
determined based on a deviation of the actual rover position from a
nominal or expected position (e.g., position on a road). The rover
trajectory may comprise unrestricted straightaway portions and one
or more portions disposed in a constricted terrain e.g., with a
drop on one side and a wall on the other side. The rover target
position deviation range may be reduced for the trajectory portions
in the constricted environment, compared to the rover target
position deviation range for the unrestricted straightaway
portions.
[0100] In some implementations, the amount of time associated with
traversing the complex trajectory portion may comprise less than a
half the time used for traversing the whole trajectory. In one or
more implementations, state space extent associated with the
complex trajectory portion may comprise less than a half of the
state space extent associated with the whole trajectory.
[0101] Individual trajectory portions may be trained during
respective training trials. In one or more implementations, a
selective CDOF methodology, such as that described herein, may be
employed when training one or more portions associated with
multiple CDOF operations.
[0102] FIGS. 10A through 10C illustrate selective state space
sampling methodology in accordance with some implementations. FIG.
10A depicts an exemplary trajectory for an autonomous vehicle
useful with, e.g., cleaning, surveillance, racing, exploration,
search and rescue, and/or other robotic applications.
[0103] A robotic platform 1010 may be configured to perform a
target task comprising navigation of the target trajectory 1000.
One or more portions 1002, 1004, 1012 of the trajectory 1000 in
FIG. 10A may comprise execution of a complex action(s) by the
controller of the robotic platform 1010. In some implementations,
the trajectory portion 1004 (shown by broken line in FIG. 10A) may
comprise one or more sharp turns (e.g., greater than 90.degree.)
that may be navigated at a target speed and/or with a running
precision metric of the target position by the platform 1010.
[0104] Training of the robotic platform 1000 controller navigating
the trajectory portion 1004 may be configured on one or more
trials. During individual trials, the controller of the platform
1000 may receive teaching input, indicated by symbols `X` in FIG.
10A. Teaching input 1008 may comprise one or more control commands
provided by a training entity and configured to aid the traversal
of the trajectory portion 1004. In one or more implementations, the
teaching input 1008 may be provided via a remote controller
apparatus, such as described, e.g., in commonly owned and
co-pending U.S. patent application Ser. No. 13/953,595 entitled
"APPARATUS AND METHODS FOR CONTROLLING OF ROBOTIC DEVICES", filed
Jul. 29, 2013; Ser. No. 13/918,338 entitled "ROBOTIC TRAINING
APPARATUS AND METHODS", filed Jun. 14, 2013; Ser. No. 13/918,298
entitled "HIERARCHICAL ROBOTIC CONTROLLER APPARATUS AND METHODS",
filed Jun. 14, 2013; Ser. No. 13/918,620 entitled "PREDICTIVE
ROBOTIC CONTROLLER APPARATUS AND METHODS", filed Jun. 14, 2013;
Ser. No. 13/907,734 entitled "ADAPTIVE ROBOTIC INTERFACE APPARATUS
AND METHODS", filed May 31, 2013, each of the foregoing being
incorporated herein by reference in its entirety.
[0105] The adaptive controller may be configured to produce control
output based on the teaching input and output of the learning
process. Output of the controller may comprise a combination of an
adaptive predictor output and the teaching input. Various
realizations of adaptive predictors may be utilized with the
methodology described including, e.g. those described in U.S.
patent application Ser. No. 13/842,562 entitled "ADAPTIVE PREDICTOR
APPARATUS AND METHODS FOR ROBOTIC CONTROL", filed Mar. 15, 2013,
incorporated supra.
[0106] Training may be executed in one or more training sessions,
e.g., every week or according to a prescribed periodicity, in an
event-driven manner, aperiodically, and/or other, to improve a
particular performance for a given trajectory portion. By way of
illustration, subsequent to an initial group of training trials, a
particularly difficult operation (e.g., associated with the portion
1004) may continue to be trained in order to improve performance,
while the remaining trajectory is based on the training information
determined during the initial group of training trials.
[0107] Actions associated with navigating the portion 1004 of the
trajectory may be characterized by a corresponding range (subset)
of the state space associated with the full trajectory 1000
navigation. In one or more implementations, the selection may be
based on location (a range of coordinates), velocity, acceleration,
jerk, operational performance (e.g., lap time), the rate of
performance change over multiple trials, and/or other
parameters.
[0108] The partial trajectory training methodology (e.g., using the
selective state space sampling) may enable the trainer to focus on
more difficult sections of a trajectory compared to other
relatively simple trajectory portions (e.g., 1004 compared to 1002
in FIG. 10A). By focusing on more difficult sections of the
trajectory (e.g., portion 1004), the overall target performance,
and/or a particular attribute thereof (e.g., a shorter lap time in
racing implementations, fewer collisions in cleaning
implementations, and/or other.) may be improved in a shorter amount
of time, as compared to performing the same number of trials for
the complete trajectory in accordance with prior art approaches.
Reducing the amount of training data and/or training trials for
simpler tasks (e.g., the portions 1002, 1012 in FIG. 10A) may
further reduce or prevent errors associated with over-fitting.
[0109] FIG. 10B illustrates an exemplary trajectory for a
manufacturing robot useful with the selective state space training
methodology, according to one or more implementations. The
trajectory 1040 of FIG. 10B, may correspond to operations of a
manufacturing process e.g., performed by a robotic manipulator 100
of FIG. 1. For example, as shown the manufacturing process
comprises the assembly of a portable electronic device. The
operations 1042, 1044, 1048 may correspond to so called "pick and
place" of larger components (e.g., enclosure, battery), whereas the
operation 1046 may correspond to handling of smaller, irregular
components (e.g., wires). The operations 1042, 1044, 1048 may
comprise action(s) that may be trained in a small number of trials
(e.g., between 1 and 10 in some implementations). One or more
operations (e.g., shown by hashed rectangle 1046) may comprise more
complex action(s) (compared to the operations 1042 1044, 1048) that
may require a larger number of trials (e.g., greater than 10)
compared to the operations 1042 1044, 1048. The operation 1046 may
be characterized by increased state parameter variability between
individual trials compared to the operations 1042 1044, 1048.
[0110] In some implementations of robotic devices characterized by
multiple controllable degrees of freedom (CDOF), the trajectory
portion selection may correspond to training a subset of CDOF and
operating one or more remaining CDOF based on prior training and/or
pre-configured operational instructions.
[0111] FIG. 10C illustrates an exemplary state space trajectory
useful with the selective state space training methodology,
according to one or more implementations. The trajectory 1060 of
FIG. 10C may correspond to execution of a target task e.g., a task
described above with respect to FIGS. 10A-10B and/or operation of
the arm 100 of FIG. 1 characterized by multiple CDOF. The task may
comprise navigation from a start point 1062 to an end point 1064.
The trajectory 1060 may be characterized by two (2) states: s1, s2.
In one or more implementations, state s1 and s2 may correspond to
one or more parameters associated with the operation of the robot
(e.g., arm 100) such as, for example, position (a range of
coordinates), velocity, acceleration, jerk, joint orientation,
operational performance (e.g., distance to target), the rate of
performance change over multiple trials (e.g., improving or not),
motor torque, current draw, battery voltage, available power,
parameters describing the environment (e.g., wind, temperature,
precipitation, pressure, distance, motion of obstacles, and/or
targets, and/or other.) and/or other parameters.
[0112] As shown, the trajectory 1060 is characterized by portions
1066, 1070, 1068. The portion 1070 may be more difficult to train
compared to the portions 1066, 1068. In one or more
implementations, the training difficulty may be characterized by
one or more of lower performance, longer training time, a larger
number of training trials, frequency of training input, and/or
variability of other parameters associated with operating the
portion 1070 as compared to the portions 1066, 1068. The trajectory
portions 1066, 1068, and 1070 may be characterized by state space
extent 1076, 1074, and 1078, respectively. As illustrated in FIG.
10C, the state space extent 1074 associated with the more difficult
to train portion 1070 may occupy a smaller extent of the state
space s1-s2, compared to the state space portions 1076, 1078. The
state space configuration of FIG. 10C may correspond to the state
space s1-s2 corresponding to time-space coordinates associated with
e.g., the trajectory 1000 of 10A. In one or more implementations
(not shown), the state space s1-s2 may characterize controller
training time, platform speed/acceleration, and/or other parameters
of the trajectory.
[0113] The trajectory portion 1070 may correspond to execution of
an action (or multiple actions) that may be more difficult to learn
compared to other action. The learning difficulty may arise from
one or more of the following (i) the action is more complex (e.g. a
sharp turn characterized by an increased rate of change of speed,
direction, and or other state parameter of a vehicle, and/or
increased target precision of a manipulator), (ii) the associated
with the action is difficult to identify (e.g., another portion of
the trajectory may be associated with a similar context but may
require a different set of motor commands), or (iii) there are
multiple and contradictory ways to solve this part of the
trajectory (e.g., a wider turn with faster speed, and/or a sharp
turn with low speed) and the teacher is not consistent in the way
he solves the problem; or a combination thereof).
[0114] In one or more implementations, the state space
configuration of FIG. 10C may correspond to operation of a robotic
arm (e.g., 100 in FIG. 1) having two CDOF. State parameters s1, s2
may correspond to control parameters (e.g., orientation) of joints
102, 106 in FIG. 1. The partial trajectory training methodology
(e.g., using the selective state space sampling) may comprise: (i)
operation of one of the joints 102 (or 106) based on results of
prior training; and (ii) training the other joint 106 (or 102)
using any of the applicable methodologies described herein.
[0115] The selective state space sampling may reduce training
duration and/or amount of training data associated with the
trajectory portions 1066, 1068. Reducing the amount of training
data and/or training trials for simpler tasks (e.g., the portions
1066, 1068 in FIG. 10A) may further reduce or prevent errors that
may be associated with over-fitting.
[0116] FIGS. 6-9 illustrate methods of training an adaptive
apparatus of the disclosure in accordance with one or more
implementations. In some implementations, methods 600, 700, 800,
900 may be accomplished with one or more additional operations not
described, and/or without one or more of the operations discussed.
Additionally, the order in which the operations of methods 600,
700, 800, 900 are illustrated in FIGS. 6-9 described below is not
limiting; the various steps may be performed in other orders.
Similarly, various steps of the methods 600, 700, 800, 900 may be
substituted for equivalent or substantially equivalent steps. The
methods 600, 700, 800, 900 presented below are illustrative, any
and all of the modifications described herein are readily performed
by those of ordinary skill in the related arts.
[0117] In some implementations, methods 600, 700, 800, 900 may be
implemented in one or more processing devices (e.g., a digital
processor, an analog processor, a digital circuit designed to
process information, an analog circuit designed to process
information, a state machine, and/or other mechanisms for
electronically processing information). The one or more processing
devices may include one or more devices executing some or all of
the operations of methods 600, 700, 800, 900 in response to
instructions stored electronically on an electronic storage medium.
The one or more processing devices may include one or more devices
configured through hardware, firmware, and/or software to be
specifically designed for execution of one or more of the
operations of methods 600, 700, 800, 900. Operations of methods
600, 700, 800, 900 may be utilized with a robotic apparatus (see
e.g., the robotic arm 100 of FIG. 1 and the mobile robotic platform
300 of FIG. 3) using a remote control robotic apparatus (such as is
illustrated in FIG. 2).
[0118] FIG. 6 is a logical flow diagram illustrating a generalized
method for operating an adaptive robotic device, in accordance with
one or more implementations.
[0119] At operation 602 of method 600, a first actuator associated
with a first CDOF operation of a robotic device is selected. In
some implementations, the CDOF selection may be effectuated by
issuing an instruction to the robotic control apparatus (e.g.,
pressing a button, issuing a voice command, an audible signal
(e.g., a click), an initialization after power-on/reset sequence, a
pre-defined programming sequence, and/or other.). In one or more
implementations, the CDOF selection may be effectuated based on a
timer event, and/or training performance reaching a target level,
e.g., determined based on ability of the trainer to position of one
of the joints within a range from a target position. For example,
in the context of FIG. 1, in one exemplary embodiment, the first
CDOF selection comprises selecting joint 102 of the robotic arm
100.
[0120] At operation 604, the adaptive controller is trained to
actuate movement in the first CDOF of the robot to accomplish a
target action. In some implementations, the nature of the task is
too complex to be handled with a single CDOF and thus require
multiple CDOF.
[0121] Operation 604 may comprise training a neuron network (such
as e.g., 500, 550 of FIGS. 4-5) in accordance with a supervised
learning method. In one or more implementations, the adaptive
controller may comprise one or more predictors, training may be
based on a cooperation between the trainer and the controller,
e.g., as described in commonly owned and co-pending U.S. patent
application Ser. No. 13/953,595 entitled "APPARATUS AND METHODS FOR
CONTROLLING OF ROBOTIC DEVICES", filed Jul. 29, 2013, and/or U.S.
patent application Ser. No. 13/918,338 entitled "ROBOTIC TRAINING
APPARATUS AND METHODS", filed Jun. 14, 2013, each incorporated
supra. During training, the trainer may provide control commands
(such as the supervisory signals 540, 560 in the implementations of
FIGS. 4-5). Training input may be combined with the predicted
output.
[0122] At operation 606, a second actuator associated with a second
CDOF operation of the robotic device is selected. The CDOF
selection may be effectuated by issuing an instruction to the
robotic control apparatus (e.g., pressing the button 210, issuing a
voice command, and/or using another communication method). For
example, in the context of FIG. 1, the second CDOF selection may
comprise selecting the other joint 106 of the robotic arm.
[0123] At operation 608, the adaptive controller may be trained to
operate the second CDOF of the robot in order to accomplish the
target action. In some implementations, the operation 608 may
comprise training a neuron network (such as e.g., 500, 550 of FIGS.
4-5) in accordance with a supervised learning method. In one or
more implementations, the adaptive controller may be configured to
operate the first CDOF of the robot based on outcome of the
training during operation 608. The trainer may initially operate
the second CDOF of the robot. Training based on cooperation between
the trainer and the controller, e.g., as described above with
respect to operation 608, may enable knowledge transfer from the
trainer to the controller so as to enable the controller to operate
the robot using the first and the second CDOF. During controller
training of operations 604, 608, the trainer may utilize a remote
interface (e.g., the control apparatus 200 of FIG. 2) in order to
provide teaching input for the first and the second CDOF training
trials.
[0124] It is appreciated that the method 600 may be used with any
number of degrees of freedom, additional degrees being iteratively
implemented. For example, for a device with six (6) degrees of
freedom, training may be performed with six independent iterations,
where individual iteration may be configured to train one (1)
degree of freedom. Moreover, more complex controllers may further
reduce iterations by training multiple simultaneous degrees of
freedom; e.g., three (3) iterations of a controller with two (2)
degrees of freedom, two (2) iterations of a controller with three
(3) degrees of freedom, and/or other.
[0125] Still further it is appreciated that the robotic apparatus
may support a number of degrees of freedom which is not evenly
divisible by the degrees of freedom of the controller. For example,
a robotic mechanism that supports five (5) degrees of freedom can
be trained in two (2) iterations with a controller that supports
three (3) degrees of freedom.
[0126] FIG. 7 illustrates a method of training an adaptive
controller of a robotic apparatus using the reduced degree of
freedom methodology described herein, in accordance with one or
more implementations. In one or more implementations, the adaptive
controller may comprise a neuron network operable in accordance
with a supervised learning process (e.g., the network 500, 550 of
FIGS. 4-5, described supra.).
[0127] At operation 702 of method 700, a context is determined. In
some implementations, the context may be determined based on one or
more sensory input and/or feedback that may be provided by the
robotic apparatus to the controller. In some implementations, the
sensory aspects may include an object being detected in the input,
a location of the object, an object characteristic (color/shape), a
sequence of movements (e.g., a turn), a characteristic of an
environment (e.g., an apparent motion of a wall and/or other
surroundings turning and/or approaching) responsive to the
movement. In some implementations, the sensory input may be
received during one or more training trials of the robotic
apparatus.
[0128] At operation 704, a first or a second actuator associated
with a first or second CDOF of the robotic apparatus is selected
for operation. For example, the first and the second CDOF may
correspond to operation of the motorized joints 102, 106,
respectively, of the manipulator arm 100 in FIG. 1.
[0129] Responsive to selecting the first actuator of the robotic
apparatus, the method may proceed to operation 706, wherein the
neuron network of the adaptive controller may be operated in
accordance with the learning process to generate the first CDOF
control output based on the context (e.g., learn a behavior
associated with the context). In some implementations, the teaching
signal for the first CDOF may comprise (i) a signal provided by the
user via a remote controller, (ii) a signal provided by the
adaptive system for the controlled CDOF, and/or (iii) a weighted
combination of the above (e.g., using constant and/or adjustable
weights).
[0130] Responsive to selecting the second actuator of the robotic
apparatus, the method may proceed to operation 710 wherein the
neuron network of the adaptive controller is operated in accordance
with the learning process configured to generate the second CDOF
control output based on the context (e.g., learn a behavior
associated with the context).
[0131] At operation 708, network configuration associated with the
learned behavior at operation 704 and/or 710 may be stored. In one
or more implementations, the network configuration may comprise
efficacy of one or more connections of the network (e.g., weights)
that may have been adapted during training.
[0132] FIG. 8 illustrates a method of training an adaptive
apparatus to control a robot using a reduced degree of freedom
methodology, in accordance with one or more implementations. The
robot may be characterized by two or more degrees of freedom; the
adaptive controller apparatus may be configured to control a
selectable subset of the CDOF of the robot during a trial.
[0133] At operation 822 of method 800, an actuator associated with
a CDOF is selected for training. In one or more implementations,
the CDOF selection may be effectuated by issuing an instruction to
the robotic control apparatus (e.g., pressing a button, issuing an
audible signal (e.g., a click, and/or a voice command), and/or
using another communication method). In one or more
implementations, the CDOF selection may be effectuated based on a
timer event, and/or training performance reaching a target level.
For example, upon learning to position/move one joint to a target
location, the controller may automatically switch to training of
another joint.
[0134] Responsive to selection of a first actuator associated with
a first CDOF of the robotic apparatus, the method proceeds to
operation 824, where training input for the first CDOF (CDOF1) is
provided. For example, in the context of the robotic arm 100 of
FIG. 1, the first CDOF training comprises training the joint 106.
The training input may include one or more motor commands and/or
action indications communicated using the remote control apparatus
200 of FIG. 2.
[0135] At operation 828, the control output may be determined in
accordance with the learning process and context. In some
implementations, the context may comprise the input into the
adaptive controller e.g., as described above with respect to
operation 702 of method 700.
[0136] The control output determined at operation 828 may comprise
the first CDOF control instructions 830 and/or the second CDOF
control instructions 844. The learning process may be implemented
using an iterative approach wherein control of one CDOF may be
learned partly before switching to learning another CDOF. Such back
and forth switching may be employed until the target performance is
attained.
[0137] Referring now to operation 826, the control CDOF 1 output
830 may be combined with the first CDOF training input provided at
operation 824. The combination of operation 826 may be configured
based on a transfer function. In one or more implementations, the
transfer function may comprise addition, union, a logical `AND`
operation, and/or other operations e.g., as described in commonly
owned and co-pending U.S. patent application Ser. No. 13/842,530
entitled "ADAPTIVE PREDICTOR APPARATUS AND METHODS", filed Mar. 15,
2013, incorporated supra.
[0138] At operation 832, the first actuator associated with the
first CDOF (CDOF1) of the robotic device is operated in accordance
with the control output determined at operation 826. Within the
context of the robotic arm 100 of FIG. 1, the actuator for joint
102 is operated based on a combination of the teaching input
provided by a trainer and a predicted control signal determined by
the adaptive controller during learning and in accordance with the
context.
[0139] Responsive to selection of a second actuator associated with
a second CDOF of the robotic apparatus, the method proceeds to
operation 840, where training input for the second CDOF (CDOF2) is
provided. For example, in the context of the robotic arm 100 of
FIG. 1, the second CDOF training comprises training the joint 102.
The training input includes one or more motor commands and/or
action indications communicated using the remote control apparatus
200 of FIG. 2.
[0140] Referring now to operation 842, the control CDOF 2 output
844 may be combined with the second CDOF training input provided at
operation 840. The combination of operation 842 may be configured
based on a transfer function. In one or more implementations, the
transfer function may comprise addition, union, a logical `AND`
operation, and/or other operations e.g., as described in commonly
owned and co-pending U.S. patent application Ser. No. 13/842,530
entitled "ADAPTIVE PREDICTOR APPARATUS AND METHODS", filed Mar. 15,
2013, incorporated supra.
[0141] At operation 846, the second actuator associated with the
second CDOF (CDOF2) of the robotic device is operated in accordance
with the control output determined at operation 842. Within the
context of the robotic arm 100 of FIG. 1, the actuator for joint
106 is operated based on a combination of the teaching input
provided by a trainer and a predicted control signal determined by
the adaptive controller during learning and in accordance with the
context. In some implementations, the CDOF 1 may be operated
contemporaneously with the operation of the CDOF 2 based on the
output 830 determined during prior training trials.
[0142] FIG. 9 illustrates a method for training an adaptive
controller of a robot to perform a task using selective state space
training methodology, in accordance with one or more
implementations. In one or more implementations, the task may
comprise following a race circuit (e.g., 1000 in FIG. 10A),
cleaning a room, performing a manufacturing procedure (e.g., shown
by the sequence 1040 in FIG. 10B), and/or operating a multi-joint
manipulator arm 100 of FIG. 1.
[0143] At operation 902 of method 900 illustrated in FIG. 9, a
trajectory portion may be determined. In some implementations, the
trajectory portion may comprise one or more portions (e.g., 1002,
1004 in FIG. 10A and/or 1066, 1070, 1068 in FIG. 10D) of the task
trajectory (e.g., 1000 in FIG. 10A and/or 1060 in FIG. 10D). In one
or more implementations, the trajectory portion is further
characterized by operation of a subset of degrees of freedom of a
robot characterized by multiple CDOF (e.g., joints 102 or 106 of
the arm 100 in FIG. 1).
[0144] At operation 904 a determination may be made as to whether a
teaching input may be expedient for navigating the trajectory
portion selected at operation 902. In some implementations
exemplary embodiment, the determination of expediency is based on
complexity of the task (e.g., required precision, speed of
operation, desired success rate, minimum failure rate, and/or
other.)
[0145] Responsive to a determination at operation 904 that the
teaching input is not expedient (and will not be provided), the
method may proceed to operation 910 wherein the trajectory portion
determined at operation 902 may be navigated based on a previously
learned controller configuration. In one or more implementations of
a controller comprising a neuron network, the previously learned
controller configuration may comprise an array of connection
efficacies (e.g., 578 in FIG. 5) determined at one or more prior
trials. In some implementations, the previously learned controller
configuration may comprise a look up table (LUT) learned by the
controller during one or more prior training trials. In some
implementations, the controller training may be configured based on
an online learning methodology, e.g., such as described in co-owned
and co-pending U.S. patent application Ser. No. 14/070,114 entitled
"APPARATUS AND METHODS FOR ONLINE TRAINING OF ROBOTS", filed Nov.
1, 2013, incorporated by reference in its entirety. The trajectory
portion navigation of operation 910 may be configured based on
operation of an adaptive predictor configured to produce predicted
control output in accordance with sensory context, e.g., such as
described in commonly owned and co-pending U.S. patent application
Ser. No. 13/842,530 entitled "ADAPTIVE PREDICTOR APPARATUS AND
METHODS", filed Mar. 15, 2013; co-owned U.S. patent application
Ser. No. 13/842,562 entitled "ADAPTIVE PREDICTOR APPARATUS AND
METHODS FOR ROBOTIC CONTROL", filed Mar. 15, 2013; co-owned U.S.
patent application Ser. No. 13/842,616 entitled "ROBOTIC APPARATUS
AND METHODS FOR DEVELOPING A HIERARCHY OF MOTOR PRIMITIVES", filed
Mar. 15, 2013; co-owned U.S. patent application Ser. No. 13/842,647
entitled "MULTICHANNEL ROBOTIC CONTROLLER APPARATUS AND METHODS",
filed Mar. 15, 2013; and co-owned U.S. patent application Ser. No.
13/842,583 entitled "APPARATUS AND METHODS FOR TRAINING OF ROBOTIC
DEVICES", filed Mar. 15, 2013; each of the foregoing being
incorporated herein by reference in its entirety. Various other
learning controller implementations may be utilized with the
disclosure including, for example, artificial neural network
(analog, binary, spiking, and/or hybrid), single or multi-layer
perceptron, support vector machines, Gaussian process,
convolutional networks, and/or other.
[0146] Responsive to a determination at operation 904 that teaching
input may be expedient, the method may proceed to operation 906,
wherein training input may be determined. In some implementations
of multiple controllable CDOF robots (e.g., the arm 100 in FIG. 1),
the teaching input may comprise control instructions configured to
aid operation of a subset of CDOF (e.g., the joint 102 or 106 in
FIG. 1). In one or more implementations, the teaching input may
comprise control instructions configured to provide supervisory
input to the robot's controller in order to aid the robot to
navigate the trajectory portion selected at operation 902. In one
or more implementations, the teaching input may be provided via a
remote controller apparatus, such as described, e.g., in commonly
owned and co-pending U.S. patent application Ser. No. 13/953,595
entitled "APPARATUS AND METHODS FOR CONTROLLING OF ROBOTIC
DEVICES", filed Jul. 29, 2013; U.S. patent application Ser. No.
13/918,338 entitled "ROBOTIC TRAINING APPARATUS AND METHODS", filed
Jun. 14, 2013; U.S. patent application Ser. No. 13/918,298 entitled
"HIERARCHICAL ROBOTIC CONTROLLER APPARATUS AND METHODS", filed Jun.
14, 2013; U.S. patent application Ser. No. 13/918,620 entitled
"PREDICTIVE ROBOTIC CONTROLLER APPARATUS AND METHODS", filed Jun.
14, 2013; U.S. patent application Ser. No. 13/907,734 entitled
"ADAPTIVE ROBOTIC INTERFACE APPARATUS AND METHODS", filed May 31,
2013, incorporated supra.
[0147] At operation 908 the trajectory portion may be navigated
based on a previously learned controller configuration and the
teaching input determined at operation 906. In some
implementations, the trajectory portion may be navigation may be
effectuated over one or more training trials configured in
accordance with an online supervised learning methodology, e.g.,
such as described in co-owned U.S. patent application Ser. No.
14/070,114 entitled "APPARATUS AND METHODS FOR ONLINE TRAINING OF
ROBOTS", filed Nov. 1, 2013, incorporated supra. During individual
trials, the controller may be provided with the supervisor input
(e.g., the input 1008, 1028 in FIGS. 10A-10B) configured to
indicate to the controller a target trajectory that is to be
followed. In one or more implementations, the teaching input may
comprise one or more control instructions, way points, and/or
other.
[0148] At operation 912 a determination may be made as to whether
the target task has been accomplished. In one or more
implementations, task completion may be based on an evaluation of a
performance measure associated with the learning process of the
controller. Responsive to a determination at operation that the
target task is has not been completed the method may proceed to
operation 902, wherein additional trajectory portion(s) may be
determined.
[0149] Various exemplary computerized apparatus configured to
implement learning methodology set forth herein are now described
with respect to FIGS. 11A-11D.
[0150] A computerized neuromorphic processing system, consistent
with one or more implementations, for use with an adaptive robotic
controller described, supra, is illustrated in FIG. 11A. The
computerized system 1100 of FIG. 11A may comprise an input device
1110, such as, for example, an image sensor and/or digital image
interface. The input interface 1110 may be coupled to the
processing block (e.g., a single or multi-processor block) via the
input communication interface 1114. In some implementations, the
interface 1114 may comprise a wireless interface (cellular
wireless, Wi-Fi, Bluetooth, and/or other.) that enables data
transfer to the processor 1102 from remote I/O interface 1100, e.g.
One such implementation may comprise a central processing apparatus
coupled to one or more remote camera devices providing sensory
input to the pre-processing block.
[0151] The system 1100 further may comprise a random access memory
(RAM) 1108, configured to store neuronal states and connection
parameters and to facilitate synaptic updates. In some
implementations, synaptic updates may be performed according to the
description provided in, for example, in commonly owned and
co-pending U.S. patent application Ser. No. 13/239,255 filed Sep.
21, 2011, entitled "APPARATUS AND METHODS FOR SYNAPTIC UPDATE IN A
PULSE-CODED NETWORK", incorporated by reference, supra.
[0152] In some implementations, the memory 1108 may be coupled to
the processor 1102 via a direct connection 1116 (e.g., memory bus).
The memory 1108 may also be coupled to the processor 1102 via a
high-speed processor bus 1112.
[0153] The system 1100 may comprise a nonvolatile storage device
1106. The nonvolatile storage device 1106 may comprise, inter alia,
computer readable instructions configured to implement various
aspects of spiking neuronal network operation. Examples of various
aspects of spiking neuronal network operation may include one or
more of sensory input encoding, connection plasticity, operation
model of neurons, learning rule evaluation, other operations,
and/or other aspects. In one or more implementations, the
nonvolatile storage 1106 may be used to store state information of
the neurons and connections for later use and loading previously
stored network configuration. The nonvolatile storage 1106 may be
used to store state information of the neurons and connections
when, for example, saving and/or loading network state snapshot,
implementing context switching, saving current network
configuration, and/or performing other operations. The current
network configuration may include one or more of connection
weights, update rules, neuronal states, learning rules, and/or
other parameters.
[0154] In some implementations, the computerized apparatus 1100 may
be coupled to one or more of an external processing device, a
storage device, an input device, and/or other devices via an I/O
interface 1120. The I/O interface 1120 may include one or more of a
computer I/O bus (PCI-E), wired (e.g., Ethernet) or wireless (e.g.,
Wi-Fi) network connection, and/or other I/O interfaces.
[0155] In some implementations, the input/output (I/O) interface
may comprise a speech input (e.g., a microphone) and a speech
recognition module configured to receive and recognize user
commands.
[0156] It will be appreciated by those skilled in the arts that
various processing devices may be used with computerized system
1100, including but not limited to, a single core/multicore CPU,
DSP, FPGA, GPU, ASIC, combinations thereof, and/or other processing
entities (e.g., computing clusters and/or cloud computing
services). Various user input/output interfaces may be similarly
applicable to implementations of the disclosure including, for
example, an LCD/LED monitor, touch-screen input and display device,
speech input device, stylus, light pen, trackball, and/or other
devices.
[0157] Referring now to FIG. 11B, one implementation of
neuromorphic computerized system configured to implement
classification mechanism using a neuron network is described in
detail. The neuromorphic processing system 1130 of FIG. 11B may
comprise a plurality of processing blocks (micro-blocks) 1140.
Individual micro cores may comprise a computing logic core 1132 and
a memory block 1134. The logic core 1132 may be configured to
implement various aspects of neuronal node operation, such as the
node model, and synaptic update rules and/or other tasks relevant
to network operation. The memory block may be configured to store,
inter alia, neuronal state variables and connection parameters
(e.g., weights, delays, I/O mapping) of connections 1138.
[0158] The micro-blocks 1140 may be interconnected with one another
using connections 1138 and routers 1136. As it is appreciated by
those skilled in the arts, the connection layout in FIG. 11B is
exemplary, and many other connection implementations (e.g., one to
all, all to all, and/or other maps) are compatible with the
disclosure.
[0159] The neuromorphic apparatus 1130 may be configured to receive
input (e.g., visual input) via the interface 1142. In one or more
implementations, applicable for example to interfacing with
computerized spiking retina, or image array, the apparatus 1130 may
provide feedback information via the interface 1142 to facilitate
encoding of the input signal.
[0160] The neuromorphic apparatus 1130 may be configured to provide
output via the interface 1144. Examples of such output may include
one or more of an indication of recognized object or a feature, a
motor command (e.g., to zoom/pan the image array), and/or other
outputs.
[0161] The apparatus 1130, in one or more implementations, may
interface to external fast response memory (e.g., RAM) via high
bandwidth memory interface 1148, thereby enabling storage of
intermediate network operational parameters. Examples of
intermediate network operational parameters may include one or more
of spike timing, neuron state, and/or other parameters. The
apparatus 1130 may interface to external memory via lower bandwidth
memory interface 1146 to facilitate one or more of program loading,
operational mode changes, retargeting, and/or other operations.
Network node and connection information for a current task may be
saved for future use and flushed. Previously stored network
configuration may be loaded in place of the network node and
connection information for the current task, as described for
example in commonly owned and co-pending U.S. patent application
Ser. No. 13/487,576 entitled "DYNAMICALLY RECONFIGURABLE STOCHASTIC
LEARNING APPARATUS AND METHODS", filed Jun. 4, 2012, incorporated
herein by reference in its entirety. External memory may include
one or more of a Flash drive, a magnetic drive, and/or other
external memory.
[0162] FIG. 11C illustrates one or more implementations of shared
bus neuromorphic computerized system 1145 comprising micro-blocks
1140, described with respect to FIG. 11B, supra. The system 1145 of
FIG. 11C may utilize shared bus 1147, 1149 to interconnect
micro-blocks 1140 with one another.
[0163] FIG. 11D illustrates one implementation of cell-based
neuromorphic computerized system architecture configured to optical
flow encoding mechanism in a spiking network is described in
detail. The neuromorphic system 1150 may comprise a hierarchy of
processing blocks (cells blocks). In some implementations, the
lowest level L1 cell 1152 of the apparatus 1150 may comprise logic
and memory blocks. The lowest level L1 cell 1152 of the apparatus
1150 may be configured similar to the micro block 1140 of the
apparatus shown in FIG. 11B. A number of cell blocks may be
arranged in a cluster and may communicate with one another via
local interconnects 1162, 1164. Individual clusters may form higher
level cell, e.g., cell L2, denoted as 1154 in FIG. 11D. Similarly,
several L2 clusters may communicate with one another via a second
level interconnect 1166 and form a super-cluster L3, denoted as
1156 in FIG. 11D. The super-clusters 1154 may communicate via a
third level interconnect 1168 and may form a next level cluster. It
will be appreciated by those skilled in the arts that the
hierarchical structure of the apparatus 1150, comprising four
cells-per-level, is merely one exemplary implementation, and other
implementations may comprise more or fewer cells per level, and/or
fewer or more levels.
[0164] Different cell levels (e.g., L1, L2, L3) of the apparatus
1150 may be configured to perform functionality various levels of
complexity. In some implementations, individual L1 cells may
process in parallel different portions of the visual input (e.g.,
encode individual pixel blocks, and/or encode motion signal), with
the L2, L3 cells performing progressively higher level
functionality (e.g., object detection). Individual ones of L2, L3,
cells may perform different aspects of operating a robot with one
or more L2/L3 cells processing visual data from a camera, and other
L2/L3 cells operating motor control block for implementing lens
motion what tracking an object or performing lens stabilization
functions.
[0165] The neuromorphic apparatus 1150 may receive input (e.g.,
visual input) via the interface 1160. In one or more
implementations, applicable for example to interfacing with
computerized spiking retina, or image array, the apparatus 1150 may
provide feedback information via the interface 1160 to facilitate
encoding of the input signal.
[0166] The neuromorphic apparatus 1150 may provide output via the
interface 1170. The output may include one or more of an indication
of recognized object or a feature, a motor command, a command to
zoom/pan the image array, and/or other outputs. In some
implementations, the apparatus 1150 may perform all of the I/O
functionality using single I/O block (not shown).
[0167] The apparatus 1150, in one or more implementations, may
interface to external fast response memory (e.g., RAM) via a high
bandwidth memory interface (not shown), thereby enabling storage of
intermediate network operational parameters (e.g., spike timing,
neuron state, and/or other parameters). In one or more
implementations, the apparatus 1150 may interface to external
memory via a lower bandwidth memory interface (not shown) to
facilitate program loading, operational mode changes, retargeting,
and/or other operations. Network node and connection information
for a current task may be saved for future use and flushed.
Previously stored network configuration may be loaded in place of
the network node and connection information for the current task,
as described for example in commonly owned and co-pending U.S.
patent application Ser. No. 13/487,576, entitled "DYNAMICALLY
RECONFIGURABLE STOCHASTIC LEARNING APPARATUS AND METHODS",
incorporated, supra.
[0168] In one or more implementations, one or more portions of the
apparatus 1150 may be configured to operate one or more learning
rules, as described for example in commonly owned and co-pending
U.S. patent application Ser. No. 13/487,576 entitled "DYNAMICALLY
RECONFIGURABLE STOCHASTIC LEARNING APPARATUS AND METHODS", filed
Jun. 4, 2012, incorporated herein by reference in its entirety. In
one such implementation, one block (e.g., the L3 block 1156) may be
used to process input received via the interface 1160 and to
provide a teaching signal to another block (e.g., the L2 block
1156) via interval interconnects 1166, 1168.
[0169] The partial trajectory training methodology (e.g., using the
selective state space sampling) described herein may enable a
trainer to focus on portions of particular interest or value e.g.,
more difficult trajectory portions as compared to other trajectory
portions (e.g., 1004 compared to 1002 in FIG. 10A). By focusing on
these trajectory portions 1004, the overall target task
performance, characterized by e.g., a shorter lap time in racing
implementations, and/or fewer collisions in cleaning
implementations, may be improved in a shorter amount of time, as
compared to performing the same number of trials for the complete
trajectory 1000 in accordance with the prior art methodologies. The
selective state space sampling methodology applied to robotic
devices with multiple CDOF may advantageously allow a trainer to
train one degree of freedom (e.g., a shoulder joint), while
operating another CDOF (an elbow joint) without trainer input using
previously trained controller configurations.
[0170] In some implementations, a user may elect to re-train and/or
to provide additional training to a previously trained controller
configuration for a given target trajectory. The additional
training may be focused on a subset of the trajectory (e.g., one or
more complex actions) so that to reduce training time and/or reduce
over-fitting errors for trajectory portions comprising less complex
actions.
[0171] In some implementations, the trajectory portion (e.g., the
subset characterized by complex actions) may be associated with an
extent of the state space. Based on the training of a controller to
navigate the portion, the state space extent may be reduced and
autonomy of the robotic device may be increased. In some
implementations, the training may enable full autonomy so as to
enable the robot to traverse the trajectory in absence of teaching
input,
[0172] The selective state space sampling methodology may be
combined with online training approaches, e.g., such as described
in co-owned U.S. patent application Ser. No. 14/070,114 entitled
"APPARATUS AND METHODS FOR ONLINE TRAINING OF ROBOTS", filed Nov.
1, 2013, incorporated supra. During some implementations of online
training of a robot to perform a task, a trainer may determine one
or portions of the task trajectory wherein the controller may
exhibit difficulty of controlling the robot. In one or more
implementations, the robot may detect an `unknown state` (e.g.,
previously not encountered). The robot may be configured to request
assistance (e.g., teaching input) from one or more teachers (e.g.,
humans, supervisory processes or entities, algorithms, etc.). In
accord with the selective state space sampling methodology, the
trainer may elect to train the controller on the one or more
challenging trajectory portions online thereby reducing and/or
eliminating delays that may be associated with offline training
approaches of the prior art that may rely on
recording/replaying/review of training results in order to evaluate
quality of training.
[0173] One or more of the methodologies comprising partial degree
of freedom learning and/or use of reduced CDOF robotic controller
described herein may facilitate training and/or operation of
robotic devices. In some implementations, a user interface may be
configured to operate a subset of robot's CDOF (e.g., one joint of
a two joint robotic manipulator arm). The methodologies of the
present disclosure may enable a user to train complex robotic
devices (e.g., comprising multiple CDOF) using the reduced CDOF
control interface. During initial training of a given CDOF subset,
the user may focus on achieving target performance (e.g., placing
the manipulator joint at a target orientation) without being
burdened by control of the whole robotic device. During subsequent
training trials for another CDOF subset, operation of the robot by
the user (e.g., the joints 106) may be augmented by the controller
output for the already trained CDOF (e.g., the joint 102 in FIG.
1). Such cooperation between the controller and the user may enable
the latter to focus on training the second CDOF subset without
being distracted by the necessity of controlling the first CDOF
subset. The methodology described herein may enable use of simpler
remote control devices (e.g., single joystick) to train multiple
CDOF robots, more complex tasks, and/or more robust learning
results (e.g., in a shorter time and/or with a lower error compared
to the prior art). By gradually training one or more DOF of a
robot, operator involvement may be gradually reduced. For example,
the trainer may provide occasional corrections to CDOF that may
require an improvement in performance switching from one to another
DOF as needed.
[0174] In some implementations, the training methodologies
described herein may reduce cognitive load on a human trainer,
e.g., by enabling the trainer to control a subset of DOF at a given
trial, and alleviating the need to coordinate control signals for
all DOF.
[0175] Dexterity constraints placed on the user may be reduced,
when controlling fewer degrees of freedom (e.g., the user may use a
single hand to train one DOF at a time of a six DOF robot).
[0176] The selective state space sampling methodology described
herein may reduce training time compared to the prior art as only
the DOF and/or trajectory portions that require improvement in
performance may be trained. As training progresses, trainer
involvement may be reduced over time. In some implementations, the
trainer may provide corrections to DOF that need to improve
performance, switching from one to the other as needed.
[0177] The selective state space sampling methodology described
herein may enable development of robotic autonomy. Based on
learning to navigate one or more portions of the task trajectory
and/or operate one or more CDOF, the robot may gradually gain
autonomy (e.g., perform actions in based on the learned behaviors
and in absence of supervision by a trainer or other entity).
[0178] Dexterity requirements placed on a trainer and/or trainer
may be simplified as the user may utilize, e.g., a single to train
and/or control a complex (e.g., with multiple CDOF) robotic body.
Using the partial degree of freedom (cascade) training methodology
of the disclosure, may enable use of a simpler (e.g., a single DOF)
control interface configured, e.g., to control a single CDOF to
control a complex robotic apparatus comprising multiple CDOF.
[0179] Partial degree of freedom training and/or selective state
space sampling training may enable the trainer to focus on a subset
of DOF that may be more difficult to train, compared to other DOF.
Such approach may reduce training time for the adaptive control
system as addition as additional training time may be dedicated to
the difficult to train DOF portion without retraining (and
potentially confusing) a better behaving DOF portion.
[0180] It will be recognized that while certain aspects of the
disclosure are described in terms of a specific sequence of steps
of a method, these descriptions are only illustrative of the
broader methods of the disclosure, and may be modified as required
by the particular application. Certain steps may be rendered
unnecessary or optional under certain circumstances. Additionally,
certain steps or functionality may be added to the disclosed
implementations, or the order of performance of two or more steps
permuted. All such variations are considered to be encompassed
within the disclosure disclosed and claimed herein.
[0181] While the above detailed description has shown, described,
and pointed out novel features of the disclosure as applied to
various implementations, it will be understood that various
omissions, substitutions, and changes in the form and details of
the device or process illustrated may be made by those skilled in
the art without departing from the disclosure. This description is
in no way meant to be limiting, but rather should be taken as
illustrative of the general principles of the technology. The scope
of the disclosure should be determined with reference to the
claims.
* * * * *