U.S. patent application number 16/171681 was filed with the patent office on 2019-02-28 for apparatus and methods for matrix addition and subtraction.
The applicant listed for this patent is Cambricon Technologies Corporation Limited. Invention is credited to Tianshi Chen, Yunji Chen, Shaoli Liu, Xiao Zhang.
Application Number | 20190065436 16/171681 |
Document ID | / |
Family ID | 60160565 |
Filed Date | 2019-02-28 |
View All Diagrams
United States Patent
Application |
20190065436 |
Kind Code |
A1 |
Zhang; Xiao ; et
al. |
February 28, 2019 |
APPARATUS AND METHODS FOR MATRIX ADDITION AND SUBTRACTION
Abstract
Aspects for matrix multiplication in neural network are
described herein. The aspects may include a controller unit
configured to receive a matrix-addition instruction. The aspects
may further include a computation module configured to receive a
first matrix and a second matrix. The first matrix may include one
or more first elements and the second matrix includes one or more
second elements. The one or more first elements and the one or more
second elements may be arranged in accordance with a
two-dimensional data structure. The computation module may be
further configured to respectively add each of the first elements
to each of the second elements based on a correspondence in the
two-dimensional data structure to generate one or more third
elements for a third matrix.
Inventors: |
Zhang; Xiao; (Beijing,
CN) ; Liu; Shaoli; (Beijing, CN) ; Chen;
Tianshi; (Beijing, CN) ; Chen; Yunji;
(Beijing, CN) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
Cambricon Technologies Corporation Limited |
Beijing |
|
CN |
|
|
Family ID: |
60160565 |
Appl. No.: |
16/171681 |
Filed: |
October 26, 2018 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
PCT/CN2016/081117 |
May 5, 2016 |
|
|
|
16171681 |
|
|
|
|
Current U.S.
Class: |
1/1 |
Current CPC
Class: |
G06N 3/04 20130101; G06F
17/16 20130101 |
International
Class: |
G06F 17/16 20060101
G06F017/16 |
Foreign Application Data
Date |
Code |
Application Number |
Apr 26, 2016 |
CN |
201610266805.X |
Claims
1. An apparatus of matrix operations in a neural network,
comprising: a controller unit configured to receive a
matrix-addition instruction that indicates a first address of a
first matrix and a second address of a second matrix; and a
computation module configured to: retrieve the first matrix and the
second matrix from a storage device based on the first address of
the first matrix and the second address of the second matrix,
wherein the first matrix includes one or more first elements and
the second matrix includes one or more second elements, and wherein
the one or more first elements and the one or more second elements
are arranged in accordance with a two-dimensional data structure,
and respectively add each of the first elements to each of the
second elements based on a correspondence in the two-dimensional
data structure in accordance with the matrix-addition instruction
to generate one or more third elements for a third matrix.
2. The apparatus of claim 1, wherein the computation module
includes a data controller configured to select a first portion of
the first elements and a second portion of the second elements.
3. The apparatus of claim 2, wherein the computation module
includes one or more adders configured to respectively add each of
the first portion of the first elements to each of the second
portion of the second elements.
4. The apparatus of claim 1, wherein the matrix-addition
instruction includes a first size of the first matrix a second size
of the second matrix, and an output address.
5. The apparatus of claim 4, further comprising an instruction
register configured to store the first address of the first matrix,
the first size of the first matrix, the second address of the
second matrix, the second size of the second matrix, and the output
address.
6. The apparatus of claim 1, wherein the first matrix is a first
vector, and wherein the second matrix is a second vector.
7. The apparatus of claim 1, wherein the controller unit comprises
an instruction obtaining module configured to obtain the
matrix-addition instruction from an instruction storage device.
8. The apparatus of claim 7, wherein the controller unit further
comprises a decoding module configured to decode the
matrix-addition instruction into one or more
micro-instructions.
9. The apparatus of claim 8, wherein the controller unit further
comprises an instruction queue module configured to temporarily
store the matrix-addition instruction and one or more previously
received instructions, and retrieve information corresponding to
operation fields in the matrix-addition instruction.
10. The apparatus of claim 9, wherein the controller unit further
comprises an instruction register configured to store the
information corresponding to the operation fields in the
matrix-addition instruction.
11. The apparatus of claim 10, wherein the controller unit further
comprises a dependency processing unit configured to determine
whether the matrix-addition instruction has a dependency
relationship with the one or more previously received
instructions.
12. The apparatus of claim 11, wherein the controller unit further
comprises a storage queue module configured to store the
matrix-addition instruction while the dependency processing unit is
determining whether the matrix-addition instruction has the
dependency relationship with the one or more previously received
instructions.
13. An apparatus of matrix operations in a neural network,
comprising: a controller unit configured to receive a
matrix-subtraction instruction that indicates a first address of
the first matrix and a second address of the second matrix; and a
computation module configured to retrieve the first matrix and the
second matrix from a storage device based on the first address of
the first matrix and the second address of the second matrix,
wherein the first matrix includes one or more first elements and
the second matrix includes one or more second elements, and wherein
the one or more first elements and the one or more second elements
are arranged in accordance with a two-dimensional data structure,
and respectively subtract each of the first elements from each of
the second elements based on a correspondence in the
two-dimensional data structure in accordance with the
matrix-subtraction instruction to generate one or more third
elements for a third matrix.
14. The apparatus of claim 13, wherein the computation module
includes a data controller configured to select a first portion of
the first elements and a second portion of the second elements.
15. The apparatus of claim 14, wherein the computation module
includes one or more subtractors configured to respectively
subtract each of the first portion of the first elements from each
of the second portion of the second elements.
16. The apparatus of claim 13, wherein the matrix-subtraction
instruction includes a first size of the first matrix , a second
size of the second matrix, and an output address.
17. The apparatus of claim 13, further comprising an instruction
register configured to store the first address of the first matrix,
the first size of the first matrix, the second address of the
second matrix, the second size of the second matrix, and the output
address.
18. The apparatus of claim 13, wherein the first matrix is a first
vector, and wherein the second matrix is a second vector.
19. The apparatus of claim 13, wherein the controller unit
comprises an instruction obtaining module configured to obtain the
matrix-subtraction instruction from an instruction storage
device.
20. The apparatus of claim 19, wherein the controller unit further
comprises a decoding module configured to decode the
matrix-subtraction instruction into one or more
micro-instructions.
21. The apparatus of claim 20, wherein the controller unit further
comprises an instruction queue module configured to temporarily
store the matrix-subtraction instruction and one or more previously
received instructions, and retrieve information corresponding to
operation fields in the matrix-subtraction instruction.
22. The apparatus of claim 21, wherein the controller unit further
comprises an instruction register configured to store the
information corresponding to the operation fields in the
matrix-subtraction instruction.
23. The apparatus of claim 22, wherein the controller unit further
comprises a dependency processing unit configured to determine
whether the matrix-subtraction instruction has a dependency
relationship with the one or more previously received
instructions.
24. The apparatus of claim 23, wherein the controller unit further
comprises a storage queue module configured to store the
matrix-subtraction instruction while the dependency processing unit
is determining whether the matrix-subtraction instruction has the
dependency relationship with the one or more previously received
instructions.
25. A method for matrix operations in a neural network, comprising:
receiving, by a controller unit, a matrix-addition instruction that
indicates a first address of the first matrix and a second address
of the second matrix; retrieving, by a computation module, the
first matrix and the second matrix based on the first address of
the first matrix and the second address of the second matrix,
wherein the first matrix includes one or more first elements and
the second matrix includes one or more second elements, and wherein
the one or more first elements and the one or more second elements
are arranged in accordance with a two-dimensional data structure;
and respectively adding, in response to the matrix-addition
instruction, by the computation module, each of the first elements
to each of the second elements based on a correspondence in the
two-dimensional data structure to generate one or more third
elements for a third matrix.
26. The method of claim 25, further comprising selecting, by a data
controller of the computation module, a first portion of the first
elements and a second portion of the second elements.
27. The method of claim 26, further comprising respectively adding,
by one or more adders of the computation module, each of the first
portion of the first elements to each of the second portion of the
second elements.
28. The method of claim 25, wherein the matrix-addition instruction
includes a first size of the first matrix, a second size of the
second matrix, and an output address.
29. The method of claim 25, further comprising storing, by an
instruction register, the first address of the first matrix, the
first size of the first matrix, the second address of the second
matrix, the second size of the second matrix, and an output
address.
30. The method of claim 25, further comprising obtaining, by an
instruction obtaining module of the controller unit, the
matrix-addition instruction from an instruction storage device.
31. The method of claim 30, further comprising decoding, by a
decoding module of the controller unit, the matrix-addition
instruction into one or more micro-instructions.
32. The method of claim 31, further comprising temporarily storing,
by an instruction queue module of the controller unit, the
matrix-addition instruction and one or more previously received
instructions, and retrieve information corresponding to operation
fields in the matrix-addition instruction.
33. The method of claim 32, further comprising storing, by an
instruction register of the controller unit, the information
corresponding to the operation fields in the matrix-addition
instruction.
34. The method of claim 33, further comprising determining, by a
dependency processing unit of the controller unit, whether the
matrix-addition instruction has a dependency relationship with the
one or more previously received instructions.
35. The method of claim 34, further comprising storing, by a
storage queue module of the controller unit, the matrix-addition
instruction while the dependency processing unit is determining
whether the matrix-addition instruction has the dependency
relationship with the one or more previously received
instructions.
36. A method of matrix operations in a neural network, comprising:
receiving, by a controller unit, a matrix-subtraction instruction
that indicates a first address of the first matrix and a second
address of the second matrix; retrieving, by a computation module,
the first matrix and the second matrix from a storage device based
on the first address of the first matrix and the second address of
the second matrix, wherein the first matrix includes one or more
first elements and the second matrix includes one or more second
elements, and wherein the one or more first elements and the one or
more second elements are arranged in accordance with a
two-dimensional data structure, and respectively subtracting, by
the computation module, each of the first elements from each of the
second elements based on a correspondence in the two-dimensional
data structure in accordance with the matrix-subtraction
instruction to generate one or more third elements for a third
matrix.
37. The method of claim 36, further comprising selecting, by a data
controller, a first portion of the first elements and a second
portion of the second elements.
38. The method of claim 37, further comprising respectively
subtracting, by one or more subtractors of the computation module,
each of the first portion of the first elements from each of the
second portion of the second elements.
39. The method of claim 36, wherein the matrix-subtraction
instruction includes a first size of the first matrix, a second
size of the second matrix, and an output address.
40. The method of claim 39, further comprising storing, by an
instruction register, the first address of the first matrix, the
first size of the first matrix, the second address of the second
matrix, the second size of the second matrix, and the output
address.
41. The method of claim 36, wherein the first matrix is a first
vector, and wherein the second matrix is a second vector.
42. The method of claim 36, further comprising obtaining, by an
instruction obtaining module of the controller unit, the
matrix-subtraction instruction from an instruction storage
device.
43. The method of claim 42, further comprising decoding, by a
decoding module of the controller unit, the matrix-subtraction
instruction into one or more micro-instructions.
44. The method of claim 43, further comprising temporarily storing,
by an instruction queue module of the controller unit, the
matrix-subtraction instruction and one or more previously received
instructions, and retrieve information corresponding to operation
fields in the matrix-subtraction instruction.
45. The method of claim 44, further comprising storing, by an
instruction register of the controller unit, the information
corresponding to the operation fields in the matrix-subtraction
instruction.
46. The method of claim 45, further comprising determining, by a
dependency processing unit of the controller unit, whether the
matrix-subtraction instruction has a dependency relationship with
the one or more previously received instructions.
47. The method of claim 46, further comprising storing, by a
storage queue module of the controller unit, the matrix-subtraction
instruction while the dependency processing unit is determining
whether the matrix-subtraction instruction has the dependency
relationship with the one or more previously received instructions.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] The present invention is a continuation-in-part of PCT
Application No. PCT/CN2016/081117, filed on May 5, 2016, which
claims priority to commonly owned CN application number
201610266805.X, filed on Apr. 26, 2016. The entire contents of each
of the aforementioned applications are incorporated herein by
reference.
BACKGROUND
[0002] Multilayer neural networks (MNN) are widely applied to the
fields such as pattern recognition, image processing, functional
approximation and optimal computation. In recent years, due to the
higher recognition accuracy and better parallelizability,
multilayer artificial neural networks have received increasing
attention by academic and industrial communities.
[0003] A known method to add matrices of a multilayer artificial
neural network is to use a general-purpose processor. Such a method
uses a general-purpose register file and a general-purpose
functional unit to execute general-purpose instructions to perform
matrix-addition operations in MNNs. However, one of the defects of
the method is low operational performance of a single
general-purpose processor which cannot meet performance
requirements for usual multilayer neural network operations. When
multiple general-purpose processors execute concurrently, the
intercommunication among them also becomes a performance
bottleneck.
[0004] Another known method to add matrices of the multilayer
artificial neural network is to use a graphics processing unit
(GPU). Such a method uses a general-purpose register file and a
general-purpose stream processing unit to execute general purpose
single-instruction-multiple-data (SIMD) instructions to support the
algorithms in MNNs. However, since GPU only contains rather small
on-chip caching, then model data (weight values) of a multilayer
artificial neural network may be repeatedly moved from the
off-chip, and off-chip bandwidth becomes a main performance
bottleneck, causing huge power consumption.
SUMMARY
[0005] The following presents a simplified summary of one or more
aspects in order to provide a basic understanding of such aspects.
This summary is not an extensive overview of all contemplated
aspects, and is intended to neither identify key or critical
elements of all aspects nor delineate the scope of any or all
aspects. Its sole purpose is to present some concepts of one or
more aspects in a simplified form as a prelude to the more detailed
description that is presented later.
[0006] One example aspect of the present disclosure provides an
example apparatus for matrix operations in a neural network. The
example apparatus may include a controller unit configured to
receive a matrix-addition instruction. The example apparatus may
include a computation module configured to receive a first matrix
and a second matrix from a storage device. The first matrix may
include one or more first elements and the second matrix includes
one or more second elements. The one or more first elements and the
one or more second elements may be arranged in accordance with a
two-dimensional data structure. The computation module may be
further configured to respectively add each of the first elements
to each of the second elements based on a correspondence in the
two-dimensional data structure in accordance with the
matrix-addition instruction to generate one or more third elements
for a third matrix.
[0007] Another example apparatus may include a controller unit
configured to receive a matrix-add-scalar instruction. The example
apparatus may further configured a computation module configured to
receive a first matrix and a scalar value from a storage device.
The first matrix may include one or more first elements. The one or
more first elements may be arranged in accordance with a
two-dimensional data structure. The computation module may be
further configured to respectively add the scalar value to each of
the one or more first elements of the first matrix to generate one
or more second elements for a second matrix.
[0008] Another example aspect of the present disclosure provides an
example method for matrix operations in a neural network. The
example method may include receiving, by a computation module, a
first matrix and a second matrix. The first matrix may include one
or more first elements and the second matrix includes one or more
second elements. The one or more first elements and the one or more
second elements may be arranged in accordance with a
two-dimensional data structure. The example method may further
include respectively adding, by the computation module, each of the
first elements to each of the second elements based on a
correspondence in the two-dimensional data structure to generate
one or more third elements for a third matrix.
[0009] Another example method may include receiving, by a direct
memory access unit, a first matrix and a scalar value. The first
matrix includes one or more first elements. The one or more first
elements may be arranged in accordance with a two-dimensional data
structure. The example method may further include respectively
adding, by the computation module, the scalar value to each of the
one or more first elements of the first matrix to generate one or
more second elements for a second matrix.
BRIEF DESCIPTIOIN OF THE DRAWINGS
[0010] The disclosed aspects will hereinafter be described in
conjunction with the appended drawings, provided to illustrate and
not to limit the disclosed aspects, wherein like designations
denote like elements, and in which:
[0011] FIG. 1 illustrates a block diagram of an example neural
network acceleration processor by which matrix operations may be
implemented in a neural network;
[0012] FIG. 2 illustrates an example matrix operation between two
matrices that may be performed by the example neural network
acceleration processor;
[0013] FIG. 3 illustrates an example computation module in the
example neural network acceleration processor by which matrix
operations may be implemented in a neural network;
[0014] FIG. 4 illustrates flow chart of an example method for
matrix operation in a neural network; and
[0015] FIG. 5 illustrates flow chart of another example method for
matrix operation in a neural network.
DETAILED DESCRIPTION
[0016] Various aspects are now described with reference to the
drawings. In the following description, for purpose of explanation,
numerous specific details are set forth in order to provide a
thorough understanding of one or more aspects. It may be evident,
however, that such aspect(s) may be practiced without these
specific details.
[0017] In the present disclosure, the term "comprising" and
"including" as well as their derivatives mean to contain rather
than limit; the term "or", which is also inclusive, means
and/or.
[0018] In this specification, the following various embodiments
used to illustrate principles of the present disclosure are only
for illustrative purpose, and thus should not be understood as
limiting the scope of the present disclosure by any means. The
following description taken in conjunction with the accompanying
drawings is to facilitate a thorough understanding to the
illustrative embodiments of the present disclosure defined by the
claims and its equivalent. There are specific details in the
following description to facilitate understanding. However, these
details are only for illustrative purpose. Therefore, persons
skilled in the art should understand that various alternation and
modification may be made to the embodiments illustrated in this
description without going beyond the scope and spirit of the
present disclosure. In addition, for clear and concise purpose,
some known functionality and structure are not described. Besides,
identical reference numbers refer to identical function and
operation throughout the accompanying drawings.
[0019] Addition between two matrices in a neural network may be
presented as follows: R=A+B, in which A represents a first matrix,
B represents a second matrix, and R represents a result matrix.
Similarly, subtraction between the two matrices may be presented as
R=A-B. The first matrix and the second matrix may be structured to
include m rows and n columns and may be referred to as an m.times.n
matrix. In other words, both the elements of the first matrix and
the second matrix may be arranged in a two-dimensional data
structure that includes m rows and n columns. The first matrix A
may be described as
[ a 11 a 12 a 1 i a 1 n a 21 a 22 a 2 i a 2 n a j 1 a j 2 a ji a jn
a m 1 a m 2 a mi a mn ] ##EQU00001##
and the second matrix B may be described as
[ b 11 b 12 b 1 i b 1 n b 21 b 22 b 2 i b 2 n b j 1 b j 2 b ji b jn
b m 1 b m 2 b mi b mn ] . ##EQU00002##
[0020] FIG. 1 illustrates a block diagram of an example neural
network acceleration processor by which matrix operations may be
implemented in a neural network.
[0021] As depicted, the example neural network acceleration
processor 100 may include a controller unit 106, a direct memory
access unit 102, a computation module 110, and a matrix caching
unit 112. Any of the above-mentioned components or devices may be
implemented by a hardware circuit (e.g., application specific
integrated circuit (ASIC), Coarse-grained reconfigurable
architectures (CGRAs), field-programmable gate arrays (FPGAs),
analog circuits, memristor, etc.).
[0022] In some examples, a matrix addition instruction may
originate from an instruction storage device 134 to the controller
unit 106. An instruction obtaining module 132 may be configured to
obtain a matrix addition instruction from the instruction storage
device 134 and transmit the instruction to a decoding module
130.
[0023] The decoding module 130 may be configured to decode the
instruction. The instruction may include one or more operation
fields that indicate parameters for executing the instruction. The
parameters may refer to identification numbers of different
registers ("register ID" hereinafter) in the instruction register
126. Thus, by modifying the parameters in the instruction register
126, the neural network acceleration processor 100 may modify the
instruction without receiving new instructions. The decoded
instruction may be transmitted by the decoding module 130 to an
instruction queue module 128. In some other examples, the one or
more operation fields may store immediate values such as addressed
in the memory 101 and a scalar value, rather than the register
IDs.
[0024] The instruction queue module 128 may be configured to
temporarily store the received instruction and/or one or more
previously received instructions. Further, the instruction queue
module 128 may be configured to retrieve information according to
the register IDs included in the instruction from the instruction
register 126.
[0025] For example, the instruction queue module 128 may be
configured to retrieve information corresponding to operation
fields in the instruction from the instruction register 126.
Information for the operation fields in a matrix-add-matrix (MAM)
instruction, for example, may include a starting address of a first
matrix, a length of the first matrix, a starting address of a
second matrix, a length of the second matrix, and an address for a
result matrix. As depicted, in some examples, the instruction
register 126 may be implemented by one or more registers external
to the controller unit.
[0026] The instruction register 126 may be further configured to
store scalar values for the instruction. Once the relevant values
are retrieved, the instruction may be sent to a dependency
processing unit 124.
[0027] The dependency processing unit 124 may be configured to
determine whether the instruction has a dependency relationship
with the data of the previous instruction that is being executed.
This instruction may be stored in the storage queue module 122
until it has no dependency relationship on the data with the
previous instruction that has not finished executing. If the
dependency relationship does not exist, the controller unit 106 may
be configured to decode the instruction into micro-instructions for
controlling operations of other modules including the direct memory
access unit 102, the master computation module 112, the slave
computation modules 114, etc.
[0028] The direct memory access unit 102 may be configured to
access an external address range (e.g., in an external storage
device such as a memory 101) and directly read or write matrix data
into respective caching units in the multiple computation modules
110 in accordance with the received instruction. Hereinafter, a
caching unit (e.g the matrix caching unit 112 etc.) may refer to an
on-chip caching unit integrated in the neural network acceleration
processor 100, rather than other storage devices in memory 101 or
other external devices. In some examples, the on-chip caching unit
may be implemented as an on-chip buffer, an on-chip Static Random
Access Memory (SRAM), or other types of on-chip storage devices
that may provide higher access speed than the external memory. In
some other examples, the instruction register 126 may be
implemented as a scratchpad memory, e.g., Dynamic random-access
memory (DRAM), embedded DRAM (eDRAM), memristor, 3D-DRAM,
non-volatile memory, etc.
[0029] For example, the direct memory access unit 102 may store
data (i.e., elements) of the first matrix A and the second matrix B
in the matrix caching unit 112 or other caching units in the
computation module 110.
[0030] Upon receiving a matrix operation instruction from the
controller unit 106, the computation module 110 may be configured
to receive the data of the first matrix A and the second matrix B.
Since the elements of the first matrix A and the second matrix B
are both arranged and stored in a same m.times.n two-dimensional
data structure, the computation module 110 may be configured to add
each of the elements in the first matrix A to the elements in the
second matrix B based on a correspondence in the two-dimensional
data structure. For example, when the first matrix A may be
described as
[ a 11 a 12 a 1 i a 1 n a 21 a 22 a 2 i a 2 n a j 1 a j 2 a ji a jn
a m 1 a m 2 a mi a mn ] ##EQU00003##
and the second matrix B may be described as
[ b 11 b 12 b 1 i b 1 n b 21 b 22 b 2 i b 2 n b j 1 b j 2 b ji b jn
b m 1 b m 2 b mi b mn ] , ##EQU00004##
the computation module 110 may be configured to add the elements in
the first matrix A and in the second matrix B correspondingly to
generate a result matrix that may be described as
[ a 11 + b 11 a 12 + b 12 a 1 i + b 1 i a 1 n + b 1 n a 21 + b 21 a
22 + b 22 a 2 i + b 2 i a 2 n + b 2 n a j 1 + b j 1 a j 2 + b j 2 a
ji + b ji a jn + b jn a m 1 + b m 1 a m 2 + b m 2 a mi + b mi a mn
+ b mn ] . ##EQU00005##
Similarly, when the matrix operation instruction includes an
operation code that instructs the computation module 110 to perform
a subtraction between the two matrices ("matrix-minus-matrix"
operation hereinafter), the computation module 110 may be
configured to subtract the elements in the second matrix B from the
elements in the first matrix A correspondingly to generate a result
matrix that may be described as
[ a 11 - b 11 a 12 - b 12 a 1 i - b 1 i a 1 n - b 1 n a 21 - b 21 a
22 - b 22 a 2 i - b 2 i a 2 n - b 2 n a j 1 - b j 1 a j 2 - b j 2 a
ji - b ji a jn - b jn a m 1 - b m 1 a m 2 - b m 2 a mi - b mi a mn
- b mn ] . ##EQU00006##
In an example wherein the matrix operation instruction instructs
the computation module 110 to add a scalar value to the first
matrix A, the computation module 110 may be configured to first
duplicate the scalar value (hereinafter "S") to each element of the
second matrix B and then similarly add the elements in the first
matrix A to the second matrix B to generate a result matrix as
[ a 11 + S a 12 + S a 1 i + S a 1 n + S a 21 + S a 22 + S a 2 i + S
a 2 n + S a j 1 + S a j 2 + S a ji + S a jn + S a m 1 + S a m 2 + S
a mi + S a mn + S ] . ##EQU00007##
Similarly, the computation module 112 may also be instructed to
perform an operation to subtract a scalar value ("S") from the
first matrix A ("matrix-minus-scalar" operation hereinafter) to
generate a result matrix as
[ a 11 - S a 12 - S a 1 i - S a 1 n - S a 21 - S a 22 - S a 2 i - S
a 2 n - S a j 1 - S a j 2 - S a ji - S a jn - S a m 1 - S a m 2 - S
a mi - S a mn - S ] . ##EQU00008##
[0031] FIG. 2 illustrates an example matrix operation between two
matrices that may be performed by the example neural network
acceleration processor. As depicted, the first matrix A and the
second matrix B may be formatted in an m.times.n two-dimensional
data structure and stored in the matrix caching unit 112. The
computation module 110 may include multiple adders (e.g., adder(1)
. . . adder(k)). The count of the multiple adders may be less than
the count of the entire elements (i.e., k<m.times.n). Thus, the
multiple adders in the computation module may be configured to
sequentially process portions of the elements in the first matrix A
and the second matrix B. For example, the multiple adders may be
configured to add the first k elements (e.g., A(1,1) to A(1,k)) in
the first matrix A to the first k elements (e.g., B(1,1) to
B(1,k)). After the addition of the first k elements, the multiple
adders may be configured to process the next k elements in the
first matrix A and the second matrix B.
[0032] In the processing of subtraction, the adders of the
computation module 110 may be similarly configured to sequentially
process the portions of the elements in the first matrix A and the
second matrix B.
[0033] FIG. 3 illustrates an example computation module in the
example neural network acceleration processor by which matrix
operations may be implemented in a neural network. As depicted, the
computation module 110 may include a computation unit 302, a data
dependency relationship determination unit 304, a neuron caching
unit 306. The computation unit 302 may further include one or more
adders 308, one or more subtractors 310, a data access unit 312,
and a data controller 314.
[0034] The data dependency relationship determination unit 304 may
be configured to perform data access operations (e.g., reading or
writing operations) on the caching units including the neuron
caching unit 306 during the computation process. The data
dependency relationship determination unit 304 may be configured to
prevent conflicts in reading and writing of the data in the caching
units. For example, the data dependency relationship determination
unit 304 may be configured to determine whether there is dependency
relationship in terms of data between a micro-instruction which to
be executed and a micro-instruction being executed. If no
dependency relationship exists, the micro-instruction may be
allowed to be executed; otherwise, the micro-instruction may not be
allowed to be executed until all micro-instructions on which it
depends have been executed completely. The dependency relationship
may be determined when a target operation range of the
micro-instruction to be executed overlaps a target operation range
of a micro-instruction being executed. For example, all
micro-instructions sent to the data dependency relationship
determination unit 304 may be stored in an instruction queue within
the data dependency relationship determination unit 304.The
instruction queue may indicate the relative priorities of the
stored micro-instructions. In the instruction queue, if the target
operation range of reading data by a reading instruction conflicts
with or overlaps the target operation range of writing data by a
writing instruction of higher priority in the front of the
instruction queue, then the reading instruction may not be executed
until the writing instruction is executed.
[0035] The neuron caching unit 306 may be configured to store the
elements in the first matrix A and the second matrix B and the
scalar value.
[0036] The computation unit 302 may be configured to receive
micro-instructions from the controller unit 106 and perform
arithmetical logic operations according to the
micro-instructions.
[0037] In the example that the micro-instructions instruct the
computation module 110 to perform an addition for the first matrix
A and the second matrix B, the data controller 314 may be
configured to select a portion of the elements in the first matrix
A and the second matrix B. For example, as described in accordance
with FIG. 2, the data controller 314 may be configured to select
the first k elements in the first matrix A and the second matrix B.
The adders 308 may be configured to respectively add each element
in the first k element of the first matrix A to each element in the
first k element of the second matrix B. Sequentially, the data
controller 314 may be configured to select other portions in the
first matrix A and the second matrix B. The adders 308 may be
configured to similarly add the elements in the portions
accordingly to generate the result matrix.
[0038] In the example that the micro-instructions instruct the
computation module 110 to perform a subtraction to subtract the
second matrix B from the first matrix A, the data controller 314
may be configured to similarly select a portion of the elements in
the first matrix A and the second matrix B. For example, as
described in accordance with FIG. 2, the data controller 314 may be
configured to select the first k elements in the first matrix A and
the second matrix B. The subtractor 310 may be configured to
respectively subtract each element in the first k element of the
second matrix B from each element in the first k elements of the
first matrix A. Sequentially, the data controller 314 may be
configured to select other portions in the first matrix A and the
second matrix B. The subtractor 310 may be configured to similarly
subtract the elements of the portions in the second matrix B from
the elements of the portions in the first matrix A. In some
examples, the subtractor 310 may include one or more inverters to
invert the elements in the second matrix B. The inverted elements
may be transmitted to the adders 308. The adders 308 may be
configured to add the elements from the first matrix A to the
inverted elements from the second matrix B to generate the result
matrix for the subtraction operation.
[0039] In some other examples, the adders 308 may include one or
more inverters configure to invert the elements in the second
matrix B. As such, the adders 308 may be configured to subtract
elements in the second matrix B from the elements in the first
matrix A.
[0040] In the example that the micro-instructions instruct the
computation module 110 to add a scalar value to the first matrix A
("matrix-add-scalar" operation hereinafter) or to subtract the
scalar value from the first matrix A ("matrix-minus-scalar"
operation hereinafter), the data access unit 312 may be configured
to read the scalar value and write the scalar value to each element
in the second matrix B. In other words, the second matrix B may be
shown as
[ S S S S S S S S S S S S S S S S ] . ##EQU00009##
The adders 308 may then be configured to perform the addition
operation between the first matrix A and the second matrix B with
the duplicated scalar value as the elements to generate the result
for the matrix-add-scalar operation. Similarly, the subtractors 310
may be configured to perform the subtraction operation between the
first matrix A and the second matrix B with the duplicated scalar
value as the elements to generate the result for the
matrix-minus-scalar operation.
[0041] The results of the operations performed by the computation
unit 302 may be transmitted to the matrix caching unit 112.
[0042] FIG. 4 illustrates flow chart of an example method 400 for
matrix operation in a neural network. The example method 400 may be
performed by one or more components of the apparatus of FIGS. 1 and
3. Dash-lined blocks may indicate optional operations.
[0043] At block 402, the example method 400 may include receiving,
by a controller unit, a matrix-addition instruction that indicates
a first address of the first matrix and a second address of the
second matrix. For example, the controller unit 106 may receive a
matrix-add-matrix (MAM) instruction that indicates a starting
address of a first matrix, a length of the first matrix, a starting
address of a second matrix, a length of the second matrix, and an
address for a result matrix. For example, the MAM instruction may
include one or more register IDs that identify one or more
registers configured to store the starting address of the first
matrix, the length of the first matrix, the starting address of the
second matrix, the length of the second matrix, and the address for
the result matrix. Alternatively, the MAM instruction may store the
immediate values of the starting address of the first matrix, the
length of the first matrix, the starting address of the second
matrix, the length of the second matrix, and the address for the
result matrix.
[0044] At block 404, the example method 400 may include retrieving,
by a computation module, the first matrix and the second matrix
based on the first address of the first matrix and the second
address of the second matrix. For example, the computation module
110 may be configured to retrieve the first matrix and the second
matrix from the direct memory access unit 102 based on the first
address and the second address.
[0045] At block 406, the example method 400 may include
respectively adding, in response to the matrix-addition
instruction, by the computation module, each of the first elements
to each of the second elements based on a correspondence in the
two-dimensional data structure to generate one or more third
elements for a third matrix.
[0046] For example, since the elements of the first matrix A and
the second matrix B are both arranged and stored in a same
m.times.n two-dimensional data structure, the computation module
110 may be configured to add each of the elements in the first
matrix A to the elements in the second matrix B based on a
correspondence in the two-dimensional data structure. For example,
when the first matrix A may be described as
[ a 11 a 12 a 1 i a 1 n a 21 a 22 a 2 i a 2 n a j 1 a j 2 a ji a jn
a m 1 a m 2 a mi a mn ] ##EQU00010##
and the second matrix B may be described as
[ b 11 b 12 b 1 i b 1 n b 21 b 22 b 2 i b 2 n b j 1 b j 2 b ji b jn
b m 1 b m 2 b mi b mn ] , ##EQU00011##
the computation module 110 may be configured to add the elements in
the first matrix A and in the second matrix B correspondingly to
generate a result matrix that may be described as
[ a 11 + b 11 a 12 + b 12 a 1 i + b 1 i a 1 n + b 1 n a 21 + b 21 a
22 + b 22 a 2 i + b 2 i a 2 n + b 2 n a j 1 + b j 1 a j 2 + b j 2 a
ji + b ji a jn + b jn a m 1 + b m 1 a m 2 + b m 2 a mi + b mi a mn
+ b mn ] . ##EQU00012##
[0047] FIG. 5 illustrates flow chart of another example method for
matrix operation in a neural network. The example method 500 may be
performed by one or more components of the apparatus of FIGS. 1 and
3. Dash-lined blocks may indicate optional operations.
[0048] At block 502, the example method 502 may include receiving,
by a controller unit, a matrix-subtraction instruction that
indicates a first address of the first matrix and a second address
of the second matrix. For example, the controller unit 106 may
receive a matrix-minus-matrix instruction that indicates a starting
address of a first matrix, a length of the first matrix, a starting
address of a second matrix, a length of the second matrix, and an
address for a result matrix. For example, the matrix-minus-matrix
instruction may include one or more register IDs that identify one
or more registers configured to store the starting address of the
first matrix, the length of the first matrix, the starting address
of the second matrix, the length of the second matrix, and the
address for the result matrix. Alternatively, the
matrix-minus-matrix instruction may store the immediate values of
the starting address of the first matrix, the length of the first
matrix, the starting address of the second matrix, the length of
the second matrix, and the address for the result matrix.
[0049] At block 404, the example method 400 may include retrieving,
by a computation module, the first matrix and the second matrix
based on the first address of the first matrix and the second
address of the second matrix. For example, the computation module
110 may be configured to retrieve the first matrix and the second
matrix from the direct memory access unit 102 based on the first
address and the second address.
[0050] At block 406, the example method 400 may include
respectively subtracting, by the computation module, each of the
first elements from each of the second elements based on a
correspondence in the two-dimensional data structure in accordance
with the matrix-subtraction instruction to generate one or more
third elements for a third matrix. For example, since the elements
of the first matrix A and the second matrix B are both arranged and
stored in a same m.times.n two-dimensional data structure, the
computation module 110 may be configured to subtract the elements
in the second matrix B from the elements in the first matrix A
correspondingly to generate a result matrix. For example, when the
first matrix A may be described as
[ a 11 a 12 a 1 i a 1 n a 21 a 22 a 2 i a 2 n a j 1 a j 2 a ji a jn
a m 1 a m 2 a mi a mn ] ##EQU00013##
and the second matrix B may be described as
[ b 11 b 12 b 1 i b 1 n b 21 b 22 b 2 i b 2 n b j 1 b j 2 b ji b jn
b m 1 b m 2 b mi b mn ] , ##EQU00014##
the computation module 110 may be configured to subtract the
elements in the second matrix B from the elements in the first
matrix A correspondingly to generate a result matrix that may be
described as
[ a 11 - b 11 a 12 - b 12 a 1 i - b 1 i a 1 n - b 1 n a 21 - b 21 a
22 - b 22 a 2 i - b 2 i a 2 n - b 2 n a j 1 - b j 1 a j 2 - b j 2 a
ji - b ji a jn - b jn a m 1 - b m 1 a m 2 - b m 2 a mi - b mi a mn
- b mn ] . ##EQU00015##
[0051] The process or method described in the above accompanying
figures can be performed by process logic including hardware (for
example, circuit, specific logic etc.), firmware, software (for
example, a software being externalized in non-transitory
computer-readable medium), or the combination of the above two.
Although the process or method is described above in a certain
order, it should be understood that some operations described may
also be performed in different orders. In addition, some operations
may be executed concurrently rather than in order.
[0052] In the above description, each embodiment of the present
disclosure is illustrated with reference to certain illustrative
embodiments. Apparently, various modifications may be made to each
embodiment without going beyond the wider spirit and scope of the
present disclosure presented by the affiliated claims.
Correspondingly, the description and accompanying figures should be
understood as illustration only rather than limitation. It is
understood that the specific order or hierarchy of steps in the
processes disclosed is an illustration of exemplary approaches.
Based upon design preferences, it is understood that the specific
order or hierarchy of steps in the processes may be rearranged.
Further, some steps may be combined or omitted. The accompanying
method claims present elements of the various steps in a sample
order, and are not meant to be limited to the specific order or
hierarchy presented.
[0053] The previous description is provided to enable any person
skilled in the art to practice the various aspects described
herein. Various modifications to these aspects will be readily
apparent to those skilled in the art, and the generic principles
defined herein may be applied to other aspects. Thus, the claims
are not intended to be limited to the aspects shown herein but is
to be accorded the full scope consistent with the language claims,
wherein reference to an element in the singular is not intended to
mean "one and only one" unless specifically so stated, but rather
"one or more." Unless specifically stated otherwise, the term
"some" refers to one or more. All structural and functional
equivalents to the elements of the various aspects described herein
that are known or later come to be known to those of ordinary skill
in the art are expressly incorporated herein by reference and are
intended to be encompassed by the claims. Moreover, nothing
disclosed herein is intended to be dedicated to the public
regardless of whether such disclosure is explicitly recited in the
claims. No claim element is to be construed as a means plus
function unless the element is expressly recited using the phrase
"means for."
[0054] Moreover, the term "or" is intended to mean an inclusive
"or" rather than an exclusive "or." That is, unless specified
otherwise, or clear from the context, the phrase "X employs A or B"
is intended to mean any of the natural inclusive permutations. That
is, the phrase "X employs A or B" is satisfied by any of the
following instances: X employs A; X employs B; or X employs both A
and B. In addition, the articles "a" and "an" as used in this
application and the appended claims should generally be construed
to mean "one or more" unless specified otherwise or clear from the
context to be directed to a singular form.
* * * * *