U.S. patent number 10,559,360 [Application Number 16/109,542] was granted by the patent office on 2020-02-11 for apparatuses and methods for determining population count.
This patent grant is currently assigned to Micron Technology, Inc.. The grantee listed for this patent is Micron Technology, Inc.. Invention is credited to Timothy P. Finkbeiner, Glen E. Hush, Richard C. Murphy.
![](/patent/grant/10559360/US10559360-20200211-D00000.png)
![](/patent/grant/10559360/US10559360-20200211-D00001.png)
![](/patent/grant/10559360/US10559360-20200211-D00002.png)
![](/patent/grant/10559360/US10559360-20200211-D00003.png)
![](/patent/grant/10559360/US10559360-20200211-D00004.png)
![](/patent/grant/10559360/US10559360-20200211-D00005.png)
![](/patent/grant/10559360/US10559360-20200211-D00006.png)
![](/patent/grant/10559360/US10559360-20200211-D00007.png)
![](/patent/grant/10559360/US10559360-20200211-D00008.png)
![](/patent/grant/10559360/US10559360-20200211-D00009.png)
![](/patent/grant/10559360/US10559360-20200211-D00010.png)
View All Diagrams
United States Patent |
10,559,360 |
Finkbeiner , et al. |
February 11, 2020 |
Apparatuses and methods for determining population count
Abstract
The present disclosure includes apparatuses and methods related
to determining population count. An example apparatus comprises an
array of memory cells coupled to sensing circuitry. The apparatus
can include a controller configured to cause: summing, in parallel,
of data values corresponding to respective ones of a plurality of
first vectors stored in memory cells of the array as a data value
sum representing a population count thereof, wherein a second
vector is stored as the plurality of first vectors, and wherein
each first vector of the plurality of first vectors is stored in
respective memory cells of the array that are coupled to a
respective sense line of a plurality of sense lines; and
iteratively summing, in parallel, of data value sums corresponding
to the plurality of first vectors to provide a single data value
sum corresponding to the second vector.
Inventors: |
Finkbeiner; Timothy P. (Boise,
ID), Hush; Glen E. (Boise, ID), Murphy; Richard C.
(Boise, ID) |
Applicant: |
Name |
City |
State |
Country |
Type |
Micron Technology, Inc. |
Boise |
ID |
US |
|
|
Assignee: |
Micron Technology, Inc. (Boise,
ID)
|
Family
ID: |
55402539 |
Appl.
No.: |
16/109,542 |
Filed: |
August 22, 2018 |
Prior Publication Data
|
|
|
|
Document
Identifier |
Publication Date |
|
US 20180366202 A1 |
Dec 20, 2018 |
|
Related U.S. Patent Documents
|
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
Issue Date |
|
|
14833796 |
Aug 24, 2015 |
10068652 |
|
|
|
62045257 |
Sep 3, 2014 |
|
|
|
|
Current U.S.
Class: |
1/1 |
Current CPC
Class: |
G11C
7/1006 (20130101); G11C 7/12 (20130101); G11C
16/26 (20130101); G11C 7/06 (20130101); G11C
7/065 (20130101); G11C 7/1009 (20130101) |
Current International
Class: |
G11C
16/26 (20060101); G11C 7/06 (20060101); G11C
7/10 (20060101); G11C 7/12 (20060101) |
Field of
Search: |
;708/210 |
References Cited
[Referenced By]
U.S. Patent Documents
Foreign Patent Documents
|
|
|
|
|
|
|
101377954 |
|
Mar 2009 |
|
CN |
|
102141905 |
|
Aug 2011 |
|
CN |
|
103473213 |
|
Dec 2013 |
|
CN |
|
0214718 |
|
Mar 1987 |
|
EP |
|
2026209 |
|
Feb 2009 |
|
EP |
|
0847551 |
|
Dec 2012 |
|
EP |
|
H0831168 |
|
Feb 1996 |
|
JP |
|
2009259193 |
|
Mar 2015 |
|
JP |
|
10-0211482 |
|
Aug 1998 |
|
KR |
|
10-2010-0134235 |
|
Dec 2010 |
|
KR |
|
10-2013-0049421 |
|
May 2013 |
|
KR |
|
2001065359 |
|
Sep 2001 |
|
WO |
|
2010079451 |
|
Jul 2010 |
|
WO |
|
2013062596 |
|
May 2013 |
|
WO |
|
2013081588 |
|
Jun 2013 |
|
WO |
|
2013095592 |
|
Jun 2013 |
|
WO |
|
Other References
Office Action for related China Patent Application No.
201580047607.8, dated Jan. 28, 2019, 18 pages. cited by applicant
.
Communication Pursuant to Article 94(3) EPC for related EP
Application No. 15838739.9, dated Feb. 8, 2019, 9 pages. cited by
applicant .
Boyd et al., "On the General Applicability of Instruction-Set
Randomization", Jul.-Sep. 2010, (14 pgs.), vol. 7, Issue 3, IEEE
Transactions on Dependable and Secure Computing. cited by applicant
.
Stojmenovic, "Multiplicative Circulant Networks Topological
Properties and Communication Algorithms", (25 pgs.), Discrete
Applied Mathematics 77 (1997) 281-305. cited by applicant .
"4.9.3 MINLOC and MAXLOC", Jun. 12, 1995, (5pgs.), Message Passing
Interface Forum 1.1, retrieved from
http://www.mpi-forum.org/docs/mpi-1.1/mpi-11-html/node79.html.
cited by applicant .
Derby, et al., "A High-Performance Embedded DSP Core with Novel
SIMD Features", Apr. 6-10, 2003, (4 pgs), vol. 2, pp. 301-304, 2003
IEEE International Conference on Accoustics, Speech, and Signal
Processing. cited by applicant .
Debnath, Biplob, Bloomflash: Bloom Filter on Flash-Based Storage,
2011 31st Annual Conference on Distributed Computing Systems, Jun.
20-24, 2011, 10 pgs. cited by applicant .
Pagiamtzis, Kostas, "Content-Addressable Memory Introduction", Jun.
25, 2007, (6 pgs.), retrieved from:
http://www.pagiamtzis.com/cam/camintro. cited by applicant .
Pagiamtzis, et al., "Content-Addressable Memory (CAM) Circuits and
Architectures: A Tutorial and Survey", Mar. 2006, (16 pgs.), vol.
41, No. 3, IEEE Journal of Solid-State Circuits. cited by applicant
.
International Search Report and Written Opinion for PCT Application
No. PCT/US2013/043702, dated Sep. 26, 2013, (11 pgs.). cited by
applicant .
Elliot, et al., "Computational RAM: Implementing Processors in
Memory", Jan.-Mar. 1999, (10 pgs.), vol. 16, Issue 1, IEEE Design
and Test of Computers Magazine. cited by applicant .
Dybdahl, et al., "Destructive-Read in Embedded DRAM, Impact on
Power Consumption," Apr. 2006, (10 pgs.), vol. 2, Issue 2, Journal
of Embedded Computing-Issues in embedded single-chip multicore
architectures. cited by applicant .
Kogge, et al., "Processing in Memory: Chips to Petaflops," May 23,
1997, (8 pgs.), retrieved from:
http://www.cs.ucf.edu/courses/cda5106/summer02/papers/kogge97PIM.pdf.
cited by applicant .
Draper, et al., "The Architecture of the DIVA Processing-in-Memory
Chip," Jun. 22-26, 2002, (12 pgs.), ICS '02, retrieved from:
http://www.isi.edu/.about.draper/papers/ics02.pdf. cited by
applicant .
Adibi, et al., "Processing-in-Memory Technology for Knowledge
Discovery Algorithms," Jun. 25, 2006, (10 pgs.), Proceeding of the
Second International Workshop on Data Management on New Hardware,
retrieved from:
http://www.cs.cmu.edu/.about.damon2006/pdf/adibi06inmemory.pdf.
cited by applicant .
U.S. Appl. No. 13/449,082, entitled, "Methods and Apparatus for
Pattern Matching," filed Apr. 17, 2012, (37 pgs.). cited by
applicant .
U.S. Appl. No. 13/743,686, entitled, "Weighted Search and Compare
in a Memory Device," filed Jan. 17, 2013, (25 pgs.). cited by
applicant .
U.S. Appl. No. 13/774,636, entitled, "Memory as a Programmable
Logic Device," filed Feb. 22, 2013, (30 pgs). cited by applicant
.
U.S. Appl. No. 13/774,553, entitled, "Neural Network in a Memory
Device," filed Feb. 22, 2013, (63 pgs). cited by applicant .
U.S. Appl. No. 13/796,189, entitled, "Performing Complex Arithmetic
Functions in a Memory Device," filed Mar. 12, 2013, (23 pgs). cited
by applicant .
Mano et al., "Logic and Computer Design Fundamentals", Jan. 1,
2013, retrieved from
http://www.theeuropeanlibrary.org/tel4/record/resolve?provider=P020238&co-
llection=all95&identifier=http%3A%2F%Fdata.copac.ac.uk%Fiid%2F71811582
on Mar. 6, 2018, pp. 160-161 and 175-176. cited by applicant .
European Search Report and Written Opinion for related EP
Application No. 15838739.9, dated Mar. 14, 2018, 14 pages. cited by
applicant .
Notice of Rejection for related Korea Patent Application No.
10-2017-7008638, 26 pages. cited by applicant .
Guo et al., "A Resistive TCAM Accelerator for Data-Intensive
Computing", 2011 44th Annual IEEE/ACM International Symposium on
Microarchitecture (MICRO'11), Dec. 3-7, 2011, pp. 339-350. cited by
applicant .
Ipek et al., "Resistive Memories in Associative Computing",
Emerging Memory Technologies in Associative Computing, Springer
Science+Business Media, New York, 2014, Ch. 8. cited by applicant
.
Finkbeiner et al., "Architectures for the Post-Moore Era: In-Memory
Intelligence", IEEE Computer Society, IEEE Micro, vol. 27, Issue 4,
Jul./Aug. 2017, pp. 30-38. cited by applicant.
|
Primary Examiner: Caldwell; Andrew
Assistant Examiner: Larocque; Emily E
Attorney, Agent or Firm: Brooks, Cameron & Huebsch,
PLLC
Parent Case Text
PRIORITY INFORMATION
This application is a Continuation of U.S. application Ser. No.
14/833,796, filed Aug. 24, 2015, which claims the benefit of U.S.
Provisional Application No. 62/045,257, filed Sep. 3, 2014, the
contents of which are incorporated herein by reference.
Claims
What is claimed is:
1. An apparatus, comprising: an array of memory cells coupled to
sensing circuitry and to a plurality of sense lines; the sensing
circuitry comprising a sense amplifier and a compute component
coupled to the plurality of sense lines, wherein the sense
amplifier comprises a primary latch and the compute component
comprises a secondary latch; and a controller configured to use the
sense amplifier and the compute component to cause: summing, in
parallel and using the primary latch and the secondary latch, of
data values corresponding to respective ones of a plurality of
vectors stored in memory cells of the array as data value sums
representing population counts thereof, iteratively summing, in
parallel and using the primary latch and the secondary latch, of
the data value sums corresponding to the plurality of vectors to
provide a single data value sum.
2. The apparatus of claim 1, wherein the apparatus is configured to
sum data values corresponding to respective ones of the plurality
of vectors without transferring data values via an input/output
(I/O) line outside the array of memory cells and the sensing
circuitry.
3. The apparatus of claim 1, wherein the apparatus is configured to
perform the summing acts without transferring data values to a
control component external to the array and the sensing
circuitry.
4. The apparatus of claim 3, wherein the sensing circuitry
comprises transistors formed on pitch with transistors of the array
of memory cells.
5. An apparatus, comprising: an array of memory cells coupled to
sensing circuitry and to a number of sense lines; the sensing
circuitry comprising sense amplifiers and compute components
coupled to the number of sense lines; and a controller configured
to use the sense amplifiers and the compute components to cause:
summing, in parallel and using the sense amplifiers and the compute
components, of data values stored in memory cells coupled to the
number of sense lines and to a number of access lines of the array
of memory cells to determine a data value sum corresponding to each
respective one of the number of sense lines; storing of a
respective data value sum of the data value sums in memory cells
coupled to the respective sense line to which the respective data
value sum corresponds; and summing, using the compute component, of
the stored data value sums to determine a resultant sum
representing a population count corresponding to the data values
stored in the memory cells coupled to the number of sense lines and
to the number of access lines.
6. The apparatus of claim 5, wherein the controller is configured
to cause storing of the data value sums in memory cells coupled to
access lines other than the number of access lines to which memory
cells storing the data values are coupled.
7. The apparatus of claim 5, wherein the controller is configured
to cause storing of the data value sums as vertical bit
vectors.
8. The apparatus of claim 7, wherein the controller being
configured to cause summing of the data value sums to determine a
resultant sum comprises the controller being configured to cause:
summing, in parallel, of a number of pairs of the stored data value
sums to determine a number of updated data value sums; storing of
the number of updated data value sums in memory cells coupled to
respective ones of the number of sense lines to which the data
value sums correspond; responsive to the number of updated data
value sums being greater than one, subsequently summing, in
parallel, of a number of pairs of the updated data value sums to
determine a number of updated updated data value sums; and storing
of the number of updated updated data value sums in memory cells
coupled to respective ones of the number of sense lines to which
the updated data value sums correspond.
9. The apparatus of claim 5, wherein the controller being
configured to cause the storing of the data value sums comprises
the controller being configured to cause the data value sums to be
stored as: M-bit vertical bit vectors, and wherein the controller
being configured to cause summing of the stored data value sums
comprises the controller being configured to cause: summing
respective pairs of the M-bit vertical bit vectors; and storing the
sums of the respective pairs of the M-bit vertical bit vectors in
log.sub.2 (M) memory cells of the array of memory cells.
10. The apparatus of claim 5, wherein the controller is configured
to cause storing of the data values as a plurality of vertical bit
vectors.
11. The apparatus of claim 5, wherein the controller is configured
to cause storing of the data values in the array of memory cells
using the memory cells coupled to the number of sense lines.
12. The apparatus of claim 5, wherein the controller is configured
to cause storing the data values in the array as a plurality of
vertical bit vectors, each of the plurality of vertical bit vectors
being arranged such that elements of a respective one of the
plurality of vertical bit vectors is stored in memory cells coupled
to a respective one of the number of sense lines.
13. The apparatus of claim 5, wherein the apparatus is configured
to perform logical operations without transferring data out of the
array of memory cells and sensing circuitry.
14. The apparatus of claim 5, wherein each sense line of the number
of sense lines corresponds to a different column of the array of
memory cells.
15. A method for determining population count, comprising: storing
a vector in memory cells coupled to a number (R) of access lines
and a number (C) of sense lines of an array coupled to sensing
circuitry comprising a sense amplifier and a compute component
coupled to the C sense lines; performing, in parallel and using the
sense amplifier and the compute component, a first summation
operation on data values stored in memory cells coupled to
respective ones of the C sense lines to determine C sense line
population count sums corresponding to the respective sense lines;
subsequently performing a second summation operation, comprising:
summing, in parallel and using the sense amplifier and the compute
component, at least two particular sense line population count sums
of the C sense line population count sums to provide a first number
of updated population count sums; and responsive to the first
number of updated population count sums being greater than one,
subsequently performing a number of summation operation iterations,
wherein each one of the number of summation operation iterations
comprises: summing, in parallel and using the sense amplifier and
the compute component, at least two particular population count
sums of the first number of updated population count sums
determined in an immediate prior summation operation iteration to
provide an updated number of updated population count sums; and
wherein the number of summation operation iterations are performed
until the first number of updated population count sums is equal to
one.
16. The method of claim 15, wherein each of the C sense line
population count sums corresponds to a respective population count
of a particular data value stored in the memory cells coupled to a
respective one of the C sense lines.
17. The method of claim 15, further comprising storing each
respective one of the C sense line population count sums in memory
cells coupled to access lines of the array other than the R access
lines to which memory cells storing the vector are coupled.
18. The method of claim 15, wherein performing the number of
summation operation iterations until the number of updated
population count sums is equal to one includes performing log.sub.2
(C)-1 summation operations; and wherein the updated population
count sum when the number of updated population counts sums is one
indicates the population count for the vector.
19. The method of claim 15, wherein the vector comprises a number
(M) of digits with M/R digits of the vector being stored in memory
cells coupled to each of the R access lines and M/C of the digits
of the vector being stored in memory cells coupled to each of the C
sense lines.
20. The method of claim 15, wherein: performing the second
summation operation includes summing, in parallel, a number of
pairs of the C sense line population count sums such that the first
number of updated population count sums is C/2 population count
sums; and performing a first summation operation iteration of the
number of summation operation iterations includes summing, in
parallel, pairs of the C/2 updated population count sums such that
the updated number of updated population count sums is C/4.
Description
TECHNICAL FIELD
The present disclosure relates generally to semiconductor memory
and methods, and more particularly, to apparatuses and methods
related to determining population count in a memory array using
sensing circuitry.
BACKGROUND
Memory devices are typically provided as internal, semiconductor,
integrated circuits in computers or other electronic systems. There
are many different types of memory including volatile and
non-volatile memory. Volatile memory can require power to maintain
its data (e.g., host data, error data, etc.) and includes random
access memory (RAM), dynamic random access memory (DRAM), static
random access memory (SRAM), synchronous dynamic random access
memory (SDRAM), and thyristor random access memory (TRAM), among
others. Non-volatile memory can provide persistent data by
retaining stored data when not powered and can include NAND flash
memory, NOR flash memory, and resistance variable memory such as
phase change random access memory (PCRAM), resistive random access
memory (RRAM), and magnetoresistive random access memory (MRAM),
such as spin torque transfer random access memory (STT RAM), among
others.
Electronic systems often include a number of processing resources
(e.g., one or more processors), which may retrieve and execute
instructions and store the results of the executed instructions to
a suitable location. A processor can comprise a number of
functional units such as arithmetic logic unit (ALU) circuitry,
floating point unit (FPU) circuitry, and/or a combinatorial logic
block, for example, which can be used to execute instructions by
performing logical operations such as AND, OR, NOT, NAND, NOR, and
XOR, and invert (e.g., inversion) logical operations on data (e.g.,
one or more operands). For example, functional unit circuitry (FUC)
may be used to perform arithmetic operations such as addition,
subtraction, multiplication, and/or division on operands via a
number of logical operations.
A number of components in an electronic system may be involved in
providing instructions to the FUC for execution. The instructions
may be generated, for instance, by a processing resource such as a
controller and/or host processor. Data (e.g., the operands on which
the instructions will be executed) may be stored in a memory array
that is accessible by the FUC. The instructions and/or data may be
retrieved from the memory array and sequenced and/or buffered
before the FUC begins to execute instructions on the data.
Furthermore, as different types of operations may be executed in
one or multiple clock cycles through the FUC, intermediate results
of the instructions and/or data may also be sequenced and/or
buffered.
In many instances, the processing resources (e.g., processor and/or
associated FUC) may be external to the memory array, and data is
accessed via a bus between the processing resources and the memory
array to execute a set of instructions. Processing performance may
be improved in a processor-in-memory (PIM) device, in which a
processor may be implemented internal and/or near to a memory
(e.g., directly on a same chip as the memory array), which may
conserve time and power in processing.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a block diagram of an apparatus in the form of an
electronic system including a memory device in accordance with a
number of embodiments of the present disclosure.
FIG. 2 illustrates a vector arranged in a memory array in
accordance with a number of embodiments of the present
disclosure.
FIG. 3 illustrates an arrangement of data values in a memory array
after columnar summing in accordance with a number of embodiments
of the present disclosure.
FIG. 4 illustrates data value sums arranged in a memory array after
horizontal summing in accordance with a number of embodiments of
the present disclosure.
FIG. 5 is a schematic diagram illustrating sensing circuitry in
accordance with a number of embodiments of the present
disclosure.
FIG. 6 is a schematic diagram illustrating a portion of sensing
circuitry in accordance with a number of embodiments of the present
disclosure.
FIG. 7 illustrates a timing diagram associated with performing a
number of logical operations using sensing circuitry in accordance
with a number of embodiments of the present disclosure.
FIG. 8 illustrates a timing diagram associated with performing a
number of logical operations using sensing circuitry in accordance
with a number of embodiments of the present disclosure.
FIG. 9 illustrates a timing diagram associated with performing a
number of logical operations using sensing circuitry in accordance
with a number of embodiments of the present disclosure.
FIG. 10 illustrates a timing diagram associated with performing a
number of logical operations using sensing circuitry in accordance
with a number of embodiments of the present disclosure.
FIG. 11 illustrates a timing diagram associated with performing a
number of logical operations using sensing circuitry in accordance
with a number of embodiments of the present disclosure.
FIG. 12 illustrates a timing diagram associated with performing a
number of logical operations using sensing circuitry in accordance
with a number of embodiments of the present disclosure.
FIG. 13 is a schematic diagram illustrating sensing circuitry
having selectable logical operation selection logic in accordance
with a number of embodiments of the present disclosure.
FIG. 14 is a logic table illustrating selectable logic operation
results implemented by a sensing circuitry in accordance with a
number of embodiments of the present disclosure.
DETAILED DESCRIPTION
The present disclosure includes apparatuses and methods related to
determining population count. An example apparatus comprises an
array of memory cells coupled to sensing circuitry. The apparatus
can include a controller configured to cause: summing, in parallel,
of data values corresponding to respective ones of a plurality of
first vectors stored in memory cells of the array as a data value
sum representing a population count thereof, wherein a second
vector is stored as the plurality of first vectors, and wherein
each first vector of the plurality of first vectors is stored in
respective memory cells of the array that are coupled to a
respective sense line of a plurality of sense lines; and
iteratively summing, in parallel, of data value sums corresponding
to the plurality of first vectors to provide a single data value
sum corresponding to the second vector.
A vector is an ordered collection of data values (e.g., digits,
bits, etc.). As an example, a vector comprising 512 data values can
be stored in a memory array, with each data value being stored in a
different memory cell of the memory array. As such, the data values
of the vector can be stored (e.g., arranged) in the memory array in
a variety of ways. For example, a 512 bit vector can be arranged
such that the bits are stored in memory cells coupled to an access
line (e.g., row) and to sense lines (e.g., columns), or such that
the 512 bits are stored in memory cells coupled to a single column
and to 512 rows, or in other configurations involving 512 memory
cells of multiple rows and/or multiple columns (e.g., 32 rows and
16 columns).
A quantity of particular data values can be referred to as a
population count. As an example, in a string of bits, a population
count can be the quantity of bits having a data value of "1" (e.g.,
logical "1" data state), which is also the sum of the string of
bits. Population count can also be referred to as a "popcount" or
"sideways sum."
The population count of a vector (e.g., digit string) can be of
interest in cryptography and other applications. The population
count may also be of interest in determining the Hamming weight of
a string, which is the number of symbols that are different from
the zero-symbol of the alphabet used. The Hamming weight is
equivalent to the Hamming distance from the all-zero string of the
same length. The Hamming distance of two words A and B can be
calculated as the Hamming weight of A XOR B. Also, the quantity of
data values (e.g., bits) of "0" can be determined (e.g., counted)
if of interest, for example, in evaluation of matrix sparsity.
Performing logical operations, in parallel, using the sensing
circuitry coupled to columns of a memory array, can be employed to
rapidly and efficiently determine a population count of a vector
stored in the memory array. According to various embodiments of the
present disclosure, a vector can be stored in a memory array as a
plurality of smaller vectors. Each smaller vector can be stored in
a column of the memory array (e.g., depicted in a vertical
direction). The population count of the smaller vectors can each be
determined by adding the contents of the memory cells storing the
digits of the smaller vector using the sensing circuitry associated
with the column. The result can be stored, for example, as a binary
representation of the quantity in other rows of the column. The
population counts of the multiple smaller vectors can be determined
simultaneously in this manner where each column is equipped with
sensing circuitry capable of performing logical operations.
Thereafter, pairs of the population count of the smaller vectors
can be added together, for example, using shifting and adding
capabilities of the sensing circuitry, and pairs of those sums can
be subsequently added together in a number of iterations until a
single total resultant sum is obtained.
A number of embodiments of the present disclosure can provide
improved parallelism and/or reduced power consumption in
association with performing compute functions as compared to
previous systems such as previous PIM systems and systems having an
external processor (e.g., a processing resource located external
from a memory array, such as on a separate integrated circuit
chip). For instance, a number of embodiments can provide for
performing fully complete compute functions such as integer add,
subtract, multiply, divide, and CAM (content addressable memory)
functions without transferring data out of the memory array and
sensing circuitry via a bus (e.g., data bus, address bus, control
bus), for instance. Such compute functions can involve performing a
number of logical operations (e.g., logical functions such as AND,
OR, NOT, NOR, NAND, XOR, etc.). However, embodiments are not
limited to these examples. For instance, performing logical
operations can include performing a number of non-Boolean logic
operations such as copy, compare, destroy, etc.
In previous approaches, data may be transferred from the array and
sensing circuitry (e.g., via a bus comprising input/output (I/O)
lines) to a processing resource such as a processor,
microprocessor, and/or compute engine, which may comprise ALU
circuitry and/or other functional unit circuitry configured to
perform the appropriate logical operations. However, transferring
data from a memory array and sensing circuitry to such processing
resource(s) can involve significant power consumption. Even if the
processing resource is located on a same chip as the memory array,
significant power can be consumed in moving data out of the array
to the compute circuitry, which can involve performing a sense line
(which may be referred to herein as a digit line or data line)
address access (e.g., firing of a column decode signal) in order to
transfer data from sense lines onto I/O lines (e.g., local I/O
lines), moving the data to the array periphery, and providing the
data to the processing resource.
Furthermore, the circuitry of the processing resource(s) (e.g.,
compute engine) may not conform to pitch rules associated with a
memory array. For example, the cells of a memory array may have a
4F.sup.2 or 6F.sup.2 cell size, where "F" is a feature size
corresponding to the cells. As such, the devices (e.g., logic
gates) associated with ALU circuitry of previous PIM systems may
not be capable of being formed on pitch with the memory cells,
which can affect chip size and/or memory density, for example. A
number of embodiments of the present disclosure include sensing
circuitry formed on pitch with memory cells of the array and
capable of performing compute functions such as those described
herein below.
In the following detailed description of the present disclosure,
reference is made to the accompanying drawings that form a part
hereof, and in which is shown by way of illustration how one or
more embodiments of the disclosure may be practiced. These
embodiments are described in sufficient detail to enable those of
ordinary skill in the art to practice the embodiments of this
disclosure, and it is to be understood that other embodiments may
be utilized and that process, electrical, and/or structural changes
may be made without departing from the scope of the present
disclosure. As used herein, the designator "N," particularly with
respect to reference numerals in the drawings, indicates that a
number of the particular feature so designated can be included. As
used herein, "a number of" a particular thing can refer to one or
more of such things (e.g., a number of memory arrays can refer to
one or more memory arrays).
The figures herein follow a numbering convention in which the first
digit or digits correspond to the drawing figure number and the
remaining digits identify an element or component in the drawing.
Similar elements or components between different figures may be
identified by the use of similar digits. For example, 506 may
reference element "06" in FIG. 5, and a similar element may be
referenced as 606 in FIG. 6. As will be appreciated, elements shown
in the various embodiments herein can be added, exchanged, and/or
eliminated so as to provide a number of additional embodiments of
the present disclosure. In addition, as will be appreciated, the
proportion and the relative scale of the elements provided in the
figures are intended to illustrate certain embodiments of the
present invention, and should not be taken in a limiting sense.
FIG. 1 is a block diagram of an apparatus in the form of a
computing system 100 including a memory device 120 in accordance
with a number of embodiments of the present disclosure. As used
herein, a memory device 120, a memory array 130, and/or sensing
circuitry 150 might also be separately considered an
"apparatus."
System 100 includes a host 110 coupled (e.g., connected) to memory
device 120, which includes a memory array 130. Host 110 can be a
host system such as a personal laptop computer, a desktop computer,
a digital camera, a smart phone, or a memory card reader, among
various other types of hosts. Host 110 can include a system
motherboard and/or backplane and can include a number of processing
resources (e.g., one or more processors, microprocessors, or some
other type of controlling circuitry). The system 100 can include
separate integrated circuits or both the host 110 and the memory
device 120 can be on the same integrated circuit. The system 100
can be, for instance, a server system and/or a high performance
computing (HPC) system and/or a portion thereof. Although the
example shown in FIG. 1 illustrates a system having a Von Neumann
architecture, embodiments of the present disclosure can be
implemented in non-Von Neumann architectures (e.g., a Turing
machine), which may not include one or more components (e.g., CPU,
ALU, etc.) often associated with a Von Neumann architecture.
For clarity, the system 100 has been simplified to focus on
features with particular relevance to the present disclosure. The
memory array 130 can be a DRAM array, SRAM array, STT RAM array,
PCRAM array, TRAM array, RRAM array, NAND flash array, and/or NOR
flash array, for instance. The array 130 can comprise memory cells
arranged in rows coupled by access lines (which may be referred to
herein as word lines or select lines) and columns coupled by sense
lines. Although a single array 130 is shown in FIG. 1, embodiments
are not so limited. For instance, memory device 120 may include a
number of arrays 130 (e.g., a number of banks of DRAM cells). An
example DRAM array is described in association with FIG. 5.
The memory device 120 includes address circuitry 142 to latch
address signals provided over an I/O bus 156 (e.g., a data bus)
through I/O circuitry 144. Address signals are received and decoded
by a row decoder 146 and a column decoder 152 to access the memory
array 130. Data can be read from memory array 130 by sensing
voltage and/or current changes on the data lines using sensing
circuitry 150. The sensing circuitry 150 can read and latch a page
(e.g., row) of data from the memory array 130. The I/O circuitry
144 can be used for bi-directional data communication with host 110
over the I/O bus 156. The write circuitry 148 is used to write data
to the memory array 130.
Controller 140 decodes signals provided by control bus 154 from the
host 110. These signals can include chip enable signals, write
enable signals, and address latch signals that are used to control
operations performed on the memory array 130, including data read,
data write, and data erase operations. In various embodiments, the
controller 140 is responsible for executing instructions from the
host 110. The controller 140 can be a state machine, a sequencer,
or some other type of controller.
The controller 140 can include a shift controller 141 that can
control signals provided to, for instance, shift circuitry in
association with performing data shifting as described further
herein. For example, the shift controller 141 can control shifting
data (e.g., right or left) in an array.
Examples of the sensing circuitry 150 are described further below.
For instance, in a number of embodiments, the sensing circuitry 150
can comprise a number of sense amplifiers (e.g., sense amplifier
506 shown in FIG. 5 or sense amplifier 606 shown in FIG. 6) and a
number of compute components (e.g., compute component 531-1 shown
in FIG. 5), which may serve as, and be referred to herein as, an
accumulator and can be used to perform logical operations (e.g., on
data associated with complementary data lines).
In a number of embodiments, the sensing circuitry (e.g., 150) can
be used to perform logical operations using data stored in array
130 as inputs and store the results of the logical operations back
to the array 130 without transferring data via a sense line address
access (e.g., without firing a column decode signal). As such,
various compute functions can be performed using, and within,
sensing circuitry 150 rather than (or in association with) being
performed by processing resources external to the sensing circuitry
(e.g., by a processor associated with host 110 and/or other
processing circuitry, such as ALU circuitry, located on device 120
(e.g., on controller 140 or elsewhere)).
In various previous approaches, data associated with an operand,
for instance, would be read from memory via sensing circuitry and
provided to external ALU circuitry via I/O lines (e.g., via local
I/O lines and/or global I/O lines). The external ALU circuitry
could include a number of registers and would perform compute
functions using the operands, and the result would be transferred
back to the array via the I/O lines. In contrast, in a number of
embodiments of the present disclosure, sensing circuitry (e.g.,
150) is configured to perform logical operations on data stored in
memory (e.g., array 130) and store the result back to the memory
without enabling an I/O line (e.g., a local I/O line) coupled to
the sensing circuitry, which can be formed on pitch with the memory
cells of the array. Enabling an I/O line can include enabling
(e.g., turning on) a transistor having a gate coupled to a decode
signal (e.g., a column decode signal) and a source/drain coupled to
the I/O line. Embodiments are not so limited. For instance, in a
number of embodiments, the sensing circuitry (e.g., 150) can be
used to perform logical operations without enabling column decode
lines of the array; however, the local I/O line(s) may be enabled
in order to transfer a result to a suitable location other than
back to the array (e.g., to an external register).
As such, in a number of embodiments, circuitry external to array
130 and sensing circuitry 150 is not needed to perform compute
functions as the sensing circuitry 150 can perform the appropriate
logical operations to perform such compute functions without the
use of an external processing resource. Therefore, the sensing
circuitry 150 may be used to complement and/or to replace, at least
to some extent, such an external processing resource (or at least
the bandwidth of such an external processing resource). However, in
a number of embodiments, the sensing circuitry 150 may be used to
perform logical operations (e.g., to execute instructions) in
addition to logical operations performed by an external processing
resource (e.g., host 110). For instance, host 110 and/or sensing
circuitry 150 may be limited to performing only certain logical
operations and/or a certain number of logical operations.
FIG. 2 illustrates a vector 266 arranged in a memory array 260 in
accordance with a number of embodiments of the present disclosure.
FIG. 2 shows a memory array 260 having memory cells 261 arranged
into a number of columns 262 (e.g., C columns) and a number of rows
264 (e.g., R rows). An example N-bit vector 266 (e.g., 512-bit
vector comprising bits 0 to 511) can be stored in the memory array
260 with one digit of the vector 266 being stored per memory cell
of the memory array 260. The vector 266 can be arranged to be
stored in C columns and R rows of the memory array, where C.times.R
is greater than or equal to N.
FIG. 2 shows that vector 266 can be stored as a number of smaller
M-bit vectors 268 (e.g., where M is less than N). The smaller
vectors 268 can be stored in respective columns of the memory array
260 (e.g., vertically). As such, the smaller M-bit vectors can also
be referred to herein as vertical bit vectors. Vertical bit vectors
can also be referred to as vertical vectors, although physical
implementations of columns in a memory array may not be oriented
vertically, columns are generally depicted in 2-dimensions as being
oriented vertically. As used herein, columns refer to the
arrangement of memory cells that are coupled to a particular sense
line, which in turn is coupled to an associated sensing circuitry.
As shown in FIG. 2, the 512-bit vector 266 can be stored as sixteen
(16) 32-bit vectors 268, corresponding to columns 0-15.
Certain efficiencies can be gained by storing a vector in a memory
array using as few rows as possible to minimize logical operations
associated with columnar addition. Stated in another way, certain
efficiencies can be gained by storing a vector in as many columns
as possible so as to minimize the quantity of rows (e.g.,
arrangement of memory cells that are coupled to a particular sense
line word line). Certain efficiencies can further be gained by
storing a vector in C columns where C is a power of 2 and/or to
avoid having C be odd since horizontal additions of the present
method can involve pairs of addends. According to various
embodiments of the present disclosure, a vector can be stored in a
memory array using as few rows as possible and using C columns
where C is a power of 2 (e.g., 2 columns, 4 columns, 16 columns,
etc.)
FIG. 3 illustrates an arrangement of data values in a memory array
after columnar summing in accordance with a number of embodiments
of the present disclosure. Performing various logical operations
using sensing circuitry is discussed in detail below with respect
to FIGS. 5-12. For purposes of the discussion of FIGS. 3 and 4 in
regards to determining population count of a vector, sensing
circuitry is associated with each column of memory cells and is
configured to implement addition of data values stored in memory
cells, and storing the resultant data value sum back to memory
cells of the memory array. Also, shifting circuitry can be used in
conjunction with the sensing circuitry to shift operands and/or
resultants of a logical operation such that the contents of memory
cells in other (e.g., adjacent columns) can be used in the logical
operations and/or resultants of the logical operations can be
stored to memory cells in columns adjacent to the column to which
the sensing circuitry is associated (e.g., coupled in a normal
configuration). An addend is an operand of an AND logical
operation.
According to various embodiments of the present disclosure, a
population count of vertical bit vectors (e.g., 368-0, 368-15) can
be determined, for example, by successively summing (e.g., adding)
the data values stored in the memory cells of the vertical bit
vectors and storing the resultant data value sum back to the memory
array. The data values for a plurality of vertical bit vectors can
be summed in parallel.
The sensing circuitry is configured to sum, in parallel, data
values stored in memory cells coupled to a number of the plurality
of sense lines and to a number of access lines of the array to
determine data value sums corresponding to each respective one of
the number of sense lines. For example, the data value of each
memory cell of an M-bit vertical bit vector can be summed with a
data value sum stored in log.sub.2 (M) memory cells, the data value
sum being a data value sum of a quantity such as the population
count of the M-bit vertical bit vector.
According to some embodiments the data value sums (e.g., population
count) can be stored back to memory cells in the same column in
which data values of the vertical bit vector is stored. As
previously mentioned, a population count of a binary string (e.g.,
a quantity of data values in the string having a "1" logical
value), such as a vertical bit vector, can be determined by summing
each of the data values of the string together. Data values having
a "1" logical value will increment the sum and data values having a
"0" logical value will not increment the sum.
FIG. 3 shows a bit vector comprising 512 bits (e.g., shown as "BIT
0" through "BIT 511") being stored as a number of vertical bit
vectors 368-0, . . . , 368-15 (referred to generally as vertical
bit vectors 368) in rows 0-31 of each column and a data value sum
(e.g., binary number stored as bit vectors 370-0, . . . , 370-15
and referred to generally as data value sums 370) of the quantity
of the population count corresponding to the vertical bit vector
368 of a particular column being stored immediately below the
respective vertical bit vector 368 in each column. For example, the
data value sum 370-0 of the population count of vertical bit vector
368-0 is stored in column 0 immediately below vertical bit vector
368-0, and the data value sum 370-15 of the population count of
vertical bit vector 368-15 is stored in column 15 immediately below
vertical bit vector 368-15. FIG. 3 similarly shows the data value
sum 370 of the population count of a vertical bit vector 368 being
stored in the corresponding column immediately below the respective
vertical bit vector 368.
However, embodiments of the present disclosure are not so limited,
and the data value sum 370 of the population count of a particular
vertical bit vector 368 need not be stored in the same column as
the vertical bit vector 368, need not be stored immediately below
the vertical bit vector 368, and need not even be stored below the
vertical bit vector 368. For example, a data value sum 370 of the
population count of a particular vertical bit vector 368 may be
stored in another column, in a row, in a same column above the
vertical bit vector 368, or stored further below the vertical bit
vector 368 than immediately therebelow.
FIG. 3 shows the data value sum 370 of the population count of a
particular vertical bit vector 368 being arranged such that a least
significant bit (e.g., bit 0) is stored in a memory cell of the
array of memory cells immediately below the particular vertical bit
vector 368, and successively more significant bits of the data
value sum 370 of the population count being stored therebelow.
However, embodiments of the present disclosure are not so limited,
and the arrangement of the data value sum 370 can be different than
that shown (e.g., with a most significant bit being stored closest
to the particular vertical bit vector 368 or other arrangement).
Also, data values of the vertical bit vector 368 and/or data value
sum 370 are not limited to binary data values, and can be other
multiple-state data values stored in multiple-state memory cells.
FIG. 3 shows an example utilizing binary data values (i.e., bits)
for simplicity and clarity.
According to various embodiments of the present disclosure, the
length of the vertical bit vector (e.g., number of rows used to
store the vertical bit vector) is not limited to being 32 bits, as
is shown in FIG. 3. The vertical bit vector can be of any length
that can be accommodated by the array of memory cells. Nor is the
length of the data value sum limited to 6 data values, and can be
more or fewer depending on the quantity of data values in the
vertical bit vector. The data value sum can be of sufficient length
to represent a maximum population count for a particular vertical
bit vector. For example, a 32-bit vertical bit vector (e.g., 368-0)
can have a maximum population count of 32 (e.g., if all data values
of the vertical bit vector are a "1" logical value). Therefore, the
data value sum for a binary representation of 32 requires 6 bits
since log.sub.2 (32)=6.
Although FIG. 3 shows the resultant binary number representing the
population count being stored in the same column as the vertical
bit vector 368 from which the population count is determined,
embodiments of the present disclosure are not so limited and the
resultant can be stored in other arrangements, such as in an
adjacent column (e.g., in preparation for a subsequent addition) or
in another C quantity of columns located in another portion of the
memory array.
Summing the data values stored in the memory cells of the vertical
bit vectors 368 can be accomplished using the sensing circuitry of
the PIM architecture shown below with respect to FIGS. 5 and 6. For
each example M-bit (e.g., 32-bit) vector 368 stored in respective
columns, the quantity of those digits having a "1" logical state
can range from zero to 32. As such, log.sub.2 (M) memory cells
(rounded-up to the next integer) can contain the resultant
population count represented as a binary number. In the example
shown in FIG. 3, log.sub.2 (32)=6 memory cells are allocated to
store the resultant population count of the M-digit vector for each
column, the data value sum of population count of each vertical bit
vector 368 being stored in rows 32-37.
Mathematical addition, a higher level operation, can be
accomplished using logical operations. Let Sa and Sb be two single
bit addends. Let Ci be a carry input. The sum output, So, and carry
output, Co, can be determined by the following Boolean equations:
So=Ci Sa Sb Co=SaSb+SaCi+SbCi where "" denotes a Boolean AND, "+"
denotes a Boolean OR, and "{circumflex over ( )}" denotes a Boolean
XOR (e.g., exclusive OR). As is discussed further below with
respect to FIGS. 5-12, individual logical operations (e.g., Boolean
logical operations) can be implemented using the sensing circuitry
coupled to the memory array. An XOR function can be implemented in
a variety of ways including by implementing equivalent Boolean
algebraic expressions involving NOT, AND and/or OR logical
operations such as ANOT(B)+NOT(A)B and (A+B)NOT(AB), among others
where A and B are the operands of the XOR logical operations. That
is, apparatus that can implement NOT, AND, and OR logical
operations, such as that described with respect to FIG. 5-12, can
be used to implement XOR and other logical operations that have
equivalent Boolean expressions involving NOT, AND, and OR logical
operations.
An example of pseudo code for a multi-bit addition can be
summarized as follows. Each of Ci, Co, Sa, and Sb represent the
data value of a memory cell. The details of how to implement
loading a data value from a memory cell associated with a
particular row of the column to the sensing circuitry, saving a
data value from the sensing circuitry to a memory cell, performing
logical operations between two operands, and/or shifting operations
between adjacent columns will better be understood with respect to
the discussions that follow regarding FIG. 5-12.
As is described with respect to FIG. 5-12, several methods can be
used to perform same logical operations with the result being
initially stored in the sense amplifier, or being initially stored
in the accumulator of the sensing circuitry. As such, the
operations described here refer to the sensing circuitry, which
intends the appropriate portion of the sensing circuitry depending
on the particular one of several methods being used to perform the
particular logical operation. For example, some action to move a
data value "into the sensing circuitry" can mean moving the data
value to the accumulator using one mode of operating the sensing
circuitry, or can mean moving the data value to a sense amplifier
using another mode of operating the sensing circuitry. The below
pseudocode describes an example multi-bit addition operation:
Load Ci into the sensing circuitry;
Save sensing circuitry (Ci) to a temp row (Ci also still residing
in the sensing circuitry);
XOR Sa into the sensing circuitry (Sa{circumflex over ( )}Ci);
Shift to align data, if necessary;
XOR Sb into the sensing circuitry (Sb{circumflex over (
)}(Sa{circumflex over ( )}Ci));
Shift to align data, if necessary;
Save sensing circuitry in So;
Load sensing circuitry with Sa (replacing previous data value in
sensing circuitry);
Shift to align data, if necessary;
AND Sb into sensing circuitry (SbSa);
Shift to align data, if necessary;
Save sensing circuitry into a second temp row (Co);
Load sensing circuitry from temp row (Ci);
AND Sa into the sensing circuitry;
Save sensing circuitry to temp row (Co);
Load sensing circuitry from temp row (Ci);
Shift to align data, if necessary;
AND Sb into the sensing circuitry;
Shift to align data, if necessary;
OR the temp row (Co) into the sensing circuitry;
Increment the Sa, Sb, and So row addresses;
If not at the end of the multi-bit add, return above to Save
sensing circuitry (Ci) to a temp row;
If at the end of the multi-bit addition, save sensing circuitry
into So;
The shift operations to align data, if necessary, are used if the
two addends are not in the same column, as may be the case if the
data values involved in the mathematical addition are stored in a
different column than the vertical bit vector (which is not the
case for the example shown in FIG. 3 but is the case for subsequent
additions such as those discussed with respect to FIG. 4 below).
That is, the data value sums can be stored in memory cells coupled
to access lines other than the number of access lines to which
memory cells storing the data values (e.g., summed to determine the
data value sum) are coupled. Since the addends are all located in
the same column for the example shown in FIG. 3, the shift
operations can be omitted in determining the population count for
the 32-bit vectors 368 in each column.
The pseudo code for a multi-bit addition summarized above can be
used to add the data value of one memory cell to the data value sum
of the quantity of the population count for the vertical bit
vector. However, the vertical bit vector (e.g., 368) can be
comprised of M-bits (e.g., 32 for the example illustrated in FIG.
3). As such, each of the 32 data values of the vertical bit vector
368 can be added to the total population count represented by the
multi-bit data value sum 370 corresponding to the column. That is,
the above-detailed addition can be repeated 32 times to account for
each of the data values of the vertical bit vectors 368 comprising
the 512-bit vector described in association with FIG. 2.
The population count of the M-digit (e.g., columnar) vector can be
determined by performing logical operations involving various
memory cells in the column. For an M-digit vector in a column, M
additions can be utilized to determine the population count (e.g.,
adding each digit of the M-digit vector to a binary total). One
method for implementing the addition of the digits comprising an
M-bit vector stored in a column can be as follows: Let Src be the
starting row address of an M-bit vector stored in a column; Let Cnt
be the starting row address of a [log.sub.2 (M)]-bit data value sum
of a columnar population count for the M-bit vector; Let Sx be the
starting row address of a [log.sub.2 (M)]-bit columnar scratch area
that can be set to zero; Loop to clear the log.sub.2 (M) Cnt
address rows; Loop to clear Sx scratch rows; Load Src into the
sensing circuitry; Do the multi-bit columnar add with Cnt being Sa,
and Sx being Sb; Increment the Src row addresses; If not at the end
of the M-bit vertical bit vector, return to Load another Src into
the sensing circuitry;
As will be appreciated with respect to the discussion of FIG. 5,
each column of the memory array is coupled to an associated sensing
circuitry (e.g., sense amplifier and compute component), which can
be utilized to perform logical operations in parallel and thus
simultaneously determine the population count for each of the 16
32-bit vectors 368 shown in the example illustrated in FIG. 3. The
16 6-bit sums that quantify the population count of the 32-bit
vectors 368 stored in rows 0-31 of the corresponding column are
shown in FIG. 3 being stored back into the memory array in rows
32-37 (e.g., as sums 370).
FIG. 4 illustrates data value sums arranged in a memory array after
horizontal summing in accordance with a number of embodiments of
the present disclosure. Data value sums (e.g., sums 370-0 and 370-1
representing a population count of particular vertical bit vectors
368-0 and 368-1) can be summed to determine a resultant sum (e.g.,
472-1) representing a population count corresponding to the data
values stored in the memory cells coupled to the number of the
plurality of sense lines and to the number of access lines. For
example, a first iteration of horizontal summing can sum a pair
(e.g., 2) of data value sums to determine an updated data value
sum. A pair of updated data value sums can be summed in a second
iteration of horizontal summing to determine an updated updated
data value sum, and so on until only a single data value sum
remains that represents the population counts of each of the
vertical bit vectors, and thus the population count of an initial
vector (e.g., the 512-bit vector described in association with FIG.
2) that was stored as a plurality of vertical bit vectors (e.g.,
bit vectors 368 described in FIG. 3). Summing of data value sums,
updated data value sums, etc., can continue responsive to the
number of data value sums, updated data value sums being greater
than one. The updated data value sums can be in memory cells
coupled to respective ones of the number of sense lines to which
one of the updated data value sums correspond, or in memory cells
coupled to other sense lines (e.g., sense lines to which no updated
data value sums correspond).
After the 16 6-bit data value sums (e.g., sums 370 shown in FIG. 3)
quantifying the population count of each of the 32-bit vectors
(e.g., 368 shown in FIG. 3) stored in rows 0-31 of the
corresponding column are determined, the data value sums are
subsequently summed in log.sub.2 (C) logical addition operations,
where C is the quantity of vertical bit vectors (e.g., 368). In
other words, after the population count for the vertical bit
vectors are determined, the population counts are added together to
arrive at the total population count for the original vector (e.g.,
the 512-bit vector shown in FIG. 2 stored in columns 0-15 and rows
0-31).
FIG. 4 shows the 6-bit data value sums of the population count of
respective vertical bit vectors (e.g., 470-0 for column 0, 470-1
for column 1) stored in rows 32-37 for each column, just as was
shown in FIG. 3. FIG. 4 additionally shows a 7-bit data value sum
of a population count 472-1, which can be a sum of the data value
sums of the population count 470-0 for column 0 and the data value
sums of the population count 470-1 for column 1. That is, the
multi-bit data value sums of the population count 470-0 for column
0 can be added to the data value sums of the population count 470-1
for column 1, with the result being the data value sum 472-1 shown
stored in rows 38-44 of column 1. Alternatively, the data value sum
472-1 shown stored in rows 38-44 of column 1 can be initially
zeroed, after which each of the data value sums of the population
count 470-0 for column 0 and the data value sums of the population
count 470-1 for column 1 can be added thereto.
It should be noted that in the example illustrated in FIG. 4, data
value sum 472-1 is a 7-bit binary number whereas each of the data
value sums 470-0 and 470-1 are 6-bit numbers. The maximum
population count for each of the 32-bit vertical bit vectors is 32;
therefore, data value sum 472-1 for a sum result may have to
represent a maximum value of 2.times.32 or 64. Log.sub.2 (64)=7 so
7 digits are necessary for the data value sum 472-1.
FIG. 4 further shows that pairs of the 16 data value sums of
population counts for the vertical bit vectors (e.g., 470-0 and
470-1) are added together and stored in a new data value sum of the
sum (e.g., 472-1). The addition of the multi-bit data value sums of
the population counts for the vertical bit vectors can be
accomplished as described above. However, since the data value sums
of the population counts for pairs of vertical bit vectors are
stored in different columns, the shifting operations shown in the
pseudo code above can be used to move the data values into the
sensing circuitry of an adjacent column. Shifting operations are
discussed in more detail below with respect to FIG. 5.
The addition of two multi-bit data value sums (e.g., 470-0 and
470-1) stored in adjacent columns (e.g., columns 0 and 1,
respectively) can be accomplished by the sensing circuitry of
either of the two columns. However, the addition of two multi-bit
data value sums stored in adjacent columns can be most efficiently
accomplished by the sensing circuitry associated with the column in
which the resultant data value sum (e.g., 472-1) is going to be
stored (e.g., column 1 for the example shown in FIG. 4) so as to
eliminate a shifting operation to accomplish the storing.
The addition of two multi-bit data value sums stored in adjacent
columns can be referred to herein as horizontal addition or
horizontal summing. As can be observed in FIG. 4, a first iteration
of horizontal addition that adds pairs of the 6-bit data value sums
of population count of 16 vertical bit vectors together results in
8 new 7-bit data value sums. These 8 additions can be accomplished
simultaneously utilizing the PIM sensing circuitry of the present
disclosure.
Another iteration of multi-bit addition can add pairs of the 8
7-bit data value sums together and store the resulting data value
sums in 4 8-bit data value sums. Additional adding iterations can
result in 2 9-bit data value sums, and then 1 10-bit data value sum
of the total population count for all of the vertical bit vectors,
which is also the population count for the original 512-bit vector.
That is, for 16 vertical bit vectors 4 horizontal addition
operations will result in a singular data value sum of total
population count. More generally, log.sub.2 (C) horizontal addition
operations are used to sum the data value sums of population count
corresponding to vertical bit vectors stored in C columns.
The efficiency of storing the original vector so as to minimize the
quantity of rows and maximize the quantity of columns should be
apparent since r addition operations are used to add r elements of
a vertical bit vector, but only log.sub.2 (C) horizontal additions
are used to add the data value sums of population count for each
column. Alternatively stated, l summing operation is used for each
element of a vertical bit vector and log.sub.2 (C) summing
operations are used to add the population counts for each of C
vertical bit vectors. For another example, the population count of
a vector stored in 16,384 columns and 256 rows of a memory array,
each column having a sensing circuitry, can be determined in
256+log 2 (16,384)=256+14=270 addition operations.
The log.sub.2 (C) addition operations to arrive at a total
population count reflects that iterative additions can be
implemented to add pairs of population counts together until a
single final sum is determined. Utilizing the PIM sensing circuitry
of the present disclosure, multiple pairs of population counts can
be added together simultaneously, in parallel, with each iteration
of horizontal addition. The result of a first iteration of adding
pairs of the 16 6-bit sums quantifying the population count of the
32-bit vectors is shown in FIG. 4.
While FIG. 4 shows a result of horizontal addition being stored
immediately below the addend in the right-most column, embodiments
of the present disclosure are not so limited. The result of
horizontal addition can be stored in a different location than that
shown for the example illustrated in FIG. 4, in manners similar to
that discussed for the data value sums 370 shown in FIG. 3. Also,
the arrangement with respect to least significant bit and most
significant bit of a numerical result (e.g., 472-1) can be
different than that shown in FIG. 4.
FIG. 5 is a schematic diagram illustrating sensing circuitry in
accordance with a number of embodiments of the present disclosure.
A memory cell comprises a storage element (e.g., capacitor) and an
access device (e.g., transistor). For instance, transistor 502-1
and capacitor 503-1 comprises a memory cell, and transistor 502-2
and capacitor 503-2 comprises a memory cell, etc. In this example,
the memory array 530 is a DRAM array of 1T1C (one transistor one
capacitor) memory cells. In a number of embodiments, the memory
cells may be destructive read memory cells (e.g., reading the data
stored in the cell destroys the data such that the data originally
stored in the cell is refreshed after being read). The cells of the
memory array 530 are arranged in rows coupled by word lines 504-X
(Row X), 504-Y (Row Y), etc., and columns coupled by pairs of
complementary data lines DIGIT(n-1)/DIGIT(n-1)_,
DIGIT(n)/DIGIT(n)_, DIGIT(n+1)/DIGIT(n+1)_. The individual data
lines corresponding to each pair of complementary data lines can
also be referred to as data lines 505-1 (D) and 505-2 (D_)
respectively. Although only three pair of complementary data lines
are shown in FIG. 5, embodiments of the present disclosure are not
so limited, and an array of memory cells can include additional
columns of memory cells and/or data lines (e.g., 4,096, 8,192,
6,384, etc.).
Memory cells can be coupled to different data lines and/or word
lines. For example, a first source/drain region of a transistor
502-1 can be coupled to data line 505-1 (D), a second source/drain
region of transistor 502-1 can be coupled to capacitor 503-1, and a
gate of a transistor 502-1 can be coupled to word line 504-Y. A
first source/drain region of a transistor 502-2 can be coupled to
data line 505-2 (D_), a second source/drain region of transistor
502-2 can be coupled to capacitor 503-2, and a gate of a transistor
502-2 can be coupled to word line 504-X. The cell plate, as shown
in FIG. 5, can be coupled to each of capacitors 503-1 and 503-2.
The cell plate can be a common node to which a reference voltage
(e.g., ground) can be applied in various memory array
configurations.
The memory array 530 is coupled to sensing circuitry 550 in
accordance with a number of embodiments of the present disclosure.
In this example, the sensing circuitry 550 comprises a sense
amplifier 506 and a compute component 531 corresponding to
respective columns of memory cells (e.g., coupled to respective
pairs of complementary data lines). The sense amplifier 506 can
comprise a cross coupled latch, which can be referred to herein as
a primary latch. The sense amplifier 506 can be configured, for
example, as described with respect to FIG. 6.
In the example illustrated in FIG. 5, the circuitry corresponding
to compute component 531 comprises a static latch 564 and an
additional ten transistors that implement, among other things, a
dynamic latch. The dynamic latch and/or static latch of the compute
component 531 can be collectively referred to herein as a secondary
latch, which can serve as an accumulator. As such, the compute
component 531 can operate as and/or be referred to herein as an
accumulator. The compute component 531 can be coupled to each of
the data lines D 505-1 and D_ 505-2 as shown in FIG. 5. However,
embodiments are not limited to this example. The transistors of
compute component 531 can all be n-channel transistors (e.g., NMOS
transistors), for example.
In this example, data line D 505-1 can be coupled to a first
source/drain region of transistors 516-1 and 539-1, as well as to a
first source/drain region of load/pass transistor 518-1. Data line
D_ 505-2 can be coupled to a first source/drain region of
transistors 516-2 and 539-2, as well as to a first source/drain
region of load/pass transistor 518-2.
The gates of load/pass transistor 518-1 and 518-2 can be commonly
coupled to a LOAD control signal, or respectively coupled to a
PASSD/PASSDB control signal, as discussed further below. A second
source/drain region of load/pass transistor 518-1 can be directly
coupled to the gates of transistors 516-1 and 539-2. A second
source/drain region of load/pass transistor 518-2 can be directly
coupled to the gates of transistors 516-2 and 539-1.
A second source/drain region of transistor 516-1 can be directly
coupled to a first source/drain region of pull-down transistor
514-1. A second source/drain region of transistor 539-1 can be
directly coupled to a first source/drain region of pull-down
transistor 507-1. A second source/drain region of transistor 516-2
can be directly coupled to a first source/drain region of pull-down
transistor 514-2. A second source/drain region of transistor 539-2
can be directly coupled to a first source/drain region of pull-down
transistor 507-2. A second source/drain region of each of pull-down
transistors 507-1, 507-2, 514-1, and 514-2 can be commonly coupled
together to a reference voltage line 591-1 (e.g., ground (GND)). A
gate of pull-down transistor 507-1 can be coupled to an AND control
signal line, a gate of pull-down transistor 514-1 can be coupled to
an ANDinv control signal line 513-1, a gate of pull-down transistor
514-2 can be coupled to an ORinv control signal line 513-2, and a
gate of pull-down transistor 507-2 can be coupled to an OR control
signal line.
The gate of transistor 539-1 can be referred to as node S1, and the
gate of transistor 539-2 can be referred to as node S2. The circuit
shown in FIG. 5 stores accumulator data dynamically on nodes S1 and
S2. Activating the LOAD control signal causes load/pass transistors
518-1 and 518-2 to conduct, and thereby load complementary data
onto nodes S1 and S2. The LOAD control signal can be elevated to a
voltage greater than V.sub.DD to pass a full V.sub.DD level to
S1/S2. However, elevating the LOAD control signal to a voltage
greater than V.sub.DD is optional, and functionality of the circuit
shown in FIG. 5 is not contingent on the LOAD control signal being
elevated to a voltage greater than V.sub.DD.
The configuration of compute component 531 shown in FIG. 5 has the
benefit of balancing the sense amplifier for functionality when the
pull-down transistors 507-1, 507-2, 514-1, and 514-2 are conducting
before the sense amplifier 506 is fired (e.g., during pre-seeding
of the sense amplifier 506). As used herein, firing the sense
amplifier 506 refers to enabling the sense amplifier 506 to set the
primary latch and subsequently disabling the sense amplifier 506 to
retain the set primary latch. Performing logical operations after
equilibration is disabled (in the sense amp), but before the sense
amplifier fires, can save power usage because the latch of the
sense amplifier does not have to be "flipped" using full rail
voltages (e.g., V.sub.DD, GND).
Inverting transistors can pull-down a respective data line in
performing certain logical operations. For example, transistor
516-1 (having a gate coupled to S2 of the dynamic latch) in series
with transistor 514-1 (having a gate coupled to an ANDinv control
signal line 513-1) can be operated to pull-down data line 505-1
(D), and transistor 516-2 (having a gate coupled to S1 of the
dynamic latch) in series with transistor 514-2 (having a gate
coupled to an ANDinv control signal line 513-2) can be operated to
pull-down data line 505-2 (D_).
The latch 564 can be controllably enabled by coupling to an active
negative control signal line 512-1 (ACCUMB) and an active positive
control signal line 512-2 (ACCUM) rather than be configured to be
continuously enabled by coupling to ground and V.sub.DD. In various
embodiments, load/pass transistors 508-1 and 508-2 can each having
a gate coupled to one of a LOAD control signal or a PASSD/PASSDB
control signal.
According to some embodiments, the gates of load/pass transistors
518-1 and 518-2 can be commonly coupled to a LOAD control signal.
In the configuration where the gates of load/pass transistors 518-1
and 518-2 are commonly coupled to the LOAD control signal,
transistors 518-1 and 518-2 can be load transistors. Activating the
LOAD control signal causes the load transistors to conduct, and
thereby load complementary data onto nodes S1 and S2. The LOAD
control signal can be elevated to a voltage greater than V.sub.DD
to pass a full V.sub.DD level to S1/S2. However, the LOAD control
signal need not be elevated to a voltage greater than V.sub.DD is
optional, and functionality of the circuit shown in FIG. 5 is not
contingent on the LOAD control signal being elevated to a voltage
greater than V.sub.DD.
According to some embodiments, the gate of load/pass transistor
518-1 can be coupled to a PASSD control signal, and the gate of
load/pass transistor 518-2 can be coupled to a PASSDb control
signal. In the configuration where the gates of transistors 518-1
and 518-2 are respectively coupled to one of the PASSD and PASSDb
control signals, transistors 518-1 and 518-2 can be pass
transistors. Pass transistors can be operated differently (e.g., at
different times and/or under different voltage/current conditions)
than load transistors. As such, the configuration of pass
transistors can be different than the configuration of load
transistors.
Load transistors are constructed to handle loading associated with
coupling data lines to the local dynamic nodes S1 and S2, for
example. Pass transistors are constructed to handle heavier loading
associated with coupling data lines to an adjacent accumulator
(e.g., through the shift circuitry 523, as shown in FIG. 5).
According to some embodiments, load/pass transistors 518-1 and
518-2 can be configured to accommodate the heavier loading
corresponding to a pass transistor but be coupled and operated as a
load transistor. Load/pass transistors 518-1 and 518-2 configured
as pass transistors can also be utilized as load transistors.
However, load/pass transistors 518-1 and 518-2 configured as load
transistors may not be capable of being utilized as pass
transistors.
In a number of embodiments, the compute component 531, including
the latch 564, can comprise a number of transistors formed on pitch
with the transistors of the corresponding memory cells of an array
(e.g., array 530 shown in FIG. 5) to which they are coupled, which
may conform to a particular feature size (e.g., 4F.sup.2, 6F.sup.2,
etc.). According to various embodiments, latch 564 includes four
transistors 508-1, 508-2, 509-1, and 509-2 coupled to a pair of
complementary data lines D 505-1 and D_ 505-2 through load/pass
transistors 518-1 and 518-2. However, embodiments are not limited
to this configuration. The latch 564 can be a cross coupled latch
(e.g., gates of a pair of transistors, such as n-channel
transistors (e.g., NMOS transistors) 509-1 and 509-2 are cross
coupled with the gates of another pair of transistors, such as
p-channel transistors (e.g., PMOS transistors) 508-1 and 508-2). As
described further herein, the cross coupled latch 564 can be
referred to as a static latch.
The voltages or currents on the respective data lines D and D_ can
be provided to the respective latch inputs 517-1 and 517-2 of the
cross coupled latch 564 (e.g., the input of the secondary latch).
In this example, the latch input 517-1 is coupled to a first
source/drain region of transistors 508-1 and 509-1 as well as to
the gates of transistors 508-2 and 509-2. Similarly, the latch
input 517-2 can be coupled to a first source/drain region of
transistors 508-2 and 509-2 as well as to the gates of transistors
508-1 and 509-1.
In this example, a second source/drain region of transistor 509-1
and 509-2 is commonly coupled to a negative control signal line
512-1 (e.g., ground (GND) or ACCUMB control signal similar to
control signal RnIF shown in FIG. 6 with respect to the primary
latch). A second source/drain region of transistors 508-1 and 508-2
is commonly coupled to a positive control signal line 512-2 (e.g.,
V.sub.DD or ACCUM control signal similar to control signal ACT
shown in FIG. 6 with respect to the primary latch). The positive
control signal 512-2 can provide a supply voltage (e.g., V.sub.DD)
and the negative control signal 512-1 can be a reference voltage
(e.g., ground) to enable the cross coupled latch 564. According to
some embodiments, the second source/drain region of transistors
508-1 and 508-2 are commonly coupled directly to the supply voltage
(e.g., V.sub.DD), and the second source/drain region of transistor
509-1 and 509-2 are commonly coupled directly to the reference
voltage (e.g., ground) so as to continuously enable latch 564.
The enabled cross coupled latch 564 operates to amplify a
differential voltage between latch input 517-1 (e.g., first common
node) and latch input 517-2 (e.g., second common node) such that
latch input 517-1 is driven to either the activated positive
control signal voltage (e.g., V.sub.DD) or the activated negative
control signal voltage (e.g., ground), and latch input 517-2 is
driven to the other of the activated positive control signal
voltage (e.g., V.sub.DD) or the activated negative control signal
voltage (e.g., ground).
FIG. 6 is a schematic diagram illustrating a portion of sensing
circuitry in accordance with a number of embodiments of the present
disclosure. According to various embodiments, sense amplifier 606
(e.g., corresponding to sense amplifier 506 shown in FIG. 5) can
comprise a cross coupled latch. However, embodiments of the sense
amplifier 606 are not limited to a cross coupled latch. As an
example, the sense amplifier 606 can be current-mode sense
amplifier and/or single-ended sense amplifier (e.g., sense
amplifier coupled to one data line). Also, embodiments of the
present disclosure are not limited to a folded data line
architecture.
In a number of embodiments, a sense amplifier (e.g., 606) can
comprise a number of transistors formed on pitch with the
transistors of the corresponding compute component 631 and/or the
memory cells of an array (e.g., 530 shown in FIG. 5) to which they
are coupled, which may conform to a particular feature size (e.g.,
4F.sup.2, 6F.sup.2, etc.). The sense amplifier 606 comprises a
latch 615 including four transistors coupled to a pair of
complementary data lines D 605-1 and D_ 605-2. The latch 615 can be
a cross coupled latch (e.g., gates of a pair of transistors, such
as n-channel transistors (e.g., NMOS transistors) 627-1 and 627-2
are cross coupled with the gates of another pair of transistors,
such as p-channel transistors (e.g., PMOS transistors) 629-1 and
629-2). As described further herein, the latch 615 comprising
transistors 627-1, 627-2, 629-1, and 629-2 can be referred to as a
primary latch. However, embodiments are not limited to this
example.
The voltages or currents on the respective data lines D and D_ can
be provided to the respective latch inputs 633-1 and 633-2 of the
cross coupled latch 615 (e.g., the input of the secondary latch).
In this example, the latch input 633-1 is coupled to a first
source/drain region of transistors 627-1 and 629-1 as well as to
the gates of transistors 627-2 and 629-2. Similarly, the latch
input 633-2 can be coupled to a first source/drain region of
transistors 627-2 and 629-2 as well as to the gates of transistors
627-1 and 629-1. The compute component 633 (e.g., serving as an
accumulator) can be coupled to latch inputs 633-1 and 633-2 of the
cross coupled latch 615 as shown; however, embodiments are not
limited to the example shown in FIG. 6.
In this example, a second source/drain region of transistor 627-1
and 627-2 is commonly coupled to an active negative control signal
628 (RnIF). A second source/drain region of transistors 629-1 and
629-2 is commonly coupled to an active positive control signal 690
(ACT). The ACT signal 690 can be a supply voltage (e.g., V.sub.DD)
and the RnIF signal can be a reference voltage (e.g., ground).
Activating signals 628 and 690 enables the cross coupled latch
615.
The enabled cross coupled latch 615 operates to amplify a
differential voltage between latch input 633-1 (e.g., first common
node) and latch input 633-2 (e.g., second common node) such that
latch input 633-1 is driven to one of the ACT signal voltage and
the RnIF signal voltage (e.g., to one of V.sub.DD and ground), and
latch input 633-2 is driven to the other of the ACT signal voltage
and the RnIF signal voltage.
The sense amplifier 606 can also include circuitry configured to
equilibrate the data lines D and D_ (e.g., in association with
preparing the sense amplifier for a sensing operation). In this
example, the equilibration circuitry comprises a transistor 624
having a first source/drain region coupled to a first source/drain
region of transistor 625-1 and data line D 605-1. A second
source/drain region of transistor 624 can be coupled to a first
source/drain region of transistor 625-2 and data line D_ 605-2. A
gate of transistor 624 can be coupled to gates of transistors 625-1
and 625-2.
The second source drain regions of transistors 625-1 and 625-2 are
coupled to an equilibration voltage 638 (e.g., V.sub.DD/2), which
can be equal to V.sub.DD/2, where V.sub.DD is a supply voltage
associated with the array. The gates of transistors 624, 625-1, and
625-2 can be coupled to control signal 625 (EQ). As such,
activating EQ enables the transistors 624, 625-1, and 625-2, which
effectively shorts data line D to data line D_ such that the data
lines D and D_ are equilibrated to equilibration voltage
V.sub.DD/2. According to various embodiments of the present
disclosure, a number of logical operations can be performed using
the sense amplifier, and storing the result in the compute
component (e.g., serving as an accumulator).
As shown in FIG. 5, the sense amplifier 506 and the compute
component 531 can be coupled to the array 530 via shift circuitry
523. In this example, the shift circuitry 523 comprises a pair of
isolation devices (e.g., isolation transistors 521-1 and 521-2)
coupled to data lines 505-1 (D) and 505-2 (D_), respectively). The
isolation transistors 521-1 and 521-2 are coupled to a control
signal 522 (NORM) that, when activated, enables (e.g., turns on)
the isolation transistors 521-1 and 521-2 to couple the
corresponding sense amplifier 506 and compute component 531 to a
corresponding column of memory cells (e.g., to a corresponding pair
of complementary data lines 505-1 (D) and 505-2 (D_)). According to
various embodiments, conduction of isolation transistors 521-1 and
521-2 can be referred to as a "normal" configuration of the shift
circuitry 523.
In the example illustrated in FIG. 5, the shift circuitry 523
includes another (e.g., a second) pair of isolation devices (e.g.,
isolation transistors 521-3 and 521-4) coupled to a complementary
control signal 519 (SHIFT), which can be activated, for example,
when NORM is deactivated. The isolation transistors 521-3 and 521-4
can be operated (e.g., via control signal 519) such that a
particular sense amplifier 506 and compute component 531 are
coupled to a different pair of complementary data lines (e.g., a
pair of complementary data lines different than the pair of
complementary data lines to which isolation transistors 521-1 and
521-2 couple the particular sense amplifier 506 and compute
component 531), or can couple a particular sense amplifier 506 and
compute component 531 to another memory array (and isolate the
particular sense amplifier 506 and compute component 531 from a
first memory array). According to various embodiments, the shift
circuitry 523 can be arranged as a portion of (e.g., within) the
sense amplifier 506, for instance.
Although the shift circuitry 523 shown in FIG. 5 includes isolation
transistors 521-1 and 521-2 used to couple particular sensing
circuitry 550 (e.g., a particular sense amplifier 506 and
corresponding compute component 531) to a particular pair of
complementary data lines 505-1 (D) and 505-2 (D_) (e.g., DIGIT(n)
and DIGIT(n)_) and isolation transistors 521-3 and 521-4 are
arranged to couple the particular sensing circuitry 550 to an
adjacent pair of complementary data lines in one particular
direction (e.g., adjacent data lines DIGIT(n+1) and DIGIT(n+1)_
shown to the right in FIG. 5), embodiments of the present
disclosure are not so limited. For instance, shift circuitry can
include isolation transistors 521-1 and 521-2 used to couple
particular sensing circuitry to a particular pair of complementary
data lines (e.g., DIGIT(n) and DIGIT(n) and isolation transistors
521-3 and 521-4 arranged so as to be used to couple the particular
sensing circuitry to an adjacent pair of complementary data lines
in another particular direction (e.g., adjacent data lines
DIGIT(n-1) and DIGIT(n-1)_ shown to the left in FIG. 5).
Embodiments of the present disclosure are not limited to the
configuration of shift circuitry 523 shown in FIG. 5. In a number
of embodiments, shift circuitry 523 such as that shown in FIG. 5
can be operated (e.g., in conjunction with sense amplifiers 506 and
compute components 531) in association with performing compute
functions such as adding and subtracting functions without
transferring data out of the sensing circuitry 550 via an I/O line
(e.g., local I/O line (IO/IO_)), for instance.
Although not shown in FIG. 5, each column of memory cells can be
coupled to a column decode line that can be enabled to transfer,
via local I/O line, a data value from a corresponding sense
amplifier 506 and/or compute component 531 to a control component
external to the array such as an external processing resource
(e.g., host processor and/or other functional unit circuitry). The
column decode line can be coupled to a column decoder (e.g., column
decoder). However, as described herein, in a number of embodiments,
data need not be transferred via such I/O lines to perform logical
operations in accordance with embodiments of the present
disclosure. In a number of embodiments, shift circuitry 523 can be
operated in conjunction with sense amplifiers 506 and compute
components 531 to perform compute functions such as adding and
subtracting functions without transferring data to a control
component external to the array, for instance.
The sensing circuitry 550 can be operated in several modes to
perform logical operations, including a first mode in which a
result of the logical operation is initially stored in the compute
component 531, and a second mode in which a result of the logical
operation is initially stored in the sense amplifier 506. Operation
of the sensing circuitry 550 in the first mode is described in
association with the timing diagrams shown in FIGS. 9-12. Operation
of the sensing circuitry 550 in the second mode is described below
with respect to FIGS. 7 and 8. Additionally, with respect to the
second operating mode, sensing circuitry 550 can be operated in
both pre-sensing (e.g., sense amps fired before logical operation
control signal active) and post-sensing (e.g., sense amps fired
after logical operation control signal active) modes with a result
of a logical operation being initially stored in the sense
amplifier 506.
As described further below, the sense amplifier 506 can, in
conjunction with the compute component 531, be operated to perform
various logical operations using data from an array as input. In a
number of embodiments, the result of a logical operation can be
stored back to the array without transferring the data via a data
line address access (e.g., without firing a column decode signal
such that data is transferred to circuitry external from the array
and sensing circuitry via local I/O lines). As such, a number of
embodiments of the present disclosure can enable performing logical
operations and compute functions associated therewith using less
power than various previous approaches. Additionally, since a
number of embodiments eliminate the need to transfer data across
I/O lines in order to perform compute functions (e.g., between
memory and discrete processor), a number of embodiments can enable
an increased parallel processing capability as compared to previous
approaches.
The functionality of the sensing circuitry 550 of FIG. 5 is
described below and summarized in Table 2 below with respect to
performing logical operations and initially storing a result in the
sense amplifier 506. Initially storing the result of a particular
logical operation in the primary latch of sense amplifier 506 can
provide improved versatility as compared to previous approaches in
which the result may initially reside in a secondary latch (e.g.,
serving as an accumulator) of a compute component 531, and then be
subsequently transferred to the sense amplifier 506, for
instance.
TABLE-US-00001 TABLE 2 Operation Accumulator Sense Amp AND
Unchanged Result OR Unchanged Result NOT Unchanged Result SHIFT
Unchanged Shifted Data
Initially storing the result of a particular operation in the sense
amplifier 506 (e.g., without having to perform an additional
operation to move the result from the compute component 531 (e.g.,
serving as an accumulator) to the sense amplifier 506) is
advantageous because, for instance, the result can be written to a
row (of the array of memory cells) or back into the accumulator
without performing a precharge cycle (e.g., on the complementary
data lines 505-1 (D) and/or 505-2 (D_)).
FIG. 7 illustrates a timing diagram associated with performing a
number of logical operations using sensing circuitry in accordance
with a number of embodiments of the present disclosure.
The sensing circuitry 550 can be operated in several modes to
perform logical operations, including a second mode in which a
result of the logical operation is initially stored in the sense
amplifier 506, and a first mode in which a result of the logical
operation is initially stored in the compute component 531.
Operation of the sensing circuitry 550 in the second mode is
described below with respect to FIGS. 7 and 8, and operation of the
sensing circuitry 550 in the first mode is described below with
respect to FIGS. 9-12. Additionally, with respect to the second
operating mode, sensing circuitry 550 can be operated in both
pre-sensing (e.g., sense amps fired before logical operation
control signal active) and post-sensing (e.g., sense amps fired
after logical operation control signal active) modes with a result
of a logical operation being initially stored in the sense
amplifier 506.
FIG. 7 illustrates a timing diagram associated with initiating an
AND logical operation on a first operand and a second operand. In
this example, the first operand is stored in a memory cell coupled
to a first access line (e.g., ROW X) and the second operand is
stored in a memory cell coupled to a second access line (e.g., ROW
Y). Although the example refers to performing an AND on data stored
in cells corresponding to one particular column, embodiments are
not so limited. For instance, an entire row of data values can be
ANDed, in parallel, with a different row of data values. For
example, if an array comprises 2,048 columns, then 2,048 AND
operations could be performed in parallel.
FIG. 7 illustrates a number of control signals associated with
operating sensing circuitry (e.g., 550) to perform the AND logical
operation. "EQ" corresponds to an equilibrate signal applied to the
sense amplifier 506, "ROW X" corresponds to an enabling signal
applied to access line 504-X, "ROW Y" corresponds to an enabling
signal applied to access line 504-Y, "Act" and "RnIF" correspond to
a respective active positive and negative control signal applied to
the sense amplifier 506, "LOAD" corresponds to a load control
signal (e.g., LOAD/PASSD and LOAD/PASSDb shown in FIG. 5), and
"AND" corresponds to the AND control signal shown in FIG. 5. FIG. 7
also illustrates the waveform diagrams showing the signals (e.g.,
voltage signals) on the digit lines D and D_ corresponding to sense
amplifier 506 and on the nodes S1 and S2 corresponding to the
compute component 531 (e.g., serving as an accumulator) during an
AND logical operation for the various data value combinations of
the Row X and Row Y data values (e.g., diagrams correspond to
respective data value combinations 00, 10, 01, 11). The particular
timing diagram waveforms are discussed below with respect to the
pseudo code associated with an AND operation of the circuit shown
in FIG. 5.
An example of pseudo code associated with loading (e.g., copying) a
first data value stored in a cell coupled to row 504-X into the
accumulator can be summarized as follows:
Copy Row X into the Accumulator: Deactivate EQ; Open Row X; Fire
Sense Amps (after which Row X data resides in the sense amps);
Activate LOAD (sense amplifier data (Row X) is transferred to nodes
S1 and S2 of the Accumulator and resides there dynamically);
Deactivate LOAD; Close Row X; Precharge;
In the pseudo code above, "Deactivate EQ" indicates that an
equilibration signal (EQ signal shown in FIG. 7) corresponding to
the sense amplifier 506 is disabled at t.sub.1 as shown in FIG. 7
(e.g., such that the complementary data lines (e.g., 505-1 (D) and
505-2 (D_) are no longer shorted to V.sub.DD/2). After
equilibration is disabled, a selected row (e.g., ROW X) is enabled
(e.g., selected, opened such as by activating a signal to select a
particular row) as indicated by "Open Row X" in the pseudo code and
shown at t.sub.2 for signal Row X in FIG. 7. When the voltage
signal applied to ROW X reaches the threshold voltage (Vt) of the
access transistor (e.g., 502-2) corresponding to the selected cell,
the access transistor turns on and couples the data line (e.g.,
505-2 (D_)) to the selected cell (e.g., to capacitor 503-2) which
creates a differential voltage signal between the data lines.
After Row X is enabled in the pseudo code above, "Fire Sense Amps"
indicates that the sense amplifier 506 is enabled to set the
primary latch and subsequently disabled. For example, as shown at
t.sub.3 in FIG. 7, the ACT positive control signal (e.g., 690 shown
in FIG. 6) goes high and the RnIF negative control signal (e.g.,
628 shown in FIG. 6) goes low, which amplifies the differential
signal between 505-1 (D) and D_ 505-2, resulting in a voltage
(e.g., V.sub.DD) corresponding to a logic 1 or a voltage (e.g.,
GND) corresponding to a logic 0 being on data line 505-1 (D) (and
the voltage corresponding to the other logic state being on
complementary data line 505-2 (D_)). The sensed data value is
stored in the primary latch of sense amplifier 506. The primary
energy consumption occurs in charging the data lines (e.g., 505-1
(D) or 505-2 (D_)) from the equilibration voltage V.sub.DD/2 to the
rail voltage V.sub.DD.
The four sets of possible sense amplifier and accumulator signals
illustrated in FIG. 7 (e.g., one for each combination of Row X and
Row Y data values) shows the behavior of signals on data lines D
and D_. The Row X data value is stored in the primary latch of the
sense amp. It should be noted that FIG. 5 shows that the memory
cell including storage element 503-2 and access transistor 502-2,
corresponding to Row X, is coupled to the complementary data line
D_, while the memory cell including storage element 503-1 and
access transistor 502-1, corresponding to Row Y, is coupled to data
line D. However, as can be seen in FIG. 5, the charge stored in the
memory cell comprising access transistor 502-2 (corresponding to
Row X) corresponding to a "0" data value causes the voltage on data
line D_ (to which access transistor 502-2 is coupled) to go high
and the charge stored in the memory cell comprising access
transistor 502-2 corresponding to a "1" data value causes the
voltage on data line D_ to go low, which is opposite correspondence
between data states and charge stored in the memory cell comprising
access transistor 502-1, corresponding to Row Y, that is coupled to
data line D. These differences in storing charge in memory cells
coupled to different data lines is appropriately accounted for when
writing data values to the respective memory cells.
After firing the sense amps, in the pseudo code above, "Activate
LOAD" indicates that the LOAD control signal goes high as shown at
t.sub.4 in FIG. 7, causing load/pass transistors 518-1 and 518-2 to
conduct. In this manner, activating the LOAD control signal enables
the secondary latch in the accumulator of the compute component
531. The sensed data value stored in the sense amplifier 506 is
transferred (e.g., copied) to the secondary latch. As shown for
each of the four sets of possible sense amplifier and accumulator
signals illustrated in FIG. 7, the behavior at inputs of the
secondary latch of the accumulator indicates the secondary latch is
loaded with the Row X data value. As shown in FIG. 7, the secondary
latch of the accumulator may flip (e.g., see accumulator signals
for Row X="0" and Row Y="0" and for Row X="1" and Row Y="0"), or
not flip (e.g., see accumulator signals for Row X="0" and Row Y="1"
and for Row X="1" and Row Y="1"), depending on the data value
previously stored in the dynamic latch.
After setting the secondary latch from the data values stored in
the sense amplifier (and present on the data lines 505-1 (D) and
505-2 (D_), in the pseudo code above, "Deactivate LOAD" indicates
that the LOAD control signal goes back low as shown at t.sub.5 in
FIG. 7 to cause the load/pass transistors 518-1 and 518-2 to stop
conducting and thereby isolate the dynamic latch from the
complementary data lines. However, the data value remains
dynamically stored in secondary latch of the accumulator.
After storing the data value on the secondary latch, the selected
row (e.g., ROW X) is disabled (e.g., deselected, closed such as by
deactivating a select signal for a particular row) as indicated by
"Close Row X" and indicated at t.sub.6 in FIG. 7, which can be
accomplished by the access transistor turning off to decouple the
selected cell from the corresponding data line. Once the selected
row is closed and the memory cell is isolated from the data lines,
the data lines can be precharged as indicated by the "Precharge" in
the pseudo code above. A precharge of the data lines can be
accomplished by an equilibrate operation, as indicated in FIG. 7 by
the EQ signal going high at t.sub.7. As shown in each of the four
sets of possible sense amplifier and accumulator signals
illustrated in FIG. 7 at t.sub.7, the equilibrate operation causes
the voltage on data lines D and D_ to each return to V.sub.DD/2.
Equilibration can occur, for instance, prior to a memory cell
sensing operation or the logical operations (described below).
A subsequent operation phase associated with performing the AND or
the OR operation on the first data value (now stored in the sense
amplifier 506 and the secondary latch of the compute component 531)
and the second data value (stored in a memory cell 502-1 coupled to
Row Y 504-Y) includes performing particular steps which depend on
the whether an AND or an OR is to be performed. Examples of pseudo
code associated with "ANDing" and "ORing" the data value residing
in the accumulator (e.g., the first data value stored in the memory
cell 502-2 coupled to Row X 504-X) and the second data value (e.g.,
the data value stored in the memory cell 502-1 coupled to Row Y
504-Y) are summarized below. Example pseudo code associated with
"ANDing" the data values can include:
Deactivate EQ;
Open Row Y;
Fire Sense Amps (after which Row Y data resides in the sense
amps);
Close Row Y; The result of the logic operation, in the next
operation, will be placed on the sense amp, which will overwrite
any row that is active; Even when Row Y is closed, the sense
amplifier still contains the Row Y data value;
Activate AND; This results in the sense amplifier being written to
the value of the function (e.g., Row X AND Row Y); If the
accumulator contains a "0" (i.e., a voltage corresponding to a "0"
on node S2 and a voltage corresponding to a "1" on node S1), the
sense amplifier data is written to a "0"; If the accumulator
contains a "1" (i.e., a voltage corresponding to a "1" on node S2
and a voltage corresponding to a "0" on node S1), the sense
amplifier data remains unchanged (Row Y data); This operation
leaves the data in the accumulator unchanged;
Deactivate AND;
Precharge;
In the pseudo code above, "Deactivate EQ" indicates that an
equilibration signal corresponding to the sense amplifier 506 is
disabled (e.g., such that the complementary data lines 505-1 (D)
and 505-2 (D_) are no longer shorted to V.sub.DD/2), which is
illustrated in FIG. 7 at t.sub.8. After equilibration is disabled,
a selected row (e.g., ROW Y) is enabled as indicated in the pseudo
code above by "Open Row Y" and shown in FIG. 7 at t.sub.9. When the
voltage signal applied to ROW Y reaches the threshold voltage (Vt)
of the access transistor (e.g., 502-1) corresponding to the
selected cell, the access transistor turns on and couples the data
line (e.g., D_ 505-1) to the selected cell (e.g., to capacitor
503-1) which creates a differential voltage signal between the data
lines.
After Row Y is enabled, in the pseudo code above, "Fire Sense Amps"
indicates that the sense amplifier 506 is enabled to amplify the
differential signal between 505-1 (D) and 505-2 (D_), resulting in
a voltage (e.g., V.sub.DD) corresponding to a logic 1 or a voltage
(e.g., GND) corresponding to a logic 0 being on data line 505-1 (D)
(and the voltage corresponding to the other logic state being on
complementary data line 505-2 (D_)). As shown at t.sub.10 in FIG.
7, the ACT positive control signal (e.g., 690 shown in FIG. 6) goes
high and the RnIF negative control signal (e.g., 628 shown in FIG.
6) goes low to fire the sense amps. The sensed data value from
memory cell 502-1 is stored in the primary latch of sense amplifier
506, as previously described. The secondary latch still corresponds
to the data value from memory cell 502-2 since the dynamic latch is
unchanged.
After the second data value sensed from the memory cell 502-1
coupled to Row Y is stored in the primary latch of sense amplifier
506, in the pseudo code above, "Close Row Y" indicates that the
selected row (e.g., ROW Y) can be disabled if it is not desired to
store the result of the AND logical operation back in the memory
cell corresponding to Row Y. However, FIG. 7 shows that Row Y is
left enabled such that the result of the logical operation can be
stored back in the memory cell corresponding to Row Y. Isolating
the memory cell corresponding to Row Y can be accomplished by the
access transistor turning off to decouple the selected cell 502-1
from the data line 505-1 (D). After the selected Row Y is
configured (e.g., to isolate the memory cell or not isolate the
memory cell), "Activate AND" in the pseudo code above indicates
that the AND control signal goes high as shown in FIG. 7 at
t.sub.11, causing pull down transistor 507-1 to conduct. In this
manner, activating the AND control signal causes the value of the
function (e.g., Row X AND Row Y) to be written to the sense
amp.
With the first data value (e.g., Row X) stored in the dynamic latch
of the accumulator 531 and the second data value (e.g., Row Y)
stored in the sense amplifier 506, if the dynamic latch of the
compute component 531 contains a "0" (i.e., a voltage corresponding
to a "0" on node S2 and a voltage corresponding to a "1" on node
S1), the sense amplifier data is written to a "0" (regardless of
the data value previously stored in the sense amp) since the
voltage corresponding to a "1" on node S1 causes transistor 509-1
to conduct thereby coupling the sense amplifier 506 to ground
through transistor 509-1, pull down transistor 507-1 and data line
505-1 (D). When either data value of an AND operation is "0," the
result is a "0." Here, when the second data value (in the dynamic
latch) is a "0," the result of the AND operation is a "0"
regardless of the state of the first data value, and so the
configuration of the sensing circuitry causes the "0" result to be
written and initially stored in the sense amplifier 506. This
operation leaves the data value in the accumulator unchanged (e.g.,
from Row X).
If the secondary latch of the accumulator contains a "1" (e.g.,
from Row X), then the result of the AND operation depends on the
data value stored in the sense amplifier 506 (e.g., from Row Y).
The result of the AND operation should be a "1" if the data value
stored in the sense amplifier 506 (e.g., from Row Y) is also a "1,"
but the result of the AND operation should be a "0" if the data
value stored in the sense amplifier 506 (e.g., from Row Y) is also
a "0." The sensing circuitry 550 is configured such that if the
dynamic latch of the accumulator contains a "1" (i.e., a voltage
corresponding to a "1" on node S2 and a voltage corresponding to a
"0" on node S1), transistor 509-1 does not conduct, the sense
amplifier is not coupled to ground (as described above), and the
data value previously stored in the sense amplifier 506 remains
unchanged (e.g., Row Y data value so the AND operation result is a
"1" if the Row Y data value is a "1" and the AND operation result
is a "0" if the Row Y data value is a "0"). This operation leaves
the data value in the accumulator unchanged (e.g., from Row X).
After the result of the AND operation is initially stored in the
sense amplifier 506, "Deactivate AND" in the pseudo code above
indicates that the AND control signal goes low as shown at t.sub.12
in FIG. 7, causing pull down transistor 507-1 to stop conducting to
isolate the sense amplifier 506 (and data line 505-1 (D)) from
ground. If not previously done, Row Y can be closed (as shown at
t.sub.13 in FIG. 7) and the sense amplifier can be disabled (as
shown at t.sub.14 in FIG. 7 by the ACT positive control signal
going low and the RnIF negative control signal goes high). With the
data lines isolated, "Precharge" in the pseudo code above can cause
a precharge of the data lines by an equilibrate operation, as
described previously (e.g., commencing at t.sub.14 shown in FIG.
7).
FIG. 7 shows, in the alternative, the behavior of voltage signals
on the data lines (e.g., 505-1 (D) and 505-2 (D_) shown in FIG. 5)
coupled to the sense amplifier (e.g., 506 shown in FIG. 5) and the
behavior of voltage signals on nodes S1 and S1 of the secondary
latch of the compute component (e.g., 531 shown in FIG. 5) for an
AND logical operation involving each of the possible combination of
operands (e.g., Row X/Row Y data values 00, 10, 01, and 11).
Although the timing diagrams illustrated in FIG. 7 and the pseudo
code described above indicate initiating the AND logical operation
after starting to load the second operand (e.g., Row Y data value)
into the sense amplifier, the circuit shown in FIG. 5 can be
successfully operated by initiating the AND logical operation
before starting to load the second operand (e.g., Row Y data value)
into the sense amplifier.
FIG. 8 illustrates a timing diagram associated with performing a
number of logical operations using sensing circuitry in accordance
with a number of embodiments of the present disclosure. FIG. 8
illustrates a timing diagram associated with initiating an OR
logical operation after starting to load the second operand (e.g.,
Row Y data value) into the sense amplifier. FIG. 8 illustrates the
sense amplifier and accumulator signals for various combinations of
first and second operand data values. The particular timing diagram
signals are discussed below with respect to the pseudo code
associated with an AND logical operation of the circuit shown in
FIG. 5.
A subsequent operation phase can alternately be associated with
performing the OR operation on the first data value (now stored in
the sense amplifier 506 and the secondary latch of the compute
component 531) and the second data value (stored in a memory cell
502-1 coupled to Row Y 504-Y). The operations to load the Row X
data into the sense amplifier and accumulator that were previously
described with respect to times t.sub.1-t.sub.7 shown in FIG. 7 are
not repeated with respect to FIG. 8. Example pseudo code associated
with "ORing" the data values can include:
Deactivate EQ;
Open Row Y;
Fire Sense Amps (after which Row Y data resides in the sense
amps);
Close Row Y; When Row Y is closed, the sense amplifier still
contains the Row Y data value;
Activate OR; This results in the sense amplifier being written to
the value of the function (e.g., Row X OR Row Y), which may
overwrite the data value from Row Y previously stored in the sense
amplifier as follows: If the accumulator contains a "0" (i.e., a
voltage corresponding to a "0" on node S2 and a voltage
corresponding to a "1" on node S1), the sense amplifier data
remains unchanged (Row Y data); If the accumulator contains a "1"
(i.e., a voltage corresponding to a "1" on node S2 and a voltage
corresponding to a "0" on node S1), the sense amplifier data is
written to a "1"; This operation leaves the data in the accumulator
unchanged;
Deactivate OR;
Precharge;
The "Deactivate EQ" (shown at t.sub.8 in FIG. 8), "Open Row Y"
(shown at t.sub.9 in FIG. 8), "Fire Sense Amps" (shown at t.sub.10
in FIG. 8), and "Close Row Y" (shown at t.sub.13 in FIG. 8, and
which may occur prior to initiating the particular logical function
control signal), shown in the pseudo code above indicate the same
functionality as previously described with respect to the AND
operation pseudo code. Once the configuration of selected Row Y is
appropriately configured (e.g., enabled if logical operation result
is to be stored in memory cell corresponding to Row Y or closed to
isolate memory cell if result if logical operation result is not to
be stored in memory cell corresponding to Row Y), "Activate OR" in
the pseudo code above indicates that the OR control signal goes
high as shown at t.sub.11 in FIG. 8, which causes pull down
transistor 507-2 to conduct. In this manner, activating the OR
control signal causes the value of the function (e.g., Row X OR Row
Y) to be written to the sense amp.
With the first data value (e.g., Row X) stored in the secondary
latch of the compute component 531 and the second data value (e.g.,
Row Y) stored in the sense amplifier 506, if the dynamic latch of
the accumulator contains a "0" (i.e., a voltage corresponding to a
"0" on node S2 and a voltage corresponding to a "1" on node S1),
then the result of the OR operation depends on the data value
stored in the sense amplifier 506 (e.g., from Row Y). The result of
the OR operation should be a "1" if the data value stored in the
sense amplifier 506 (e.g., from Row Y) is a "1," but the result of
the OR operation should be a "0" if the data value stored in the
sense amplifier 506 (e.g., from Row Y) is also a "0." The sensing
circuitry 550 is configured such that if the dynamic latch of the
accumulator contains a "0," with the voltage corresponding to a "0"
on node S2, transistor 509-2 is off and does not conduct (and pull
down transistor 507-1 is also off since the AND control signal is
not asserted) so the sense amplifier 506 is not coupled to ground
(either side), and the data value previously stored in the sense
amplifier 506 remains unchanged (e.g., Row Y data value such that
the OR operation result is a "1" if the Row Y data value is a "1"
and the OR operation result is a "0" if the Row Y data value is a
"0").
If the dynamic latch of the accumulator contains a "1" (i.e., a
voltage corresponding to a "1" on node S2 and a voltage
corresponding to a "0" on node S1), transistor 509-2 does conduct
(as does pull down transistor 507-2 since the OR control signal is
asserted), and the sense amplifier 506 input coupled to data line
505-2 (D_) is coupled to ground since the voltage corresponding to
a "1" on node S2 causes transistor 509-2 to conduct along with pull
down transistor 507-2 (which also conducts since the OR control
signal is asserted). In this manner, a "1" is initially stored in
the sense amplifier 506 as a result of the OR operation when the
secondary latch of the accumulator contains a "1" regardless of the
data value previously stored in the sense amp. This operation
leaves the data in the accumulator unchanged. FIG. 8 shows, in the
alternative, the behavior of voltage signals on the data lines
(e.g., 505-1 (D) and 505-2 (D_) shown in FIG. 5) coupled to the
sense amplifier (e.g., 506 shown in FIG. 5) and the behavior of
voltage signals on nodes S1 and S2 of the secondary latch of the
compute component 531 for an OR logical operation involving each of
the possible combination of operands (e.g., Row X/Row Y data values
00, 10, 01, and 11).
After the result of the OR operation is initially stored in the
sense amplifier 506, "Deactivate OR" in the pseudo code above
indicates that the OR control signal goes low as shown at t.sub.12
in FIG. 8, causing pull down transistor 507-2 to stop conducting to
isolate the sense amplifier 506 (and data line D 505-2) from
ground. If not previously done, Row Y can be closed (as shown at
t.sub.13 in FIG. 8) and the sense amplifier can be disabled (as
shown at t.sub.14 in FIG. 8 by the ACT positive control signal
going low and the RnIF negative control signal going high). With
the data lines isolated, "Precharge" in the pseudo code above can
cause a precharge of the data lines by an equilibrate operation, as
described previously and shown at t.sub.14 in FIG. 8.
The sensing circuitry 550 illustrated in FIG. 5 can provide
additional logical operations flexibility as follows. By
substituting operation of the ANDinv control signal for operation
of the AND control signal, and/or substituting operation of the
ORinv control signal for operation of the OR control signal in the
AND and OR operations described above, the logical operations can
be changed from {Row X AND Row Y} to {.about.Row X AND Row Y}
(where ".about.Row X" indicates an opposite of the Row X data
value, e.g., NOT Row X) and can be changed from {Row X OR Row Y} to
{.about.Row X OR Row Y}. For example, during an AND operation
involving the inverted data values, the ANDinv control signal can
be asserted instead of the AND control signal, and during an OR
operation involving the inverted data values, the ORInv control
signal can be asserted instead of the OR control signal. Activating
the ORinv control signal causes transistor 514-1 to conduct and
activating the ANDinv control signal causes transistor 514-2 to
conduct. In each case, asserting the appropriate inverted control
signal can flip the sense amplifier and cause the result initially
stored in the sense amplifier 506 to be that of the AND operation
using inverted Row X and true Row Y data values or that of the OR
operation using the inverted Row X and true Row Y data values. A
true or complement version of one data value can be used in the
accumulator to perform the logical operation (e.g., AND, OR), for
example, by loading a data value to be inverted first and a data
value that is not to be inverted second.
In a similar approach to that described above with respect to
inverting the data values for the AND and OR operations described
above, the sensing circuitry shown in FIG. 5 can perform a NOT
(e.g., invert) operation by putting the non-inverted data value
into the dynamic latch of the accumulator and using that data to
invert the data value in the sense amplifier 506. As previously
mentioned, activating the ORinv control signal causes transistor
514-1 to conduct and activating the ANDinv control signal causes
transistor 514-2 to conduct. The ORinv and/or ANDinv control
signals are used in implementing the NOT function, as described
further below:
Copy Row X into the Accumulator Deactivate EQ Open Row X Fire Sense
Amps (after which Row X data resides in the sense amps) Activate
LOAD (sense amplifier data (Row X) is transferred to nodes S1 and
S2 of the Accumulator and resides there dynamically Deactivate LOAD
Activate ANDinv and ORinv (which puts the compliment data value on
the data lines) This results in the data value in the sense
amplifier being inverted (e.g., the sense amplifier latch is
flipped) This operation leaves the data in the accumulator
unchanged Deactivate ANDinv and ORinv Close Row X Precharge
The "Deactivate EQ," "Open Row X," "Fire Sense Amps," "Activate
LOAD," and "Deactivate LOAD" shown in the pseudo code above
indicate the same functionality as the same operations in the
pseudo code for the "Copy Row X into the Accumulator" initial
operation phase described above prior to pseudo code for the AND
operation and OR operation. However, rather than closing the Row X
and precharging after the Row X data is loaded into the sense
amplifier 506 and copied into the dynamic latch, a complement
version of the data value in the dynamic latch of the accumulator
can be placed on the data line and thus transferred to the sense
amplifier 506 by enabling (e.g., causing transistor to conduct) and
disabling the invert transistors (e.g., ANDinv and ORinv). This
results in the sense amplifier 506 being flipped from the true data
value that was previously stored in the sense amplifier to a
complement data value (e.g., inverted data value) stored in the
sense amp. That is, a true or complement version of the data value
in the accumulator can be transferred to the sense amplifier by
activating and deactivating ANDinv and ORinv. This operation leaves
the data in the accumulator unchanged.
Because the sensing circuitry 550 shown in FIG. 5 initially stores
the result of the AND, OR, and NOT logical operations in the sense
amplifier 506 (e.g., on the sense amplifier nodes), these logical
operation results can be communicated easily and quickly to any
enabled row and/or into the secondary latch of the compute
component 531. The sense amplifier 506 and sequencing for the AND,
OR, and/or NOT logical operations can also be interchanged by
appropriate firing of the AND, OR, ANDinv, and/or ORinv control
signals (a
References