U.S. patent application number 15/078682 was filed with the patent office on 2016-12-15 for image stream pipeline controller for deploying image primitives to a computation fabric.
This patent application is currently assigned to Intel Corporation. The applicant listed for this patent is Intel Corporation. Invention is credited to Scott A. Krig, Stewart N. Taylor.
Application Number | 20160364832 15/078682 |
Document ID | / |
Family ID | 48698178 |
Filed Date | 2016-12-15 |
United States Patent
Application |
20160364832 |
Kind Code |
A1 |
Krig; Scott A. ; et
al. |
December 15, 2016 |
IMAGE STREAM PIPELINE CONTROLLER FOR DEPLOYING IMAGE PRIMITIVES TO
A COMPUTATION FABRIC
Abstract
According to some embodiments, an image pipeline controller may
determine an image stream having a plurality of image primitives to
be executed. Each image primitive may be, for example, associated
with an image algorithm and a set of primitive attributes. The
image pipeline controller may then automatically deploy the set of
image primitives to an image computation fabric based at least in
part on primitive attributes.
Inventors: |
Krig; Scott A.; (Folsom,
CA) ; Taylor; Stewart N.; (Los Altos, CA) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
Intel Corporation |
Santa Clara |
CA |
US |
|
|
Assignee: |
Intel Corporation
Santa Clara
CA
|
Family ID: |
48698178 |
Appl. No.: |
15/078682 |
Filed: |
March 23, 2016 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
13994013 |
Jun 13, 2013 |
9378534 |
|
|
PCT/US2011/067487 |
Dec 28, 2011 |
|
|
|
15078682 |
|
|
|
|
Current U.S.
Class: |
1/1 |
Current CPC
Class: |
G06T 1/20 20130101; G06T
7/12 20170101; G06K 9/48 20130101; G06T 2207/10016 20130101; G06T
7/49 20170101; G06K 9/00335 20130101; G06T 15/005 20130101 |
International
Class: |
G06T 1/20 20060101
G06T001/20; G06T 7/40 20060101 G06T007/40; G06K 9/00 20060101
G06K009/00; G06T 7/00 20060101 G06T007/00 |
Claims
1-29. (canceled)
30. A method, comprising generating, with a run-time framework, a
plurality of image primitives grouped into image segments, wherein
at least one of the image primitives to create output image data;
wherein at least one image primitive is associated with an image
primitive library, the image primitives to be at least one of: (i)
a histogram primitive, (ii) a scaling primitive; or (iii) a machine
vision primitive; and deploying, by the run-time framework, the
plurality of image primitives to hardware for execution.
31. The method of claim 30, wherein image segments are grouped for
at least one of in-order execution or out-of-order execution.
32. The method of claim 30, wherein image segments are grouped in
the run-time framework for in-order execution and out-of-order
execution, wherein a first run-time framework is executed in order
and a second run-time framework is executed out of order.
33. The method of claim 30, wherein the output image data is output
to a display.
34. The method of claim 30, wherein the image primitives generate
output image data from input image input received from an image
sensor.
35. The method of claim 30, wherein the generation of the run-time
framework is powered by a battery power source.
36. The method of claim 30, wherein the run-time framework
comprises at least one of: (i) a hardware run-time framework, (ii)
a software run-time framework, or (iii) a combination of hardware
and software run-time framework components.
37. The method of claim 30, wherein image primitives associated
with a plurality of image segments are deployed.
38. The method of claim 30, wherein at least one of the segments is
executed by an operating system, and information about the segments
is associated with an application programming interface.
39. The method of claim 30, further comprising the hardware for
execution comprises at least one of a system on a chip, a
computation fabric, a processing unit, or a fixed function hardware
image processing unit
40. The method of claim 30, further comprising executing a
sequencing algorithm to order the image primitives within an image
segment for an in-order image primitive execution in the run-time
framework.
41. The method of claim 30, wherein the image primitives comprise
an original order and said executing is performed for at least some
of the image primitives in an order different than the original
order for an out-of-order image primitive execution in the run-time
framework.
42. The method of claim 30, wherein the output image data is
produced by the image primitives before any reader of that data
accesses it.
43. The method of claim 30, wherein at least one image segment
comprises at least one of: (i) pixel correction, (ii) artifact
removal, (iii) histogram information, (iv) a scaling function, (v)
face recognition, (vi) visual object recognition, (vii) visual
scene analysis, (viii) machine vision, (ix) gesture recognition, or
(x) depth map calculation.
44. A non-transitory computer-readable storage on at least one
medium having stored thereon instructions that when executed:
generate, with a run-time framework, a plurality of image
primitives grouped into image segments, wherein at least one of the
image primitives to create output image data; wherein at least one
image primitive is associated with an image primitive library, the
image primitives to be at least one of: (i) a histogram primitive,
(ii) a scaling primitive; or (iii) a machine vision primitive; and
deploy, by the run-time framework, the plurality of image
primitives to hardware for execution.
45. The non-transitory computer-readable storage of claim 44,
wherein image segments are grouped for at least one of in-order
execution and out-of-order execution.
46. The non-transitory computer-readable storage of claim 44,
wherein image segments are grouped in the run-time framework for
in-order execution and out-of-order execution, wherein a first
run-time framework is executed in order and a second run-time
framework is executed out of order.
47. The non-transitory computer-readable storage of claim 44,
wherein the output image data is output to a display.
48. The non-transitory computer-readable storage of claim 44,
wherein the image primitives generate output image data from input
received from a camera.
49. The non-transitory computer-readable storage of claim 44,
wherein generation of the run-time framework is powered by a
battery power source.
50. The non-transitory computer-readable storage of claim 44,
wherein the run-time framework comprises at least one of: (i) a
hardware run-time framework, (ii) a software run-time framework, or
(iii) a combination of hardware and software run-time framework
components.
51. The non-transitory computer-readable storage of claim 44,
wherein image primitives are associated with a plurality of image
segments are deployed.
52. The non-transitory computer-readable storage of claim 44,
wherein at least one of the segments is executed by an operating
system, and information about the segments is associated with an
application programming interface.
53. The non-transitory computer-readable storage of claim 44,
wherein the hardware for execution comprises at least one of a
system on a chip, a computation fabric, a processing unit, or a
fixed function hardware image processing unit.
54. The non-transitory computer-readable storage of claim 44,
further comprising executing a sequencing algorithm to order the
image primitives within an image segment for an in-order image
primitive execution in the run-time framework.
55. The non-transitory computer-readable storage of claim 44,
wherein the image primitives comprise an original order and said
executing is performed for at least some of the image primitives in
an order different than the original order for an out-of-order
image primitive execution in the run-time framework.
56. The non-transitory computer-readable storage of claim 44,
wherein the output image data is produced by the image primitives
before any reader of that data accesses it.
57. The non-transitory computer-readable storage of claim 44,
wherein at least one image segment comprises at least one of: (i)
pixel correction, (ii) artifact removal, (iii) histogram
information, (iv) a scaling function, (v) face recognition, (vi)
visual object recognition, (vii) visual scene analysis, (viii)
machine vision, (ix) gesture recognition, or (x) depth map
calculation.
58. A system, comprising: a processor; an image sensor to create
input image data; and a memory to store instructions that executed
by the processor: generate, with a run-time framework, a plurality
of image primitives grouped into image segments, wherein at least
one of the image primitives to create output image data in response
to the input image data, wherein at least one image primitive is
associated with an image primitive library, the image primitives to
be at least one of: (i) a histogram primitive, (ii) a scaling
primitive; or (iii) a machine vision primitive; and deploy, by the
run-time framework, the plurality of image primitives to hardware
for execution.
59. The system of claim 58, wherein image segments are grouped for
at least one of in-order execution and out-of-order execution.
60. The system of claim 58, wherein image segments are grouped in
the run-time framework for in-order execution and out-of-order
execution, wherein a first run-time framework is executed in order
and a second run-time framework is executed out of order.
61. The system of claim 58, wherein the image primitives generate
output image data from input received from a camera.
62. The system of claim 58, wherein generation of the run-time
framework is powered by a battery power source.
63. The system of claim 58, wherein the run-time framework
comprises at least one of: (i) a hardware run-time framework, (ii)
a software run-time framework, or (iii) a combination of hardware
and software run-time framework components.
64. The system of claim 58, wherein image primitives associated
with a plurality of image segments are deployed.
65. The system of claim 58, wherein at least one of the segments is
executed by an operating system, and information about the segments
is associated with an application programming interface.
66. The system of claim 58, wherein the hardware for execution
comprises at least one of a system on a chip, a computation fabric,
a processing unit, or a fixed function hardware image processing
unit.
67. The system of claim 58, further comprising executing a
sequencing algorithm to order the image primitives within an image
segment for an in-order image primitive execution in the run-time
framework.
68. The system of claim 58, wherein the image primitives comprise
an original order and said executing is performed for at least some
of the image primitives in an order different than the original
order for an out-of-order image primitive execution in the run-time
framework.
69. The system of claim 58, wherein the output image data is
produced by the image primitives before any reader of that data
accesses it.
70. The system of claim 58, wherein at least one image segment
comprises at least one of: (i) pixel correction, (ii) artifact
removal, (iii) histogram information, (iv) a scaling function, (v)
face recognition, (vi) visual object recognition, (vii) visual
scene analysis, (viii) machine vision, (ix) gesture recognition, or
(x) depth map calculation.
71. A system, comprising: a means to process information; a means
for creating input image data; and a means to store instructions
that executed by the processor: generate a plurality of image
primitives grouped into image segments to create output image data
in response to the input image data, wherein at least one image
primitive is associated with an image primitive library, the image
primitives to be at least one of: (i) a histogram primitive, (ii) a
scaling primitive; or (iii) a machine vision primitive; and deploy
the plurality of image primitives to means for execution.
72. The system of claim 71, wherein image segments are grouped for
at least one of in-order execution and out-of-order execution.
73. The system of claim 71, wherein image segments are grouped for
in-order execution and out-of-order execution, wherein a first
means of execution executes image segments in order and a second
means of execution executes image segments out-of-order.
74. The system of claim 71, wherein the image primitives generate
output image data from input received from a means for capturing
input.
75. The system of claim 71, wherein generation of means of
execution for image segments is powered by means of suppling
power.
76. The system of claim 71, wherein at least one of the segments is
executed by an operating system, and information about the segments
is associated with an application programming interface.
77. The system of claim 71, wherein the hardware for execution
comprises at least one of a system on a chip, a computation fabric,
a processing unit, or a fixed function hardware image processing
unit.
78. The system of claim 71, further comprising executing a
sequencing algorithm to order the image primitives within an image
segment for an in-order image primitive execution in means of
run-time execution.
79. The system of claim 71, wherein the image primitives comprise
an original order and said executing is performed for at least some
of the image primitives in an order different than the original
order for an out-of-order image primitive execution in means of
run-time execution.
80. The system of claim 71, wherein the output image data is
produced by the image primitives before any reader of that data
accesses it.
81. The system of claim 71, wherein at least one image segment
comprises at least one of: (i) pixel correction, (ii) artifact
removal, (iii) histogram information, (iv) a scaling function, (v)
face recognition, (vi) visual object recognition, (vii) visual
scene analysis, (viii) machine vision, (ix) gesture recognition, or
(x) depth map calculation.
Description
TECHNICAL FIELD
[0001] The present application is a continuation of U.S. patent
application Ser. No. 13/994,013, filed Jun. 13, 2013, which is a
National Stage Entry of International PCT Application
PCT/US2011/067487 filed Dec. 28, 2011 which both applications are
incorporated herein by reference.
TECHNICAL FIELD
[0002] Many devices include one or more image sensors and/or image
displays, and an image processing unit may facilitate the
processing of data coming from the sensor, being provided to the
display, and/or is otherwise being utilized by applications running
on the device. For example, a smart phone might include a number of
different cameras and a touch screen. The image processing unit may
include an image computation fabric having a number of different
components to process image information.
BACKGROUND
[0003] In some cases, the image processing unit may execute a
series of image primitives to create output image data (e.g., to be
sent to a touch screen) based on input image data (e.g., received
from a smart phone's camera). The image primitives may be, for
example, associated with an image primitive library and might
include, for example, sensor primitives, calibration primitives,
optics primitives, etc.
[0004] Typically, an application executing in connection the image
processing unit determines which image primitives will be executed
by the various components of the image computation fabric. For
example, the application might determine that a filter primitive
will be executed by fixed function hardware. Such an approach,
however, can have several disadvantages. For example, the
application might be unaware that another application is also
attempting to use the same fixed function hardware. As a result, an
application may "stall" or need to wait until the fixed function
hardware becomes free, and the performance of the system may be
degraded.
[0005] Moreover, the substantial number and relative complexity of
image primitives (and the fact that they may operate differently in
connection with different components of different image execution
fabrics) may result in substantial software development costs and
inhibit innovation for application software developers (who may be
forced to create customized software for each new platform).
BRIEF DESCRIPTION OF THE DRAWINGS
[0006] FIG. 1 is a block diagram of a device.
[0007] FIG. 2 is a block diagram of an imaging processing unit.
[0008] FIG. 3 is a block diagram of an imaging processing unit in
accordance with some embodiments.
[0009] FIG. 4 is a flow diagram illustrating a method in accordance
with some embodiments.
[0010] FIG. 5 illustrates primitive attributes for a convolution
image primitive according to some embodiments.
[0011] FIG. 6 is a block diagram of an imaging processing unit
having a primitive attribute database or other data structure in
accordance with some embodiments.
[0012] FIG. 7 is a block diagram of an imaging processing unit with
an initialize component and a sequencer component in accordance
with some embodiments.
[0013] FIG. 8 is a block diagram of an imaging processing unit to
process multiple image streams in accordance with some
embodiments.
[0014] FIG. 9 is a block diagram of an imaging processing unit
providing a software proxy emulation of fixed function hardware
according to some embodiments.
[0015] FIGS. 10A through 10C illustrate segments in image streams
according to some embodiments.
[0016] FIG. 11 is an example of a graphical user interface for
segment attribute definition in accordance with some
embodiments.
[0017] FIG. 12 is a flow diagram illustrating a method associated
with image stream segments in accordance with some embodiments.
[0018] FIG. 13 is an overall view including an image computation
fabric and software architecture characteristics according to some
embodiments.
DESCRIPTION OF THE EMBODIMENTS
[0019] FIG. 1 is a block diagram of a device 100 that may include,
for example, one or more image sensors 110 and/or image displays
120. The sensor 110 might comprise, for example, a camera, video
camera, a depth sensor, and/or a stereo image sensor. The display
120 might comprise, for example, a touch screen, a high resolution
display, and/or a three dimensional image. An image processing unit
130 may facilitate the processing of data coming from the sensor
110, being provided to the display 120, and/or is otherwise being
utilized by applications running on the device 100. Note that the
device 100 may further include one or more supplemental interfaces
140, such as a digital display port (e.g., to be coupled to a
digital television or computer monitor), a wireless antenna, or a
Universal Serial Bus (USB) interface. Note that the device might be
associated with, for example, a smart phone, a tablet computer, a
mobile computing device, a mobile telephone, a desktop computer, a
laptop computer, a gaming system, a set-top box, or a
television.
[0020] The device 100 illustrated in FIG. 1 may exchange
information via any communication network which may be one or more
of a Local Area Network ("LAN"), a Metropolitan Area Network
("MAN"), a Wide Area Network ("WAN"), a proprietary network, a
Public Switched Telephone Network ("PSTN"), a Wireless Application
Protocol ("WAP") network, a Bluetooth network, a wireless LAN
network, and/or an Internet Protocol ("IP") network such as the
Internet, an intranet, or an extranet. Note that any devices
described herein may communicate via one or more such communication
networks.
[0021] All systems and processes discussed herein may be embodied
in program code stored on one or more non-transitory
computer-readable media. Such media may include, for example, a
solid state Random Access Memory ("RAM") or Read Only Memory
("ROM") storage units. Embodiments are therefore not limited to any
specific combination of hardware and software.
[0022] FIG. 2 is a block diagram of an imaging processing unit 200
that might be used in the display device of FIG. 1. The image
processing unit 200 includes an image computation fabric 210 that
may process image information. The image computation fabric 210
might include, for example, a fixed function hardware image
processing unit 212, a Single Instruction, Multiple Data (SIMD)
image execution unit 214, a Very Long Instruction Word (VLIW)
processing unit 216, and/or a general processing unit 218.
[0023] The image processing unit 200 may execute a series of image
primitives 220 to create output image data (e.g., to be sent to a
touch screen) based on input image data (e.g., received from a
smart phone's camera). The image primitives 220 are associated with
an image primitive library stored in an image primitive database
260 and might include, for example, sensor primitives, calibration
primitives, optics primitives, lighting primitives, depth
primitives, segmentation primitives, color primitives, filter
primitives, and/or three dimensional depth primitives.
[0024] The set of image primitives 200 executed on the stream of
image information may represent a set of resources used by an
application to process the image data. For example, an imaging
application might require a small set of image primitives 200 to
provide processing to implement specific high level algorithms,
such as face recognition, gesture recognition, etc. That is, the
image primitives 220 may be used together to process image data and
achieve higher level goals. The image primitives 220 may represent
building blocks for larger algorithms, and may be resources which
must be managed and made available to multiple simultaneous imaging
and visual computing applications.
[0025] A set of image primitives 220 may be associated with many
different types of image algorithms, such as those associated with
pixel correction, artifact removal, histogram information, scaling
functions, face recognition, visual object recognition, visual
scene analysis, machine vision, gesture recognition, and/or depth
map calculations. Moreover, different types of image primitives 220
might be associated with, by way of examples only, camera sensor
format processing (Bayer Red Green Blue (RGB), Aptina.TM. RGB,
Kodak.TM. RGBW, etc.), camera sensor dimensions (1080p, etc.),
camera sensor frame rates, calibrations (Auto White Balance, Auto
Shutter, Auto Focus, etc.), dead pixel detection and correction,
lighting controls, optics controls, three dimensional depth sensor
controls (structured light, stereo triangulation, etc.), color
conversion (RGB, YUV, HIV, etc.), Look-Up Table (LUT) processing
and value substitution, boolean operations, segmenting an image
into various component parts (foreground, background, objects,
etc.), filters (sharpen, blur, media, etc.), edge detection (Sobel,
Roberts, Prewitt, etc.), point operations (Pixel Math, etc.),
and/or domain processing (Fourier, HAAR, Karhunen-Loeve, Slant
Transform, etc.)
[0026] Typically, an application executing in connection the image
processing unit 200 determines which image primitives 220 will be
executed by the various components 212, 214, 216, 218 of the image
computation fabric 210. For example, the application might
determine that a filter primitive will be executed by the fixed
function hardware 212. Such an approach, however, can have several
disadvantages. For example, the application might be unaware that
another application is also attempting to use the fixed function
hardware 212. As a result, an application may "stall" or need to
wait until the fixed function hardware becomes free, and the
performance of the system may be degraded.
[0027] Moreover, the substantial number and relative complexity of
image primitives 220 (and the fact that they may operate
differently in connection with different components of different
image execution fabrics 210) may result in substantial software
development costs and inhibit innovation for application software
developers (who may be forced to create customized software for
each new platform).
[0028] Thus, embodiments provided herein may provide for improved
deployment of image primitives to a computation fabric. In
particular, FIG. 3 is a block diagram of an imaging processing unit
300 in accordance with some embodiments. As before, the image
processing unit 300 includes an image computation fabric 310 that
may process image information. The image computation fabric 310
might include, for example, a fixed function hardware image
processing unit 312, an SIMD image execution unit 314, a VLIW
processing unit 316, and/or a general processing unit 318. The
image processing unit 300 may execute a series of image primitives
320 to create output image data (e.g., to be sent to a touch
screen) based on input image data (e.g., received from a smart
phone's camera). The image primitives 320 are associated with an
image primitive library stored in an image primitive database 360.
According to this embodiment, an image pipeline controller 330 may
be used to help deploy the image primitives 320 to the image
computation fabric 310. Note that the image pipeline controller 330
might be associated with a hardware image pipeline controller, a
software image pipeline controller, or a combination of hardware
and software image pipeline controller components.
[0029] The image pipeline controller 330 may deploy image
primitives 320 (e.g. to various components of the image computation
fabric 310) in a number of different ways. For example, FIG. 4 is a
flow diagram of a process that might be associated with the
pipeline controller 330 of FIG. 3 according to some embodiments.
Note that all processes described herein may be executed by any
combination of hardware and/or software. The processes may be
embodied in program code stored on a tangible medium and executable
by a computer to provide the functions described herein. Further
note that the flow charts described herein do not imply a fixed
order to the steps, and embodiments of the present invention may be
practiced in any order that is practicable.
[0030] At 402, an image pipeline controller may determine an image
stream having a plurality of image primitives to be executed, each
image primitive being associated with an image algorithm and a set
of primitive attributes. The image stream might be, for example,
received from a video camera. At 404, the image pipeline controller
may automatically deploy the set of image primitives to an image
computation fabric based at least in part on primitive
attributes.
[0031] As used herein, a primitive "attribute" may be any
information that describes aspects of the operation or execution of
the image primitive. One skilled in the art will recognize that a
wide range of attributes may be assigned to each primitive or group
of primitives within a segment, thus the attributes listed herein
serve to illustrate the concepts of this invention and therefore do
not limit the applicability of this invention to incorporate other
useful attributes beside those listed.
[0032] For example, FIG. 5 illustrates primitive attributes 500 for
a convolution image primitive according to some embodiments. The
image attributes 500 may be, for example, defined by a developer of
the image primitive. According to some embodiments, the primitive
attributes 500 might reflect a number of computation units, a
performance value, a power value, a thermal value, and/or a rank
preference for the image attribute. By way of example, each image
primitive might be assigned attributes a-priori by design engineers
who characterize the primitive in connection with various criteria
to define the image primitive in terms of: a performance rank for
software primitives on various processors or fixed function
hardware, a preferred processor(s) for software primitives, a
ranking of performance versus power, a fixed function hardware
availability (e.g., some primitives may be implemented only in
software), and/or an indication of whether or not the image
primitive can process a subset of image information (e.g., to be
processed as 4 k cache-resident pixel image "tiles" for two
dimensional images). That is, these primitive attributes 500 may be
used to define image primitive behavior at run-time.
[0033] The image primitives may be stored within a primitive
attribute database or other data structure and used by a compiler
or translator that is accessed by a pipeline controller to
interpret the attributes and execute primitives in accordance with
the attributes. For example, FIG. 6 is a block diagram of an
imaging processing unit 600 having a primitive attribute database
640 or other data structure in accordance with some embodiments.
The image processing unit 600 includes an image computation fabric
610 to execute image information including a fixed function
hardware image processing unit 612, an SIMD image execution unit
614, a VLIW processing unit 616, and/or a general processing unit
618. The image processing unit 600 may execute a series of image
primitives 620 to create output image data based on input image
data, and an image pipeline controller 630 may be used to help
deploy the image primitives 620 to the image computation fabric
610. The image pipeline controller 630 may deploy image primitives
620 (e.g. to various components of the image computation fabric
610) based on information in the primitive attribute database 640
or other data structure. For example, a software application may
use an Application Programming Interface (API) to query image
primitive attributes and associated assets, and then the
application may choose a preferred method for using the image
primitive 620 based on the available primitive attributes.
[0034] According to some embodiments, the image pipeline controller
630 and/or primitive attribute database 640 or other data structure
at run-time may read the primitive attributes of each image
primitive 620 to determine the best way to run a workload within a
given image computation fabric 610. For example, an image primitive
620 may be available both in fixed function hardware 612 and a
software proxy as defined in the primitive attributes, in which
case an application might choose which type should be executed to
achieve a performance versus wattage target.
[0035] According to some embodiments, when a software application
has not specified how to use an image primitive 620 via a primitive
attribute, the image pipeline controller 630 and/or primitive
attribute database 640 or other data structure may be used by the
various components comprising the run-time framework within this
invention to automatically attempt to optimize performance.
According to some embodiments, the run-time framework may
automatically attempt to optimize performance of primitives across
a compute fabric according to a-priori defined attributes of each
primitive, where primitives may be grouped into segments which may
be executed in-order or out-of-order according to their attributes.
Moreover, as described with respect to FIGS. 10A through 10C
segments may chained together to form a pipeline, and the run-time
framework may optimize the workload according to the available
compute resources as per the attributes defined for each primitive.
Moreover, the optimization may include adjusting the behavior of
the computing assets such as a clock frequency, voltage, bus speed,
processor speed, processor time slice size for threads, device and
thread priorities, bus arbitration priorities, memory tile sizes,
cache behavior, memory behavior, primitive implementation method of
SW or FF HW, etc.
[0036] For example, FIG. 7 is a block diagram of an imaging
processing unit 700 with an initialize component 732 and a
sequencer component 734 in accordance with some embodiments. The
image processing unit 700 includes an image computation fabric 710
to execute image information including a fixed function hardware
image processing unit 712, an SIMD image execution unit 714, a VLIW
processing unit 716, and/or a general processing unit 718. The
image processing unit 700 may execute a series of image primitives
720 to create output image data based on input image data, and an
image pipeline controller 730 may be used to help deploy the image
primitives 720 to the image computation fabric 710. The image
pipeline controller 730 may deploy image primitives 720 (e.g. to
various components of the image computation fabric 710) based on
information in the primitive attribute database 740 or other data
structure. The initialize component 732 may be used, for example,
to initialize a camera, image sensor, or any other device.
[0037] The sequencer component 734 may execute a sequencing
algorithm to order the image primitives 720 within the image stream
for an in-order image primitive execution in a pipeline sequence.
According to some embodiments, the image primitives 720 may be
associated with an original order, and the execution of the image
primitives 720 may be performed for at least some of the image
primitives 720 in an order different than the original order for an
"out-of-order" primitive execution in a pipeline sequence. For
example, at run time the sequencer component 734 may order the
image primitives 720 to execute efficiently within the image
computation fabric 710. For example, portions of an image stream
may allow out-of-order image primitive execution (and may have no
dependencies) and such image primitives 720 may be candidates for
parallel execution across the components of the image computation
fabric 710.
[0038] A resource manager and run time resource lock mechanism may
be responsible for determining the availability of assets or
components of the image computation fabric 710, locking assets for
exclusive use by a pipeline or application, monitoring asset
states, and/or freeing assets for use by other pipelines or
application. Such an approach may permit, for example, multiple
simultaneous applications to use the components of the image
computation fabric 710. For example, FIG. 8 is a block diagram of
an imaging processing unit 800 to process multiple image streams
820 in accordance with some embodiments. The image processing unit
800 includes an image computation fabric 810 to execute image
information including a fixed function hardware image processing
unit 812, an SIMD image execution unit 814, a VLIW processing unit
816, and/or a general processing unit 818. The image processing
unit 800 may execute a series of image primitives for multiple
image streams 820 to create output image data based on input image
data, and an image pipeline controller 830 may be used to help
deploy the image primitives of the image streams 820 to the image
computation fabric 810. The image pipeline controller 830 may
deploy image primitives of the image stream 820 (e.g. to various
components of the image computation fabric 810) based on
information in a primitive attribute database 840 or other data
structure. For example, different image streams 820 may be
associated with different applications being executed by an
operating system, and information about the image streams 820 may
be associated with an API.
[0039] According to some embodiments, a tile processor 836 in the
image pipeline controller 830 may determine whether a tile subset
of image data is to be deployed to the image computation fabric 810
based at least in part on a primitive attribute in the primitive
attribute database 840. For example, a primitive attribute might
indicate that a convolution image primitive in an image stream 820
can be divided into tiles that can be separately processed by
components of the image computation fabric 810 (e.g., to allow for
more efficient execution). That is, at run time the tile processor
835 may manage dividing an image stream 820 being sent through the
pipeline into tiled regions when possible and/or specified by an
application. The tiling technique may let an image be processed in
smaller tiles that fit inside a cache line, enabling swap-free
access to the data with little or no page faults. This may speed up
performance as compared to processing each image primitive over an
entire image, sequentially.
[0040] According to some embodiments, a load distributor and
balancer 838 in the image pipeline controller 830 may execute a
load-balancing algorithm between image primitives in different
image streams 820. For example, at run time the load distributor
and balancer 838 may let multiple applications simultaneously use
available assets in the image computation fabric 810, and a stream
multiplexer may manage resource locks and resource contention
issues. The load distributor and balancer 838 may also execute a
workload distribution algorithm to select an image processing
component to receive one of the image primitives in the image
streams 820. The selection may be based on a power and performance
policy, resource reservation priorities, pipeline priorities,
and/or resource availability arbitration priorities. According to
some embodiments, a workload distribution algorithm may reduces
stall and/or optimize for power or performance associated with
execution of the image primitives in the image computation fabric
810. Thus, the load distributor and balancer 838 may spread the
workload across available resources in the image computation fabric
810, to parallelize workload execution when possible. According to
some embodiments, information in the primitive attribute database
840 may provide guidance for the load distributor and balancer
838.
[0041] For example, a workload distribution algorithm might select
one of the fixed function hardware image processing unit 812 or a
"software emulation" or proxy of the fixed function hardware image
processing unit 812 based on primitive attributes and/or an image
processing component status (e.g., when the fixed function hardware
image processing unit 812 is in use by another application, the
load distributor and balancer 838 might select to use a software
proxy of that component instead). FIG. 9 is a block diagram of an
imaging processing unit 900 providing a software proxy emulation of
fixed function hardware 950 according to some embodiments. The
image processing unit 900 includes an image computation fabric 910
to execute image information including a fixed function hardware
image processing unit 912, an SIMD image execution unit 914, a VLIW
processing unit 916, and/or a general processing unit 918. The
image processing unit 900 may execute a series of image primitives
for multiple image streams 920 to create output image data based on
input image data, and an image pipeline controller 930 may deploy
the image primitives of the image streams 920 to the image
computation fabric 910. The image pipeline controller 930 may
deploy image primitives of the image stream 920 to various
components of the image computation fabric 910 and/or the software
proxy emulations 950 based on information in a primitive attribute
database 940 or other data structure. For example, an image
primitive might be deployed to a software proxy emulation 950 when
a corresponding component in the image computation fabric 910 is
currently being used by another image stream 920 and/or another
application.
[0042] Note that FIG. 9 represents a logical architecture according
to some embodiments, and actual implementations may include more or
different components arranged in other manners. Moreover, each
system described herein may be implemented by any number of devices
in communication via any number of communication paths. Two or more
of the devices may be may be implemented in a single component.
Further, each device may comprise any number of hardware and/or
software elements suitable to provide the functions described
herein as well as any other functions. Other topologies may be used
in conjunction with other embodiments.
[0043] The image streams 920 are composed of sequences of image
primitives. According to some embodiments, a subset of the image
primitives within a stream are associated with an image stream
"segment." For example, FIG. 10A illustrates 1000 segments in image
streams according to some embodiments. In particular, a first image
stream 1010 includes an image stream segment 1012 comprising image
primitives A, B, and C. The first image stream 1010 also includes a
number of individual image primitives 1014, 1016. A second image
stream 1020 includes other image stream segments 1022 (comprising
image primitives A, E, and F) and 1024 (comprising image primitives
B, D, and G).
[0044] The image streams 1010, 1020 of FIG. 10A might be composed
of "in-order" image stream segments. That is, each image stream
segment might be deployed to the image computation fabric only
after the prior segment has executed (sequential execution). Note,
however, that some image streams might support out-of-order
execution. For example, FIG. 10B illustrates 1030 three image
stream segments being executed in parallel by the image computation
fabric (spread out across computing resources). Similarly, FIG. 100
illustrates 1040 an image pipeline composed of both in-order and
out-of-order image stream segments chained together.
[0045] According to some embodiments, the image stream segments may
be associated with one or more image stream attributes for workload
distribution, stall reduction, power optimization, performance
optimization, load balancing, and/or a sequencing algorithm. Thus,
a pipeline or image stream may be composed of segments, where
segments are composed of sets of image primitives. Moreover, sets
of primitives may be combinations of either fixed function
hardware, software proxy emulations of the fixed function hardware
that can be used when the fixed function hardware is busy, or
"software only" primitives. Moreover, segments might be executed
either in-order or out-of-order. According to some embodiments,
image primitives, segments and/or entire pipelines may have policy
attributes such as priority, power/performance budget, memory size
requests, memory bandwidth requests. Note that a programmable
segment could be provided such that it is associated with an
arbitrary set of image primitives and/or an arbitrary image
primitive order (e.g., to allow a customer to program an area image
processing function).
[0046] Thus, a segment of an image stream may be assigned various
attributes to control its execution during run time. For example,
FIG. 11 is an example of a Graphical User Interface 1100 for
segment attribute definition in accordance with some embodiments.
The GUI 1100 may, for example, let a software developer define
attributes for a camera fixed function hardware segment and an
associated camera software proxy segment. In particular, the GUI
1100 may be used to turn various attributes (e.g., de-noise,
artifact removal, and/or video stabilization attributes) "or" or
"off" for the segments as appropriate.
[0047] These attributes may be used by an image pipeline controller
when deploying the segment to an image computation fabric. For
example, FIG. 12 is a flow diagram illustrating a method associated
with image stream segments in accordance with some embodiments. At
1202, an image stream with multiple segments may be received by an
image pipeline controller. For example, the image stream might be
received from a smartphone video camera. At 1204, a stall avoidance
and power optimization analysis may be performed for the segment.
For example, an image pipeline control might try to avoid deploying
multiple tasks to the same component at the same time. At 1206, a
workload distribution and load balancing analysis may be performed
in connection with various pipelines, segments and/or individual
image primitives. For example, an image pipeline controller might
attempt to deploy tasks to under utilized image resources or
assets. At 1208, an execution component may be selected for the
segment. Based on the selection at 1208, the segment may be
assigned to a fixed function hardware image processing unit at 1210
or a software proxy of a fixed function hardware image processing
unit at 1212.
[0048] FIG. 13 is an overall view 1300 including an image
computation fabric 1310 (e.g., associated with any of the image
execution components described herein) and software architecture
characteristics 1320 according to some embodiments. The software
architecture characteristics 1320 may include a number of
applications executing simultaneously. Moreover, the applications
may access a software framework for hardware context management
(enabling the multiple simultaneous applications) via software
libraries that may be provided for common use cases. Moreover,
software primitives at a hardware abstraction layer may be provided
for the software architecture characteristics 1320. In this way, a
programming model may be provided using image primitives to
simplify development of software applications. Moreover, sensor
processing may be associated with decreased software development
and/or tool costs and improved software scalability for System On
Chip (SOC) image products.
[0049] Embodiments described herein may provide a standard software
API across different execution components and/visual computing
assets associated with perceptual computing software and fixed
function hardware, camera pipelines and asset to help provide an
improved user experience and performance versus wattage
advantages.
[0050] Embodiments described herein may provide a standard software
API across different execution components and/visual computing
assets associated with perceptual computing software and fixed
function hardware, camera pipelines and asset to help provide an
improved user experience and performance versus wattage
advantages.
[0051] According to some embodiments, a run-time framework may
automatically attempt to facilitate or optimize performance of
primitives across a compute fabric according to a-priori defined
attributes of each primitive. Moreover, according to some
embodiments, primitives may be grouped into segments which might be
executed in-order or out-of-order according to their attributes.
Moreover, segments may be chained together into a pipeline, and the
run-time framework may attempt to facilitate or optimize the
workload according to the available compute resources as per the
attributes defined for each primitive or segment. According to some
embodiments, the facilitation or optimization might include support
for multiple simultaneous applications to share the compute fabric,
interleaving for resource sharing and usage by different
applications, resource locking and sharing mechanisms for
primitives in a compute fabric, adjusting the behavior of the
computing primitive assets such as by adjusting a clock frequency,
voltage, bus speed, processor speed, processor time slice size for
threads, device and thread priorities, bus arbitration priorities,
memory tile sizes, cache behavior, memory behavior, primitive
implementation method of SW or FF HW, etc.
[0052] The following illustrates various additional embodiments and
do not constitute a definition of all possible embodiments, and
those skilled in the art will understand that the present invention
is applicable to many other embodiments. Further, although the
following embodiments are briefly described for clarity, those
skilled in the art will understand how to make any changes, if
necessary, to the above-described apparatus and methods to
accommodate these and other embodiments and applications.
[0053] Although embodiments have been described with respect to
particular types of image sensors and displays, note that
embodiments may be associated with other types of sensors and
displays. For example, three dimensional cameras and/or displays
may be supported by any of the embodiments described herein.
Moreover, while embodiments have been illustrated using particular
ways of processing image information, note that embodiments might
instead be associated with any other sorts of image primitives
and/or algorithms.
[0054] Embodiments have been described herein solely for the
purpose of illustration. Persons skilled in the art will recognize
from this description that embodiments are not limited to those
described, but may be practiced with modifications and alterations
limited only by the spirit and scope of the appended claims.
* * * * *