U.S. patent application number 14/727427 was filed with the patent office on 2015-09-17 for validation of applications for graphics processing unit.
The applicant listed for this patent is QUALCOMM Incorporated. Invention is credited to Alexei V. Bourd, Jay Chunsup Yun.
Application Number | 20150261651 14/727427 |
Document ID | / |
Family ID | 47846123 |
Filed Date | 2015-09-17 |
United States Patent
Application |
20150261651 |
Kind Code |
A1 |
Bourd; Alexei V. ; et
al. |
September 17, 2015 |
VALIDATION OF APPLICATIONS FOR GRAPHICS PROCESSING UNIT
Abstract
The techniques described in this disclosure are directed to
validating an application that is to be executed on a graphics
processing unit (GPU). For example, a validation server device may
receive code of the application. The validation server device may
provide some level of assurance that the application satisfies one
or more performance criteria. In this manner, the probability of a
problematic application executing on the device that includes the
GPU may be reduced.
Inventors: |
Bourd; Alexei V.; (San
Diego, CA) ; Yun; Jay Chunsup; (Carlsbad,
CA) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
QUALCOMM Incorporated |
San Diego |
CA |
US |
|
|
Family ID: |
47846123 |
Appl. No.: |
14/727427 |
Filed: |
June 1, 2015 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
13406272 |
Feb 27, 2012 |
9075913 |
|
|
14727427 |
|
|
|
|
Current U.S.
Class: |
717/110 ;
717/168 |
Current CPC
Class: |
G06F 11/0793 20130101;
G06F 11/3652 20130101; G06F 8/30 20130101; G06F 11/3604 20130101;
G06F 2221/033 20130101; G06F 11/3664 20130101; G06F 21/566
20130101; G06F 11/0751 20130101; G06F 11/0736 20130101; G06F
11/3612 20130101 |
International
Class: |
G06F 11/36 20060101
G06F011/36; G06F 21/56 20060101 G06F021/56; G06F 9/44 20060101
G06F009/44 |
Claims
1. A computer-readable storage medium comprising instructions that,
when executed, cause one or more processors to: receive, with a
server device, an application that is to be executed by a graphics
processing unit (GPU) that resides on a device external to the
server device; determine, by the server device, that the
application would execute inefficiently on the GPU; modify, by the
server device and based on the determination that the application
would execute inefficiently on the GPU, code of the application to
create a modified version of the application that would execute
more efficiently on the GPU than the received application; perform,
with the server device, an analysis of the modified version of the
application during execution of the modified version of the
application on the server device, wherein the instructions that
cause the one or more processors to perform the analysis comprise
instructions that cause the one or more processors to: execute a
virtual GPU model; execute the modified version of the application
on the virtual GPU model; and analyze functionality of the virtual
GPU model during the execution of the modified version of the
application on the virtual GPU model; determine whether the
modified version of the application satisfies one or more
performance criteria based on at least one of the analyses; and
transmit, to the device, the modified code of the application and a
validation of the application if the application satisfies the one
or more performance criteria.
2. The computer-readable storage medium of claim 1, wherein the
instructions that cause the one or more processors to receive the
application further comprise instructions that cause the one or
more processors to receive an identification of the GPU that
resides on the device external to the server device, and further
comprising instructions that cause the one or more processors to:
identify, based on the received identification of the GPU, a
particular virtual GPU model of a plurality of virtual GPU models,
wherein the instructions that cause the one or more processors to
execute the virtual GPU model comprise instructions that cause the
one or more processors to execute the identified particular virtual
GPU model, and wherein the instructions that cause the one or more
processors to execute the modified version of the application on
the virtual GPU model comprise instructions that cause the one or
more processors to execute the modified version of the application
on the identified particular virtual GPU model.
3. A method comprising: receiving an application that is to be
executed by a graphics processing unit (GPU) of a device;
transmitting the application and an identification of the GPU to a
server device external to the device for validation of the
application on a virtual GPU model associated with the identified
GPU of the device; receiving a modified version of the application
from the server device, wherein the modified version of the
application would execute more efficiently on the GPU; and
receiving a validation from the server device that indicates that
the modified version of the application satisfies one or more
criteria for execution on the GPU.
4. The method of claim 3, further comprising: executing the
modified version of the application on the GPU based on the
received validation.
5. The method of claim 3, wherein receiving the modified version of
the application comprises receiving at least one of source code for
the modified version of the application, intermediate code of the
modified version of the application, and complied code for the
modified version of the application, and wherein transmitting the
application comprises transmitting at least one of the source code
for the application, intermediate code of the application, and the
compiled code of the application.
6. The method of claim 3, further comprising: executing the
modified version of the application on the GPU.
7. The method of claim 3, wherein transmitting the application
comprises transmitting at least one of a source code of the
application and an intermediate code of the application, wherein
receiving the modified version of the application comprises
receiving compiled object code of the modified version of the
application from the server device, the method further comprising:
executing the compiled object code of the modified version of the
application on the GPU.
8. The method of claim 3, wherein transmitting the application to
the server device comprises transmitting the application only once
to the server device, and wherein receiving the validation from the
server device comprises receiving, only once, the validation from
the server device.
9. An apparatus comprising: a graphics processing unit (GPU); a
device memory operable to store an application that is to be
executed by the GPU; and a processor configured to: transmit the
application and an identification of the GPU to a server device
external to the apparatus for validation of the application on a
virtual GPU model associated with the identified GPU of the device;
receive a modified version of the application from the server
device, wherein the modified version of the application would
execute more efficiently on the GPU; and receive a validation from
the server device that indicates that the modified version of the
application satisfies one or more criteria for execution on the
GPU.
10. The apparatus of claim 9, wherein the processor is further
configured to instruct the GPU to execute the modified version of
the application based on the received validation, and wherein the
GPU is operable to execute the application in response to the
instruction from the processor.
11. The apparatus of claim 9, wherein the processor receives at
least one of source code for the modified version of the
application, intermediate code of the modified version of the
application, and complied code for the modified version of the
application, and wherein the processor transmits at least one of
the source code for the application, intermediate code of the
application, and the compiled code of the application.
12. The apparatus of claim 9, wherein the GPU is configured to
execute the modified version of the application.
13. The apparatus of claim 9, wherein the processor transmits at
least one of a source code of the application and an intermediate
code of the application, wherein the processor is configured to
receive the modified version of the application by at least
receiving compiled object code of the modified version of the
application from the server device, and wherein the GPU is
configured to execute the compiled object code of the modified
version of the application.
14. The apparatus of claim 9, wherein the processor transmits the
application only once to the server device, and wherein the
processor receives the validation from the server device only
once.
15. A device comprising: a graphics processing unit (GPU); means
for receiving an application that is to be executed by the GPU;
means for transmitting the application and an identification of the
GPU to a server device external to the device for validation of the
application on a virtual GPU model associated with the identified
GPU of the device; means for receiving a modified version of the
application from the server device, wherein the modified version of
the application would execute more efficiently on the GPU; and
means for receiving a validation from the server device that
indicates that the modified version of the application satisfies
one or more criteria for execution on the GPU.
16. The device of claim 15, further comprising: means for executing
the modified version of the application on the GPU based on the
received validation.
17. The device of claim 15, wherein the means for receiving the
modified version of the application comprise means for receiving at
least one of source code for the modified version of the
application, intermediate code of the modified version of the
application, and complied code for the modified version of the
application, and wherein the means for transmitting the application
comprise means for transmitting at least one of the source code for
the application, intermediate code of the application, and the
compiled code of the application.
18. The device of claim 15, further comprising: means for executing
the modified version of the application on the GPU.
19. The device of claim 15, wherein the means for transmitting the
application comprise means for transmitting at least one of a
source code of the application and an intermediate code of the
application, wherein the means for receiving the modified version
of the application comprise means for receiving compiled object
code of the modified version of the application from the server
device, the device further comprising: means for executing the
compiled object code of the modified version of the application on
the GPU.
20. The device of claim 15, wherein the means for transmitting the
application to the server device comprise means for transmitting
the application only once to the server device, and wherein the
means for receiving the validation from the server device comprise
means for receiving, only once, the validation from the server
device.
Description
[0001] This application is a continuation of U.S. application Ser.
No. 13/406,272 filed Feb. 27, 2012, the entire content of which is
incorporated herein by reference in its entirety.
TECHNICAL FIELD
[0002] This disclosure is directed to applications that execute on
a graphics processing unit (GPU), and more particularly, to
validation of such applications.
BACKGROUND
[0003] Graphics processing units (GPUs) traditionally have been
limited to performing only graphics related processing in
fixed-function pipelines that provide very limited functional
flexibility. Newer GPUs include programmable cores that execute
programs, and thereby provide greater functional flexibility as
compared to the traditional GPUs. The programmable cores may
execute both graphics related applications and non-graphics related
applications.
SUMMARY
[0004] In general, this disclosure is related to techniques for
identifying potentially problematic applications that are to be
executed on a graphics processing unit (GPU), prior to execution.
Examples of problematic applications include, but are not limited
to, malicious applications, as well as inefficient or error-prone
applications. For example, a server device external to the device
that houses the GPU may validate the application. Validation of the
application may mean that the application satisfies one or more
criteria. As one example, validation may mean determining with some
level of assurance that the application is not a malicious
application, an error-prone application, or an inefficient
application. The server device may transmit an indication, to the
device, that indicates whether it is either safe or unadvisable for
the GPU to execute the program. The device may then elect to
execute the program on the GPU based on the received
indication.
[0005] In one example, the disclosure describes a method that
includes receiving, with a server device, an application that is to
be executed by a graphics processing unit (GPU) that resides on a
device external to the server device. The method also include
performing, with the server device, at least one of an analysis of
the application prior to and during compilation of the application
on the server device, and an analysis of the application during
execution of the application on the server device. The method
further includes determining whether the application satisfies one
or more performance criteria based on at least one of the analyses,
and transmitting to the device a validation of the application if
the application satisfies the one or more performance criteria.
[0006] In another example, the disclosure describes an apparatus
that includes an emulator unit operable to receive an application
that is to be executed by a graphics processing unit (GPU) that
resides on a device external to the apparatus. The emulator unit is
also operable to perform at least one of an analysis of the
application prior to and during compilation of the application on
the apparatus, and an analysis of the application during execution
of the application on the apparatus. The emulator unit is also
operable to determine whether the application satisfies one or more
performance criteria based on at least one of the analyses, and
transmit to the device a validation of the application if the
application satisfies the one or more performance criteria.
[0007] In another example, the disclosure describes a server device
that includes means for receiving an application that is to be
executed by a graphics processing unit (GPU) that resides on a
device external to the server device. The server device also
includes means for performing at least one of an analysis of the
application prior to and during compilation of the application on
the server device, and an analysis of the application during
execution of the application on the server device. The server
device further includes means for determining whether the
application satisfies one or more performance criteria based on at
least one of the analyses, and means for transmitting to the device
a validation of the application if the application satisfies the
one or more performance criteria.
[0008] In another example, the disclosure describes a
non-transitory computer-readable storage medium comprising
instructions that cause one or more processors to receive, with a
server device, an application that is to be executed by a graphics
processing unit (GPU) that resides on a device external to the
server device. The instructions further cause one or more
processors to perform, with the server device, at least one of an
analysis of the application prior to and during compilation of the
application on the server device, and an analysis of the
application during execution of the application on the server
device. The instructions also cause the one or more processors to
determine whether the application satisfies one or more performance
criteria based on at least one of the analyses, and transmit to the
device a validation of the application if the application satisfies
the one or more performance criteria.
[0009] In another example, the disclosure describes a method that
includes receiving an application that is to be executed by a
graphics processing unit (GPU) of a device, and transmitting the
application to a server device external to the device for
validation of the application. The method further includes
receiving a validation from the server device that indicates that
the application satisfies one or more criteria for execution on the
GPU.
[0010] In another example, the disclosure describes an apparatus
that includes a graphics processing unit (GPU), and a device memory
operable to store an application that is to be executed by the GPU.
The apparatus also includes a processor operable to transmit the
application to a server device external to the apparatus, and
receive a validation from the server device that indicates that the
application satisfies one or more criteria for execution on the
GPU.
[0011] In another example, the disclosure describes a device that
includes a graphics processing unit (GPU). The device also includes
means for receiving an application that is to be executed by the
GPU, and means for transmitting the application to a server device
external to the device for validation of the application. The
device further includes means for receiving a validation from the
server device that indicates that the application satisfies one or
more criteria for execution on the GPU.
[0012] In another example, the disclosure describes a
non-transitory computer-readable storage medium comprising
instructions that cause one or more processors to receive an
application that is to be executed by a graphics processing unit
(GPU) of a device, and transmit the application to a server device
external to the device for validation of the application. The
instructions further cause the processor to receive a validation
from the server device that indicates that the application
satisfies one or more criteria for execution on the GPU.
[0013] The details of one or more aspects of the disclosure are set
forth in the accompanying drawings and the description below. Other
features, objects, and advantages of the disclosure will be
apparent from the description and drawings, and from the
claims.
BRIEF DESCRIPTION OF DRAWINGS
[0014] FIG. 1 is a block diagram illustrating an example of a
system that may be operable to implement one or more aspects of
this disclosure.
[0015] FIG. 2 is a flowchart illustrating an example operation of a
device that may be operable to implement one or more aspects of
this disclosure.
[0016] FIG. 3 is a flowchart illustrating an example operation of a
server that may be operable to implement one or more aspects of
this disclosure.
[0017] FIG. 4. is a flowchart illustrating another example
operation of a server that may be operable to implement one or more
aspects of this disclosure.
[0018] FIG. 5 is a block diagram illustrating an example device,
illustrated in FIG. 1, in further detail.
DETAILED DESCRIPTION
[0019] In general, this disclosure is related to techniques to
ensure proper functionality of applications that are to be executed
on a graphics processing unit (GPU). Some previous GPUs included
only fixed-function hardware pipelines which did not provide
programming capabilities. However, to increase functional
flexibility, newer GPUs allow for programmable shader cores. For
example, these GPUs execute applications such as vertex shaders and
fragment shaders that perform functions that were previously
delegated to components of the fixed-function hardware
pipelines.
[0020] While programmable shader cores allow for functional
flexibility, they also invite misuse or suboptimal use of the GPU.
For example, a malicious developer may develop an application that
generates a denial of service attack or a virus. In some instances,
a developer, who may not have malicious intent, may nevertheless
inadvertently develop an inefficient or error-prone application. A
problematic application (e.g., a malicious, inefficient or
error-prone application) can substantially undermine the operation
of the GPU or a device in which the GPU is provided.
[0021] The techniques of this disclosure may assist in identifying
possibly malicious, inefficient and/or error-prone GPU-executed
applications, prior to execution by the GPU. For example, the
techniques of this disclosure may be directed to a cloud-based
solution in which a server device, external to the device that
houses the GPU, and coupled to the device housing the GPU via one
or more network connections, functions as an emulator for execution
of an application. The server may emulate the results of the
application, as if the application is executing on the GPU. Based
on the results, the server may validate the application (e.g.,
determine whether or not the program is malicious, inefficient, or
error-prone), and indicate as such to the device that houses the
GPU. The GPU may then execute the application based on the received
indication.
[0022] There may be various ways in which the server may execute a
validation process to validate the application. The validation
process may be a software process. The software process may be
executed in conjunction with general purpose processor and/or
special purpose hardware. For example, the server may execute
virtual model software. The virtual model causes the server to
emulate the GPU or the actual device that includes GPU upon which
the application will execute. In alternate examples, instead of or
in addition to virtual models, the server may include a hardware
emulation board to validate the application. The server may also
include an application that is specifically designed to test
security violations of the application that is be executed by the
GPU.
[0023] To validate the application that is to be executed by the
GPU, the server may perform static analysis, dynamic analysis, or a
combination thereof. Static analysis refers to analysis of the
application that can be performed without execution of the
application. For instance, static analysis can be performed during
compilation. During the compilation, the server may identify errors
in the application such as infinite loops in the program or
out-of-bounds access to array locations within the application as
two non-limiting examples.
[0024] Dynamic analysis refers to analysis of the application
during execution, which may additionally result in identifying
problematic applications (e.g., malicious, inefficient, and
error-prone applications). For example, the server may execute
compiled code, and the server may provide the executed code with
hypothetical input values. The hypothetical input values may be,
for example, different input images, input images with different
sizes, and the like.
[0025] The server, executing a validation process, may monitor the
results and the functions performed by the executed code. For
example, the server may monitor memory accesses by the virtual
model of the GPU, and determine whether the memory accesses are
out-of-bounds memory accesses. The server may also monitor the
memory addresses where the virtual model of the GPU is writing
information. Based on the memory accesses of the virtual model of
the GPU and memory addresses where the virtual model of the GPU is
writing information, the server may be able to determine whether
the application is error-prone. Such memory tracking may be
particularly useful when the application reads or writes to
variables using pointers.
[0026] The server may also detect applications that generate or
enable denial of service attacks. For example, the server may
monitor the rate at which the virtual model of the GPU is able to
execute the application. If the server detects slow responsiveness,
unintended termination, or hanging, the server may determine that
the application is an application designed for a denial of service
attack, or a very poorly designed application. In either case,
execution of such an application may negatively impact the
experience of a user.
[0027] In addition to validating the application, in some examples,
the server may be able to tune and optimize the application as
well. For example, the server may insert or replace the source
code, or portions of the source code, or collect statistics to
determine how well the compiled code works. In some examples, the
server may validate the application and optimize or tune the
application once. After such validation, the device may execute the
application as often as the user would like without requiring
further validations or optimization. Also, in some examples, after
validating a certain application, the server may store an
indication that indicates that this application has already been
validated. If the server receives the same source code or
pre-compiled object code again, the server may first ensure that
the code is identical, and if so, immediately validate that
application.
[0028] FIG. 1 is a block diagram illustrating an example of a
system that may be operable to implement one or more aspects of
this disclosure. For example, FIG. 1 illustrates system 10 that
includes device 12, network 22, validation server device 24, and
application server device 38. Although only one device 12,
validation server device 24, and application server device 38 is
illustrated in FIG. 1, in other examples, system 10 may include a
plurality of devices 12, validation servers 24, and application
servers 38. System 10 may be referred to as a cloud-based system to
indicate that validation of application 20 occurs in validation
server device 24, which is external to device 12, as described in
more detail. For example, the techniques of this disclosure may be
directed to validating application 20 in the cloud (e.g., in
validation server device 24, which is external to device 12).
[0029] Examples of device 12 include, but are not limited to, video
devices such as media players, set-top boxes, wireless handsets
such as mobile telephones, personal digital assistants (PDAs),
desktop computers, laptop computers, gaming consoles, video
conferencing units, tablet computing devices, and the like.
Examples of validation server device 24 and application server
device 38 include, but are not limited to, laptops, desktops, web
servers, and the like. In general, validation server device 24 and
application server device 38 may be any type of device capable of
performing the functions attributed to validation server device 24
and application server device 38 in this disclosure.
[0030] Network 22 may allow device 12 to securely communicate with
validation server device 24 and application server device 38. For
security purposes, any communication between device 12 and
validation server device 24 and application server device 38 may be
encrypted or otherwise secured. Also, for further protection, any
communication between device 12 and validation server device 24 and
application server device 38 may require user authorization.
[0031] In some examples, network 22 may ensure that information
transmitted by any one of device 12, validation server device 24,
and application server device 38 is received only by the intended
device or devices, and no other device. Network 22 may be a local
area network (LAN), a wide area network (WAN), the Internet, and
the like. Device 12, validation server device 24, and application
server device 38 may be coupled to network 22 wirelessly or through
a wired link. In some examples, it may be possible for device 12 to
be coupled directly to validation server device 24 and/or
application server device 38. For example, device 12 may directly
communicate with validation server device 24 and/or application
server device 38 through a wireless or wired connection. In these
examples, network 22 may not be needed in system 10.
[0032] As illustrated in FIG. 1, device 12 may include GPU 14,
processor 16, and device memory 18. Device 12 may include
components in addition to those illustrated in FIG. 1. For example,
FIG. 5 illustrates an example of device 12 that includes more
components than those illustrated in FIG. 1.
[0033] Examples of GPU 14 and processor 16 include, but are not
limited, to a digital signal processor (DSP), a general purpose
microprocessor, an application specific integrated circuit (ASIC),
a field programmable logic array (FPGA), or other equivalent
integrated or discrete logic circuitry. Furthermore, although GPU
14 and processor 16 are illustrated as separate components, aspects
of this disclosure are not so limited. In alternate examples, GPU
14 and processor 16 may be part of a common integrated circuit. For
purposes of illustration and ease of description, GPU 14 and
processor 16 are illustrated as separate components.
[0034] Examples of device memory 18 include, but are not limited
to, a random access memory (RAM), a read only memory (ROM), or an
electrically erasable programmable read-only memory (EEPROM).
Examples of device memory 18 may also include storage devices such
as CD-ROM or other optical disk storage, magnetic disk storage, or
other magnetic storage devices, flash memory. In general, device
memory 18 may include mediums that can be used to store desired
program code in the form of instructions or data structures and
that can be accessed by GPU 14 and processor 16. In some examples,
device memory 18 may comprise one or more computer-readable storage
media, such as a computer-readable storage device. For instance, in
some example implementations, device memory 18 may include
instructions that cause GPU 14 and processor 16 to perform the
functions ascribed to GPU 14 and processor 16 in this
disclosure.
[0035] Device memory 18 may, in some examples, be considered as a
non-transitory storage medium. The term "non-transitory" may
indicate that the storage medium is not embodied in a carrier wave
or a propagated signal. However, the term "non-transitory" should
not be interpreted to mean that device memory 18 is non-movable. As
one example, device memory 18 may be removed from device 12, and
moved to another device. As another example, a storage device,
substantially similar to device memory 18, may be inserted into
device 12. In certain examples, a non-transitory storage medium may
store data that can, over time, change (e.g., in RAM).
[0036] GPU 14 may be operable to execute one or more software
applications. For example, GPU 14 may include a processor core on
which one or more software applications may execute. The
applications that execute on GPU 14 may be graphics applications
such as vertex shaders and fragment shaders for generating graphics
data. However, it may be possible for the applications that execute
on GPU 14 to be unrelated to graphics processing. For example, a
developer may consider it beneficial to exploit the massive
parallelism of GPU 14 and develop a software application unrelated
to graphics processing that exploits the massive parallelism of GPU
14. In these cases, GPU 14 may be referred to as a general purpose
GPU (GP-GPU).
[0037] As one example, FIG. 1 illustrates GPU 14 executing
application 20. Application 20 may be a graphics application or a
non-graphics application that executes on GPU 14. Application 20 is
illustrated in a dashed box within GPU 14 to indicate that
application 20 is executing on GPU 14. GPU 14 does not actually
include application 20. For instance, application 20 may be stored
in device memory 18, as illustrated in FIG. 1.
[0038] Application 20 may be developed using a wide variety of
different programming application processing interfaces (APIs). For
example, a developer may have developed application 20 using any
programming API such as OpenGL, OpenCL, WebGL, and WebCL. In
general, applications that are developed using the OpenGL or WebGL
APIs are designed for graphics processing. Applications that are
developed using the OpenCL or WebCL APIs are designed for
processing unrelated to graphics processing. The OpenGL, OpenCL,
WebGL, and WebCL APIs are provided for illustration purposes and
should not be considered limiting. The techniques of this
disclosure may be extendable to APIs in addition to the examples
provided above. In general, the techniques of this disclosure may
be extendable to any technique utilized by a developer to develop
application 20.
[0039] As illustrated, device memory 18 may store application 20.
For example, a user of device 12 may cause device 12 to download
application 20 from application server device 38 via network 22. In
turn, device 12 may store application 20 in device memory 18. There
may be other ways in which device 12 stores application 20 in
device memory 18. For instance, a user of device 12 may insert a
FLASH drive into device 12 that stores application 20, and device
12 may retrieve application 20 from the FLASH drive and store
application 20 in device memory 18. In this example, application
server device 38 may not be needed. The above examples that
describe the manner in which device 12 stores application 20 in
device memory 18 are provided for purposes of illustration and
should not be considered limiting. The techniques of this
disclosure may be applicable to any technique in which application
20 is loaded into device memory 18.
[0040] Device memory 18 may store the source code of application
20, intermediate representation of application 20, or object code
of application 20. The source code of application 20 may be the
text in the programming language in which application 20 was
developed. The object code of application 20 may be the binary bits
resulting from the compilation of application 20. For example,
application server device 38 may compile the source code of
application 20, and device 12 may download this pre-compiled object
code of application 20. The intermediate representation of
application 20 may be intermediate to the source code and the
object code. For example, in the intermediate representation of
application 20, the variables of the source code of application 20
may be replaced with register or memory identifiers for where the
variables will be stored in device memory 18.
[0041] The capability of the programmable core or cores of GPU 14
to execute applications, such as application 20, increases the
functionality of GPU 14. However, the capability of GPU 14 to
execute applications may invite misuse or suboptimal use of GPU 14
and make device 12 more susceptible to malicious applications or
error-prone applications. For example, applications that execute
solely on a central processing unit (CPU), such as processor 16,
execute applications in a virtual machine setting which allocates
the amount of memory of device memory 18 and storage locations
within device memory 18 that are accessible to the applications.
Because the applications are confined to the virtual machine of
processor 16, the applications are unable to access out-of-bounds
memory addresses and are limited to accessing memory addresses
specifically provided to it by the virtual machine of processor 16.
In this way, it may be difficult for applications executing on
processor 16 to drastically impact processor 16, and device 12, in
turn, in a negative manner.
[0042] In some instances, it may not be practical to implement
virtual machines on GPU 14. For example, the massive parallel
processing capabilities of GPU 14 may not be well suited for
executing virtual machines. For instance, if virtual machines were
to execute on GPU 14, the virtual machines would dominate the
resources of GPU 14, possibly restricting other applications from
being executed on GPU 14. Accordingly, in some instances, virtual
machines may not be able to limit the negative impacts of malicious
or error-prone applications that execute on GPU 14.
[0043] Applications that execute on GPU 14, such as application 20,
may be considered as applications that execute "natively" (i.e.,
are not confined to a virtual machine). Native execution of
application 20 may allow for application 20 to access larger
portions of device memory 18. Such access may allow problematic
application such as malicious applications or poorly designed
(e.g., error-prone) applications to negatively impact the
performance capabilities of GPU 14 and device 12.
[0044] As one example, the developer of application 20 may develop
application 20 such that application 20, when executed, provokes a
denial of service attack on device 12, or propagates a virus that
impacts the performance of device 12. For example, when GPU 14
executes application 20, application 20 may control GPU 14 such
that GPU 14 may not be able to perform any other tasks such as
rendering graphics content for a user interface. This may cause
device 12 to "hang," which may drastically impact the functionality
of device 12. In some cases, the developer of application 20 may
develop application 20 to access portions of device memory 18 that
it should be limited from accessing. Application 20 may store
instructions for a virus in these portions of device memory 18.
Then, when processor 16 or GPU 14 accesses these portions of device
memory 18, processor 16 or GPU 14 may accidentally execute the
stored virus. There may be additional examples of malicious
applications, and aspects of this disclosure should not be
considered limited to denial of service attacks or viruses.
[0045] As another example, the developer of application 20 may
inadvertently develop application 20 such that application 20 is
inefficient or error-prone. For instance, an error-prone
application may include infinite loops, out-of-bounds access to an
array, or out-of-bounds access to memory locations of device memory
18. An inefficient application may not properly utilize the
functionality of GPU 14. For example, an inefficient application
may not properly use the programmable functionality of GPU 14.
[0046] In some cases, application server device 38 may potentially
provide a modicum of protection from malicious and error-prone
applications. For example, the owner of application server device
38 may guarantee that none of the applications stored on
application server device 38 are malicious or error-prone
applications. However, this may not be the case in every instance
(e.g., the owner of application server device 38 may not provide a
guarantee of safe and proper operation), or the purported
"guarantee" from the owner of application server device 38 may not
be trustworthy.
[0047] The techniques of this disclosure may assist in identifying
whether applications that are to be executed on GPU 14 (e.g.,
application 20) are problematic applications such as malicious
applications, as well as inefficient and error-prone applications,
prior to execution. For example, the techniques of this disclosure
may validate application 20 prior to GPU 14 executing application
20. Validation of application 20 may mean that the application 20
satisfies one or more performance criteria. For example, validation
may mean determining with some level of assurance that application
20 is not a malicious application, an inefficient application, or
an error-prone application. The example techniques described in
this disclosure may transmit an indication to device 12 that
indicates whether it is safe or inadvisable for GPU 14 to execute
application 20. Processor 16 may then elect to instruct GPU 14 to
execute application 20 based on the received indication.
[0048] For example, processor 16 may instruct GPU 14 to execute
application 20 if the indication is favorable, i.e., indicates that
the program is not malicious, not inefficient, and/or not
error-prone. In some examples, processor 16 may instruct GPU 14 to
execute application 20 even if the indication is unfavorable. For
example, if application 20 is not malicious or error-prone, but
inefficient, processor 16 may instruct GPU 14 to execute
application 20 as such execution may potentially not harm GPU 14 or
device 12, but may not execute as efficiently as possible.
[0049] In some examples, the techniques of this disclosure may also
tune, or otherwise optimize, an inefficient application that is to
be executed on GPU 14. For example, the developer of application 20
may not have any malicious intent, and may have developed
application 20 such that application 20 is not prone to errors.
Nevertheless, it may be possible that application 20 may not
efficiently utilize the resources of GPU 14.
[0050] As one example, one of the functions of application 20 may
be to divide a task into workgroups and perform parallel processing
on the workgroups to exploit the parallelism of GPU 14. For
example, application 20 may divide an image into blocks and perform
parallel processing on the blocks. The size of each of blocks may
be based on the amount of local memory available on GPU 14.
[0051] Because the developer of application 20 may want to design
application 20 to execute on a variety of different GPUs, the
developer may not know ahead of time how much local memory is
available on a particular GPU, such as GPU 14, as different GPUs
may include different amounts of local memory. To address this, the
developer may develop application 20 to utilize variable sized
blocks. In some instances, utilizing variable sized blocks may be
less efficient than utilizing fixed sized blocks. The techniques of
this disclosure may tune or optimize application 20 such that
application 20 utilizes fixed sized blocks based on the amount of
available memory in GPU 14.
[0052] As another example, application 20 may perform matrix
operations. The developer of application 20 may have developed
application 20 to perform row-based matrix operations or
column-based matrix operation. In some instances, GPU 14 may be
better suited to perform row-based matrix operations, as compared
to column-based matrix operations, or vice-versa. In this example,
the techniques of this disclosure may modify application 20 to
perform row-based matrix operations, if application 20 uses
column-based matrix operations, to more efficiently utilize GPU
14.
[0053] As yet another example, the developer may have developed
application 20 for older versions of GPUs, and application 20 may
not be optimized for GPU 14. The techniques of this disclosure may
modify application 20 so that application 20 is more optimized for
newer GPUs, such as GPU 14. GPU 14 may then execute application 20,
which is optimized to execute on newer GPUs.
[0054] In accordance with techniques of this disclosure, validation
server device 24 may validate application 20, and in some examples,
optimize or tune application 20. To validate application 20,
validation server device 24 may implement a validation process that
determines whether application 20 satisfies one or more performance
criteria. For example, validation server device 24 may determine,
with some reasonable level of assurance, whether application 20 is
a malicious application, an error-prone application, or an
inefficient application. In examples where application 20 is an
error-prone application or an inefficient application, validation
server device 24 may attempt to correct the errors in application
20, or optimize application 20 to be more efficient.
[0055] It may be generally difficult to absolutely guarantee that
application 20 is not a problematic application because it may be
difficult to test all of the various ways in which application 20
may affect GPU 14 and device 12. Although an absolute guarantee
that application 20 is not a problematic application may be
difficult, validation server device 24 may employ different types
of analysis to ensure with some reasonable amount of certainty that
application 20 is not a problematic application.
[0056] As illustrated in FIG. 1, validation server device 24 is
external to device 12. Accordingly, the validation of application
20 and optimization of application 20 may be offloaded from device
12, which may be referred to as validating application 20 in the
"cloud" because validation server device 24 is a server that is
external to device 12. By offloading the validation of application
20 to validation server device 24, the probability of application
20 negatively impacting GPU 14 and device 12 may be reduced, in
cases where application 20 is a malicious application or an
error-prone application. Also, by offloading the optimization of
application 20 to validation server device 24, power savings and
processing efficiency may be realized because processor 16 does not
need to consume power and clock cycles validating or optimizing
application 20.
[0057] There may be various examples of performance criteria that
application 20 may need to satisfy for validation server device 24
to validate application 20. In general, the performance criteria
can be part of static analysis, dynamic analysis, or a combination
thereof. Static analysis refers to analysis of application 20 that
can be performed without execution of application 20 to ensure that
application 20 satisfies one or more performance criteria
associated with static analysis. Dynamic analysis refers to
analysis of application 20 during execution to ensure that
application 20 satisfies one or more performance criteria
associated with dynamic analysis.
[0058] Validation server device 24 may be operable to perform
static analysis, dynamic analysis, or both static analysis and
dynamic analysis. For purposes of illustration, validation server
device 24 is described as being operable to perform both static
analysis and dynamic analysis, and therefore, operable to ensure
that application 20 satisfies the performance criteria associated
with both static analysis and dynamic analysis. In alternate
examples, validation server device 24 may be operable to perform
one of static analysis or dynamic analysis, and in these alternate
examples, validation server device 24 may be operable to ensure
that application 20 satisfies the performance criteria associated
with the type of analysis that validation server device 24 is
operable to perform (e.g., performance criteria associated with
static analysis or dynamic analysis).
[0059] As illustrated in FIG. 1, validation server device 24
includes emulator unit 26 and server memory 28. Server memory 28
may include data and/or instructions defining one or more GPU
models 30, one or more GPU inputs 32, and one or more device models
34. Emulator unit 26 may be a processing unit that is operable to
execute one or more of GPU models 30 and device models 34. As
another example, emulator unit 26 may be a hardware emulation
board, which may be a GPU. In some examples, emulator unit 26 may
include two portions, which may be part of the same circuitry or
separate, distinct circuits, where the first portion is a
processing unit that is operable to execute one or more of GPU
models 30 and device models 34, and the second portion that is the
hardware emulation board (e.g., a GPU). Examples of emulator unit
26 include, but are not limited to, a DSP, a general purpose
microprocessor, an ASIC, a FPGA, or other equivalent integrated or
discrete logic circuitry.
[0060] Server memory 28 may be similar to device memory 18. For
instance, server memory 18 may be any medium that can be used to
store desired program code in the form of instructions, data,
and/or data structures and that can be accessed by emulator unit 26
and that cause emulator unit 26 to perform one or more the
functions ascribed to emulator unit 26. Similar to device memory
18, server memory 28 may, in some examples, be considered as a
non-transitory storage medium, as described above with respect to
device memory 18.
[0061] As illustrated, server memory 28 may store data and/or
instructions defining one or more GPU models 30, GPU inputs 32, and
device models 34. It may not be necessary for server memory 28 to
store one or more GPU models 30, GPU inputs 32, and device models
34 in every example. For example, server memory 28 may store GPU
models 30 and GPU inputs 32, but may not store device models 34. If
validation server device 24 is operable to perform only static
analysis, GPU models 30, GPU inputs 32, and device models 34 may
not be needed. In some examples, it is with the GPU models 30, GPU
inputs 32, and device models 34 that emulator unit 26 performs
dynamic analysis.
[0062] Each of the one or more GPU models 30 may correspond to a
particular GPU type, and each of the one or more device models 34
may correspond to a particular device type. For instance, each one
of the GPU models 30 may model the configuration of its
corresponding GPU type in terms of parallel processing
capabilities, local memory availability, and any other pertinent
characteristic that defines the functionality of GPUs of that GPU
type. Each one of the device models 34 may model the configuration
of its corresponding device type in terms of memory configuration,
processor speed, system bus speed, device memory, and any other
pertinent characteristics that defines the functionality of devices
of that device type. For examples, different vendors provide
different types of devices with different functional
characteristics, and device models 34 may be models for each of
these different device types.
[0063] The one or more GPU models 30 and device models 34 may each
be considered as virtual model software that emulator unit 26 can
execute. For example, when emulator unit 26 executes one of the GPU
models 30, emulator unit 26 emulates the GPU to which the executed
GPU model 30 corresponds. When emulator unit 26 executes one of the
GPU models 30 and one of the device models 34, emulator unit 26
emulates the device to which the executed device model 34
corresponds, as if such a device included the GPU to which the
executed GPU model 30 corresponds. In some examples, the GPU
vendors and the device vendors may supply GPU models 30 and device
models 34, respectively. There may be other ways in which server
memory 28 stores GPU models 30 and device models 34, and aspects of
this disclosure are not limited to the specific examples where
vendors provide GPU models 30 and device models 34.
[0064] For example, when emulator unit 26 executes one of GPU
models 30, emulator unit 26 may function as if the parallel
processing capabilities and local memory availability of emulator
unit 26 (as two examples) are functionally equivalent to the GPU
type associated with executed one of GPU models 30. Similarly, when
emulator unit 26 executes one of device models 34, emulator unit 26
may function as if the memory configuration, processor speed,
system bus speed, and device memory of emulator unit 26 (as four
examples) are functionally equivalent to the device type associated
with executed one of device models 34. In other words, the
execution of one of GPU models 30 causes emulator unit 26 to
function as the GPU associated with the executed one of GPU models
30. The execution of one of GPU models 30 and one of device models
34 causes emulator unit 26 to function as a device associate with
the executed one of device models 34 that includes the GPU
associated with the executed one of GPU models 30.
[0065] One of the plurality of GPU models 30 may be a generic GPU
model 30, and one of the plurality of device models 34 may be
generic device model 34. In some examples, server memory 28 may
store a generic GPU model and a generic device model instead of a
plurality of GPU models and device models. The generic GPU model
and device model may not correspond to a particular GPU or device
type, but may be suitable for static and dynamic analysis. In some
examples, if server memory 28 does not store a GPU model that
corresponds to GPU 14, then the generic GPU model may be suitable
for validation purposes. The generic GPU model and the generic
device model may conform to a base profile of operation common to
most GPUs or devices.
[0066] There may be various types of GPUs and devices that may be
modeled by the generic GPU and generic device models. As one
example, the generic GPU model may model a GPU with average
parallel processing capabilities and local memory availability as
compared to other GPUs. The generic device model may model a device
with average memory configuration, processor speed, system bus
speed, and device memory as compared to other devices.
[0067] As an illustrative example for validating and/or optimize
application 20 for execution on GPU 14, device 12 may download
application 20 from application server device 38. Application 20
may be source code, an intermediate representation, or pre-compiled
object code, as described above. Processor 16 may then install
application 20 on device 12. If application 20 is in source code or
in the intermediate representation, e.g., not pre-compiled object
code, part of the installation may be processor 16 executing a
compiler to compile the code of application 20.
[0068] In some examples, where the downloaded code of application
20 is source code or the intermediate representation, prior to
compiling, processor 16 may cause device 12 to transmit the
downloaded code of application 20 to validation server device 24
for validation. In some examples, where the downloaded code of
application 20 is pre-compiled object code, processor 16 may cause
device 12 to transmit the pre-compiled object code to validation
server device 24 for validation before allowing GPU 14 to execute
application 20.
[0069] For security purposes, processor 16 may encrypt or otherwise
make secure the downloaded code of application 20 that device 12
transmits to validation server device 24. In some examples,
processor 16 may require authorization from a user prior to
transmitting the downloaded code of application 20 to validation
server device 24. Furthermore, in some examples of dynamic
analysis, processor 16 may cause device 12 to transmit the GPU type
of GPU 14 or both the GPU type of GPU 14 and the device type of
device 12 to validation server device 24. In some of these
instances, processor 16 may require authorization from the user
prior to transmitting the GPU type of GPU 14 or the GPU type of GPU
14 and device type of device 12 to validation server device 24.
[0070] Emulator unit 26 may be operable to perform static analysis
on application 20 to determine whether application 20 satisfies the
performance criteria associated with static analysis. For example,
emulator unit 26 may analyze application 20 without executing
application 20. As one example, emulator unit 26 may parse through
the downloaded code of application 20 to identify code known to be
code for a virus. For instance, server memory 28 may store code of
known viruses, and emulator unit 26 may compare the downloaded code
of application 20 to the code of the known viruses. Determining
that the downloaded code of application 20 does not include code of
known viruses may be one example of performance criteria that needs
to be satisfied to validate application 20.
[0071] As part of the static analysis, emulator unit 26 may compile
the downloaded code of application 20, in examples where the
downloaded code of application 20 is the source code or
intermediate representation of application 20, to identify errors
in application 20 during compilation. For example, emulator unit 26
may execute compiler 36, as indicated by dashed lines within
emulator unit 26. The compilation of application 20, with compiler
36, may identify any infinite loops in application 20 or
out-of-bounds access to memory array locations within application
20. In this example, determining that there are not errors in
application 20, that can be found during compilation, may be
another example of performance criteria that needs to be satisfied
to validate application 20.
[0072] Static analysis may be limited in the types of errors,
inefficiencies, and malicious code that can be found. For example,
if the downloaded code of application 20 is pre-compiled object
code, it may not be possible for emulator unit 26 to identify
errors in application 20 during compilation because the code for
application 20 is already pre-compiled object code. As another
example, if application 20 relies on pointers for storage, it may
not be possible to determine if there are any out-of-bounds memory
access errors in application 20 based simply on compiling
application 20.
[0073] To further determine whether application 20 is problematic
(e.g., inefficient, error-prone, or malicious), emulator unit 26
may perform dynamic analysis. As indicated above, dynamic analysis
refers to analysis of application 20 during execution. In some
examples, to perform dynamic analysis emulator unit 26 may cause
itself to appear as if it is GPU 14. For example, in some
instances, in addition to transmitting the downloaded code of
application 20, processor 16 may cause device 12 to transmit the
GPU type of GPU 14 to emulator unit 26 of validation server device
24, or both the GPU type of GPU 14 and the device type of device 12
to emulator unit 26 of validation server device 24 via network 22.
Emulator unit 26, in turn, may identify which one of GPU models 30
corresponds to the GPU type of GPU 14, and may execute that one of
GPU models 30 to emulate GPU 14 on validation server device 24. In
examples where emulator unit 26 also receives the device type,
emulator unit 26 may identify which one of device models 34
corresponds to the device type of device 12, and may execute that
one of device models 34 to emulate device 12 on validation server
device 24.
[0074] In examples where device 12 does not transmit the GPU type
of GPU 14 and/or the device type of device 12, emulator unit 26 may
execute the generic GPU model and/or the generic device model.
Alternatively, if device 12 does transmit the GPU type of GPU 14
and/or the device type of device 12, but none of GPU models 30 and
device models 34 correspond to the GPU and device type, emulator
unit 26 may execute the generic GPU model and/or generic device
model. In examples where emulator unit 26 is or includes a hardware
emulation board, such a hardware emulation board may be designed to
function, at least in part, as a generic GPU on a generic
device.
[0075] Once emulator unit 26 emulates itself to be GPU 14, or to be
GPU 14 as part of device 12, emulator unit 26 may execute
application 20. For example, if emulator unit 26 received the
source code or intermediate code of application 20, emulator unit
26 may compile the source code via compiler 36, and execute the
resulting object code. If emulator unit 26 received pre-compiled
object code of application 20, emulator unit 26 may execute the
pre-compiled object code of application 20.
[0076] The techniques of this disclosure may be considered, in some
examples, as being performed at least in part by emulator unit 26
executing a virtual model based on the type of GPU 14 (e.g., one of
GPU models 30). Then, when emulator unit 26 executes application
20, application 20 can be considered as executing in the virtual
model (e.g., the one of GPU models 30 that is executing on emulator
unit 26). For example, both the GPU model, of GPU models 30, that
corresponds to GPU 14 and application 20 are executing on emulator
unit 26. In the techniques of this disclosure, because emulator
unit 26 functions as if it is GPU 14, due to the execution of the
GPU model that corresponds to GPU 14, when emulator unit 26
executes application 20, application 20 may execute on the GPU
model that corresponds to GPU 14.
[0077] As part of the dynamic analysis, emulator unit 26 may
receive hypothetical input values for application 20 that is
executing on emulator unit 26. As illustrated, server memory 28 may
store one or more GPU inputs 32. These one or more GPU inputs 32
may be values for different graphical images or objects. In some
examples, each of these different images may be of different sizes.
In examples where application 20 is not related to graphics
processing, GPU inputs 32 may be non-graphics inputs. It may be
difficult to ensure that emulator unit 26 tests every permutation
and combination of possible input values. Accordingly, server
memory 28 may store a sufficient number and/or range of GPU inputs
32, e.g., as samples or test inputs, to provide some reasonable
level of assurance that application 20 is not a malicious or highly
error-prone application (e.g., a problematic application). The GPU
inputs 32 may include different types of images or objects to be
processed and rendered by GPU 14.
[0078] During execution of application 20, emulator unit 26 may
input the values of GPU inputs 32 and may analyze functionality of
the executed GPU model of GPU models 30. In examples, where
emulator unit 26 is a hardware emulation board, emulator unit 26
may analyze the functionality of the hardware emulation board. For
example, emulator unit 26 may monitor memory accesses by the
executed GPU model of GPU models 30. In this example, emulator unit
26 may determine whether any of the memory accesses by the executed
GPU model of GPU models 30 are out-of-bounds memory accesses of
server memory 28. As another example, emulator unit 26 may monitor
the memory addresses where the execute GPU model of GPU models 30
is writing information in server memory 28. Based on the memory
accesses of the GPU model and the memory addresses where the GPU
model is writing information, emulator unit 26 may be able to
determine whether application 20 is error-prone. Such memory
tracking may be particularly useful when application 20 reads or
writes to variables using pointers.
[0079] For example, if the executed GPU model writes information to
or reads information from out-of-bounds memory locations, emulator
unit 26 may determine that application 20 is error-prone, and
possibly malicious. For example, if the executed GPU model writes
information to or reads information from a non-existent memory
location, emulator unit 26 may determine that application 20 is
error-prone. If the executed GPU model writes information to a
memory location that is not reserved for the GPU model, emulator
unit 26 may determine that application 20 is error-prone or
possibly malicious. For example, emulator unit 26 may determine
that application 20 is attempting to load a virus into the memory
locations which application 20 should not be able to access.
[0080] The limitations of where application 20 can write
information to or read information from (e.g., access) during
execution may be an example of performance criteria associated with
dynamic analysis. For example, the performance criteria may be a
limitation of the memory locations that application 20 is allowed
to access. If the GPU model of GPU models 30 accesses memory
location outside of the limited memory locations, due to the
execution of application 20, application 20 may be in violation of
the performance criteria. For example, there may be threshold
number of access outside the limited memory locations that is
allowable, in accordance with the performance criteria. The
threshold number may be zero to provide a highest level of
assurance that application 20 is not attempting to access memory
locations outside of the limited memory locations.
[0081] In examples where emulator unit 26 also executes one of
device models 34, emulator unit 26 may similarly analyze
functionality of the executed device model of device models 34. For
example, emulator unit 26 may monitor the functions performed by
the executed one of device models 34 while emulator unit 26
executes one of GPU models 30. For example, the execution of one of
device models 34 may result in emulator unit 26 device 12 which
includes a system bus. Emulator unit 26 may determine whether the
execution of application 20 causes the system bus to overload
resulting in device 12 slowing down.
[0082] The monitoring of the system bus to determine whether the
system bus is being overloaded may be an example of performance
criteria associated with dynamic analysis. For example, if the
execution of application 20 causes the system bus to overload,
application 20 may be in violation of the performance criteria. In
this example, the performance criteria may allow for some level of
overloading the system bus, as it may not be possible to not allow
any overloading of the system bus. For example, the perform
criteria may establish a percentage amount threshold of system bus
overload. If the system bus overload is below the allowable
percentage, the performance criteria is satisfied. Otherwise, the
performance criteria is not satisfied.
[0083] Emulator unit 26 may similarly detect malicious applications
such as denial of service attacks. For example, emulator unit 26
may monitor the rate at which the GPU model of GPU models 30 is
able to execute application 20. If emulator unit 26 detects slow
responsiveness, unintended termination, or hanging, emulator unit
26 may determine application 20 is an application designed for a
denial of service attack, or a very poorly designed application. In
this example, the performance criteria may be a threshold execution
time or execution rate for a particular task of application 20. If
application 20 takes longer than the threshold execution time to
complete a particular task or executes the task at a rate less than
the threshold execution rate, application 20 may be in violation of
the performance criteria.
[0084] As another example of emulator unit 26 detecting malicious
applications or error-prone applications, emulator unit 26 may
monitor instructions issued by application 20. For instance, in
some examples, instructions issued by application 20 may be 96-bit
words. However, not all combinations of 96 bits represents a valid
instruction. In some examples, GPU 14 may be designed to ignore
invalid instructions; however, this may not be case for every
example of GPU 14. To avoid GPU 14 from inadvertently executing an
invalid instruction, emulator unit 26 may determine whether the
instructions issued by application 20 during execution are valid or
invalid instructions. If emulator unit 26 determines that
application 20 is issuing invalid instructions, emulator unit 26
may determine that application 20 is a malicious application, an
error-prone application, or an inefficient application.
[0085] As another example, during execution, application 20 may
write data to and read data from registers. A malicious
application, error-prone application, or inefficient application
may read data from unwritten registers. If application 20 attempts
to read data from a register that was not previously written to,
the data read by application 20 may be meaningless data (i e ,
uninitialized data). Such reading of uninitialized data may result
in unpredictable behavior. In some examples, emulator unit 26 may
monitor which registers application 20 writes to during execution,
and may determine whether application 20 is reading from a register
that has not previously been written to. If emulator unit 26
determines that application 20 is reading from unwritten registers,
emulator unit 26 may determine that application 20 is a malicious
application, error-prone application, or an inefficient
application.
[0086] If emulator unit 26 determines that the performance criteria
associated with static analysis and dynamic analysis are met,
validation server device 24 may transmit an indication to device 12
indicating that application 20, with some level of assurance,
satisfies one or more performance criteria associated with static
analysis, dynamic analysis, or both static and dynamic analysis
(e.g., validates application 20). In this case, validation server
device 24 may provide an indication that application 20 is
validated for use by GPU 14. Otherwise, in some examples,
validation server device 24 may transmit an indication to device 12
indicating that application 20 is invalidated for use by GPU 14,
such that it is inadvisable for GPU 14 to execute application 20.
In response, processor 16 may instruct GPU 14 to execute
application 20 based on the received indication.
[0087] In examples where validation server device 24 received
source code or intermediate code of application 20, emulator unit
26 may also transmit the compiled object code of application 20, as
compiled by compiler 36. In this way, the compilation of
application 20 may also be offloaded from device 12 and offloaded
to an external device, such as validation server device 24.
[0088] Validation server device 24 may also be tasked with
optimizing or tuning application 20. For example, emulator unit 26
may receive the source code or intermediate code of application 20.
As part of the static and/or dynamic analysis, emulator unit 26 may
determine that application 20 is somewhat error-prone or would
inefficiently utilize the capabilities of GPU 14. In these
examples, rather than transmitting an indication to device 12
indicating that it is inadvisable for GPU 14 to execute application
20, emulator unit 26 may attempt to correct the errors of
application 20 or attempt to tune application 20 for GPU 14 when it
is determined that application 20 may execute inefficiently or with
errors on GPU 14.
[0089] If emulator unit 26 is able to correct the errors or make
application 20 more efficient, emulator unit 26 may compile the
modified code of application 20 to generate object code that GPU 14
should execute. Emulator unit 26 may then transmit the resulting
object code to device 12 with an indication that GPU 14 should
execute the resulting object code. In this case, GPU 14 may execute
the object code generated from the modified code, rather than the
object code generated from the original code of application 20.
Alternatively, emulator unit 26 may transmit the modified code of
application 20 without compilation.
[0090] In either of these examples, the validation of application
20 may be considered as being part of the transmission of the
modified code of application 20 (e.g., the transmission of the
modified code or the resulting object code). For example, when
device 12 receives modified code of application 20 from validation
server device 24, device 12 may automatically determine that the
modified code of application 20 is suitable for execution because
device 12 received the modified code of application 20 from
validation server device 24. In this sense, the validation that
device 12 receives from validation server device 24 may be an
explicit validation or an implicit validation. In either case,
i.e., explicit or implicit validation, emulator unit 26 may
determine with some level of assurance that application 20 or the
modified version of application 20 satisfies one or more
performance criteria.
[0091] If emulator unit 26 is unable to correct the errors of
application 20, emulator unit 26 may transmit the indication
indicating that it is inadvisable to execute application 20 on GPU
14. If emulator unit 26 is unable to make application 20 more
efficient, emulator unit 26 may still transmit an indication to
device 12 indicating that it may be suitable for GPU 14 to execute
application 20 because while application 20 may not be completely
efficient, application 20 may not be error-prone or malicious.
[0092] To tune or optimize application 20, emulator unit 26 may
insert code (e.g., source code or intermediate code), replace code,
or modify code of application 20 in some other manner. In some
examples, emulator unit 26 may collect statistics to determine how
well the compiled code of application 20 works. For example,
application 20 may utilize array indices for storing variable
values in an array. Emulator unit 26 may add code into the source
code of application 20 that checks that array indices, utilized by
application 20, are within the range. Emulator unit 26 may add code
into the source code of application 20 that causes application 20
to abort when an array index is not within range. Emulator unit 26
then may compile the modified source code to produce object code
for execution of application 20 by GPU 14.
[0093] Optimization or tuning may be based on the assumption that
applications, such as application 20, are generally developed to
exploit the high level of parallelism of GPU 14. If the developer
did not intend to exploit the parallelism of GPU 14, the developer
would have developed application 20 to not execute on GPU 14, and
rather execute on processor 16.
[0094] For example, the developer of application 20 may have
developed application 20 to perform image processing on blocks of
images in parallel. As described above, the size of the blocks of
the images may be based on the amount of available local memory on
GPU 14. Because the developer may not know how much memory is
available on GPU 14, the developer may develop application 20 to
use variable-sized blocks, instead of the more efficient fixed
sized blocks. For example, fixed-size blocks may be more efficient
because the size of the blocks does not change during
execution.
[0095] In some examples, emulator unit 26 may determine the optimal
size for the blocks because the GPU model of GPU models 30 that
corresponds to GPU 14 may include information that indicates the
size of the local memory of GPU 14. In this example, emulator unit
26 may select the optimal size for the blocks based on the amount
of available local memory on GPU 14, the amount of data that will
be needed to write to or read from the local memory of GPU 14, and
other such information which may not be available to developer of
application 20. In aspects of this disclosure, emulator unit 26
would know how much local memory is available and how much data
needs to be written or read from local memory because emulator unit
26 may execute application 20 on the GPU model of GPU models 30
that correspond to GPU 14.
[0096] In these examples, emulator unit 26 may update or otherwise
modify the source code or intermediate code of application 20 to
fix block size to the optimally determined size. In other words,
emulator unit 26 may determine the optimal size of the blocks to
best utilize the parallelism of GPU 14. Emulator unit 26 may then
compile this modified code of application 20, and transmit the
resulting object code to device 12 for execution on GPU 14. In this
way, when GPU 14 executes the modified application 20, the modified
application 20 may execute more efficiently on GPU 14, as compared
to the original application 20.
[0097] In another example for optimization, as described above,
application 20 may perform matrix operations. In this example,
emulator unit 26 may determine whether column-based matrix
operations or row-based matrix operations are handled easier by GPU
14. For instance, emulator unit 26 may cause the GPU model of GPU
models 30 that corresponds to GPU 14 to execute application 20
using row-based matrix operations and using column-based matrix
operations. Emulator unit 26 may compare the efficiency of the
column-based and row-based matrix operations (e.g., number of
accesses to memory, amount of processing time, and other such
efficiency measures). Based on the measured efficiency, emulator
unit 26 may modify the code of application 20. For example, if
column-based operations are more efficiently executed than
row-based operations, emulator unit 26 may modify the code of
application 20 so that the matrix operations are performed as
column-based operations. Similarly, if row-based operations are
more efficiently executed than column-based operations, emulator
unit 26 may modify the code of application 20 so that the matrix
operations are performed as row-based operations.
[0098] In another example for optimization, as described above, the
developer of application 20 may have developed application 20 to be
executed on older versions of GPU. In this case, application 20 may
properly execute on a GPU such as GPU 14; however, application 20
may not fully exploit the functionality of GPU 14. For example,
application 20 may unnecessarily limit the amount of graphics or
non-graphics data that GPU 14 should process in parallel because
older versions of GPUs may be limited in processing capabilities.
In this example, emulator unit 26 may modify the code of
application 20 such that, when application 20 is executed,
application 20 causes GPU 14 to process more data in parallel.
There may be other examples of ways in which emulator unit 26 may
modify application 20 such that application 20 is better suited for
execution on newer GPUs, and aspects of this disclosure should not
be considered limited to the above examples.
[0099] After optimizing application 20, emulator unit 26 may
transmit the modified or updated code of application 20 to device
12. In this example, processor 16 may compile the code of
application 20, as received from emulator unit 26, and instruct GPU
14 to execute the resulting object code. In some other examples,
emulator unit 26 may compile the modified application 20, via
compiler 36, and transmit the resulting object code to device 12.
In this example, processor 16 may instruct GPU 14 to execute the
received object code for application 20.
[0100] In some examples, emulator unit 26 may validate application
20 and optimize or tune application 20 once. After such validation,
GPU 14 may execute application 20 as needed without requiring
further validation or optimization. Also, in some examples, after
emulator unit 26 validates application 20, emulator unit 26 may
store an indication in server memory 28 that indicates that this
application 20 has already been validated. In these examples, when
emulator unit 26 receives code for validation, emulator unit 26 may
first determine whether emulator unit 26 previously validated the
code based on the indication stored in server memory 28. If
emulator unit 26 previously validated the code, emulator unit 26
may immediately valid that received code. For example, emulator
unit 26 may validate application 20, as received from device 12.
Subsequently, emulator unit 26 may receive code for application 20
from a device other than device 12. In this case, emulator unit 26
may first determine that the received code is same as the code that
emulator unit 26 previously validated, and if so, may immediately
validate the received code. In this manner, emulator unit 26 may
not need to perform the static and/or dynamic analysis again for
previously validated code.
[0101] FIG. 2 is a flowchart illustrating an example operation of
device 12. For purposes of illustration only, reference is made to
FIG. 1. Device 12 may receive application 20 that is to be executed
by GPU 14 (40). For example, device 12 may download application 20
from application server device 38. As another example, application
20 may be preloaded on device memory 18. As described above, device
12 may receive the source code, intermediate code (e.g.,
intermediate representation of application 20), or object code of
application 20.
[0102] Device 12 may transmit the code of application 20 to
validation server device 24 (42). For example, device 12 may
transmit the source code, intermediate code, or object code of
application 20 to validation server device 24 for validation of
application 20. In some examples, device 12 may transmit the code
of application 20 to validation server device 24 once for
validation. GPU 14, of device 12, may then execute application 20
as needed without requiring subsequent validation.
[0103] In response to transmitting the code of application 20 to
validation server device 24 for validation, device 12 may receive
the validation from validation server device 24 (44).
Alternatively, device 12 may receive an invalidation or either a
validation or an invalidation. The validation from server device 24
may indicate that application 20 satisfies one or more performance
criteria. If application 20 does not satisfy the one or more
performance criteria, validation server device 24 may indicate that
application 20 did not satisfy the performance criteria. For
example, the validation may indicate that application 20 satisfies
performance criteria associated with static analysis, dynamic
analysis, or both static and dynamic analysis. In some examples,
validation server device 24 may optimize or tune application 20 to
make application 20 more efficient or less error-prone. In this
case, the validation may indicate that the modified version of
application 20 satisfies one or more performance criteria.
[0104] In some examples, processor 16 of device 12 may instruct GPU
14 of device 12 to execute application 20 based on the validation
(48). For example, if validation server device 24 indicates that
application 20 satisfies the performance criteria, processor 16 may
instruct GPU 14 to execute application 20. Otherwise, processor 16
may not allow GPU 14 to execute application 20.
[0105] In some alternate examples, prior to execution, device 12
may receive a modified version of application 20 (46). In FIG. 2,
the dashed line from block 44 to block 46, and from block 46 to
block 48 is used to indicate that the functions of block 46 may not
be necessary in every example. For instance, validation server
device 24 may be able to optimize or tune application 20, and may
transmit the modified version of application 20. As another
example, device 12 may transmit the source code or intermediate
code of application 20, and receive a compiled version of
application 20 from validation server device 24. As yet another
example, device 12 may receive a compiled version of the code as
modified by validation server device 24 (e.g., modified for
optimization or tuning) In these examples, processor 16 may
instruct GPU 14 to execute the modified version of application 20
(48).
[0106] FIG. 3 is a flowchart illustrating an example operation of
validation server device 24. For purposes of illustration only,
reference is made to FIG. 1. Validation server device 24 may
receive application 20, which is to be executed by GPU 14, from
device 12 (50). For example, validation server device 24 may
receive source code, intermediate code, or object code of
application 20 from device 12 via network 22.
[0107] Validation server device 24 may perform at least one of
static analysis and dynamic analysis on application 20 (52). For
example, as part of static analysis, emulator unit 26 of validation
server device 24 may compile the code of application 20, and
monitor for any errors during the compilation of application 20. As
part of the dynamic analysis, emulator unit 26 of validation server
device 24 may execute a virtual model of GPU 14 or the virtual
model of GPU 14 and a virtual model of device 12. As described
above, GPU models 30 and device models 34 may include a virtual
model of GPU 14 and device 12, respectively. In some examples, GPU
models 30 and device models 34 may include a generic GPU model and
a generic device model.
[0108] For example, emulator unit 26 may receive an identification
of GPU 14 and/or device 12 from device 12. Emulator unit 26 may
identify which one of GPU models 30 corresponds to GPU 14 and which
one of device models 34 corresponds to device 12, and execute the
corresponding GPU and device models. If there is no corresponding
GPU and/or device models for GPU 14 and device 12, or if emulator
unit 26 did not receive an identification of GPU 14 and/or device
12, emulator unit 26 may execute the generic GPU and device
models.
[0109] As part of the dynamic analysis, emulator unit 26 may
execute application 20 and input application 20 with GPU inputs 32
for analyzing application 20. In these examples, application 20 may
be considered as executing on the corresponding virtual model of
GPU 14, which is executing on emulator unit 26. In this way,
emulator unit 26 may execute application 20, as if application 20
is executing on GPU 14. Emulator unit 26 may monitor the functions
performed by the corresponding virtual model of GPU 14 such as
memory accesses, rate of execution, termination instance, and other
functions pertinent to the functionality of GPU 14.
[0110] Emulator unit 26 may determine whether application 20
satisfies one or more performance criteria (54). The one or more
performance criteria may be performance criteria associated with
static analysis and performance criteria associated with dynamic
analysis. For example, the one or more performance criteria may be
criteria that there are no errors in the compilation of application
20, as evaluated by compiling application 20 during the static
analysis. As another example, the one or more performance criteria
may be criteria that application 20 not access out-of-bounds memory
locations and not use up resources of GPU 14 such that GPU 14 is
not able to perform other tasks in parallel, as evaluated by
executing application 20 and providing application 20 with GPU
inputs 32 during the dynamic analysis. There may be other examples
of performance criteria that emulator unit 26 may determine that
application 20 satisfies.
[0111] Validation server device 24 may transmit a validation of
application 20 to device 12 based on the determination (56). For
example, validation server device 24 may transmit a validation of
application 20 to device 12 if application 20 satisfies the one or
more performance criteria. Otherwise, validation server device 24
may transmit an invalidation if application 20 does not satisfy the
one or more performance criteria. For example, if emulator unit 26
determines that application 20 satisfies the one or more
performance criteria, validation server device 24 may transmit an
indication to device 12 indicating as such. Alternatively, if
emulator unit 26 determines that application 20 does not satisfy
the one or more performance criteria, validation server device 24
may transmit an indication to device 12 indicating as such.
[0112] FIG. 4 is a flowchart illustrating another example operation
of validation server device 24. For purposes of illustration only,
reference is made to FIGS. 1 and 3. Similar to FIG. 3, validation
server device 24 may receive application 20, which is to be
executed by GPU 14, from device 12 (58). In this example, emulator
unit 26 may modify application 20 (e.g., the source code or
intermediate code of application 20) to optimize or tune
application 20. For example, emulator unit 26 may modify the code
of application 20 so that application 20 executes more efficiently
on GPU 14. Validation server device 24 may then transmit modified
application 20 to device 12 (62). In some examples, validation
server device 24 may transmit the source code or intermediate code
of the modified application 20. As another example, validation
server device 24 may compile the modified code of application, and
transmit the resulting object code to device 12.
[0113] FIG. 5 is a block diagram illustrating the example device of
FIG. 1 in further detail. For instance, FIG. 5 illustrates device
12 of FIG. 1 in further detail. For example, as indicated above,
examples of device 12 include, but are not limited to, mobile
wireless telephones, PDAs, video gaming consoles that include video
displays, mobile video conferencing units, laptop computers,
desktop computers, television set-top boxes, and the like.
[0114] As illustrated in FIG. 5, device 12 may include GPU 14,
processor 16, device memory 18, transceiver module 64, user
interface 66, display 68, and display processor 70. GPU 14,
processor 16, and device memory 18 may be substantially similar or
identical to those illustrated in FIG. 1. For purposes of brevity,
only the components that are shown in FIG. 5, but not shown in FIG.
1 are described in detail.
[0115] Device 12 may include additional modules or units not shown
in FIG. 5 for purposes of clarity. For example, device 12 may
include a speaker and a microphone, neither of which are shown in
FIG. 5, to effectuate telephonic communications in examples where
device 12 is a mobile wireless telephone, or a speaker where device
12 is a media player. Furthermore, the various modules and units
shown in device 12 may not be necessary in every example of device
12. For example, user interface 66 and display 68 may be external
to device 12 in examples where device 12 is a desktop computer or
other device that is equipped to interface with an external user
interface or display.
[0116] Examples of user interface 66 include, but are not limited
to, a trackball, a mouse, a keyboard, and other types of input
devices. User interface 66 may also be a touch screen and may be
incorporated as a part of display 68. Transceiver module 64 may
include circuitry to allow wireless or wired communication between
device 12 and another device or a network. Transceiver module 64
may include one or more modulators, demodulators, amplifiers,
antennas and other such circuitry for wired or wireless
communication. Display 68 may comprise a liquid crystal display
(LCD), an organic light emitting diode display (OLED), a cathode
ray tube (CRT) display, a plasma display, a polarized display, or
another type of display device.
[0117] In some examples, after GPU 14 generates the graphics data
for display on display 68, GPU 14 may output the resulting graphics
data to device memory 18 for temporary storage. Display processor
70 may retrieve the graphics data from device memory 18, perform
any post-processing on the graphics data, and output the resulting
the graphics data to display 68. For example, display processor 70
may perform any further enhancements or scale the graphics data
generated by GPU 14.
[0118] In one or more examples, the functions described may be
implemented in hardware, software, firmware, or any combination
thereof. If implemented in software, the functions may be stored as
one or more instructions or code on a computer-readable medium.
Computer-readable media may include computer data storage media.
Data storage media may be any available media that can be accessed
by one or more computers or one or more processors to retrieve
instructions, code and/or data structures for implementation of the
techniques described in this disclosure. By way of example, and not
limitation, such computer-readable media can comprise random access
memory (RAM), read-only memory (ROM), EEPROM, CD-ROM or other
optical disk storage, magnetic disk storage or other magnetic
storage devices, or any other medium that can be used to store
desired program code in the form of instructions or data structures
and that can be accessed by a computer. Disk and disc, as used
herein, includes compact disc (CD), laser disc, optical disc,
digital versatile disc (DVD), floppy disk and Blu-ray disc where
disks usually reproduce data magnetically, while discs reproduce
data optically with lasers. Combinations of the above should also
be included within the scope of computer-readable media.
[0119] The code may be executed by one or more processors, such as
one or more digital signal processors (DSPs), general purpose
microprocessors, application specific integrated circuits (ASICs),
field programmable logic arrays (FPGAs), or other equivalent
integrated or discrete logic circuitry. Accordingly, the term
"processor," as used herein may refer to any of the foregoing
structure or any other structure suitable for implementation of the
techniques described herein. Also, the techniques could be fully
implemented in one or more circuits or logic elements.
[0120] The techniques of this disclosure may be implemented in a
wide variety of devices or apparatuses, including a wireless
handset, an integrated circuit (IC) or a set of ICs (i.e., a chip
set). Various components, modules or units are described in this
disclosure to emphasize functional aspects of devices configured to
perform the disclosed techniques, but do not necessarily require
realization by different hardware units. Rather, as described
above, various units may be combined in a hardware unit or provided
by a collection of interoperative hardware units, including one or
more processors as described above, in conjunction with suitable
software and/or firmware.
[0121] Various examples have been described. These and other
examples are within the scope of the following claims.
* * * * *