U.S. patent application number 14/853405 was filed with the patent office on 2016-03-17 for method and apparatus for executing application based on open computing language.
This patent application is currently assigned to Samsung Electronics Co., Ltd.. The applicant listed for this patent is Samsung Electronics Co., Ltd., SNU R&DB Foundation. Invention is credited to Cheolyong Jeon, Hongjune Kim, Jaejin Lee, Jun Lee.
Application Number | 20160080284 14/853405 |
Document ID | / |
Family ID | 55455940 |
Filed Date | 2016-03-17 |
United States Patent
Application |
20160080284 |
Kind Code |
A1 |
Jeon; Cheolyong ; et
al. |
March 17, 2016 |
METHOD AND APPARATUS FOR EXECUTING APPLICATION BASED ON OPEN
COMPUTING LANGUAGE
Abstract
Disclosed is a method of executing a kernel of a mobile
application program using an open computing language (OpenCL). The
method includes receiving, from a server, a resource list including
resources to execute a kernel for the application program;
determining, if the application program is executed, resources to
execute the kernel for the application program among resources of
the terminal and the server; and transmitting, if the resources to
execute the kernel are determined as the resources of the server,
data and the kernel to the server.
Inventors: |
Jeon; Cheolyong; (Seoul,
KR) ; Lee; Jaejin; (Seoul, KR) ; Kim;
Hongjune; (Seoul, KR) ; Lee; Jun; (Seoul,
KR) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
Samsung Electronics Co., Ltd.
SNU R&DB Foundation |
Gyeonggi-do
Seoul |
|
KR
KR |
|
|
Assignee: |
Samsung Electronics Co.,
Ltd.
SNU R&DB Foundation
|
Family ID: |
55455940 |
Appl. No.: |
14/853405 |
Filed: |
September 14, 2015 |
Current U.S.
Class: |
709/226 |
Current CPC
Class: |
H04W 4/60 20180201; G06F
9/445 20130101; G06F 2209/509 20130101; G06F 8/30 20130101; G06F
9/50 20130101; G06F 9/44 20130101; H04L 67/10 20130101; Y02D 10/00
20180101; G06F 9/5027 20130101 |
International
Class: |
H04L 12/911 20060101
H04L012/911; H04L 29/08 20060101 H04L029/08 |
Foreign Application Data
Date |
Code |
Application Number |
Sep 12, 2014 |
KR |
10-2014-0121379 |
Claims
1. A method of executing an application program by a terminal, the
method comprising: receiving, from a server, a resource list
including resources to execute a kernel for the application
program; determining, if the application program is executed,
resources to execute the kernel for the application program among
resources of the terminal and the server; and transmitting, if the
resources to execute the kernel are determined as the resources of
the server, data and the kernel to the server.
2. The method of claim 1, wherein determining the resources
comprises: predicting costs when kernels are executed by the
resources of the terminal and the server; and determining the
resources to execute the kernels based on the predicted costs.
3. The method of claim 1, wherein the server includes a cloud
server capable of executing the kernel.
4. The method of claim 2, wherein predicting the costs comprises:
measuring a time required when the kernel is executed by the
terminal; and predicting the cost based on the measured time and an
energy consumed by the terminal when the kernel is executed.
5. The method of claim 2, wherein predicting the cost comprises:
measuring an execution time of the kernel based on information on a
capability of the server received from the server; and predicting
the cost based on the measured time and energy consumed by the
terminal when the terminal is in an idle state.
6. The method of claim 1, wherein the kernel for the application
program is dependent on at least one other kernel.
7. The method of claim 6, wherein determining the resources
comprises: identifying a location in which the at least one other
kernel is executed; predicting costs when the kernels are executed
by the resources of the terminal and the server based on data
transmission and reception with the at least one other kernel
according to the identified location; and determining the resources
to execute the kernel based on the predicted costs.
8. The method of claim 7, wherein predicting the cost comprises:
measuring a time required when the kernel is executed by the
terminal; and predicting the cost based on the measured time and an
energy consumed by the terminal when the kernel is executed.
9. The method of claim 7, wherein predicting the cost comprises:
predicting an execution time of the kernel based on information on
a capability of the server received from the server; and predicting
the cost based on the predicted time and energy consumed by the
terminal when the terminal is in an idle state.
10. The method of claim 1, further comprising receiving information
on a result of the execution of the kernel from the server.
11. A terminal for executing an application program, the terminal
comprising: a communication unit configured to transmit and receive
information; and a controller configured to receive, from a server,
a resource list including resources to execute a kernel for the
application program, determine, if the application program is
executed, resources to execute the kernel for the application
program among resources of the terminal and the server, and
transmit, if the resources to execute the kernel are determined as
the resources of the server, data and the kernel to the server.
12. The terminal of claim 11, wherein the controller is further
configured to predict costs when kernels are executed by the
resources of the terminal and the server and to determine resources
to execute the kernels based on the predicted costs.
13. The terminal of claim 11, wherein the server includes a cloud
server capable of executing the kernel.
14. The terminal of claim 12, wherein the controller is further
configured to measure a time required when the kernel is executed
by the terminal and predict the cost based on the measured time and
energy consumed by the terminal when the kernel is executed.
15. The terminal of claim 12, wherein the controller is further
configured to predict an execution time of the kernel based on
information on a capability of the server received from the server
and to predict the cost based on the measured time and energy
consumed by the terminal when the terminal is in an idle state.
16. The terminal of claim 11, wherein the kernel for the
application program is dependent on at least one other kernel.
17. The terminal of claim 16, wherein the controller is further
configured to identify a location in which the at least one other
kernel is executed, to predict costs when the kernels are executed
by the resources of the terminal and the server based on data
transmission and reception with the at least one other kernel
according to the identified location, and to determine the
resources to execute the kernel based on the predicted costs.
18. The terminal of claim 17, wherein the controller is further
configured to measure a time required when the kernel is executed
by the terminal and to predict the cost based on the measured time
and energy consumed by the terminal when the kernel is
executed.
19. The terminal of claim 17, wherein the controller is further
configured to predict an execution time of the kernel based on
information on a capability of the server received from the server
and to predict the cost based on the predicted time and energy
consumed by the terminal when the terminal is in an idle state.
20. The terminal of claim 11, wherein the controller is further
configured to receive information on a result of the execution of
the kernel from the server.
Description
PRIORITY
[0001] This application claims priority under 35 U.S.C.
.sctn.119(a) to Korean Patent Application No. 10-2014-0121379,
filed in the Korean Intellectual Property Office on Sep. 12, 2014,
the contents of which are incorporated herein by reference.
BACKGROUND
[0002] 1. Field of the Invention
[0003] The present disclosure relates generally to an open
universal parallel computing framework, and more particularly, to a
method and an apparatus for executing an application based on an
open computing language.
[0004] 2. Description of the Related Art
[0005] A local device directly used by a user may have limitations
on tasks that it can perform, due to limited computing capability
or power resources. Particularly, a mobile device is largely
limited in terms of hardware capability for the sake of
portability. Accordingly, one of the methods for solving the
restrictions on tasks that can be performed, is to transfer a
workload to be processed to an external high performance device.
For example, a cloud service provides a computing system in which
an input or output operation is mainly performed through a user
terminal and operations such as information analysis, processing,
storage, management and circulation are performed in a third space
called a cloud. Through the use of such a service, plentiful
computing resources may be provided at a relatively low cost,
rendering the service as a proper alternative to cure some of the
aforementioned limitations of the mobile device. However, it is
difficult for the cloud service to allow a general programmer to
use such a task transferring scheme due to the nonexistence of an
integrated programming model capable of using the service and a
development environment.
[0006] More specifically, a high level of technology is required in
order to transfer the task, which has been performed in the local
device, to an external service and to perform the transferred task.
Because of incompatibility of instruction set architecture (ISA)
between different devices, a new program suitable for each device
should be made and a program that transfers a task to an external
device, performs the task, and collects and receives results should
be developed. In addition, it is required to determine which task
is suitable to be transferred to the external device and, when the
type of external device or local device changes, a new analysis of
suitability of the changed type must be performed.
SUMMARY
[0007] The present invention has been made to address the
above-mentioned problems and disadvantages, and to provide at least
the advantages described below.
[0008] Accordingly, an aspect of the present invention is to
provide a scheme of transferring a task to an internal or external
device according to a user preference through a programming model
using an open computing language (OpenCL).
[0009] According to another aspect of the present disclosure,
support is provided to automatically select an external device such
as a mobile device or a server, or an effective device among the
mobile device and server, with respect to an application program
written in an OpenCL according to a user's selection, to execute
the application program, and to selectively transfer only a part
suitable to be remotely performed through performance analysis
based on a cost model, so as to promote optimal performance and
energy efficiency.
[0010] In accordance with an aspect of the present disclosure, a
method of executing an application program by a terminal includes
receiving, from a server, a resource list including resources to
execute a kernel for the application program; determining, if the
application program is executed, resources to execute the kernel
for the application program among resources of the terminal and the
server; and transmitting, if the resources to execute the kernel
are determined as the resources of the server, data and the kernel
to the server.
[0011] In accordance with another aspect of the present disclosure,
a terminal for executing an application program includes a
communication unit configured to transmit and receive information;
and a controller configured to receive, from a server, a resource
list including resources to execute a kernel for the application
program, determine, if the application program is executed,
resources to execute the kernel for the application program among
resources of the terminal and the server, and transmit, if the
resources to execute the kernel are determined as the resources of
the server, data and the kernel to the server.
BRIEF DESCRIPTION OF THE DRAWINGS
[0012] The above and other aspects, features and advantages of the
present disclosure will be more apparent from the following
detailed description in conjunction with the accompanying drawings,
in which:
[0013] FIG. 1 illustrates a total configuration of an application
execution system based on an OpenCL according to an embodiment of
the present disclosure;
[0014] FIG. 2 illustrates an operation of a kernel offloading
method according to an embodiment of the present disclosure;
[0015] FIG. 3 illustrates cost estimation to determine whether to
perform offloading in an automatic offloading mode according to an
embodiment of the present disclosure;
[0016] FIG. 4 illustrates a multi-kernel in a method of determining
whether to perform offloading according to an embodiment of the
present disclosure;
[0017] FIG. 5 illustrates when a dependent kernel exists in a local
device in a method of determining whether to perform offloading
according to an embodiment of the present disclosure;
[0018] FIG. 6 illustrates when a dependent kernel exists in a
server node in a method of determining whether to perform
offloading according to an embodiment of the present
disclosure;
[0019] FIG. 7 illustrates an offloading method when a plurality of
kernels having a dependent relationship therebetween exist
according to an embodiment of the present disclosure;
[0020] FIG. 8 illustrates an operation for determining a kernel to
be offloaded by a client manager of a terminal (local device)
according to an embodiment of the present disclosure;
[0021] FIG. 9 illustrates an operation for determining a kernel to
be offloaded by the client manager of the terminal (local device)
when a plurality of kernels having dependency exist according to an
embodiment of the present disclosure; and
[0022] FIG. 10 illustrates a structure of the terminal to which the
method of determining whether to perform offloading can be applied
according to the present disclosure.
DETAILED DESCRIPTION OF EMBODIMENTS OF THE DISCLOSURE
[0023] Embodiments of the present disclosure will be described in
detail with reference to the accompanying drawings. A description
of technical matters well-known in the art and not directly
associated with the present disclosure will be omitted for the sake
of clarity and conciseness.
[0024] In addition, some elements may be exaggerated, omitted, or
schematically illustrated in the accompanying drawings, and the
size of each element does not completely reflect the actual size
thereof. In the respective drawings, the same or corresponding
elements are provided with the same reference numerals.
[0025] The advantages and features of the present disclosure and
methods to achieve the same will be apparent when reference is made
to embodiments as described below in detail in conjunction with the
accompanying drawings. However, the present disclosure is not
limited to the embodiments set forth below, but may be implemented
in various different forms. The following embodiments are provided
in an effort to completely disclose the present disclosure and
inform those skilled in the art of the scope of the present
disclosure, which is defined only by the appended claims.
Throughout the specification, the same or like reference signs are
used to designate the same or like elements.
[0026] A terminal of the present specification may be a device
including a Central Processing Unit (CPU) and a storage unit, such
as a smart phone, a tablet personal computer (PC), a mobile phone,
a video phone, an e-book reader, a desktop PC, a laptop PC, a
netbook computer, a personal digital assistant (PDA), a portable
multimedia player (PMP), an MP3 player, a mobile medical device, a
camera, and a wearable device including a head-mounted-device (HMD)
such as electronic glasses, electronic clothes, an electronic
bracelet, an electronic necklace, an electronic appcessory, an
electronic tattoo, and a smart watch.
[0027] The terminal of the present specification may also be a
smart home appliance including a CPU and a storage unit, such as a
television, a digital video disk (DVD) player, an audio player, a
refrigerator, an air conditioner, a cleaner, an oven, a microwave
oven, a washing machine, an air purifier, a set-top box, a TV box
such as Samsung HomeSync.TM., Apple TV.TM., or Google TV.TM., a
game console, an electronic dictionary, an electronic key, a
camcorder, or an electronic frame.
[0028] According to some embodiments, the terminal includes at
least one of various medical appliances such as magnetic resonance
angiography (MRA), magnetic resonance imaging (MRI), computed
tomography (CT), and ultrasonic machines, navigation equipment, a
global positioning system (GPS) receiver, an event data recorder
(EDR), a flight data recorder (FDR), an automotive infotainment
device, electronic equipment for ships such as ship navigation
equipment and a gyrocompass, avionics, security equipment, a
vehicle head unit, an industrial or home robot, an automatic teller
machine (ATM) of a banking system, and a point of sales (POS)
device of a shop.
[0029] It will be understood that each block of the flowchart
illustrations, and combinations of blocks in the flowchart
illustrations, can be implemented by computer program instructions
provided to a processor of a general purpose computer, special
purpose computer, or other programmable data processing apparatus
to produce a machine, such that the instructions, which execute via
the processor of the computer or other programmable data processing
apparatus, create means for implementing the functions specified in
the flowchart block or blocks. These computer program instructions
may also be stored in a non-transitory computer usable or
computer-readable memory that can direct a computer or other
programmable data processing apparatus to function in a particular
manner, such that the instructions stored in the computer usable or
computer-readable memory produce an article of manufacture
including instructions that implement the function specified in the
flowchart. The computer program instructions may also be loaded
onto a computer or other programmable data processing apparatus to
cause a series of operational steps to be performed on the computer
or other programmable apparatus to produce a computer implemented
process such that the instructions that are executable on the
computer or other programmable apparatus provide steps for
implementing the functions specified in the flowchart.
[0030] Each block of the flowchart illustrations may represent a
module, segment, or portion of code, which includes one or more
executable instructions for implementing the specified logical
function(s). In some alternative implementations, the functions
noted in the blocks may occur out of the order illustrated. For
example, two blocks shown in succession may in fact be executed
substantially concurrently or in reverse order, depending upon the
functionality involved.
[0031] As used herein, the terms "unit" or "module" refer to a
software element or a hardware element, such as a field
programmable gate array (FPGA) or an application specific
integrated circuit (ASIC), which performs a predetermined function.
However, the "unit" or " module" does not always have a meaning
limited to software or hardware, and may be constructed either to
be stored in an addressable storage medium or to execute one or
more processors. Therefore, the terms "unit" or "module" include,
for example, software elements, object-oriented software elements,
class elements or task elements, processes, functions, properties,
procedures, sub-routines, segments of a program code, drivers,
firmware, micro-codes, circuits, data, database, data structures,
tables, arrays, and parameters. The elements and functions provided
by the "unit" or "module" may be either combined into a smaller
number of elements or divided into a larger number of elements.
Moreover, the elements and the "units" or "modules" may be
implemented to reproduce one or more CPUs within a device or a
security multimedia card.
[0032] In this specification, a kernel may be a core part of an
operating system that performs a function of managing and
controlling system hardware, and may refer to controlling hardware
resources to operate a function required for driving an
application. The kernel may have the same meaning as the term
"kernel program", and may control or manage system resources used
for executing operations or functions implemented in programming
modules, such as middleware, an application programming interface
(API), or an application.
[0033] In this specification, an OpenCL may be one type of an API
that corresponds to an interface made to control a function
provided by an operating system or a programming language so as to
be used by an application program. More specifically, the OpenCL
includes an OpenCL C corresponding to a language based on C99 to
write a kernel code and an API for defining and controlling a
platform. The OpenCL provides task-based and data-based parallel
computing.
[0034] According to the present disclosure, a device to execute an
OpenCL programming model can be selected according to user
preference, and an optimal target to be offloaded can be determined
using a cost model.
[0035] Since whether to perform offloading is determined herein
through automatic kernel analysis, the offloading can be directly
applied, even though the type and environment of the server or
local device may change. Thus, limitations on performance
restrictions can be overcome in various environments.
[0036] According to the present disclosure, high energy efficiency
is provided when a local device is similar to a battery-dependent
mobile device.
[0037] FIG. 1 illustrates a configuration of an application
execution system based on an OpenCL according to an embodiment of
the present disclosure.
[0038] Referring to FIG. 1, an OpenCL application includes a kernel
program and a host program. The kernel program corresponds to a
function executed by a computing device such as a central
processing unit (CPU) or a graphic processing unit (GPU), and the
host program corresponds to a program operated by a host processor
that manages a memory or the kernel program to execute the kernel
program.
[0039] The present disclosure is applied to a system that is
divided into an OpenCL host 100 and an OpenCL computing device 105
according to a basic configuration of the OpenCL as illustrated in
FIG. 1, and further includes a client manager 110 in the host for
basic operations and a resource manager 120 and a server manager
125 in the computing device.
[0040] The resource manager 120 is a process firstly executed when
an application is driven and exists in a server. All clients 130
and servers 140 are aware of an address of the resource manager 120
in advance and thus may connect to the resource manager 120 at
anytime. Computation devices within the server inform the resource
manager 120 of a state of computation resources of each server
through the server manager 125.
[0041] The client 130, such as a local device, drives an OpenCL
application with the client manager 110 and receives desired server
resources through the resource manager as necessary. The kernel is
automatically analyzed by the client manager and is offloaded to a
server node allocated by the resource manager when needed, and the
server node executes the kernel through an OpenCL runtime system
127 within the server. A process of collecting read results of the
kernel execution performed by the server node and the local device
is performed by the client manager 110.
[0042] FIG. 2 illustrates an operation of a kernel offloading
method according to an embodiment of the present disclosure.
[0043] Referring to FIG. 2, when an application is driven, a
terminal determines in step 201 whether to execute the kernel in
the local device in step 202 in a local execution mode, in the
server in step 203 in a server execution mode, or by offloading the
kernel according to an automatically preset reference in step 204
in an automatic offloading mode, based on a user's selection or
preference. All kernels are executed within the local device (i.e.,
the terminal) in the local execution mode 202, all kernels are
offloaded to the server and executed in the server execution mode
203, and kernels to be offloaded are automatically selected in a
framework based on calculation and network capabilities of the
terminal and the server and the selected kernels are executed in
the automatic offloading mode 204.
[0044] The local execution mode 202 corresponds to a type in which
the terminal executes all kernels without offloading the kernels,
and is selected when there is no limitation on hardware resources
of the terminal and there is no restriction on tasks which can be
performed. In this case, until the OpenCL application program ends
in step 206, the kernels are executed by the terminal in step
205.
[0045] In the server execution mode 203, all kernels are offloaded
to the server and executed. To this end, information on targets to
be offloaded is required. The resource manager renews available
server information in a resource management table and, when
receiving a request for executing the kernel from the client
manager, informs the client manager of suitable server information.
The server information includes at least one of calculation or
computing performance capability of the server, a state of
communication with the server, and energy consumed by a data upload
or download to the server. The client manager directly transmits
data and the kernel to the corresponding server manager based on
the information. The received data and kernel are executed in an
OpenCL runtime system on the corresponding server in step 208.
Results of the execution are transferred to the client manager
which receives the results in step 209, and the process is repeated
until the program performance ends in step 210.
[0046] In the automatic offloading mode 204, kernels to be
offloaded are automatically selected in the framework based on
calculation and network capabilities of the terminal and an
external device and the kernels are executed by each of the
terminal and the server.
[0047] An estimated cost of each kernel is used as a reference for
determining the kernel executed by the kernel and the terminal in
step 211.
[0048] A method of estimating the cost will be described below in
more detail with reference to FIGS. 3 to 7.
[0049] When the kernel to be executed by the terminal or the server
is determined in step 211, the corresponding locations transmit and
receive required data and kernels in steps 212 and 214,
respectively, and the terminal and the server execute the kernels
in steps 213 and 215, respectively. The required data, which is the
result of the kernel execution performed by the server is received
by the terminal in step 216. The process is repeated until the
program ends in step 217.
[0050] FIG. 3 illustrates cost estimation to determine whether to
perform offloading in the automatic offloading mode according to an
embodiment of the present disclosure.
[0051] Referring to FIG. 3, costs are calculated by cost models
determined according to Equation (1) and Equation (2) below, and
whether to transfer (offload) the kernel to be executed is
determined based on the calculated costs. More specifically,
referring to FIG. 3, when kernel 1 is first executed, the time for
which the kernel is executed by the terminal is recorded as
indicated by reference numeral 301. When a cost of kernel 1 is
analyzed, the analysis is based on at least one of, for example, a
calculation capability of the terminal, a calculation capability of
the server node to be allocated, a data transmission time according
to a data size and a communication state (bandwidth) used by the
kernel, a time for which the kernel to be executed is actually
executed by the terminal, and energy consumption when the kernels
are executed. That is, through the analysis of the cost of kernel
1, the benefit in transferring the kernel is analyzed according to
Equations (1) and (2), as follows. The terminal determines whether
to transfer kernel based on the analysis, and transfers or performs
local execution based on the determination 302.
Cost.sub.L=T.sub.L*E.sub.C+T.sub.D*E.sub.D (1)
Cost.sub.S=T.sub.S*E.sub.I+T.sub.U*E.sub.U (2)
[0052] In Equations (1) and (2), Cost.sub.L indicates an execution
cost calculated by the terminal, and Cost.sub.S indicates an
execution cost calculated by the server. T.sub.L and T.sub.S
indicate processing time of the corresponding kernels in the
terminal and the server, E.sub.C and E.sub.I indicate an energy
when the kernel is executed by the terminal and an energy in an
idle state, T.sub.D and T.sub.U indicate a data download time and a
data upload time based on a communication state, and E.sub.D and
E.sub.U indicate energies in data download and upload when data is
transmitted.
[0053] The execution time of the current kernels is collected based
on profiles. In the initial execution, the kernel is executed and
the execution time is measured by the terminal. The kernel
execution time in each computing device is predicted based on the
measured value and a hardware (HW) capability of each computing
device. The predicted information is stored in a database file and
is then used for later execution.
[0054] FIG. 4 illustrates a multi-kernel in a method of determining
whether to perform offloading according to an embodiment of the
present disclosure.
[0055] In single-kernel analysis, only a cost model according to a
location in which the corresponding kernel is executed is
sufficient in the automatic offloading mode of FIG. 3, but a
plurality of kernels may exist within one OpenCL program and there
may be dependency between the kernels. In this case, a quantity of
communication that actually occurs may be changed by the
dependency.
[0056] In FIG. 4, kernel 1 including data 1 depends on kernel 2
including data 2, and kernel 3 including data 3 and kernel 4
including data 4 depend on kernel 5 including data 5. Accordingly,
when data 1 in kernel 1 is executed by the terminal and data 2 in
kernel 2 is executed by the server node, data
transmission/reception between the two kernels is necessary, which
generates transmission time and energy according to the data
transmission/reception. Therefore, in the multi-kernel in which a
plurality of kernels exists, more accurate analysis is acquired by
adding the dependency to the cost analysis.
[0057] FIG. 5 illustrates the multi-kernel in which kernels having
a dependent relationship therebetween exist in the local device in
the method of determining whether to perform offloading according
to an embodiment of the present disclosure.
[0058] The kernel dependency may be expressed as data dependency
between kernels. In the kernel dependency, one kernel may be
dependent on several kernels and several kernels may be dependent
on one kernel. A communication quantity of dependent data may be
determined according to a location in which the dependent kernel is
executed. Communication for data transmission is not necessary when
dependent kernels are executed in the same location, but is
necessary when the dependent kernels are executed in different
locations.
[0059] Referring to FIG. 5, since kernel 1 including data 1 and
kernel 2 including data 2 have a dependent relationship
therebetween and both the kernels are executed by the client, data
transmission/reception between the client and the server is not
necessary.
[0060] FIG. 6 illustrates a the multi-kernel in which kernels
having a dependent relationship therebetween exist in the local
device and the server node in the method of determining whether to
perform offloading according to an embodiment of the present
disclosure.
[0061] In FIG. 6, kernel 1 including data 1 and kernel 2 including
data 2 have a dependent relationship therebetween, but kernel 1
including data 1 is executed by the client (i.e., the local device
or terminal) and kernel 2 is executed by the server, so that data
is transmitted to the server node from the client. Accordingly, in
this case, the offloading is determined based on costs of time and
energy resources due to the data communication.
[0062] FIG. 7 illustrates an offloading method when a plurality of
kernels having a dependent relationship therebetween exist
according to an embodiment of the present disclosure.
[0063] When one kernel (kernel 3) is executed and a plurality of
kernels dependent on the one kernel exists, current cost analysis
of the independent kernel considers relationships with all the
dependent kernels in order to reflect costs of offloading of all
the dependent kernels to perform optimal offloading. To this end,
the client provides all the dependent kernels as one dependent set
and uses the dependent set for the cost analysis. If the dependent
set is generated once, the dependent set is stored in a database to
be used for later cost analysis.
[0064] When kernel 3 is executed, a dependent set including the
same kernel (kernel 3) is searched for in the database. When the
same kernel (kernel 3) does not exist, the cost analysis of the
corresponding kernel is performed only using the cost model
described in FIG. 2. When the dependent set including the same
kernel exists, an optimal location in which the corresponding
kernel is executed is analyzed through total cost analysis of the
set.
[0065] When a dependent set of kernel 1 to kernel 4 is found in the
database, it is determined whether to offload each kernel through
the cost analysis of each kernel. When it is determined whether to
transfer the task through the cost analysis considering the
dependency, the client manager transfers the corresponding kernel
to the server or executes the kernel by the local device according
to the determination. When the location in which the kernel
dependent on the current independent kernel is executed is
different from the location in which the current kernel will be
executed, the client manager moves dependent data to the location
in which the independent kernel is executed. When the user desires
to identify a result of the kernel execution, the client manager
moves the kernel to the local device to allow the user to identify
the kernel when the kernel exists in the server based on the
determination. The client manager also collects results when
computing of all kernels is completed.
[0066] FIG. 8 illustrates an operation for determining a kernel to
be offloaded by the client manager of the terminal according to an
embodiment of the present disclosure.
[0067] More specifically, FIG. 8 illustrates an operation for
determining a kernel to be offloaded by the client manager in the
automatic offloading mode.
[0068] Referring to FIG. 8, when the kernel is first executed, the
client manager measures an execution time of the corresponding
kernel in step 801. When the corresponding kernel is re-executed at
a predetermined time point, the client manager predicts an
execution cost of the corresponding kernel based on the measured
time and a capability of each computing device, that is, the
terminal or the server node in step 802. Each computing device is
included in an available computing resource list which the client
manager requests and receives from the resource manager. The
resource manager receives and stores the availability from the
server manager of each server node in advance.
[0069] Thereafter, the client manager predicts the execution cost
of the corresponding kernel based on at least one of a calculation
capability of the local device, a calculation capability of the
server node to be allocated, a data transmission time according to
a data size and a communication state (bandwidth) used by the
kernel, a time for which the kernel to be executed is actually
executed by the local device, and energy consumption when the
kernels are executed. That is, the client manager analyzes the
benefit of when the kernel is transferred by predicting the
execution cost.
[0070] The client manager selects resources such as the server node
to execute the corresponding kernel based on the predicted cost
according to the analysis in step 803. Although not illustrated in
FIG. 8, the client manager may collect results of the execution
performed by each server node.
[0071] FIG. 9 illustrates an operation for determining a kernel to
be offloaded by the client manager of the terminal when a plurality
of kernels having dependency exist according to an embodiment of
the present disclosure.
[0072] More specifically, FIG. 9 illustrates an operation for
determining a kernel to be offloaded by the client manager in the
automatic offloading mode.
[0073] Referring to FIG. 9, the terminal determines an independent
kernel to be executed and one or more kernels including all kernels
dependent on the independent kernel as a dependent set in step
901.
[0074] More specifically, kernel dependency is expressed as data
dependency between kernels and one kernel is dependent on several
kernels. A communication quantity of dependent data is determined
according to a location in which the dependent kernel is executed.
Communication for data transmission is not needed when dependent
kernels are executed in the same location, but is needed when the
dependent kernels are executed in different locations.
[0075] The terminal provides all the dependent kernels as one
dependent set and stores the dependent set to be used for cost
analysis. If the dependent set is generated once, the dependent set
is stored in a database (i.e. storage in the terminal) to be used
for later cost analysis. In step 901, when one independent kernel
is executed, a dependent set including kernels dependent on the
same kernel is determined in the database. When the dependent set
of the corresponding kernel does not exist, the cost analysis of
the corresponding kernel is performed only using the cost model
described in FIG. 8.
[0076] When the dependent set exists, total cost analysis of the
set is performed in step 902. This corresponds to analysis of the
cost required for executing each kernel based on the cost model
described in FIG. 8 with respect to all the kernels included in the
corresponding dependent set. When the dependent set exists, an
optimal location in which the corresponding kernel should be
executed is analyzed through the total cost analysis of the
set.
[0077] In step 903, the client manager determines resources to
execute each kernel, that is, to offload each kernel based on
results of the executed cost analysis.
[0078] Although not illustrated in FIG. 9, when it is determined
whether to transfer the task through the cost analysis considering
the dependency, the client manager may transfer the corresponding
kernel to the server or execute the kernel by the local device
according to the determination. When the location in which the
kernel dependent on the current independent kernel is executed is
different from the location in which the current kernel will be
executed, the client manager may move dependent data to the
location in which the independent kernel is executed. When the user
desires to identify a result of the kernel execution, the client
manager may move the kernel to the local device to allow the user
to identify the kernel when the kernel exists in the server based
on the determination. The client manager may also collect results
when the computing of all kernels is completed.
[0079] FIG. 10 illustrates a structure of the terminal to which the
method of determining whether to perform offloading can be applied
according to an embodiment of the present disclosure.
[0080] As previously noted, the terminal to which the method of
determining whether to perform offloading is applied according to
the present disclosure may also be referred to as the local device.
Further, the server that receives data and a kernel according to
the determination of the offloading, executes the data and the
kernel, and then transmits a result thereof to the local device may
correspond to an embodiment of the terminal.
[0081] Referring to FIG. 10, a terminal 1000 according to the
present disclosure includes a controller 1001 and a communication
unit 1002.
[0082] The communication unit 1002 performs data communication and
transmits/receives data and a kernel to perform offloading.
[0083] The controller 1001 controls general operations of the
terminal. Although it is illustrated in FIG. 10 that the terminal
1000 includes only the controller 1001 and the communication unit
1002, the terminal 1000 may further include a module for performing
various functions and a module for determining a kernel to be
offloaded or for collecting results of kernel execution.
Hereinafter, it is assumed that the controller 1001 controls all
the general operations of the terminal.
[0084] The controller 1001 receives a resource list for executing
the kernel from the server, predicts kernel execution costs of one
or more kernel execution resources included in the resource list,
and transmits the kernel to the kernel execution resources to
execute the kernel according to a result of the prediction.
[0085] In the above embodiments, all operations may be optionally
performed or may be omitted, and steps in each embodiment do not
have to be sequentially performed and orders thereof may be
changed. In addition, the embodiments disclosed in the
specification and drawings are merely presented to easily describe
technical details of the present disclosure and to assist in the
understanding of the present disclosure, and are not intended to
limit the scope of the present disclosure. That is, it is obvious
to those skilled in the art to which the present disclosure belongs
that different modifications can be achieved based on the technical
spirit of the present disclosure.
[0086] Although the present disclosure has been described above
using specific terms in connection with the certain embodiments
disclosed in the specification and drawings, it will be understood
by those skilled in the art that various changes in form and
details may be made therein without departing from the scope of the
present invention. Therefore, the scope of the present invention
should not be defined as being limited to the embodiments, but
should be defined by the appended claims and equivalents
thereof.
* * * * *