U.S. patent application number 17/486066 was filed with the patent office on 2022-01-13 for stream computing method, apparatus, and system.
The applicant listed for this patent is Huawei Technologies Co., Ltd.. Invention is credited to Fengbin Fang, Yunlong Shi.
Application Number | 20220012288 17/486066 |
Document ID | / |
Family ID | 1000005869105 |
Filed Date | 2022-01-13 |
United States Patent
Application |
20220012288 |
Kind Code |
A1 |
Shi; Yunlong ; et
al. |
January 13, 2022 |
Stream Computing Method, Apparatus, and System
Abstract
A stream computing method performed by a manager node includes
obtaining input channel description information, a structured query
language (SQL) statement, and output channel description
information, dynamically generating a data flow diagram according
to the input channel description information, the SQL statement,
and the output channel description information, and controlling,
according to the data flow diagram, a worker node to execute a
stream computing task.
Inventors: |
Shi; Yunlong; (Hangzhou,
CN) ; Fang; Fengbin; (Shenzhen, CN) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
Huawei Technologies Co., Ltd. |
Shenzhen |
|
CN |
|
|
Family ID: |
1000005869105 |
Appl. No.: |
17/486066 |
Filed: |
September 27, 2021 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
16261014 |
Jan 29, 2019 |
11132402 |
|
|
17486066 |
|
|
|
|
PCT/CN2017/094331 |
Jul 25, 2017 |
|
|
|
16261014 |
|
|
|
|
Current U.S.
Class: |
1/1 |
Current CPC
Class: |
G06F 16/284 20190101;
G06F 16/24568 20190101; H04L 67/10 20130101; G06F 16/9024
20190101 |
International
Class: |
G06F 16/901 20060101
G06F016/901; G06F 16/2455 20060101 G06F016/2455; G06F 16/28
20060101 G06F016/28; H04L 29/08 20060101 H04L029/08 |
Foreign Application Data
Date |
Code |
Application Number |
Jul 29, 2016 |
CN |
201610617253.2 |
Claims
1. A stream computing method performed by a manager node of a
stream computing system, wherein the stream computing method
comprises: generating a first data flow diagram according to input
channel description information, a structured query language (SQL)
statement, and output channel description information, wherein the
first data flow diagram comprises a plurality of logical nodes,
wherein the input channel description information defines an input
channel, wherein the input channel is a first logical channel for
inputting an input data stream from a data production system to the
first data flow diagram, wherein the output channel description
information defines an output channel, and wherein the output
channel is a logical channel for outputting an output data stream
of the first data flow diagram to a data consumption system;
classifying the logical nodes in the first data flow diagram to
obtain a plurality of logical node groups; selecting, from a preset
operator library, a common operator corresponding to each logical
node group of the logical node groups; generating a second data
flow diagram according to the common operator, wherein each
operator in the second data flow diagram implements functions of
one or more logical nodes in a logical node group corresponding to
the operator; and controlling, according to the second data flow
diagram, a worker node of the stream computing system to execute a
stream computing task.
2. The stream computing method of claim 1, wherein the first data
flow diagram comprises a source logical node, an intermediate
logical node, and a sink logical node coupled using first directed
edges, and wherein the second data flow diagram comprises a source
operator, an intermediate operator, and a sink operator coupled
using second directed edges.
3. The stream computing method of claim 1, wherein the SQL
statement comprises a plurality of SQL rules, and wherein the
stream computing method further comprises: generating a source
logical node in the first data flow diagram according to the input
channel description information, wherein the source logical node is
for receiving the input data stream from the data production
system; generating intermediate logical nodes in the first data
flow diagram according to select substatements in the SQL rules,
wherein the intermediate logical nodes indicate computational logic
for computing the input data stream, and wherein each intermediate
logical node of the intermediate logical nodes corresponds to one
SQL rule of the SQL rules; generating a sink logical node in the
first data flow diagram according to the output channel description
information, wherein the sink logical node is for sending the
output data stream to the data consumption system; and generating
first directed edges between the source logical node, the
intermediate logical nodes, and the sink logical node according to
an input substatement in each SQL rule of the SQL rules, an output
substatement in each SQL rule of the SQL rules, or both the input
substatement and the output substatement.
4. The stream computing method of claim 3, wherein the second data
flow diagram comprises a source operator, an intermediate operator,
and a sink operator coupled using second directed edges, wherein
the preset operator library comprises a common source operator, a
common intermediate operator, and a common sink operator, and
wherein the stream computing method further comprises: compiling
the common source operator to obtain the source operator in the
second data flow diagram; selecting, from the preset operator
library, at least one common intermediate operator for each logical
node group of the logical node groups; compiling the at least one
common intermediate operator to obtain the intermediate operator in
the second data flow diagram; compiling, the common sink operator
to obtain the sink operator in the second data flow diagram; and
generating the second directed edges according to the first
directed edges.
5. The stream computing method of claim 4, further comprising:
receiving, from a client, modification information for modifying an
SQL rule of the SQL rules; and adding, modifying, or deleting,
according to the modification information, an intermediate operator
in the second data flow diagram.
6. The stream computing method of claim 4, further comprising:
receiving, from a client, modification information for modifying
the input channel description information; and adding, modifying,
or deleting the source operator in the second data flow diagram
according to the modification information.
7. The stream computing method of claim 4, further comprising:
receiving, from a client, modification information for modifying
the output channel description information; and adding, modifying,
or deleting the sink operator in the second data flow diagram
according to the modification information.
8. The stream computing method of claim 1, wherein controlling the
worker node to execute the stream computing task comprises:
scheduling each operator in the second data flow diagram to at
least one worker node of a plurality of working nodes in the stream
computing system, generating, according to an output data stream of
each operator in the second data flow diagram, subscription
publication information corresponding to the operator, wherein the
subscription publication information indicates a manner of sending
a respective output data stream corresponding to the operator;
configuring, for each operator in the second data flow diagram, the
subscription publication information for the operator; generating,
for each operator in the second data flow diagram and according to
an input data stream of the operator, input stream definition
information corresponding to the operator, wherein the input stream
definition information indicates a manner of receiving a respective
input data stream corresponding to the operator; and configuring,
for each operator in the second data flow diagram, the input stream
definition information for the operator.
9. A manager node, comprising: a memory configured to store
instructions; and a processor coupled to the memory and configured
to execute the instructions to cause the manager node to: generate
a first data flow diagram according to input channel description
information, a structured query language (SQL) statement, and
output channel description information, wherein the first data flow
diagram comprises a plurality of logical nodes, wherein the input
channel description information defines an input channel, wherein
the input channel is a logical channel for inputting an input data
stream from a data production system to the first data flow
diagram, the output channel description information defines an
output channel, and wherein the output channel is a logical channel
for outputting an output data stream of the first data flow diagram
to a data consumption system; classify the logical nodes in the
first data flow diagram to obtain a plurality of logical node
groups; select, from a preset operator library, a common operator
corresponding to each logical node group of the logical node
groups; and generate a second data flow diagram according to the
common operator, wherein each operator in the second data flow
diagram implements functions of one or more logical nodes in a
logical node group corresponding to the operator; and control,
according to the second data flow diagram, a worker node of a
stream computing system to execute a stream computing task.
10. The manager node of claim 9, wherein the first data flow
diagram comprises a source logical node, an intermediate logical
node, and a sink logical node coupled using first directed edges,
and wherein the second data flow diagram comprises a source
operator, an intermediate operator, and a sink operator coupled
using second directed edges.
11. The manager node of claim 9, wherein the SQL statement
comprises a plurality of SQL rules, and wherein the instructions
further cause the processor to be manage node to: generate a source
logical node in the first data flow diagram according to the input
channel description information, wherein the source logical node is
for receiving the input data stream from the data production
system; generating intermediate logical nodes in the first data
flow diagram according to select substatements in each SQL rule of
the SQL rules, wherein the intermediate logical nodes indicate
computational logic for computing the input data stream, and
wherein each intermediate logical node of the intermediate logical
nodes corresponds to one SQL rule of the SQL rules; generate a sink
logical node in the first data flow diagram according to the output
channel description information, wherein the sink logical node is
for sending the output data stream to the data consumption system;
and generate first directed edges between the source logical node,
the intermediate logical nodes, and the sink logical node according
to an input substatement in each SQL rule of the SQL rules, an
output substatement in each SQL rule of the SQL rules, or both the
input substatement or the output substatement.
12. The manager node of claim 11, wherein the second data flow
diagram comprises a source operator, an intermediate operator, and
a sink operator coupled using second directed edges, wherein the
preset operator library comprises a common source operator, a
common intermediate operator, and a common sink operator, and
wherein the instructions further cause the manager node to: compile
the common source operator to obtain the source operator in the
second data flow diagram; select, from the preset operator library,
at least one common intermediate operator for each logical node
group of the logical node groups; compile the at least one common
intermediate operator to obtain the intermediate operator in the
second data flow diagram; compile the common sink operator to
obtain the sink operator in the second data flow diagram; and
generate the second directed edges according to the first directed
edges.
13. The manager node of claim 12, wherein the instructions further
cause the manager node to: receive, from a client, modification
information for modifying an SQL rule of the SQL rules; and add,
modify, or delete, according to the modification information, an
intermediate operator in the second data flow diagram.
14. The manager node of claim 12, wherein the instructions further
cause the manager node to: receive, from a client, modification
information for modifying the input channel description
information; and add, modify, or delete the source operator in the
second data flow diagram according to the second modification
information.
15. The manager node of claim 12, wherein the instructions further
cause the manager node to: receive, from a client, modification
information for modifying the output channel description
information; and add, modify, or delete the sink operator in the
second data flow diagram according to the modification
information.
16. The manager node of claim 10, wherein the instructions further
cause the manager node to schedule each operator in the second data
flow diagram to at least one worker node in the stream computing
system.
17. A computer program product comprising instructions that are
stored on a computer-readable medium and that, when executed by a
processor, cause a manager node to: generate a first data flow
diagram according to input channel description information, a
structured query language (SQL) statement, and output channel
description information, wherein the first data flow diagram
comprises a plurality of logical nodes, wherein the input channel
description information defines an input channel, wherein the input
channel is a logical channel for inputting an input data stream
from a data production system to the first data flow diagram,
wherein the output channel description information defines an
output channel, and wherein the output channel is a logical channel
for outputting an output data stream of the first data flow diagram
to a data consumption system; classify the logical nodes in the
first data flow diagram to obtain a plurality of logical node
groups; select, from a preset operator library, a common operator
corresponding to each logical node group of the logical node
groups; and generate a second data flow diagram according to the
common operator, wherein each operator in the second data flow
diagram implements functions of one or more logical nodes in a
logical node group corresponding to the operator; and control,
according to the second data flow diagram, a worker node of a
stream computing system to execute the stream computing task.
18. The computer program product of claim 17, wherein the first
data flow diagram comprises a source logical node, an intermediate
logical node, and a sink logical node coupled using first directed
edges, and wherein the second data flow diagram comprises a source
operator, an intermediate operator, and a sink operator coupled
using second directed edges.
19. The computer program product of claim 17, wherein the SQL
statement comprises a plurality of SQL rules, and wherein when
executed by the processor, the instructions further cause the
manager node to: generate a source logical node in the first data
flow diagram according to the input channel description
information, wherein the source logical node is for receiving the
input data stream from the data production system; generate
intermediate logical nodes in the first data flow diagram according
to select substatements in the SQL rules, wherein the intermediate
logical nodes indicate computational logic for computing the input
data stream, and wherein each intermediate logical node of the
intermediate logical nodes corresponds to one SQL rule of the SQL
rules; generate a sink logical node in the first data flow diagram
according to the output channel description information, wherein
the sink logical node is for sending the output data stream to the
data consumption system; and generate first directed edges between
the source logical node, the intermediate logical nodes, and the
sink logical node according to an input substatement in each SQL
rule of the SQL rules, an output substatement in each SQL rule of
the SQL rules, or both the input substatement and the output
substatement.
20. The computer program product of claim 19, wherein the second
data flow diagram comprises a source operator, an intermediate
operator, and a sink operator coupled using second directed edges,
wherein the preset operator library comprises a common source
operator, a common intermediate operator, and a common sink
operator, and wherein when executed by the processor, the
instructions further cause the manager node to: compile the common
source operator to obtain the source operator in the second data
flow diagram; select, from the preset operator library, at least
one common intermediate operator for each logical node group of the
logical node groups; compile the at least one common intermediate
operator to obtain the intermediate operator in the second data
flow diagram; compile the common sink operator to obtain the sink
operator in the second data flow diagram; and generate the second
directed edges according to the first directed edges.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This is a continuation of U.S. patent application Ser. No.
16/261,014, filed on Jan. 29, 2019, now U.S. Pat. No. 11,132,402.
which is a continuation of International Patent Application No.
PCT/CN2017/094331 filed on Jul. 25, 2017, The International
Application claims priority to Chinese Patent Application No.
201610617253.2 filed on Jul. 29, 2016. All of the afore-mentioned
patent applications are hereby incorporated by reference in their
entireties.
TECHNICAL FIELD
[0002] The present disclosure relates to the big data computing
field, and in particular, to a stream computing method, apparatus,
and system.
BACKGROUND
[0003] In application fields such as a finance service, sensor
monitoring, and network monitoring, a data stream (also called data
flow) is real-time, volatile, sudden, disordered, and infinite. As
a system that can perform computing processing on a real-time data
stream, a stream computing system is increasingly widely
applied.
[0004] A directed acyclic graph (DAG) may be used to represent
processing logic of a streaming application (or generally referred
to as a stream application) deployed in the stream computing
system, and the DAG is also referred to as a data flow diagram
(DFD). Referring to FIG. 1, a data flow diagram 100 is used to
represent the processing logic of the stream application. Each
directed edge in the data flow diagram 100 represents a data
stream, each node represents an operator, and each operator in the
diagram has at least one input data stream and at least one output
data stream. An operator is a smallest unit that is in the stream
computing system and that may be scheduled to execute a computing
task, and the operator may also be referred to as an execution
operator.
[0005] When the stream application is deployed in the stream
computing system, a user needs to first construct a data flow
diagram for the stream application, and then the stream application
is compiled and runs in the stream system in a data flow diagram
form to execute a task for processing a data stream. The stream
computing system provides an integrated development environment
(IDE) for the user. The IDE provides a graphical user interface
that is used for constructing a data flow diagram, and the
graphical user interface includes several basic operators. The user
constructs a data flow diagram on the graphical user interface by
dragging a basic operator, and needs to configure various running
parameters for the data flow diagram.
[0006] Although a manner in which the data flow diagram is
constructed by dragging the basic operator is extremely intuitive,
a function of each basic operator provided in the IDE is
pre-divided at an extremely fine granularity to help the user
construct a data flow diagram. Consequently, complexity of
constructing a data flow diagram is relatively high, a data flow
diagram that is actually constructed by the user is relatively
complex, and overall computing performance of the data flow diagram
is relatively poor.
SUMMARY
[0007] To improve overall computing performance of a data flow
diagram, embodiments of the present application provide a stream
computing method, apparatus, and system. The technical solutions
are as follows.
[0008] The stream computing system is generally a distributed
computing architecture. The distributed computing architecture
includes a manager node and at least one worker node. A user
configures a data flow diagram in the manager node using a client,
and the manager node schedules each operator in the data flow
diagram to the worker node for running.
[0009] According to a first aspect, an embodiment of the present
application provides a stream computing method, and the method is
applied to a stream computing system including a manager node and a
worker node, and includes obtaining, by the manager node, input
channel description information, a structured query language (SQL)
statement, and output channel description information from a
client, generating, by the manager node, a data flow diagram
according to the input channel description information, the SQL
statement, and the output channel description information, where
the data flow diagram is used to define computational logic of
multiple operators for executing a stream computing task and a data
stream input/output relationship between the operators, and
controlling, by the manager node according to the data flow
diagram, an operator in the worker node to execute the stream
computing task, and scheduling the multiple operators to one or
more worker nodes in the stream computing system for execution.
[0010] The input channel description information is used to define
an input channel, and the input channel is a logical channel that
is used to input a data stream from a data production system to the
data flow diagram. The output channel description information is
used to define an output channel, and the output channel is a
logical channel that is used to output an output data stream of the
data flow diagram to a data consumption system.
[0011] In this embodiment of the present application, the manager
node generates the executable data flow diagram according to the
input channel description information, the SQL statement, and the
output channel description information, and then the manager node
controls, according to the data flow diagram, the worker node to
perform stream computing. This resolves, to some extent, a problem
that complexity of constructing a data flow diagram is relatively
high and overall computing performance of the generated data flow
diagram is relatively poor because a function of each basic
operator is divided at an extremely fine granularity when the data
flow diagram is constructed in a current stream computing system
using the basic operator provided by an IDE. An SQL is a relatively
common database management language, and the stream computing
system supports the SQL statement in constructing a data flow
diagram such that system usability can be improved, and user
experience can be improved. In addition, the user uses the SQL
statement using a programming language characteristic of the SQL
language to define processing logic of the data flow diagram, and
the manager node dynamically generates the data flow diagram
according to the processing logic defined using the SQL statement
such that overall computing performance of the data flow diagram is
improved.
[0012] With reference to the first aspect, in a first possible
implementation of the first aspect, the SQL statement includes
several SQL rules, and each SQL rule includes at least one SQL sub
statement, and generating, by the manager node, a data flow diagram
according to the input channel description information, the SQL
statement, and the output channel description information includes
generating, by the manager node, a first data flow diagram
according to the input channel description information, the several
SQL rules, and the output channel description information, where
the first data flow diagram includes several logical nodes, and
classifying, by the manager node, the logical nodes in the first
data flow diagram to obtain several logical node groups, and
selecting a common operator from a preset operator library
according to each logical node group, and generating a second data
flow diagram according to the selected common operator, where each
operator in the second data flow diagram is used to implement
functions of one or more logical nodes in a logical node group
corresponding to the operator.
[0013] In conclusion, according to the stream computing method
provided in this implementation, the user only needs to logically
write the SQL rule. The manager node generates the first data flow
diagram according to the SQL rule, where the first data flow
diagram includes the several logical nodes. Then, the manager node
classifies the logical nodes in the first data flow diagram using
the preset operator library, and converts each logical node group
into an operator in the second data flow diagram, where each
operator in the second data flow diagram is used to implement
logical nodes that belong to a same logical node group in the first
data flow diagram. In this way, the user neither needs to have a
stream programming thought nor needs to care about classification
logic of an operator, and a data flow diagram can be constructed
provided that the SQL rule is logically written. The manager node
generates an operator in the data flow diagram such that code
editing work of constructing a stream computing application by the
user is reduced, and complexity of constructing the stream
computing application by the user is reduced.
[0014] With reference to the first possible implementation of the
first aspect, in a second possible implementation of the first
aspect, the first data flow diagram includes a source logical node,
an intermediate logical node, and a sink logical node that are
connected by directed edges, and the generating, by the manager
node, a first data flow diagram according to the input channel
description information, the several SQL rules, and the output
channel description information includes generating, by the manager
node, the source logical node in the first data flow diagram
according to the input channel description information, where the
source logical node is used to receive an input data stream from
the data production system, generating, by the manager node, the
intermediate logical node in the first data flow diagram according
to a select substatement in each SQL rule, where the intermediate
logical node is used to indicate computational logic for computing
the input data stream, and each intermediate logical node
corresponds to one SQL rule, generating, by the manager node, the
sink logical node in the first data flow diagram according to the
output channel description information, where the sink logical node
is used to send an output data stream to the data consumption
system, and generating, by the manager node, the directed edges
between the source logical node, the intermediate logical node, and
the sink logical node according to an input substatement and/or an
output substatement in each SQL rule.
[0015] In conclusion, according to the stream computing method
provided in this implementation, the input substatement, the select
substatement, and the output substatement in the SQL statement are
converted in the stream computing system, and the stream computing
system supports the user in logically defining a logical node in
the data flow diagram using an SQL rule such that difficulty of
defining the stream computing application is reduced using an SQL
syntax familiar to the user, and a data flow diagram customized
manner with extremely high usability is provided.
[0016] With reference to the first or the second possible
implementation of the first aspect, in a third possible
implementation of the first aspect, the second data flow diagram
includes a source operator, an intermediate operator, and a sink
operator that are connected by directed edges, and the preset
operator library includes a common source operator, a common
intermediate operator, and a common sink operator, and classifying,
by the manager node, the logical nodes in the first data flow
diagram, selecting a common operator from a preset operator library
according to each logical node group, and generating a second data
flow diagram according to the selected common operator includes
compiling, by the manager node, the common source operator to
obtain the source operator in the second data flow diagram,
selecting, by the manager node from the preset operator library, at
least one common intermediate operator for each logical node group
that includes the intermediate logical node, and compiling the
selected common intermediate operator to obtain the intermediate
operator in the second data flow diagram, compiling, by the manager
node, the common sink operator to obtain the sink operator in the
second data flow diagram, and generating, by the manager node, the
directed edges between operators in the second data flow diagram
according to the directed edges between the source logical node,
the intermediate logical node, and the sink logical node.
[0017] In conclusion, according to the stream computing method
provided in this implementation, the manager node classifies the
multiple logical nodes in the first data flow diagram, and
implements, using a same common intermediate operator, logical
nodes that are classified into a same logical node group. The user
does not need to consider factors such as load balance and
concurrent execution, and the manager node determines generation of
the second data flow diagram according to the factors such as load
balance and concurrent execution such that difficulty of generating
the second data flow diagram by the user is further reduced,
provided that the user is capable of constructing the logic-level
first data flow diagram using the SQL.
[0018] With reference to any one of the first to the third possible
implementations of the first aspect, in a fourth possible
implementation, the controlling, by the manager node according to
the data flow diagram, the worker node to perform stream computing
includes scheduling, by the manager node, each operator in the
second data flow diagram to at least one worker node in the stream
computing system, where the worker node is configured to execute
the operator, generating, by the manager node according to an
output data stream of each operator, subscription publication
information corresponding to the operator, and configuring the
subscription publication information for the operator, and
generating, by the manager node according to an input data stream
of each operator, input stream definition information corresponding
to the operator, and configuring the input stream definition
information for the operator, where the subscription publication
information is used to indicate a manner of sending an output data
stream corresponding to a current operator, and the input stream
definition information is used to indicate a manner of receiving an
input data stream corresponding to the current operator.
[0019] In conclusion, according to the stream computing method
provided in this implementation, a subscription mechanism is set,
and a citation relationship between the input data stream and the
output data stream of each operator in the second data flow diagram
is decoupled such that each operator in the second data flow
diagram can still be dynamically adjusted after the second data
flow diagram is executed, and overall usability and maintainability
of the stream computing application are improved.
[0020] With reference to any one of the first to the fourth
possible implementations of the first aspect, in a fifth possible
implementation, the method further includes receiving, by the
manager node, first modification information from the client, where
the first modification information is information for modifying the
SQL rule, and adding, modifying, or deleting, by the manager node,
the corresponding intermediate operator in the second data flow
diagram according to the first modification information.
[0021] In conclusion, according to the stream computing method
provided in this implementation, the client sends the first
modification information to the manager node, and the manager node
adds, modifies, or deletes the intermediate operator in the second
data flow diagram according to the first modification information
such that the manager node can still dynamically adjust the
intermediate operator in the second data flow diagram after the
second data flow diagram is generated.
[0022] With reference to any one of the first to the fifth possible
implementations of the first aspect, in a sixth possible
implementation, the method further includes receiving, by the
manager node, second modification information from the client,
where the second modification information is information for
modifying the input channel description information, and adding,
modifying, or deleting the source operator in the second data flow
diagram according to the second modification information, and/or
receiving, by the manager node, third modification information from
the client, where the third modification information is information
for modifying the output channel description information, and
adding, modifying, or deleting the sink operator in the second data
flow diagram according to the third modification information.
[0023] In conclusion, according to the stream computing method
provided in this implementation, the client sends the second
modification information and/or the third modification information
to the manager node, and the manager node adds, modifies, or
deletes the source operator and/or the sink operator in the second
data flow diagram such that the manager node can still dynamically
adjust the source operator and/or the sink operator in the second
data flow diagram after the second data flow diagram is
generated.
[0024] According to a second aspect, a stream computing apparatus
is provided, where the stream computing apparatus includes at least
one unit, and the at least one unit is configured to implement the
stream computing method in any one of the first aspect or the
possible implementations of the first aspect.
[0025] According to a third aspect, a manager node is provided,
where the manager node includes a processor and a memory, the
processor is configured to store one or more instructions, the
instruction is instructed to be executed by the processor, and the
processor is configured to implement the stream computing method in
any one of the first aspect or the possible implementations of the
first aspect.
[0026] According to a fourth aspect, an embodiment of the present
application provides a computer readable storage medium, and the
computer readable storage medium stores an executable program for
implementing the stream computing method in any one of the first
aspect or the possible implementations of the first aspect.
[0027] According to a fifth aspect, a stream computing system is
provided, where the stream computing system includes a manager node
and at least one worker node, and the manager node is the manager
node in the third aspect.
BRIEF DESCRIPTION OF DRAWINGS
[0028] FIG. 1 is a schematic structural diagram of a data flow
diagram;
[0029] FIG. 2A is a schematic structural diagram of a stream
computing system according to an embodiment of the present
application;
[0030] FIG. 2B is a structural block diagram of a stream computing
system according to another embodiment of the present
application;
[0031] FIG. 3A is a structural block diagram of a manager node
according to an embodiment of the present application;
[0032] FIG. 3B is a structural block diagram of a manager node
according to another embodiment of the present application;
[0033] FIG. 4 is a schematic principle diagram of a stream
computing process according to an embodiment of the present
application;
[0034] FIG. 5 is a method flowchart of a stream computing method
according to an embodiment of the present application;
[0035] FIG. 6 is a schematic principle diagram of a stream
computing method according to an embodiment of the present
application;
[0036] FIG. 7 is a method flowchart of a stream computing method
according to another embodiment of the present application;
[0037] FIG. 8A and FIG. 8B are a method flowchart of a stream
computing method according to another embodiment of the present
application;
[0038] FIG. 8C is a schematic principle diagram of a stream
computing method according to another embodiment of the present
application;
[0039] FIG. 8D is a method flowchart of a stream computing method
according to another embodiment of the present application;
[0040] FIG. 8E is a method flowchart of a stream computing method
according to another embodiment of the present application;
[0041] FIG. 8F is a method flowchart of a stream computing method
according to another embodiment of the present application;
[0042] FIG. 9A is a schematic principle diagram of specific
implementation of a stream computing method according to an
embodiment of the present application;
[0043] FIG. 9B is a schematic principle diagram of specific
implementation of a stream computing method according to another
embodiment of the present application;
[0044] FIG. 10 is a structural block diagram of a stream computing
apparatus according to another embodiment of the present
application; and
[0045] FIG. 11 is a structural block diagram of a stream computing
system according to another embodiment of the present
application.
DESCRIPTION OF EMBODIMENTS
[0046] To make the objectives, technical solutions, and advantages
of the present application clearer, the following further describes
implementations of the present application in detail with reference
to the accompanying drawings. FIG. 2A shows a schematic structural
diagram of a stream computing system according to an embodiment of
the present application. For example, the stream computing system
is a distributed computing system, and the distributed computing
system includes a terminal 220, a manager node 240, and multiple
worker nodes 260.
[0047] The terminal 220 is an electronic device such as a mobile
phone, a tablet computer, a laptop portable computer, or a desktop
computer, and a hardware form of the terminal 220 is not limited in
this embodiment. A client runs in the terminal 220, and the client
is configured to provide a human-computer interaction entry between
a user and the distributed computing system. The client is capable
of obtaining input channel description information, several SQL
rules, and output channel description information according to user
input.
[0048] Optionally, the client is an original client provided by the
distributed computing system, or the client is a client
independently developed by the user.
[0049] The terminal 220 is connected to the manager node 240 using
a wired network, a wireless network, or a special-purpose hardware
interface.
[0050] The manager node 240 is a server or a combination of some
servers, and a hardware form of the manager node 240 is not limited
in this embodiment. The manager node 240 is a node for managing
each worker node 260 in the distributed computing system.
Optionally, the manager node 240 is configured to perform at least
one of resource management, active/standby management, application
management, or task management on each worker node 260. The
resource management is management on a computing resource of each
worker node 260. The active/standby management is active/standby
switching management implemented when a fault occurs on each worker
node 260. The application management is management on at least one
stream computing application running in the distributed computing
system. The task management is management on a computing task of
each operator in a stream computing application. In different
stream computing systems, the manager node 240 may have different
names, for example, a master node.
[0051] The manager node 240 is connected to the worker node 260
using a wired network, a wireless network, or a special-purpose
hardware interface.
[0052] The worker node 260 is a server or a combination of some
servers, and a hardware form of the worker node 260 is not limited
in this embodiment. Optionally, an operator in a stream computing
application runs on the worker node 260. Each worker node 260 is
responsible for a computing task of one or more operators. For
example, each process in the worker node 260 is responsible for a
computing task of one operator.
[0053] When there are multiple worker nodes 260, the multiple
worker nodes 260 are connected using a wired network, a wireless
network, or a special-purpose hardware interface.
[0054] It can be understood that, in a virtualization scenario, the
manager node 240 and the worker node 260 in the stream computing
system may be implemented using a virtual machine running on
commodity hardware. FIG. 2B shows a structural block diagram of a
stream computing system according to another embodiment of the
present application. For example, the stream computing system
includes a distributed computing platform including several
computing devices 22. At least one virtual machine runs in each
computing device 22, and each virtual machine is a manager node 240
or a worker node 260.
[0055] The manager node 240 and the worker node 260 are different
virtual machines (as shown in FIG. 2B) located in a same computing
device 22. Optionally, the manager node 240 and the worker node 260
are different virtual machines located in different computing
devices 22.
[0056] Optionally, more than one worker node 260 runs in each
computing device 22, and each worker node 260 is a virtual machine.
A quantity of worker nodes 260 that can run in each computing
device 22 depends on compute power of the computing device 22.
[0057] Optionally, the computing devices 22 are connected using a
wired network, a wireless network, or a special-purpose hardware
interface. Optionally, the special-purpose hardware interface is an
optical fiber, a cable of a predetermined interface type, or the
like.
[0058] That is, in this embodiment of the present application,
whether the manager node 240 is a physical entity or a logical
entity is not limited, and whether the worker node 260 is a
physical entity or a logical entity is not limited either. A
structure and a function of the manager node 240 are further
described below.
[0059] FIG. 3A shows a structural diagram of a manager node 240
according to an embodiment of the present application. The manager
node 240 includes a processor 241, a network interface 242, a bus
243, and a memory 244.
[0060] The processor 241 is separately connected to the network
interface 242 and the memory 244 using the bus 243.
[0061] The network interface 242 is configured to implement
communication between a terminal 220 and a worker node 260.
[0062] The processor 241 includes one or more processing cores. The
processor 241 implements a management function in a stream
computing system by running an operating system or an application
program module.
[0063] Optionally, the memory 244 may store an operating system 245
and an application program module 25 required by at least one
function. The application program module 25 includes an obtaining
module 251, a generation module 252, an execution module 253, and
the like.
[0064] The obtaining module 251 is configured to obtain input
channel description information, an SQL statement, and output
channel description information from a client.
[0065] The generation module 252 is configured to generate a data
flow diagram according to the input channel description
information, the SQL statement, and the output channel description
information, where the data flow diagram is used to define
computational logic of operators for executing a stream computing
task and a data stream input/output relationship between the
operators.
[0066] The execution module 253 controls, according to the data
flow diagram, the worker node to execute the stream computing
task.
[0067] In addition, the memory 244 may be implemented by a volatile
or non-volatile storage device of any type or a combination
thereof, such as a static random access memory (SRAM), an
electrically erasable programmable read-only memory (EEPROM), an
erasable programmable read-only memory (EPROM), a programmable
read-only memory (PROM), a read-only memory (ROM), a magnetic
memory, a flash memory, a magnetic disk, or an optical disc.
[0068] A person skilled in the art can understand that, the
structure shown in FIG. 3A does not constitute a limitation on the
manager node 240, and the manager node may include more or fewer
components than those shown in the diagram, or may combine some
components or have different component arrangements.
[0069] FIG. 3B shows an embodiment of a manager node 240 in a
virtualization scenario. As shown in FIG. 3B, the manager node 240
is a virtual machine (designated as VM) 224 running in a computing
device 22. The computing device 22 includes a hardware layer 221, a
virtual machine monitor (VMM) 222 running at the hardware layer
221, and a host machine (designated as Host) 223 and several
virtual machines running on the VMM 222. The hardware layer 221
includes but is not limited to an input/output (I/O) device, a
central processing unit (CPU), and a memory. An executable program
runs on the VM, and the VM invokes a hardware resource of the
hardware layer 221 by running the executable program and using the
Host 223 in a program running process to implement functions of the
obtaining module 251, the generation module 252, and the execution
module 253. Further, the obtaining module 251, the generation
module 252, and the execution module 253 may be included in the
executable program in a form of a software module or a function,
and the VM 224 runs the executable program by invoking resources
such as the CPU and the memory at the hardware layer 221 to
implement the functions of the obtaining module 251, the generation
module 252, and the execution module 253.
[0070] With reference to FIG. 2A-2B, and FIG. 3A-3B, an overall
process of stream computing performed in a stream computing system
is described below. FIG. 4 shows a schematic principle diagram of a
stream computing process according to an embodiment of the present
application. A data production system 41, a stream computing system
42, and a data consumption system 43 are included in the overall
stream computing process.
[0071] The data production system 41 is used to generate data. In
different implementation environments, the data production system
41 may be a finance system, a network monitoring system, a
manufacturing system, a web application system, a sensing and
detection system, or the like.
[0072] Optionally, a storage form of the data generated by the data
production system 41 includes but is not limited to at least one of
a file, a network data packet, or a database. The storage form of
the data is not limited in this embodiment of the present
application.
[0073] Optionally, in terms of hardware, the data production system
41 is connected to the stream computing system 42 using a hardware
line such as a network, an optical fiber, or a hardware interface
card. In terms of software, the data production system 41 is
connected to the stream computing system 42 using an input channel
411. The input channel 411 is a logical channel that is used to
input a data stream from the data production system 41 to a data
flow diagram in the stream computing system 42, and the logical
channel is used to implement an interconnection between the data
production system 41 and the stream computing system 42 in a
transmission path, a transmission protocol, a data format, a data
encoding/decoding scheme, and the like.
[0074] The stream computing system 42 generally includes a data
flow diagram including multiple operators. The data flow diagram
may be regarded as a stream computing application. The data flow
diagram includes a source operator 421, at least one intermediate
operator 422, and a sink operator 423. The source operator 421 is
used to receive an input data stream from the data production
system 41, and the source operator 421 is further used to send the
input data stream to the intermediate operator 422. The
intermediate operator 422 is used to compute the input data stream,
and an output data stream obtained by means of computing to a
next-level intermediate operator 422 or the sink operator 423. The
sink operator 423 is used to send the output data stream to the
data consumption system 43. The operators are scheduled by the
manager node in FIG. 2, and run on the multiple worker nodes 260 in
FIG. 2 in a distributed form. At least one operator runs on each
worker node 260.
[0075] Optionally, in terms of hardware, the stream computing
system 42 is connected to the data consumption system 43 using a
hardware line such as a network, an optical fiber, or a hardware
interface card. In terms of software, the stream computing system
42 is connected to the data consumption system 43 using an output
channel 431. The output channel 431 is a logical channel that is
used to output an output data stream of the stream computing system
42 to the data consumption system 43, and the logical channel is
used to implement an interconnection between the stream computing
system 42 and the data consumption system 43 in a transmission
path, a transmission protocol, a data format, a data
encoding/decoding scheme, and the like.
[0076] The data consumption system 43 is used to use the output
data stream computed by the stream computing system 42.
[0077] The data consumption system 43 persistently stores or reuses
the output data stream. For example, the data consumption system 43
is a recommendation system, and the recommendation system
recommends an interested web page, a text, audio, a video, shopping
information, and the like to a user according to the output data
stream.
[0078] The data flow diagram in the stream computing system 42 is
generated, deployed, or adjusted by the user using a client 44.
[0079] In an embodiment of the present application, a stream
computing system provides a data flow diagram construction manner
in which a data flow diagram is constructed using an SQL statement.
For example, FIG. 5 shows a flowchart of a stream computing method
according to an embodiment of the present application. An example
in which the stream computing method is applied to the manager node
shown in FIG. 2A-2B and FIG. 3A-3B is described in this embodiment.
The method includes the following steps.
[0080] Step 501: A manager node obtains input channel description
information, an SQL statement, and output channel description
information from a client.
[0081] A user sends the input channel description information, the
SQL statement, and the output channel description information to
the manager node using the client.
[0082] The input channel description information is used to define
an input channel, or the input channel description information is
used to describe an input manner of an input data stream, or the
input channel description information is used to describe
construction information of an input channel. The input channel is
a logical channel that is used to input a data stream from a data
production system to a data flow diagram.
[0083] Optionally, the input channel description information
includes at least one of transmission medium information,
transmission path information, data format information, or data
decoding scheme information. For example, one piece of input
channel description information includes an Ethernet medium, an
Internet Protocol (IP) address, a port number, a transmission
control protocol (TCP) data packet, and a default decoding scheme.
Another piece of input channel description information includes a
file storage path and an Excel file.
[0084] The SQL statement is used to define computational logic of
each operator in the data flow diagram, and an input data stream
and an output data stream of each operator. Optionally, each
operator has at least one input data stream, and each operator has
at least one output data stream.
[0085] The output channel description information is used to define
an output channel, or the output channel description information is
used to describe an output manner of an output data stream, or the
output channel description information is used to describe
construction information of an output channel. The output channel
is a logical channel that is used to output an output data stream
of the data flow diagram to a data consumption system.
[0086] Optionally, the output channel description information
includes at least one of transmission medium information,
transmission path information, data format information, or data
encoding scheme information. For example, one piece of output
channel description information includes a file storage path and a
comma-separated values (CSV) file.
[0087] The manager node receives the input channel description
information, the SQL statement, and the output channel description
information that are sent by the client.
[0088] Step 502: The manager node generates a data flow diagram
according to the input channel description information, the SQL
statement, and the output channel description information, where
the data flow diagram is used to define computational logic of
operators in stream computing and a data stream input/output
relationship among the operators.
[0089] Optionally, the SQL statement includes several SQL rules,
and each SQL rule is used to define computational logic of one
logical operator, and an input data stream and an output data
stream of the operator. Each SQL rule includes at least one SQL
substatement.
[0090] Optionally, each operator has at least one input data
stream, and each operator has at least one output data stream.
[0091] Optionally, an executable data flow diagram includes a
source operator (Source), an intermediate operator, and a sink
operator (Sink). The source operator is used to receive an input
data stream from the data production system, and input the input
data stream to the intermediate operator. The intermediate operator
is used to compute the input data stream from the source operator,
or the intermediate operator is used to compute an input data
stream from another intermediate operator. The sink operator is
used to send an output data stream to the data consumption system
according to a computing result from the intermediate operator.
[0092] Step 503: The manager node controls, according to the data
flow diagram, a worker node to execute a stream computing task.
[0093] The manager node controls, according to the data flow
diagram, each worker node in the stream computing system to execute
a stream computing task. The "data flow diagram" herein should be
understood as an executable stream application.
[0094] Optionally, the manager node schedules the generated data
flow diagram to each worker node for distributed execution.
Multiple worker nodes perform stream computing on the input data
stream from the data production system according to the data flow
diagram to obtain a final output data stream, and output the output
data stream to the data consumption system.
[0095] In conclusion, according to the stream computing method
provided in this implementation, the manager node generates the
executable data flow diagram according to the input channel
description information, the SQL statement, and the output channel
description information, and then the manager node controls,
according to the data flow diagram, the worker node to perform
stream computing. This resolves a problem that overall computing
performance of a generated data flow diagram is relatively poor
because a function of each basic operator is divided at an
extremely fine granularity when the data flow diagram is
constructed in a current stream computing system using the basic
operator provided by an IDE. An SQL is a relatively common database
management language, and the stream computing system supports the
SQL statement in constructing a data flow diagram such that
usability of constructing the data flow diagram by the user using
the SQL statement is ensured. In addition, the user uses the SQL
statement using a programming language characteristic of the SQL
language to define processing logic of the data flow diagram, and
the manager node dynamically generates the data flow diagram with a
proper quantity of operators according to the processing logic
defined using the SQL statement such that overall computing
performance of the data flow diagram is improved.
[0096] To more clearly understand a computing principle of the
stream computing method provided in the embodiment in FIG. 5,
referring to FIG. 6, from a perspective of a user, the user needs
to configure input channel description information 61a, configure a
service-related SQL rule 62a, and configure output channel
description information 63a, from a perspective of a manager node,
the manager node introduces an input data stream from a data
production system according to input channel description
information 61b, constructs an operator in a data flow diagram
using an SQL statement 62b, and sends an output data stream to a
data consumption system according to output channel description
information 63b, from a perspective of a worker node, the worker
node needs to execute a source operator (designated as Source), an
intermediate operator complex event processing (CEP), and a sink
operator (designated as Sink) in a stream computing application
that are generated by a manager node.
[0097] Step 502 may be implemented by several subdivided steps. In
an optional embodiment, as shown in FIG. 7, step 502 may be
replaced with step 502a and step 502b for implementation.
[0098] Step 502a: The manager node generates a first data flow
diagram according to the input channel description information,
several SQL rules, and the output channel description information,
where the first data flow diagram includes several logical
nodes.
[0099] Step 502b: The manager node classifies the logical nodes in
the first data flow diagram to obtain several logical node groups,
and selects a common operator corresponding to each logical node
group from a preset operator library, and generates a second data
flow diagram according to the selected common operator, where each
operator in the second data flow diagram is used to implement
functions of one or more logical nodes in a logical node group
corresponding to the operator.
[0100] Optionally, the first data flow diagram is a temporary
logic-level data flow diagram, and the second data flow diagram is
an executable code-level data flow diagram. The first data flow
diagram is a temporary data flow diagram obtained after one-tier
compiling is performed according to the several SQL rules in the
SQL statement, and the second data flow diagram is an executable
data flow diagram obtained after two-tier compiling is performed
according to the first data flow diagram. An operator in the second
data flow diagram may be assigned by means of management scheduling
to a worker node for execution.
[0101] After obtaining the input channel description information,
the several SQL rules, and the output channel description
information, the manager node first obtains the first data flow
diagram by means of one-tier compiling. The first data flow diagram
includes a source logical node, several intermediate logical nodes,
and a sink logical node that are connected by directed edges. The
first data flow diagram includes several logical nodes.
[0102] Then, the manager node classifies the logical nodes in the
first data flow diagram, and performs two-tier compiling on the
logical node groups in the first data flow diagram using the common
operator in the preset operator library to obtain the second data
flow diagram. Each operator in the second data flow diagram is used
to implement logical nodes in the first data flow diagram that are
classified into a same logical node group.
[0103] The common operator is a preset universal operator that is
used to implement one or more functions.
[0104] For example, one operator is used to implement a function of
one source logical node, or one operator is used to implement
functions of one or more intermediate logical nodes, or one
operator is used to implement a function of one sink logical
node.
[0105] For example, one operator is used to implement functions of
one source logical node and one intermediate logical node, one
operator is used to implement functions of one source logical node
and multiple intermediate logical nodes, one operator is used to
implement functions of multiple intermediate logical nodes, one
operator is used to implement functions of one intermediate logical
node and one sink node, or one operator is used to implement
functions of multiple intermediate logical nodes and one sink
node.
[0106] When the logical nodes in the first data flow diagram are
being classified, the manager node may classify the logical nodes
according to at least one factor of load balance, operator
concurrence, intimacy between the logical nodes, and mutual
exclusiveness between the logical nodes.
[0107] When the manager node performs classification according to
the load balance, the manager node classifies the logical nodes
with reference to compute power of each operator and a computing
resource consumed by each logical node such that a computing amount
of each operator is relatively balanced. For example, if compute
power of one operator is 100%, a computing resource that needs to
be consumed by a logical node A is 30%, a computing resource that
needs to be consumed by a logical node B is 40%, a computing
resource that needs to be consumed by a logical node C is 50%, and
a computing resource that needs to be consumed by a logical node D
is 70%, the logical node A and the logical node D are classified
into a same logical node group, and the logical node B and the
logical node C are classified into a same logical node group.
[0108] When the manager node performs classification according to
the operator concurrence, the manager node obtains a data stream
size of each input data stream, and determines, according to the
data stream size of each input data stream, a quantity of logical
nodes used to process the input data stream such that computing
speeds of all input data streams are the same or similar.
[0109] When the manager node performs classification according to
the intimacy between the logical nodes, the manager node computes
the intimacy between the logical nodes according to a type of an
input data stream and/or a dependency relationship between the
logical nodes, and then classifies logical nodes with higher
intimacy into a same logical node group. For example, if an input
data stream 1 is an input data stream of both the logical node A
and the logical node D, intimacy between the logical node A and the
logical node D is relatively high, and the logical node A and the
logical node D are classified into a same logical node group and
are implemented by a same operator such that a quantity of data
streams transmitted between operators can be reduced. For another
example, if an output data stream of the logical node A is an input
data stream of the logical node B, and the logical node B depends
on the logical node A, intimacy between the logical node A and the
logical node B is relatively high, and the logical node A and the
logical node B are classified into a same logical node group and
are implemented by a same operator such that a quantity of data
streams transmitted between operators can also be reduced.
[0110] When the manager node performs classification according to
the mutual exclusiveness between the logical nodes, the manager
node detects whether there is mutual exclusiveness in arithmetic
logic between the logical nodes, and classifies two logical nodes
into different logical node groups when there is mutual
exclusiveness in arithmetic logic between the two logical nodes.
Because a distributed computing system is based on concurrence and
coordination between multiple operators, mutually exclusive access
to a shared resource by the multiple operators is inevitable. To
avoid an access conflict, two mutually exclusive logical nodes are
classified into different logical node groups.
[0111] In conclusion, according to the stream computing method
provided in this embodiment, the user only needs to logically write
the SQL rule. The manager node generates the first data flow
diagram according to the SQL rule, where the first data flow
diagram includes the several logical nodes. Then, the manager node
classifies the logical nodes in the first data flow diagram using
the preset operator library, to obtain the several logical node
groups, and converts each logical node group into an operator in
the second data flow diagram, where each operator in the second
data flow diagram is used to implement logical nodes that belong to
a same logical node group. In this way, the user neither needs to
have a stream programming thought nor needs to care about
classification logic of an operator, and a flow diagram can be
constructed provided that the SQL rule is logically written. The
manager node generates an operator in the second data flow diagram
such that code editing work of constructing a stream computing
application by the user is reduced, and complexity of constructing
the stream computing application by the user is reduced.
[0112] An example of the foregoing stream computing method is
described below in detail in an embodiment in FIG. 8A and FIG.
8B.
[0113] FIG. 8A and FIG. 8B show a flowchart of a stream computing
method according to another embodiment of the present application.
This embodiment describes an example in which the stream computing
method is applied to the stream computing system shown in FIG. 2.
The method includes the following steps.
[0114] Step 801: A management node obtains input channel
description information, an SQL statement, and output channel
description information from a client.
[0115] 1. The input channel description information is used to
define an input channel, and the input channel is a logical channel
that is used to input a data stream from a data production system
to a data flow diagram.
[0116] An Extensible Markup Language (XML) file is used as an
example for the input channel description information, and an
example of the input channel description information is as
follows:
TABLE-US-00001 <channel name="tcp_channel_xdr" type="in">
//channel name tcp_channel_xdr, type input <transfers
type="tcp"> //transfer type: tcp <mode>server</mode>
// transmission mode: server <addr>127.0.0.1:8080;</ip>
//transmission address: 127.0.0.1:8080 </transfers>
<!--global data stream format definition, and encoding/decoding
format definition--> <schemadep> <schema name="XDR"
type="binary" > //schema name "XDR", type: binary file
<attribute name="MSISON" type="string" length="12"/>
//attribute name: MSISON type: string length: 12 < attribute
name="HOST" type="string" length="4"/> //attribute name: HOST
type: string length: 4 < attribute name= "Case ID" type="unit32"
/> //attribute name: Case ID type: 32-bit integer
</schema> </schemadep> <!--input data stream
definition--> <in> <stream name="cau_xdr"
decode="default" schema="XDR"/> //input data stream name:
cau_xdr decoding scheme: default, schema name:XDR </in>
</channel>.
[0117] A specific form of the input channel description information
is not limited in this embodiment of the present application, and
the foregoing example is merely an example for description.
[0118] Optionally, the input data stream from the data production
system is a TCP or UDP data stream, a file, a database, a
distributed file system (for example, Hadoop Distributed File
System, HDFS for short), or the like.
[0119] 2. An SQL is used to define computational logic of each
operator in the data flow diagram, and an input data stream and an
output data stream of each operator.
[0120] The SQL includes a data definition language (DLL) and a data
manipulation language (DML). When each operator in the data flow
diagram is defined using the SQL, the input data stream and/or the
output data stream are/is usually defined using the DLL language,
for example, a create substatement, and the computational logic is
defined using the DML language, for example, an insert into
substatement or a select substatement.
[0121] To define multiple operators in the data flow diagram, the
SQL statement generally includes multiple SQL rules, each SQL rule
includes at least one SQL substatement, and each SQL rule is used
to define a logical node in the data flow diagram.
[0122] For example, a set of typical SQL rules includes
[0123] insert into B . . .
[0124] select . . .
[0125] from A . . .
[0126] where . . . .
[0127] In the database field, an insert into substatement is a
statement for inserting data into a data table in the SQL, a select
substatement is a statement for selecting data from a data table in
the SQL, a from substatement is a statement for reading data from a
data table in the SQL, and a where substatement is a condition
statement added to the select substatement when data needs to be
selected from a data table according to a condition. In the
foregoing example, the input data stream is A, and the output data
stream is B.
[0128] In the SQL in this embodiment, the insert into substatement
is converted into a statement that is used to define an output data
stream, the select substatement is converted into a statement that
is used to indicate computational logic, the from substatement is
converted into a statement that is used to define an input data
stream, and the where substatement is converted into a statement
for selecting data.
[0129] For example, the several SQL rules entered by the user that
are used to configure a data flow diagram include the
following:
TABLE-US-00002 Create stream s_edr(TriggerType uint32,MSISDN
string,QuotaName string,QuotaConsumption uint32, QuotaBalance
uint32, CaseID uint32) as select * from tcp_channel_edr.edr_event;
//SQL rule 1 Create stream s_xdr(MSISDN string,Host string,CaseID
uint32,CI uint32,App_Category uint32,App_sub_Category
uint32,Up_Thoughput uint32,Down_Thoughput uint32) as select * from
tcp_channel_xdr.xdr_event; //SQL rule 2 insert into temp1 select
*form s_edr as a where a.QuotaName=`GPRS`and a.QuotaConsumption *
10 >=a.QuotaBalance * 8; //SQL rule 3 insert into
file_channel_result1.cep_result select b.*,1 as Fixnum from s_xdr
as a,temp1.win:time_sliding(15 sec) as b where a.MSISON= b.MSISDN;
//SQL rule 4 insert into file_channel_result2.cep_result select
MSISDN,App_Category,App_sub)_category; sum
(Up_Thoughput+Down_Thoughput) as Thoughput from
s_xdr.win:time_tumbling(5 min) group by
MSISDN,App_Category,APP_Sub_Category //SQL rule 5
[0130] For the SQL rule 1, an input data stream is tcp_channel_edr,
and an output data stream is s_edr. For the SQL rule 2, an input
data stream is tcp_channel_xdr, and an output data stream is s_xdr.
For the SQL rule 3, an input data stream is tcp_channel_edr, and an
output data stream is s. For the SQL rule 4, an input data stream
is s_xdr and temp1, and an output data stream is
file_channel_result1. For the SQL rule 5, an input data stream is
s_xdr, and an output data stream is file_channel_result2.
[0131] 3. The output channel description information is used to
define an output channel, and the output channel is a logical
channel that is used to send an output data stream to a data
consumption system.
[0132] An XML file is used as an example for the output channel
description information, and an example of the output channel
description information is as follows:
TABLE-US-00003 <channel name="file_channel_result"
type="out"> //channel name tcp_channel_xdr, type: output
<parameter> //parameter <type>file<type> //type:
file <mode>server</mode> // transmission mode: csv file
<line_terminator>\n< line_terminator >//line
terminator:\n
<file_name>/home/demo/result.csv</file_name> //file
name: /home/demo/result.csv </ parameter > <schema
name="RESULT_OUT" type="text" delimiter "," > //schema name:
RESULT_OUT, type: text, delimiter:, <attribute
name="TriggerType" type="uint32"/> //attribute name: TriggerType
type: uint32 <attribute name="MSISDN" type="string"/>
//attribute name: MSISDN type: string < attribute name="Case ID"
type="unit32" /> //attribute name: Case ID type: 32-bit integer
</schema> <!--output data stream definition-->
<out> <stream name="outevent" schema="RESULT_OUT"/>
//output stream name: outevent, schema name: RESULT_OUT
</out> </channel>.
[0133] Optionally, the input data stream from the data production
system is a TCP or User Datagram Protocol (UDP) data stream, a
file, a database, a distributed file system (for example, HADOOP
Distributed File System (HDFS)), or the like.
[0134] A first data flow diagram is a temporary data flow diagram
including a source logical node, an intermediate logical node, and
a sink logical node. The first data flow diagram is a logic-level
data flow diagram. A generation process of the first data flow
diagram may include step 802 to step 805.
[0135] Step 802: The manager node generates a source logical node
according to the input channel description information.
[0136] Optionally, the source logical node is used to receive an
input data stream from the data production system. Generally, each
source logical node is used to receive one input data stream from
the data production system.
[0137] Step 803: The manager node generates an intermediate logical
node according to each SQL rule in the SQL statement and a select
substatement in the SQL rule.
[0138] Optionally, for each SQL rule, the intermediate logical node
is generated according to computational logic defined by the select
substatement in the SQL rule.
[0139] For example, an intermediate logical node that is used to
compute the input data stream tcp_channel_edr is generated
according to a select statement in the SQL rule 1. For another
example, an intermediate logical node that is used to compute the
input data stream tcp_channel_xdr is generated according to a
select statement in the SQL rule 2.
[0140] Step 804: The manager node generates a sink logical node
according to the output channel description information.
[0141] Optionally, the sink logical node is used to send an output
data stream to the data consumption system. Generally, each sink
logical node is used to output one output data stream.
[0142] Step 805: The manager node generates a directed edge between
the source logical node and the intermediate logical node, a
directed edge between intermediate logical nodes, and a directed
edge between the intermediate logical node and the sink logical
node according to an input substatement and an output substatement
in the SQL rule.
[0143] An input edge of the intermediate logical node corresponding
to the SQL rule is generated according to a substatement in the SQL
rule. The other end of the input edge is connected to the source
logical node, or the other end of the input edge is connected to
another intermediate logical node.
[0144] An output edge of the intermediate logical node
corresponding to the SQL rule is generated according to an insert
into substatement in the SQL rule. The other end of the output edge
is connected to another intermediate logical node, or the other end
of the output edge is connected to the sink logical node.
[0145] For an intermediate logical node, an input edge is a
directed edge pointing to the intermediate logical node, and an
output edge is a directed edge pointing from the intermediate
logical node to another intermediate logical node or a sink logical
node.
[0146] For example, as shown in FIG. 8C, the first data flow
diagram includes a first source logical node 81, a second source
logical node 82, a first intermediate logical node 83, a second
intermediate logical node 84, a third intermediate logical node 85,
a fourth intermediate logical node 86, a fifth intermediate logical
node 87, a first sink logical node 88, and a second sink logical
node 89.
[0147] An output data stream tcp_channel_edr of the first source
logical node 81 is an input data stream of the first intermediate
logical node 83.
[0148] An output data stream tcp_channel_xdr of the second source
logical node 82 is an input data stream of the second intermediate
logical node 84.
[0149] An output data stream s_edr of the first intermediate
logical node 83 is an input data stream of the third intermediate
logical node 85.
[0150] An output data stream temp1 of the third intermediate
logical node 85 is an input data stream of the fourth intermediate
logical node 86.
[0151] An output data stream s_xdr of the second intermediate
logical node 84 is an input data stream of the fourth intermediate
logical node 86.
[0152] An output data stream s_xdr of the second intermediate
logical node 84 is an input data stream of the fifth intermediate
logical node 87.
[0153] An output data stream file_channel_result1 of the fourth
intermediate logical node 86 is an input data stream of the first
sink logical node 88.
[0154] An output data stream file_channel_result2 of the fifth
intermediate logical node 87 is an input data stream of the second
sink logical node 89.
[0155] It should be noted that, a sequence of performing step 802,
step 803, and step 804 is not limited in this embodiment.
Optionally, step 802, step 803, and step 804 are concurrently
performed, or step 802, step 803, and step 804 are sequentially
performed.
[0156] A second data flow diagram is an executable stream computing
application, and the second data flow diagram is a code-level data
flow diagram. A generation process of the second data flow diagram
may include step 806 to step 808.
[0157] Step 806: The manager node compiles a common source operator
to obtain a source operator in a second data flow diagram.
[0158] Optionally, the manager node selects the common source
operator from a preset operator library according to the source
logical node, and obtains the source operator in the second data
flow diagram by means of compilation according to the common source
operator.
[0159] Optionally, one or more common source operators are set in
the preset operator library, for example, a common source operator
corresponding to the TCP, a common source operator corresponding to
the UDP, a common source operator corresponding to a file type A, a
common source operator corresponding to a file type B, a common
source operator corresponding to a database type A, and a common
source operator corresponding to a database type B.
[0160] Optionally, the manager node classifies source logical nodes
into one logical node group, and each source logical node is
implemented as a source operator.
[0161] The manager node selects a corresponding common source
operator from the preset operator library according to the source
logical node in the first data flow diagram for compilation in
order to obtain the source operator in the second data flow
diagram. The source operator is used to receive an input data
stream from the data production system.
[0162] Step 807: The manager node selects, from a preset operator
library, at least one common intermediate operator for each logical
node group that includes the intermediate logical node, and
compiles the selected common intermediate operator to obtain an
intermediate operator in the second data flow diagram.
[0163] Optionally, the manager node classifies at least one
intermediate logical node to obtain several logical node groups,
selects, according to intermediate logical nodes that are
classified into a same logical node group, a common intermediate
operator corresponding to the logical node group from the preset
operator library, and obtains the intermediate operator in the
second data flow diagram by means of compilation according to the
common intermediate operator.
[0164] Optionally, one or more common intermediate operators are
set in the preset operator library, for example, a common
intermediate operator used to implement a multiplication operation,
a common intermediate operator used to implement a subtraction
operation, a common intermediate operator used to implement a
sorting operation, and a common intermediate operator used to
implement a screening operation. Certainly, a common intermediate
operator may have multiple types of functions, that is, the common
intermediate operator is an operator with multiple types of
computing functions. When a common intermediate operator has
multiple types of functions, multiple logical nodes can be
implemented on the common intermediate operator.
[0165] Because computing types and/or computing amounts of
intermediate logical nodes in the first data flow diagram are
different, the manager node classifies the intermediate logical
nodes according to at least one factor of load balance, a
concurrence requirement, intimacy between logical nodes, or mutual
exclusiveness between logical nodes, and compiles, using a same
common intermediate operator in the preset operator library,
intermediate logical nodes that are classified into a same logical
node group, to obtain an intermediate operator in the second data
flow diagram.
[0166] For example, the manager node classifies two intermediate
logical nodes with a small computing amount into a same group. For
another example, the manager node classifies an intermediate
logical node A, an intermediate logical node B, and an intermediate
logical node C into a same group, where an output data stream of
the intermediate logical node A is an input data stream of the
intermediate logical node B, and an output data stream of the
intermediate logical node B is an input data stream of the
intermediate logical node C. For still another example, the manager
node classifies an intermediate logical node A and an intermediate
logical node D that have a same input data stream into a same
group.
[0167] Step 808: The manager node compiles a common sink operator
to obtain a sink operator in the second data flow diagram.
[0168] Optionally, the manager node selects the common sink
operator from the preset operator library according to the sink
logical node, and obtains the sink operator in the second data flow
diagram by means of compilation according to the common sink
operator.
[0169] Optionally, one or more common sink operators are set in the
preset operator library, for example, a common sink operator
corresponding to the TCP, a common sink operator corresponding to
the UDP, a common sink operator corresponding to a file type A, a
common sink operator corresponding to a file type B, a common sink
operator corresponding to a database type A, and a common sink
operator corresponding to a database type B.
[0170] Optionally, the manager node classifies sink logical nodes
into one logical node group, and each sink logical node is
implemented as a sink operator.
[0171] The manager node selects a corresponding common sink
operator from the preset operator library according to the sink
logical node in the first data flow diagram for compilation in
order to obtain the sink operator in the second data flow diagram.
The sink operator is used to send a final output data stream to the
data consumption system.
[0172] For example, referring to FIG. 8C, the first source logical
node 81 in the first data flow diagram is compiled using a common
source operator to obtain a first source operator Source 1. The
second source logical node 82 in the first data flow diagram is
compiled using a common source operator to obtain a second source
operator Source 2. The first intermediate logical node 83 to the
fifth intermediate logical node 87 in the first data flow diagram
are classified into a same group, and are compiled using a same
common intermediate operator to obtain an intermediate operator
CEP. The first sink logical node in the first data flow diagram is
compiled using a common sink operator to obtain a first sink
operator Sink 1. The second sink logical node in the first data
flow diagram is compiled using a common sink operator to obtain a
second sink operator Sink 2.
[0173] Finally, the second data flow diagram includes the first
source operator Source 1, the second source operator Source 2, the
intermediate operator CEP, the first sink operator Sink 1, and the
second sink operator Sink 2.
[0174] Step 809: The manager node generates directed edges between
operators in the second data flow diagram according to the directed
edge between the source logical node and the intermediate logical
node, the directed edge between the intermediate logical nodes, and
the directed edge between the intermediate logical node and the
sink logical node.
[0175] The manager node correspondingly generates the directed
edges between the operators in the second data flow diagram
according to the directed edges in the first data flow diagram.
[0176] In this case, an executable data flow diagram is generated.
The data flow diagram may be regarded as a stream computing
application.
[0177] It should be noted that, a sequence of performing step 806,
step 807, and step 808 is not limited in this embodiment.
Optionally, step 806, step 807, and step 808 are concurrently
performed, or step 806, step 807, and step 808 are sequentially
performed.
[0178] Step 810: The manager node schedules operators in the second
data flow diagram to at least one worker node in a distributed
computing system, where the worker node is configured to execute
the operator.
[0179] The distributed computing system includes multiple worker
nodes, and the manager node schedules, according to a physical
execution plan determined by the manager node, the operators in the
second data flow diagram to the multiple worker nodes for
execution. Each worker node is configured to execute at least one
operator. Generally, at least one process runs on each worker node,
and each process is used to execute one operator.
[0180] For example, the first source operator Source 1 is scheduled
to a worker node 1, the second source operator Source 2 is
scheduled to a worker node 2, the intermediate operator CEP is
scheduled to a worker node 3, and the first sink operator Sink 1
and the second sink operator Sink 2 are scheduled to a worker node
4.
[0181] To decouple a data stream citation relationship between the
operators, a subscription mechanism is further introduced in this
embodiment.
[0182] Step 811: The manager node generates, according to an output
data stream of each operator, subscription publication information
corresponding to the operator, and configures the subscription
publication information for the operator.
[0183] The subscription publication information is used to indicate
a publication manner of an output data stream corresponding to a
current operator.
[0184] The manager node generates, according to the output data
stream of the current operator, the directed edge in the second
data flow diagram, and a topology structure between worker nodes,
subscription publication information corresponding to the current
operator.
[0185] For example, if an output data stream of the first source
operator Source 1 is tcp_channel_edr, a directed edge corresponding
to tcp_channel_edr in the second data flow diagram points to the
intermediate operator CEP, and a network interface 3 of the worker
node 1 is connected to a network interface 4 of the worker node 3,
the manager node generates subscription publication information for
publishing the output data stream tcp_channel_edr from the network
interface 3 of the worker node 1 in a predetermined form. Then, the
manager node delivers the subscription publication information to
the first source operator 1 on the worker node 1, and the first
source operator Source 1 publishes the output data stream
tcp_channel_edr according to the subscription publication
information. In this case, the first source operator Source 1
neither needs to care about a specific downstream operator nor
needs to care about a worker node on which the downstream operator
is located, provided that the output data stream is published from
the network interface 3 of the worker node 1 according to the
subscription publication information.
[0186] Step 812: The manager node generates, according to an input
data stream of each operator, input stream definition information
corresponding to the operator, and configures the input stream
definition information for the operator.
[0187] The input stream definition information is used to indicate
a receive manner of an input data stream corresponding to the
current operator.
[0188] The manager node generates, according to the input data
stream of the current operator, the directed edge in the second
data flow diagram, and a topology structure between worker nodes,
subscription information corresponding to the current operator.
[0189] For example, if an input data stream of the intermediate
operator CEP includes tcp_channel_edr, a directed edge
corresponding to tcp_channel_edr in the second data flow diagram is
from the first source operator Source 1, and the network interface
3 of the worker node 1 is connected to the network interface 4 of
the worker node 3, the manager node generates the input stream
definition information that is received from the network interface
4 in a predetermined form. Then, the manager node delivers the
input stream definition information to the intermediate operator
CEP on the worker node 3, and the intermediate operator CEP
receives the input data stream tcp_channel_edr according to the
input stream definition information. In this case, the intermediate
operator CEP neither needs to care about a specific upstream
operator nor needs to care about a worker node on which the
upstream operator is located, provided that the input data stream
is received from the network interface 4 of the worker node 3
according to the input stream definition information.
[0190] Step 813: The worker node executes each operator in the
second data flow diagram.
[0191] Each worker node executes each operator in the second data
flow diagram according to scheduling by the manager node. For
example, each process is responsible for a computing task of one
operator.
[0192] In conclusion, according to the stream computing method
provided in this embodiment, the manager node generates the
executable data flow diagram according to the input channel
description information, the SQL statement, and the output channel
description information, and then the manager node controls,
according to the data flow diagram, the worker node to perform
stream computing. This resolves a problem that overall computing
performance of a generated data flow diagram is relatively poor
because a function of each basic operator is divided at an
extremely fine granularity when the data flow diagram is
constructed in a current stream computing system using the basic
operator provided by an IDE. An SQL is a relatively common database
management language, and the stream computing system supports the
SQL statement in constructing a data flow diagram such that
usability of constructing the data flow diagram by the user using
the SQL statement is ensured. In addition, the user uses the SQL
statement using a programming language characteristic of the SQL
language to define processing logic of the data flow diagram, and
the manager node dynamically generates the data flow diagram with a
proper quantity of operators according to the processing logic
defined using the SQL statement such that overall computing
performance of the data flow diagram is improved.
[0193] Further, the manager node classifies the multiple logical
nodes in the first data flow diagram, and implements, using a same
common intermediate operator, logical nodes that are classified
into a same group. The user does not need to consider factors such
as load balance, concurrent execution, intimacy, and mutual
exclusiveness, and the manager node determines generation of the
second data flow diagram according to the factors such as load
balance, concurrent execution, intimacy, and mutual exclusiveness
such that difficulty of generating the second data flow diagram by
the user is further reduced, provided that the user is capable of
constructing the logic-level first data flow diagram using the
SQL.
[0194] Further, the subscription mechanism is set, and a citation
relationship between the input data stream and the output data
stream of each operator in the second data flow diagram is
decoupled such that the user can still dynamically adjust each
operator in the second data flow diagram in the stream computing
system after the second data flow diagram is executed, and overall
usability and maintainability of the stream computing application
are improved.
[0195] When the second data flow diagram is executed in the stream
computing system, and a service function is changed and adjusted in
an actual use scenario, the executed second data flow diagram also
needs to be changed for adapting to a new requirement. Different
from the other approaches in which the second data flow diagram
usually needs to be reconstructed, this embodiment of the present
application provides a capability of dynamically modifying the
executed second data flow diagram. For details, refer to FIG. 8D to
FIG. 8F.
[0196] After the second data flow diagram is executed, the user may
further modify the intermediate operator in the second data flow
diagram, as shown in FIG. 8D.
[0197] Step 814: The client sends first modification information to
the manager node.
[0198] The first modification information is information for
modifying the SQL rule, or the first modification information
carries a modified SQL rule.
[0199] If the intermediate operator in the second data flow diagram
needs to be modified, the client sends, to the manager node, the
first modification information that is used to modify the SQL
rule.
[0200] Step 815: The manager node receives the first modification
information from the client.
[0201] Step 816: The manager node adds, modifies, or deletes the
intermediate operator in the second data flow diagram according to
the first modification information.
[0202] Optionally, in a modification process in which an original
intermediate operator is replaced with a new intermediate operator,
the original intermediate operator may be deleted, and then the new
intermediate operator is added.
[0203] Step 817: The manager node reconfigures subscription
publication information and/or input stream definition information
for the modified intermediate operator.
[0204] Optionally, if an input data stream of the modified
intermediate operator is a newly added data stream or a changed
data stream, the manager node further needs to reconfigure the
input stream definition information for the intermediate
operator.
[0205] If an output data stream of the modified intermediate
operator is a newly added data stream or a changed data stream, the
manager node further needs to reconfigure the subscription
publication information for the intermediate operator.
[0206] In conclusion, according to the stream computing method
provided in this embodiment, the client sends the first
modification information to the manager node, and the manager node
adds, modifies, or deletes the intermediate operator in the second
data flow diagram according to the first modification information
such that the manager node can dynamically adjust the intermediate
operator in the second data flow diagram.
[0207] After the second data flow diagram is executed, the user may
further modify the source operator in the second data flow diagram,
as shown in FIG. 8E.
[0208] Step 818: The client sends second modification information
to the manager node.
[0209] The second modification information is information for
modifying the input channel description information, or the second
modification information carries modified input channel description
information.
[0210] If the source operator in the second data flow diagram needs
to be modified, the client sends, to the manager node, the second
modification information that is used to modify the input channel
description information.
[0211] Step 819: The manager node receives the second modification
information from the client.
[0212] Step 820: The manager node adds, modifies, or deletes the
source operator in the second data flow diagram according to the
second modification information.
[0213] Optionally, in a modification process in which an original
source operator is replaced with a new source operator, the
original source operator may be deleted, and then the new source
operator is added.
[0214] Step 821: The manager node reconfigures subscription
publication information for the modified source operator.
[0215] Optionally, if an output data stream of the modified source
operator is a newly added data stream or a changed data stream, the
manager node further needs to reconfigure the subscription
publication information for the source operator.
[0216] In conclusion, according to the stream computing method
provided in this embodiment, the client sends the second
modification information to the manager node, and the manager node
adds, modifies, or deletes the source operator in the second data
flow diagram according to the second modification information such
that the manager node can dynamically adjust the source operator in
the second data flow diagram.
[0217] After the second data flow diagram is executed, the user may
further modify the sink operator in the second data flow diagram,
as shown in FIG. 8F.
[0218] Step 822: The client sends third modification information to
the manager node.
[0219] The third modification information is information for
modifying the output channel description information, or the third
modification information carries modified output channel
description information.
[0220] If the sink operator in the second data flow diagram needs
to be modified, the client sends, to the manager node, the third
modification information that is used to modify the output channel
description information.
[0221] Step 823: The manager node receives the third modification
information from the client.
[0222] Step 824: The manager node adds, modifies, or deletes the
sink operator in the second data flow diagram according to the
third modification information.
[0223] Optionally, in a modification process in which an original
sink operator is replaced with a new sink operator, the original
sink operator may be deleted, and then the new sink operator is
added.
[0224] Step 825: The manager node reconfigures input stream
definition information for the modified sink operator.
[0225] Optionally, if an input data stream of the modified sink
operator is a newly added data stream or a changed data stream, the
manager node further needs to reconfigure the input stream
definition information for the sink operator.
[0226] In conclusion, according to the stream computing method
provided in this embodiment, the client sends the third
modification information to the manager node, and the manager node
adds, modifies, or deletes the sink operator in the second data
flow diagram according to the third modification information such
that the manager node can dynamically adjust the sink operator in
the second data flow diagram.
[0227] In a specific embodiment, as shown in FIG. 9A, a stream
computing system provides two types of clients for a user an
original client 92 provided by the stream computing system and a
client 94 secondarily developed by the user. An SQL application
programming interface (API) is provided for both the original
client 92 and the secondarily developed client 94, and the SQL API
is used to implement a function of defining a data flow diagram
using an SQL language. The user enters input/output channel
description information and an SQL statement at the original client
92 or the secondarily developed client 94, and the original client
92 or the secondarily developed client 94 sends the input/output
channel description information and the SQL statement to a manager
node (Master), that is, step 1 in the diagram.
[0228] The manager node (Master) establishes a connection to the
original client 92 or the secondarily developed client 94 using an
App connection service. The manager node (Master) obtains the
input/output channel description information and the SQL statement,
and an SQL engine 96 generates an executable data flow diagram
according to the input/output channel description information and
the SQL statement, that is, step 2 in the diagram.
[0229] The manager node (Master) further includes a stream platform
execution framework management module 98, and the stream platform
execution framework management module 98 is configured to implement
management transactions such as resource management, application
management, active/standby management, and task management. The SQL
engine 96 generates an executable data flow diagram. The stream
platform execution framework management module 98 plans and makes a
decision on an execution plan of the data flow diagram on each
worker node (Worker), that is, step 3 in the diagram.
[0230] A processing element container (PEC) on each worker node
(Worker) includes multiple processing elements (PEs), and each PE
is configured to invoke a source operator, or an intermediate
operator CEP, or a sink operator in the executable data flow
diagram. Each operator in the executable data flow diagram is
processed by means of coordination between PEs.
[0231] FIG. 9B shows a schematic principle diagram of specific
implementation of an SQL engine 96 according to an embodiment of
the present application. After obtaining input/output channel
description information and an SQL statement, the SQL engine 96
performs the following processes:
[0232] 1. The SQL engine 96 parses each SQL rule in the SQL
statement. 2. The SQL engine 96 generates a temporary first data
flow diagram according to a result of the parsing. 3. The SQL
engine 96 classifies logical nodes in the first data flow diagram
according to factors such as load balance, intimacy, and mutual
exclusiveness to obtain at least one logical node group, where each
logical node group includes one or more logical nodes. 4. The SQL
engine 96 simulates operator concurrence computing, and adjusts
each logical node group according to a result of the simulating
operator concurrence computing. 5. The SQL engine 96 generates a
second data flow diagram according to the adjusted logical node
group, and assigns, to an executable operator in the second data
flow diagram, logical nodes that are classified into a same logical
node group. 6. The SQL engine 96 parses each executable operator in
the second data flow diagram, and analyzes information such as a
computing requirement of each operator. 7. The SQL engine 96
generates a logical execution plan for each executable operator in
the second data flow diagram. 8. The SQL engine 96 performs code
editing optimization on the logical execution plan of the second
data flow diagram, and generates a physical execution plan. 9. The
SQL engine 96 sends the physical execution plan to the stream
platform execution framework management module 98, and the stream
platform execution framework management module 98 executes a stream
computing application according to the physical execution plan.
[0233] Step 1 to step 5 belong to a one-tier compilation process,
and step 6 to step 9 belong to a two-tier compilation process.
[0234] The following describes an apparatus embodiment of the
present application, and the apparatus embodiment corresponds to
the foregoing method embodiment. For details not described in
detail in the apparatus embodiment, refer to the description in the
foregoing method embodiment.
[0235] FIG. 10 shows a structural block diagram of a stream
computing apparatus 1000 according to an embodiment of the present
application. The stream computing apparatus 1000 may be implemented
as all or a part of a manager node 240 using a special-purpose
hardware circuit or a combination of software and hardware. The
stream computing apparatus 1000 includes an obtaining unit 1020, a
generation unit 1040, and an execution unit 1060.
[0236] The obtaining unit 1020 is configured to implement functions
of step 501 and step 801.
[0237] The generation unit 1040 is configured to implement
functions of step 502, step 502a, step 502b, and step 802 to step
808.
[0238] The execution unit 1060 is configured to implement functions
of step 503, and step 810 to step 812.
[0239] Optionally, the apparatus further includes a modification
unit 1080.
[0240] The modification unit 1080 is configured to implement
functions of step 815 to step 825.
[0241] For related details, refer to the method embodiments in FIG.
5, FIG. 6, FIG. 7, FIG. 8A, FIG. 8B, FIG. 8C, FIG. 8D, FIG. 8E, and
FIG. 8F.
[0242] Optionally, the obtaining unit 1020 is implemented by
executing an obtaining module 251 in a memory 244 using a network
interface 242 and a processor 241 of the manager node 240. The
network interface 242 is an Ethernet network interface card, an
optical fiber transceiver, a universal serial bus (USB) interface,
or another I/O interface.
[0243] Optionally, the generation unit 1040 is implemented by
executing a generation module 252 in the memory 244 using the
processor 241 of the manager node 240. A data flow diagram
generated by the generation unit 1040 is an executable distributed
stream computing application including multiple operators, and the
operators in the distributed stream computing application may be
assigned to different worker nodes for execution.
[0244] Optionally, the execution unit 1060 is implemented by
executing an execution module 253 in the memory 244 using the
network interface 242 and the processor 241 of the manager node
240. The network interface 242 is an Ethernet network interface
card, an optical fiber transceiver, a USB interface, or another I/O
interface. The processor 241 assigns the operators in the data flow
diagram to different worker nodes using the network interface 242,
and then the worker nodes perform data computing on the
operators.
[0245] Optionally, the modification unit 1080 is implemented by
executing a modification module (not shown in the diagram) in the
memory 244 using the processor 241 of the manager node 240.
[0246] It should be noted that, when the stream computing apparatus
provided in this embodiment generates the data flow diagram and
performs stream computing, division of the function modules is
merely used as an example for description. In practical
application, the functions may be allocated to different function
modules for completion according to a requirement, that is, an
internal structure of a device is divided into different function
modules to complete all or some of the functions described above.
In addition, the stream computing apparatus provided in the
foregoing embodiment pertains to a same concept as the method
embodiment of the stream computing method. For a specific
implementation process of the stream computing apparatus, refer to
the method embodiment. Details are not described herein again.
[0247] FIG. 11 shows a structural block diagram of a stream
computing system 1100 according to an embodiment of the present
application. The stream computing system 1100 includes a terminal
1120, a manager node 1140, and a worker node 1160.
[0248] The terminal 1120 is configured to perform the steps
performed by the terminal or the client in the foregoing method
embodiment.
[0249] The manager node 1140 is configured to perform the steps
performed by the manager node in the foregoing method
embodiment.
[0250] The worker node 1160 is configured to perform the steps
performed by the worker node in the foregoing method
embodiment.
[0251] The sequence numbers of the foregoing embodiments of the
present application are merely for illustrative purposes, and are
not intended to indicate priorities of the embodiments.
[0252] A person of ordinary skill in the art may understand that
all or some of the steps of the embodiments may be implemented by
hardware or a program instructing related hardware. The program may
be stored in a computer-readable storage medium. The storage medium
may include a ROM, a magnetic disk, or an optical disc.
* * * * *