U.S. patent application number 10/795304 was filed with the patent office on 2004-09-02 for method and system for installing program in parallel computer system.
This patent application is currently assigned to HITACHI, LTD.. Invention is credited to Aramaki, Hiromitsu, Takatsu, Hiroyuki, Tatsumi, Akio.
Application Number | 20040172628 10/795304 |
Document ID | / |
Family ID | 18603136 |
Filed Date | 2004-09-02 |
United States Patent
Application |
20040172628 |
Kind Code |
A1 |
Aramaki, Hiromitsu ; et
al. |
September 2, 2004 |
Method and system for installing program in parallel computer
system
Abstract
A distributing node initiates an install control program in
receiving nodes, and then broadcasts or multicasts program data to
the receiving nodes. Thereby, the installation of the program into
the nodes is carried out in shorter time. In this event, the
distributing node and the receiving nodes buffer the program data
in units of data block sizes of storage devices associated
therewith. The distributing node executes in parallel the
processing for storing data read from the storage device in a
buffer, and the processing for reading the data from the buffer and
broadcasting or multicasting the read data to the receiving node.
The receiving node executes in parallel the processing for storing
the data received from the distributing node in a buffer, and the
processing for reading the program data from the buffer and storing
the program data in the storage device thereof.
Inventors: |
Aramaki, Hiromitsu;
(Yokohama, JP) ; Takatsu, Hiroyuki; (Yokohama,
JP) ; Tatsumi, Akio; (Yokohama, JP) |
Correspondence
Address: |
MATTINGLY, STANGER & MALUR, P.C.
1800 DIAGONAL ROAD
SUITE 370
ALEXANDRIA
VA
22314
US
|
Assignee: |
HITACHI, LTD.
|
Family ID: |
18603136 |
Appl. No.: |
10/795304 |
Filed: |
March 9, 2004 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
10795304 |
Mar 9, 2004 |
|
|
|
09791698 |
Feb 26, 2001 |
|
|
|
6721612 |
|
|
|
|
Current U.S.
Class: |
717/176 ;
717/171 |
Current CPC
Class: |
G06F 8/61 20130101; Y10S
707/99952 20130101; G06F 8/60 20130101 |
Class at
Publication: |
717/176 ;
717/171 |
International
Class: |
G06F 009/445; G06F
009/44 |
Foreign Application Data
Date |
Code |
Application Number |
Mar 23, 2000 |
JP |
2000-087064 |
Claims
1. A method of installing a program in a parallel computer system
including plural nodes comprising: selecting one of said plural
nodes as a distributing node for delivering to remaining nodes of
said plural nodes a program to be executed in parallel in said
parallel computer system, and selecting said remaining nodes as
receiving nodes; providing each of said receiving nodes with an
install control program which receives and stores a program from
said distributing node; causing said distributing node to initiate
said install control program in each of said receiving nodes; and
broadcasting said program to be executed in parallel, from said
distributing node to said receiving nodes, and causing said
receiving nodes to simultaneously receive said program.
2. An installing method according to claim 1, wherein: said
distributing node confirms whether or not each of said receiving
nodes is powered on before initiating said install control program
in each of said receiving nodes, and broadcasts said program to
receiving nodes which are powered on.
3. An installing method according to claim 1, further comprising:
notifying all said receiving nodes that said program to be executed
in parallel has been delivered, after completing the delivery of
said program.
4. An installing method according to claim 1, wherein in
broadcasting said program to be executed in parallel, following
four steps are executed in parallel: a step of causing said
distributing node to read said program from a storage device, and
to store the read program in a first buffer; a step of causing said
distributing node to read said program from said first buffer, and
to send the read program to said receiving nodes; a step of causing
said receiving node to store said received program in a second
buffer; and a step of causing said receiving node to read a program
distributed from said second buffer, and to store the distributed
program in a storage device.
5. An installing method according to claim 4, wherein said
distributing node reads a program distributed in data block units
of said storage device, and stores the read program in said first
buffer.
6. An installing method according to claim 4., wherein said
receiving node reads said received program from said second buffer
in data block units of said storage device of said receiving node
itself, and stores the read program in said storage device of said
receiving node itself.
7. An installing method according to claim 4, wherein said
distributing node divides said first buffer by a block size of a
data block in said storage device, and manages said first buffer as
a ring buffer using a first buffer management table.
8. An installing method according to claim 7, wherein said first
buffer management table stores a three-value buffer state
indicative of a state in which a data block is being read or has
been completely transmitted to said receiving nodes, a state in
which said data block has been completely read, or a state in which
all data has been fully read, to manage said first buffer in
blocks.
9. An installing method according to claim 4, wherein said
receiving node divides said second buffer by a block size of a data
block in the storage device to manages said second buffer as a ring
buffer using a second buffer management table.
10. An installing method according to claim 9, wherein said second
buffer management table stores a three-value buffer state
indicative of a state in which data is being received or has been
completely written into a disk, or a state in which said data has
been completely received, or a state in which all data has been
completely written, to manage said second buffer in blocks.
11. A parallel computer system including plural nodes, comprising:
a storage device for storing a program which is executed in
parallel in said plural nodes of said parallel computer system; a
distributing node being one of said plural nodes, and for
delivering said program to remaining nodes of said plural nodes;
and receiving nodes for receiving said program delivered from said
distributing node, wherein said distributing node includes a master
install control unit, said master install control unit notifying
said receiving nodes, which receive said program, of delivery of
said program before delivering said program, reading said program
to be delivered from said storage device, and delivering said
program simultaneously to said receiving nodes; and said receiving
node includes an install control unit, said install control unit
receiving said program from said distributing node, and storing
said program in a storage device of said receiving node itself.
12. A parallel computer system according to claim 11, wherein said
master install control unit includes a first buffer, said first
buffer temporarily holding said program to be delivered which is
read from said storage device; and said master install control unit
performs in parallel an operation involved in reading said program
to be delivered from said storage device and writing said program
to be delivered in said first buffer, and an operation involved in
reading said program to be delivered from said first buffer and
transferring data to said receiving nodes.
13. A parallel computer system according to claim 12, wherein said
master install control unit includes a first buffer management
table, said first buffer management table managing said first
buffer for each data block size of said storage unit; and said
first buffer management table stores a three-value buffer state
indicative of a state in which a data block is being read or has
been completely transmitted to said receiving nodes, a state in
which said data block has been completely read, or a state in which
all data has been completely read.
14. A parallel computer system according to claim 11, wherein said
install control unit includes a second buffer, said second buffer
temporarily buffering said program delivered from said distributing
node; and said install control unit performs in parallel an
operation involved in receiving said program from said distributing
node and writing said program in said second buffer, and an
operation involved in reading said program delivered from said
second buffer and writing said program in said storage device of
said receiving node itself.
15. A parallel computer system according to claim 14, wherein said
install control unit includes a second buffer management table,
said second buffer management table managing said second buffer for
each data block size of said storage unit; and said second buffer
management table stores a three-value buffer state indicative of a
state in which said program to be delivered is being received or
has been completely written into said disk, a state in which said
program to be delivered has been completely received, or a state in
which all data has been completely written.
16. A parallel computer system according to claim 11, wherein said
master install control unit includes a power source control unit,
said power source control unit checking whether or not said
receiving node is powered on before delivering said program.
17. A medium storing a computer executable installing program for a
parallel computer system, said computer executable installing
program comprising: a step of causing a distributing node to
initiate an install control program in plural receiving nodes which
receive a program; a step of causing said distributing node to read
a program to be delivered in order to be executed in each node,
from a storage device of said distributing node itself, and to
store the read program in a first buffer; a step of causing said
distributing node to read said program from said first buffer, and
to send the read program to said receiving nodes; a step of causing
said receiving node to store a received program in a second buffer;
a step of causing said receiving node to read a distributed program
from said second buffer, and to store the read program in a storage
device of said receiving node itself; and a step of causing said
distributing node to notify all said receiving nodes of completion
of program delivery after completing delivery of said program.
Description
BACKGROUND OF THE INVENTION
[0001] The present invention relates to installation of a program
into nodes independent of one another in a parallel computer system
including plural nodes.
[0002] An installing method for use in a parallel computer system
is disclosed, for example, in JP-A-11-296349, wherein upon
completion of installation into a particular node, this node serves
as a server machine to sequentially install software into different
nodes to reduce time required for the installation in the overall
system.
[0003] Also, JP-A-6-309261 discloses a method including a step of
sending an install instruction from a server machine, a step of
requesting required software from client machines to the server
machine, and a step of starting installation of the software into
plural client machines.
[0004] Further, JP-A-6-59994 discloses a method including a step of
sending an install start instruction and install information from a
primary station computer device to plural secondary station
computer devices to install a program into plural client
machines.
[0005] A parallel computer system may include a number of nodes
ranging from several tens to several thousands or more because of
requirements imposed thereto to execute a large scale of
computations. When the same programs are incorporated into these
nodes, it is necessary to reduce time required for installing the
programs. In the prior art JP-A-11-296349, assuming that the number
of nodes in a system is N, and time required for installation per
node is T, time required for the installation into all nodes is
expressed by (log.sub.2N).times.T.
SUMMARY OF THE INVENTION
[0006] It is an object of the present invention to further reduce
the above installation time (log.sub.2N).times.T required for
installing into plural nodes.
[0007] The present invention is characterized by simultaneously
performing install processing in plural nodes by simultaneously
transferring data on a program to be installed, utilizing
communication means interconnecting the respective nodes.
[0008] An arbitrary node in a parallel computer system reads every
predefined amount of programs from a storage medium which stores
the programs, and delivers program data to all nodes, into which
the programs are to be installed, through the communication means.
Each node receives the data and writes the data into a storage
device of the node itself to install the same program in
parallel.
[0009] Also, a master install control program for distributing a
program is executed by one node or an install device in the
parallel computer system. The master install control program reads
a program from a storage medium which stores programs, and
transfers the read program. In this event, plural buffers are used
for communication of data associated with the reading and
transferring of the program.
[0010] A node receiving the program executes an install control
program for receiving the distributed data. The install control
program receives data on the program, which is to be executed in
the node, from the distribution node, and writes the received data
into a storage device of the node itself. Plural buffers are
utilized for communication of data during the reception of data and
the writing into the storage device.
[0011] The master install control program and the install control
program rely on the buffers to process in parallel the reading of
the program from the recording medium, the delivery of the read
program, the reception of the program, and the writing of the
program into the storage device, to reduce time required for
installing the program into plural nodes.
[0012] In an environment in which the present invention is
implemented under the best condition, transfer time is calculated
as follows. Assuming for example that the number of nodes is N; a
total data size of a program to be distributed is A; a predefined
amount of data size for distribution is B; time required for
reading the predefined amount of data is C; time required for
transferring the predefined amount of data to all nodes is D; time
required for receiving the predefined amount of data is E; and time
required for writing the predefined amount of data into an external
storage device is F, time required for installing the program into
all nodes is expressed by ((A/B).times.F)+(C+D+E). (C+D+E) is time
taken for transferring the first predefined amount of data in the
processing for writing the predefined amount of data into the
external storage device. Subsequently, the data read processing,
the transfer-to-node processing and the data reception processing
are performed in parallel through the buffers, so that time
required for the processing is included in time required for
writing data into the storage device.
[0013] As described above, since a program is distributed to all
nodes at one time, time required for installing the program into
all the nodes does not depend on the number of nodes N.
BRIEF DESCRIPTION OF THE DRAWINGS
[0014] FIG. 1 is a block diagram illustrating the configuration of
a parallel computer system;
[0015] FIG. 2 is a diagram illustrating a data flow during
installation of a program;
[0016] FIG. 3 is a block diagram illustrating the configuration of
a master install control program;
[0017] FIG. 4 is a block diagram illustrating the configuration of
an install control program;
[0018] FIG. 5 is a flow chart illustrating the flow of processing
executed by the master install control program;
[0019] FIG. 6 is a flow chart illustrating data read processing in
the master install control program;
[0020] FIG. 7 is a flow chart illustrating data transfer processing
in the master install control program;
[0021] FIG. 8 is a flow chart illustrating the flow of processing
executed by the install control program;
[0022] FIG. 9 is a flow chart illustrating data reception
processing in the install control program; and
[0023] FIG. 10 is a flow chart illustrating disk write processing
in the install control program.
DETAILED DESCRIPTION OF THE EMBODIMENTS
[0024] FIG. 1 illustrates the configuration of a parallel computer
system. In a parallel computer system 1, plural nodes, each
including a CPU, a memory and a disk drive, are interconnected
through an internal network 11.
[0025] For example, a node (1) 2 includes a computation unit (1) 21
having a CPU and a memory; a disk 22; and an install device 23 such
as a hard disk or a magnetic tape. The node (1) 2 is connected to
the internal network 11 through a communication device 12 which has
a broadcast or multicast transfer function. In this way, all nodes
(i.e., node (1) 2, node (2) 3, node (3) 4, node (4) 5, . . . , node
(n-1) 6 and node (n) 7) are interconnected to the internal network
11.
[0026] The disk 22 is an external storage device for storing a
distributed program, and may be implemented by a hard disk or the
like. The install device 23 is an external storage device for
storing programs to be distributed, and may be implemented by a
hard disk, a magnetic tape, an optical disk or the like.
Alternatively, instead of the external storage device connected to
each node, a storage region may be reserved on a memory in each
node, or a program may be directly stored in a work region.
[0027] FIG. 2 is a block diagram illustrating an example in which a
program to be distributed is read from the install device 23 and
installed into all the nodes through the internal network 11. Data
read processing 81 at the node (1) 2 reads every predefined amount
of the program to be distributed, to data dividing buffers 90 from
the install device 23 storing the program to be distributed. Data
transfer processing 82 transfers the data read into the buffers to
all the nodes to which the program is to be distributed, through
the internal network 11. One of all the nodes is represented by the
node (2) 3. In the node (2) 3, data reception processing 111
remains in a data waiting state, for waiting data from the network
11. As the program data is delivered by the data transfer
processing 82, the data reception processing 111 initiates to read
the transferred data into data dividing buffers 120. Disk write
processing 112 writes the data read into the buffers into a disk 33
of the node (2) 3.
[0028] A master install control program 80 including the data read
processing 81 and the data transfer processing 82, and an install
control program 120 including the data reception processing 111 and
the disk write processing 112 are stored in the storage device of
each node. Alternatively, the install control program 120 and the
master install control program 80 may be stored in a storage device
of a distributing node, such that the distributing node distributes
the install control program 120 to receiving nodes when the
parallel computer system is powered on. In this event, the
distributing node sequentially transfers the install control
program to the receiving nodes.
[0029] FIG. 3 illustrates the configuration of the master install
control program 80. In the node (1) 2 to which the install device
23 storing the program to be distributed is connected, the master
install control program 80 transfers every predefined amount of
data read from the install device 23 to all of the node (1) 2, node
(2) 3, node (3) 4, node (4) 5, . . . , node (n-1) 6 and node (n) 7
to which the program is distributed. In the following, the master
install control program 80 will be described in detail.
[0030] The master install control program 80 includes the data read
processing 81 for reading data from the install device 23 which
stores programs; the data transfer processing 82 for transferring
the data read in the data read processing 81 to all the nodes; data
dividing buffers 90 each of which stores the predefined amount of
data read from the install device 23 which stores programs; and
buffer management tables 100 for managing the data dividing buffers
90. Each buffer in the data dividing buffers 90 has a size
equivalent to that of the predefined amount read from the install
device 23.
[0031] The buffer management tables 100 store information
indicative of the states of the associated buffers to control the
data dividing buffers 90 used in the data read processing 81 and
the data transfer processing 82. The buffer management tables 100
include table (1) 101, table (2) 102, . . . , table (m-1) 103 and
table (m) 104 corresponding to buffer (1) 91, buffer (2) 92, . . .
, buffer (m-1) 93 and buffer (m) 94 in the data dividing buffers
90.
[0032] FIG. 4 illustrates the configuration of the install control
program 110.
[0033] The install control program 110 in each of the node (1) 2,
node (2) 3, node (3) 4, node (4)5, node (n-1) 6 and node (n) 7 is
initiated by the master install control program 80.
[0034] The install control program 110 includes data reception
processing 111 for receiving data transferred from the master
install control program 80; disk write processing 112 for writing
data read in the data reception processing 111 into a disk; data
dividing buffers 120 for storing every predefined amount of data
transferred from the master install control program 80 and received
in the data reception processing 111; and buffer management tables
130 for managing the data dividing buffers 120. Each buffer in the
data dividing buffers 120 has a size equivalent to that of the
predefined amount read from the install device 23.
[0035] The operation of the master install control program 80 in
the configuration of FIG. 3 will be described along flow charts
illustrated in FIGS. 5, 6 and 7
[0036] The flow chart in FIG. 5 illustrates the flow of processing
executed by the master install control program 80. First, the
master install control program 80 reserves a region in the data
dividing buffers 90 for storing data read from the install device
23, and a region in the buffer management tables 100 for managing
the data dividing buffers 90 (F50), and initializes the data
division management tables 100 to an initial state (F51). Next, the
program 80 initiates the install control programs 110 in all the
nodes (F52). Finally, the program 80 initiates the data read
processing 81 and the data transfer processing 82 in the master
install control program 80 (F53). In this event, the install
control programs 110 in all the nodes are sequentially initiated in
each of the nodes (F52).
[0037] Each of the receiving nodes is additionally provided with
confirmation means for confirming whether a node power source is ON
or OFF, and notification means for notifying a distributing node of
the power-source state of the node itself. The distributing node
may identify receiving nodes in which the power source is in ON
state, before initiating the install control program 110 in all the
nodes (F52), to initiate the install control programs 110 in the
operable receiving nodes (F52).
[0038] FIG. 6 is a flow chart illustrating the data read processing
81. The data read processing 81 is the processing for sequentially
reading the predefined amount of data from the install device 23
into buffer (1) 91, buffer (2) 92, . . . , buffer (m-1) 93 and
buffer (m) 94.
[0039] The data read processing 81 stores the location of the
buffer (1) 91 which is the head of the data dividing buffers 90
(F60). Next, the data read processing 81 finds a corresponding
table in the buffer management tables 100 from the stored buffer
location (F61), and checks the state of the buffer from the found
table (F62). A buffer may take one of the following four states: a
state (a reading-in-progress state) in which the predefined amount
of data is being read from the install device 23 into the buffer; a
state (a reading-completion state) in which the data has been
completely read into the buffer; a state (a transfer-completion
state) in which the data has been completely transferred to all the
nodes; and a state in which a program has been fully read from the
install device 23 (End of File). These states are represented by
numerical values 0, 1, 0, -1 from the first state. It should be
noted that the reading-in-progress state and the
transfer-completion state are synonym. When the buffer is in the
state "1," the processing 81 waits for an event (F63). When the
buffer is in the state "0" the processing 81 checks whether or not
data still remains in the install device 23 (F64). If data remains,
the predefined amount of data is read from the install device 23
into the buffer (F65). Then, the processing 81 transitions the
state of the buffer to the reading-completion state (F66). The
processing 81 finds and stores the location of the next buffer
(F67), and returns to F61. If no data remains in the install device
23, the processing 81 sets a table corresponding to the buffer
location to "-1" (F68), followed by termination of the flow. The
correspondence between the buffers and the tables is made by
reserving arrays of the same length. If the end location of the
array is reached in determining the next location, the head of the
array is pointed. Also, when the state of the buffer is set to "1"
(F66), and when the data transfer processing 82 is waiting for an
event when the state of the buffer is set to "-1" (F68), the data
transfer processing 82 is released from the event waiting state,
and forced to continue the processing.
[0040] FIG. 7 is a flow chart illustrating the data transfer
processing 82. The data transfer processing 82 is the processing
for transferring data read into the data dividing buffers 90 in the
data read processing 81 to the install control program 110 in each
of the nodes. The data transfer processing 82 stores the location
of the buffer (1) 91 which is the head of the data dividing buffers
90 (F70).
[0041] Next, the processing 82 finds a corresponding table in the
buffer management tables 100 from the stored buffer location (F71),
and checks the state of the buffer from the found table (F72, F74).
The buffer may take one of the four states similar to the
foregoing. When the buffer is in the state "0" the processing 82
waits for an event (F73). When the buffer is in the state "-1," the
processing 82 notifies all the nodes that the data has been fully
read (the end of data is reached) (F75), followed by termination of
the flow. When the buffer is in the state "1," the processing 82
transfers the data in the buffer to all the nodes (F76). Then, the
processing 82 transitions the state of the buffer to the
transfer-completion state (F77), stores the location of the next
buffer (F78), and returns to F71. If the end location of the array
is reached in determining the next location, the head of the array
is pointed in a manner similar to the foregoing. Also, when the
data read processing 81 is waiting for an event when the state of
the buffer is set to "0" (F77), the data read processing 81 is
released from the event waiting state, and forced to continue the
processing.
[0042] Next, the operation of the install control program 110 in
the configuration of FIG. 4 will be described along flow charts
illustrated in FIGS. 8, 9 and 10.
[0043] The flow chart in FIG. 8 illustrates the flow of processing
executed by the install control program 110. First, the program 110
reserves a region in the data dividing buffers 120 for storing data
transferred from the data transfer processing 82 in the master
install control program 80, and a region in the buffer management
tables 130 for managing the data dividing buffers 120 (F80), and
initializes the data-division management tables 130 (F81) into the
initial state. Finally, the program 110 initiates the data
reception processing 111 and the disk write processing 112 in the
install control program 110 (F82).
[0044] FIG. 9 is a flow chart illustrating the data reception
processing 111. The data reception processing 111 is the processing
for sequentially receiving the predefined amount of data from the
data transfer processing 82 in the master install control program
80 in buffer (1) 121, buffer (2) 122, buffer (m-1) 123 and buffer
(m) 124.
[0045] The data reception processing 111 stores the location of the
buffer (1) 121 which is the head of the data dividing buffers 120
(F90).
[0046] Next, the processing 111 finds a corresponding table in the
buffer management tables 130 from the stored buffer location (F91),
and checks the state of the buffer from the table (F92). The buffer
may take one of the following four states: a state (a
reception-in-progress state) in which data is being received from
the data transfer processing 82; a state (a receiving-completion
state) in which data has been completely received; a state (a
writing-completion state) in which data has been completely written
into a disk, and a state in which the end of data has been reached.
The respective states are represented by numerical values 0, 1, 0,
-1 from the first one. It should be noted that the
reception-in-progress state and the writing-completion state are
synonym. When the buffer is in the state "1," the processing 111
waits for an event (F93). When the buffer is in the state "0," the
processing 111 checks whether or not data to be transferred from
the data transfer processing 82 still remains (F94). If data
remains, data transferred from the data transfer processing 82 is
read into the buffer (F95). Then, the processing 111 transitions
the state of the buffer to the receiving-completion state (F96).
The processing 111 finds and stores the location of the next buffer
(F97), and returns to F91. If no data remains, the processing 111
sets a table corresponding to the buffer location to "-1" (F98),
followed by termination of the flow. The correspondence between the
buffers and the tables is made by reserving arrays of the same
length. If the end location of the array is reached in determining
the next location, the head of the array is pointed. Also, when the
state of the buffer is set to "-1" (F96), and when the data write
processing 112 is waiting for an event when the state of the buffer
is set to "-1" (F98), the data write processing 112 is released
from the event waiting state, and forced to continue the
processing.
[0047] FIG. 10 is a flow chart illustrating the disk write
processing 112. The disk write processing 112 is the processing for
writing data read into the data dividing buffers in the data
reception processing 111 into a disk connected to the node. The
disk write processing 112 stores the location of the buffer (1) 121
which is the head of the data dividing buffers 120 (F100).
[0048] Next, the processing 112 finds a corresponding table in the
buffer management tables 130 from the stored buffer location
(F101), and checks the state of the buffer from the found table
(F102, F104). The buffer may take one of the four states similar to
the foregoing. When the buffer is in the state "0," the processing
112 waits for an event (F103). When the buffer is in the state "-1"
the flow is terminated. When the buffer is in the state "1," the
processing 112 writes the data in the buffer into a disk (Fl05).
Then, the processing 112 transitions the state of the buffer to the
writing-completion state (F106), finds and stores the location of
the next buffer (F107), and returns to F101. If the end location of
the array is reached in determining the next location, the head of
the array is pointed in a manner similar to the foregoing. Also,
when the data reception processing 111 is waiting for an event when
the state of the buffer is set to "0" (F106), the data reception
processing 111 is released from the event waiting state, and forced
to continue the processing.
[0049] The foregoing description has been made for the embodiment
according to the present invention which distributes the
executed-in-parallel program in the parallel computer system
including plural nodes interconnected through the internal network.
It goes without saying, however, that the present invention can be
applied to a system which has plural computers connected to a
network and executed in parallel by multicasting program data from
a distributing node to receiving nodes.
* * * * *