Methods and systems for creating and communicating with computer processes Morrison, Conor P. ; et al. [Microsoft Corporation]

Methods and systems for creating and communicating with computer processes

Morrison, Conor P. ; et al.

Patent Application Summary

U.S. patent application number 11/042293 was filed with the patent office on 2005-06-16 for methods and systems for creating and communicating with computer processes. This patent application is currently assigned to Microsoft Corporation. Invention is credited to Gopalan, Arvind, Morrison, Conor P., Padisetty, Sivaprasad V..

Application Number	20050132384 11/042293
Document ID	/
Family ID	25359184
Filed Date	2005-06-16

United States Patent Application	20050132384
Kind Code	A1
Morrison, Conor P. ; et al.	June 16, 2005

Methods and systems for creating and communicating with computer processes

Abstract

Disclosed are mechanisms for creating and communicating with computer processes. An application programming interface (API) presents services of the system to applications. The API is usable with all processes, local and remote, and is transparent with respect to the location of processes. A process table stores information about processes created using the system. The process table supports centralized process control and peer-to-peer process communication and synchronization. Each process is assigned a Universally Unique Identifier (UUID) that uniquely identifies the process no matter the computing device on which it runs. A parent UUID and a group UUID may be attached to the process and used for enforcing dependencies (e.g., for halting the process and all of its child processes) and for managing arbitrary, user-defined groups, respectively. A global event is associated with each process. When a process receives this event, it performs a controlled shutdown, cleans up, and reports status.

Inventors:	Morrison, Conor P.; (Seattle, WA) ; Padisetty, Sivaprasad V.; (Redmond, WA) ; Gopalan, Arvind; (Hacienda Heights, CA)
Correspondence Address:	MICROSOFT CORPORATION MICROSOFT PATENT GROUP DOCKETING DEPARTMENT ONE MICROSOFT WAY BUILDING 109 REDMOND WA 98052-6399 US
Assignee:	Microsoft Corporation Redmond WA
Family ID:	25359184
Appl. No.:	11/042293
Filed:	January 24, 2005

Related U.S. Patent Documents


Application Number	Filing Date	Patent Number
11042293	Jan 24, 2005
09872257	Jun 1, 2001

Current U.S. Class:	719/312 ; 719/313; 719/316; 719/328
Current CPC Class:	G06F 9/54 20130101; H04L 67/40 20130101; H04L 29/06 20130101
Class at Publication:	719/312 ; 719/313; 719/316; 719/328
International Class:	G06F 013/00

Claims

We claim:

1. A computer-readable medium having stored thereon a data structure, the data structure comprising: a first data field containing data representing a UUID associated with a process; and a second data field containing data representing a process identifier associated with the process by an operating system.

2. The data structure of claim 1, further comprising: a third data field comprising data representing a UUID associated with a parent process of the process.

3. The data structure of claim 1, further comprising: a third data field comprising data representing a UUID associated with a group comprising the process.

4. The data structure of claim 1, further comprising: a third data field comprising data representing a time of creation of the process; a fourth data field comprising data representing the most recent time that the process logged a heartbeat; and a fifth data field comprising data representing a type of the process.

5. The data structure of claim 1, further comprising: a third data field comprising data representing an identity of a computing device on which the data structure resides; and a fourth data field comprising data representing an identity of a computing device on which the process runs.

6. The data structure of claim 5, wherein the identities of the computing devices are represented by data in the set: name, IP address.

7. A computer-readable medium having stored thereon a data structure, the data structure comprising: a first data field containing data representing a type of the new process; a second data field containing data representing a UUID; and a third data field containing data representing a command line to execute to initiate the process.

8. The data structure of claim 7, wherein the UUID is a NIL UUID.

9. The data structure of claim 7, further comprising: a fourth data field comprising data representing a username to use when creating the process; and a fifth data field comprising data representing a password to use when creating the process.

10. The data structure of claim 7, further comprising: a fourth data field comprising data representing a directory in which to execute the process.

11. The data structure of claim 7, further comprising: a fourth data field comprising data representing a UUID of a parent of the process.

12. The data structure of claim 7, further comprising: a fourth data field comprising data representing a UUID of a group comprising the new process.

13. The data structure of claim 7, further comprising: a fourth data field comprising data representing a computing device on which the process will run.

14. The data structure of claim 13, wherein the data representing the computing device are in the set: name, IP address.

Description

CROSS-REFERENCE TO RELATED APPLICATION(S)

[0001] This application is a divisional application of and claims the benefit of U.S. patent application Ser. No. 09/872,257, filed Jun. 1, 2001, content of which is hereby incorporated by reference.

TECHNICAL FIELD

[0002] The present invention relates generally to computer operating systems, and, more particularly, to communications mechanisms for computer processes.

BACKGROUND OF THE INVENTION

[0003] Often, a process running on one computing device may need to create or communicate with a process on another device. The use of remote devices may simply be a convenience as, for example, when a program requires so many resources that it cannot effectively be run on one device. The work of the program may then be shared among several devices by invoking processes on the remote devices to perform pieces of the overall task. The results produced by the remote processes are collected in a central, coordinating process. In other cases, the use of remote devices is inherent in the nature of the work at hand. For example, communications protocols cannot be fully tested on one device. A script for testing a protocol may be run on a test host device. To perform the test, the script may start an application on a second device, start a peer application on a third device, and start an application on a fourth device to monitor the communications between the applications on the second and third devices.

[0004] Methods exist for a process running on a host computing device to create a process on a remote device. However, these methods provide much less functionality for communicating with the remote process than is available for processes running locally. Often, these methods only allow the host device to start the remote process, receive output from it, and terminate it. The termination is uncontrolled, not giving the remote process a chance to clean up before exiting. Another drawback of these methods is the distinction they draw between local and remote processes. This makes it very difficult to debug a program on one device and know that it will work correctly when it is running on multiple devices.

[0005] Even for purely local processes, current methods of communication are in some ways inadequate. Local processes may be limited in their ability to log ongoing status information. Termination of local processes may be as uncontrolled as for remote processes.

[0006] What is needed is a method that enhances the communications abilities of all processes and that provides the full functionality of local processes to processes on remote computing devices. The method would ideally hide the distinction between local and remote processes, allowing all processes to be treated in the same manner.

SUMMARY OF THE INVENTION

[0007] The above problems and shortcomings, and others, are addressed by the present invention, which can be understood by referring to the specification, drawings, and claims. The present invention provides mechanisms for creating and communicating with computer processes. An application programming interface (API) presents the services of the invention to applications. The API is usable with all processes, local and remote, and is transparent with respect to the location of processes. The invention also works with processes that do not use the API, although some enhanced services are available only to processes using the API.

[0008] A process table stores information about processes created using the invention. The process table is accessible by all processes, local and remote, and supports centralized process control and peer-to-peer process communication and synchronization. Locks are used to synchronize access to the process table.

[0009] Each process is assigned a Universally Unique Identifier (UUID) that uniquely identifies the process no matter the computing device on which it runs. A parent UUID and a group UUID may be attached to the process and used for enforcing dependencies (e.g., for waiting for or halting the process and all of its child processes) and for managing arbitrary, user-defined groups, respectively.

[0010] A global event is associated with each process. When a process receives this event, it performs a controlled shutdown, cleans up, and reports its status. Users define other global events and assign meanings to them. Global events form a generally useful message-passing mechanism.

[0011] At frequent intervals, processes and process threads log heartbeat entries in the process table. If a process or thread stops updating this field, then other processes can assume that this process or thread broke into the debugger. A process may log other information such as the number of its threads and the current status of the threads.

BRIEF DESCRIPTION OF THE DRAWINGS

[0012] While the appended claims set forth the features of the present invention with particularity, the invention, together with its objects and advantages, may be best understood from the following detailed description taken in conjunction with the accompanying drawings of which:

[0013] FIG. 1 is a schematic drawing of an exemplary environment in which the invention may be practiced: multiple computing devices running multiple processes and communicating with each other;

[0014] FIG. 2 is a block diagram generally illustrating an exemplary computer system that supports the present invention;

[0015] FIGS. 3A and 3B are flow charts showing the steps in creating a process using the invention; and

[0016] FIG. 4 is a schematic diagram of representative process tables.

DETAILED DESCRIPTION OF THE INVENTION

[0017] Turning to the drawings, wherein like reference numerals refer to like elements, the invention is illustrated as being implemented in a suitable computing environment. The following description is based on embodiments of the invention and should not be taken as limiting the invention with regard to alternative embodiments that are not explicitly described herein.

[0018] In the description that follows, the invention is described with reference to acts and symbolic representations of operations that are performed by one or more computers, unless indicated otherwise. As such, it will be understood that such acts and operations, which are at times referred to as being computer-executed, include the manipulation by the processing unit of the computer of electrical signals representing data in a structured form. This manipulation transforms the data or maintains them at locations in the memory system of the computer, which reconfigures or otherwise alters the operation of the computer in a manner well understood by those skilled in the art. The data structures where data are maintained are physical locations of the memory that have particular properties defined by the format of the data. However, while the invention is being described in the foregoing context, it is not meant to be limiting as those of skill in the art will appreciate that various of the acts and operations described hereinafter may also be implemented in hardware.

Creating and Communicating with Local and Remote Processes

[0019] The present invention provides services for creating and communicating with computer processes, whether the processes are all running locally on one computing device or are scattered among several remote devices. Information about processes is gathered into data structures called "process tables." The process tables are accessible by all processes, local and remote, and support centralized process control and peer-to-peer process communication and synchronization.

[0020] This section provides an overview of the mechanisms and capabilities of the invention and includes implementation details only when they are useful to illustrate the discussion. The following section expands on this overview by presenting, in great detail, an exemplary embodiment of the invention.

[0021] FIG. 1 shows an exemplary environment in which the invention may be practiced. It is a schematic drawing showing multiple computing devices 100, 102, and 104 running multiple processes and communicating with each other via a LAN 106. Computing device 100 is running four processes. The indentation is intended to show that Process 1 invokes Process 2, Process 2 invokes Process 3, and Process 3 invokes Process 4. For purposes of illustration, Process 1 is a command and control interface program. The user of the computing device 100 invokes other processes through this interface. Here, the user invokes Process 2 which coordinates and schedules jobs that may comprise several tasks. Process 2 invokes Process 3 which is a communications job. To do its work, Process 3 invokes Processes 4, 5, and 6. Processes 4 and 5 communicate with each other via the LAN 106, Process 4 running on computing device 100 and Process 5 running on computing device 102. Process 6 monitors the communications between Processes 4 and 5 and runs on computing device 104. The choice of a communications job is merely illustrative as the invention works with all single- or multi-process jobs.

[0022] Each computing device runs a service called "spsrv" that coordinates communications among the devices. The spsrv service listens for requests coming in to a device and processes them. These requests include requests to create a process, requests to provide updated status information, and requests to send information to a process. The spsrv service also sends out status updates and responses to enquiries. This service generally makes communications details transparent so that an application can deal with processes regardless of the device on which they are running. Details specific to remote communications are discussed in the section below entitled "Specific Considerations When Communicating with Remote Processes."

[0023] Each computing device contains a process table that has an entry for each process running on, or invoked by a process running on, the computing device. The process table 108 of computing device 100 contains six entries. The first four entries are for Processes 1 through 4 which run on the device. In addition, the process table contains entries for Process 5 and 6 which do not run locally but were invoked by Process 3 which does run locally. Process table 110 on computing device 102 contains an entry for Process 5 because that process runs locally, even though the process was invoked on another device. Similarly, process table 112 on computing device 104 contains entries for Process 6, running locally though invoked remotely, and Process 7, running locally. Process 7 illustrates processes running on a computing device that have nothing to do with the job run by the user of computing device 100. Process tables are described in greater detail with reference to FIG. 4. For the moment, note that process tables are populated when a process is created and contain information useful for controlling and monitoring the processes.

[0024] The computing devices 100, 102, and 104 of FIG. 1 may be of any architecture. FIG. 2 is a block diagram generally illustrating an exemplary computer system that supports the present invention. The computing device 100 is only one example of a suitable environment and is not intended to suggest any limitation as to the scope of use or functionality of the invention. Neither should the computing device 100 be interpreted as having any dependency or requirement relating to any one or combination of components illustrated in FIG. 2. The invention is operational with numerous other general-purpose or special-purpose computing environments or configurations. Examples of well-known computing systems, environments, and configurations suitable for use with the invention include, but are not limited to, personal computers, servers, hand-held or laptop devices, multiprocessor systems, microprocessor-based systems, set-top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, and distributed computing environments that include any of the above systems or devices. In its most basic configuration, computing device 100 typically includes at least one processing unit 200 and memory 202. The memory 202 may be volatile (such as RAM), non-volatile (such as ROM, flash memory, etc.), or some combination of the two. This most basic configuration is illustrated in FIG. 2 by the dashed line 204. The computing device may have additional features and functionality. For example, computing device 100 may include additional storage (removable and non-removable) including, but not limited to, magnetic and optical disks and tape. Such additional storage is illustrated in FIG. 2 by removable storage 206 and non-removable storage 208. Computer-storage media include volatile and non-volatile, removable and non-removable, media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules, or other data. Memory 202, removable storage 206, and non-removable storage 208 are all examples of computer-storage media. Computer-storage media include, but are not limited to, RAM, ROM, EEPROM, flash memory, other memory technology, CD-ROM, digital versatile disks (DVD), other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage, other magnetic storage devices, and any other media which can be used to store the desired information and which can accessed by device 100. Any such computer storage media may be part of device 100. Device 100 may also contain communications connections 210 that allow the device to communicate with other devices. Communications connections 210 are examples of communications media. Communications media typically embody computer-readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism and include any information delivery media. The term "modulated data signal" means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communications media include wired media, such as wired networks (including the LAN 106 of FIG. 1) and direct-wired connections, and wireless media such as acoustic, RF, infrared, and other wireless media. The term computer-readable media as used herein includes both storage media and communications media. The computing device 100 may also have input devices 212 such as a keyboard, mouse, pen, voice-input device, touch-input device, etc. Output devices 214 such as a display, speakers, printer, etc., may also be included. All these devices are well know in the art and need not be discussed at length here.

[0025] The services of the present invention are presented to applications by means of an Application Programming Interface (API). The API can be used with all processes, local and remote, and is transparent with respect to the location of a process. The API returns sensible values if a request fails because of a network problem and does not falter if remote devices are unavailable. If a process uses the API, then the process is called a "WINDOWS Test Technologies (WTT)-based process." The name "WTT" is of only historical interest, and the invention is not limited to use in the testing field or to use with Microsoft's "WINDOWS" operating systems. The invention works with any combination of WTT-based and non-WTT-based processes, although some enhanced services are available only to WTT-based processes. For purposes of this discussion, the services provided by the API are roughly divided into four major categories of communications tasks: creating processes, monitoring processes, waiting for processes, and sending signals to processes, especially termination signals.

[0026] Using the API, applications can create new processes and run them either on the local computing device or on a remote device. Each process is tagged by a Universally Unique Identifier (UUID) that uniquely identifies the process no matter the computing device on which it resides. In addition, a parent UUID and a group UUID may be assigned to the process and used for enforcing dependencies (e.g., for signaling the process and all of its child processes) and for managing arbitrary, user-defined groups, respectively. The process table stores information about processes created on the computing device, whether the process runs locally on the device or runs remotely. The process table is created as a memory-mapped file and is visible to all processes on the device. A global event is associated with each process created via the API and is used for process control and signaling.

[0027] FIGS. 3A and 3B illustrate the steps taken when a process is created by means of calls to the API. In step 300 of FIG. 3A, the API is called to create a process. The call is made by a parent application running on the "source" computing device. Steps 302, 304, and 306 set up information associated with the new process and record that information in the process table on the source device. If desired, a group UUID, parent UUID, or other information can be added to the process table (not shown). Step 308 asks whether the new process will run on the source device or on a remote device. If the new process is to run on the source device, as, for example, when Process 3 in FIG. 1 invokes Process 4, the new process is started in step 310. Otherwise, step 314 sends pertinent information about the new process to the spsrv service running on the remote device, called the "target" device, on which the process will run. This is the case when Process 3 in FIG. 1 invokes Process 6. The information necessary for invoking Process 6 is sent from the source device 100 to the target device 104. FIG. 3B illustrates what happens on the target device when it receives a request from the source device to run a process. After receiving the request in step 318, the target device creates an entry for the process in its process table, step 320, and runs the process, step 322. Note that in the case where the source and target devices are distinct, the process table on each device has an entry for the process. Process 6 shows up both in the process table 108 on the source device 100 and in the process table 112 on the target device 104. This is an implementation detail and is not necessary for the invention, but it helps when monitoring and controlling remote processes, as discussed further below.

[0028] FIG. 4 is a schematic diagram of representative process tables. The tables are populated to reflect the situation in FIG. 1. The first field shown, the UUID assigned to each process, is a useful key into the process tables. Next, the Process ID is assigned by the operating system when the process is created. Because the operating system may not understand the UUID, the Process ID is available when operating system calls need to be made in relation to a process. The Parent UUID and Group UUID are optional fields and are discussed above. Creation Time marks the moment when the process began running. Heartbeat Time stores the last time that a WTT-based process posted a heartbeat update. Uses of the heartbeat timer are discussed further below. The Source Device and Target Device fields identify the computing device where the process was invoked and where it runs, respectively. For local processes such as Process 3, these fields contain the same value. The fields also contain the same value in the process table on the target machine, as shown by the Process 6 entry in Process Table 112. The specific semantics of these two fields are unimportant, as long as the values uniquely identify the devices. Some possible values are the name of the computing device and its IP address. The final field shown, Process Type, is a flag showing whether the process is aware of this API. A Process Table may contain other fields, not shown, and some of these other fields are discussed below. The fields illustrated in FIG. 4 are, arguably, the basic fields used by the API.

[0029] Because a process table is accessible to all processes on the computing device, mechanisms exist for coordinating access to the table. One mechanism involves software locks, both for the entire table and for each individual row. For example, a process updating its heartbeat time can lock access to its row while it writes the current time into the Heartbeat Time field. When a process is created or deleted, the entire process table is locked so that a row can be added or deleted without interference.

[0030] At frequent intervals, for each process, a monitor thread logs heartbeat entries in the Heartbeat Time field in the local process table. Each thread in a process updates a local heartbeat and the monitor thread keeps track of these local heartbeats, updating the heartbeat field in the local process table if all the threads are updating their local heartbeats. If any thread deadlocks and stops updating its local heartbeat, the monitor thread detects this, logs the fact, and either breaks into the debugger or marks the process as requiring assistance. When an application wants to monitor the heartbeat of a process, the application begins by looking up the entry for the process in the process table on the computing device on which the application is running. The application reads the Target Device field to see where the process is running. Then, if the target device is the local device, the application reads the Heartbeat Time field in the local process table. Otherwise, the target device is distinct from the local device and the application sends a request to the spsrv service running on the target device asking it to send the value of the Heartbeat Time of the process. For example, if Process 3 in FIG. 1 wants to know whether Process 6 is still running normally, that is to say, is still logging heartbeats, Process 3 would consult Process Table 108 on its local computing device 100. Reading the entry for Process 6, Process 3 discovers that Process 6 is running remotely, on computing device 104. (See FIG. 4.) Process 3 formulates a request and sends it to the computing device 104. That device reads its process table 112 and reports to Process 3 that the Heartbeat Time field of Process 6 currently reads "14:24:56". Process 3 compares that heartbeat time (adjusted, if necessary, for time zone differences) to its local clock and decides whether Process 6 is running or has broken into the debugger.

[0031] In addition to its heartbeat, a process may log other information including the number of its threads, the current status of the threads, console output, log file output, etc. An application wishing to monitor this output can use the same techniques described above with respect to heartbeats. The application can also obtain ongoing status information by requesting that a copy of new information written by the process be sent to the application as it is written. Using parent and group UUIDs, an application can monitor all of the processes in a dependency list or in a user-defined process group.

[0032] A process may wait for other processes to achieve a specified status, for example, to complete their initialization or to terminate. The API provides a function that waits until the processes achieve the status or until a timeout period elapses. The function checks the heartbeat of all WTT-based processes and, if a process is not logging heartbeats, then the process may be assumed to have broken into the debugger. Using the processes in FIG. 1 as an example, assume that Process 3 calls the API function to wait for Processes 4, 5, and 6 to complete their initialization. Because Processes 5 and 6 run on remote computing devices, the API function sends a wait request to those remote devices. Each device waits on the processes local to it and then reports the results to Process 3. For each process in the wait list, the returned status may be Completed Initialization, Still Initializing, or Heartbeat Stopped. Using UUIDs in the same manner as in process monitoring, a process can wait for all of the processes in a dependency list or in a user-defined process group. Note that because non-WTT-based processes do not update their Heartbeat Time field, it cannot be assumed that these processes broke into the debugger.

[0033] When a job is divided into discrete processes, the processes often need to communicate among themselves to coordinate the tasks they perform. The API provides a generally useful signaling mechanism for this purpose in the form of Global Events. As an example, one particular event is the Controlled Shutdown. When a WTT-based process receives this event, it releases the resources it is using, reports its status, and performs a controlled shutdown. Users may define other Global Events and assign meanings to them. When a process receives an event, it responds in a fashion appropriate to the event's meaning. However, if a process receives an event it does not understand, it may terminate in an uncontrolled fashion. A process may use parent and group UUIDs to send an event to groups of processes.

An Exemplary Application Programming Interface

[0034] The services provided by the invention as described in the previous section are presented again in this section but with more attention paid to the details of an exemplary API. In its specific details, this embodiment is oriented towards use with Microsoft's "WINDOWS" operating system, but the principles are applicable to other environments. This section begins by describing the fundamental data structures used in this embodiment.

[0035] Note that UUIDs are sometimes called GUIDs (Globally Unique Identifiers).

[0036] The variable types TCHAR and Tstring are used in the definitions below to provide source code compatibility between Unicode and non-Unicode machines. If the parameter_UNICODE is defined during the build, then TCHAR is defined to be the Unicode's basic wide character type, "wchar_t," otherwise it becomes the standard ASCII 8-bit signed "char." Similarly, Tstring is a string of TCHARs and becomes either the Unicode wide string, "wstring," or ASCII "string."

[0037] 1 WTTPROCESSPARAM

[0038] Describes the input parameters to the WTTCreateProcess call.

1 // From the winbase.h file. #define MAX_COMPUTERNAME_LENGTH 31 // Type of processes: WTT-based or not. #define WTT_PROC_TYPE_NONWTT_BASED 1 #define WTT_PROC_TYPE_WTT_BASED 2 // System processes and other non-WTT-based processes launched outside the // scope of the API. #define WTT_PROC_TYPE_SYSTEM_BASED 3 typedef struct _WTTPROCESSPARAM { // Sizeof this structure (including this field). User needs to input a value of // sizeof(WTTPROCESSPARAM) for this. IN DWORD dwStructSizeOf; // Flags. Reserved: must be zero (MBZ). IN DWORD dwFlags; // Flags used in WTTCreateProcess. Only CREATE_NEW_CONSOLE, // CREATE_NEW_PROCESS, and DETACHED_PROCESS are currently // supported. IN DWORD dwCreateProcessFlags; // Is this a WTT-based process? IN DWORD dwProcessType; // The username and password to use when running the process. The password // is unencoded text but is encrypted before sending to the target device. IN TCHAR *szUserName; IN TCHAR *szPassword; // The command line to execute when starting the process. IN TCHAR *szCommandLine; // NULL or a debugger string such as "ntsd -g". IN TCHAR *szDebugger; // NULL or the UNC-style (e.g., .backslash..backslash.machine.backslash.share.bac- kslash.path . . .) name of a generated // log file. IN TCHAR *szLogFile; // The directory where the process is created. Can be NULL, which means use // the current directory for launching the process. IN TCHAR *szCurrentDirectory; // If the process was invoked remotely, then get the GUID from the caller. // From an external caller's perspective, this is not provided as an input. UUID Guid; // This optionally identifies a group with which the process is associated. UUID GroupGuid; // The GUID of the parent of this process. There may be a chain of parent- // child processes. UUID ParentGuid; // The identity of the target computing device, for example, its name or IP // address. IN TCHAR szTargetMachine[MAX_COMPUTERNAME_LENGTH + 1]; } WTTPROCESSPARAM, *PWTTPROCESSPARAM;

[0039] By associating a group GUID with a set of processes, processes can communicate with all the processes in the set. This is similar to a "process group" in Windows NT or Unix.

[0040] 2 WTTPROCLISTINFO

[0041] Defines information relating to a process. WTTGetProcessListInfo returns this information. A pointer to this structure is passed as an input parameter to WTTOpenProcess. An application receives a handle to a process by calling WTTOpenProcess and can use that handle to monitor the process, even if the process was not created by the application.

2 typedef struct _WTTPROCLISTINFO { // The GUID, Process ID, and type of the process. The process type can be: // WTT_PROC_TYPE_NON_WTT_BASED (defined to be 1); // WTT_PROC_TYPE_WTT_BASED (2); or // WTT_PROC_TYPE_SYSTEM_BASED (4). UUID Guid; DWORD dwPid; DWORD dwProcType; // These variables are meaningful only if the process is WTT-based and is // logging heartbeats. For non-WTT-based processes, dwHBTime is zero and // ulLastHBUpdateTime is the time the process was created. DWORD dwHBTime; ULARGE_INTEGER ulLastHBUpdateTime; // The number of seconds since the process was created (reported as zero for // non-WTT-based processes). DWORD dwElapsedSeconds; // This is the status of the process. Its possible values are given below in the // section describing WTTGetProcessInfo. For non-WTT-based processes, the // reported status is WTTHANDLE_PROCSTATUS_UNDEFIN- ED. DWORD dwProcStatus; // The module name (not fully qualified with path). TCHAR szModuleName[256]; } WTTPROCLISTINFO, *PWTTPROCLISTINFO;

[0042] 3 WTTTHREADINFO

[0043] Holds information about a thread including the Thread Identifier and a list of comments. Comments may be pushed onto the stack, and the most recent comment may be popped off the stack and examined.

3 typedef struct_WTTTHREADINFO { DWORD dwThreadId; // The Standard Template Library (STL) contains type-parameterized classes. // slThreadCommentStack is an STL stack of STL strings. stack <string> slThreadCommentStack; } WTTTHREADINFO, *PWTTTHREADINFO;

[0044] 4 WTTPROCESSINFO

[0045] Holds detailed process information.

4 typedef class _WTTPROCESSINFO { // All members are public (can use a ctor and a dtor). public: DWORD dwProcType; // Status of the process (initialized, debug break, terminated, etc.). This is // the same as in the WTTPROCLISTINFO structure. That one is there for // convenience only. DWORD dwProcStatus; UUID Guid; DWORD dwProcPid; DWORD dwProcExitCode; // Time elapsed since the creation of the process. ULARGE_INTEGER ulElapsedTime; TCHAR *pszModuleName; TCHAR *pszCommandLine; TCHAR *pszTargetMachine; // Singly-linked list of thread information (used to store elements of type // WTTTHREADINFO). list <PWTTTHREADINFO> slThreadList; // List of log files associated with the process. list <string> slLogList; // List of variations covered. list <string> slVarnList; public: _WTTPROCESSINFO( ) { pszModuleName = new TCHAR[MAX_PATH]; pszCommandLine = new TCHAR[MAX_CMD_LINE]; pszTargetMachine = new TCHAR[MAX_COMPUTERNAME_LENGTH + 1]; } } WTTPROCESSINFO, *PWTTPROCESSINFO; 5 WTTP_LOG_INFO typedef struct _WTTP_LOG_INFO { TCHAR szLogFileName[128]; // UNC path of log file. } WTTP_LOG_INFO, *PWTTP_LOG_INFO

[0046] 6 HWTTPROCESS

[0047] This structure is opaque to the user and is used as a handle for future operations. This process-specific handle may be replaced by WTTHANDLE.

[0048] 7 WTTHANDLE

[0049] This data structure is opaque to the user and is used as a handle for future operations. This handle is capable of handling objects no matter their type--whether processes, events, mutexes, etc. For "WINDOWS" implementations, this handle is similar to the handles used by Win32 processes.

5 typedef struct _WTT_HANDLE { // The exit status of the process as would be returned by a local call to the // Win32 function GetExitCodeProcess( ). DWORD dwStatus; // The Process Identifier of a created child. DWORD dwProcID; // Was the process successfully created? If not, then this is set to // ERROR_SERVICE_NOT_ACTIVE. DWORD dwProcCreationStatus; // This points to information such as the heartbeat timer, etc. This field is // opaque and only makes sense on the device on which the process is created. PWTT_SHAREDINFO pSharedInfo; // The current status of the process. DWORD dwProcStatus; // Store the following data in the process handle. While marshaling the // parameters, the offsets are clearly defined and the strings are put towards the // end of the buffer. // If the call comes from a remote device, then get the GUID from the caller. UUID Guid; TCHAR *szCommandLine; // The following two parameters are supplied in case the process needs to be // launched by a specified user. TCHAR *szUserName; TCHAR *szPasswd; // Both for storage in the local process table and for redirection. TCHAR *szTargetMachineName; TCHAR *szModule; // The object type can be WTT_PROC_OBJECT, WTT_EVENT_OBJECT, // etc. DWORD dwObjectType; PHANDLE hObjectHandle; } WTT_HANDLE, *WTTHANDLE;

[0050] Having presented the data structures used in this implementation, the following describes the function calls provided by the API.

[0051] 8 WTTCreateProcess

[0052] Create a process, whether WTT-based or not. The user's input parameters are passed in as part of the WTTPROCESSPARAM structure. The returned structure pointer (pHWTTProcess) is opaque and is used in future calls. If UserName and Password are specified as part of the input structure, then the process is created with the logon credentials of the specified user.

[0053] The call is basically asynchronous in nature and returns as soon as possible after the process is successfully created or with a meaningful error value explaining why the process creation failed.

6 DWORD WTTCreateProcess ( IN OUT PWTTPROCESSPARAM pWTTProcessParam, OUT WTTHANDLE *pHWTTProcess );

[0054] Parameters:

[0055] pWTTProcessParam

[0056] Points to a structure of type WTTPROCESSPARAM, which contains the input parameters. Some of the fields in this structure are appropriately updated to store output values. For example, if the passed in GUID is "NIL" (see Note on UUIDs below), then the newly created GUID is stored when the function returns.

[0057] The following flags are supported in the WTTPROCESSPARAM structure's dwCreateProcessFlags field: CREATE_NEW_CONSOLE, CREATE_NEW_PROCESS, and DETACHED_PROCESS.

[0058] pHWTTProcess

[0059] An opaque pointer used in future calls to the API for accessing information about the process.

[0060] Return Values:

[0061] ERROR_SUCCESS if the process is successfully created, else Win32 error. In the latter case, the returned handle is NOT valid.

[0062] Implementation Notes:

[0063] This function assigns a GUID to the process that uniquely identifies the process no matter the device on which it runs. Then the function locks access to the process table and finds an empty slot in the table. Assigning the slot to the new process, this function stores in the slot the initial data for the process including its GUID, Parent GUID, Group GUID, etc. The parent of the process updates the heartbeat field and writes a zero value into the HB field. This makes it possible for the WTTWaitForMultipleObjects function to detect a DEBUG_BREAK that occurs before the creation of the Global Event.

[0064] If the process is to run on a remote device, then the parameters of the call are marshaled over the network and sent to the remote (target) device. The process is then created locally on the target device.

[0065] Once the new process starts, its status in the process table (the dwProcStatus field) is automatically updated.

[0066] 9 WTTSignalProcesses

[0067] Send a signal to the processes in a set. The set may include both WTT-based and non-WTT-based processes. The global event handle is set for each process. One currently defined signal is "terminate the process." On receipt of that signal, a process cleans up after itself and performs a controlled stop. Sending a terminate signal is similar to sending a "kill" signal.

7 DWORD WTTSignalProcesses ( IN DWORD nCount, IN WTTHANDLE *phWTTProcess, IN DWORD dwFlags );

[0068] Parameters:

[0069] nCount

[0070] The number of processes in the phWTTProcess array.

[0071] phWTTProcess

[0072] The set of processes to signal. This is an array of WTTHANDLEs for WTTProcesses as returned by the WTTCreateProcess and WTTOpenProcess functions.

[0073] dwFlags

[0074] The type of signal to send:

[0075] WTT_SIGNAL_PROCESS

[0076] Attempt a controlled stop by signaling the event associated with the process. It is the responsibility of non-WTT-based processes to check the global event.

[0077] WTT_TERMINATE_PROCESS

[0078] Force-terminate the process. This cannot be combined with WTT_SIGNAL_PROCESS.

[0079] WTT_TERMINATE_ALL_CHILDREN

[0080] This terminates all processes in a process tree. For every process in the process tree, internal process APIs are recursively used to terminate the children. The process table is searched to find all the descendents so that they can be signaled.

[0081] Return Values:

[0082] ERROR_SUCCESS if the signal is successfully sent, else Win32 error.

[0083] Implementation Notes:

[0084] For non-WTT-based processes, the standard global event handle is signaled. If a non-WTT-based process does not clean up within an acceptable period of time after being sent a WTT_SIGNAL_PROCESS signal, then the calling process can send a WTT_TERMINATE_PROCESS signal.

[0085] 10 WTTWaitForMultipleObjects

[0086] Wait for processes in a set to achieve a specified status, but stop waiting if a timeout period expires. The function checks the heartbeats of all WTT-based processes, and if a process is not logging heartbeats, then it is assumed to have broken into the debugger. This function is often used to wait for processes to terminate. In that case, the different possible scenarios on returning from this function are as follows:

[0087] all processes stopped successfully;

[0088] some processes stopped successfully, and some processes broke into the debugger; and

[0089] some processes stopped successfully, some broke into the debugger, and some did neither but are still logging heartbeats.

[0090] In the last case, the function timed out before all the processes were finished so the function returns the value WAIT_TIMEOUT.

[0091] A debug break cannot be declared for a non-WTT-based process because this type of process does not log heartbeats.

8 DWORD WTTWaitForMultipleObjects ( IN DWORD nCount, IN WTTHANDLE *phWTTProcess, IN BOOL fWaitAll, IN DWORD dwTimeoutInSeconds, IN DWORD dwDebugTimeoutInSeconds, IN DWORD dwWaitType, OUT DWORD *pdwSummaryStatus, OUT DWORD *pdwSummaryIndex )

[0092] Parameters:

[0093] nCount

[0094] The number of processes in the phWTTProcess array.

[0095] phWTTProcess

[0096] The set of processes stored as an array of WTTHANDLEs.

[0097] fWaitAll

[0098] TRUE means wait for all processes in the set. FALSE means wait for the first process to achieve the specified status.

[0099] dwTimeoutInSeconds

[0100] The function timeout period. The function waits no longer than this before returning. If a process does not achieve the specified status (e.g., terminated) during this period of time, its status is returned as WAIT_TIMEOUT.

[0101] dwDebugTimeoutInSeconds

[0102] If a process has not logged a heartbeat during this period, then the process is declared to have broken into the debugger. The value of this parameter may be smaller than the value of dwTimeoutInSeconds. A value of INFINITE is also possible which effectively ignores heartbeats.

[0103] If fWaitAll is TRUE, then the value of this parameter should be the maximum of the debug timeout values of all the processes in the monitored set.

[0104] dwWaitType

[0105] The type of status to wait for. These values cannot be combined. Many more statuses are possible; the following are currently implemented:

[0106] WTT_PROCESS_INITIALIZE

[0107] Wait for the processes to complete their initialization.

[0108] WTT_PROCESS_TERMINATE

[0109] Wait for the processes to finish.

[0110] pdwSummaryStatus

[0111] The address to receive the first failure status of the array (or NULL if this information is not desired). This field is meaningful only if the return value is ERROR_SUCCESS and if fWaitAll is FALSE.

[0112] pdwSummaryIndex

[0113] The address to receive the index corresponding to the summary status (or NULL if this information is not desired).

[0114] Return Values:

[0115] ERROR_SUCCESS if all the processes successfully achieve the specified status.

[0116] WAIT_TIMEOUT if the timeout expires before all the processes achieve the specified status. In this case, *pdwSummaryIndex and *pdwSummaryStatus are undefined.

[0117] WTT_ERROR_DEBUG_BREAK if a process breaks into the debugger. *pdwSummaryStatus contains WTT_ERROR_DEBUG_BREAK and the index of that process in the phWTTProcess array is returned in *pdwSummaryIndex. There could be several processes in such a state in which case pdwSummaryIndex points to the first one.

[0118] Win32 if the function call fails.

[0119] Implementation Notes:

[0120] When processes in the set run on a distributed set of computing devices, there may be one thread per process (or one per computing device) which the overall thread monitors.

[0121] For non-WTT-based processes, dwLastHBUpdateTime is the time the process was created and is not updated. No debug break can be declared for these processes.

[0122] 11 WTTGetProcessInfo

[0123] Query the status of a process that was launched by the WTTCreateProcess function. After reviewing the information returned, WTTFreeProcessInfo is called to release the memory allocated by this function.

9 DWORD WTTGetProcessInfo ( IN WTTHANDLE phWTTProcess, OUT PWTTPROCESSINFO *ppWTTProcessinfo );

[0124] Parameters:

[0125] phWTTProcess

[0126] Process information is stored in a WTTHANDLE structure. The handle could have been obtained either by a call to WTTCreateProcess or by a call to WTTOpenProcess (after a call to WTTGetProcessListInfo).

[0127] Additionally, this could have a value of NULL. In that case, the information returned pertains to the process that called this function. This is useful when a non-WTT-based process wishes to get GUID information about itself, which it can then use to open a handle to the Global Event.

[0128] ppWTTProcessinfo

[0129] This stores information about the process being queried. The information includes the threads present, the stack of thread comments for each thread, a list of log files that this process monitors, and a list of variations completed by the process.

[0130] Return Values:

[0131] ERROR_SUCCESS if the request is successfully processed, else Win32 error.

[0132] Implementation Notes:

[0133] For WTT-based processes, the following information is returned:

[0134] a list of the threads present in the process;

[0135] a stack of comments stored on a per-process basis;

[0136] a list of log files that are directly created by the process;

[0137] a list of variations covered by the process;

[0138] the module name;

[0139] the type of the process (WTT_PROC_TYPE_WTT_BASED); and

[0140] the current state of the process.

[0141] The data returned are stored in the form of simple link lists or stacks. Small routines are provided to return the size, traverse, and list the contents of the lists or stacks.

[0142] For non-WTT-based process, a list of thread identifiers, the module name, the type of the process, and the current state of the process are returned. The current state of the process may not be very accurate because non-WTT-based processes do not log heartbeats.

[0143] The process statuses are:

10 WTTHANDLE_PROCSTATUS_UNDEFINED WTTHANDLE_PROCSTATUS_INITIALIZED WTTHANDLE_PROCSTATUS_RUNNING WTTHANDLE_PROCSTATUS_GE_CREATED (The Global Event is ready for signaling.) WTTHANDLE_PROCSTATUS_TERMINATED WTTHANDLE_PROCSTATUS_DEBUG_BREAK WTTHANDLE_PROCSTATUS_HANDLE_CLOSE- D

[0144] The macro GET_PROC_STATUS(pWTTProcessinfo->dwProcStatus) returns a string corresponding to the process status.

[0145] 12 WTTFreeProcessInfo

[0146] Release the memory allocated within the WTTPROCESSINFO structure during a WTTGetProcessInfo function call.

[0147] DWORD WTTFreeProcessInfo(IN PWTTPROCESSINFO*ppWTTProcessinfo);

[0148] Parameter:

[0149] ppWTTProcessinfo

[0150] Pointer to a pointer to a structure containing information about a process returned by a call to WTTGetProcessInfo.

[0151] Return Values:

[0152] ERROR_SUCCESS if the allocated memory is successfully released, else Win32 error. The pointer to the WTTPROCESSINFO structure is not defined after a call to this function.

[0153] 13 WTTGetProcessListInfo

[0154] Get the process list from the target machine's process table. The information returned varies depending upon the values specified in dwFlags. Memory allocation is done within the function call itself. WTTFreeProcessListInfo is called to release the memory after reviewing the information returned.

11 DWORD WTTGetProcessListInfo ( IN LPCTSTR pszMachine, BOOL bResolveRemote, IN DWORD dwFlags, OUT DWORD *pdwCount, OUT PWTTPROCLISTINFO *ppWTTProcessListInfo );

[0155] Parameters:

[0156] pszMachine

[0157] The name of the computing device from which to retrieve the process table information.

[0158] bResolveRemote

[0159] TRUE means remote entries should be resolved. In that case, extra heartbeat-related information is retrieved for processes initiated by WTTCreateProcess on the computing device specified by pszMachine. A query is made to that remote device.

[0160] dwFlags

[0161] Include_wtt_based_procs

[0162] Include all WTT-based processes created by WTTCreateProcess or otherwise.

[0163] Include_non_wtt_based_procs

[0164] Include non-WTT-based processes created by WTTCreateProcess.

[0165] Include_system_procs

[0166] GUID is displayed as NULL for these. WTTOpenProcess cannot be called for processes of this type.

[0167] pdwCount

[0168] Pointer to the number of elements in the ppWTTProcessListInfo array.

[0169] ppWTTProcessListInfo

[0170] An array of output information for the processes.

[0171] Return Values:

[0172] ERROR_SUCCESS if the information is successfully retrieved, else Win32 error.

[0173] Implementation Notes:

[0174] During the marshaling of parameters to a remote device, pszMachine is marshaled into the szTargetMachine field of the buffer.

[0175] This function needs to carefully check to see if a process actually exists. If the entry for a particular process is present in the <GUID>.ini file but not present in the process table, then the process no longer exists. There is a problem, however, because there may be entries in the process table for processes that have exited. This happens only if a WTT-based process is killed with a forced kill signal. Even doing an OpenProcess( ) on the process identifier (PID) is not a foolproof check as the PID could have been recycled. The solution is to use the Phandle pointer in the process table (on the local machine where the process was instantiated) to wait on the Process Handle with a timeout of zero. If the process is gone, then Phandle is signaled immediately.

[0176] When returning the list of process information, allocate space for one more than the total number of entries returned. The last entry is a "NULL": NIL for GUIDs and ZERO for DWORDS.

[0177] 14 WTTFreeProcessListInfo

[0178] Release the memory allocated during a WTTGetProcessListInfo function call.

12 DWORD WTTFreeProcessListInfo ( IN PWTTPROCLISTINFO *ppWTTProcessListInfo );

[0179] Parameter:

[0180] ppWTTProcessListInfo

[0181] The array for which memory is to be released.

[0182] Return Values:

[0183] ERROR_SUCCESS if the allocated memory is successfully released, else Win32 error.

[0184] 15 WTTTailLog

[0185] Retrieve a copy of output as it is added to a log file. The effect is that of a distributed "tail-f" command. A callback allows this function to return asynchronously.

13 DWORD WTTTailLog ( WTTHANDLE pWTTProcInfo, WTTP_LOG_INFO *pWTTLogInfo, DWORD dwBytes, WTTPROC_CALLBACK CALLBACKFUNCTION );

[0186] Parameters:

[0187] pWTTProcInfo

[0188] Information about the process of interest to be passed over to the remote device.

[0189] pWTTLogInfo

[0190] This structure contains the log information. It includes the UNC path of the log file. If this pointer is NULL, then the first log file is used, as specified in the <GUID>.ini file.

[0191] dwBytes

[0192] The number of bytes to be retrieved. If this is set to the value WTTPROCESS_FULL_LOGSIZE, then entire log files are retrieved.

[0193] CALLBACKFUNCTION

[0194] Register a callback function with the spsrv service to retrieve data (the tail of the log file) asynchronously.

[0195] Return Values:

[0196] ERROR_SUCCESS if the log file stream is successfully initialized, else Win32 error.

[0197] 16 WTTCancelTailLog

[0198] Cancel the effect of a previous call to WTTTailLog.

14 DWORD WTTCancelTailLog ( WTTHANDLE pWTTProcInfo, WTTP_LOG_INFO *pWTTLogInfo );

[0199] Parameters:

[0200] pWTTProcInfo

[0201] Information about the process of interest to be passed over to the remote device.

[0202] pWTTLogInfo

[0203] This structure contains the log information. It includes the UNC path of the log file. If this pointer is NULL, then cancel all tail logs for the process identified by the pWTTProcInfo parameter.

[0204] Return Values:

[0205] ERROR_SUCCESS if the cancellation is successful, else Win32 error.

[0206] 17 WTTOpenProcess

[0207] Get a WTT process handle.

15 DWORD WTTOpenProcess ( IN WTTPROCLISTINFO *pWTTProcessInfo, OUT WTTHANDLE *pWTTProcInfo );

[0208] Parameters:

[0209] pWTTProcessInfo

[0210] A pointer to the element in the array retrieved by WTTGetProcessListInfo that concerns the process of interest.

[0211] pWTTProcInfo

[0212] A returned pointer to a handle to the process of interest.

[0213] Return Values:

[0214] ERROR_SUCCESS if the handle is successfully retrieved, else Win32 error.

[0215] Implementation Notes:

[0216] The handle has information like the GUID of the process, the name of the device on which the process runs, etc. Once the handle is received, it is more efficient to store its information in a local process table and to then call WTTCloseHandle to release the memory.

[0217] 18 WTTCloseHandle

[0218] Close a WTT process handle. This releases the memory allocated by the WTTOpenProcess call. The local process table entry created for the process is marked as invalid.

[0219] DWORD WTTCloseHandle(WTTHANDLE*pWTTProcInfo);

[0220] Parameter:

[0221] pWTTProcInfo

[0222] A pointer to a handle to the process of interest.

[0223] Return Values:

[0224] ERROR_SUCCESS if the handle is successfully closed, else Win32 error.

[0225] 19 WTTConsoleOutput

[0226] Provide console output for a process. A callback allows this function to return asynchronously.

16 DWORD WTTConsoleOutput ( WTTHANDLE pWTTProcInfo, WTTPROC_CALLBACK CALLBACKFUNCTION );

[0227] Parameters:

[0228] pWTTProcInfo

[0229] Process information stored in a WTTHANDLE structure.

[0230] CALLBACKFUNCTION

[0231] Register a callback function with the spsrv service to retrieve data asynchronously

[0232] Return Values:

[0233] ERROR_SUCCESS if the console output stream is successfully initialized, else Win32 error.

[0234] 20 WTTCancelConsoleOutput

[0235] Cancel the console output associated with a particular process.

[0236] DWORD WTTCancelConsoleOutput(WTTHANDLE pWTTProcInfo);

[0237] Parameter:

[0238] pWTTProcInfo

[0239] Process information stored in a WTTHANDLE structure.

[0240] Return Values:

[0241] ERROR_SUCCESS if the cancellation is successful, else Win32 error.

[0242] 21 WTTSetLogFile

[0243] Add a log file to the list of log files to which a process logs.

17 DWORD WTTSetLogFile ( WTTHANDLE pProcessInfo, LPCWSTR pszLogFile );

[0244] Parameters:

[0245] pProcessInfo

[0246] Process information stored in a WTTHANDLE structure.

[0247] pszLogFile

[0248] The name of the log file to add to the list.

[0249] Return Values:

[0250] ERROR_SUCCESS if the log file is successfully added to the list, else Win32 error.

[0251] 22 WTTPROC_CALLBACK

[0252] The functions WTTTailLog and WTTConsoleOutput use callback functions to allow them to return asynchronously. The structure of the callback function is as follows:

18 typedef DWORD (*WTTPROC_CALLBACK) ( SOCKET hSocket, LPVOID pData, DWORD dwBytes );

[0253] 23 Note on UUIDs

[0254] UUIDs (also called GUIDs) provide unique designations of objects such as processes, interfaces, manager entry-point vectors, and client objects. In practice, these identifiers need only be unique within the context of their use, that is, within the set of communicating computing devices. Because techniques already exist for making the identifiers truly unique, those techniques are used here.

19 typedef struct_GUID { unsigned long Data1; unsigned short Data2; unsigned short Data3; unsigned char Data4[8]; } GUID; typedef GUID UUID;

[0255] Members:

[0256] Data1

[0257] The first eight hexadecimal digits of the UUID.

[0258] Data2

[0259] The first group of four hexadecimal digits of the UUID.

[0260] Data3

[0261] The second group of four hexadecimal digits of the UUID.

[0262] Data4

[0263] An array of eight elements. The first two elements of the array contain the third group of four hexadecimal digits of the UUID. The remaining six elements contain the final twelve hexadecimal digits of the UUID.

[0264] Remarks:

[0265] For implementations based on Microsoft's "WINDOWS" operating systems, the following standard Win32 functions are used to create, compare, and manipulate UUIDs. Other implementation platforms provide similar functions.

20 signed int RPC_ENTRY UuidCompare ( UUID *Uuid1, UUID *Uuid2, RPC_STATUS *Status ); RPC_STATUS RPC_ENTRY UuidCreate(UUID *Uuid); RPC_ENTRY UuidCreateNil(UUID *Nil_Uuid); RPC_STATUS RPC_ENTRY UuidFromString ( unsigned char *StringUuid, UUID *Uuid ); RPC_STATUS RPC_ENTRY UuidToString ( UUID *Uuid, unsigned char **StringUuid );

[0266] 24 Note on Non-WTT-Based Processes

[0267] A suitable infrastructure is provided for tagging and monitoring non-WTT-based processes. Every non-WTT-based process created by the WTTCreateProcess function is given a WTT-created GUID for tagging. The GUID is stored in the WTT-based process handle for future tracking purposes.

[0268] A Global Event handle is present for every non-WTT-based process. The naming structure of this handle is "Event.backslash.<GUID>" and it is present on the device on which the process is created. When a non-WTT-based process is created, it has the option of waiting on this event handle and performing a clean shutdown if requested.

[0269] 25 Note on Locking

[0270] Central to the implementation of this API is the process table. The process table has row-level exclusive locks and a global process table lock that over-rides the row-level locks.

[0271] There are at least six points in time when locking comes into play:

[0272] (a) When the parent process looks for an empty slot in the process table for the new child process;

[0273] (b) When the parent process reserves a slot in the process table by writing in the GUID of the child process, the GUID of the parent process, a Group GUID (if any), the time the child process was created, the Heartbeat Time, the Source Device, the Target Device, and the Process Type (WTT-based or non-WTT-based) (see FIG. 4 and accompanying text for a description of these fields);

[0274] (c) When the child process soon after creation writes in its process identifier and the heartbeat time;

[0275] (d) When a process periodically updates the Heartbeat Time;

[0276] (e) When multiple processes are querying either at the row level or at the process table level; and

[0277] (f) When a WTT-based process is created outside the scope of this API. It looks for a slot in the process table and then gives itself a GUID for identification.

[0278] Considering all these, a global lock (mutex) is needed whenever a write affects the entire process table, as in cases (a), (b), and (f) above. A row-level exclusive lock is needed (after acquiring the global process table) when updating process-specific information, as in cases (c), (d), and (e) above.

Specific Considerations when Communicating with Remote Processes

[0279] While the invention is useful when all processes run on the same computing device, it is also designed for the case when some processes run remotely. This section discusses specific considerations that come into play when the API supports remote processes.

[0280] PWTTPROCESSINFO contains a field called szDestMachine that holds the value of the target device on which the process runs. If the value is NULL, then the call is local. If not, the command and its parameters are sent to the target device, and the results are piped back to the originating device. All calls are synchronous in nature. So, if the target device crashes during the period of passing the command, an appropriate error is returned.

[0281] The need to pass by value argues for using Remote Procedure Calls (RPC) as a message-passing paradigm. On the other hand, if all input parameters to a call are based on parameters passed only by value, then interfaces (function tables) for the call can be set up and the spsrv service used to handle the commands on the remote device. Another consideration is that if 32-bit-based machines communicate with IA64 cluster machines, then RPC is very useful as it takes care of architectural differences. RPC interfaces are flexible in terms of marshaling both pointer-based and value-based parameters.

[0282] Every time a new API call is made, a new GUID may be generated on the device that initiated the call. This GUID is used to "track" the call. The GUID is sent with the call to the target device. The target device keeps track of the GUID. If the target device crashes, then the target device, after re-booting, "calls back" its parent device with the knowledge of the GUID of the last call and the name or IP address of the parent device.

[0283] For every process created on a particular device, a <GUID>.ini file is created in the %windir%.backslash.WTTbin.backsla- sh.GUID directory. (For non-"WINDOWS" implementations, a similar directory is used.) This directory stores information about the process, its threads, and its stack comments. The files store information more persistently than can memory and prevent having to use memory for ever-changing, bulky data. A process is free to update the information in its file whenever the thread comments are updated. If a query about the state of a process is made and if the process no longer has an entry in the process table, but a <GUID>.ini file exists, then the status of the process is updated to ERROR_SERVICE_NOT_ACTIVE. Due to the presence of multiple threads possibly operating simultaneously on this file, synchronization is important. A cleanup routine removes .ini files three or more days old. This is the structure of a <GUID>.ini file:

21 [GLOBAL] GGUID = nnn PID = nnn Status = WTT_PROCESS_RUNNING // Or some other status. [LogFiles] <Log1.log> <Log2.log> [<ThreadId1>] Comment1 Comment2 ... [<ThreadId2>] Comment1 Comment2 ...

[0284] For marshaling parameters for a function call, the spsrv service has a function table that is used to form the receive and send stubs for the spsrv service running on the remote device. To form the stub for receiving data, the buffer is as generic and as flexible as possible. It identifies the function, determines the number of parameters, and sets a fixed order of parameters depending on the function. The following structure is used. It is marshaled into a byte buffer, sent out the socket, and un-marshaled on the other end. When the call completes, the same procedure gets the returned value of the call.

22 // This is the index into the function dispatch table on the remote device. DWORD dwTestAPINum; // This usually corresponds to nCount. DWORD dwNumHWTTProcesses; // Offset into the non-variable-length buffers. DWORD dwHWTTProcOffset[MAX_PROCS]; // The number of processes present in the WTTPROCESSMARSHALPARAM // structure (see below). DWORD dwNumMPProcesses; // Offset into the non-variable-length buffers. DWORD dwNumMPOffset[MAX_PROCS]; // The total number of bytes taken up by the buffer. DWORD dwBytesForBuffer; DWORD dwNumWTTPLogInfo; // Offset into the non-variable-length buffers. DWORD dwNumWTTPLogOffset[MAX_PROCS]; DWORD dwNumWTTProcListElem; // Offset into the non-variable-length buffers. DWORD dwNumWTTProcListOffset[MAX_PROCS]; DWORD dwWaitTimeout; DWORD dwFlags; DWORD dwWaitAll; DWORD dwBytes; // Now for storage for the variable-length data fields. (dwNumHWTTProcesses * sizeof(_M_HWTTPROCESS)) (dwNumMPProcesses * sizeof(WTTPROCESSMARSHALPARAM)) (dwNumWTTPLogInfo * sizeof(WTTP_LOG_INFO)) (dwNumWTTProcListElem * sizeof(WTTPROCLISTINFO))

[0285] The WTTPROCESSMARSHALPARAM structure is based on WTTPROCESSPARAM but each instance of a TCHAR*field is replaced by a DWORD dwLen<sss> and a CHAR*szStr<sss> containing a string and a NULL character. The variable-length data are moved to the end of the buffer so as not to affect the offsets of the non-variable-length fields. The dwLen<sss> length information is stored with the help of the offsets. Each GUID is converted to a string, marshaled, and then re-converted into a GUID on the target device. WTTPROCESSMARSHALPARAM is as follows:

23 typedef struct { DWORD dwFlags; // Flags; currently a reserved field. Input. DWORD dwCreateProcessFlags; // Flags used in CreateProcess. Input. DWORD dwProcessType; // Is this a WTT-based process? Input. DWORD dwOffSets[25]; // Offsets to the variable-length strings. Void *pBuf; ... } WTTPROCESSMARSHALPARAM, *PWTTPROCESSMARSHALPARAM;

[0286] The variable-length strings in WTTPROCESSMARSHALPARAM include szUserName, szPasswd, stCommandLine, stDebugger, stClusterName, stLogFile, szGuid, szGroupGuid, szParentGuid, szSourceMachine, and szTargetMachine.

[0287] The output buffer for most calls contains the following information: information in HWTTPROCESS, marshaled as _M_HWTTPROCESS; dwSummaryStatus; and dwSummaryIndex. Variable-length data are put at the end of the buffer. For WTTGetProcessListInfo, a list is formed of entries containing information about the processes of interest. The information carried back is as follows: a list of threads present including their thread identifiers; a list of comments on a per-thread basis; and a list of variations completed by the process. The data structures useful for marshaling this data are as follows:

24 Struct_WTTP_THREAD_INFO { DWORD dwThreadId; // Offset into the comments strings for a thread. DWORD dwCommentOffset[MAX_COMMENTS_PER.sub.-- THREAD]; } Struct_WTTP_VARIATION_INFO { // Offset into the variable-length name strings. DWORD dwVarnNameOffset[MAX_VARN- S]; } Struct_WTTP_LOG_INFO { // Offset into the log strings. DWORD dwLogOffset[MAX_LOGS_PER_PROC]; }

[0288] The structure of the marshaling buffer is as follows (no pointers are passed):

25 // The size of this entire buffer in bytes. DWORD dwBuffSize; DWORD _dwThreadCount; // All fixed-length data for threads (i.e., the thread identifier and the offsets for the // comments) go here while the actual comments are in the variable-length section. Struct_WTTP_THREAD_INFO *pThreadInfo; DWORD _dwVariationCount; Struct_WTTP_VARIATION_INFO *pVarnInfo; DWORD _dwLogCount; Struct_WTTP_LOG_INFO *pLogInfo;

[0289] (The variable-length data go here.)

[0290] To be exported are the following 2 variables:

26 PDWORD pdwThreadCount; PWTTPROCESS_THREAD_INFO *pThreadInfo;

[0291] WTTGetProcessListInfo retrieves information about a set of processes. Its return buffer contains the following information:

27 // The size of this entire buffer in bytes. DWORD dwBuffSize; // The number of processes whose information is returned in this buffer. DWORD dwProcs; DWORD dwProcInfoOffset[WTT_MAX_PROCS]; DWORD dwProcessId; DWORD dwGuidOffSet; DWORD dwSrcMcOffset; DWORD dwDestMcOffset; DWORD _dwProcListCount; // The time of the last recorded heartbeat is split into two parts. DWORD LastHBTimeHighDword; DWORD LastHBTimeLowDword; DWORD dwHeartBeat;

[0292] In view of the many possible embodiments to which the principles of this invention may be applied, it should be recognized that the embodiments described herein with respect to the drawing figures are meant to be illustrative only and should not be taken as limiting the scope of invention. Therefore, the invention as described herein contemplates all such embodiments as may come within the scope of the following claims and equivalents thereof.

* * * * *