GLOBUS ASCII HELPER PROTOCOL for HTCondor VERSION 1.0 Kirill (Carey) Kireyev [ckireyev@cs.wisc.edu] June 2005 HTCondor Project [http://research.cs.wisc.edu/htcondor] Department of Computer Sciences University of Wisconsin-Madison 1210 W. Dayton Street Madison, WI 53706 [http://www.cs.wisc.edu] 3.0 C-GAHP COMMANDS The following commands can be recognized in C-GAHP as of version 1.0.0 of the protocol: ASYNC_MODE_ON ASYNC_MODE_OFF COMMANDS CONDOR_JOB_SUBMIT CONDOR_JOB_REMOVE CONDOR_JOB_STATUS_CONSTRAINED CONDOR_JOB_UPDATE_CONSTRAINED CONDOR_JOB_UPDATE CONDOR_JOB_HOLD CONDOR_JOB_RELEASE CONDOR_JOB_STAGE_IN CONDOR_JOB_STAGE_OUT CONDOR_JOB_REFRESH_PROXY REFRESH_PROXY_FROM_FILE QUIT RESULTS VERSION 3.1 CONVENTIONS AND TERMS USED IN SECTION 3.2 Below are definitions for the terms used in the sections to follow: The characters carriage return and line feed (in that order), _or_ solely the line feed character. The space character. line A sequence of ASCII characters ending with a . Request Line A request for action on the part of the GAHP server. Return Line A line immediately returned by the GAHP server upon receiving a Request Line. Result Line A line sent by the GAHP server in response to a RESULTS request, which communicates the results of a previous asynchronous command Request. S: and R: In the Example sections for the commands below, the prefix "S: " is used to signify what the client sends to the GAHP server. The prefix "R: " is used to signify what the client receives from the GAHP server. Note that the "S: " or "R: " should not actually be sent or received. 3.2 GAHP COMMAND STRUCTURE GAHP commands consist of three parts: * Request Line * Return Line * Result Line Each of these "Lines" consists of a variable length character string ending with the character sequence . A Request Line is a request from the client for action on the part of the GAHP server. Each Request Line consists of a command code followed by argument field(s). Command codes are a string of alphabetic characters. Upper and lower case alphabetic characters are to be treated identically with respect to command codes. Thus, any of the following may represent the gram_job_request command: gram_job_request Gram_Job_Request grAm_joB_reQUEst GRAM_JOB_REQUEST In contrast, the argument fields of a Request Line are _case sensitive_. The Return Line is always generated by the server as an immediate response to a Request Line. The first character of a Return Line will contain one the following characters: S - for Success F - for Failure E - for a syntax or parse Error Any Request Line which contains an unrecognized or unsupported command, or a command with an insufficient number of arguments, will generate an "E" response. The Result Line is used to support commands that would otherwise block. Any GAHP command which may require the implementation to block on network communication require a "request id" as part of the Request Line. For such commands, the Result Line just communicates if the request has been successfully parsed and queued for service by the GAHP server. At this point, the GAHP server would typically dispatch a new thread to actually service the request. Once the request has completed, the dispatched thread should create a Result Line and enqueue it until the client issues a RESULT command. 3.3 TRANSPARENCY Arguments on a particular Line (be it Request, Return, or Result) are typically separated by a . In the event that a string argument needs to contain a within the string itself, it may be escaped by placing a backslash ("\") in front of the character. Thus, the character sequence "\ " (no quotes) must not be treated as a separator between arguments, but instead as a space character within a string argument. 3.4 SEQUENCE OF EVENTS Upon startup, the GAHP server should output to stdout a banner string which is identical to the output from the VERSION command without the beginning "S " sequence (see example below). Next, the GAHP server should wait for a complete Request Line from the client (e.g. stdin). The server is to take no action until a Request Line sequence is received. At startup, allowable commands are limited to the following: COMMANDS INITIALIZE_FROM_FILE INITIALIZE_FROM_MYPROXY QUIT VERSION Once a successful INITIALIZE_* command is completed, any other command may be issued. If any command outside of the above list is issued before a successful INITIALIZE_* command, the Result Line should consist of : E Example: R: $GahpVersion 2.0.0 Jan 21 2004 HTCondor\ GAHP $ S: COMMANDS R: S CONDOR_JOB_SUBMIT CONDOR_JOB_REMOVE CONDOR_JOB_STATUS_CONSTRAINED CONDOR_JOB_UPDATE_CONSTRAINED CONDOR_JOB_UPDATE CONDOR_JOB_HOLD CONDOR_JOB_RELEASE CONDOR_JOB_STAGE_IN CONDOR_JOB_STAGE_OUT CONDOR_JOB_REFRESH_PROXY ASYNC_MODE_ON ASYNC_MODE_OFF RESULTS QUIT S: VERSION R: S $GahpVersion 2.0.0 Jan 21 2004 HTCondor\ GAHP $ S: INITIALIZE_FROM_FILE /tmp/grid_proxy_554523.txt R: S S: RESULTS R: S 0 S: RESULTS R: S 1 R: 100 0 S: QUIT R: S 3.4 COMMAND SYNTAX This section contains the syntax for the Request, Return, and Result line for each command. ----------------------------------------------- INITIALIZE_FROM_FILE Initialize the GAHP server and provide it with a GSI (Grid Security Infrastructure) proxy certificate which will be used by the GAHP server for all subsequent authentication which requires GSI credentials. + Request Line: INITIALIZE_FROM_FILE = a fully-qualified pathname to a file local to the GAHP server which contains a valid GSI proxied certificate. + Return Line: One of the following: S F Upon success, use the "S" version; if not recognized, use the "F" version. * error_string = brief string description of the error, appropriate for reporting to a human end-user. + Result Line: None. ----------------------------------------------- COMMANDS List all the commands from this protocol specification which are implemented by this GAHP server. + Request Line: COMMANDS + Return Line: S ... + Result Line: None. ----------------------------------------------- VERSION Return the version string for this GAHP. The version string follows a specified format (see below). Ideally, the version entire version string, including the starting and ending dollar sign ($) delimiters, should be a literal string in the text of the GAHP server executable. This way, the Unix/RCS "ident" command can produce the version string. The version returned should correspond to the version of the protocol supported. + Request Line: VERSION + Return Line: S $GahpVesion: .. $ * major.minor.subminor = for this version of the protocol, use version 1.0.0. * build-month = string with the month abbreviation when this GAHP server was built or released. Permitted values are: "Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", and "Dec". * build-day-of-month = day of the month when GAHP server was built or released; an integer between 1 and 31 inclusive. * build-year = four digit integer specifying the year in which the GAHP server was built or released. * general-descrip = a string identifying a particular GAHP server implementation. + Result Line: None. + Example: S: VERSION R: S $GahpVersion: 1.0.0 Nov 26 2001 NCSA\ CoG\ Gahpd $ ----------------------------------------------- ASYNC_MODE_ON Enable Asynchronous notification when the GAHP server has results pending for a client. This is most useful for clients that do not want to periodically poll the GAHP server with a RESULTS command. When asynchronous notification mode is active, the GAHP server will print out an 'R' (without the quotes) on column one when the 'RESULTS' command would return one or more lines. The 'R' is printed only once between successive 'RESULTS' commands. The 'R' is also guaranteed to only appear in between atomic return lines; the 'R" will not interrupt another command's output. If there are already pending results when the asynchronous results available mode is activated, no indication of the presence of those results will be given. A GAHP server is permitted to only consider changes to it's result queue for additions after the ASYNC_MODE_ON command has successfully completed. GAHP clients should issue a 'RESULTS' command immediately after enabling asynchronous notification, to ensure that any results that may have been added to the queue during the processing of the ASYNC_MODE_ON command are accounted for. + Request Line: ASYNC_MODE_ON + Return Line: S Immediately afterwards, the client should be prepared to handle an R appearing in the output of the GAHP server. + Result Line: None. + Example: S: ASYNC_MODE_ON R: S S: GRAM_PING 00001 beak.cs.wisc.edu/jobmanager R: S S: GRAM_PING 00002 nostos.cs.wisc.edu/jobmanager R: S R: R S: RESULTS R: S 2 R: 00001 0 R: 00002 0 Note that you are NOT guaranteed that the 'R' will not appear between the dispatching of a command and the return line(s) of that command; the GAHP server only guarantees that the 'R' will not interrupt an in-progress return. The following is also a legal example: S: ASYNC_MODE_ON R: S S: GRAM_PING 00001 beak.cs.wisc.edu/jobmanager R: S S: GRAM_PING 00002 nostos.cs.wisc.edu/jobmanager R: R R: S S: RESULTS R: S 2 R: 00001 0 R: 00002 0 (Note the reversal of the R and the S after GRAM_PING 00002) ----------------------------------------------- ASYNC_MODE_OFF Disable asynchronous results-available notification. In this mode, the only way to discover available results is to poll with the RESULTS command. This mode is the default. Asynchronous mode can be enable with the ASYNC_MODE_ON command. + Request Line: ASYNC_MODE_OFF + Return Line: S + Results Line: None + Example: S: ASYNC_MODE_OFF R: S ----------------------------------------------- QUIT Free any/all system resources (close all sockets, etc) and terminate as quickly as possible. + Request Line: QUIT + Return Line: S Immediately afterwards, the command pipe should be closed and the GAHP server should terminate. + Result Line: None. ----------------------------------------------- RESULTS Display all of the Result Lines which have been queued since the last RESULTS command was issued. Upon success, the first return line specifies the number of subsequent Result Lines which will be displayed. Then each result line appears (one per line) -- each starts with the request ID which corresponds to the request ID supplied when the corresponding command was submitted. The exact format of the Result Line varies based upon which corresponding Request command was issued. IMPORTANT: Result Lines must be displayed in the _exact order_ in which they were queued!!! In other words, the Result Lines displayed must be sorted in the order by which they were placed into the GAHP's result line queue, from earliest to most recent. + Request Line: RESULTS + Return Line(s): S ... ... ... * reqid = integer Request ID, set to the value specified in the corresponding Request Line. + Result Line: None. + Example: S: RESULTS R: S 1 R: 100 0 ----------------------------------------------- CONDOR_JOB_SUBMIT Submit a job request to a specified resource. This will cause the job to be submitted to Globus. + Request Line: CONDOR_JOB_SUBMIT * reqid = non-zero integer Request ID * resource_contact_string = a contact string of the remote resource (usually a name or hostname of an accessable schedd) * job_classad = a description of a HTCondor job in ClassAd format (see: http://research.cs.wisc.edu/htcondor/classad/) + Return Line: * result = the character "S" (no quotes) for successful submission of the request (meaning that the request is now pending), or an "E" for error on the parse of the request or its arguments (e.g. an unrecognized or unsupported command, or for missing or malformed arguments). + Result Line: * reqid = integer Request ID, set to the value specified in the corresponding Request Line. * result_code = integer equal to 0 on success, or an error code * job_contact = on success, a string representing a unique identifier for the job. This identifier must not be bound to this GAHP server, but instead must be allowed to be used in subsequent GAHP server instantiations. For instance, the job_contact must be implemented in such a fashion that the following sequence of events by the caller must be permissible: a) issue a GRAM_JOB_REQUEST command b) read the job_contact in the result line c) store the job_contact persistently d) subsequently kill and restart the GAHP server process e) issue a GRAM_JOB_CANCEL command, passing it the stored job_contact value obtained in step (b). It is strongly suggested to GAHP server implementers use the Job Contact string as returned by the Gatekeeper, i.e. a unique contact string (URL) to a Globus Job Manager. ----------------------------------------------- CONDOR_JOB_REMOVE This function removes a job from the remote resource. This is the equivalent of running the "condor_rm" tool. + Request Line: CONDOR_JOB_REMOVE * reqid = non-zero integer Request ID * resource_name = the name or hostname of the remote resource * job_id = the ID of the job on the remote site (as returned by CONDOR_JOB_SUBMIT) * reason_string = A user readable string explaining the reason for removal. + Return Line: * result = the character "S" (no quotes) for successful submission of the request (meaning that the request is now pending), or an "E" for error on the parse of the request or its arguments (e.g. an unrecognized or unsupported command, or for missing or malformed arguments). + Result Line: * reqid = integer Request ID, set to the value specified in the corresponding Request Line. * result_code = integer equal to 0 on success, or an error code * error_message = A user-readable message explaining the error reason, "NULL" if success. ----------------------------------------------- CONDOR_JOB_STATUS_CONSTRAINED Query and report the current status of a set of job specified via a query on an arbitrary set of jobs' attributes. + Request Line: CONDOR_JOB_STATUS_CONSTRAINED * reqid = non-zero integer Request ID * resource_name = the name of the remote resource (typicall SchedD) * constraint_query = a logical expression referencing job attributes. See "condor_q -constraint" for more info. + Return Line: * result = the character "S" (no quotes) for successful submission of the request (meaning that the request is now pending), or an "E" for error on the parse of the request or its arguments (e.g. an unrecognized or unsupported command, or for missing or malformed arguments). + Result Line: ... * reqid = integer Request ID, set to the value specified in the corresponding Request Line. * result_code = integer equal to 0 on success, or an error code * error_string = an integer containing additional information about a failure. * number_results = an integer indicating the number of results that match the query. * ... = zero or more strings containg ClassAds of jobs that match the query. ----------------------------------------------- CONDOR_JOB_UPDATE_CONSTRAINED Update a specified set of attributes on all the job that match given criteria, specified by a constraint. + Request Line: CONDOR_JOB_UPDATE_CONSTRAINED * reqid = non-zero integer Request ID * resource_name = the name of the remote resource (typicall SchedD) * constraint_query = a logical expression referencing job attributes. See "condor_q -constraint" for more info. * update_ad = a ClassAd containing the attributes that need to be updated and their new values. + Return Line: * result = the character "S" (no quotes) for successful submission of the request (meaning that the request is now pending), or an "E" for error on the parse of the request or its arguments (e.g. an unrecognized or unsupported command, or for missing or malformed arguments). + Result Line: * reqid = integer Request ID, set to the value specified in the corresponding Request Line. * result_code = integer equal to 0 on success, or an error code * error_string = an integer containing additional information about a failure. ----------------------------------------------- CONDOR_JOB_UPDATE Update a specified set of attributes for job with a given remote ID. + Request Line: CONDOR_JOB_UPDATE * reqid = non-zero integer Request ID * resource_name = the name or hostname of the remote resource * job_id = the ID of the job on the remote site (as returned by CONDOR_JOB_SUBMIT) * update_ad = a ClassAd containing the attributes that need to be updated and their new values. + Return Line: * result = the character "S" (no quotes) for successful submission of the request (meaning that the request is now pending), or an "E" for error on the parse of the request or its arguments (e.g. an unrecognized or unsupported command, or for missing or malformed arguments). + Result Line: * reqid = integer Request ID, set to the value specified in the corresponding Request Line. * result_code = integer equal to 0 on success, or an error code * error_message = A user-readable message explaining the error reason, "NULL" if success. ----------------------------------------------- CONDOR_JOB_HOLD This function puts on hold the specified job on the remote resource. This is the equivalent of running the "condor_hold" tool. + Request Line: CONDOR_JOB_REMOVE * reqid = non-zero integer Request ID * resource_name = the name or hostname of the remote resource * job_id = the ID of the job on the remote site (as returned by CONDOR_JOB_SUBMIT) * reason_string = A user readable string explaining the reason for putting on hold. + Return Line: * result = the character "S" (no quotes) for successful submission of the request (meaning that the request is now pending), or an "E" for error on the parse of the request or its arguments (e.g. an unrecognized or unsupported command, or for missing or malformed arguments). + Result Line: * reqid = integer Request ID, set to the value specified in the corresponding Request Line. * result_code = integer equal to 0 on success, or an error code * error_message = A user-readable message explaining the error reason, "NULL" if success. ----------------------------------------------- CONDOR_JOB_RELEASE This function takes off hold (releases) the specified job on the remote resource. This is the equivalent of running the "condor_release" tool. + Request Line: CONDOR_JOB_RELEASE * reqid = non-zero integer Request ID * resource_name = the name or hostname of the remote resource * job_id = the ID of the job on the remote site (as returned by CONDOR_JOB_SUBMIT) * reason_string = A user readable string explaining the reason for putting on hold. + Return Line: * result = the character "S" (no quotes) for successful submission of the request (meaning that the request is now pending), or an "E" for error on the parse of the request or its arguments (e.g. an unrecognized or unsupported command, or for missing or malformed arguments). + Result Line: * reqid = integer Request ID, set to the value specified in the corresponding Request Line. * result_code = integer equal to 0 on success, or an error code * error_message = A user-readable message explaining the error reason, "NULL" if success. ----------------------------------------------- CONDOR_JOB_STAGE_IN Perfrom file staging to the remote resource for a given job. + Request Line: CONDOR_JOB_STAGE_IN * reqid = non-zero integer Request ID * resource_contact_string = a contact string of the remote resource (usually a name or hostname of an accessable schedd) * job_classad = a description of a HTCondor job in ClassAd. It is primarily used to specify the list of files to transfer. + Return Line: * result = the character "S" (no quotes) for successful submission of the request (meaning that the request is now pending), or an "E" for error on the parse of the request or its arguments (e.g. an unrecognized or unsupported command, or for missing or malformed arguments). + Result Line: * reqid = integer Request ID, set to the value specified in the corresponding Request Line. * result_code = integer equal to 0 on success, or an error code * error_message = A user-readable message explaining the error reason, "NULL" if success. ----------------------------------------------- CONDOR_JOB_STAGE_OUT Perfrom file out-staging from the remote resource for a given job. + Request Line: CONDOR_JOB_STAGE_OUT * reqid = non-zero integer Request ID * resource_contact_string = a contact string of the remote resource (usually a name or hostname of an accessable schedd) * job_id = the id of the job on the remote resource + Return Line: * result = the character "S" (no quotes) for successful submission of the request (meaning that the request is now pending), or an "E" for error on the parse of the request or its arguments (e.g. an unrecognized or unsupported command, or for missing or malformed arguments). + Result Line: * reqid = integer Request ID, set to the value specified in the corresponding Request Line. * result_code = integer equal to 0 on success, or an error code * error_message = A user-readable message explaining the error reason, "NULL" if success. ----------------------------------------------- REFRESH_PROXY_FROM_FILE Reset the GSI (Grid Security Infrastructure) proxy certificate cached by the GAHP server, which is used by the GAHP server for all subsequent authentication which requires GSI credentials. This command can be used to tell the GAHP server to use a different credential than the one initially specified via the INITIALIZE_FROM_FILE command. The intention is this command will be used to refresh an about-to-expire credential used by the GAHP server. + Request Line: REFRESH_PROXY_FROM_FILE = a fully-qualified pathname to a file local to the GAHP server which contains a valid GSI proxied certificate. + Return Line: One of the following: S F Upon success, use the "S" version; if not recognized, use the "F" version. * error_string = brief string description of the error, appropriate for reporting to a human end-user. + Result Line: None. ----------------------------------------------- CONDOR_JOB_REFRESH_PROXY Forces the GAHP server to update the remote job's proxy with its current (local) state. + Request Line: CONDOR_JOB_REFRESH_PROXY * reqid = non-zero integer Request ID * resource_contact_string = a contact string of the remote resource (usually a name or hostname of an accessable schedd) * job_id = the id of the job on the remote resource * proxy_file = a fully-qualified pathname to file local to the GAHP server which contains a valid GSI proxied certificate. + Return Line: * result = the character "S" (no quotes) for successful submission of the request (meaning that the request is now pending), or an "E" for error on the parse of the request or its arguments (e.g. an unrecognized or unsupported command, or for missing or malformed arguments). + Result Line: * reqid = integer Request ID, set to the value specified in the corresponding Request Line. * result_code = integer equal to 0 on success, or an error code * error_message = A user-readable message explaining the error reason, "NULL" if success.