Skip to content

Submitting Jobs Remotely to an HTCondor-CE

This document outlines how to submit jobs to an HTCondor-CE from a remote client using two different methods:

  • With dedicated tools for quickly verifying end-to-end job submission, and
  • From an existing HTCondor submit host, useful for developing pilot submission infrastructure

If you are the administrator of an HTCondor-CE, consider verifying your HTCondor-CE using the administrator-focused documentation.

Before Starting

Before attempting to submit jobs to an HTCondor-CE as documented below, ensure the following:

Submission with Debugging Tools

The HTCondor-CE client contains debugging tools designed to quickly test an HTCondor-CE. To use these tools, install the RPM package from the relevant Yum repository:

root@host # yum install htcondor-ce-client

Verify end-to-end submission

The HTCondor-CE client package includes a debugging tool that perform tests of end-to-end job submission called condor_ce_trace. To submit a diagnostic job with condor_ce_trace, run the following command:

user@host $ condor_ce_trace --debug <CE HOST>

Replacing <CE HOST> with the hostname of the CE you wish to test. On success, you will see Job status: Completed and the job's environment on the worker node where it ran. If you do not see the expected output, refer to the troubleshooting guide.

CONDOR_CE_TRACE_ATTEMPTS

For a busy site cluster, it may take longer than the default 5 minutes to test end-to-end submission. To extend the length of time that condor_ce_trace waits for the job to complete, prepend the command with _condor_CONDOR_CE_TRACE_ATTEMPTS=<TIME IN SECONDS>.

(Optional) Requesting resources

condor_ce_trace doesn't make any specific resource requests so its jobs are only given the default resources as configured by the HTCondor-CE you are debugging. To request specific resources (or other job attributes), you can specify the --attribute option on the command line:

user@host $ condor_ce_trace --debug \
                            --attribute='+resource1=value1'... \
                            --attribute='+resourceN=valueN' \
                            ce.htcondor.org

For example, the following command submits a test job requesting 4 cores, 4 GB of RAM, a wall clock time of 2 hours, and the 'osg' queue, run the following command:

user@host $ condor_ce_trace --debug \
                            --attribute='+xcount=4' \
                            --attribute='+maxMemory=4000' \
                            --attribute='+maxWallTime=120' \
                            --attribute='+remote_queue=osg' \
                            ce.htcondor.org

For a list of other attributes that can be set with the --attribute option, consult the submit file commands section.

Note

Non-HTCondor batch systems may need additional HTCondor-CE configuration to support these job attributes. See the batch system integration for details on how to support them.

Submission with HTCondor Submit

If you need to submit more complicated jobs than a trace job as described above (e.g. for developing piilot job submission infrastructures) and have access to an HTCondor submit host, you can use standard HTCondor submission tools.

Submit the job

To submit jobs to a remote HTCondor-CE (or any other externally facing HTCondor SchedD) from an HTCondor submit host, you need to construct an HTCondor submit file describing an HTCondor-C job:

  1. Write a submit file, ce_test.sub:

    # Required for remote HTCondor-CE submission
    universe = grid
    use_x509userproxy = true
    grid_resource = condor ce.htcondor.org ce.htcondor.org:9619
    
    # Files
    executable = ce_test.sh
    output = ce_test.out
    error = ce_test.err
    log = ce_test.log
    
    # File transfer behavior
    ShouldTransferFiles = YES
    WhenToTransferOutput = ON_EXIT
    
    # Optional resource requests
    #+xcount = 4            # Request 4 cores
    #+maxMemory = 4000      # Request 4GB of RAM
    #+maxWallTime = 120     # Request 2 hrs of wall clock time
    #+remote_queue = "osg"  # Request the OSG queue
    
    # Submit a single job
    queue
    

    Replacing ce_test.sh with the path to the executable you wish to run and ce.htcondor.org with the hostname of the CE you wish to test.

    Note

    The grid_resource line should start with condor and is not related to which batch system you are using.

  2. Submit the job:

    user@host $ condor_submit ce_test.sub
    

Tracking job progress

You can track job progress by by querying the local queue:

user@host $ condor__q

As well as the remote HTCondor-CE queue:

user@host $ condor__q -name <CE HOST> -pool <CE HOST>:9619

Replacing <CE HOST> with the FQDN of the HTCondor-CE. For reference, condor_q -help status will provide details of job status codes.

user@host $ condor_q -help status | tail

    JobStatus codes:
     1 I IDLE
     2 R RUNNING
     3 X REMOVED
     4 C COMPLETED
     5 H HELD
     6 > TRANSFERRING_OUTPUT
     7 S SUSPENDED

Troubleshooting

All interactions between condor_submit and the HTCondor-CE will be recorded in the file specified by the log command in your submit file. This includes acknowledgement of the job in your local queue, connection to the CE, and a record of job completion:

000 (786.000.000) 12/09 16:49:55 Job submitted from host: <131.225.154.68:53134>
...
027 (786.000.000) 12/09 16:50:09 Job submitted to grid resource
    GridResource: condor ce.htcondor.org ce.htcondor.org:9619
    GridJobId: condor ce.htcondor.org ce.htcondor.org:9619 796.0
...
005 (786.000.000) 12/09 16:52:19 Job terminated.
        (1) Normal termination (return value 0)
                Usr 0 00:00:00, Sys 0 00:00:00  -  Run Remote Usage
                Usr 0 00:00:00, Sys 0 00:00:00  -  Run Local Usage
                Usr 0 00:00:00, Sys 0 00:00:00  -  Total Remote Usage
                Usr 0 00:00:00, Sys 0 00:00:00  -  Total Local Usage
        0  -  Run Bytes Sent By Job
        0  -  Run Bytes Received By Job
        0  -  Total Bytes Sent By Job
        0  -  Total Bytes Received By Job

If there are issues contacting the HTCondor-CE, you will see error messages about a Down Globus Resource:

020 (788.000.000) 12/09 16:56:17 Detected Down Globus Resource
    RM-Contact: ce.htcondor.org
...
026 (788.000.000) 12/09 16:56:17 Detected Down Grid Resource
    GridResource: condor ce.htcondor.org ce.htcondor.org:9619

This indicates a communication issue with the HTCondor-CE that may be diagnosed with condor_ce_ping.

Submit File Commands

The following table is a reference of commands that are commonly included in HTCondor submit files used for HTCondor-CE resource allocation requests. A more comprehensive list of submit file commands specific to HTCondor can be found in the HTCondor manual.

HTCondor string values

If you are setting an attribute to a string value, make sure enclose the string in double-quotes (")

Command Description
arguments Arguments that will be provided to the executable for the resource allocation request.
error Path to the file on the client host that stores stderr from the resource allocation request.
executable Path to the file on the client host that the resource allocation request will execute.
input Path to the file on the client host that stores input to be piped into the stdin of the resource allocation request.
+maxMemory The amount of memory in MB that you wish to allocate to the resource allocation request.
+maxWallTime The maximum walltime (in minutes) the resource allocation request is allowed to run before it is removed.
output Path to the file on the client host that stores stdout from the resource allocation request.
+remote_queue Assign resource allocation request to the target queue in the scheduler.
transfer_input_files A comma-delimited list of all the files and directories to be transferred into the working directory for the resource allocation request, before the resource allocation request is started.
transfer_output_files A comma-delimited list of all the files and directories to be transferred back to the client, after the resource allocation request completes.
+WantWholeNode When set to True, request entire node for the resource allocation request (HTCondor batch systems only)
+xcount The number of cores to allocate for the resource allocation request.

Getting Help

If you have any questions or issues with job submission, please contact us for assistance.