Skip to content

For HTCondor Batch Systems

This page contains information about job routes that can be used if you are running an HTCondor pool at your site.

Setting periodic hold or release

Avoid setting PERIODIC_REMOVE expressions

The HTCondor Job Router will automatically resubmit jobs that are removed by the underlying batch system, which can result in unintended churn. Therefore, it is recommended to append removal expressions to HTCondor-CE's configuration by adding the following to a file in /etc/condor-ce/config.d/

SYSTEM_PERIODIC_REMOVE = $(SYSTEM_PERIODIC_REMOVE) || <YOUR REMOVE EXPRESSION>

To release or put routed jobs on hold if they meet certain criteria, use the Periodic* family of attributes. By default, periodic expressions are evaluated once every 300 seconds but this can be changed by setting PERIODIC_EXPR_INTERVAL in your local HTCondor configuration.

In this example, we set the routed job on hold if the job is idle and has been started at least once or if the job has tried to start more than once. This will catch jobs which are starting and stopping multiple times.

JOB_ROUTER_ROUTE_Condor_Pool @=jrt
  UNIVERSE VANILLA
  # Puts the routed job on hold if the job's been idle and has been started at least
  # once or if the job has tried to start more than once
  SET PeriodicHold ((NumJobStarts >= 1 && JobStatus == 1) || NumJobStarts > 1)
  # Release routed jobs if the condor_starter couldn't start the executable and 
  # 'VMGAHP_ERR_INTERNAL' is in the HoldReason
  SET PeriodicRelease = (HoldReasonCode == 6 && regexp("VMGAHP_ERR_INTERNAL", HoldReason))
@jrt

JOB_ROUTER_ROUTE_NAMES = Condor_Pool
JOB_ROUTER_ENTRIES @=jre
[
  TargetUniverse = 5;
  name = "Condor_Pool";
  # Puts the routed job on hold if the job's been idle and has been started at least
  # once or if the job has tried to start more than once
  set_PeriodicHold = (NumJobStarts >= 1 && JobStatus == 1) || NumJobStarts > 1;
  # Release routed jobs if the condor_starter couldn't start the executable and
  # 'VMGAHP_ERR_INTERNAL' is in the HoldReason
  set_PeriodicRelease = HoldReasonCode == 6 && regexp("VMGAHP_ERR_INTERNAL", HoldReason);
]
@jre

JOB_ROUTER_ROUTE_NAMES = Condor_Pool

Setting routed job requirements

If you need to set requirements on your routed job, you will need to use SET REQUIREMENTS or set_Requirements instead of Requirements for ClassAd transform and deprecated syntaxes, respectively. The Requirements attribute filters jobs coming into your CE into different job routes whereas the set function will set conditions on the routed job that must be met by the worker node it lands on. For more information on requirements, consult the HTCondor manual.

To ensure that your job lands on a Linux machine in your pool:

JOB_ROUTER_ROUTE_Condor_Pool @jrt
  UNIVERSE VANILLA
  SET Requirements = (TARGET.OpSys == "LINUX")
@jrt

JOB_ROUTER_ROUTE_NAMES = Condor_Pool
JOB_ROUTER_ENTRIES @=jre
[
  TargetUniverse = 5;
  name = "Condor_Pool";
  set_Requirements =  (TARGET.OpSys == "LINUX");
]
@jre

JOB_ROUTER_ROUTE_NAMES = Condor_Pool

Preserving original job requirements

To preserve and include the original job requirements, rather than just setting new requirements, you can use COPY Requirements or copy_Requirements to store the current value of Requirements to another variable, which we'll call original_requirements. To do this, replace the above SET Requirements or set_Requirements lines with:

SET Requirements = ($(MY.Requirements)) && (<YOUR REQUIREMENTS EXPRESSION>)
copy_Requirements = "original_requirements";
set_Requirements = original_requirements && ...;

Setting the accounting group based on the credential of the submitted job

A common need in the CE is to want to set the accounting identity of the routed job using information from the credential of the submitter of the job. This originally was done using information from the x509 certificate, in particular X509UserProxyVOName and x509UserProxySubject. With the switch to SCITOKENs, the equivalent job attributes are AuthTokenIssuer and AuthTokenSubject.

It is important to understand that the condor_schedd treats AuthTokenSubject and AuthTokenIssuer as secure attributes. The values of these attributes cannot be supplied by the condor_job_router directly, they will be set based on what credential the condor_job_router uses to submit the routed job. Because of this the value of these attributes in the routed job is almost never the same as the value in the original job. This is different from the way the x509* job attributes behaved.

Because of this, the default CE config will copy all attributes that match AuthToken* to orig_AuthToken* before the route transforms are applied.

Example of setting the accounting group from AuthToken or x509 attributes.

JOB_ROUTER_CLASSAD_USER_MAP_NAMES = $(JOB_ROUTER_CLASSAD_USER_MAP_NAMES) AcctGroupMap
CLASSAD_USER_MAPFILE_AcctGroupMap = <path-to-mapfile>

JOB_ROUTER_TRANSFORM_SetAcctGroup @=end
   REQUIREMENTS (orig_AuthTokenSubject ?: x509UserProxySubject) isnt undefined
   EVALSET AcctGroup UserMap("AcctGroupMap", orig_AuthTokenSubject ?: x509UserProxySubject, AcctGroup)
   EVALSET AccountingGroup join(".", AcctGroup, Owner)
@end

JOB_ROUTER_PRE_ROUTE_TRANSFORMS = $(JOB_ROUTER_PRE_ROUTE_TRANSFORMS) SetAcctGroup

Refer to the HTCondor documentation for information on mapfiles.

Getting Help

If you have any questions or issues with configuring job routes, please contact us for assistance.