Sub Dags Vs Splices

This document is an explanation of why you might want to use the DAGMan external sub-DAG and splice features, and which one you might want to use in a particular situation. (See http://research.cs.wisc.edu/htcondor/manual/v8.5/2_10DAGMan_Applications.html#SECTION003108900000000000000 and http://research.cs.wisc.edu/htcondor/manual/v8.5/2_10DAGMan_Applications.html#SECTION0031081000000000000000 for detailed information about external sub-DAGs and splices, respectively.)

When should I use one of these features?

Both external sub-DAGs and splices allow you to compose a large workflow from various sub-pieces that are defined in individual DAG files. This is the basic motivation for using either external sub-DAGs or splices: you want to create a single workflow from a number of DAG files, either because the smaller DAG files already exist, or because it's easier to deal with sub-parts of the workflow. (One use case might be that you have sub-workflows that you want to combine in different ways to make different overall workflows.)

Some reasons to use external sub-DAGs or splices:

  • Create a workflow from separate sub-workflows
  • Dynamically create parts of the workflow (external sub-DAGs only)
  • Re-try multiple nodes as a unit (external sub-DAGs only)
  • Short-circuit parts of the workflow (external sub-DAGs only)

Feature comparison

Here's a table comparing external sub-DAGs and splices. Note that the bold entries are the ones that are advantageous for a given feature.

Feature External sub-DAGs Splices Notes
Ability to incorporate separate sub-workflow files yes yes
Rescue DAG(s) created upon failure yes yes
DAG recovery (e.g., from submit machine crash) yes yes
Creates multiple DAGMan instances in the queue yes no
Possible combinatorial explosion of dependencies (see below) no yes Until we implement socket nodes for splices
Sub-workflow files must exist at submission no yes
PRE/POST scripts allowed on sub-workflows yes no Until we implement socket nodes for splices
Ability to retry sub-workflows yes no
Job/script throttling applies across entire workflow no yes
Separate job/script throttles for each sub-workflow yes no
Node categories can apply across entire workflow no yes
Ability to set priority on sub-workflows as nodes yes no
Ability to reduce workflow memory footprint yes? no If used properly
Ability to have separate final nodes in sub-workflows yes no
Ability to abort sub-workflows individually yes no
Ability to associate variables with sub-workflow nodes yes no
Ability to configure sub-workflows individually yes no Can be good or bad
Separate node status files, etc., for sub-workflows yes no
A single halt file or condor_hold suspends the entire workflow no yes

Possible combinatorial explosion of dependencies

When one splice is the immediate parent of another splice, it is possible for an extremely large number of dependencies to be created. This is because every "terminal" node of the parent splice becomes a parent of every "inital" node in the child splice. So, for example, if the parent splice has 1000 "terminal" nodes and the child splice has 1000 "initial" nodes, 1 million dependencies will be created. (A "terminal" node is a node that has no children within its splice; and an "initial" node is a node that has no parents within its splice.)

Should I use external sub-DAGs or splices?

The simple answer is that, unless you need one of the features that's available with external sub-DAGs but not with splices (see the table above), you should use splices. Splices are generally simpler and have less overhead than external sub-DAGs (unless the workflow is specifically designed to minimize the external sub-DAG overhead). Also, workflow-wide throttling is generally more useful than separate throttles for sub-parts of the workflow.

How to use external sub-DAGs to reduce workflow memory footprint

(Coming soon!)

Note: This document is valid for HTCondor version 8.5.5.