How To Chirp Remote Io

How to do remote IO in vanilla universe with Chirp and Parrot (on CHTC submit node)

The directions below work for the CHTC submit node.

One of the neat features of Standard Universe is the ability to do remote IO, meaning, you can read from/write to files in your remote submit folder without the need to transfer them back and forth between the submit- and the execute node.

This howto briefly describes how you can do remote IO leveraging the HTCondor-integrated condor_chirp and Parrot . All of this should work with HTCondor versions 7.6.0 and newer.

What is required?

In order to do this, all you need is HTCondor version 7.6.0 or newer and you need Parrot . You will need to have the version matched to your execute node architectures available. It is more convenient to install parrot directly on the execute nodes but you can also transfer it with your job data. A small caveat: Parrot will only work on Linux, which means that your execute machine needs to run some flavor of Linux. Your submit node, however, can run whatever OS that is supported by HTCondor 7.6.0.

How to create your submit file?

To have your job do remote IO, all you now need to do is to create a wrapper such that your job can be run under parrot. The easiest way is to have a script such as

#!/bin/sh
######## wrapper.sh ################
# This script directly passes all
# its arguments into 'parrot_run'
####################################

./parrot_run "$@"

Ultimately, this script will be your job executable and you'll specify your actual program to be run (let's assume its name is 'foo') and its arguments as job args:

#####################
#### condor.submit
#####################

universe = vanilla
executable = wrapper.sh
arguments = foo arg1 arg2 arg3
+WantIOProxy = True
...

+WantIOProxy = True will trigger start up of the chirp service which is needed to do remote IO, so make sure you have that in your submit file too.

Now, the only thing that is left to know: Remote IO does not just happen in your current working directory as you may be used to from your standard universe jobs. Your submit directory is hidden under /chirp/CONDOR/ . So whenever you want to read a file from there or write to there, you'll have to prefix your file name with this path.

Example: Your application wants to read the file input/inputRead.dat and write to output/outWrite.dat and you pass these as arguments. Typically you would call

./foo input/inputRead.dat output/outWrite.dat

now you will need to pass

./foo /chirp/CONDOR/$ENV(PWD)/input/inputRead.dat /chirp/CONDOR/$ENV(PWD)/output/outWrite.dat

This will be easier for you if your application does not have paths encoded statically but rather takes them as arguments (or environment variables).

Complete submit file example

Last but not least a complete example submit file combining all the mentioned steps:

#####################
#### condor.submit
#####################

universe = vanilla
executable = wrapper.sh
arguments = foo /chirp/CONDOR/$ENV(PWD)/input/inputRead.dat /chirp/CONDOR/$ENV(PWD)/input/inputRead.dat
+WantIOProxy = True
should_transfer_files = YES
when_to_transfer_output = ON_EXIT

# in this example we are going to transfer parrot_run with the application
# and assume it being located in /usr/local
transfer_input_files = /usr/local/bin/parrot_run,foo

queue

Additional Resources

Obviously there are a variety of useful things you can do with chirp and parrot. Remote IO is only one of them.