Condor Tutorial
First
EuroGlobus Workshop
June 2001
Tutorial Outline
Tutorial Outline
The Condor Project (Established
‘85)
What is High-Throughput
Computing?
What is Condor?
The Condor System
Some HTC Challenges
What is ClassAd Matchmaking?
Upgrade to Condor-G
What Have We Done on the
Grid Already?
NUG30 Solved on the Grid
with Condor + Globus
NUG30 - Solved!!!
The Idea
Meet Frieda.
Frieda’s Application …
I have 600
simulations to run.
Where can I get help?
Slide 18
Installing Condor
So Frieda Installs Personal
Condor on her machine…
Slide 21
Personal
Condor?!
What’s the benefit of a Condor “Pool” with just one user and one machine?
Your Personal Condor will
...
Getting Started: Submitting
Jobs to Condor
Making your job batch-ready
Creating a Submit
Description File
Simple Submit Description
File
Running condor_submit
Running condor_submit
Another Submit Description
File
“Clusters” and “Processes”
Example Submit Description
File for a Cluster
Slide 33
Submit Description File for
a BIG Cluster of Jobs
Submit Description File for
a BIG Cluster of Jobs
Using condor_rm
Temporarily halt a Job
Using condor_history
Getting Email from Condor
Getting Email from Condor
(cont’d)
A Job’s life story: The
“User Log” file
Sample Condor User Log
Uses for the User Log
Condor
JobMonitor
Screenshot
Job Priorities w/
condor_prio
Want other Scheduling
possibilities?
Extend with the Scheduler Universe
DAGMan
What is a DAG?
Defining a DAG
Submitting a DAG
Running a DAG
Running a DAG (cont’d)
Running a DAG (cont’d)
Recovering a DAG
Recovering a DAG (cont’d)
Finishing a DAG
Additional DAGMan Features
We’ve seen how Condor will
What if each job needed to
run for 20 days?
What if I wanted to interrupt a job with a higher priority job?
Condor’s Standard Universe
to the rescue!
Process Checkpointing
Relinking Your Job for
submission to the
Standard Universe
Limitations in the
Standard Universe
When will Condor checkpoint
your job?
What Condor Daemons are
running on my machine, and what do they do?
Condor Daemon Layout
condor_master
condor_master (cont’d)
condor_startd
condor_schedd
condor_collector
condor_negotiator
Happy Day! Frieda’s organization purchased a Beowulf
Cluster!
Slide 74
Layout of the Condor Pool
condor_status
Frieda tries out parallel
jobs…
The Boss says Frieda can add
her
co-workers’ desktop machines into her Condor pool as well…
but only if they can also submit jobs.
Layout of the Condor Pool
Some of the machines in the
Pool do not have enough memory or scratch disk space to run my job!
Specify Requirements!
Specify Rank!
How can my jobs access their
data files?
Access to Data in Condor
Remote System Calls
Job Startup
condor_q -io
I am adding nodes to the
Cluster… but the Engineering Department has priority on these nodes.
The Machine (Startd) Policy
Expressions
Freida’s Current Settings
Freida’s New Settings for
the Chemistry nodes
Submit file with Custom
Attribute
What if “Department” not
specified?
Another example
The Cluster is fine. But not the desktop machines. Condor can only use the desktops when they
would otherwise be idle.
So Frieda decides she wants
the desktops to:
Macros in the Config File
Desktop Machine Policy
Policy Review
General User Commands
Administrator Commands
CondorView Usage Graph
Back to the
Story:
Disaster Strikes!
Frieda Goes to the Grid!
Slide 105
How Flocking Works
Condor Flocking
Condor Flocking, cont.
Condor-G: Globus + Condor
Condor-G Installation: Tell
it what you need…
… and watch it go!
Frieda Submits a Globus
Universe Job
How It Works
How It Works
How It Works
How It Works
How It Works
Condor Globus Universe
Globus Universe Concerns
Changes to the Globus
JobManager for Fault Tolerance
Globus Universe
Fault-Tolerance: Submit-side Failures
Globus Universe
Fault-Tolerance:
Lost Contact with Remote Jobmanager
Globus Universe
Fault-Tolerance: Credential Management
But Frieda Wants More…
Solution: Condor GlideIn
How It Works
How It Works
How It Works
How It Works
How It Works
How It Works
How It Works
Slide 133
GlideIn Concerns
Common Questions, cont.
In Review
Slide 137
Case Study: CMS Production
CMS Physics
CMS Physics
ENORMOUS Data Challenges
Ahead
Leveraging Grid Resources
Challenges of a CMS Run
CMS Run on the Grid
CMS Run on the Grid
CMS Run on the Grid
CMS Run on the Grid
CMS Run on the Grid
CMS Run on the Grid
CMS Run Details
CMS Run Details
Future Directions
Slide 153
Thank you!