Condor
University of Wisconsin-Madison
NCSA NINTH ANNUAL INDUSTRIAL PARTNER EXECUTIVE MEETING
APRIL 28, 1997 - MAY 1, 1997
A New PACI Partner Opportunity
KEY CONCEPTS
High Throughput Computing
High Throughput Computing
Large amounts of processing capacity over a very large period of time (FLOPY = Floating Point Operations per Year).
- Higher Throughput enables better research, better designs, and better products.
Resource Fragmentation
- As an organization’s aggregate distributed compute power increases, the percentage accessible to any given individual decreases.
(due to distributed ownership)
The Condor System
Supporting High-Throughput Computing in large, distributively owned environments.
- Harnesses the power of existing non-dedicated resources.
- Does not require (in most cases) changes to the application.
- Adapts to local requirements and policies.
- Supports sequential and parallel (PVM) applications.
A culture, not just a technology
Novel Layered Design
Unique Mechanisms
- Checkpointing
- Enables Preemptive Resume Resource Allocation (essential in an opportunistic environment)
- Remote I/O
- Enables computation across administrative domains (essential for HTC)
- ClassAds
- Enables flexible resource matchmaking (essential in a distributively owned environment)
The Condor Team
- More than 10 years of Experience with a production HTC Environment.
- Close interaction with owners, customers, and system administrators worldwide.
- Maintain a large HTC environment.
The Condor UW-Flock
- >500 Desktop Workstations
- Multiple Administrative Domains
- Supported by the UW Graduate School
- Architectures/Operating Systems
- Sun SPARC/Solaris
- HP-RISC/HPUX
- SGI/IRIX
- DEC Alpha/Digital UNIX
- Intel/LINUX
- Intel/Solaris
- Soon… Windows NT!
maximizing the utility of
UW-Flock Engineering Pool
- Pool Profile
- 190 HP Workstations
- 35 active users
- 120 CPU-days per day
- 90 jobs completed per day
- Average Job Profile
- Consumed 26 CPU-hours
- 44 hours response time
Customer Example: KIVA
- “Condor enabled us to push the futility-point farther away enabling KIVA to be used as a design tool.”
-Dan Mather, UW Engine Research Center
- futility-point: when the time for multiple computations to complete far exceeds the available time to the point where they are not even worth considering
KIVA: a state-of-the-art computational fluid dynamics code
for modeling two-phase, turbulent, reacting flows within
the combustion chambers of diesel engines.
Customer Example: KIVA
- In 2 weeks, the Engineering Condor Pool delivered to the KIVA customer:
- 420 completed KIVA runs
- Average run consumed 25 CPU-hours
- Average run response time: 36 hours
- Overall: ~30 CPU-days/day
- Other customer experiences: http://www.cs.wisc.edu/condor/stories.html
Joint Projects/Opportunities
- Establish and support Condor Pools at Industrial Partners.
- Identify Industrial HTC applications and adapt them to the Condor environment.
- Develop and Implement customized HTC layers to meet special Industrial requirements.
Condor Takes Flight at NCSA