HTCondor Week 2021 Summary

Monday

Introduction and Guidelines

Speaker(s): Mark Coatsworth ( University of Wisconsin ) Plenary: Accelerating SARS-CoV-2 variant sequencing with HTCondor

Speaker(s): Dave O'Connor ( UW Medical Foundation Professor of Pathology and Laboratory Medicine and UW-Madison ) Welcome to HTCondor Week 2021

Speaker(s): Miron Livny ( UW-Madison CHTC ) What's New? What's Improved?

Speaker(s): Todd Tannenbaum ( University of Wisconsin ) HTC Philosophy

Speaker(s): Greg Thain ( UW-Madison CHTC )
Slides Video Recording
Introducing the HTCondor 9.0 Series (Users)

Speaker(s): Christina Koch ( UW Madison ) Introducing the HTCondor 9.0 Series (Admins)

Speaker(s): Greg Thain ( Center for High Throughput Computing ) Security in HTCondor 9.0

Speaker(s): Brian Bockelman ( Morgridge Institute for Research ) Town Hall Discussion: Authorization and Identity

Speaker(s): Miron Livny ( UW-Madison CHTC ) Brian Bockelman ( Morgridge Institute for Research ) Jim Basney ( University of Illinois Urbana-Champagne ) Frank Würthwein ( UCSD / Open Science Grid ) Jeny Teheran ( FermiLab )

Tuesday

Campus Research and Facilitation

Speaker(s): Lauren Michael ( UW-Madison CHTC ) HTCondor Tutorials on Youtube

Speaker(s): Christina Koch ( UW-Madison CHTC ) Running COPASI biochemical simulations with HTCondor

COPASI is a widely used simulator for chemical and
biochemical reaction networks based on ordinary differential equations
or stochastic methods. It includes various analysis methods such as
optimization, parameter estimation, sensitivity analysis, and several
others. While COPASI is mostly used in a standalone GUI-based mode,
several compute-intesive tasks benefit from parallelization. We created
a web-based system which facilitates transforming such tasks into
smaller sub-tasks that can be run independently. This system then allows
the user to submit these tasks to HTCondor from the web interface, and
assembles the numerical results in their expected order. Thus the end
user never has to interact directly with HTCondor.

Speaker(s): Pedro Mendes ( UConn Health ) Advancing data intensive science at George Washington University

Speaker(s): Clark Gaylord ( George Washington University ) High-throughput horizon screening for invasive species at USGS

The US Geological Survey (USGS) is currently leading a horizon scanning for new invasive species for the United States (US). This horizon scan is using a climate match to assess how climate in potential invasive species’ non-US range matches the climate in different parts of the US. We developed a high-throughput assessment using HTCondor to examine 8,000+ species. We will describe our workflow and how we created an R package, used Docker, and HTCondor for the assessment and provide suggestions for other people wanting to use R with HTCondor.

Speaker(s): Richard Erickson ( USGS ) Building better tools with the help of the Open Science Grid

Speaker(s): Nick Cooley ( University of Pittsburg ) Evolution of the CMS Submission Infrastructure to Support Heterogeneous Resources

The landscape of computing power available for the CMS experiment is already evolving from almost exclusively x86 processors, predominantly deployed at WLCG sites, towards a more diverse mixture of Grid, HPC and Cloud facilities, incorporating a higher fraction of non-CPU components, such as GPUs. The CMS Global Pool is consequently adapting to the heterogeneous resource scenario, aiming at making the new resource types available to CMS. An optimal level of granularity in their description and matchmaking strategy will be essential in order to ensure efficient allocation and matchmaking to CMS workflows. Current uncertainties involve what types of resources will be available in the future, how to prioritize diverse workflows to those diverse types, and how to deal with a diversity of policy preferences by the resource providers. This contribution will describe the present capabilities of the CMS Submission Infrastructure and its critical dependencies on the underlying tools (such as HTCondor and GlideinWMS), along with its necessary evolution towards a full integration and support of heterogeneous resources according to the CMS needs.

Speaker(s): Marco Mascheroni ( CERN ) LIGO Monitoring and the Grafana Dashboard

Speaker(s): Michael Thomas ( LIGO ) HTCondor in a Digitization Workflow : Helping Preserve Cultural Heritage

Digitization is an important aspect of the preservation and promotion of heritage materials. Once physical documents are too fragile or damaged to manipulate, the digital copy often becomes the only version that is available to the public. The digitization workflow must produce files that reliably meet high standards.

The combination of cycle scavenging and distributed computing of HTCondor allows the digital collections team to complete tasks faster with a small pool of 50 available workstations. The team submits projects to HTCondor through a web server that automatically prepares the submit file and input list.

Each task launches a Java application that handles file verification and executes tools such as Tesseract (optical character recognition), FFmpeg (audio, video file conversion) or ImageMagick (image conversion). Once the project is complete, the web server prepares a report using custom exit codes and informs the owner.

After being processed through HTCondor, the files are ready to be preserved for future generations. The projects for which the institution has dissemination rights then become available through our web platform : https://numerique.banq.qc.ca/patrimoine/

Speaker(s): David Lamarche ( Bibliothèque et Archives nationales du Québec )

Wednesday

Upgrading to HTCondor 9.0

Speaker(s): Todd Miller ( CHTC ) Managing Dropbox Workflows with HTCondor

Dropbox-driven workflows, where the appearance of new files in a given directory triggers work to be done on those inputs, are common in many contexts. Customarily these are implemented with cron jobs, or a service daemon in the system. The HTCondor platform has a number of features, such as built-in e-mail notifications, “crondor” for repeating jobs, and a well-conceived model of jobs and resources, which make building Dropbox workflows easier, and the result far more manageable.

The techniques I will describe were developed during 2020 to support an automated AI-driven visual and x-ray inspection process for silicon wafer and other component production which delivered $50 million worth of benefits, by reducing SME work and improving product quality and manufacturing yield as the data gathered was fed back into the component design, and was recognized in a prestigious Raytheon Missiles & Defense CIO Award.

Speaker(s): Michael Pelletier ( Raytheon ) Python Workflows and htcondor.dags

Speaker(s): Patrick Godwin ( LIGO ) Pegasus 5.0 + Ensemble Manager

Pegasus 5.0 is the latest stable release of Pegasus that was released in November 2020. A key highlight of this release, is a brand new Python3 based Pegasus API that allows users to compose workflows and to control their execution programmatically. This talk will give an overview of the new API and highlight various key improvements introduced that address system usability (including a comprehensive, yet easy-to-navigate documentation, and training), and the development of core functionalities for improving the management and processing of large, distributed data sets, and the management of experiment campaigns defined as ensembles.

Speaker(s): Karan Vahi ( Pegasus Team - USC ) Office Hours

Speaker(s): Town Hall Discussion: Multiple GPU Jobs

Speaker(s): David Schultz ( UW-Madison WIPAC ) John Knoeller ( University of Wisconsin, Madison ) Josh Willis ( LIGO ) Todd Miller ( UW-Madison CHTC ) Pegasus Tutorial

Speaker(s): Karan Vahi ( Pegasus - Team USC Information Sciences Institute )

Thursday

dHTC for LHAASO Experiments

Speaker(s): Jingyan Shi ( IHEP ) Unchaining JupyterHub: Running notebooks on resources without inbound connectivity

JupyterLab has become an increasingly popular platform for rapid prototyping, teaching algorithms or sharing small analyses in a self-documenting manner.

However, it is commonly operated using dedicated cloud-like infrastructures (e.g. Kubernetes) which often need to be maintained in addition to existing HTC systems. Furthermore, federation of resources or opportunistic usage are not possible due to a requirement of direct inbound connectivity to the execute nodes.

This talk presents a new, open development in the context of the JupyterHub batchspawner:
Extending the existing functionality to leverage the connection broker of the HTCondor batch system, the requirement for inbound connectivity to the execute nodes can be dropped, and only outbound connectivity to the Hub is needed.

Combined with a container runtime leveraging user namespaces, unprivileged CVMFS and the HTCondor file transfer mechanism, notebooks can not only be executed directly on existing local HTC systems, but also on opportunistically usable resources such as HPC centres or clouds via an overlay batch system.

The presented prototype paves the way towards a federation of heterogeneous and distributed resources behind a single point of entry.

Speaker(s): Oliver Freyermuth ( University of Bonn ) Improving Kubernetes support for batch scheduling of high throughput and parallel jobs

Kubernetes is an open source cluster orchestration system whose popularity stems in part because it acts as a standard resource management interface across cloud providers and on-premises data centers. There is significant interest in managing HTCondor services and scheduling user jobs in Kubernetes clusters. These solutions often rely on running standard HTCondor daemons inside a container or developing custom Kubernetes operators to bridge the two services. Originally designed by Google, it remains a major contributor to Kubernetes which is now governed by the Cloud Native Computing Foundation. We will describe recent (1.21) and planned (1.22+) contributions to improve direct support for batch scheduling of high throughput and parallel jobs as well as developments in our Google Kubernetes Engine product, which offers Kubernetes clusters with reduced management overhead.

Speaker(s): Abdullah Gharaibeh ( Google Cloud ) HTCondor CE 5 and Job Router Transform Language

Speaker(s): Brian Lin ( UW-Madison CHTC ) John (TJ) Knoeller ( UW-Madison CHTC ) ML, Image Analysis for Livestock Data

Speaker(s): Joao Dorea ( UW-Madison Animal & Dairy Sciences ) Scaling Virtual Screening to Ultra-Large Virtual Chemical Libraries

Progress in chemical synthesis strategies has given rise to vast “make-on-demand” chemical libraries. Such libraries, now virtual, are bounded only by synthetic feasibility and are growing exponentially. Making and testing significant portions of such libraries on a new drug target is not feasible. We increasingly rely on computational approaches called virtual screening methods to help us navigate large chemical spaces and to prioritize the most promising molecules for testing. The main challenge now is to scale existing virtual screening methods, or develop new ones, with sufficient molecule throughput and scoring accuracy to accommodate ultra-large compound libraries. Here I will describe some promising approaches that leverage high-throughput computing to meet this challenge.

Speaker(s): Spencer Ericksen ( UW-Carbone Cancer Center, Drug Development Core, Small Molecule Screening Facility ) Using high-throughput computing to develop precision mental health algorithms

Speaker(s): Gaylen Fronk ( UW-Madison Addiction Research Center ) IceCube Glideins and Autonomous Pilots

Speaker(s): Benedikt Riedel ( UW-Madison WIPAC ) Closing Remarks

Speaker(s): Miron Livny ( UW-Madison CHTC )