Throughput Computing 2024 Summary

Monday

David Swanson Awardee Talk

Speaker(s): Cort Posnansky ( Penn State ) The Event Horizon Telescope Science Gateway

Speaker(s): Esen Gokpinar-Shelton Jun Wang ( Indiana Unversity ) Introducing Pelican: Powering the OSDF

Speaker(s): Brian Bockelman ( Morgridge Institute for Research ) Thousands of Little Artificial Societies: Experimental Machine Ethics at Scale

Speaker(s): Nate Kremer-Herman ( Seattle University ) Closing Remarks and Announcements

Speaker(s): Frank Wuerthwein ( UC San Diego ) David Swanson Award Introduction

Speaker(s): Ronda Swanson Large Scale Processing of Water Column Sonar Data to a Cloud Native Zarr Format

Speaker(s): Rudy Kluckik ( CIRES, University of Colorado Boulder, NOAA National Centers for Environmental Information ) "Replace the D with an O!" : No more "Data," "Directories," "Folders" and "Files." Only "Buckets" and "Objects"

Speaker(s): Miron Livny ( UW-Center for High Throughput Computing ) Towards universal mapping of cellular phenotypes with microscophy and large-scale computing

Speaker(s): Juan Caicedo ( Morgridge Institute for Research, UW–Madison ) Challenges in Radio Astronomical Imaging: TeraBytes, PetaFLOPS, and Algorithms

Processing data from interferometric telescopes requires the application of computationally expensive algorithms to relatively large data volumes for imaging the sky at the sensitivity and resolution afforded by current and future telescopes. Therefore, in addition to their numerical performance, algorithms for data processing for imaging must also pay attention to computational complexity and runtime performance. As a result, algorithms R&D involves complex interactions between evolution in telescope capabilities, scientific uses cases, and computing hardware and software technologies.

In this talk I will briefly describe the working of radio interferometric telescope and highlight the resulting data processing challenges for imaging with next-generation telescopes like the ngVLA. I will then discuss the general data processing landscape and the algorithms, and the computing architecture developed by the NRAO Algorithms R&D Group (ARDG) to navigate this landscape with a focus on (near) future needs, and on hardware/software technology projections. Recently, in collaboration with the Center for High Throughput Computing we deployed this architecture on the OSG, PATh, San Diego Supercomputer Center (SDSC) and National Research Platform (NRP) resources to process a large database for the first time. This produced the deepest image ever at radio frequencies of the Hubble Ultra-Deep Field (HUDF). I will also briefly discuss this work, the lessons learnt, and the work in progress for the challenges ahead.

Speaker(s): Sanjay Bhatnagar ( NRAO ) Processing Historic Aerial Photography with High Throughput Computing Resources

Speaker(s): Jim Lacy ( Wisconsin State Cartographer's Office, Dept. of Geography, University of Wisconsin–Madison ) Deployment Scale and Use of OSDF

Speaker(s): Frank Wuerthwein ( UC San Diego ) Developing the HTC Community

Speaker(s): Christina Koch ( UW Madison ) Cache-ing in on SLEAP (Remote Speaker)

Speaker(s): Jarryd Ramborger ( George Lab and Preclinical Addiction Research Consortium (PARC) at University of California, San Diego ) Using Astronomy Big Data and National, Distributed, Equitable Cyberinfrastructure to Drive AI Innovation

Speaker(s): Stanley Dodds ( Institute for Astronomy, University of Hawaii )

Tuesday

Introductory Remarks - LIGO

Speaker(s): Ron Tapia ( Institute for Computational and Data Sciences PSU ) Connecting Pelican to your Data

Speaker(s): Justin Hiemstra ( Morgridge Institute for Research ) Stress testing an OSDF/Pelican origin and OSDF Ops

Speaker(s): Fabio Andrijauskas Closing Remarks and Announcements

Speaker(s): Brian Bockelman ( Morgridge Institute for Research ) OSG Campus Services and the HTCondor-CE dashboard

Speaker(s): Tim Cartwright ( University of Wisconsin–Madison, OSG ) Todd Tannenbaum ( University of Wisconsin ) Cyberinfrastructure Planning Community of Practice

Speaker(s): Russell Hofmann ( Internet2/MS-CC ) Pelican under the hood: how the data federation works

Speaker(s): Brian Bockelman ( Morgridge Institute for Research ) Collaborations Panel Discussion: Opening Remarks

Speaker(s): Pascal Paschos ( University of Chicago ) NSF’s Campus Cyberinfrastructure program and OSPool/OSDF

Speaker(s): Kevin Thompson ( NSF ) Using NRP to share resources with the OSPool

Speaker(s): Frank Wuerthwein ( UC San Diego ) Collaborations Panel Discussion

Speaker(s): What did Pelican do to my transfer? A Monitoring Story

Speaker(s): Jason Patton ( University of Wisconsin ) Introductory Remarks - ePIC/EIC

Speaker(s): Sakib Rahman Introductory Remarks - IceCube

Speaker(s): Benedikt Riedel ( UW-Madison )

Wednesday

Post-DC24 Activities and Plans for DC26 Preparation

Speaker(s): Farrukh Khan ( Fermi National Accelerator Laboratory ) Shawn McKee ( University of Michigan ) Group Discussion USATLAS Operations

See notes https://docs.google.com/document/d/1pjwG1LAjOPWsSrdas4WvYYfoyn8NyLuk5u4L6opNb4Q/edit?tab=t.0#heading=h.o4xh9deafqh6

US cloud Ops Organization
- Effort:
- What about M.Maeno & Armen ?
- KD - Mayuko is supported by Physics Support for US user support. I don't think she is in WBS 2.3.
- KD - Armen is 50% ADCoS and the remainder mostly K8 deployment, testing.
- 1.5 FTE @CERN
- T1/T2s technical expertise [?]
- KD - we are very thin here. Need more help.
- Communication channel(s) and meetings
- Known challenges and issues
- What and how we want to improve
- Wish list to ADC Ops

Speaker(s): Alexei Klimentov ( Brookhaven National Lab ) The ePIC Simulation Campaigns: Experience so far on the OSG and Future Use Cases (ePIC/EIC Collaboration)

Speaker(s): Sakib Rahman ( Brookhaven National Laboratory ) Token Transition Update (PATh Production Services)

Speaker(s): Brian Bockelman ( Morgridge Institute for Research ) Data in Flight - Delivering Data with Pelican Tutorial

If you are interested in participating in the hands-on portion of the Wednesday's tutorial "Data in Flight - Delivering Data with Pelican", you will need to register at https://go.wisc.edu/cfsl43 before end-of-day Tuesday. This tutorial is aimed at those who may be interested in contributing their data to the OSDF via a Pelican data origin. In-person and remote attendees can participate in the tutorial, and experience using SSH and Bash commands is recommended. Registration is not required to observe the tutorial.

When you click the registration link, you'll be asked to login using CILogon. For new users, you'll be prompted to select an "identity provider" - most major institutions are available, but if you do not see your institution you can select "ORCID", "GitHub", or "Google" (GMail) instead (be sure to choose an institution that you remember the login information for!). If you have issues logging in, you may want to try using an incognito/private browser session.

Once you've logged in, you'll land on a page with "Basic Account Creation" - click on the "Begin" button. Enter your information, enter "HTC24 Pelican Tutorial" in the comment box, and click the "Submit" button. After your information has been processed, confirm your email address following the instructions in the email you should have received in your inbox.

If you have issues with registration or other questions, please email support@osg-htc.org.

Speaker(s): Andrew Owen ( UW-Center for High Throughput Computing ) Case study: dynamic jobs, subdags, and resource requests in htCondor DAGs (Remote Speaker)

Speaker(s): Joseph Areeda ( Cal State Fullerton ) Analysis Facilities

Speaker(s): Dirk Hufnagel ( Fermilab ) Matteo Cremone ( CMU ) Ofer Rind ( Brookhaven National Lab ) Rob Gardner ( University of Chicago ) Wei Yang ( SLAC ) Advanced HTCondor config for GPU machines

Speaker(s): John Knoeller ( University of Wisconsin, Madison ) Discussion

Speaker(s): R2R4 / ATLAS ADC Input - WBS 2.3 Milestones

Cover existing and proposed USATLAS milestones.

See https://docs.google.com/spreadsheets/d/1YEEzfcXkQ_KHg1to-aSsFLE7z708c5rumFmRGDaQEmk/edit?gid=1636071618#gid=1636071618 (milestone spreadsheet WBS 2.3 working copy)

Description of changes for Apr-Jun 2024 https://docs.google.com/document/d/1QykafgLCRQtFzgozLQHnnZImbJ2zdYjSAj1Bb2S2d5s/edit#heading=h.hvne1q6a1adg

Spreadsheet summarizing WBS 2.3 milestones https://docs.google.com/spreadsheets/d/1RDuvkuOvHG6RhuUcDB42JsRxGUjDW9zcXIvvI8wPL3w/edit?gid=1747360723#gid=1747360723

Speaker(s): Shawn McKee ( University of Michigan ) Automating Workflows via DAGMan

Speaker(s): Cole Bollig ( UWM ) ATLAS Distributed Computing Operations Today and in the Future

Speaker(s): Andres Pacheco Pages ( IFAE Barcelona ) T2 Meeting

Speaker(s): Farrukh Khan ( Fermi National Accelerator Laboratory ) Advanced Debugging with eBPF and Linux perf tools

Speaker(s): Greg Thain ( Center for High Throughput Computing ) An update on the REDTOP collaboration (REDTOP Collaboration - Remote Speaker)

TBD

Speaker(s): Corrado Gatto ( INFN and NIU ) Vito Di Benedetto ( Fermi National Accelerator Laboratory ) GlueX in the integrated global infrastructure for distributed computing (GlueX Collaboration - Remote Speaker)

We present our vision on using the globally integrated infrastructure to advance the mission of the GlueX collaboration via leveraging available computing and storage capacity. GlueX has been using the distributed resources at the OSPool along with their own pool resources in a number of institutions in the US, Canada and Europe. With lessons learned and adapted know-how we will be able to chart paths forward in becoming more efficient and productive in our computing workflows for both simulations and data processing.

Speaker(s): Richard Jones ( University of Connecticut ) Rucio/SENSE/Xrootd

Speaker(s): Justas Balcas ( ESnet ) Wrap up, Action Items

Summarize discussion and create relevant action items

Speaker(s): Introduction and Welcome

Speaker(s): Alexei Klimentov ( Brookhaven National Lab ) Shawn McKee ( University of Michigan ) Wrap up, document action items

Speaker(s): WLCG Strategy Implications for USLHC

Link to shared Google doc with details and background https://docs.google.com/document/d/1W5yxKIVLGQWuzf7iw_izk5HPq9pm_6l9bg7Ty2XAPNY/edit

Speaker(s): Alexei Klimentov ( Brookhaven National Lab ) Dirk Hufnagel ( Fermilab ) New Heterogeneous Integration and Operations area (2.3.3) overview/discussion

Need to discuss the changes in WBS 2.3.3
How to improve our toolkit for the future (shouldn't take 0.2+ FTE for EACH HPC)
What do we do about TACC?
Small amount of effort that we need to optimize for our goals.
Potential candidates to go from R&D → (pre)production

Speaker(s): Alexei Klimentov ( Brookhaven National Lab ) Rui Wang ( Argonne National Lab ) Analysis Facilities

Speaker(s): Matteo Cremone ( CMU ) Intro to Pegasus

Speaker(s): Karan Vahi ( Pegasus - Team USC Information Sciences Institute ) Mats Rynge ( USC / ISI ) Hunting for the Screams from the Stellar Graveyard (DES Collaboration - Remote Speaker)

Speaker(s): Nora Sherman ( University of Michigan - Ann Arbor ) Network Traffic Optimization (Jumbo, protocols, pacing)

Speaker(s): Justas Balcas ( Caltech ) SYED ASIF RAZA SHAH ( Fermilab (FNAL) ) Shawn McKee ( University of Michigan ) PATh Collaboration Support Services

We present a list of services that support collaborations on computing pools on the global cyberinfrastructure.

Speaker(s): Pascal Paschos ( University of Chicago )

Thursday

IceCube's SkyDriver: An Application of the Event Workflow Management System for Scalable Solutions of Distributed Workflows

Speaker(s): Ric Evans ( UW-Madison / IceCube ) Cgroup v2: what lies beneath

Speaker(s): Greg Thain ( Center for High Throughput Computing ) Closing Remarks and Announcements

Speaker(s): Miron Livny ( UW-Center for High Throughput Computing ) Monitoring HTCondor in HEPCloud's Decision Engine using Prometheus and Grafana

Speaker(s): Ilya Baburashvili ( Fermi Lab ) Optimizing Cost and Performance: Best practices for Efficient HTCondor Workload Deployment in AWS

Speaker(s): Sudheendra Bhat ( Amazon Web Services ) GlideinWMS handling of credentials and the new challenges they present

Speaker(s): Bruno Coimbra ( Fermilab ) Unleashing the power of protein engineering with artificial intelligence

Speaker(s): Anthony Gitter ( University of Wisconsin-Madison; Morgridge Institute for Research ) Python Bindings: Version 2

Speaker(s): Todd Miller ( CHTC ) Building High Throughput Function-Oriented Workflows with TaskVine

Speaker(s): Douglas Thain ( University of Notre Dame ) Working from Both Ends to Bridge the Gap Between Silicon and Human Cognition

Speaker(s): Ranganath Selagamsetty ( University of Wisconsin - Madison ) Robert Klock ( University of Wisconsin - Madison ) Throughput Machine Learning in CHTC

Speaker(s): Ian Ross ( U. Wisconsin ) Introduction of Keynote Speaker Anthony Gitter

Speaker(s): Frank Wuerthwein ( UC San Diego ) HTCondor 24: What's new and upcoming

Speaker(s): Todd Tannenbaum ( University of Wisconsin ) Implementation of NRAO's imaging workflow on HTCondor (Remote Speaker)

Speaker(s): Felipe Madsen ( National Radio Astronomy Observatory )

Friday

Practical experience with an interactive-first approach to leverage HTC resources (Remote Speaker)

Development and execution of scientific code requires increasingly complex software stacks and specialized resources such as machines with huge system memory or GPUs. Such resources are present in HTC/HPC clusters and used for batch processing since decades,but users struggle with adapting their software stacks and their development workflows to those dedicated resources. Hence, it is crucial to enable interactive use with a low-threshold user experience, i.e. offering an SSH-like experience to enter development environments or start JupyterLab sessions from a web browser.

Turning some knobs, HTCondor unlocks these interactive use cases of HTC and HPC resources, leveraging the resource control functionality of a workload manager, wrapping execution within unprivileged containers and even enabling the use of federated resources crossing network boundaries without loss of security.

This talk presents the positive experience with an interactive-first approach, hiding the complexities of containers and different operating systems from the users, enabling them to use HTC resources in an SSH-like fashion and with their JupyterLab environments. It also provides a short outlook on scaling this approach to a federated infrastructure.

Speaker(s): Oliver Freyermuth ( University of Bonn ) Research Computing and Interactive Services

Speaker(s): Miron Livny ( UW-Center for High Throughput Computing ) Improving CMS CPU Efficiency through Strategic Pilot Overloading

Speaker(s): Marco Mascheroni ( UCSD ) Versions and Upgrades

Speaker(s): Tim Theisen ( UW-Madison CHTC ) Closing Remarks

Speaker(s): Miron Livny ( UW-Center for High Throughput Computing ) Jupyter Notebooks as a frontend for a htc analysis facility (Remote Speaker)

Speaker(s): Christoph Beyer ( Deutsches Elektronen-Synchrotron ) Upgrading from EL7 to EL9: tales from Chicago

Speaker(s): Judith Stephen ( University of Chicago ) The DataVault: A comprehensive approach to research data management

Speaker(s): Derek Cooper ( CHTC Team Member ) Em Craft ( Wisconsin Institute for Discovery ) Jupyter Live Demonstration (Remote Speaker)

Speaker(s): Christoph Beyer ( Deutsches Elektronen-Synchrotron )