ALICE-USA Computing Project 2023 Tasklist

ALICE-USA Computing Project 2023 Tasklist
Slide Note
Embed
Share

Our general to-do list includes evaluating new hardware purchases for 2023, attending important meetings, updating technical tasks, and proposing new initiatives. Specific tasks involve migrating systems, managing operational considerations, and setting priorities and deadlines to meet project goals.

  • ALICE-USA
  • Computing Project
  • Tasklist
  • Hardware
  • Meetings

Uploaded on Apr 13, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. ALICE USA Computing Project 2023 Tasklist R. Jeff Porter (LBNL) December 9th, 2022

  2. Our General To Do List see PEAP Evaluate new hardware purchases for 2023 New CPU at both LBNL and ORNL New disk as needed and occupied Attend Meetings EOS workshop at CERN, OSG All hands meetings, CHEP 2023 Report operational status US Spring visit 2023 T1/T2 workshop Ongoing technical tasks IPv6 at ORNL Perfsonar at both sites & update OSG dashboard LHCOne at LBNL Report to DOE Quarterly reports New AF Proposal FY2024 Update to PEAP Jeff Porter LBNL - 2 -

  3. US AF Prototype at LBL Target systems: LBL legacy hardware Oldest 2 JBODS from current production EOS ~20+ 2015 PDSF Nodes (~640 cpu cores) Configuration details: Drain and reconfigure old JBODS in current EOS Spin up new vobox + new ALICE LDAP entries Create new SLURM partition Operational/Management Considerations: Irakli matrixed into HPCS group Can we spin up the ORNL Testbed for additional training or is that what is in rack 2? LHCOne for AF data feed? (maybe not prototype but future) Priority & ETA: Moderately high April 2023 Jeff Porter LBNL - 3 -

  4. Migrate LBL to whole-node scheduling Target systems: Current LBL T2 Configuration details: Move STAR off of ALICE T2 Copy data from STAR EOS into purchased file system on Lawrencium Request allocation on Lawrencium for STAR based on CPU procurements Modify SLURM and ALICE LDAP entries Operational/Management Considerations: None? Priority & ETA: High January 2023 Jeff Porter LBNL - 4 -

  5. Migration to Perlmutter - baseline Target system: NERSC/US supercomputers Configuration details: Spin up New VOBox VM on HPCS system + ALICE LDAP entries Service run by alicepro account available to Sergiu/Irakli Integrate Superfacility API Test access by collaboration account Develop Token renewal service push for 30 day lifetime Define R&D / logging structure (Give Maarten access!) Operational/Management Considerations: None Priority & ETA: Moderately high March 2023 End of CORI Jeff Porter LBNL - 5 -

  6. Migration to Perlmutter extended Target system: NERSC GPU resource Configuration or R&D details: Confirm hand-build and run-ability on NVIDIA A100 Optimize and test If workable, integrate build into ALICE Build system Operational consideration: 2nd Vobox for GPU partition? Priority & ETA: Modest/low 6-12 months Jeff Porter LBNL - 6 -

  7. US CCDB Repository Target systems: US production EOS systems Configuration details: details from CERN team Subdirectory definition: /something/ Replication tag (expect 2 or 3 replicas/site) Operational/Management Considerations: None? Priority & ETA: LBL ASAP High January 2023 Jeff Porter LBNL - 7 -

  8. ORNL Storage refresh Target Systems: ORNL::EOS Configuration details: Rack and configure new systems as ORNL::EOS with single disk FSID REL 8 or REL 9? Connect to legacy storage that still contain data as read-only Remove legacy storage once drained Operational/Management Considerations Drain speed? Priority & ETA: Moderately high February 2023 Jeff Porter LBNL - 8 -

  9. CTF production resource at ORNL Target Systems: legacy ORN::EOS ORNL T2 Configuration details: Configure CERN10 + 1 MGM w/ QuarkDb as new EOS storage EL 8 or EL 9? New name? ORNL::PRF Start copy of CTFs into new EOS storage Expand with old hardware as old ORNL::EOS is drained Operational/Management Considerations None? Priority & ETA: High December 2022? likely January 2023 Jeff Porter LBNL - 9 -

  10. ORNL CPU refresh Target Systems: ORNL T2 Configuration details: Rack systems and run HEPSpec on new systems Cable as 10GigE Migrate to EL8 or EL9 Evaluate amount of original hardware that can be retired Operational/Management Considerations None Priority & ETA: Moderately high to be used for CTF production January 2023 Jeff Porter LBNL - 10 -

  11. Task tracking software Target Systems: ALICE-USA computing project Configuration details: Select tool that using Google Auth Operational/Management Considerations None Priority & ETA: high January 2023 Jeff Porter LBNL - 11 -

More Related Content