
Transition to Tokens in CMS Experience
Explore the transition from GSI to tokens in the CMS Submission Infrastructure, focusing on complex infrastructures, dynamic HTCondor pools with GlideinWMS, and the motivation behind moving towards industry standards. Learn about the components of the CMS Global Pool and the implementation of IDTOKENS in GlideinWMS.Fulfill your curiosity with practical examples and timelines regarding the adoption of capabilities-based authorization and retirement of the Globus Toolkit.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Transition to tokens: the CMS experience Saqib Haleem, Marco Mascheroni, Antonio Perez-Calero Yzquierdo,Edita Kizinevi for the CMS Submission Infrastructure team HTCondor Week 2022
Outline The CMS Submission Infrastructure (SI) overview. Transition status : from GSI to tokens IDTOKENS SciToken Conclusions and next steps 2
A complex infrastructure The CMS SI model has evolved to running multiple federated pools, with extensive use of flocking Multiple sets of specialized workflow managers (CRAB & WMAgent) attached to schedds The main Global Pool: Peaks at ~350k CPU cores Up to 200k running jobs 50+ schedds Redundant infrastructure for HA Resource provisioning mainly with GlidenWMS pilots but also vacuum-like instantiated: DODAS, BOINC(CMS@Home), opportunistic (HLT), HPC... 3
Building dynamic HTCondor pools with GlideinWMS CMS computing pool is build using two components: GlideinWMS : Resource provisioning overlay batch system which grows and shrink based on Job pressure. HTCondor: Batch system for Job scheduling. 4
Moving to tokens in the CMS Submission Infrastructure Motivation Towards industry standards : Capabilities based authorization for distributed services (new) Globus Toolkit retirement Practical example: multiple tokens with different capabilities instead of a single identity i.e. powerful pilot (GSI) proxy Timeline In coordination with WLCG/OSG timeline OSG 3.6 release removed Globus toolkit dependency November 2022: HTCondor GSI End Of Life 5
Components of the CMS Global Pool Authentication between SI Internal Components (IDTOKENS) Authentication between Factories <-> Sites (SciToken) GlideinWMS FRONTEND CRAB Task Workers GlideinWMS Factories Compute Elements (CE) COLLECTORS COLLECTORS SCHEDDs Startds (GWMS Pilots) Startds (Pilots) CCB CCB 6
IDTOKEN Implementation GlideinWMS FRONTEND GlideinWMS Factories DAEMON CRAB Task Workers Whole Infrastructure is already using IDToken as preferred authentication method between components. 96% startds using IDTokens, 4% falling back to GSI due to token expiration. IDTOKENS created with need based authorization capabilities and limited lifetime. DAEMON WRITE RS COLLECTORS Compute Elements (CE) SCHEDDs ADMIN Adv. STARTD Startds (non gwms Pilots) Startds (GWMS Pilots) CCB CCB 8
IDTOKEN authentication for GlideinWMS pilots Same signing key placed on both Frontend and Collector Frontend generates an IDTOKEN for each site. IDTOKEN is transferred to the factory, the CE, Batch System, WN (as pilot proxy before) Startd is then authenticated and authorized by the collector VO GWMS Factory Frontend Glidien Pool Collector Startd 9
IDTOKEN Implementation cms-global.cern.ch TRUST DOMAINS: CRAB Schedd Separate trust domains for: Global and CERN pools External schedds Pilots and non-gwms startds. Each trust domain has its own signing key. External startds issues IDToken to CM which is required for admin operations: e.g condor_drain Global CM non-gwms Startds WM Agents Backup CM GWMS Pilots external Schedd cms-t0.cern.ch T0 CM GWMS Pilots T0 Schedd Backup CM 10
SciToken Implementation SCITOKEN Identity and Access Management (IAM) https://cms-auth.web.cern.ch { SciToken authentication used for pilot submission between Factory -> Compute Element (CE) HTCondor-CE ( newer version) ARC -CE ( REST API) "wlcg.ver": "1.0", "sub": "bad55f4e-62c-2113", "aud": "https://wlcg.cern.ch/jwt/v1/any", "nbf": 1647430802, "scope": "compute.cancel compute.create compute.modify compute.read wlcg.groups:/cms/pilot", "iss": "https://cms-auth.web.cern.ch/", "exp": 1647434402, "iat": 1647430802, "jti": "01864ce3-a40d-4c-ae5b-dccc80870760" } SciToken with different scopes/subjects are issued for different categories of pilots. e.g. local vs generic pilots. GWMS Factory GWMS FrontEnd Pilot Job (Glidein) Pilot Job (Glidein) Pilot Job (Glidein) CronJob: Registered clients with CMS IAM fetches fresh token after every 10 minutes, and put it on FE, which is then used by factory for pilot submission. ARC-CE HTCONDOR-CE 12
SciToken Implementation (HTCondor CEs) Nearly 70% of the HTCondor CEs we interact with are already using recent enough HTCondor versions and supporting SciTokens authentication methods. CERN last major site pending! CMS is working in a systematic way with each grid site to minimize disruption during transition (= transparent from CMS Operations point of view) Separate glideinWMS FE group "main-token" created for submitting jobs with SciToken credentials. Individual CEs are moved to token group after successful condor_ping test. Site admin perform mapping of different jobs based on scitoken s subject in htcondor-CEs e.g: 30.3% No 69.7% Yes scitoken enabled CEs (HTCondor) # CMS ITB generic pilots: SCITOKENS /^https\:\/\/cms\-auth\.web\.cern\.ch\/,07f75a9a\-bb78\-4735-938b\-7e61b2b6d5c$/ cmspilot # CMS ITB local pilots: SCITOKENS /^https\:\/\/cms\-auth\.web\.cern\.ch\/,efbed8c1\-f9a7\-4063\-92f7\-f89c04c04a3$/ cmslocal 13
SciToken Implementation (ARC CEs) In the case of ARC CEs (about of our total sites and CEs use this technology), our strategy so far has been to test that we can interact with them via x509 proxies but with the new REST interface About 74% of all ARC CEs we use already OK Secondly, we are already testing a HTCondor pre-release capable of submitting pilots with SciTokens on to an ARC CE Tested with T2_IT_Rome, ok! 26% Not OK 74% OK ARC-CEs REST interface status 14
Summary GWMS Factory IDTOKEN DAEMON CRAB TW Remote submission tool GWMS Factory GWMS Frontend scitoken IDTOKEN (WRITE) scitoken IDTOKEN DAEMON IDTOKEN READ CE Glidein/Pilot Job Global CM GWMS Factory CRAB Schedd scitoken Backup CM IDTOKEN Adv. SCHEDD WM Agents GWMS STARTDs IDTOKEN Adv. SCHEDD T0 CM external Schedd non-glideinWMS resources /HPC/Vacuum STARTDs IDTOKEN Adv. SCHEDD Backup CM T0 Schedd 16
Conclusions and next steps CMS Submission Infrastructure internal components fully switched to IDTOKENS with fallback to GSI. No dependency on external components (CRLs, Argus ) CMS ITB Pool ( i.e. with same components) is running fine with condor feature release 9.8.0 (i.e. without GSI support).We can start dropping GSI fallback method soon in production pools. Still working with CMS WLCG sites to guarantee a seamless transition to Scitoken for Factory<->CE communication We thank the HTCondor development team for the continued support to CMS Submission Infrastructure over the years, a model of excellent partnership! 17