Reducing Airport Slots to Combat Climate Change: The Amsterdam Schiphol Airport Case

Slide Note

Aviation is a significant contributor to greenhouse gas emissions, with non-CO2 effects often overlooked. This study delves into the potential climate benefits of cutting airport slots at Amsterdam Schiphol Airport, examining the impact on GHG emissions based on aircraft type and flight distances. The Dutch government's plan to restrict slots is explored in light of both noise regulations and climate considerations.

rael Follow

Uploaded on Feb 26, 2025 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

Using GPUs with HTCondor 9.x HTCondor Week 2022 John (TJ) Knoeller

GPUs are "custom" resources GPUs are a custom resource with the tag : GPUs MACHINE_RESOURCE_INVENTORY_GPUs Startd runs a program to determine the number of GPUs and their properties condor_gpu_discovery reports GPU ids and properties number of GPUs is inferred 2

GPU discovery (old output) $ condor_gpu_discovery extra by-index DetectedGPUs="CUDA0, CUDA1" CUDACapability=8.0 CUDADeviceName="A100-SXM4-40GB" CUDAGlobalMemoryMb=40536 CUDAMaxSupportedVersion=11020 CUDA0DevicePciBusId="0000:01:00.0" CUDA0DeviceUuid="887efb86-35ba-3928-8b22-8f98126311f7" CUDA1DevicePciBusId="0000:41:00.0" CUDA1DeviceUuid="ee7237b4-7e82-64c1-4693-db39b705ecfa" properties that vary by GPU have CUDA<N> or OCL<N> prefix 3

Startd slot attributes condor_gpu_discovery defines slot attributes for the GPUs GPUs : number of items in DetectedGPUs AssignedGPUs : items assigned to the slot For p-slots this is usually the same as DetectedGPUs For other slots it is one or more GPU ids from the list AssignedGPUs from the slot can be configured to set CUDA_VISIBLE_DEVICES or other environment for the job All other attributes from discovery become slot attributes in all slots* * (this is a problem - details later in the talk) 4

GPU ids Orignally GPU ids were CUDA<n> CUDA_VISIBLE_DEVICES = <n> But device indexes are not stable Starting with HTCondor 9.0 we use uuids -short-uuid is the default CUDA_VISIBLE_DEVICES = GPU-<uuid> Some GPU property attribute names change Bigger name changes 9.x series (more on this later) 5

GPU discovery in 9.0 $ condor_gpu_discovery extra DetectedGPUs="GPU-110a08e4, GPU-6c6a9b39" CUDACapability=8.0 CUDADeviceName="A100-SXM4-40GB" CUDAGlobalMemoryMb=40536 CUDAGlobalMemoryMb=40536 CUDAMaxSupportedVersion=11020 GPU_110a08e4DevicePciBusId="0000:81:00.0" GPU_110a08e4DeviceUuid="110a08e4-1c0c-334c-f62b-ce1bf355d691" GPU_6c6a9b39DevicePciBusId="0000:C1:00.0" GPU_6c6a9b39DeviceUuid="6c6a9b39-af67-1df1-2827-d30d5ef32421" properties that vary by GPU have <gpu-id> prefix 6

Mixed GPU types in 9.0 $ condor_gpu_discovery -extra DetectedGPUs="GPU-c4a646d7, GPU-6a96bd13" CUDAMaxSupportedVersion=11020 GPU_6a96bd13Capability=7.5 GPU_6a96bd13DeviceName="TITAN RTX" GPU_6a96bd13DevicePciBusId="0000:AF:00.0" GPU_6a96bd13DeviceUuid="6a96bd13-70bc-6494-6d62-1b77a9a7f29f" GPU_6a96bd13GlobalMemoryMb=24220 GPU_c4a646d7Capability=7.0 GPU_c4a646d7DeviceName="Tesla V100-PCIE-16GB" GPU_c4a646d7DevicePciBusId="0000:3B:00.0" GPU_c4a646d7DeviceUuid="c4a646d7-aa14-1dd1-f1b0-57288cda864d" GPU_c4a646d7GlobalMemoryMb=16160 7

Heterogenous GPUs in 9.0 Matching jobs to GPUs is per-machine only, not per-GPU You can say "Run on a machine that has a GPU like this" , but not "Run on this GPU" Matchmaker doesn't know about GPUs that are in use No clean way to write Requirements expressions for machines that have more than a single type of GPU 8

How to match individual GPUs Matchmaker needs to know individual GPU properties New format for properties needed Stop advertising properties of GPUs not assigned to the slot Job needs to be able to select GPUs based on properties Match p-slot if any unused GPU has desired properties Match d-slot only if all GPUs have the desired properties we generalize this as Match slot when sufficient GPUs have desired properties 9

GPU discovery in 9.x (-nested) $ condor_gpu_discovery extra -nested -short-uuid DetectedGPUs="GPU-c4a646d7, GPU-6a96bd13" Common= [ CoresPerCU=64; DriverVersion=11.20; MaxSupportedVersion=11020; ] GPU_6a96bd13 = [ Capability=7.5; DeviceName="TITAN RTX"; DeviceUuid="6a96bd13-70bc-6494-6d62-1b77a9a7f29f"; GlobalMemoryMb=24220; ] GPU_c4a646d7 = [ Capability=7.0; DeviceName="Tesla V100-PCIE-16GB"; DeviceUuid="c4a646d7-aa14-1dd1-f1b0-57288cda864d"; GlobalMemoryMb=16160; ] For each GPU there is an attribute with the GPU properties 10

STARTD handling of -nested GPU props An internal property classad is stored for each GPU Common + <gpu-id> ad are merged for each GPU GPU property ad for each GPU in AssignedGPUs is published to the slot Properties of GPUs not assigned to the slot are omitted New attribute AvailableGPUs List of GPU property ads that are Assigned to the slot and not Assigned to a child slot and are not offline 11

STARTD attributes from -nested discovery GPUs = 1 ToalSlotGPUs = 2 AssignedGPUs="GPU-c4a646d7, GPU-6a96bd13" AvailableGPUs = { GPUs_GPU_c4a646d7 } GPUs_GPU_6a96bd13 = [ Capability=7.5; Id = "GPU-6a96bd13"; DeviceName="TITAN RTX"; DeviceUuid="6a96bd13-70bc-6494-6d62-1b77a9a7f29f"; GlobalMemoryMb=24220; ] GPUs_GPU_c4a646d7 = [ Capability=7.0; ... The partitionable slot when GPU-6a96bd13 is assigned to a child 12

Targeted binding of GPUs to slots New multi-line STARTD slot resource configuration SLOT_TYPE_1 @=slot1 CPUs = 50% Memory = 50% GPUs = 2 : Capability > 7.0 @slot1 Evaluated against each GPU property ad Affects binding of GPUs into static slots and p-slots 13

Targeted binding of GPUs to dynamic slots New job attribute RequireGPUs constrains assignment RequestGPUs controls the number of GPUs assigned RequireGPUs constrains which GPUs are assigned Evaluated against each GPU property ad RequireGPUs = Capability > 7.0 Affects binding of GPUs into dynamic slots Does not directly affect matching of jobs to slots 14

Using AvailableGPUs properties Two new classad functions for lists of properties evalInEachContext(<expression>, <list-of-property-ads>) returns a list containing the result of each evaluation try this: evalInEachContext(DeviceName, AvailableGPUs) countMatches(<expression>, <list-of-property-ads>) returns the number of ads that evaluate the expression to true equivalent* to sum(evalInEachContext( )) * more permissive of undefined and error in arguments 15

New submit keyword New submit keyword require_gpus (HTCondor 9.8.0 or later) # in the submit file request_gpus = 2 require_gpus = Capablity > 7.0 # in the job classad # without require_gpus Requirements = .. && TARGET.GPUs >= RequestGPUs && .. # with require_gpus RequireGPUs = Capability > 7.0 Requirements = .. && countMatches(MY.RequireGPUs,TARGET.AvailableGPUs) >= RequestGPUs && .. 16

Now we can match individual GPUs If Startd, Schedd and Negotiator are all 9.8.0 or later, Jobs can use the new require_gpus submit command Negotiator will only match machines that have needed quantity of Available gpus with the desired properties Schedd will only re-use slots that have at least the needed quantity of GPUs with the required properties. Schedd/Startd will only create dynamic slots that have GPUs with the required properties (multiple negotiation cycles may be necessary for now) Startd will only Assign GPUs that match requirements to d-slots 17

Questions on Heterogenous GPUs Before we go on to new topics, any questions? 18

Take a GPU offline Take a GPU offline with a reconfig of the Startd Offline GPU ids are not assigned to new dynamic slots Offline GPUs are not in AvailableGPUs list Requires stable GPU ids to work properly Stable GPU ids are the default (-short-uuid) Works with GPU indexes only if the GPU is not hung 19

Multiple jobs sharing a GPU (the unsafe way) New arguments to condor_gpu_discovery in 9.0 -repeat : duplicate GPU ids -divide : duplicate GPU ids and reduce advertised GPU memory Add to GPU_DISCOVERY_EXTRA knob 20

Using divide to share GPUs $ condor_gpu_discovery -by-index -extra -divide 2 DetectedGPUs="CUDA0, CUDA0" CUDACapability=7.0 CUDADeviceMemoryMb=16160 CUDADeviceName="Tesla V100-PCIE-16GB" CUDADeviceUuid="c4a646d7-aa14-1dd1-f1b0-57288cda864d" CUDAGlobalMemoryMb=8080 CUDAMaxSupportedVersion=11020 21

Upcoming - safer GPU sharing New slot splitting protocol Schedd will request that partitionable slot be split into N slots that share the same GPU Schedd will do this only for jobsets that request GPU sharing Share GPUs only with your own jobs 22

NVIDIA MIG support Some NVIDIA GPUs have MIG capability Split a GPU device into up to 7 MIGs Each MIG behaves like a smaller GPU device The GPU can no longer be used directly condor_gpu_discovery discovers the MIGs (And hides the MIG parent GPU) MIGs are usually heterogenous, GlobalMemoryMb will vary 23

Discovering MIGs $ condor_gpu_discovery -extra -nested DetectedGPUs="MIG-115b4463-372e-5b55-811a-00fb1374034d, \ MIG-633563ce-cd57-5081-9386-c7a1b374a74d, GPU-124d06a7" Common=[ DriverVersion=11.70; MaxSupportedVersion=11070; ] GPU_124d06a7=[ id="GPU-124d06a7"; Capability=8.0; DeviceName="NVIDIA A100-SXM4-40GB"; DeviceUuid="GPU-124d06a7-6642-3962-9afa-c86c31b9a7e6"; GlobalMemoryMb=40390; ] MIG_115b4463_372e_5b55_811a_00fb1374034d=[ id="MIG-115b4463-372e-5b55-811a-00fb1374034d"; ComputeUnits=42; DeviceName="NVIDIA A100-SXM4-40GB MIG 3g.20gb"; DeviceUuid="MIG-115b4463-372e-5b55-811a-00fb1374034d"; GlobalMemoryMb=19949; ] MIG_633563ce_cd57_5081_9386_c7a1b374a74d=[ id="MIG-633563ce-cd57-5081-9386-c7a1b374a74d"; ComputeUnits=42; DeviceName="NVIDIA A100-SXM4-40GB MIG 3g.20gb"; DeviceUuid="MIG-633563ce-cd57-5081-9386-c7a1b374a74d"; GlobalMemoryMb=19945; ] 24

MIGs Lots of caveats from NVIDIA Big differences between driver 460 and 470 Only one MIG can be used per-process Use long uuid name, (Short uuid name not supported) CUDA device enumeration does not see them GPU forgets MIGs on reboot MIGs "just work" in HTCondor 9.0 Must be set up before the Startd runs discovery Restart required to re-run discovery 25

More change to come -nested discovery will become the default Better support for GPU sharing Watching NVIDIA for MIG changes 26

This work is supported by NSF under Cooperative Agreement OAC- 2030508 as part of the PATh Project. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the NSF. 27

Any Questions? 28

Reducing Airport Slots to Combat Climate Change: The Amsterdam Schiphol Airport Case

Download Presentation

Presentation Transcript

Related

More Related Content