ILD MC Production for Detector Optimization

ILD MC Production for Detector Optimization
Slide Note
Embed
Share

Overview of ILD MC production system for detector optimization, performance study based on large-scale MC samples, optimization of ILD detector with different calorimeter options and software updates. Resource estimation for CPU time and data size, utilization of DIRAC and ILCDirac for job management and large-scale production.

  • MC Production
  • Detector Optimization
  • Performance Study
  • Large-scale Samples
  • DIRAC

Uploaded on Apr 12, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. ILD MCProduction for detector optimization Akiya Miyamoto 23 Oct. 2018 LCWS2018 Simulation and Reconstruction Session

  2. Contents Overview of ILD MC production system MC production in Spring-Summer, 2018 Summary 2018/10/23 Akiya Miyamoto, ALCWS2018 2

  3. Optimization production ILD detector optimization Performance study based on a large scale MC samples. Large and small detector geometry Calorimeter options : Hybrid Analog HCAL vs Semi-Digital HCAL Software : full update since DBD Mokka DD4Sim New/improved reconstruction tools are used. Samples Calibration samples: single particles, UDS, flavor tag DBD samples 500 GeV SM + some addition. Statistics same as DBD or more. overlay pairs, as well as aa_lowpt correct IP vertex offset and smearing 250 GeV by Whizard2 with the latest ILC beam parameter generated by Whizard2 2018/10/23 Akiya Miyamoto, ALCWS2018 3

  4. Resource : initial estimation for 500 GeV CPU time and data size was estimated by KEK batch ( ~23HS06/CPU), 50 events/process, 1 Detector(large) and 1 CAL option Data size(GB) CPU days Process uds single higgs 2f 4f 5f 6f aa_4f flavortag Sum Nb.Procs k Evts 12 94 32 Nb. Jobs SIM 430 589 6,189 16,475 48,219 7,627 REC SIM REC DST 120 30 37 322 13 36 243 706 385 382 379 307 6 4 1,520 951 3,780 11,289 2,029 6,907 530,029 816 535 27,947 617,231 4,367 14,386 33,369 6,175 38,456 1,794 3,943 103,257 4,417 13,740 34,721 6,442 40,506 2,042 4,142 106,696 72 215 475 116 725 41 76 8 1,198 3,108 520 2,900 158 290 8,563 40 200 188 80 1,726 342 2,564 115 262 6,007 2,344 5,329 5 659 1,732 Resource needs for all samples. Total for 500 GeV 2 Size x 1 Cal 2 Size x 2 Cal Key for timely production : the reduction of baby sitting work CPU(HS06 year) 1,882 2,657 Storage (TB) 423 640 2018/10/23 Akiya Miyamoto, ALCWS2018 4

  5. DIRAC and ILCDirac DIRAC ( Distributed Infrastructure with Remote Agent Control) :High level interface between users and distributed resources Job managements, File catalog, .. Transformation system for productions Written in Python 2 Web interface ILCDirac : An extension for the ILC VO and CALICE VO - Developed & operated by CLICdp group since 2010. - Provide simple interface for user jobs - DIRAC file catalog for file and metadata - Central system for large scale production ILD began to use ILCDirac after DBD Essential tool for a large scale MC production 2018/10/23 Akiya Miyamoto, ALCWS2018 5

  6. Production with ILCDirac Transformation Automatically creates job scripts, submit jobs, retry when failed, and maintain job logs, ILCDirac Jobs have to be submitted using the same steering. File names should be know to the server Input file : picked up from DB based on query given by the operator request. Output file: File name is generated by the transformation server and written to a defined directory. BKG overlay : Relevant files are picked randomly and given to the job. Workflow in the ILCDIRAC server and shared among other groups. Hard to modify or implement new workflow, though not impossible. 2018/10/23 Akiya Miyamoto, ALCWS2018 6

  7. ILD specific features Naming rules for file and directory and meta key matters. Input file search by DIRAC is based on the meta keys attached to the directory and the file. In addition, names are constructed using meta keys. ex: ild/rec/500-TDR_ws/2f_Z_leptonic/ILD_o1_v05/v01-16-p05_500/ rv01-16-p05_500.sv01-14-01-p00.mILD_o1_v05.E500-TDR_ws.I250108 . Modules for ILD naming rule have been prepared. Avoid very long file name. Need to be within 128 characters. Generator input ( DBD stdhep files ) are provided as files GenSplitting is mandatory for jobs to be within CPU limit. GenSplit is performed at local host, then uploaded to DIRAC stdhep lcio conversion Add split number in file name. Used later to merge DST files. Add header information to LCIO header. Consistent with whizard2 files Nb. of events per file is limited to limit rec. file size O(1GB) and CPU time to O(~8hour) Grouping of generator processes. Grouped by output directory, beam nature ( beamor not ), lepton or not, ~600 Gen. processes are grouped to O(50) groups During the production, initial grouping was divided in some case to limit the number of jobs per production to O(10k) complete all production steps in O(2 weeks). 2018/10/23 Akiya Miyamoto, ALCWS2018 7

  8. ILD specific features - 2 Not all tasks of ILDProduction are provided by ILCDirac. DST Merge : Wait completion of all jobs. Keep the order of gensplit using split number ProcessID aware merging. Note many processIDs by 1 production. Not implemented as ILCDirac transformation. Run as User job. Log files: not registered in the Catalog Copied to the standard location after production. Data base of produced samples DIRAC catalog and meta information ILD : Providing ELOG server, https://ild.ngt.ndu.ac.jp/elog/dbd-prod/ Timely update of ELOG information 2018/10/23 Akiya Miyamoto, ALCWS2018 8

  9. ILD MC production workflow Established since v02-00 Process ILCDirac Local batch jobs WishList WishList - ProcessName - Int. Lumi Resource Needs Resource Needs - sec/ev - MB/ev. Local scripts Manual editing Nb.Evts/Job Group process ProdPara excel Build Production scripts Build Production scripts loop over process groups, initiated by operator stdhep split to lcio register to catalog Record progress GenSplit GenSplit Elog ILCDirac production DDSim DDSim & Marlin & Marlin Cron task - Show progresses on web - Initiate sub-steps of DST merge and save log - Monitor error Save Prod. Log Save Prod. Log DST Merge DST Merge - sub. Dirac User Job - Replicate files - Merge Job log GenSplit and DSTMerge could be implemented in ILCDirac in principle Update production summary

  10. Databases in ILDProduction ( other than ILDDirac ) Purpose Keep a track of DIRAC transformations. Database for the information not provided by ILCDirac Elog database: Ref. https://midas.psi.ch/elog/ ASCII data record. Easy to use, flexible, search function available, python interface available. no performance issue yet. Database defined in out Elog, https://ild.ngt.ndu.ac.jp/elog/ genmeta : Generator meta data imported from /ilc/prod/ilc/mc-dbd/generated. process ID based entry a json file of all meta data is also provided. https://ild.ngt.ndu.ac.jp/CDS/files/genmetaByID.json dbd-prod : Data base for the ILD MC production since 2017. progress of production are entered at each production steps. opt-prod : Process ID and detector model based data base. Updated after completion of production.

  11. ELog Server : https://ild.ngt.ndu.ac.jp/elog/dbd-prod/ Searchable entries Updated by scripts using python interface. manual correction possible

  12. Web page to summarize produced samples https://ild.ngt.ndu.ac.jp/mc-prod/prodmon/summary-by-evttype.html ( continue to the next page )

  13. - - In total, 4 energy points, 50 event type, ~ 146M events Links to - ElogID (production set) based data base in elog (dbd-prod) https://ild.ngt.ndu.ac.jp/elog/dbd-prod/ - Process ID based data base in elog (opt-data) https://ild.ngt.ndu.ac.jp/elog/opt-data/ -

  14. Cumulative Normalized CPU ( April. 2018 to August ) v02-00-01 Left over Opt. prod. add. requests Essentially no break during this period. Slope was limited by transfer rate. v02-00

  15. From 19 Apr. to 4 Oct.

  16. Produced data since mid April 2018 Data size registered in DIRAC catalog /ilc/prod/ilc/mc-opt-3 /ilc/prod/ilc/mc-opt.dsk ( Some data are missing due to errors of monitoring job )

  17. Number of replicas since mid April 2018 /ilc/prod/ilc/mc-opt-3 /ilc/prod/ilc/mc-opt.dsk

  18. > 1.5GB/sec at peak

  19. Too many transfer errors, if too many jobs were running. Limit the number of concurrent jobs by adjusting the speed of gensplit and/or the number of tasks in transformation

  20. Repository of ILD production scripts Git repository https://gitlab.cern.ch/amiyamot Reorganized recently, taking into account the experience of the production with ILCSoft-v02-00-01 Documentation https://gitlab.cern.ch/amiyamot/ildprod/blob/master/docs/GetStarted.md https://gitlab.cern.ch/amiyamot/ildprod/blob/master/docs/ProductionManual.md

  21. Summary ILD has developed a tool for MC production using ILCDirac. The tool has been used for the production of MC samples for ILD detector optimization. About ~1 PB of data has been produced since May 2018. Mostly 500 GeV samples with ILD Large and Small. Baby sitting works, which had been the major limitation in previous test productions ( summer 2017-winter 2018 ), were reduced significantly. Currently, the data transfer rate limits the production rate. The refinement of the production scripts are in progress, aiming for the use in coming 250 GeV production.

  22. B A C K U P 22 2018/10/23 Akiya Miyamoto, ALCWS2018

Related


More Related Content