Best Practices for Diagnostics and Analysis Development
Proposed engineering best practices include unit testing small tests for simple infrastructure, continuous integration verification, regression testing, and strategies for parallelism. Coding best practices advocate for using Python 3.6+, following PEP 8 style guide for readability, and avoiding modifications to module search paths. Packaging best practices emphasize proper packaging techniques, distribution, and dependency management. These practices aim to enhance code maintainability, reproducibility, and usability in development projects.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Best Practices for diagnostics and analysis development Xylar Asay-Davis, Joe Kennedy
Proposed engineering best practices Unit testing Small tests for simple infrastructure (i/o, climatologies, remapping, time keeping, etc.) Continuous Integration (CI) can verify these with each pull request Regression testing Small E3SM runs (e.g. oQU240 in the ocean) on which analysis can be tested on a laptop Perhaps nightly testing (e.g. with Jenkins) if help can be provided to get there Making data (image and Netcdf files available) e3sm_diags is the gold standard right now Reproducibility and provenance Provide config file and/or command-line options to reproduce output Strategies for parallelism MPI support? (high-res analysis may be too big for one node) Tasks with prerequisites/dependencies (e.g. plotting happens after climatology is computed and remapped)
Proposed coding best practices Stop writing scripts in python 2.7 ; use 3.6+ from now on (more soon) Practices that make it easier to learn/navigate the source code Follow PEP 8 style guide as much as possible Improves readability; consistency across python code bases Deviations from PEP8 should be documented prominently https://pep8.org/ Imports: Use absolute imports ("required" in py3) from package.subpackage import some_function from thing import * is usually an anti-idiom Don't modify the search path for modules I.e., avoid: sys.path.append(...) Only use for external tools that can't be packaged
Proposed packaging best practices Packaging (example on next slide) Basics: https://packaging.python.org/tutorials/packaging-projects/ Target conda-forge for distribution (and PyPI if possible) Requires that all dependencies are on conda-forge Joe and I can help you get them there! Package should be in its own directory in the repo Don't hide python code in repo-level or unindexed src, lib, etc. directories prefer entry points: Pip/conda handles script creation and installation; linux, osx, and windows! Example: package/__main__.py for CLI "script" setup.py: entry_points={'console_scripts': ['script = package.__main__:main']}, execute with: python m package [ARGS] pip install e ./ #once!; script [ARGS] Packages changes are immediately reflected; execute same way as users
Repository name Python package name pynameless .git .gitignore Note: conda recipe in feedstock! README.rst CHANGELOG.rst CODE_OF_CONDUCT.rst CONTRIBUTING.rst LICENSE docs conf.py usage.rst ... User/developer community information included in repo Documentation source in working branches require docs updates for PR integration! Built documentation only in orphaned gh-pages branch MANIFEST.in setup.py nameless subpackage __init__.py ... module.py __init__.py __main__.py Package contains all python code Don't hide package in repo-level src/lib directories Don't hide python libraries in unindexed directories I.e., avoid: sys.path.append(...) Only use for external tools that can't be packaged Prefer entrypoints (e.g., __main__.py) over srcipts/bin .travis.yml appveyor.yml ci ... pytest.ini tox.ini tests test_nameless.py Testing and continuous integration Test all supported python versions (at least 2.7, 3.6, 3.7 for conda-forge) Prefer CI testing where possible Automate longer/intensive testing as much as possible (e.g., Jenkins)
Move to python 3 From 2008(!) "The End Of Life date (EOL, sunset date) for Python 2.7 has been moved five years into the future, to 2020." See https://legacy.python.org/dev/peps/pep-0373/ Packages on conda-forge/pypi are already dropping py2 support To avoid an abrupt transition, E3SM-Unified needs to support both for at least 2 releases Python 3 version will become the "default" as soon as it is available so users hopefully notice problems that arise Many, many scripts may need to be ported.
Resources for python 2.7 to 3.x transition Automated conversion is possible but not recommended: https://docs.python.org/2/library/2to3.html Biggest change: print statements need parentheses print('Output: {}'.format(someinfo)) A good starting point: put this at the top of each python file: # coding=utf-8 from __future__ import absolute_import, division, print_function, \ unicode_literals This will make python 2.7 act more like 3.x A tutorial: https://docs.python.org/3/howto/pyporting.html Joe and I are happy to help!
Joe: proposed best practices Follow PEP 8 style guide as much as possible Improves readability; consistency across python code bases https://pep8.org/ Packaging Example on next slide Target PyPI and conda-forge for distribution https://packaging.python.org/tutorials/packaging-projects/ prefer entry points: Pip/conda handles script creation and installation; linux, osx, and windows! package/__main__.py for CLI setup.py: entry_points={'console_scripts': ['script = package.__main__:main']}, execute with: python m package ... pip install e ./ #once!; script Packages changes are immediately reflected; execute same way as users
Repository name Python package name pynameless .git .gitignore Note: conda recipe in feedstock! README.rst CHANGELOG.rst CODE_OF_CONDUCT.rst CONTRIBUTING.rst LICENSE docs conf.py usage.rst ... User/developer community information included in repo Documentation source in working branches require docs updates for PR integration! Built documentation in orphaned gh-pages branch MANIFEST.in setup.py nameless subpackage __init__.py ... module.py __init__.py __main__.py Package contains all python code Don't hide package in repo-level src/lib directories Don't hide python libraries in unindexed directories I.e., avoid: sys.path.append(...) Only use for external tools that can't be packaged Prefer entrypoints (e.g., __main__.py) over srcipts/bin .travis.yml appveyor.yml ci ... pytest.ini tox.ini tests test_nameless.py Testing and continuous integration Test all supported python versions (at least 2.7, 3.6, 3.7 for conda-forge) Prefer CI testing where possible Automate longer/intensive testing as much as possible (e.g., Jenkins)
Xylar: proposed best practices Unit testing Small tests for simple infrastructure (i/o, climatologies, remapping, time keeping, etc.) Continuous Integration (CI) can verify these with each pull request Regression testing Small E3SM runs (e.g. oQU240 in the ocean) on which analysis can be tested on a laptop Perhaps nightly testing (e.g. with Jenkins) if help can be provided to get there Making data (image and Netcdf files available) e3sm_diags is the gold standard right now Reproducibility and provenance Provide config file and/or command-line options to reproduce output Strategies for parallelism MPI support? (high-res analysis may be too big for one node) Tasks with prerequisites/dependencies (e.g. plotting happens after climatology is computed and remapped)
Xylar: proposed best practices Stop writing scripts in Python 2.7, use Python 3.x from now on (more soon) Python packages should go to conda-forge whenever possible Requires that all dependencies are on conda-forge Enables support for python 3.6 and 3.7, linux and osx without any extra work (via continuous integration CI bots). For now, also builds python 2.7 if supported. Python packages should use absolute imports (required in python 3): from mpas_analysis.shared.io import write_netcdf not modify the search path for modules (I believe CIME does this) These practices make it easier for users to navigate the source code
Xylar: Move to python 3 From 2008(!) "The End Of Life date (EOL, sunset date) for Python 2.7 has been moved five years into the future, to 2020." See https://legacy.python.org/dev/peps/pep-0373/ Packages on conda-forge and pypi are already beginning to drop support To avoid an abrupt transition, E3SM-Unified needs to support both for at least 2 releases Python 3 version will become the "default" as soon as it is available so users hopefully notice problems that arise Many, many scripts may need to be ported.
Xylar: Resources for python 2.7 to python 3.7 transition Automated conversion is possible but not recommended: https://docs.python.org/2/library/2to3.html Biggest change: print statements need parentheses print('Output: {}'.format(someinfo)) A good starting point: put this at the top of each python file: from __future__ import absolute_import, division, print_function, \ unicode_literals This will make python 2.7 act more like 3.x A tutorial: https://docs.python.org/3/howto/pyporting.html Joe and I are happy to help!