Effortless Provenance Tracking in Python with Recipy by Robin Wilson

recipy n.w
1 / 22
Embed
Share

"Discover how to effortlessly track the provenance of your data in Python using Recipy, a convenient tool introduced by Robin Wilson. Learn to work seamlessly with libraries without modification and enhance your data management processes. Check out the demonstrated ease and efficiency in this innovative approach."

  • Python
  • Provenance Tracking
  • Data Management
  • Robin Wilson
  • Recipy

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. recipy Effortless provenance tracking in Python Robin Wilson robin@rtwilson.com @sciremotesense

  2. Provenance Lab notebook

  3. It must: Be easy no effort! Work with libraries without modification recipy

  4. WINNERS! Raquel Alegre, Robin Wilson, Janneke van der Zwaan #CollabW2015 www.software.ac.uk/cw15

  5. import recipy

  6. import pandas as pd from matplotlib.pyplot import savefig data = pd.read_csv('data.csv') data.plot(x='year', y='temperature') savefig('graph.png') data.temperature = data.temperature * 100 data.to_csv('output.csv')

  7. import recipy import pandas as pd from matplotlib.pyplot import savefig data = pd.read_csv('data.csv') data.plot(x='year', y='temperature') savefig('graph.png') data.temperature = data.temperature * 100 data.to_csv('output.csv')

  8. DEMO

  9. Set up import recipy import pandas as pd from matplotlib.pyplot import * Monkey Patched Hooks data = pd.read_csv('data.csv') data.plot(x='year', y='temperature') savefig('graph.png') data.temperature = data.temperature * 100 data.to_csv('output.csv') DB

  10. Your turn! pip install recipy Add pull requests & issues at github.com/recipy/recipy/ Contribute? (code .or money!) Fill in survey: https://raquelalegre.typeform.com/to/H7wQKZ

  11. Monkey Patching No on_save hooks So, change code at runtime def wrapped_read_csv(*args): print('You called read_csv!') pd.read_csv(*args) pd.read_csv = wrapped_read_csv patch_function(mod, f, wrapper_function)

  12. NoSQL Database Client-Server Separate installation Can be remote Scalable? Pure Python No install needed! JSON-based Scalability?

  13. sys.meta_path is magic! 1. Find module Search file system 2. Load module Load as standard Python module

  14. sys.meta_path is magic! 1. Find module Search file system Only work with one module 2. Load module Load as standard Python module AND patch functions to use wrapper

  15. PatchImporter PatchSimple PatchPandas PatchNumpy PatchMPL

  16. PatchImporter Crazy magic! PatchSimple Simplification PatchPandas PatchNumpy PatchMPL

  17. class PatchNumpy(PatchSimple): modulename = 'numpy' input_functions = ['genfromtxt', 'loadtxt', 'load', 'fromfile'] output_functions = ['save', 'savez', 'savez_compressed', 'savetxt'] input_wrapper = create_wrapper(log_input, 0, 'numpy') output_wrapper = create_wrapper(log_output, 0, 'numpy')

  18. class PatchNumpy(PatchSimple): modulename = 'numpy' input_functions = ['genfromtxt', 'loadtxt', 'load', 'fromfile'] output_functions = ['save', 'savez', 'savez_compressed', 'savetxt'] input_wrapper = create_wrapper(log_input, 0, 'numpy') output_wrapper = create_wrapper(log_output, 0, 'numpy')

  19. class PatchNumpy(PatchSimple): modulename = 'numpy' input_functions = ['genfromtxt', 'loadtxt', 'load', 'fromfile'] output_functions = ['save', 'savez', 'savez_compressed', 'savetxt'] input_wrapper = create_wrapper(log_input, 0, 'numpy') output_wrapper = create_wrapper(log_output, 0, 'numpy')

  20. What next? Wrap IO from more packages Track custom packages Store metadata in output files Store file hashes Annotate, export & share runs Automated testing & better docs

Related


More Related Content