EDirect for PubMed

Download Presenatation
EDirect for PubMed
Slide Note
Embed
Share

In this guide, you will learn strategies for scripting and automating data access in PubMed using EDirect. The content covers identifying goals, choosing tools, and understanding data format to build scripts step by step. The agenda includes extracting data, formatting results, and utilizing conditional arguments to manipulate output. Takeaways include tips for building scripts effectively and combining searching, retrieving, and arranging data tasks seamlessly.

  • PubMed
  • Data Access
  • EDirect
  • Scripting
  • Automation

Uploaded on Mar 06, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. The Insiders Guide to Accessing NLM Data EDirect for PubMed Part 5: Developing and Building Scripts Mike Davidson, MLS National Library of Medicine National Institutes of Health U.S. Department of Health and Human Services

  2. EDirect for PubMed Agenda Part 1: Getting PubMed Data Part 2: Extracting Data from XML Part 3: Formatting Results and Unix Tools Part 4: xtract Conditional Arguments Part 5: Developing and Building Scripts 2

  3. Todays Agenda Recap of Part Four Strategies for building scripts Basic step-by-step case study 3

  4. Recap of Part Four -if: limits output based on whether an element is present -if/-equals: limits output based on whether an element equals a certain value -if/-contains: limits output based on whether an element contains a certain string 4

  5. Recap of Part Four (cont'd) -or: At least one condition must be true -and: Both conditions must be true -position: Include a block based on position -def: Define a placeholder for blank cells 5

  6. Questions from last class? Homework? 6

  7. We have all the pieces esearch: search a database efetch: retrieve records in XML xtract: arrange XML data in tables but how do we put them together? 7

  8. Strategies for Developing a Script 1. Identify your goal 2. Choose your tool 3. Understand the data 4. Decide how much to automate 5. Build one step at a time 8

  9. 1. Identify your goal Identify your input: What do you know? Identify your output: What do you want to know? Identify your format: What do you want it to look like? 9

  10. 2. Choose your tool Is this actually a job for EDirect? Can you do this faster another way? How much data do you need? 10

  11. Working with ALL of PubMed E-utilities limits Usage restrictions Practical limits Data Distribution Bulk downloads of PubMed XML 11

  12. Get the best of both worlds? Create a local copy of PubMed New feature in EDirect v. 8.00! Requires some extra hardware Takes some time to configure Remember: xtract works with any XML! 12

  13. 3. Understand the data Get familiar with what is available Know the data's limitations Figure out what is possible, given the data 13

  14. 4. Decide how much to automate Multiple solutions to most problems Is a 100% solution worth the effort? Does this job need a human? 14

  15. 5. Build one step at a time Create each command separately Find opportunities to troubleshoot Test early. Test often. 15

  16. Case Study Start with a goal Identify our input, output, and format Build one step at a time Test frequently 16

  17. Case Study: Our Goal We want a list of articles about breast cancer that were published in the last year, and are linked to ClinicalTrials.gov entries. For each article, we want: PMID NCT Number(s) First Author Journal 17

  18. Case Study: Identify your input A PubMed search string "breast cancer AND clinicaltrials.gov[si]" Limited by date (March 2017 February 2018) 18

  19. Case Study: Identify your output PMID MedlineCitation/PMID NCT Number AccessionNumber but only if DataBankName is "ClinicalTrials.gov" 19

  20. Case Study: Identify your output (cont'd) First Author Author/LastName,Author/Initials but only for the first author. Journal ISOAbbreviation 20

  21. Case Study: Identify your format One row per article Four columns: PMID, Journal TA, First Author, NCT Number Columns separated by tabs Multiple NCT Numbers separated by "|" Saved to a text file (to open in Excel) 21

  22. Case Study: Time to build! 22

  23. Solving your problems! 23

  24. Next steps NCBI EDirect Cookbook Insider s Guide online https://dataguide.nlm.nih.gov Sign up for "utilities-announce" mailing list. CE Credit? Complete you final assignment! 24

  25. Final Assignment A few questions based on real-world problems Will be distributed via e-mail shortly Instructions are on the assignment DUE: 11:59 PM EDT, March 26, 2018 25

  26. Questions? 26

Related


More Related Content