Study on Detecting Modified Peptides in Complex Mixtures by A.B.R.F. Proteome Informatics Research Group

a b r f n.w
1 / 36
Embed
Share

A.B.R.F. Proteome Informatics Research Group conducted a study to evaluate participants' ability to identify modified peptides in complex mixtures. The study goals included understanding discrepancies in results and producing benchmark datasets. Participants were provided with study materials and instructions on data analysis methods. Participation details were outlined for this iPRG 2012 study.

  • Proteomics
  • Peptide Analysis
  • Bioinformatics
  • Research Study
  • Data Analysis

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. A B R F Proteome Informatics Research Group iPRG 2012: A Study on Detecting Modified Peptides in a Complex Mixture ABRF 2012, Orlando, FL 3/17-20/2012

  2. A B R F Proteome Informatics Research Group IPRG2012 STUDY: DESIGN 2

  3. A B R F Study Goals Proteome Informatics Research Group Primary: Evaluate the ability of participants to identify modified peptides present in a complex mixture Secondary: Find out why result sets might differ between participants Tertiary: Produce a benchmark dataset, along with an analysis resource 3

  4. A B R F Study Design Proteome Informatics Research Group Use a common, rich dataset Use a common sequence database Allow participants to use the bioinformatic tools and methods of their choosing Use a common reporting template Report results at an estimated 1% FDR (at the spectrum level) Ignore protein inference 4

  5. A B R F Sample Proteome Informatics Research Group Tryptic digest of yeast (RM8323 NIST), spiked with 69 synthetic modified peptides (tryptic peptides from 6 different proteins sPRG) Phospho (STY) Sulfo (Y) Mono-, di-, trimethyl (K) Mono-, dimethyl (R) Acetyl (K) Nitro (Y) 5

  6. A B R F Supplied Study Materials Proteome Informatics Research Group 5600 TripleTOF dataset (i.e. WIFF file) WIFF, mzML, dta, MGF (de-isotoped); conversions by MS Data Converter 1.1.0 MGF (not de-isotoped conversion by Mascot Distiller 2.4) 1 fasta file (UniProtKB/SwissProt S. cerevisiae, human, + 1 bovine protein + trypsin from Dec. 2011) 1 template (Excel) 1 on-line survey (Survey Monkey) 6

  7. A B R F Instructions to Participants Proteome Informatics Research Group 1. Retrieve and analyze the data file in the format of your choosing, with the method(s) of your choosing 2. Report the peptide to spectrum matches in the provided template 3. Report measures of reliability for PTM site assignments (optional) 4. Fill out the survey 5. Attach a 1-2 page description of the methodology employed 7

  8. A B R F Proteome Informatics Research Group iPRG 2012 STUDY: PARTICIPATION 8

  9. A B R F Soliciting Participants and Logistics Proteome Informatics Research Group Study advertised on the ABRF website and listserv and by direct invitation from iPRG members 1. Email participation request to iPRGxxxx@gmail.com Participant 2. Send official study letter with instructions iPRG members Questions / Answers 3. All further communication (e.g., questions, submission) through iPRGxxx.anonymous@gmail.com Anonymizer 9

  10. A B R F Participants (i) overall numbers Proteome Informatics Research Group 24 submissions One participant submitted two result sets 9 initialed iPRG member submissions (with appended i ) 2 vendor submissions (identifiable by appended v ) 10

  11. A B R F About the Participant Proteome Informatics Research Group ABRF Member Nonmember I routinely analyze these sorts of data I have worked with several data sets I have worked with a few data sets Bioinformatician/S oftware Developer Complete Novice Mass Spectrometrist Lab Scientist Director/Manager Post-Grad 11 11

  12. A B R F About the Participant s Lab Proteome Informatics Research Group Academic Manufacturer/Ve ndor Biotech/Pharma/I ndustry Government North America Other Europe Asia Core Only Australia/NZ Africa Software development only (no research facility) Conduct both core functions and non- core lab research 12 12

  13. A B R F Participation in sPRG Study Proteome Informatics Research Group YES NO Only one participant indicated he used sPRG information to aid his analysis. This person was one of the least successful in identifying the spiked-in peptides! 13

  14. A B R F Search Engine Used Proteome Informatics Research Group 10 9 8 7 6 5 4 3 2 1 0 14 14

  15. A B R F Site Localization Software Proteome Informatics Research Group 8 7 6 5 4 3 2 1 0 4 participants did not list using software for site localization. 15

  16. A B R F Summary of Submitted Results Proteome Informatics Research Group 7000 # spectra Id Yes # unique Peptides UC ID Yes 6000 5000 4000 3000 Only reported modified peptides 2000 1000 0 93128i 87133i 94158i 97053i 42424i 77777i 40104i 87048i 34284i 71755v 58288v 33564 11211 58409 23068 92653 23117 74564 14151 52781 47603 14152 45511 11821 16

  17. A B R F Summary of IDs and Localizations Proteome Informatics Research Group 7000 # No Mods # Common Mods (^q,^c,m,n,q) # Nterm Mods # AA Mutation Mods # Interesting Mods 6000 5000 Peptide Identification in all Spectra # Spectra 4000 3000 2000 1000 0 700 93128i 87133i 94158i 97053i 42424i 77777i 40104i # Interesting Mod Loc Certainty Y 87048i 34284i 71755v 58288v 33564 11211 58409 23068 92653 23117 74564 14151 52781 47603 14152 45511 11821 # Interesting Mod Loc Certainty N 600 Site Localization in Spectra With Interesting Modifications 500 # Spectra 400 300 200 100 0 93128i 87133i 94158i 97053i 42424i 77777i 40104i 87048i 34284i 71755v 58288v 33564 11211 58409 23068 92653 23117 74564 14151 52781 47603 14152 45511 11821 17

  18. A B R F Overlap of spectrum identifications Proteome Informatics Research Group 12000 10000 8000 7840 agreed on by 3 or more participants Cummulative # Spectra 6000 4000 2000 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 # Participants Agreeing 18

  19. A B R F Room for improvement in thresholding? Proteome Informatics Research Group 7500 #ND No Id, Diff from Consensus #Y<3 P Id Yes #YD Yes Id, Diff from Consensus #NS No Id, Same as Consensus #YS Yes Id, Same as Consensus 7000 6500 6000 5500 5000 4500 4000 # Spectra 3500 3000 2500 2000 1500 1000 500 0 93128i 87133i 94158i 97053i 42424i 77777i 40104i 87048i 34284i 71755v 58288v 33564 11211 58409 23068 92653 23117 74564 14151 52781 47603 14152 45511 11821 71755v 58288v 33564 93128i 11211 mgf mzML mzML 58409 87133i 94158i 97053i 42424i 77777i 23068 40104i 87048i 92653 34284i 23117 mgf mzML mgf_nd mgf mgf_nd mgf mzML mzML mgf_nd WIFF MDi PW M O PPr pF M MG O ST MM ST XT IH XT Ot 74564 mgf 14151 mgf_nd 52781 mgf 47603 mgf 14152 mgf_nd 45511 mgf 11821 mgf_nd mgf mgf mzML mzML mgf_nd mgf mgf mzML Peaklist Pk Pk PPi Ot SM pF Pk MDi PD Sq PD Sq Spectral Pre-Processing PkDB Pk PkDB PkDB Pk PkDB By Pk M PPi M O XT P/PP TPP XT SM pF M M PPi M O M M Peptide Identification PkDB Discovery of Unexpected Mods By Pk Pk M PPi ST PPr pF IH MO SM pF M Pk PPi O M PkDB Pk PkDB PkDB Pk PkDB Pk MDe PPi M O ST P/PP TPP PPr pF M MM IH IP IH M O SM pF M M AS Sc IH An MDe Ot PD Sq Site Localization By Pk Pk XL IH Ot PPi P/PP TPP PPr XL pF IH XL IH XL pF M Pk M XL Sc M PR Results Filtering PkDB PkDB NTT 2 1 1 2 1 ? 1 1 1 2 1 ? 1 2 2 2 2 ? 1 2 2 2 ? 2 5-10 years 5-10 years 5-10 years >10 years 5-10 years >10 years 5-10 years 5-10 years 5-10 years 5-10 years >10 years >10 years >10 years >10 years >10 years >10 years >10 years < 1 year 3-4 years 3-4 years 1-2 years 1-2 years 1-2 years 1-2 years Experience An AS By IH IP M MDe MDi Andromeda/MaxQuant A-Score Byonics In-house software IDPicker Mascot Mascot Delta Score Mascot Distiller MG MM MO O Ot P/PP PD MS-GFDB MyriMatch MODa OMSSA Other Pep/Prot Prophet ProteomeDiscoverer pF Pk PkDB PPi PPr PR PW pFind PEAKS PEAKSDB Protein Pilot Protein Prospector PhosphoRS ProteoWizard Sc SM Sq ST TPP XL XT Scaffold Spectrum Mill Sequest SpectraST TransProteomic Pipeline Excel X!Tandem 19

  20. A B R F ESR and FDR Extraordinary Skill Rate or High False Discovery Rate? ESR + FDR = 100* (Y<3P+YD)/total ids Y Proteome Informatics Research Group 24 participants 3 for consensus 22 20 Y<3 P percent 18 YD percent 16 14 12 10 % 8 6 4 2 0 93128i 87133i 94158i 97053i 42424i 77777i 40104i 87048i 34284i 71755v 58288v 33564 11211 58409 23068 92653 23117 74564 14151 52781 47603 14152 45511 11821 20

  21. A B R F Characteristics of consensus spectra Proteome Informatics Research Group 7840 spectra >=3 participants agreeing on sequence 1 10 100 1000 10000 447 Nterm-Acetyl 3 Nterm-Carbamyl 3 Nterm-Other 70 PyroGlu Q 11 PyroGlu E 6 PyroCarbamidomethylCys 94 m: Oxidation 310 n: Deamidation 183 q: Deamidation 6 c 94 d 107 e 5 w: Oxidation 3 p 165 k 45 r 294 s 137 t 132 y 6117 No Variable Mods Consensus requires agreement on Sequence, but not modification localization 21

  22. A B R F Peak lists Proteome Informatics Research Group Two types of peak lists were supplied Deisotoped and non deisotoped Can only tell fragment charge state from non- deisotoped Requires search engine to be able to de-isotope spectrum 22

  23. A B R F Peaklists Proteome Informatics Research Group Number of spectra with undefined precursor charge state Deisotoped 1031 (304 in consensus results) Non-deisotoped 6094 (1140 in consensus results) For 1013 out of 7840 consensus spectra the precursor m/z differ by greater than 0.02 Da between deisotoped and non-deisotoped peak list. For 238 consensus spectra the peak lists had different specified charge state 193 consensus results only possible with deisotoped peak list 45 consensus results only possible with non-deisotoped peak list For 19 consensus results multiple people who searched the nd peak list agreed on a confident different answer For 4 consensus results multiple people who searched the deisotoped peak list agreed on a confident different answer 23

  24. A B R F Mixed Spectra Proteome Informatics Research Group 465.19 2+ 464.59 3+ 465.19 2+ Deisotoped peaklist 464.59 3+ Non-deisotoped peaklist 24

  25. A B R F Synthetic Peptide ID by Peptide Proteome Informatics Research Group TVIDyNGER NGDTASPkEYTAGR TIAQDyGVLK LkAQLGPDESK Sulfo Trimethyl SVSDyEGK LkAEGSEIR DISLSDyK FPkAEFAEVSK ALAPEyAK yKPEsDELtAEK EkLLDFIK WVtFIsLLFLFssAYSR NGDTASPkEYTAGR VPQVstPtLVEVsR LkAQLGPDESK VPQVstPtLVEVSR Methyl (K) VDAtEEsDLAQQyGVR LkAEGSEIR tyEtTLEK FPkAEFAEVSK tLsDyNIQK EkLLDFIK TLSDyNIQK GTrDYSPR tLsDYNIQK tLSDYNIQK GILrQITVNDLPVGR tItLEVEPsDtIENVK LDELrDEGK Methyl (R) TITLEVEPsDtIENVK ESTLHLVLrLR THILLFLPKsVSDYEGK tHILLFLPKsVsDyEGK Phospho AEGSEIrLAK SVsDYEGK NGDTASPkEYTAGR NVAVDELsR LkAQLGPDESK LVQAFQFtDK LkAEGSEIR LVNEVtEFAK IFsIVEQR EkLLDFIK Dimethyl (K) EstLHLVLR FPkAEFAEVSK EsTLHLVLR GTrDYSPR EStLHLVLR DQGGELLsLR GILrQITVNDLPVGR DIsLsDyK LDELrDEGK DIsLSDyK Dimethyl (R) AEGSEIrLAK DISLSDyK ADEGISFrGLFIIDDK DISLsDyK AEFAEVsK LkLVSELWDAGIK ADEGIsFR LkAEGSEIR TLSDyNIQK GLFIIDDkGILR TIAQDyGVLK Acetyl SVSDyEGK FkDLGEENFK DISLSDyK Nitro DEGkASSAK ALAPEyAK 0 5 10 15 20 25 0 5 10 15 20 25 30 25 # participants # participants

  26. A B R F Synthetic Peptide ID by Participant Proteome Informatics Research Group 58288v 71755v 93128i 87133i 94158i 97053i 42424i 77777i 40104i 87048i 34284i 74564 33564 11211 58409 23068 92653 23117 14151 52781 47603 14152 45511 11821 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Acetyl (K) 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Dimethyl (K) 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Dimethyl (R) 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Methyl (K) 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Methyl (R) 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Nitro (Y) 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Phospho (STY) 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Sulfo (Y) 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Trimethyl (K) 1 1 1 1 1 1 1 1 1 1 26 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

  27. A B R F Correct Localization of Modified Synthetic Peptides Proteome Informatics Research Group 70 synthetic modified peptides were spiked into sample. 7 of these were confidently found by no participant Correct localization & name of modification reported 70 # Spiked Unique Peptides Correct Mod Loc, Name; Mod Loc Certainty N # Spiked Unique Peptides Correct Mod Loc, Name; Mod Loc Certainty Y 60 # Unique Peptides Correct Mod Localization 50 40 30 20 10 0 93128i 87133i 94158i 97053i 42424i 77777i 40104i 87048i 34284i 71755v 58288v 33564 11211 58409 23068 92653 23117 74564 14151 52781 47603 14152 45511 11821 27

  28. A B R F FLR of Modified Synthetic Peptides Ignored PSMs contain mods of residues other than s,t,y,k,r . Sample handling mods (n,q,d,e, etc). Proteome Informatics Research Group FLR = 100% * # PSMs wrong localization of s,t,y,k,r # PSMs wrong + right localization of s,t,y,k,r 500 # Spiked PSMs Mod Loc Certainty N # Spiked PSMs Mod Loc Certainty Y, Ignored # Spiked PSMs Wrong Mod Loc; Mod Loc Certainty Y # Spiked PSMs Correct Mod Loc; Mod Loc Certainty Y 450 400 # Spiked Peptide PSMs 350 300 250 200 150 100 50 0 93128i 87133i 94158i 97053i 42424i 77777i 40104i 87048i 34284i 71755v 58288v 33564 11211 58409 23068 92653 23117 74564 14151 52781 47603 14152 45511 11821 18 16 Spiked Peptide PSM FLR (%) 14 12 10 8 6 4 2 0 93128i 87133i 94158i 97053i 42424i 77777i 40104i 87048i 34284i 71755v 58288v 33564 11211 58409 23068 92653 23117 74564 14151 52781 47603 14152 45511 11821 28 1% 1-2% 10% <30% <1% 5% 5% 1% <1 0.5 0.01 <5%

  29. A B R F Incorrect Localization by Peptide Number of PSM s with Incorrect Site Localization Mod Loc Confidence Y Present as sulfo-Tyr Present as phospho S-10 often mislocalized as S-12 or Y-14 Present as mono, di, tri methyl K often mislocalized at R Proteome Informatics Research Group 71755v 58288v 93128i 87133i 94158i 97053i 42424i 77777i 40104i 87048i 34284i 33564 11211 58409 23068 92653 23117 74564 14151 52781 47603 14152 45511 11821 EKLLDFIK AEGSEIRLAK VDATEESDLAQQYGVR TITLEVEPSDTIENVK ESTLHLVLR AEFAEVSK LKLVSELWDAGIK DQGGELLSLR TYETTLEK NGDTASPKEYTAGR LKAEGSEIR TVIDYNGER ADEGISFR YKPESDELTAEK GTRDYSPR VPQVSTPTLVEVSR ADEGISFRGLFIIDDK ALAPEYAK TIAQDYGVLK THILLFLPKSVSDYEGK WVTFISLLFLFSSAYSR IFSIVEQR TLSDYNIQK GILRQITVNDLPVGR NVAVDELSR LDELRDEGK ESTLHLVLRLR DEGKASSAK SVSDYEGK LVQAFQFTDK LVNEVTEFAK GLFIIDDKGILR FPKAEFAEVSK LKAQLGPDESK DISLSDYK FKDLGEENFK 1 1 1 1 4 4 1 4 1 4 3 4 4 4 4 1 1 3 3 1 1 5 1 3 1 1 6 2 8 1 1 1 1 1 1 2 4 1 3 1 1 1 1 1 1 1 1 1 1 29 1 1 2 3 1 3 6

  30. A B R F Phospho vs Sulfo Proteome Informatics Research Group DISLSDY(Phospho)K Observe modified fragment ions. DISLSDY(Sulfo)K Observe unmodified fragment ions. Spectrum looks essentially identical to unmodified peptide spectrum 30

  31. A B R F Conclusions Proteome Informatics Research Group Reasonable number of participants from around the globe, mainly experienced users but a few first-timers Large spread in number of spectra identified False negatives (NS) are generally much higher than false positives, so there is generally room for improvement Peak list was a significant factor on performance Varied performance in detecting PTMs Most participants struggled with sulfation Multiply phosphorylated harder to find than singly Most common errors in site assignment were: Reporting sulfo(Y) as phospho(ST) Mis-assignment of site/s in multiply phosphorylated peptides31

  32. A B R F What did the participants think? Proteome Informatics Research Group The spiked proteins made it possible to game the study - look for the uncommon modifications only on the spikes. Of course we didn't do this. Overall I'd say this was a flawed but very interesting ABRF study. 22 out of 24 participants found the study useful Too many modifications at the same time. Manual validation is necessary and the right time necessary for this study is too demanding for this challenge. 32

  33. A B R F Participant s Confidence in Analyzing PTM Data Proteome Informatics Research Group Before After Very Confident Very Confident Confident Confident Not Confident Not Confident No Experience 33 33

  34. A B R F How difficult do you think this study was? Proteome Informatics Research Group Too difficult Challenging Just right What was your total analysis time for the entire project? 12 10 8 6 4 2 0 0-8 8-16 16-24 24-32 32-40 40-48 Time (hrs) >48 34

  35. A B R F Based on this study, would you consider participating in future ABRF studies? Proteome Informatics Research Group Yes No 35

  36. A B R F Thank you! Questions? Proteome Informatics Research Group THANK YOU TO ALL STUDY PARTICIPANTS! iPRG Nuno Bandeira Robert Chalkley(chair) Matt Chambers Karl Clauser John Cottrell Eric Deutsch Eugene Kapp Henry Lam Hayes McDonald Tom Neubert (EB liaison) Ruixiang Sun Dataset Creation Chris Colangelo Anonymizer: Jeremy Carver, UCSD 36

Related


More Related Content