Protein Sequencing and Mass Spectrometry Insights in CSE182 Course

cse182 l11 n.w
1 / 56
Embed
Share

Explore the dynamic nature of cells, peptide backbone fragmentation, mass spectrometry's role in proteomics, and the promise of computational algorithms for data interpretation in the CSE182 course on protein sequencing and mass spectrometry. Engage with topics ranging from gene finding, protein motifs, and DNA signals to population genetics and cellular functions.

  • Protein Sequencing
  • Mass Spectrometry
  • Proteomics
  • CSE182 Course
  • Cellular Functions

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. CSE182-L11 Protein sequencing and Mass Spectrometry CSE182

  2. Course Summary Gene finding Sequence Comparison (BLAST & other tools) Protein Motifs: Profiles/Regular Expression/HMMs Discovering protein coding genes Gene finding HMMs DNA signals (splice signals) How is the genomic sequence itself obtained? LW statistics Sequencing and assembly Population Genetics Next topic: the dynamic aspects of the cell ESTs Protein sequence analysis (Blast, Dictionary matching, Profiles, Reg. Expr., HMMs) CSE182

  3. The Dynamic nature of the cell The molecules in the body, RNA, and proteins are constantly turning over. New ones are created through transcription, translation Proteins are modified post-translationally, Old molecules are degraded CSE182

  4. Dynamic aspects of cellular function Expressed transcripts Microarrays to count the number of copies of RNA Expressed proteins Mass spectrometry is used to count the number of copies of a protein sequence. Protein-protein interactions (protein networks) Protein-DNA interactions Population studies CSE182

  5. The peptide backbone The peptide backbone breaks to form fragments with characteristic masses. H...-HN-CH-CO-NH-CH-CO-NH-CH-CO- OH Ri-1 Ri Ri+1 C-terminus N-terminus AA residuei-1 AA residuei+1 AA residuei CSE182

  6. Mass Spectrometry CSE182

  7. Nobel citation 02 CSE182

  8. The promise of mass spectrometry Mass spectrometry is coming of age as the tool of choice for proteomics Protein sequencing, networks, quantitation, interactions, structure . Computation has a big role to play in the interpretation of MS data. We will discuss algorithms for Sequencing, Modifications, Interactions.. CSE182

  9. Sample Preparation Enzymatic Digestion (Trypsin) + Fractionation CSE182

  10. Single Stage MS Mass Spectrometry LC-MS: 1 MS spectrum / second CSE182

  11. Tandem MS Secondary Fragmentation Ionized parent peptide CSE182

  12. The peptide backbone The peptide backbone breaks to form fragments with characteristic masses. H...-HN-CH-CO-NH-CH-CO-NH-CH-CO- OH Ri-1 Ri Ri+1 C-terminus N-terminus AA residuei-1 AA residuei+1 AA residuei CSE182

  13. Ionization The peptide backbone breaks to form fragments with characteristic masses. H+ H...-HN-CH-CO-NH-CH-CO-NH-CH-CO- OH Ri-1 Ri Ri+1 C-terminus N-terminus AA residuei-1 AA residuei+1 AA residuei Ionized parent peptide CSE182

  14. Fragment ion generation The peptide backbone breaks to form fragments with characteristic masses. H+ H...-HN-CH-CO NH-CH-CO-NH-CH-CO- OH Ri-1 Ri Ri+1 C-terminus N-terminus AA residuei-1 AA residuei AA residuei+1 Ionized peptide fragment CSE182

  15. Tandem MS for Peptide ID 88 145 292 405 534 663 778 907 1020 1166 b ions S G F L E E D E L K 1166 1080 1022 875 762 633 504 389 260 147 y ions 100 % Intensity [M+2H]2+ 0 250 500 750 1000 March 25 m/z

  16. Peak Assignment 88 145 292 405 534 663 778 907 1020 1166 b ions S G F L E E D E L K 1166 1080 1022 875 762 633 504 389 260 147 y ions y6 100 Peak assignment implies Sequence (Residue tag) Reconstruction! % Intensity y7 [M+2H]2+ y5 b3 b4 y2 y3 b5 y4 y8 b8 b9 b6 b7 y9 0 250 500 750 1000 m/z March 25

  17. END OF MS LECTURE 5/26/16 CSE182

  18. Database Searching for peptide ID For every peptide from a database Generate a hypothetical spectrum Compute a correlation between observed and experimental spectra Choose the best Database searching is very powerful and is the de facto standard for MS. Sequest, Mascot, and many others CSE182

  19. Spectra: the real story Noise Peaks Ions, not prefixes & suffixes Mass to charge ratio, and not mass Multiply charged ions Isotope patterns, not single peaks CSE182

  20. Peptide fragmentation possibilities (ion types) xn-i yn-i vn-iwn-i yn-i-1 zn-i -HN-CH-CO-NH-CH-CO-NH- CH-R Ri ai i+1 R i+1 bi bi+1 ci di+1 low energy fragments high energy fragments CSE182

  21. Ion types, and offsets P = prefix residue mass S = Suffix residue mass b-ions = P+1 y-ions = S+19 a-ions = P-27 CSE182

  22. Mass-Charge ratio The X-axis is not mass, but (M+Z)/Z Z=1 implies that peak is at M+1 Z=2 implies that peak is at (M+2)/2 M=1000, Z=2, peak position is at 501 Quiz: Suppose you see a peak at 501. Is the mass 500, or is it 1000? CSE182

  23. Isotopic peaks Ex: Consider peptide SAM Mass = 308.12802 You should see: 308.13 Instead, you see 308.13 310.13 CSE182

  24. Isotopes C-12 is the most common. Suppose C-13 occurs with probability 1% EX: SAM Composition: C11 H22 N3 O5 S1 What is the probability that you will see a single C-13? 11 1 0.01 (0.99)10 Note that C,S,O,N all have isotopes. Can you compute the isotopic distribution? CSE182

  25. All atoms have isotopes Isotopes of atoms O16,18, C-12,13, S32,34 . Each isotope has a frequency of occurrence If a molecule (peptide) has a single copy of C-13, that will shift its peak by 1 Da With multiple copies of a peptide, we have a distribution of intensities over a range of masses (Isotopic profile). How can you compute the isotopic profile of a peak? CSE182

  26. Isotope Calculation Denote: Nc : number of carbon atoms in the peptide Pc : probability of occurrence of C-13 (~1%) Then Nc=50 Pr[Peak at M]=NC 01- pc ( ) NC pc 0 +1 Pr[Peak at M+1] =NC 11- pc ( ) NC-1 pc Nc=200 1 +1 CSE182

  27. Isotope Calculation Example Suppose we consider Nitrogen, and Carbon NN: number of Nitrogen atoms P1,N: probability of occurrence of N-15 Pr(peak at M) Pr(peak at M+1)? Pr(peak at M+2)? NC NN Pr[Peak at M]=NC 01- pc ( ) 01- pN ( ) NN pc pN 0 0 NC-1 NN Pr[Peak at M+1] =NC 11- pc ( ) 01- pN ( ) NN pc pN 1 0 NC NN +NC 01- pc ( ) 11- pN ( ) NN-1 pc pN 0 1 How do we generalize? How can we handle Oxygen (O-16,18)? CSE182

  28. General isotope computation Definition: Let pi,a be the abundance of the isotope with mass i Da above the least mass Ex: P0,C : abundance of C-12, P2,O: O-18 etc. Characteristic polynomial Prob{M+i}: coefficient of xi in (x) (a binomial convolution) ( ) Na f(x)= p0,a+ p1,ax + p2,ax2+ a CSE182

  29. The isotope characteristic polynomial What is the characteristic polynomial for 1 carbon atom? 1 Nitrogen atom? Nc carbon atoms 1 Carbon and 1 Nitrogen atom together For a peptide, e.g. SAM (C11 H22 N3 O5 S1) CSE182

  30. Isotopic Profile Application In DxMS, hydrogen atoms are exchanged with deuterium The rate of exchange indicates how buried the peptide is (in folded state) Consider the observed characteristic polynomial of the isotope profile t1, t2, at various time points. Then ft2(x)=ft1(x)(p0,H+ p1,Hx)NH The estimates of p1,H can be obtained by a deconvolution Such estimates at various time points should give the rate of incorporation of Deuterium, and therefore, the accessibility. CSE182

  31. Quiz How can you determine the charge on a peptide? Difference between the first and second isotope peak is 1/Z Proposal: Given a mass, predict a composition, and the isotopic profile Do a goodness of fit test to isolate the peaks corresponding to the isotope Compute the difference CSE182

  32. Tandem MS summary The basics of peptide ID using tandem MS is simple. Correlate experimental with theoretical spectra In practice, there might be many confounding problems. Isotope peaks, noise peaks, varying charges, post-translational modifications, no database. Recall that we discussed how peptides could be identified by scanning a database. What if the database did not contain the peptide of interest? CSE182

  33. De novo analysis basics Suppose all ions were prefix ions? Could you tell what the peptide was? Can post-translational modifications help? CSE182

  34. Ion mass computations Amino-acids are linked into peptide chains, by forming peptide bonds Residue mass Res.Mass(aa) = Mol.Mass(aa)-18 (loss of water) CSE182

  35. Peptide chains MolMass(SGFAL) = resM(S)+ res(L)+18 CSE182

  36. M/Z values for b/y-ions Ionized Peptide H+R NH2-CH-CO- -NH-CH-COOH R Singly charged b-ion = ResMass(prefix) + 1 R NH+2-CH-CO-NH-CH-CO R Singly charged y-ion= ResMass(suffix)+18+1 R What if the ions have higher units of charge? NH+3-CH-CO-NH-CH-COOH R CSE182

  37. De novo interpretation Given a spectrum (a collection of b-y ions), compute the peptide that generated the spectrum. A database of peptides is not given! Useful? Many genomes have not been sequenced Tagging/filtering PTMs CSE182

  38. De Novo Interpretation: Example 0 88 145 274 402 b-ions S G E K 420 333 276 147 0 y-ions Ion Offsets b=P+1 y=S+19=M-P+19 y 2 y 1 b 1 b 2 100 200 300 400 500 M/Z CSE182

  39. Computing possible prefixes We know the parent mass M=401. Consider a mass value 88 Assume that it is a b-ion, or a y-ion If b-ion, it corresponds to a prefix of the peptide with residue mass 88-1 = 87. If y-ion, y=M-P+19. Therefore the prefix has mass P=M-y+19= 401-88+19=332 Compute all possible Prefix Residue Masses (PRM) for all ions. CSE182

  40. Putative Prefix Masses Only a subset of the prefix masses are correct. The correct mass values form a ladder of amino-acid residues Prefix Mass M=401 b y 88 87 332 145 144 275 147 146 273 276 275 144 S G E K 0 87 144 273 401 CSE182

  41. Spectral Graph Each prefix residue mass (PRM) corresponds to a node. Two nodes are connected by an edge if the mass difference is a residue mass. A path in the graph is a de novo interpretation of the spectrum G 87 144 CSE182

  42. Spectral Graph Each peak, when assigned to a prefix/suffix ion type generates a unique prefix residue mass. Spectral graph: Each node u defines a putative prefix residue M(u). (u,v) in E if M(v)-M(u) is the residue mass of an a.a. (tag) or 0. Paths in the spectral graph correspond to a interpretation 0 273 332 401 87 144 146 275 100 200 300 S G E K CSE182

  43. Re-defining de novo interpretation Find a subset of nodes in spectral graph s.t. 0, M are included Each peak contributes at most one node (interpretation)(*) Each adjacent pair (when sorted by mass) is connected by an edge (valid residue mass) An appropriate objective function (ex: the number of peaks interpreted) is maximized G 87 144 0 273 332 401 87 144 146 275 100 200 300 S G E K CSE182

  44. Two problems Too many nodes. Only a small fraction are correspond to b/y ions (leading to true PRMs) (learning problem) Multiple Interpretations Even if the b/y ions were correctly predicted, each peak generates multiple possibilities, only one of which is correct. We need to find a path that uses each peak only once (algorithmic problem). In general, the forbidden pairs problem is NP-hard 0 273 332 401 87 144 146 275 100 200 300 S G E K CSE182

  45. a n y n o d e s We will use other properties to decide if a peak is a b-y peak or not. For now, assume that (u) is a score function for a peak u being a b-y ion. (u) 0 273 332 401 87 144 146 275 100 200 300 S G E K CSE182

  46. Multiple Interpretation Each peak generates multiple possibilities, only one of which is correct. We need to find a path that uses each peak only once (algorithmic problem). In general, the forbidden pairs problem is NP- hard However, The b,y ions have a special non- interleaving property Consider pairs (b1,y1), (b2,y2) If (b1 < b2), then y1 > y2 CSE182

  47. Non-Intersecting Forbidden pairs 332 100 300 0 400 200 87 S G E K If we consider only b,y ions, forbidden node pairs are non-intersecting, The de novo problem can be solved efficiently using a dynamic programming technique. CSE182

  48. The forbidden pairs method Sort the PRMs according to increasing mass values. For each node u, f(u) represents the forbidden pair Let m(u) denote the mass value of the PRM. Let (u) denote the score of u Objective: Find a path of maximum score with no forbidden pairs. 332 100 300 0 400 200 87 f(u) u CSE182

  49. D.P. for forbidden pairs Consider all pairs u,v m[u] <= M/2, m[v] >M/2 Define S(u,v) as the best score of a forbidden pair path from 0->u, and v->M Is it sufficient to compute S(u,v) for all u,v? 332 100 300 0 400 200 87 u v CSE182

  50. D.P. for forbidden pairs Note that the best interpretation is given by max((u,v) E)S(u,v) 332 100 300 0 400 200 87 u CSE182 v

Related


More Related Content