
Python Implementation of PAM250 Matrix Construction
Learn how to replicate the PAM250 matrix used in scoring sequence alignments through Python code. Explore the steps taken to create this matrix accurately, along with solving asymmetry issues in a DNA example.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
RECREATING THE PAM250 MATRIX MIRIAM BERN
THE PROBLEM PAM 250 Matrix used to score sequence alignments Provides more accurate understanding of which amino acids will be conserved and which mutations will occur Is it easy to replicate? Anna s DNA example: Why was it not symmetric?
HOW DID I SOLVE IT? Turned steps of Dayhoff paper into Python code Wrote functions to compute the PAM250 Matrix from the Accepted Point Mutations Matrix Used the steps to replicate Anna s DNA example to see what went wrong
DATA Started out with the Accepted Point Mutations Matrix from the paper as a text file Made Python dictionaries from the relative mutabilites and frequencies listed in the paper Used the equations in the paper to construct the matrices Checked program output against the matrices in the paper
HIGH LEVEL STEPS Step 1: Calculate proportionality constant (lambda) Step 2: Using the Accepted Point Mutations Matrix, relative mutabilities, and lambda, create Mutation Probability Matrix for 1 PAM Step 3: Using Python package NumPy, multiply the matrix by itself 250 times to get the Mutation Probability Matrix for 250 PAMs Step 4: Divide by the frequency to get the Relatedness Odds Matrix Step 5: Take the log of these values to get the Log Odds Matrix Step 6: Reorder the Log Odds Matrix to match the amino acid order in the paper and cut it in half
RESULTS: PAM250 I got the same answers! (With a few rounding errors)
RESULTS: CLASS EXAMPLE Question: Why was the class example asymmetric? No proportionality constant used in DNA example Definitions of frequency and relative mutability make it unlikely that nucleotides with different mutabilities would appear the same number of times