
Molecular Genetics and Machine Learning in AI for Medicine
Explore the intersection of molecular genetics and machine learning in the context of AI for Medicine. Learn about the limitations of consensus sequences and the role of perceptrons in DNA and RNA sequence analysis. Discover how perceptrons work in classifying genetic data based on linear separability.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
AI for Medicine Lecture 6: Molecular Genetics and Machine Learning February 3, 2021 Mohammad Hammoud Carnegie Mellon University in Qatar
Today Last Wednesday s Session: Molecular genetics and sequence alignment Today s Session: Molecular genetics and machine learning Announcements: Assignment 1 is due tomorrow (Feb 4) by midnight Quiz 1 is on Feb 15 during the class time (all material included)
Limitation of Consensus Sequences: Similarity Does Not Necessarily Entail Functionality As discussed last lecture, while it is easy to find a consensus sequence that identifies existing binding sites, it is not easy to find one that is optimal for predicting the occurrence of new sites Stormo, Gary D., et al. [3] discovered that some published binding sites identified by consensus sequences did not function as translation initiation sites in mRNA of E.coli This lead to the hypothesis that there could be features (beyond only similarity between sequences) that can serve in distinguishing true ribosome binding sites from non-sites In an attempt to learn these features and distinguish between true-sites and non-sites , Stormo, Gary D., et al. used an ML algorithm named perceptron!
Perceptrons A perceptron is a linear binary classifier Input: Perceptron: Output: True-Site Linear Binary Classifier DNA or RNA Sequence Non-Site
Perceptrons A perceptron is a linear binary classifier Input: Perceptron: Output: Spam One of two (hence, binary) classes (thus, classification) Linear Binary Classifier Email Not Spam What does linear mean then?
Perceptrons The way a perceptron works is by learning a hyperplane that clearly separates examples into two classes A perceptron divides a space by a hyperplane into two half-spaces
Perceptrons The way a perceptron works is by learning a hyperplane that clearly separates examples into two classes This entails that the space has to be linearly separable (or otherwise, the perceptron will not converge i.e., will not be able to correctly classify all examples)
Perceptrons The way a perceptron works is by learning a hyperplane that clearly separates examples into two classes NOT a linearly separable space, hence, perceptron will not be effective!
Perceptrons The way a perceptron works is by learning a hyperplane that clearly separates examples into two classes A linearly separable space and a workable perceptron
Perceptrons The way a perceptron works is by learning a hyperplane that clearly separates examples into two classes A linearly separable space and another workable perceptron!
Perceptrons The way a perceptron works is by learning a hyperplane that clearly separates examples into two classes Yet, another valid hyperplane that can be learnt by a perceptron! If there are many hyperplanes, the perceptron will converge to one of them & classify correctly all examples But, how to represent the given examples in a computer program like perceptron?
From Abstraction Abstraction to Representation Representation Consider again the problem of detecting whether an email is a spam or not a spam Input: Perceptron: Output: Spam +1 Can be represented as integers! Linear Binary Classifier Email Not Spam -1 An email can be represented as a vector x = [x1, x2, . . . , xd], with each component xi corresponding to the presence (xi = 1) or absence (xi = 0) of a particular word in the email
What Comes After Representation? Abstraction Representation Learning Training Inferring
From Representation Representation to Training Training Let us see how we can train a perceptron to recognize spam emails The training set consists of pairs (x, y), where xis a vector of 0 s and 1 s representing an email and y is +1 or -1 indicating whether the email is a spam or not a spam So the email is treated as a set of words, which is represented as a vector of 0 s and 1 s, whereby 0 indicates the absence of a certain word and 1 indicates its presence in the email Words in this case are referred to as features And, y is treated as the label of x
From Representation Representation to Training Training Here is a simplified example of a training set of emails, which assumes only 5 words in any email Features: 5 words that any email can consist of Label: Spam (+1) or not spam (-1) and vaccine the of nigeria y Email a 1 1 0 1 1 +1 Email b 0 0 1 1 0 -1 6 emails encompassing the training dataset Email c 0 1 1 0 0 +1 Email d 1 0 0 1 0 -1 Email e 1 0 1 0 1 +1 Email f 1 0 1 1 0 -1
From Representation Representation to Training Training Here is a simplified example of a training set of emails, which assumes only 5 words in any email Features: x = A vector of 5 feature values Label: Spam (+1) or not spam (-1) and vaccine the of nigeria y Email a 1 1 0 1 1 +1 Email b 0 0 1 1 0 -1 6 emails encompassing the training dataset Email c 0 1 1 0 0 +1 Email d 1 0 0 1 0 -1 Email e 1 0 1 0 1 +1 Email f 1 0 1 1 0 -1
From Representation Representation to Training Training Here is a simplified example of a training set of emails, which assumes only 5 words in any email Email a = (x, y) = ([1, 1, 0, 1, 1], +1) and vaccine the of nigeria y Email a 1 1 0 1 1 +1 Email b 0 0 1 1 0 -1 Email c 0 1 1 0 0 +1 Email d 1 0 0 1 0 -1 Email e 1 0 1 0 1 +1 Email f 1 0 1 1 0 -1
From Representation Representation to Training Training Here is a simplified example of a training set of emails, which assumes only 5 words in any email Email d = (x, y) = ([1, 0, 0, 1, 0], -1) and vaccine the of nigeria y Email a 1 1 0 1 1 +1 Email b 0 0 1 1 0 -1 Email c 0 1 1 0 0 +1 Email d 1 0 0 1 0 -1 Email e 1 0 1 0 1 +1 Email f 1 0 1 1 0 -1
Training With that, we can: train a perceptron using this training set, and once done, examine any future email and infer whether it is a spam or not For this sake, we need to associate a weight, wi, with each xi in any input feature vector x = [x1, x2, , xn] (hence, we can define w = [w1, w2, , wn]) and a threshold ? such that the output is: ? The special case where the sum is ?will be regarded as wrong +1 (or spam) if ?=1 ???? > ? ? -1 (or not spam) if ?=1 ???? < ?
Training In short, we simply need to learnw = [w1, w2, , wn] based on the given training set Once we learn w, we can multiply it by any newx = [x1, x2, , xn] representing a new email, and infer whether it is a spam or not a spam based on the output value If the output value is greater than ?, the email is a spam If the output value is less than ?, the email is not a spam But, how to learn w?
Training Perceptron algorithm: 1. Assume ? is 0 and initialize the weight vector, w, to all 0 s 2. Pick a learning-rate, ?, that is a small, positive real number Note: The choice of ? affects the convergence of the perceptron. If ? is too small, convergence is slow; if it is too big, the decision boundary will dance around and again will converge slowly, if at all 3. Consider each training example t = (x, y) in turn: a. Let y = w.x b. If y and y have the same sign, do nothing; t is properly classified c. if y and y have different signs, or y = 0, replace w by w + ?.y.x. That is, adjust w slightly in the direction of x
Training When to stop the perceptron algorithm? Ideally, you want it to stop when it converges (i.e., when it learnt enough and can now render quite accurate during inference) We can repeat step 3 in the perceptron algorithm and: a. Terminate after a fixed number of rounds b. Or, terminate when the number of misclassified training examples stops changing c. Or, withhold a test set from the training data, and after each round, run the perceptron on the test data. Afterwards, terminate the algorithm when the number of errors on the test set stops changing
Training: A Concrete Example Let us apply the perceptron algorithm on our spam email recognition problem, assuming ? = 0.5 and starting with w = [0, 0, 0, 0, 0] and vaccine the of nigeria y Email a 1 1 0 1 1 +1 Email b 0 0 1 1 0 -1 Email c 0 1 1 0 0 +1 Email d 1 0 0 1 0 -1 Email e 1 0 1 0 1 +1 Email f 1 0 1 1 0 -1 Our Training Dataset
Training: A Concrete Example Let us apply the perceptron algorithm on our spam email recognition problem, assuming ? = 0.5 and starting with w = [0, 0, 0, 0, 0] w = w + ? ?.y.x w = w + ? ?.y.x w = w + ? ?.y.x w = w + ? ?.y.x w = w + ? ?.y.x w = w + ? ?.y.x w = w + ? ?.y.x w = w + ? ?.y.x w = w + ? ?.y.x w = w + ? ?.y.x w = w + ? ?.y.x w = w + ? ?.y.x w = w + ? ?.y.x w = w + ? ?.y.x w = w + ? ?.y.x w = w + ? ?.y.x w = w + ? ?.y.x w = w + ? ?.y.x w = w + ? ?.y.x x x x x x x x x x x x x x x x x x x x y y y y y y y y y y y y y y y y y y y y = w.x y = w.x y = w.x y = w.x y = w.x y = w.x y = w.x y = w.x y = w.x y = w.x y = w.x y = w.x y = w.x y = w.x y = w.x y = w.x y = w.x y = w.x y = w.x Sign of y = sign Sign of y = sign Sign of y = sign Sign of y = sign Sign of y = sign Sign of y = sign Sign of y = sign Sign of y = sign Sign of y = sign Sign of y = sign Sign of y = sign Sign of y = sign Sign of y = sign Sign of y = sign Sign of y = sign Sign of y = sign Sign of y = sign Sign of y = sign Sign of y = sign of y & y 0? of y & y 0? of y & y 0? of y & y 0? of y & y 0? of y & y 0? of y & y 0? of y & y 0? of y & y 0? of y & y 0? of y & y 0? of y & y 0? of y & y 0? of y & y 0? of y & y 0? of y & y 0? of y & y 0? of y & y 0? of y & y 0? (if signs of y and y are different or y = 0) (if signs of y and y are different or y = 0) (if signs of y and y are different or y = 0) (if signs of y and y are different or y = 0) (if signs of y and y are different or y = 0) (if signs of y and y are different or y = 0) (if signs of y and y are different or y = 0) (if signs of y and y are different or y = 0) (if signs of y and y are different or y = 0) (if signs of y and y are different or y = 0) (if signs of y and y are different or y = 0) (if signs of y and y are different or y = 0) (if signs of y and y are different or y = 0) (if signs of y and y are different or y = 0) (if signs of y and y are different or y = 0) (if signs of y and y are different or y = 0) (if signs of y and y are different or y = 0) (if signs of y and y are different or y = 0) (if signs of y and y are different or y = 0) [1,1,0,1,1] [1,1,0,1,1] [1,1,0,1,1] [1,1,0,1,1] [1,1,0,1,1] [1,1,0,1,1] [1,1,0,1,1] [1,1,0,1,1] [1,1,0,1,1] [1,1,0,1,1] [1,1,0,1,1] [1,1,0,1,1] [1,1,0,1,1] [1,1,0,1,1] [1,1,0,1,1] [1,1,0,1,1] [1,1,0,1,1] [1,1,0,1,1] [1,1,0,1,1] +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 [0,0,0,0,0] [1,1,0,1,1] = 0 [0,0,0,0,0] [1,1,0,1,1] = 0 [0,0,0,0,0] [1,1,0,1,1] = 0 [0,0,0,0,0] [1,1,0,1,1] = 0 [0,0,0,0,0] [1,1,0,1,1] = 0 [0,0,0,0,0] [1,1,0,1,1] = 0 [0,0,0,0,0] [1,1,0,1,1] = 0 [0,0,0,0,0] [1,1,0,1,1] = 0 [0,0,0,0,0] [1,1,0,1,1] = 0 [0,0,0,0,0] [1,1,0,1,1] = 0 [0,0,0,0,0] [1,1,0,1,1] = 0 [0,0,0,0,0] [1,1,0,1,1] = 0 [0,0,0,0,0] [1,1,0,1,1] = 0 [0,0,0,0,0] [1,1,0,1,1] = 0 [0,0,0,0,0] [1,1,0,1,1] = 0 [0,0,0,0,0] [1,1,0,1,1] = 0 [0,0,0,0,0] [1,1,0,1,1] = 0 [0,0,0,0,0] [1,1,0,1,1] = 0 [0,0,0,0,0] [1,1,0,1,1] = 0 No No No No No No No No No No No No No No No No No No No [0,0,0,0,0]+(0.5 1 [1,1,0,1,1]) = [0,0,0,0,0]+(0.5 1 [1,1,0,1,1]) = [0,0,0,0,0]+(0.5 1 [1,1,0,1,1]) = [0,0,0,0,0]+(0.5 1 [1,1,0,1,1]) = [0,0,0,0,0]+(0.5 1 [1,1,0,1,1]) = [0,0,0,0,0]+(0.5 1 [1,1,0,1,1]) = [0,0,0,0,0]+(0.5 1 [1,1,0,1,1]) = [0,0,0,0,0]+(0.5 1 [1,1,0,1,1]) = [0,0,0,0,0]+(0.5 1 [1,1,0,1,1]) = [0,0,0,0,0]+(0.5 1 [1,1,0,1,1]) = [0,0,0,0,0]+(0.5 1 [1,1,0,1,1]) = [0,0,0,0,0]+(0.5 1 [1,1,0,1,1]) = [0,0,0,0,0]+(0.5 1 [1,1,0,1,1]) = [0,0,0,0,0]+(0.5 1 [1,1,0,1,1]) = [0,0,0,0,0]+(0.5 1 [1,1,0,1,1]) = [0,0,0,0,0]+(0.5 1 [1,1,0,1,1]) = [0,0,0,0,0]+(0.5 1 [1,1,0,1,1]) = [0,0,0,0,0]+(0.5 1 [1,1,0,1,1]) = [0,0,0,0,0]+(0.5 1 [1,1,0,1,1]) = [0.5,0.5,0,0.5,0.5] [0.5,0.5,0,0.5,0.5] [0.5,0.5,0,0.5,0.5] [0.5,0.5,0,0.5,0.5] [0.5,0.5,0,0.5,0.5] [0.5,0.5,0,0.5,0.5] [0.5,0.5,0,0.5,0.5] [0.5,0.5,0,0.5,0.5] [0.5,0.5,0,0.5,0.5] [0.5,0.5,0,0.5,0.5] [0.5,0.5,0,0.5,0.5] [0.5,0.5,0,0.5,0.5] [0.5,0.5,0,0.5,0.5] [0.5,0.5,0,0.5,0.5] [0.5,0.5,0,0.5,0.5] [0.5,0.5,0,0.5,0.5] [0.5,0.5,0,0.5,0.5] [0.5,0.5,0,0.5,0.5] [0.5,0.5,0,0.5,0.5] [0,0,1,1,0] [0,0,1,1,0] [0,0,1,1,0] [0,0,1,1,0] [0,0,1,1,0] [0,0,1,1,0] [0,0,1,1,0] [0,0,1,1,0] [0,0,1,1,0] [0,0,1,1,0] [0,0,1,1,0] [0,0,1,1,0] [0,0,1,1,0] [0,0,1,1,0] [0,0,1,1,0] [0,0,1,1,0] [0,0,1,1,0] [0,0,1,1,0] [0,0,1,1,0] -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 [0.5,0.5,0,0.5,0.5] [0,0,1,1,0] = 0.5 [0.5,0.5,0,0.5,0.5] [0,0,1,1,0] = 0.5 [0.5,0.5,0,0.5,0.5] [0,0,1,1,0] = 0.5 [0.5,0.5,0,0.5,0.5] [0,0,1,1,0] = 0.5 [0.5,0.5,0,0.5,0.5] [0,0,1,1,0] = 0.5 [0.5,0.5,0,0.5,0.5] [0,0,1,1,0] = 0.5 [0.5,0.5,0,0.5,0.5] [0,0,1,1,0] = 0.5 [0.5,0.5,0,0.5,0.5] [0,0,1,1,0] = 0.5 [0.5,0.5,0,0.5,0.5] [0,0,1,1,0] = 0.5 [0.5,0.5,0,0.5,0.5] [0,0,1,1,0] = 0.5 [0.5,0.5,0,0.5,0.5] [0,0,1,1,0] = 0.5 [0.5,0.5,0,0.5,0.5] [0,0,1,1,0] = 0.5 [0.5,0.5,0,0.5,0.5] [0,0,1,1,0] = 0.5 [0.5,0.5,0,0.5,0.5] [0,0,1,1,0] = 0.5 [0.5,0.5,0,0.5,0.5] [0,0,1,1,0] = 0.5 [0.5,0.5,0,0.5,0.5] [0,0,1,1,0] = 0.5 [0.5,0.5,0,0.5,0.5] [0,0,1,1,0] = 0.5 [0.5,0.5,0,0.5,0.5] [0,0,1,1,0] = 0.5 [0.5,0.5,0,0.5,0.5] [0,0,1,1,0] = 0.5 No No No No No No No No No No No No No No No No No No No [0.5,0.5,0,0.5,0.5]+(0.5 (-1) [0,0,1,1,0]) = [0.5,0.5,-0.5,0,0.5] [0.5,0.5,-0.5,0,0.5] [0.5,0.5,-0.5,0,0.5] [0.5,0.5,-0.5,0,0.5] [0.5,0.5,-0.5,0,0.5] [0.5,0.5,-0.5,0,0.5] [0.5,0.5,-0.5,0,0.5] [0.5,0.5,-0.5,0,0.5] [0.5,0.5,-0.5,0,0.5] [0.5,0.5,-0.5,0,0.5] [0.5,0.5,-0.5,0,0.5] [0.5,0.5,-0.5,0,0.5] [0.5,0.5,-0.5,0,0.5] [0.5,0.5,-0.5,0,0.5] [0.5,0.5,-0.5,0,0.5] [0.5,0.5,-0.5,0,0.5] [0.5,0.5,-0.5,0,0.5] [0.5,0.5,-0.5,0,0.5] [0.5,0.5,-0.5,0,0.5] [0.5,0.5,0,0.5,0.5]+(0.5 (-1) [0,0,1,1,0]) = [0.5,0.5,0,0.5,0.5]+(0.5 (-1) [0,0,1,1,0]) = [0.5,0.5,0,0.5,0.5]+(0.5 (-1) [0,0,1,1,0]) = [0.5,0.5,0,0.5,0.5]+(0.5 (-1) [0,0,1,1,0]) = [0.5,0.5,0,0.5,0.5]+(0.5 (-1) [0,0,1,1,0]) = [0.5,0.5,0,0.5,0.5]+(0.5 (-1) [0,0,1,1,0]) = [0.5,0.5,0,0.5,0.5]+(0.5 (-1) [0,0,1,1,0]) = [0.5,0.5,0,0.5,0.5]+(0.5 (-1) [0,0,1,1,0]) = [0.5,0.5,0,0.5,0.5]+(0.5 (-1) [0,0,1,1,0]) = [0.5,0.5,0,0.5,0.5]+(0.5 (-1) [0,0,1,1,0]) = [0.5,0.5,0,0.5,0.5]+(0.5 (-1) [0,0,1,1,0]) = [0.5,0.5,0,0.5,0.5]+(0.5 (-1) [0,0,1,1,0]) = [0.5,0.5,0,0.5,0.5]+(0.5 (-1) [0,0,1,1,0]) = [0.5,0.5,0,0.5,0.5]+(0.5 (-1) [0,0,1,1,0]) = [0.5,0.5,0,0.5,0.5]+(0.5 (-1) [0,0,1,1,0]) = [0.5,0.5,0,0.5,0.5]+(0.5 (-1) [0,0,1,1,0]) = [0.5,0.5,0,0.5,0.5]+(0.5 (-1) [0,0,1,1,0]) = [0.5,0.5,0,0.5,0.5]+(0.5 (-1) [0,0,1,1,0]) = [0,1,1,0,0] [0,1,1,0,0] [0,1,1,0,0] [0,1,1,0,0] [0,1,1,0,0] [0,1,1,0,0] [0,1,1,0,0] [0,1,1,0,0] [0,1,1,0,0] [0,1,1,0,0] [0,1,1,0,0] [0,1,1,0,0] [0,1,1,0,0] [0,1,1,0,0] [0,1,1,0,0] [0,1,1,0,0] [0,1,1,0,0] [0,1,1,0,0] [0,1,1,0,0] +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 [0.5,0.5,-0.5,0,0.5] [0,1,1,0,0] = 0 [0.5,0.5,-0.5,0,0.5] [0,1,1,0,0] = 0 [0.5,0.5,-0.5,0,0.5] [0,1,1,0,0] = 0 [0.5,0.5,-0.5,0,0.5] [0,1,1,0,0] = 0 [0.5,0.5,-0.5,0,0.5] [0,1,1,0,0] = 0 [0.5,0.5,-0.5,0,0.5] [0,1,1,0,0] = 0 [0.5,0.5,-0.5,0,0.5] [0,1,1,0,0] = 0 [0.5,0.5,-0.5,0,0.5] [0,1,1,0,0] = 0 [0.5,0.5,-0.5,0,0.5] [0,1,1,0,0] = 0 [0.5,0.5,-0.5,0,0.5] [0,1,1,0,0] = 0 [0.5,0.5,-0.5,0,0.5] [0,1,1,0,0] = 0 [0.5,0.5,-0.5,0,0.5] [0,1,1,0,0] = 0 [0.5,0.5,-0.5,0,0.5] [0,1,1,0,0] = 0 [0.5,0.5,-0.5,0,0.5] [0,1,1,0,0] = 0 [0.5,0.5,-0.5,0,0.5] [0,1,1,0,0] = 0 [0.5,0.5,-0.5,0,0.5] [0,1,1,0,0] = 0 [0.5,0.5,-0.5,0,0.5] [0,1,1,0,0] = 0 [0.5,0.5,-0.5,0,0.5] [0,1,1,0,0] = 0 [0.5,0.5,-0.5,0,0.5] [0,1,1,0,0] = 0 No No No No No No No No No No No No No No No No No No No [0.5,0.5,-0.5,0,0.5]+(0.5 1 [0,1,1,0,0]) = [0.5,0.5,-0.5,0,0.5]+(0.5 1 [0,1,1,0,0]) = [0.5,0.5,-0.5,0,0.5]+(0.5 1 [0,1,1,0,0]) = [0.5,0.5,-0.5,0,0.5]+(0.5 1 [0,1,1,0,0]) = [0.5,0.5,-0.5,0,0.5]+(0.5 1 [0,1,1,0,0]) = [0.5,0.5,-0.5,0,0.5]+(0.5 1 [0,1,1,0,0]) = [0.5,0.5,-0.5,0,0.5]+(0.5 1 [0,1,1,0,0]) = [0.5,0.5,-0.5,0,0.5]+(0.5 1 [0,1,1,0,0]) = [0.5,0.5,-0.5,0,0.5]+(0.5 1 [0,1,1,0,0]) = [0.5,0.5,-0.5,0,0.5]+(0.5 1 [0,1,1,0,0]) = [0.5,0.5,-0.5,0,0.5]+(0.5 1 [0,1,1,0,0]) = [0.5,0.5,-0.5,0,0.5]+(0.5 1 [0,1,1,0,0]) = [0.5,0.5,-0.5,0,0.5]+(0.5 1 [0,1,1,0,0]) = [0.5,0.5,-0.5,0,0.5]+(0.5 1 [0,1,1,0,0]) = [0.5,0.5,-0.5,0,0.5]+(0.5 1 [0,1,1,0,0]) = [0.5,0.5,-0.5,0,0.5]+(0.5 1 [0,1,1,0,0]) = [0.5,0.5,-0.5,0,0.5]+(0.5 1 [0,1,1,0,0]) = [0.5,0.5,-0.5,0,0.5]+(0.5 1 [0,1,1,0,0]) = [0.5,0.5,-0.5,0,0.5]+(0.5 1 [0,1,1,0,0]) = [0.5,1,0,0,0.5] [0.5,1,0,0,0.5] [0.5,1,0,0,0.5] [0.5,1,0,0,0.5] [0.5,1,0,0,0.5] [0.5,1,0,0,0.5] [0.5,1,0,0,0.5] [0.5,1,0,0,0.5] [0.5,1,0,0,0.5] [0.5,1,0,0,0.5] [0.5,1,0,0,0.5] [0.5,1,0,0,0.5] [0.5,1,0,0,0.5] [0.5,1,0,0,0.5] [0.5,1,0,0,0.5] [0.5,1,0,0,0.5] [0.5,1,0,0,0.5] [0.5,1,0,0,0.5] [0.5,1,0,0,0.5] [1,0,0,1,0] [1,0,0,1,0] [1,0,0,1,0] [1,0,0,1,0] [1,0,0,1,0] [1,0,0,1,0] [1,0,0,1,0] [1,0,0,1,0] [1,0,0,1,0] [1,0,0,1,0] [1,0,0,1,0] [1,0,0,1,0] [1,0,0,1,0] [1,0,0,1,0] [1,0,0,1,0] [1,0,0,1,0] [1,0,0,1,0] [1,0,0,1,0] [1,0,0,1,0] -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 [0.5,1,0,0,0.5] [1,0,0,1,0] = 1 [0.5,1,0,0,0.5] [1,0,0,1,0] = 1 [0.5,1,0,0,0.5] [1,0,0,1,0] = 1 [0.5,1,0,0,0.5] [1,0,0,1,0] = 1 [0.5,1,0,0,0.5] [1,0,0,1,0] = 1 [0.5,1,0,0,0.5] [1,0,0,1,0] = 1 [0.5,1,0,0,0.5] [1,0,0,1,0] = 1 [0.5,1,0,0,0.5] [1,0,0,1,0] = 1 [0.5,1,0,0,0.5] [1,0,0,1,0] = 1 [0.5,1,0,0,0.5] [1,0,0,1,0] = 1 [0.5,1,0,0,0.5] [1,0,0,1,0] = 1 [0.5,1,0,0,0.5] [1,0,0,1,0] = 1 [0.5,1,0,0,0.5] [1,0,0,1,0] = 1 [0.5,1,0,0,0.5] [1,0,0,1,0] = 1 [0.5,1,0,0,0.5] [1,0,0,1,0] = 1 [0.5,1,0,0,0.5] [1,0,0,1,0] = 1 [0.5,1,0,0,0.5] [1,0,0,1,0] = 1 [0.5,1,0,0,0.5] [1,0,0,1,0] = 1 [0.5,1,0,0,0.5] [1,0,0,1,0] = 1 No No No No No No No No No No No No No No No No No No No [0.5,1,0,0,0.5]+(0.5 (-1) [1,0,0,1,0] = [0,1,0,-0.5,0.5] [0,1,0,-0.5,0.5] [0,1,0,-0.5,0.5] [0,1,0,-0.5,0.5] [0,1,0,-0.5,0.5] [0,1,0,-0.5,0.5] [0,1,0,-0.5,0.5] [0,1,0,-0.5,0.5] [0,1,0,-0.5,0.5] [0,1,0,-0.5,0.5] [0,1,0,-0.5,0.5] [0,1,0,-0.5,0.5] [0,1,0,-0.5,0.5] [0,1,0,-0.5,0.5] [0,1,0,-0.5,0.5] [0,1,0,-0.5,0.5] [0,1,0,-0.5,0.5] [0,1,0,-0.5,0.5] [0,1,0,-0.5,0.5] [0.5,1,0,0,0.5]+(0.5 (-1) [1,0,0,1,0] = [0.5,1,0,0,0.5]+(0.5 (-1) [1,0,0,1,0] = [0.5,1,0,0,0.5]+(0.5 (-1) [1,0,0,1,0] = [0.5,1,0,0,0.5]+(0.5 (-1) [1,0,0,1,0] = [0.5,1,0,0,0.5]+(0.5 (-1) [1,0,0,1,0] = [0.5,1,0,0,0.5]+(0.5 (-1) [1,0,0,1,0] = [0.5,1,0,0,0.5]+(0.5 (-1) [1,0,0,1,0] = [0.5,1,0,0,0.5]+(0.5 (-1) [1,0,0,1,0] = [0.5,1,0,0,0.5]+(0.5 (-1) [1,0,0,1,0] = [0.5,1,0,0,0.5]+(0.5 (-1) [1,0,0,1,0] = [0.5,1,0,0,0.5]+(0.5 (-1) [1,0,0,1,0] = [0.5,1,0,0,0.5]+(0.5 (-1) [1,0,0,1,0] = [0.5,1,0,0,0.5]+(0.5 (-1) [1,0,0,1,0] = [0.5,1,0,0,0.5]+(0.5 (-1) [1,0,0,1,0] = [0.5,1,0,0,0.5]+(0.5 (-1) [1,0,0,1,0] = [0.5,1,0,0,0.5]+(0.5 (-1) [1,0,0,1,0] = [0.5,1,0,0,0.5]+(0.5 (-1) [1,0,0,1,0] = [0.5,1,0,0,0.5]+(0.5 (-1) [1,0,0,1,0] = [1,0,1,0,1] [1,0,1,0,1] [1,0,1,0,1] [1,0,1,0,1] [1,0,1,0,1] [1,0,1,0,1] [1,0,1,0,1] [1,0,1,0,1] [1,0,1,0,1] [1,0,1,0,1] [1,0,1,0,1] [1,0,1,0,1] [1,0,1,0,1] [1,0,1,0,1] [1,0,1,0,1] [1,0,1,0,1] [1,0,1,0,1] [1,0,1,0,1] [1,0,1,0,1] +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 [0,1,0,-0.5,0.5] [1,0,1,0,1] = 0.5 [0,1,0,-0.5,0.5] [1,0,1,0,1] = 0.5 [0,1,0,-0.5,0.5] [1,0,1,0,1] = 0.5 [0,1,0,-0.5,0.5] [1,0,1,0,1] = 0.5 [0,1,0,-0.5,0.5] [1,0,1,0,1] = 0.5 [0,1,0,-0.5,0.5] [1,0,1,0,1] = 0.5 [0,1,0,-0.5,0.5] [1,0,1,0,1] = 0.5 [0,1,0,-0.5,0.5] [1,0,1,0,1] = 0.5 [0,1,0,-0.5,0.5] [1,0,1,0,1] = 0.5 [0,1,0,-0.5,0.5] [1,0,1,0,1] = 0.5 [0,1,0,-0.5,0.5] [1,0,1,0,1] = 0.5 [0,1,0,-0.5,0.5] [1,0,1,0,1] = 0.5 [0,1,0,-0.5,0.5] [1,0,1,0,1] = 0.5 [0,1,0,-0.5,0.5] [1,0,1,0,1] = 0.5 [0,1,0,-0.5,0.5] [1,0,1,0,1] = 0.5 [0,1,0,-0.5,0.5] [1,0,1,0,1] = 0.5 [0,1,0,-0.5,0.5] [1,0,1,0,1] = 0.5 [0,1,0,-0.5,0.5] [1,0,1,0,1] = 0.5 [0,1,0,-0.5,0.5] [1,0,1,0,1] = 0.5 Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Same as before (i.e., [0,1,0,-0.5,0.5]) Same as before (i.e., [0,1,0,-0.5,0.5]) Same as before (i.e., [0,1,0,-0.5,0.5]) Same as before (i.e., [0,1,0,-0.5,0.5]) Same as before (i.e., [0,1,0,-0.5,0.5]) Same as before (i.e., [0,1,0,-0.5,0.5]) Same as before (i.e., [0,1,0,-0.5,0.5]) Same as before (i.e., [0,1,0,-0.5,0.5]) Same as before (i.e., [0,1,0,-0.5,0.5]) Same as before (i.e., [0,1,0,-0.5,0.5]) Same as before (i.e., [0,1,0,-0.5,0.5]) Same as before (i.e., [0,1,0,-0.5,0.5]) Same as before (i.e., [0,1,0,-0.5,0.5]) Same as before (i.e., [0,1,0,-0.5,0.5]) Same as before (i.e., [0,1,0,-0.5,0.5]) Same as before (i.e., [0,1,0,-0.5,0.5]) Same as before (i.e., [0,1,0,-0.5,0.5]) Same as before (i.e., [0,1,0,-0.5,0.5]) Same as before (i.e., [0,1,0,-0.5,0.5]) [1,0,1,1,0] [1,0,1,1,0] [1,0,1,1,0] [1,0,1,1,0] [1,0,1,1,0] [1,0,1,1,0] [1,0,1,1,0] [1,0,1,1,0] [1,0,1,1,0] [1,0,1,1,0] [1,0,1,1,0] [1,0,1,1,0] [1,0,1,1,0] [1,0,1,1,0] [1,0,1,1,0] [1,0,1,1,0] [1,0,1,1,0] [1,0,1,1,0] [1,0,1,1,0] -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 [0,1,0,-0.5,0.5] [1,0,1,1,0] = -0.5 [0,1,0,-0.5,0.5] [1,0,1,1,0] = -0.5 [0,1,0,-0.5,0.5] [1,0,1,1,0] = -0.5 [0,1,0,-0.5,0.5] [1,0,1,1,0] = -0.5 [0,1,0,-0.5,0.5] [1,0,1,1,0] = -0.5 [0,1,0,-0.5,0.5] [1,0,1,1,0] = -0.5 [0,1,0,-0.5,0.5] [1,0,1,1,0] = -0.5 [0,1,0,-0.5,0.5] [1,0,1,1,0] = -0.5 [0,1,0,-0.5,0.5] [1,0,1,1,0] = -0.5 [0,1,0,-0.5,0.5] [1,0,1,1,0] = -0.5 [0,1,0,-0.5,0.5] [1,0,1,1,0] = -0.5 [0,1,0,-0.5,0.5] [1,0,1,1,0] = -0.5 [0,1,0,-0.5,0.5] [1,0,1,1,0] = -0.5 [0,1,0,-0.5,0.5] [1,0,1,1,0] = -0.5 [0,1,0,-0.5,0.5] [1,0,1,1,0] = -0.5 [0,1,0,-0.5,0.5] [1,0,1,1,0] = -0.5 [0,1,0,-0.5,0.5] [1,0,1,1,0] = -0.5 [0,1,0,-0.5,0.5] [1,0,1,1,0] = -0.5 [0,1,0,-0.5,0.5] [1,0,1,1,0] = -0.5 Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Same as before (i.e., [0,1,0,-0.5,0.5]) Same as before (i.e., [0,1,0,-0.5,0.5]) Same as before (i.e., [0,1,0,-0.5,0.5]) Same as before (i.e., [0,1,0,-0.5,0.5]) Same as before (i.e., [0,1,0,-0.5,0.5]) Same as before (i.e., [0,1,0,-0.5,0.5]) Same as before (i.e., [0,1,0,-0.5,0.5]) Same as before (i.e., [0,1,0,-0.5,0.5]) Same as before (i.e., [0,1,0,-0.5,0.5]) Same as before (i.e., [0,1,0,-0.5,0.5]) Same as before (i.e., [0,1,0,-0.5,0.5]) Same as before (i.e., [0,1,0,-0.5,0.5]) Same as before (i.e., [0,1,0,-0.5,0.5]) Same as before (i.e., [0,1,0,-0.5,0.5]) Same as before (i.e., [0,1,0,-0.5,0.5]) Same as before (i.e., [0,1,0,-0.5,0.5]) Same as before (i.e., [0,1,0,-0.5,0.5]) Same as before (i.e., [0,1,0,-0.5,0.5]) Same as before (i.e., [0,1,0,-0.5,0.5])
What Comes After Training? Abstraction Representation Learning Training Inferring
Inference If we check this learned w = [0,1,0,-0.5,0.5], we realize that it correctly classifies all the given emailsand potentially new emails and and and and and and and and and vaccine vaccine vaccine vaccine vaccine vaccine vaccine vaccine vaccine the the the the the the the the the of of of of of of of of of nigeria nigeria nigeria nigeria nigeria nigeria nigeria nigeria nigeria y y y y y y y y y y = w.x y = w.x y = w.x y = w.x y = w.x y = w.x y = w.x y = w.x y = w.x [0,1,0,-0.5,0.5] [1,1,0,1,1]= +1 > 0 Spam [0,1,0,-0.5,0.5] [0,0,1,1,0]=-0.5 < 0 Not spam [0,1,0,-0.5,0.5] [0,1,1,0,0]=+1 > 0 Spam [0,1,0,-0.5,0.5] [0,1,1,0,0]=+1 > 0 Spam [0,1,0,-0.5,0.5] [0,1,1,0,0]=+1 > 0 Spam [0,1,0,-0.5,0.5] [0,1,1,0,0]=+1 > 0 Spam [0,1,0,-0.5,0.5] [0,1,1,0,0]=+1 > 0 Spam [0,1,0,-0.5,0.5] [1,1,0,1,1]= +1 > 0 Spam [0,1,0,-0.5,0.5] [0,0,1,1,0]=-0.5 < 0 Not spam [0,1,0,-0.5,0.5] [0,1,1,0,0]=+1 > 0 Spam [0,1,0,-0.5,0.5] [1,0,0,1,0]=-0.5 < 0 Not spam [0,1,0,-0.5,0.5] [1,0,0,1,0]=-0.5 < 0 Not spam [0,1,0,-0.5,0.5] [1,0,0,1,0]=-0.5 < 0 Not spam [0,1,0,-0.5,0.5] [1,0,0,1,0]=-0.5 < 0 Not spam [0,1,0,-0.5,0.5] [1,0,0,1,0]=-0.5 < 0 Not spam [0,1,0,-0.5,0.5] [1,0,0,1,0]=-0.5 < 0 Not spam [0,1,0,-0.5,0.5] [1,1,0,1,1]= +1 > 0 Spam [0,1,0,-0.5,0.5] [0,0,1,1,0]=-0.5 < 0 Not spam [0,1,0,-0.5,0.5] [0,1,1,0,0]=+1 > 0 Spam [0,1,0,-0.5,0.5] [1,0,0,1,0]=-0.5 < 0 Not spam [0,1,0,-0.5,0.5] [1,0,1,0,1]=+0.5 > 0 Spam [0,1,0,-0.5,0.5] [1,0,1,0,1]=+0.5 > 0 Spam [0,1,0,-0.5,0.5] [1,0,1,0,1]=+0.5 > 0 Spam [0,1,0,-0.5,0.5] [1,0,1,0,1]=+0.5 > 0 Spam [0,1,0,-0.5,0.5] [1,0,1,0,1]=+0.5 > 0 Spam [0,1,0,-0.5,0.5] [1,0,1,0,1]=+0.5 > 0 Spam [0,1,0,-0.5,0.5] [1,0,1,0,1]=+0.5 > 0 Spam [0,1,0,-0.5,0.5] [1,1,0,1,1]= +1 > 0 Spam [0,1,0,-0.5,0.5] [0,0,1,1,0]=-0.5 < 0 Not spam [0,1,0,-0.5,0.5] [0,1,1,0,0]=+1 > 0 Spam [0,1,0,-0.5,0.5] [1,0,0,1,0]=-0.5 < 0 Not spam [0,1,0,-0.5,0.5] [1,0,1,0,1]=+0.5 > 0 Spam [0,1,0,-0.5,0.5] [1,0,1,1,0]=-0.5 < 0 Not spam [0,1,0,-0.5,0.5] [1,0,1,1,0]=-0.5 < 0 Not spam [0,1,0,-0.5,0.5] [1,0,1,1,0]=-0.5 < 0 Not spam [0,1,0,-0.5,0.5] [1,0,1,1,0]=-0.5 < 0 Not spam [0,1,0,-0.5,0.5] [1,0,1,1,0]=-0.5 < 0 Not spam [0,1,0,-0.5,0.5] [1,0,1,1,0]=-0.5 < 0 Not spam [0,1,0,-0.5,0.5] [1,0,1,1,0]=-0.5 < 0 Not spam [0,1,0,-0.5,0.5] [1,0,1,1,0]=-0.5 < 0 Not spam [0,1,0,-0.5,0.5] [1,1,0,1,1]= +1 > 0 Spam [0,1,0,-0.5,0.5] [0,0,1,1,0]=-0.5 < 0 Not spam [0,1,0,-0.5,0.5] [0,0,1,1,0]=-0.5 < 0 Not spam [0,1,0,-0.5,0.5] [0,0,1,1,0]=-0.5 < 0 Not spam [0,1,0,-0.5,0.5] [0,0,1,1,0]=-0.5 < 0 Not spam [0,1,0,-0.5,0.5] [1,1,0,1,1]= +1 > 0 Spam [0,1,0,-0.5,0.5] [0,0,1,1,0]=-0.5 < 0 Not spam [0,1,0,-0.5,0.5] [0,1,1,0,0]=+1 > 0 Spam [0,1,0,-0.5,0.5] [1,0,0,1,0]=-0.5 < 0 Not spam [0,1,0,-0.5,0.5] [1,0,1,0,1]=+0.5 > 0 Spam [0,1,0,-0.5,0.5] [1,0,1,1,0]=-0.5 < 0 Not spam [0,1,0,-0.5,0.5] [0,1,0,0,1] = +1.5 > 0 Spam [0,1,0,-0.5,0.5] [0,1,0,0,1] = +1.5 > 0 Spam [0,1,0,-0.5,0.5] [0,1,0,0,1] = +1.5 > 0 Spam [0,1,0,-0.5,0.5] [0,1,0,0,1] = +1.5 > 0 Spam [0,1,0,-0.5,0.5] [0,1,0,0,1] = +1.5 > 0 Spam [0,1,0,-0.5,0.5] [0,1,0,0,1] = +1.5 > 0 Spam [0,1,0,-0.5,0.5] [0,1,0,0,1] = +1.5 > 0 Spam [0,1,0,-0.5,0.5] [0,1,0,0,1] = +1.5 > 0 Spam [0,1,0,-0.5,0.5] [0,1,0,0,1] = +1.5 > 0 Spam [0,1,0,-0.5,0.5] [1,1,0,1,1]= +1 > 0 Spam [0,1,0,-0.5,0.5] [1,1,0,1,1]= +1 > 0 Spam [0,1,0,-0.5,0.5] [1,1,0,1,1]= +1 > 0 Spam Email a Email a Email a Email a Email a Email a Email a Email a Email a 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 +1 +1 +1 +1 +1 +1 +1 +1 +1 Email b Email b Email b Email b Email b Email b Email b Email b Email b 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 -1 -1 -1 -1 -1 -1 -1 -1 -1 Email c Email c Email c Email c Email c Email c Email c Email c Email c 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 +1 +1 +1 +1 +1 +1 +1 +1 +1 Email d Email d Email d Email d Email d Email d Email d Email d Email d 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 -1 -1 -1 -1 -1 -1 -1 -1 -1 Email e Email e Email e Email e Email e Email e Email e Email e Email e 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 +1 +1 +1 +1 +1 +1 +1 +1 +1 Email f Email f Email f Email f Email f Email f Email f Email f Email f 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 -1 -1 -1 -1 -1 -1 -1 -1 -1 Eamil g Eamil g Eamil g Eamil g Eamil g Eamil g Eamil g Eamil g Eamil g 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 ? ? ? ? ? ? ? ? ?
Inference If we check this learned w = [0,1,0,-0.5,0.5], we realize that it correctly classifies all the given emailsand potentially new emails and vaccine the of nigeria y y = w.x [0,1,0,-0.5,0.5] [1,1,0,1,1]= +1 > 0 Spam [0,1,0,-0.5,0.5] [0,0,1,1,0]=-0.5 < 0 Not spam [0,1,0,-0.5,0.5] [0,1,1,0,0]=+1 > 0 Spam [0,1,0,-0.5,0.5] [1,0,0,1,0]=-0.5 < 0 Not spam [0,1,0,-0.5,0.5] [1,0,1,0,1]=+0.5 > 0 Spam [0,1,0,-0.5,0.5] [1,0,1,1,0]=-0.5 < 0 Not spam [0,1,0,-0.5,0.5] [0,1,0,0,1] = +1.5 > 0 Spam Email a 1 1 0 1 1 +1 Email b 0 0 1 1 0 -1 Email c 0 1 1 0 0 +1 Email d 1 0 0 1 0 -1 Email e 1 0 1 0 1 +1 Email f 1 0 1 1 0 -1 Eamil g 0 1 0 0 1 ? It makes a certain amount of sense as it considers vaccine and nigeria indicative of spam!
Back to Where We Started While it is easy to find a consensus sequence that identifies existing binding sites, it is not easy to find one that is optimal for predicting the occurrence of new sites Stormo, Gary D., et al. [3] discovered that some published binding sites identified by consensus sequences did not function as translation initiation sites in mRNA of E.coli This lead to the hypothesis that there could be features (beyond only similarity between sequences) that can serve in distinguishing true ribosome binding sites from non-sites In an attempt to learnthese features and distinguish between true-sites and non-sites , Stormo, Gary D., et al. used an ML algorithm named perceptron!
From Abstraction Abstraction to Representation Representation To this end, they used a training dataset that contains 78,612 bases of transcribed RNA on which reside (at least) 124 genes The first question was, how to represent any given sequence (say, a seven long sequence ACGGTAC)? They used a matrix of 4 x N elements, where N is the length of the sequence, and 0s and 1s to indicate the absence or presence of a base at any position 1 2 3 4 5 6 7 A 1 0 0 0 0 1 0 C 0 1 0 0 0 0 1 Represents ACGGTAC G 0 0 1 1 0 0 0 T 0 0 0 0 1 0 0
From Representation Representation to Training Training They then: trained a perceptron using the given training set, and once done, examined any new sequence and inferredwhether it is a true-site or non-site For this sake, they needed to associate a weight, wij, with each xij in any input feature matrix x (hence, they defined matrix w) and a threshold ? such that the (simplified) output is: +1 if a defined score over w.x> ? The special case where the score is ?will be regarded as wrong -1 if a defined score over w.x < ?
Recall: Scores Over Position Weight Matrices A score over a weight matrix representation can be calculated as follows (matrix was presented in the previous lecture for Pribnow sequences): One column for each position in the Pribnow sequences This matrix will be the result of w.x A -38 19 1 12 10 -48 One row for each nucleotide C -15 -38 -8 -10 -3 -32 G -13 -48 -6 -7 -10 -48 T 17 -32 8 -9 -6 19 Score of TATAAT = 17 + 19 + 8 + 12 + 10 + 19 = 85
From Training Training to Inference Inference In short, a matrix w can be learnt based on a given training set using a perceptron algorithm (in the same way we did spam emails) Once learnt, w can be multiplied by any new sequence, x, represented as a matrix of 0s and 1s, after which we can infer whether xis a true- site or non-site based on the output score If the output score is greater than , the site is true If the output score is less than , the site is not true
Next Wednesdays Lecture Perceptrons exhibit several limitations, which will be discussed next Wednesday These limitations serve as a motivation for a better learning algorithm known as Support-Vector Machine (SVM), which we will discuss next Wednesday as well
References [1] Rajaraman Anand and Jeffrey David Ullman. Mining of massive datasets. Cambridge University Press, 2011 [2] Stormo, Gary D. "DNA binding sites: representation and discovery." Bioinformatics 16.1 (2000): 16-23 [3] Stormo, Gary D., et al. "Use of the Perceptron algorithm to distinguish translational initiation sites in E. coli." Nucleic acids research 10.9 (1982): 2997-3011 [4] de Smit, Maarten H., and J. Van Duin. "Secondary structure of the ribosome binding site determines translational efficiency: a quantitative analysis." Proceedings of the National Academy of Sciences 87.19 (1990): 7668-7672