Probabilistic Models for Sequence Data: Foundations of Algorithms and Machine Learning
This content covers topics such as Markov models, state transition matrices, Maximum Likelihood Estimation for Markov Chains, and Hidden Markov Models. It explains the concepts with examples and visuals, focusing on applications in various fields like NLP, weather forecasting, and stock market analysis.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Probabilistic Models for Sequence Data 1 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya
Random Sequence Data ?1,?2, ,?? Subscript denotes steps in time Many applications NLP, Arrival data, Weather, Stocks, Biology, Na ve models Complete independence too simple Complete pairwise dependence too many parameters 2 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya
Markov Model ?(?1:?) = ?(?1)(??2|?1) ?(?? 1|??) ?1 ?2 ?? 1 ?? Markov (conditional independence) assumption ??+1 ?? ? | ?? for all ? Future conditionally independent of past given present Still too many parameters Homogeneous / stationary: ?(??+1|??) is same for all ? 3 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya
State Transition Matrix For discrete ?? State transition matrix ? ??? = ?(??+1= ?|??= ?) Stochastic matrix N-step matrix ?(?) ???(?) = ?(??+?= ?|??= ?) Chapman Kolmogorov Equation ???(? + ?) = ????? ???(?) ?(? + ?) = ?(?)?(?) ?(?) = ?? 4 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya
MLE for a Markov Chain Data: Multiple sequences ? = {?1, ,??}, where ?? = {??1, ????} Formulate loglikelihood 1 ?? ??? ? ??1 ?(???|??? 1) = ?? ??? ? ? ? ?? where ?? 1, ???denote Maximize wrt parameters 1 ?? ??? ???? 1 ???= ??= ??? 5 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya
Digression: Stationary Distribution Does a Markov Chain ever stabilize, ie ??+1= ??? Only if A satisfies particular properties Ergodic Markov Chains Without Ergodic Markov Chains our daily lives would come to a standstill 6 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya
Hidden Markov Model Sequence over discrete states States generate emissions Emissions are observed, but states are hidden 7 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya
Motivations States have physical meaning Speech recognition Speaker diaritization Part of speech tagging Long range dependencies in Markov Model with a few parameters 8 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya
Hidden Markov Model Conditional Independence assumptions ??+1 ?? 1| ?? ?? ?? ,?? ?? Likelihood ? ?,? = ? ?1? ?1?1 ? ???? 1?(??|??) ? 9 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya
Hidden Markov Model: Generative Process 1. 2. 3. 4. 5. Sample an initial state ?1 ?(?1) Sample an initial output from initial state: ?1 ?(?1|?1) Repeat for N steps Sample curr state based on prev. state ?? ?(??|?? 1) Sample curr output based on curr state ?? ?(??|??) Compare with iid generative mixture models 10 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya
Inference Problems Filtering ?(??|?1:?) Smoothing ?(?? ?|?1:?) for ? = 1,2, Prediction ? ??+ ?1:? for = 1,2, MAP estimation / Viterbi decoding argmax ?(?1:?|?1:?) ? 11 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya
Inference: Challenge Na ve summation does not work ? ??(?1,?2, ,??, ,??,?1,?2, ,??) ?1, ,???(?1,?2, ,??, ,??,?1,?2, ,??) ? ??? = What is the cost? Solution: Identify and reuse shared computations 12 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya
Forward Algorithm Define ??? ? ??= ? ?1:? = ? ??= ? ??,?1:? 1 ? ????= ? ? ??= ? ?1:? 1 (Derivation? Normalization?) = ? ????= ? ? = ??(?) ? Simple recursive algorithm Base case: ?0? = ?0? ?? Complexity ?(?2?) where K is #state values Compare with iid models ? ??= ? ?? 1= ? ?(?? 1= ?|?1:? 1) ?(?,?)?? 1(?) 13 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya
Backward Algorithm Define ??? ? ??+1:???= ? = ? ??+1= ?,??+1,??+2:?|??= ?) ? = ? ??+2:???+1= ? ?(??+1= ?,??+1|??= ?) ? = ? ??+2:???+1= ? ? ??+1??+1= ? ? ??+1= ? ??= ? ? = ??+1? ??+1? ?(?,?) ? Simple recursive algorithm Base case? 14 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya
Forward Backward Algorithm Define??? = ? ??= ? ?1:? ??? ??(?) Overall algorithm Compute forward pass Compute backward pass Combine to get ??? Sum product algorithm Belief propagation / message passing algorithm 15 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya
Viterbi Algorithm Most likely sequence of states ? = argmax ?1:??(?1:?|?1:?) Not the same as sequence of most likely states argmax ?1 ?2 ?(?2|?1:?), , argmax ?(?1|?1:?),argmax ?(??|?1:?) ?? Forward Backward + additional bookkeeping Track trellis of states Max product algorithm 16 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya
Parameter Estimation Straight forward for fully observed data Homework 17 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya
EM for HMMs (Baum Welch Algorithm) Formulate complete data loglikelihood ? 1log??+ = ?? ??? log??? ? ? ? + ? ??,? log?(???|??) ? ? ? Formulate expectation given current parameters ? ,???? = ?[?? 1]log??+ ?[??? ]log??? ? ? ? + ? ??= ?|??? log?(???|??) ? ? ? 18 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya
E-step Compute the current expectations 1= ? ?? ?(??1= ?|??,????) ? Use backward algorithm ? ??? = ?(??,? 1= ?,??,?= ?|??,????) ? ? Use extension of forward-backward (HW) ? ?? = ?(??,?= ?|??,????) ? ? Use forward-backward Compare with iid models 19 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya
M-step Maximize expected complete loglikelihood using current expected sufficient statistics ? ??? ? ?[??? ] ??=? ?? ???= 1 ? Emission parameter estimates depend on emission distribution 20 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya
Choosing number of hidden states Similar to choosing number of components in mixture model One possibility: Cross validation Computationally more costly 21 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya
Many, many generalizations . Work horse of signal processing, e.g. speech for decades Continuous data Long range dependencies Multiple hidden layers Handling inputs 22 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya
Linear Chain Conditional Random Fields Discriminative Markov Model (recall NB vs LR) 1 ? ? ? = ?(?) exp ????(??,?? 1,??) ? ? Where ? ? = ? ?exp ?????(??,?? 1,??) Forward backward algorithm for inference Gradient descent instead of EM for par estimation 23 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya
State Space Models HMM with continuous hidden states Linear dynamical systems Conditional distributions are linear-Gaussian Mathematically tractable inference Kalman filter Widely used in time-series analysis + forecasting, object tracking, robotics, etc 24 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya
Probabilistic Graphical Models Used for modelling conditional independences in joint distributions using large no. of RVs Directed Graphical Models / Bayes Nets Undirected Graphical Models / Random Fields Inference computationally hard in general Parameter estimation with hidden variables harder Probability theory + graph theory 25 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya