Probabilistic Models for Sequence Data: Foundations of Algorithms and Machine Learning

Slide Note

This content covers topics such as Markov models, state transition matrices, Maximum Likelihood Estimation for Markov Chains, and Hidden Markov Models. It explains the concepts with examples and visuals, focusing on applications in various fields like NLP, weather forecasting, and stock market analysis.

lane_301 Follow

Uploaded on Feb 17, 2025 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

Probabilistic Models for Sequence Data 1 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Random Sequence Data ?1,?2, ,?? Subscript denotes steps in time Many applications NLP, Arrival data, Weather, Stocks, Biology, Na ve models Complete independence too simple Complete pairwise dependence too many parameters 2 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Markov Model ?(?1:?) = ?(?1)(??2|?1) ?(?? 1|??) ?1 ?2 ?? 1 ?? Markov (conditional independence) assumption ??+1 ?? ? | ?? for all ? Future conditionally independent of past given present Still too many parameters Homogeneous / stationary: ?(??+1|??) is same for all ? 3 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

State Transition Matrix For discrete ?? State transition matrix ? ??? = ?(??+1= ?|??= ?) Stochastic matrix N-step matrix ?(?) ???(?) = ?(??+?= ?|??= ?) Chapman Kolmogorov Equation ???(? + ?) = ????? ???(?) ?(? + ?) = ?(?)?(?) ?(?) = ?? 4 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

MLE for a Markov Chain Data: Multiple sequences ? = {?1, ,??}, where ?? = {??1, ????} Formulate loglikelihood 1 ?? ??? ? ??1 ?(???|??? 1) = ?? ??? ? ? ? ?? where ?? 1, ???denote Maximize wrt parameters 1 ?? ??? ???? 1 ???= ??= ??? 5 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Digression: Stationary Distribution Does a Markov Chain ever stabilize, ie ??+1= ??? Only if A satisfies particular properties Ergodic Markov Chains Without Ergodic Markov Chains our daily lives would come to a standstill 6 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Hidden Markov Model Sequence over discrete states States generate emissions Emissions are observed, but states are hidden 7 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Motivations States have physical meaning Speech recognition Speaker diaritization Part of speech tagging Long range dependencies in Markov Model with a few parameters 8 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Hidden Markov Model Conditional Independence assumptions ??+1 ?? 1| ?? ?? ?? ,?? ?? Likelihood ? ?,? = ? ?1? ?1?1 ? ???? 1?(??|??) ? 9 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Hidden Markov Model: Generative Process 1. 2. 3. 4. 5. Sample an initial state ?1 ?(?1) Sample an initial output from initial state: ?1 ?(?1|?1) Repeat for N steps Sample curr state based on prev. state ?? ?(??|?? 1) Sample curr output based on curr state ?? ?(??|??) Compare with iid generative mixture models 10 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Inference Problems Filtering ?(??|?1:?) Smoothing ?(?? ?|?1:?) for ? = 1,2, Prediction ? ??+ ?1:? for = 1,2, MAP estimation / Viterbi decoding argmax ?(?1:?|?1:?) ? 11 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Inference: Challenge Na ve summation does not work ? ??(?1,?2, ,??, ,??,?1,?2, ,??) ?1, ,???(?1,?2, ,??, ,??,?1,?2, ,??) ? ??? = What is the cost? Solution: Identify and reuse shared computations 12 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Forward Algorithm Define ??? ? ??= ? ?1:? = ? ??= ? ??,?1:? 1 ? ????= ? ? ??= ? ?1:? 1 (Derivation? Normalization?) = ? ????= ? ? = ??(?) ? Simple recursive algorithm Base case: ?0? = ?0? ?? Complexity ?(?2?) where K is #state values Compare with iid models ? ??= ? ?? 1= ? ?(?? 1= ?|?1:? 1) ?(?,?)?? 1(?) 13 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Backward Algorithm Define ??? ? ??+1:???= ? = ? ??+1= ?,??+1,??+2:?|??= ?) ? = ? ??+2:???+1= ? ?(??+1= ?,??+1|??= ?) ? = ? ??+2:???+1= ? ? ??+1??+1= ? ? ??+1= ? ??= ? ? = ??+1? ??+1? ?(?,?) ? Simple recursive algorithm Base case? 14 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Forward Backward Algorithm Define??? = ? ??= ? ?1:? ??? ??(?) Overall algorithm Compute forward pass Compute backward pass Combine to get ??? Sum product algorithm Belief propagation / message passing algorithm 15 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Viterbi Algorithm Most likely sequence of states ? = argmax ?1:??(?1:?|?1:?) Not the same as sequence of most likely states argmax ?1 ?2 ?(?2|?1:?), , argmax ?(?1|?1:?),argmax ?(??|?1:?) ?? Forward Backward + additional bookkeeping Track trellis of states Max product algorithm 16 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Parameter Estimation Straight forward for fully observed data Homework 17 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

EM for HMMs (Baum Welch Algorithm) Formulate complete data loglikelihood ? 1log??+ = ?? ??? log??? ? ? ? + ? ??,? log?(???|??) ? ? ? Formulate expectation given current parameters ? ,???? = ?[?? 1]log??+ ?[??? ]log??? ? ? ? + ? ??= ?|??? log?(???|??) ? ? ? 18 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

E-step Compute the current expectations 1= ? ?? ?(??1= ?|??,????) ? Use backward algorithm ? ??? = ?(??,? 1= ?,??,?= ?|??,????) ? ? Use extension of forward-backward (HW) ? ?? = ?(??,?= ?|??,????) ? ? Use forward-backward Compare with iid models 19 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

M-step Maximize expected complete loglikelihood using current expected sufficient statistics ? ??? ? ?[??? ] ??=? ?? ???= 1 ? Emission parameter estimates depend on emission distribution 20 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Choosing number of hidden states Similar to choosing number of components in mixture model One possibility: Cross validation Computationally more costly 21 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Many, many generalizations . Work horse of signal processing, e.g. speech for decades Continuous data Long range dependencies Multiple hidden layers Handling inputs 22 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Linear Chain Conditional Random Fields Discriminative Markov Model (recall NB vs LR) 1 ? ? ? = ?(?) exp ????(??,?? 1,??) ? ? Where ? ? = ? ?exp ?????(??,?? 1,??) Forward backward algorithm for inference Gradient descent instead of EM for par estimation 23 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

State Space Models HMM with continuous hidden states Linear dynamical systems Conditional distributions are linear-Gaussian Mathematically tractable inference Kalman filter Widely used in time-series analysis + forecasting, object tracking, robotics, etc 24 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Probabilistic Graphical Models Used for modelling conditional independences in joint distributions using large no. of RVs Directed Graphical Models / Bayes Nets Undirected Graphical Models / Random Fields Inference computationally hard in general Parameter estimation with hidden variables harder Probability theory + graph theory 25 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Probabilistic Models for Sequence Data: Foundations of Algorithms and Machine Learning

Download Presentation

Presentation Transcript

Related

More Related Content