Probabilistic Models for Sequence Data: Foundations of Algorithms and Machine Learning

Probabilistic Models for Sequence Data: Foundations of Algorithms and Machine Learning
Slide Note
Embed
Share

This content covers topics such as Markov models, state transition matrices, Maximum Likelihood Estimation for Markov Chains, and Hidden Markov Models. It explains the concepts with examples and visuals, focusing on applications in various fields like NLP, weather forecasting, and stock market analysis.

  • Probabilistic Models
  • Sequence Data
  • Markov Models
  • Machine Learning
  • Hidden Markov Models

Uploaded on Feb 17, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Probabilistic Models for Sequence Data 1 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

  2. Random Sequence Data ?1,?2, ,?? Subscript denotes steps in time Many applications NLP, Arrival data, Weather, Stocks, Biology, Na ve models Complete independence too simple Complete pairwise dependence too many parameters 2 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

  3. Markov Model ?(?1:?) = ?(?1)(??2|?1) ?(?? 1|??) ?1 ?2 ?? 1 ?? Markov (conditional independence) assumption ??+1 ?? ? | ?? for all ? Future conditionally independent of past given present Still too many parameters Homogeneous / stationary: ?(??+1|??) is same for all ? 3 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

  4. State Transition Matrix For discrete ?? State transition matrix ? ??? = ?(??+1= ?|??= ?) Stochastic matrix N-step matrix ?(?) ???(?) = ?(??+?= ?|??= ?) Chapman Kolmogorov Equation ???(? + ?) = ????? ???(?) ?(? + ?) = ?(?)?(?) ?(?) = ?? 4 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

  5. MLE for a Markov Chain Data: Multiple sequences ? = {?1, ,??}, where ?? = {??1, ????} Formulate loglikelihood 1 ?? ??? ? ??1 ?(???|??? 1) = ?? ??? ? ? ? ?? where ?? 1, ???denote Maximize wrt parameters 1 ?? ??? ???? 1 ???= ??= ??? 5 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

  6. Digression: Stationary Distribution Does a Markov Chain ever stabilize, ie ??+1= ??? Only if A satisfies particular properties Ergodic Markov Chains Without Ergodic Markov Chains our daily lives would come to a standstill 6 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

  7. Hidden Markov Model Sequence over discrete states States generate emissions Emissions are observed, but states are hidden 7 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

  8. Motivations States have physical meaning Speech recognition Speaker diaritization Part of speech tagging Long range dependencies in Markov Model with a few parameters 8 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

  9. Hidden Markov Model Conditional Independence assumptions ??+1 ?? 1| ?? ?? ?? ,?? ?? Likelihood ? ?,? = ? ?1? ?1?1 ? ???? 1?(??|??) ? 9 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

  10. Hidden Markov Model: Generative Process 1. 2. 3. 4. 5. Sample an initial state ?1 ?(?1) Sample an initial output from initial state: ?1 ?(?1|?1) Repeat for N steps Sample curr state based on prev. state ?? ?(??|?? 1) Sample curr output based on curr state ?? ?(??|??) Compare with iid generative mixture models 10 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

  11. Inference Problems Filtering ?(??|?1:?) Smoothing ?(?? ?|?1:?) for ? = 1,2, Prediction ? ??+ ?1:? for = 1,2, MAP estimation / Viterbi decoding argmax ?(?1:?|?1:?) ? 11 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

  12. Inference: Challenge Na ve summation does not work ? ??(?1,?2, ,??, ,??,?1,?2, ,??) ?1, ,???(?1,?2, ,??, ,??,?1,?2, ,??) ? ??? = What is the cost? Solution: Identify and reuse shared computations 12 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

  13. Forward Algorithm Define ??? ? ??= ? ?1:? = ? ??= ? ??,?1:? 1 ? ????= ? ? ??= ? ?1:? 1 (Derivation? Normalization?) = ? ????= ? ? = ??(?) ? Simple recursive algorithm Base case: ?0? = ?0? ?? Complexity ?(?2?) where K is #state values Compare with iid models ? ??= ? ?? 1= ? ?(?? 1= ?|?1:? 1) ?(?,?)?? 1(?) 13 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

  14. Backward Algorithm Define ??? ? ??+1:???= ? = ? ??+1= ?,??+1,??+2:?|??= ?) ? = ? ??+2:???+1= ? ?(??+1= ?,??+1|??= ?) ? = ? ??+2:???+1= ? ? ??+1??+1= ? ? ??+1= ? ??= ? ? = ??+1? ??+1? ?(?,?) ? Simple recursive algorithm Base case? 14 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

  15. Forward Backward Algorithm Define??? = ? ??= ? ?1:? ??? ??(?) Overall algorithm Compute forward pass Compute backward pass Combine to get ??? Sum product algorithm Belief propagation / message passing algorithm 15 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

  16. Viterbi Algorithm Most likely sequence of states ? = argmax ?1:??(?1:?|?1:?) Not the same as sequence of most likely states argmax ?1 ?2 ?(?2|?1:?), , argmax ?(?1|?1:?),argmax ?(??|?1:?) ?? Forward Backward + additional bookkeeping Track trellis of states Max product algorithm 16 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

  17. Parameter Estimation Straight forward for fully observed data Homework 17 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

  18. EM for HMMs (Baum Welch Algorithm) Formulate complete data loglikelihood ? 1log??+ = ?? ??? log??? ? ? ? + ? ??,? log?(???|??) ? ? ? Formulate expectation given current parameters ? ,???? = ?[?? 1]log??+ ?[??? ]log??? ? ? ? + ? ??= ?|??? log?(???|??) ? ? ? 18 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

  19. E-step Compute the current expectations 1= ? ?? ?(??1= ?|??,????) ? Use backward algorithm ? ??? = ?(??,? 1= ?,??,?= ?|??,????) ? ? Use extension of forward-backward (HW) ? ?? = ?(??,?= ?|??,????) ? ? Use forward-backward Compare with iid models 19 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

  20. M-step Maximize expected complete loglikelihood using current expected sufficient statistics ? ??? ? ?[??? ] ??=? ?? ???= 1 ? Emission parameter estimates depend on emission distribution 20 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

  21. Choosing number of hidden states Similar to choosing number of components in mixture model One possibility: Cross validation Computationally more costly 21 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

  22. Many, many generalizations . Work horse of signal processing, e.g. speech for decades Continuous data Long range dependencies Multiple hidden layers Handling inputs 22 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

  23. Linear Chain Conditional Random Fields Discriminative Markov Model (recall NB vs LR) 1 ? ? ? = ?(?) exp ????(??,?? 1,??) ? ? Where ? ? = ? ?exp ?????(??,?? 1,??) Forward backward algorithm for inference Gradient descent instead of EM for par estimation 23 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

  24. State Space Models HMM with continuous hidden states Linear dynamical systems Conditional distributions are linear-Gaussian Mathematically tractable inference Kalman filter Widely used in time-series analysis + forecasting, object tracking, robotics, etc 24 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

  25. Probabilistic Graphical Models Used for modelling conditional independences in joint distributions using large no. of RVs Directed Graphical Models / Bayes Nets Undirected Graphical Models / Random Fields Inference computationally hard in general Parameter estimation with hidden variables harder Probability theory + graph theory 25 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Related


More Related Content