Probabilistic Inference in PRISM: Solving Statistical Machine Learning Challenges

Probabilistic Inference in PRISM: Solving Statistical Machine Learning Challenges
Slide Note
Embed
Share

Introducing PRISM by Taisuke Sato from Tokyo Institute of Technology, a high-level modeling language that simplifies the labor-intensive process of statistical machine learning. From basic ideas to ABO blood type program, discover how PRISM offers universal learning and inference methods applicable to every model, revolutionizing probabilistic inference.

  • Probabilistic Inference
  • PRISM
  • Statistical Machine Learning
  • Modeling Language
  • Taisuke Sato

Uploaded on Mar 03, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Probabilistic Inference in PRISM Taisuke Sato Tokyo Institute of Technology

  2. Problem Statistical machine learning is a labor-intensive process: {modeling learning evaluation}* of trial-and-error Pains of deriving and implementing model-specific learning algorithms and model-specific probabilistic inference ... Model 1 Model 2 Model n EM2 EM1 EMn ... EM VB MCMC model-specific learning algorithms

  3. Our solution Develop a high-level modeling language that offers universal learning and inference methods applicable to every model ... Model 1 Model 2 Model n modeling language ... EM VB MCMC The user concentrates on modeling and the rest (learning and inference) is taken care of by the system

  4. PRISM (http://sato-www.cs.titech.ac.jp/prism/) Logic-based high-level modeling language Probabilistic models Bayesian network New model HMM PCFG ... PRISM system EM/MAP VT VBVT VB MCMC Learning methods Its generic inference/learning methods subsume standard algorithms such as FB for HMMs and BP for Bayesian networks

  5. Basic ideas Semantics program = Turing machine + probabilistic choice + Dirichlet prior denotation = a probability measure over possible worlds Propositionalized probability computation (PPC) programs written at predicate logic level probability computation at propositional logic level Dynamic programming for PPC proof search generates a directed graph (explanation graph) Probabilities are computed from bottom to top in the graph Discriminative use generatively define a model by a PRISM program and descriminatively use it for better prediction performance

  6. ABO blood type program values(abo,[a,b,o],[0.5,0.2,0.3]). msw(abo,a) is true with prob. 0.5 btype(X):- gtype(Gf,Gm), pg_table(X,[Gf,Gm]). pg_table(X,GT):- ((X=a;X=b),(GT=[X,o];GT=[o,X];GT=[X,X]) ; X=o,GT=[o,o] ; X=ab,(GT=[a,b];GT=[b,a])). gtype(Gf,Gm):- msw(abo,Gf),msw(abo,Gm). father mother a b a o AB A child b o B probabilistic primitives simulate gene inheritance from father (left) and mother (right)

  7. Propositionalized probability computation Explanation graph for btype(a) that explains how btype(a) is proved by probabilistic choice made by msw-atoms 0.55 btype(a) <=> gtype(a,a) v gtype(a,o) v gtype(o,a) 0.15 0.15 0.25 0.25 0.5 0.5 gtype(a,a) <=> msw(abo,a) & msw(abo,a) 0.15 0.5 0.3 gtype(a,o) <=> msw(abo,a) & msw(abo,o) Sum-product computation of probabilities in a bottom-up manner using probabilities assigned to msw atoms 0.15 0.3 0.5 gtype(o,a) <=> msw(abo,o) & msw(abo,a) PPC+DP subsumes forward-backward, belief propagation, inside- outside computation Expl. graph is acyclic and dynamic programming (DP) is possible

  8. Learning A program defines a joint distributionP(x,y| ) where x hidden and y observed P(msw(abo,a),..btype(a), | a, b, o) where a+ b+ o=1 Learning from observed data y by maximizing P(y| ) MLE/MAP P(x*,y| ) where x* = argmax_xP(x,y| ) VT From a Bayesian point of view, a program defines marginal likelihood P(x,y| , ) d We wish to compute predictive distribution = P(x|y, , ) d marginal likelihood P(y| ) = x P(x,y| , ) d Both need approximation Variational Bayes (VB) VB, VB-VT MCMC Metropolis-Hastings

  9. Sample session 1 - Expl. graph and prob. computation built-in predicate | ?- prism(blood) loading::blood.psm.out | ?- show_sw Switch gene: unfixed_p: a (p: 0.500000000) b (p: 0.200000000) o (p: 0.300000000) | ?- probf(btype(a)) btype(a) <=> gtype(a,a) v gtype(a,o) v gtype(o,a) gtype(a,a) <=> msw(gene,a) & msw(gene,a) gtype(a,o) <=> msw(gene,a) & msw(gene,o) gtype(o,a) <=> msw(gene,o) & msw(gene,a) | ?- prob(btype(a),P) P = 0.55

  10. Sample session 2 - MLE and Viterbi inference | ?- D=[btype(a),btype(a),btype(ab),btype(o)],learn(D) Exporting switch information to the EM routine ... done #em-iters: 0(4) (Converged: -4.965121886) Statistics on learning: Graph size: 18 Number of switches: 1 Number of switch instances: 3 Number of iterations: 4 Final log likelihood: -4.965121886 | ?- prob(btype(a),P) P = 0.598211 | ?- viterbif(btype(a)) btype(a) <= gtype(a,a) gtype(a,a) <= msw(gene,a) & msw(gene,a)

  11. Sample session 3 - Bayes inference by MCMC | ?- D=[btype(a), btype(a), btype(ab), btype(o)], marg_mcmc_full(D,[burn_in(1000),end(10000),skip(5)],[VFE,ELM]), marg_exact(D,LogM) VFE = -5.54836 ELM = -5.48608 LogM = -5.48578 |?- D=[btype(a), btype(a), btype(ab) ,btype(o)], predict_mcmc_full(D,[btype(a)],[[_,E,_]]), print_graph(E,[lr('<=')]) btype(a) <= gtype(a,a) gtype(a,a) <= msw(gene,a) & msw(gene,a)

  12. Summary PRISM = Probabilistic Prolog for statistical machine learning Forward sampling Exact probability computation Parameter learning MLE/MAP, VT Bayesian inference VB VBVT MCMC Viterbi inference model core (BIC,Cheesman-Stutz,VFE) smoothing Current version 2.1

Related


More Related Content