
Active Inference and Artificial Curiosity in Computational Neuroscience
Explore the concept of active inference and artificial curiosity in computational neuroscience, as presented at the Gatsby-Kaken Joint Workshop. Learn how agents learn and gain insights through Bayesian inference, model selection, and structure learning, leading to aha moments and novel discoveries in understanding the world.
Uploaded on | 0 Views
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Image result for Gatsby computational neuroscience Gatsby-Kaken Joint Workshop on AI and Neuroscience: May 11-12, 2017 Active inference and artificial curiosity Karl Friston, University College London Abstract: This talk offers a formal account of insight and learning in terms of active (Bayesian) inference. It deals with the dual problem of inferring states of the world and learning its statistical structure. In contrast to current trends in machine learning, we focus on how agents learn from a small number of ambiguous outcomes to form insights. I will present simulations of abstract rule-learning and approximate Bayesian inference to show that minimising (expected) free energy leads to active sampling of novel contingencies. This epistemic, curious, novelty-seeking behaviour closes explanatory gaps in knowledge about the causal structure of the world, in addition to resolving uncertainty about states of the known world. We then move from inference to model selection or structure learning to show how abductive processes emerge when agents test plausible hypotheses about symmetries in their generative models. The ensuing Bayesian model reduction evokes mechanisms that have all the hallmarks of aha moments . Key words: active inference insight novelty curiosity model reduction free energy epistemic value structure learning
A game (abstract rule) Your prior beliefs: You will choose the correct colour based on the three large circles. The location of the correct colour depends upon the colour of the top circle.
William Rowan Hamilton portrait oval combined.png Active inference Action and the path of least resistance Generative models and active inference From principles to process theories Some empirical predictions Artificial insight and aha moments
Optimal action depends on states of the world Optimal action depends on beliefs about states = = = argmin ( ( )| ) u F Q s u argmax ( ( ) t s | ) u V s u + 1 t t t + 1 t t t and subsequent action = argmin ( ( )| ) F Q s = ( ) u Bellman s Optimality Principle: Hamilton's Principle of least Action: Optimal control theory Dynamic programming Reinforcement learning and expected utility Backwards induction State-action policy iteration MDP . Free energy principle Active inference Artificial curiosity and intrinsic motivation Bayesian decision theory Bayesian sequential policy optimisation POMDP .
Prior beliefs about policies http://www.clker.com/cliparts/H/8/R/H/H/5/lab-mouse-template-md.png = ( ) = ( , ) G G ln | : P F Cost of a policy = expected free energy ( , ) = = | )] + | )] [ln ( , P o s [ ( F Q o s E E H Q s ( , | ) , ) ln ( + ) ln ( | )] [ln ( | | Q s o P o m Q s ( , | ) Q o s = + , )|| ( | )]] [ln ( | )] [ [ ( D Q s | E P o m E o Q s | ) | ) ( ( Q o Q o Epistemic value or information gain Extrinsic value KL or risk-sensitive control Expected utility theory Bayesian surprise and Infomax In the absence of ambiguity (known states): Flat prior beliefs about outcomes: = In the absence of uncertainty or risk: , )|| | )]] | )[ [ ( D Q s | ( E o Q s = | )[ln ( | )] = = ) ln ( | )] E P o m | )[ln ( [ ( D Q o | E P o m Q o ( Q o ( Q o ( Q o | )|| ( | )] Bayesian surprise P o m Expected value = | )|| | ) ( | )] Predicted divergence [ ( , D Q s o ( Q s Q o Mutual information http://i.ytimg.com/vi/cv9hje42i_E/hqdefault.jpg http://www.rdecon.org/IMAGES/TURNOV.JPG http://capabilities.templeton.org/2008/images/HA/vn_withMorgensternat.jpg
William Rowan Hamilton portrait oval combined.png Overview Action and the path of least resistance Generative models and active inference From principles to process theories Some empirical predictions Artificial insight and aha moments
A (Markovian) generative model ( ) ( ) ( ) ( P ) ( ) P = , , , | | | P o s P o s P s ( ) ) = = | ( | ) ( | ) ( | ) P o s P o s P o s P o s 0 0 1 1 t t http://www.clker.com/cliparts/H/8/R/H/H/5/lab-mouse-template-md.png ( A Likelihood | P o s t t ( | , s ) ) = = = = , ) , ) ( ) P s | ( | P s s ( | P s s P s 1 ( )) t 1 0 0 t t ( = B ( P s u + 1 t t ( ) ( ) P s ( ) C P o t Empirical priors hidden states D 0 Precision C control states ( ) d = = G | ( ) P | ) ln ( | )] G [ln ( , P o s Q o s E Q s ( , | ) Policy = = ( ) D B ( ) P P Dir d ( , ) D ( ) Full priors ts+ ts ts Hidden states 1 1 ( ) ) , , = = = = = | ) ( ) ( ) ( ) Q Q D D ( | ) Q s ( Q s Q s Q 1 ( ) ( ) d T A ( s | ( ) ( ) ( ) D Q s Cat Cat Q Q Q to to+ to ( , ) ( ) Dir 1 1 Approximate posterior
Variational updates Functional anatomy ta Perception = + + s A B s B s ( ) o + 1 1 1 t t t t t t Action selection = G ( ) motor Cortex occipital Cortex s Incentive salience = + ( ) G striatum t+ 1 0 to = argmin ( , , , ) t o F o prefrontal Cortex 1 G s hippocampus Performance FE KL RL DA VTA/SN 80 success rate (%) 60 http://www.clker.com/cliparts/H/8/R/H/H/5/lab-mouse-template-md.png 40 20 = = = s s 0 As 0 0.2 0.4 0.6 0.8 1 ln minu ln ( ) a AB s Prior preference As + 1 a Simulated behaviour + 1 Action selection t t
Predicted action Dorsal prefrontal Motor Cortex Bayesian model average of next state s s t+ 1 t Policy selection G Striatum & VTA/SN Precision State estimation under plausible policies Evaluation of policies Sensory input Ventral prefrontal Occipital Cortex Hippocampus
Two-step maze task Generative model Control states u U http://www.clker.com/cliparts/H/8/R/H/H/5/lab-mouse-template-md.png 1 2 3 4 location 1 0 0 0 0 1 0 0 0 0 1 0 1 0 0 0 US US 1 0 0 1 = = = B B ( | , ) t s u ( ): ( u 1) , P s u Hidden states s + 1 t S CS context location 1 1 p q q p = = = A ( | ) : 1 P o s q p q p p q Outcomes t t o O 1 1 T ln ( ) P o = = 3 3 3 0 C 0 3 0 t
Initial state and policy selection 2 4 6 8 Policy US US http://www.clker.com/cliparts/H/8/R/H/H/5/lab-mouse-template-md.png http://www.clker.com/cliparts/H/8/R/H/H/5/lab-mouse-template-md.png 10 5 10 15 20 25 30 Trial Final outcome, performance and reaction times 4 Expected utility 2 0 CS CS -2 5 10 15 20 25 30 State estimation (ERPs) 0.2 Response 0 -0.2 200 400 600 800 1000 1200 1400 Precision (dopamine) 1 Precision 0.5 0 -0.5 200 400 600 800 1000 1200 1400 Learning (C and D) Hidden state 2 4 6 8 5 10 15 20 25 30
William Rowan Hamilton portrait oval combined.png Overview Action and the path of least resistance Generative models and active inference From principles to process theories Some empirical predictions Artificial insight and aha moments
Variational updates Functional anatomy Functional anatomy Perception = + + s A B s B s ( ) o ta ta + 1 1 1 t t t t t t Action selection = G ( ) motor Cortex motor Cortex occipital Cortex occipital Cortex ts t+ s Incentive salience = + ( ) G striatum striatum 0 1 to to = argmin ( , , , ) t o F o 1 prefrontal Cortex F G s s hippocampus hippocampus Performance FE KL RL DA VTA/SN VTA/SN 80 success rate (%) 60 http://www.clker.com/cliparts/H/8/R/H/H/5/lab-mouse-template-md.png 40 20 0 0 0.2 0.4 0.6 0.8 1 Prior preference Simulated behaviour
US http://www.clker.com/cliparts/H/8/R/H/H/5/lab-mouse-template-md.png Initial state and policy selection 2 4 6 8 Policy 10 5 10 15 20 25 30 Trial CS Final outcome, performance and reaction times 4 Expected utility 2 0 -2 5 10 15 20 25 30 State estimation (ERPs) = = s + + s s A B s B s s o Inference (state estimation) + 1 1 1 t t t t t t t 0.2 Response ( ) 0 t t -0.2 200 400 600 800 1000 1200 1400 Precision (dopamine) = = = + ( ) G 1 Precision 0.5 ( ) G Policy selection 0 0 -0.5 1 200 400 600 800 1000 1200 1400 Learning (D) Hidden state 2 4 6 = = 0 D d d d ( ) + ( ) 8 Learning 5 10 15 20 25 30 s d 1
Overview Action and the path of least resistance Generative models and active inference From principles to process theories Some empirical predictions Artificial insight and aha moments
Evidence accumulation Phase precession Place cell activity Oddball (MMN) responses Dopamine transfer Unit responses Firing rates 1 US 0.8 http://www.clker.com/cliparts/H/8/R/H/H/5/lab-mouse-template-md.png 8 Response 0.6 Chosen option unit 0.4 16 Unchosen 0.2 CS 0 24 0.25 time (seconds) 0.5 0.75 0.25 0.5 0.75 time (seconds) Time-frequency response Local field potentials 0.12 30 0.1 Phase precession s s s 25 = 0.08 1 t frequency Response 20 0.06 = = 2 t 0.04 15 0.02 10 = 3 t 0 5 -0.02 0.25 0.5 0.75 0.25 time (seconds) 0.5 0.75 time (seconds) Phasic dopamine responses 0.5 US 0.4 change in precision 0.3 http://www.clker.com/cliparts/H/8/R/H/H/5/lab-mouse-template-md.png 0.2 0.1 0 -0.1 CS -0.2 10 20 30 40 time (updates)
Evidence accumulation Phase precession Place cell activity Oddball (MMN) responses Dopamine transfer Expectations about the past (memory) Unit responses s s s = 1 t 8 = unit = 2 t 16 = 3 t 1 2 3 24 0.25 0.5 0.75 time (seconds) Expectations about the future (planning)
Repetition suppression and DA transfer Evidence accumulation Phase precession Place cell activity Oddball (MMN) responses Dopamine transfer 8 unit 16 24 0.25 0.5 0.75 1 1.25 1.5 time (seconds) Time-frequency response 30 25 Difference waveform (MMN) frequency 20 0.08 15 10 0.06 oddball 5 0.04 0.25 0.5 0.75 1 1.25 1.5 time (seconds) Local field potentials 0.02 LFP standard 0 0.05 Response -0.02 0 MMN -0.04 -0.05 -0.06 50 100 150 200 250 300 350 0.25 0.5 0.75 1 1.25 1.5 Time (ms) time (seconds) Phasic dopamine responses US 0.5 change in precision standard oddball http://www.clker.com/cliparts/H/8/R/H/H/5/lab-mouse-template-md.png 0 -0.5 10 20 30 40 50 60 70 80 90 time (updates) CS
Overview Action and the path of least resistance Generative models and active inference From principles to process theories Some empirical predictions Artificial insight and aha moments
A game (abstract rule) Your prior beliefs: You will choose the correct colour indicated by the large circles. The location of the correct colour depends upon the colour of the upper circle. ..
Hidden states Our game 1s rule 2s colour centre 3s where left start right left center right Priors 4s choice r g b r m A g b r g b 1o what r g b r g b r g b Correct color Correct color centre 2 o where left start right 3 o feedback Outcomes
Rule learning Initial state and policy selection 2 Policy 4 6 5 10 15 20 25 30 Final outcome and performance Expected utility 5 0 -5 5 10 15 20 25 30 Free energy 34 32 nats 30 28 26 5 10 15 20 25 30 Confidence -6.5 nats -7 -7.5 5 10 15 20 25 30 trial
Epistemic learning and novelty Inferred and selected action - where Inferred and selected action - where Hidden states - rule Hidden states - rule 1 1 123 123 1 2 3 4 5 6 2 1 2 3 4 5 6 2 action action 3 3 Hidden states - colour Hidden states - colour 123 123 4 4 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 1 2 3 4 5 Inferred and selected action - choice Inferred and selected action - choice Hidden states - where Hidden states - where 1 1 24 24 1 2 3 4 5 6 1 2 3 4 5 6 2 2 action action 3 3 Hidden states - choice Hidden states - choice 24 24 4 4 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 1 2 3 4 5 time time time time
Expected free energy and novelty So far, we have considered expected free energy in terms of risk and ambiguity that reflect uncertainty about states. Exactly the same principles can be applied to model parameters leading to epistemic policies that resolve ignorance; i.e., search out novel contingencies. ( , ) = | ) ln ( , , | , )] o A A E [ln ( , Q G Q s P o s entropy energy = , )] + | ) ln ( , )] [ln ( ) ln ( Q A A | , [ln ( | [ln ( )] P o E Q s o E Q o Q o s E Q Q Q novelty intrinsic value extrinsic = , )] + | )|| ( )] P o + [ln ( ) ln ( Q A A | , [ ( D Q o [ [ ( E H P o | )]] E Q s o s Q Q ignorance risk ambiguity
The neural correlates of insight A B Unit responses Firing rates Unit responses Firing rates 1 1 2 s past 3 3 0.8 0.8 6 6 0.6 0.6 Response Response unit 9 9 unit 0.4 0.4 12 12 0.2 0.2 future 15 15 0 0 18 18 0.25 0.5 0.75 1 1.25 1.5 0.25 0.5 time (seconds) 0.75 1 1.25 1.5 0.25 0.5 0.75 1 1.25 1.5 0.25 0.5 time (seconds) 0.75 1 1.25 1.5 time (seconds) time (seconds) Local field potentials Local field potentials trial 31 trial 1 0.2 0.2 0.15 0.15 0.1 0.1 Response 0.05 0.05 0 0 -0.05 -0.05 -0.1 -0.1 0.25 0.5 time (seconds) 0.75 1 1.25 1.5 0.25 0.5 time (seconds) 0.75 1 1.25 1.5
Bayes optimal rule learning? Initial state and policy selection 2 Policy 4 6 5 10 15 20 25 30 Final outcome and performance Expected utility 5 0 -5 5 10 15 20 25 30 Free energy 34 32 nats 30 28 26 5 10 15 20 25 30 Confidence -6.5 nats -7 -7.5 5 10 15 20 25 30 trial
Bayesian model reduction and insight So far, we have considered optimising expectations about states and parameters ; however, we can also optimise the model itself with respect to free energy. This is known as Bayesian model comparison or reduction that can proceed in the absence of new information. For example, the awake brain learns causal associations in synaptic connections, while during sleep redundant connections are removed (Tononi and Cirelli, 2006) to minimise complexity (Hobson and Friston, 2012) literally to clear one's mind . Here, plausible models are implicitly defined under the prior belief that there is a rule (i.e. a symmetry or invariance of contingencies in the likelihood mapping) By applying Bayes rules to the full and reduced models it is straightforward to show that the change in free energy can be expressed in terms of posterior concentration parameters a, prior concentration parameters a and the prior concentration parameters that define a reduced or simpler model a : ln ( ) ln ( ) ln ( ) ln ( a + a = + a ) F a a a This equation returns the difference in free energy we would have observed, had we started observing outcomes with simpler prior beliefs.
Insight and model reduction Sample: left - center - right Priors r g b Rule: right - center - left r g b r g b r g b r g b r g b Correct color Correct color 1 what A A Posteriors Reduced Correct color Correct color
The emergence of insight Average performance 1 probability of correct 0.8 0.6 0.4 5 10 15 20 25 30 trial Average free energy 40 Without model reduction 35 free energy 30 25 20 5 10 15 20 25 30 trial Average confidence -5 -6 confidence -7 With model reduction -8 -9 5 10 15 20 25 30 trial Aha moments 10 20 subject 30 40 64 subjects 50 60 5 10 15 20 25 30 trial
William Rowan Hamilton portrait oval combined.png Thank you And colleagues: And thanks to collaborators: Micah Allen Felix Blankenburg Andy Clark Peter Dayan Ray Dolan Allan Hobson Paul Fletcher Pascal Fries Geoffrey Hinton James Hopkins Jakob Hohwy Mateus Joffily Henry Kennedy Simon McGregor Read Montague Tobias Nolte Anil Seth Mark Solms Paul Verschure Rick Adams Ryszard Auksztulewicz Andre Bastos Sven Bestmann Harriet Brown Jean Daunizeau Mark Edwards Chris Frith Thomas FitzGerald Xiaosi Gu Stefan Kiebel James Kilner Christoph Mathys J r mie Mattout Rosalyn Moran Dimitri Ognibene Sasha Ondobaka Will Penny Giovanni Pezzulo Lisa Quattrocki Knight Francesco Rigoli Klaas Stephan Philipp Schwartenbeck And many others