Solving MDPs

Solving MDPs
Slide Note
Embed
Share

An MDP consists of a tuple {S,A,R,P} representing states, actions, rewards, and transition probabilities. The goal in MDPs is to find an action policy to maximize future rewards. Solving involves iteratively updating value and action policies. MDPs are a part of AI models with control over actions, observable states, and are differentiated from HMMs and POMDPs. Human decision modeling in MDPs poses challenges due to conceptualizing states in the real world and predicting rewards. Markov chains aim to identify stationary distributions, while HMMs estimate latent state transitions from observation sequences. Reinforcement Learning in MDP involves learning optimal policies, either model-based or model-free.

  • Markov Decision Processes
  • MDP Framework
  • Reinforcement Learning
  • AI Models
  • Human Decision Modeling

Uploaded on Feb 17, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Solving MDPs CS786 27th January 2022

  2. The MDP framework An MDP is the tuple {S,A,R,P} Set of states (S) Set of actions (A) Possible rewards (R) for each {s,a} combination P(s |s,a) is the probability of reaching state s given you took action a while in state s

  3. Solving an MDP Solving an MDP is equivalent to finding an action policy AP(s) Tells you what action to take whenever you reach a state s Typical rational solution is to maximize future- discounted expected reward

  4. Solution strategy Notation: P(s |s,a) is the probability of moving to s from s via action a R(s ,a) is the reward received for reaching state s via action a Update value and action policy iteratively https://towardsdatascience.com/getting-started-with-markov-decision-processes- reinforcement-learning-ada7b4572ffb

  5. Part of a larger universe of AI models Control over actions? States observable? No Yes No HMM POMDP Yes Markov chain MDP

  6. Modeling human decisions? States are seldom nicely conceptualized in the real world Where do rewards come from? Storing transition probabilities is hard Do people really look ahead into the infinite time horizon?

  7. Markov chain Goal in solving a Markov chain Identify the stationary distribution 0.1 Sunny Rainy 0.3

  8. HMM HMM goal estimate latent state transition and emission probability from sequence of observations

  9. MDP RL In MDP, {S,A,R,P} are known In RL, R and P are not known to begin with They are learned from experience Optimal policy is updated sequentially to account for increased information about rewards and transition probabilities Model-based RL Learns transition probabilities P as well as optimal policy Model-free RL Learns only optimal policy, not the transition probabilities P

Related


More Related Content