Exponentially Many Experts: On-Line Decision Problems

on line decision problems with exponentially many n.w
1 / 22
Embed
Share

Dive into the world of on-line decision problems with exponentially many experts, exploring algorithms, regret bounds, and implementations like the Kalai-Vempala algorithm for coping with vast numbers of possible actions. Discover strategies for minimizing regret and making optimal choices in a dynamic setting.

  • Decision Problems
  • Regret Bounds
  • On-Line Algorithms
  • Exponential Actions
  • Kalai-Vempala

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. On-Line Decision Problems with Exponentially Many Experts Richard M. Karp Hoffman Fest September 19-20, 2014

  2. On-Line Decision Problem A game between an adversary and a chooser. At each of T steps the adversary secretly designates a subset of a fixed set A of n possible actions. The designated actions are called winning and the others are called losing. The chooser, knowing the winning actions at all preceding steps, chooses an action. The set of winning actions at the current step is then revealed to the chooser. The chooser s goal: Minimize regret, the number of times he chooses a losing action, minus mine(number of losses of action e). Any deterministic policy for the chooser can be made to lose at every step.

  3. Randomized On-line Algorithms with Proven Regret Bounds {W(t,e), L(t,e))}: Number of {wins, losses} for action e in first t steps. Multiplicative Update Algorithm: Choose a small positive epsilon. At each step t+1, assign a weight (1-epsilon}(L(t-1,e))to each action e. Draw an action from a distribution in which the probability of an action is proportional to its weight. Kalai-Vempala Algorithm: at each step t+1, assign each action e the score W(t,e) - R(t+1,e) where R(t+1,e) is drawn by the chooser from an exponential distribution with mean 1/epsilon. Choose the action argmaxeW(t,e) R(t+1,e)

  4. Bounds on Regret For the multiplicative weight update algorithm and Kalai-Vempala algorithm the expected regret after step t is at most epsilon mintL(t,e) +O(log(n)/epsilon where n is the number of possible actions.

  5. Examples The actions may be: paths through a network, multicommodity flows, packings of a knapsack, truth assignments, stock levels in an inventory system, spanning trees, minimum spanning trees, perfect bipartite matchings, transversals in a bipartite graph, bases in a matroid etc. How to implement the Kalai-Vempala algorithm when the number of possible actions is exponentially large?

  6. Coping with Exponentially Many Actions Approach: Find a compact representation that enables counting the number of winning actions at any prior step, sampling uniformly at random from the set of winning actions at that step , and testing whether a given action is winning at that step. Implement the Kalai-Vempala algorithm using these primitives.

  7. Ideal Case WIN(t) Set of winning actions at step t. Ideal case: It is possible to count the elements of WIN(t), sample uniformly from WIN(t) and test membership in WIN(t). Sets for which the ideal case holds: source-sink paths respecting a deadline, feasible solutions to a provisioning problem, feasible solutions to a knapsack problem, spanning trees, minimum spanning trees, satisfying assignments to a boolean formula described by a BDD.

  8. Nearly Ideal Case Approximate counting can be done. The nearly ideal case applies to perfect matchings in a bipartite graph, integer multicommodity flows, satisfying assignments to a DNF Boolean formula and bases of a transversal matroid.

  9. Example: The On-Line Path Selection Problem The setting: A graph G, a source a and sink z, where a is different from z, and a positive integer deadline d. The actions are the paths from a to z in G. At each step t the adversary assigns positive integer costs to the edges. A path is winning if the sum of the costs of its edges is less than or equal to d.

  10. A Compact Representation A pair (c,u), where c is a nonnegative integer and u is a vertex, is good at step t if there is a path of cost c from a to u and a path of cost <=d-c from u to z. In the graph of winning paths H the vertex set is the set of good pairs plus the pair (d,z) and there is an edge from (c,u) to (c ,v) if c c is the cost of edge (u,v) and from (c,u) to (d,z) if the cost of the edge (u,t) is less than or equal to d-c. The winning paths are the paths in H from (0,a) to (d,z).

  11. Constructing the Graph H By breadth-first search forward from (0,a) and backward from (d,z) find the set of winning pairs. Counting: Examining the pairs (c,u) in H in decreasing order of c, compute L(c,u), the number of paths in H from (c,u) to (d,z). L(0,a) is the number of winning paths. Sampling: Impose an arbitrary linear ordering on the edges out of each pair in H. This induces a linear ordering on the winning paths. For any N, one can quickly find the Nth path in this ordering.

  12. A Provisioning Problem Each day, the adversary secretly specifies a vector d = (d(1),d(2), ,d(m)) of demands for m perishable goods and the chooser chooses supply levels e(1),e(2), ,e(m) summing to C. The action (e(1), e(2), ,e(m)) is winning if e(i) >= d(i) for all i. Easy exercise: count the winning choices and sample uniformly at random from the set of winning choices.

  13. Nave Algorithm for the Ideal Case At step t+1 explicitly compute W(t,e) and W(t,e) - R(t+1,e) for all e, and choose argmaxeW(t,e) R(t+1,e). Here W(t,e) = W(t-1,e) + 1 if e is a winning action at time t, and W(t-1,e) otherwise. This is expensive of time and especially of storage.

  14. Thinning Choose a small positive parameter p At each step, compute a thinned winning set in which each winning action occurs independently with probability p Estimate each W(t,e) by V(t,e)/p where V(t,e) is the number of thinned winning sets containing e at the first t steps. Execute the na ve algorithm using these estimates.

  15. Intersection Trick Winning sets: WIN(1), WIN(2), ,WIN(T) In many cases, the intersection of k winning sets has a compact representation supporting counting, uniform sampling and membership testing. One can estimate the fraction of winning sets containing action e up to time t by picking random k-tuples of prior winning sets and estimating the frequency with which the k-fold intersection contains action e.

  16. A Thought Experiment A thought experiment: given an urn filled with balls of many different colors, find all balls that occur with frequency at least 1/r, where r is positive integer. Solution: Repeat as long as possible: discard a set of r balls of distinct colors. At the end of this process, fewer than r colors remain, and every color of frequency >1/r in the original urn is among them.

  17. Finding Frequent Actions with Limited Storage WIN(i) = set of winning actions at time i. Action e is (t,r)-frequent if W(t,e) > 1/r (|WIN(1)| + + |WIN(t)|). Problem: for t=1,2, ,T find the (t,r)-frequent actions. These are candidates for argmaxe(W(t,e) R(t+1,e)). Actions that are not (t,r)-frequent can often be eliminated.

  18. A Streaming Algorithm Process the successive sets WIN(1), WIN(2), , WIN(T) action by action. Create sets S(1),S(2), each of cardinality at most r and containing occurrences of distinct actions. Place each action in turn in the first set that can accept it.

  19. Streaming Algorithm (continued) A set S(j) is incomplete if it contains fewer than r distinct actions. For each such action, the algorithm keeps a count of the number of incomplete sets containing the action. Lemma: At any point, the number of actions occurring in incomplete sets is less than r. Lemma: If action e is (t,r)-frequent then, after the processing of WIN(1), WIN(2), ,WIN(t), action e occurs in at least one incomplete set.

  20. Estimating the Frequencies of Actions Goal: for each action e, estimate the number of complete sets containing e. Associate with each complete set S(j) a consensus set C(j) as follows: C(1) = S(1) for j=2,3, if e is in both S(j) and C(j-1) then e is in C(j) pair the remaining elements of S(j) one-to-one with the remaining elements of C(j-1). Place one action from each pair into C(j), choosing the element from S(j) with probability (j-1)/j, and the element from C(j-1) with probability 1/j.

  21. Estimating the Frequencies of Actions (continued) Theorem: The probability that action e lies in C(j) is exactly number of occurrences of e in the sets S(1), S(2), ,S(j), divided by j. The construction of the consensus sets C(t) is randomized. By maintaining several consensus sets for each set S(t) one can estimate the number of occurrences of e in the sets S(1),S(2), ,S(t). Combining this with the construction of the (t,r)- frequent actions one can reliably estimate argmaxeW(t,e) R(t+1,e) for all t.

  22. Acknowledgement This is ongoing joint work with Evdokia Nikolova of UT Austin. The further work will focus on algorithms for the non-ideal case, in which counting and sampling can only be done approximately, and the non- binary case, in which an action receives a score ate each step, rather than merely winning or losing.

More Related Content