Explore Refresh Policy and Page Freshness Optimization

cs246 refresh policy n.w
1 / 27
Embed
Share

Dive into the world of maintaining website freshness with a focus on optimizing page refresh policies based on change frequencies. Discover ideas on formalizing freshness metrics and tackling computational problems to enhance user experience through updated content.

  • Website Freshness
  • Page Optimization
  • Change Frequencies
  • Computational Problems
  • Web Content

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. CS246: Refresh Policy Junghoo John Cho UCLA

  2. Ideas? Q: How can we maintain pages fresh ? What ideas can we explore to refresh pages well? Idea 1: Some pages change often, some pages do not. News archive vs daily news article Q: Can we do things differently depending on how often they change? Idea 2: A set of pages change together Java manual pages Q: Can we do something when we notice that some pages change together? Q: How can we formalize these ideas as a computational problem? 2

  3. Idea 1: Leveraging Change Frequencies Q: What is the idea? How can we formalize the idea? A: Given different change frequencies of pages, come up with a refresh policy that maximizes the freshness of our pages Page refresh policy as optimization problem Given changes frequencies ?1,?2, ,??of pages ?1,?2, ,??, find a refresh policy that maximizes the freshness of downloaded pages Q: What is a refresh policy? What does freshness mean? What does the change frequency ?? of page ?? mean? 3

  4. Change Metric: Formalizing Freshness Freshness of page ?? at time ? is: web collection pi ? ??;? = 1 if ?? is up to date at time ? 0 otherwise pi ... ... Freshness of our collection ? at time ? is: ? ? ?;? =1 ? ?=1 ?(??;?) Q: Why binary metric? Any other way to define freshness ? 4

  5. Change Metric: Formalizing Freshness Age of page ?? at time ? is: web collection pi ? ??;? = 0 if ?? is up to date at time ? (? modification time) otherwise pi ... ... Age of our collection ? at time ? is: ? ? ?;? =1 ? ?=1 ?(??;?) Q: But both change over time! What time should we optimize for? 5

  6. Change Metrics Over Time Time averages: F(pi) ? 1 ? 1 ? 1 ? ?? = lim ? ??;? ?? ? 0 0 ? time ? ? = lim ? ?;? ?? ? 0 A(pi) ? 1 ? ? ?? = lim ? ??;? ?? 0 ? 0 time ? 1 ? ? ? = lim ? ?;? ?? refresh update ? 0 6

  7. Page Refresh Policy as Optimization Problem Given changes frequencies ?1,?2, ,??of pages ?1,?2, ,??, find a refresh policy that maximizes the freshness (or age) of the collection ?: ? ? = ? ?=1 Q: What does change frequency ?? mean? For example, if ??= once/day, how does the page change? Exactly once a day? At what time? How much predictability? How can we model web page changes given ??? 1 ? ?(??) 7

  8. Modeling Page Change Poisson process with rate . Memoryless, independent Q: But what do they exactly mean? Simulating Poisson process Split time into very small intervals Flip a coin at each interval with the same probability p: ~ p Under Poisson process with rate ?, if ? is time to the next event, the p.d.f. of ? is: ?(?) = ?? ??(? > 0) Q: Is this really a good model for Web page change? 8

  9. Web Evolution Experiment February 17 to June 24, 1999 270 sites visited (with permission) identified 400 sites with highest page rank contacted administrators 720,000 pages collected 3,000 pages from each site daily start at root, visit breadth first (get new & old pages) 9

  10. Average Change Interval fraction of pages 10

  11. Average Change Interval By Domain fraction of pages 11

  12. Change Interval of Pages for pages that change every 10 days on average fraction of changes with given interval Poisson model interval in days 12

  13. Poisson Model Seems to be good for the pages whose change frequencies are in the range of once every week through once every month The range can could be verified using the available dataset Q: But Is the Poisson model correct in general? 13

  14. Question on Poisson Model 14

  15. Back to Page Refresh Policy as Optimization Problem Given pages ?1,?2, ,?? whose changes follow Poisson processes of rate ?1,?2, ,??, respectively, find a refresh policy that maximizes the time-averaged freshness (or age) of the collection ?: ? ? = ? ?=1 Q: What is the refresh policy ? What things do we need to decide for a page refresh policy ? 1 ? ?(??) 15

  16. Refresh Policies Refresh order In what order? Refresh frequency How often to refresh? Refresh points Exactly when to refresh? Resource allocation How often each page? 16

  17. Refresh Order Fixed order Example: Explicit list of URLs to visit Random Order Example: Start from seed URLs & follow links Purely Random Example: Refresh pages on demand, as requested by user Let us assume that all pages change at the same rate ?, we refresh pages at the same rate ? and see how freshness (or age) change depending on the choice of refresh order 17

  18. Refresh Order and Refresh Frequency r = / f = change frequency / refresh frequency 18

  19. Refresh Order and Refresh Frequency = Age / time to refresh all N elements r = / f = change frequency / refresh frequency 19

  20. Refresh Order and Refresh Frequency Fixed order is optimal Given change frequency ? and refresh frequency ?, the time averaged freshness and age are: ? ? = ? ?,? =1 ? ?/? ?/? ? ?,? =1 ? ?+1 ? ?/? ?/?2 1 2 ? ? ? = 20

  21. Resource Allocation Two page collection p1 changes daily p2 changes once a week Can visit pages once a week How should we visit pages? p1 p1 p1 p1 p1 p1 ... p2 p2 p2 p2 p2 p2 ... p1 p2 p1 p2 p1 p2... [uniform] p1 p1 p1 p1 p1 p1 p1 p2 p1 ... [proportional] ? p1 p1 p2 p2 web collection 21

  22. Proportional Often Not Good! Visit fast changing p1 get 1 day of freshness Visit slow changing p2 get 1 week of freshness Visiting p2 is a better deal! Q: What is the best resource-allocation policy? 22

  23. Resource Allocation as Optimization Problem Given change rates ?1,?2, ,?? and average refresh rate ?, find the optimal individual refresh rates ?1,?2, ,?? (with 1 maximize the time-averaged freshness (or age) of the collection ?: ? ? ?=1 ??= ? )that 1 ? ??/?? ??/?? 1 ? ?=1 ? ??,?? =1 ? ? ? ? = ? ?=1 Q: How can we solve the optimization problem? A: The function has derivatives! Use gradient descent (with lagrange multiplier method to take into account the constraint 1 ? ? ?=1 ??= ?) 23

  24. Selecting Optimal Refresh Frequency Shape of curve is the same in all cases Holds for any change frequency distribution 24

  25. Optimal Refresh Frequency for Age Shape of curve is the same in all cases Holds for any change frequency distribution 25

  26. Proportional vs Uniform We can mathematically prove that uniform is always better than proportional Assuming every page changes Proportional allocates resources to pages that change too often Proof is based on the concavity of freshness/age curve 26

  27. Comparing Policies Freshness Age Proportional Uniform Optimal 0.12 0.57 0.62 400 days 5.6 days 4.3 days 27

More Related Content