Understanding Machine Learning Ranking Algorithms

1 / 13

Embed Share

Explore the world of machine learning ranking algorithms, including classification, regression, and pointwise, pairwise, and listwise ranking methods. Learn about evaluating rankings, training data for ranking models, and the RankNet algorithm. Discover the intricacies of ranking systems and how they play a crucial role in various applications like search results, ad targeting, and movie recommendations.

zany_98 Follow

Uploaded on May 09, 2025 | 1 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

ML Design Pattern: Ranking Geoff Hulten

Setup for a ranking system Goal of Classification find correct label Goal of Regression predict correct number Goal of Ranking sort samples in correct order Pointwise regression for relevance score Pairwise which response is better Listwise 1 to N ranking ? ??????,????? = ????????? ? ??????,?????1,?????2 = ?(?????1?? ?? ??) Reasons for Ranking Search Results Ad Targeting Movie recommendation Skills for assistant Designs Digital Market place ? ??????,[ ?????? ] = [ ?????? ]

Text Speech Movie Product Content Ranking Flow Topic Parameters Augmentation Query Triggering Query Engines Query Query Interpretation & Planning Interact Task Specific Solutions Domain Specific Answers Focused (or Legacy) Indexes Best Answers Top N Answers Ranker Maximize objective History, user, context, etc User Experience

One way of Evaluating Ranking: Mean Average Precision ?(?????????@? ?????????? ? ) #???????? ????(?) = ?=? ? = ???? ??????? A (1.0 1) A .5 Scores from Ranking Algorithm ? ?????(?) |?| + ??? = B A .9 (0.5 0) + ???? =1.5 B .8 C (0.33 0) 2 = 0.75 A + A B .2 (0.5 1) Query + C C .4 (0.4 0) + Many other options: Mean Reciprocal Rank Precision @k Clickthrough Rate User Outcomes And etc C .7 B (0.33 0) If: Ranker put both As @ top ???? = 1 Possible Answers Ranked Answers Ranker put both As @ end ????~.18

Ranking algorithm sketch (RankNet) 1 Training data: Set of: Query {< ????1,???1>, ??? ? ????? ????? 1 + ?(???? ????) , < ?????,????>} ? = ???log??? (1 ???)log(1 ???) While not converged: Iterate over training data Apply current model to all items For every pair of items, ?,? Adjust the model weights to make them more correctly ordered 1 if i should rank above j else 0 https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/MSR-TR-2010-82.pdf

Getting Training data for ranking models Corpus Centric Closed Loop Record the interactions users have Click through rate Outcomes they achieve Sample queries from the system and: Explore for ranking training ? greedy show a random answer ? percent of the time Pay labelers to find relevant (good) answers Pay labelers to grade the responses the system gives Explore for query engine training ?2 greedy let random engine show random answers some percent of the time Do Active Learning

Query Triggering: Latency Privacy Data Cost Text Speech Movie Product Content Where the Models Live Query Engines Query Query Interpretation & Planning Interact Query Engines: Large Data Indexes Analysis and Interpretation Best Answers Top N Answers Ranker History, user, context, query results, etc User Experience

What does it matter where intelligence lives? Latency in Updating Quality is evolving quickly Problem is evolving quickly Risk of costly mistakes Cost of operation Cost of distributing intelligence Cost of executing intelligence Offline operation Work without Internet? Keep it out of Abuser s hands Latency in Execution Slowing the experience The right answer changes too fast PRIVACY

Where Intelligence Lives Lives in Service Lives on Client 1 MB Model Daily Update 100k Users 100kb/Call 10 Calls/Day 1 mb x 1 Intelligence Creation 1 mb x 100,000 Intelligence Creation Server Server Telemetry 100kb x 10 X 100,000 Clients Clients Total: 100,000 mb + Telemetry Total: 100,001 mb + compute

Places Intelligence can Live Where it Lives Latency in Updating Latency in Execution Cost of Operation Offline? Static in Product Poor Excellent Cheap Yes Based on update rate Client Side Variable Excellent Yes Internet Roundtrip Server-Centric Good Can be high No Back-end Variable Variable Variable Partial Hybrid ?? ?? ?? ??

Where the Models Live Query Engines Query Query Triggering: Client Side Reduce server traffic Preserve user privacy Query Engines: Hybrid Backend: Cache common queries Server Centric: Tail queries Ranker Ranker: Server-centric if heavily query-engine-confidence focused Client Side if simple & heavily user sensitive

Deploying and Lighting Up (Online Evaluation) Single Deployment All users see all updates at once Simple Relies on great offline tests Controlled Rollout Several live at once, transition slowly Lets you observe user interactions Overhead to build and manage Risk of costly/hard-to-find mistakes. Adds latency. Silent Intelligence Run two versions at once Ensure online is same as offline Gives time to see new contexts Flighting Intelligence (A/B test) Deploy options, track till one better Connects accuracy to true objective Overhead to build and manage Latency. No interactions. Latency. Hard to confirm small gains.

Summary Ranking Based Choosing where your models lives can have a large impact on cost / effectiveness, options include: Static client side server centric back end hybrid Ranking sorts possible responses in the correct order based on a query Loss metrics for ranking include: Mean Average Precision Mean reciprocal rank Precision @k Clickthrough rate Outcomes And more Ranking can be used corpus centric or with a closed loop

Understanding Machine Learning Ranking Algorithms

Download Presentation

Presentation Transcript

Related

More Related Content