
Understanding User Needs Through Data Mining
Explore the potential for personalization transactions in Computer-Human Interaction utilizing data mining techniques to understand user needs. Discover how to capture variation in user intent and leverage implicit and explicit indicators to enhance search results relevance effectively.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Potential for Personalization Transactions on Computer-Human Interaction, 17(1), March 2010 Data Mining for Understanding User Needs Jaime Teevan, Susan Dumais, and Eric Horvitz Microsoft Research
CFP Paper
Questions How good are search results? Do people want the same results for a query? How to capture variation in user intent? Explicitly Implicitly How can we use what we learn?
personalization research Ask the searcher Is this relevant? Look at searcher s clicks Similarity to content searcher s seen before
Ask the Searcher Explicit indicator of relevance Benefits Direct insight Drawbacks Amount of data limited Hard to get answers for the same query Unlikely to be available in a real system
Searchers Clicks Implicit behavior-based indicator of relevance Benefits Possible to collect from all users Drawbacks People click by mistake or get side tracked Biased towards what is presented
Similarity to Seen Content Implicit content-based indicator of relevance Benefits Can collect from all users Can collect for all queries Drawbacks Privacy considerations Measures of textual similarity noisy
Summary of Data Sets Explicit Indicator Implicit Indicators Behavior 1.5 M 44 K 44 K 2.4 M Content 59 24 24 822 # Users # Queries >5 Users # Instances 125 119 17 308
Questions How good are search results? Do people want the same results for a query? How to capture variation in user intent? Explicitly Implicitly How can we use what we learn?
How Good Are Search Results? Explicit Behavior Content 0.7 Normalized Gain Lots of relevant results ranked low 0 1 2 3 4 5 6 7 8 9 10 Rank
How Good Are Search Results? Explicit Behavior Content 0.7 Behavior data has presentation bias Normalized Gain Lots of relevant results ranked low 0 1 2 3 4 5 6 7 8 9 10 Rank
How Good Are Search Results? Explicit Behavior Content 0.7 Behavior data has presentation bias Normalized Gain Content data also identifies low results Lots of relevant results ranked low 0 1 2 3 4 5 6 7 8 9 10 Rank
Do People Want the Same Results? What s best for For you? For everyone? When it s just you, can rank perfectly With many people, ranking must be a compromise personalization research?
Do People Want the Same Results? Group Individual Web 1 Potential for Personalization Normalized DCG 0.85 0.7 0.55 1 2 3 4 5 6 Number of People in Group
Do People Want the Same Results? Group Individual Web 1 Potential for Personalization Normalized DCG 0.85 0.7 0.55 1 2 3 4 5 6 Number of People in Group
How to Capture Variation? Explicit Behavior Content 1 Normalized DCG 0.85 Behavior gap smaller because of presentation bias 0.7 0.55 1 2 3 4 5 6 Number of People in Group
How to Capture Variation? Explicit Behavior Content 1 Normalized DCG 0.85 Behavior gap smaller because of presentation bias Content data shows more variation than explicit judgments 0.7 0.55 1 2 3 4 5 6 Number of People in Group
How to Use What We Have Learned? Identify ambiguous queries Solicit more information about need Personalize search Using content and behavior-based measures 0.6 Normalized DCG 0.58 0.56 Web Personalized 0.54 0.52 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Content Behavior
Answers Lots of relevant content ranked low Potential for personalization high Implicit measures capture explicit variation Behavior-based: Highly accurate Content-based: Lots of variation Example: Personalized Search Behavior + content work best together Improves search result click through
Potential for Personalization THANK YOU!