
Learning Probabilistic Relational Models in Web Mining Agents
Explore the world of probabilistic relational models in the realm of web mining agents with insights from experts in the field. Discover the power of combining relational logic and Bayesian networks to model uncertainty and relationships in data effectively.
Uploaded on | 0 Views
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Web-Mining Agents Learning Probabilistic Relational Models Prof. Dr. Ralf M ller Dr. zg r L. z ep Universit t zu L beck Institut f r Informationssysteme Tanya Braun (Exercises)
Slides taken from the presentation (subset only) Learning Statistical Models From Relational Data Lise Getoor University of Maryland, College Park Includes work done by: Nir Friedman, Hebrew U. Daphne Koller, Stanford Avi Pfeffer, Harvard Ben Taskar, Stanford
Outline Motivation and Background PRMs w/ Attribute Uncertainty PRMs w/ Link Uncertainty PRMs w/ Class Hierarchies
Discovering Patterns in Structured Data Strain Contact Patient Treatment
Learning Statistical Models Contact Patient Traditional approaches work well with flat representations fixed length attribute-value vectors assume independent (IID) sample flatten Problems: introduces statistical skew loses relational structure incapable of detecting link-based patterns must fix attributes in advance
Probabilistic Relational Models Combine advantages of relational logic & Bayesian networks: natural domain modeling: objects, properties, relations; generalization over a variety of situations; compact, natural probability models. Integrate uncertainty with relational model: properties of domain entities can depend on properties of related entities; uncertainty over relational structure of domain.
Relational Schema Author Review Mood Length Good Writer Smart Paper Quality Accepted Has Review Author of Describes the types of objects and relations in the database
Probabilistic Relational Model Review Author Smart Mood Good Writer Length Paper Quality Accepted
Probabilistic Relational Model Review Author Smart Mood Good Writer Length Paper Paper.Accepted | Quality Paper.Quality, P Accepted Paper.Review.Mood
Probabilistic Relational Model Review Author Smart Mood Good Writer Length Paper P(A | Q, M) Q M , f f , 1 . 0 9 . 0 Quality f t , 2 . 0 8 . 0 Accepted t f , 6 . 0 4 . 0 t t , 7 . 0 3 . 0
Relational Skeleton Primary Keys Paper P1 Author: A1 Review: R1 Author A1 Review R1 Paper P2 Author: A1 Review: R2 Review R2 Author A2 Review R2 Paper P3 Author: A2 Review: R2 Foreign Keys Fixed relational skeleton : set of objects in each class relations between them
PRM w/ Attribute Uncertainty Paper P1 Author: A1 Review: R1 Quality Author A1 Review R1 Smart Mood Good Writer Accepted Length Paper P2 Author: A1 Review: R2 Quality Author A2 Review R2 Mood Smart Length Good Writer Accepted Paper P3 Author: A2 Review: R2 Quality Review R3 Mood Length Accepted PRM defines distribution over instantiations of attributes
A Portion of the BN P(A | Q, M) P(A | Q, M) Q Q M M , f , f f r2.Mood Bad P2.Quality Low f f , , 1 . 0 1 . 0 9 . 0 9 . 0 P2.Accepted f t t , , 2 . 0 2 . 0 8 . 0 8 . 0 t t f f , , 6 . 0 6 . 0 4 . 0 4 . 0 r3.Mood P3.Quality t t t t , , 7 . 0 7 . 0 3 . 0 3 . 0 P3.Accepted
A Portion of the BN P(A | Q, M) Q M , f r2.Mood Bad P2.Quality Low f , 1 . 0 9 . 0 f t , 2 . 0 8 . 0 P2.Accepted t f , 6 . 0 4 . 0 r3.Mood Bad P3.Quality High t t , 7 . 0 3 . 0 P3.Accepted
PRM: Aggregate Dependencies Paper Review Mood Quality Length Accepted Review R1 Mood Review R2 Length Mood Paper P1 Quality Review R3 Length Mood Accepted Length
PRM: Aggregate Dependencies Paper Review Mood Quality Length Accepted P(A | Q, M) Q M , f Review R1 f , 1 . 0 9 . 0 Mood f t , 2 . 0 8 . 0 Review R2 t f , 6 . 0 4 . 0 Length Mood Paper P1 t t , 7 . 0 3 . 0 Quality Review R3 Length mode Mood Accepted sum, min, max, avg, max-occurrence, count Length
PRM with AU Semantics Author Review Author A1 Paper R1 Paper P1 Review Author A2 R2 Review Paper P2 Review R3 Paper P3 relational skeleton PRM + = probability distribution over completions I: x , S = P I P x A parents x A ( | , ) ( . | ( . )) S , x A . Objects Attributes
Learning PRMs w/ AU Strain Database Patient Contact PRM PRM Strain Patient Contact Parameter estimation Structure selection Relational Schema
Parameter Estimation in PRMs Assume known dependency structure S Goal: estimate PRM parameters entries in local probability models, . | ( . ) x A parents x A is good if it is likely to generate the observed data, instance I . log ) , : ( = S l I I ( | , ) P S MLE Principle: Choose so as to maximize l
Learning PRMs w/ AU Author Database Paper Review PRM PRM Author Paper Review Parameter estimation Structure selection Relational Schema
ML Parameter Estimation Review Mood Length Paper Quality P(A | Q, M) ? ? ? ? Q , f M ? ? ? ? , f Accepted f , t t , f t , t N P N Q . R , M P , A . . = = P Q . R , M . N where is the number of accepted, low quality papers whose reviewer was in a poor mood P Q . R M . P A , , .
ML Parameter Estimation Review Mood Length Paper Quality P(A | Q, M) ? ? ? ? Q , f M ? ? ? ? , f Accepted f , t t , f t , t N P N Q . R , M P , A . . = = P Q . R , M . Query for counts: . p Review table Paper table Count ? ?P.Q =q R.M = m P.A = m P Quality Mood . R . P Accepted
Structure Selection Idea: define scoring function do local search over legal structures Key Components: legal models scoring models searching model space
Structure Selection Idea: define scoring function do local search over legal structures Key Components: legal models scoring models searching model space
Legal Models PRM defines a coherent probability model over a skeleton if the dependencies between object attributes are acyclic (prop. BN). Paper author-of P1 Researcher Prof. Gump Reputation high Accepted yes Paper P2 sum Accepted yes How do we guarantee that a PRM is acyclic for every skeleton?
Attribute Stratification PRM dependency graph dependency structure S Paper.Accecpted if Researcher.Reputation depends directly on Paper.Accepted Researcher.Reputation Attribute stratification: dependency graph acyclic acyclic for any Algorithm more flexible; allows certain cycles along guaranteed acyclic relations
(Father) (Mother) Person Person Blood Type Blood Type P-chromosome P-chromosome M-chromosome M-chromosome Person P-chromosome M-chromosome Blood Type Contaminated Blood Test Result
Structure Selection Idea: define scoring function do local search over legal structures Key Components: legal models scoring models searching model space
Scoring Models Bayesian approach: marginal likelihood P I prior ( P I I = ( : ) log ( | ) log[ ( | ) ] ) Score S P S S S Standard approach to scoring models; used in Bayesian network learning
Structure Selection Idea: define scoring function do local search over legal structures Key Components: legal models scoring models searching model space
Searching Model Space Phase 0: consider only dependencies within a class Author Review Paper Author Review Paper Potential = e Parents ( R . A ) R . B descriptiv attributes R . B ( R ) Author Review Paper
Phased Structure Search Phase 1: consider dependencies from neighboring classes, via schema relations Author Review Paper Author Review Parents Paper Potential = attributes e ( R . A ) S . C descriptiv S . C ( R S ) Author Review Paper
Phased Structure Search Phase 2: consider dependencies from further classes, via relation chains Author Review Paper Author Potential Review Paper Parents = attributes ( R . A ) T . D descriptiv T . D e ( R S T ) Author Review Paper
Issue PRM w/ AU applicable only in domains where we have full knowledge of the relational structure Next we introduce PRMs which allow uncertainty over relational structure
Kinds of structural uncertainty How many objects does an object relate to? how many Authors does Paper1 have? Which object is an object related to? does Paper1 cite Paper2 or Paper3? Which class does an object belong to? is Paper1a JournalArticleor aConferencePaper? Does an object actually exist? Are two objects identical?
Structural Uncertainty Motivation: PRM with AU only well-defined when the skeleton structure is known May be uncertain about relational structure itself Construct probabilistic models of relational structure that capture structural uncertainty Mechanisms: Reference uncertainty Existence uncertainty Number uncertainty Type uncertainty Identity uncertainty
PRMs w/ Link Uncertainty Advantages: Applicable in cases where we do not have full knowledge of relational structure Incorporating uncertainty over relational structure into probabilistic model can improve predictive accuracy Two approaches: Reference uncertainty Existence uncertainty Different probabilistic models; varying amount of background knowledge required for each
Citation Relational Schema Author Institution Research Area Wrote Paper Topic Word1 Word2 Paper Topic Word1 Word2 Cites Count Citing Paper WordN Cited Paper WordN
Attribute Uncertainty Author Institution P( Institution | Research Area) Research Area Wrote P( Topic | Paper.Author.Research Area Paper Topic P( WordN | Topic) ... Word1 WordN
Reference Uncertainty Bibliography 1. ----- 2. ----- 3. ----- ? ? ? ` Scientific Paper Document Collection
PRM w/ Reference Uncertainty Paper Topic Words Paper Topic Words Cites Cited Citing Dependency model for foreign keys Na ve Approach: multinomial over primary key noncompact limits ability to generalize Use attribute partition instead
Reference Uncertainty Example Paper Paper Paper Paper Paper P5 C1 Paper P5 Topic AI P4 P3 M2 Paper P4 Topic Theory Paper P1 Topic Theory Topic Topic Topic Topic Topic Theory Paper P2 Topic Theory AI P1 AI AI AI Paper P3 Topic AI C2 Paper.Topic = AI Paper.Topic = Theory Cites Citing Cited C1 C2 3 . 0 7 . 0
Reference Uncertainty Example Paper Paper Paper Paper Paper P1 Topic Theory P5 P4 C1 Paper P5 Topic AI Paper Paper P4 Topic Theory Paper P1 Topic Theory P3 M2 Topic Topic Topic Topic Paper P2 Topic Theory AI AI AI AI P3 Topic AI C2 Paper.Topic = AI Paper.Topic = Theory Paper Topic Words C1 1 . 0 . 0 C2 9 . 0 . 0 Topic Cites Citing Cited C1 3 . 0 C2 7 . 0 Theory 99 01 AI
Introduce Selector RVs P2.Topic Cites1.Selector P3.Topic Cites1.Cited P1.Topic P4.Topic Cites2.Selector P5.Topic Cites2.Cited P6.Topic Introduce Selector RV, whose domain is {C1,C2} The distribution over Cited depends on all of the topics, and the selector
PRMs w/ RU Semantics Paper Paper P2 Topic Theory P3 Topic AI Paper Paper P2 Topic Theory P3 Topic AI Paper P5 Topic AI Topic Theory Topic ??? Paper P5 Topic AI Topic Theory Topic ??? Paper P4 Paper P1 Topic Words Paper P4 Paper P1 Paper Paper Topic Words Cites Reg Reg Cited Citing Reg Reg Cites PRM RU entity skeleton PRM-RU + entity skeleton probability distribution over full instantiations I
Learning PRMs w/ RU Idea: just like in PRMs w/ AU define scoring function do greedy local structure search Issues: expanded search space construct partitions new operators
Learning PRMs w/ RU Idea: define scoring function do phased local search over legal structures Key Components: legal models model new dependencies scoring models unchanged searching model space new operators
Legal Models Review Mood Paper Paper Important Important Accepted Cites Accepted Citing Cited
Legal Models Cites1.Selector Cites1.Cited P2.Important R1.Mood P3.Important P1.Accepted P4.Important When a node s parent is defined using an uncertain relation, the reference RV must be a parent of the node as well.
Structure Search Paper Paper Paper Paper Paper Paper Topic Words Paper Topic Words Paper Paper Paper Paper Paper Cites Citing Cited Author Institution Paper Cited