Rocchio's Algorithm in Information Retrieval

1 / 37

Embed Share

Discover the intricacies of Rocchio's algorithm in information retrieval, its relevance in feedback systems, and its variants. Explore how it differs from Naive Bayes and its application in document processing experiments. Gain insights into the process of mapping words to document frequency and learning algorithms based on Rocchio's principles.

jveron Follow

Uploaded on Apr 03, 2025 | 2 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

Rocchios Algorithm 1

Motivation Na ve Bayes is unusual as a learner: Only one pass through data Order doesn t matter 2

Rocchios algorithm Relevance Feedback in Information Retrieval, SMART Retrieval System Experiments in Automatic Document Processing, 1971, Prentice Hall Inc. 3

Rocchios algorithm Many variants of these formulae DF(w)= #different docs w occurs in TF(w,d)= #different times w occurs in doc d |D| DF(w) u(w,d)= log(TF(w,d)+1) log(IDF(w)) u(d)= u(w1,d),....,u(w|V|,d) 1 |Cy| ||u(d)||2 d Cy u(d) ||u(d)||2 as long as u(w,d)=0 for words not in d! IDF(w)= Store only non-zeros in u(d), so size is O(|d| ) u(d) 1 u(d') ||u(d')||2 u(y)=a -b |D-Cy| d' D-Cy But size of u(y) is O(|nV| ) u(y) ||u(y)||2 f(d)=argmaxy i 2= 2 u ui 4

Given a table mapping w to DF(w), we can compute v(d) from the words in d and the rest of the learning algorithm is just adding Rocchio s algorithm DF(w) =#different docs w occurs in TF(w,d) =#different times w occurs in doc d |D| DF(w) u(w,d) = log(TF(w,d)+1) log(IDF(w)) IDF(w) = u(d) ||u(d) ||2 u(d) = u(w1,d),....,u(w|V|,d) , v(d) = = v(w1,d),.... 1 1 u(y) ||u(y) ||2 u(y) = a - b , v(y) = v(d) v(d) |D-Cy| |Cy| d Cy d' D-Cy f (d) = argmaxyv(d) v(y) 5

Rocchio v Bayes Imagine a similar process but for labeled documents Event counts Train data id1 y1 w1,1 w1,2 w1,3 . w1,k1 id2 y2 w2,1 w2,2 w2,3 . id3 y3 w3,1 w3,2 . id4 y4 w4,1 w4,2 id5 y5 w5,1 w5,2 . .. X=w1^Y=sports X=w1^Y=worldNews X=.. X=w2^Y= X= 5245 1054 2120 37 3 Recall Na ve Bayes test process? id1 y1 w1,1 w1,2 w1,3 . w1,k1 C[X=w1,1^Y=sports]=5245, C[X=w1,1^Y=..],C[X=w1,2^ ] id2 y2 w2,1 w2,2 w2,3 . C[X=w2,1^Y= .]=1054, , C[X=w2,k2^ ] id3 y3 w3,1 w3,2 . C[X=w3,1^Y= .]= id4 y4 w4,1 w4,2 6

Rocchio. Rocchio: DF counts Train data id1 y1 w1,1 w1,2 w1,3 . w1,k1 id2 y2 w2,1 w2,2 w2,3 . id3 y3 w3,1 w3,2 . id4 y4 w4,1 w4,2 id5 y5 w5,1 w5,2 . .. aardvark agent 12 1054 2120 37 3 id1 y1 w1,1 w1,2 w1,3 . w1,k1 v(w1,1,id1), v(w1,2,id1) v(w1,k1,id1) id2 y2 w2,1 w2,2 w2,3 . v(w2,1,id2), v(w2,2,id2) id3 y3 w3,1 w3,2 . id4 y4 w4,1 w4,2 7

Rocchio. DF(w)= #different docs w occurs in TF(w,d)= #different times w occurs in doc d |D| DF(w) Rocchio: DF counts Train data id1 y1 w1,1 w1,2 w1,3 . w1,k1 id2 y2 w2,1 w2,2 w2,3 . id3 y3 w3,1 w3,2 . id4 y4 w4,1 w4,2 id5 y5 w5,1 w5,2 . .. aardvark agent u(w,d)= log(TF(w,d)+1) log(IDF(w)) 12 IDF(w)= 1054 2120 37 u(d) ||u(d)||2 u(d)= u(w1,d),....,u(w|V|,d) , v(d)= 3 id1 y1 w1,1 w1,2 w1,3 . w1,k1 v(id1 ) id2 y2 w2,1 w2,2 w2,3 . v(id2 ) id3 y3 w3,1 w3,2 . id4 y4 w4,1 w4,2 8

1 1 u(y) ||u(y)||2 Rocchio . |D-Cy| u(y) =a -b , v(y) = v(d) v(d) |Cy| d Cy d' D-Cy f (d) =argmaxyv(d) v(y) id1 y1 w1,1 w1,2 w1,3 . w1,k1 v(w1,1 w1,2 w1,3 . w1,k1 ), the document vector for id1 id2 y2 w2,1 w2,2 w2,3 . v(w2,1 w2,2 w2,3 .)= v(w2,1 ,d), v(w2,2 ,d), id3 y3 w3,1 w3,2 . id4 y4 w4,1 w4,2 For each (y, v), go through the non-zero values in v one for each win the document d and increment a counter for that dimension of v(y) Message: increment v(y1) s weight for w1,1by v(w1,1 ,d) /|Cy| Message: increment v(y1) s weight for w1,2by v(w1,2 ,d) /|Cy| 9

w f (d) =argmaxyv(d) v(y) =argmaxy v(w,d) v(w,y) Rocchio at Test Time Rocchio: DF counts Train data id1 y1 w1,1 w1,2 w1,3 . w1,k1 id2 y2 w2,1 w2,2 w2,3 . id3 y3 w3,1 w3,2 . id4 y4 w4,1 w4,2 id5 y5 w5,1 w5,2 . .. aardvark agent v(y1,w)=0.0012 v(y1,w)=0.013, v(y2,w)= .... id1 y1 w1,1 w1,2 w1,3 . w1,k1 v(id1 ), v(w1,1,y1),v(w1,1,y1), .,v(w1,k1,yk), ,v(w1,k1,yk) id2 y2 w2,1 w2,2 w2,3 . v(id2 ), v(w2,1,y1),v(w2,1,y1), . id3 y3 w3,1 w3,2 . id4 y4 w4,1 w4,2 10

Rocchio Summary Compute DF one scan thru docs Compute v(idi) for each document output size O(n) time: O(n), n=corpus size like NB event-counts time: O(n) one scan, if DF fits in memory like first part of NB test procedure otherwise time: O(n) one scan if v(y) s fit in memory like NB training otherwise Add up vectors to get v(y) Classification ~= disk NB 11

Rocchio results Joacchim 98, A Probabilistic Analysis of the Rocchio Algorithm Rocchio s method (w/ linear TF) Variant TF and IDF formulas 12

Rocchio results Schapire, Singer, Singhal, Boosting and Rocchio Applied to Text Filtering , SIGIR 98 Reuters 21578 all classes (not just the frequent ones) 14

A hidden agenda Part of machine learning is good grasp of theory Part of ML is a good grasp of what hacks tend to work These are not always the same Especially in big-data situations Catalog of useful tricks so far Brute-force estimation of a joint distribution Naive Bayes Stream-and-sort, request-and-answer patterns BLRT and KL-divergence (and when to use them) TF-IDF weighting especially IDF it s often useful even when we don t understand why 15

One more Rocchio observation Rennie et al, ICML 2003, Tackling the Poor Assumptions of Na ve Bayes Text Classifiers NB + cascade of hacks 16

One more Rocchio observation Rennie et al, ICML 2003, Tackling the Poor Assumptions of Na ve Bayes Text Classifiers In tests, we found the length normalization to be most useful, followed by the log transform these transforms were also applied to the input of SVM . 17

One? more Rocchio observation Split into documents subsets Documents/labels Documents/labels 1 Documents/labels 2 Documents/labels 3 Compute DFs DFs -1 DFs - 2 DFs -3 Sort and add counts DFs 18

One?? more Rocchio observation DFs Split into documents subsets Documents/labels Documents/labels 1 Documents/labels 2 Documents/labels 3 Compute partial v(y) s v-3 v-2 v-1 Sort and add vectors v(y) s 19

O(1) more Rocchio observation Split into documents subsets Documents/labels Documents/labels 1 Documents/labels 2 Documents/labels 3 Compute partial v(y) s DFs DFs DFs v-3 v-2 v-1 Sort and add vectors v(y) s coming up - how We have shared access to the DFs, but only shared read access we don t need to share write access. So we only need to copy the information across the different processes. 20

Abstract Implementation: [TF]IDF 1/2 data = pairs (docid ,term) where term is a word appears in document with id docid operators: DISTINCT, MAP, JOIN GROUP BY . [RETAINING ] REDUCING TO a reduce step aardvark (d123,aardvark), aardvark (d123,aardvark), 7 key found found key value (d123,found),(d134,found), (d123,found),(d134,found), 2456 value docId d123 d123 term found aardvark docFreq = DISTINCT data | GROUP BY (docid,term):term REDUCING TO count /* (term,df) */ key 1 value 12451 docIds = MAP DATA BY= (docid,term):docid | DISTINCT numDocs = GROUP docIds BY docid:1 REDUCING TO count /* (1,numDocs) */ question how many reducers should I use here? dataPlusDF = JOIN data BY (docid, term):term, docFreq BY (term, df):term | MAP ((docid,term),(term,df)):(docId,term,df) /* (docId,term,document-freq) */ unnormalizedDocVecs = JOIN dataPlusDF by row:1, numDocs by row:1 | MAP ((docId,term,df),(dummy,numDocs)): (docId,term,log(numDocs/df)) /* (docId, term, weight-before-normalizing) : u */ 21

Two ways to join Reduce-side join Map-side join 22

Two ways to join Reduce-side join for A,B term found aardvark df 2456 7 A (aardvark, 7) B (aardvark, d15) ... A concat and sort do the join ( term aardvark found found found docId d15 ... d7 d23 (found,24 56) (found,24 56) (found,d7) (found,d23) B 23

tricky bit: need sort by first two values (aardvark, AB) we want the DF s to come first Two ways to join Reduce-side join for A,B but all tuples with key aardvark should go to same worker term found aardvark df df 2456 7 A 2456 A 7 found aardvark A (aardvark, 7) B (aardvark, d15) ... concat and sort do the join ( term docI d d15 ... d7 d23 docId d15 ... d7 d23 (found,24 56) (found,24 56) (found,d7) aardvark aardvark found found found B (found,d23) found found found B B B 24

tricky bit: need sort by first two values (aardvark, AB) we want the DF s to come first Two ways to join Reduce-side join for A,B but all tuples with key aardvark should go to same worker term found aardvark df df 2456 7 A 2456 A 7 found aardvark custom sort (secondary sort key): Writeable with your own Comparator concat and sort term docI d d15 ... d7 d23 docId d15 ... d7 d23 aardvark aardvark found found found B custom Partitioner (specified for job like the Mapper, Reducer, ..) found found found B B B 25

Two ways to join Map-side join write the smaller relation out to disk send it to each Map worker DistributedCache when you initialize each Mapper, load in the small relation Configure( ) is called at initialization time map through the larger relation and do the join faster but requires one relation to go in memory 26

Abstract Implementation: [TF]IDF 1/2 data = pairs (docid ,term) where term is a word appears in document with id docid operators: DISTINCT, MAP, JOIN GROUP BY . [RETAINING ] REDUCING TO a reduce step aardvark (d123,aardvark), aardvark (d123,aardvark), 7 key found found key value (d123,found),(d134,found), (d123,found),(d134,found), 2456 value docId d123 d123 term found aardvark docFreq = DISTINCT data | GROUP BY (docid,term):term REDUCING TO count /* (term,df) */ key 1 value 12451 docIds = MAP DATA BY= (docid,term):docid | DISTINCT numDocs = GROUP docIds BY docid:1 REDUCING TO count /* (1,numDocs) */ question how many reducers should I use here? dataPlusDF = JOIN data BY (docid, term):term, docFreq BY (term, df):term | MAP ((docid,term),(term,df)):(docId,term,df) /* (docId,term,document-freq) */ unnormalizedDocVecs = JOIN dataPlusDF by row:1, numDocs by row:1 | MAP ((docId,term,df),(dummy,numDocs)): (docId,term,log(numDocs/df)) /* (docId, term, weight-before-normalizing) : u */ 27

Abstract Implementation: [TF]IDF 1/2 data = pairs (docid ,term) where term is a word appears in document with id docid operators: DISTINCT, MAP, JOIN GROUP BY . [RETAINING ] REDUCING TO a reduce step aardvark (d123,aardvark), aardvark (d123,aardvark), 7 key found found key value (d123,found),(d134,found), (d123,found),(d134,found), 2456 value docId d123 d123 term found aardvark docFreq = DISTINCT data | GROUP BY (docid,term):term REDUCING TO count /* (term,df) */ key 1 value 12451 docIds = MAP DATA BY= (docid,term):docid | DISTINCT numDocs = GROUP docIds BY docid:1 REDUCING TO count /* (1,numDocs) */ question how many reducers should I use here? dataPlusDF = JOIN data BY (docid, term):term, docFreq BY (term, df):term | MAP ((docid,term),(term,df)):(docId,term,df) /* (docId,term,document-freq) */ unnormalizedDocVecs = JOIN dataPlusDF by row:1, numDocs by row:1 | MAP ((docId,term,df),(dummy,numDocs)): (docId,term,log(numDocs/df)) /* (docId, term, weight-before-normalizing) : u */ question how many reducers should I use here? 28

Abstract Implementation: TFIDF 2/2 normalizers = GROUP unnormalizedDocVecs BY (docId,term,w):docid RETAINING (docId,term,w): w2 REDUCING TO sum /* (docid,sum-of-square-weights) */ key d1234 d3214 d3214 key d1234 (d1234,found,1.542), (d1234,aardvark,13.23), 37.234 . . 29.654 (d1234,found,1.542), (d1234,aardvark,13.23), 37.234 docVec = JOIN unnormalizedDocVecs BY (docId,term,w):docid, normalizers BY (docId,norm):docid | MAP ((docId,term,w), (docId,norm)): (docId,term,w/sqrt(norm)) /* (docId, term, weight) */ docId d1234 d1234 term found aardvark 13.23 w 1.542 docId d1234 d1234 w 37.234 37.234 29

ANOTHER EXAMPLE: COMPUTING TFIDF IN GUINEA PIG 30

Actual Implementation 31

Actual Implementation docId d123 d123 w found aardvark 32

Actual Implementation docId d123 d123 w found aardvark key found aardvark value (d123,found),(d134,found), 2456 (d123,aardvark), 7 33

Actual Implementation Augment: loads a preloaded object b at mapper initialization time, cycles thru the input, and generates pairs (a,b) 34

Full Implementation 35

ANOTHER EXAMPLE: COMPUTING TFIDF IN PIG LATIN 36

(docid,token) (docid,token,tf(token in doc)) (docid,token,tf) (docid,token,tf,length(doc)) (docid,token,tf,n) ( ,tf/n) (docid,token,tf,n,tf/n) ( ,df) ndocs.total_docs relation-to-scalar casting (docid,token,tf,n,tf/n) (docid,token,tf/n * id) 37

Rocchio's Algorithm in Information Retrieval

Download Presentation

Presentation Transcript

Related

More Related Content