ClRank: A Method for Keyword Extraction Using Clustering and Distributions

ClRank: A Method for Keyword Extraction Using Clustering and Distributions
Slide Note
Embed
Share

ClRank is a method designed for extracting keywords from web pages by utilizing clustering and distributions of nouns. The process involves text extraction, pre-processing, POS tagging, lemmatization, similarities comparison, clustering, and ranking. The effectiveness of clustering in keyword selection is demonstrated through results showing improved precision, recall, and F-measure. Conclusions drawn indicate the superiority of ClRank over traditional methods.

  • ClRank
  • Keyword Extraction
  • Clustering
  • Nouns

Uploaded on Feb 24, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. ClRank: A method for keyword extraction from web pages using clustering and distributions of nouns Mohammad Rezaei, Najlah Gali, Pasi Fr nti School of Computing 7.12.2015

  2. Problem: not all www pages have keywords Scientific document HTML document <meta name="keywords" content="" />

  3. Overall process Web page Text extraction and pre-processing POS tagging and nouns extraction Nouns lemmatization Nouns Similarities Nouns clustering and clusters ranking Keywords selection Keywords

  4. Overall process Web page Part of the extracted text Text extraction and pre-processing POS tagging <h1>ABOUT FORME SPA</h1> <p>Forme Spa offers a tranquil environment designed for relaxation and rejuvenation. The spa is located in . </p> Forme/NNP Spa/NNP offers/VBZ a/DT tranquil/JJ environment/NN designed/VBN relaxation/NN and/CC for/IN POS tagging and nouns extraction Extracted lemmas Nouns Forme, spa, building, treatment, massage, therapy lemmatization Nouns Similarities Nouns clustering and clusters ranking Keywords selection

  5. Overall process Web page Text extraction and pre-processing POS tagging and nouns extraction Similarity matrix of lemmas SpaBuildi Treatm ent Massa ge Thera py Nouns ng lemmatization 1.00 0.89 0.23 0.20 0.19 Spa 0.89 1.00 0.70 0.67 0.63 Building Nouns Similarities Complete-link clustering and ranked clusters Treatment 0.23 0.70 1.00 0.95 0.91 0.20 0.67 0.95 1.00 0.87 Massage Spa (33) 11 Cluster 1: Building (1) 0.19 0.63 0.91 0.87 1.00 Therapy Nouns clustering and clusters ranking 8 Cluster 2:Treatment Massage (7) Therapy (2) (20) Keywords 5 Cluster 3:Auckland Wellington (6) City (2) Spa, Auckland, Wellington Treatment, Massage, (7) Keywords selection 2 Cluster 4: Service (5) Care (2)

  6. The effect the clustering

  7. The effect the clustering

  8. Overall results Set 1 Set 2 Method Precision Recall F-measure Precision Recall F-measure 0.38 0.38 0.37 0.36 0.33 0.33 0.51 0.38 0.42 0.51 0.38 0.42 Term frequency 0.35 0.52 0.33 0.46 0.46 0.52 0.46 0.52 0.41 0.37 0.46 0.47 TextRank ClRank (average) ClRank (complete)

  9. Conclusions 1. Outperforms state-of-the-art Improves TextRank from 37% to 47% Improves Text frequency 41% to 47% 2. Distribution of nouns more effective than term frequency.

  10. Overall process Web page Part of the extracted text Text extraction and pre-processing POS tagging <h1>ABOUT FORME SPA</h1> <p>Forme Spa offers a tranquil environment designed for relaxation and rejuvenation. The spa is located in . </p> Forme/NNP Spa/NNP offers/VBZ a/DT tranquil/JJ environment/NN designed/VBN relaxation/NN and/CC for/IN POS tagging and nouns extraction Similarity matrix of lemmas Extracted lemmas SpaBuildi Treatm ent Massa ge Thera py Nouns ng Forme, spa, building, treatment, massage, therapy lemmatization 1.00 0.89 0.23 0.20 0.19 Spa 0.89 1.00 0.70 0.67 0.63 Building Nouns Similarities Complete-link clustering and ranked clusters Treatment 0.23 0.70 1.00 0.95 0.91 0.20 0.67 0.95 1.00 0.87 Massage Spa (33) 11 Cluster 1: Building (1) 0.19 0.63 0.91 0.87 1.00 Therapy Nouns clustering and clusters ranking 8 Cluster 2:Treatment Massage (7) Therapy (2) (20) Keywords 5 Cluster 3:Auckland Wellington (6) City (2) Spa, Auckland, Wellington Treatment, Massage, (7) Keywords selection 2 Cluster 4: Service (5) Care (2)

More Related Content