Lemmas and Word Frequencies in Linguistics

zippf s zippf s law n.w
1 / 5
Embed
Share

Learn about the significance of lemma frequency and word coverage in linguistic analysis based on research by J. Hana from UFAL and Charles University. Discover how common lemmas impact text understanding and the challenges of achieving complete lexicon coverage.

  • Lemmas
  • Linguistics
  • Word Frequencies
  • Language Analysis
  • UFAL

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Zippfs Zippf s law law J. Hana, UFAL, Charles University

  2. Zippfs Zippf s law law word frequency is inversely proportional to its freq rank J. Hana, UFAL, Charles University

  3. Corpus coverage by lemma frequency Corpus coverage by lemma frequency (PDT nouns) (PDT nouns) J. Hana, UFAL, Charles University

  4. What does it mean? What does it mean? The good news The good news Complete unsupervised morphology is not necessary 2.5K most frequent lemmas cover 3/4 of tokens 7K most frequent lemmas cover nearly 90% of tokens J. Hana, UFAL, Charles University

  5. What does it mean? What does it mean? The bad news The bad news Complete lexicon is impossible, nearly complete is hard Coverage gains drop quickly each of the 5 lower deciles adds ca 1% Infrequent lemmas are text specific 70% (!!) of the less frequent half of the lemmas from tr1 do not occur in tr2 J. Hana, UFAL, Charles University

Related


More Related Content