Bootstrapping Corpora and Terminology Translation Methods

comparable corpora bootcat ccbc n.w
1 / 11
Embed
Share

Explore the BootCaT method for creating domain-specific corpora, bilingual terminology, and multilingual capabilities. Understand how BootCaT leverages search engines and integrated web interfaces to facilitate the process. Dive into the world of comparable corpora and linguistic computing to enhance your language projects.

  • BootCaT
  • Corpora
  • Translation
  • Linguistic Computing
  • Multilingual

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Comparable Corpora BootCat (CCBC) Adam Kilgarriff, Avinesh PVS Lexical Computing Ltd

  2. BootCaT Bootstrapping Corpora and Terms Translators Know the language Not domain experts Can interpret domain terms but can t guess them Instant domain corpus from the web Marco Baroni and Silvia Bernardini (2004)

  3. BootCaT method Piggyback on a search engine Google, Yahoo, Bing Set of seed terms Repeat Take random 3 seeds Send to search engine Gather search hits pages Remove, duplicates, find terms Can iterate

  4. WebBootCaT Web interface Improved cleaning, duplicate removal Integrated with corpus tool (Sketch Engine)

  5. Going multilingual Google-translate English: volcanology volcanologist "volcanic eruption" seismographs Eyjafjallajokull geodic "deformation monitoring" tephra magma stratigraphic tephrochronology geochronological "volcanic ash" ablation rhyolitic French:vulcanologue volcanologie " ruption volcanique " sismographes Eyjafjallajokull "surveillance de la d formation" g odiques tephra magma t phrochronologie stratigraphique g ochronologiques "de cendres volcaniques" ablation rhyolitiques And do the same thing for French

  6. By July 2011 All steps integrated Propose bilingual terminology

Related


More Related Content