Creating Effective ISOcat Entries for Annotation Schemes

www isocat org www isocat org n.w
1 / 38
Embed
Share

Learn how to create high-quality DC entries in ISOcat for better annotation schemes. Understand the key criteria for good entries and how to link, adopt, or create new entries effectively. Improve your annotation process with these guidelines.

  • - Annotation Schemes
  • - ISOcat Tutorial
  • - Annotation Entries
  • - Effective Practices

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. www.isocat.org www.isocat.org ISOcat: How to create a DC (including do s and don ts ) 20 June 2013 CLARIN-NL ISOcat tutorial 1

  2. Your work wrt ISOcat www.isocat.org www.isocat.org Adopt an existing entry Create an entry Link with an existing entry In all cases: the entries should be GOOD ones But: what makes an entry a good one, one that you can use? 20 June 2013 CLARIN-NL ISOcat tutorial 2

  3. www.isocat.org www.isocat.org A good DC What defines a good DC? It should match with the way you use a specific notion in the annotation scheme, application, at hand It should come with the same profile It should handle the same phenomenon, SpeakerID =/= SingerID 20 March 2012 20 June 2013 CLARIN-NL ISOcat tutorial CLARIN-NL ISOcat tutorial 3 3

  4. www.isocat.org www.isocat.org Speaker vs Singer SingerID and SpeakerID: siblings SingerID is subclass of both Singer and ID (RELcat!) String Name Person Singer Opera singer Tenor Tenor in La Boh me First: too generic, last: too specific The others are in se candidates for DCs 20 March 2012 20 June 2013 CLARIN-NL ISOcat tutorial CLARIN-NL ISOcat tutorial 4 4

  5. www.isocat.org www.isocat.org Standards Hardly any available (cf morning session) We really should try to arrive at a series of sound DCs, useful for YOU and as many other people as possible => not too specific, not too general 20 March 2012 20 June 2013 CLARIN-NL ISOcat tutorial CLARIN-NL ISOcat tutorial 5 5

  6. What defines a good DC? www.isocat.org www.isocat.org Meaningful definition Indefinite pronoun Not: pronoun that is indefinite Unless both pronoun and indefinite are defined elsewhere AND it is mentioned explicitly which are involved AND these definitions are correct (for you) 20 March 2012 20 June 2013 CLARIN-NL ISOcat tutorial CLARIN-NL ISOcat tutorial 6 6

  7. www.isocat.org www.isocat.org Correct definition Personal pronoun Not: pronoun referring to persons As That cat has five kittens. SHE This table was very expensive but I like IT very much And John shook HIS head [Note: in a particular tagset the definition may be correct! In general it is not.] 20 March 2012 20 June 2013 CLARIN-NL ISOcat tutorial CLARIN-NL ISOcat tutorial 7 7

  8. www.isocat.org www.isocat.org Reusable definition Personal pronoun Not: In CGN a personal pronoun Not: In Dutch a personal pronoun Not: A personal pronoun (ik, ikke and ikzelf) is characterized by A definition should be as neutral (project, language) as possible, while still valid for your purposes! 20 June 2013 CLARIN-NL ISOcat tutorial 8

  9. Good DC => good name www.isocat.org www.isocat.org Sometimes confused: 1. Identifier (=/= PID) 2. Data Element Name 3. Name Re 1: should come in camelCaseFormat, start with alphabetical character (not 1stPerson, but firstPerson), in English, be meaningful (not EVON, but singularNeuterForm) , 20 June 2013 CLARIN-NL ISOcat tutorial 9

  10. www.isocat.org www.isocat.org Re 2: field Data Element Name (DEN) is proper place to mention abbreviations/tags used for a particular notion, and not just for English (N, NPlur, EVON) Re 3: In all Language Sections the correct full name(s) in the working language at hand are provided 20 June 2013 CLARIN-NL ISOcat tutorial 10

  11. www.isocat.org www.isocat.org decision process: 20 June 2013 CLARIN-NL ISOcat tutorial 11

  12. www.isocat.org www.isocat.org Flagged DCs why? 20 June 2013 CLARIN-NL ISOcat tutorial 12

  13. www.isocat.org www.isocat.org Flagged DCs Try to avoid linking with deprecated or superseded DCs ! do not use DCs with 2 definitions!! In other cases the flags show whether the DC specification is correct from a more technical point of view Note that only DCs with a green marking are qualified for standardization (or CLARIN-NL/VL recommendation) 20 June 2013 CLARIN-NL ISOcat tutorial 13

  14. DC/DCS and profile www.isocat.org www.isocat.org Profiles are not added automatically, a DCS may contain elements with various profiles Profile not available : only to be used when the correct profile is not contained in the list! In such a case, use Not available for the time being, AND Contact isocat@mpi.nl 20 June 2013 CLARIN-NL ISOcat tutorial 14

  15. Which elements to include? www.isocat.org www.isocat.org Cf slide on SingerID/SpeakerID In general: all linguistically meaningful notions mentioned in your schema, manual, definition PLUS the metadata (CMDI !) Abbreviations (PST for /past tense/) are to be mentioned as Data Element Name 20 June 2013 CLARIN-NL ISOcat tutorial 15

  16. Dos& donts www.isocat.org www.isocat.org Do s: Create a DCS for your scheme (name project, annotation scheme, ) Provide clear definition (short, to the point) for your scheme, application, . Take care not to leave concepts used in your definition undefined or vague ( note section !) Use appropriate profile (NOT: undecided ) Use appropriate vocabulary (per profile) Check adopted DC s regularly till standardization ! 20 June 2013 CLARIN-NL ISOcat tutorial 16

  17. www.isocat.org www.isocat.org Dos When creating a DC, fill out Justification: used in XYZ, part of tagset N Why existing DCs could not be reused !!!!! Language section Always English language section (+ Dutch!) Strong recommendation: sections for object language(s), for working language (like language in which manual is written) Sections in the various languages should match (+/- be translations of each other) Profile Undecided is NOT correct! 20 June 2013 CLARIN-NL ISOcat tutorial 17

  18. www.isocat.org www.isocat.org When creating a DC, fill out Example section Note that *negative* examples may be very helpful! Identifier foreignWord Dutch language section example section: the, house, NOT: poster explanation section: een woord als poster heeft Nederlandse diminutief: postertje, itt house (*housje, *houseje) 20 June 2013 CLARIN-NL ISOcat tutorial 18

  19. www.isocat.org www.isocat.org Example sections Suppose you want to illustrate a real Dutch phenomenon ( neuter vs non-neuter ) : Ex.sec. in EN language section Dutch ex with transl in English Ex.sec. in DE language section Dutch ex with transl in German Ex.sec. in EN linguistic section EN example Ex.sec. in DE linguistic section DE example with translation in English 20 June 2013 CLARIN-NL ISOcat tutorial 19

  20. Donts www.isocat.org www.isocat.org Confuse Language and Linguistic section Latter contains language specific values for closed domains Be (too) language specific in definition Mention scheme in definition Use several definitions in one DC Circular definitions Rely on authority Rely on standardized status Definition should fit YOUR scheme, etc 20 June 2013 CLARIN-NL ISOcat tutorial 20

  21. www.isocat.org www.isocat.org Questions? 20 June 2013 CLARIN-NL ISOcat tutorial 21

  22. www.isocat.org www.isocat.org 20 June 2013 CLARIN-NL ISOcat tutorial 22

  23. www.isocat.org www.isocat.org 20 June 2013 CLARIN-NL ISOcat tutorial 23

  24. www.isocat.org www.isocat.org 20 June 2013 CLARIN-NL ISOcat tutorial 24

  25. www.isocat.org www.isocat.org 20 June 2013 CLARIN-NL ISOcat tutorial 25

  26. www.isocat.org www.isocat.org 20 June 2013 CLARIN-NL ISOcat tutorial 26

  27. www.isocat.org www.isocat.org RelCat Linking DCs is not just a nice feature Proper noun Common noun Mass noun Count noun are all instances of noun (i.e. have an IsA relation with it) 20 June 2013 CLARIN-NL ISOcat tutorial 27

  28. www.isocat.org www.isocat.org RelCat Essential for several Dutch tag sets N(soort, .) comes with 2 DCs: 1. Noun 2. Common How to relate this with one of the DCs for common noun , even in case we would find the definition perfect? Good news: in progress! 20 June 2013 CLARIN-NL ISOcat tutorial 28

  29. Some considerations www.isocat.org www.isocat.org DC N(common) as a unit DC Noun and DC Common We are to take care that a definition for Common is not seen as definition of common noun (i.e. the whole) We are to take care that, when a notion noun is used in the definition of common , it gets the intended reading 20 June 2013 CLARIN-NL ISOcat tutorial 29

  30. www.isocat.org www.isocat.org More complex N(soort,mv,dim) noun(common,plural,diminutive) More problematic to define as a whole, not just stating: a diminutive common noun used as plural This doesn t mean anything! Possible solution: linking it with the intended readings of the features involved 20 June 2013 CLARIN-NL ISOcat tutorial 30

  31. www.isocat.org www.isocat.org Searching How to detect which DCs are Standardized? Or have a German language section? How to search using the keys? And what about language of keywords? How to detect which DCs belong together (unless one mentions the tag set in the definition e.g ) 20 June 2013 CLARIN-NL ISOcat tutorial 31

  32. www.isocat.org www.isocat.org Searching How to search for alternative names (Data Element Names): Konjunktion, Bindewort; Pr position/ Verh ltniswort And the results: when not using exact match and a specific field, MANY results come up, apparently unordered, while using exact + specific field or profile may make you miss relevant entries. 20 June 2013 CLARIN-NL ISOcat tutorial 32

  33. www.isocat.org www.isocat.org Consequences of mapping Suppose, you map with a specific DC, and some essential changes are made to that DC You may no longer want to map, but how do you know? Suppose the are several relevant DCs, you select one and just that one doesn t get standardized You have to redo your work (but you first are to be aware that ) 20 June 2013 CLARIN-NL ISOcat tutorial 33

  34. www.isocat.org www.isocat.org Ill-defined DCs Profile: morphosyntax Definition: semantic Definition: too narrow/broad Definition unclear (and no examples available) concept in definition not defined in ISOcat , or That concept comes with several DCs (which one was meant?) 20 June 2013 CLARIN-NL ISOcat tutorial 34

  35. www.isocat.org www.isocat.org Too many DCs There are too many almost the same DCs, even within the same profile Too vague DCs There are many DCs with rather empty definitions Proper noun: a noun or adjective denoting a single object Common noun: a noun or adjective denoting a class of objects 20 June 2013 CLARIN-NL ISOcat tutorial 35

  36. www.isocat.org www.isocat.org Too language-specific DCs Quite a number of DCs are too specific, mostly Polish ones, this makes it difficult to map with them In these cases: stuff that belongs in the Polish language section is in the general, English one *** ISOcat: not yet perfect 20 June 2013 CLARIN-NL ISOcat tutorial 36

  37. www.isocat.org www.isocat.org Therefore, while for some technical issues solutions will come up/are coming up YOU should also be very careful yourself, especially wrt the soundness of the DCs, in particular as far as definitions, profile, and translation are concerned! Only in that case ISOcat can become a success story! 20 June 2013 CLARIN-NL ISOcat tutorial 37

  38. www.isocat.org www.isocat.org Thanks ! 20 June 2013 CLARIN-NL ISOcat tutorial 38

Related


More Related Content