DC specifications

DC specifications
Slide Note
Embed
Share

In this collection of images, learn about the specifications for creating meaningful and correct DC entries. Explore the dos and don'ts, understand what makes an entry good and how to define a matching DC. Dive into concepts such as Speaker vs. Singer, CLARIN standards, and the characteristics of a good DC. Discover the importance of a reusable definition and how to ensure your DCs are valid and useful for various purposes.

  • Specifications
  • Dos and Donts
  • Matching DC
  • Reusable Definition
  • CLARIN Standards

Uploaded on Feb 23, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. DC specifications or Do s and don ts when creating a DC

  2. Your work wrt ISOcat Create an entry Link with an existing entry In both cases: the entries should be GOOD ones But: what makes an entry a good one, one that you can use?

  3. What defines a matching DC? It should match with the way you use a specific notion in the annotation scheme, application, at hand It should come with the same profile It should handle the same phenomenon, SpeakerID =/= SingerID

  4. Speaker vs Singer SingerID and SpeakerID: siblings SingerID is subclass of both Singer and ID (RELcat!) String Name Person Singer Opera singer Tenor Tenor in La Boh me First: too generic, last: too specific The others are in se candidates for DCs

  5. (CLARIN) standards Hardly any available (cf morning session) We really should try to arrive at a series of sound DCs, useful for YOU and as many other people as possible

  6. What defines a good DC? Meaningful definition Indefinite pronoun Not: pronoun that is indefinite Unless both pronoun and indefinite are defined elsewhere AND it is mentioned explicitly which are involved AND these definitions are correct (for you)

  7. What defines a good DC? Correct definition Personal pronoun Not: pronoun refering to persons As That cat has five kittens. SHE This table was very expensive but I like IT very much [Note: in a particular tagset the definition may be correct! In general it is not.]

  8. What defines a good DC? Reusable definition Personal pronoun Not: In CGN a personal pronoun Not: In Dutch a personal pronoun Not: A personal pronoun (ik, ikke and ikzelf) is characterized by A definition should be as neutral (project, language) as possible, while still valid for your purposes!

  9. Good DC good name Sometimes confused: 1. Identifier (=/= PID) 2. Data Element Name 3. Name Re 1: should come in camelCaseFormat, start with alphabetical character (not 1stPerson, but firstPerson), in English, be meaningful (not EVON, but singularNeuterForm)),

  10. Good DC good name Re 2: field Data Element Name is proper place to mention abbreviations/tags used for a particular notion, and not just for English (N, NPlur, EVON) Re 3: In all Language Sections the correct full name(s) in the working language at hand are provided

  11. Flagged DCs Never link with deprecated DCs ! In other cases the flags show whether the DC specification is correct from a technical point of view. Note that only DCs with a green marking are qualified for standardization

  12. DC/DCS and profile Profiles are not added automatically, a DCS may contain elements with various profiles In case the profile you need is not yet available, contact Menzo

  13. What to include? Cf slide on SingerID/SpeakerID In general: all linguistically meaningful notions mentioned in your schema, manual, definition Abbreviations (PST for /past tense/) are to be mentioned as Data Element Name

  14. Dos & donts Do s: Create a DCS for your scheme (name project, annotation scheme, ) Provide clear definition (short, to the point) for your scheme, application, . Take care not to leave concepts used in your definition undefined or vague Use appropriate vocabulary (per profile) Check adopted DC s regularly till standardization !

  15. Dos (continued) When creating a DC, fill out Justification: used in XYZ, part of tagset N Language section Always English language section Strong recommendation: sections for object language(s), for working language (like language in which manual is written) Sections in the various languages should match (+/- be translations of each other)

  16. Dos (continued) When creating a DC, fill out Example section Note that *negative* examples may be very helpful! foreignWord Dutch language section example section: the, house, NOT: poster explanation section: een woord als poster heeft Nederlandse diminutief: postertje, itt house (*housje, *houseje)

  17. Example sections Suppose you want to illustrate a Dutch phenomenon: Ex.sec. in EN language section Dutch ex with transl in English Ex.sec. in DE language section Dutch ex with transl in German Ex.sec. in EN linguistic section EN example Ex.sec. in DE linguistic section DE example with translation in English

  18. Donts Confuse Language and Linguistic section Latter contains language specific values for closed domains Be (too) language specific in definition Mention scheme in definition Use several definitions in one DC Circular definitions Rely on authority Rely on standardized status Definition should fit YOUR scheme, etc

Related


More Related Content