Improving Resource Discovery with Controlled Vocabularies

vocabularies and linked data n.w
1 / 20
Embed
Share

Enhance resource discovery by implementing controlled vocabularies in metadata management. Discover the benefits of subject headings, authority control, metadata standards, and equivalence relationships. Explore how consistent terms and hierarchies of terms can optimize information organization and retrieval.

  • Metadata
  • Vocabularies
  • Controlled Vocabulary
  • Resource Discovery
  • Information Management

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Vocabularies and Linked Data IST 653

  2. Benefits of Subject Headings Improves Resource Discovery Users can retrieve meaningful sets of digital objects Retrieve by: Type: Audio, Moving Image, Document Format: .wav, .tif, .pdf Place: Albany County (N.Y.) Person: Cuomo, Mario M., 1932-2015 Consistent terms are used for Names and Subjects There are many authorities Ontology and authority are used interchangeably

  3. Authority control or Controlled Vocabularies Organizes bibliographic information Dictates what you can put in the metadata field Uses one distinct spelling Names of people, places, things, and concepts are authorized Facilitates browsing Subject and Name authorities were the only option for searching before the rise of computerized search indexes Decades long, tried and true method for organizing information

  4. Metadata, Vocabularies, Content, Format Metadata Standard: structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use or manage an information resource ex. MARC, Dublin Core, EAD Content Standard: is a set of rules for describing information DACS, RDA, AACR2 Controlled Format: rules concerning the allowable data types and formatting of information ISO ex. 2017-04-07 Controlled Vocabulary: standardized list of terms that been selected for consistent use in describing information LCSH, SKOS,

  5. Inconsistent Values for Resource Type No consistency without control Creating unique entries Human users can make read it, but computers cannot logically Computers very literal Nearly impossible to sustain or migrate metadata into the future

  6. Equivalence relationship

  7. Handles Hierarchy of terms A stricter form of vocabulary is a Thesaurus Broader terms Narrower terms

  8. Spectrum of Controlled Vocabularies

  9. Handles for Ambiguity & Disambiguation Bank: (Financial Institution ) (Container) (A mound of dirt)

  10. Rise of Search Founded by Larry Page and Sergey Brin Many server farms Each composed of thousands of low-cost computers running stripped-down versions of Linux Google does not give out specifics on how their process works Estimated more than 450,000 servers around the world

  11. Googles Search Google knows the web is unstructured. Takes messy web, and creates orderly indexs Crawls web creates PageRank PageRank works by counting the number and quality of links to a page to determine a rough estimate of how important the website is. The underlying assumption is that more important websites are likely to receive more links from other websites Search, until now, is dominated by this paradigm.

  12. Linked Data or Semantic Web Coined by Tim Berners-Lee and part of his original plan for the World Wide Web Uses XML standard called Resource Description Framework or RDF We are using XML, so isn t the web already understandable by computers? The Semantic Web was laid out in a famous article published 2001 in Scientific American by Tim Berners-Lee, James Hendler, and Ora Lassila

  13. But, we have XML XML marked up web information and structured it Computers could read XML documents, and restructure data, manipulate, update and delete and send over the web safely XML allowed computers to talk to each other via serialization. Before that, most documents were binary meaning that each document needed the specific program to run read the document For example, a Word document needed MS Word to read it

  14. Before Linked Data The web is linked together Lots of relationships Great for humans, We can link to images, charts, other documents But, computers don t understand the relationships like humans do

  15. Linked Data Explicitly express things and relationships, like names, birthdays, places, friend, parent, etc. Links to a name, like Frank in a website to another person, Jan on another website etc. Frank knows Jan and she is parent of Tim Express it in a common format, using RDFa or JSON-LD How do we do it: Authorities URIs Triples

  16. Resource Description Framework in Attributes (RDFa) XML based standard is a W3C Recommendation that adds a set of attribute-level extensions to HTML, XHTML and various XML-based document types for embedding rich metadata within Web documents. XML /serialization is the physical format of data, while the RDFa is a data model so that we can represent the book's inherent meaning.

  17. JSON for Linking Data JSON, or JavaScript Object Notation, is a minimal, readable format for structuring data. It is used primarily to transmit data between a server and web application, as an alternative to XML. JSON-LD, or JavaScript Object Notation for Linked Data, is a method of encoding Linked Data using JSON

  18. Triples Subject The resource that s being described. Book, person, LCSH, website, function, anything that can be described can be a subject. This is always a URI (uniform resource identifier) of some kind. That could be a URL. It could also be another kind of identifier, such as an ISBN.2 Predicate Also known as a property, this is a URI which fulfills the role of the database field name or the name of an XML tag. It declares what s going to be declared about an object. A very simple example would be <dc:title> in XML. In RDF, this would be the equivalent of <http://purl.org/dc/terms/title>, although it may sometimes even be written as dc:title (see Serialization below) Object The value of a statement. This can be a URI, like the other two, or it can be what s called a Literal, meaning a string, a number, or a date, enclosed in quotation marks. Strings are what we normally think of as text. We can get more specific about what this Literal is with datatype and language modifiers (see Datatype and Language Modifiers ).

  19. Triple Frank Knows Jan

  20. Google Knowledge Graph We have auto complete, but that is using older indexing technology Knowledge graph is using Linked Data to answer questions for the user. https://www.google.com/

More Related Content