
Understanding Schema Matching Techniques and Challenges
Explore the world of schema matching, where concepts from one schema are matched with another to enable effective data mapping. Discover the complexities involved, such as schema heterogeneity and various matching approaches like constraint-based and linguistic matching. Uncover key issues like insufficient schema information and the importance of addressing structural and semantic conflicts.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
SCHEMA MATCHING SNEHA PRIYA DEVARAKONDA PRESENTATION ID:16 (137-143)
Outline Schema Matching Schema Heterogeneity Linguistic Matching Approaches Constraint based Matching Approaches
Schema Matching Schema matching determines which concepts of one schema match those of another. If the GCS has already been defined, then one of these schemas is typically the GCS, and the task is to match each LCS to the GCS. Otherwise, matching is done on two LCSs. The matches that are determined in this phase are then used in schema mapping to produce a set of directed mappings, which, when applied to the source schema, would map its concepts to the target schema.
The matches that are defined or discovered during schema matching are specified as a set of rules where each rule (r) identifies a correspondence (c) between two elements, a predicate (p) that indicates when the correspondence may hold, and a similarity value (s) between the two elements correspondence. Aset of matches can be defined as M= {r} where r = {c,p,s}. As much as it is desirable to automate this process, as we discuss below, there are many complicating factors. identified in the
The most important is schema heterogeneity, which refers to the differences in the way real-world phenomena are captured in different schemas. This is a critically important issue, and we devote a separate section to it.
Apart from Schema Heterogeneity the issues that complicate the matching process are Insufficient schema and instance information Unavailability of schema documentation Subjectivity of matching Other issues which affect the particular matching algorithm are: Schema versus instance matching Element-level vs. structure-level Matching cardinality
Schema Heterogeneity Schema matching algorithms deal with both structural heterogeneity and semantic heterogeneity among the matched schemas. Structural conflicts occur in four possible ways: As type conflicts Dependency conflicts Key conflicts Behavioral conflicts.
Structural differences among schemas are important, but their identification and resolution is not sufficient. Schema matching has to take into account the (possibly different) semantics of the schema concepts. This is referred to as semantic heterogeneity, which is a fairly loaded term without a clear definition.
There are attempts to formalize semantic heterogeneity and to establish its link to structural heterogeneity. The following are some of these problems that the match algorithms need to deal with: Synonyms, homonyms, hypernyms Different ontology Imprecise wording
Linguistic Matching Approaches Linguistic matching approaches, as the name implies, use element names and other textual information (such as textual descriptions /annotations in schema definitions) to perform matches among elements. In many cases, they may use external sources, such as thesauri, to assist in the process.
Linguistic techniques can be applied in both schema- based approaches and instance-based ones. In the former case, similarities are established among schema elements where as in the later, they are specified among elements of individual data instances. We will use the notation SC1.element-1 SC2.element- 2,p,s to represent that element-1 in schema SC1 corresponds to element-2 in schema SC2 if predicate p holds, with a similarity value of s.
Matchers use these rules and similarity values to determine the similarity value of schema elements. Linguistic matchers that operate at the schema element-level typically deal with the names of the schema elements and handle cases such as synonyms, homonyms and hypernyms. Schema linguistic matchers use a set of linguistic (also called terminological) rules that can be hand crafted or may be discovered using auxiliary data sources such as thesauri.
The hand-crafted linguistic rules may deal with capitalization, abbreviations, concept relationships etc. However, in most cases, the rule base contains both intra and inter schema rules. There are ways of determining the element name similarities automatically. For example, COMA uses the following techniques to determine similarity of two element names: The affixes, n-grams, edit distance and soundex code.