
Matching Approaches in Schema Integration
Discover the essence of constraint-based and learning-based matching approaches in schema integration. Explore how schema definitions, instance-based techniques, and structure-based approaches play crucial roles in determining the similarities between schemas for effective classification. Dive into the world of hybrid and composite matching algorithms to enhance the matching process significantly.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Kalyan Chakravarthy Ala Student Id: 1559530 Presentation Id: 17
Constraint-based Matching Approaches Learning-based Matching Classifying concepts Combined Matching Approaches Schema Integration Integration methodologies
Schema definitions almost always contain semantic information that constrain the values in the database. In the case of instance-based techniques, the existing ranges of the values can be extracted as well as some patterns that exist in the instance data. In structure-based approaches, the structural similarities in the two schemas can be exploited in determining the similarity of the schema elements.
Learning-based approaches formulate the problem as one of classification where concepts from various schemas are classified into classes according to their similarity. The similarity is determined by checking the features of the data instances of the databases that correspond to these schemas.
A training set (t) is prepared that consists of instances of example correspondences between the concepts of two databases Di and Dj. The learner uses this training data to acquire probabilistic information about the features of the data sets. The classifier, when given two other database instances (Dk and Dl ), then uses this knowledge to go through the data instances in Dk and Dl and make predictions about classifying the elements of Dk and Dl .
The individual matching techniques that we have considered so far have their strong points and their weaknesses. Each may be more suitable for matching certain cases. Therefore, a complete matching algorithm or methodology usually needs to make use of more than one individual matcher. There are two possible ways in which matchers can be combined: hybrid and composite.
Hybrid algorithms combine multiple matchers within one algorithm. E.g., string matching as well as data type and/or structural matchers within one algorithm to determine their overall similarity. Composite algorithms, on the other hand, apply each matcher to the elements of the two schemas individually, obtaining individual similarity scores, and then they apply a method for combining these similarity scores.
Once schema matching is done, the correspondences between the various LCSs have been identified. The next step is to create the GCS, and this is referred to as schema integration.
Integration methodologies can be classified as binary or nary mechanisms based on the manner in which the local schemas are handled in the first phase.
Binary integration methodologies involve the manipulation of two schemas at a time. These can occur in a stepwise (ladder) fashion where intermediate schemas are created for integration with subsequent schemas or in a purely binary fashion where each schema is integrated with one other, creating an intermediate schema for integration with other intermediate schemas.
Nary integration mechanisms integrate more than two schemas at each iteration. One-pass integration occurs when all schemas are integrated at once, producing the global conceptual schema after one iteration. Benefits of this approach include the availability of complete information about all databases at integration time. Iterative nary integration offers more flexibility (typically, more information is available) and is more general.