Mapping Maintenance and Data Cleaning Strategies in Dynamic Environments

chapter 4 n.w
1 / 12
Embed
Share

Explore mapping maintenance and data cleaning techniques in dynamic environments. Understand how to detect and adapt invalid mappings, along with addressing errors in data cleaning processes. Learn about the importance of schema evolution over time and the use of machine learning techniques for automatic detection of invalid schema mappings.

  • Mapping Maintenance
  • Data Cleaning
  • Schema Evolution
  • Machine Learning
  • Dynamic Environments

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Chapter 4: Mapping Maintenance and Data Cleaning Krishna Teja Talasila Presentation ID #19

  2. Overview Mapping Maintenance. Detecting invalid mappings. Adapting invalid Mappings. Data Cleaning. Errors in Data cleaning.

  3. Mapping Maintenance Schemas evolve over time in dynamic environment. Structural and constraint changes make schema mappings invalid. Detection of invalid/inconsistent schema mappings and the adaptation of such schema mappings to new schema structures/constraints becomes important.

  4. Automatic detection of invalid schema mappings is desired as the complexity increases. Schema adaptation aims to resolve semantic correspondences using known changes in intra-schema semantics, semantics in existing mappings, and detected semantic inconsistencies.

  5. Detecting invalid mappings Detection of invalid mappings resulting from schema change can either happen proactively, or reactively. In proactive detection environments, schema mappings are tested for inconsistencies as soon as schema changes are made by a user. In reactive detection environments, the mapping maintenance system is unaware of when and what schema changes are made. To detect invalid schema mappings in this setting, mappings are tested at regular intervals by performing queries against the data sources and translating the resulting data using the existing mappings.

  6. Machine learning Techniques Build an ensemble of trained sensors to detect invalid mappings. A weighted combination of the findings of the individual sensors is then calculated. If the combined result indicates changes, an alert is generated.

  7. Adapting invalid mappings Once invalid schema mappings are detected, they must be adapted to schema changes and made valid once again. High-level mapping adaptation approaches: Fixed rule approaches: define a re-mapping rule for every type of expected schema change. Map bridging approaches: compare original schema S and the updated schema S1 , and generate new mapping from S to S1.

  8. Data Cleaning Errors in source databases can always occur, requiring cleaning in order to correctly answer user queries. In data warehouses: cleaning is performed as the global database is created. Data integration systems: Data cleaning is a process that needs to be performed during query processing when data are returned from the source databases.

  9. Errors in Data Cleaning Schema-level. Instance-level.

  10. Schema-level problems can arise in each individual LCS due to violations of explicit and implicit constraints. At the schema level, it is clear that the problems need to be identified at the schema match stage and fixed during schema integration. Instance level errors are those that exist at the data level.

  11. The popular approach to data cleaning has been to define a number of operators that operate either on schemas or on individual data. Given the large amount of data that needs to be handled, data level cleaning is expensive and efficiency is a significant issue.

  12. Thank You

Related


More Related Content