Optimizing Data Generation and Database System Constraints

slide1 n.w
1 / 14
Embed
Share

"Explore the challenges of testing data constraints in database systems and the need for more realistic datasets. Learn about designing databases, constraints, data types, and generating region-consistent data. Discover methods for ensuring randomness and uniqueness. Dive into frontend and backend frameworks for efficient data handling."

  • Data Generation
  • Database Systems
  • Realistic Datasets
  • Data Constraints
  • Frontend Backend

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. P15 DATA SET GENERATOR TEAM: Li Xiangqun, Wu Xudong, Wu Dan, Yu Fangzhou

  2. Motivation? Testing data! Not support the constrains of database systems! Datasets are not realistic enough

  3. Goals Realistic data sets Sample input: Chinese, 21< age < 41 Data range in South East Asia and East Asia Enforce the data integrity constraints

  4. Database Design Assumptions: the phone number for each country is using different country code, i.e. country -> country code and country->country code country may use different languages, and it is possible that one country uses more than one language language and gender will affect first name and last name different country may have different email domain

  5. Database Design 3NF Small relations

  6. Frontend Framework: Language: twitter bootstrap front-end framework HTML and Javascript

  7. Data Types With Region-Consistency Constraint Regional Name, Email, Phone, Country Non-regional Name, Email, Phone, Country, Gender, String, Integer, Float, Date With Uniqueness Constraint Unique Non-unique Regional Name, Email, Phone Name, Email, Phone, Country Non-regional Name, Email, Phone, Country, Gender, String, Integer, Float, Date Name, Email, Phone, Country, Gender, String, Integer, Float, Date

  8. Constraints Region-consistency Regional Data Generator Non-Regional Data Generator Randomness and Uniqueness Randomly generate data and use a hash-table to check uniqueness Generate permutation of unique data and use shuffle algorithm to ensure randomness Distribution Uniform: use random function Normal Distribution: Box Muller Transform (U1 and U2 uniformly distributed in the interval (0, 1))

  9. Backend

  10. Backend Problems Inserting data to database is too slow Processing time is too long Amount of data is limited to 10 thousand.

  11. Backend

  12. Backend Improvements Processing speed is faster Drawbacks: Cannot generate too much data

  13. Features User-friendly UI Performance: runs data very fast! Can reach below 10 sec in present of a poor server Output CSV format that is popular for many testing programs Support enforcing database constraints Realistic data and result

  14. Conclusion We can generate regional data in several data types We can ensure uniqueness of data if required We can generate normal distribution for numeric data Data generator consume much computing power Stronger computing power is required for larger data set More improvement can be made Multiple tables with foreign key constraint More format for output files

Related


More Related Content