Unraveling the Power of Database Constraints - Improve Data Quality and Integrity

tame your unruly data n.w
1 / 36
Embed
Share

Discover the significance of database constraints in maintaining data integrity and quality. Learn why constraints are essential to prevent bad data and ensure proper data types, uniqueness, and validity. Explore the reasons behind underutilization of constraints and their benefits in enhancing data management practices.

  • Database
  • Constraints
  • Data Quality
  • Integrity
  • Database Management

Uploaded on | 2 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Tame Your Unruly Data! ....With Constraints! Rob Volk

  2. Speaker Bio / Contact Rob Volk SQL Server DBA since v6.5 ...and even older DBs... SQL Saturday Atlanta Organizer SQLTeam http://weblogs.sqlteam.com/robv/ Twitter: @sql_r Email: sql_r@outlook.com

  3. Just like Jimi Hendrix We love to get feedback Please complete the session feedback forms

  4. SQLBits - It's all about the community... Please visit Community Corner, we are trying this year to get more people to learn about the SQL Community, equally if you would be happy to visit the community corner we d really appreciate it.

  5. Apologies for the acoustics Please ask me to speak up if I drift off or mic acts up Please ask me to repeat questions if I forget

  6. What are Constraints? constraint |k n str nt| noun a limitation or restriction : the availability of water is the main constraint on food production | time constraints make it impossible to do everything. Constraints maintain the integrity of your database Domain what kind of data types/attributes Entity what kind of entities/objects/rows Relational control relations between entities Constraints don t allow good data, they prevent bad data But be careful about what they allow!

  7. Why Dont We Use Constraints? Developer unawareness - didn t know they re available Cross-database integrity not supported (use triggers instead) Multi-platform support (or lack thereof) They slow the database down! And the #1 reason...

  8. My App Can Do That! Yeah, that's what the previous dev team said... The team whose app you're currently rewriting... That isn't documented anywhere... That left all the bad data in the database... That you're writing huge data cleansing code to fix...

  9. What are constraints good for? Improve data quality & integrity Proper types Ensure data is required (not null) Prevent duplicate data (primary key & unique) Apply basic rules, e.g. postal code, phone number (check) Handle missing data (default) Ensure related tables have valid data (foreign keys) Document your data model! (discovery script) Improve performance (???) - YES! Reduce application and SQL coding! Reduce need for data cleansing (DQS, MDM, ETL/data warehouse)

  10. What are constraints NEEDED for? I mean, besides Good Database Design Full Text Indexes XML Indexes Hekaton/In-memory tables (SQL 2014+) Transactional Replication Change Tracking and Change Data Capture Partitioned views

  11. What happens without constraints? Duplicate rows Incomplete phone numbers or postal codes Still invalid if padded or trimmed Sales to non-existent customer accounts Did they get the merchandise? Did they pay us? WHO ARE THESE PEOPLE? February 31* Lots of manual data cleansing/fixing/validation More code!

  12. Show us! Constraint discovery script! Demo performance improvements! Uniqueness Foreign Key Check constraints Partitioned views Trusted vs. untrusted Demo integrity! Sales Commission Dupe Identity SalesOrderDetail - Your data tells a story!

  13. Domain Constraints Domain fancy name for type or attribute (column in SQL) Data type - what kind of data you re storing (number, date, etc.) Nullability - whether it s required or not Default Check - what values are valid and invalid

  14. Domain Constraints - Types First constraint: Data type Choose the right type VARCHAR(max) is a bad sign Float or real -> Bad! Money -> Better (but still problematic) Dates (use proper date/time types) Good data type = storage and performance efficiency Andy Yun - Every Byte Counts https://www.youtube.com/watch?v=zSNEaCujxmE Important for Hekaton too

  15. Domain Constraints NOT NULL Limit or eliminate nullable columns Nulls are not comparable, even to themselves Not applicable vs. unknown Which one does Null represent? Nulls cannot be typed properly (or at all) Read Chris Date In SQL Server, nulls have storage overhead For 2008+ the overhead is fixed, except for sparse columns

  16. Domain Constraints - Default Use this default when value is unknown or unsupplied N/A vs. Unknown can be addressed with defined values Can be useful in cascading foreign keys SET DEFAULT supported in SQL Server 2005+

  17. Domain Constraints - CHECK Restrict values further than what data type allows Dates Numeric/Money Character Partitioned views partition elimination Same performance enhancement as partitioned tables Mike Fal - Eating the Elephant http://www.mikefal.net/presentations_page/ Read Alex Kuznetsov https://www.red-gate.com/simple-talk/author/alex-kuznetsov/

  18. Domain Constraints - External Very often ignored, not all values are valid (special meaning) Postal Codes Phone numbers Social Security/Tax IDs Doesn t matter if external system can t provide invalid value Verify it anyway Read Karen L pez (@DataChick) http://www.datamodel.com/index.php/category/blog/

  19. Domain Constraints - Internal Very often ignored (and rarely documented) $10,000 utility bills for single (unoccupied) apartment 99 occupants for single room apartment (legacy calculation support) Percentage values (tax rate, discount) See AdventureWorks Sales tables (smallmoney) App could handle, but then must also fix if wrong

  20. Entity Constraints Entity fancy name for row or record (tuple even fancier) Primary Key - value(s) that identify the entity Unique - value(s) that must be unique Table-level check - values that are valid or invalid One or more domains/attributes/columns

  21. Entity Constraints Primary Key Uniquely identifies the row/record/tuple/entity Must provide a value (NOT NULL by definition) Logical construct, NOT physical Surrogate vs. natural key (e.g. dimensional tables)

  22. Entity Constraints - Unique Candidate keys (natural keys) Phone number is unique, but not good candidate for PK Screen Actors Guild (billed/credited name) SSN, Credit card # (unique but can t be used for ID) Multiple columns are unique Constraint vs. index (logical vs. physical) Unique filtered index on nullable column

  23. Entity Constraints - CHECK Multi-column conditions Pregnancy status (N/A for males or young girls) Unless you're British...http://goo.gl/msSeP Single table only ANSI/ISO supports subqueries...no one else does

  24. Relational Constraints Foreign keys Requires data in one table to reference another table s data Referenced table (parent) Referencing table (child) Can be one OR MORE columns Referential actions on UPDATE/DELETE in referenced

  25. Performance UNIQUE/PRIMARY keys can improve performance Can also save space in index structures CHECK and Foreign Key Eliminate data access if WHERE clause violates CHECK constraint Must be trusted (next slides)

  26. Trusted Constraints ALTER TABLE CHECK/NOCHECK CONSTRAINT CHECK WITH CHECK Performance improvement

  27. Untrusted Constraints Constraint could be enabled but data could be invalid Performance hit How to detect: sys.check_constraints.is_not_trusted = 1 sys.foreign_keys.is_not_trusted = 1 Primary key and Unique are always trusted

  28. Cool Constraint Use Alex Kuznetsov: Time overlap prevention: http://goo.gl/K5j4hK Prevent updates if related activity: http://goo.gl/V1axJ4 RedGate book Defensive Database Programming http://goo.gl/dDEGUj Chapters 6 & 7 for constraints And maybe my lunatic rantings: http://goo.gl/p23NL

  29. Considerations Be wary of meets our current needs Be wary of will also meet all possible future needs Aaron Bertrand Bad Habits to Kick blog If there is a standard defined for your industry, use it (ISO) Sanity check, make sure you covered everything If you must deviate from the standard, document why Multi-platform support Doomed Least-feature support = baseline Constraints (if any) go into app logic Maintenance - use DBCC CHECKCONSTRAINTS! Thanks to Neil Hambly for reminding me!

  30. Can you have too many? IMHO, no What about performance? Layers? UI + App logic + stored procedure + triggers + constraints Don t assume a single layer can do all the work What s the cost of fixing bad data? And will it actually get fixed? Importing/migrating bad data or design Cruft from old systems (Ariane 501) ETL transform bad data, but don t constrain it in destination Never time to do it right, Always time to do it over -Ron Soukup, SQL Server PM

  31. Butbut Devs say Application does everything And DB constraints will make sure it works Then why use SQL Server? Write your own storage engine. Or use NoSQL. The data will be right eventually. NoSQL reliability: http://aphyr.com/tags/jepsen In 5 years we ll rewrite your app in Ocaml, or Scala, or Blub Moral: you re using SQL Server anyway, use what it offers Constraints won t interfere Even if you switch RDBMS, USE CONSTRAINTS! Credit: Andy Leonard & Buck Woody

  32. And if you think Imstrict SQLite testing procedure: http://www.sqlite.org/testing.html 1177x as much test code as program code Space Shuttle Software Development: https://www.fastcompany.com/28121/they-write-right-stuff 11 versions * 400K+ lines each = 17 errors total F-35 (JSF) Software coding standard: http://www.stroustrup.com/JSF-AV-rules.pdf 141 pages on C++ programming guidelines Which is really ironic Data Integrity is its own reward http://goo.gl/gZ3TUk Mars Spacecraft (Curiosity, Opportunity/Spirit, Phoenix) Extremely limited resources (20 MHz/10 MB RAM)

  33. But really, how bad could it be? Ariane Flight 501 (16 bit integer overflow) Mars Climate Orbiter (metric vs. US) USS Yorktown (1997, divide by zero) F-22 (crashes, Intl. Date Line) F-35 (grounded by mechanical, oxygen, needs 8M+ lines codefix) Qantas Flight 72 (didn t crash, nose dived twice, injuries) Patriot Missile (clock drift, missed target, soldiers died) Therac-25 (deaths) Not data but process, safeguards missing/misprogrammed More: http://goo.gl/4Akqm

  34. Conclusion Performance & integrity benefits of constraints Types and definition of constraints Domain, Entity, Relational Assume layers/components/processes will fail or change Protect data and DB integrity with constraints Constraints document integrity rules Use them! And be sure to check on them (DBCC CHECKCONSTRAINTS)

  35. References Books Online & MSDN Books Beginning SQL Server 2008 for Developers: From Novice to Professional by Robin Dewson (Apress) Pro SQL Server 2012 Relational Database Design and Implementation by Louis Davidson, with Jessica Moss(Apress) Microsoft SQL Server 2012 Internals by Kalen Delaney,Paul S. Randal, Conor Cunningham, Jonathan Kehayias, et. al. (MSPress) Defensive Database Programming by Alex Kuznetsov (Red Gate Books, PDF link below) A Developer s Guide to Data Modeling for SQL Server, by Eric Johnson and Joshua Jones (Addison-Wesley) SQL Antipatterns: Avoiding the Pitfalls of Database Programming by Bill Karwin (Pragmatic Bookshelf) SQL and Relational Theory: How to Write Accurate SQL Code by C. J. Date(O Reilly) SQL Server MVP Deep Dives by SQL Server MVPs for War Child International (Manning) SQL Server MVP Deep Dives 2 by SQL Server MVPs for Operation Smile (Manning) T-SQL Fundamentals by Itzik Ben-Gan (MSPress) Blogs/Sites http://weblogs.sqlteam.com/ http://sqlblog.com/ (lots of great people) http://sqlskills.com/ (Paul Randal, Kimberly Tripp, Jonathan Kehayias, Bob Beuchemin, Erin Stellato) http://tsql.solidq.com/ (Itzik Ben-Gan) http://www.simple-talk.com/sql/database-administration/constraints-and-the-test-driven-database/ http://www.simple-talk.com/books/sql-books/defensive-database-programming/ http://datachix.com/2010/09/09/can-you-trust-your-foreign-key-or-feature-821-i-didnt-already-know-about-sql-server/ (Datachix - Trusted Constraints) https://sqlserverfast.com/blog/hugo/2007/03/can-you-trust-your-constraints/ http://blogs.sqlsentry.com/aaronbertrand/bad-habits-revival/ Aaron Bertrand Bad Habits to Kick data type issues

  36. Thank You Sponsors! Organizers & Volunteers! YOU!

Related


More Related Content