
Mastering MDM: Key Insights for Data Scientists
Learn from Malcolm T. Hawker, Head of Data Strategy at Profisee, about the critical aspects of Master Data Management (MDM) that data scientists need to know. Discover how MDM enhances data governance, improves analytics, and streamlines business operations through analytical and operational approaches. Gain insights into the significance of MDM in ensuring data uniformity, accuracy, and consistency across organizational data assets.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
What Data Scientists Need to Know About MDM Malcolm T. Hawker Head of Data Strategy, Profisee Presented to the Data Science Dojo 2/16/2023 1
DEFINITION OF MDM Highlights - MDM is a discipline first, a data management technology second A technology-enabled discipline in which business and IT work together to ensure the uniformity, accuracy, stewardship, semantic consistency and accountability of the enterprise s official, shared master data assets.* - You can have data governance without MDM, but you can t have MDM without data governance - MDM is all about applying consistent governance policies to data that is widely *shared* across the org. Gartner 2
WHAT PROBLEMS DOES MDM SOLVE? - MDM improves both analytics and business operations/efficiencies using a data hub which is referenced by analytical and operational systems to solve for the single version/source of truth problem. - As organizations grow over time, it s typical to have competing versions of the same data object by business function / silo - Inconsistent, inaccurate, incomplete, or duplicated data is typically replicated from each source into analytics environments. - By deeply integrating into the applications that create or data, MDM can solve for the garbage in problem. 3
TWO APPROACHES TO MDM ANALYTICAL MDM Analytical MDM o apply one (or more) set of governance policies to allow duplicate versions of given data object (customer, product, location, etc.) to be grouped together under a single ID for the purposes of correctly and consistently aggregating data for that object. o You can create a stub of a master record and persist it in an MDM hub , but its not required. What is persisted are the links between the master record and all the child records, which can then be consumed in downstream analytical processes o Solves for 360 of something but does not change any source data. Most of the heavy lifting here involves complex entity and/or identity resolution. o Technically any DB can act as an MDM persistence layer, so long as the rules used to create the master record are consistently applied to source records. Pro: can be done quickly, doesn t require any data cleanup or data stewardship, can drive significant business value without requiring any changes to business processes. Con: doesn t fix data and doesn t stop the flow of garbage in. 4
TWO APPROACHES TO MDM OPERATIONAL MDM Operational MDM o apply one (or more) set of governance policies to allow duplicate versions of given data object (customer, product, location, etc.) to be clustered together. o Using business rules defined by a governance organization, the multiple records are then consolidated down to a single master record. To enable roll-back, source records are generally not deleted, but new master records are created and persisted. o Master records are then syndicated down into source systems for consumption in core business processes. Reference data is essentially a type of Operational MDM. o Is the classic single version of truth approach and remains popular for many manufacturing or industries that demand high degrees of control over data. Pro: will deliver the most value by ensuring consistency across business processes leveraging that data. Bad data is fixed typically at the source. Con: can be highly disruptive and time consuming. Requires high levels of MDM and governance maturity, and is increasingly seen as old school , particularly by acolytes of more decentralized patterns of data management. 5
TWO APPROACHES TO MDM Analytics Platforms Analytics Platforms MDM Hub MDM Hub Source X Source X Source 2 Source 2 Source 1 Source 1 Analytical MDM Operational MDM 6
THREE MDM IMPLEMENTATION STYLES 1. Consolidation Style system of record is the source systems, MDM hub persists the links between the source systems and a master record. Widely used in analytical MDM 2. Centralized Style system of record is the MDM hub, which persists a gold master record. Reference data typically follows a centralized style. Typical hub and spoke , old school approach that is falling out of favor due to controls it typically requires. 3. Registry Style - *extremely* rare form of MDM where the hub acts as a sophisticated integration hub and source systems are the system of record. Used to be widespread in healthcare by has not been spotted in the wild in years. Could make a comeback thanks to data fabric architectures with highly virtualized approaches to data management. 4. Hybrid some combination of above. Most large companies have combinations of consolidation styles and centralized styles depending on the use case, industry, and data types. 7
WHAT MAKES MDM SOFTWARE UNIQUE Also known as critical capabilities , these features/functions must be present in a solution to be considered enterprise-class MDM 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. Augmented MDM capabilities (AI/ML, Graph, other advanced data mgt) Support for multiple domains (customer, product, location, etc.) Data quality capabilities, inc. complex entity resolution Workflow capabilities Data governance capabilities UI s for data stewards Configurable Data modelling and persistence Data integration capabilities Complex hierarchy / relationship management Support for multiple implementation styles and approaches Many systems claim to support MDM, but unless they check all of the above boxes, they fall short of what most larger companies will need from MDM 8
SOLUTIONS CLAIMING TO SUPPORT MDM BUT CANT 1. Having data in a single place, with some access controls and permissions on it, is a great start, but it s not MDM Data Warehouses, Swamps, Lakes, Lakehouses, Boathouses, etc. 2. Built for, and used exclusively by marketing users, CDPs fall significantly short on data quality, governance, and hierarchy mgt. Customer Data Platforms (CDPs) 3. As much as CRM vendors think they can enable 360 views , those insights are typically limited only to the customer domain, supporting a single use case (sales) CRM s 4. Data Governance or Quality Platforms (Collibra, Alation, Purview, etc.) Most only check two or three of the must have boxes for MDM, but are important pieces of an entire data estate. 9
MDM IS NOT DEAD o For the last twenty years, the coming of every new data fad has foretold of the demise of MDM and it s never happened and won t happen so long as companies need a way to share data within operational systems, at scale. o Data mesh and data fabric both need MDM, and both have failed to present any compelling technical alternative to allow for widespread data sharing across domains at scale, and in compliance with governance policies. Semantic layer + operational governance + scale + widespread integration = ??? o Digital transformation efforts, which have caused a drastic increase in the amount of transactional data which companies want to use to drive deeper levels of insights on their customers/products/employees, etc. have put massive pressure on companies to increase their MDM skills and maturity levels. o AI/ML, Graph, and other new technologies are enabling new levels of automation in MDM, and are exposing new insights (esp. around complex relationships in master data), thereby increasing the demand for MDM as a discipline and a technology. 10
BUT, MDM IS CHANGING o Single version of truth is less and less relevant, although MDM is still an effective single source of truth. Data hubs and limited data replication still enable scale. o Context is becoming king. More mature organizations are implementing solutions to support multiple versions of the truth based on context which is difficult to do in the realm of operational uses of data. Remember, MDM supports both analytical AND operational uses of data. o More adaptive forms of data governance are required to operationalize multiple versions of truth. The business rules for using master data this way are complex. o AI/ML expanding the boundaries of data that has historically been considered for inclusion into MDM hubs, and are adding new layers of automation in master data mgt (inc. modelling, discovery, etc.), and valuable new business insights 11
FUTURE MDM o More automation of data management and governance Auto detection of what is / what isn t master data based on insights from active metadata Automated development of master data models and hierarchies o Virtualized master data hubs, mastery of source data in situ , flexible data models o Deeper integration into other components of the data estate, including becoming a critical component of any fully cloud-based data ecosystem o Blockchain enabled governance and inter-company data sharing, creation of data sharing ecosystems (with shared governance and stewardship) enabled by MDM solutions 12
SOUND FAMILIAR? In order to build more accurate or robust models or analytical insights, are some of you being asked to build custom solutions to solve for MDM use cases or capabilities which could be resolved with your company implementing an MDM program and supporting software? - Data quality problems - Complex entity resolution / matching - Customized business rules or processes to resolve data discrepancies - Data normalization / transformation and integration - Etc., etc. etc. If yes, maybe you need MDM 13