
Efficient SDG Data Validation and Structural Evaluation
Ensure your datasets meet SDG standards with rigorous validation, including structural checks, content verification, and adherence to global dataflows and constraints. Learn about the importance of SDMX validation and how it optimizes data accuracy and integrity for statistical divisions.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
SDMX Validation Structural validation Ensures that all dimensions and mandatory attributes are in place and have valid values, and the dataset does not contain duplicates Content validation In addition to structural validation, ensures that more complex data relationships are also observed Statistics Division
Validation against the SDG DSD Validating a dataset against the SDG DSD achieves structural validation. It ensures that the dataset has all dimensions and mandatory attributes in place, and all concepts have correct values according to the DSD. Statistics Division
SDMX Dataflows Dataflow defines a view on a Data Structure Definition (DSD). It is a structure that can help describe, categorize and constrain datasets. Each data flow is linked to one DSD. Each DSD may have one or more dataflows linked to it. In its simplest form defines any data valid according to a DSD. Statistics Division
SDG Global Dataflows DF_SDG_GLH Harmonized Global Dataflow. This dataflow is used by the Custodian Agencies to report SDG indicators that are part of the global dataset, regardless of how the data was obtained. This dataflow is also used to disseminate the global dataset at the SDMX API. DF_SDG_GLC Country Global Dataflow. This data is used by countries to report data to UNSD, as well as to disseminate national data in compliance with the SDG Global DSD. Statistics Division
SDG Cube Region Content Constraints CN_SDG_GLC, attached to dataflow DF_SDG_GLC Restricts the dimension REPORTING_TYPE to code N( National ) Ensures that data from countries always have REPORTING_TYPE=N, i.e. the countries always use correct Reporting Type for national dataset. CN_SDG_GLH, attached to dataflow DF_SDG_GLH Restricts the dimension REPORTING_TYPE to code G ( Global ) Ensures that data from custodian agencies always have REPORTING_TYPE=G, i.e. the agencies always use correct Reporting Type for the global dataset. Statistics Division
Validation against an SDG dataflow Countries should always use the Country Global Dataflow, DF_SDG_GLC, for SDG data that they exchange SDG Content Constraints are attached to the global dataflows When a dataset is validated against a global dataflow, the Content Constraints are applied Structural validation is carried out as in the case of the DSD In addition, relationships between the dimensions are validated using the Content Constraints Statistics Division
SDG Content Constraints Cube region constraints: CN_SDG_GLC and CN_SDG_GLH Ensure that the reporting type is correct for each dataflow: N for DF_SDG_GLC, and G for DF_DSG_GLH Series constraints: CN_SERIES_SDG_GLC and CN_SERIES_SDG_GLH Ensure that the dataset has valid combinations of dimensions Valid disaggregation is provided for each SDG series The Content Constraint Matrix helps visualize the content constraints and apply them in mapping SDG series Statistics Division
SDG Content Constraints Matrix Informal representation of SDG series content constraints in CSV/Excel Can be used to determine how to correctly map an SDG series Statistics Division
Content Constraint Matrix: Columns SERIES Series name [for convenience, ignored in validation] Unit Measure* (attribute) Unit Multiplier* (attribute) SEX, AGE, URBANISATION, COMPOSITE_BREAKDOWN, EDUCATION_LEV, DISABILITY_STATUS, OCCUPATION, INCOME_WEALTH_QUANTILE, PRODUCT, ACTIVITY * In SDMX 2.1, validation of attributes is not supported. Values for Unit of Measure and Unit Multiplier are listed for correct mapping, and will be enforced on import to SDG Lab. Statistics Division
Content Constraint Matrix: Values Allowed disaggregation codes are listed for each series One or more codes separated with semicolon (;) Y15T19;Y10T14 Special value ALL means there are no restrictions on the corresponding dimension, i.e. all values are allowed Statistics Division
Content Constraint matrix: example series SERIES SP_DYN_ADKL Adolescent birth rate (per 1,000 women aged 15-19 and 10-14 years) [3.7.2] PER_1000_POP ALL Name UNIT_MEASURE UNIT_MULT SEX AGE Y15T19;Y10T14 ALL URBANISATION COMPOSITE_BREAKDOWN EDUCATION_LEV _T;MS_MIGRANT;MS_NOMIGRANT;MS_EUMIGRANT;MS_NONEUMIGRANT ALL ALL DISABILITY_STATUSOCCUPATIONINCOME_WEALTH_QUANTILE ALL PRODUCT ACTIVITY _T F ALL _T Series: SP_DYN_ADKL (Adolescent birth rate (per 1,000 women aged 15-19 and 10-14 years)) Allowed concept values: Unit of measure: PER_1000_POP Unit multiplier: 0 Sex: F Age: Y15T19 and Y10T14 Composite Breakdown: _T;MS_MIGRANT;MS_NOMIGRANT;MS_EUMIGRANT;MS_NONEUMIGRANT Product: _T Activity: _T Urbanisation, Disability status, Occupation, Income or Wealth Quantile: all values allowed Statistics Division
Diagram of SDG artefacts Codelists CL_FREQ DSD Concept Scheme SDG SDG_CONCEPTS CL_PRODUCT Dataflow Dataflow DF_SDG_GLC DF_SDG_GLH Content Constraints CN_SDG_GLC CN_SERIES_SDG_GLC CN_SERIES_SDG_GLH CN_SDG_GLH Statistics Division