
Clustered Data Grids and Methodology
Explore clustered data grids and methodology by Aksel Thomsen and Erik Sommer. The content covers various aspects including grid data visualization, a case study on Bornholm, population analysis, and detailed principles and algorithms for clustering data. Learn about the method's application through multiple examples.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Clustered data - grids Aksel Thomsen Erik Sommer
Clustered data - grids Aksel Thomsen Erik Sommer
Outline 1. Grid data 2. Our method 3. Result examples 4. Potential expansions 5. Commercial aspects 3
Grid data Either 100x100m or 1x1km grid cells No. of No. of cells cells <100 households 421,655 34,951 <100 households Percentage Percentage 100x100 m 1x1 km 423,755 38,908 99.5% 89.9% Vast majority consists of few households Clustering is needed 4
Method - Principles Each grid is assigned to a unique municipality Time consistent No. of households in a grid is defined as the minimum over e.g. two years All cells with min. K households are their own cluster The remaining cells are clustered by an algorithm 8
Method - Algorithm 1. Start in the South western corner 2. Combined with the nearest remaining cell 3. New center is calculated 4. The nearest still remaining cell is added to the cluster 5. 3. and 5. are repeated until the cluster consists of min. K households 6. If less than K households remains they are added to the last cluster 9
Potential expansions Modify the distance parameter Now: Only geographical distance Potential: Prioritize similar grids nearby - Same households types - Same income - Same demographics - Avoid mixing very different households in the same cluster 18
Commercial aspects (1 of 4) action done by customers Many of the customers actually handle the clustering themselves. The clustering done by the customers/users has to meet our requirements for the minimum of households for at least two years. The clustering done by the customers can be very complex and already include a number of the potential expansions listed by Statistics Denmark. 19
Commercial aspects (2 of 4) role Statistics Denmark. The primary role for Statistics Denmark in regards to clustering of grids is to be an alternative supplier. The primary demand for our clustering has been for us to be a supplier of simple clusters that are easy to understand and easy to use keeping it simple . Very often it seems like that the creator of the clusters tend to forget the important task of explaining and illustrating the methods used so this is an important factor for as a supplier. 20
Commercial aspects (3 of 4) two approaches Clusters can be done either simple using nearest cell approach (as shown by Aksel ) or more complex including various factors in the algorithm creating more optimized clusters (as listed as potential expansions for Statistics Denmark and already used by existing customers). Clusters can then either be created first and then be fixed as static clusters (non-dynamic) and then variables can be added or the clusters can be created by using/sorting the selected variable making dynamic clusters (changed by each variable used). 21
Commercial aspects (4 of 4) two approaches Clusters with a minimum of 20, 50, 100 or 150 households used for the static clusters (non- dynamic). Micro Clusters with a minimum of 5 household used to create dynamic macro clusters with a total minimum of 300 households within a municipality where the first cluster will have the best value in regards to the selected variable and the second clusters will have the next best value etc. for example sorted by decreasing average household income. 22
Clustered data grids Aksel Thomsen Erik Sommer