Creating Specialized APIs for Efficient Data Management

statistical classifications api n.w
1 / 13
Embed
Share

Learn about the requirements and strategies for developing two distinct APIs for managing statistical classifications efficiently. These APIs address the need for handling ready classifications and internal access restrictions while utilizing naming conventions for enhanced organization. Explore the challenges faced and the solutions adopted, including the integration of Solr for improved performance.

  • API Development
  • Data Management
  • Solr Integration
  • Classification Services
  • SQL Database

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Statistical Classifications API Katja Pulkkinen Nordic Web Meeting 11 April 2019

  2. Background SQL database called Luoti policy is to not use any databases directly, but via APIs Not duplicated luoti_fyysinen_malli_2018-02-01.pdf Classification Services (Luokituspalvelut) API for internal use Contains also classifications that are not public or ready luokituspalvelut.png Classification editor (Luokituseditori) for input: Editor to input data to the SQL database luokituseditori.png 2 13 April 2025 Katja Pulkkinen

  3. Requirements 1 Create a new open API that contains only ready classifications that are allowed to be published Create another API that contains all classifications and can be accessed only from internal network or uses authentication In both APIs the names of the fields must be renamed. The column names of the underlying SQL database that the internal API uses could not be used as the field names for json in open API. and organized, although json is by definition unorganized. 3 13 April 2025 Katja Pulkkinen

  4. Requirements - 2 Minimum and maximum amount of metadata parameter Minimum amount metadata of the parent must be included in the child , i.e. minimum metadata of classification has to be returned with classification item Always all languages or language as a parameter Visio-ulosjaettava_luokitustieto.pdf 4 13 April 2025 Katja Pulkkinen

  5. Requirements 3 Duplicated data server for web application SQL database was not duplicated We could not have a regular database dump from the SQL database so that we could have a duplicated database We were also required to implement some data aggregates and searches that were not available from the Classification Services API for the searches duplicated Solr instances were taken into use, this could be used also for the data aggregates 5 13 April 2025 Katja Pulkkinen

  6. Requirements - 4 Feedback CSC wanted to take the API into use, but it was too slow New page for classifications (Luoksi) needed a faster solution Solution: we took Solr into use in every endpoint of the API https://lucene.apache.org/solr/7_3_1/, https://pypi.org/project/pysolr/ Major improvements in performance Too late for CSC, but not for Luoksi Side-effect: independence from the Luokituspalvelut API (it has maintenance breaks lasting hours during working hours and is not duplicated) 6 13 April 2025 Katja Pulkkinen

  7. Performance measurements Old Old New New Speed Failed requests Speed Failed requests Classifications (urlit) 9263.767 [ms] (mean) 280/300 357.591 [ms] (mean) 0/300 Classifications (data) 8691.135 [ms] (mean) 286/300 383.488 [ms] (mean) 0/300 Classification (url) 848.985 [ms] (mean) 0/300 137.025 [ms] (mean) 0/300 Classification (data) 828.830 [ms] (mean) 0/300 140.389 [ms] (mean) 0/300 Classification items (urlit) 951.168 [ms] (mean) 0/300 257.689 [ms] (mean) 0/300 Classification items (data) 934.896 [ms] (mean) 0/300 258.154 [ms] (mean) 0/300 Classification item (url) 1693.415 [ms] (mean)) 0/300 134.428 [ms] (mean) 0/300 Classification item (data) 1589.208 [ms] (mean) 0/300 138.206 [ms] (mean) 0/300 Taken from Luoksi page with debugging information on: used to be more than a second! content data last modified: Wed, 03 Apr 2019 04:07:17 GMT, , Wed, 03 Apr 2019 04:07:17 GMT - lang=fi https://data.stat.fi/api/classifications/v1/classificationFamilies?content=data&lang=fi (0,1 s) 7 13 April 2025 Katja Pulkkinen

  8. Results Small set of end points is published as open API https://data.stat.fi/api/classifications/v1/ (open.html) Luoksi page with general purpose classifier (Luokitin) and building classifier (Rakennusluokitin) use the open API https://tilastokeskus.fi/fi/luokitukset/ https://www.stat.fi/rakennusluokitin These mostly implemented their searches using filtering on the client side Solr makes it possible to implement searches needed by the development team of our new web page one (classifications used in a specific statistics) is already implemented. 8 13 April 2025 Katja Pulkkinen

  9. Future API with all classifications The API with all the classifications, known as Lurppa (Droopy the Dog), not only publishable, was never fully implemented or taken into use (lurppa.html) This will be implemented as part of the open API with authentication (for data collection forms and possible other users) Huge amount of especially items (2-3 million) and maps How much memory and hard disk space is needed in the servers containing Solr instances (8 servers) Is Solr still fast enough after all these are indexed? Indexing to Solr already slow change messages needed in internal Classification Services API 9 13 April 2025 Katja Pulkkinen

  10. https://test.stat.fi:xxxx/api/classifications/v1/ https://data.stat.fi/api/classifications/v1/ Test Production Proxy cluster Proxy Backend 1 Backend 2 Backend 2 Backend 1 API API API API Proxy (nginx) Proxy (nginx) Solr port 2 Solr port 2 Solr port 1 Solr Solr port 2 Solr Solr port 2 Solr Backend 1 port 1 Backend 1 port 1 Backend 1 port 1 Backend 2 Backend 2 Backend 1 Backend 1 Indexer Indexer

  11. Current architecture Out DMZ In Web11 UI Web11 Copy of classifications (SQL + API) Classifications (SQL + API) Lurppa API Building Classifier Indexing Open API Classification pages Solr Classifier 11 13.4.2025 Katja Pulkkinen

  12. Plans for the future Stage 2 Just-in-case a synchronized data dump.eg. once a week so that indexing everything is still possible. Stage 3 Web11 UI and other similar systems can be transformed to use Open Lurppa API. Out DMZ In Web11 X Web11 UI X Cpability to receive change messages. Copy of classifications Classifications (SQL + API) Stage 1 Combine Lurppa and Open lAPI Indexing X Messages Building Classifier Open Lurppa API Classification pages Solr Stage 2 Implement capability to send messages of changes. Classifier 12 13.4.2025 Katja Pulkkinen

  13. Future - new versions of the API Only non-breaking changes possible in the current version, because it is used at least in Luoksi and Rakennusluokitin Uses database row id as mapid in path, this should be fixed so that correspondencetable related endpoints can be published Families service is structured differently from the others (reflects the implementation in internal Luokituspalvelut API), it might not be published until refactored to use the same structure or not at all Some parts of the returned json are unnecessarily complicated (implemented in a hurry and kind of reflect the structure in Solr) 13 13 April 2025 Katja Pulkkinen

More Related Content