An Exploration of Alternative Data for Index Creation in the Airline Industry

an alternative data approach to index creation n.w
1 / 17
Embed
Share

This project explores the use of alternative data sources in creating airline industry and commodity indexes, aiming to validate its effectiveness against traditionally collected data. Led by the Office of Prices and Living Conditions (OPLC) at the Bureau of Labor Statistics (BLS), the team led by Ayme Tomson focuses on utilizing non-traditional data types such as geolocation, social media, sensors, and public data to enhance the Producer Price Index (PPI) for airlines. Through the comparison of three years of historic airline data, the project members aim to validate the alternative data approach and potentially improve index methodologies.

  • Alternative Data
  • Index Creation
  • Airline Industry
  • Bureau of Labor Statistics
  • Data Science

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. An Alternative Data Approach to Index Creation Ayme Tomson FedCASIC 2024 Data Scientist Office of Prices and Living Conditions (OPLC) Bureau of Labor Statistics (BLS) 1 U.S. BUREAU OF LABOR STATISTICS bls.gov

  2. What is Alternative Data? Non-traditional data Traditional data = established data for a given task Types of alternative data Geolocation (Satellite / weather) Social media Tracking / Traffic (clicks / usage / feeds / reviews) Sensors (IoT) Public data Source: Grand View Research - (CAGR) compound annual growth rate 2 U.S. BUREAUOF LABOR STATISTICS bls.gov

  3. Alternative Data Characteristic Pro Con Availability (Ownership) X X Type (Linkage / Harmonization) X X Size X X Accuracy (Scope) X X Bias (Weighting) X X Creation (Transparency / Informed Gaming) X Unintended Consequences X 3 U.S. BUREAUOF LABOR STATISTICS bls.gov

  4. Project Description & Goal The Producer Price Index (PPI) Airline team currently uses traditionally collected data for the creation of the airline industry and commodity indexes. This project validates replacing traditionally collected data with an alternative data source. Goal: Validate the alternative data source through index creation and comparison to current airline PPI methodologies using 3 years of historic airline data 4 U.S. BUREAUOF LABOR STATISTICS bls.gov

  5. Project Members OPLC Data Scientists Ayme Tomson Gerald Thomas / Stephen Hanna Producer Price Index (PPI) Airline Team John Lucier Bill Page Chelsea Velic 5 U.S. BUREAUOF LABOR STATISTICS bls.gov

  6. Alternative Data Characteristics Characteristic V1 V2 3rd party purchase 3rd party purchase Availability (Ownership) Type (Linkage / Harmonization) No geographical data No geographical data Size 7 Gigabyte 4 Gigabyte Accuracy (Scope) Monthly Origin - Destination Monthly Origin - Destination Bias (Weighting) Value Padding +/- 20% Value Padding +/- 20%, Rolling 7-day averages Creation (Transparency / Informed Gaming) Sales Transactions Sales Transactions Unintended Consequences 6 U.S. BUREAUOF LABOR STATISTICS bls.gov

  7. Project Journey to representative data Data Final Data Filtering Raw data Preprocessing Aggregation Index Calculation Ready! Historical airline price data from third party Remove out- of-scope data: Create labels Group by key variables 120 million rows 49 million rows 49 million rows 26 million rows V1 V2 7 U.S. BUREAUOF LABOR STATISTICS bls.gov

  8. Data & Decision Architecture Extract-Transform-Load (ETL) Data processing in monthly batches Index Labeling Industry Index Aggregation Remove international to international flights 2019 2019 Airport Scope Industry Alt DATA 2020 2020 Cabin Class Commodity 2021 2021 Region Commodity Index Remove unscheduled flights Country Codes International Regions Canada & Latin America Atlantic Pacific IATA CC Mappings CSV File Python Library 8 U.S. BUREAUOF LABOR STATISTICS bls.gov

  9. Data & Decision Architecture Extract-Transform-Load (ETL) Index 1 2019 Index Labeling Alt DATA 2020 Scope Aggregation Index 2 2021 Geo- Coding Data processing in MONTHLY batches 9 U.S. BUREAUOF LABOR STATISTICS bls.gov

  10. Data & Decision Architecture Extract-Transform-Load (ETL) Index 1 2019 Index Labeling Alt DATA 2020 Scope Aggregation Index 2 2021 Geo- Coding Data processing in MONTHLY batches Data Size 10 U.S. BUREAUOF LABOR STATISTICS bls.gov

  11. Data & Decision Architecture Extract-Transform-Load (ETL) Index 1 2019 Index Labeling Alt DATA 2020 Scope Aggregation Index 2 2021 Geo- Coding Data processing in MONTHLY batches Data Type 11 U.S. BUREAUOF LABOR STATISTICS bls.gov

  12. Data & Decision Architecture Extract-Transform-Load (ETL) Index 1 2019 Index Labeling Alt DATA 2020 Scope Aggregation Index 2 2021 Geo- Coding Data Creation, Accuracy, & Bias Data processing in MONTHLY batches 12 U.S. BUREAUOF LABOR STATISTICS bls.gov

  13. Data & Decision Architecture Extract-Transform-Load (ETL) Data processing in monthly batches Index Labeling Industry Index Aggregation Remove international to international flights 2019 2019 Airport Scope Industry Alt DATA 2020 2020 Cabin Class Commodity 2021 2021 Region Commodity Index Remove unscheduled flights Country Codes International Regions Canada & Latin America Atlantic Pacific IATA CC Mappings CSV File Python Library 13 U.S. BUREAUOF LABOR STATISTICS bls.gov

  14. Project Methodology Data Type No Geographical data Excludes flights that are foreign airport to foreign airport Index labels are based on geographical region Domestic / International Data Creation, Accuracy, and Bias Index labels are based on cabin class Aggregation decisions 14 U.S. BUREAUOF LABOR STATISTICS bls.gov

  15. Project Outcomes Outcomes V1 V2 Index 1 Year-to-Year Correlation 0.95 0.81 Index 2 Year-to-Year Correlation 0.94 0.82 Index 1 Month-to-Month Correlation 0.78 0.44 Index 2 Month-to-Month Correlation 0.75 0.41 Sample Size Sample to Universe Sample to Universe 15 U.S. BUREAUOF LABOR STATISTICS bls.gov

  16. Project Outcomes TBD if this alternative data will be used in production of the PPI Airline Indexes 16 U.S. BUREAUOF LABOR STATISTICS bls.gov

  17. Contact Information Tomson.Ayme@bls.gov 17 U.S. BUREAUOF LABOR STATISTICS bls.gov

More Related Content