Discovering Occupational Topics with Topic Modeling

ors task list classification topic modeling n.w
1 / 13
Embed
Share

Leveraging the Occupational Information Network (O*NET) and Occupational Requirements Survey (ORS) data, this project aims to classify and structure tasks using Topic Modeling methods. By utilizing Latent Dirichlet Allocation (LDA) models, both ORS and O*NET data are organized into abstract topics, facilitating better understanding and categorization of occupational information.

  • Topic Modeling
  • Occupational Data
  • O*NET
  • ORS
  • LDA

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. ORS Task List Classification - Topic Modeling Drake Gibson Data Scientist Office of Compensation and Working Conditions April 6, 2022 1 U.S. BUREAU OF LABOR STATISTICS bls.gov

  2. Outline O*NET and ORS background Purpose of the project Leveraging task data and methods Topic Modeling 2 U.S. BUREAUOF LABOR STATISTICS bls.gov 2 U.S. BUREAUOF LABOR STATISTICS bls.gov

  3. Occupational Information Network (O*NET) The nation's primary source of occupational information Highlights changes in workforce Helps people find the training needed for jobs Generalized Work Activities (GWA) classify workers tasks in the data source 3 U.S. BUREAUOF LABOR STATISTICS bls.gov 3 U.S. BUREAUOF LABOR STATISTICS bls.gov

  4. Occupational Requirements Survey (ORS) Establishment survey collected on behalf of the Social Security Administration (SSA) ORS supports adjudication of SSA s disability program Captures the requirements for a job like: Physical Demands Environmental Conditions Captures minimally structured task data 4 U.S. BUREAUOF LABOR STATISTICS bls.gov 4 U.S. BUREAUOF LABOR STATISTICS bls.gov

  5. Purpose Why? Publish ORS task data How? Using O*NET as a taxonomy, we may be able to classify and structure ORS task data 5 U.S. BUREAUOF LABOR STATISTICS bls.gov 5 U.S. BUREAUOF LABOR STATISTICS bls.gov

  6. ORS Task List Classification Topic Modeling Type of statistical model for discovering the abstract topics that occur in a collection of documents (Li, Susan. 2018. Topic Modeling and Latent Dirichlet Allocation (LDA) in Python.) is a method for unsupervised classification of such documents, similar to clustering on numeric data, which finds natural groups of items even when we re not sure what we re looking for. ( Silge, Julia and David Robinson. 2021. Text Mining with R: A Tidy Approach) 6 U.S. BUREAUOF LABOR STATISTICS bls.gov 6 U.S. BUREAUOF LABOR STATISTICS bls.gov

  7. ORS Task List Classification Topic Modeling Latent Dirichlet Allocation (LDA) Each task falls into a topic Each word in each task falls into a topic Topic models for both ORS and O*NET 7 U.S. BUREAUOF LABOR STATISTICS bls.gov 7 U.S. BUREAUOF LABOR STATISTICS bls.gov

  8. ORS Task List Classification Models were built with 8, 16, 32, 48 and 64 topics Models with common words and verbs with O*NET were also built Currently, working on evaluating results for final suggestion 8 U.S. BUREAUOF LABOR STATISTICS bls.gov 8 U.S. BUREAUOF LABOR STATISTICS bls.gov

  9. O*NET Topic Modeling First, a 4-topic model is used to compare log likelihoods for each topic in the model Compared common words in tasks between topics in O*NET and ORS Compared common verbs in tasks between topics in O*NET and ORS 9 U.S. BUREAUOF LABOR STATISTICS bls.gov 9 U.S. BUREAUOF LABOR STATISTICS bls.gov

  10. O*NET Topic Modeling Results R-Squared Measures for the Models Coherence Measures for the Models NUMBER OF TOPICS NUMBER OF TOPICS COHERENCE R-SQUARED 8 0.07385021 8 0.01108270 16 0.06835653 16 0.01893237 32 0.06733141 32 0.02808492 48 0.06486691 48 0.03074370 64 0.07247086 64 0.03334604 10 U.S. BUREAUOF LABOR STATISTICS bls.gov 10 U.S. BUREAUOF LABOR STATISTICS bls.gov

  11. Topic Modeling Analysis ORS and O*NET tasks are classified Optimal number of topics for this exercise? Coherence vs Log likelihood vs R-Squared The structure of the results Are tasks following the Standard Occupation Classification(SOC)? Still need to further explore other models for better results 12 U.S. BUREAUOF LABOR STATISTICS bls.gov 12 U.S. BUREAUOF LABOR STATISTICS bls.gov

  12. The Team Nicole Nestoriak & David Oh 13 U.S. BUREAUOF LABOR STATISTICS bls.gov 13 U.S. BUREAUOF LABOR STATISTICS bls.gov

  13. Contact Information Drake Gibson Operations Research Analyst(Data Scientist) Bureau of Labor Statistics Gibson.Drake@bls.gov 14 U.S. BUREAUOF LABOR STATISTICS bls.gov

More Related Content