Leveraging Large Language Models in Census Data Processing and Analysis

leveraging large language models llms in census n.w
1 / 21
Embed
Share

Explore how Large Language Models (LLMs) are utilized in Census data processing and analysis, with a focus on AI classification, language translation challenges, and operational efficiency goals. Discover tools like the SAS to Python conversion tool and the LLM-assisted search engine for NAICS codes. Dive into document analysis, code conversion, and survey response optimization for increased efficiency and resource savings.

  • Language Models
  • Census Data
  • AI Classification
  • SAS to Python
  • Operational Efficiency

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Leveraging Large Language Models (LLMs) in Census Data Processing and Analysis Taylor Wilson Director, Applied Statistics & Data Science Cameron Milne, Angi Lee & Hector Ferronato

  2. Outline AI Solution/capability AI Classification SAS to Python Conversion Tool Language Translation Challenges NIST Framework Consideration

  3. LLM Assisted Search NAICS Code Classifier (DR. NAICS) Occupation & Industry Code Classifier Document Analysis Qualitative Analysis Multiple Document Analysis Census Use Cases/ Pilot Studies Code Conversion SAS to Python Language Translation Survey Responses Survey Questionnaires Goal: Operational efficiency, cost saving & resource optimization

  4. LLM Assisted Search

  5. LLM Assisted Search Dr. NAICS North American Industry Classification System (NAICS) search engine

  6. Information Retrieval Document(s) Indexing Hugging Face Semantic Search Query Processing User s Inquiry Ranking

  7. Embedding Space and Distance-based Retrieval User Inquiry: "Cybersecurity Services" Finds the closest embeddings in a vector space such as 513210, Software Publishers 541511, Custom Computer Programming Services 541512, Computer Systems Design Services 541513, Computer Facilities Management Services

  8. Input Documentation Title, Description, Illustrative Examples, Cross References, Indexed Keywords Accuracy rate of 87% in the first 5 retrieved responses.

  9. SAS to Python Conversion Tool

  10. SAS to Python Convert legacy SAS code into open-source programming languages Solution: Multilayer processing steps & error feedback loop Advantage Speed Cost Consistency Challenge Lack of context: AI may struggle with code that lacks clear context Nuances: AI might miss out on nuanced coding practices or specific business logic Need for human review: AI isn't perfect; human expertise is essential to validate and refine translations

  11. SAS to Python Post Processing (Merging chunks and cleaning any repeated statement) Python Conversion of Chunk 1 Chunk 1 Code Input Python Conversion of Chunk 2 Summary of Chunk 1 Chunk 2 Code Cleaning (Removes comments, white spaces,) Python Conversion of Chunk n Summary of Chunk (n-1) Chunk n Python Output

  12. SAS to Python Convert SAS built-in function into Python function or equivalents Add comments Translate dynamic macro constructs (like %LET or &) into equivalent Python variables or calls using f-string Convert SAS Macro into Python equivalent function

  13. SAS Python Convert SAS Macro into Python equivalent function For recodes, the AI translated exactly as the human would have.

  14. Language Translation

  15. Language Translation (Pilot) Capabilities Captures cultural nuances Professional wording Careful word choices for sensitive topics Preserve contexts English to Hindi English to Spanish English to Korean

  16. Translation Example Question Main Topic Changes in sales or revenue <Reference period>, how did this <business/agency/etc.> s net profits change compared to what was normal before <event>? Question Wording If this <business/agency/etc.> did not make a profit during this time period, select Not applicable. If grid, subitems Increased a lot Increased a moderate amount Did not change Response options Decreased a moderate amount Decreased a lot Not applicable CDC Question Bank

  17. Translation Example (EN-KO) English OpenAI LLM Google Translate Word choices Profit Profit Act of selling Event Incident A planned public or social occasion Language flow (Emphasis) If this <business/agency/etc.> did not make a profit during this time period, select Not applicable." Use more adequate language for Not Applicable More straightforward and flows well with previous sentence. Emphasis on the If statement. Formality Informal, inconsistent use of formal language Formal, Consistent, and Concise Increased a lot Increased a moderate amount Did not change Decreased a moderate amount Decreased a lot Not applicable

  18. Data privacy and security Model accuracy and bias mitigation Measuring Model Performance/ Quality Assessment Challenges Integration with existing systems

  19. Lifecycle and Key Dimensions of an AI System Source from NIST : https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.100-1.pdf

  20. Tailoring solution based on current organization structure and considering AI lifecycle stages Scalability and flexibility requirements to accommodate future growth and evolving business needs Training and upskilling initiatives to empower staff with the necessary skills for successful AI implementation Consideration

  21. For any questions, please contact us at taylor.wilson@revealgc.com Thank you angi.lee@revealgc.com

More Related Content