Predicting Poverty Using Socio-Economic Indicators in Shelby County and Memphis
Explore a data science approach to predict poverty using socio-economic indicators in Shelby County and Memphis. Discover the challenges with poverty data estimates and data-driven approaches to address them. Research questions focus on quantifying infrastructure, urban revitalization, and education attainment to predict poverty rates.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
A Data Science Approach to Predicting A Data Science Approach to Predicting Poverty Using Socio Poverty Using Socio- -Economic Indicators Indicators The Case of Shelby County The Case of Shelby County and City of Memphis and City of Memphis Economic Srikar Velichety Chen Zhang Brian Hoogstra
The Problem Unacceptable
Where does Memphis Stand? Overall Poverty Rank 1 2 3 4 5 6 7 8 9 10 Overall Poverty Rate 19% 17% 17% 15% 15% 15% 15% 14% 14% 14% Overall Poverty Rank 1 2 3 4 5 6 7 8 9 10 Overall Poverty Rate 30.0% 21.4% 21.1% 21.1% 18.6% 17.1% 16.7% 16.3% 16.2% 16.0% MSA Greater than 1,000,000 (53 MSA) 2017 MSA Greater than 500,000 (107 MSA) 2017 New Orleans-Metairie, LA Metro Area Memphis, TN-MS-AR Metro Area Tucson, AZ Metro Area Cleveland-Elyria, OH Metro Area Detroit-Warren-Dearborn, MI Metro Area Birmingham-Hoover, AL Metro Area San Antonio-New Braunfels, TX Metro Area Riverside-San Bernardino-Ontario, CA Metro Area Miami-Fort Lauderdale-West Palm Beach, FL Metro Area Buffalo-Cheektowaga-Niagara Falls, NY Metro Area McAllen-Edinburg-Mission, TX Metro Area Bakersfield, CA Metro Area Fresno, CA Metro Area El Paso, TX Metro Area New Orleans-Metairie, LA Metro Area Memphis, TN-MS-AR Metro Area Tucson, AZ Metro Area Baton Rouge, LA Metro Area Winston-Salem, NC Metro Area Lakeland-Winter Haven, FL Metro Area
Issues with Poverty data estimates Household surveys of wealth are expensive and not comprehensive. Data is spread over a variety of agencies and sources. Incompatibility of local and national level data sources.
Data Driven Approaches Use of Cell Phone Call Data Records to predict incidence of poverty Nighttime imaging as a proxy for economic activity to identify concentrations of poor regions.
Research Questions How do we quantify infrastructure and urban revitalization measures from Satellite Image data and how are they associated with poverty rates? How are the neighborhood characteristics of a census tract associated poverty rates of the focal tract? How do measures of education attainment supplement infrastructure and neighborhood characteristics to predict poverty rates?
Methodology PROBLEM IDENTIFICATION OBJECTIVES DESIGN (a) Devise metrics for quantifying infrastructure development and revitalization in a census tract using image analytics on historic snapshots (b) Quantify the spatial impact of the characteristics of the neighborhood tracts on the poverty rate in focal census tract using social network analysis (c) Build robust models for predicting poverty rate using education, ethnicity, infrastructure and neighborhood measures Feature Engineering and Analytics to identify How to accurately predict poverty rate in a census tract using infrastructure development and revitalization, neighborhood, education and ethnicity measures? (a) Variables related to infrastructure development and revitalization using deep learning methods. (b) Variables related to social network structure of neighborhood census tracts. (c) Variables related to education and ethnicity measures. COMMUNICATION EVALUATION DEMONSTRATION (a) How good is the model performing when compared to the existing models? (b) What value do each of the variables add to the model? (c) How robust is the model to changes in data and techniques? (a) Methods of combining Convolutional Neural Networks and Recurrent Neural Networks to detect sequences in images. (b) Presentation of artifacts that can be replicated by researchers and practitioners for mining images and social networks. Models for predicting poverty rate in a census tract. Comparison with existing models.
Data Sources Day-time Satellite Images Night-time Satellite Images Real Estate Transactions Real Estate Attributes
Challenges with Data Collection Cloud Cover for Daytime images Biases in Census Bureau reporting Missing Latitude/Longitude data in City Assessor s Office DB
Image Segmentation Original 3 Clusters 4 Clusters
Basic Correlation Plot Target
Next Steps Combine Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN) to develop infrastructure development indicators (Validate with the number of business licenses data). Use high dimensional statistics to build predict poverty rates. Analyze feature importance and interactions (Using interpretability metrics).