
NBA Rebounding Analysis with Alteryx - Web Scraping & Predictive Analytics
Explore how Alteryx is used to scrape NBA player data from BasketballReference.com, analyze season statistics, and build predictive models. Learn key concepts like HTML table structure, batch macros, XML parsing, and regex tools in Alteryx Predictive Suite.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Scraping the Glass: NBA Rebounding Analysis Web Scraping and Predictive Analytics with Alteryx Greg Murray Hector Amaya
1. Use Alteryx to scrape BasketballReference.com A. Return a list of all basketball players in catalog B. Return season statistics for all players in 2000 or greater Objectives 2. Use the resulting data set to build a predictive model
Batch Macros Download Tool Key Concepts XML parse Tool Regex Tool and Regex Parsing Alteryx Predictive Suite
Step 1: Examine the webpage(s) that will be scraped Objective 1a: Collect list of Players from BR Catalog
Step 2: Examine the structure of HTML for the desired section Use your web browsers inspect function (right click Inspect) Objective 1a: Collect list of Players from BR Catalog
Basic structure for HTML tables Column 1 Column 2 Column 3 Data 1 Data 2 Data 3 <Table> </Table> - Table Tag - Table Heading Tag - Table Row Tag - Column Headers Objective 1a: <thead> <tr> </tr> </thead> <tbody> <th>Column 1</th> <th>Column 2</th> <th>Column 3</th> Collect list of Players from BR Catalog - Table Row End Tag - Table Heading End Tag - Table Body Tag </tbody> <tr> </tr> <td> data 1</td> <td> data 2</td> <td> data 3</td> - Table Detail (cell) -Table Body End Tag - Table End Tag
Step 3: Use Alteryx to retrieve the HTML Pass first URL to download tool using a text input tool Use the Text columns tool parsing the HTML into rows Objective 1a: Add record IDs to rows Collect list of Players from BR Catalog
Step 4: Isolate code containing table data Challenge: We need to identify the table dynamically Table can be located on different lines on each page Table can have different # of rows depending on # of players Objective 1a: Collect list of Players from BR Catalog
Step 5: Isolate and parse the table headers Isolate table headers by reusing the section of tools we used to isolate the table Objective 1a: Parse the table headers using an XML Parse tool Collect list of Players from BR Catalog
Step 6: Isolate and parse the table detail Isolate table detail by reusing the section of tools we used to isolate the table and table headers Objective 1a: Collect list of Players from BR Catalog
Step 6: Isolate and Parse Table Detail Parse the table detail using a combination of XML Parsing and Regex Objective 1a: Collect list of Players from BR Catalog
Step 7: Union headers to detail Use the Autoc0nfig by Position option Ensure the header data stream in first in the Output Order Objective 1a: Collect list of Players from BR Catalog
Step 8: Validate the output Compare data in the workflow to the webpage Objective 1a: Collect list of Players from BR Catalog
Step 9: Convert the workflow into a batch macro Add a Control Parameter tool to Text input tool at the beginning of the workflow Objective 1a: Collect list of Players from BR Catalog Add a Macro Output tool the end of the workflow
Step 9: Convert the workflow into a batch macro Save the workflow as a Batch Macro into your macro repository If necessary add a new macro repository through the user settings Objective 1a: Collect list of Players from BR Catalog
Step 10: Create a new workflow using the new macro Pass the macro the player catalog URLs to download and parse all of the Player Catalog webpages Objective 1a: Collect list of Players from BR Catalog
Step 1: Examine the webpage(s) that will be scraped Objective 1b: Return Season Statistics for individual Players
Step 2: Examine the structure of HTML for the desired section Use your web browsers inspect function (right click Inspect) Objective 1b: Return Season Statistics for Individual Players
Step 3: Use Alteryx to retrieve the HTML Pass first URL to download tool using a text input tool Use the Text columns tool parsing the HTML into rows Objective 1b: Add record IDs to rows Return Season Statistics for Individual Players
Step 4: Isolate code containing table data Challenge: We need to identify the table dynamically Table can be located on different lines on each page Table can have different # of rows depending on seasons played There are multiple tables on a player s page Objective 1b: Return Season Statistics for Individual Players
Step 5: Isolate and parse the table headers Isolate table headers by reusing the section of tools we used to isolate the table Objective 1b: Parse the table headers using an XML Parse tool Return Season Statistics for Individual Players
Step 6: Isolate and the parse table detail Isolate table detail by reusing the section of tools we used to isolate the table and table headers Objective 1b: Parse the table detail using a combination of XML Parsing and Regex Return Season Statistics for Individual Players
Step 7: Union headers to detail/Append URL suffix Use the Autoc0nfig by Position option Ensure the header data stream in first in the Output Order Objective 1b: Return Season Statistics for Individual Players
Step 8: Convert the workflow into a batch macro Add a Control Parameter tool to Text input tool at the beginning of the workflow Objective 1b: Return Season Statistics for Individual Players Add a Macro Output tool the end of the workflow
Step 8: Convert the workflow into a batch macro Open the Interface Designer (CTRL +ALT + D) Change Batch Macro Output mode to : Auto Configure by Name Objective 1b: Return Season Statistics for Individual Players Save the workflow as a Batch Macro into your macro repository
Step 9: Use the batch macro to parse the player pages Open a New workflow Use the Player Catalog list as your input data Concatenate https://www.basketball-reference.com with the Player URL suffix Pass the Concatenated URL into the batch macro Limit the number of records per run using sample or filter tool Objective 1b: Return Season Statistics for Individual Players
Regex : www.regexr.com Batch Macros: https://community.alteryx.com/t5/Live- Training/Live-Training-Build-Your-First-Batch-Macro/m-p/52900 Alteryx Tool Mastery: Download Regex Resources