
Improving Computing Degree Programs for IT Job Market
This research delves into enhancing computing degree programs to meet industry demands, analyzing job availability and types, and sourcing job data from various platforms. Data collection and skills categorization are detailed to address the IT job market's needs effectively.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Mining for Jobs in IT Andrew Aken April 2008
Synopsis Purpose Data source Data Collection process Skill Extraction process Skill Clustering process Results Computer Science Data Mining Colloquium 2 25 April 2008
Purpose of Research The primary driver for this research is to determine what improvements need to be made in Computing degree programs to address the skills needs of the organizations hiring their graduates. Additionally: Find what types of jobs are available Determine the relative quantity of different job types Computer Science Data Mining Colloquium 3 25 April 2008
Job Ad Data Sources Jobs extracted from: HotJobs.com Monster.com SimplyHired.com (through August, 2007) Searches based upon a combination of: Majors E.g., CS or Computer Science Degree names Bachelors or BS or Masters or BA or Degree Daily search of jobs posted within the past 24 hours Computer Science Data Mining Colloquium 4 25 April 2008
Data Collection Process Define format of search fields Root URI: http://jobsearch.monster.com/Search.aspx?re=130&cy=us&JSNONREG=1&... Job posting URI: http://jobview.monster.com/getjob.asp?JobID=56049704 Search Field: q Page number field: pg Number of days field: tm JobID field: JobID Results per page: 25 Computer Science Data Mining Colloquium 5 25 April 2008
Data Collection (cont.) Each day, jobs are automatically retrieved from job search sites Job data is stored in an SQL database Date retrieved JobID Job title (when found) Salary (in progress) Location (in progress) Web page title URL Referrer URI Original HTML Extracted text Computer Science Data Mining Colloquium 6 25 April 2008
Skills Categorizations Soft skills Business skills Technical skills Problem-solving (8) Interpersonal (5) Work ethic (7) Communications (4) Business processes (6) Management (6) Project management (2) Strategy (4) Software development (10) Business applications (6) Information management (5) Hardware management (5) ~ ~ Programming (15) Computer Science Data Mining Colloquium 7 25 April 2008
Skill Retrieval Process Skills stored in their root word forms where applicable E.g., compilers is stored as compiler Job ad is stripped of HTML, hidden text, and links Stripped job ad is also converted to root words Search for skills and their synonyms in the stripped ad or the root word ad Multiple word phrases combined in several ways visual basic searched as visualbasic , visual_basic , & visual- basic Phrases between the first and last non- common usage skills are evaluated to see if they may indicate a new skill Proximity analysis Discovered skills are periodically reviewed manually Computer Science Data Mining Colloquium 8 25 April 2008
Skill Frequencies (May June) Skill Skill Frequency Frequency 33.29% 28.69% 27.57% 26.08% 23.18% 21.09% 20.10% 18.86% 18.01% 17.19% 17.15% 17.06% 14.88% 13.98% 13.56% 13.43% 12.80% 12.50% 12.47% 12.10% 11.70% 11.37% 11.16% 10.55% Security C/C++ SQL Programming Microsoft operating systems Java / J2EE / J2P Leadership Project Management / Planning / Budgeting / Scheduling Software Development Oracle databases UNIX operating systems Business Strategy Certification Finance XML Generic databases HTML / XHTML / DHTML Open-source operating systems Marketing JavaScript Accounting Microsoft databases Object Oriented Programming Computer Science Data Mining Colloquium 9 25 April 2008 NET
Skill Frequencies (May June) Skill Skill Frequency Frequency 1.88% 1.68% 1.51% 1.38% 1.31% 1.13% 1.12% 1.09% 0.95% 0.87% 0.72% 0.69% 0.59% 0.48% 0.43% 0.32% 0.29% 0.28% 0.28% 0.23% 0.21% 0.20% 0.14% 0.10% 0.06% 0.04% 0.02% 0.00% 0.00% 0.00% Python Apple operating systems Eclipse Mainframe operating systems System Testing Cobol Systems Design Electronic Data Interchange Negotiation Skills ColdFusion TCL CGI Oral Communications Conflict Resolution Ada Forth Business Continuity Planning 4GL DSS / GDSS LAMP Fortran FrontPage Pascal Transaction Processing Systems Lisp Smalltalk Prolog Eiffel Modula-2 Simula 10 25 April 2008
Skill Clustering Process Skills found are stored in a database with ID of job ad Skills are retrieved from table with each ad as a case and each skill representing an attribute Outliers are eliminated Hierarchical Agglomerative Clustering is used on a subset of the data Produces nested sequence of partitions to determine the most appropriate number of clusters Computer Science Data Mining Colloquium 11 25 April 2008
Sample HAC Dendrogram Computer Science Data Mining Colloquium 12 25 April 2008
Skill Clustering Process (cont.) K-Means Cluster analysis is performed on the entire set to generate job classifications SPSS can only handle about 2200 job records with 69 skill attributes using HAC HAC with subset of job records used to determine appropriate number of clusters 20 clusters were identified and later verified using multiple runs with differing numbers of clusters Clusters are named according to skill groups found Generated clusters are then grouped by function Computer Science Data Mining Colloquium 13 25 April 2008
Categories of IT Jobs Computer Science Data Mining Colloquium 14 25 April 2008
Sample Cluster Web Programmers Web Programmers Members Cluster number 6,187 18 86.10% 82.80% 50.10% 49.20% 35.20% 32.20% 21.80% 20.30% 19.60% 19.10% 18.50% 18.20% 17.40% 16.10% 16.10% 15.70% 15.50% 13.30% 13.20% 11.20% 10.60% HTML / XHTML / DHTML JavaScript XML Java / J2EE / J2P AJAX Programming Web-based Application Development Software Development Marketing C/C++ JavaServer Pages SQL User Interface Design Microsoft Operating Systems Object-Oriented Programming ASP Security Dreamweaver PHP Business Strategy UNIX operating systems 15 25 April 2008
Sample Cluster Project Analysts Project Analysts / Managers Members Cluster number Project Management / Planning / Budgeting / Scheduling 21,942 17 100.00% 56.10% 42.70% 25.60% 25.00% 19.50% 19.40% 18.30% 16.60% 15.90% 15.40% 14.50% 14.20% 13.70% 13.20% 10.90% 10.20% Leadership Business Strategy Finance Certification Marketing Enterprise Resource Planning Accounting Responsibility C/C++ Security CASE Tools Business Process Design / Reengineering Software Development Contracting and legal Integrity / Honesty / Ethics Change Management 16 25 April 2008
dogs-it.org website Complete set of clusters, skills, and certifications from the data collected April through June of 2007 located on the website: www.dogs-it.org Computer Science Data Mining Colloquium 17 25 April 2008
Collected Data Initial analysis period of April June, 1997 240,000 jobs 286 Skills 196 Certifications Computer Science Data Mining Colloquium 18 25 April 2008
Current Data Currently 703,998 jobs stored (growing at about 6000/day) 456,136 CS jobs (65.12%) 223,594 MIS jobs (31.92%) 117,562 IST jobs (16.78%) Started in January 2008 1,709 skills & their synonyms are being analyzed Includes skills from O*NET database 5,582,167 references to skills found in job ads Computer Science Data Mining Colloquium 19 25 April 2008
So what? What can we expect to learn from this type of analysis? Which skills are most in demand for a given degree What trends can be observed in popularity of skills (over time) What skill groups are most popular Skill groups are analogous to job categories/types What new skills are emerging (over time) Strength of market for IT jobs Provide new methods for searching for a job Computer Science Data Mining Colloquium 20 25 April 2008
Remaining issues & enhancements Eliminate invalid pages Complete word root analysis of skills and documents Complete proximity search of skills Effective weighting of skills SPSS cannot handle large set of data for performing HAC Implement HAC within DOGS-IT Real-time clustering Develop UI to allow direct access to data Expert system for students, job seekers, industry, & education Alternative job search mechanisms Computer Science Data Mining Colloquium 21 25 April 2008
Questions? Questions? Computer Science Data Mining Colloquium 22 25 April 2008