Discover the Power of Variable Clustering Analysis

variable clustering n.w
1 / 13
Embed
Share

Uncover insights by utilizing variable clustering to group correlated variables together and separate uncorrelated ones efficiently. Learn about the algorithm, eigenvalue thresholds, and the VARCLUS procedure. Explore how to extract numeric input variables and apply ODS for cluster quality evaluation.

  • Analysis
  • Clustering
  • Data
  • Insights
  • Algorithm

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Variable Clustering 1

  2. Variable Clustering Variable clustering finds groups of variables that are as correlated as possible among themselves and as uncorrelated as possible with variables in other clusters. The basic algorithm is binary and divisive. All variables start in one cluster. A principal components analysis is done on the variables in the cluster. 2

  3. If the second eigenvalue is greater than a specified threshold (in other words, there is more than one dominant dimension), then the cluster is split. The PC scores are then rotated obliquely so that the variables can be split into two groups. This process is repeated for the two child clusters until the second eigenvalue drops below the threshold. 3

  4. The VARCLUS Procedure PROC VARCLUS DATA=SAS-data-set<options>; VARvariables; RUN; 4

  5. Variable Clustering, the develop data set Checking Deposits Mortgage Balance Number of Checks Teller Visits Credit Card Balance Age 5

  6. Where we are proc proc contents contents data=d.imputed;run run; 6

  7. Get name of numeric input variables from dictionary.columns /*get names of numeric variables in a macro variable note that leaving out brclus5 gives us 4 indicator variables*/ proc proc sql sql; describe table dictionary.columns; select name into : inputs separated by " " from dictionary.columns where memname="IMPUTED" and libname="D" and name ^= "Ins" and name ^="brclus5" and type="num" ; quit quit; %put &inputs; 7

  8. Variable Clustering proc proc varclus varclus data=d.imputed maxeigen=.7 var &inputs ; title "Variable Clustering of Imputed Data Set"; run run; title; .7 hi short; 8

  9. Use ODS to get some stuff. ods output clusterquality=summary rsquare=clusters; proc proc varclus varclus data=d.imputed maxeigen=.7 short hi; var &inputs ; run run; .7 9

  10. proc proc print print data=summary;run run; 10

  11. Numerous possibilities for summarizing clusters. Principal Components Pick one variable: Based on subject matter Statistics 11

  12. proc proc print print data=clusters; where numberofclusters=39 run run; 39; 12

  13. One variable per cluster /* Pick one variable per cluster for the first 10 The others are clusters of one variable */ %let reduced= MIPhone MICCBal Dep MM ILS MTGBal Income POS CD IRA brclus1 Sav NSF Age SavBal LOCBal NSFAmt Inv MIHMVal CRScore MIAcctAg InvBal DirDep CCPurc SDB CashBk AcctAge InArea ATMAmt DDABal DDA brclus2 CC HMOwn DepAmt Phone ATM LORes brclus4; 13

Related


More Related Content