
Feature Selection Techniques for Data Analysis
Explore the world of feature selection with Professor V.B. More at MET's IOE, BKC, Nashik. Learn about Scikit-learn datasets, managing categorical data, non-negative matrix factorization, and more. Discover how NMF can be used instead of standard PCA and the intricacies of X.WH factorization. Dive into the world of matrix factorization methods and their applications in data analysis. With insights from Prof. V.B. More, enhance your understanding of feature selection in data science.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Feature Selection Prof V B More MET s IOE BKC Nashik
Feature Selection Feature Selection Unit 2 Feature Selection Scikit- learn Dataset, Creating training and test sets, managing categorical data, Managing missing features, Data scaling and normalization, Feature selection and Component Analysis(PCA)-non negative matrix factorization, Sparse PCA, Kernel PCA. Atom Extraction and Dictionary Learning. Filtering, Principle Prof V B More, MET BKC IOE Nashik 2
Feature Selection Feature Selection Non-Negative Matrix Factorization When the dataset is made up of non- negative elements, it is possible to use non-negative matrix factorization (NNMF) instead of standard PCA. The algorithm optimizes a loss function based on the Frobenius norm: Prof V B More, MET BKC IOE Nashik 3
Feature Selection Feature Selection Non-Negative Matrix Factorization X WH If dim(X) = n x m, then dim(W) = n x p and dim(H) = p x m with p equal to the number of requested components, where pxm is < nxm Prof V B More, MET BKC IOE Nashik 4
Feature Selection Feature Selection Non-Negative Matrix Factorization NMF (Nonnegative Matrix Factorization) is a matrix factorization method where matrices need to be nonnegative. Prof V B More, MET BKC IOE Nashik 5
Feature Selection Feature Selection Non-Negative Matrix Factorization Suppose we factorize a matrix matrices W and H so that X WH After factorization, there is no guarantee that we can recover the original matrix X, it will be approximated as best as we can. X into two Prof V B More, MET BKC IOE Nashik 6
Feature Selection Feature Selection Non-Negative Matrix Factorization Now, suppose that X is composed of m rows x1,x2, ,xm, W is composed of k rows w1,w2, ,wk, H is composed of m rows h1,h2, ,hm. Each row in X can be considered a data point. E.g. while decomposing images, each row in X is a single image, and each column represents some feature. Prof V B More, MET BKC IOE Nashik 7
Feature Selection Feature Selection Non-Negative Matrix Factorization For ith row in X, i.e. xi, the equation becomes: Here, xi is the weighted sum of some components, where each row in H is a component, and each row in W contains the weights of each component. Prof V B More, MET BKC IOE Nashik 8
Feature Selection Feature Selection Non-Negative Matrix Factorization The final reconstruction is purely additive and it is particularly efficient for images or text where there are normally no non-negative elements. from sklearn.datasets import load_iris from sklearn.decomposition import NMF >>> iris = load_iris() >>> iris.data.shape (150, 4) >>> nmf = NMF(n_components=3, init='random', l1_ratio=0.1) >>> Xt = nmf.fit_transform(iris.data) Prof V B More, MET BKC IOE Nashik 9
Feature Selection Feature Selection Non-Negative Matrix Factorization >>> nmf.reconstruction_err_ 1.8819327624141866 >>> iris.data[0:4] array([[5.1, 3.5, 1.4, 0.2], [4.9, 3. , 1.4, 0.2], [4.7, 3.2, 1.3, 0.2], [4.6, 3.1, 1.5, 0.2]]) >>> Xt[0:4] array([[1.80836871, 0.12714934, 0.34218569], [1.55297544, 0. , 0.53735756], [1.65309356, 0.11752503, 0.32910144], [1.56050028, 0.17917708, 0.40195966]]) >>> nmf.inverse_transform(Xt[0:4]) #reconstruction array([[5.10179755, 3.49786954, 1.39825837, 0.20234533], [4.85168299, 3.05106855, 1.45919973, 0.15445694], [4.69663952, 3.20306782, 1.30575459, 0.19071325], [4.62798492, 3.07159854, 1.45974132, 0.26207716]]) >>> nmf.reconstruction_err_ : Frobenius norm of the matrix difference between the training data X and the reconstructed data WH from the fitted model. Prof V B More, MET BKC IOE Nashik 10
Feature Selection Feature Selection Non-Negative Matrix Factorization NNMF, together with other factorization methods, will be very useful for more advanced techniques, recommendation systems modeling. such and as topic Prof V B More, MET BKC IOE Nashik 11