Advanced Feature Selection Techniques in Data Analysis

1 / 21

Embed Share

Explore the intricacies of feature selection with Professor V.B. More at MET's IOE in BKC, Nashik. Dive into topics like creating training and test sets, managing categorical data, handling missing features, scaling and normalizing data, and utilizing techniques like Sparse PCA and Kernel PCA for improved analysis and interpretation of data.

kayse Follow

Uploaded on May 28, 2025 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

Feature Selection Prof V B More MET s IOE BKC Nashik

Feature Selection Feature Selection Unit 2 Feature Selection Scikit- learn Dataset, Creating training and test sets, managing categorical data, Managing missing features, Data scaling and normalization, Feature selection and Component Analysis(PCA)-non negative matrix factorization, Sparse PCA, Kernel PCA. Atom Extraction and Dictionary Learning. Filtering, Principle Prof V B More, MET BKC IOE Nashik 2

Feature Selection Feature Selection Sparse PCA Prof V B More, MET BKC IOE Nashik 3

Feature Selection Feature Selection Sparse PCA Sparse PCA allows exploiting the natural sparsity of data while extracting principal components. Standard PCA selects only the average most important features, assuming that every sample can be rebuilt using the same components. Prof V B More, MET BKC IOE Nashik 4

Feature Selection Feature Selection Sparse PCA This is equivalent to: Each element in SPCA can be rebuilt using its specific components which can include elements normally discarded by a dense PCA. Now the expression becomes: Prof V B More, MET BKC IOE Nashik 5

Feature Selection Feature Selection Sparse PCA PCA suffers from the fact that each principal component combination of all the original variables, thus it is often difficult to interpret the results. is a linear S-PCA using the lasso (elastic net) to produce modified principal components with sparse loadings. Prof V B More, MET BKC IOE Nashik 6

Feature Selection Feature Selection Kernel PCA Prof V B More, MET BKC IOE Nashik 7

Feature Selection Feature Selection Kernel PCA KPCA performs a PCA with non-linearly separable data sets. Consider a projection of each sample into a particular space where the dataset becomes linearly separable. The components of this space correspond to the first, second, ...and so on, principal components PCs and a kernel PCA algorithm computes the projection of our samples onto each of them. Prof V B More, MET BKC IOE Nashik 8

Feature Selection Feature Selection Kernel PCA Consider a dataset made up of a circle with a blob inside: from sklearn.datasets import make_circles >>> Xb, Yb = make_circles(n_samples=500, factor=0.1, noise=0.05) The graphical representation is shown in the following picture. Prof V B More, MET BKC IOE Nashik 9

Feature Selection Feature Selection Kernel PCA Classic PCA approach is not able to capture the non-linear dependency of existing components. However, looking at the samples and using polar coordinates, it is easy to separate the two sets, only considering the radius. Cartesian coordinates are used in classic PCA by which we are not able to separate non-linear behavior, whereas, Polar coordinates using radius helps to separate non-linear behaviour of data. Prof V B More, MET BKC IOE Nashik 10

Feature Selection Feature Selection Kernel PCA Prof V B More, MET BKC IOE Nashik 11

Feature Selection Feature Selection Kernel PCA It is possible to investigate the behavior of a PCA with a radial basis function kernel. The default value for gamma is 1.0/number of features. Technically, the gamma parameter is the inverse of the standard deviation of the RBF kernel (Gaussian function), which is used as similarity measure between two points. A small gamma value define a Gaussian function with a large variance. Prof V B More, MET BKC IOE Nashik 12

Feature Selection Feature Selection Kernel PCA from sklearn.decomposition import KernelPCA >>> kpca = KernelPCA(n_components=2, kernel='rbf', fit_inverse_transform=True, gamma=1.0) >>> X_kpca = kpca.fit_transform(Xb) The instance variable X_transformed_fit_ will contain the projection of our dataset into the new space. Plotting it, we get: Prof V B More, MET BKC IOE Nashik 13

Feature Selection Feature Selection Kernel PCA Prof V B More, MET BKC IOE Nashik 14

Feature Selection Feature Selection Kernel PCA Kernel PCA is a powerful instrument when we think of our dataset as made up of elements that can be a function of radial-basis or polynomials but we are not able to determine a linear relationship among them. Prof V B More, MET BKC IOE Nashik 15

Feature Selection Feature Selection Atom Extraction and Dictionary Learning Prof V B More, MET BKC IOE Nashik 16

Feature Selection Feature Selection Atom Extraction and Dictionary Learning Dictionary learning is a technique which allows rebuilding a sample starting from a sparse dictionary of atoms (similar to principal components). Let Is an input dataset and the target is to find both a dictionary D and a set of weights for each sample: Prof V B More, MET BKC IOE Nashik 17

Feature Selection Feature Selection Atom Extraction and Dictionary Learning Sparse a representation learning method which aims at finding a sparse representation of the input data (also known as sparse coding) in the form of a linear combination of basic elements as well as those basic elements themselves. These elements are called atoms and they compose a dictionary. dictionary learning is Prof V B More, MET BKC IOE Nashik 18

Feature Selection Feature Selection Atom Extraction and Dictionary Learning After the training process, an input vector can be computed as: The optimization problem (which involves both D and alpha vectors) can be expressed as the minimization of the following loss function with L1 norm: Prof V B More, MET BKC IOE Nashik 19

Feature Selection Feature Selection Atom Extraction and Dictionary Learning Sklearn class DictionaryLearning with MNIST datasets is used to determine the number of atoms: from sklearn.decomposition DictionaryLearning from sklearn.datasets import load_digits >>> digits = load_digits() >>> dl = DictionaryLearning(n_components=36, fit_algorithm='lars', transform_algorithm='lasso_lars') >>> X_dict = dl.fit_transform(digits.data) A plot of each atom (component) is shown in the following figure: Prof V B More, MET BKC IOE Nashik import 20

Feature Selection Feature Selection Atom Extraction and Dictionary Learning Prof V B More, MET BKC IOE Nashik 21

Advanced Feature Selection Techniques in Data Analysis

Download Presentation

Presentation Transcript

Related

More Related Content