Understanding Principal Component Analysis: Key Concepts and Applications

principal component analysis features extraction n.w
1 / 20
Embed
Share

Explore the underlying concepts and practical applications of Principal Component Analysis (PCA) in data processing, feature extraction, and dimensionality reduction. Learn how PCA helps in efficiently representing data and identifying patterns for tasks like classification, visualization, and noise reduction.

  • PCA
  • Data Analysis
  • Dimensionality Reduction
  • Feature Extraction
  • Data Applications

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Principal Component Analysis Features extraction and representation

  2. Principal Component Analysis Abstract : In the present big data era, there is a need to process large amounts of unlabeled data and find some patterns in the data to use it further. Need to discard features that are unimportant and discover only the representations that are needed. It is possible to convert high-dimensional data to low-dimensional data using different techniques, this dimension reduction is important and makes tasks such as classification, visualization, communication and storage much easier. The loss of information should be less while mapping data from high- dimensional space to low-dimensional space.

  3. Principal Components Analysis Ideas ( PCA) Does the data set span the whole of d dimensional space? For a matrix of m samples x n genes, create a new covariance matrix of size n x n. Transform some large number of variables into a smaller number of uncorrelated variables called principal components (PCs). developed to capture as much of the variation in data as possible

  4. Example Applications Face Recognition Image Compression Gene Expression Analysis Data Reduction Data Classification Trend Analysis Factor Analysis Noise Reduction

  5. Principal Component Analysis Note: Y1 is the first eigen vector, Y1 Y2 x Y2 is the second. x x x x x x x Y2 ignorable. x x x x x x x x x Key observation:Varience = Largest x x x x x x x x

  6. Principal Component Analysis: one attribute first Temperature 42 40 24 30 15 18 15 30 15 30 35 30 40 30 Question: how much spread is in the data along the axis? (distance to the mean) Variance=Standard deviation^2 n = 2 ( ) X X i 2 = 1 i s ) 1 ( n

  7. Now consider two dimensions X=Temperature X=Temperature Y=Humidity Y=Humidity 40 40 40 30 15 15 15 30 15 30 30 30 40 30 90 90 90 90 70 70 70 90 70 70 70 90 70 90 Covariance: measures the correlation between X and Y cov(X,Y)=0: independent Cov(X,Y)>0: move same direction Cov(X,Y)<0: move opposite direction n = ( )( ) X X Y Y i i = 1 i cov( , ) X Y ) 1 ( n

  8. More than two attributes: covariance matrix Contains covariance values between all possible dimensions (=attributes): cov( | ( ij ij Dim c c C = = nxn , )) Dim i j Example for three attributes (x,y,z): cov( , ) cov( , ) cov( , ) x x x y x z = cov( , ) cov( , ) cov( , ) C y x y y y z cov( , ) cov( , ) cov( , ) z x z y z z

  9. What is Principal Component Analysis? They are the directions where there is the most variance, the directions where the data is most spread out.

  10. To find the direction where there is most variance, find the straight line where the data is most spread out when projected onto it. A vertical straight line with the points projected on to it will look like this:

  11. On this line the data is way more spread out, it has a large variance. In fact there isn t a straight line you can draw that has a larger variance than a horizontal one. A horizontal line is therefore the principal component in this example.

  12. Eigenvalues & Eigenvectors Vectors x having same direction as Ax are called eigenvectors of A (A is an n by n matrix). In the equation Ax= x, is called an eigenvalue of A. 2 3 3 12 3 = = 4 x x 2 1 2 8 2

  13. Principal components:- 1. principal component (PC1) The eigenvalue with the largest absolute value will indicate that the data have the largest variance along its eigenvector, the direction along which there is greatest variation 2. principal component (PC2) the direction with maximum variation left in data, orthogonal to the 1. PC In general, only few directions manage to capture most of the variability in the data.

  14. Steps of PCA :- Let be the mean vector (taking the mean of all rows) X Adjust the original data by the mean X = X Compute the covariance matrix C of adjusted X Find the eigenvectors and eigenvalues of C. X

  15. Eigenvalues Calculate eigenvalues and eigenvectors x for covariance matrix: Eigenvalues j are used for calculation of [% of total variance] (Vj) for each component j: n = x j = = 100 V n j x n = x 1 x 1

  16. Principal components - Variance 25 20 Variance (%) 15 10 5 0 PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10

  17. Transformed Data Eigenvalues j corresponds to variance on each component j Thus, sort by j Take the first p eigenvectors ei; where p is the number of top eigenvalues These are the directions with the largest variances y e x x 1 1 i 1 1 i y ... e ... ... x x 2 2 i = 2 2 i y e x x ip p in n

  18. PCA > Original Data Retrieving old data (e.g. in data compression) RetrievedRowData=(RowFeatureVectorTx FinalData)+OriginalMean Yields original data using the chosen components

  19. Refrences :- https://pdfs.semanticscholar.org/6e72/2fafa3b1191f7c779ddf09b64eda01c94 a28.pdf https://en.wikipedia.org/wiki/Principal_component_analysis principalcomponentanalysis-150314161616-conversion-gate01.pdf https://georgemdallas.wordpress.com/2013/10/30/principal-component- analysis-4-dummies-eigenvectors-eigenvalues-and-dimension-reduction/ https://www.slideshare.net/CvilleDataScience/az-tecpca- datasciencemeetupkscott20140218?next_slideshow=2

  20. THANKYOU

Related


More Related Content