
Multidimensional Gaussian Concepts in Machine Learning
Explore topics like random vectors, covariance, and conditioning in machine learning, with a focus on multidimensional Gaussian distributions. Understand the importance of covariance matrices and how they relate to data points in ML models. Dive into the world of random vectors and their significance in predicting outcomes across different dimensions.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Hodgepodge CSE 312 Spring 21 Lecture 27
Announcements Monday is a holiday, we re listing changed office hours on a pinned Ed post. Remember to find groups for the final (unless you want to work alone, of course). Ed post up also consider filling out if you re a group of two and want a third person. We ve made it through the core content! Today we re revisiting some old topics Wednesday is an application lecture (probability and algorithms) Friday will be a victory lap (wrap up the course/put it into context of what comes next/answer lingering questions). Concept checks for this week due Tuesday (because of holiday)
Today Cover a topic or two that you got a small taste of, but show up much more frequently in ML. Random Vectors More on Covariance Multidimensional Guassians More on Conditioning
Preliminary: Random Vectors In ML, our data points are often multidimensional. For example: To predict housing prices, each data point might have: number of rooms, number of bathrooms, square footage, zip code, year built, To make movie recommendations, each data point might have: ratings of existing movies, whether you started a movie and stopped after 10 minutes, A single data point is a full vector
Preliminary: Random Vector A random vector ? is a vector where each entry is a random variable. ?[?] is a vector, where each entry is the expectation of that entry. For example, if ? is a uniform vector from the sample space 1 2 3 3 6 ? ? = 0,2,4? 1 2 0 2 , ,
Covariance Matrix Remember Covariance? Cov ?,? = ? (? ?[?])(? ? ? ) = ? ?? ?[?]?[?] We ll want to talk about covariance between entries: Define the covariance matrix Cov(?1,?1) Cov(??,??) Cov(??,?1) Cov(?1,??) Cov(??,??) =
Covariance Let s think about 2 dimensions. Let ? = ?1,?2 What is ? Which of these pictures are 200 i.i.d. samples of ?? ?where ??~?(0,1) and ?1and ?2are independent.
Covariance Let s think about 2 dimensions. Let ? = ?1,?2 What is ? Which of these pictures are 200 i.i.d. samples of ?? ?where ??~?(0,1) and ?1and ?2are independent. =1 0 1 0
Unequal Variances, Still Independent Let s think about 2 dimensions. Let ? = ?1,?2 independent. What is ? Which of these pictures are i.i.d. samples of ?? ? where ?1~?(0,5), ?2~?(0,1) and ?1 and ?2 are
Unequal Variances, Still Independent Let s think about 2 dimensions. Let ? = ?1,?2 independent. What is ? Which of these pictures are i.i.d. samples of ?? ? where ?1~?(0,5), ?2~?(0,1) and ?1 and ?2 are =5 0 1 0
What about dependence. When we introduce dependence, we need to know the mean vector and the covariance matrix to define the distribution (instead of just the mean and the variance). Let s see a few examples
Dependence Let s think about 2 dimensions. Let ? = ?1,?2 dependent. Cov ?1,?2 = 5 What is ? Which of these pictures are i.i.d. samples of ?? ? where Var ?1 = 1, Var ?2 = 1 BUT ?1 and ?2 are
Dependence Let s think about 2 dimensions. Let ? = ?1,?2 dependent. Cov ?1,?2 = 5 What is ? Which of these pictures are i.i.d. samples of ?? ? where Var ?1 = 1, Var ?2 = 1 BUT ?1 and ?2 are =1 5 1 5
Dependence Let s think about 2 dimensions. Let ? = ?1,?2 dependent. Cov ?1,?2 = 3 What is ? Which of these pictures are i.i.d. samples of ?? ? where ???(?1) = 1, Var ?2 = 1 BUT ?1 and ?2 are
Dependence Let s think about 2 dimensions. Let ? = ?1,?2 dependent. Cov ?1,?2 = 3 What is ? Which of these pictures are i.i.d. samples of ?? ? where ???(?1) = 1, Var ?2 = 1 BUT ?1 and ?2 are 1 3 1 = 3
Using the Covariance Matrix What were those ellipses in those datasets? How do we know how many standard deviations from the mean a 2D point is, for the independent, variance 1 ones Well (?1 ? ?1) is the distance from ? to the center in the ?-direction. And (?2 ? ?2) is the distance from ? to the center in the ?-direction. So the number of standard deviations is That s just the distance! In general, the major/minor axes of those ellipses were the eigenvectors of the covariance matrix. And the associated eigenvalues tell you how the directions should be weighted. 2 2+ ?2 ? ?2 ?1 ? ?1
Probability and ML You re going to do a lot of conditional expectations, let s talk about why Many problems in ML: Given a bunch of data points, you ll find a function ? that you hope will predict future points well. We usually assume there is some true distribution ? of data points (e.g. all theoretical possible houses and their prices). You get a dataset ? that you assume was sampled from ? to find ?? ?? is a lot like an MLE it depends on the data, so before you knew what ? was, ? was a random variable. You then want to figure out what the true error is if you knew ?.
Probability and ML But ? is a theoretical construct. What can we do instead? Get a second dataset ? drawn from ? (drawn independently of ?) (or actually save part of your database before you start). Then ??error of ? = ??error of ??? But how confident can you be? You ll make confidence intervals (statements like the true error is within 5% of our estimate with probability at least .9) using concentration inequalities.
Practice with conditional expectations Consider of the following process: Flip a fair coin, if it s heads, pick up a 4-sided die; if it s tails, pick up a 6- sided die (both fair) Roll that die independently 3 times. Let ?1,?2,?3 be the results of the three rolls. What is ?[?2]? ?[?2|?1= 5]? ?[?2|?3= 1]?
Using conditional expectations Let ? be the event the four sided die was chosen ? ?2 = (?)? ?2? + ? ? ?2 ? =1 2 3.5 = 3 ?[?2|?1= 5] event ?1= 5tells us we re using the 6-sided die. ? ?2?1= 5 = 3.5 2 2.5 +1 ?[?2|?3= 1]We aren t sure which die we got, but is it still 50/50?
Setup Let ?be the event ?3= 1 ? =1 2 1 6+1 2 1 5 24 4= ? ? = ?|? (?) (?) 1 4 1 5/24=3 2 = 5 1 6 1 5/24=2 ? ? = (?| ?) ( ?) good confirmation) = 5(we could also get this with LTP, but it s 2 (?)
Analysis ? ?2?3= 1 = ?|?3= 1 ? ?2?3= 1 ? + ?|?3= 1 ? ?2?3= 1 ? Wait what? This is the LTE, applied in the space where we ve conditioned on ?3= 1. Everything Everything is conditioned on ?3= 1. . Beyond that conditioning, it s LTE. =3 5 2.5 +2 A little lower than the unconditioned expectation. Because seeing a 1 has made it ever so slightly more probable that we re using the 4-sided die. 5 3.5 = 2.9.