
Understanding VC Dimensions and Growth Functions in Machine Learning
Learn about VC dimensions, Vapnik-Chervonenkis theory, growth functions, and shatter functions in machine learning. Explore examples and the concept of shattered sets. Discover how to determine the size of a database needed for training samples.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
VC-Dimensions VAPNIK VAPNIK CHERVONENKIS CHERVONENKIS DIMENSION
Example Consider a database consisting of the salary and age for a random sample of the adult population in the United States. We are interested in using the database to answer the question: What fraction of the adult population in the US has: - age between 35 and 45 70,000 - salary between 50,000$ and 70,000$ ? 50,000 35 45
How large does our database need to be? Theorem: Growth function sample bound (from last week) For any class ? and distribution ?, if a training sample ? is drawn from ? of size: ? 1 ?[ln ? + ln then with probability 1 ?, every ? with ???? ? has ???? > 0, or every ? with ???? = 0 has ???? < ? 1 ?]
There are N adults in the US, So there are at most ?4 rectangles, so ? ?4 We receive: ? 1 1 ? ?[ln ?4+ ln ] Which means ? when ? By using VC dimension, we will be able to achieve: n = ?(1 ? ? ? 1 ? ????? ? log + log )
Definitions Given a set S of examples and a concept class ?, we say ? is shattered by ? if for every ? ?, there exists some ? that labels all examples in ? as positive and all examples in ?\A as negative The VC-dimension of ? is the size of the largest set ? shattered by ? The VC dimension is the maximal number ? such that there exists a set of ? points that is shattered by ? Example: Intervals of the real axis
Growth Function / Shatter Function Given a set ? of examples and a concept class ?, ? ? = { ?: ?} For integer ? and class ?, ? ? = max ? =?|? ? |
Examples Intervals of the real axis: ????? = 2, ? ? = ? ?2 Rectangle with axis-parallel edges: ????? = 4, ? ? = ? ?4 Union of 2 intervals of the real axis (Divide an orders set of numbers into two different intervals) ????? = 4, ? ? = ? ?4 Convex polygons: ????? , ? ? = 2?
Half spaces in d dimensions ????? = ? + 1 Proof: ????? ? + 1 S: d unit coordinate vectors + origin A subset of S (Assume includes the origin) Vector w has 1-s in the coordinates corresponding to vectors not in A For each ? ?, ??? 0 and for each ? ?,??? > 0
Half spaces in d dimensions ????? = ? + 1 Proof: ????? < ? + 2 Theorem(Radon): Any set ? ?? with ? ? + 2 can be partitioned into two disjoint subsets ? and ? such that ??????(?) ??????(?)
Growth function sample bound For any class ? and distribution ?, if a training sample ? is drawn from ? of size: ? 1 1 ?] ?[ln ? + ln then with probability 1 ?, every ? with ???? ? has ???? > 0, or every ? with ???? = 0 has ???? < ? If we use VD-dimension: ? 2 ?[log2(2? 2? ) +???2 And later we will see: n = ?(1 ? Where ? = ?????(?) 1 ? ] ? ? 1 ? ?log + log )
Sauers Lemma ? ?? ? ? ? ? If Vcdim(H)=d, then ? ? ?=0
proof ? ?, ? Instead of ? ? ?=0 it is sufficient to prove a stronger claim: Given any set S ?.? ? = ?: H S |{B ?:? ? ?????? ?}|
proof Assume ????? ? ? = ? The proof is by Induction on ?: If ? = 1: ? ? = { 1 ,(0)} ? ? = { 0 } B ?:? ? ?????? ? = {{?1}, } B ?:? ? ?????? ? = { } The empty set is always shattered by ?, so |? ? | = 2 in the first example, and |? ? | = 1 in the second example. Assume inequation holds for sets of size ? 1 and prove for sets of size ?
proof Let ? = {?1, ,??} and define: -?0= { ?2, ,??: 0,?2 ,?? ?[?] ?? 1,?2, ,?? ?[?]} -?1= { ?2, ,??: 0,?2 ,?? ?[?] ??? 1,?2, ,?? ?[?]} - ?0+ ?1 = ?
proof Let ? = {?1, ,??} and define: -?0= { ?2, ,??: 0,?2 ,?? ?[?] ?? 1,?2, ,?? ?[?]} -?1= { ?2, ,??: 0,?2 ,?? ?[?] ??? 1,?2, ,?? ?[?]} - ?0+ ?1 = ?[?]
proof Define: ? = {?2, ,??} Notice: ?0= ?[? ] Using induction assumption: ?0 = ? ? ? ? :? ? ?????? ? = | ? ?:?1 ? ??? ? ? ?????? ? | Define: ? ?: ? = { ?: ? ?.? 1 ?1), , (?? = ( ?1, , ??} ? contains pairs of hypotheses that agree on ? (?2, ??) and differ on ?1 It can be seen that ? ? ?????? ? ? ? ? ?????? ? ? ??? ? {?1}
proof Notice: ?1= ? [? ] Using induction assumption: ?1 = ? ? = ? ? :? ? ?????? ? {?1} ? ? :? ? ?????? ? = | ? ?:?1 ? ??? ? ? ?????? ? | | ? ?:?1 ? ??? ? ? ?????? ? | = Finally: |? ? | = ?0+ ?1 ? ?:?1 ? ??? ? ? ?????? ? ? ?:?1 ? ??? ? ? ?????? ? ? ?:? ? ?????? ? + = =
Lemma Let ? be a concept class over some domain ? and let ? an? ? be sets of ? elements drawn from some distribution ? on ?, where ? 8 ?. ? The event that there exists ? with ???? = 0, but ???? ? ? The event that there exists ? with ???? = 0, but ???? ? 2 Then ? ? 1 2?(?)
Proof Clearly, ? ? ? ? ? = ? ? ?(?|?) Let s find ?(?|?): Let ? with ???? = 0 and ???? ? Draw set ? : ? ???? = ???? ?
Growth function sample bound For any class ? and distribution ?, if a training sample ? is drawn from ? of size: ? 2 1 ?] ?[log22? 2? + log2 then with probability 1 ?, every ? with ???? ? has ???? > 0, or every ? with ???? = 0 has ???? < ?
Proof Consider the set S of size n from distribution D: A denotes the event that there exists ? with ???? > ? but ???? = 0 We will prove ? ? ? B denotes the event that there exists ? with ???? ? By the previous lemma, it s enough to prove that ? ? ? 2 but ???? = 0 2
Proof We will draw a set ? of 2? points and partition in into two sets: ?,? Randomly put the point in ? into pairs: ?1,?1, ,(??,??) For each index i, flip a fair coin. If heads, put ?? in ? and ?? in ? . ? ? over the new ?,? is identical to ?(?) over ? in case it was drawn directly
Proof Fix some classifier ?[? ] and consider the probability over these n fair coin flips that: makes zero mistakes on ? make more than ?? 2 mistakes on ? For any index i, makes a mistake on both ??,?? ???? = 0 There are fewer mistakes than ?? 2 indices such that makes a mistake on either ?? or ?? ???? = 0 There are ? ?? 2 indices i such that makes a mistake on exactly on of ?? or ?? The chance 1 2 2 ?? 2 ? 1 that all those mistakes land in ? is
Growth function uniform convergence For any class ? and distribution ?, if a training sample ? is drawn from ? of size: 8 ?2[ln (2? 2? + ln 1 ?] ? then with probability 1 ?, every ? satisfies ???? ???? ?
When we combine Sauers lemma with the theorems.. ? ? 2 1 ?] =2 2?? ? 1 ?] ?[log22? 2? + log2 ?[log2 2 + log2 ? 2 2? ? 1 ? ?1 + ? log2? + ? log2 + log2 And the inequation is solved with: n = ?(1 ? ? 1 ? ?log + log ) ? Where ? = ?????(?)
VC-dimensions of Combination of Concepts ????? 1, ? = {? ?:? 1? , , ?? = 1} When ?(?) denotes the indicator for whether or not ? ? Given concept class ?, a Boolean function ? and an integer ?, ?????,?? = {????? 1, , ?: ? ?}
Corollary If the ?? dim ? = ?, then for any combination function ?, VCdim ?????,?? = ? ????? ??