
Understanding Discrimination and Classification Methods
Discrimination and classification methods aim to separate cases into known groups based on measurement variables. Learn how these methods work, their notation, concepts, setting up discriminant functions, and mathematical notations for classifying populations. Explore prior probabilities, misclassification costs, and more.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Discrimination & Classification Discrimination: Goal is to separate individual cases into known population groups by their measurements on several variables. Makes use of graphical and algebraic methods. Goal is to separate groups as much as possible based on numeric values Referred to as Separation Classification: Observing new cases along with numeric values and assigning them to groups based on their numeric values Makes use of an algorithm generated on known cases and applies it to new cases whose population is unknown. Referred to as Allocation
Notation and Concepts Notation: Populations Measured Variables: X Conceptual Settings of Unknown Population: Incomplete Knowledge of Outcome: The outcome is in future and cannot be observed when X is measured Destruction Necessary to Observe Outcome: A product must be destroyed to observe quality status. Unavailable or Expensive Assessments of Outcome: Authorship unknown or assessment by expensive gold standard may be needed
Setting up a Discriminant Function Prior Probabilities for the 2 Populations Assumes knowledge of relative population sizes. Will tend to classify individual cases into the larger population unless strong evidence in favor of smaller population. Misclassification Cost Is cost of misclassification same for objects from each of the populations? Probability Density Functions The distributions of the numeric variables for the elements of the 2 populations. Population 1: f1(x) Population 2: f2(x) Classification Regions Given an observations x values, it will be assigned to Population 1 or 2, R1 {x} s.t. an observation is classified to Population 1, R2 R1is the set of x where it is classified to Population 2
Mathematical Notation Conditional Probabilities of Classifying into the 2 Populations Given the 2 Populations: ( ) ( ) ( ) ( ) x ( ) ( ) x ( ) = = = = = x x x x 1|1 2|1 1 1|1 P f d P R P f d P R P 1 1 1 1 2 1 = R R R 1 2 1 ( ) ( ) ( ) ( ) x ( ) ( ) x ( ) = = = = = x x x x 2|2 1|2 1 2|2 P f d P R P f d P R P 2 2 2 2 1 2 = R R R 2 1 1 ( ) 2 ( P ) x = = + = Over P P all (Unconditional Probabilities): 1 P P P P P P 1 1 2 2 1 2 ( ( ) ( ) ( ) ( ( ( ) = = observation comes from & classified as 1|1 P P R P P 1 1 1 1 1 1 ) ( ) ( ) ( ) = ) ) = ) ) = ) ( ) ( = x ( ( observation comes from & classified as 2|1 ( ( 1|2 1 1|1 P P R P P P P ) 1 1 2 1 1 1 ( ) = = = x observ ation comes from & classified as 2|2 1 2|2 P P P R P P P P 2 2 2 2 2 2 1 ( ) ) ( ) = = x observation comes from & classified as 1 1 2|2 P P R P P P P 2 1 2 1 2 2 1 ( ) ( ) ( ) = = Misclassification Costs: | cost of classifying as given from with 1|1 2|2 0 c i j c c i j Expected Cost of Misclassification (ECM): obs comes from ECM P = = ( ( ) ( ) + = & classified as 2|1 obs comes from & classified as 1|2 c P c 1 2 2 1 ) ( ) ( ) ( ) + 2|1 2|1 1|2 1|2 P P c P P c 1 2
Regions that Minimize Expected Cost of Misclassification ( ) ( ) x ( ( ) ) ( ) ( ) x ( ( ) ) x x 1|2 2|1 1|2 2|1 f f c c f f c c P P P P 1 1 : s.t. x : s.t. x R R 2 2 1 2 2 1 2 1 ( ) ( ) x ( ( ) ) x 1|2 2|1 f f c c P P 1 where: density ratio cost ratio prior probability rat io 2 2 1 Special Cases: ( ) ( ) x ( ( ) ) ( ) ( ) x ( ( ) ) x x 1|2 2|1 1|2 2|1 f f c c f f c c 1 1 : s.t. x : s.t. x 1) Equal prior probabilities: R R 1 2 2 2 ( ) ( ) x ( ) ( ) x x x f f f f P P P P 1 1 : s.t. x : s. x 2) Equal cost ratios: t. R R 2 2 1 2 2 1 2 1 ( ) ( ) x ( ) ( ) x x x f f f f 1 1 : s.t. x : s.t. x 3) Equal prior probs & cost ratios: 1 1 R R 1 2 2 2 ( ) ( ) = + Total Probability of Misclassification: 2|1 1|2 See Case 2) Above for Rule TPM PP P P 1 2
Allocation of New Observation x x0to Population x Pr occurs & Pr x is observed ( ) = = 1 0 x Bayes' Rule: P 1 0 is observed 0 ( ) ( 1 ) ( ) x P x P + x 1 1 P f x 0 1 = = 0 ( ) ( 1 ) ) ( P ) x ( ) ( ) ( + x x 1 1 P f P f P P P 0 2 2 0 0 1 2 0 2 ( ) P f x ( ) ( ) = = 2 2 + 0 x x 1 P P ( ) ( ) 2 0 1 0 x 1 1 P f P f 0 2 2 0 Allocation Rule: ( ) ( ) ( ) ( ) x x x x Classify observation as if P P 1 1 P f P f 1 1 0 2 0 0 2 2 0
Normal Populations with Equal ( ) = X ~ , 1,2 N i i i 1 2 ( ) ( x ) ( ) ( ) 1/2 /2 p = = 1 2 exp 1,2 f i x ' x i i i Minimum ECM Regions (Based on Common Normalizing Constants for Popultions): ( ( ( ( ) ) ) ) ( ) ( ) x ( ) ( ) x 1 2 21 c c x f f 1 2 1 2 P P x ( ) ( ) ( ) ( ) = + 1 1 1 x ' s.t. exp R 2 x x ' x 1 1 1 2 2 2 1 1 2 21 c c x f f 1 2 1 2 P P x ( ) ( ) ( ) ( ) = + 1 1 1 x ' s.t. exp R 2 x x ' x 2 1 1 2 2 2 1 x Equivalently, Allocate New Observation to Population 1 if: 0 ( ( ) ) 1 2 21 c c 1 2 1 2 1 2 P P ( ) ( ) ( ) ( ) ( = ) ( ) ( ) + + 1 1 1 1 x x ln 2 ' x x ' x ' ' 0 1 0 1 0 2 0 2 1 2 0 1 2 1 2 1
Sample Based Discrimination = x x x x Random Samples of Size from 1,2 ,..., ; ,..., n i 11 1 12 2 i i n n 1 2 ( ) n ( ) + S S 1 1 n n n n ( )( ) 1 n 1 i i = = = = 1 1 + 2 2 x x S x x x x ' S 1,2 i i i i Pooled ji i ji ji 1 2 n x n = = 1 1 j j 1 2 i i Allocate New Observation to Population 1 if: 0 ( ( ) ) 1 2 21 c c ( ) ( ) ( ) 1 2 P P + 1 1 x x 'S x x x 'S x x ln 2 1 2 1 2 1 2 Pooled 0 Pooled 1 ( ( ) ) 1 2 21 c c P P ( ) ( ) = = = Note that if 1 2 21 and : ln 0 c c P P 2 2 1 1 ( ) ^ y ^ a'x ^ y ^ a'x ^ a'x ^ a'x = = = = = 1 x ( x 'S x Then, defining: ( 2 y y 1 2 1 2 Pooled 0 0 1 2 ) ) ( ) 1 1 2 ^ ^ ^ y = + = + 1 x x 'S x x Decision is based on what side of that falls m y y m 1 2 1 2 Pooled 1 2 0
Fishers Method for 2 Populations Linear transformations of multivariate to univariate to be as different as possible for ,..., ,..., ,..., ,..., n n n y y y x x x x x , y 1 2 y 11 1 11 1 21 2 21 2 n 1 1 2 2 n n ( ) ( ) 2 2 1 2 = + y y y y y y 1 2 1 2 j j 1 2 = = 1 1 j j 2 y Separation s + 2 s n n 1 2 y ( ) ^ y ^ a'x = = 1 x x 'S x Result 11.3: maximizes: 1 2 Pooled 2 2 ^ a'x ^ a'x ^ a'd ( ) 2 y y 1 2 1 2 = = = d x x 1 2 ^ a'S ^ a ^ a'S ^ a 2 y s Pooled Pooled ^ a = d x x over all with (Maximization Lemma) 1 2 x Allocate to ( x if : ) x 'S 0 1 ( ) ( ) 1 2 ^ y ^ a'x ^ = = = + 1 1 x x x 'S x x otherwise allocate it to m 1 2 1 2 1 2 0 Pooled 0 Pooled 2 0
Classification of Multivariate Normal Populations when 1 2 1 2 ( ) = + 1 1 1 Define: ln k ' ' 1 1 1 2 2 2 2 ( ( ( ( ) ) ) ) 1 2 21 c c 1 2 P P ( ) ( ) + 1 1 1 1 x x' : s.t. ln R k 2 x ' ' x 1 1 2 1 1 2 2 1 1 2 21 c c 1 2 P P ( ) ( ) + 1 1 1 1 x x' : s.t. ln R k 2 x ' ' x 2 1 2 1 1 2 2 1 Sample Data: Allocate to if: x 0 1 ( ( ) ) 1 2 21 c c ( ) 1 2 P P ( ) ^ k + 1 1 1 1 x ' S S x x 'S x 'S x ln 2 1 2 0 1 2 0 1 2 0 1 S S ( ) 1 2 1 2 ^ k = + 1 1 1 x 'S x x 'S x ln 1 1 2 2 1 2 2
Evaluation of Classification Functions ( ) x ( ) x = + x x Total Probability of Misclassification: TPM P f d P f d 1 1 2 2 R R 2 1 Optimum Error Rate when f R f evaluated with: f f TPM ( ) ( ) x ( ) ( ) x x x P P P P 1 1 x x : and : R 2 2 1 2 2 1 2 1 In practice, parameters of probability density funct ions are estimated: ( ) x ( ) x = + x x Actual Error Rate: AER P f d P f d ^ R ^ R 1 1 2 2 2 1 ( ( ( ( ) ) ) ) 1 2 21 c c ( ) ( ) ( ) 1 2 P P ^ + 1 1 x x x S x x x S x x : ' ' ln R 2 1 1 2 1 2 1 2 Pooled Pooled 1 1 2 21 c c ( ) ( ) ( ) 1 2 P P ^ + 1 1 x x x S x x x S x x : ' ' ln R 2 2 1 2 1 2 1 2 Pooled Pooled 1 Predicted Membership n1c n2m=n2-n2c Apparent Error Rate measured for Training Data (Actual Membership Known) obtained from Confusion Matrix: # of correctly classified items from population # im n of incorrectly misclassified items from population n n APER n n + n1m=n1-n1c n2c Total n1 n2 Actual Membership = 1,2 i n i i ic = 1,2 i + = Apparent Error Rate: Tends to underestimate (overfit to Training Sample) 1 2 m m AER 1 2
Jacknife Cross-Validation (Lauchenbruchs Holdout Method) For Population 1, remove each observation 1-at-time and fit the classifier based on all (n1-1)+n2remaining cases. Classify the hold-out case. Repeat for all n1cases from Population 1. n1m(H) # misclassified as 2 Repeat for all n2cases from Population 2. n2m(H) # misclassified as 1 Conditional Probabilities fo Misclassification: ^ ( ) = Pr Misclassified as Pop , Given from Pop j i P j i n n ^ ^ ( ) ( ) ( ) m H n ( ) m H n 1 2 = = 21 1 2 P P 1 2 + + n n ^ ( ) m H ( ) m H 1 2 = For reasonably large sample sizes: E AER n n 1 2