
Binary Hypothesis Testing in Engineering Applications
Binary hypothesis testing is a fundamental concept in making decisions based on limited information in engineering applications. Learn about detection rules, ML and MAP estimation, and likelihood ratio tests in ECE 313 with Professor Ravi K. Iyer at the University of Illinois.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Binary Hypothesis Testing ECE 313 Probability with Engineering Applications Lecture 21 Professor Ravi K. Iyer Dept. of Electrical and Computer Engineering University of Illinois at Urbana Champaign Iyer - Lecture 21 ECE 313 - Fall 2016
Todays Topics Introduction to Detection Rules Binary Hypothesis Testing: Maximum Likelihood (ML) Estimation Maximum a Posteriori Probability (MAP) Estimation Likelihood ratio tests Reference: ECE 313 Course Notes by Prof. Hajek, pages 59 65 and 138 143, available at: https://courses.engr.illinois.edu/ece313/fa2014/probabilityAug14.pdf Announcements: Group activity on Hypothesis testing in class, next Wednesday April 19 First progress meeting on the project coming up Iyer - Lecture 21 ECE 313 - Fall 2016
Iyer - Lecture 21 ECE 313 - Fall 2016
Iyer - Lecture 21 ECE 313 - Fall 2016
Iyer - Lecture 21 ECE 313 - Fall 2016
Iyer - Lecture 21 ECE 313 - Fall 2016
Binary Hypothesis Testing In many practical problems we need to make decisions about on the basis of limited information contained in a sample. For instance a system administrator may have to decide whether to upgrade the capacity of installation. The choice is binary in nature (e.g. an upgrade or not) To arrive at a decision we often make assumptions or a guess (an assertion) about the nature of the underlying population or situation. Such an assertion which may or may not be valid is called a statistical hypothesis. Procedures that enable us to decide whether to reject or accept hypotheses, based on the available information are called statistical tests. Iyer - Lecture 21 ECE 313 - Fall 2016
Iyer - Lecture 21 ECE 313 - Fall 2016
Binary Hypothesis Testing: Basic Framework Learning a decision rule based on past measurements Measurements Previous MRI Scan Machine Images captured from brain when H1 or H0: e.g. X = No. of abnormal spots in the image H1: Patient has a tumor H0:Patient doesn t have a tumor Binary: Either hypothesis H1is true or hypothesis H0is true. Based on which hypothesis is true, system generates a set of observations X. We use the past observations and find the conditional probabilities of different patterns under each hypothesis: e.g. What is the probability of observing two abnormal spots (X = 2) given patient has a tumor (H1)? Iyer - Lecture 21 ECE 313 - Fall 2016
Binary Hypothesis Testing: Basic Framework formulate a decision rule based on past measurements Measurements Training Previous Testing New Measurements Either hypothesis H1 is true or hypothesis H0 is true. Based on which hypothesis is true, system generates a set of observations X. We measure the past observations to learn the conditional probabilities of different patterns under each hypothesis: e.g. What is the probability of observing two abnormal spots (X = 2) if patient has a tumor (H1)? Next we formulate a decision rule that is then used to determine whether H1 or H0is true for new measurements. Note that the system has uncertainty or the features are not the best, so the decision rule can sometimes declare a true hypothesis, or it can make an error Iyer - Lecture 21 ECE 313 - Fall 2016
Example 1 Suppose the data is from a computer aided tomography (CAT) scan system, and the hypotheses are: H1: A tumor is present H0: No tumor is present We model the number of abnormal spots observed in the image by a discrete random variable X. Suppose: If hypothesis H1 is true (given), then X has pmf p1 If hypothesis H0 is true (given), then X has pmf p0 Note that p1 and p0 are conditional pmf s, described by a likelihood Matrix Given a tumor, what is the probability of observing 3 spots P (X = k | H1) => P (X = k | H0) => Iyer - Lecture 21 ECE 313 - Fall 2016
Example 1: Decision Rule A decision rule specifies for each possible observation (each of the possible values of X), which hypothesis is declared. Typically a decision rule is described using a likelihood matrix. Here we underline one entry in each column, to specify which hypothesis is to be declared for each possible value of X. Exampl:e: A probability-based decision rule: H1 is declared when the probability of an observation given hypothesis H1 is higher than its probability given H0: If we observe X = 2,3 in the new measurements, H1 (tumor) is declared, otherwise H0 (no tumor). Declare H1 Declare H0 Example intuitive decision rule: H1 is declared whenever X 1 : Iyer - Lecture 21 ECE 313 - Fall 2016
Binary Hypothesis Testing In a binary hypothesis testing problem, there are two possibilities for which the hypothesis is true, and two possibilities for which hypothesis is declared: H0 = Negative or null hypothesis, H1 = Positive or alternative hypothesis So there are four possible outcomes: 1. H1 is declared given Hypothesis H1 is true. => True Positive 2. H1 is declared given Hypothesis H0 is true. => False Positive (False Alarm) 3. 4. (Miss Detection) H0 is declared given Hypothesis H0 is true. => True Negative H0 is declared given Hypothesis H1is true. => False Negative Iyer - Lecture 21 ECE 313 - Fall 2016
Probability of False Alarm and Miss Two conditional probabilities are defined: ( declare P pmiss= = ( | ) p P declare H true H => Type I Error 1 0 false alarm | ) H true H => Type II Error 0 1 p is the sum of entries in the H0 row of the likelihood matrix that are not underlined: false alarm = + + = 3 . 0 2 . 0 1 . 0 6 . 0 p false alarm p is the sum of entries in the H1 row of the likelihood matrix that are not underlined: miss = 0 . 0 p miss Iyer - Lecture 21 ECE 313 - Fall 2016
Decision Rule If we modify the sample decision rule in the previous example to declare H1 when X = 1: p Then the will decrease to: The will increase to: miss p 3 . 0 false alarm 3 . 0 + 6 . 0 + 9 . 0 = 0 . 0 So what decision rule should be used? Trade-off between the two types of error probabilities Evaluate the pair of conditional error probabilities for multiple decision rules and then make a final selection ( , ) p p false alarm miss Iyer - Lecture 21 ECE 313 - Fall 2016
Maximum Likelihood (ML) Decision Rule Maximum likelihood (ML) decision rule declares the hypothesis that maximizes the probability (likelihood) of the observation ) | ( 0 and H k X P = = ( | ) P X k H 1 ML decision rule is based on likelihood matrix, the matrix of conditional probabilities of The sum of elements in each row of the likelihood matrix is one. = ( | ) P X k H i ML decision rule is specified by underlining the larger entry in each column of likelihood matrix. If the entries in a column are identical, then either can be underlined. p = + = 2 . 0 1 . 0 = 3 . 0 false alarm = 1 . 0 + 0 . 0 1 . 0 p miss Iyer - Lecture 21 ECE 313 - Fall 2016
Maximum a posteriori probability (MAP) Decision Rule Maximum a-posteriori probability (MAP) decision rule declares the hypothesis which maximizes the posteriori probabilities A Posteriori probability is a conditional probability that an observer would declare a hypothesis, given an observation k: = = ( | ) ( | ) P H X k and P H X k 0 1 So given an observation X = k, the MAP decision rule chooses the hypothesis with the larger posteriori probability The posteriori probabilities are unknown, so we use Bayes formula to calculate them. Iyer - Lecture 21 ECE 313 - Fall 2016
Maximum a posteriori probability (MAP) Decision Rule (Cont d) = = ( P , ) ( = , ) P H X = k P H X + k = = = 0 X 0 ( | ) P H X k 0 = ( ) k ( , ) ( , ) k = P H X k P = H X k 0 1 ( P , ) ( = , ) P H X = P H X + k = = = 1 X 1 ( | ) P H X k 1 = ( ) ( , ) ( , ) k P H X k P H X k 0 1 So the MAP decision rule requires the computation of the joint probabilities: But we have: from the likelihood matrix which are called the prior probabilities. The prior probabilities are determined independent of the experiment and prior to any observation is made. = = ( , ) ( , ) P H X k and P H X k 0 1 = ( | ) P X k H i ( ) ( ) P H and P H 0 1 Iyer - Lecture 21 ECE 313 - Fall 2016
Maximum a posteriori probability (MAP) Decision Rule (Cont d) = = Let s assume, ( ) ( ) P H and P H 0 0 1 1 We apply the Bayes rule to calculate the joint probabilities ) , ( 1 k X H = as follows: ) , ( 0 P and k X H P = = = = ) ( , ) ( | P H X k P X k H i i i Iyer - Lecture 21 ECE 313 - Fall 2016
Maximum a posteriori probability (MAP) Decision Rule (Cont d) MAP decision rule creates a joint probability matrix: this is a matrix of the joint probabilities of , (just calculated). = ( , ) P H X k i i Each row in the joint probability matrix is times the corresponding row of the likelihood matrix. The sum of entries in row is , and the some of all the entries in the matrix is one. H i i H i = = 8 . 0 , 2 . 0 0 1 Likelihood Matrix Joint Probabilit Matrix y Iyer - Lecture 21 ECE 313 - Fall 2016
Maximum a posteriori probability (MAP) Decision Rule (Cont d) = = ( P , ) ( = , ) P H X = k P H X + k = = = Remember : , | ( X H P i 0 X ) i k = ( ) ( , ) ( , ) k P H X k P H X k 0 1 = So the posterior probability can be computed by dividing the value in the row , column , divided by the sum of entries in the column . Since the denominators for both are the same (both equal to ), we only need to compare the numerator (joint probabilities of ) ( 0 H P ( | ) P H X k i X = H k i X = k = = ( | ) ( | ) P ) H X k and P H X k 0 1 = ( P X k = , ) X k MAP decision rule is specified by underlying the larger entry in each column of the joint probability matrix. If the entries in a column are identical, then either can be underlined. = = . 0 08 + 8 . 0 / 1 . 0 p p false alarm = + = . 0 ( 00 . 0 02 . 0 06 / ) 2 . 0 4 . 0 miss Iyer - Lecture 21 ECE 313 - Fall 2016
Average Error Probability Remember the conditional probabilities of: p alarm false = ( | ) P declare H true H 1 0 pmiss= ( | ) P declare H true H 0 1 The average error probability is defined as: p = + p p 0 1 e false alarm miss p is the sum of entries in the H0 row of the likelihood matrix, (joint probability matrix) that are not underlined (divided by ) is the sum of entries in the H0 row of the likelihood matrix, (joint probability matrix) that are not underlined (divided by ) false alarm 0 p miss 1 e p is the sum of all the entries in the joint probability matrix that are not underlined. Among all decision rules, MAP is the one that minimizes =>Optimal e p Iyer - Lecture 21 ECE 313 - Fall 2016
Likelihood Ratio Test (LRT) A way to generalize the ML and MAP decision rules into a single framework is using LRT. Define the likelihood ratio for each possible observation k as the ratio of the two conditional probabilities: ) ( ) ( 0 k p (k ) = ( | ) p k P X k H = = 1 1 k = ( ) ( | ) P X k H 0 A decision rule can be expressed as an LRT with threshold : ) ( declare H alarm false p . declare H is true 1 X . H is true 0 If the threshold is increased, then there are fewer observations that lead to deciding is true. As increases, decreases, and increases. 1 p miss Iyer - Lecture 21 ECE 313 - Fall 2016
Likelihood Ratio Test (LRT) (Contd) If the observation is X = k: The ML rule declares hypothesis is true, if and otherwise it declares is true. So the ML rule can be specified using an LRT with : ( ) ( ) p k p k H 1 0 1 H 0 = 1 1 . declare H is true ( ) p k 1 = = ( ) 1 X 1 . declare H is true ( ) p k 0 0 H ( ) ( ) p k p k The MAP rule declares hypothesis is true, if and otherwise it declares is true. So the MAP rule can be specified using an LRT with : = = ) ( 0 k p = 1 1 0 0 1 H 0 = 0 1 0 . declare H is true 1 ( ) p k ( ) 1 X 1 0 . declare H is true 0 1 For uniform priors , MAP and ML decision rules are the same 1 0 Iyer - Lecture 21 ECE 313 - Fall 2016
Example 2: Discrete Case Suppose you have a coin and you know that either: H1: the coin is biased, showing heads on each flip with probability 2/3; or H0: the coin is fair, showing heads and tails with probability 1/2 Suppose you flip the coin five times. Let X be the number of times heads shows. Describe the ML and MAP decision rules using LRT. Find , , and for both decision rules. Use the prior probabilities of . e p p p false alarm miss = ( , ) ) 8 . 0 , 2 . 0 ( 0 1 Iyer - Lecture 21 ECE 313 - Fall 2016
Example 2 (Contd) X (number of times that head shows up) has a binomial distribution with n = 5, and: p = 2/3 for (Coin is biased) p = 1/2 for (Coin is fair) 0 H H 1 5 = = Remember for a binomial distribution: So we have: 5) k ( ) 1 ( P X k p p k 5 k 5 5 k k 5 k k 2 1 1 1 = = ( | ) P X k H = = ( | ) P X k H 1 0 3 3 2 2 k k The rows of the likelihood matrix consist of the pmf of X: X = 0 X = 1 X = 2 X = 3 X = 4 X = 5 5 5 4 2 3 4 3 2 2 1 1 2 1 2 1 2 1 2 1 H 5 10 5 10 3 1 3 3 3 3 3 3 3 3 3 5 5 5 5 5 5 1 1 1 1 1 H 5 10 5 10 0 2 2 2 2 2 2 Iyer - Lecture 21 ECE 313 - Fall 2016
Example 2 (Contd) In computing the likelihood ratio, the binomial coefficients cancel, so: 5 k k 2 1 5 k 2 2 3 3 = = k ( ) 2 . k 5 3 6 . 7 1 2 The ML decision rule is: Declare H1 whenever , or equivalently X ( ) 1 . 3 X The MAP decision rule is: Declare H1 whenever , or equivalently ( X 2 . 0 = . 1 ) . 0 25 X 8 . 0 Iyer - Lecture 21 ECE 313 - Fall 2016
Example 2 (Contd) 5 5 For the ML rule: 1 1 1 = + + = 10 5 5 . 0 p false alarm 2 2 2 5 4 2 3 1 2 1 2 1 51 = + + = p 5 5 . 0 201 p miss 3 3 3 3 3 243 = ) 8 . 0 ( + ) 2 . 0 ( . 0 26 p p e false alarm miss For the MAP rule: 5 1 = 1 . 0 97 p false alarm 2 5 1 1 = = p . 0 041 p miss 3 243 = ) 8 . 0 ( + ) 2 . 0 ( . 0 227 p p e false alarm miss e p As expected the probability of error ( ) for the MAP rule is smaller than for the ML rule. e p Iyer - Lecture 21 ECE 313 - Fall 2016
Example 3 An observation X is drawn from a standard normal distribution (i.e. N(0,1)) if hypothesis H1 is true and from a uniform distribution with support [-a, a], if hypothesis H0 is true. As shown in the figure, the pdfs of the two distributions are equal when |u| = b. a) Describe the maximum likelihood (ML) decision rule in terms of observation X and constant values a and b. b) Shade and label the regions in the figure such that the area of one region is the and the area of the other region is the . c) Express the and for ML decision rule in terms of a and b and the (u) function of the standard normal distribution. d) Determine the maximum a posteriori probability (MAP) decision rule for a = 2/3, b = 0.6, and the probability of hypothesis H1 being true, p p false alarm miss p p false alarm miss Iyer - Lecture 21 ECE 313 - Fall 2016
Example 3 (Contd) Iyer - Lecture 21 ECE 313 - Fall 2016
Example 3 (Contd) 2?? ?2 1 3 1 2?? ?2 2 ? =?1(?) 3 = ?0(?)= 2 Iyer - Lecture 21 ECE 313 - Fall 2016