From Linear Classifiers to Neural Networks - A Comprehensive Overview
This content delves into the transition from linear classifiers to neural networks, covering topics such as discriminant functions, cost functions, loss functions, and the structure of linear classifiers. Explore the representation power of sigmoidal neural networks and the challenges posed by non-differentiable functions in the ideal case scenario.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
From linear classifiers to neural network
Skeleton Recap linear classifier Nonlinear discriminant function Nonlinear cost function Example Linear Classifiers Introduction to neural network Representation power of sigmoidal neural network
Linear Classifier: Recap & Notation We focus on two-class classification for the entire class ?- Input feature ??= ??1 ,??2 , ,??? ? index of training tokens, ? feature dimension ? = ?1,?2, ,?? ??= ? + ????- Linear output ??- Predicted class ??- Labelled class ?- Weight vector if ?? 0 otherwise if ?? 0 otherwise ??= 1 or ??= 1 0 1
Discriminant function ??= ? + ????- Linear output if ?? 0 otherwise ??= 1 1 ??= ? ?? ? ? = 1 if ? 0 otherwise 1 ? ? - Nonlinear discriminant function 1 -1
Loss function To evaluate the performance of the classifier ? ?1:?,?1:? ? ??,?? = ?=1 Loss function ? ?,? = ? ? ? ?,1 ? ?, 1 1 ? ? -1 -1 1
Structure of linear classifier ? ?,1 ? ?, 1 ? ,?? 1 ? ? - 1 - 1 1 ?? ? ?? ??= ? + ???? ??1 ??? 1 ??
Alternatively Nonlinear function ? ? = ? Loss function ? ?,? = ? ?? ? ?,1 ? ?, 1 ? ?
Structure of linear classifier ? ?,1 ? ?,1 ? ?, 1 ? ?, 1 ? ,?? ? ? ? ? 1 - 1 - 1 1 ?? ? ?? ??= ? + ???? ??1 ??? 1 ??
Ideal case - Problem? Nonlinear function 1 -1 Loss function ? ?,1 ? ?, 1 ? ? Not differentiable. Cannot train using gradient methods We need proxy for both.
Skeleton Recap linear classifier Nonlinear discriminant function Nonlinear cost function Example Linear Classifiers Introduction to neural network Representation power of sigmoidal neural network
Nonlinear Discriminant function Our Goal: Find a function that is Differentiable Approximates the step function Solution: Sigmoid function Definition: A bounded, differentiable and monotonically increasing function.
Sigmoid Function - Examples Logistic Function: 1 ? ? = 1 + ? ? ? ? 1 + ? ? 2 ? ? = 1 0.5 0 -5 -4 -3 -2 -1 0 1 2 3 4 5 0.25 0.2 0.15 0.1 0.05 0 -5 -4 -3 -2 -1 0 1 2 3 4 5
Sigmoid Function - Examples Hyperbolic Tangent: ? ? = tanh ? =?? ? ? ?? ? ? ??+ ? ? 2 ??+ ? ? ?? ? ? ? ? =??+ ? ? ??+ ? ? = 1 tanh2? 1 0.5 0 -0.5 -1 -5 -4 -3 -2 -1 0 1 2 3 4 5 1 0.5 0 -5 -4 -3 -2 -1 0 1 2 3 4 5
Skeleton Recap linear classifier Nonlinear discriminant function Nonlinear cost function Example Linear Classifiers Introduction to neural network Representation power of sigmoidal neural network
Nonlinear Loss function Our Goal: Find a function that is Differentiable Is an UPPER BOUND of the step function Why? For training: min loss => error not large For test: generalized error < generalized loss < upper bounds
Loss function example Square loss ? ?,? = ? ?2 ?? ??= 2 ? ? ? ?,1 ? ?, 1 ? ? Advantage: easy to solve, common for regression Disadvantage: punish right tokens
Loss function example Hinge loss ? ?,? = ?? + 1 if ?? < 1 otherwise 0 ?? ??= ? ?? + 1 ? ?,1 ? ?, 1 ? ? Advantage: easy to solve, good for classification
Skeleton Recap linear classifier Nonlinear discriminant function Nonlinear cost function Example Linear Classifiers Introduction to neural network Representation power of sigmoidal neural network
Linear classifiers example Nonlinear Discriminant Function Loss Function ? ,?? ? ??,?? Linear Square ?? ??= ? ?? Sigmoid Hinge ? Linear + Square: MSE classifier Sigmoid + Squared: Nonlinear MSE classifier Linear + Hinge + Regularization : SVM ?? ??= ???? ?? ??1 ??? 1
MSE Classifier 2= 2 ? ??,?? ?? ?? ???? ?? = ? ? ? 2= 2 ???? ?? ???? ?? ?? ?? = 0 ? ? ???? ?? = ???? ? ? ? ? = ?1, ,??, ? = ?1, ,?? ???? = ?? ? = ??? 1??
Nonlinear MSE Classifier 2 ? ??,?? ?? ?? = ? ? 2 ???? ?? = ? 2 ? ??,?? ?? ?? = ? ? 2 ? ???? ?? = ?
Training a Nonlinear MSE Classifier 2 ? ??,?? ?? ?? = ? ? 2 ? ???? ?? = ? Chain rule: 2 ? ???? ??? ?????? ??= ? Disadvantage: Can be stagnant.
Skeleton Recap linear classifier Nonlinear discriminant function Nonlinear cost function Example Linear Classifiers Introduction to neural network Representation power of sigmoidal neural network
Introduction of neural network ?? is a function of of ??, ? ?? For linear classifier, this function takes a simple form What if we need more complicated functions? ?? ?? ?? ?? ??1 ??? 1
Introduction of neural network ? ? ?1 1 ?1 ?1 ? ? ?1 1 ?1 ?1 ? ? 1 ?0 1 ?0 ?0
? ? ?3 1 ?3 ?3 Introduction of neural network ?3 1 ? ? ?3 ?3 ? ? 1 ?2 1 ?2 ?2 ? ? ?2 1 ?2 ?2 ? ? 1 ?1 1 ?1 ?1 ? ? ?1 1 ?1 ?1 ? ? 1 ?0 1 ?0 ?0
Introduction of neural network ?? ? ?? 1 ? ? 1 ?? 1 1 ?? 1 ?? 1 ? ? ?? 1 1 ?? 1 ?? 1
Notation ?= ?2 ?2 ? ?2 ? ? ?2 1 ?2 ?2 ?= ?2+ ?2?2 ? ? ? ?2 ?2 1 ?2 ?2 ? ? 1 ?= ?1 ?1 ? ?1 1 ?1 ?1 ?1 Hidden layer 1 ?= ?1+ ?1?0 ? ? ? ?1 ?1 1 ?1 ?1 ? input ? ? 1 ?0 ?0 1 ?0 ?0
Notation ? ??= ?? ?? ?? ?= ??+ ???? 1 ? ? ?? ?? 1 ? ? ?? 1 = ?? 1 ?? 1 ? ? 1 ?? 1 1 ?? 1 ?? 1 ? ? ?? 1 = ?? 1+ ?? 1?? 2 ? ? ?? 1 1 ?? 1 ?? 1
Question ?, ? ?0 ? ?? is a function of ?0 of function can be represented by a neural net? Are sigmoid functions good candidates for ? ? , how many kinds Answer: Given enough nodes, a 3-layer network with sigmoid or linear activation functions can approximate ANY functions with bounded support sufficiently accurately.
Skeleton Recap linear classifier Nonlinear discriminant function Nonlinear cost function Example Linear Classifiers Introduction to neural network Representation power of sigmoidal neural network
Proof: representation power For simplicity, we only consider functions with 1 variable. Real input real output. In this case, two layers are enough. ? ? ? ? ? + 1 ? ? ? ? ? ?=1
Proof: representation power First Layer: ? hidden nodes, each node represents ?1? = ? ?0 ? ?1 - logistic function, ?1? = 1,?1= ? Second (Output) layer: ?2? = ? ? + 1 ? ? ? ? ? ? ? + 1 ? ? ? ? ? ?=1