
Advanced Neural Network Architectures and Training Techniques
Explore auto-associative neural networks, bottleneck constraints, and ability to learn complex functions. Learn about the challenges in training deep networks and techniques like pre-training and stochastic restricted Boltzmann machines. Dive into deterministic shallow autoencoders and various other advanced neural network concepts.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Submitted by: Ankit Bhutani (Y9227094) Supervised by: Prof. Amitabha Mukerjee Prof. K S Venkatesh
AUTO-ASSOCIATIVE NEURAL NETWORKS OUTPUT SIMILAR AS INPUT
BOTTLENECK CONSTRAINT LINEAR ACTIVATION PCA [Baldiet al., 1989] NON-LINEAR PCA [Kramer, 1991] 5 layered network ALTERNATE SIGMOID AND LINEAR ACTIVATION EXTRACTS NON-LINEAR FACTORS
ABILITY TO LEARN HIGHLY COMPLEX FUNCTIONS TACKLE THE NON-LINEAR STRUCTURE OF UNDERLYING DATA HEIRARCHICAL REPRESENTATION RESULTS FROM CIRCUIT THEORY SINGLE LAYERED NETWORK WOULD NEED EXPONENTIALLY HIGH NUMBER OF HIDDEN UNITS
DIFFICULTY IN TRAINING DEEP NETWORKS NON-CONVEX NATURE OF OPTIMIZATION GETS STUCK IN LOCAL MINIMA VANISHING OF GRADIENTS DURING BACKPROPAGATION SOLUTION -``INITIAL WEIGHTS MUST BE CLOSE TO A GOOD SOLUTION [Hinton et. al., 2006] GENERATIVE PRE-TRAINING FOLLOWED BY FINE-TUNING
PRE-TRAINING INCREMENTAL LAYER-WISE TRAINING EACH LAYER ONLY TRIES TO REPRODUCE THE HIDDEN LAYER ACTIVATIONS OF PREVIOUS LAYER
INITIALIZE THE AUTOENCODER WITH WEIGHTS LEARNT BY PRE-TRAINING PERFORM BACKPROPOAGATION AS USUAL
STOCHASTIC RESTRICTED BOLTZMANN MACHINES (RBMs) HIDDEN LAYER ACTIVATIONS (0-1) USED TO TAKE A PROBABILISTIC DECISION OF PUTTING 0 OR 1 MODEL LEARNS THE JOINT PROBABILITY OF 2 BINARY DISTRIBUTIONS - 1 IN INPUT AND THE OTHER IN HIDDEN LAYER EXACT METHODS COMPUTATIONALLY INTRACTABLE NUMERICAL APPROXIMATION -CONTRASTIVE DIVERGENCE
DETERMINISTIC SHALLOW AUTOENCODERS HIDDEN LAYER ACTIVATIONS (0-1) ARE DIRECTLY USED FOR INPUT TO NEXT LAYER TRAINED BY BACKPROPAGATION DENOISING AUTOENCODERS CONTRACTIVE AUTOENCODERS SPARSE AUTOENCODERS
TASK \ MODEL RBM SHALLOWAE [Hinton et al, 2006] andmany others since then Investigated by [Bengio et al, 2007], [Ranzato et al, 2007], [Vincent et al, 2008], [Rifai et al, 2011] etc. No significant results reported in literature -Gap CLASSIFIER [Hinton & Salakhutdinov, 2006] DEEPAE
MNIST Big and Small Digits
Square & Room 2d Robot Arm 3d Robot Arm
Libraries used Numpy, Scipy Theano takes care of parallelization GPU Specifications Memory 256 MB Frequency 33 MHz Number of Cores 240 Tesla C1060
REVERSE CROSS-ENTROPY X Original input Z Output Parameters Weights and Biases
RESULTS FROM PRELIMINARY EXPERIMENTS
TIME TAKEN FOR TRAINING CONTRACTIVE AUTOENCODERS TAKE VERY LONG TO TRAIN
EXPERIMENT USING SPARSE REPRESENTATIONS STRATEGY A BOTTLENECK STRATEGY B SPARSITY + BOTTLENECK STRATEGY C NO CONSTRAINT + BOTTLENECK
MOMENTUM INCORPORATING THE PREVIOUS UPDATE CANCELS OUT COMPONENTS IN OPPOSITE DIRECTIONS PREVENTS OSCILLATION ADDS UP COMPONENTS IN SAME DIRECTION SPEEDS UP TRAINING WEIGHT DECAY REGULARIZATION PREVENTS OVER-FITTING
USING ALTERNATE LAYER SPARSITY WITH MOMENTUM & WEIGHT DECAY YIELDS BEST RESULTS