
Image Recognition and Deep Learning Principles
Explore the evolution and key principles of image recognition, from early algorithms to deep learning basics. Discover the importance of data collections, model invariance, parameter management, and network capacity in developing image recognition systems. Dive into the realm of deep learning and its ability to capture complex feature spaces through hierarchical structures. Uncover how these advancements impact various tasks like object detection and semantic segmentation, paving the way for enhanced image understanding.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
VALSE Webinar ICCV Pre-conference SORT & Genetic CNN Speaker: Lingxi Xie Slides available at my homepage (TALKS)! Department of Computer Science The Johns Hopkins University http://lingxixie.com/
We Focus on Image Recognition Image recognition or classification is important It is the lowest goal of understanding an image The ease of data collection and large-scale datasets Recognition itself is of little use, but it helps other tasks Many other tasks, including instance retrieval, object detection, semantic segmentation, boundary detection, etc., benefit from the pre-trained models on a large dataset Meanwhile, the recognition task is still developing A single label is not enough for describing an image Recognition is being combined with natural language processing 4/8/2025 VALSE Webinar 2017 2
Brief History: Image Recognition Image recognition: a fundamental task Clearly defined, labeled data easy to obtain Development in datasets Small datasets: from two classes to few classes Mid-level datasets: tens or hundreds of classes Current age: more than 10,000 classes [Deng, 2009] Evolution in algorithms Early years: global features, e.g., color histograms From 2000 s: local features, e.g., SIFT Current age: deep neural networks, e.g., AlexNet 4/8/2025 VALSE Webinar 2017 3
Key Principles: Image Recognition Principle #1: invariance The ability of modeling and capturing invariance determines the transfer ability The local features are often more repeatable than global features Example: handcrafted features from global to local Principle #2: parameters A large parameter count often leads to the risk of over-fitting Example: neuron connectivity from fully-connected to convolutional (partially-connected and weight sharing) Principle #3: capacity A model with a large capacity would benefit from data increase Example: network structure from shallow to deep 4/8/2025 VALSE Webinar 2017 4
Deep Learning Basics Deep learning is the idea of constructing a very complicated mathematical function based on a hierarchy of differentiable operations We provide a large function space, and let the data speak for themselves The hierarchy often appears as a network structure, and the operations are often illustrated as links between neurons People tend to believe that a network with an enough depth and a sufficient number of neurons is able to fit any complicated feature space 4/8/2025 VALSE Webinar 2017 5
Recognition: Background Deeper architectures AlexNet: the first deep network for large-scale recognition (8 layers) VGGNet: deeper structures (16 or 19 layers) GoogLeNet: multi-scale, multi-path (22 layers) ResNet: deeper networks with highway connections (50, 101 layers or more) DenseNet: dense layer connections (100 + layers) 4/8/2025 VALSE Webinar 2017 6
Recognition: Background (cont.) Towards efficient network training Basic elements: learning rate, mini-batch, momentum ReLU: a non-linear unit to prevent gradient vanishing Dropout: introducing randomness to prevent over-fitting Batch normalization: towards better numerical stability 4/8/2025 VALSE Webinar 2017 7
Our Work on Image Recognition Novel network modules L. Xieet.al, Towards Reversal-Invariant Image Representation, ICCV 2015, IJCV 2017 L. Xieet.al, Geometric Neural Phrase Pooling: Modeling the Spatial Co-occurrence of Neurons, ECCV 2016 Y. Wanget.al, SORT: Second-Order Response Transform for Visual Recognition, ICCV 2017 A new training strategy L. Xieet.al, DisturbLabel: Regularizing CNN on the Loss Layer, CVPR 2016 Automatically discovering new network structures L. Xieet.al, Genetic CNN, ICCV 2017 4/8/2025 VALSE Webinar 2017 8
ICCV 2017 SORT: Second-Order Response Transform for Visual Recognition Speaker: Lingxi Xie Authors: Yan Wang, Lingxi Xie, Chenxi Liu, Siyuan Qiao, Ya Zhang, Wenjun Zhang, Qi Tian, Alan Yuille Department of Computer Science The Johns Hopkins University http://lingxixie.com/
Outline Introduction Second-Order Response Transform Experiments Conclusions and Future Work 4/8/2025 VALSE Webinar 2017 10
Outline Introduction Second-Order Response Transform Experiments Conclusions and Future Work 4/8/2025 11
Introduction Deep Learning The state-of-the-art machine learning theory Using a cascade of many layers of non-linear neurons for feature extraction and transformation Learning multiple levels of feature representation Higher-level features are derived from lower-level features to form a hierarchical architecture Multiple levels of representation correspond to different levels of abstraction 4/8/2025 12
Introduction (cont.) The Convolutional Neural Networks A fundamental machine learning tool Good performance in a wide range of problems in computer vision as well as other research areas Evolutions in many real-world applications Theory: a multi-layer, hierarchical network often has a larger capacity, also requires a larger amount of data to get trained 4/8/2025 13
Outline Introduction Second-Order Response Transform Experiments Conclusions and Future Work 4/8/2025 VALSE Webinar 2017 14
Motivation The representation ability of deep neural networks comes from the composition of nonlinear functions Currently, the main source of nonlinearity comes from the ReLU (or sigmoid) activation, and the max-pooling operation We add a second-order term into the network to facilitate nonlinearity 4/8/2025 VALSE Webinar 2017 15
Branched Network Structures An input data cube ? is feed into two parallel modules, and we get intermediate outputs ?1?;?1 and ???;?2, then fuse them into an output cube ? Example 1: in the Maxout network, ?1? = ?1? ?2? = ?2?, and ? = max ?1? ,?2? Example 2: in the deep ResNet, ?1? = ?, ?2? = ?2 ? ?2? , and ? = ?1? + ?2? 4/8/2025 VALSE Webinar 2017 16
Formulation Adding a second-order term into the fusion stage of ?1? and ??? ? = ?1? + ?2? + ?1? ?2? is element-wise product operation Implementation Details Gradient back-propagation is straightforward Less than 5% extra time, no extra memory 4/8/2025 VALSE Webinar 2017 17
Illustration A single-branch network, after each convolution layer is replaced by a two-branch module, can be improved by SORT A Two-Branch Block A Residual Block ? ? conv-1a conv-2a conv-a conv-1b conv-2b conv-b ?1? ?2? ? ? ? ? Fusion Fusion ?R= ?1? + ?2? ?R= ? + ? ? ORIGINAL ?S= ? + ? ? + ? ? ? ?S= ?1? + ?2? +?1? ?2? SORT 4/8/2025 VALSE Webinar 2017 18
Benefit? What is the benefit of the second-order term? Increasing nonlinearity The roles of different orders Cross-branch gradient back-propagation Other explanations? 4/8/2025 VALSE Webinar 2017 19
Increasing the Nonlinearity Both ReLU and max operations are nonlinear at a sub-dimension, but a real second-order term is nonlinear at the entire input space ?1+ ?? ?1 ?? max ?1,?? ResNet-20 on CIFAR10 7.60 7.55 not converge 7.63 ?.?? 7.64 7.90 4/8/2025 VALSE Webinar 2017 20
The Role of Different Orders Linear terms help convergence It is not recommended to use ?1 ?? alone Nonlinear terms help representation ability Using a second-order term is better than using a piecewise linear term (such as ReLU and max) A combination of linear and nonlinear terms produces the best performance 4/8/2025 VALSE Webinar 2017 21
Cross-Branch Gradient Back-Prop Original form: ? = ?1?;?1 + ?2?;?2 ?? ??1 only depends on ?1, ?? SORT: ? = ?1?;?1 + ?2?;?2 + ?1?;?1 ?2?;?2 Both ?? ??2 depends on both ?1 and ?2 A branch can update the parameter based on the information from another branch ??2 only depends on ?2 ??1 and ?? 4/8/2025 VALSE Webinar 2017 22
Any Other Explanations? This is still an open problem! Possible options Using a nonlinear kernel in visual recognition Gating: a popular idea in recurrent CNN The mask operation in the attention model 4/8/2025 VALSE Webinar 2017 23
Outline Introduction Second-Order Response Transform Experiments Conclusions and Future Work 4/8/2025 VALSE Webinar 2017 24
Small-Scale Experiments Datasets CIFAR10, CIFAR100, SVHN Networks LeNet (5 layers) BigNet (11 layers) ResNet (20 layers, 32 layers, 56 layers) WideResNet (28 layers) 4/8/2025 VALSE Webinar 2017 25
Small-Scale Results Network CIFAR10 CIFAR100 SVHN 7.97 34.57 1.92 DSN (2014) 7.09 31.75 1.77 r-CNN (2015) 6.05 32.37 1.69 GePool (2016) 5.37 24.53 1.85 WRN (2016) 5.25 24.98 1.75 StocNet (2016) 3.74 19.25 1.59 DenNet (2017) 11.10 ??.?? 36.93 ??.?? 2.55 ?.?? LeNet* 6.84 ?.?? 29.25 ??.?? 1.97 ?.?? BigNet* 7.60 ?.?? 30.66 ??.?? 2.04 ?.?? ResNet-20 6.72 ?.?? 29.55 ??.?? 2.20 ?.?? ResNet-32 6.00 ?.?? 27.55 ??.?? 2.22 ?.?? ResNet-56 4.78 ?.?? 22.05 ??.?? 1.80 ?.?? WRN-28 4/8/2025 VALSE Webinar 2017 26
Small-Scale Results Network CIFAR10 CIFAR100 SVHN 7.97 34.57 1.92 DSN (2014) 7.09 31.75 1.77 r-CNN (2015) 6.05 32.37 1.69 GePool (2016) 5.37 24.53 1.85 WRN (2016) 5.25 24.98 1.75 StocNet (2016) 3.74 19.25 1.59 DenNet (2017) 11.10 ??.?? 36.93 ??.?? 2.55 ?.?? LeNet* 6.84 ?.?? 29.25 ??.?? 1.97 ?.?? BigNet* 7.60 ?.?? 30.66 ??.?? 2.04 ?.?? ResNet-20 6.72 ?.?? 29.55 ??.?? 2.20 ?.?? ResNet-32 6.00 ?.?? 27.55 ??.?? 2.22 ?.?? ResNet-56 4.78 ?.?? 22.05 ??.?? 1.80 ?.?? WRN-28 4/8/2025 VALSE Webinar 2017 27
Small-Scale Results Network CIFAR10 CIFAR100 SVHN 7.97 34.57 1.92 DSN (2014) 7.09 31.75 1.77 r-CNN (2015) 6.05 32.37 1.69 GePool (2016) 5.37 24.53 1.85 WRN (2016) 5.25 24.98 1.75 StocNet (2016) 3.74 19.25 1.59 DenNet (2017) 11.10 ??.?? 36.93 ??.?? 2.55 ?.?? LeNet* 6.84 ?.?? 29.25 ??.?? 1.97 ?.?? BigNet* 7.60 ?.?? 30.66 ??.?? 2.04 ?.?? ResNet-20 6.72 ?.?? 29.55 ??.?? 2.20 ?.?? ResNet-32 6.00 ?.?? 27.55 ??.?? 2.22 ?.?? ResNet-56 4.78 ?.?? 22.05 ??.?? 1.80 ?.?? WRN-28 4/8/2025 VALSE Webinar 2017 28
ImageNet Experiments Dataset ILSVRC2012 Networks AlexNet (8 layers) ResNet (18, 34, or 50 layers) The Facebook implementation on pytorch is used 4/8/2025 VALSE Webinar 2017 29
ImageNet Results Network Top-1 Error Top-5 Error 43.19 19.87 AlexNet 36.66 14.79 AlexNet* ??.?? ??.?? AlexNet*+SORT 30.50 11.07 ResNet-18 ??.?? ??.?? ResNet-18+SORT 27.02 8.77 ResNet-34 ??.?? ?.?? ResNet-34+SORT 24.10 7.11 ResNet-50 ??.?? ?.?? ResNet-50+SORT 4/8/2025 VALSE Webinar 2017 30
Outline Introduction Second-Order Response Transform Experiments Conclusions and Future Work 4/8/2025 VALSE Webinar 2017 31
Conclusions SORT: a simple idea to improve deep networks Effective: accuracy is boosted consistently Efficient: a light-weighted operation which needs less than 2% extra time and no extra memory Can be applied to a wide range of networks The role of different terms First-order terms: basic property and convergence Second-order terms: nonlinearity 4/8/2025 VALSE Webinar 2017 32
Future Work Applying SORT to the concatenation module? Inception, ResNeXt, DenseNet, etc. Adding other terms? Even higher-order, or arbitrary polynomial terms Non-polynomial terms Application to recurrent neural networks? 4/8/2025 VALSE Webinar 2017 33
ICCV 2017 Genetic CNN Speaker: Lingxi Xie Authors: Lingxi Xie, Alan Yuille Department of Computer Science The Johns Hopkins University http://lingxixie.com/
Outline Introduction Designing CNN Structures Genetic CNN Experiments Discussions and Conclusions 4/8/2025 VALSE Webinar 2017 35
Outline Introduction Designing CNN Structures Genetic CNN Experiments Discussions and Conclusions 4/8/2025 VALSE Webinar 2017 36
Introduction Deep Learning The state-of-the-art machine learning theory Using a cascade of many layers of non-linear neurons for feature extraction and transformation Learning multiple levels of feature representation Higher-level features are derived from lower-level features to form a hierarchical architecture Multiple levels of representation correspond to different levels of abstraction 4/8/2025 VALSE Webinar 2017 37
Introduction (cont.) The Convolutional Neural Networks A fundamental machine learning tool Good performance in a wide range of problems in computer vision as well as other research areas Evolutions in many real-world applications Theory: a multi-layer, hierarchical network often has a larger capacity, also requires a larger amount of data to get trained 4/8/2025 VALSE Webinar 2017 38
Outline Introduction Designing CNN Structures Genetic CNN Experiments Discussions and Conclusions 4/8/2025 VALSE Webinar 2017 39
Designing CNN Structures History From linear to non-linear From shallow to deep From fully-connected to convolutional Today A cascade of various types of non-linear units Typical units: convolution, pooling, activation, etc. 4/8/2025 VALSE Webinar 2017 40
Example Networks LeNet [LeCun et.al, 1998] 4/8/2025 VALSE Webinar 2017 41
Example Networks (cont.) AlexNet [Krizhevsky et.al, 2012] 4/8/2025 VALSE Webinar 2017 42
Example Networks (cont.) Other deep networks VGGNet [Simonyan et.al, 2014] GoogLeNet (Inception) [Szegedy et.al, 2014] Deep ResNet [He et.al, 2016] DenseNet [Huang et.al, 2016] 4/8/2025 VALSE Webinar 2017 43
Problem All the networks architectures are fixed This limits the ability and complexity of the networks We see some examples such as the Stochastic Network [Huang et.al, 2016], which allows the network to skip some layers in the training stage, but we point out that this is a fixed structure with a stochastic training strategy 4/8/2025 VALSE Webinar 2017 44
Outline Introduction Designing CNN Structures Genetic CNN Experiments Discussions and Conclusions 4/8/2025 VALSE Webinar 2017 45
General Idea Modeling a large family of CNN architectures as a solution space In this work, each architecture is encoded into a binary string of a fixed length Using an efficient search algorithm to explore good candidates In this work, the genetic algorithm is used 4/8/2025 VALSE Webinar 2017 46
The Genetic Algorithm A metaheuristic inspired by the process of natural selection that belongs to the larger class of evolutionary algorithms Commonly used to generate high-quality solutions to optimization and search problems by relying on bio-inspired operators such as mutation, crossover and selection https://en.wikipedia.org/wiki/Genetic_algorithm 4/8/2025 VALSE Webinar 2017 47
The Genetic Algorithm (cont.) Typical requirements of a genetic process A genetic representation of each individual (a sample in the solution space) A function to evaluate each individual (cost function or loss function) 4/8/2025 VALSE Webinar 2017 48
The Genetic Algorithm (cont.) Flowchart of a genetic process Initialization: generating a population of individuals to start with Selection: determining which individuals survive Genetic operations: crossover, mutation, etc. Iteration: repeating the above two process several times and ending the process when a condition holds 4/8/2025 VALSE Webinar 2017 49
The Genetic Algorithm (cont.) Example: the Traveling Salesman Problem (TSP) Finding the shortest Hamilton path over ? towns A typical genetic algorithm for TSP Genetic representation: a permutation of ? numbers Cost function: the total length of the current path Crossover: switching the sub-sequences in two paths Mutation: switching the position of two towns in a path Termination: after a fixed number of generations 4/8/2025 VALSE Webinar 2017 50