Masked Neural Networks: Learning Multiple Datasets Simultaneously

neural networks n.w
1 / 25
Embed
Share

Explore how a masked neural network can learn multiple datasets simultaneously, leveraging the Levenberg-Marquardt optimization algorithm. Each dataset with the same input vectors and different target output values is learned with a specifically selected mask, showcasing the power of neural network parameter switching and optimization techniques.

  • Neural Networks
  • Levenberg-Marquardt
  • Masking Mechanism
  • Multiple Datasets
  • Optimization

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. NEURAL NETWORKS Prof. Dr. Mehmet nder Efe Department of Computer Engineering Hacettepe University, Ankara, Turkey http://web.cs.hacettepe.edu.tr/~onderefe/ e-mail : onderefe@gmail.com

  2. Multiple Dataset Learning Can a neural network learn multiple datasets simultaneously? Consider simple problem: Each dataset has the same input vectors but different target output values. Find a masked NN such that each one of the datasets are learned for a particularly selected mask.

  3. Introduction Levenberg-Marquardt (LM) algorithm is a powerful approach to optimize the parameters of a neural network (NN). Given a training dataset, the algorithm synthesizes the best path toward the optimum. This work demonstrates the use of LM optimization algorithm when there are more than one dataset and on/off type switching of NN parameters is allowed.

  4. Introduction For each dataset a pre-selected set of parameters are allowed for modification and the proposed scheme reformulates the Jacobian under the switching mechanism. The results show that a NN can store information available in different datasets by a simple modification to the original LM algorithm. The results are verified on a regression problem.

  5. Introduction

  6. On/Off Type Masking According to the figure, we deduce two immediate facts for the masking mechanism. Every single parameter (weight or bias) must be active at least for one dataset. The masking scheme must allow information flow from each input to each output.

  7. p it e ( ) Nomenclature Discrete time index Output vector of the neural network at time t (ith entry of y at time t) Target output vector at time t (ith entry of at time t) Input vector of the neural network at time t (ith entry of u at time t) Parameter (weight/bias) vector of neural network at time t (ith entry of w at time t) Identity matrix Jacobian of the classical LM algorithm at time t Jacobian of the proposed LM algorithm at time t Error vector of the classical LM algorithm at time t Error vector of the proposed LM algorithm at time t Step size parameter Mask vector for the dth dataset (ith entry of md) ith entry of the error vector for the pth input/output pair at time t t y(t) (yi(t)) (t) ( i(t)) u(t)(ui(t)) w(t)(wi(t)) I J(t) JF(t) E(t) EF(t) md (mid) ( ) e D Pd P N L LF K p it Number of different datasets Number of input/output pairs in dataset d Number of input/output pairs for classical single dataset learning setting Total number of adjustable parameters in the neural network Loss function for a single dataset Combined loss functions for multiple dataset setting Number of NN outputs

  8. A Dataset to Learn

  9. Levenberg-Marquardt Algorithm for Single Dataset Case + = w w ( 1) ( ) t t ( ) 1 + T T I J J J E ( ) t ( ) t ( ) t ( ) t N N

  10. w e w 1 1 1 1 1 1 e w e w e w e w e 1 1 1 2 e e ( ) ( ) t t 1 2 N 1 2 1 2 1 2 1 2 N 1 K e 1 K 1 K 1 K ( ) t e w e w e w 1 K 1 2 N K N 2 1 2 2 e e ( ) ( ) t t w 2 1 2 1 2 1 e w e w e w e w e 1 2 N w 2 2 2 2 2 2 e = E = E ( ) t T E ( ) ( ) t ( ) t L t 1 2 = N J ( ) t 2 K e ( ) t 2 K 2 K 2 K e w e w e w 1 K 1 2 N K N P P P e w e w e w e w e w e w P e e ( ) ( ) t t 1 1 1 1 P 1 2 N P P P 2 2 2 2 1 2 N P K e ( ) t P K P K P K e w e w e w 1 K 1 PK 1 2 N K N PK N

  11. Algorithm1 :t=1; 2 : L=Inf; =1; 4 : PerformanceLevel = ChooseLevel; 5 : SetNeuralNetwork 6 : while L> PerformanceLevel 7 : E =[]; 8 : J = []; 9 : % Pattern loop 10 : for p = 1 to P 11 : y = ForwardPassPattern#p 12 : % Append sample error y to E 13 : E= [E y] 14 : % Loop over each output of the NN 15 : for k = 1 toK 16 : Row_of_J = []; 17 : % Parameter (weight/bias) index 18 : for i = 1 to N 19 : Compute ekp/ wi 20 : Row_of_J = [Row_of_J ekp/ wi] 21 : end 1

  12. Algorithm1 Contd 22 :J = [J ; Row_of_J] 23 : % One row of J(t) is ready and appended 24 : end 25 : % One K N sub-block of J(t) is ready 26 : end 27 : % J(t) is ready, update now 28 :w(t+1) = w(t) ( I+J(t)TJ(t))-1J(t)TE(t) 29 : DistributeNewValuesToTheirPositions 30 :L = ETE 31 : DisplayCost L 32 : Update IfNecessary 33 : RunEarlyStoppingRoutineIfNecessary 34 :t = t+1; 35 : end

  13. Modified Levenberg Marquardt Algorithm Mask, weight and masked weight Multiple masks lead to multiple subnetworks

  14. Multiple Datasets

  15. w e w 1 1 1 1 1 1 e w e w e w e w e d d d N m m m J J 1 2 1 1 2 N 1 2 1 2 1 2 d d d N m m m 1 2 2 = J 1 2 N F 1 K 1 K 1 K e w e w e w d d d N m m m J D 1 2 D K P N 1 2 N K N d 2 1 2 1 2 1 e w e w e w e w e = 1 d d d d N m m m 1 2 w e w 1 2 N 1 E E ( ) ( ) t t 2 2 2 2 2 2 d d d N m m m 1 2 2 1 2 N = J = E ( ) t d F 2 K 2 K 2 K e w e w e w d d d N m m m D E 1 2 ( ) t 1 2 N K N P P P e w e w e w d d d d d d N m m m 1 1 1 D ( ) ( ) 1 2 = E T d d 1 2 N E ( ) ( ) t ( ) t L t P P P e w e w e w F d d d d d d N m m m 2 2 2 = 1 d 1 2 1 2 N P K P K P K e w e w e w d d d d d d N m m m 1 2 d P K N 1 2 N K N

  16. Modified Levenberg Marquardt Algorithm + = w w ( 1) ( ) t t ( ) 1 + T T I J J J E ( ) t ( ) t ( ) t ( ) t N N F F F F D = ( ) ( )/ MSE t L t P F d = 1 d

  17. Algorithm 2 1 : t=1; 2 :L=Inf; 3 : =1; 4 :PerformanceLevel = ChooseLevel; 5 :SetNeuralNetwork 6 :SetMaskVectors 7 :while L> PerformanceLevel 8 : EF=[]; 9 : JF= []; 10 : % Dataset loop 11 : for d =1 to D 12 : % Pattern loop 13 : for p = 1 to Pd 14 : y = ForwardPassPattern#p 15 : % Append sample error y to E 16 : EF= [EF y] 17 : % Loop over each output of the NN 18 : for k = 1 toK 19 : Row_of_Jd = []; 20 : % Parameter (weight/bias) index 21 : for i = 1 to N

  18. Compute midekp/wi Row_of_Jd = [Row_of_Jd mid ekp/ wi] end Jd = [Jd; Row_of_Jd] % One row of J(t) is ready and appended end % One K N sub-block of J(t) is ready end % Jd is ready. Append to Jacobian JF = [JF;Jd] end % Jacobian JF(t) is ready, update now w(t+1) = w(t) ( I+JF(t)TJF(t))-1JF(t)TEF(t) DistributeNewValuesToTheirPositions LF = EFTEF DisplayCost LF Update IfNecessary RunEarlyStoppingRoutineIfNecessary t = t+1; 22 : 23 : 24 : 25 : 26 : 27 : 28 : 29 : 30 : 31 : 32 : 33 : 34 : 35 : 36 : 37 : 38 : 39 : 40 : 41 :end

  19. An Illustrative Example

  20. An Illustrative Example After 2000 iterations of running Algorithm 2, MSE decreases to 9.882e 5 and the results shown in Fig. 2 are obtained. According to the figure, the NN is proved capable of learning multiple datasets simultaneously. Since the training considers all training samples together, the proposed scheme is not vulnerable to the well known catastrophic forgetting phenomenon.

  21. Setting the Subnetworks We will Show how to initialize the elements of the mask vector ( ). For this, define a random variable (u) that distributes uniformly between 0 and 1. Based on this, we adopted the following scheme: 1 0.3 0 0.3 u d i m u Threshold = d i m d = 1,2, ,D; i = 1,2, ,N

  22. Setting the Subnetworks The chosen threshold above is 0.3. One can increase or decrease this threshold to obtain similar results. As the threshold approaches 0, it becomes difficult to learn any of the datasets as the degrees of freedom allocated for each dataset reduces and the subnetwork activates few parameters. As the threshold approaches 1, it becomes impossible to learn multiple datasets as most parameters are kept active for all datasets. Depending on the size of the NN, one should be aware of this and choose a reasonable value that generates a good learning performance.

  23. Setting the Subnetworks Threshold and NN size must be chosen such that Weights used for #1 Weights used for #2 {w}3 {w}1 {w}2 {w}15 {w}14 {w}6 {w}5 {w}4 {w}7 {w}8 {w}9 {w}10 {w}11 {w}12 {w}13 Weights used for #4 Weights used for #3 {w}16 This must be empty set

  24. Conclusions We propose a modification to the Levenberg Marquardt algorithm. The approach allows learning of multiple datasets simultaneously and catastrophic forgetting is prevented. Each dataset is accompanied by a mask vector that activates a particular parameter set resulting in a subnetwork. The obtained subnetwork realizes the information in the selected dataset. Simulations results prove the solvability of the multiple optimization problems within the Levenberg Marquardt algorithm s setup.

Related


More Related Content