
Applications of Reinforcement Learning: Robotics and Neural Networks Insights
Stay updated with the latest in reinforcement learning applications focusing on robot soccer in the RoboCup, along with invaluable insights into neural networks and artificial intelligence developments in machine learning. Dive deep into data mining concepts and techniques for effective classification evaluation, including N-Fold cross-validation methods.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
News We., March 30 GHC tasks for Groups I and J, to be presented on April 4, have been posted! Reminder: Task3 is due tomorrow, March 31 end of the day. Today s Background: Snow Monsters at Mount Zao, Japan The Snow Monsters of Mt. Zao (thehiddenjapan.com) March 28, Lab slides and Code is available now! Today s Lecture Topic: Applications of RL: Robot Soccer: Call for Applications for Participation 2022 RoboCup Standard Platform Leaguehttps://www.youtube.com/watch?v=BN7t9dTbRyI RoboCup Federation official website https://www.japankyo.com/2017/04/wacky-weird-interesting-japanese-news-robot-soccer-world- cup-robocup-2017-nagoya-promotional-video/Nao-Team HTWK vs. Nao Devils - Final RoboCup German Open 2018 - Bing video Neural Networks 03/30/22 Introduction to Data Mining, 2ndEdition slides with a lot of slides added by Dr. Eick 1
COSC 4368 Machine Learning Organization 1. Introduction to Machine Learning 2. Reinforcement Learning 3. Introduction to Supervised Learning 4. Support Vector Machines 5. Neural Networks 6. Deep Learning: Transformers & CNN 03/30/22 Introduction to Data Mining, 2ndEdition slides with a lot of slides added by Dr. Eick 2
N-Fold Cross Validation 10-fold cross validation is the most popular technique to evaluate classifiers Leave one out and stratified cross validation also has some popularity Cross validation is usually performed class stratified (frequencies of examples of a particular class are approximately the same in each fold). Example should be assigned to folds randomly (if not cheating!) Accuracy:= % of testing examples classified correctly Example: 3-fold Cross-validation; examples of the dataset are subdivided into 3 joints sets (preserving class frequencies); then training/test-set pairs are constructed as follows: 1 2 1 3 Training: 2 3 3 1 2 Testing: 03/30/22 Introduction to Data Mining, 2ndEdition slides with a lot of slides added by Dr. Eick 3
COSC 4368 NN Lecture(s) Organization 1. Video Amplified Partners/3blueonebrown: What is a NN?: https://www.bing.com/videos/search?q=neural+network+video&view=detail&mid=54402D363ABB 8903202F54402D363ABB8903202F&FORM=VIRE (almost 9 million views; you will watch only the first 13:00 of this video) 2. Followed by discussing the slides in this slideshow 3. a Tensorflow NN Demo (15-20 minutes) conducted by Group X 4. Continue discussion of this slideshow also viewing a second video for 12 minutes on gradient decent approaches used to learn NN weights 03/30/22 Introduction to Data Mining, 2nd Edition slides with a lot of slides added by Dr. Eick 4
Neural Networks Lecture Notes for Chapter 4 Artificial Neural Networks Introduction to Data Mining , 2nd Edition by Tan, Steinbach, Karpatne, Kumar Slides 9,13-23 added by Dr. Eick 03/30/22 Introduction to Data Mining, 2nd Edition slides with a lot of slides added by Dr. Eick 5
Artificial Neural Networks (ANN) Black box Input X1 1 1 1 1 0 0 0 0 X2 0 0 1 1 0 1 1 0 X3 0 1 0 1 1 0 1 0 Y -1 1 1 1 -1 -1 1 -1 X1 Output X2 Y X3 Output Y is 1 if at least two of the three inputs are equal to 1. 03/30/22 Introduction to Data Mining, 2nd Edition slides with a lot of slides added by Dr. Eick 6
Artificial Neural Networks (ANN) Input nodes Black box X1 1 1 1 1 0 0 0 0 X2 0 0 1 1 0 1 1 0 X3 0 1 0 1 1 0 1 0 Y -1 1 1 1 -1 -1 1 -1 Output node X1 0.3 0.3 X2 Y X3 0.3 t=0.4 = + + if 3 . 0 ( 3 . 0 3 . 0 x 4 . 0 ) Y sign X X X 1 2 3 0 1 0 = where ( ) sign x 1 if x 03/30/22 Introduction to Data Mining, 2nd Edition slides with a lot of slides added by Dr. Eick 7
Artificial Neural Networks (ANN) Various types of neural network topology single-layered network (perceptron) versus multi-layered network Feed-forward versus recurrent network Various types of activation functions (f) ( = ) Y f w iX i i 03/30/22 Introduction to Data Mining, 2nd Edition slides with a lot of slides added by Dr. Eick 8
Multilayer Neural Network Hidden layers intermediary layers between input & output layers More general activation functions (sigmoid, linear, etc) 03/30/22 Introduction to Data Mining, 2nd Edition slides with a lot of slides added by Dr. Eick 9
General Structure of ANN x1 x2 x3 x4 x5 Input Layer Input Neuron i Output I1 wi1 wi2 wi3 Activation function g(Si ) Oi Si I2 Oi Hidden Layer I3 threshold, t Output Layer Training ANN means learning the weights of the neurons ( y = ) Y f w iX i i 03/30/22 Introduction to Data Mining, 2nd Edition slides with a lot of slides added by Dr. Eick 10
Neural Network Terminology A neural network is composed of a number of units (nodes) that are connected by links. Each link has a weight associated with it. Each unit has an activation level and a means to compute the activation level at the next step in time. Most neural networks are decomposed of a linear component called input function, and a non-linear component call activation function. Popular activation functions include: relu, tanh, and sigmoid function. The architecture of a neural network determines how units are connected and what activation function are used for the network computations. Architectures are subdivided into feed-forward and recurrent networks. Moreover, single layer and multi-layer neural networks (that contain hidden units) are distinguished. Learning in the context of neural networkscenters on finding good weights for a given architecture so that the error in performing a particular task is minimized. Most approaches center on learning a function from a set of training examples, and use hill-climbing and steepest decent hill-climbing approaches to find the best values for the weights that lead to the lowest error . Loss functions are mechanisms that compute a NN s error for a training set; NN training looks for finding weights which minimize this function. 03/30/22 Introduction to Data Mining, 2nd Edition slides with a lot of slides added by Dr. Eick 11
Gradient Descent for Multilayer NN E Weight update: + = ( ) 1 ( ) k k w w j j w j N 1 Error function: = i j = ( ) E t f w x i j ij 2 1 Activation function f must be differentiable For sigmoid function: i + = + ( ) 1 ( ) k k ( ) i 1 ( ) w w t o o o x j j i i i ij 03/30/22 Introduction to Data Mining, 2nd Edition slides with a lot of slides added by Dr. Eick 12
Gradient Descent for MultiLayer NN Hidden layer k-1 Hidden layer k Hidden layer k+1 For output neurons, weight update formula is the same as before (gradient descent for perceptron) Neuron p Neuron x wpi wix Neuron i wqi wiy For hidden neurons: Neuron q Neuron y j 1 ( + = + ( ) 1 ( ) k k i 1 ( ) w w o o w x pi pi i j ij pi i : = Output neurons )( ) o o t o j j j j j k : = Hidden neurons j 1 ( ) o o w j j k jk j 03/30/22 Introduction to Data Mining, 2nd Edition slides with a lot of slides added by Dr. Eick 13
Neural Network Learning ---Mostly Steepest Descent Hill Climbing on a Differentiable Error Function Important: How far you jump depends on: the learning rate . On the error |T-O| The input activation of the node Current Weight Vector Gradient of the Error Function New Weight Vector Remarks on : too low too high slow convergence might overshoot goal 03/30/22 Introduction to Data Mining, 2nd Edition slides with a lot of slides added by Dr. Eick 14
Error Function Gradient based on 2 Weights Video on this topic at starting at 2:50: https://www.youtube.com/watch?v=IHZwWFHWa-w&index=3&list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi&t=0s Remark: To minimize the error/loss function we will walk in the inverse direction of the arrows! If the steepest gradient is for example (1,2) then the second weight contributes more to the error; Consequently, it is increased twice as much in comparison to the increase of the first weight. 03/30/22 Introduction to Data Mining, 2nd Edition slides with a lot of slides added by Dr. Eick 15
Watch Second 3Blue1Brown Video Centering on using gradient descent to learn the optimal weights of a neural network: https://www.youtube.com/watch?v=IHZwWFHWa- w&list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB -3pi&index=2 Watch only the first 12:24 minutes of the video! 03/30/22 Introduction to Data Mining, 2nd Edition slides with a lot of slides added by Dr. Eick 16
GHC Groups K or L or M or N: I am looking for a group to give a 10-12 minute presentation on the topic "Will China be the Number One in AI in 2030 on Monday, April 18, followed by a 5-minute discussion. Send Dr. Eick an e-mail by Wednesday, April 6, noon the latest, if you are interested! China Number One in AI Soon??
News Mo., April 4 Reminder: Task4 is due Fr., April 8 end of the day. The midterm exam number grades should be available by the end of the day of April 6 the latest. During the lecture on April 6, we will be conducting some polls asking you all kind of questions: If you plan to attend the lecture on April 6 F2F: Please, bring your laptop!! Today s background: Sunflowers, the National Flower of the Ukraine. Today s Lecture Topic: GHC Presentation group I (centering on Q-Learning) Finish the Discussion of Neural Networks New Topic: Ethical and Societal Aspects of AI GHC Presentation Group J (centering on SVMs) 03/30/22 Introduction to Data Mining, 2nd Edition slides with a lot of slides added by Dr. Eick 18
NN Comp. w13 I1 a3 w35 w23 a5 ai:= activation of node i zi := linear input of node i: ai =g(zi) w14 w45 I2 a4 w24 g(x)= 1/(1+e x ) g (x)= g(x)*(1-g(x)) is the learning rate Example: all weights are 0.1 except w45=1; =0.2 Training Example: (I1=1,I2=1;a5=1) g is the sigmoid activation function a4=g(z4)=g(x1*w14+x2*w24)=g(0.2)=0.550 a3=g(z3)=g(x1*w13+x2*w23)=g(0.2)=0.550 a5=g(z5)=g(a3*w35+a4*w45)=g(0.605)=0.647 error(a5)=1-0.647=0.353 03/30/22 Introduction to Data Mining, 2nd Edition slides with a lot of slides added by Dr. Eick 19
Learning Multi-layer Neural Network Goal: Learn good weights Neural network learning computes error term e = y- f(w,x) for each training example and plugs this error into a loss-function and updates weights accordingly moving in the direction that reduces the error---or the value of a regularized error function---the most, following the direction of the steepest gradient. The length of the step in the direction of the steepest decent depends on the learning rate and other factors (see later discussion) Problem: how to determine the true value of y / the error for hidden nodes? Solution: Approximate the error in hidden nodes by back-propagating the error in the output nodes. 03/30/22 Introduction to Data Mining, 2nd Edition slides with a lot of slides added by Dr. Eick 20
Back Propagation Algorithm 1. Initialize the weights in the network (often randomly) 2. repeatfor each example e in the training set do a. O = neural-net-output(network, e) ; forward pass b. T = teacher output for e c. Calculate error (T - O) at the output units d. Compute error term i for the output node e. Compute error term i for nodes of the intermediate layer f. Update the weights in the network wij= *ai* j until all examples classified correctly or stopping criterion satisfied 3. return(network) 03/30/22 Introduction to Data Mining, 2nd Edition slides with a lot of slides added by Dr. Eick 21
Updating Weights in Neural Networks wij:= Old_wij + input_activationi associated_errorj Multi-layer Network: Associated_Error ( i)):= 1. Output Node i: g (zi)*(T-O) 2. Intermediate Node k connected to i: g (zi)*w ki *error_at_node_i w13 3 a1 a3 5 3 I1 w35 4 w23 a5 5 w14 w45 4 a2 a4 w24 I2 Multi-layer Network 03/30/22 Introduction to Data Mining, 2nd Edition slides with a lot of slides added by Dr. Eick 22
Back Propagation Formula Example g(x)= 1/(1+e x ) g (x)= g(x)*(1-g(x)) is the learning rate w13 I1 a3 w35 w23 ai:= activation of node i zi := linear input of node i: ai =g(zi) a5 w14 w45 I2 a4 w35= w35 + *a3* 5 w45= w45 + *a4* 5 w24 a4=g(z4)=g(x1*w14+x2*w24) a3=g(z3)=g(x1*w13+x2*w23) a5=g(z5)=g(a3*w35+a4*w45) 5=error*g (z5)=error*a5*(1-a5) 4= 5*w45*g (z4)= 5*w45*a4*(1-a4) 3= 5*w35*a3*(1-a3) w13= w13 + *x1* 3 w23= w23 + *x2* 3 w14= w14 + *x1* 4 w24= w24 + *x2* 4 03/30/22 Introduction to Data Mining, 2nd Edition slides with a lot of slides added by Dr. Eick 23
Example: all weights are 0.1 except w45=1; =0.2 Training Example: (x1=1,x2=1;a5=1) g is the sigmoid function Example BP g(x)= 1/(1+e x ) g (x)= g(x)*(1-g(x)) is the learning rate w13 I1 a3 w35 w23 a5 is 0.6483 with the adjusted weights! a5 w14 w45 w35= w35 + *a3* 5= 0.1+0.2*0.55*0.08=0.109 w45= w45 + *a4* 5=1.009 I2 a4 w24 a4=g(z4)=g(x1*w14+x2*w24)=g(0.2)=0.550 a3=g(z3)=g(x1*w13+x2*w23)=g(0.2)=0.550 a5=g(z5)=g(a3*w35+a4*w45)=g(0.605)=0.647 5=error*g (a5)=error*a5*(1-a5)= 0.353*0.353*0.647=0.08 4= 5*w45*a4*(1-a4)=0.02 3= 5*w35*a3*(1-a3)=0.002 w13= w13 + *x1* 3=0.1004 w23= w23 + *x2* 3=0.1004 w14= w14 + *x1* 4=0.104 w24= w24 + *x2* 4=0.104 a4 =g(0.208)=0.551 a3 =g(0.2008)=0.551 a5 =g(0.611554)=0.6483 03/30/22 Introduction to Data Mining, 2nd Edition slides with a lot of slides added by Dr. Eick 24
Example: all weights are 0.1 except w45=1; =1 Training Example: (x1=1,x2=1;a5=1) g is the sigmoid function Example BP a5 is 0.6594 with the adjusted weights! w13 I1 a3 w35 w23 a5 w14 w45 w35= w35 + *a3* 5= 0.1+1*0.55*0.08=0.145 w45= w45 + *a4* 5=1.045 w13= w13 + *x1* 3=0.102 w23= w23 + *x2* 3=0.102 w14= w14 + *x1* 4=0.12 w24= w24 + *x2* 4=0.12 a4 =g(0.24)=0.557 a3 =g(0.204)=0.554 a5 =g(0.66045)=0.66 I2 a4 w24 a4=g(z4)=g(x1*w14+x2*w24)=g(0.2)=0.550 a3=g(z3)=g(x1*w13+x2*w23)=g(0.2)=0.550 a5=g(z5)=g(a3*w35+a4*w45)=g(0.605)=0.647 5=error*g (z5)=error*a5*(1-a5)= *0.353*0.647*0.353=0.08 4= 5*w45*a4*(1-a4)=0.02 3= 5*w35*a3*(1-a3)=0.002 03/30/22 Introduction to Data Mining, 2nd Edition slides with a lot of slides added by Dr. Eick 25
Activation Functions Transforms neuron s input into output. Features of activation functions: A squashing effect is required Prevents accelerating growth of activation levels through the network. 26 03/30/22 Introduction to Data Mining, 2nd Edition slides with a lot of slides added by Dr. Eick 26
1. Sigmoid or Logistic Its Range is between 0 and 1. It is a S - shaped curve. It is easy to understand and apply but there are reasons which have made it fall out of popularity: Vanishing gradient problem Slow convergence. 27 03/30/22 Introduction to Data Mining, 2nd Edition slides with a lot of slides added by Dr. Eick 27
2. Tanh : hyperbolic tangent function It s a rescaling of the logistic sigmoid It is a S -shaped curve. Outputs range from -1 to 1. The advantage is that the negative inputs will be mapped strongly negative and the zero inputs will be mapped near zero in the tanh graph. The tanh function is a popular choice for classification problems involving exactly 2 classes 28 03/30/22 Introduction to Data Mining, 2nd Edition slides with a lot of slides added by Dr. Eick 28
3. Softplus function: hyperbolic tangent function Difference: Outputs produced by sigmoid and tanh functions have upper and lower limits whereas softplus function produces outputs in scale of (0, + ). 29 03/30/22 Introduction to Data Mining, 2nd Edition slides with a lot of slides added by Dr. Eick 29
4. ReLu: Rectified Linear units Most popular. Avoids and rectifies vanishing gradient problem . Cheap to compute as there is no complicated math. Converge faster. 30 03/30/22 Introduction to Data Mining, 2nd Edition slides with a lot of slides added by Dr. Eick 30
Design Issues in ANN Number of nodes in input layer One input node per binary/continuous attribute k or log2 k nodes for each categorical attribute with k values Number of nodes in output layer One output for binary class problem k for k-class problem Many other possibilities Which activation function(s) to use Number of nodes in hidden layer Initial weights and biases 03/30/22 Introduction to Data Mining, 2nd Edition slides with a lot of slides added by Dr. Eick 31
Characteristics of ANN Multilayer ANN are universal approximators but could suffer from overfitting if the network is too complex or if the number of training examples is too low. Gradient descent may converge to local minimum Model building can be very time consuming, but prediction/classification is very fast. Sensitive to noise in training data Difficult to handle missing attributes and symbolic attributes 03/30/22 Introduction to Data Mining, 2nd Edition slides with a lot of slides added by Dr. Eick 32
More NN Terminology (source Wikipedia) Batch size: Size of the training set that is used in each iteration. For example if batch size is 4, weights are adjusted using 4 training examples in each iteration of the weight learning process. Sometimes called mini-batch size. Epoch is 1 complete cycle where Neural network has seen all the data. That is if the batch size is 128 and training set size 2048, one epoch is completed after 16 iterations. That if in the above example the NN learning used 256 iterations that would be 16 epochs as 16*16=256. In mathematics, statistics, finance, computer science, particularly in machine learning and inverse problems, regularization is the process of adding information in order to solve an ill-posed problem or to prevent overfitting, usually to an objective / cost function. Example L1 regularization: In the case of neural networks i are the p weights of the neural network and the first term is the squared prediction error and is a parameter called regularization rate. 03/30/22 Introduction to Data Mining, 2nd Edition slides with a lot of slides added by Dr. Eick 33
More NN Terminology Part2 Batch normalization (also known as batch norm) is a method used to make artificial neural networks more stable and less sensitive to overfitting through normalization of the layers' inputs by re-centering and re-scaling.[1]It was proposed by Sergey Ioffe and Christian Szegedy in 2015.[2] Dropout: this technique consists of removing some nodes so that the NN is not too heavy. This can be implemented during the training phase. The idea is that we do not want our NN to be overwhelmed by information, especially if we consider that some nodes might be redundant and useless. So, while building our algorithm, we can decide to keep, for each training stage, each node with probability p (called keep probability ) or drop it with probability 1-p (called drop probability ). Remarks: Both techniques are used to alleviate overfitting 03/30/22 Introduction to Data Mining, 2nd Edition slides with a lot of slides added by Dr. Eick 34