Learning Algorithms
Associative memories (AM) store and retrieve patterns based on psychological concepts. Neural networks can work as Content-Addressable Memory (CAM), associating stimuli and recalling related memories. Auto-Associative and Hetero-Associative are two types. Associative memory stores data with metadata for easy retrieval, producing associated patterns when triggered by input. Using parallel processing, it quickly matches and retrieves data even with incomplete queries.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Unit-II Learning Algorithms Course Outcome: Perform the training of neural networks using various learning rules. Note: The material to prepare this Presentation and Notes has been taken from internet, books and are generated only for students reference and not for commercial use.
Learning & Memory Associative memories (AM) are pattern storage and retrieval systems inspired by the psychological concept of the same name. As demonstrated by key findings in learning and memory research, a given stimulus (object, shape, word) is stored as a memory in the brain, and is associated with related stimuli. When we experience stimuli that triggers an AM, we can recall memories associated with those inputs. Have you ever smelled and/or tasted a dish and recalled strong memories of other places and times? How about listening to an old song and remembering what you were doing when you first heard it? You hear the phrase, & &&.....& , and you can recall the rest of the quote.
These kinds of neural networks work on the basis of pattern association, which means they can store different patterns and at the time of giving an output they can produce one of the stored patterns by matching them with the given input pattern. These types of memories are also called Content-Addressable Memory (CAM). Associative memory makes a parallel search with the stored patterns as data files. Following are the two types of associative memories we can observe Auto Associative Memory Hetero Associative memory
How Does Associative Memory Work? In conventional memory, data is stored in specific locations, called addresses, and retrieved by referencing those addresses. In associative memory, data is stored together with additional tags or metadata that describe its content. Associative memory is a depository of associated pattern which in some form. If the depository is triggered with a pattern, the associated pattern pair appear at the output. The input could be an exact or partial representation of a stored pattern.
If the memory is produced with an input pattern, may say , the associated pattern is recovered automatically.
When a search is performed, the associative memory compares the search query with the tags of all stored data, and retrieves the data that matches the query. Associative memory is designed to quickly find matching data, even when the search query is incomplete or imprecise. This is achieved by using parallel processing techniques, where multiple search queries can be performed simultaneously. The search is also performed in a single step, as opposed to conventional memory where multiple steps are required to locate the data.
Applications of Associative memory :- It can be only used in memory allocation format. Networking: Associative memory is used in network routing tables to quickly find the path to a destination network based on its address. Image processing: Associative memory is used in image processing applications to search for specific features or patterns within an image. Artificial intelligence: Associative memory is used in artificial intelligence applications such as expert systems and pattern recognition. Database management: Associative memory can be used in database management systems to quickly retrieve data based on its content.
Advantages of Associative memory :- It is used where search time needs to be less or short. It is suitable for parallel searches. It is often used to speedup databases. It is used in page tables used by the virtual memory and used in neural networks. Disadvantages of Associative memory :- It is more expensive than RAM. Each cell must have storage capability and logical circuits for matching its content with external argument.
Auto Associative Memory This is a single layer neural network in which the input training vector and the output target vectors are the same. The weights are determined so that the network stores a set of patterns. Architecture:
Auto-associative: X = Y *Recognize noisy versions of a pattern
As shown in the the architecture of Auto Associative memory network has n number of input training vectors and similar n number of output target vectors. Consider x[1], x[2], x[3], .. x[M], be the number of stored pattern vectors, and let x[m] be the element of these vectors, showing characteristics obtained from the patterns. The auto-associative memory will result in a pattern vector x[m] when putting a noisy or incomplete version of x[m]. Applications: Pattern Recognition Bio-informatics Voice Recognition Signal Validation etc.
Hetero Associative memory In this network the input training vector and the output target vectors are not the same. The weights are determined so that the network stores a set of patterns. Hetero associative network is static in nature, hence, there would be no non-linear and delay operations. In a hetero-associate memory, the recovered pattern is generally different from the input pattern not only in type and format but also in content. It is also known as a hetero-associative correlator.
Hetero-associative Bidirectional: X <> Y *Iterative correction of input and output
Architecture As shown in the following figure, the architecture of Hetero Associative Memory network has n number of input training vectors and m number of output target vectors.
Consider we have a number of key response pairs {a(1), x(1)}, {a(2),x(2)}, ..,{a(M), x(M)}. The hetero-associative memory will give a pattern vector x(m) when a noisy or incomplete version of the a(m) is given. In a hetero-associative network, the input pattern is associated with a different output pattern, allowing the network to learn and remember the associations between the two sets of patterns. This type of memory network is commonly used in applications such as data compression and data retrieval.
Hopfield Network Hopfield neural network was invented by Dr. John J. Hopfield in 1982. It consists of a single layer which contains one or more fully connected recurrent neurons. The Hopfield network is commonly used for auto-association and optimization tasks. A Hopfield network is a single-layered and recurrent network in which the neurons are entirely connected, i.e., each neuron is associated with other neurons. If there are two neurons i and j, then there is a connectivity weight wij lies between them which is symmetric wij = wji . With zero self-connectivity, Wii =0 is given below. Here, the given three neurons having values i = 1, 2, 3 with values Xi= 1 have connectivity weight Wij.
A Hopfield network which operates in a discrete line fashion or in other words, it can be said the input and output patterns are discrete vector, which can be either binary 0,1 or bipolar +1, 1 in nature. The network has symmetrical weights with no self-connections i.e., wij = wji and wii = 0.
The Hopfield networks are categorized into two categories. These are: 1. Discrete Networks 2. Continuous Networks Discrete Networks: These networks give any of the two discrete outputs. Based on the output received, further two types: Binary: In this type, the output is either 0 or 1. Bipolar: In bipolar networks, the output is either -1 (When output < 0) or 1 (When output > 0) Continuous Networks: Instead of receiving binary or bipolar output, the output value lies between 0 and 1.
Bidirectional Associative Memory (BAM) Bidirectional associative memory (BAM), first proposed by Bart Kosko in the year 1988. The BAM network performs forward and backward associative searches for stored stimulus responses. The BAM is a recurrent hetero associative pattern-marching nerwork that encodes binary or bipolar patterns using Hebbian learning rule. It associates patterns, say from set A to patterns from set B and vice versa is also performed. BAM neural nets can respond to input from either layers (input layer and output layer).
BAM Architecture: When BAM accepts an input of n-dimensional vector X from set A then the model recalls m-dimensional vector Y from set B. Similarly when Y is treated as input, the BAM recalls X.
Bidirectional Associative Memory (BAM) is a supervised learning model in Artificial Neural Network. This is hetero-associative memory, for an input pattern, it returns another pattern which is potentially of a different size. This phenomenon is very similar to the human brain. Human memory is necessarily associative. It uses a chain of mental associations to recover a lost memory like associations of faces with names, in exam questions with answers, etc. The main objective to introduce such a network model is to store hetero- associative pattern pairs.This is used to retrieve a pattern given a noisy or incomplete pattern.
Auto associative Memory Vs Hetero associative Memory Auto associative Memory 1.The inputs and output vectors s and t are the same. 2.The Hebb rule is used as a learning algorithm or calculate the weight matrix by summing the outer products of each input-output pair. 3.The auto associative application algorithm is used to test the algorithm 4. Eg. Colour Correction, Colour Constancy. Hetero associative Memory 1.The inputs and output vectors s and t are different. 2. The Hebb rule is used as a learning algorithm or calculate the weight matrix by summing the outer products of each input-output pair. 3.The hetero associative application algorithm is used to test the algorithm. 4. Eg. Space Transformation (Fourier), Dimensionality reduction.
Hebbian Learning The neuroscientific concept of Hebbian learning was introduced by Donald Hebb in his 1949 publication of The Organization of Behaviour. The basis of the theory is when our brains learn something new, neurons are activated and connected with other neurons, forming a neural network. These connections start off weak, but each time the stimulus is repeated, the connections grow stronger and stronger, and the action becomes more intuitive. A good example is the act of learning to drive. When you start out, everything you do is incredibly deliberate. You remind yourself to turn on your indicator, to check your blind spot, and so on. However, after years of experience, these processes become so automatic that you perform them without even thinking.
When an axon of cell A is near enough to excite cell B and repeatedly or permanently takes place in firing it, some growth process or metabolic changes takes place in one or both the cells such that A s efficiency, as one of the cells firing B, is increased. In this, if two interconnected neurons are ON simultaneously then the weight associated with these neurons can be increased by the modification made in their synaptic gaps(strength). The weight update in the Hebb rule is given by; ith value of w(new) = ith value of w(old) + (ith value of x * y)
Hebbian Learning Rule Algorithm STEP 1: Initialize the weights and bias to 0 i.e w1=0,w2=0, . , wn=0. STEP 2: 2 4 have to be performed for each input training vector and target output pair i.e. s:t (s=training input vector, t=training output vector) STEP 3: Input units activation are set and in most of the cases is an identity function(one of the types of an activation function) for the input layer; ith value of x = ith value of s for i=1 to n STEP 4: Output units activations are set y:t STEP 5: Weight adjustments and bias adjustments are performed; ith value of w(new) = ith value of w(old) + (ith value of x * y) new bias(value) = old bias(value) + y
Designing a Hebb network to implement AND function: AND function is very simple and mostly known to everyone where the output is 1/SET/ON if both the inputs are 1/SET/ON. But in the above example, we have used -1' instead of 0 this is because the Hebb network uses bipolar data and not binary data because the product item in the above equations would give the output as 0 which leads to a wrong calculation.
Starting with setp1 which is inializing the weights and bias to 0, so we get w1=w2=b=0 A) First input [x1,x2,b]=[1,1,1] and target/y = 1. Now using the initial weights as old weight and applying the Hebb rule(ith value of w(new) = ith value of w(old) + (ith value of x * y)) as follow; w1(new) = w1(old) + (x1*y) = 0+1 * 1 = 1 w2(new) = w2(old) + (x2*y) = 0+1 * 1 = 1 b(new) = b(old) + y = 0+1 =1 Now the above final weights act as the initial weight when the second input pattern is presented. And remember that weight change here is; ith value of w = ith value of x * y
hence weight changes relating to the first input are; w1= x1.y = 1*1=1 w2 = x2.y = 1*1=1 b = y = 1
B) Second input [x1,x2,b]=[1,-1,1] and target/y = -1. Weight change here is; w1 = x1*y = 1*-1 = -1 w2 =x2*y = -1 * -1 = 1 b = y = -1 The new weights here are; w1(new) = w1(old) + w1= 1 1 = 0 w2(new) = w2(old) + w2= 1+1 = 2 b(new) = b(old) + b= 1 1=0
similarly, using the same process for third and fourth row we get a new table as follows;
It consists of a set of hierarchically layered units in which each layer connects, via excitatory connections, with the layer immediately above it, and has inhibitory connections to units in its own layer. In the most general case, each unit in a layer receives an input from each unit in the layer immediately below it and projects to each unit in the layer immediately above it. Moreover, within a layer, the units are broken into a set of inhibitory clusters in which all elements within a cluster inhibit all other elements in the cluster. Thus the elements within a cluster at one level compete with one another to respond to the pattern appearing on the layer below. The more strongly any particular unit responds to an incoming stimulus, the more it shuts down the other members of its cluster.
Competitive learning takes place in a context of sets of hierarchically layered units. Units are represented in the diagram as dots. Units may be active or inactive. Active units are represented by filled dots, inactive ones by open dots. In general, a unit in a given layer can receive inputs from all of the units in the next lower layer and can project outputs to all of the units in the next higher layer. Connections between layers are excitatory and connections within layers are inhibitory.
Each layer consists of a set of clusters of mutually inhibitory units. The units within a cluster inhibit one another in such a way that only one unit per cluster may be active. We think of the configuration of active units on any given layer as representing the input pattern for the next higher level. There can be an arbitrary number of such layers. A given cluster contains a fixed number of units, but different clusters can have different numbers of units.
Competitive learning has the following properties: Competitive learning has the following properties: 1. The units in a given layer are broken into several sets of nonoverlapping clusters. Each unit within a cluster inhibits every other unit within a cluster. Within each cluster, the unit receiving the largest input achieves its maximum value while all other units in the cluster are pushed to their minimum value. We have arbitrarily set the maximum value to 1 and the minimum value to 0. 2. Every unit in every cluster receives inputs from all members of the same set of input units. 3. A unit learns if and only if it wins the competition with other units in its cluster. 4. A stimulus pattern Sj consists of a binary pattern in which each element of the pattern is either active or inactive. An active element is assigned the value 1 and an inactive element is assigned the value 0.
5. Each unit has a fixed amount of weight (all weights are positive) that is distributed among its input lines. The weight on the line connecting to unit i on the upper layer from unit j on the lower layer is designated wij. The fixed total amount of weight for unit j is designated jwij = 1. A unit learns by shifting weight from its inactive to its active input lines. If a unit does not respond to a particular pattern, no learning takes place in that unit. If a unit wins the competition, then each of its input lines gives up some portion of its weight and that weight is then distributed equally among the active input lines.
Mathematically, this learning rule can be stated: where activejk is equal to 1 if in stimulus pattern Sk, unit j in the lower layer is active and is zero otherwise, and nactivek is the number of active units in pattern Sk
Features of Competitive Learning Each cluster classifies the stimulus set into M groups, one for each unit in the cluster. Each of the units captures roughly an equal number of stimulus patterns. It is possible to consider a cluster as forming an M-valued feature in which every stimulus pattern is classified as having exactly one of the M possible values of this feature. Thus, a cluster containing two units acts as a binary feature detector. One element of the cluster responds when a particular feature is present in the stimulus pattern, otherwise the other element responds. If there is structure in the stimulus patterns, the units will break up the patterns along structurally relevant lines. Roughly speaking, this means that the system will find clusters if they are there.
If the stimuli are highly structured, the classifications are highly stable. If the stimuli are less well structured, the classifications are more variable, and a given stimulus pattern will be responded to first by one and then by another member of the cluster. The particular grouping done by a particular cluster depends on the starting value of the weights and the sequence of stimulus patterns actually presented. A large number of clusters, each receiving inputs from the same input lines can, in general, classify the inputs into a large number of different groupings or, alternatively, discover a variety of independent features present in the stimulus population. This can provide a kind of distributed representation of the stimulus patterns. To a first approximation, the system develops clusters that minimize within-cluster distance, maximize between-cluster distance, and balance the number of patterns captured by each cluster. In general, tradeoffs must be made among these various forces and the system selects one of these tradeoffs.
Applications of Competitive Learning Competitive learning is widely used in applications like vector quantization, pattern recognition, and feature extraction, making this learning technique an essential component of various artificial neural networks, including Self-Organizing Maps (SOMs) and Adaptive Resonance Theory (ART) networks.
Error-Correction Learning Error-Correction Learning, used with supervised learning, is the technique of comparing the system output to the desired output value, and using that error to direct the training. In the most direct route, the error values can be used to directly adjust the tap weights, using an algorithm such as the backpropagation algorithm. If the system output is y, and the desired system output is known to be d, the error signal can be defined as: e = d y Error correction learning algorithms attempt to minimize this error signal at each training iteration. The most popular learning algorithm for use with error-correction learning is the backpropagation algorithm
Gradient Descent Gradient Descent is an optimization algorithm for finding a local minimum of a differentiable function. Gradient descent in machine learning is simply used to find the values of a function's parameters (coefficients) that minimize a cost function as far as possible. "A gradient measures how much the output of a function changes if you change the inputs a little bit." The gradient descent algorithm is used to minimize an error function g(y), through the manipulation of a weight vector w.
The cost function should be a linear combination of the weight vector and an input vector x. The algorithm is: w i j [ n + 1 ] = w i j [ n ] + g ( w i j [ n ] ) Here, is known as the step-size parameter, and affects the rate of convergence of the algorithm. If the step size is too small, the algorithm will take a long time to converge. If the step size is too large the algorithm might oscillate or diverge. The gradient descent algorithm works by taking the gradient of the weight space to find the path of steepest descent. By following the path of steepest descent at each iteration, we will either find a minimum, or the algorithm could diverge if the weight space is infinitely decreasing. When a minimum is found, there is no guarantee that it is a global minimum, however.
Supervised learning Supervised learning, as the name indicates, has the presence of a supervisor as a teacher. Supervised learning is when we teach or train the machine using data that is well-labelled. Which means some data is already tagged with the correct answer. After that, the machine is provided with a new set of examples(data) so that the supervised learning algorithm analyses the training data(set of training examples) and produces a correct outcome from labeled data. During the training of ANN under supervised learning, the input vector is presented to the network, which will produce an output vector. This output vector is compared with the desired/target output vector.
An error signal is generated if there is a difference between the actual output and the desired/target output vector. On the basis of this error signal, the weights would be adjusted until the actual output is matched with the desired output. Types of Supervised Learning: Regression: A regression problem is when the output variable is a real value, such as dollars or weight . Classification: A classification problem is when the output variable is a category, such as Red or blue , disease or no disease . Supervised learning deals with or learns with labeled data. This implies that some data is already tagged with the correct answer.
1. Regression Regression is a type of supervised learning that is used to predict continuous values, such as house prices, stock prices, or customer churn. Regression algorithms learn a function that maps from the input features to the output value. This algorithm produces a numerical target for each example, for instance, how much revenue will be generated from a new marketing campaign. 2. Classification Classification is a type of supervised learning that is used to predict categorical values, such as whether a customer will churn or not, whether an email is spam or not, or whether a medical image shows a tumor or not. Classification algorithms learn a function that maps from the input features to a probability distribution over the output classes.
Advantages of Supervised Learning: Advantages of Supervised Learning: Explicit Feedback: Supervised learning relies on labeled data, which provides explicit feedback on the model s predictions. This feedback is valuable for model training and improvement. Predictive Accuracy: Supervised learning models can achieve high predictive accuracy when trained on high-quality, representative data. They are effective in tasks like classification and regression. Generalization: Well-trained supervised models can generalize their knowledge to make accurate predictions on new, unseen data points, making them suitable for real-world applications. Interpretability: Some supervised learning algorithms, like linear regression and decision trees, provide interpretable models that allow users to understand the relationships between input features and predictions. Wide Range of Applications: Supervised learning can be applied to a wide range of domains, including healthcare, finance, natural language processing, computer vision, and more. Availability of Tools and Libraries: There are numerous tools, libraries (e.g., sci-kit-learn, TensorFlow, PyTorch), and resources available for implementing and experimenting with supervised learning algorithms.
Disadvantages of Supervised Learning: Data Labeling Requirement: Supervised learning relies on labeled data, which can be expensive and time-consuming to obtain, especially for large datasets. Limited to Labeled Data: The model can only make predictions on data similar to what it was trained on, limiting its ability to handle novel or unexpected situations. Bias and Noise in Labels: If labeled data contains biases or errors, the model may learn and perpetuate those biases, leading to unfair or inaccurate predictions. Overfitting: There s a risk of overfitting, where the model learns the training data too well, capturing noise rather than the underlying patterns. Regularization techniques are often required to mitigate this. Feature Engineering: Selecting and engineering relevant features is a crucial step in building effective supervised learning models. Poor feature selection can lead to suboptimal performance.
Scalability: Training complex models with large datasets can be computationally expensive and time-consuming, requiring substantial computing resources. Limited to Labeled Data Distribution: Supervised models are constrained by the distribution of labeled data and may not perform well when faced with data from a different distribution. Privacy Concerns: In some applications, the use of labeled data may raise privacy concerns, as it can reveal sensitive information about individuals. Imbalanced Data: When dealing with imbalanced datasets (e.g., rare disease detection), supervised models may struggle to predict minority classes accurately. Concept Drift: Over time, the relationship between input features and the target variable may change (concept drift). Supervised models may require constant retraining to adapt to these changes.