
Understanding Bias and Variance in Machine Learning Algorithms
Learn about supervised machine learning, regression techniques, and the concepts of bias and variance in machine learning. Discover how errors in machine learning impact model accuracy and the differences between reducible and irreducible errors. Gain insights into the importance of reducing bias errors for more accurate predictions in data analysis.
Uploaded on | 0 Views
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Department of Computer Department of Computer Institute of Engineering Institute of Engineering Unit Unit- -III III Supervised Learning : Regression Supervised Learning : Regression BHUJBAL KNOWLEDGE CITY
Supervised Machine Learning Supervised Machine Learning: It is an ML technique where models are trained on labeled data i.e output variable is provided in these types of problems. Here, the models find the mapping function to map input variables with the output variable or the labels . Regression and Classification problems are a part of Supervised Machine Learning. BHUJBAL KNOWLEDGE CITY
Bias and Variance in Machine Learning Machine learning allows machines to perform data analysis and make predictions. However, if the machine learning model is not accurate, it can make prediction errors, and these prediction errors are usually known as Bias and Variance. In machine learning, these errors will always be present as there is always a slight difference between the model predictions and actual predictions. The main aim of ML/data science analysts is to reduce these errors in order to get more accurate results. BHUJBAL KNOWLEDGE CITY
Errors in Machine Learning? In machine learning, an error is a measure of how accurately an algorithm can make predictions for the previously unknown dataset. On the basis of these errors, the machine learning model is selected that can perform best on the particular dataset. There are mainly two types of errors in machine learning, which are: Reducible errors: These errors can be reduced to improve the model accuracy. Such errors can further be classified into bias and Variance. Irreducible errors: These errors will always be present in the model BHUJBAL KNOWLEDGE CITY
Bias In general, a machine learning model analyses the data, find patterns in it and make predictions. While training, the model learns these patterns in the dataset and applies them to test data for prediction. While making predictions, a difference occurs between prediction values made by the model and actual values/expected values, and this difference is known as bias errors or Errors due to bias. It can be defined as an inability of machine learning algorithms such as Linear Regression to capture the true relationship between the data points. BHUJBAL KNOWLEDGE CITY
Each algorithm begins with some amount of bias because bias occurs from assumptions in the model. A model has either: Low Bias: A low-bias model will make fewer assumptions about the form of the target function. High Bias: A model with a high bias makes more assumptions, and the model becomes unable to capture the important features of our dataset. A high-bias model also cannot perform well on new data. Some examples of machine learning algorithms with low bias are Decision Trees, k-nearest Neighbours, and Support Vector Machines. At the same time, an algorithm with high bias is Linear Regression, Linear Discriminant Analysis, and Logistic Regression. BHUJBAL KNOWLEDGE CITY
Variance The variance would specify the amount of variation in the prediction if the different training data was used. In simple words, variance tells that how much a random variable is different from its expected value. Ideally, a model should not vary too much from one training dataset to another, which means the algorithm should be good in understanding the hidden mapping between inputs and output variables. Variance errors are either of low variance or high variance. BHUJBAL KNOWLEDGE CITY
Low variance means there is a small variation in the prediction of the target function with changes in the training data set. At the same time, High variance shows a large variation in the prediction of the target function with changes in the training dataset. A model that shows high variance learns a lot and performs well with the training dataset, and does not generalize well with the unseen dataset. As a result, such a model gives good results with the training dataset but shows high error rates on the test dataset. A model with high variance has the following problems: A high variance model leads to overfitting. Increase model complexities. BHUJBAL KNOWLEDGE CITY
Some examples of machine learning algorithms with low variance are Linear Regression, Logistic Regression, and Linear discriminant analysis. At the same time, algorithms with high variance are decision tree, Support Vector Machine, and K-nearest neighbours. BHUJBAL KNOWLEDGE CITY
Different Combinations of Bias-Variance BHUJBAL KNOWLEDGE CITY
1. Low-Bias, Low-Variance: The combination of low bias and low variance shows an ideal machine-learning model. However, it is not possible practically. 2. Low-Bias, High-Variance: With low bias and high variance, model predictions are inconsistent and accurate on average. This case occurs when the model learns with a large number of parameters and hence leads to an overfitting BHUJBAL KNOWLEDGE CITY
3. High-Bias, Low-Variance: With High bias and low variance, predictions are consistent but inaccurate on average. This case occurs when a model does not learn well with the training dataset or uses few numbers of the parameter. It leads to underfitting problems in the model. 4. High-Bias, High-Variance: With high bias and high variance, predictions are inconsistent and also inaccurate on average. BHUJBAL KNOWLEDGE CITY
How to identify High variance or High Bias? High variance can be identified if the model has: Low training error and high test error. High Bias can be identified if the model has: High training error and the test error is almost similar to training error. BHUJBAL KNOWLEDGE CITY
Bias-Variance Trade-Off While building the machine learning model, it is really important to take care of bias and variance in order to avoid overfitting and underfitting in the model. If the model is very simple with fewer parameters, it may have low variance and high bias. Whereas, if the model has a large number of parameters, it will have high variance and low bias. So, it is required to make a balance between bias and variance errors, and this balance between the bias error and variance error is known as the Bias-Variance trade-off. BHUJBAL KNOWLEDGE CITY
For an accurate prediction of the model, algorithms need a low variance and low bias. But this is not possible because bias and variance are related to each other: If we decrease the variance, it will increase the bias. If we decrease the bias, it will increase the variance. BHUJBAL KNOWLEDGE CITY
Linear Regression Linear regression is one of the most accessible and most popular Machine Learning algorithms. It is a statistical method that is used for predictive analysis. Linear regression predicts continuous/real or numeric variables such as sales, salary, age, product price, etc. Linear regression algorithm shows a linear relationship between a dependent (y) and one or more independent (X) variables, hence called as linear regression. Since linear regression shows the linear relationship, which means it finds how the value of the dependent variable is changing according to the value of the independent variable. BHUJBAL KNOWLEDGE CITY
Mathematically, we can represent a linear regression as: y= a0+a1x+ Y= Dependent Variable (Target Variable) X= Independent Variable (predictor Variable) a0= intercept of the line a1 = Linear regression coefficient = random error The values for x and y variables are training datasets for Linear Regression model representation BHUJBAL KNOWLEDGE CITY
Types of Linear Regression Simple Linear Regression: If a single independent variable is used to predict the value of a numerical dependent variable, then such a Linear Regression algorithm is called Simple Linear Regression. Multiple Linear regression: If more than one independent variable is used to predict the value of a numerical dependent variable, then such a Linear Regression algorithm is called Multiple Linear Regression. BHUJBAL KNOWLEDGE CITY
Linear Regression Line A linear line showing the relationship between the dependent and independent variables is called a regression line. A regression line can show two types of relationship: Positive Linear Relationship: If the dependent variable increases on the Y-axis and independent variable increases on X-axis, then such a relationship is termed as a Positive linear relationship. BHUJBAL KNOWLEDGE CITY
Negative Linear Relationship: If the dependent variable decreases on the Y-axis and independent variable increases on the X- axis, then such a relationship is called a negative linear relationship. BHUJBAL KNOWLEDGE CITY
Finding the best fit line: When working with linear regression, our main goal is to find the best fit line that means the error between predicted values and actual values should be minimized. The best-fit line will have the least error. The different values for weights or the coefficient of lines (a0, a1) give a different line of regression, so we need to calculate the best values for a0and a1to find the best-fit line, so to calculate this we use the cost function. Cost function optimizes the regression coefficients or weights. It measures how a linear regression model is performing. We can use the cost function to find the accuracy of the mapping function, which maps the input variable to the output variable. This mapping function is also known as Hypothesis function. BHUJBAL KNOWLEDGE CITY
For Linear Regression, we use the Mean Squared Error (MSE) cost function, which is the average of squared error occurred between the predicted values and actual values. It can be written as: Where, N=Total number of observation Yi = Actual value (a1xi+a0)= Predicted value. BHUJBAL KNOWLEDGE CITY
Residuals: The distance between the actual value and predicted values is called residual. If the observed points are far from the regression line, then the residual will be high, and so the cost function will be high. If the scatter points are close to the regression line, then the residual will be small and hence the cost function. BHUJBAL KNOWLEDGE CITY
Gradient Descent: Gradient descent is used to minimize the MSE by calculating the gradient of the cost function. A regression model uses gradient descent to update the coefficients of the line by reducing the cost function. It is done by a random selection of values of the coefficient and then iteratively updating the values to reach the minimum cost function. BHUJBAL KNOWLEDGE CITY
Popular regression techniques Ridge and Lasso regression. Ridge Regression Ridge Regression, also known as L2 regularization, is an extension to linear Regression that introduces a regularization term to reduce model complexity and help prevent overfitting. In simple terms, Ridge Regression helps minimize the sum of the squared residuals and the parameters squared values scaled by a factor (lambda or ). BHUJBAL KNOWLEDGE CITY
The Ridge Regression can help shrink the coefficients of less significant features close to zero but not exactly zero. By doing so, it reduces the model s complexity while still preserving its interpretability. An Example of Ridge Regression Let s consider a data set with three explanatory variables: A, B, and C. You can use Ridge Regression to determine how each of these variables affects the response variable Y. Ridge Regression will add a regularization term ( ) to the equation to reduce the overall complexity of the model. The modified equation is as follows: BHUJBAL KNOWLEDGE CITY
Ridge Regression offers other benefits, such as improved generalization accuracy and reduced variance. It helps reduce model complexity while preserving interpretability and preventing overfitting. It is a useful technique for many data science problems. BHUJBAL KNOWLEDGE CITY
Lasso Regression Lasso (Least Absolute Shrinkage and Selection Operator) Regression is another regularization technique that prevents overfitting in linear Regression models. Like Ridge Regression, Lasso Regression adds a regularization term to the linear Regression objective function. The difference lies in the loss function used Lasso Regression uses L1 regularization, which aims to minimize the sum of the absolute values of coefficients multiplied by penalty factor . L1 Regularization or Lasso Regression seeks to minimize the following: BHUJBAL KNOWLEDGE CITY
Unlike Ridge Regression, Lasso Regression can force coefficients of less significant features to be exactly zero. As a result, Lasso Regression performs both regularization and feature selection simultaneously. BHUJBAL KNOWLEDGE CITY
Evaluation Metrics: MAE, RMSE, R2 Root Mean Squared Error (RMSE)and Mean Absolute Error (MAE) are metrics used to evaluate a Regression Model. These metrics tell us how accurate our predictions are and, what is the amount of deviation from the actual values. Technically, RMSE is the Root of the Mean of the Square of Errors and MAE is the Mean of Absolute value of Errors. Here, errors are the differences between the predicted values (values predicted by our regression model) and the actual values of a variable. They are calculated as follows : BHUJBAL KNOWLEDGE CITY
On close inspection, you will see that both are average of errors. Let s understand this with an example. Say, I want to predict the salary of a data scientist based on the number of years of experience. So, salary is my target variable (Y) and experience is the independent variable(X). I have some random data on X and Y and we will use Linear Regression to predict salary. Let s use pandas and sci-kit-learn for data loading and creating linear models. BHUJBAL KNOWLEDGE CITY
import pandas as pd from sklearn.linear_model import LinearRegression sal_data={"Exp":[2,2.2, 2.8, 4, 7, 8, 11, 12, 21, 25], "Salary": [7, 8, 11, 15, 22, 29, 37 ,45.7, 49, 52]} #Load data into a pandas Dataframe df=pd.DataFrame(sal_data) df.head(3) BHUJBAL KNOWLEDGE CITY
#Selecting X and y variables X=df[['Experience']] y=df.Salary #Creating a Simple Linear Regression Model to predict salaries lm=LinearRegression() lm.fit(X,y) #Prediction of salaries by the model yp=lm.predict(X) print(yp) [12.23965934 12.64846842 13.87489568 16.32775018 22.45988645 24.50393187 30.63606813 32.68011355 51.07652234 59.25270403] BHUJBAL KNOWLEDGE CITY
Plot graph of prediction BHUJBAL KNOWLEDGE CITY
From the graph above, we see that there is a gap between predicted and actual data points. Statistically, this gap/difference is called residuals and commonly called error, and is used in RMSE and MAE. Scikit-learn provides metrics library to calculate these values. BHUJBAL KNOWLEDGE CITY
import numpy as np print(f'Residuals: {y-yp}') np.sqrt(np.mean(np.square(y-yp))) #RMSE np.mean(abs(y-yp)) #MAE #RMSE/MAE computation using sklearn library from sklearn.metrics import mean_squared_error, mean_absolute_error np.sqrt(mean_squared_error(y, yp)) mean_absolute_error(y, yp) BHUJBAL KNOWLEDGE CITY
Output: 6.48 5.68 MAE is around 5.7 which seems to be higher. Now our goal is to improve this model by reducing this error. Let s run a polynomial transformation on experience (X) with the same model and see if our errors reduce. BHUJBAL KNOWLEDGE CITY
from sklearn.preprocessing import PolynomialFeatures pf=PolynomialFeatures() X_poly=pf.fit_transform(X) lm.fit(X_poly, y) yp=lm.predict(X_poly) #RMSE and MAE np.sqrt(np.mean(np.square(y-yp))) np.mean(abs(y-yp)) Output: 2.3974 1.6386 BHUJBAL KNOWLEDGE CITY
Plotted Graph BHUJBAL KNOWLEDGE CITY
There is a third metric R-squared score, usually used for regression models. This measures the amount of variation that can be explained by our model i.e. percentage of correct predictions returned by our model. It is also called the coefficient of determination and is calculated by the formula: BHUJBAL KNOWLEDGE CITY
Lets compute R2 mathematically using the formula and using sklearn library and compare the values. Both methods should give you the same result. #Calculating R-Squared manually a=sum(np.square(y-yp)) # a -> sum of square of residuals b=sum(np.square(y-np.mean(y))) # b -> total sum of sqaures r2_value = 1-(a/b) 0.979 #calculating r2 using sklearn from sklearn.metrics import r2_score print(r2_score(y, yp)) 0.979 BHUJBAL KNOWLEDGE CITY
Thus, overall we can interpret that 98% of the model predictions are correct and the variation in the errors is around 2 units. For an ideal model, RMSE/MAE=0 and R2 score = 1, and all the residual points lie on the X-axis. Achieving such a value for any business solution is almost impossible! BHUJBAL KNOWLEDGE CITY