
Understanding Generalized Linear Models
Explore the application and benefits of Generalized Linear Models in insurance, data analysis, and regression. Learn about Simple Linear Regression, assumptions, data visualization, and mapping techniques. Discover why insurers prefer GLMs and their efficiency in data processing and interpretation.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Generalized Linear Models Theory vs. Practice Hannah Kaufmann Patryk Wiech Nathan Schuele March 28, 2019
Presentation Outline Why use Generalized Linear Models? Introduction to Simple Linear Regression Data Processing What is a Generalized Linear Model? Choosing a Model Validation Interpreting Results 1
Why do Insurers use GLMs? Efficiency Account for the relationship between variables Flexibility based on overall purpose Error statistics Accessible software 2
Simple Linear Regression Explores the relationship between a quantitative response variable (Y) and one explanatory variable (X) Limited to a single pair of a response and an explanatory variable represents the true mean of the response variable y given the data from the explanatory variable x ??|?= ?0+ ?1? + 3
Linear Regression Assumptions Random Components: each component of the response vector (?) is normally distributed and all share a common constant variance ?2 Systematic Components: p covariates combined to give the linear predictor ? such that: ? = ? ? Link Function: the identity function such that: ? ? ? = ? 4
One-Way Charts One-Way charts are a good way to begin exploring your data Analyze reliability of data Loss ratios Exposure distribution Correlation issues are difficult to detect Helps with selecting a reference level and mapping data 6
Mapping Your Data Data can either be continuous or categorical Continuous data can be transformed to categorical data In general we want to group similar data levels within the same variable Group miles driven, Ages, etc. If we map data to too small of a group, results can be unreliable and non predictive Too many degrees of freedom may cause model to not converge If we map data to too large of a group, may miss an important factor 8
Selecting A Reference Level Choose a reference level that helps with the interpretability of your model In general we select the level with the most exposure to be the reference level This is done so that the significance statistics produce meaningful p values Reference level with too little data will produce less significant p values than one with more 9
Why not Linear Regression? Ignores any interdependencies the variables have Assumes the response variable is normally distributed, has a constant variance, and that all predictors are entered additively Why Generalized Linear Models? Considers interdependencies Allows for multivariate analysis 10
GLM Assumptions Random Component: each component of the response variable vector (?) is independent and is from one of the distributions in the exponential family Systematic Components: p covariates combined to give the linear predictor ? such that: ? = ? ? Note: unchanged from assumption 2 on simple regression Link Function: the relationship between the random and systematic components via a link function g, that is differentiable and monotonic such that ? ? ? = ? 1(?) 11
Basics of a GLM The standard form of a Generalized Linear Model ?? is the response vector ?(?) is the link function ???is the design matrix ?? is the vector of parameters ?? is the vector of offsets ? is the parameter of ?(?) ?(?) is the variance function (?2) ? is the prior weight ??= E ?? = ? 1( ??? ??+ ??) ? With ??? ?? =? ?(??) ? 12
Exponential Family of Distributions Includes: Normal Poisson Binomial Gamma Inverse Gaussian Variance function depends on the distribution chosen Most distributions have a strictly increasing variance More risky policyholders are expected to have higher variance 13
Choosing a Model Selecting a model is more of an art than a science As models are built and results are analyzed, the model is likely to evolve Choosing target and predictor variables Choosing distribution for the target variable Best form of predictor variables Which variables to include 14
Choosing a Model Compare Measures of Fit Non Penalized Log-Likelihood Deviance Penalized AIC BIC Analyzing Residuals They follow no predictable pattern Normally distributed with constant variance (Homoscedastic) Any deviation can indicate underlying distribution is incorrect 15
Bias vs. Variance Trade-Off Bias: Expected Prediction Correct Value Pay little attention to the training data, which leads to high error in both the training data and test data Variance: variability within the data Pays lots of attention to the training data which leads to overfitting in the test data 16
P-Values An estimate of the probability of a value of which the magnitude (or higher) arises by pure chance Example: ? ?0 1.5 = ??= 0.0012 Leads to the result that ?0 is significant, or that it is likely that ?0 0 Example: ? ?0 1.5 = ??= 0.52 Leads to the result that ?0 is insignificant, or that it is likely that ?0= 0 The effect of ?0 may be present in the data set, but would need to be seen from a macro-level 17
Simple Quantile Plots Validation technique to compare the actual results with the predicted results Judgment to determine best model Predictive accuracy, monotonicity, vertical separation 19
Analyzing Results Simplifying models Smoothing results post model Outside of model Rerunning model Communicating results to clients Business considerations Reliability of results How feasible is it to implement the rates? 20
Thank You for Your Attention Hannah Kaufmann (309) 807 2304 hkaufmann@pinnacleactuaries.com Patryk Wiech (678) 894 7263 pwiech@pinnacleactuaries.com Nathan Schuele (217) 278 9132 npschu1@ilstu.edu Commitment Beyond Numbers 21