
Advanced Model Evaluation Techniques in Psychometrics
Explore the world of psychometric model evaluation techniques such as RMSEA, Fit to Data, and 2 Ratio Test in this detailed guide by Prof. Dr. Gavin T. L. Brown. Understand how these methods help assess model fit and reliability, with insights from renowned researchers like Marsh, Hau, and Wen.
Uploaded on | 1 Views
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Fit & Estimation HSE Psychometric School August 2019 Prof. dr. Gavin T. L. Brown University of Auckland Ume University
Model Evaluation: Fit to Data Because of MLE, it is possible to evaluate the fit of the model relative to the data by comparing the distributions The chi-squared ( 2) test is the fundament of model evaluation 2test: difference between Observed (model) and Expected (Data) adjusted by number of parameters and cases (degrees of freedom) However, 2 penalises falsely falsely large N (i.e., >100) and large number of manifest variables So it is a poor test, notwithstanding vehement objections by some researchers Wheaton, B., Muth n, B., Alwin, D. F., & Summers, G. F. (1977). Assessing reliability and stability in panel models. Sociological methodology, 8, 84-136. doi:10.2307/270754
Marsh, H. W., Hau, K.-T., & Wen, Z. (2004). In search of golden rules: Comment on hypothesis-testing approaches to setting cutoff values for fit indexes and dangers in overgeneralizing Hu and Bentler's (1999) findings. Structural Equation Modeling, 11(3), 320-341. doi:10.1207/s15328007sem1103_2 2 ratio test To reduce the complexity of a model, find the ratio of 2 to df How much 2 for 1 df? Then determine the p value for that ratio https://www.fourmilab.ch/rpkp/experiments/analysis/chiCalc.html NB NB: all values <3.80 will have p>.05 Many will prefer ratio <3.00
Root Mean Square of Approximation (RMSEA) To compensate for model complexity, and to return the index of fit to the original metric of the covariance matrix RMSEA = F*/df F* = population discrepancy function The truncated estimate of RMSEA Forces the lower value to be 0 Adjusts for 2 (U), number of cases (n), and df (v-1) But this is a point in a range due to se; so 90%CI is desirable because for any observation on a statistic, there will be a sampling error Steiger: My original intention was that the RMSEA be employed in a more relaxed, heuristic fashion, both as an improvement on and a release from hypothesis testing. RMSEA .10 is an interesting model Steiger, J. H. (2000). Point estimation, hypothesis testing, and interval estimation using the RMSEA: Some comments and a reply to Hayduk and Glaser. Structural Equation Modeling, 7(2), 149-162. doi:10.1207/S15328007SEM0702_1
RMSEA pclose RMSEA is a point estimate in the middle of a range. The 90% confidence interval should be reported. The PCLOSE statistic shows whether it is probable that RMSEA is <.05; accuracy effected by sample size RMSEA RMSEA Model RMSEA LO 90 HI 90 Default model .048 .045 .051 Independence .127 .124 .129 PCLOSE .899 .000 Comparison to independence model not terribly interesting. The real question should be: Is there a better model to explain these responses than the model I have used?
Fan, X., & Sivo, S. A. (2007). Sensitivity of fit indices to model misspecification and model types. Multivariate Behavioral Research, 42(3), 509 529. doi:10.1080/00273170701382864 RMSEA: not stable RMSEA robust against sample size But mis-specification or model type not so much RMSEA is dramatically higher for the two smaller models, suggesting much worse model fit of these two smaller models compared with the previous three larger models. This confirms the finding of Kenny and McCoach that RMSEA decreases (indicating better fit) as the model becomes larger. (p. 525)
Fan, X., & Sivo, S. A. (2007). Sensitivity of fit indices to model misspecification and model types. Multivariate Behavioral Research, 42(3), 509 529. doi:10.1080/00273170701382864 Comparative Fit Index also not stable An explicit population comparative fit coefficient that evaluates the adequacy of a particular model (Mk) in relation to Mi (null model)and Ms (all paths in S S matrix) CFI and TLI tend to suggest worse model fit as the number of observed variables increased. Somewhat sensitive to different models with more >3 factors
Standardised Root Mean Residual (SRMR); also not stable an absolute measure of fit defined as the standardized difference between the observed correlation and the predicted correlation. positively biased measure and that bias is greater for small N and for low df studies; a value of zero indicates perfect fit. no penalty for model complexity; BUT sensitivity to model types
Gamma hat Stable! GAMMA includes the #of variables Fan, X., & Sivo, S. A. (2007). Sensitivity of fit indices to model misspecification and model types. Multivariate Behavioral Research, 42(3), 509 529. doi:10.1080/00273170701382864 Easy spreadsheet: http://www.education.auckland.ac.nz/en/abo ut/research/research-at-faculty/quant-dare- unit_1/tools-for-statistical-procedures.html there may be a limited number of indices (e.g., GAMMA) that are sufficiently robust to different models and only sensitive to model specification errors that further pursuit for establishing cut-off criteria for these indices may be warranted. (p. 526)
Evaluating Results: Which Fit indices & What Values? Goodness of Fit Goodness of Fit Badness of fit Badness of fit CFI RMSEA 90%CI Decision Decision pof 2/df gamma hat RMSEA SRMR .06 .06 Good/Ideal >.05 >.05 >.95 >.95 <.05 <.05 Upper <.05 Acceptable >.05 >.90 <.08 Lower <.05 <.08 Marginal >.01 .85-.89 <.10 All >.05, <.08 Reject <.01 <.85 >.10 Lower >.08 >.08 Report all of these! Non-rejection when multiple indices meet conventional standards, with greater weight put on more robust indicators
Reporting fit Bollen, K. A., & Long, S. J. (1992). Tests for structural equation models. Sociological Methods & Research, 21(2), 123-131. doi:10.1177/0049124192021002001 Consensus Consensus A. know your substantive area before assessing fit B. do not rely only on the chi-square test statistic C. report multiple fit indices D. examine the components of fit as well as the overall model fit E. estimate several plausible model structures as a means of determining the best fit.
Reporting fit Bollen, K. A., & Long, S. J. (1992). Tests for structural equation models. Sociological Methods & Research, 21(2), 123-131. doi:10.1177/0049124192021002001 Further advice Further advice when reporting multiple fit indices, choose ones that represent different families Do NOT report only incremental fit indices (e.g., TLI or GFI-type measures). Mix with other types of fit indices Use fit indices with sampling distribution means that are not or are only weakly related to the sample size. prefer fit indices that take account of the degrees of freedom of a model. Some fit indices (e.g., NFI, GFI) do not take account of how many parameters are used in a model use prior studies of the same or similar models whenever possible. model fit quality depends on the basis of comparison. In some areas, where little prior work exists, less demanding standards may be acceptable. the objective of fitting a model is to understand a substantive area, not simply to obtain an adequate fit. persistent, data-driven respecifications may produce measures of fit that are adequate by conventional standards, it is unlikely that the resulting model will add to our substantive understanding different families of measures
Would you accept this model? fit statistics N=22; k=7 2 = 9.31, df = 8, p = .32, 2 /df = 1.16, p = .28; Markov estimated p = .39 .01; CFI = .96; gamma hat = .98; RMSEA = .093, 90%CI = .000 .295, pclose = .35; SRMR = .088 Brown, G. T. L., & Marshall, J. C. (2012). The impact of training students how to write introductions for academic essays: An exploratory, longitudinal study. Assessment & Evaluation in Higher Education, 37(6), 653-670. doi:10.1080/02602938.2011.563277
Estimation Maximum likelihood (most common) The parameter values in the data set (a sample) are the most likely values in the population (not present, but to which we wish to generalise) Hence, procedure attempts to maximise the input values when estimating the solution means, standard deviations, covariances, residuals Regression weights, intercepts, Hence, it matters that the sample reflects the population and is sufficiently large that parameters are likely to apply to population Valid if response categories are defensibly continuous (i.e., 5 ordinal categories)
ML Estimation Maximum Likelihood estimation of Pearson product moment correlations defensible for ordinal rating scales of five or more response categories (Finney & DiStefano, 2006). Additional benefit: handles robustly moderate deviation from univariate normality (Curran, West, & Finch, 1996). Esp. kurtosis up to 11.00 Excessive kurtosis does not prevent analysis, it results in reduced power to reject wrong models (Foldnes, Olsson, & Foss, 2012).
Types of estimator Many different estimators depending on type of data Binary: ADF Ordinal: WLSMV Continuous: ML (MLR) NB: ML works acceptably for ordinal when response scale has >5 options
ADF Asymptote. a straight line that continually approaches a given curve but does not meet it at any finite distance. In dichotomous 0-1 scoring it is the probability of reaching 100% or 0% score (IRT) Distribution free because NOT normal, so need estimator designed for this Lavaan: "WLS": weighted least squares; for complete data only.
Threshold Threshold is the point at which the probability of switching from one ordered situation to another is 50% on the total difficulty score ( ). This is the point at which responses tend to switch from the first category to the 2nd category and so on. In a dichotomous category there is only 1 threshold. In an ordered item with more categories there is 1 less threshold than categories. Lavaan requires insertion of ordered command to do this
Sample data: UETALQ Data from Xu Yueting s PhD thesis Survey of >900 university level teachers of English in PRC 24 test questions concerning assessment literacy All MCQ, scored 0-1 Research question: do all the items belong to a factor of overall assessment literacy? Import into RStudio from SPSS Use variables q1 to q24 Published as; Xu, Y., & Brown, G. T. L. (2017). University English teacher assessment literacy: A survey-test report from China. Papers in Language Testing and Assessment, 6(1), 133-158.
WLS example syntax #import data into Rstudio library(haven) UETALQ_survey_data <- read_sav("C:/Users/gbro008/Google Drive/1995 to 2025 All PUBS/Teachers/working folders/2017 Xu & Brown TALQ/UETALQ-survey-data.sav") View(UETALQ_survey_data) #lavaan code UETALQ 24 item dichotomous test analysis #create model TALQ.model <- 'TALQ =~ q1 +q2 + q3 +q4 +q5 + q6 + q7 + q8 + q9 + q10 + q11 + q12 + q13 + q14 + q15 + q16 + q17 + q18 + q19 + q20 + q21 + q22 + q23 + q24' #cfa analysis with WLS=ADF estimator, ordered command to inform CFA that this is ordered scores TALQ_fit <-cfa (TALQ.model, data=UETALQ_survey_data, ordered=c("q1", "q2", "q3", "q4", "q5", "q6", "q7", "q8", "q9", "q10", "q11", "q12", "q13", "q14", "q15", "q16", "q17", "q18", "q19", "q20", "q21", "q22", "q23", "q24"), estimator="wls") #standard output of fit and values summary(TALQ_fit, standardized=TRUE) #detailed fit indices fitmeasures(TALQ_fit) #get a picture,standardised scores shown library(lavaanPlot) lavaanPlot(model = TALQ_fit, node_options = list(shape = "box", fontname = "Helvetica"), edge_options = list(color = "grey"), coefs = TRUE, stand=TRUE)
TALQ item loadings TALQ =~ EstimateStd.Err q1 q2 q3 0.249 q4 1.121 q5 q6 -0.074 q7 1.663 q8 0.174 q9 0.758 q10 0.924 q11 -0.578 q12 -0.372 q13 0.303 q14 0.358 q15 1.613 q16 1.058 q17 1.804 q18 0.445 q19 0.826 q20 0.664 q21 0.809 q22 1.332 q23 1.676 q24 1.354 z-value P(>|z|) Std.lv Std.all 1 0.389 0.26 0.097 0.436 0.595 -0.029 0.646 0.068 0.295 0.359 -0.224 -0.145 0.118 0.139 0.627 0.411 0.701 0.173 0.321 0.258 0.314 0.517 0.651 0.526 0.389 0.26 0.097 0.436 0.595 -0.029 0.646 0.068 0.295 0.359 -0.224 -0.145 0.118 0.139 0.627 0.411 0.701 0.173 0.321 0.258 0.314 0.517 0.651 0.526 Use Std.all column to evaluate relationship of TALQ (total score) as factor predicting item responses 1. What do the negative values suggest about the item belonging to the total score? 2. What do the values <.10 suggest about the items belonging to the total score? 3. What normally should you do about that? 0.67 0.127 0.106 0.15 0.195 0.105 0.201 0.105 0.123 0.139 0.12 0.129 0.108 0.109 0.195 0.144 0.227 0.116 0.132 0.116 0.136 0.164 0.192 0.174 5.296 2.35 7.481 7.866 -0.707 8.271 1.653 6.143 6.637 -4.81 -2.894 2.813 3.272 8.265 7.369 7.956 3.828 6.278 5.734 5.96 8.134 8.74 7.783 0 0.019 0 0 1.53 0.479 0 0.098 0 0 0 0.004 0.005 0.001 0 0 0 0 0 0 0 0 0 0
Thresholds ThresholdsEstimate q1|t1 q2|t1 q3|t1 q4|t1 q5|t1 q6|t1 q7|t1 q8|t1 q9|t1 q10|t1 q11|t1 q12|t1 q13|t1 q14|t1 q15|t1 q16|t1 q17|t1 q18|t1 q19|t1 q20|t1 q21|t1 q22|t1 q23|t1 q24|t1 Std.Err z-value -24.045 7.286 2.207 -4.652 -18.055 14.781 -22.795 14.687 -5.379 -10.085 13.041 18.359 11.646 7.561 -7.552 -3.193 -21.668 7.391 8.751 -4.324 -14.343 1.844 -11.443 -12.274 P(>|z|) Std.lv Std.all Negative values indicate the changeover from wrong to right is below the average; positive indicates value is higher than the average 0.00 -0.999 0.26 0.081 -0.167 -0.72 0.563 -0.921 0.545 -0.192 -0.375 0.482 0.724 0.44 0.268 -0.276 -0.113 -0.921 0.266 0.318 -0.158 -0.561 0.066 -0.422 -0.468 0.042 0.036 0.037 0.036 0.04 0.038 0.04 0.037 0.036 0.037 0.037 0.039 0.038 0.035 0.037 0.035 0.043 0.036 0.036 0.036 0.039 0.036 0.037 0.038 0 0 -0.999 0.26 0.081 -0.167 -0.72 0.563 -0.921 0.545 -0.192 -0.375 0.482 0.724 0.44 0.268 -0.276 -0.113 -0.921 0.266 0.318 -0.158 -0.561 0.066 -0.422 -0.468 -0.999 0.26 0.081 -0.167 -0.72 0.563 -0.921 0.545 -0.192 -0.375 0.482 0.724 0.44 0.268 -0.276 -0.113 -0.921 0.266 0.318 -0.158 -0.561 0.066 -0.422 -0.468 0.027 0 0 0 0 0 0 0 0 0 0 0 0 0.001 0 0 0 0 0 0.065 0 0
fit npar 48 fmin 0.33 chisq 587.701 baseline.pval ue 0 nfi 0.549 rmsea.ci.low er 0.035 df 252 pvalue 0 Comprehension Comprehension 1. Is this fit good? 2. Which values do you put greater weight on? 3. What values would you want to calculate? 4. What action might you take to improve fit? baseline.chis q 1301.874 nnfi 0.642 baseline.df 276 rfi 0.506 cfi tli 0.673 pnfi 0.501 0.642 ifi 0.68 rni rmsea 0.039 rmr_nomea n 0.07 crmr_nome an 0.073 gfi 0.888 rmsea.ci.upper 0.043 rmsea.pvalue 1 srmr_bentler_nom ean 0.07 0.673 rmr srmr 0.07 srmr_bentler 0.067 srmr_mplus_nom ean 0.07 pgfi 0.746 0.067 crmr srmr_mplus 0.067 agfi 0.866 cn_05 440.212 mfi 0.828 0.07 cn_01 466.136
UETALQ image with standardized loadings Are the values the same as the loadings? Which items might you want to remove in a trimming process?
Multiple ordered categories
Ordered responses Attitude surveys tend to use ordered response scales (Likert-type) Capture points on a continuum Disagree Neutral Agree But the assumption of continuity should not be automatic; few options might not be continuous Need an estimator for this . Remember there is a threshold between each response category.
Thresholds of ordered response categories Just as there is a threshold in a binary response or score, so there are multiple thresholds in a multiple response category scale when analysed with WLSMV Ideally response categories are ordered appropriately, Equally high Equally separated Example: Andrich s rating-scale model analysis of positively packed scale Deneen, C., Brown, G. T. L., Bond, T. G., & Shroff, R. (2013). Understanding outcome-based education changes in teacher education: evaluation of a new instrument with preliminary findings. Asia-Pacific Journal of Teacher Education, 41(4), 441- 456. 10.1080/1359866X.2013.787392
Weighted Least Squares Means & Variances robust weighted least squares (WLS) estimator = WLSMV (means & variances) no distributional assumptions about observed variables; no assumption that points are on a continuity a normal latent distribution underlying each observed categorical variable is assumed seems to work well if sample size is 200 or better Fit estimators as before experimental Weighted Root Mean Square Residual (WRMR) <1.00 good fit; but not recommended as sole criterion
Lavaan WLSMV When ordered command used, lavaan automatically switches to the WLSMV estimator: thresholds and polychoric correlations are first estimated using two-step ML estimation through bivariate contingency tables based on the full weight matrix parameter estimates and standard errors are obtained using the estimated asymptotic covariance matrix of the polychoric correlation and threshold estimate to minimize the weighted least squares fit function FWLS a mathematically simple form of the WLS estimator, only incorporates diagonal elements of the full weight matrix [diagonally weighted least squares (DWLS)] in the fit function to prevent software from engaging in extensive computations and encountering numerical problems in model estimation.
Data: SCoA6 Higher Education Brazil- New Zealand This data is the same SCoA6 as before; item si4 dropped because Brazil researcher didn t want it. 8 factors, 32 items; N=1014, NZ=321, Brazil=693 Response options are 6 from strongly disagree to strongly agree A good question to ask: IS the WLSMV estimation more accurate than the conventional ML estimation? Import SCoA Brazil_NZ WLSMV.sav
Lavaan syntax #install required libraries library(lavaan) library(semPlot) #to create 8 factor SCoA model, nb no sf4 in Brazil data SCoA6.modelwls <- ' Bad =~ bd1 + bd2 + bd3 + bd4 + bd5 CE =~ ce1 + ce2 + ce3 + ce4 + ce5 + ce6 IG =~ ig1 + ig2 + ig3 PE =~ pe1 + pe2 SF =~ sf1 + sf2 + sf3 SI =~ si1 + si2 + si3 + si4 + si5 SQ =~ sq1 + sq2 TI =~ ti1 + ti2 + ti3 + ti4 + ti5 + ti6 ' #cfa analysis with WLSmv estimator, ordered command to inform CFA that this is ordered scores SCoAwls_fit <-cfa (SCoA6.modelwls, data=SCoA_Brazil_NZ_WLSMV, ordered=c("bd1", "bd2", "bd3", "bd4", "bd5", "ce1", "ce2", "ce3", "ce4", "ce5", "ce6", "ig1", "ig2", "ig3", "pe1", "pe2", "sf1", "sf2", "sf3", "si1", "si2", "si3", "si4", "si5", "sq1", "sq2", "ti1", "ti2", "ti3", "ti4", "ti5", "ti6")) #get results of cfa summary (SCoAwls_fit, standardized=TRUE) #detailed fit indices fitmeasures(SCoAwls_fit) #to get conventional correlated factors diagram semPaths(SCoAwls_fit, intercept = FALSE, whatLabel = "std", residuals = FALSE, exoCov = TRUE)
Standard output What Info is not shown that might make you more confident about this model? lavaan 0.6-4 ended normally after 60 iterations Optimization method NLMINB Number of free parameters 220 Number of observations 1014 Estimator Model Fit Test Statistic 2491.169 Degrees of freedom 436 P-value (Chi-square) Scaling correction factor 0.882 Shift parameter DWLS Robust 3003.519 436 0.000 0.000 177.920 for simple second-order correction (Mplus variant)
Estimate Std.Err z-value P(>|z|) Std.lv Std.all Estimate Std.Err z-value P(>|z|) Std.lv Std.all Factor/item Bad =~ bd1 bd2 bd3 bd4 bd5 CE =~ ce1 ce2 ce3 ce4 ce5 ce6 IG =~ ig1 ig2 ig3 PE =~ pe1 pe2 SF =~ sf1 sf2 sf3 SI =~ si1 si2 si3 si4 si5 SQ =~ sq1 sq2 TI =~ ti1 ti2 ti3 ti4 ti5 ti6 Covariances: Bad ~~ CE IG PE SF SI SQ TI CE ~~ IG PE SF SI SQ TI IG ~~ PE SF SI SQ TI PE ~~ SF SI SQ TI SF ~~ SI SQ TI SI ~~ SQ TI SQ ~~ TI 1 0.797 0.729 0.486 0.478 0.181 0.797 0.729 0.486 0.478 0.181 -0.204 0.442 -0.329 -0.286 -0.275 -0.265 -0.395 0.02 0.026 0.025 0.023 0.022 0.024 0.021 -9.986 17.247 -13.419 -12.658 -12.608 -10.999 -18.696 0 0 0 0 0 0 0 -0.356 0.754 -0.511 -0.482 -0.471 -0.4 -0.65 -0.356 0.754 -0.511 -0.482 -0.471 -0.4 -0.65 0.915 0.61 0.6 0.227 0.048 18.981 0.047 0.043 13.999 0.046 0 0 0 0 Loadings & covariances 12.95 4.917 1 0.72 0.803 0.847 0.758 0.791 0.815 0.72 0.803 0.847 0.758 0.791 0.815 1.115 1.177 1.053 1.099 1.133 0.035 32.166 0.034 34.505 0.032 32.614 0.032 34.713 0.033 34.483 0 0 0 0 0 -0.14 0.338 0.319 0.23 0.317 0.349 0.02 0.02 0.019 0.018 0.019 0.018 -6.994 16.89 16.782 13.052 16.259 19.254 0 0 0 0 0 0 -0.264 0.582 0.595 0.436 0.53 0.634 -0.264 0.582 0.595 0.436 0.53 0.634 Any concerns? 1 0.736 0.62 0.644 0.736 0.62 0.644 0.841 0.874 0.057 0.062 14.121 14.65 0 0 -0.255 -0.239 -0.357 -0.213 -0.287 0.023 0.023 0.022 0.025 0.021 -11.179 -10.524 -15.927 -8.653 -13.867 0 0 0 0 0 -0.429 -0.436 -0.661 -0.347 -0.51 -0.429 -0.436 -0.661 -0.347 -0.51 1 0.807 0.753 0.807 0.753 0.933 0.033 28.172 0 1 0.746 0.717 0.692 0.746 0.717 0.692 0.962 0.928 0.031 30.639 0.037 25.085 0 0 0.436 0.32 0.358 0.446 0.021 0.021 0.022 0.019 20.84 15.477 15.93 24.054 0 0 0 0 0.724 0.541 0.533 0.724 0.724 0.541 0.533 0.724 1 0.733 0.631 0.761 0.81 0.793 0.733 0.631 0.761 0.81 0.793 0.861 1.038 1.106 1.082 0.039 22.229 0.037 27.704 0.035 31.245 0.035 30.665 0 0 0 0 0.335 0.452 0.474 0.02 0.021 0.019 16.719 21.666 25.151 0 0 0 0.614 0.73 0.832 0.614 0.73 0.832 1 0.832 0.779 0.832 0.779 0.32 0.38 0.021 0.019 15.093 19.838 0 0 0.525 0.679 0.525 0.679 0.936 0.029 32.847 0 1 0.764 0.585 0.711 0.81 0.854 0.775 0.764 0.585 0.711 0.81 0.854 0.775 0.469 0.018 25.453 0 0.738 0.738 0.766 0.931 1.06 1.119 1.015 0.029 26.774 0.027 34.629 0.026 40.977 0.025 45.124 0.025 40.763 0 0 0 0 0
Thresholds (BD only) ThresholdsEstimate bd1|t1 bd1|t2 bd1|t3 bd1|t4 bd1|t5 bd2|t1 bd2|t2 bd2|t3 bd2|t4 bd2|t5 bd3|t1 bd3|t2 bd3|t3 bd3|t4 bd3|t5 bd4|t1 bd4|t2 bd4|t3 bd4|t4 bd4|t5 bd5|t1 bd5|t2 bd5|t3 bd5|t4 bd5|t5 Std.Err z-value -3.828 16.862 23.565 24.826 22.729 -18.909 -1.444 12.922 21.389 24.81 -24.431 -10.76 3.514 14.33 23.605 -24.056 -3.891 11.874 20.37 24.824 -19.309 -5.332 6.522 17.099 24.509 P(>|z|) Std.lv -0.151 0.733 1.24 1.57 2.039 -0.852 -0.057 0.536 1.024 1.614 -1.386 -0.439 0.139 0.603 1.245 -1.313 -0.154 0.488 0.948 1.587 -0.877 -0.212 0.26 0.746 1.406 Std.all -0.151 0.733 1.24 1.57 2.039 -0.852 -0.057 0.536 1.024 1.614 -1.386 -0.439 0.139 0.603 1.245 -1.313 -0.154 0.488 0.948 1.587 -0.877 -0.212 0.26 0.746 1.406 -0.151 0.733 1.24 1.57 2.039 -0.852 -0.057 0.536 1.024 1.614 -1.386 -0.439 0.139 0.603 1.245 -1.313 -0.154 0.488 0.948 1.587 -0.877 -0.212 0.26 0.746 1.406 0.04 0.043 0.053 0.063 0.09 0.045 0.039 0.042 0.048 0.065 0.057 0.041 0.04 0.042 0.053 0.055 0.04 0.041 0.047 0.064 0.045 0.04 0.04 0.044 0.057 0 0 0 0 0 0 Are all the thresholds in the correct order? The estimates should be ordered such that t1<t2<t3<t4<t5 The lower the value the easier it is to shift from the previous response to the next response These are all for bad so we should expect that values will be relatively high if people are inclined to disagree that assessment is bad. 0.149 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Detailed fit npar npar df df df.scaled df.scaled baseline.chisq baseline.chisq baseline.chisq.scaled baseline.chisq.scaled baseline.chisq.scaling.factor baseline.chisq.scaling.factor nnfi nnfi pnfi pnfi cfi.scaled cfi.scaled tli.robust tli.robust rfi.scaled rfi.scaled rni.scaled rni.scaled rmsea.ci.lower rmsea.ci.lower rmsea.scaled rmsea.scaled rmsea.pvalue.scaled rmsea.pvalue.scaled rmsea.ci.upper.robust rmsea.ci.upper.robust rmr_nomean rmr_nomean srmr_bentler_nomean srmr_bentler_nomean srmr_mplus srmr_mplus cn_01 cn_01 pgfi pgfi 220 436 436 fmin fmin pvalue pvalue pvalue.scaled pvalue.scaled baseline.df baseline.df.scaled cfi cfi rfi rfi ifi ifi tli.scaled tli.scaled nnfi.scaled nnfi.scaled nfi.scaled nfi.scaled rni.robust rni.robust rmsea.ci.upper rmsea.ci.upper rmsea.ci.lower.scaled rmsea.ci.lower.scaled rmsea.robust rmsea.robust rmsea.pvalue.robust rmsea.pvalue.robust srmr srmr crmr crmr srmr_mplus_nomean srmr_mplus_nomean 0.056 cn_05 gfi gfi mfi mfi 1.228 chisq 0 0 496 baseline.pvalue baseline.pvalue 496 baseline.pvalue.scaled baseline.pvalue.scaled 0.981 tli tli 0.974 nfi nfi 0.981 rni rni 0.899 cfi.robust cfi.robust 0.899 nnfi.robust nnfi.robust 0.897 ifi.scaled ifi.scaled NA rmsea rmsea 0.071 rmsea.pvalue rmsea.pvalue 0.074 rmsea.ci.upper.scaled rmsea.ci.upper.scaled NA rmsea.ci.lower.robust rmsea.ci.lower.robust NA rmr rmr 0.056 srmr_bentler srmr_bentler 0.056 crmr_nomean crmr_nomean cn_05 0.985 agfi agfi 0.363 chisq chisq.scaled chisq.scaled chisq.scaling.factor chisq.scaling.factor 2491.169 3003.519 0.882 0 0 0.978 0.977 0.981 NA NA 0.911 0.068 0 0.079 NA 0.054 0.054 0.058 198.496 0.977 So is it good, acceptable, interesting, or outright rejection? 108944.231 baseline.df 29274.968 baseline.df.scaled 3.768 0.978 0.859 0.911 NA 0.883 0.911 0.066 0.076 0 NA 0.056 0.056 0.054 207.418 0.654
Look at it semPlot creation
Look at it lavaanPlot creation
Further work Brazil Brazil- -NZ SCoA data NZ SCoA data WLSMV solution was ok. Would it be better if run as MLE? One item had weak standardized loading. Is the model noticeably better without it? Check the thresholds for all items. Are they all ordered appropriately? The threshold for t2 in bd2 was statistically not significant. Is the same response threshold have any more not stat sig values? If so what does this suggest about the response scale?