Published in

Journal of Rheumatology, The Journal of Rheumatology, p. jrheum.2022-1109, 2023

DOI: 10.3899/jrheum.2022-1109

Links

Tools

Export citation

Search in Google Scholar

Avoiding Blunders When Analyzing Correlated Data, Clustered Data or Repeated Measures

Distributing this paper is prohibited by the publisher
Distributing this paper is prohibited by the publisher

Full text: Unavailable

Red circle
Preprint: archiving forbidden
Orange circle
Postprint: archiving restricted
Red circle
Published version: archiving forbidden
Data provided by SHERPA/RoMEO

Abstract

Rheumatology research often involves correlated and clustered data. A common error when analyzing these data occurs when instead we treat these data as independent observations. This can lead to incorrect statistical inference. The data used is a subset of the study from Raheel (2017) consisting of 633 RA patients between 1988 and 2007. RA flare and number of swollen joints served as our binary and continuous outcomes, respectively. Generalized linear models (GLM) were fitted for each while adjusting for RF positivity and sex. Additionally, a generalized linear mixed model (GLMM) with random intercept and a generalized estimating equation (GEE) were used to model RA flare and number of swollen joints, respectively, to take additional correlation into account. The GLM's β coefficients and their 95% confidence intervals are then compared to their mixed-effects equivalents. The β coefficients compared between methodologies are very similar. However, their standard errors increase when correlation is accounted for. As a result, if the additional correlations are not taken into consideration, the standard error can be underestimated. This results in an overestimated effect size, narrower confidence intervals, increased type I error and a smaller p-value, thus potentially producing misleading results. It is important to model the additional correlation that occurs in correlated data.