Introduction

Insulin sensitivity is generally measured by euglycaemic–hyperinsulinaemic glucose clamp [1], insulin suppression test [2], or minimal model analysis of glucose and insulin patterns from the frequently sampled IVGTT (FSIGT) [3]. Due to the complexities and expense of these tests, a wide variety of surrogate measures have been proposed and used in large-scale studies. These surrogate measures generally use glucose and insulin levels in the fasting state or following an oral glucose challenge to estimate insulin sensitivity. Most surrogates were derived empirically and then validated against measures from more accurate but complex tests using cross-sectional correlations. Whether surrogates validated in this way remain robust in other research settings has not been fully addressed. Genetic studies have demonstrated different genetic contributions to insulin resistance between direct and surrogates measures [4, 5]. In this report, we compare the longitudinal performance of two commonly used surrogates, updated HOMA of insulin sensitivity (HOMA2-%S) [6, 7] and the Matsuda index [8], with minimal model-based estimates of insulin sensitivity (SI) from FSIGTs using data from two independent longitudinal studies. We used HOMA2-%S as recommended by the developers of HOMA for comparison between HOMA and other approaches to estimate insulin sensitivity. Since measurement uncertainty (error) presents in both surrogates and the more complex measures, measurement errors will attenuate correlation coefficient estimates. We conducted simulation studies to investigate the impact of error propagation on the differences in correlation coefficients between longitudinal and cross-sectional settings.

Methods

Participants

Data from two longitudinal studies were used, one under natural observation (BetaGene) [9] and one in response to treatment (PIPOD) [10]. Briefly, the BetaGene study included Mexican-American adults who were women with recent gestational diabetes mellitus (GDM) and their siblings and/or cousins, or women with normal glucose levels in pregnancy. Data from 338 individuals who had an OGTT and FSIGT at baseline and a median of 4.1 years later were used for this report. The PIPOD study included women with prior GDM who were treated with open label pioglitazone. Data from 97 women who had an OGTT and FSIGT at baseline and 1 year later were included in this report. Insulin-modified FSIGTs were performed in the BetaGene study, and tolbutamide-modified FSIGTs were performed in the PIPOD study. All participants gave written informed consent for participation in the studies, which were approved by the institutional review boards of participating institutions.

Data analysis

FSIGT results were analysed using the MINMOD program to estimate SI [11]. HOMA2-%S was calculated using the updated version of the HOMA calculator [6] (www.dtu.ox.ac.uk/homacalculator/index.php) using fasting glucose and insulin from OGTTs. The Matsuda index of insulin sensitivity was calculated as described previously [8].

Correlation coefficients between surrogates and SI were reported from the Pearson coefficients calculated on natural log-transformed data for cross-sectional correlations, and changes in natural log-transformed data for longitudinal correlations. The equality of the baseline correlation to the longitudinal correlation for each pair of insulin sensitivity measures was tested using Fisher’s z test. In addition, changes in surrogates and SI were dichotomised as increasing values vs no change or falling values over time, and Kappa coefficients were calculated.

Measurement uncertainty was expressed as the proportion of total variance that was accounted for by the within-subject repeated measures variance, estimated from FSIGTs performed 3 months apart without any interventions. These measurements were made in a separate cohort of 109 Hispanic women [12] with similar characteristics to those of the BetaGene and PIPOD samples. The estimated measurement errors were 28% for log(HOMA2-%S) and 15% for log(SI). Measurement error for the Matsuda index was not calculated because OGTTs were not performed at 3 month intervals in the separate cohort. Means and SDs of baseline log(HOMA2-%S) and log(SI), and changes in log(HOMA2-%S) and log(SI) from the BetaGene cohort were used to generate the bivariate normal data for both baseline and change, assuming various true correlations for baseline and change in the simulations studies. Random measurement errors of 28% for log(HOMA2-%S) and 15% for log(SI) were then added to both baseline and follow-up in the simulated data. Sample size for the simulated data was set at 340 to mimic the size of the BetaGene cohort and repeated 1,000 times. Pearson correlations were calculated from each of the simulated data sets and averaged across the 1,000 replications. Calculated average correlations from the simulated data were compared with the correlations from the BetaGene cohort. SAS version 9.2 (SAS Institute, Cary, NC, USA) was used for all statistical analyses and simulations. A p value of <0.05 was considered statistically significant.

Results

The BetaGene sample included both males and females with wider age (18–66 vs 25–54 years) and BMI (17.1–52.9 vs 21.3–47.8 kg/m2) ranges compared with PIPOD (see electronic supplementary material [ESM] text and ESM Table 1). The PIPOD sample was slightly more obese and had worse average glucose and insulin sensitivity compared with the BetaGene sample. Baseline and changes in all three insulin sensitivity indices covered wide ranges; median changes in all three indices were negative in the BetaGene sample but positive in the PIPOD sample, consistent with the different study designs.

Cross-sectional and longitudinal correlations

At baseline, the cross-sectional correlations between SI and HOMA2-%S were 0.69 for BetaGene and 0.61 for PIPOD; the correlations between SI and Matsuda were 0.71 for BetaGene and 0.66 for PIPOD. Correlations of similar magnitude and direction were obtained using cross-sectional data at follow-up (Table 1). However, correlations calculated using changes in insulin sensitivity were 27%–49% lower than the analogous correlations made with cross-sectional correlations (Table 1). For SI vs HOMA2-%S, correlations for change were 0.35 for BetaGene and 0.39 for PIPOD, which were significantly lower than the baseline correlations of 0.69 for BetaGene (p < 0.0001) and 0.61 for PIPOD (p = 0.02). Likewise, correlations for change between SI and Matsuda index were 0.40 for BetaGene and 0.48 for PIPOD, which were significantly lower than the baseline correlations of 0.71 for BetaGene (p < 0.0001) and 0.66 for PIPOD (p = 0.03). Scatter plots of baseline and change data appear in Fig. 1. Kappa coefficients, which assess agreement for dichotomised changes between surrogates and SI (increasing vs no change or falling; 1.0 = perfect concordance) were also low, in the range 0.17–0.32 (Table 1).

Table 1 Cross-sectional and longitudinal correlations between FSIGT SI and surrogate measuresa
Fig. 1
figure 1

Scatter plots of baseline FSIGT SI and change in SI vs HOMA2-%S (a, b), and in SI vs Matsuda index (c, d) for data from the BetaGene (blue diamond, n = 338) and PIPOD (pink triangle, n = 97) studies. Data were in natural log scale for baseline and change in natural log scale for change

Were reduced correlations explained by the measurement error?

Details of simulations to address this question appear in ESM text and ESM Table 2. Briefly, measurement errors of 15% for log(SI) and 28% for log(HOMA2-%S) should have caused the correlation coefficient between changes in these two variables to fall from baseline only slightly, i.e. from 0.69 to 0.56 in the BetaGene study. In fact, the observed reduction was much larger, from 0.69 to 0.35 (Table 1). This finding indicates that the lower correlation coefficient for change compared with that at baseline was not explained by propagation of measurement error alone.

Discussion

Using data from two independent longitudinal studies we showed that HOMA2-%S and the Matsuda index had much lower correlations with FSIGT SI when assessing longitudinal changes in insulin sensitivity than in cross-sectional settings. The results were consistent whether insulin-modified or tolbutamide-modified FSIGTs were used. Agreement between surrogates and SI assessed on the dichotomous scale of change was also low. The reduced correlations were not explained by measurement uncertainty, suggesting true lower validity of the surrogates against FSIGT-derived insulin sensitivity in longitudinal settings.

We are aware of two previous follow-up studies, both relatively small and of short duration, that examined change in QUICKI and change in insulin sensitivity by euglycaemic–hyperinsulinaemic clamps in patients with type 2 diabetes [13] or hypertension [14]. They concluded that QUICKI is useful; close examination of their results, however, indicates that the mean change was much less in QUICKI than in the clamp insulin sensitivity (8% vs 38%) [13], and the correlation between QUICKI and the clamp was considerably less for change than for baseline (r = 0.42 vs 0.60 [13] and r = 0.61 vs 0.82 [14]). Thus, these data appear to support our conclusion that surrogates may perform relatively poorly for change in contrast to their generally good performance in cross-sectional studies. Surrogates have not been shown to be superior to simple measures of fasting glucose and insulin in cross-sectional settings [1517], and their performance may be race- and sex-dependent [17].

The strengths of this study were (a) its use of two independent, relatively large and long-term follow-up studies, one with natural observation and one with an insulin-sensitising drug, and (b) the evaluation of the impact of measurement uncertainty. Variability was large in all measures, whether considered cross-sectionally or as change; thus, the impact of data range restriction on the correlation is minimal. Limitations were the inclusion of only two surrogates and the inclusion of only Mexican-American study participants. Our results need to be confirmed for other ethnic groups. The performance of other indices under longitudinal settings should also be the subject of investigation.

In conclusion, our results suggest that HOMA and the Matsuda index may capture different components of changes in insulin sensitivity compared with SI. It remains to be determined which, if either, of these measures is a better indicator of important biological variables. Until that is known, we suggest caution in applying surrogate measures to studies with longitudinal designs.