Measuring bias in self-reported data.
Journal: 2017/February - International journal of behavioural & healthcare research
ISSN: 1755-3539
Abstract:
Response bias shows up in many fields of behavioural and healthcare research where self-reported data are used. We demonstrate how to use stochastic frontier estimation (SFE) to identify response bias and its covariates. In our application to a family intervention, we examine the effects of participant demographics on response bias before and after participation; gender and race/ethnicity are related to magnitude of bias and to changes in bias across time, and bias is lower at post-test than at pre-test. We discuss how SFE may be used to address the problem of 'response shift bias' - that is, a shift in metric from before to after an intervention which is caused by the intervention itself and may lead to underestimates of programme effects.
Relations:
Content
Citations
(42)
References
(5)
Affiliates
(3)
Similar articles
Articles by the same authors
Discussion board
Int J Behav Healthc Res 2(4): 320-332

Measuring bias in self-reported data

1 Introduction

In this paper, we demonstrate the potential of a common econometric tool, stochastic frontier estimation (SFE), to measure response bias and its covariates in self-reported data. We illustrate the approach using self-reported measures of parenting behaviours before and after a family intervention. We demonstrate that in addition to affecting targeted behaviours, an intervention may also affect any bias associated with self-assessment of those behaviours. We show that SFE can be used to identify and correct for bias in self-assessment both before and after treatment, resulting in more accurate estimates of treatment effects.

Response bias is a widely discussed phenomenon in behavioural and healthcare research where self-reported data are used; it occurs when individuals offer self-assessed measures of some phenomenon. There are many reasons individuals might offer biased estimates of self-assessed behaviour, ranging from a misunderstanding of what a proper measurement is to social-desirability bias, where the respondent wants to ‘look good’ in the survey, even if the survey is anonymous. Response bias itself can be problematic in programme evaluation and research, but is especially troublesome when it causes a recalibration of bias after an intervention. Recalibration of standards can cause a particular type of measurement bias known as ‘response-shift bias’ (Howard, 1980). Response-shift bias occurs when a respondent's frame of reference changes across measurement points, especially if the changed frame of reference is a function of treatment or intervention, thus, confounding the treatment effect with bias recalibration. More specifically, an intervention may change respondents’ understanding or awareness of the target concept and the estimation of their level of functioning with respect to the concept (Sprangers and Hoogstraten, 1989), thus changing the bias at each measurement point. In fact, some treatments or interventions are intended to change how respondents look at the target concept. Further complicating matters is that an intervention may affect not only a respondent's metric for targeted behaviours across time points (resulting in response shift bias) but may also affect other types of response bias. For example, social desirability bias may decrease over the course of an intervention as respondents come to know and trust a service provider. Thus, it is necessary to understand the degree and type of response bias at both pretest and posttest in order to determine whether response shift has occurred.

When there is a potential for confusing bias recalibration with treatment outcomes, statistical approaches may be useful (Schwartz and Sprangers, 1999). In recent years, researchers have applied structural equation modelling (SEM) to the problem of decomposing error in order to identify response shift bias (Oort, 2005; Oort et al., 2005). In this paper, we suggest a different statistical approach which reveals response bias at a single time point as well as differences in bias across time points. Perhaps more importantly, it identifies covariates of these differences. When applied before and after an intervention, it reveals differences related to changes in respondents’ frame of reference. Thus, it can be used to decompose errors so that recalibration of the bias occurring across time points can be distinguished from simple response bias within each time point. The suggested approach is based on SFE (Aigner et al., 1977; Battese and Coelli, 1995; Meeusen and van den Broeck, 1977), a technique widely used in economics and operational research.

Our approach has two significant advantages over that proposed by Oort et al. (2005). Their approach reveals only aggregate changes in the responses and requires a minimum of two temporal sets of observations on the self-rating of interest as well as multiple measures of the item to be rated. SFE, to its credit, can identify response differences across individuals (as opposed to simply aggregate response shifts) with a single temporal observation and a single measure, so is much less data intensive. Moreover, since it identifies differences at the individual level, it allows the analyst to identify not only that responses differ by individual, but what characteristics are at the root of the differences. Thus, as long as more than one temporal observation is available for respondents, SFE can be used to systematically identify different types of response recalibration by looking at the changes at the individual level, and aggregating them. SFE again has an advantage because the causes of both bias and recalibration can be identified at the individual level.

What may superficially be seen as two disadvantages to SFE when compared to SEM approaches are actually common to both methods. First, both measure response (and therefore response shift) against a common subjective metric established by the norm of the data. In fact, any systematic difference by an individual from this norm is how we measure ‘response bias’. With both SEM and SFE, if an objective metric exists, the difference between the self-rating and the objective measure is easily established. A second apparent disadvantage is that SFE requires a specific assumption of a truncated distribution of the bias (although it is possible to test this assumption statistically). While SEM can reveal response shift on individual bias without such a strong assumption, aggregate changes become manifest only if “many respondents experience the same shift in the same direction” [Oort, (2005), p.595]. Hence, operationally the assumptions are nearly equivalent.

In next section, we explain how we model response bias and response recalibration within the SFE framework. In Section 3, we present our empirical application including the results of our baseline model and a model with heteroscedastic errors as a robustness check. In Section 4, we discuss the relative merits of the method we propose, together with its limitations and offer some conclusions.

2 Response bias and SFE

We are concerned with situations where individuals do not have an objective measure of some variable of interest which we denote Y*it, and we have to use a subjective measure (denoted Yit) as a proxy instead. An unbiased estimate of the variable of interest Y*it can be defined as,

YitYit,Zit=YitYit
(1)

where Yit denotes the observed measurement, Y*it is the true attribute being measured and Zit represents variables other than Y*it. When Yit is self-reported Zit includes (often unobserved) variables affecting the frame of reference used by respondents for measuring Y*it and (1) is not assured. Within this context, response bias is simply the case that Yit | Y*it, ZitYit | Y*it. The bias is upward if Yit | Y*it, Zit > Yit | Y*it and downward if the inequality goes the other way.

Our approach for measuring response bias and bias recalibration (change in response bias between two time periods) is based on the Battese and Coelli (1995) adaptation of the stochastic frontier model (SFE) independently proposed by Aigner et al. (1977), and Meeusen and van den Broeck (1977). Let

Yit=Tβ0+Xitβt+εit
(2)

where Yitis the true (latent) outcome, T denotes some treatment or intervention,1Xit are variables other than the treatment that explain the outcome and εi is a random error term. For identification, we assume that εit is distributed iid N(0,σε2). The observed self-reported outcome is a combination of true outcome and the response bias YitR.

Yit=Yit+YitR
(3)

We consider the specific case that the bias term YitRhas a truncated-normal distribution

YitR=uit(uit>0)
(4)

where uit is a random variable which accounts for response shift away from a subjective norm response level (usually called the ‘frontier’ in SFE) and is distributed N(μit,σu2)independent of εit. Moreover,

μit = Tδ0 + zitδt
(5)

where the vector zit includes variables (other than the treatment) that explain the specific deviation from the response frontier. Subscript i indexes the individual observation and, subscript t denotes time.2 Substituting (2), (4) and (5) in (3) we can write,

E(Yit)=Tβ0+Xitβt+Tδ0+zitδt+σuϕ(Tδ0+zitδσu)Φ(Tδ0+zitδtσu)
(6)

where φ(.) and Φ(.) are the standard normal probability density function and cumulative probability functions, respectively. Any treatment effect is given by β0 in equation (6). The normal relationship between the Xs and Y are given by βt. The last three terms on the right hand side represent the observation-specific response bias from this normal relationship. Treatment can affect both the maximum possible value of the measured outcome of a given individual (as defined by Xitβt), and the response bias. If treatment changes the response bias it will be indicated by the term δ0 and the bias recalibration is given by

E(Yi2Yi1)β0=δ0+σuϕ(Tδ0+zitδtσu)Φ(Tδ0+zitδtσu)σuϕ(zitδtσu)Φ(zitδtσu).
(7)

The estimated δ0 coefficient on treatment indicates how treatment has changed response bias. If δ0 = 0 there is no recalibration and the response bias, if it exists, is not affected by the treatment. Cross terms of treatment and other variables (that is, slope dummy variables) may be used if the treatment is thought to change the general way these other variables interact with functioning.

Recalibration can occur independently of the treatment effect. In fact, recalibration is sometimes a goal of the treatment or intervention in addition to the targeted outcome, which means a desired outcome is that δ ≠ 0 and Yi1 | Y*itYi2 | Y*itfor t ∈{1,2}. In other words, there is a change in individual measurement scale caused (and intended) by the intervention.

3 An application to evaluation of a family intervention

We applied SFE to examine response bias and recalibration in programme evaluations of a popular, evidence-based family intervention (the Strengthening Families Program for Parents and Youth 10–14, or SFP) (Kumpfer et al., 1996). Families attend SFP once a week for seven weeks and engage in activities designed to improve family communication, decrease harsh parenting practices, and increase parents’ family management skills. At the beginning and end of a programme, parents report their level of agreement with various statements related to skills and behaviours targeted by the intervention (e.g., ‘I have clear and specific rules about my child's association with peers who use alcohol’). Consistent with the literature on response shift, we hypothesised that non-random bias would be greater at pretest than at posttest as parents changed their standards about intervention-targeted behaviours and became more conservative in their self-ratings. In other words, we expected that after the intervention parents would recalibrate their self-ratings downward, resulting in an underestimate of the programme's effects.

3.1 Sample

Our data consisted of 1437 parents who attended 94 SFP cycles in Washington State and Oregon from 2005 through 2009. 25% of the participants identified themselves as male, 72% as female, and 3% did not report gender. 27% of the participants identified themselves as Hispanic/Latino, 60% as White, 2% as Black, 4% as American Indian/Alaska Native, 3% as other or multiple race/ethnicity, and 3% did not report race/ethnicity. Almost 74% of the households included a partner or spouse of the attending parent, and 19% reported not having a spouse or partner. For almost 8% of the sample, the presence of a partner or spouse is unknown. Over 62% of our observations are from Washington State, with the remainder from Oregon.

3.2 Measures

The outcome measure consisted of 13 items assessing parenting behaviours targeted by the intervention, including communication about substance use, general communication, involvement of children in family activities and decisions, and family conflict. Items were designed by researchers of the programme's efficacy trial and information about the scale has been reported on elsewhere (Spoth et al., 1995; Spoth et al., 1998). Cronbach's alpha (a measure of internal consistency) in the current data was .85 at both pretest and posttest. Items were scored on a 5-point Likert-type scale ranging from 1 (‘strongly disagree’) to 5 (‘strongly agree’).

Variables used in the analysis, including definitions and summary statistics, are presented in Table 1. The average family functioning, as measured by the change in self-assessed parenting behaviours from the pretest to the posttest, increased from 3.98 to 4.27 after participation in SFP.

Table 1

Variable names, descriptions and summary statistics

NameDescriptionMSD
Pretest functioningSemi-continuous (0-5)3.9790.546
Posttest functioningSemi-continuous (0-5)4.2730.461
MaleIf Male = 10.2500.433
Gender missingIf gender not reported = 10.0300.170
WhiteIf White = 10.6010.490
BlackIf Black = 10.0230.150
Latino/HispanicIf Latino/Hispanic = 10.2690.443
Native AmericanIf Native American = 10.0400.195
OtherIf Other race/ethnicity = 10.0340.182
Race missingIf race not reported = 10.0340.182
AgeInteger (17-73)38.8227.846
Partner or spouseIf Partner or spouse in family = 10.7360.441
Partner or spouse missingIf Partner or spouse in family not reported = 10.0770.266
Partner or spouse attendsIf Partner or spouse attended SFP = 10.4990.500
Washington StateIf family lives in Washington State = 10.6220.485

3.3 Procedure

Pencil-and-paper pretests were administered as part of a standard, ongoing programme evaluation on the first night of the programme, before programme content was delivered; posttests were administered on the last night of the programme. All data are anonymous; names of programme participants are not linked to programme evaluations and are unknown to researchers. The Institutional Review Board of Washington State University issued a Certificate of Exemption for the procedures of the current study.

We used SFE to estimate (pre- and post-treatment) family functioning scores as a function primarily of demographic characteristics. Based on previous literature (Howard and Dailey, 1979), we hypothesised that the one-sided errors (response bias) would be downward, and preliminary analysis supported that hypothesis.3 Additional preliminary analysis of which variables to include among zi (including a model using all the explanatory variables) led us to conclude that three variables determined the level of bias in the family functioning assessment – age, Latino/Hispanic ethnicity, and whether or not the functioning measure was a pretest or posttest assessment. We used the ‘xtfrontier’ routine in Stata to estimate the parameters of our models. Unlike the applications of SFE to technical efficiency estimation our model does not require log transforming the dependent variable.

3.4 The baseline model

The results of the baseline SFE model are shown in Table 2. The Wald χ statistic indicated that the regression was highly significant. Several demographic variables were found to influence the assessment of family functioning with conventional statistical significance. Males gave lower estimates of family functioning than did females and those with unreported gender. All non-White ethnic groups (and those with unreported race/ethnicity) assessed their family's functioning more highly than did White respondents. Participation in the Strengthening Families Program increased individuals’ assessments of their family's functioning.

Table 2

SFE - total effects model

VariablePSEZp < Z
Functioning
Treatment0.1560.0275.870.000
Male−0.1190.020−6.030.000
Gender missing−0.0180.058−0.300.760
Black0.1670.0543.110.002
Latino/Hispanic0.2560.0298.860.000
Native American0.0900.0432.080.038
Other0.1740.0453.830.000
Race missing0.1130.0542.080.038
Age−0.0050.001−3.920.000
Partner or spouse−0.0260.022−1.180.237
Partner or spouse missing−0.0620.037−1.700.090
Washington State0.0230.0181.310.189
Constant4.6050.05485.630.000
μ
Treatment−1.1950.407−2.940.003
Hispanic1.1000.3832.870.004
Age−0.0520.028−1.880.061
lnsigma20.2910.2011.000.317
inlgtgamma2.5590.2639.720.000
σ 1.3380.389
γ 0.9280.018
σ u1.2420.383
σ 2 ε0.0960.010
Wald χ(15) = 331.46
Prob > χ = 0.000

We assessed bias, and its change, from the coefficient estimates for the δ parameters where μi = ziδ. Our first overall question was if, in fact, there was a one-sided error. Three measures of unexplained variation are shown in Table 2: σ = Eiui) is the variance of the total error, which can be broken down into component parts, σu2=E(ui2)and σε2=E(εi2). The statistic γ=σu2σu2+σε2gives the percent of total unexplained variation attributable to the one-sided error. To ensure 0 ≤ γ ≤ 1 the model was parameterised as the inverse logit of γ and reported as inlgtgamma. Similarly, the model estimated the natural log of σ, reported as lnsigma2, and used these estimates to derive σ ,σε2,σu2and γ. As seen in the table the estimates for inlgtgamma was highly significant but the estimate for lnsigma2 had a p-value of 0.317, which means we cannot reject a hypothesis that all of the variation in the responses is due to respondent-specific bias. Hence, we found strong support for the one-sided variation that we call bias, and we saw that by far the most substantial portion of the unexplained variation in our data came from that source.

Three variables explained the level of bias. Latino/Hispanic respondents on average had more biased estimates of their family functioning. Looking again at equation (3), we see that this means they, relative to other ethnic groups, underestimated their family functioning. However, we found that older participants had smaller biases, thus giving closer estimates of their family's relative functioning. Of primary interest is the estimate of the treatment effect. Participation in SFP strongly lowered the bias, on average.

3.5 Decomposing the measured change in functioning

The total change in the functioning score averaged 0.295. This total change consisted of two parts as indicated by the following:

Total change = Measured prescore − Measured postscore = (Real prevalue − Prevalue bias) − (Real postvalue − Postvalue bias) = Real change − (Postvalue bias − Prevalue bias) The term in parentheses is negative (the estimation indicates that treatment lowered the bias). Thus, the total change in the family functioning score underestimated the improvement due to SFP, although the measured post-treatment family functioning was not as large as it would seem from the reported family functioning scores, on average. Table 3 shows the average estimated bias by pre- and post-treatment, and the average change in bias, which was –0.133. Thus, the average improvement in family functioning was underestimated by this amount.

Table 3

Averages of bias and change

VariableMSD
Estimated u, pre-treatment0.4690.368
Estimated u, post-treatment0.3350.273
Change in u, post minus pre−0.1330.346

Table 4 shows the results of a regression on bias change and demographic and other characteristics. Males and Black respondents had marginally larger bias changes, while those with race/ethnicity unreported had smaller bias changes. Since the bias change was measured as postscore bias minus prescore bias, this means that the bias changed less, on average, for male and Black respondents, but more, on average, for those whose race was unreported.

Table 4

Regression of bias change

Dependent variable: change in biasβSEtp < t
Male0.0500.0232.190.029
Gender missing0.1000.0641.550.122
Black0.1140.0621.840.066
Latino/Hispanic0.0150.0220.680.496
Native American0.0480.0471.020.308
Other0.0780.0511.540.125
Race/ethnicity missing–0.1470.061–2.420.016
Age0.0030.0012.740.006
Partner or spouse0.0320.0281.130.258
Partner or spouse information missing0.0510.0401.270.203
Washington State–0.0020.020–0.110.912
Partner or spo use attended–0.0090.024–0.360.721
Constant–0.3030.054–5.650.000
SourceSum of square errorsdfF(12, 1424) = 2.4
Model3.40804212Prob. >F = 0.0044
Residual168.21811,424R-squared = 0.019
Total171.62621,436

3.6 The SFE model with heteroscedastic error

One alternative to our baseline model (known as the total effects model in SFE terminology) which generated the results in Table 2 is a SFE model which allows for heteroscedasticity in εi, ui, or both. More precisely, for this model, we maintained equation (3) but had E) = ωεwi and E(u ) = ωuwi where ωε and ωu are parameters to be estimated and wi are variables that explain the heteroscedasticity. We note that wi need not be the same in the two expressions, but since elements of ωε and ωu can be zero we lose no generality by showing it as we do, and in fact in our application we used the same variables in both expressions, those that we used to explain μ in the first model. Table 5 reports the results of such a model. In this case, the one-sided error we ascribe to bias is evident from statistically significant parameters in the explanatory expressions for σu2.

Table 5

SFE with heteroscedasticity

VariablePSEZp<Z
Functioning
Treatment0.2220.0326.940.000
Male-0.0980.019-5.110.000
Gender missing0.0020.0570.040.970
African Americans0.1590.0542.950.003
Hispanic0.3440.0359.950.000
Native American0.0960.0422.270.023
Other0.1580.0443.630.000
Race missing0.0900.0531.690.091
Age–0.0010.002–0.650.516
Partner or spouse–0.0270.021–1.290.199
Partner or spouse missing–0.0440.035–1.250.213
Washington State0.0170.0170.980.325
Constant4.5320.08851.550.000
Ln s)
Treatment–0.7150.187–3.810.000
Hispanic–1.1320.288–3.940.000
Age–0.0070.010–0.660.512
Constant–1.9060.434–4.390.000
ln (σu)
Treatment–0.2470.116–2.130.033
Hispanic0.9130.1237.420.000
Age–0.0050.007–0.670.504
Constant–0.7610.319–2.390.017
Wald χ(12) = 253.60
Prob. > χ = 0.000

We note first that the estimates in the main body of the equation were quantitatively and qualitatively very similar to those for the non-heteroscedastic SFE model. The only substantive change is that age was no longer significant at an acceptable p-value, and race unreported had a p-value of 0.1. All signs and magnitudes were similar. Once again, results indicated that participation in SFP (treatment) strongly improved functioning. Additionally, treatment lowered the variability of both sources of unexplained variation across participants. Th e decreased unexplained variation due to ε is likely explained by individuals having a better idea of the constructs assessed by scale items. For our purposes, the key statistic here is the coefficient of treatment explaining σu2. The estimated parameter was negative and significant with a p-value = 0.03. Since the bias was one-sided we clearly can conclude that going through SFP lowered the variability of the bias significantly. Moreover, these estimates can be used to predict the bias of each observation, and with this model the average bias fell from 0.545 to 0.492, so while the biases were larger with this model, the decrease in the average (–0.63) was about one-half the decrease we saw in the first model.

3.1 Sample

Our data consisted of 1437 parents who attended 94 SFP cycles in Washington State and Oregon from 2005 through 2009. 25% of the participants identified themselves as male, 72% as female, and 3% did not report gender. 27% of the participants identified themselves as Hispanic/Latino, 60% as White, 2% as Black, 4% as American Indian/Alaska Native, 3% as other or multiple race/ethnicity, and 3% did not report race/ethnicity. Almost 74% of the households included a partner or spouse of the attending parent, and 19% reported not having a spouse or partner. For almost 8% of the sample, the presence of a partner or spouse is unknown. Over 62% of our observations are from Washington State, with the remainder from Oregon.

3.2 Measures

The outcome measure consisted of 13 items assessing parenting behaviours targeted by the intervention, including communication about substance use, general communication, involvement of children in family activities and decisions, and family conflict. Items were designed by researchers of the programme's efficacy trial and information about the scale has been reported on elsewhere (Spoth et al., 1995; Spoth et al., 1998). Cronbach's alpha (a measure of internal consistency) in the current data was .85 at both pretest and posttest. Items were scored on a 5-point Likert-type scale ranging from 1 (‘strongly disagree’) to 5 (‘strongly agree’).

Variables used in the analysis, including definitions and summary statistics, are presented in Table 1. The average family functioning, as measured by the change in self-assessed parenting behaviours from the pretest to the posttest, increased from 3.98 to 4.27 after participation in SFP.

Table 1

Variable names, descriptions and summary statistics

NameDescriptionMSD
Pretest functioningSemi-continuous (0-5)3.9790.546
Posttest functioningSemi-continuous (0-5)4.2730.461
MaleIf Male = 10.2500.433
Gender missingIf gender not reported = 10.0300.170
WhiteIf White = 10.6010.490
BlackIf Black = 10.0230.150
Latino/HispanicIf Latino/Hispanic = 10.2690.443
Native AmericanIf Native American = 10.0400.195
OtherIf Other race/ethnicity = 10.0340.182
Race missingIf race not reported = 10.0340.182
AgeInteger (17-73)38.8227.846
Partner or spouseIf Partner or spouse in family = 10.7360.441
Partner or spouse missingIf Partner or spouse in family not reported = 10.0770.266
Partner or spouse attendsIf Partner or spouse attended SFP = 10.4990.500
Washington StateIf family lives in Washington State = 10.6220.485

3.3 Procedure

Pencil-and-paper pretests were administered as part of a standard, ongoing programme evaluation on the first night of the programme, before programme content was delivered; posttests were administered on the last night of the programme. All data are anonymous; names of programme participants are not linked to programme evaluations and are unknown to researchers. The Institutional Review Board of Washington State University issued a Certificate of Exemption for the procedures of the current study.

We used SFE to estimate (pre- and post-treatment) family functioning scores as a function primarily of demographic characteristics. Based on previous literature (Howard and Dailey, 1979), we hypothesised that the one-sided errors (response bias) would be downward, and preliminary analysis supported that hypothesis.3 Additional preliminary analysis of which variables to include among zi (including a model using all the explanatory variables) led us to conclude that three variables determined the level of bias in the family functioning assessment – age, Latino/Hispanic ethnicity, and whether or not the functioning measure was a pretest or posttest assessment. We used the ‘xtfrontier’ routine in Stata to estimate the parameters of our models. Unlike the applications of SFE to technical efficiency estimation our model does not require log transforming the dependent variable.

3.4 The baseline model

The results of the baseline SFE model are shown in Table 2. The Wald χ statistic indicated that the regression was highly significant. Several demographic variables were found to influence the assessment of family functioning with conventional statistical significance. Males gave lower estimates of family functioning than did females and those with unreported gender. All non-White ethnic groups (and those with unreported race/ethnicity) assessed their family's functioning more highly than did White respondents. Participation in the Strengthening Families Program increased individuals’ assessments of their family's functioning.

Table 2

SFE - total effects model

VariablePSEZp < Z
Functioning
Treatment0.1560.0275.870.000
Male−0.1190.020−6.030.000
Gender missing−0.0180.058−0.300.760
Black0.1670.0543.110.002
Latino/Hispanic0.2560.0298.860.000
Native American0.0900.0432.080.038
Other0.1740.0453.830.000
Race missing0.1130.0542.080.038
Age−0.0050.001−3.920.000
Partner or spouse−0.0260.022−1.180.237
Partner or spouse missing−0.0620.037−1.700.090
Washington State0.0230.0181.310.189
Constant4.6050.05485.630.000
μ
Treatment−1.1950.407−2.940.003
Hispanic1.1000.3832.870.004
Age−0.0520.028−1.880.061
lnsigma20.2910.2011.000.317
inlgtgamma2.5590.2639.720.000
σ 1.3380.389
γ 0.9280.018
σ u1.2420.383
σ 2 ε0.0960.010
Wald χ(15) = 331.46
Prob > χ = 0.000

We assessed bias, and its change, from the coefficient estimates for the δ parameters where μi = ziδ. Our first overall question was if, in fact, there was a one-sided error. Three measures of unexplained variation are shown in Table 2: σ = Eiui) is the variance of the total error, which can be broken down into component parts, σu2=E(ui2)and σε2=E(εi2). The statistic γ=σu2σu2+σε2gives the percent of total unexplained variation attributable to the one-sided error. To ensure 0 ≤ γ ≤ 1 the model was parameterised as the inverse logit of γ and reported as inlgtgamma. Similarly, the model estimated the natural log of σ, reported as lnsigma2, and used these estimates to derive σ ,σε2,σu2and γ. As seen in the table the estimates for inlgtgamma was highly significant but the estimate for lnsigma2 had a p-value of 0.317, which means we cannot reject a hypothesis that all of the variation in the responses is due to respondent-specific bias. Hence, we found strong support for the one-sided variation that we call bias, and we saw that by far the most substantial portion of the unexplained variation in our data came from that source.

Three variables explained the level of bias. Latino/Hispanic respondents on average had more biased estimates of their family functioning. Looking again at equation (3), we see that this means they, relative to other ethnic groups, underestimated their family functioning. However, we found that older participants had smaller biases, thus giving closer estimates of their family's relative functioning. Of primary interest is the estimate of the treatment effect. Participation in SFP strongly lowered the bias, on average.

3.5 Decomposing the measured change in functioning

The total change in the functioning score averaged 0.295. This total change consisted of two parts as indicated by the following:

Total change = Measured prescore − Measured postscore = (Real prevalue − Prevalue bias) − (Real postvalue − Postvalue bias) = Real change − (Postvalue bias − Prevalue bias) The term in parentheses is negative (the estimation indicates that treatment lowered the bias). Thus, the total change in the family functioning score underestimated the improvement due to SFP, although the measured post-treatment family functioning was not as large as it would seem from the reported family functioning scores, on average. Table 3 shows the average estimated bias by pre- and post-treatment, and the average change in bias, which was –0.133. Thus, the average improvement in family functioning was underestimated by this amount.

Table 3

Averages of bias and change

VariableMSD
Estimated u, pre-treatment0.4690.368
Estimated u, post-treatment0.3350.273
Change in u, post minus pre−0.1330.346

Table 4 shows the results of a regression on bias change and demographic and other characteristics. Males and Black respondents had marginally larger bias changes, while those with race/ethnicity unreported had smaller bias changes. Since the bias change was measured as postscore bias minus prescore bias, this means that the bias changed less, on average, for male and Black respondents, but more, on average, for those whose race was unreported.

Table 4

Regression of bias change

Dependent variable: change in biasβSEtp < t
Male0.0500.0232.190.029
Gender missing0.1000.0641.550.122
Black0.1140.0621.840.066
Latino/Hispanic0.0150.0220.680.496
Native American0.0480.0471.020.308
Other0.0780.0511.540.125
Race/ethnicity missing–0.1470.061–2.420.016
Age0.0030.0012.740.006
Partner or spouse0.0320.0281.130.258
Partner or spouse information missing0.0510.0401.270.203
Washington State–0.0020.020–0.110.912
Partner or spo use attended–0.0090.024–0.360.721
Constant–0.3030.054–5.650.000
SourceSum of square errorsdfF(12, 1424) = 2.4
Model3.40804212Prob. >F = 0.0044
Residual168.21811,424R-squared = 0.019
Total171.62621,436

3.6 The SFE model with heteroscedastic error

One alternative to our baseline model (known as the total effects model in SFE terminology) which generated the results in Table 2 is a SFE model which allows for heteroscedasticity in εi, ui, or both. More precisely, for this model, we maintained equation (3) but had E) = ωεwi and E(u ) = ωuwi where ωε and ωu are parameters to be estimated and wi are variables that explain the heteroscedasticity. We note that wi need not be the same in the two expressions, but since elements of ωε and ωu can be zero we lose no generality by showing it as we do, and in fact in our application we used the same variables in both expressions, those that we used to explain μ in the first model. Table 5 reports the results of such a model. In this case, the one-sided error we ascribe to bias is evident from statistically significant parameters in the explanatory expressions for σu2.

Table 5

SFE with heteroscedasticity

VariablePSEZp<Z
Functioning
Treatment0.2220.0326.940.000
Male-0.0980.019-5.110.000
Gender missing0.0020.0570.040.970
African Americans0.1590.0542.950.003
Hispanic0.3440.0359.950.000
Native American0.0960.0422.270.023
Other0.1580.0443.630.000
Race missing0.0900.0531.690.091
Age–0.0010.002–0.650.516
Partner or spouse–0.0270.021–1.290.199
Partner or spouse missing–0.0440.035–1.250.213
Washington State0.0170.0170.980.325
Constant4.5320.08851.550.000
Ln s)
Treatment–0.7150.187–3.810.000
Hispanic–1.1320.288–3.940.000
Age–0.0070.010–0.660.512
Constant–1.9060.434–4.390.000
ln (σu)
Treatment–0.2470.116–2.130.033
Hispanic0.9130.1237.420.000
Age–0.0050.007–0.670.504
Constant–0.7610.319–2.390.017
Wald χ(12) = 253.60
Prob. > χ = 0.000

We note first that the estimates in the main body of the equation were quantitatively and qualitatively very similar to those for the non-heteroscedastic SFE model. The only substantive change is that age was no longer significant at an acceptable p-value, and race unreported had a p-value of 0.1. All signs and magnitudes were similar. Once again, results indicated that participation in SFP (treatment) strongly improved functioning. Additionally, treatment lowered the variability of both sources of unexplained variation across participants. Th e decreased unexplained variation due to ε is likely explained by individuals having a better idea of the constructs assessed by scale items. For our purposes, the key statistic here is the coefficient of treatment explaining σu2. The estimated parameter was negative and significant with a p-value = 0.03. Since the bias was one-sided we clearly can conclude that going through SFP lowered the variability of the bias significantly. Moreover, these estimates can be used to predict the bias of each observation, and with this model the average bias fell from 0.545 to 0.492, so while the biases were larger with this model, the decrease in the average (–0.63) was about one-half the decrease we saw in the first model.

4 Discussion and conclusions

As we noted earlier, bias in self-rating is of concern in a variety of research areas. In particular, the potential for recalibration of self-rating bias as a function of material or skills learned in an intervention has long been a concern to programme evaluators as it may result in underestimates of programme effectiveness (Howard and Dailey, 1979; Norman, 2003; Pratt et al., 2000; Sprangers, 1989). However, in the absence of an objective performance measurement, it has not been possible to determine whether lower posttest scores truly represent response-shift bias or instead an actual decrement in targeted behaviours or knowledge (i.e., an iatrogenic effect of treatment). By allowing evaluators to test for a decrease in response bias from pretest to posttest, SFE provides a means of resolving this conundrum.

The SFE method, however, is not without problems. The main limitation is that the estimates rely on assumptions about the distributions of the two error components. Model identification requires that one of the error terms, the bias term in our application, to be one-sided. This, however, is not as strong an assumption as it looks, for two reasons. First, often there is prior information or theory that indicates the most likely direction for the bias. Second, the validity of the assumption can be tested statistically.

We presented SFE as a method to identify response bias and changes in response bias, within the context of self-reported measurements at individual and aggregate levels. Even though we proposed a novel application, the techniques not new, and has been widely used in economics and operational research for over three decades. The procedure is easily adoptable by researchers, since it is already supported by several statistical packages including Stata (StataCorp., 2009) and Limdep (Econometrica Software, Inc., 2009).

Response bias has long been a key issue in psychometrics, with response shift bias a particular concern in programme evaluation. However, almost all statistical attempts to address the issue have been confined to using SEM to test for response shift bias at the aggregate level. As noted in the introduction, our approach has three significant advantages over SEM techniques that try to measure response bias. SEM requires more data – multiple time periods and multiple measures, and measures bias only in the aggregate. SFE can identify bias with a single time period (although multiple observations are needed to identify bias recalibration) and identifies response biases across individuals. Perhaps the biggest advantage over SEM approaches is that SFE not only identifies bias but also provides information about the root causes of the bias. SFE allows simultaneously analysis about treatment effectiveness, causal factors of outcomes, and covariates to the bias, improving the statistical efficiency of the analysis over traditional SEM which often cannot identify causal factors and covariates to bias, and when it can, it requires two-step procedures. And since SFE allows the researcher to identify bias and causal factors at the individual level, it expands our ability to identify, understand, explain, and potentially correct for, response shift bias. Of course, bias at the individual level can be aggregated to measures comparable to what is learned through SEM approaches.

Acknowledgements

The authors would like to thank the anonymous referees. This study was supported in part by the National Institute of Drug Abuse (grants R21 DA025139-01Al and R21 DA19758-01). We thank the programme providers and families who participated in the programme evaluation.

Robert Rosenman, School of Economic Sciences, Washington State University, P.O. Box 646210, Pullman, WA 99164-6210, USA;
Contributor Information.
Corresponding author ude.usw@akamay

Abstract

Response bias shows up in many fields of behavioural and healthcare research where self-reported data are used. We demonstrate how to use stochastic frontier estimation (SFE) to identify response bias and its covariates. In our application to a family intervention, we examine the effects of participant demographics on response bias before and after participation; gender and race/ethnicity are related to magnitude of bias and to changes in bias across time, and bias is lower at post-test than at pre-test. We discuss how SFE may be used to address the problem of ‘response shift bias’ – that is, a shift in metric from before to after an intervention which is caused by the intervention itself and may lead to underestimates of programme effects.

Keywords: response bias, response-shift bias, programme evaluation, stochastic frontier analysis, stochastic frontier estimation, SFE, prevention science
Abstract

Footnotes

Reference to this paper should be made as follows: Rosenman, R., Tennekoon, V. and Hill, L.G. (2011). ‘Measuring bias in self-reported data’, Int. J. Behavioural and Healthcare Research, Vol. 2, No. 4/2011, pp. 320-332.

We present a single model that allows for pre- and post-intervention measurement of the outcome of interest and bias. If the self-reported data is not related to an intervention, β0 and δ0 (below) are identically 0 and there is only one time period, t.

Due to symmetry of the normal distribution, without loss of generality we can also assume that the bias distribution is right truncated.

When we tried to estimate the parameters of a model with one-sided errors upward the maximisation procedure failed to converge. A specification with one-sided errors upward but without a constant term converged, but a null hypothesis that there is a one-side error term was rejected with near certainty, indicating that there is no sizable upward response bias. A similar analysis but with the one-sided upward errors completely random (rather than dependent on treatment and other variables) was also rejected, again with near certainty. Thus, upward bias was robustly rejected.

Footnotes

Contributor Information

Robert Rosenman, School of Economic Sciences, Washington State University, P.O. Box 646210, Pullman, WA 99164-6210, USA.

Vidhura Tennekoon, School of Economic Sciences, Washington State University, P.O. Box 646210, Pullman, WA 99164-6210, USA ude.usw@aruhdiv..

Laura G. Hill, Department of Human Development, Washington State University, 523 Johnson Tower, Pullman WA 99164, USA ude.usw@lliharual.

Contributor Information
Collaboration tool especially designed for Life Science professionals.Drag-and-drop any entity to your messages.