Psychol Rev 122(3): 429-460

PMID: 26120907

Separating Decision and Encoding Noise in Signal Detection Tasks

SDT and Static Criteria

In a typical yes/no signal detection experiment, an observer monitors an observation interval for the presence of a designated signal stimulus. The observer responds affirmatively if she believes the signal was present during this interval. The observer cannot respond with perfect accuracy on every trial, sometimes correctly reporting the presence of a signal when a signal stimulus in fact occurred, but sometimes incorrectly affirming the presence of a signal when a signal was not present. The hit rate (HR) is the relative frequency of saying “yes” when a signal is present; the false alarm rate (FAR) is the relative frequency of saying “yes” when a signal is not present. Misses and correct rejections are the relative frequencies of saying “no” when a signal is present and when a signal is absent. Manipulation of the observer’s ‘yes’ rate by changing task instruction, pay-off structure, or stimulus base rates elicits different values of HR and FAR, and the HR plotted against the FAR defines the receiver operating characteristic (ROC, Figure 2, left; Green & Swets, 1966).

An external file that holds a picture, illustration, etc.
Object name is nihms691243f2.jpg

Figure 2

Left: An ROC with three different decision criteria. When the signal strength is low, performance decreases, values of HR and FAR converge, and the ROC curve approaches the unity slope. With higher signal strength, HR and FAR diverge, so the ROC curve moves up and to the left. Right: underlying distributions of stimulus representations at the decision stage shown with high encoding noise and low decision noise (top panel) and an alternative representation with lower encoding noise and higher decision noise (bottom panel), each leading to the same performance outcome.

The data from empirical ROCs often comprise the fundamental features researchers wish to model in signal detection tasks. In most applications, SDT posits internal representations in the form of Gaussian random variables with mean values positioned along a decision axis and monotonically related to stimulus strength (Graham, 1989). Consequently, the representational distributions of two stimuli of different strength often overlap, leaving some non-zero likelihood that a stimulus sample from either stimulus class (signal present or signal absent) could have generated the internal response in a given trial. Many signal detection models assume that the observer responds by establishing a boundary or criterion along the decision axis, and chooses “yes” when the value of the sampled internal representation exceeds this criterion, and chooses “no” otherwise (Figure 2, right panels). Representations from signal present trials exceeding the criterion contribute to HR, and representations of signal absent trials exceeding the criterion contribute to FAR. Insofar as distributions of internal representations really do approximate Gaussian probability density functions, HR and FAR may be transformed into standardized scores (z-scores) to indicate the position of the criteria along the decision axis in units of the standard deviation of the underlying distributions (see Appendix A.1). Empirical zROC functions are often approximately linear, consistent with the Gaussian distribution assumption (Macmillan & Creelman, 2004). The classical SDT model does not incorporate trial-by-trial variability in the criterion position, so all response variability accrues from variations in the internal representations of the stimuli (Benjamin et al, 2009).

While some simple SDT applications assume equal variances for signal present and signal absent distributions, researchers frequently relax this equal variance assumption to account for the non-unity slopes often observed in many empirical zROC’s. Meanwhile, the static criterion assumption has rarely been relaxed. Early formulations of SDT excluded decision noise for two reasons (Tanner & Swets, 1954). First, because a static decision mechanism was optimal and part of a cognitive operation, an observer would not willingly choose to vary its operation from trial to trial, since this variable strategy would lead to lower overall performance (Benjamin et al, 2013; Mueller & Weidemann, 2008). And second, typical analyses of signal detection data simply could not differentiate between noise arising from representational and decision-related processes (Figure 2, right panels; see Wickelgren, 1968).

Evidence for Criterion Variability

Though practical considerations led to omissions of criterion variability in early applications of signal detection theory, in fact, lines of evidence suggesting a variable decision process predate even the Thurstonian framework (Fernberger, 1920). Later, reduced performance on absolute identification due to increased stimulus range was attributed to increased variance in identification criteria (the range effect; Pollack, 1952). Early research in auditory amplitude identification led to the explanation that the change in response variability arose due to subjects exhibiting a range-dependent criterion noise (also interpreted as memory noise; see Durlach & Braida, 1969). Later research suggested an independence between the range effect and the total number of response categories (Braida & Durlach, 1972) and specifically implicated the criterial range as the source of the performance decrement (Gravetter & Lockhead, 1973), though not to the exclusion of representation-related mechanisms as well (Luce, Nosofsky, Green, & Smith, 1982; Luce & Nosofsky, 1984; Nosofsky, 1983). Additionally, investigators have invoked criterion noise to help explain anomalies in the shape of the ROC curve (Murray, Bennett, & Sekuler, 2002; Mueller & Weidemann, 2008; Wickelgren, 1968); discrepancies in distribution-free estimates of response bias in confidence rating tasks (Mueller & Weidemann, 2008); performance decrements related to larger rating scales in confidence ratings tasks (Benjamin et al, 2013); and feedback-associated manipulation (Carterette, 1966) and learning (Friedman, Caterette, Nakatani, & Ahumada, 1968) in auditory amplitude detection. Others have suggested that decision noise results from criterion-setting mechanisms for reconstructing stimulus representations at the decision level (Parks, 1966); and that criterion noise is related to non-optimal criterion shifting (Thomas, 1973,1975). For a more extensive review, see Benjamin et al (2009).

Although we have presented a small sample here, evidence arising from these disparate research areas has generated a great body of literature implicating the presence of criterion variability. Along with these empirical results, a literature of theoretical contributions has also emerged (e.g., Kac, 1962; Treisman, 1984; Treisman & Williams, 1985). Strictly speaking, to whatever extent quantitative models can account for the phenomena of criteria shifting, we can no longer refer to this as “noise” in the proper sense of the word. We here follow earlier writers who have disambiguated “systematic” noise from “unsystematic,” “irreducible,” or “random” noise (Levi, Klein, & Chen, 2005; Rosner & Kochanski, 2009). We now turn to the research efforts to separate and measure decision noise.

Decision Noise Methods and Models

Analysis of the categorical judgment task showed that standard signal detection experimental procedures could not generally distinguish representational noise from decision noise without significant simplifying assumptions (Rosner & Kochanski, 2009; Torgerson, 1958). The first serious research effort to understand the influence of decision noise began with Wickelgren and his study of response predictions for a variety of signal detection task conditions in the presence of significant criterion noise (although see also Tanner, 1961, for consideration of decision noise under a less rigid interpretation of decision criterion in a 2-alternative forced choice task). In a seminal paper, Wickelgren (1968) examined the ramifications of decision noise for subject performance in yes/no and confidence rating tasks. He derived functional forms for the zROC and showed that observers with non-trivial decision noise could produce linear zROCs as long as decision noise remained constant across criteria and task structure did not alter representational characteristics (see also Benjamin et al., 2009). Static criteria with Gaussian representational distributions lead to linear zROCs, but linear zROCs do not necessarily imply static criteria. Wickelgren also considered the implications of attenuated criterion noise at a primary decision boundary relative to the remaining criterion boundaries in bipolar confidence rating tasks and the data signature this affords in a zROC curve (see also Mueller & Weidemann, 2008; Murray et al, 2002). In particular, he observed that the subject could exhibit a peaked zROC when criterion noise at the primary decision boundary is significantly less than the decision noise at the remaining boundaries. Reviewing studies with greater numbers of category boundaries, he often identified larger peaks, leading to the speculation that increasing the number of category boundaries could increase decision noise. This finding was consistent with Miller’s famous paper on information retrieval (Miller, 1956) and the criterial range interpretation of the range effect (Gravetter & Lockhead, 1973) insofar as additional criteria lead to broader criterion spread across the decision axis.

Wickelgren’s close examination of the shape of subjects’ ROCs and zROCs became a standard diagnostic approach for criterion variability in signal detection type tasks. But because data collection in typical yes/no tasks requires bias manipulations that might alter either representational or decision processes, researchers preferred confidence rating procedures for their greater assurances of representation and decision noise stability over the duration of the experiment. However, even studies using rating procedures may have fallen short of unambiguous estimates of representation and decision variability owing to tradeoffs between these parameters in estimation (e.g., Mueller & Weidemann, 2008; Benjamin et al., 2009).

Nosofsky (1983) developed a multiple presentation method to examine the range effect with an identification task. On individual trials in his study, subjects made multiple responses to repeated identical presentations of a stimulus from one of the available stimulus classes. Although he treated each response as independent of the others, he assumed that noisy internal representations were averaged while decision noise remained constant across presentation repetitions. By separately measuring sensitivity for each presentation repetition, he demonstrated non-trivial decision and representational noise with both components increasing with larger criterion range.

Benjamin et al (2009) developed an Ensemble Recognition task similar to the multiple presentation method of Nosofsky to examine the effects of decision noise in memory recognition. In this study, subjects were first presented a study list of words they would later be asked to recognize during a test phase. During the test phase individual trials contained ensembles of one, two, or four words. Each ensemble contained either one, two, four, or no words from the previously examined study list. The Ensemble Recognition framework assumed that each word of each trial ensemble led to internal activations independent of the other words, and that either the sum or the average of these activations would comprise the internal representation at the decision stage. Similar to Nosofsky, these authors assume that the decision noise remained constant while the summing or averaging would lead to adding or averaging of the representational noise. The averaging model performed best in model selection tests and estimated a very significant role for decision noise in word recognition.

More recently, Kellen et al (2012) offered a critique of the conclusions drawn from the Ensemble Recognition study and provided new reports on the question of decision noise in memory recognition using a model generalization framework. This approach involves combining a 4-alternative forced choice task with a rating procedure under the traditional assumptions that internal representations are identical under the two regimes and that response bias does not play a role in subject response during forced choice tasks. They jointly fit their elaborated SDT model with decision noise to data from both the 4AFC and the confidence rating tasks but found virtually no significant decision noise influencing subject performance in their memory recognition experiments.

Rosner and Kochanski (RK; 2009) developed a categorical judgment model to separately estimate criterion noise at decision boundaries. They corrected an error in an earlier formal description of a categorization task that allowed for decision noise in absolute identification and confidence rating tasks (Torgerson, 1958). However, RK showed that the earlier formulation failed to account for the fact that truly independent noisy criteria might overlap from trial to trial and could result in predictions of negative response frequencies. Their revised formalization accounts for this overlap and can be reduced to two special cases: in the absence of decision noise the model simplifies to the traditional SDT model, and in the absence of representation noise the model simplifies to a complimentary SDT model (a formulation which ascribes all response variability to noisy criteria). Using simulated experiments, RK showed parameter recovery was possible for a range of assumed parameter configurations. They argued that the general formulation of the model disambiguated the conflated parameters, and that acquiring sufficient degrees of freedom in data posed the only constraint to parameter estimation. In particular, a categorization task with N stimulus classes and M+1 response categories requires identification of the means and variances of 2N-2 stimulus parameters (assuming a reference stimulus class with mean 0 and variance 1) and 2M criterion parameters. This categorization task has NM independent data points, so that full model identification is possible only when NM > 2(N+M)-2; that is, when both N >2 and M >2. For the standard signal detection paradigm with 2 stimulus classes (N = 2), a solution is available only if the criterion variances are assumed equal at all category boundaries.

A New Approach

Intuitions and Rationale

We develop a framework combining two well-known experimental paradigms to estimate both representational and decision noise components in signal detection type tasks with only two stimulus classes, S₀ and S₁ (where 0 refers to signal absent trials and 1 refers to signal present trials). The first paradigm is a confidence-rating task in which subjects provide a rating R_i indicating their degree of certainty that the present trial contains a signal stimulus (Egan, Shulman, & Greenberg, 1959). The second component is the multi-pass procedure, an external noise paradigm involving multiple presentations of identical stimuli (Burgess & Colborne, 1988; Greene, 1964; Lu & Dosher, 2008). We show that this combination sufficiently constrains elaborated signal detection models by providing measures of agreement in addition to rating frequencies.

Here we offer some basic intuitions to illustrate our strategy for dissociating representation and decision noise components. To begin with, we simplify our exposition by considering response variability with a single criterion C with stimulus class S_h, where h = 0 or 1. If an observer responds differently to two or more trial presentations with identical stimuli, we attribute the change in response to internal noise. Researchers have explored this basic idea by adding external noise to stimulus presentations in order to estimate internal noise (Barlow, 1957; Pelli, 1990; Lu & Dosher, 1998, 2008). Examples of external noise include random assignment of contrast increments or decrements to individual pixels in a visual stimulus, samples of “white noise” added to an auditory stimulus, or any other random trial-by-trial perturbations to the stimulus. Multiple presentation methods that utilize external noise assume that the total noise degrading subject performance is a composite of component noise sources. The first component, with standard deviation σ_ext, reflects a variability in the subject’s internal representation of the external noise that is entirely correlated with the variability in the physical stimuli. This assumption implies that identical samples of external noise lead to internal representations that are partly composed of identical offsets along the decision axis. Therefore, a given sample offset reflected by this consistent noise component depends entirely on the specific noisy stimulus that evoked it¹. The second component, with standard deviation σ_{E_h}, signifies the internal noise induced during trials of stimulus class h and reflects random perturbations arising from the encoding of both signal (if present) and external noise in trial stimuli. Finally, random trial-by-trial sampling of a variable criterion with standard deviation σ_C constitutes a third component. The distributional parameters of the encoding noise component may be functionally related to features of a stimulus class (e.g., contrast level), but it is still stochastic in nature and results in random perturbations of the internal representation to identical stimuli. The criterion variability, by assumption, neither depends on individual stimulus samples nor on the general stimulus class. We refer to these secondary noise components as random noise (Levi & Klein, 2003) insofar as they operate independently of any external noise samples (drawn from a single distribution). Therefore, the total response variability σ_{T_h}during trial presentations of stimulus S_h, is the combined result of the perturbations arising from consistent and random noise components.

σ_{T_{h}}^{2} = σ_{ext}^{2} + σ_{E_{h}}^{2} + σ_{C}^{2}

(1)

In a multi-pass paradigm, subjects perform a signal detection task over multiple passes of trials. Each trial from the first pass includes an independent sample of external noise. However, subsequent passes of trials contain the same stimuli and exactly identical samples of external noise as in the first pass (Figure 3). Although two passes suffice to obtain an estimate of agreement, in practice experiments often include additional passes for better accuracy and precision. Since any change in overt response to identical presentations of a stimulus reflects a change in the internal state of the observer, variability in response to identical stimuli reflects internal noise (Burgess & Colborne, 1988; Green, 1964; Lu & Dosher, 2008). Researchers can assess to what extent subject responses agree over multiple presentations of identical samples of noisy stimuli and this agreement can be used as an additional constraint to determine the ratio ${(σ_{E_{h}}^{2} + σ_{C}^{2})}^{1 / 2} / σ_{ext}$ (see Appendix). Low ratios of internal to external noise will lead to greater agreement between responses to identical stimuli, while higher ratios lead to a decline in agreement. The estimated statistic of agreement depends on the task specifications but can be measured with percent agreement (Burgess & Colborne, 1988; Spiegel & Green, 1981; Lu & Dosher, 2008), correlation (Levi & Klein, 2003), or covariance between responses to corresponding trials on successive passes.

An external file that holds a picture, illustration, etc.
Object name is nihms691243f3.jpg

Figure 3

Left: a multi-pass procedure contains at least two runs with identical samples of external noise added to corresponding trial stimuli within each pass. Corresponding trials need not be presented according to the same stimulus schedule for each pass, but we match external noise samples with trial order here for the purpose of illustration. Right: Measures of agreement (percent agreement, covariance, correlation) between responses to corresponding trials across passes provide additional behavioral measure to help constrain observer models.

For multi-pass experiments involving only a single decision criterion, the observed response frequency and response agreement can provide estimates of the total internal to external noise ratio in addition to sensitivity and response bias (Green, 1964; Burgess & Colborne, 1988). The separate parameters of criterion and encoding variance, however, leaves many possible combinations of criterion and encoding noise that are compatible with the measured combination of HR, FAR, and agreement measures. In a multi-pass signal detection experiment with a single criterion, there are five parameters to estimate (encoding noise for each stimulus class, a mean value for the signal distribution, a criterion mean, and a criterion variance) with only four data points (HR, FAR, agreement on signal present trials, and agreement on signal absent trials).

Degrees of freedom increase with additional criteria in a rating experiment. Rosner & Kochanski (2009) demonstrated the possibility of independent estimates of criteria variability, criteria positioning, stimulus positioning, and stimulus representational noise (they did not distinguish between consistent and random components) in rating tasks with at least three stimulus levels and four response categories. Estimating these parameters with only two stimulus classes, however, requires additional constraining data measurements. In this paper, we use a multi-pass confidence rating procedure (MCR) and we measure the covariance of responses to trials of a specific stimulus class across different passes as an index of response correlation between these passes. The full covariance matrix provides a compact summary of agreement measures for the same categorization of identical trials across passes (within-category covariance along the diagonal) as well as disagreement for different categorizations of identical trials (between category covariance off the diagonal). Conceptually, if trial-by-trial responses over each pass are taken as vector elements, then the covariance gives the (mean adjusted) dot product of these response vectors. A highly positive covariance estimate implies response agreement across passes. Very low covariance (near zero) implies lack of agreement. Highly negative covariance implies not only lack of agreement but strong disagreement across passes. With low to moderate levels of internal noise, we intuitively expect positive covariance values for within-category estimates along the diagonal of the covariance matrix. For between-category covariance estimates for adjacent regions of decision space (e.g., response assignments of “2” and “3” across passes) we might expect lower though still positive values. For between-category covariance estimates for response assignments of nonadjacent regions (e.g., response assignments of “2” and “5” across passes), we expect nearly zero or negative covariance estimates.

Here we show that the MCR procedure sufficiently constrains a class of decision noise models to identify all relevant parameters even when the task involves only two stimulus classes. Under the MCR procedure, each stimulus class gives us M independent response frequencies as well as M independent agreement measures for identical responses between passes. In addition to the covariance of responses for the same rating category across passes (within-category covariance: e.g., response category “2” in the first pass and “2” again in subsequent passes), the covariance of responses for different rating categories across passes may provide even stronger constraints for model fits to data (between-category covariance: e.g., response category “2” in the first pass and “3” in subsequent passes). In total, the MCR provides M(M+3) data points (2M response frequencies and M(M+1) covariance estimates) to fit 2M+3 free parameters: M criterion positions, M criterion variances, an encoding variance for the signal absent trials, an encoding variance for the signal present trials, and the mean position of the signal stimulus along the decision axis (Table 1). Therefore, the MCR procedure may provide sufficient constraints to recover all decision noise parameters for a rating task with as few as three response categories (corresponding to M = 2).

Table 1

Degrees of freedom in rating procedure tasks

	Data Points		Free parameters
Rating procedure	2M	<	2M + 3
MCR procedure	2×2M	>	2M + 3

To illustrate this point, Figure 4 (left) shows two overlapping and nearly identical ROCs generated using very different underlying internal noise components. In one case, the encoding noise is equal for signal-absent and signal-present trials while decision noise is small for all criteria. In the second case, the encoding noise for signal-present trials is half that for signal absent trials, while the decision noise varies markedly across criteria and even well exceeds the encoding noise at one of the decision boundaries. Yet, in spite of these very different noise profiles, the resulting ROC’s are essentially the same. On the other hand, the covariance measures estimated from an MCR procedure are drastically different (Figure 4, right) and may provide additional constraints to disambiguate the underlying noise components.

An external file that holds a picture, illustration, etc.
Object name is nihms691243f4.jpg

Figure 4

Left: Two overlapping ROCs generated using a decision rule described by Rosner and Kochanski (2009; see decision rules below) and assuming two different underlying parameter sets. Parameters 1 (circles): encoding noise is 1 for both signal absent and signal present trials; the mean of the signal distribution is 1; criteria are located at −0.62, 0, 0.5, 1 with criterion noise at 0.1 for all criteria. Parameters 2 (+’s): encoding noise is 0.8 for signal absent trials, 0.4 for signal present trials; the signal mean is 0.92; the criteria are located at −0.15, 0, 0.5, 0.77 with corresponding criteria noise of 0.125, 1, 0.3, 0.2. All quantities given in units of the consistent noise, σ_ext. Right: covariance outcomes using the same two underlying parameter sets result in discriminably different data patterns. Within-category covariances are denoted as [r,r] and lie within the gray bar. Between-category covariances lie outside the gray bar. Blue symbols mark within- and between-category covariances for response “2”; red for response “3”; black for response “4”; and magenta shows within-category covariance for response “5”. For example, between-category covariance for response categories “3” and “5” across passes are shown as red circles and +’s at the position “r, r+2” along the abscissa.

While a greater number of independent data points relative to the number of free parameters provides a necessary condition for fitting those parameters within the context of a model, this is not sufficient all on its own (Busemeyer & Diederich, 2010). Even with more data points relative to free parameters, the data may fail to fully constrain the model and disambiguate the parameters, so that successful model identification depends on more than degrees of freedom alone.

We will provide evidence that the MCR framework allows for full parameter recovery from simulated data over a wide range of conditions. However, we first seek an intuitive demonstration of the relationship between observed data and underlying noise components. While some changes to covariance data are straightforward (e.g., representational noise for a specific stimulus class selectively depresses covariance estimates for responses to that specific stimulus class, but nontrivial decision noise at even a single criterion boundary will lead to changes in covariance and z-scores at all criteria owing to positional overlap), the pattern of expected values becomes more complex with the introduction of decision noise. In Figure 5, we examined changes to expected values of response frequencies and covariance structure for a three-category rating task in which we selectively increase the variability for one of the criteria from zero to match the level of variability in the stimulus representation. For this very simple example, we assumed that observers map internal representations to responses according to a corrected Law of Categorical Judgment as described by Rosner and Kochanski (2009; see Decision Rules below). This decision rule determines response assignment by subtracting each trial-sampled representation from trial-sampled criteria and choosing the category where the difference between representation and corresponding criterion gives the least positive value; when all values are negative, the representation is assigned to the highest response category.

An external file that holds a picture, illustration, etc.
Object name is nihms691243f5.jpg

Figure 5

Left-top: decision space for classical confidence rating signal detection task with no decision noise. Criterion locations lie at the means of the signal-absent and signal-present distributions. Left-center and bottom: decision space showing joint distributions when decision noise equal to the representational noise is selectively added to the more lax criterion. The center of the concentric circles represents the mean position of the lax criterion along the ordinate, and the mean position of the signal-absent distribution (center) and signal-present distribution (bottom) along the abscissa. Straight blue lines represent mean criterion positions. Numbers overlaying joint distributions denote expected response category for trial-sampled criteria and representations falling in these regions. Right: zROC (top) and covariance data (bottom) for classical signal detection task without decision noise (circles) and with decision noise equal to representational noise at the more lax criterion (crosses). Within-category covariance data lie within the gray bar, between-category covariance data lie outside the gray bar. Covariance data indicating a response of “2” in at least one pass are blue; withing-category covariance for response “3” in both passes labeled with red. See main text for more details.

We begin from the standard SDT account with no decision noise. In this case we assume that two static criteria, each positioned at the mean of the signal-absent and signal present distributions, divide the decision space into three response categories (Figure 5, top-left). Our example assumes a d′ = 1 with equal representational noise for the two evidence distributions. In contrast, we juxtapose a second scenario in which we selectively increase the decision noise for the more lax criteria to match the representational noise, without modifying any of the other parameters. The joint distributions accounting for both the variability in the criterion as well as variability in the signal-absent and signal-present representations are shown as concentric circles (Figure 5, left middle and bottom). The vertical axis represents positions of the noisy criterion, the horizontal axis reflects positions of the noisy internal representations, and the solid blue lines reflect the position of the means of the noisy and static criteria with respect to the noisy criterion (horizontal blue lines) and representational (vertical blue lines) distributions. Finally, we superimpose rating response column and row labels A, B, C, and D for regions of the joint distributions according to the decision rule described above. For example, when trial samples of both the noisy criterion and representation exceed the stricter (and static) criterion in region DD, some trial representations will be classified as “1”s instead of “3” depending on whether the sampled criterion exceeds the sampled representation. Similarly, trial representations will always be classified with a response category of “2” anytime a sampled criterion exceeds the static criterion while the sampled representation does not (regions AD, BD, and CD). Each column of these joint distributions illustrates how some representations falling along the decision axis become reassigned depending on the position of the trial sampled criterion. In column C, for example, all representations remain with a response assignment of “2” except in row C where some will be reassigned to a response of “1.”

Figure 5 (right) also shows the corresponding changes to the zROC and covariance in the classical SDT treatment with no decision noise (shown as circles) and with the targeted increase in decision noise at the most lax criteria (shown as ‘+’ symbols). In the case of the zROC plot, we can see how the introduction of decision noise at the more lax criterion results in small but noticeable change in position for the stricter criterion in z-space. Column D in the joint distributions shows that response assignments of “3” can only decrease with increased decision noise at the more lax criterion, and no responses previously mapped to “1” or “2” will be reassigned to “3” according to the parameters we have chosen for this illustration. This net loss of assignments to “3” occurs for both signal-absent and signal-present trials and is reflected by a shift in the criterion estimate in the zROC towards the bottom left. Similarly, columns A and B show how the criterion variability on signal-absent trials results in a net decrease of response assignments mapped to “1” leading to a significant rightward shift in the more lax criterion estimate in zROC space: losses from region BB are canceled by gains in region CC, but region AA, BA, AD, and BD all lose response assignments of “1” without corresponding counterbalancing regions. These regional reassignments are also true for signal-present trials, but in this case the region CC represents a much higher likelihood under the joint density function than is counterbalanced by regions AA, BA, AD, and BD. These regional exchanges, coupled with an additional increase in “1” responses from region DD to counterbalance losses in region BB, results in a very slight net increase in response assignments of “1” with a corresponding subtle downward shift in the position of the more lax criterion in the zROC plot.

We can also observe this increased decision noise changes the covariance data, though overall response frequency will also affect this measure in addition to the correlation in responses across passes. For both signal-absent and signal-present trials, the covariances for response assignments of “3” decrease due to changes in lower correlations and lower response frequencies when trial samples of both criterion and representation fall within region DD. Within-category covariance for response assignments of “2” also decrease with increased decision noise for signal-absent trials since many of the regions previously assigned to “1” become remapped to “2” under the joint distribution. Although the remapping of these regions also occurs during signal-present trials, covariance for response assignments of “2” nets a small increases here because the overall response frequency increases with decision noise, but the shifted position of the signal-present joint distribution leads to a lower drop in correlation than occurs in signal-absent trials (note the lower impact of regions AD, BA, BB, and BB). On the other hand, the between-category covariance of responses “2” and “3” become increasingly negative on both signal-absent and signal-present trials. These negative covariances occur because response assignments of both “2” and “3” become increasingly associated with “1” on subsequent passes, thereby decreasing the “2–3” covariance from baseline.

Decision Rules

For any task amenable to analysis within the signal detection framework, SDT assumes observers generate responses by comparing internal representations of the trial stimulus with one or more decision criterion. A decision rule constitutes a specific protocol that determines how an observer assigns an internal representation to a response. With static criteria, most straightforward decision rules predict identical responses for any given trial-sampled representation. With noisy criteria, the situation may be quite complex. When the task involves only a single noisy criterion (yes/no, 2AFC, 2IFC with bias, etc), no ambiguity arises in consideration of this comparison. Similarly, for tasks calling for multiple criteria (rating procedures, identification, classification, etc), it is straightforward to map a trial-sampled representation to response as long as the noisy criteria do not overlap from trial to trial. We might even expect the operation of an enforcement mechanism maintaining ordinal relations between trial-sampled criteria (Treisman & Faulkner, 1984).

When noisy criteria have overlapping distributions, trial-sampled criteria may sometimes become disordered along the axis, requiring subjects to implement a more complicated decision rule. Simultaneous decision rules require the observers to compare the internal representation with available criteria all at once. These decision rules then determine a response category by making a unique selection among the results of these comparisons. The work in this paper focuses on several forms of simultaneous decision rules.

We first formulate the simultaneous decision rule used by RK: subtract the position of the stimulus representation from each criterion boundary and respond with the category affording the least positive distance; if all differences are negative respond with category M+1. Following a similar notation used by RK, let s_h ∈ G(0, 1) where G(μ, σ) is a Gaussian random variable with mean μ and variance σ^{. Then}s_hσ_{E_h} equals the random offset of the internal response from its mean position μ_{S_h} due to the subject’s encoding noise during a trial of stimulus class S_h. Also, let c_i ∈ G (0,1) and c_i σ_{C_i} equal a trial-sampled offset of the i^{criterion from its mean location μ}_{C_i} due to the subject’s internal decision noise at that boundary. We now assume a single external noise level σ_ext = 1, so that all parameters are estimated in reference to this term. We let s_ext equal an observer’s consistent trial-by-trial offset to the internal representation due to presentation of a specific sample of Gaussian external noise, so that s_ext ∈ G(0,1). The RK decision rule just described can be formalized as follows: for a trial-sampled stimulus of class h is to choose the category m when the following equation evaluates to true, or category M+1 if the equation evaluates false for all m:

s_{ext} + s_{h} σ_{E_{h}} + μ_{S_{h}} < c_{m} σ_{C_{m}} + μ_{C_{m}} < \min_{\forall m^{'} \neq m} [c_{m^{'}} σ_{{C_{m}}^{'}} + μ_{{C_{m}}^{'}} ∣ s_{ext} + s_{h} σ_{E_{h}} + μ_{S_{h}} < c_{m^{'}} σ_{{C_{m}}^{'}} + μ_{{C_{m}}^{'}}]

(2)

Klauer & Kellen (2012) proposed two alternative simultaneous decision rules. In the first of these alternatives, the decision rule determines the trial-by-trial response according to the rule: subtract the m criterion boundaries from the trial-sampled stimulus representation and respond with the category m+1 yielding the smallest positive distance; in the event all comparisons are negative, choose category 1. The second rule determines the trial-by-trial response by computing the least absolute distance between criterion boundaries and the trial-sampled representation. Specifically, subtract the stimulus representation from all M criterion boundaries, identifying the smallest absolute value of the difference between stimulus representation and criterion boundary m, and choose category m if the difference is positive and m+1 otherwise. This second rule also has the additional consequence that rating frequencies will be symmetrically distributed when the corresponding means of criteria distributions are symmetrically distributed about an evidence distribution. Given any M > 1 trial sampled criteria, these decision rules can be used to map any trial sampled internal representation to overt observer responses.

To distinguish these three decision rules, we follow Kellen et al (2012) and denote RK’s Law of Categorical Judgment as LCJ (given by equation 2); we denote the second (Klauer and Kellen’s complimentary version of the LCJ) as LCJ_c, and the last as LCJ_sym due to its symmetric treatment of criterial boundaries relative to trial sampled representations. Figure 6 contrasts the response mappings for each of these three decision rules when trial-sampled criteria overlap. For a given sample of criteria, the rules prescribe different response profiles for stimuli falling in a given region along the decision axis. Note that for any given overlapping criteria the LCJ and LCJ_c prescribe entirely incongruent responses while LCJ_sym shows some response agreement with both. These differences suggest the possibility that the LCJ will produce distinctly different data patterns in the aggregate from the LCJ_c rule and moderately different patterns from the LCJ_sym rule. With these three different decision rules in hand, we examined the possibility of parameter recovery in simulated MCR experiments using simultaneous decision rules that either matched or mismatched the rule used to generate simulated data.

An external file that holds a picture, illustration, etc.
Object name is nihms691243f6.jpg

Figure 6

Criterion overlap and stimulus-response mapping for three different decision rules. Random trial-by-trial sampling may lead to ordinal rearrangement of criteria (C₁ and C₂). The encircled red letters A, B, C, and D denote different positions of trial sampled stimulus representations falling along the decision axis. An observer requires an explicit decision rule to map the internal representation to a response. Under each stimulus representation, the columns of the Observer Response shows how an observer operating under the LCJ, LCJ_c, and LCJ_sym decision rules classifies each stimulus representation above. See main text for response mapping protocols.

Intuitions and Rationale

σ_{T_{h}}^{2} = σ_{ext}^{2} + σ_{E_{h}}^{2} + σ_{C}^{2}

(1)

Figure 3

Table 1

Degrees of freedom in rating procedure tasks

	Data Points		Free parameters
Rating procedure	2M	<	2M + 3
MCR procedure	2×2M	>	2M + 3

Figure 4

Figure 5

Decision Rules

s_{ext} + s_{h} σ_{E_{h}} + μ_{S_{h}} < c_{m} σ_{C_{m}} + μ_{C_{m}} < \min_{\forall m^{'} \neq m} [c_{m^{'}} σ_{{C_{m}}^{'}} + μ_{{C_{m}}^{'}} ∣ s_{ext} + s_{h} σ_{E_{h}} + μ_{S_{h}} < c_{m^{'}} σ_{{C_{m}}^{'}} + μ_{{C_{m}}^{'}}]

(2)

Figure 6

Simulation Study

In the present study, we recruit the power of external noise and the MCR method in a confidence rating task to disambiguate and estimate criterion noise under the various simultaneous decision rules LCJ, LCJ_c, and LCJ_sym. We derived the expected values of the response frequencies and covariance data conditioned on trial-by-trial samples of external noise. Here in the main text we show the equations describing LCJ. For a formal description of LCJ_c and LCJ_sym, please see Appendix A.

For the LCJ decision rule, the expected response frequencies conditioned on the external noise sample s_ext for the h^{stimulus class are given as,}

P (R = m ∣ s_{ext}, S_{h}) = \int ϕ (c_{m}; μ_{C_{m}}, σ_{C_{m}}) \int_{- \infty}^{μ_{C_{m}} + c_{m} σ_{C_{m}}} ϕ (s_{E_{h}}; s_{ext} + μ_{S_{h}}, σ_{E_{h}}) \prod_{m^{'} \neq m} [1 - \int_{μ_{s_{h}} + s_{E_{h}} σ_{E_{h}} + s_{ext}}^{μ_{C_{m}} + c_{m} σ_{C_{m}}} ϕ (c_{m^{'}}; μ_{C_{m^{'}}}, σ_{C_{m^{'}}}) {d c}_{m^{'}}] {d s}_{h} {d c}_{m}

(3)

where ϕ(x) is the Gaussian probability density function. We then easily determine P (R= M + 1|s_ext, S_h) as $1 - \sum_{m = 1}^{M} P (R = m ∣ s_{ext}, S_{h})$ . The first term in eq. 3 integrates over all possible values of the m^{criterion. The middle term integrates over stimulus representation values up to that criterion. The third term estimates the probability that the response is consistent with any other criterion. We then integrate over all external noise samples}s_ext to get the overall response frequency for this stimulus class h.

P (R = m∣S_h) = ∫P(R = m∣s_ext, S_h)ϕ(s_ext)ds_ext

(4)

Similarly, across any two passes i and j, the covariance between any two response categories m and m′ is,

Cov [R_i = m, R_j = m^∣S_h] = ∫P(R_i = m∣s_ext, S_h)P(R_j = m^∣s_ext, S_h)ϕ(s_ext)ds_ext - P (R_i = m∣S_h)P (R_j = m^∣S_h)

(5)

We now show that data from the MCR experiment adequately constrains the models to uniquely identify individual representational and decision noise components. We approach this problem by examining the precision, accuracy, and goodness-of-fit of recovered model parameters from simulated data. For each decision rule adopted by our simulated observer we tested parameter recovery when fitting simulation data with matched models (e.g., LCJ fitted to data generated with a simulated observer using LCJ) as well as when fitted with mismatched models (LCJ_c and LCJ_sym fitted to data generated with simulated observer using LCJ). In the multi-pass framework, response frequencies and the covariances of responses across passes are estimated. This covariance data paired with the rating response sufficiently specifies the models for independent identification of encoding and decision noise contributions.

Methods

Rationale

In order to demonstrate full parameter recovery for the model using our new framework, we simulated a number of MCR experiments under a range of noise configurations. Because MCR experiments schedule identical stimuli over each pass, data collection may require significant empirical investment. Since the minimal data for acceptable model recovery was of interest, we examined not only the possibility but also the feasibility of parameter recovery at different numbers of trials and passes per simulated experiment.

Our simulations investigated several plausible configurations for the parameters of criterion and stimulus distributions using three response categories and two stimulus classes. We focus on the minimum number of stimuli and rating categories because earlier efforts towards parameter recovery became problematic with fewer response categories. We investigated configurations in which either the criterion noise variances or the encoding noise variances were equated along the decision axis (labeled equ), increased along the decision axis (labeled asc) or decreased along the decision axis (labeled des). We assume a single external noise variance of unity for all stimulus classes, with an external noise mean of zero. For any given variance configuration, $0 \leq \max [σ_{E_{0}}^{2}, σ_{E_{1}}^{2}] \leq 1$ and $0 \leq \max [σ_{C_{1}}^{2}, σ_{C_{2}}^{2}, \dots, σ_{C_{M}}^{2}] \leq 1$ . We also normalized the sum of the highest decision and encoding noise variances to equal the variance of the external noise. In other words, $\max [σ_{E_{0}}^{2}, σ_{E_{1}}^{2}] + \max [σ_{C_{1}}^{2}, σ_{C_{2}}^{2}, \dots, σ_{C_{M}}^{2}] = σ_{ext}^{2}$ . This constraint accords with the reports of previous authors that the total internal noise lies near this level for visual and auditory detection and discrimination experiments over a considerable range of external noise levels² (Burgess & Colborne, 1988; Green, 1964; Lu & Dosher, 2008). For all other noise components, we computed variances by applying logarithmic decrements in the ascending and descending conditions. We positioned each criterion mean along the decision axis at $\frac{1}{3} {(σ_{ext}^{2} + σ_{E_{0}}^{2})}^{1 / 2}$ and $\frac{2}{3} {(σ_{ext}^{2} + σ_{E_{0}}^{2})}^{1 / 2}$ so that we could ensure a robust level of trial-by-trial criterion overlap. Finally, we kept the position of the mean of the signal distribution at ${(σ_{ext}^{2} + σ_{E_{0}}^{2})}^{1 / 2}$ . The various arrangements of parameter configurations is shown in Table 2 and Figure 7.

An external file that holds a picture, illustration, etc.
Object name is nihms691243f7.jpg

Figure 7

Probability density functions for six representative parameter configurations underlying response behavior for simulated observers. Black density functions represent signal-absent trials, red for signal-present trials, and blue for criterion noise. DN: decision noise; EN: encoding noise.

Table 2

Parameter Configurations for Simulation Study

		Encoding Noise
		Equal	Ascending
Decision Noise	0		✓
	Equal	✓	✓
	Ascending	✓	✓
	Descending		✓

The simulated experiments emulated a confidence rating detection paradigm in which an observer maintains two criteria that define three response categories. The simulated observer implemented a LCJ decision rule for all noise level configurations. We also generated simulated data with the LCJ_c and LCJ_sym decision rules for a single parameter configuration in which decision and encoding noise are equal across criterion boundaries and stimulus classes. The probability of a signal present stimulus was 0.5. The simulated experiments varied the number of trials per pass and number of passes per experiment, in addition to a specific parameter configuration. The number of trials n per pass was 250, 500, or 1000 and the number of passes was either four or six. We set the minimum number of passes to four in order to obtain variance estimates on covariance data for weighted-least squares model fitting.

Data analysis

The data were arranged in this way: for each stimulus class h, we have M+1 subject response matrices R^{^m,h}^{of size}T × J, where J is the number of passes, T is the number of trials per pass, and m is an available response category. Then each entry of R^{^m,h}^{contains 1’s for trial responses to stimulus class}h classified as category m and 0’s otherwise. Thus, we denote $r_{j}^{(m, h)}$ as the jT × 1 column vector of the matrix R^{^m,h}^{with the}t^entry $r_{t j}^{(m, h)}$ equal to 1 or 0, signifying whether or not subjects classified the stimulus from the t^{trial of the}j^{pass with a classification of}m. The matrix corresponding to the lowest confidence rating R^{^m}^,h^{was dropped due to its redundancy given the other response rates and fixed trial numbers.}

For every simulated experiment, we computed the relative frequency of the m^{classification rating during each pass}j as

{\hat{p}}_{j} (r = m ∣ S_{h}) = \frac{1}{T} \sum_{t = 1}^{T} r_{t j}^{(m, h)}

(6)

The average of each response rating across all passes is the best and final estimate of the rating response rate. That is

\hat{p} (r = m ∣ S_{h}) = \frac{1}{J} \sum_{j = 1}^{J} {\hat{p}}_{j} (r = m ∣ S_{h})

(7)

Covariance was computed for every combination of passes for every rating category. For passes i and j, where i≠j, and category ratings m and m′, the covariance is given as,

Cov [r_{i}^{(m, h)}, r_{j}^{(m^{'}, h)}] = \frac{1}{T - 1} \sum_{t = 1}^{T} [r_{t i}^{(m, h)} - {\hat{p}}_{i} (r = m ∣ S_{h})] [r_{t j}^{(m', h)} - {\hat{p}}_{j} (r = m^{'} ∣ S_{h})]

(8)

We refer to the covariance as within category covariance when m= m′ and between category covariance when m ≠ m′. For an MCR experiment with J passes, we have $\sum_{j = 1}^{J - 1} j$ observations of within category covariance estimates for each response rating m, and $2 \sum_{j = 1}^{J - 1} j$ observations of between category covariance estimates for each response pairing of m and m′. We took the average of all pairwise estimates as our final covariance estimate between categories m and m′.

Weighted least-squares model estimation requires estimates of the variance for each of the final response rates. The variability of the response rates for each pass was estimated by the variance of each response rate across all passes:

Var [p_{j} (r = m ∣ S_{h})] = \frac{1}{J - 1} \sum_{j = 1}^{J} {[{\hat{p}}_{j} (r = m ∣ S_{h}) - \hat{p} (r = m ∣ S_{h})]}^{2}

(9)

The final estimate of each response rate is the average of the response rates across passes, and the final estimate of variance for an averaged response rate across all passes is given by dividing the variance among individual passes by the total number of passes. That is,

Var [p (r = m ∣ S_{h})] = \frac{Var [p_{j} (r = m ∣ S_{h})]}{J}

(10)

Variances for covariance data were computed by first taking the variance of each within and between pass estimate and then dividing by the $\sum_{j = 1}^{J - 1} j$ or $2 \sum_{j = 1}^{J - 1} j$ possible pairing combinations, respectively.

Modeling

We fit the LCJ, LCJ_c, and LCJ_sym to simulated data derived from each parameter configuration and LCJ decision rule, and to simulated data derived from one parameter configuration using the LCJ_c and LCJ_sym decision rules. Model fits used a Matlab simplex optimization routine (Nelder-Mead) and a weighted least-squares cost function. The cost function heavily penalized a possible solution if any variance parameters fell below zero or if the criterion means violated their ordinal relation. At the beginning of each parameter search routine, we generated initial starting parameters by independently perturbing the true means of each parameter using a Gaussian random number generator with a standard deviation of 0.15σ_ext. Apart from penalties just stated, the constraints imposed on parameters of the simulated observer were not imposed upon the model during parameter recovery: candidate fits of criteria and signal distribution means were not restricted to specific positions along the decision axis nor were they restricted to maintain certain relative distances; nor were any decision and encoding noise variances constrained to sum to unity. We ran 250 experiments at each experimental condition and at each parameter configuration.

Results

We computed the median and 95% confidence interval for each model parameter using the 250 simulated runs at each parameter configuration and pass-trial combination. In every case, the actual parameter values of the simulated observer fell within the 95% confidence intervals of the estimated values for each position and variance parameter. The median parameter values recovered from the matched model were very close to the parameter values used to generate the simulated data. These results stand in contrast to the attempted parameter recovery for decision protocols of the models mismatched against decision rule of the observer. In the case of LCJ_c fitted to the data simulated with LCJ, at least one generative parameter failed to fall within the 95% confidence interval when simulations were run with four passes at 500 trials/pass or with six passes at 250 trials/pass. When we fitted LCJ_sym to the data simulated with LCJ, at least one generative parameter failed to fall within the 95% confidence intervals when simulations were run with four passes at 500 trials/pass.

We also examined the precision and accuracy of our model fits as a function of trials per pass and passes per experiment. We calculated the standard error (SE) of individual recovered parameters by computing the standard deviation of each fitted parameter across all experiments within a given noise configuration, trials/pass, and passes/experiment setting. Similarly, we estimated an individual parameter mean-squared error (MSE) by squaring the difference between the true parameter value adopted by the simulated observer from the corresponding fitted parameter in each experiment and averaging across all experiments within the given configuration, trials/pass, and passes/experiment setting. Mean SEs (averaged across all model parameters), as well as the SE of the most variable parameter, strictly decrease with increasing trials per pass and passes per experiment at each experimental configuration (Figure 8). Mean MSEs (again, averaged across all model parameters) also exhibit a pattern of increasing accuracy (decreasing MSE) with greater numbers of trials and passes for the correctly matched decision rule (Figure 9). The MSE of the most poorly fitted parameters (i.e., those parameters with the highest MSE) also decrease with increasing trials and increasing passes (a single exception occurs in the DN-asc EN-des configuration at 500 trials/pass comparing four vs six passes per experiment).

An external file that holds a picture, illustration, etc.
Object name is nihms691243f8.jpg

Figure 8

Standard error (SE) of parameter fits to data from simulated experiments for different pass-trial and parameter configurations. Decision noise: DN; Encoding noise: EN. Average SE across all parameters given as circles connected by solid lines. Maximum SE among parameters given as blue diamonds (4 passes/experiment) and red asterisks (6 passes/experiment). All parameter configurations show less variability in parameter fits with increasing trials and passes.

An external file that holds a picture, illustration, etc.
Object name is nihms691243f9.jpg

Figure 9

Average mean squared error (MSE) of parameter fits to simulated data for various pass-trial and parameter configurations. Average MSE across all parameters given as circles connected by solid lines for 4 pass and 6 pass experiments. Maximum MSE among parameters given as blue diamonds and red asterisks. (Maximum for DN-asc EN-equ at 250 trials, 4 passes is 0.465; not shown in order to preserve scale).

We also examined fits at six passes/experiment for mismatched relative to matched models (Figure 10). For both fits of LCJ_c and LCJ_sym to an observer using LCJ, the averages of the MSE for mismatched protocols do not generally monotonically decrease with trials/pass or passes/experiment. Furthermore, at six passes/experiment, fits for both mismatched models show a higher average MSE across all trials/experiment relative to MSE for the correctly matched model for all configurations except DN-0 EN-asc. The models perform equally well for simulations assuming zero decision noise because the models make identical predictions for negligible decision noise. For one parameter configuration, we used both LCJ_c and LCJ_sym as our simulation decision rule (Figure 10, bottom). Here too, accuracy improved for matched but not mismatched models with increasing trials.

An external file that holds a picture, illustration, etc.
Object name is nihms691243f10.jpg

Figure 10

Top and middle rows: average log mean-squared error (MSE) for model fits vs trials/pass (assuming six passes/experiment) for the LCJ, LCJ_c, and LCJ_sym matched to data simulated using the LCJ decision rule. Bottom: average (MSE) for model fits to simulations when decision noise and encoding noise are equal across criteria and stimulus classes. Bottom left: LCJ, LCJ_c, and LCJ_sym modeled to data simulated using the LCJ_c decision rule. Bottom right: LCJ, LCJ_c, and LCJ_sym modeled to data simulated using the LCJ_sym decision rule.

An important concern is whether differences in parameter recovery between matched and mismatched models correspond to goodness-of-fit when actual underlying parameters are unknown. A weighted least squares estimate (χ^{) finds parameters that minimize the difference between simulated data and expected values of data based on recovered parameters. We computed χ}^{for each fit of matched and mismatched models to each simulated data set. We averaged across simulations from a given configuration and trials/pass setting using six passes/experiment from mismatched and correctly matched models. In this case, the average χ}^{fits for the correctly matched model remains nearly constant with increasing trials/experiment (}Figure 11). On the other hand, average χ^{for mismatched models increases with increasing trials/experiment for all configurations except DN-0 EN-}asc. In contrast to the other configurations, average χ^{fits for DN-0 EN-}asc are notably consistent across both matched and mismatched fits. For simulated observers with zero decision noise, fits show an increasing accuracy while the log of the mean chi-square fits lie within a narrow range across all trials/experiment for all model protocols. We also investigated the frequency with which the model fits for correctly matched model resulted in lower weighted least square costs than fits for mismatched models. For every configuration except DN-0 EN-asc, χ^{fits were lower for correctly matched models than mismatched models for at least 91% of the individual simulations with four passes and 250 trials/pass. This lower bound on success rate increased to 97% for individual simulations with six passes and 1000 trials/pass.}

An external file that holds a picture, illustration, etc.
Object name is nihms691243f11.jpg

Figure 11

Top and middle rows: average log χ^{for model fits vs trials/pass (assuming six passes/experiment) for the LCJ, LCJ}_c, and LCJ_sym matched to data simulated using the LCJ decision rule. Bottom: log χ^{for model fits to simulations when decision noise and encoding noise are equal across criteria and stimulus classes. Bottom left: LCJ, LCJ}_c, and LCJ_sym modeled to data simulated using the LCJ_c decision rule. Bottom right: LCJ, LCJ_c, and LCJ_sym modeled to data simulated using the LCJ_sym decision rule.

We also examined MSE and χ^{for model fits to data generated using the LCJ}_c and LCJ_sym decision rules for a single parameter configuration, DN-equ EN-equ (Figure 11, bottom). Similar to results when using the LCJ as a generative model, MSE decreased with additional trials for correctly matched rules but did not generally show similar decreases with mismatched rules. Again, the χ^{results for models matched to the generative model remained low with increasing trials, while the χ}^{increased with increasing trials for mismatched models. When using LCJc as the generative decision rule, χ}^{fits for correctly matched models were lower than mismatched models for at least 90% of the individual simulations with four passes and 250 trials/pass. This lower bound success rate increased to 99% of individual simulations with six passes and 1000 trials/pass. However, when using the LCJsym as the generative decision rule, success rate decreased significantly for correctly matched models relative to mismatched models at 60% of individual simulations with four passes and 250 trials/pass, increasing to 80% with six passes and 1000 trials/pass.}

Discussion

Previous attempts to estimate decision noise in simple response signal detection type tasks with two stimulus classes have required strong simplifying assumptions about the various noise components. Here we demonstrate that an MCR procedure provides a sufficiently rich data set to effectively recover decision noise parameters in many representative parameter configurations without assuming specific relationships between noise components. Importantly, this framework uses a model that permits overlapping criterion distributions and a decision rule that deals with this possible overlap.

The results show that both the precision (1/SE) and the accuracy (1/MSE) of the parameters increase with the number of trials/pass and passes/experiment. Furthermore, model fitting is not only possible, but also feasible with a number of total trials amenable to typical experiments in psychophysical studies. For all parameter configurations, it appears that parameter recovery does no worse and often improves with total number of trials up to 2000 total trials. However, within the range of 3000 to 4000 total trials, allocating less trials over more passes results in better average accuracy than a greater number of total trials distributed over less passes for some parameter configurations (cf, DN-asc EN-equ, and DN-0 EN-asc). Still, though the optimal allocation strategy may depend on the underlying parameter configuration, the accuracy generally appears to improve with total number of trials.

For the configuration assuming zero decision noise, our simulations showed that all three decision models gave accurate and precise fits to the data of simulated experiments. This result should come as no surprise because each of the protocols prescribes identical trial-by-trial responses to a trial-sampled representation when criteria remain static over the course of the experiment. However, the results for accuracy look quite different for mismatched model and simulation protocols for all configurations imposing non-trivial decision noise. In every configuration with decision noise the accuracy and χ^{estimates are much worse relative to correctly matched model fits. In these cases, the accuracy generally fails to improve in any significant way with increasing trials/pass or passes/experiment and the χ}^{estimates become notably worse. The failure of these models to fit simulated data from mismatched protocols shows that the χ}^{estimates of recovered parameters for correctly matched pairings do not result from under-constrained models. It appears that some combinations of response frequencies and covariance data are simply not compatible with data sets generated by certain decision protocols. Therefore fitting a decision rule model to data derived from an MCR experiment could recover erroneous estimates of the underlying parameters when the model rule fails to match the decision strategy of the observer. At least in some cases, however, mismatched models can be ruled out by comparison to fits of models more closely aligned with decision rules used by the observer. Some positive evidence exists suggesting that the experimenter may manipulate the observer’s decision strategy by instruction and task structure (}Treisman & Faulkner, 1985). However, a more parsimonious approach would attempt to disambiguate potential protocols through model selection techniques.

In a related study, we investigated the possibility of trade-offs between decision and encoding variance parameters. That is, for a given data set of response frequencies and covariance estimates, are variances associated with decision and encoding processes fungible? Using the LCJ decision rule, we generated expected values of response frequencies and covariance data using the same underlying parameter sets from our simulation study (Table 2) for three response categories. We then independently perturbed these generative parameters using a Gaussian random number generator with a standard deviation of 0.15σ_ext. We then used these perturbed parameters as an initial guess in model fitting routines to assess how changes in model parameters led to differences between expected values in the data obtained from our generative parameters. We penalized violations of criterion ordering along the decision axis, but we did not constrain our model fitting with the same constraints imposed on our simulated observer: decision and encoding noise variances were not constrained to sum to unity. We obtained fits for 500 iterations at each parameter configuration. The norm of the difference between expected values resulting from the fitting routine and those given by the true generative parameters was always greater than zero when the search failed to converge on the true parameters. That is, we did not find any alternative model solutions that resulted in non-zero costs.

Finally, we compared the expected values of the LCJ for each of our representative parameter settings with those obtained when random numbers were given as parameter inputs to the model. The sum of squared differences between model outputs for the representative parameter sets and model outputs for random selected parameters generally increased with the Euclidean distance between parameter sets. This relationship was not monotonic, but a general trend showed an increasing sum of squared error with increasing distance between parameters.

We have demonstrated the feasibility of recovering estimates for decision noise as well as encoding noise within an expanded signal detection framework for representative parameter configurations. These configurations imposed identical positioning of the criteria and signal distribution means, and caps on the total noise at the decision stage. While we do not believe that this circumstance poses any fundamental constraints on the application of our framework, more complex configurations might lead to more variable parameter estimation. For example, a higher overall total internal noise relative to external noise would necessitate a greater number of total trials in order to achieve comparable levels of accuracy and precision in parameter estimates. Nevertheless, the total internal noise levels assumed by our simulated observer lay well within the range often reported in multi-pass experiments (Burgess & Colborne, 1988; Green, 1964; Lu & Dosher, 2008). While simulation studies cannot guarantee that the parameters of the decision noise models considered here uniquely map to confidence rating and covariance estimates, we believe the demonstrations given here provide strong evidence for the efficacy of the procedure in resolving and identifying factors underlying response variability.