A Diffusion Model Account of the Lexical Decision Task
Abstract
The diffusion model for 2-choice decisions (R. Ratcliff, 1978) was applied to data from lexical decision experiments in which word frequency, proportion of high- versus low-frequency words, and type of nonword were manipulated. The model gave a good account of all of the dependent variables—accuracy, correct and error response times, and their distributions—and provided a description of how the component processes involved in the lexical decision task were affected by experimental variables. All of the variables investigated affected the rate at which information was accumulated from the stimuli—called drift rate in the model. The different drift rates observed for the various classes of stimuli can all be explained by a 2-dimensional signal-detection representation of stimulus information. The authors discuss how this representation and the diffusion model’s decision process might be integrated with current models of lexical access.
The lexical decision task is one of the most widely used paradigms in psychology. The goal of the research described in this article was to account for lexical decision performance with the diffusion model (Ratcliff, 1978), a model that allows components of cognitive processing to be examined in two-choice decision tasks. Nine lexical decision experiments, manipulating a number of factors known to affect lexical decision performance, are presented. The diffusion model gives good fits to the data from all of the experiments, including mean response times for correct and error responses, the relative speeds of correct and error responses, the distributions of response times, and accuracy rates.
In the diffusion model, the mechanism underlying two-choice decisions is the accumulation of noisy information from a stimulus over time. Information accumulates toward one or the other of two decision criteria until one of the criteria is reached; then the response associated with that criterion is initiated. In the lexical decision task, one of the criteria is associated with a word response, the other with a nonword response. The rate with which information is accumulated is called drift rate, and it depends on the quality of information from the stimulus. In lexical decision, some stimuli are more wordlike than others, and so their rate of accumulation of information toward the word criterion is faster; other stimuli, such as random letter strings, are so un-wordlike that information accumulates quickly toward the nonword criterion. For the nine experiments presented below, the drift rates can be summarized quite simply. First, the ordering of the drift rates from largest to smallest is as follows: high-frequency words, low-frequency words, very low-frequency words, pseudowords, and random letter strings. Second, the differences among the drift rates are larger when the nonwords in an experiment are pseudowords than when they are random letter strings.
For our framework, Figure 1 outlines the relationships among lexical decision data, the diffusion model, and word recognition (lexical) models, and shows how the data do not map directly to lexical processes but, instead, map to lexical processes only through the mediation of the diffusion model. Data enter the diffusion model, which produces the values of drift rates for the different classes of stimuli that give the best account of the data. In this framework, the role of a word recognition model is to produce values for stimuli for how wordlike they are. We call the measure of how wordlike a stimulus is its wordness value (a term intended to be neutral for the purposes of this article). Wordness values map onto the drift rates that drive the diffusion decision process to produce predictions about accuracy and response time.

The relationship between data, the diffusion model fits, drift rates, and models of word identification. The diffusion model fits the data and provides values of drift rate that represent how wordlike the stimulus is. The word identification models need to produce values of drift rate to provide a complete description of the data. A complete model would represent lexical processing, which would produce drift rates to feed into the diffusion model to produce predicted values of the dependent measures. RT distribs. = response time distributions.
In our framework, wordness values place fewer constraints on word recognition models for the lexical decision task than has been appreciated. All that is required is that a model produce the appropriate ordering of wordness values: from high-frequency words to low- and very low-frequency words to pseudowords and random letter strings, with larger differences among them when the nonwords in an experiment are pseudowords than when they are random letter strings. In other words, the disturbing and simple conclusion from the diffusion model’s account of lexical decision is that, beyond what can be said from a bare ordering of wordness values, the lexical decision task may have nothing to say about lexical representations or about lexical processes such as lexical access. Lexical decision data do not provide the window into the lexicon that might have been supposed in earlier research.
The framework shown in Figure 1 is counter to much previous work that has assumed lexical decision data do map directly onto lexical processes. Often, lexical decision response time (RT) has been interpreted as a direct measure of the speed with which a word can be accessed in the lexicon. For example, some researchers have argued that the well-known effect of word frequency—shorter RTs for higher frequency words—demonstrates the greater accessibility of high-frequency words (e.g., their order in a serial search, Forster, 1976; the resting levels of activation in units representing the words in a parallel processing system, Morton, 1969). However, other researchers have argued, as we do here, against a direct mapping from RT to accessibility. For example, Balota and Chumbley (1984) suggested that the effect of word frequency might be a by-product of the nature of the task itself and not a manifestation of accessibility. In the research presented here, the diffusion model makes explicit how such a by-product might come about.
The sections below begin with a detailed description of the diffusion model; then nine experiments are presented, and the model is fit to the data from each one. The main result is that the differences in performance for various classes of stimuli are all captured by drift rate, not by any of the other components of processing that make up the diffusion model.
Contributor Information
Roger Ratcliff, The Ohio State University.
Pablo Gomez, De Paul University.
Gail McKoon, The Ohio State University.


