SUMMARY
Experimental studies concerning fraud (or “red flag”) checklists often are interpreted as providing evidence that checklists are dysfunctional because their use yields results inferior to unaided judgments (Hogan et al. 2008). However, some of the criticisms leveled against checklists are directed at generic checklists applied by individual auditors who combine the cues using their own judgment. Based on a review and synthesis of the literature on the use of checklists in auditing and other fields, we offer a framework for effective use of checklists that incorporates the nature of the audit task, checklist design, checklist application, and contextual factors. Our analysis of checklist research in auditing suggests that improvements to checklist design and to checklist application methods can make checklists more effective. In particular, with regard to fraud risk assessments, customizing checklists to fit both client circumstances and the characteristics of the fraud risk assessment task, along with auditor reliance on formal cue-combination models rather than on judgmental cue combinations, could make fraud checklists more effective than extant research implies.
INTRODUCTION
Checklists are widely used in auditing to ensure compliance with various requirements, such as completing audit procedures in the appropriate sequence and without omission, collecting or compiling all relevant materials required to complete an audit file, and supporting judgment tasks by listing the key questions to be asked and answered in order to reach appropriate judgments or conclusions. However, there is a widely held view that the use of diagnostic checklists for fraud risk assessment in financial audits yields dysfunctional outcomes (e.g., Hogan, Rezaee, Riley, and Velury 2008; Jamal 2008), and this view is spreading to other uses of checklists (e.g., Wheeler and Arunachalam 2008; Seow 2011). We agree that, when checklists are incomplete, are too long, contain inappropriate content, or are applied inappropriately, they can result in “checklist failure.” However, some interpretations of audit-checklist failures take a relatively narrow perspective and blame the failure on simply the use of a checklist rather than on weaknesses in the design or application of the checklist. Abandoning the use of checklists (e.g., red flag checklists) in favor of unaided judgment on the basis of such a limited perspective may not be the best option for large firms with multi-location audits that require a degree of standardization and coordination in the performance of audit tasks. Our review of literature on the use of checklists in auditing and other fields suggests that improvements in checklist design and application can make checklists more effective.
This study is motivated by the contrast between some of the negative views of checklists expressed in the extant audit literature; in particular, research on the use of “red flag” checklists to support fraud risk assessment and a considerable body of evidence on the successful use of checklists in many other fields including psychology (Fischhoff, Slovic, and Lichtenstein 1978), aviation (FAA 2007; Turner 2001), nuclear power generation (Carvalho, dos Santos, and Vidal 2005; Carvalho, Vidal, and de Carvalho 2007), and medicine (Pronovost et al. 2006; Haynes et al. 2009; Ely, Graber, and Croskerry 2011). Gawande's (2009) book, The Checklist Manifesto, summarizes the benefits achieved by using checklists in the medical field. Kahneman, Lovallo, and Sibony (2011) support Gawande's (2009) endorsement of checklists, suggesting that checklists can be helpful in many areas, most notably in corporate decision making. In fact, Kahneman et al. (2011, 54) provide a 12-question checklist designed to protect corporate decision makers against various “defects in thinking,” although Kahneman (2011, 249) recognizes that some practitioners voice concerns about the “impersonality of procedures that are guided by statistics and checklists.” We believe that criticisms of checklists in auditing should be reassessed in light of these observations.
In this article, we review and synthesize the literature on checklists in auditing and other fields and consider a comprehensive array of factors that affect the use and effectiveness of checklists in auditing, with a focus on diagnostic judgment-oriented checklists. Raiffa (1968) defines a judgmental checklist as an approach to aiding decisions that decomposes a judgment into components that decision makers can assess, and subsequently combine. We provide a framework for thinking about checklists that recognizes that checklist-based outcomes are a joint product of the task and task environment, checklist design, checklist application, and contextual factors. We conclude that appropriately designed and adequately customized checklists, applied strategically using appropriate cue-combination methods, can be effective.
CHECKLIST USE IN AUDITING AND OTHER FIELDS
In auditing, checklists have long been utilized to assist auditors in completing a variety of tasks including client acceptance and retention decisions (Bell, Bédard, Johnstone, and Smith 2002), inherent risk assessments (Boritz, Albuquerque, and Kielstra 1991; Bédard and Graham 2002), internal control evaluations (Ashton 1974; Trotman, Yetton, and Zimmer 1983; Boritz 1985), substantive test planning (Blocher, Esposito, and Willingham 1983; Bonner, Libby, and Nelson 1996), and fraud risk assessments (Pincus 1989). In fact, checklists are a part of GAAS, as the AICPA (2002, 193–197) provides lists of red flags related to misstatements arising from fraudulent financial reporting, as well as misappropriation of assets. Ironically, just as the virtues of checklists are being extolled in other fields (e.g., Gawande 2009; Kahneman 2011), the use of checklists in auditing is being challenged, especially for tasks such as fraud risk assessment. Results of experimental studies examining the use of fraud “red flag” checklists (Pincus 1989; Asare and Wright 2004; Seow 2009) are interpreted as providing evidence that checklists are “harmful rather than helpful in detecting fraud” (Jamal 2008, 103), and that they “restrict the auditor's generation of ideas” for fraud detection (Hogan et al. 2008, 239–240). Also, recent papers question the effectiveness of checklists for audit and tax professionals, such as Seow (2011) for an internal control evaluation task, and Wheeler and Arunachalam (2008) for a tax research task. These authors argue that checklists increase confirmation bias and focus the user's attention solely on cues included in the checklist. Bamber, Carpenter, and Hammersley (2007) suggest that the importance of red flag checklists is diminished with the introduction of SAS No. 99's (AICPA 2002) fraud brainstorming requirement. The authors argue that a SAS No. 99 brainstorming session “cannot be readily reduced to a checklist or form” (Bamber et al. 2007, 4–5).
We reviewed all of the audit research that we could identify that involved the use of checklists, as well as relevant research in medicine and other fields. Overall, our literature review suggests that the question of checklists' effectiveness for various tasks is far from settled. While some studies find mixed or weak evidence of checklist usefulness (Blocher et al. 1983; Bonner et al. 1996), or report dysfunctional consequences of their application (Pincus 1989; Asare and Wright 2004; Bamber et al. 2007; Wood 2012), others conclude that checklists are useful for either performing an audit task (Butler 1985; Alon and Dwyer 2010) or for learning (Seow 2009; J. Rose, McKay, Norman, and A. Rose 2012). Ashton (1974), R. Libby and P. Libby (1989), Bonner et al. (1996), Eining, Jones, and Loebbecke (1997), Bell and Carcello (2000), A. Rose and J. Rose (2003), and Marley (2011) all document improved effectiveness related to using checklist-based aids. Effectiveness gains appear to be stronger for checklist-based audit decision aids that supplement a checklist's knowledge retrieval function with a model for knowledge aggregation. Nelson and Tan (2005, 54) suggest that decision aids “can be used to enhance independence and focus the expertise of large firms in the key areas for which they are most vulnerable to audit failure.” However, potential improvements may be attenuated by issues associated with auditor nonreliance on checklists (Boatsman, Moeckel, and Pei 1997). That is, even when the reliability of a checklist is supported by evidence, auditors may nevertheless overrule it because of a preference for relying on their own judgments or a wish to avoid particular consequences, such as exceeding the budget.
We now examine checklist use in other fields to determine whether auditing practice can benefit from insights provided by research and practice in those fields. In aviation—one of the first industries to adopt checklists—checklists are intended to assist with operational procedures so that pilots can confirm that omission of any critical procedural step is detected and remediated before it becomes consequential. Compliance with standards is achieved via “error trapping” (FAA 2007; Turner 2001), whereby a procedure continues only if all critical errors are intercepted and addressed. Psychology research identifies checklist applications in many areas, including nuclear power generation, in which diagnostic judgmental checklists called “fault trees” can assist personnel in determining the causes of system breakdowns (Fischhoff et al. 1978).1 In addition, Fischhoff et al. (1978) explain that fault trees can help people who deal with complex fallible systems to better describe and comprehend these systems. Carvalho et al. (2005) and Carvalho et al. (2007) report that, at nuclear power plants, personnel use both procedural and judgmental checklists. In plant procedures, the actions of operators are described by flowcharts to illustrate the sequence of the system's actions, as well as checklists in which operators document the specific manual actions they executed. According to Carvalho et al. (2005) and Carvalho et al. (2007), checklists help operators adapt to a high degree of plant automation. Specifically, when they work with checklists, operators become more aware of plant conditions, because checklists prompt them to look for information that is not readily available without a stimulus.
In healthcare, checklists initially were adopted to ensure that doctors and nurses in hospitals completed critical procedures. For example, checklists were implemented in intensive care units (ICUs) to help prevent post-surgery complications (Pronovost et al. 2006). Further, the introduction of a 19-item operating room checklist resulted in an impressive 50 percent reduction in surgical deaths (Haynes et al. 2009). For more information on the use of such checklists in a healthcare setting, please refer to Gawande (2007) or, for a more comprehensive discussion, Gawande (2009). Recently, academics and practitioners have raised questions about whether the successful implementation of procedural checklists can be expanded to judgmental checklists that could be used for medical diagnoses (Ely et al. 2011; Winters et al. 2009; Winters, Aswani, and Pronovost 2011). Ely et al. (2011) suggest that a combination of checklists possessing different strengths could help to minimize a number of individual cognitive biases (such as anchoring, availability, base rate neglect, premature closure, and others; for details refer to Ely et al. [2011]), as well as to avoid “groupthink” when a team of doctors makes a diagnosis.
Based on our review and synthesis of the literature on checklists in auditing and other fields, we developed Table 1, which classifies checklists along two dimensions: (1) generic versus customized, and (2) procedural versus judgmental. A generic checklist is a standardized list of common items to be considered, without being tailored to specific circumstances, whereas a customized checklist is tailored to fit a specific situation (Cowperthwaite 2012). A procedural checklist provides a basic outline of the work to be performed, including all or most of the important steps to be carried out, along with their sequence, which the user can modify as required to accommodate the details of a particular application. In addition to ensuring standardization of procedures across, for example, audits, geographical locations, and audit team members, and the completeness and appropriate sequencing of procedures, checklists provide documentation of the procedures actually performed, which can facilitate review (Anderson 1977, 413). Recent findings in PCAOB inspection and peer review reports (Gramling and Watson 2009; PCAOB 2013) suggest that audit documentation remains a major area of concern for regulators, particularly for audits of accounting estimates (e.g., fair values) and for fraud risk assessment. The use of checklists, therefore, can provide the additional benefit of fulfilling documentation requirements imposed by regulatory and standard-setting bodies, such as the PCAOB.
Table 1 serves as a reminder that, when assessing the strengths and weaknesses of checklists, it is important to distinguish among the various types of checklists represented in the four cells in Table 1. As stated earlier, the focus of this paper is on judgment-oriented checklists such as those used in client acceptance and continuance, internal control evaluation, and fraud risk assessment, summarized in the cells in the bottom half, and especially the lower right hand corner of Table 1.
Our review of the literature identified factors that influence the effectiveness of judgment checklists that we have summarized in Figure 1 under four main headings: nature of the task, checklist design, checklist application, and contextual factors. Below, we discuss these in turn, using fraud risk assessment as an illustrative example of a diagnostic audit task.
NATURE OF THE TASK
The nature of the task determines the need for—and should influence the design of—decision aids, such as checklists, to enhance task performance. Although we discuss a variety of audit checklists throughout this paper, in this section we focus on fraud checklists because of the prominence of their coverage in the literature and the attention paid by regulators and practitioners to the auditor's assessment of and response to fraud risk (McKee 2010; PCAOB 2012). Fraud risk generally is classified into two main types: financial statement fraud and misappropriation of assets.2 Most of the fraud-related auditing literature addresses financial statement fraud, and we also focus on this type of fraud. One key difference between fraud detection in an audit setting and tasks in other fields in which checklists are used (e.g., nuclear power generation, aviation, medical practice) is in the strategic nature of financial reporting fraud. The fraud perpetrator uses deception to mislead auditors and counter the auditors' attempts to detect fraud through the use of red flag checklists, whereas no such intentional deception is at play in other fields in which checklists are used.
Economic game theory suggests that the most effective course of action for a fraud-perpetrating manager is to follow a randomized strategy when perpetrating the fraud (i.e., one that makes it difficult to predict where and when the fraud was perpetrated), and that the auditor's optimal response is to randomize the audit strategy to make it more difficult for fraud-perpetrating managers to guess where and when the auditor might look for fraud (Jamal 2008, 102). However, current risk-based audit practice represents a type of focused (nonrandomized) strategy, which game theory suggests is suboptimal for countering a randomized fraud strategy, because managers can use the auditor's red flag checklists to guess the auditor's strategy and perpetrate fraud in areas that do not appear to be risky. Therefore, proponents of this view argue that a risk-based audit is more suitable for detecting errors, because they are unintentional, than fraud, which is deliberate and strategic (Jamal 2008, 102–103). Research by Hoffman and Zimbelman (2009) concludes that the strategic nature of fraud requires strategic thinking on the part of the auditor, including whether and how red flag checklists are deployed.
We believe that it is difficult for management to implement a randomized fraud strategy because of the constraining effects of the double entry accounting system and the need to recruit collaborators. These factors can reduce the unpredictability of the methods used by fraud perpetrators and can permit the successful application of checklist-based decision aids, as documented by Eining et al. (1997) and Bell and Carcello (2000). However, we agree that strategic thinking on the part of the auditor is required.
CHECKLIST DESIGN
A checklist's design greatly influences its effectiveness (e.g., Glover, Prawitt, and Spilker 1997). The framework in Figure 1 identifies several key checklist design issues that influence the effectiveness of checklists: customization (related to checklist type), comprehensiveness versus diagnosticity, and structure of the checklist content.
Customization
Recent discussions in the medical and auditing fields suggest that customizing diagnostic judgment checklists is necessary to increase their effectiveness (Ely et al. 2011; Cowperthwaite 2012). According to Ely et al. (2011), diagnostic medical checklists are tailored to specific diseases. In fraud risk assessment and, more generally, in audit settings, customization is accomplished by adapting the checklist to the characteristics particular to the client and/or the industry in which the client operates, the staff mix specific to the engagement, and to the areas of the audit in which professional judgment is important (Cowperthwaite 2012, 39).3 Cowperthwaite (2012) suggests that auditors can start with a generic checklist and then tailor it based on factors such as their knowledge of the client and understanding the requirements of GAAS. The customization helps to omit irrelevant items in the checklist and emphasize items important to the engagement (Cowperthwaite 2012).
In the fraud risk assessment task, besides enhancing an auditor's risk assessments, an additional and important benefit of customization is its positive effect on the checklist user's ability to translate a diagnostic judgment into an appropriate audit program. Hammersley (2011) provides evidence that a red flag fraud checklist can produce appropriate fraud risk assessments, but should be supplemented with client-specific contextual cues (i.e., the client's circumstance in the year under audit) in order to effectively translate the fraud risk assessments into appropriate changes to the audit plan. These benefits of customization are in line with the critique that “using standardized ‘checklists' may not reflect an allocation of audit work weighted toward high-risk areas” (PCAOB 2006; emphasis added). In other words, customization can eliminate a mismatch between the decision aid and the task situation, as well as between the decision aid and the decision maker, which has been shown to cause “passive” application of the aid (i.e., its mechanical use without engaging in active thinking about the task) and inappropriate reliance (Glover et al. 1997). Some examples of checklists that would be useful are client acceptance/retention checklists, fraud risk checklists, and internal control evaluation checklists.
Comprehensiveness versus Diagnosticity
Two potentially conflicting design-related issues associated with checklist effectiveness are comprehensiveness of cues and the relative diagnosticity of positive and negative cues. Pincus (1989) states that the omission of apparently useful red flags in her checklist highlights the need to better investigate financial statement fraud indicators, and that developing a comprehensive set of such indicators might increase the effectiveness of fraud checklists.4 As the search for useful red flags continues, auditors should be aware of new types of red flags identified and evaluated by researchers and regulators, such as disparities between financial and nonfinancial indicators (Brazel, Jones, and Zimbelman 2009; ASC 2011); e.g., growth in revenue outpacing that in warehouse space, number of retail outlets, or employee headcount.
However, as desirable as the comprehensiveness of cues in a checklist may be, too many cues can cause diagnosticity problems because a large proportion of red flags identified in the literature and in auditing standards are ineffective in distinguishing between fraud and nonfraud engagements (Bell and Carcello 2000), which raises the issue of red flags' predictive ability. Pincus (1989) notes that her results are consistent with users of checklists under emphasizing negative indicators suggesting potential fraud. This, in turn, can result in judgment errors at the cue-combination stage (i.e., the stage at which the auditor synthesizes the cues obtained from all of the items included in the checklist into a single overall judgment of fraud risk). While there could be many reasons for such an under emphasis, Pincus (1989) believes that an excessive number of red flags with weak predictive ability likely draws attention away from those with strong predictive ability, suggesting that checklist design can be improved by identifying and utilizing a small number of highly predictive items, and dropping the large number of red flags with limited predictive ability, as in Bell and Carcello (2000).5 Pincus's (1989) suggestions are confirmed in more recent studies by Wood (2012) and Bédard and Graham (2002). Wood (2012) produces evidence that, for a checklist used in a fraud risk assessment task, nonpredictive cues “dilute” the predictive value of diagnostic cues. Bédard and Graham (2002) find that a negatively oriented checklist causes auditors to identify more risk factors compared to auditors who use a positively oriented checklist, if the client has high engagement risk.6
Wood's (2012) and Bédard and Graham's (2002) findings suggest that audit firms could improve their risk assessment procedures through relatively simple modifications to checklist design. In practice, at least some audit firms appear to be using fraud risk checklists for which red flag selection is based on statistical procedures that utilize historical data from a variety of their clients. For example, in Boatsman et al. (1997) an audit firm provided the researchers with an actual decision aid that used 24 red flag indicators to aid in fraud risk assessment. These indicators were extracted statistically from a wider set of red flags documented in prior research and professional literature, resulting in an effective checklist (Boatsman et al. 1997, 219).7 In addition, auditors should not disclose details of their audit plans, including their final checklist's composition, to the client. Moreover, if the auditor feels that there is a need to be strategic, then she/he can collect the information related to red flags with limited predictive ability, but not use them in her/his final risk assessments (however, this can increase the cost of using checklists).
Structure
Boritz (1985) highlights three benefits of applying hierarchical structures to audit checklists: structures clarify problems through decomposition and modularization of information sets; structures limit the use of heuristics and biases of auditors; structures incorporate knowledge and expertise that highlight relationships and information cues. Wilks and Zimbelman (2004) provide evidence that structural improvements can make fraud risk assessment checklists more effective. In their experiment, audit managers who used fraud checklists in conjunction with the decomposition of fraud risk assessments according to the fraud triangle (which postulates that three conditions must be present for a fraud to occur: incentive, opportunity, and attitude) assessed fraud risk better than did managers who relied on checklists without such decomposition (holistic approach), but only when opportunity and incentive cues indicated low risk. This result demonstrates that the decomposition aid is more effective than a traditional checklist in certain contexts.8 In extant auditing research, this study often is cited as evidence that checklists are not useful (Hogan et al. 2008). However, we believe that Wilks and Zimbelman's (2004) results indicate that improvements in cue organization can increase a checklist's effectiveness.
CHECKLIST APPLICATION
The framework in Figure 1 identifies two key checklist application issues that influence the effectiveness of checklists: the method of combining cues and individual versus group use of checklists.
Method of Combining Cues
Kahneman (2011) describes two reasoning systems that govern cue processing: an intuitive system (System 1) that rapidly combines cues into a composite based on prior knowledge, heuristics, stereotypes, expectancies, scripts, and schemas (Bargh and Chartrand 1999), and a deliberative system (System 2) that combines cues in accordance with a rational decision model. Because of its speed and efficiency, the intuitive reasoning system often preempts the deliberative system and leads people to rely on heuristics and biases. Sloman (1996) points out that, even when a person tries to be rule governed, responses prompted by unconscious associations with information cues encroach on judgment quality. This conclusion leads to the rationale for using decision aids to compensate for the risks associated with the application of intuitive judgment to combine the cues identified with the help of checklists.
Recent behavioral studies suggest that deliberative thinking can be induced to improve fraud risk assessments. For example, if opportunistic managers anticipate and take advantage of a traditional risk-based audit strategy by concealing misstatements within low-risk accounts more often than within high-risk accounts, auditors can successfully predict and counter such managers' expectations if they are prompted to think strategically about managers' expected responses to their audit approach (Hoffman and Zimbelman 2009, 2012). This supposition suggests that checklists' usefulness for fraud risk assessments can be increased if the checklists are designed to encourage deliberative, strategic reasoning about the appropriate audit response to identified risks. One possible way to accomplish this is to incorporate items that help auditors identify and direct their attention to accounts and classes of transactions for which the traditional audit approach is expected to produce a low-risk estimate.
Beyond inducing deliberative instead of intuitive cue weighting and combination, is the possibility of using computational models. Extant psychology research suggests that individuals are not skilled at identifying complex, nonlinear relationships within data (Hammond and Summers 1972). Therefore, simple, additive models utilizing limited information, such as those based on linear regression, frequently perform better at identifying relationships than do experts, even when a large amount of data is available (e.g., Dawes 1971). Pincus (1989) mentions that she did not have a reliable empirical cue-processing model at her disposal when she conducted her study; therefore, cues were evaluated using only the auditor's judgment. In contrast, Eining et al. (1997) and Bell and Carcello (2000) had two different but reliable cue-processing models and were able to compare the usefulness of three types of fraud cue-processing aids: a checklist, a statistical model, and an expert system. Eining et al. (1997) and Bell and Carcello (2000) confirm that auditor judgment based on using the checklist alone is inferior to unaided judgment; however, the fraud risk assessments obtained with the statistical model and the expert system are far superior to those obtained through the use of unaided judgment.9 The key point that must be emphasized here is that the checklist-based identification of cues is necessary (although not sufficient) for the effective functioning of both the statistical model and the expert system. In other words, the problem with the red flag checklist is not the use of a checklist per se but the weak design of the checklist and the ineffective combination of the cues identified using the checklist.
Using cue-combination models also can benefit other types of audit checklists including those intended for client acceptance and continuation judgments, control risk evaluation, and planning of substantive tests. For client acceptance and continuation, Bell et al. (2002) discuss a sophisticated decision aid developed by KPMG that decomposes the decision into several categories and subcategories. This process involves an extensive set of checklists that include information gathered by the audit staff, which is then combined into an overall judgment of acceptance/continuation risk by a set of mathematical algorithms. For planning checklists, Bonner et al. (1996) find that a checklist combined with a model for aggregating judgment components performs significantly better than unaided judgment, while a checklist alone gives only a slight improvement, because of the ability of the combined aid to facilitate both knowledge retrieval and aggregation. For control checklists, Ashton (1974) states that a likely reason for inconsistencies in internal control risk judgments among auditors is differential weightings of the internal control indicators they use. He suggests that a possible solution is to rely on a model that approximates “true” statistical weights for the combination of indicators into an overall judgment (Ashton 1974, 153). Libby and Libby (1989) demonstrate that making component judgments of the strength of the individual controls, and then aggregating the component judgments using a “mechanical” model is superior to audit seniors' global control risk assessments.
These results provide two key insights into the usefulness of audit checklists. First, it appears that auditors' judgment processes, when using checklist-gathered evidence, “breaks down” when they combine the cues obtained from the checklist into a final judgment. Second, the cue-combination models and expert systems are, in essence, a combination of two decision aids—a checklist for finding the cues (knowledge retrieval) and a model for aggregating the set of cues (knowledge aggregation). For the model to be effective, an appropriately small set of diagnostic cues to be processed by the model is required. Thus, a suitably designed audit checklist applied jointly with an effective cue-combination model can be a useful tool in conducting audits. In addition, it can reduce liability, as will be discussed later.
Individual versus Group Judgment
In the medical field, both procedural operating room and judgmental diagnostic checklists are used in group settings. Such settings involve doctors specializing in different clinical disciplines, nurses, and patients, who tend to use distinct terminologies to describe the same phenomena, thus causing miscommunication because of translation errors (Winters et al. 2009). Winters et al. (2009) suggest that checklists help to “democratize knowledge” by standardizing and facilitating the reliable translation of information, so that the same knowledge awareness of the importance of a given set of cues is imparted to persons with different perspectives, professional cultures, and educational backgrounds. There also are suggestions that general diagnostic checklists that prompt doctors to optimize their cognitive approach can reduce groupthink, whereby a diagnosis by one member of a group of doctors is hastily adopted by the others (Croskerry 2007; Ely et al. 2011).
In auditing, as in medicine, many tasks involve multiperson teams. Prior research has shown that mathematical composites of checklist-based individual judgments can outperform individuals in judgment tasks, such as internal control evaluation (Trotman et al. 1983). Fraud risk assessments in financial audits are, at least in part, an interactive group process, as SAS No. 99 (AICPA 2002) stipulates that auditors must discuss the potential for material misstatement caused by fraudulent activities, including an exchange of ideas or “brainstorming” among the members of the audit team. These brainstorming sessions have been shown to be effective (Hoffman and Zimbelman 2009, 2012), and a positive role of checklists in such group judgment and decision making is their ability to reduce miscommunication. Alon and Dwyer (2010) investigate how the brainstorming component of SAS No. 99 (AICPA 2002) influences checklist use and reliance, and the effectiveness of fraud risk assessments. Using an approach similar to that employed by Pincus (1989), Alon and Dwyer (2010) find that two-person groups of students with a checklist decision aid outperform individuals with and without the decision aid, because of information processing gains arising from the group interaction. Further, groups with the decision aid identified more quality fraud ideas than those without the aid, suggesting that checklists add to decision quality for brainstorming groups (Alon and Dwyer 2010, 248). These results counter Bamber et al.'s (2007) argument that the SAS No. 99 brainstorming requirement reduces the importance of fraud checklists.
IMPACT OF CONTEXTUAL FACTORS
Psychology research recognizes the importance of contextual factors for both individual and group judgment and decision making (Kerr and Tindale 2004; Bonner 2008), implying that these factors also influence the use of checklists. The framework in Figure 1 identifies the following three key contextual factors that affect checklist use in auditing: performance pressures, legal liability considerations, and auditor characteristics (e.g., Ng and Tan 2003; Hammersley 2011).10
Performance Pressures
Gawande (2009) reports that procedural checklists are very effective in high-pressure work environments (e.g., operating rooms, airplane cockpits). These checklists improve outcomes because they draw users' attention to tasks that have to be completed, but may often be overlooked if there is no checklist. This creates an expectation that such checklists would perform well in other high-pressure workplaces, such as those encountered by auditors. Surprisingly, research studying how procedural checklists are and should be used in audits is almost nonexistent. McDaniel (1990) investigates how the joint imposition of program structure (consisting of detailed procedures and/or checklists) and time constraints affects auditors' performance. McDaniel finds that, when time pressure is low, structure significantly increases audit effectiveness, efficiency, and consistency, and when time pressure increases, audit efficiency increases and effectiveness decreases. Structure is associated with smaller decreases in audit effectiveness at higher levels of time pressure, but the difference is not significant (McDaniel 1990, 268). This result is somewhat inconsistent with the effective performance of procedural checklists in the high-pressure tasks described in Gawande (2009), so the issue requires further investigation. Blocher et al. (1983) find that when auditors use procedural checklists for planning analytical review, they plan more analytical and more tests of details procedures, compared to when they do not use such checklists.
Legal Liability Considerations
Sutton, Young, and McKenzie (1994) investigate liability issues arising from an audit firm's use of an expert system (as discussed, such systems may be based on structured checklists) and identify two sources of risk. One is that the joint judgment of the auditor and the expert system may not supply the required level of expertise, and the auditor will be held liable under GAAS fieldwork standards (AICPA 1993; PCAOB 2001) for not properly planning and supervising the use of the system. The other is that not relying on an expert system (when one is available) may be turned against the auditor because the existence of such a system can increase the level of expertise a prudent practitioner should exercise. Jennings, Kneer, and Reckers (1993) find that, when GAAS standards are not readily at hand for a particular area, jurists use audit firms' internal decision aids as surrogate standards against which to gauge auditors' performance. Results in Lowe, Reckers, and Whitecotton (2002) indicate that, if a decision aid is reliable, and the auditor followed its recommendations, the jurors attributed lower responsibility to the auditor. This indicates that jurors place a high value on decision aids and expect auditors to use them when effective ones are available.
Auditor Characteristics
Some auditors are very hesitant to rely on the judgment output of mechanical decision aids, particularly when faced with performance pressures such as financial incentives, justification, and outcome feedback.11 More experienced auditors appear to rely on decision aids less because of their confidence, perhaps even overconfidence, in their own expertise. Boatsman et al. (1997) found that auditors were more convinced of the credibility of the checklist-based decision aid when it predicted no fraud rather than fraud and their reliance on a decision aid was a function of their concern about the severity of anticipated penalties for incorrect decisions.12 From a practical point of view, an interesting finding of the study is that a simultaneous increase in the penalties for audit failure and for over auditing generates a higher reliance on the aid. Kaplan, Reneau, and Whitecotton (2001) find that auditors with an external locus of control rely more on a mechanical cue-combination decision aid than those with an internal locus of control, and that involving decision makers in the aid's development enhances reliance (the effect is more pronounced for internal locus of control auditors). Sieck and Arkes (2005) discover that people's overconfidence in the quality of their intuitive judgment adds to their reluctance to use effective cue-combining aids, and that only enhanced calibration feedback (involving answering several questions from memory) reduced overconfidence and increased reliance on the aid. Gomaa, Hunton, and Rose (2008) find that practitioners rely more on decision aids when either litigation risk or control risk is high; when both risks are high, litigation risk amplifies the auditors' awareness of legal defensibility and, thus, increases decision aid reliance, even though the auditors' confidence in the quality of their judgment deteriorates. In other words, there is a tension between the overconfidence that could lead auditors to ignore or override decision aids and the pressure to reduce legal liability that could lead auditors to subordinate their judgment to that of the checklist.
RECOMMENDATIONS AND CONCLUSION
Our review of existing audit research indicates that procedural checklists are generally thought to perform satisfactorily; whereas, there is a widely held view that the use of diagnostic checklists for fraud risk assessment in financial audits yields dysfunctional outcomes (e.g., Hogan et al. 2008; Jamal 2008), and that this view is spreading to other uses of checklists (e.g., Wheeler and Arunachalam 2008; Seow 2011). However, we believe that abandoning the use of checklists, such as red flag checklists, in favor of unaided judgment may not be the best option for large firms with multilocation audits that require a degree of standardization and coordination. Our analysis of the literature along dimensions of the suggested framework in Figure 1 identified various strategies to improve checklist effectiveness. These strategies are presented in Table 2, using the fraud risk assessment checklist as an illustrative example. Below, we highlight some of the key recommendations in this table.
Our analysis indicates that cue processing appears to be the weak link in the use of judgmental checklists, implying that they need to be integrated with a cue-combining model, an algorithm based on a statistical model or expert system, to achieve better performance. If an appropriate cue-combining model is not available, then checklists need to be designed to trigger deliberative strategic thinking on the part of the auditor to better counter the strategic nature of management behavior. For example, in a fraud risk assessment context, this could be accomplished by encouraging auditors to critically address accounts that the traditional audit approach labels as “low risk,” or to consider management's anticipation of their audit program changes in response to identified risks.
Another option is to use groups to mitigate the weaknesses associated with unaided individual judgment (such as in a fraud risk brainstorming session), even though groups are themselves subject to a number of factors that can reduce judgmental group performance (Kerr and Tindale 2004). Checklists can exert positive effects on group judgment by mitigating miscommunication, groupthink, or other factors known to reduce judgmental group performance. In particular, they may mitigate miscommunication between various specialists (e.g., forensic specialists, tax specialists, valuation specialists) and regular audit team members during the risk assessment planning phase of the audit.
Contextual factors, such as performance pressures and the legal environment, need to be considered by audit team managers, as they can affect the degree of reliance placed by audit team members on checklists and impact the translation of assessed risk into changes in the audit program. In audit settings, the use of checklists renders an additional benefit of fulfilling documentation requirements imposed by regulatory and standard-setting bodies, such as the PCAOB. The application of judgmental checklists can produce necessary evidence supporting auditors' conclusions in the areas in which appropriate documentation is currently lacking. For example, red flag checklists can be useful for documenting the estimate of risk of material misstatement because of fraud, while checklists designed for fair value auditing (e.g., PwC 2011; Deloitte 2012) can improve the documentation in that area.
Many important questions related to the use of audit checklists remain unexplored and open for future research. Cowperthwaite (2012) asserts that customization may mitigate some of the weaknesses of generic red flag checklists while retaining some of their benefits; however, such customization has not been researched in audit settings and raises several questions.13 If the engagement team is qualified to modify the checklist (and is allowed by their firm to do so), then why would they not be able to make good use of the standard checklist? Will audit teams take the time needed to modify checklists appropriately? If the engagement team modifies the checklist, might they make it worse, not better? These questions suggest that much useful research could be conducted in this area. Gawande (2009, 120) asserts that a good checklist must possess two qualities—it should be short and precise. However, among the many studies discussed in this commentary, only Bell and Carcello (2000) and Boatsman et al. (1997) address the issue to some extent, reflecting on the theoretical and practical aspects of cues' diagnosticity. Clear criteria for the precision and inclusion/exclusion of questions in an audit checklist are yet to be developed. Another area for research is the communication-enhancing role of checklists—both the communication between different types of specialists and between supervisors and subordinates (Gawande 2009, 67, 103; KPMG 2011). Evidence needs to be obtained on what role checklists can play in integrating the diverse personnel involved in the audit process.
REFERENCES
A fault tree is a judgmental checklist organized as a decision tree. In more precise terms, in a fault tree, a system failure state is postulated and sequences of more basic faults contributing to this state are laid out in a systematic manner (U.S. NRC 1981, I-8). Checklists with similar properties to fault trees have been used in auditing research and practice in connection with expert systems developed in the 1980s and 1990s for evaluating loan loss provisions, tax provisions, inherent risk assessments, going concern evaluations, and other diagnostic tasks with embedded decision trees.
Financial statement fraud is the intentional misstatement or omission of amounts or disclosures in financial statements, typically by top management, to deceive users. Misappropriation of assets (i.e., employee fraud) is the second main category of fraud (AICPA 2002). Financial statement fraud is, on average, much more likely to be material than employee fraud, even though it occurs much less frequently than does employee fraud (Wells 1997), because top management has the ability to influence larger amounts than do employees, and the financial misrepresentations designed to mislead users, if successful, are by definition material.
The staff mix includes the levels of experience possessed by the audit team members, as well as the involvement of specialists (e.g., IT specialists, forensic specialists).
In Pincus (1989), the participants who did not use the red flag checklist relied on a number of red flags not included in the checklist, such as public company status, the auditor's opinion about certain components of the client's governance, and the client's cash management skills.
Pincus's (1989) questionnaire is based on the questionnaire in Romney, Albrecht, and Cherrington (1980), which contains items with a wide variance in predictive ability (Albrecht and Romney 1986). Bell and Carcello's (2000) checklist relies on a small number of significant red flags (i.e., weak internal control environment, rapid company growth, inadequate or inconsistent profitability, undue emphasis on meeting earnings projections, lying to or evasiveness with auditor, aggressive financial reporting, and ownership status).
In Bédard and Graham (2002), a checklist has a “negative” focus if client risk and its consequences are emphasized and a “positive” focus if such factors are not emphasized.
The aid validation demonstrated a classification accuracy of 81 percent for both fraud and nonfraud audits.
The decomposition is distinct from grouping checklist items, which has been incorporated in the standards (AICPA 2002; Wilks and Zimbelman 2004, 741).
Interestingly, a contemporaneous study by Shelton, Whittington, and Landsittel (2001) demonstrates that, in practice, Big N and second-tier audit firms did not rely widely on expert systems and other tools for processing cues obtained from their red flag checklists.
Performance pressures are those arising from accountability structures, time pressures, and financial and other incentives.
Nelson and Tan (2005) provide a literature review that includes a section on decision aids; however, a recent review by Knechel, Krishnan, Pevzner, Shefchik, and Velury (2013) barely mentions them.
Boatsman et al.'s (1997) decision aid consists of a list of 24 red flags with a subsequent mechanical combination of cues to estimate fraud potential. It was provided to the authors by an audit firm.
We are grateful to the editor for suggesting these questions.