ABSTRACT
Developing models to detect financial statement fraud involves challenges related to (1) the rarity of fraud observations, (2) the relative abundance of explanatory variables identified in the prior literature, and (3) the broad underlying definition of fraud. Following the emerging data analytics literature, we introduce and systematically evaluate three data analytics preprocessing methods to address these challenges. Results from evaluating actual cases of financial statement fraud suggest that two of these methods improve fraud prediction performance by approximately 10 percent relative to the best current techniques. Improved fraud prediction can result in meaningful benefits, such as improving the ability of the SEC to detect fraudulent filings and improving audit firms' client portfolio decisions.