This paper presents a template‐based approach to extract data from the EDGAR database. A set of heuristic‐based templates is used to configure the trainable system in order to have one type of EDGAR filings processed in a single configuration. Such configurability is highly desirable as it adds expendability and flexibility to this system. The template‐based approach also enables the system to extract both structural information and content from the filings in the EDGAR database. The ability to extract structural information from a section or a complete filing makes it possible to collect data from real‐world documents for users of financial data in both academia and industry. We use the income statement section of 10‐K filings to illustrate the system and the utilization of the template‐based approach.
Skip Nav Destination
Article navigation
1 December 2007
Research Article|
January 01 2007
Extraction of Structure and Content from the Edgar Database: A Template‐Based Approach
Alexander Kogan;
Alexander Kogan
Rutgers, The State University of New Jersey, Newark
Search for other works by this author on:
Miklos A. Vasarhelyi
Miklos A. Vasarhelyi
Rutgers, The State University of New Jersey, Newark
Search for other works by this author on:
Online ISSN: 1558-7940
Print ISSN: 1554-1908
American Accounting Association
2007
Journal of Emerging Technologies in Accounting (2007) 4 (1): 69–86.
Citation
Yu Cong, Alexander Kogan, Miklos A. Vasarhelyi; Extraction of Structure and Content from the Edgar Database: A Template‐Based Approach. Journal of Emerging Technologies in Accounting 1 December 2007; 4 (1): 69–86. https://doi.org/10.2308/jeta.2007.4.1.69
Download citation file:
Pay-Per-View Access
$25.00