This paper presents a template‐based approach to extract data from the EDGAR database. A set of heuristic‐based templates is used to configure the trainable system in order to have one type of EDGAR filings processed in a single configuration. Such configurability is highly desirable as it adds expendability and flexibility to this system. The template‐based approach also enables the system to extract both structural information and content from the filings in the EDGAR database. The ability to extract structural information from a section or a complete filing makes it possible to collect data from real‐world documents for users of financial data in both academia and industry. We use the income statement section of 10‐K filings to illustrate the system and the utilization of the template‐based approach.

This content is only available via PDF.
You do not currently have access to this content.