Reader is built upon the Document class of BatteryDataExtractor.
It is specifically designed to process raw XML/HTML paper files and convert them into a Reader object, which can then be used to extract and analyze the content of the paper in a structured and meaningful way.
Pre-process HTML/XML files from three publishers: RSC, Elsevier, and Springer
Extract metadata: title, author, date, journal, issue, abstract, ...
Support user-defined paper files in the JSON format
Enable large-scale scientific paper retrieval and analysis