ChemDataWriter - Reader

Reader is built upon the Document class of BatteryDataExtractor.

It is specifically designed to process raw XML/HTML paper files and convert them into a Reader object, which can then be used to extract and analyze the content of the paper in a structured and meaningful way.

Features

Pre-process HTML/XML files from three publishers: RSC, Elsevier, and Springer

Extract metadata: title, author, date, journal, issue, abstract, ...

Support user-defined paper files in the JSON format

Enable large-scale scientific paper retrieval and analysis

Source Code

Documentation