ChemDataWriter - Retriever

Retriever is a powerful tool for efficient and effective text retrieval in scientific paper corpora given a topic.

Built upon the Haystack library, Retriever uses transformers-based models to search, query, and re-rank large volumes of text of scientific documents.

Features

Rank the text of a corpus of a given topic according to the frequency, word order and syntax

Performs document retrieval by sweeping through text which was saved in a DocumentStore

Select various retrieval methods: BM25, DensePassage, TableText, Embedding, Tfidf, MultiModal, Web

Define the number of documents that need to retrieve and re-rank by importance or date

Source Code

Documentation