cat-lemonade / PDFDataExtractor
A toolkit for automatically extracting semantic information from PDF files of scientific articles
☆72Updated last year
Alternatives and similar repositories for PDFDataExtractor:
Users that are interested in PDFDataExtractor are comparing it to the libraries listed below
- Code and data for the publication "Structured information extraction from scientific text with large language models" by Dagdelen & Dunn …☆98Updated last year
- ☆39Updated last month
- Uses publisher APIs to programmatically retrieve scientific journal articles for text mining.☆123Updated last year
- Code to access the Matscholar public API.☆64Updated 3 years ago
- ☆23Updated 7 months ago
- Service for converting and enhancing heterogeneous publisher XML formats into TEI☆53Updated 7 months ago
- a Python version of getpapers☆84Updated 10 months ago
- Public release of data and code for materials synthesis generation☆73Updated 2 years ago
- Material Science Aware Language Model☆96Updated 2 years ago
- ChemicalTagger is a tool for semantic text-mining in chemistry.☆41Updated 6 months ago
- Grobid module for superconductor material and properties extraction☆21Updated last month
- A web scraping tool to systematically extract the text of scientific papers and corresponding metadata from university accessible journal…☆197Updated 2 years ago
- Extracts data from tables with complicated structures.☆16Updated last month
- Collection of papers on text mining for materials science☆27Updated 4 years ago
- ☆82Updated last year
- A pretrained BERT model on materials science literature☆57Updated 3 years ago
- OSCAR (Open Source Chemistry Analysis Routines) is an open source extensible system for the automated annotation of chemistry in scientif…☆31Updated last month
- An open-source effort towards accessible polymer data☆33Updated 4 years ago
- Downloads USPTO patents and finds molecules related to keyword queries☆58Updated last year
- Pipeline for automated extraction of chemical property information from scientific documents☆18Updated 6 years ago
- Python PDF parser for scientific publications: content and figures☆401Updated last year
- LimeSoup is a package to parse HTML or XML papers from different publishers.☆20Updated 4 years ago
- Chemist AI Agent for Developing Materials Datasets with Natural Language Prompts☆48Updated 5 months ago
- Extracts tables into json format from HTML/XML files☆35Updated 4 years ago
- ☆18Updated 2 weeks ago
- Open Access PDF harvester, metadata aggregator and full-text ingester☆60Updated 11 months ago
- Word2Vec model trained across 640k+ materials science journal articles☆51Updated 7 years ago
- Codes for text-mined solid-state reactions dataset☆74Updated last year
- Utility to compile string of chemical terms into data structure with chemical formula and composition☆13Updated 3 years ago
- litreviewer is a Python package (collection of few Python modules) that helps researchers perform crawling, scraping, collecting (corpus)…☆41Updated 9 months ago