kermitt2/article_dataset_builder

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/kermitt2/article_dataset_builder)

kermitt2 / article_dataset_builder

Open Access PDF harvester, metadata aggregator and full-text ingester

☆62

Alternatives and similar repositories for article_dataset_builder

Users that are interested in article_dataset_builder are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

kermitt2 / biblio-glutton
View on GitHub
A high performance bibliographic information service: https://biblio-glutton.readthedocs.io
☆150Apr 8, 2026Updated 3 months ago
kermitt2 / arxiv_harvester
View on GitHub
Poor man's simple harvester for arXiv resources
☆14Jul 14, 2023Updated 3 years ago
kermitt2 / grisp
View on GitHub
Knowledge Base stuff
☆23Mar 1, 2026Updated 4 months ago
grobidOrg / grobid-client-python
View on GitHub
Python client for GROBID Web services
☆410Mar 5, 2026Updated 4 months ago
kermitt2 / datastet
View on GitHub
Finding mentions and citations to named and implicit research datasets from within the academic literature
☆31Jun 14, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
softcite / software-mentions
View on GitHub
Softcite software mention recognizer, finding mentions and citations to software from within the academic literature
☆85Jun 6, 2026Updated last month
cverluise / openPatstat
View on GitHub
Load, build and explore Patstat using the Google Cloud Platform
☆10Jan 19, 2019Updated 7 years ago
opencitations / cec
View on GitHub
Citation Extraction and Classifier
☆16Apr 18, 2026Updated 3 months ago
ourresearch / openalex-elastic-api
View on GitHub
All the OpenAlex API endpoints that are backed by Elasticsearch
☆43Updated this week
anHALytics / anhalytics-core
View on GitHub
Analytic platform for the HAL research archive (in development)
☆12Oct 2, 2020Updated 5 years ago
kermitt2 / entity-fishing
View on GitHub
A machine learning tool for fishing entities
☆268Feb 27, 2026Updated 4 months ago
kermitt2 / grobid-astro
View on GitHub
A machine learning software for extracting astronomical entities from scholarly documents
☆10Oct 31, 2022Updated 3 years ago
kermitt2 / biblio-glutton-extension
View on GitHub
A browser extension providing Open Access bibliographical services
☆18Dec 9, 2022Updated 3 years ago
ourresearch / journalsdb
View on GitHub
Open database of scholarly journals
☆11Oct 26, 2022Updated 3 years ago
Open source password manager - Proton Pass • Ad
Securely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
DanteNoguez / StreamlitGPT
View on GitHub
Streaming responses with Streamlit, ChatGPT and Langchain.
☆11Apr 7, 2023Updated 3 years ago
DataSeer / dataseer-ml
View on GitHub
DataSeer machine-learning service
☆28Sep 4, 2025Updated 10 months ago
paperfetcher / paperfetcher
View on GitHub
Pip-installable Python package to automate handsearching and citation searching for systematic reviews.
☆20Jul 13, 2024Updated 2 years ago
allenai / s2orc
View on GitHub
S2ORC: The Semantic Scholar Open Research Corpus: https://www.aclweb.org/anthology/2020.acl-main.447/
☆1,073Apr 26, 2024Updated 2 years ago
allenai / ForeCite
View on GitHub
☆35Sep 16, 2022Updated 3 years ago
howisonlab / softcite-dataset
View on GitHub
A gold-standard dataset of software mentions in research publications.
☆39Jul 27, 2023Updated 2 years ago
grobidOrg / grobid
View on GitHub
A machine learning software for extracting information from scholarly documents
☆5,015Updated this week
DriedFishMatters / zotero-meta-analysis-toolkit
View on GitHub
Command-line tools to support meta-analysis using a library managed in Zotero
☆11Feb 9, 2023Updated 3 years ago
kermitt2 / Pub2TEI
View on GitHub
Service for converting and enhancing heterogeneous publisher XML formats into TEI
☆65Apr 12, 2026Updated 3 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
Pecners / quartotemplate
View on GitHub
☆21Oct 7, 2022Updated 3 years ago
Open-Models / Base
View on GitHub
Brick of Knowledge on Open Models : Open Source, Open Science, Open Education, Open Collaboration, Open Hardware...
☆17Jun 27, 2026Updated 3 weeks ago
cverluise / PatCit
View on GitHub
Making Patent Citations Uncool Again
☆113Jun 11, 2023Updated 3 years ago
UCSBCarpentry / reproducible-publications-quarto
View on GitHub
Introduction to Reproducible Publications with Quarto
☆11Jan 28, 2025Updated last year
rtrelease / Jetson-Symbolics-Neuromorphics
View on GitHub
Integrating Symbolic Programming and Neuromorphic Modeling for Edge Labs with NVIDIA Jetson, DGX Spark, and GPU-based DNN/ML Systems
☆16Jul 10, 2026Updated last week
kermitt2 / delft
View on GitHub
a Deep Learning Framework for Text https://delft.readthedocs.io/
☆416Updated this week
titipata / scipdf_parser
View on GitHub
Python PDF parser for scientific publications: content and figures
☆455Mar 21, 2024Updated 2 years ago
papercast-dev / papercast
View on GitHub
A Python pipeline tool and plugin ecosystem for processing technical documents. Process papers from arXiv, SemanticScholar, PDF, with GRO…
☆53Mar 17, 2025Updated last year
grobidOrg / grobid-ner
View on GitHub
A Named-Entity Recogniser based on Grobid.
☆55May 14, 2025Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
ute / search-replace
View on GitHub
quarto filter extension for simple search-replace macros
☆27Dec 29, 2025Updated 6 months ago
spupyrev / gmap
View on GitHub
GMap: Graph-to-Map visualization tool
☆22Jun 11, 2021Updated 5 years ago
Zettelkasten-Method / macOS-Tag-Converter
View on GitHub
Convert Finder/Spotlight metadata tags to #hashtags as part of the files's contents
☆56Mar 16, 2018Updated 8 years ago
internetarchive / fatcat
View on GitHub
Perpetual Access To The Scholarly Record
☆121Jul 31, 2024Updated last year
r-universe-org / cranlike-server
View on GitHub
High-performance R package server
☆29Jul 1, 2026Updated 3 weeks ago
yqhuang2912 / knowtate
View on GitHub
Knowtate is a sophisticated platform designed to elevate your academic research experience. Seamlessly blend reading, note-taking with ma…
☆12Sep 19, 2024Updated last year
greenelab / crossref
View on GitHub
Download metadata for all DOIs using the Crossref API
☆66Sep 25, 2018Updated 7 years ago