kermitt2/arxiv_harvester

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/kermitt2/arxiv_harvester)

kermitt2 / arxiv_harvester

Poor man's simple harvester for arXiv resources

☆14

Alternatives and similar repositories for arxiv_harvester

Users that are interested in arxiv_harvester are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

kermitt2 / biblio-glutton-extension
View on GitHub
A browser extension providing Open Access bibliographical services
☆18Dec 9, 2022Updated 3 years ago
kermitt2 / biblio_glutton_harvester
View on GitHub
Open Access PDF harvester
☆42May 3, 2024Updated 2 years ago
kermitt2 / grobid-astro
View on GitHub
A machine learning software for extracting astronomical entities from scholarly documents
☆10Oct 31, 2022Updated 3 years ago
kermitt2 / article_dataset_builder
View on GitHub
Open Access PDF harvester, metadata aggregator and full-text ingester
☆62May 3, 2024Updated 2 years ago
Moradnejad / AgeDataset
View on GitHub
Life, work, and mortality of 1.22M distinguished people
☆12Sep 13, 2022Updated 3 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
kermitt2 / xpdf-4.00
View on GitHub
☆19Apr 6, 2021Updated 5 years ago
kermitt2 / grisp
View on GitHub
Knowledge Base stuff
☆23Mar 1, 2026Updated 4 months ago
kermitt2 / grobid-example
View on GitHub
Some examples of usage of Grobid in a third party java project.
☆20Jun 14, 2023Updated 3 years ago
softcite / softcite_kb
View on GitHub
A Knowledge Base for research software relying on large-scale text mining and curated knowledge sources
☆18May 14, 2023Updated 3 years ago
hopsparser / hopsparser
View on GitHub
A neural dependency parser that does its best
☆17Mar 6, 2026Updated 4 months ago
kermitt2 / datastet
View on GitHub
Finding mentions and citations to named and implicit research datasets from within the academic literature
☆31Jun 14, 2025Updated last year
fniessen / gitboost
View on GitHub
Discover a handpicked compilation of Git configuration settings and time-saving aliases. Enhance your productivity and simplify your work…
☆18Jul 10, 2026Updated 2 weeks ago
pmetzger / ShellTutorial
View on GitHub
A thorough tutorial on using and programming with the Unix shell.
☆11Feb 27, 2020Updated 6 years ago
lfoppiano / grobid-superconductors
View on GitHub
Grobid module for superconductor material and properties extraction
☆23May 17, 2025Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
eleanorkonik / js_projs
View on GitHub
Learning Javascript in Public
☆11Apr 16, 2021Updated 5 years ago
spupyrev / gmap
View on GitHub
GMap: Graph-to-Map visualization tool
☆22Jun 11, 2021Updated 5 years ago
kermitt2 / Pub2TEI
View on GitHub
Service for converting and enhancing heterogeneous publisher XML formats into TEI
☆65Apr 12, 2026Updated 3 months ago
LoicGrobol / zeldarose
View on GitHub
Train transformer-based models.
☆28Apr 12, 2026Updated 3 months ago
YoannDupont / WiNER-fr
View on GitHub
WiNER-fr is a free named entity corpus using French Wikinews texts.
☆17Feb 12, 2021Updated 5 years ago
tegridydev / abstract-agent
View on GitHub
Locally hosted AI Agent Python Tool To Generate Novel Research Hypothesis + Titles + Abstracts
☆30Apr 30, 2025Updated last year
llZektorll / Microsoft-PowerShell-Fastlane
View on GitHub
This repository contains a number of scripts that i have written or refactored to enhance its performance. All the scripts are meant to m…
☆20Mar 24, 2025Updated last year
lfoppiano / material-parsers
View on GitHub
Material parsers and other tools, scripts Initially developed for Grobid Superconductor
☆14Feb 21, 2025Updated last year
flutrack / Flutrack.org_webapp_source_code
View on GitHub
Flutrack platform gathers flu related tweets from the entire world, with searching tag, words that are influenza synonyms and flu symptom…
☆13Apr 22, 2019Updated 7 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
llZektorll / Obsidian-SSOV
View on GitHub
Obsidian SSOV (Student Start Obsidian Vault) is a project created for Obsidian October 2022 event with the theme "Back to School"
☆18Oct 27, 2022Updated 3 years ago
cverluise / openPatstat
View on GitHub
Load, build and explore Patstat using the Google Cloud Platform
☆10Jan 19, 2019Updated 7 years ago
dsebastien / obsidian-update-time
View on GitHub
Obsidian plugin that updates front matter to include creation and last update times
☆21Updated this week
grobidOrg / grobid-ner
View on GitHub
A Named-Entity Recogniser based on Grobid.
☆55May 14, 2025Updated last year
EHRI / ehri-frontend
View on GitHub
The EHRI project's portal interface.
☆15Jul 7, 2026Updated 3 weeks ago
hirmeos / entity-fishing-client-python
View on GitHub
Repository hosting the common code for the entity-fishing clients
☆10May 18, 2026Updated 2 months ago
com3dian / Grobidmonkey
View on GitHub
The grobidmonkey package is an open-source package designed for postprocessing GROBID outputs.
☆12Mar 27, 2024Updated 2 years ago
iskyzh / ml-gcn
View on GitHub
Course project for CS410. Drug Molecular Toxicity Prediction with GCN + Cloud ML Infra.
☆10Apr 6, 2021Updated 5 years ago
pjox / gutf
View on GitHub
Terminal tool that converts files encoding to UTF-8
☆10Oct 5, 2019Updated 6 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
anHALytics / anhalytics-core
View on GitHub
Analytic platform for the HAL research archive (in development)
☆12Oct 2, 2020Updated 5 years ago
kermitt2 / biblio-glutton
View on GitHub
A high performance bibliographic information service: https://biblio-glutton.readthedocs.io
☆150Apr 8, 2026Updated 3 months ago
laurentromary / stdfSpec
View on GitHub
Specification of a stand-off element for the TEI guidelines
☆12Apr 29, 2021Updated 5 years ago
internetarchive / sandcrawler
View on GitHub
Backend, IA-specific tools for crawling and processing the scholarly web. Content ends up in https://fatcat.wiki
☆28Jul 31, 2024Updated last year
microsoft / ARXGEN
View on GitHub
Scripts to parse arxiv documents for NLP tasks
☆19Jun 12, 2023Updated 3 years ago
mlcommons / science
View on GitHub
MLCommons Science benchmarking working group
☆14Apr 17, 2026Updated 3 months ago
istex-archives / istex-browser-extension
View on GitHub
Bouton ISTEX : extension web capable d'insérer dynamiquement sur la page web consultée un lien vers le fulltext d'un document si ce dern…
☆11May 30, 2023Updated 3 years ago