kermitt2/datastet

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/kermitt2/datastet)

kermitt2 / datastet

Finding mentions and citations to named and implicit research datasets from within the academic literature

☆31

Alternatives and similar repositories for datastet

Users that are interested in datastet are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

kermitt2 / grisp
View on GitHub
Knowledge Base stuff
☆23Mar 1, 2026Updated 4 months ago
kermitt2 / grobid-astro
View on GitHub
A machine learning software for extracting astronomical entities from scholarly documents
☆10Oct 31, 2022Updated 3 years ago
softcite / software-mentions
View on GitHub
Softcite software mention recognizer, finding mentions and citations to software from within the academic literature
☆85Jun 6, 2026Updated last month
DataSeer / dataseer-ml
View on GitHub
DataSeer machine-learning service
☆28Sep 4, 2025Updated 10 months ago
georgetown-cset / 1790-ai-patent-data
View on GitHub
☆20Jun 2, 2022Updated 4 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
cverluise / openPatstat
View on GitHub
Load, build and explore Patstat using the Google Cloud Platform
☆10Jan 19, 2019Updated 7 years ago
kermitt2 / biblio_glutton_harvester
View on GitHub
Open Access PDF harvester
☆42May 3, 2024Updated 2 years ago
kermitt2 / arxiv_harvester
View on GitHub
Poor man's simple harvester for arXiv resources
☆14Jul 14, 2023Updated 3 years ago
com3dian / Grobidmonkey
View on GitHub
The grobidmonkey package is an open-source package designed for postprocessing GROBID outputs.
☆12Mar 27, 2024Updated 2 years ago
anHALytics / anhalytics-core
View on GitHub
Analytic platform for the HAL research archive (in development)
☆12Oct 2, 2020Updated 5 years ago
laurentromary / stdfSpec
View on GitHub
Specification of a stand-off element for the TEI guidelines
☆12Apr 29, 2021Updated 5 years ago
kermitt2 / biblio-glutton-extension
View on GitHub
A browser extension providing Open Access bibliographical services
☆18Dec 9, 2022Updated 3 years ago
lfoppiano / grobid-quantities
View on GitHub
GROBID extension for identifying and normalizing physical quantities.
☆85Apr 8, 2026Updated 3 months ago
kermitt2 / article_dataset_builder
View on GitHub
Open Access PDF harvester, metadata aggregator and full-text ingester
☆62May 3, 2024Updated 2 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
lfoppiano / grobid-superconductors
View on GitHub
Grobid module for superconductor material and properties extraction
☆23May 17, 2025Updated last year
istex-archives / istex-browser-extension
View on GitHub
Bouton ISTEX : extension web capable d'insérer dynamiquement sur la page web consultée un lien vers le fulltext d'un document si ce dern…
☆11May 30, 2023Updated 3 years ago
softcite / softcite_kb
View on GitHub
A Knowledge Base for research software relying on large-scale text mining and curated knowledge sources
☆18May 14, 2023Updated 3 years ago
kermitt2 / biblio-glutton
View on GitHub
A high performance bibliographic information service: https://biblio-glutton.readthedocs.io
☆150Apr 8, 2026Updated 3 months ago
ottowg / gsap-ner
View on GitHub
☆10Oct 2, 2024Updated last year
kermitt2 / Pub2TEI
View on GitHub
Service for converting and enhancing heterogeneous publisher XML formats into TEI
☆65Apr 12, 2026Updated 3 months ago
neuged / webanno_tsv
View on GitHub
A small python library to parse and write TSV files generated by the WebAnno software.
☆11Apr 14, 2025Updated last year
PierreSenellart / theoremkb
View on GitHub
Collection of tools to extract semantic information from (mathematical) research articles
☆24Updated this week
danilo-dessi / SKG-pipeline
View on GitHub
☆21May 1, 2025Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
ScienciaLAB / document-qa
View on GitHub
Scientific Document Insight Q/A
☆37Jun 7, 2026Updated last month
thatandromeda / hamlet
View on GitHub
How About Machine Learning Enhancing Theses? - a pilot discovery project
☆14May 23, 2023Updated 3 years ago
malteos / scincl
View on GitHub
Neighborhood Contrastive Learning for Scientific Document Representations with Citation Embeddings (EMNLP 2022 paper)
☆79Dec 29, 2025Updated 6 months ago
vwoloszyn / diaa
View on GitHub
Inter-annotator agreement for Doccano
☆28May 3, 2020Updated 6 years ago
miroozyx / BERT_with_keras
View on GitHub
A Keras version of Google's BERT model
☆35Nov 4, 2019Updated 6 years ago
howisonlab / softcite-dataset
View on GitHub
A gold-standard dataset of software mentions in research publications.
☆39Jul 27, 2023Updated 2 years ago
opencitations / cec
View on GitHub
Citation Extraction and Classifier
☆16Apr 18, 2026Updated 3 months ago
istex-archives / sisyphe
View on GitHub
Sisyphe is a modulable NodeJS BIG-DATA analyser & transformer
☆12Oct 16, 2023Updated 2 years ago
pjox / gutf
View on GitHub
Terminal tool that converts files encoding to UTF-8
☆10Oct 5, 2019Updated 6 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
lfoppiano / SuperMat
View on GitHub
Superconductors material dataset
☆28Dec 5, 2023Updated 2 years ago
grobidOrg / grobid-client-python
View on GitHub
Python client for GROBID Web services
☆410Mar 5, 2026Updated 4 months ago
allenai / pybart
View on GitHub
Converter from UD-trees to BART representation
☆35Mar 6, 2024Updated 2 years ago
webis-de / scidata22-stereo-scientific-text-reuse
View on GitHub
☆11Dec 2, 2024Updated last year
bio-ontology-research-group / Onto2Graph
View on GitHub
Generating graph structures from OWL ontologies
☆12Nov 21, 2017Updated 8 years ago
kermitt2 / pdfalto
View on GitHub
PDF to XML ALTO file converter
☆272Updated this week
lfoppiano / material-parsers
View on GitHub
Material parsers and other tools, scripts Initially developed for Grobid Superconductor
☆14Feb 21, 2025Updated last year