kermitt2 / grobid_client_pythonLinks

Python client for GROBID Web services

☆362

Alternatives and similar repositories for grobid_client_python

Users that are interested in grobid_client_python are comparing it to the libraries listed below

Sorting:

allenai / s2orc-doc2json
Parsers for scientific papers (PDF2JSON, TEX2JSON, JATS2JSON)
☆439Updated last year
titipata / scipdf_parser
Python PDF parser for scientific publications: content and figures
☆433Updated last year
allenai / spv2
Science-parse version 2
☆247Updated 5 years ago
allenai / s2orc
S2ORC: The Semantic Scholar Open Research Corpus: https://www.aclweb.org/anthology/2020.acl-main.447/
☆974Updated last year
allenai / specter
SPECTER: Document-level Representation Learning using Citation-informed Transformers
☆557Updated 2 years ago
allenai / science-parse
Science Parse parses scientific papers (in PDF form) and returns them in structured form.
☆674Updated last year
allenai / vila
Incorporating VIsual LAyout Structures for Scientific Text Classification
☆180Updated 2 years ago
allenai / pdffigures2
Given a scholarly PDF, extract figures, tables, captions, and section titles.
☆681Updated last year
allenai / scirepeval
SciRepEval benchmark training and evaluation scripts
☆76Updated last year
allenai / SPECTER2
☆103Updated last year
danielnsilva / semanticscholar
Unofficial Python client library for Semantic Scholar APIs.
☆405Updated 2 weeks ago
J535D165 / pyalex
A Python library for OpenAlex (openalex.org)
☆290Updated 2 months ago
allenai / pawls
Software that makes labeling PDFs easy.
☆420Updated last year
IllDepence / unarXive
A data set based on all arXiv publications, pre-processed for NLP, including structured full-text and citation network
☆296Updated last year
mattbierbaum / arxiv-public-datasets
A set of scripts to grab public datasets from resources related to arXiv
☆466Updated last year
gipplab / pdf-benchmark
A Benchmark of PDF Information Extraction Tools using a Multi-Task and Multi-Domain Evaluation Framework for Academic Documents
☆26Updated 2 years ago
allenai / mmda
multimodal document analysis
☆167Updated last year
CeON / CERMINE
Content ExtRactor and MINEr
☆504Updated 3 years ago
allenai / scidocs
Dataset accompanying the SPECTER model
☆139Updated 2 years ago
allenai / s2-folks
Public space for the user community of Semantic Scholar APIs to share scripts, report issues, and make suggestions.
☆242Updated 8 months ago
kermitt2 / pdfalto
PDF to XML ALTO file converter
☆254Updated 3 weeks ago
kermitt2 / Pub2TEI
Service for converting and enhancing heterogeneous publisher XML formats into TEI
☆57Updated last year
WING-NUS / Neural-ParsCit
Neuralized version of the Reference String Parser component of the ParsCit package.
☆81Updated 3 years ago
allenai / papermage
library supporting NLP and CV research on scientific papers
☆784Updated 10 months ago
allenai / deepfigures-open
Companion code to the paper "Extracting Scientific Figures with Distantly Supervised Neural Networks" 🤖
☆142Updated 3 years ago
fabiobatalha / crossrefapi
A python library that implements the Crossref API.
☆322Updated 2 months ago
kermitt2 / biblio-glutton
A high performance bibliographic information service: https://biblio-glutton.readthedocs.io
☆144Updated 3 months ago
amazon-science / ReFinED
ReFinED is an efficient and accurate entity linking (EL) system.
☆219Updated 9 months ago
ad-freiburg / pdfact
A basic tool that extracts the structure from the PDF files of scientific articles.
☆75Updated 3 years ago
IBM / science-result-extractor
☆93Updated 3 years ago