allenai / pdf-component-libraryLinks
☆79Updated last year
Alternatives and similar repositories for pdf-component-library
Users that are interested in pdf-component-library are comparing it to the libraries listed below
Sorting:
- Public space for the user community of Semantic Scholar APIs to share scripts, report issues, and make suggestions.☆239Updated 7 months ago
- Parsers for scientific papers (PDF2JSON, TEX2JSON, JATS2JSON)☆434Updated last year
- This is a public repository to enable researchers to begin their journey of self-hosting data from Semantic Scholar.☆42Updated 10 months ago
- ☆87Updated 10 months ago
- Get answers to research questions from 200M+ papers. Link to demo -☆206Updated last year
- 🗺️ Data Cleaning and Textual Data Visualization 🗺️☆187Updated 3 months ago
- 📄 ⚙️ ETL processes for medical and scientific papers☆399Updated last month
- SciRepEval benchmark training and evaluation scripts☆76Updated last year
- ☆96Updated last year
- A spaCy wrapper for GliNER☆118Updated 7 months ago
- Edu-ConvoKit: An Open-Source Framework for Education Conversation Data☆100Updated 4 months ago
- Concept Induction: Analyzing Unstructured Text with High-Level Concepts Using LLooM (CHI 2024 paper). LLooM automatically surfaces high-l…☆124Updated 3 months ago
- ☆102Updated last year
- Python client for GROBID Web services☆358Updated this week
- library supporting NLP and CV research on scientific papers☆782Updated 10 months ago
- A high performance bibliographic information service: https://biblio-glutton.readthedocs.io☆144Updated 2 months ago
- Convert all of libgen to high quality markdown☆253Updated last year
- PDF parser powered by grobid☆28Updated last year
- multimodal document analysis☆166Updated last year
- automatic sentence highlights based on their significance to the document☆191Updated last year
- A Python pipeline tool and plugin ecosystem for processing technical documents. Process papers from arXiv, SemanticScholar, PDF, with GRO…☆52Updated 6 months ago
- A data set based on all arXiv publications, pre-processed for NLP, including structured full-text and citation network☆296Updated 11 months ago
- Generalist and Lightweight Model for Relation Extraction (Extract any relationship types from text)☆238Updated 3 months ago
- In-Context Learning for eXtreme Multi-Label Classification (XMC) using only a handful of examples.☆437Updated last year
- Repository for the research work "Ontology Generation using Large Language Models", presented at ESWC 2025.☆15Updated last month
- Viewer for the structure extracted by Grobid on PDF documents☆54Updated 4 months ago
- Benchmarking PDF libraries☆310Updated 2 months ago
- A collection of datasets and other resources for legal text processing.☆121Updated 2 weeks ago
- A dataset for pretraining language models targeted for legal tasks.☆139Updated 3 years ago
- Code and data for the paper 'The impact of founder personalities on startup success'☆17Updated this week