allenai / pdf-component-library
☆65Updated last year
Alternatives and similar repositories for pdf-component-library:
Users that are interested in pdf-component-library are comparing it to the libraries listed below
- Factored Cognition Primer: How to write compositional language model programs☆48Updated 2 years ago
- SciRepEval benchmark training and evaluation scripts☆72Updated 9 months ago
- A spaCy wrapper for GliNER☆108Updated last month
- Pretraining Efficiently on S2ORC!☆156Updated 4 months ago
- multimodal document analysis☆163Updated 9 months ago
- This is a public repository to enable researchers to begin their journey of self-hosting data from Semantic Scholar.☆40Updated 4 months ago
- Python API for https://vespa.ai, the open big data serving engine☆116Updated this week
- 🗺️ Data Cleaning and Textual Data Visualization 🗺️☆163Updated 8 months ago
- WikiSP, a semantic parser for Wikidata. WikiWebQuestions, a SPARQL-annotated dataset on Wikidata☆92Updated 4 months ago
- ☆34Updated last year
- 🦦 weasel: A small and easy workflow system☆75Updated 8 months ago
- Open Access PDF harvester, metadata aggregator and full-text ingester☆60Updated 10 months ago
- ☆91Updated 9 months ago
- PDF parser powered by grobid☆25Updated 7 months ago
- Scrollership through 20m pubmed abstracts.☆26Updated last year
- ☆84Updated 9 months ago
- Analyzing and scoring reasoning traces of LLMs☆44Updated 6 months ago
- 📝 Reference-Free automatic summarization evaluation with potential hallucination detection☆100Updated last year
- Viewer for the structure extracted by Grobid on PDF documents☆46Updated last month
- Logical structure analysis for visually structured documents☆86Updated 2 years ago
- The AI Knowledge Editor☆183Updated 2 years ago
- Edu-ConvoKit: An Open-Source Framework for Education Conversation Data☆84Updated 7 months ago
- Semantic search engine indexing 110 million academic publications☆80Updated this week
- Parsers for scientific papers (PDF2JSON, TEX2JSON, JATS2JSON)☆358Updated 11 months ago
- ☆33Updated last year
- S2APLER: S2 Agglomeration of Papers with Low Error Rate (it's for academic paper clustering)☆16Updated last year
- An easy way to chunk spaCy docs.☆19Updated 7 months ago
- [EMNLP 2023 Demo] fabricator - annotating and generating datasets with large language models.☆104Updated 9 months ago
- A collection of datasets for language model pretraining including scripts for downloading, preprocesssing, and sampling.☆56Updated 7 months ago