allenai / pdf-component-library
☆59Updated last year
Alternatives and similar repositories for pdf-component-library:
Users that are interested in pdf-component-library are comparing it to the libraries listed below
- SciRepEval benchmark training and evaluation scripts☆72Updated 9 months ago
- This is a public repository to enable researchers to begin their journey of self-hosting data from Semantic Scholar.☆40Updated 3 months ago
- ☆84Updated 9 months ago
- ☆33Updated last year
- Public space for the user community of Semantic Scholar APIs to share scripts, report issues, and make suggestions.☆215Updated 3 weeks ago
- ☆33Updated last year
- PDF parser powered by grobid☆25Updated 6 months ago
- multimodal document analysis☆162Updated 8 months ago
- A Python library for OpenAlex (openalex.org)☆198Updated last week
- Open Access PDF harvester, metadata aggregator and full-text ingester☆59Updated 9 months ago
- Factored Cognition Primer: How to write compositional language model programs☆48Updated last year
- A high performance bibliographic information service: https://biblio-glutton.readthedocs.io☆132Updated 5 months ago
- Pretraining Efficiently on S2ORC!☆156Updated 3 months ago
- S2APLER: S2 Agglomeration of Papers with Low Error Rate (it's for academic paper clustering)☆16Updated last year
- Python client for GROBID Web services☆308Updated 3 weeks ago
- The Semantic Scholar Search Reranker☆104Updated 4 years ago
- ☆90Updated 8 months ago
- MultiCite code and data. Models are available on Huggingface.☆29Updated 2 years ago
- WikiSP, a semantic parser for Wikidata. WikiWebQuestions, a SPARQL-annotated dataset on Wikidata☆89Updated 4 months ago
- The guts for computing data for OpenAlex. For more, see https://openalex.org/.☆129Updated last week
- Multidocument Summarization for Literature Review Shared Task 2022☆28Updated 2 years ago
- Python API for https://vespa.ai, the open big data serving engine☆113Updated this week
- Neighborhood Contrastive Learning for Scientific Document Representations with Citation Embeddings (EMNLP 2022 paper)☆67Updated 2 years ago
- Parsers for scientific papers (PDF2JSON, TEX2JSON, JATS2JSON)☆356Updated 10 months ago
- Efficient few-shot learning with cross-encoders.☆48Updated last year
- 📝 Reference-Free automatic summarization evaluation with potential hallucination detection☆101Updated last year
- A data set based on all arXiv publications, pre-processed for NLP, including structured full-text and citation network☆285Updated 4 months ago
- A spaCy wrapper for GliNER☆107Updated 3 weeks ago
- Completion After Prompt Probability. Make your LLM make a choice☆74Updated 3 months ago
- 📄 ⚙️ ETL processes for medical and scientific papers☆374Updated last month