IllDepence / unarXive
A data set based on all arXiv publications, pre-processed for NLP, including structured full-text and citation network
☆285Updated 5 months ago
Alternatives and similar repositories for unarXive:
Users that are interested in unarXive are comparing it to the libraries listed below
- ☆85Updated 10 months ago
- Pretraining Efficiently on S2ORC!☆158Updated 5 months ago
- Parsers for scientific papers (PDF2JSON, TEX2JSON, JATS2JSON)☆359Updated 11 months ago
- multimodal document analysis☆164Updated 9 months ago
- SciRepEval benchmark training and evaluation scripts☆73Updated 10 months ago
- Dataset accompanying the SPECTER model☆133Updated 2 years ago
- SPECTER: Document-level Representation Learning using Citation-informed Transformers☆539Updated last year
- Get answers to research questions from 200M+ papers. Link to demo -☆206Updated last year
- This repository provides details and links to the ACL anthology corpus/collection including .bib, .pdf and grobid extractions of the pdfs☆178Updated last year
- S2ORC: The Semantic Scholar Open Research Corpus: https://www.aclweb.org/anthology/2020.acl-main.447/☆889Updated 11 months ago
- Neighborhood Contrastive Learning for Scientific Document Representations with Citation Embeddings (EMNLP 2022 paper)☆67Updated 2 years ago
- Repo for Aspire - A scientific document similarity model based on matching fine-grained aspects of scientific papers.☆51Updated last year
- Data and models for the SciFact verification task.☆228Updated last year
- Incorporating VIsual LAyout Structures for Scientific Text Classification☆175Updated 2 years ago
- Datasets collection and preprocessings framework for NLP extreme multitask learning☆176Updated 2 months ago
- ☆182Updated last year
- Reproduce results and replicate training fo T0 (Multitask Prompted Training Enables Zero-Shot Task Generalization)☆463Updated 2 years ago
- potato: portable text annotation tool☆324Updated this week
- [EMNLP 2023] Enabling Large Language Models to Generate Text with Citations. Paper: https://arxiv.org/abs/2305.14627☆478Updated 5 months ago
- Interpretable Evaluation for AI Systems☆363Updated 2 years ago
- The original implementation of Min et al. "Nonparametric Masked Language Modeling" (paper https//arxiv.org/abs/2212.01349)☆157Updated 2 years ago
- Search Engines with Autoregressive Language models☆283Updated last year
- Pipeline for pulling and processing online language model pretraining data from the web☆177Updated last year
- A framework for few-shot evaluation of autoregressive language models.☆103Updated last year
- A set of scripts to grab public datasets from resources related to arXiv☆432Updated 10 months ago
- Measuring the Evolution of a Scientific Field through Citation Frames☆55Updated 6 years ago
- [ACL 2022] LinkBERT: A Knowledgeable Language Model 😎 Pretrained with Document Links☆433Updated 2 years ago
- Tk-Instruct is a Transformer model that is tuned to solve many NLP tasks by following instructions.☆179Updated 2 years ago
- ☆65Updated last year
- Reverse Instructions to generate instruction tuning data with corpus examples☆208Updated last year