allenai / peS2o
Pretraining Efficiently on S2ORC!
☆164Updated 6 months ago
Alternatives and similar repositories for peS2o
Users that are interested in peS2o are comparing it to the libraries listed below
Sorting:
- This project studies the performance and robustness of language models and task-adaptation methods.☆150Updated 11 months ago
- [Data + code] ExpertQA : Expert-Curated Questions and Attributed Answers☆128Updated last year
- What's In My Big Data (WIMBD) - a toolkit for analyzing large text datasets☆218Updated 6 months ago
- ☆38Updated last year
- Datasets collection and preprocessings framework for NLP extreme multitask learning☆180Updated 4 months ago
- Retrieval-Augmented Generation battle!☆50Updated 4 months ago
- Scalable training for dense retrieval models.☆292Updated 2 months ago
- SWIM-IR is a Synthetic Wikipedia-based Multilingual Information Retrieval training set with 28 million query-passage pairs spanning 33 la…☆48Updated last year
- ☆97Updated 2 years ago
- [EMNLP 2024] A Retrieval Benchmark for Scientific Literature Search☆84Updated 5 months ago
- ☆72Updated last year
- A framework for few-shot evaluation of autoregressive language models.☆103Updated 2 years ago
- SILO Language Models code repository☆81Updated last year
- Code and model release for the paper "Task-aware Retrieval with Instructions" by Asai et al.☆162Updated last year
- Inquisitive Parrots for Search☆191Updated last year
- multimodal document analysis☆164Updated 11 months ago
- Code for "SemDeDup", a simple method for identifying and removing semantic duplicates from a dataset (data pairs which are semantically s…☆136Updated last year
- Efficient Memory-Augmented Transformers☆34Updated 2 years ago
- Finetune mistral-7b-instruct for sentence embeddings☆81Updated last year
- A collection of datasets for language model pretraining including scripts for downloading, preprocesssing, and sampling.☆59Updated 9 months ago
- A data set based on all arXiv publications, pre-processed for NLP, including structured full-text and citation network☆289Updated 7 months ago
- The official code repo for "Sub-Sentence Encoder: Contrastive Learning of Propositional Semantic Representations".☆82Updated last year
- LOFT: A 1 Million+ Token Long-Context Benchmark☆193Updated 3 weeks ago
- ☆72Updated last year
- ☆147Updated last year
- [ICLR 2023] Guess the Instruction! Flipped Learning Makes Language Models Stronger Zero-Shot Learners☆115Updated 8 months ago
- Open Instruction Generalist is an assistant trained on massive synthetic instructions to perform many millions of tasks☆208Updated last year
- Official code for "MAmmoTH2: Scaling Instructions from the Web" [NeurIPS 2024]☆140Updated 6 months ago
- Dataset and evaluation suite enabling LLM instruction-following for scientific literature understanding.☆40Updated last month
- Retrieval as Attention☆83Updated 2 years ago