allenai / peS2o
Pretraining Efficiently on S2ORC!
☆138Updated last month
Related projects ⓘ
Alternatives and complementary repositories for peS2o
- What's In My Big Data (WIMBD) - a toolkit for analyzing large text datasets☆193Updated last week
- [Data + code] ExpertQA : Expert-Curated Questions and Attributed Answers☆122Updated 8 months ago
- Datasets collection and preprocessings framework for NLP extreme multitask learning☆149Updated 4 months ago
- ☆38Updated 7 months ago
- ☆73Updated last year
- ☆95Updated last year
- This project studies the performance and robustness of language models and task-adaptation methods.☆141Updated 6 months ago
- A framework for few-shot evaluation of autoregressive language models.☆102Updated last year
- SWIM-IR is a Synthetic Wikipedia-based Multilingual Information Retrieval training set with 28 million query-passage pairs spanning 33 la…☆44Updated last year
- ☆71Updated 6 months ago
- [EMNLP 2024] A Retrieval Benchmark for Scientific Literature Search☆61Updated 4 months ago
- The official repository for Efficient Long-Text Understanding Using Short-Text Models (Ivgi et al., 2022) paper☆67Updated last year
- Code for "SemDeDup", a simple method for identifying and removing semantic duplicates from a dataset (data pairs which are semantically s…☆112Updated last year
- Code of ICLR paper: https://openreview.net/forum?id=-cqvvvb-NkI☆91Updated last year
- Scalable training for dense retrieval models.☆271Updated last year
- Comprehensive benchmark for RAG☆39Updated 2 weeks ago
- minimal pytorch implementation of bm25 (with sparse tensors)☆90Updated 8 months ago
- The original implementation of Min et al. "Nonparametric Masked Language Modeling" (paper https//arxiv.org/abs/2212.01349)☆157Updated last year
- DSIR large-scale data selection framework for language model training☆231Updated 7 months ago
- ☆46Updated last week
- Inquisitive Parrots for Search☆179Updated 8 months ago
- Retrieval as Attention☆83Updated last year
- ☆167Updated last year
- ☆56Updated 9 months ago
- Code and data accompanying our paper on arXiv "Faithful Chain-of-Thought Reasoning".☆155Updated 6 months ago
- Retrieval-Augmented Generation battle!☆45Updated last week
- Code and Data for "Evaluating Correctness and Faithfulness of Instruction-Following Models for Question Answering"☆78Updated 3 months ago
- ☆126Updated 7 months ago
- SILO Language Models code repository☆80Updated 9 months ago
- [EMNLP 2023] The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning☆214Updated last year