EleutherAI / pile_dedupeLinks
Pile Deduplication Code
☆19Updated 2 years ago
Alternatives and similar repositories for pile_dedupe
Users that are interested in pile_dedupe are comparing it to the libraries listed below
Sorting:
- [ACL'24 Oral] Analysing The Impact of Sequence Composition on Language Model Pre-Training☆21Updated 9 months ago
- [NeurIPS 2023] Repetition In Repetition Out: Towards Understanding Neural Text Degeneration from the Data Perspective☆31Updated last year
- Code for paper 'Data-Efficient FineTuning'☆29Updated 2 years ago
- ☆50Updated last year
- ☆28Updated last year
- DEMix Layers for Modular Language Modeling☆53Updated 3 years ago
- Repo for ICML23 "Why do Nearest Neighbor Language Models Work?"☆57Updated 2 years ago
- Repo for the paper "Large Language Models Struggle to Learn Long-Tail Knowledge"☆77Updated 2 years ago
- ☆11Updated last year
- Code for M4LE: A Multi-Ability Multi-Range Multi-Task Multi-Domain Long-Context Evaluation Benchmark for Large Language Models☆23Updated 10 months ago
- Retrieval as Attention☆82Updated 2 years ago