EleutherAI / pile_dedupe
Pile Deduplication Code
☆17Updated last year
Alternatives and similar repositories for pile_dedupe:
Users that are interested in pile_dedupe are comparing it to the libraries listed below
- Repo for the paper "Large Language Models Struggle to Learn Long-Tail Knowledge"☆77Updated last year
- ☆48Updated 11 months ago
- [NeurIPS 2023] Repetition In Repetition Out: Towards Understanding Neural Text Degeneration from the Data Perspective☆30Updated last year
- DEMix Layers for Modular Language Modeling☆53Updated 3 years ago
- [ACL'24 Oral] Analysing The Impact of Sequence Composition on Language Model Pre-Training☆20Updated 7 months ago