Efficiently computing & storing token n-grams from large corpora
☆26Oct 6, 2024Updated last year
Alternatives and similar repositories for tokengrams
Users that are interested in tokengrams are comparing it to the libraries listed below
Sorting:
- Mapping out the "memory" of neural nets with data attribution☆45Updated this week
- See https://github.com/cuda-mode/triton-index/ instead!☆11May 8, 2024Updated last year
- ☆13Dec 15, 2025Updated 2 months ago
- ☆17Aug 30, 2025Updated 6 months ago
- ☆17Updated this week
- Minimum Description Length probing for neural network representations☆20Jan 28, 2025Updated last year
- Pile Deduplication Code☆18May 15, 2023Updated 2 years ago
- URL downloader supporting checkpointing and continuous checksumming.☆19Nov 29, 2023Updated 2 years ago
- ☆23Jan 27, 2025Updated last year
- PyTorch and NNsight implementation of AtP* (Kramar et al 2024, DeepMind)☆20Jan 19, 2025Updated last year
- Landing page for MIB: A Mechanistic Interpretability Benchmark☆24Aug 15, 2025Updated 6 months ago
- Utilities for PyTorch distributed☆25Feb 27, 2025Updated last year