EleutherAI/tokengrams

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/EleutherAI/tokengrams)

EleutherAI / tokengrams

Efficiently computing & storing token n-grams from large corpora

☆28

Alternatives and similar repositories for tokengrams

Users that are interested in tokengrams are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

aflah02 / TokenSmith
View on GitHub
A comprehensive toolkit for streamlining data editing, search, and inspection for large-scale language model training and interpretabilit…
☆21Oct 30, 2025Updated 8 months ago
EleutherAI / pile_dedupe
View on GitHub
Pile Deduplication Code
☆18May 15, 2023Updated 3 years ago
JasonGross / guarantees-based-mechanistic-interpretability
View on GitHub
☆18Updated this week
lacoco-lab / decompiling_transformers
View on GitHub
Repo for Paper: Discovering Interpretable Algorithms by Decompiling Transformers to RASP
☆15May 25, 2026Updated last month
microsoft / implicitMemory
View on GitHub
☆19Feb 12, 2026Updated 5 months ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
aaronmueller / MIB
View on GitHub
Landing page for MIB: A Mechanistic Interpretability Benchmark
☆26Aug 15, 2025Updated 11 months ago
ejnnr / cupbearer
View on GitHub
A library for mechanistic anomaly detection
☆22Jan 9, 2025Updated last year
EleutherAI / semantic-memorization
View on GitHub
☆44Nov 17, 2024Updated last year
haileyschoelkopf / triton-index
View on GitHub
See https://github.com/cuda-mode/triton-index/ instead!
☆11May 8, 2024Updated 2 years ago
unbiarirang / Fixed-Input-Parameterization
View on GitHub
This repository contains the official code for the paper: "Prompt Injection: Parameterization of Fixed Inputs"
☆32Sep 13, 2024Updated last year
EleutherAI / mdl
View on GitHub
Minimum Description Length probing for neural network representations
☆20Jan 28, 2025Updated last year
EleutherAI / bergson
View on GitHub
Mapping out the "memory" of neural nets with data attribution
☆70Updated this week
pkunlp-icler / IKE
View on GitHub
☆25Feb 27, 2023Updated 3 years ago
VE-FORBRYDERNE / mesh-transformer-jax
View on GitHub
Fork of kingoflolz/mesh-transformer-jax with memory usage optimizations and support for GPT-Neo, GPT-NeoX, BLOOM, OPT and fairseq dense L…
☆22Nov 14, 2022Updated 3 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
AlexWan0 / infini-gram
View on GitHub
An unofficial implementation of the Infini-gram model proposed by Liu et al. (2024)
☆33Jun 19, 2024Updated 2 years ago
samblouir / birdie
View on GitHub
☆15Jun 8, 2026Updated last month
salesforce / simplification
View on GitHub
☆23Jun 25, 2026Updated 3 weeks ago
tilde-research / activault
View on GitHub
Engine for collecting, uploading, and downloading model activations
☆30Apr 2, 2025Updated last year
technion-cs-nlp / parametric-faithfulness
View on GitHub
☆23Aug 30, 2025Updated 10 months ago
Jaded-Encoding-Thaumaturgy / vs-kernels
View on GitHub
Kernel objects for scaling and format conversion within VapourSynth
☆12Nov 5, 2025Updated 8 months ago
lhoestq / hfjobs
View on GitHub
Hugging Face Jobs
☆20Jul 11, 2025Updated last year
cadentj / caft
View on GitHub
☆25Mar 30, 2026Updated 3 months ago
aws-samples / mammography-classification-workshop
View on GitHub
☆12Jan 10, 2023Updated 3 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
koayon / atp_star
View on GitHub
PyTorch and NNsight implementation of AtP* (Kramar et al 2024, DeepMind)
☆20Jan 19, 2025Updated last year
EleutherAI / best-download
View on GitHub
URL downloader supporting checkpointing and continuous checksumming.
☆19Nov 29, 2023Updated 2 years ago
Zce1112zslx / IKE
View on GitHub
☆41Nov 30, 2023Updated 2 years ago
ApolloResearch / apd
View on GitHub
Attribution-based Parameter Decomposition
☆35Jun 11, 2025Updated last year
Sara-mibo / LRP_EncoderDecoder_GRU
View on GitHub
Implementing LRP (Layer-wise Relevance Propagation) for a sequence-to-sequence model with GRU layers.
☆12Sep 8, 2023Updated 2 years ago
lucataco / serverless-template-flan-t5
View on GitHub
Basic template for using Flan-t5 on Banana's serverless GPU platform. Ready for 1-Click deploy
☆11Jan 30, 2023Updated 3 years ago
ColCarroll / working_ml
View on GitHub
Examples of applied machine learning
☆13Dec 27, 2017Updated 8 years ago
xuhaoxh / infini-gram-mini
View on GitHub
☆57Sep 26, 2025Updated 9 months ago
chrismcguire / gobberish
View on GitHub
Generates random utf-8 strings for fuzz t�sting character encoding probl�ms
☆11Aug 21, 2015Updated 10 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
vaibhavjhawar / typst-cv-template1
View on GitHub
A Typst Resume/CV template, inspired by Alessandro Plasmati's Graduate CV LaTex template
☆22Dec 16, 2024Updated last year
nickjevershed / Time-serious
View on GitHub
Automated journalism from data and time series analysis
☆11Mar 7, 2016Updated 10 years ago
chrislee973 / bible-semantic-search
View on GitHub
☆17Mar 15, 2023Updated 3 years ago
EleutherAI / project-menu
View on GitHub
See the issue board for the current status of active and prospective projects!
☆65Feb 12, 2022Updated 4 years ago
nesl / ExMatchina
View on GitHub
A Deep Neural Network explanation-by-example library for generating meaningful explanations
☆18Nov 11, 2020Updated 5 years ago
mpuels / docker-py-kaldi-asr-and-model
View on GitHub
STT Service based on Kaldi ASR
☆15Aug 17, 2018Updated 7 years ago
pnnl / GeoCLUSTER
View on GitHub
GeoCLUSTER is a Python-based web application that provides a collection of interactive methods for streamlining the visualization of the …
☆17Feb 15, 2026Updated 5 months ago