neilrathi/token-filtering

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/neilrathi/token-filtering)

neilrathi / token-filtering

Shaping capabilities with token-level pretraining data filtering

☆94

Alternatives and similar repositories for token-filtering

Users that are interested in token-filtering are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

JanTempus / tokenisation_lp
View on GitHub
☆15May 20, 2026Updated 2 months ago
davidbrandfonbrener / color-filter-olmo
View on GitHub
☆13Dec 12, 2025Updated 7 months ago
jennhu / lm-pragmatics
View on GitHub
Code and data for "A fine-grained comparison of pragmatic language understanding in humans and language models"
☆11Dec 14, 2022Updated 3 years ago
EleutherAI / deep-ignorance
View on GitHub
☆20Jan 7, 2026Updated 6 months ago
EvanZhuang / knowledge_flow
View on GitHub
Official Implementation of Knowledge Flow Prompting
☆35Oct 20, 2025Updated 9 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
hkust-nlp / model-task-align-rl
View on GitHub
[ICLR 26] The official code repository for the paper "Mirage or Method? How Model–Task Alignment Induces Divergent RL Conclusions".
☆18Feb 9, 2026Updated 5 months ago
jennhu / metalinguistic-prompting
View on GitHub
Materials for "Prompting is not a substitute for probability measurements in large language models" (EMNLP 2023)
☆24Oct 24, 2023Updated 2 years ago
Sphere-AI-Lab / FormalMATH-Bench
View on GitHub
Repository of <FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models>
☆75Jan 8, 2026Updated 6 months ago
SeunggeunKimkr / PRISM
View on GitHub
[ICML 2026] Public repository for fine-tuning Masked Diffusion Models toward provable self-correction.
☆26Jul 5, 2026Updated 2 weeks ago
g-luo / generative_latent_prior
View on GitHub
Official PyTorch Implementation for Learning a Generative Meta-Model of LLM Activations, ICML 2026
☆90Apr 30, 2026Updated 2 months ago
krandiash / quinine
View on GitHub
A library to create and manage configuration files, especially for machine learning projects.
☆79Mar 14, 2022Updated 4 years ago
TransluceAI / introspective-interp
View on GitHub
Repository for "Training Language Models To Explain Their Own Computations"
☆23Jul 7, 2026Updated 2 weeks ago
lhoestq / hfjobs
View on GitHub
Hugging Face Jobs
☆20Jul 11, 2025Updated last year
Interplay-LM-Reasoning / Interplay-LM-Reasoning
View on GitHub
[ICML 2026 Spotlight] On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models
☆162Jun 8, 2026Updated last month
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
UW-Madison-Lee-Lab / ReJump
View on GitHub
☆20May 26, 2026Updated last month
allenai / olmes
View on GitHub
Reproducible, flexible LLM evaluations
☆388Mar 24, 2026Updated 3 months ago
Phylliida / MambaLens
View on GitHub
Mamba support for transformer lens
☆20Sep 17, 2024Updated last year
SonicCodes / subcloning
View on GitHub
implementation of https://arxiv.org/pdf/2312.09299
☆21Jul 3, 2024Updated 2 years ago
tylerachang / goldfish
View on GitHub
Goldfish: Monolingual language models for 350 languages.
☆27Mar 4, 2026Updated 4 months ago
naotoo1 / Beyond-Neural-Scaling
View on GitHub
Implementation of Beyond Neural Scaling beating power laws for deep models and prototype-based models
☆35Oct 30, 2025Updated 8 months ago
neelnanda-io / neel-plotly
View on GitHub
A very hacky set of functions for getting plotly to do what I want when doing mech interp research, designed to be compatible with PyTorc…
☆15Jun 16, 2023Updated 3 years ago
naver-ai / hype
View on GitHub
[ECCV 2024] Official PyTorch implementation of "HYPE: Hyperbolic Entailment Filtering for Underspecified Images and Texts"
☆20Nov 22, 2024Updated last year
allenai / olmix
View on GitHub
☆41May 26, 2026Updated last month
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
Dynamics-of-Neural-Systems-Lab / cdc-fm
View on GitHub
A flow matching model that uses data-driven geometric noise to improve the quality-generalisation tradeoff and to reduce memorisation.
☆19Apr 7, 2026Updated 3 months ago
RadicalNumerics / spear
View on GitHub
Structured Primitives for Efficient Architecture Research
☆20Dec 22, 2025Updated 6 months ago
apartresearch / DarkBench
View on GitHub
Benchmarking Dark Patterns in LLMs (ICLR 2025)
☆18Mar 29, 2025Updated last year
huggingface / feel
View on GitHub
☆15May 26, 2026Updated last month
safety-research / safety-tooling
View on GitHub
Inference API for many LLMs and other useful tools for empirical research
☆133May 29, 2026Updated last month
AIRC-KETI / Korean-Copora
View on GitHub
☆14Dec 9, 2021Updated 4 years ago
r-three / realistic_evaluation_of_model_merging_for_compositional_generalization
View on GitHub
☆12Feb 11, 2026Updated 5 months ago
PRIME-RL / RL-Compositionality
View on GitHub
FROM $f(x)$ AND $g(x)$ TO $f(g(x))$: LLMs Learn New Skills in RL by Composing Old Ones
☆68Jan 26, 2026Updated 5 months ago
microsoft / data-efficacy
View on GitHub
Data Efficacy for Language Model Training
☆52May 29, 2026Updated last month
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
scaleapi / propensity-evaluation
View on GitHub
open Source code for propensity evaluation
☆19Apr 25, 2026Updated 2 months ago
AMD-AGI / torchtitan-amd
View on GitHub
A PyTorch native platform for training generative AI models
☆17Jun 30, 2026Updated 3 weeks ago
ndif-team / nnterp
View on GitHub
Unified access to Large Language Model modules using NNsight
☆116Jul 2, 2026Updated 2 weeks ago
RUCAIBox / MPOP
View on GitHub
☆13Jun 16, 2021Updated 5 years ago
tilde-research / aurora-release
View on GitHub
Aurora optimizer release
☆150Updated this week
manifoldmarkets / manifund
View on GitHub
☆13Updated this week
NielsRogge / coco-eval
View on GitHub
A tiny package supporting distributed computation of COCO metrics for PyTorch models.
☆15Feb 28, 2023Updated 3 years ago