Shaping capabilities with token-level pretraining data filtering
☆80Jan 28, 2026Updated last month
Alternatives and similar repositories for token-filtering
Users that are interested in token-filtering are comparing it to the libraries listed below
Sorting:
- coded with and corrected by Google Anti-Gravity☆13Nov 23, 2025Updated 3 months ago
- A toy text-to-image model trained from scratch.☆19Jun 9, 2025Updated 8 months ago
- A tiny package supporting distributed computation of COCO metrics for PyTorch models.☆15Feb 28, 2023Updated 2 years ago
- Implementation of the model: "Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models" in PyTorch☆28Feb 9, 2026Updated 2 weeks ago
- An implementation of the Anthropic's paper and essay on "A statistical approach to model evaluations"☆17Oct 6, 2025Updated 4 months ago
- implementation of https://arxiv.org/pdf/2312.09299☆21Jul 3, 2024Updated last year
- Fork of Flame repo for training of some new stuff in development☆19Feb 20, 2026Updated last week
- ☆40Oct 2, 2025Updated 4 months ago
- Experiments to assess SPADE on different LLM pipelines.☆17Apr 7, 2024Updated last year
- Research work aimed at addressing the problem of modeling infinite-length context☆46Dec 18, 2025Updated 2 months ago
- Load any clip model with a standardized interface☆22Oct 20, 2025Updated 4 months ago
- Re-implementation of local descriptor HardNet training in fasta2+kornia☆21Apr 6, 2020Updated 5 years ago
- ☆22Apr 28, 2025Updated 10 months ago
- ☆39Oct 31, 2025Updated 3 months ago
- Demos of ChatGPT's function calling/structured data support.☆24Dec 21, 2023Updated 2 years ago
- Fluent student-teacher redteaming☆23Jul 25, 2024Updated last year
- Simple and scalable tools for data-driven pretraining data selection.☆29Jun 9, 2025Updated 8 months ago
- A collection of lightweight interpretability scripts to understand how LLMs think☆89Updated this week
- mlctl is the control plane for MLOps. It provides a CLI and a Python SDK for supporting key operations related to MLOps, such as "model t…☆25Aug 23, 2021Updated 4 years ago
- Sequence models in Numpy☆25Oct 9, 2020Updated 5 years ago
- ☆37Sep 21, 2025Updated 5 months ago
- Cross Atlas Remapping via Optimal Transport☆12Dec 14, 2023Updated 2 years ago
- Easier CUSUM control charts. Returns simple CUSUM statistics, CUSUMs with control limit calculations, and function to generate faceted …☆26Nov 22, 2024Updated last year
- This repo is built to facilitate the training and analysis of autoregressive transformers on maze-solving tasks.☆34Oct 28, 2025Updated 3 months ago
- entropix style sampling + GUI☆27Oct 30, 2024Updated last year
- ☆78Nov 12, 2024Updated last year
- Exploring mrdbourke's awesome 🔥 course on Deep Learning using Tensorflow. Beautiful Jupyter notebooks, a well structured installable Pyt…☆29Nov 3, 2021Updated 4 years ago
- Inference API for many LLMs and other useful tools for empirical research☆107Feb 16, 2026Updated last week
- Composable inference algorithms with LLMs and programmable logic☆69Dec 4, 2024Updated last year
- Plan✕ is a platform for creating and publishing digital planning services☆17Updated this week
- Tools for visualizing and comparing data from vertebrate retinas☆14Jan 20, 2025Updated last year
- ☆12Oct 7, 2020Updated 5 years ago
- Jax like function transformation engine but micro, microjax☆34Oct 25, 2024Updated last year
- Minimal Differentiable Image Reward Functions☆107Aug 19, 2025Updated 6 months ago
- 📦 A collection of pastable code gathered from past projects☆12Sep 9, 2024Updated last year
- Example application for creating an MVC Express + Node + TypeScript app and deploying it to Azure☆10Nov 8, 2018Updated 7 years ago
- This repository contains the Parasol processor, which enables next-generation privacy preserving applications. Users can run arbitrary co…☆11Jan 5, 2026Updated last month
- Project exploring 3D volumetric rendering of NEXRAD radar data.☆11Oct 23, 2023Updated 2 years ago
- Official implementation for "Law of the Weakest Link: Cross capabilities of Large Language Models"☆43Oct 1, 2024Updated last year