Shaping capabilities with token-level pretraining data filtering
☆94Jan 28, 2026Updated 5 months ago
Alternatives and similar repositories for token-filtering
Users that are interested in token-filtering are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- implementation of https://arxiv.org/pdf/2312.09299☆21Jul 3, 2024Updated last year
- A tiny package supporting distributed computation of COCO metrics for PyTorch models.☆15Feb 28, 2023Updated 3 years ago
- Does patch ordering affect context-limited vision transformers?☆17Oct 10, 2025Updated 8 months ago
- A toy text-to-image model trained from scratch.☆19Jun 9, 2025Updated last year
- Implementation of the model: "Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models" in PyTorch☆29Jun 22, 2026Updated last week
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Implementation of numerous Vision Transformers in Google's JAX and Flax.☆22Aug 30, 2022Updated 3 years ago
- 🌾 Universal, customizable and deployable fine-grained evaluation for text generation.☆24Apr 22, 2026Updated 2 months ago
- Research work aimed at addressing the problem of modeling infinite-length context☆49Dec 18, 2025Updated 6 months ago
- Simple and scalable tools for data-driven pretraining data selection.☆29Jun 9, 2025Updated last year
- Code for experiments on self-prediction as a way to measure introspection in LLMs☆16Dec 10, 2024Updated last year
- Code for the paper "Large Language Models Share Representations of Latent Grammatical Concepts Across Typologically Diverse Languages" (N…☆17Apr 13, 2025Updated last year
- Materials for EACL2024 tutorial: Transformer-specific Interpretability☆66Mar 26, 2024Updated 2 years ago
- Load any clip model with a standardized interface☆22Oct 20, 2025Updated 8 months ago
- ☆22Dec 3, 2021Updated 4 years ago
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- ☆14Jun 24, 2024Updated 2 years ago
- MoE training for Me and You and maybe other people☆392Mar 15, 2026Updated 3 months ago
- Train text generation model with JavaScript.☆15Jul 14, 2024Updated last year
- Code and data for "A fine-grained comparison of pragmatic language understanding in humans and language models"☆11Dec 14, 2022Updated 3 years ago
- Collections of RLxLM experiments using minimal codes☆14Feb 17, 2025Updated last year
- A full-stack online music app, developed using MERN stack (React, Express.js, MongoDB) and Electron. Libraries including Tailwind CSS, Re…☆10Jul 2, 2024Updated last year
- QLoRA: Efficient Finetuning of Quantized LLMs☆11Jul 22, 2023Updated 2 years ago
- Codebase from our first release.☆58Feb 17, 2026Updated 4 months ago
- Mamba support for transformer lens☆20Sep 17, 2024Updated last year
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- ☆13Jul 14, 2024Updated last year
- Fluent student-teacher redteaming☆23Jul 25, 2024Updated last year
- ☆22Apr 28, 2025Updated last year
- ☆24May 27, 2025Updated last year
- entropix style sampling + GUI☆27Oct 30, 2024Updated last year
- Cross Atlas Remapping via Optimal Transport☆12Dec 14, 2023Updated 2 years ago
- Inference API for many LLMs and other useful tools for empirical research☆130May 29, 2026Updated last month
- Optimisation on Diffeomorphisms☆12Feb 17, 2025Updated last year
- Collection of academic works in natural language processing, computational linguistics, and computational cognitive science that study th…☆22Mar 20, 2024Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- [EMNLP 2025] Med-PRM: Medical Reasoning Models with Stepwise, Guideline-verified Process Rewards☆68Sep 15, 2025Updated 9 months ago
- Evaluating methods for estimating aperiodic activity in electrophysiological data.☆17Sep 24, 2024Updated last year
- [ICLR 2025] On Evluating the Durability of Safegurads for Open-Weight LLMs☆13Jun 20, 2025Updated last year
- Decoupled Q-Chunking☆72May 3, 2026Updated last month
- ☆19Mar 4, 2025Updated last year
- ☆15Oct 31, 2023Updated 2 years ago
- CIFAR-10 speedrun: Trains to 94% accuracy in 1.98 seconds on a single NVIDIA A100 GPU.☆79Oct 17, 2025Updated 8 months ago