Simple repository for training small reasoning models
β49Feb 17, 2026Updated 2 weeks ago
Alternatives and similar repositories for microR1
Users that are interested in microR1 are comparing it to the libraries listed below
Sorting:
- JAX implementation of GPTQ quantization algorithmβ10Jul 19, 2023Updated 2 years ago
- π΅ muse: Music Separationβ11Feb 14, 2024Updated 2 years ago
- β14Apr 16, 2025Updated 10 months ago
- An example implementation of RLHF (or, more accurately, RLAIF) built on MLX and HuggingFace.β38Jun 21, 2024Updated last year
- https://footprints.baulab.infoβ17Oct 4, 2024Updated last year
- β17Oct 9, 2023Updated 2 years ago
- nanogpt turned into a chat modelβ81Aug 30, 2023Updated 2 years ago
- Build your own visual reasoning modelβ419Jan 13, 2026Updated last month
- A Ray-based data loader with per-epoch shuffling and configurable pipelining, for shuffling and loading training data for distributed traβ¦β18Jan 5, 2023Updated 3 years ago
- β19May 6, 2023Updated 2 years ago
- Code for "Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining"β27Oct 14, 2025Updated 4 months ago
- A distributed GPU-centric experience replay system for large AI models.β19Aug 1, 2023Updated 2 years ago
- cheap & easy LLM experiments for amateurs (alpha)β25Nov 30, 2025Updated 3 months ago
- β20Jun 11, 2023Updated 2 years ago
- QLoRA for Masked Language Modelingβ23Sep 11, 2023Updated 2 years ago
- Algorithms for optimization tasks (operations research)β19Sep 11, 2023Updated 2 years ago
- Code for "Counterfactual Token Generation in Large Language Models", Arxiv 2024.β32Nov 7, 2024Updated last year
- A Quantum Approximate Optimization Algorithmβ21Apr 6, 2018Updated 7 years ago
- Differential equation neural operatorβ22Sep 4, 2023Updated 2 years ago
- Stable timestamps and confidence score for words of OpenAI's Whisper outputs down to word-level.β24Dec 20, 2022Updated 3 years ago
- Official pytorch implementation of ZiRa, a method for incremental vision language object detection (IVLOD)οΌwhich has been accepted by Neuβ¦β28Oct 22, 2024Updated last year
- Library for training process reward modelsβ29Jun 3, 2025Updated 9 months ago
- Educational WIPβ70Feb 16, 2026Updated 3 weeks ago
- Due to the huge vocaburary size (151,936) of Qwen models, the Embedding and LM Head weights are excessively heavy. Therefore, this projecβ¦β33Jan 6, 2026Updated 2 months ago
- Legal document similarity - Code, data, and models for the ICAIL 2021 paper "Evaluating Document Representations for Content-based Legal β¦β32Apr 29, 2021Updated 4 years ago
- Attempt at reinforcement learning with curiosity for Sonic the Hedgehog games. Number 149 on OpenAI retro contest leaderboard, but more wβ¦β32Sep 17, 2018Updated 7 years ago
- rl from zero pretrain, can it be done? yes.β288Sep 28, 2025Updated 5 months ago
- Dive into Jax, Flax, XLA and C++β32Apr 1, 2020Updated 5 years ago
- InSales e-commerce platform API bindingsβ14Jul 13, 2024Updated last year
- β12Oct 7, 2020Updated 5 years ago
- Detect and redact PII locally with SOTA performanceβ91Mar 25, 2025Updated 11 months ago
- Async RL Training at Scaleβ1,107Updated this week
- GPT2 fine-tuning pipeline with KerasNLP, TensorFlow, and TensorFlow Extendedβ33Sep 6, 2023Updated 2 years ago
- Official code for TLDR: Unsupervised Goal-Conditioned RL via Temporal Distance-Aware Representationsβ36Jan 24, 2026Updated last month
- β43Apr 22, 2025Updated 10 months ago
- A simple MLX implementation for pretraining LLMs on Apple Silicon.β86Aug 20, 2025Updated 6 months ago
- A Transfer Learning Study of Gas Adsorption in Metal-Organic Frameworksβ14Jul 15, 2020Updated 5 years ago
- The GraphBench package.β27Updated this week
- Clean RL implementation using MLXβ35Mar 8, 2024Updated 2 years ago