☆119Updated this week
Alternatives and similar repositories for UnstableBaselines
Users that are interested in UnstableBaselines are comparing it to the libraries listed below
Sorting:
- Automatically annotates YOLO dataset using Moondream visual model☆20Aug 24, 2025Updated 6 months ago
- ☆28Feb 13, 2026Updated 2 weeks ago
- Ludic – an LLM-RL library for the era of experience☆60Jan 9, 2026Updated last month
- ☆15Apr 26, 2025Updated 10 months ago
- Chat with any codebase with MCP servers in a single command☆13May 28, 2025Updated 9 months ago
- Minimal (truly) muP implementation, consistent with TP4 and TP5 papers notation☆14Jan 2, 2026Updated last month
- Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna☆59Oct 18, 2025Updated 4 months ago
- ☆14Oct 18, 2023Updated 2 years ago
- ☆13Dec 12, 2025Updated 2 months ago
- ☆14Apr 16, 2025Updated 10 months ago
- ☆12Jun 2, 2023Updated 2 years ago
- My published benchmark for a Kaggle Simulations Competition☆28Dec 8, 2021Updated 4 years ago
- ☆115Jun 11, 2025Updated 8 months ago
- SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning☆175Sep 18, 2025Updated 5 months ago
- Best-of-N LLM editing with auto version control (+ other unix tools)☆39Apr 22, 2025Updated 10 months ago
- ☆42Dec 27, 2024Updated last year
- Our library for RL environments + evals☆3,850Updated this week
- ☆67May 23, 2025Updated 9 months ago
- ☆21Apr 29, 2024Updated last year
- Trance parser: an implementation of transition-based neural constituent parsing☆16Aug 9, 2021Updated 4 years ago
- ☆20Feb 11, 2024Updated 2 years ago
- Extract full next-token probabilities via language model APIs☆248Feb 23, 2024Updated 2 years ago
- A Domain-Specific Language, Jailbreak Attack Synthesizer and Dynamic LLM Redteaming Toolkit☆27Dec 5, 2024Updated last year
- CFR implementation of a poker bot.☆12Feb 17, 2023Updated 3 years ago
- Trains an agent with Proximal Policy Optimization (PPO) to beat Winter Run☆23May 21, 2022Updated 3 years ago
- Setting up Vscode to work with Pytorch in C/C++ with CUDA support☆25Feb 5, 2025Updated last year
- Memory-efficient Count-Min Sketch Counter (based on Madoka C++ library)☆27Nov 30, 2025Updated 3 months ago
- QAlign is a new test-time alignment approach that improves language model performance by using Markov chain Monte Carlo methods.☆26Feb 11, 2026Updated 2 weeks ago
- ☆32Jan 26, 2026Updated last month
- we have ai at home☆72Feb 18, 2026Updated last week
- Lego for GRPO☆30May 27, 2025Updated 9 months ago
- Exploring Applications of GRPO☆250Aug 25, 2025Updated 6 months ago
- ☆44Jul 22, 2024Updated last year
- [NeurIPS 2025 Spotlight] Reasoning Environments for Reinforcement Learning with Verifiable Rewards☆1,346Jan 16, 2026Updated last month
- A framework for optimizing DSPy programs with RL☆318Jan 12, 2026Updated last month
- Code for "Variational Reasoning for Language Models"☆56Sep 29, 2025Updated 5 months ago
- Codes for "Efficient Offline Policy Optimization with a Learned Model", ICLR2023☆30Jul 18, 2023Updated 2 years ago
- Atropos is a Language Model Reinforcement Learning Environments framework for collecting and evaluating LLM trajectories through diverse …☆858Feb 20, 2026Updated last week
- Sotopia-RL: Reward Design for Social Intelligence☆46Jan 29, 2026Updated last month