LeonGuertler / UnstableBaselinesLinks
☆28Updated this week
Alternatives and similar repositories for UnstableBaselines
Users that are interested in UnstableBaselines are comparing it to the libraries listed below
Sorting:
- Training an LLM to use a calculator with multi-turn reinforcement learning, achieving a **62% absolute increase in evaluation accuracy**.☆41Updated last month
- Compiling useful links, papers, benchmarks, ideas, etc.☆46Updated 3 months ago
- Simple repository for training small reasoning models☆33Updated 4 months ago
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆173Updated 5 months ago
- Automated Capability Discovery via Foundation Model Self-Exploration☆52Updated 4 months ago
- ☆52Updated 2 weeks ago
- Simple GRPO scripts and configurations.☆58Updated 4 months ago
- A Collection of Competitive Text-Based Games for Language Model Evaluation and Reinforcement Learning☆184Updated this week
- Repository for the paper Stream of Search: Learning to Search in Language☆149Updated 4 months ago
- look how they massacred my boy☆63Updated 8 months ago
- A scalable asynchronous reinforcement learning implementation with in-flight weight updates.☆127Updated this week
- ☆127Updated 3 months ago
- Official repo for Learning to Reason for Long-Form Story Generation☆63Updated 2 months ago
- Measuring General Intelligence With Generated Games (Preprint)☆24Updated 3 weeks ago
- Learn online intrinsic rewards from LLM feedback☆41Updated 6 months ago
- ☆54Updated last year
- ☆63Updated last month
- Minimal but scalable implementation of large language models in JAX☆35Updated 7 months ago
- Intelligent Go-Explore: Standing on the Shoulders of Giant Foundation Models☆58Updated 4 months ago
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆173Updated 3 months ago
- ☆41Updated 5 months ago
- ☆23Updated 8 months ago
- Code and Configs for Asynchronous RLHF: Faster and More Efficient RL for Language Models☆57Updated 2 months ago
- j1-micro (1.7B) & j1-nano (600M) are absurdly tiny but mighty reward models.☆82Updated 3 weeks ago
- A framework for optimizing DSPy programs with RL☆76Updated this week
- Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Fl…☆75Updated 10 months ago
- Small, simple agent task environments for training and evaluation☆18Updated 7 months ago
- Simple and efficient pytorch-native transformer training and inference (batched)☆76Updated last year
- Storing long contexts in tiny caches with self-study☆67Updated last week
- ☆38Updated 11 months ago