brendanhogan / DeepSeekRL-ExtendedLinks
Exploring Applications of GRPO
☆251Updated 4 months ago
Alternatives and similar repositories for DeepSeekRL-Extended
Users that are interested in DeepSeekRL-Extended are comparing it to the libraries listed below
Sorting:
- Build your own visual reasoning model☆415Updated last month
- rl from zero pretrain, can it be done? yes.☆282Updated 3 months ago
- Tina: Tiny Reasoning Models via LoRA☆310Updated 3 months ago
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆174Updated 11 months ago
- Train your own SOTA deductive reasoning model☆107Updated 9 months ago
- ☆136Updated 9 months ago
- Single File, Single GPU, From Scratch, Efficient, Full Parameter Tuning library for "RL for LLMs"☆573Updated 2 months ago
- an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)☆109Updated 9 months ago
- Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, spars…☆365Updated last year
- ☆116Updated 3 weeks ago
- nanoGRPO is a lightweight implementation of Group Relative Policy Optimization (GRPO)☆138Updated 7 months ago
- FlexAttention based, minimal vllm-style inference engine for fast Gemma 2 inference.☆327Updated last month
- Async RL Training at Scale☆960Updated this week
- Minimal hackable GRPO implementation☆308Updated 11 months ago
- minimal GRPO implementation from scratch☆101Updated 9 months ago
- Simple & Scalable Pretraining for Neural Architecture Research☆305Updated 3 weeks ago
- A Collection of Competitive Text-Based Games for Language Model Evaluation and Reinforcement Learning☆332Updated 2 months ago
- ⚖️ Awesome LLM Judges ⚖️☆146Updated 8 months ago
- Training teachers with reinforcement learning able to make LLMs learn how to reason for test time scaling.☆355Updated 6 months ago
- Curated collection of community environments☆196Updated last week
- 🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.☆604Updated last week
- OpenTinker is an RL-as-a-Service infrastructure for foundation models☆424Updated this week
- Compiling useful links, papers, benchmarks, ideas, etc.☆45Updated 9 months ago
- code for training & evaluating Contextual Document Embedding models☆201Updated 7 months ago
- ☆185Updated last month
- NanoGPT-speedrunning for the poor T4 enjoyers☆73Updated 8 months ago
- Super basic implementation (gist-like) of RLMs with REPL environments.☆290Updated 2 months ago
- ☆225Updated last month
- Storing long contexts in tiny caches with self-study☆228Updated 3 weeks ago
- A framework to study AI models in Reasoning, Alignment, and use of Memory (RAM).☆333Updated 2 weeks ago