brendanhogan / DeepSeekRL-Extended
Exploring Applications of GRPO
☆185Updated this week
Alternatives and similar repositories for DeepSeekRL-Extended:
Users that are interested in DeepSeekRL-Extended are comparing it to the libraries listed below
- Build your own visual reasoning model☆341Updated this week
- minimal GRPO implementation from scratch☆85Updated last month
- Train your own SOTA deductive reasoning model☆88Updated last month
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆171Updated 3 months ago
- TTRL: Test-Time Reinforcement Learning☆166Updated this week
- an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)☆96Updated last month
- An extension of the nanoGPT repository for training small MOE models.☆131Updated last month
- Code for "Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate"☆139Updated this week
- ☆122Updated last month
- 🌾 OAT: A research-friendly framework for LLM online alignment, including preference learning, reinforcement learning, etc.☆325Updated this week
- nanoGRPO is a lightweight implementation of Group Relative Policy Optimization (GRPO)☆100Updated last week
- Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs☆161Updated this week
- ☆194Updated 2 months ago
- DeMo: Decoupled Momentum Optimization☆186Updated 4 months ago
- ☆71Updated this week
- Minimal hackable GRPO implementation☆213Updated 2 months ago
- OpenPipe ART (Agent Reinforcement Trainer): train LLM agents☆108Updated this week
- A framework to study AI models in Reasoning, Alignment, and use of Memory (RAM).☆217Updated 3 weeks ago
- A Collection of Competitive Text-Based Games for Language Model Evaluation and Reinforcement Learning☆139Updated this week
- PyTorch building blocks for the OLMo ecosystem☆197Updated this week
- Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"☆230Updated 2 months ago
- Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, spars…☆317Updated 4 months ago
- Official PyTorch implementation for Hogwild! Inference: Parallel LLM Generation with a Concurrent Attention Cache☆89Updated this week
- My fork os allen AI's OLMo for educational purposes.☆30Updated 4 months ago
- NanoGPT-speedrunning for the poor T4 enjoyers☆62Updated this week
- Single File, Single GPU, From Scratch, Efficient, Full Parameter Tuning library for "RL for LLMs"☆385Updated 2 weeks ago
- Compiling useful links, papers, benchmarks, ideas, etc.☆42Updated last month
- Official repo for paper: "Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't"☆204Updated last month
- Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasks☆182Updated last week
- ☆169Updated 2 months ago