brendanhogan / DeepSeekRL-Extended
Exploring Applications of GRPO
☆109Updated last month
Alternatives and similar repositories for DeepSeekRL-Extended:
Users that are interested in DeepSeekRL-Extended are comparing it to the libraries listed below
- working implimention of deepseek MLA☆38Updated 2 months ago
- Train your own SOTA deductive reasoning model☆81Updated 3 weeks ago
- Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs☆145Updated this week
- My fork os allen AI's OLMo for educational purposes.☆30Updated 3 months ago
- A framework to study AI models in Reasoning, Alignment, and use of Memory (RAM).☆209Updated last week
- Code for "Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate"☆131Updated last month
- Fine-tunes a student LLM using teacher feedback for improved reasoning and answer quality. Implements GRPO with teacher-provided evaluati…☆39Updated 3 weeks ago
- Efficient triton implementation of Native Sparse Attention.☆116Updated this week
- nanoGRPO is a lightweight implementation of Group Relative Policy Optimization (GRPO)☆83Updated last week
- Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasks☆83Updated last week
- Train, tune, and infer Bamba model☆86Updated 2 months ago
- an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)☆91Updated 3 weeks ago
- Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems☆75Updated 3 weeks ago
- A collection of tricks and tools to speed up transformer models☆145Updated this week
- Code Implementation, Evaluations, Documentation, Links and Resources for Min P paper☆28Updated last week
- Simple GRPO scripts and configurations.☆58Updated last month
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆168Updated 2 months ago
- Collection of autoregressive model implementation☆83Updated last month
- Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"☆226Updated last month
- ☆47Updated 7 months ago
- ☆16Updated 3 weeks ago
- Implementation of 🥥 Coconut, Chain of Continuous Thought, in Pytorch☆161Updated 2 months ago
- ☆74Updated 7 months ago
- ☆47Updated last week
- Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, spars…☆310Updated 3 months ago
- PyTorch implementation of models from the Zamba2 series.☆178Updated 2 months ago
- RL significantly the reasoning capability of Qwen2.5-1.5B-Instruct☆27Updated last month
- This is the official repository for Inheritune.☆109Updated last month
- Normalized Transformer (nGPT)☆164Updated 4 months ago
- FuseAI Project☆84Updated 2 months ago