mingyin0312 / RL4LLM
RL significantly the reasoning capability of Qwen2.5-1.5B-Instruct
☆27Updated last month
Alternatives and similar repositories for RL4LLM:
Users that are interested in RL4LLM are comparing it to the libraries listed below
- nanoGRPO is a lightweight implementation of Group Relative Policy Optimization (GRPO)☆91Updated this week
- ☆47Updated 7 months ago
- Train, tune, and infer Bamba model☆87Updated 2 months ago
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆168Updated 2 months ago
- Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)☆150Updated 3 months ago
- ☆60Updated 11 months ago
- Exploring Applications of GRPO☆145Updated this week
- Collection of autoregressive model implementation☆83Updated last month
- Official repo of paper LM2☆34Updated last month
- ☆111Updated last month
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆55Updated 7 months ago
- An extension of the nanoGPT repository for training small MOE models.☆109Updated 3 weeks ago
- My fork os allen AI's OLMo for educational purposes.☆30Updated 3 months ago
- model activation visualiser☆90Updated this week
- ☆48Updated 4 months ago
- ☆74Updated 7 months ago
- Train your own SOTA deductive reasoning model☆81Updated 3 weeks ago
- A pipeline for LLM knowledge distillation☆99Updated this week
- ☆115Updated 7 months ago
- NeurIPS 2024 tutorial on LLM Inference☆39Updated 3 months ago
- ☆66Updated last week
- an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)☆91Updated 3 weeks ago
- Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasks☆140Updated this week
- Implementation of the paper: "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention" from Google in pyTO…☆53Updated last week
- Set of scripts to finetune LLMs☆37Updated last year
- A simplified implementation for experimenting with Reinforcement Learning (RL) on GSM8K, inspired by RLVR and Deepseek R1. This repositor…☆72Updated last month
- A framework to study AI models in Reasoning, Alignment, and use of Memory (RAM).☆209Updated last week
- Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"☆97Updated 6 months ago
- Code Implementation, Evaluations, Documentation, Links and Resources for Min P paper☆28Updated 2 weeks ago
- Fine-tunes a student LLM using teacher feedback for improved reasoning and answer quality. Implements GRPO with teacher-provided evaluati…☆39Updated 3 weeks ago