aeroplanepaper / GRPO-LEADView external linksLinks
☆33Nov 18, 2025Updated 2 months ago
Alternatives and similar repositories for GRPO-LEAD
Users that are interested in GRPO-LEAD are comparing it to the libraries listed below
Sorting:
- This is the official implementation of the paper "S²R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning"☆74Apr 22, 2025Updated 9 months ago
- ☆60Jan 12, 2026Updated last month
- ☆17Aug 1, 2025Updated 6 months ago
- ☆14Jan 24, 2025Updated last year
- ☆16Sep 4, 2025Updated 5 months ago
- Official InfiniBench: A Benchmark for Large Multi-Modal Models in Long-Form Movies and TV Shows☆19Nov 4, 2025Updated 3 months ago
- FastCuRL: Curriculum Reinforcement Learning with Stage-wise Context Scaling for Efficient LLM Reasoning☆57Oct 10, 2025Updated 4 months ago
- This is the official repo for the paper "AMO-Bench: Large Language Models Still Struggle in High School Math Competitions".☆62Feb 6, 2026Updated last week
- ReaRAG: Knowledge-guided Reasoning Enhances Factuality of Large Reasoning Models with Iterative Retrieval Augmented Generation☆25Aug 24, 2025Updated 5 months ago
- ☆29Updated this week
- CoT-Valve: Length-Compressible Chain-of-Thought Tuning☆91Feb 14, 2025Updated last year
- The Code and Script of "David's Slingshot: A Strategic Coordination Framework of Small LLMs Matches Large LLMs in Data Synthesis"☆35Jun 13, 2025Updated 8 months ago
- ☆15Apr 11, 2024Updated last year
- Efficient Scaling laws and collaborative pretraining.☆20Sep 18, 2025Updated 4 months ago
- ☆35May 16, 2025Updated 9 months ago
- The benchmark and datasets of the ICML 2024 paper "VisionGraph: Leveraging Large Multimodal Models for Graph Theory Problems in Visual C…☆17May 27, 2024Updated last year
- (ACL 2025 oral) SCOPE: Optimizing KV Cache Compression in Long-context Generation☆34May 28, 2025Updated 8 months ago
- Emergent Hierarchical Reasoning in LLMs/VLMs through Reinforcement Learning☆62Oct 24, 2025Updated 3 months ago
- A comprehensive and efficient long-context model evaluation framework☆30Feb 8, 2026Updated last week
- CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning Models (NeurIPS 2025)☆172Nov 4, 2025Updated 3 months ago
- ☆18Oct 28, 2025Updated 3 months ago
- [NeurIPS'25] The official code of "PeRL: Permutation-Enhanced Reinforcement Learning for Interleaved Vision-Language Reasoning"☆30Jan 12, 2026Updated last month
- SimPER: A Minimalist Approach to Preference Alignment without Hyperparameters (ICLR 2025)☆17Aug 22, 2025Updated 5 months ago
- The official implementation of ICLR 2025 paper "Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models".☆18Apr 25, 2025Updated 9 months ago
- [EMNLP 2024] Official repository for paper "From the Least to the Most: Building a Plug-and-Play Visual Reasoner via Data Synthesis"☆21Oct 15, 2024Updated last year
- Official implementation of Scaling Laws in Patchification: An Image Is Worth 50,176 Tokens And More☆24Feb 25, 2025Updated 11 months ago
- [NeurIPS'24] MemVLT: Vision-Language Tracking with Adaptive Memory-based Prompts☆18Oct 7, 2024Updated last year
- HierSearch: A Hierarchical Enterprise Deep Search Framework Integrating Local and Web Searches☆37Oct 9, 2025Updated 4 months ago
- ☆16Jul 23, 2024Updated last year
- Awesome Long-CoT Data☆18Mar 26, 2025Updated 10 months ago
- Code, Data and Model for Paper "Learning from Peers in Reasoning Models"☆27May 13, 2025Updated 9 months ago
- AdaSkip: Adaptive Sublayer Skipping for Accelerating Long-Context LLM Inference☆20Jan 24, 2025Updated last year
- [NeurIPS 2024] | An Efficient Recipe for Long Context Extension via Middle-Focused Positional Encoding☆21Oct 10, 2024Updated last year
- Compiler-R1: Towards Agentic Compiler Auto-tuning with Reinforcement Learning☆28Jul 14, 2025Updated 7 months ago
- ☆19Sep 24, 2025Updated 4 months ago
- ☆22Mar 7, 2025Updated 11 months ago
- Official repository of paper "Context-DPO: Aligning Language Models for Context-Faithfulness"☆21Feb 17, 2025Updated 11 months ago
- [ACM MM25] LongWriter-V: Enabling Ultra-Long and High-Fidelity Generation in Vision-Language Models☆23Mar 29, 2025Updated 10 months ago
- ☆33Jul 15, 2025Updated 7 months ago