☆70Jul 28, 2024Updated last year
Alternatives and similar repositories for GRPO
Users that are interested in GRPO are comparing it to the libraries listed below
Sorting:
- ☆13Sep 12, 2024Updated last year
- Beyond KV Caching: Shared Attention for Efficient LLMs☆20Jul 19, 2024Updated last year
- ☆11Mar 13, 2023Updated 2 years ago
- AutoLibra: Metric Induction for Agents from Open-Ended Human Feedback☆17Oct 15, 2025Updated 4 months ago
- A mathematical model for Fibonacci Retracement and location entry and exit formulation using ML☆10Aug 2, 2022Updated 3 years ago
- ☆12Oct 7, 2024Updated last year
- Code repository for the paper "The Inherent Limits of Pretrained LLMs: The Unexpected Convergence of Instruction Tuning and In-Context Le…☆13Jan 16, 2025Updated last year
- Code and data for Distributional Correlation–Aware Knowledge Distillation for Stock Trading Volume Prediction (ECML-PKDD 22)☆15Sep 6, 2022Updated 3 years ago
- “Style Transfer as Data Augmentation: A Case Study on Named Entity Recognition” (EMNLP 2022)☆16Feb 2, 2023Updated 3 years ago
- ☆18Mar 19, 2025Updated 11 months ago
- Open sourced predictions, execution logs, trajectories, and results from model inference + evaluation runs on the SWE-bench task.☆15Sep 4, 2024Updated last year
- Scripts for pushing models to huggingface repos☆15Sep 11, 2025Updated 5 months ago
- Agent-RRM: Exploring Reasoning Reward Model for Agents☆49Updated this week
- ☆20Mar 3, 2025Updated last year
- Symmetrical Visual Contrastive Optimization: Aligning Vision-Language Models with Minimal Contrastive Images☆18Jun 4, 2025Updated 8 months ago
- A Multi-Dimensional Constraint Framework for Evaluating and Improving Instruction Following in Large Language Models☆20May 24, 2025Updated 9 months ago
- [ICML 2025] Code for "R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts"☆19Mar 10, 2025Updated 11 months ago
- ☆17Apr 7, 2025Updated 10 months ago
- Implementation of Qformer from BLIP2 in Zeta Lego blocks.☆48Nov 11, 2024Updated last year
- Convert CVXPY expressions to PyTorch expressions☆19Jul 8, 2025Updated 7 months ago
- Code for "Explainable Data-Driven Optimization" (ICML 2023)☆14Jul 21, 2023Updated 2 years ago
- Introducing Filtered Direct Preference Optimization (fDPO) that enhances language model alignment with human preferences by discarding lo…☆16Nov 27, 2024Updated last year
- The code of “Improving Weak-to-Strong Generalization with Scalable Oversight and Ensemble Learning”☆17Feb 26, 2024Updated 2 years ago
- ACRE: Abstract Causal REasoning Beyond Covariation☆19Dec 7, 2021Updated 4 years ago
- Using conversational games to evaluate powerful LLMs☆18Sep 3, 2023Updated 2 years ago
- Implementation of Influence Function approximations for differently sized ML models, using PyTorch☆16Sep 15, 2023Updated 2 years ago
- ☆20May 7, 2025Updated 9 months ago
- Official implementation of "Graph Meta-Reinforcement Learning for TransferableAutonomous Mobility-on-Demand"☆16Mar 3, 2022Updated 3 years ago
- Representation Learning in RL☆13Jun 1, 2022Updated 3 years ago
- Prepare SEC EDGAR data for working examples☆20Feb 7, 2024Updated 2 years ago
- this is for fun, ain't it grand!☆21Sep 18, 2025Updated 5 months ago
- ☆19Mar 10, 2025Updated 11 months ago
- [AAAI 2026] Multimodal Deepresearcher: Generating Text-Chart Interleaved Reports From Scratch with Agentic Framework☆45Jan 25, 2026Updated last month
- ☆20Nov 4, 2025Updated 3 months ago
- implementation of dualformer☆24Mar 1, 2025Updated last year
- ☆21Jul 25, 2025Updated 7 months ago
- ☆22Dec 2, 2024Updated last year
- Suri: Multi-constraint instruction following for long-form text generation (EMNLP’24)☆27Oct 3, 2025Updated 5 months ago
- Just a subfolder of https://github.com/siliconflow/onediff☆24Jun 24, 2024Updated last year