zhangxy-2019 / critique-GRPOView external linksLinks
☆57Oct 2, 2025Updated 4 months ago
Alternatives and similar repositories for critique-GRPO
Users that are interested in critique-GRPO are comparing it to the libraries listed below
Sorting:
- ☆17Nov 20, 2024Updated last year
- Source code for our paper: "ARIA: Training Language Agents with Intention-Driven Reward Aggregation".☆26Aug 9, 2025Updated 6 months ago
- ☆21Nov 11, 2024Updated last year
- The official repository of NeurIPS'25 paper "Ada-R1: From Long-Cot to Hybrid-CoT via Bi-Level Adaptive Reasoning Optimization"☆21Nov 9, 2025Updated 3 months ago
- ☆55Feb 11, 2026Updated last week
- official implementation of ICLR'2025 paper: Rethinking Bradley-Terry Models in Preference-based Reward Modeling: Foundations, Theory, and…☆70Apr 2, 2025Updated 10 months ago
- ☆39Aug 6, 2025Updated 6 months ago
- Official Repository of "Learning to Reason under Off-Policy Guidance"☆416Oct 4, 2025Updated 4 months ago
- [ACL 2024] The official codebase for the paper "Self-Distillation Bridges Distribution Gap in Language Model Fine-tuning".☆154Nov 2, 2024Updated last year
- image retrieval using metric learning☆10Nov 22, 2022Updated 3 years ago
- The code implementation of MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models…☆40Feb 5, 2024Updated 2 years ago
- 🚀 LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training☆91Dec 3, 2024Updated last year
- The Entropy Mechanism of Reinforcement Learning for Large Language Model Reasoning.☆420Jul 11, 2025Updated 7 months ago
- Optimized Circuit Generation for Secure Multiparty Computation☆12Nov 25, 2019Updated 6 years ago
- A Redis-compatible in-memory database server written in Rust with MLua-based Lua 5.1 scripting☆17Nov 28, 2025Updated 2 months ago
- A jailbreak tweak to respring your device using the hardware buttons☆11Jun 9, 2020Updated 5 years ago
- 增加了indextts2的简单的界面与api调用方式☆20Oct 27, 2025Updated 3 months ago
- ☆14Dec 10, 2025Updated 2 months ago
- ☆24Aug 19, 2025Updated 5 months ago
- Diffusion for EEG☆11Jan 2, 2023Updated 3 years ago
- grpo to train long form QA and instructions with long-form reward model☆16Jul 17, 2025Updated 7 months ago
- [ICLR 2024] This is the official implementation for the paper: "Beyond imitation: Leveraging fine-grained quality signals for alignment"☆10May 5, 2024Updated last year
- Documentation at☆14Mar 27, 2025Updated 10 months ago
- Official Implementation of "Personalized Pieces: Efficient Personalized Large Language Models through Collaborative Efforts" at EMNLP 202…☆13Oct 27, 2024Updated last year
- AdaptiveStep: Automatically Dividing Reasoning Step through Model Confidence☆10Mar 2, 2025Updated 11 months ago
- Official Implementation of "Learning to Refuse: Towards Mitigating Privacy Risks in LLMs"☆10Dec 13, 2024Updated last year
- Repository of paper "Establishing Trustworthy LLM Evaluation via Shortcut Neuron Analysis" (ACL 2025 Main)☆19Jul 19, 2025Updated 6 months ago
- Using Xaml in the Win32 app model using DesktopWindowXamlSource☆16Jul 19, 2024Updated last year
- Python bindings for NVIDIA CUDA APIs.☆13Mar 2, 2024Updated last year
- ☆11Jan 3, 2024Updated 2 years ago
- Central difference kalman filter which can work with states on a manifold☆12Feb 26, 2021Updated 4 years ago
- a feature frontend for VINS☆10Aug 27, 2018Updated 7 years ago
- ☆16Jul 19, 2024Updated last year
- Courier Mail Server - shared libraries☆12Jan 31, 2026Updated 2 weeks ago
- [ICLR 2026] Official repo for "FrameThinker: Learning to Think with Long Videos via Multi-Turn Frame Spotlighting"☆37Oct 9, 2025Updated 4 months ago
- LongAttn :Selecting Long-context Training Data via Token-level Attention☆15Jul 16, 2025Updated 7 months ago
- Windows Sets sample from Build 2018☆11Apr 16, 2022Updated 3 years ago
- Reverse Engineering the Tabstate files for Windows Notepad☆10May 1, 2024Updated last year
- ☆26Oct 16, 2025Updated 4 months ago