DCPO: Dynamic Adaptive Clipping for RL
☆45Dec 20, 2025Updated 2 months ago
Alternatives and similar repositories for DCPO
Users that are interested in DCPO are comparing it to the libraries listed below
Sorting:
- ☆26Jul 29, 2025Updated 7 months ago
- ☆11Nov 30, 2025Updated 3 months ago
- [ICML 2025] LaCache: Ladder-Shaped KV Caching for Efficient Long-Context Modeling of Large Language Models☆17Nov 4, 2025Updated 3 months ago
- [ICLR 26] The official code repository for the paper "Mirage or Method? How Model–Task Alignment Induces Divergent RL Conclusions".☆15Feb 9, 2026Updated 3 weeks ago
- Codes for our paper "AgentMonitor: A Plug-and-Play Framework for Predictive and Secure Multi-Agent Systems"☆13Dec 13, 2024Updated last year
- [ICLR 2026] Official repo for "Spotlight on Token Perception for Multimodal Reinforcement Learning"☆49Jan 30, 2026Updated last month
- [ACL 2024 Findings] Light-PEFT: Lightening Parameter-Efficient Fine-Tuning via Early Pruning☆13Sep 2, 2024Updated last year
- Mixture-of-Basis-Experts for Compressing MoE-based LLMs☆29Dec 24, 2025Updated 2 months ago
- CoV: Chain-of-View Prompting for Spatial Reasoning☆51Jan 23, 2026Updated last month
- An End-to-End Model with Adaptive Filtering for Retrieval-Augmented Generation☆16Oct 27, 2024Updated last year
- The official implementation of the paper “Anchored Supervised Fine-Tuning”☆30Feb 12, 2026Updated 2 weeks ago
- [ICLR-2026] Official Implementation of our paper "THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical Reasoning".☆31Updated this week
- Emergent Hierarchical Reasoning in LLMs/VLMs through Reinforcement Learning☆62Oct 24, 2025Updated 4 months ago
- Code and Data for "FaithfulRAG: Fact-Level Conflict Modeling for Context-Faithful Retrieval-Augmented Generation" (ACL25)☆29Oct 26, 2025Updated 4 months ago
- Code for paper: Unified Text-to-Image Generation and Retrieval☆16Jul 6, 2024Updated last year
- The official implementation of Preference Data Reward-Augmentation.☆18May 1, 2025Updated 10 months ago
- Official implementation of https://arxiv.org/abs/2108.11554 paper☆13Feb 22, 2022Updated 4 years ago
- Audio Jailbreak: An Open Comprehensive Benchmark for Jailbreaking Large Audio-Language Models☆30Oct 6, 2025Updated 4 months ago
- [ICLR'25] Code for KaSA, an official implementation of "KaSA: Knowledge-Aware Singular-Value Adaptation of Large Language Models"☆20Jan 16, 2025Updated last year
- CODA: Coordinating the Cerebrum and Cerebellum for a Dual-Brain Computer Use Agent with Decoupled Reinforcement Learning☆35Aug 28, 2025Updated 6 months ago
- Source code for our paper: "ARIA: Training Language Agents with Intention-Driven Reward Aggregation".☆26Aug 9, 2025Updated 6 months ago
- Source code of our EMNLP 2024 paper "FactAlign: Long-form Factuality Alignment of Large Language Models"☆19Oct 3, 2024Updated last year
- [NeurIPS25] Official Implementation (Pytorch) of "DeepVideo-R1"☆31Feb 22, 2026Updated last week
- Generative Modeling with Bayesian Sample Inference☆24May 17, 2025Updated 9 months ago
- MetaAgent: Toward Self-Evolving Agent via Tool Meta-Learning☆42Sep 3, 2025Updated 5 months ago
- [NeurIPS 2024] Official Implementation of GrounDiT☆59Dec 12, 2024Updated last year
- Modality Gap–Driven Subspace Alignment Training Paradigm For Multimodal Large Language Models☆51Feb 23, 2026Updated last week
- ☆49Aug 14, 2025Updated 6 months ago
- [NeurIPS 2024] TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration☆26Oct 17, 2024Updated last year
- [NeurIPS'25] Router-R1: Teaching LLMs Multi-Round Routing and Aggregation via Reinforcement Learning☆116Dec 30, 2025Updated 2 months ago
- Resa: Transparent Reasoning Models via SAEs☆47Sep 23, 2025Updated 5 months ago
- Complex-Edit: CoT-Like Instruction Generation for Complexity-Controllable Image Editing Benchmark☆28Apr 22, 2025Updated 10 months ago
- [ICLR 2026] SparseD: Sparse Attention for Diffusion Language Models☆58Feb 22, 2026Updated last week
- Chinese Vision-Language Understanding Evaluation☆23Dec 26, 2024Updated last year
- MuMA-ToM: Multi-modal Multi-Agent Theory of Mind☆38Jan 23, 2025Updated last year
- ☆60Jan 12, 2026Updated last month
- The first spoken long-text dataset derived from live streams, designed to reflect the redundancy-rich and conversational nature of real-w…☆12Jun 28, 2025Updated 8 months ago
- [ICML'25] Official code of paper "Fast Large Language Model Collaborative Decoding via Speculation"☆28Jun 23, 2025Updated 8 months ago
- ☆90Oct 30, 2025Updated 4 months ago