WooooDyy/BAPO

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/WooooDyy/BAPO)

WooooDyy / BAPO

Codes for the paper "BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping" by Zhiheng Xi et al.

☆94

Alternatives and similar repositories for BAPO

Users that are interested in BAPO are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

nex-agi / NexGAP
View on GitHub
Nex General Agentic Data Pipeline, an end-to-end pipeline for generating high-quality agentic training data.
☆36Nov 19, 2025Updated 8 months ago
nex-agi / NexHTML
View on GitHub
HTML Agent based on NexAU
☆16Nov 20, 2025Updated 8 months ago
nex-agi / NexA4A
View on GitHub
Nex Agent for Agent is a meta-agent system that automatically creates specialized AI agents based on natural language requirements.
☆29Nov 18, 2025Updated 8 months ago
nex-agi / NexRL
View on GitHub
NexRL is an ultra-loosely-coupled LLM post-training framework.
☆114Updated this week
nex-agi / Nex-N1
View on GitHub
☆116Dec 5, 2025Updated 7 months ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
nex-agi / NexDR
View on GitHub
NexDR (Nex Deep Research), a leading deep research agent that autonomously investigates complex topics and generates rich, structured rep…
☆36Dec 4, 2025Updated 7 months ago
OpenLMLab / ParallelTokenizer
View on GitHub
Use the tokenizer in parallel to achieve superior acceleration
☆20Mar 21, 2024Updated 2 years ago
WooooDyy / BMMR
View on GitHub
Code and resources for the NeurIPS 2025 Paper "BMMR: A Large-Scale Bilingual Multimodal Multi-Discipline Reasoning Dataset" by Zhiheng X…
☆18Oct 14, 2025Updated 9 months ago
hewei2001 / ReachQA
View on GitHub
[EMNLP 2025] Distill Visual Chart Reasoning Ability from LLMs to MLLMs
☆61Aug 25, 2025Updated 11 months ago
GAIR-NLP / Safety-J
View on GitHub
Safety-J: Evaluating Safety with Critique
☆16Jul 28, 2024Updated last year
wizard-III / ArcherCodeR
View on GitHub
ArcherCodeR is an open-source initiative enhancing code reasoning in large language models through scalable, rule-governed reinforcement …
☆44Aug 6, 2025Updated 11 months ago
JiazhengZhang / AgentV-RL
View on GitHub
☆15Apr 17, 2026Updated 3 months ago
nex-agi / weaver
View on GitHub
Python SDK for Weaver.
☆17Updated this week
WooooDyy / LLM-Reverse-Curriculum-RL
View on GitHub
Implementation of the ICML 2024 paper "Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning" pr…
☆116Feb 9, 2024Updated 2 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
ZJU-REAL / InftyThink-Plus
View on GitHub
[ICML 2026] InftyThink+: Effective and Efficient Infinite-Horizon Reasoning via Reinforcement Learning
☆34May 25, 2026Updated 2 months ago
wizard-III / Archer2.0
View on GitHub
Archer2.0 evolves from its predecessor by introducing ASPO, which overcomes fundamental PPO-Clip limitations to prevent premature converg…
☆31Oct 10, 2025Updated 9 months ago
OpenLMLab / LongWanjuan
View on GitHub
Towards Systematic Measurement for Long Text Quality
☆39Sep 5, 2024Updated last year
Zhou-Zoey / RMB-Reward-Model-Benchmark
View on GitHub
☆48Mar 25, 2025Updated last year
GAIR-NLP / weak-to-strong-reasoning
View on GitHub
☆59Sep 2, 2024Updated last year
OpenLMLab / scaling-rope
View on GitHub
code for Scaling Laws of RoPE-based Extrapolation
☆73Oct 16, 2023Updated 2 years ago
InternLM / POLAR
View on GitHub
Pre-trained, Scalable, High-performance Reward Models via Policy Discriminative Learning.
☆166Sep 23, 2025Updated 10 months ago
january-blue / OpenNovelty
View on GitHub
☆135May 12, 2026Updated 2 months ago
tongjingqi / Thinking-with-Video
View on GitHub
We introduce 'Thinking with Video', a new paradigm leveraging video generation for multimodal reasoning. Our VideoThinkBench shows that S…
☆315Jun 21, 2026Updated last month
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
zzhang0179 / Unveiling-Linguistic-Regions-in-LLMs
View on GitHub
[ACL 2024] Unveiling Linguistic Regions in Large Language Models
☆34Jun 9, 2024Updated 2 years ago
llmeval / LLMEval-1
View on GitHub
[AAAI 2024] LLMEval Phase I dataset — 17 categories, 453 questions, 2186 annotators for Chinese LLM evaluation
☆114May 21, 2026Updated 2 months ago
jinzhuoran / RAG-RewardBench
View on GitHub
RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment
☆18Dec 19, 2024Updated last year
hkust-nlp / RL-Verifier-Robustness
View on GitHub
From Accuracy to Robustness: A Study of Rule- and Model-based Verifiers in Mathematical Reasoning.
☆24Oct 7, 2025Updated 9 months ago
llmeval / LLMEval-2
View on GitHub
[AAAI 2024] LLMEval Phase II dataset — professional domain evaluation across 12 academic disciplines
☆71May 21, 2026Updated 2 months ago
OpenMOSS / Thus-Spake-Long-Context-LLM
View on GitHub
a survey of long-context LLMs from four perspectives, architecture, infrastructure, training, and evaluation
☆62Mar 31, 2025Updated last year
KYLN24 / CritiQ
View on GitHub
Repository of the paper ''CritiQ: Mining Data Quality Criteria from Human Preferences". Code for CritiQ Flow & Training CritiQ Scorer.
☆22Dec 11, 2025Updated 7 months ago
ltzheng / SimpleTIR
View on GitHub
[ICLR 2026] End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning
☆401Mar 30, 2026Updated 3 months ago
multimodal-art-projection / TreePO
View on GitHub
☆65Mar 30, 2026Updated 3 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
WooooDyy / AgentGym
View on GitHub
Code and implementations for the ACL 2025 paper "AgentGym: Evolving Large Language Model-based Agents across Diverse Environments" by Zhi…
☆817May 30, 2026Updated last month
GAIR-NLP / self-improvement-reversal
View on GitHub
☆13Jul 14, 2024Updated 2 years ago
Lagooon / LeanSTaR
View on GitHub
☆44Sep 19, 2024Updated last year
WooooDyy / AgentGym-RL
View on GitHub
Code and implementations for the paper "AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcemen…
☆820Feb 15, 2026Updated 5 months ago
RUCAIBox / Passk_Training
View on GitHub
The official repository of paper "Pass@k Training for Adaptively Balancing Exploration and Exploitation of Large Reasoning Models''
☆113Aug 15, 2025Updated 11 months ago
Kwai-Klear / RLEP
View on GitHub
RL with Experience Replay
☆58Jul 27, 2025Updated 11 months ago
junkangwu / QAE
View on GitHub
[ICLR 2026] Quantile Advantage Estimation for Entropy-Safe Reasoning
☆29Oct 14, 2025Updated 9 months ago