Trae1ounG/BuPO

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Trae1ounG/BuPO)

Trae1ounG / BuPO

[arxiv: 2512.19673] Bottom-up Policy Optimization: Your Language Model Policy Secretly Contains Internal Policies

☆60

Alternatives and similar repositories for BuPO

Users that are interested in BuPO are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

chenlong-clock / RULE-Unlearn
View on GitHub
[NeurIPS25] RULE: Reinforcement UnLEarning Achieves Forge-retain Pareto Optimality
☆20Oct 22, 2025Updated 9 months ago
MozerWang / promISe
View on GitHub
[COLING 2024 (Oral)] PromISe:Releasing the Capabilities of LLMs with Prompt Introspective Search
☆23Aug 26, 2024Updated last year
Trae1ounG / Pretrain_Space_RLVR
View on GitHub
[arxiv: 2604.14142] From P(y|x) to P(y): Investigating Reinforcement Learning in Pre-train Space
☆17Apr 16, 2026Updated 3 months ago
DA-Open / DV-World
View on GitHub
[ICML 2026] DV-World: Benchmarking Data Visualization Agents in Real-World Scenarios
☆69Apr 29, 2026Updated 2 months ago
MozerWang / DEMO
View on GitHub
[ACL 2025 (Findings)] DEMO: Reframing Dialogue Interaction with Fine-grained Element Modeling
☆22Dec 16, 2024Updated last year
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
hanningzhang / ER-PRM
View on GitHub
☆20Dec 14, 2024Updated last year
wenge-research / TableEval
View on GitHub
This repository contains code and data for the paper "TableEval: A Real-World Benchmark for Complex, Multilingual, and Multi-Structured T…
☆30Jun 12, 2025Updated last year
Interplay-LM-Reasoning / Interplay-LM-Reasoning
View on GitHub
[ICML 2026 Spotlight] On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models
☆162Jun 8, 2026Updated last month
SharkSpicy-NLP / SR-KI
View on GitHub
SR-KI: Scalable and Real-Time Knowledge Integration into LLMs via Supervised Attention
☆56Dec 6, 2025Updated 7 months ago
MozerWang / AMPO
View on GitHub
[ICLR 2026] Adaptive Social Learning via Mode Policy Optimization for Language Agents
☆51Feb 2, 2026Updated 5 months ago
wutaiqiang / MI
View on GitHub
Official code for paper "Revisiting Model Interpolation for Efficient Reasoning"
☆17Jul 14, 2026Updated last week
jinzhuoran / RAG-RewardBench
View on GitHub
RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment
☆18Dec 19, 2024Updated last year
Hambaobao / Marathon
View on GitHub
Marathon: A Multiple-choice Long Context Evaluation Benchmark for Large Language Models.
☆10May 16, 2024Updated 2 years ago
CLR-Lab / SimKO
View on GitHub
SimKO: Simple Pass@K Policy Optimization
☆31Oct 24, 2025Updated 9 months ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
facebookresearch / reasoning-memory
View on GitHub
Procedural Knowledge at Scale Improves ReasoningThis repository contains the minimal, end-to-end pipeline for reproducing the paper resul…
☆15Apr 1, 2026Updated 3 months ago
HongbangYuan / OmniReward
View on GitHub
☆47Dec 16, 2025Updated 7 months ago
upup-wei / RAG-ReasonAlignment
View on GitHub
☆20May 20, 2025Updated last year
Leey21 / CipherBank
View on GitHub
☆14Jun 13, 2025Updated last year
AngelaZZZ-611 / reasoning_models_probing
View on GitHub
☆22May 14, 2026Updated 2 months ago
Tim-Siu / reinforcement-distillation
View on GitHub
Code repo for "Harnessing Negative Signals: Reinforcement Distillation from Teacher Data for LLM Reasoning"
☆33Jul 25, 2025Updated last year
nick7nlp / FastCuRL
View on GitHub
FastCuRL: Curriculum Reinforcement Learning with Stage-wise Context Scaling for Efficient LLM Reasoning (EMNLP 2025)
☆61Oct 10, 2025Updated 9 months ago
Guinan-Su / auto-merge-llm
View on GitHub
An official repository for GPTailor
☆18Jun 29, 2025Updated last year
ybwang119 / label_recovery
View on GitHub
[ICLR 2024] Towards Elminating Hard Label Constraints in Gradient Inverision Attacks
☆14Feb 6, 2024Updated 2 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
GAIR-NLP / OctoThinker
View on GitHub
Revisiting Mid-training in the Era of Reinforcement Learning Scaling
☆189Jul 23, 2025Updated last year
MozerWang / Loong
View on GitHub
[EMNLP 2024 (Oral)] Leave No Document Behind: Benchmarking Long-Context LLMs with Extended Multi-Doc QA
☆155Dec 22, 2025Updated 7 months ago
stallone0000 / Reasoning-Skill
View on GitHub
☆20May 25, 2026Updated 2 months ago
Gen-Verse / GenEnv
View on GitHub
GenEnv: Difficulty-Aligned Co-Evolution Between LLM Agents and Environment Simulators
☆62Dec 23, 2025Updated 7 months ago
FranxYao / Complexity-Based-Prompting
View on GitHub
Complexity Based Prompting for Multi-Step Reasoning
☆17Mar 10, 2023Updated 3 years ago
eternal8080 / MV-MATH
View on GitHub
Description for MV-MATH
☆15Jul 20, 2025Updated last year
aHapBean / NITP
View on GitHub
[ICML 2026] NITP: Next Implicit Token Prediction for LLM Pre-training
☆34May 26, 2026Updated 2 months ago
TsinghuaC3I / Unify-Post-Training
View on GitHub
Towards a Unified View of Large Language Model Post-Training
☆211Sep 8, 2025Updated 10 months ago
shiweijiezero / R3L
View on GitHub
☆23Apr 5, 2026Updated 3 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
PRIME-RL / Entropy-Mechanism-of-RL
View on GitHub
The Entropy Mechanism of Reinforcement Learning for Large Language Model Reasoning.
☆445Jul 11, 2025Updated last year
Trae1ounG / Awesome-Parametric-Knowledge-in-LLMs
View on GitHub
Must-read papers and blogs about parametric knowledge mechanism in LLMs.
☆41May 9, 2025Updated last year
Xnhyacinth / NesyCD
View on GitHub
[AAAI 2025] Neural-Symbolic Collaborative Distillation: Advancing Small Language Models for Complex Reasoning Tasks
☆12Jun 19, 2025Updated last year
hzy312 / knowledge-r1
View on GitHub
IKEA: Reinforced Internal-External Knowledge Synergistic Reasoning for Efficient Adaptive Search Agent
☆70May 13, 2025Updated last year
RUCAIBox / Passk_Training
View on GitHub
The official repository of paper "Pass@k Training for Adaptively Balancing Exploration and Exploitation of Large Reasoning Models''
☆113Aug 15, 2025Updated 11 months ago
Trae1ounG / DyPRAG
View on GitHub
[arxiv: 2503.23895] Dynamic Parametric Retrieval Augmented Generation for Test-time Knowledge Enhancement
☆182Aug 14, 2025Updated 11 months ago
D2I-ai / dasd-thinking
View on GitHub
☆105Jan 27, 2026Updated 5 months ago