RUCAIBox / Passk_TrainingView external linksLinks
The official repository of paper "Pass@k Training for Adaptively Balancing Exploration and Exploitation of Large Reasoning Models''
☆111Aug 15, 2025Updated 5 months ago
Alternatives and similar repositories for Passk_Training
Users that are interested in Passk_Training are comparing it to the libraries listed below
Sorting:
- ☆179Dec 5, 2025Updated 2 months ago
- ☆60Jan 12, 2026Updated last month
- ☆14Nov 12, 2025Updated 3 months ago
- TreeRL: LLM Reinforcement Learning with On-Policy Tree Search in ACL'25☆88Jun 16, 2025Updated 7 months ago
- Python wrapper for lean-gym☆12Apr 5, 2023Updated 2 years ago
- ☆27Jul 18, 2025Updated 6 months ago
- [NeurIPS 2025] The implementation of paper "On Reasoning Strength Planning in Large Reasoning Models"☆30Jul 6, 2025Updated 7 months ago
- The code for paper "EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning"☆37Oct 1, 2025Updated 4 months ago
- [NAACL 2025] Representing Rule-based Chatbots with Transformers☆23Feb 9, 2025Updated last year
- ☆42Sep 19, 2024Updated last year
- Official Implementation of the paper "Jointly Reinforcing Diversity and Quality in Language Model Generations"☆55Dec 26, 2025Updated last month
- ☆54Oct 29, 2024Updated last year
- ☆215Feb 20, 2025Updated 11 months ago
- [EMNLP 2025] LightThinker: Thinking Step-by-Step Compression☆132Apr 12, 2025Updated 10 months ago
- This is the repo for our paper "Mr-Ben: A Comprehensive Meta-Reasoning Benchmark for Large Language Models"☆51Oct 31, 2024Updated last year
- ☆25Oct 31, 2024Updated last year
- Instruct Once, Chat Consistently in Multiple Rounds: An Efficient Tuning Framework for Dialogue (ACL 2024)☆25Oct 18, 2025Updated 3 months ago
- Code for Evolving Language Models without Labels: Majority Drives Selection, Novelty Promotes Variation (EVOL-RL).☆47Oct 16, 2025Updated 3 months ago
- [arxiv: 2512.19673] Bottom-up Policy Optimization: Your Language Model Policy Secretly Contains Internal Policies☆59Feb 6, 2026Updated last week
- Towards a Unified View of Large Language Model Post-Training☆201Sep 8, 2025Updated 5 months ago
- official repo for the paper "Learning From Mistakes Makes LLM Better Reasoner"☆60Dec 20, 2023Updated 2 years ago
- Code for the paper "Learning Temporal Distances: Contrastive Successor Features Can Provide a Metric Structure for Decision-Making"☆28Jul 11, 2024Updated last year
- Official repo for "PAPO: Perception-Aware Policy Optimization for Multimodal Reasoning"☆113Feb 4, 2026Updated last week
- ✨✨ [ICLR 2026] Think Beyond Images☆578Sep 23, 2025Updated 4 months ago
- ☆47Aug 5, 2025Updated 6 months ago
- AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories☆40Aug 7, 2025Updated 6 months ago
- A repo for REMOD: relation extraction algorithm based on multimodality knowledge distillation☆28Jan 4, 2022Updated 4 years ago
- A LLaMA1/LLaMA12 Megatron implement.☆28Dec 13, 2023Updated 2 years ago
- This repo contains evaluation code for the paper "MileBench: Benchmarking MLLMs in Long Context"☆36Jul 11, 2024Updated last year
- Emergent Hierarchical Reasoning in LLMs/VLMs through Reinforcement Learning☆60Oct 24, 2025Updated 3 months ago
- RL with Experience Replay☆55Jul 27, 2025Updated 6 months ago
- Understanding R1-Zero-Like Training: A Critical Perspective☆1,205Aug 27, 2025Updated 5 months ago
- Preparing for ML Interviews.☆53Jan 12, 2026Updated last month
- [NeurIPS 2025] What Makes a Reward Model a Good Teacher? An Optimization Perspective☆42Sep 18, 2025Updated 4 months ago
- Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models☆139Jun 12, 2024Updated last year
- Real-time airport timetable data.☆12Nov 6, 2023Updated 2 years ago
- Primus-SaFE(Stability and Fault Endurance)☆50Updated this week
- LexAI is an innovative legal assistant platform that simplifies legal processes through AI-driven tools. It offers comprehensive features…☆10Aug 8, 2024Updated last year
- This is a custom integration for Home Assistant that allows you to control and monitor IRSAP radiators through AWS Cognito authentication…☆13Jul 7, 2025Updated 7 months ago