The official repository of paper "Pass@k Training for Adaptively Balancing Exploration and Exploitation of Large Reasoning Models''
☆111Aug 15, 2025Updated 6 months ago
Alternatives and similar repositories for Passk_Training
Users that are interested in Passk_Training are comparing it to the libraries listed below
Sorting:
- ☆181Dec 5, 2025Updated 3 months ago
- ☆64Jan 12, 2026Updated last month
- ☆14Nov 12, 2025Updated 3 months ago
- TreeRL: LLM Reinforcement Learning with On-Policy Tree Search in ACL'25☆89Jun 16, 2025Updated 8 months ago
- [AAAI 2024] MESED: A Multi-modal Entity Set Expansion Dataset with Fine-grained Semantic Classes and Hard Negative Entities☆16Apr 26, 2024Updated last year
- The official implemention of "Depth-Breadth Synergy in RLVR: Unlocking LLM Reasoning Gains with Adaptive Exploration"☆24Feb 4, 2026Updated last month
- Python wrapper for lean-gym☆13Apr 5, 2023Updated 2 years ago
- [NeurIPS 2025] The implementation of paper "On Reasoning Strength Planning in Large Reasoning Models"☆30Jul 6, 2025Updated 8 months ago
- The code for paper "EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning"☆37Oct 1, 2025Updated 5 months ago
- [NAACL 2025] Representing Rule-based Chatbots with Transformers☆23Feb 9, 2025Updated last year
- ☆21Jun 5, 2025Updated 9 months ago
- ☆44Sep 19, 2024Updated last year
- Official Implementation of the paper "Jointly Reinforcing Diversity and Quality in Language Model Generations"☆57Dec 26, 2025Updated 2 months ago
- ☆216Feb 20, 2025Updated last year
- ☆25Oct 31, 2024Updated last year
- This is the repo for our paper "Mr-Ben: A Comprehensive Meta-Reasoning Benchmark for Large Language Models"☆51Oct 31, 2024Updated last year
- This repository lists papers, codes, and datasets in Biomedical Text Summarisation based on PLM☆23Oct 4, 2022Updated 3 years ago
- Instruct Once, Chat Consistently in Multiple Rounds: An Efficient Tuning Framework for Dialogue (ACL 2024)☆25Oct 18, 2025Updated 4 months ago
- Code for Evolving Language Models without Labels: Majority Drives Selection, Novelty Promotes Variation (EVOL-RL).☆48Oct 16, 2025Updated 4 months ago
- Towards a Unified View of Large Language Model Post-Training☆204Sep 8, 2025Updated 6 months ago
- official repo for the paper "Learning From Mistakes Makes LLM Better Reasoner"☆60Dec 20, 2023Updated 2 years ago
- Official repo for "PAPO: Perception-Aware Policy Optimization for Multimodal Reasoning"☆120Feb 4, 2026Updated last month
- ✨✨ [ICLR 2026] Think Beyond Images☆576Sep 23, 2025Updated 5 months ago
- ☆25Apr 9, 2025Updated 11 months ago
- simple bibtex generator for any text with \cite{}☆31Jul 13, 2024Updated last year
- AutoThink is a reinforcement learning framework designed to equip R1-style language models with adaptive reasoning capabilities. Instead …☆50Oct 14, 2025Updated 4 months ago
- A repo for REMOD: relation extraction algorithm based on multimodality knowledge distillation☆28Jan 4, 2022Updated 4 years ago
- AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories☆40Aug 7, 2025Updated 7 months ago
- Segment Policy Optimization: Effective Segment-Level Credit Assignment in RL for Large Language Models☆45Sep 19, 2025Updated 5 months ago
- [NeurIPS 2025] RL Tango: Reinforcing Generator and Verifier Together for Language Reasoning☆52Oct 23, 2025Updated 4 months ago
- Convenience repo for providing access to various presentations.☆12Updated this week
- A LLaMA1/LLaMA12 Megatron implement.☆28Dec 13, 2023Updated 2 years ago
- This repo contains evaluation code for the paper "MileBench: Benchmarking MLLMs in Long Context"☆36Jul 11, 2024Updated last year
- Official implementation for DenseMixer: Improving MoE Post-Training with Precise Router Gradient☆66Aug 3, 2025Updated 7 months ago
- RL with Experience Replay☆55Jul 27, 2025Updated 7 months ago
- Emergent Hierarchical Reasoning in LLMs/VLMs through Reinforcement Learning☆62Oct 24, 2025Updated 4 months ago
- Parameter-Efficient Fine-Tuning for Foundation Models☆111Mar 31, 2025Updated 11 months ago
- A collection of visual instruction tuning datasets.☆77Mar 14, 2024Updated last year
- [NeurIPS 2025] What Makes a Reward Model a Good Teacher? An Optimization Perspective☆42Sep 18, 2025Updated 5 months ago