One-shot Entropy Minimization
☆188Jun 13, 2025Updated 8 months ago
Alternatives and similar repositories for one-shot-em
Users that are interested in one-shot-em are comparing it to the libraries listed below
Sorting:
- The Entropy Mechanism of Reinforcement Learning for Large Language Model Reasoning.☆421Jul 11, 2025Updated 7 months ago
- ☆111Jun 15, 2025Updated 8 months ago
- TreeRL: LLM Reinforcement Learning with On-Policy Tree Search in ACL'25☆89Jun 16, 2025Updated 8 months ago
- Code repo for "Harnessing Negative Signals: Reinforcement Distillation from Teacher Data for LLM Reasoning"☆33Jul 25, 2025Updated 7 months ago
- [ICLR2025] γ -MOD: Mixture-of-Depth Adaptation for Multimodal Large Language Models☆42Oct 28, 2025Updated 4 months ago
- Extrapolating RLVR to General Domains without Verifiers☆201Aug 12, 2025Updated 6 months ago
- Entropy-Driven GRPO with Guided Error Correction for Advantage Diversity☆22Aug 28, 2025Updated 6 months ago
- Revisiting Mid-training in the Era of Reinforcement Learning Scaling☆185Jul 23, 2025Updated 7 months ago
- [EMNLP 2025] WebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement Learning☆77Nov 4, 2025Updated 4 months ago
- Official implementation for Text Generation Beyond Discrete Token Sampling☆22Aug 11, 2025Updated 6 months ago
- A holistic framework for advancing LLMs as data science agents☆37Feb 3, 2026Updated last month
- Trust Region Preference Approximation: A simple and stable reinforcement learning algorithm for LLM reasoning☆14Jun 28, 2025Updated 8 months ago
- [CVPR 2025] DiscoVLA: Discrepancy Reduction in Vision, Language, and Alignment for Parameter-Efficient Video-Text Retrieval☆21Jun 23, 2025Updated 8 months ago
- Creating Your Divine Agent 😇☆10Jan 26, 2026Updated last month
- [ICML 2025] LaCache: Ladder-Shaped KV Caching for Efficient Long-Context Modeling of Large Language Models☆17Nov 4, 2025Updated 4 months ago
- Active Learning Helps Pretrained Models Learn the Intended Task (https://arxiv.org/abs/2204.08491) by Alex Tamkin, Dat Nguyen, Salil Desh…☆11Nov 22, 2022Updated 3 years ago
- ☆34Aug 18, 2025Updated 6 months ago
- Official Repository of "Learning to Reason under Off-Policy Guidance"☆417Oct 4, 2025Updated 5 months ago
- [NeurIPS 2025] ScaleKV: Memory-Efficient Visual Autoregressive Modeling with Scale-Aware KV Cache Compression☆50Nov 4, 2025Updated 4 months ago
- [ICML'25] Official code of paper "Fast Large Language Model Collaborative Decoding via Speculation"☆28Jun 23, 2025Updated 8 months ago
- [NeurIPS 2024] Fight Back Against Jailbreaking via Prompt Adversarial Tuning☆11Oct 29, 2024Updated last year
- [NAACL'25 🏆 SAC Award] Official code for "Advancing MoE Efficiency: A Collaboration-Constrained Routing (C2R) Strategy for Better Expert…☆15Feb 4, 2025Updated last year
- ☆16Aug 1, 2024Updated last year
- Fast and Slow Generating: An Empirical Study on Large and Small Language Models Collaborative Decoding.☆13Nov 19, 2024Updated last year
- ☆16May 12, 2025Updated 9 months ago
- ROUTE: Robust Multitask Tuning and Collaboration for Text-to-SQL (ICLR 2025 Pytorch Code)☆17May 15, 2025Updated 9 months ago
- Codes for our paper "AgentMonitor: A Plug-and-Play Framework for Predictive and Secure Multi-Agent Systems"☆13Dec 13, 2024Updated last year
- [ICLR 26] The official code repository for the paper "Mirage or Method? How Model–Task Alignment Induces Divergent RL Conclusions".☆16Feb 9, 2026Updated last month
- [ICLR 2026] Official repo for "Spotlight on Token Perception for Multimodal Reinforcement Learning"☆49Jan 30, 2026Updated last month
- Reproducing R1 for Code with Reliable Rewards☆292May 5, 2025Updated 10 months ago
- An open-world scenario domain generalization code base☆27Feb 22, 2023Updated 3 years ago
- ACL'2025: SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs. and preprint: SoftCoT++: Test-Time Scaling with Soft Chain-of…☆81May 30, 2025Updated 9 months ago
- ☆15Nov 7, 2024Updated last year
- SPUQ: Perturbation-Based Uncertainty Quantification for Large Language Models☆15Jun 24, 2024Updated last year
- Mixture-of-Basis-Experts for Compressing MoE-based LLMs☆29Dec 24, 2025Updated 2 months ago
- Github Repo for OATS: Outlier-Aware Pruning through Sparse and Low Rank Decomposition☆18Apr 16, 2025Updated 10 months ago
- Segment Policy Optimization: Effective Segment-Level Credit Assignment in RL for Large Language Models☆45Sep 19, 2025Updated 5 months ago
- [NAACL 2024] CoE-SQL: In-Context Learning for Multi-Turn Text-to-SQL with Chain-of-Editions☆13May 7, 2024Updated last year
- CoV: Chain-of-View Prompting for Spatial Reasoning☆51Jan 23, 2026Updated last month