UbiquantAI / one-shot-emView external linksLinks
One-shot Entropy Minimization
☆188Jun 13, 2025Updated 8 months ago
Alternatives and similar repositories for one-shot-em
Users that are interested in one-shot-em are comparing it to the libraries listed below
Sorting:
- ☆112Jun 15, 2025Updated 7 months ago
- Code repo for "Harnessing Negative Signals: Reinforcement Distillation from Teacher Data for LLM Reasoning"☆32Jul 25, 2025Updated 6 months ago
- [ICLR2025] γ -MOD: Mixture-of-Depth Adaptation for Multimodal Large Language Models☆42Oct 28, 2025Updated 3 months ago
- [EMNLP 2025] WebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement Learning☆73Nov 4, 2025Updated 3 months ago
- Extrapolating RLVR to General Domains without Verifiers☆200Aug 12, 2025Updated 6 months ago
- Entropy-Driven GRPO with Guided Error Correction for Advantage Diversity☆22Aug 28, 2025Updated 5 months ago
- [CVPR 2025] DiscoVLA: Discrepancy Reduction in Vision, Language, and Alignment for Parameter-Efficient Video-Text Retrieval☆22Jun 23, 2025Updated 7 months ago
- Revisiting Mid-training in the Era of Reinforcement Learning Scaling☆182Jul 23, 2025Updated 6 months ago
- Official implementation for Text Generation Beyond Discrete Token Sampling☆21Aug 11, 2025Updated 6 months ago
- Active Learning Helps Pretrained Models Learn the Intended Task (https://arxiv.org/abs/2204.08491) by Alex Tamkin, Dat Nguyen, Salil Desh…☆11Nov 22, 2022Updated 3 years ago
- Trust Region Preference Approximation: A simple and stable reinforcement learning algorithm for LLM reasoning☆14Jun 28, 2025Updated 7 months ago
- Creating Your Divine Agent 😇☆10Jan 26, 2026Updated 2 weeks ago
- [ICLR 26] The official code repository for the paper "Mirage or Method? How Model–Task Alignment Induces Divergent RL Conclusions".☆15Updated this week
- [ICML 2025] LaCache: Ladder-Shaped KV Caching for Efficient Long-Context Modeling of Large Language Models☆17Nov 4, 2025Updated 3 months ago
- A holistic framework for advancing LLMs as data science agents☆30Feb 3, 2026Updated last week
- ☆34Aug 18, 2025Updated 5 months ago
- Official Repository of "Learning to Reason under Off-Policy Guidance"☆413Oct 4, 2025Updated 4 months ago
- [NeurIPS 2025] ScaleKV: Memory-Efficient Visual Autoregressive Modeling with Scale-Aware KV Cache Compression☆50Nov 4, 2025Updated 3 months ago
- [ICML'25] Official code of paper "Fast Large Language Model Collaborative Decoding via Speculation"☆28Jun 23, 2025Updated 7 months ago
- Reproducing R1 for Code with Reliable Rewards☆286May 5, 2025Updated 9 months ago
- [ICLR 2026] Official repo for "Spotlight on Token Perception for Multimodal Reinforcement Learning"☆49Jan 30, 2026Updated 2 weeks ago
- CoV: Chain-of-View Prompting for Spatial Reasoning☆50Jan 23, 2026Updated 3 weeks ago
- Codes for our paper "AgentMonitor: A Plug-and-Play Framework for Predictive and Secure Multi-Agent Systems"☆13Dec 13, 2024Updated last year
- ROUTE: Robust Multitask Tuning and Collaboration for Text-to-SQL (ICLR 2025 Pytorch Code)☆17May 15, 2025Updated 8 months ago
- Mixture-of-Basis-Experts for Compressing MoE-based LLMs☆27Dec 24, 2025Updated last month
- ☆16Aug 1, 2024Updated last year
- [NAACL'25 🏆 SAC Award] Official code for "Advancing MoE Efficiency: A Collaboration-Constrained Routing (C2R) Strategy for Better Expert…☆14Feb 4, 2025Updated last year
- ☆16May 12, 2025Updated 9 months ago
- An open-world scenario domain generalization code base☆27Feb 22, 2023Updated 2 years ago
- ACL'2025: SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs. and preprint: SoftCoT++: Test-Time Scaling with Soft Chain-of…☆78May 30, 2025Updated 8 months ago
- SPUQ: Perturbation-Based Uncertainty Quantification for Large Language Models☆15Jun 24, 2024Updated last year
- ☆15Nov 7, 2024Updated last year
- ☆12Jan 10, 2025Updated last year
- Segment Policy Optimization: Effective Segment-Level Credit Assignment in RL for Large Language Models☆45Sep 19, 2025Updated 4 months ago
- [NAACL 2024] CoE-SQL: In-Context Learning for Multi-Turn Text-to-SQL with Chain-of-Editions☆13May 7, 2024Updated last year
- KnowRL: Exploring Knowledgeable Reinforcement Learning for Factuality☆40Dec 1, 2025Updated 2 months ago
- [ICML 2025] Official resources of "KBQA-o1: Agentic Knowledge Base Question Answering with Monte Carlo Tree Search".☆34Dec 6, 2025Updated 2 months ago
- An End-to-End Model with Adaptive Filtering for Retrieval-Augmented Generation☆16Oct 27, 2024Updated last year
- The implementation for "DEER: Descriptive Knowledge Graph for Explaining Entity Relationships" (EMNLP '22)☆12Oct 31, 2022Updated 3 years ago