zitian-gao / one-shot-emLinks

One-shot Entropy Minimization

☆187

Alternatives and similar repositories for one-shot-em

Users that are interested in one-shot-em are comparing it to the libraries listed below

Sorting:

Joshua-Ren / Learning_dynamics_LLM
☆185Updated 6 months ago
LeapLabTHU / limit-of-RLVR
repo for paper https://arxiv.org/abs/2504.13837
☆271Updated 5 months ago
ruixin31 / Spurious_Rewards
☆344Updated 4 months ago
fscdc / Awesome-Efficient-Reasoning-Models
[TMLR 2025] Efficient Reasoning Models: A Survey
☆282Updated last month
LINs-lab / DynMoE
[ICLR 2025] Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models
☆147Updated 4 months ago
horseee / CoT-Valve
CoT-Valve: Length-Compressible Chain-of-Thought Tuning
☆87Updated 9 months ago
Dereck0602 / Awesome_Test_Time_LLMs
☆134Updated 8 months ago
QingyangZhang / Label-Free-RLVR
☆290Updated 5 months ago
PRIME-RL / Entropy-Mechanism-of-RL
The Entropy Mechanism of Reinforcement Learning for Large Language Model Reasoning.
☆396Updated 4 months ago
GeniusHTX / TALE
☆137Updated 2 months ago
MingyuJ666 / Rope_with_LLM
[ICML'25] Our study systematically investigates massive values in LLMs' attention mechanisms. First, we observe massive values are concen…
☆86Updated 5 months ago
multimodal-art-projection / LatentCoT-Horizon
📖 This is a repository for organizing papers, codes, and other resources related to Latent Reasoning.
☆290Updated last month
OpenRLHF / OpenRLHF-M
An Easy-to-use, Scalable and High-performance RLHF Framework designed for Multimodal Models.
☆149Updated 2 months ago
ThreeSR / Awesome-Inference-Time-Scaling
Paper List of Inference/Test Time Scaling/Computing
☆326Updated 3 months ago
OpenBMB / RLPR
Extrapolating RLVR to General Domains without Verifiers
☆180Updated 3 months ago
XiaoYee / Awesome_Efficient_LRM_Reasoning
😎 A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, Agent, and Beyond
☆318Updated last month
lzhxmu / CPPO
CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning Models (NeurIPS 2025)
☆167Updated last month
cmu-l3 / l1
L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning
☆257Updated 6 months ago
ltzheng / SimpleTIR
End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning
☆332Updated 2 months ago
ElliottYan / LUFFY
Official Repository of "Learning to Reason under Off-Policy Guidance"
☆380Updated 2 months ago
Chongjie-Si / Subspace-Tuning
A generalized framework for subspace tuning methods in parameter efficient fine-tuning.
☆161Updated 5 months ago
ritzz-ai / GUI-R1
Official implementation of GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents
☆205Updated 7 months ago
EIT-NLP / Awesome-Latent-CoT
This repository contains a regularly updated paper list for LLMs-reasoning-in-latent-space.
☆215Updated this week
GAIR-NLP / LIMR
☆213Updated 9 months ago
hemingkx / TokenSkip
[EMNLP 2025] TokenSkip: Controllable Chain-of-Thought Compression in LLMs
☆193Updated last week
waltonfuture / Diff-eRank
[NeurIPS 2024] A Novel Rank-Based Metric for Evaluating Large Language Models
☆55Updated 6 months ago
TIGER-AI-Lab / VL-Rethinker
The official code of "VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning" [NeurIPS25]
☆168Updated 6 months ago
DataArcTech / ChartMoE
[ICLR2025 Oral] ChartMoE: Mixture of Diversely Aligned Expert Connector for Chart Understanding
☆92Updated 8 months ago
UCSC-VLAA / VLAA-Thinking
[TMLR 25] SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models
☆143Updated last month
NUS-TRAIL / NoisyRollout
[NeurIPS 2025] NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation
☆98Updated 2 months ago