waltonfuture / Diff-eRankLinks

[NeurIPS 2024] A Novel Rank-Based Metric for Evaluating Large Language Models

☆51

Alternatives and similar repositories for Diff-eRank

Users that are interested in Diff-eRank are comparing it to the libraries listed below

Sorting:

Dereck0602 / Awesome_Test_Time_LLMs
☆117Updated 4 months ago
horseee / CoT-Valve
CoT-Valve: Length-Compressible Chain-of-Thought Tuning
☆81Updated 5 months ago
sail-sg / Attention-Sink
[ICLR 2025] When Attention Sink Emerges in Language Models: An Empirical View (Spotlight)
☆107Updated last month
Ahren09 / AgentReview
Official Implementation for EMNLP 2024 (main) "AgentReview: Exploring Academic Peer Review with LLM Agent."
☆83Updated 8 months ago
YangLing0818 / SuperCorrect-llm
[ICLR 2025] SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction
☆76Updated 4 months ago
THU-KEG / AdaptThink
☆140Updated 2 months ago
ruixin31 / Spurious_Rewards
☆323Updated last week
bethgelab / sober-reasoning
A Sober Look at Language Model Reasoning
☆81Updated last month
RM-R1-UIUC / RM-R1
RM-R1: Unleashing the Reasoning Potential of Reward Models
☆120Updated last month
zitian-gao / one-shot-em
One-shot Entropy Minimization
☆175Updated last month
LightChen233 / reasoning-boundary
☆67Updated last month
MingyuJ666 / Rope_with_LLM
[ICML'25] Our study systematically investigates massive values in LLMs' attention mechanisms. First, we observe massive values are concen…
☆75Updated last month
multimodal-art-projection / LatentCoT-Horizon
📖 This is a repository for organizing papers, codes, and other resources related to Latent Reasoning.
☆171Updated last week
dongxiangjue / Awesome-LLM-Self-Improvement
A curated list of awesome LLM Inference-Time Self-Improvement (ITSI, pronounced "itsy") papers from our recent survey: A Survey on Large …
☆88Updated 7 months ago
bobxwu / learning-from-rewards-llm-papers
A comrephensive collection of learning from rewards in the post-training and test-time scaling of LLMs, with a focus on both reward model…
☆52Updated last month
GeniusHTX / TALE
☆126Updated 2 months ago
sail-sg / scaling-with-vocab
[NeurIPS-2024] 📈 Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623
☆86Updated 10 months ago
mathllm / MATH-V
[NeurIPS 2024] MATH-Vision dataset and code to measure multimodal mathematical reasoning capabilities.
☆111Updated 2 months ago
NuoJohnChen / JudgeLRM
JudgeLRM: Large Reasoning Models as a Judge
☆32Updated 3 months ago
MingyuJ666 / Disentangling-Memory-and-Reasoning
[ACL'25] We propose a novel fine-tuning method, Separate Memory and Reasoning, which combines prompt tuning with LoRA.
☆68Updated 2 weeks ago
cs-holder / Reasoning-Self-Evolution-Survey
☆50Updated 5 months ago
GATECH-EIC / ACT
[ICML 2024] Unveiling and Harnessing Hidden Attention Sinks: Enhancing Large Language Models without Training through Attention Calibrati…
☆40Updated last year
MiroMindAsia / MiroMind-M1
MiroMind-M1 is a fully open-source series of reasoning language models built on Qwen-2.5, focused on advancing mathematical reasoning.
☆106Updated this week
Joshua-Ren / Learning_dynamics_LLM
☆155Updated 2 months ago
ltzheng / SimpleTIR
End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning
☆162Updated this week
LeapLabTHU / limit-of-RLVR
repo for paper https://arxiv.org/abs/2504.13837
☆180Updated last month
microsoft / x-reasoner
X-Reasoner: Towards Generalizable Reasoning Across Modalities and Domains
☆47Updated 3 months ago
maple-research-lab / SLOT
☆101Updated last month
OpenSparseLLMs / LLaMA-MoE-v2
🚀 LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training
☆86Updated 8 months ago
LINs-lab / DynMoE
[ICLR 2025] Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models
☆121Updated last month