waltonfuture / Diff-eRankLinks
[NeurIPS 2024] A Novel Rank-Based Metric for Evaluating Large Language Models
☆55Updated 6 months ago
Alternatives and similar repositories for Diff-eRank
Users that are interested in Diff-eRank are comparing it to the libraries listed below
Sorting:
- ☆134Updated 9 months ago
- A curated list of awesome LLM Inference-Time Self-Improvement (ITSI, pronounced "itsy") papers from our recent survey: A Survey on Large …☆97Updated 11 months ago
- One-shot Entropy Minimization☆187Updated 5 months ago
- [ICLR 2025] When Attention Sink Emerges in Language Models: An Empirical View (Spotlight)☆143Updated 5 months ago
- [EMNLP 2025] LightThinker: Thinking Step-by-Step Compression☆123Updated 7 months ago
- A Sober Look at Language Model Reasoning☆89Updated 3 weeks ago
- CoT-Valve: Length-Compressible Chain-of-Thought Tuning☆88Updated 9 months ago
- JudgeLRM: Large Reasoning Models as a Judge☆40Updated this week
- ☆138Updated 2 months ago
- ☆171Updated this week
- X-Reasoner: Towards Generalizable Reasoning Across Modalities and Domains☆49Updated 7 months ago
- Official Implementation for EMNLP 2024 (main) "AgentReview: Exploring Academic Peer Review with LLM Agent."☆94Updated last year
- [ICLR 2025] SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction☆83Updated 8 months ago
- [NeurIPS-2024] 📈 Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623☆89Updated last year
- [ICML'25] Our study systematically investigates massive values in LLMs' attention mechanisms. First, we observe massive values are concen…☆86Updated 5 months ago
- RM-R1: Unleashing the Reasoning Potential of Reward Models☆152Updated 5 months ago
- [ICLR 2025] Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models☆147Updated 5 months ago
- A comrephensive collection of learning from rewards in the post-training and test-time scaling of LLMs, with a focus on both reward model…☆59Updated 5 months ago
- ☆344Updated 4 months ago
- [NeurIPS 2024] MATH-Vision dataset and code to measure multimodal mathematical reasoning capabilities.☆126Updated 6 months ago
- Unofficial Implementation of Chain-of-Thought Reasoning Without Prompting☆34Updated last year
- The official repo for "AceCoder: Acing Coder RL via Automated Test-Case Synthesis" [ACL25]☆95Updated 8 months ago
- Geometric-Mean Policy Optimization☆95Updated 3 weeks ago
- ☆38Updated 3 months ago
- Official PyTorch Implementation of EMoE: Unlocking Emergent Modularity in Large Language Models [main conference @ NAACL2024]☆37Updated last year
- [NeurIPS 2025] Implementation for the paper "The Surprising Effectiveness of Negative Reinforcement in LLM Reasoning"☆136Updated last month
- [NAACL 2025] A Closer Look into Mixture-of-Experts in Large Language Models☆55Updated 10 months ago
- [ACL'25 Oral] What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective☆74Updated 5 months ago
- ☆53Updated 4 months ago
- ☆53Updated 10 months ago