waltonfuture / Diff-eRankLinks
[NeurIPS 2024] A Novel Rank-Based Metric for Evaluating Large Language Models
☆53Updated 4 months ago
Alternatives and similar repositories for Diff-eRank
Users that are interested in Diff-eRank are comparing it to the libraries listed below
Sorting:
- ☆127Updated 6 months ago
- CoT-Valve: Length-Compressible Chain-of-Thought Tuning☆86Updated 7 months ago
- [ICLR 2025] When Attention Sink Emerges in Language Models: An Empirical View (Spotlight)☆125Updated 3 months ago
- Official Implementation for EMNLP 2024 (main) "AgentReview: Exploring Academic Peer Review with LLM Agent."☆87Updated 10 months ago
- A Sober Look at Language Model Reasoning☆83Updated this week
- JudgeLRM: Large Reasoning Models as a Judge☆39Updated 3 weeks ago
- [ICLR 2025] SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction☆82Updated 6 months ago
- RM-R1: Unleashing the Reasoning Potential of Reward Models☆137Updated 3 months ago
- ☆38Updated last month
- ☆133Updated 3 weeks ago
- One-shot Entropy Minimization☆185Updated 3 months ago
- ☆333Updated 2 months ago
- ☆155Updated 4 months ago
- A curated list of awesome LLM Inference-Time Self-Improvement (ITSI, pronounced "itsy") papers from our recent survey: A Survey on Large …☆96Updated 9 months ago
- TreeRL: LLM Reinforcement Learning with On-Policy Tree Search in ACL'25☆68Updated 3 months ago
- ☆169Updated 4 months ago
- [NeurIPS 2024] MATH-Vision dataset and code to measure multimodal mathematical reasoning capabilities.☆116Updated 4 months ago
- [EMNLP 2025] LightThinker: Thinking Step-by-Step Compression☆104Updated 5 months ago
- [ICML'25] Our study systematically investigates massive values in LLMs' attention mechanisms. First, we observe massive values are concen…☆80Updated 3 months ago
- ☆53Updated 7 months ago
- ☆96Updated 3 weeks ago
- [NeurIPS'24] Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models☆62Updated 9 months ago
- X-Reasoner: Towards Generalizable Reasoning Across Modalities and Domains☆48Updated 5 months ago
- A unified suite for generating elite reasoning problems and training high-performance LLMs, including pioneering attention-free architect…☆106Updated last week
- Code for "CREAM: Consistency Regularized Self-Rewarding Language Models", ICLR 2025.☆26Updated 7 months ago
- 🚀 LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training☆89Updated 10 months ago
- Official PyTorch Implementation of EMoE: Unlocking Emergent Modularity in Large Language Models [main conference @ NAACL2024]☆35Updated last year
- A comrephensive collection of learning from rewards in the post-training and test-time scaling of LLMs, with a focus on both reward model…☆56Updated 3 months ago
- [NeurIPS-2024] 📈 Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623☆86Updated last year
- Large Language Models Can Self-Improve in Long-context Reasoning☆73Updated 10 months ago