waltonfuture / Diff-eRank
[NeurIPS 2024] A Novel Rank-Based Metric for Evaluating Large Language Models
☆46Updated 5 months ago
Alternatives and similar repositories for Diff-eRank:
Users that are interested in Diff-eRank are comparing it to the libraries listed below
- Code for "A Sober Look at Progress in Language Model Reasoning" paper☆41Updated 3 weeks ago
- ☆55Updated 6 months ago
- ☆95Updated last month
- ☆40Updated this week
- ☆29Updated last year
- ☆40Updated 2 months ago
- Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".☆72Updated 2 weeks ago
- [NeurIPS 2024] Code and Data Repo for Paper "Embedding Trajectory for Out-of-Distribution Detection in Mathematical Reasoning"☆25Updated 11 months ago
- This the implementation of LeCo☆31Updated 3 months ago
- ☆39Updated last month
- A curated list of awesome LLM Inference-Time Self-Improvement (ITSI, pronounced "itsy") papers from our recent survey: A Survey on Large …☆75Updated 4 months ago
- This repo contains evaluation code for the paper "MileBench: Benchmarking MLLMs in Long Context"☆33Updated 9 months ago
- CoT-Valve: Length-Compressible Chain-of-Thought Tuning☆67Updated 2 months ago
- This is the repo for our paper "Mr-Ben: A Comprehensive Meta-Reasoning Benchmark for Large Language Models"☆49Updated 6 months ago
- SLED: Self Logits Evolution Decoding for Improving Factuality in Large Language Model https://arxiv.org/pdf/2411.02433☆25Updated 5 months ago
- [ICLR 2025] LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization☆35Updated 2 months ago
- [EMNLP 2023 Main] Sparse Low-rank Adaptation of Pre-trained Language Models☆75Updated last year
- ☆95Updated 2 weeks ago
- Code for paper "Unraveling Cross-Modality Knowledge Conflicts in Large Vision-Language Models."☆42Updated 6 months ago
- Unofficial Implementation of Chain-of-Thought Reasoning Without Prompting☆32Updated last year
- The code and data for the paper JiuZhang3.0☆43Updated 11 months ago
- The this is the official implementation of "DAPE: Data-Adaptive Positional Encoding for Length Extrapolation"☆38Updated 6 months ago
- [ICLR 2025] When Attention Sink Emerges in Language Models: An Empirical View (Spotlight)☆72Updated 6 months ago
- [NeurIPS 2024 Spotlight] Code and data for the paper "Finding Transformer Circuits with Edge Pruning".☆54Updated 2 months ago
- Codebase for Instruction Following without Instruction Tuning☆34Updated 7 months ago
- [ICLR 2025] SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction☆69Updated last month
- Code for "Reasoning to Learn from Latent Thoughts"☆93Updated last month
- ☆22Updated 9 months ago
- Official implementation of Bootstrapping Language Models via DPO Implicit Rewards☆44Updated 3 weeks ago
- Code for Paper: Teaching Language Models to Critique via Reinforcement Learning☆94Updated 3 weeks ago