maitrix-org / de-arenaLinks
Official repository for Decentralized Arena via Collective LLM Intelligence
☆14Updated last month
Alternatives and similar repositories for de-arena
Users that are interested in de-arena are comparing it to the libraries listed below
Sorting:
- [ICML 2025] Teaching Language Models to Critique via Reinforcement Learning☆99Updated last month
- ☆59Updated 9 months ago
- Interpretable Contrastive Monte Carlo Tree Search Reasoning☆49Updated 7 months ago
- [ICML 2024 Oral] Official code repository for MLLM-as-a-Judge.☆71Updated 4 months ago
- This is the repo for our paper "Mr-Ben: A Comprehensive Meta-Reasoning Benchmark for Large Language Models"☆50Updated 7 months ago
- The rule-based evaluation subset and code implementation of Omni-MATH☆22Updated 6 months ago
- ☆46Updated 8 months ago
- Implementation for the paper "The Surprising Effectiveness of Negative Reinforcement in LLM Reasoning"☆61Updated 2 weeks ago
- Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models☆36Updated last week
- CoT-Valve: Length-Compressible Chain-of-Thought Tuning☆73Updated 4 months ago
- Resources for the Enigmata Project.☆44Updated 2 weeks ago
- Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"☆57Updated 8 months ago
- Missing Premise exacerbates Overthinking: Are Reasoning Models losing Critical Thinking Skill?☆29Updated 3 weeks ago
- AdaRFT: Efficient Reinforcement Finetuning via Adaptive Curriculum Learning☆37Updated 2 weeks ago
- [EMNLP 2024 Findings] ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs☆27Updated last month
- Revisiting Mid-training in the Era of RL Scaling☆62Updated 2 months ago
- Code for "CREAM: Consistency Regularized Self-Rewarding Language Models", ICLR 2025.☆22Updated 4 months ago
- [ICML 2024] Unveiling and Harnessing Hidden Attention Sinks: Enhancing Large Language Models without Training through Attention Calibrati…☆41Updated 11 months ago
- [NeurIPS 2024] A Novel Rank-Based Metric for Evaluating Large Language Models☆46Updated last month
- [ACL' 25] The official code repository for PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models.☆74Updated 4 months ago
- ☆35Updated 3 months ago
- [ICML 2025] M-STAR (Multimodal Self-Evolving TrAining for Reasoning) Project. Diving into Self-Evolving Training for Multimodal Reasoning☆61Updated 6 months ago
- ☆40Updated 7 months ago
- Code for ICLR 2025 Paper "What is Wrong with Perplexity for Long-context Language Modeling?"☆90Updated last month
- [ICLR 2025 Workshop] "Landscape of Thoughts: Visualizing the Reasoning Process of Large Language Models"☆25Updated last week
- This the implementation of LeCo☆31Updated 5 months ago
- A Sober Look at Language Model Reasoning☆74Updated last week
- A comrephensive collection of learning from rewards in the post-training and test-time scaling of LLMs, with a focus on both reward model…☆47Updated 2 weeks ago
- [ICLR 2025] When Attention Sink Emerges in Language Models: An Empirical View (Spotlight)☆90Updated 8 months ago
- [NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.☆124Updated 3 months ago