☆43Oct 7, 2024Updated last year
Alternatives and similar repositories for Auto-Arena-LLMs
Users that are interested in Auto-Arena-LLMs are comparing it to the libraries listed below
Sorting:
- Multi-Task instruction-tuned LLaMA☆14May 5, 2023Updated 2 years ago
- Source code of "Reasons to Reject? Aligning Language Models with Judgments"☆58Feb 29, 2024Updated 2 years ago
- [EMNLP 2023] Once Upon a *Time* in *Graph*: Relative-Time Pretraining for Complex Temporal Reasoning☆17Oct 31, 2023Updated 2 years ago
- ☆21Sep 17, 2021Updated 4 years ago
- [ICLR 2025] LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization☆43Feb 27, 2025Updated last year
- [ICML 2025] Official repository for paper "OR-Bench: An Over-Refusal Benchmark for Large Language Models"☆23Mar 4, 2025Updated last year
- ☆11Apr 2, 2024Updated last year
- [ICCV 2025 Highlight] The official repository for "2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining"☆194Mar 17, 2025Updated 11 months ago
- Repository for paper CELLS: A Parallel Corpus for Biomedical Lay Language Generation☆19Apr 2, 2024Updated last year
- A Holistic Embodied Cognition Benchmark☆18Apr 3, 2025Updated 11 months ago
- Official repository for Decentralized Arena via Collective LLM Intelligence☆17May 19, 2025Updated 9 months ago
- [NeurIPS 2024] A comprehensive benchmark for evaluating critique ability of LLMs☆49Nov 29, 2024Updated last year
- [ICLR 2024] CLEX: Continuous Length Extrapolation for Large Language Models☆78Mar 12, 2024Updated last year
- VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs☆54Mar 9, 2025Updated 11 months ago
- [NAACL 2024 Findings] Evaluation suite for the systematic evaluation of instruction selection methods.☆23Jul 26, 2023Updated 2 years ago
- ☆21Oct 26, 2021Updated 4 years ago
- This repo contains evaluation code for the paper "AV-Odyssey: Can Your Multimodal LLMs Really Understand Audio-Visual Information?"☆31Dec 23, 2024Updated last year
- ☆31Jun 12, 2024Updated last year
- GSM-Plus: Data, Code, and Evaluation for Enhancing Robust Mathematical Reasoning in Math Word Problems.☆64Jul 8, 2024Updated last year
- ☆32Jan 11, 2024Updated 2 years ago
- Data and code for paper "M3Exam: A Multilingual, Multimodal, Multilevel Benchmark for Examining Large Language Models"☆103Jun 15, 2023Updated 2 years ago
- [AAAI 2024] SciEval: A Multi-Level Large Language Model Evaluation Benchmark for Scientific Research☆30Aug 6, 2024Updated last year
- a benchmark suite for testing logical reasoning abilities of prompt-based models☆32Nov 20, 2023Updated 2 years ago
- The official repo for "VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search" [EMNLP25]☆38Feb 1, 2026Updated last month
- Neuro-Symbolic Integration Brings Causal and Reliable Reasoning Proofs☆41Feb 15, 2024Updated 2 years ago
- This the implementation of LeCo☆31Jan 20, 2025Updated last year
- A Survey on Benchmarks of Multimodal Large Language Models☆150Jul 1, 2025Updated 8 months ago
- Profile repository of Pietro Monticone.☆14Feb 22, 2026Updated last week
- [NeurIPS-2024] 📈 Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623☆89Sep 26, 2024Updated last year
- [ICML 2024 Oral] Official code repository for MLLM-as-a-Judge.☆89Feb 17, 2025Updated last year
- [ACL 2023] Code and Data for "Bidirectional Generative Framework for Cross-domain Aspect-based Sentiment Analysis"☆39Aug 2, 2023Updated 2 years ago
- A framework for few-shot evaluation of autoregressive language models.☆12Jul 14, 2025Updated 7 months ago
- [CVPR2024] Learning from Synthetic Human Group Activities☆14Feb 24, 2025Updated last year
- A Swedish Natural Language Understanding Benchmark☆11Dec 12, 2025Updated 2 months ago
- The official PyTorch implementation of "An Attentional Multi-scale Co-evolving Model for Dynamic Link Prediction" (TheWebConf'23)☆11May 4, 2023Updated 2 years ago
- 一个支持跨模态大语言模型的webui. A chatbot webui that supports various multi-modal large language models☆11May 8, 2023Updated 2 years ago
- The official github repo for the open online courses: "Dive into LLMs".☆10Mar 15, 2024Updated last year
- [ACL'24] A Knowledge-grounded Interactive Evaluation Framework for Large Language Models☆39Jul 19, 2024Updated last year
- ☆12Jan 11, 2026Updated last month