Official repository for Decentralized Arena via Collective LLM Intelligence
☆17May 19, 2025Updated 9 months ago
Alternatives and similar repositories for de-arena
Users that are interested in de-arena are comparing it to the libraries listed below
Sorting:
- [EMNLP'24 (Main)] DRPO(Dynamic Rewarding with Prompt Optimization) is a tuning-free approach for self-alignment. DRPO leverages a search-…☆24Nov 17, 2024Updated last year
- [NeurIPS 2024] Code and Data Repo for Paper "Embedding Trajectory for Out-of-Distribution Detection in Mathematical Reasoning"☆28May 28, 2024Updated last year
- ☆35Apr 8, 2025Updated 10 months ago
- [IJCAI 2024] CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and Reasoning☆24Feb 1, 2024Updated 2 years ago
- exploring whether LLMs perform case-based or rule-based reasoning☆30Mar 2, 2024Updated 2 years ago
- This the implementation of LeCo☆31Jan 20, 2025Updated last year
- This repo contains Duolingo English Test practice materials.☆13Jan 31, 2026Updated last month
- 公開用リポジトリ☆13Mar 29, 2024Updated last year
- [ICML 2024 Oral] Official code repository for MLLM-as-a-Judge.☆89Feb 17, 2025Updated last year
- Code and data for paper "A Semantic Invariant Robust Watermark for Large Language Models" accepted by ICLR 2024.☆37Nov 13, 2024Updated last year
- AITuberのデモリポジトリです☆10Mar 11, 2023Updated 2 years ago
- Software that runs reinout.vanrees.org☆20Feb 23, 2026Updated last week
- Undergraduate Work.☆13Jan 30, 2026Updated last month
- DOMAINEVAL is an auto-constructed benchmark for multi-domain code generation that consists of 2k+ subjects (i.e., description, reference …☆14Dec 12, 2024Updated last year
- [CVPR2024] Learning from Synthetic Human Group Activities☆14Feb 24, 2025Updated last year
- ☆12Jan 11, 2026Updated last month
- [ACL'24] A Knowledge-grounded Interactive Evaluation Framework for Large Language Models☆39Jul 19, 2024Updated last year
- A Swedish Natural Language Understanding Benchmark☆11Dec 12, 2025Updated 2 months ago
- ☆43Oct 7, 2024Updated last year
- (NeurIPS 2025) Official implementation for "MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?"☆47Jun 3, 2025Updated 9 months ago
- [ICLR2024] "Backdoor Federated Learning by Poisoning Backdoor-Critical Layers"☆53Dec 11, 2024Updated last year
- ☆10Oct 22, 2024Updated last year
- Code for our project CROWN (Conversational Passage Ranking by Reasoning over Word Networks)☆10Jan 11, 2024Updated 2 years ago
- Code and Data for GlitchBench☆13Feb 27, 2024Updated 2 years ago
- Align, a general text alignment function☆15Dec 7, 2023Updated 2 years ago
- Survey of available speech datasets for Polish ASR development☆17Jan 1, 2025Updated last year
- Convert datasets from Hugging Face to FiftyOne for Visualization☆11Mar 15, 2024Updated last year
- ☆11Nov 5, 2024Updated last year
- ☆15Jul 21, 2020Updated 5 years ago
- Shaping Language Models with Cognitive Insights☆15Feb 29, 2024Updated 2 years ago
- ☆12Mar 5, 2025Updated 11 months ago
- benchmarks for evaluating MT models☆11Jun 26, 2024Updated last year
- sealos deck☆11Mar 30, 2024Updated last year
- ☆11Oct 20, 2023Updated 2 years ago
- LLM benchmarks☆13Feb 22, 2024Updated 2 years ago
- 中文金融大模型测评基准,六大类二十五任务、等级化评价,国内模型获得A级☆10May 6, 2024Updated last year
- ☆11Oct 15, 2022Updated 3 years ago
- ☆10Feb 28, 2021Updated 5 years ago
- Automatically download a withny livestream.☆14Updated this week