KwanWaiChung / MT-EvalView external linksLinks
Code and data for "MT-Eval: A Multi-Turn Capabilities Evaluation Benchmark for Large Language Models"
β51Nov 18, 2025Updated 2 months ago
Alternatives and similar repositories for MT-Eval
Users that are interested in MT-Eval are comparing it to the libraries listed below
Sorting:
- [ACL 2024] MT-Bench-101: A Fine-Grained Benchmark for Evaluating Large Language Models in Multi-Turn Dialoguesβ141Jul 24, 2024Updated last year
- Official Code Repository for [AutoScaleπ: Scale-Aware Data Mixing for Pre-Training LLMs] Published as a conference paper at **COLM 2025*β¦β13Aug 8, 2025Updated 6 months ago
- Short RLβ17May 26, 2025Updated 8 months ago
- The official implementation of InfoRM [NeurIPS 2024].β14Oct 25, 2025Updated 3 months ago
- Fork of Blissβ14Dec 13, 2025Updated 2 months ago
- Evaluating LLMs' multi-round chatting capability via assessing conversations generated by two LLM instances.β161May 22, 2025Updated 8 months ago
- β20Nov 3, 2024Updated last year
- β21Jun 27, 2024Updated last year
- β20Aug 30, 2025Updated 5 months ago
- Evaluate gpt-4o on CLIcK (Korean NLP Dataset)β20May 18, 2024Updated last year
- [NAACL 2025] Source code for MMEvalPro, a more trustworthy and efficient benchmark for evaluating LMMsβ24Sep 26, 2024Updated last year
- Official Repo for ICLR 2024 paper MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback by Xingyao Wang*, Zihaβ¦β133Jun 4, 2024Updated last year
- Code for Paper (ReMax: A Simple, Efficient and Effective Reinforcement Learning Method for Aligning Large Language Models)β199Dec 16, 2023Updated 2 years ago
- Collection of papers for scalable automated alignment.β93Oct 22, 2024Updated last year
- Official code for the paper "Contrastive Representations for Temporal Reasoning".β51Nov 25, 2025Updated 2 months ago
- β22Jan 3, 2025Updated last year
- The official GitHub page for paper "NegativePrompt: Leveraging Psychology for Large Language Models Enhancement via Negative Emotional Stβ¦β25May 10, 2024Updated last year
- β59Aug 22, 2024Updated last year
- EMNLP 2024 "Re-reading improves reasoning in large language models". Simply repeating the question to get bidirectional understanding forβ¦β28Dec 10, 2024Updated last year
- Source code for GreaTer ICLR 2025 - Gradient Over Reasoning makes Smaller Language Models Strong Prompt Optimizers