Code and data for "MT-Eval: A Multi-Turn Capabilities Evaluation Benchmark for Large Language Models"
☆51Nov 18, 2025Updated 3 months ago
Alternatives and similar repositories for MT-Eval
Users that are interested in MT-Eval are comparing it to the libraries listed below
Sorting:
- [ACL 2024] MT-Bench-101: A Fine-Grained Benchmark for Evaluating Large Language Models in Multi-Turn Dialogues☆143Jul 24, 2024Updated last year
- Instruct Once, Chat Consistently in Multiple Rounds: An Efficient Tuning Framework for Dialogue (ACL 2024)☆25Oct 18, 2025Updated 4 months ago
- Fork of Bliss☆14Dec 13, 2025Updated 2 months ago
- Short RL☆18May 26, 2025Updated 9 months ago
- Evaluating LLMs' multi-round chatting capability via assessing conversations generated by two LLM instances.☆161May 22, 2025Updated 9 months ago
- ☆16May 31, 2024Updated last year
- ☆20Nov 3, 2024Updated last year
- ☆21Jun 27, 2024Updated last year
- Evaluate gpt-4o on CLIcK (Korean NLP Dataset)☆20May 18, 2024Updated last year
- [NAACL 2025] Source code for MMEvalPro, a more trustworthy and efficient benchmark for evaluating LMMs☆25Sep 26, 2024Updated last year
- Official Repo for ICLR 2024 paper MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback by Xingyao Wang*, Ziha…☆133Jun 4, 2024Updated last year
- ☆37Jan 23, 2026Updated last month
- Code for Paper (ReMax: A Simple, Efficient and Effective Reinforcement Learning Method for Aligning Large Language Models)☆201Dec 16, 2023Updated 2 years ago
- Collection of papers for scalable automated alignment.☆93Oct 22, 2024Updated last year
- The official GitHub page for paper "NegativePrompt: Leveraging Psychology for Large Language Models Enhancement via Negative Emotional St…☆25May 10, 2024Updated last year
- Code and data for CoachLM, an automatic instruction revision approach LLM instruction tuning.☆60Mar 20, 2024Updated last year
- ☆60Aug 22, 2024Updated last year
- EMNLP 2024 "Re-reading improves reasoning in large language models". Simply repeating the question to get bidirectional understanding for…☆28Dec 10, 2024Updated last year
- Source code for GreaTer ICLR 2025 - Gradient Over Reasoning makes Smaller Language Models Strong Prompt Optimizers☆36Apr 18, 2025Updated 10 months ago
- the world's first large-scale multi-modal short-video encyclopedia, where the primitive units are items, aspects, and short videos.☆66Nov 28, 2023Updated 2 years ago
- CodeUltraFeedback: aligning large language models to coding preferences (TOSEM 2025)☆73Jun 25, 2024Updated last year
- Code and Dataset for the CVPRW Paper "Where did I leave my keys? — Episodic-Memory-Based Question Answering on Egocentric Videos"☆29Aug 28, 2023Updated 2 years ago
- The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism☆30Jul 17, 2024Updated last year
- ☆69Mar 21, 2024Updated last year
- [ICLR 2024] Evaluating Large Language Models at Evaluating Instruction Following☆137Jul 8, 2024Updated last year
- Dataset and evaluation script for "Evaluating Hallucinations in Chinese Large Language Models"☆136Jun 5, 2024Updated last year
- ☆79Nov 19, 2024Updated last year
- Q-Probe: A Lightweight Approach to Reward Maximization for Language Models☆40Jun 10, 2024Updated last year
- A trainable user simulator☆34Jun 30, 2025Updated 8 months ago
- ☆12Updated this week
- Visual tool for SPARQL queries on graphol graphs☆10Oct 3, 2018Updated 7 years ago
- CODO is an ontology for the semantic representation and annotation of COVID-19 data in a machine-readable form for tracking history of th…☆10Apr 19, 2022Updated 3 years ago
- [ACL'24] Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization☆96Aug 20, 2024Updated last year
- Test-time preferenece optimization (ICML 2025).☆178May 8, 2025Updated 10 months ago
- A unified suite for generating elite reasoning problems and training high-performance LLMs, including pioneering attention-free architect…☆134Jan 31, 2026Updated last month
- [NeurIPS 2023] This is the code for the paper `Large Language Model as Attributed Training Data Generator: A Tale of Diversity and Bias`.☆156Nov 2, 2023Updated 2 years ago
- ☆144Sep 10, 2023Updated 2 years ago
- Implementations of Curious Replay for model-based adaptation.☆43Jul 5, 2023Updated 2 years ago
- ☆149Apr 16, 2024Updated last year