Code and data for "MT-Eval: A Multi-Turn Capabilities Evaluation Benchmark for Large Language Models"
☆55Nov 18, 2025Updated 6 months ago
Alternatives and similar repositories for MT-Eval
Users that are interested in MT-Eval are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [ACL 2024] MT-Bench-101: A Fine-Grained Benchmark for Evaluating Large Language Models in Multi-Turn Dialogues☆149Jul 24, 2024Updated last year
- [ACL 2024] Instruct Once, Chat Consistently in Multiple Rounds: An Efficient Tuning Framework for Dialogue☆26Oct 18, 2025Updated 7 months ago
- Evaluating LLMs' multi-round chatting capability via assessing conversations generated by two LLM instances.☆162May 22, 2025Updated last year
- Official Code Repository for [AutoScale📈: Scale-Aware Data Mixing for Pre-Training LLMs] Published as a conference paper at **COLM 2025*…☆14Aug 8, 2025Updated 9 months ago
- Repo for the EMNLP2021 paper: Lifelong Event Detection with Knowledge Transfer☆14Sep 2, 2021Updated 4 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- The official implementation of InfoRM [NeurIPS 2024].☆15Oct 25, 2025Updated 7 months ago
- ☆20Jul 24, 2024Updated last year
- Source code of “Reinforcement Learning with Token-level Feedback for Controllable Text Generation (NAACL 2024)☆17Dec 8, 2024Updated last year
- ☆13Jan 31, 2025Updated last year
- ☆15Sep 6, 2024Updated last year
- Fork of Bliss☆15Dec 13, 2025Updated 5 months ago
- Short RL☆18Apr 16, 2026Updated last month
- OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, …☆19Jun 25, 2024Updated last year
- [EMNLP 2022] Code and data for "Controllable Dialogue Simulation with In-Context Learning"☆34Feb 22, 2023Updated 3 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- ☆10Dec 19, 2023Updated 2 years ago
- Code for Paper (ReMax: A Simple, Efficient and Effective Reinforcement Learning Method for Aligning Large Language Models)☆202Dec 16, 2023Updated 2 years ago
- Evaluate gpt-4o on CLIcK (Korean NLP Dataset)☆20May 18, 2024Updated 2 years ago
- Using self-play to augment multi-turn text-to-SQL datasets☆11Oct 20, 2022Updated 3 years ago
- ☆40May 14, 2026Updated 2 weeks ago
- ☆16May 31, 2024Updated last year
- ☆10Jul 13, 2024Updated last year
- [ICME 2019] Source code and datasets for "Semi-supervised Compatibility Learning Across Categories for Clothing Matching"☆11Apr 26, 2024Updated 2 years ago
- A trainable user simulator☆34Jun 30, 2025Updated 10 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Code and data for CoachLM, an automatic instruction revision approach LLM instruction tuning.☆60Mar 20, 2024Updated 2 years ago
- ☆11Feb 16, 2023Updated 3 years ago
- ☆21Jun 27, 2024Updated last year
- ☆24Feb 16, 2025Updated last year
- PFLoRA-lib: Personalized Federated Learning with LoRA Algorithm Library focusing on privacy-protection, federated-learning, Citation, Ext…☆14Sep 19, 2024Updated last year
- Official implementation of DapperFL.☆13Oct 29, 2024Updated last year
- Code and Dataset for the CVPRW Paper "Where did I leave my keys? — Episodic-Memory-Based Question Answering on Egocentric Videos"☆29Aug 28, 2023Updated 2 years ago
- Finetuning a codegen model with python instruction set using QLORA technique for better efficacy☆11Aug 31, 2023Updated 2 years ago
- Benchmarking Complex Instruction-Following with Multiple Constraints Composition (NeurIPS 2024 Datasets and Benchmarks Track)☆102Feb 20, 2025Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- ☆67Mar 21, 2024Updated 2 years ago
- Project of llm evaluation to Japanese tasks☆94May 13, 2026Updated 2 weeks ago
- Collection of papers for scalable automated alignment.☆93Oct 22, 2024Updated last year
- DocChecker: Bootstrapping Code-Text Pretrained Language Model to Detect Inconsistency Between Code and Comment☆16Jan 23, 2024Updated 2 years ago
- ☆14Jul 25, 2024Updated last year
- ☆20Nov 3, 2024Updated last year
- ☆13Sep 14, 2023Updated 2 years ago