Code and data for "MT-Eval: A Multi-Turn Capabilities Evaluation Benchmark for Large Language Models"
☆56Nov 18, 2025Updated 7 months ago
Alternatives and similar repositories for MT-Eval
Users that are interested in MT-Eval are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [ACL 2024] MT-Bench-101: A Fine-Grained Benchmark for Evaluating Large Language Models in Multi-Turn Dialogues☆150Jul 24, 2024Updated last year
- [ACL 2024] Instruct Once, Chat Consistently in Multiple Rounds: An Efficient Tuning Framework for Dialogue☆26Oct 18, 2025Updated 8 months ago
- Evaluating LLMs' multi-round chatting capability via assessing conversations generated by two LLM instances.☆162May 22, 2025Updated last year
- Official Code Repository for [AutoScale📈: Scale-Aware Data Mixing for Pre-Training LLMs] Published as a conference paper at **COLM 2025*…☆14Aug 8, 2025Updated 10 months ago
- Code for M4LE: A Multi-Ability Multi-Range Multi-Task Multi-Domain Long-Context Evaluation Benchmark for Large Language Models☆23Jul 27, 2024Updated last year
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- ☆60Aug 22, 2024Updated last year
- Source code of “Reinforcement Learning with Token-level Feedback for Controllable Text Generation (NAACL 2024)☆17Dec 8, 2024Updated last year
- Fork of Bliss☆15Dec 13, 2025Updated 6 months ago
- Source code for the EMNLP 2021 finding paper: Event-enhanced Knowledge Graph Embeddings☆13Sep 3, 2021Updated 4 years ago
- Short RL☆18Apr 16, 2026Updated 2 months ago
- [EMNLP 2022] Code and data for "Controllable Dialogue Simulation with In-Context Learning"☆34Feb 22, 2023Updated 3 years ago
- In-car multi-channel speech transcription system of AISHELL-5.☆44Jun 9, 2025Updated last year
- ☆18Feb 29, 2024Updated 2 years ago
- An (incomplete) overview of information extraction☆43Apr 28, 2022Updated 4 years ago
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- Official Repo for ICLR 2024 paper MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback by Xingyao Wang*, Ziha…☆140Jun 4, 2024Updated 2 years ago
- ☆11Jul 6, 2023Updated 2 years ago
- Detect-Then-Explain Framework for Text-to-SQL task☆10Dec 6, 2023Updated 2 years ago
- Code for Paper (ReMax: A Simple, Efficient and Effective Reinforcement Learning Method for Aligning Large Language Models)☆201Dec 16, 2023Updated 2 years ago
- Code and data for the paper: AI Sees Your Location—But With A Bias Toward The Wealthy World☆19Dec 15, 2025Updated 6 months ago
- RAG-Fusion implementation using Langchain, Weaviate and OpenAI☆13Oct 31, 2023Updated 2 years ago
- Evaluate gpt-4o on CLIcK (Korean NLP Dataset)☆20May 18, 2024Updated 2 years ago
- A tool to check for potential open source licensing problems.☆12Aug 17, 2016Updated 9 years ago
- ☆16May 31, 2024Updated 2 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- ☆10Jul 13, 2024Updated last year
- ☆42May 14, 2026Updated last month
- [ICME 2019] Source code and datasets for "Semi-supervised Compatibility Learning Across Categories for Clothing Matching"☆11Apr 26, 2024Updated 2 years ago
- Accompanying repo for the DP2O paper accepted by AAAI 2024 main conference☆17Mar 28, 2024Updated 2 years ago
- A trainable user simulator☆34Jun 30, 2025Updated 11 months ago
- 手机商城☆11Dec 16, 2022Updated 3 years ago
- Code and data for CoachLM, an automatic instruction revision approach LLM instruction tuning.☆60Mar 20, 2024Updated 2 years ago
- Accelerate Video Diffusion Inference via Sketching-Rendering Cooperation☆20Jun 11, 2025Updated last year
- ChineseMedicalAssistant based on Internlm-chat-7b☆17Mar 13, 2024Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- VQA-Med 2021☆23May 13, 2026Updated last month
- ☆21Jun 27, 2024Updated last year
- ☆24Feb 16, 2025Updated last year
- On the Complementarity between Pre-Training and Back-Translation for Neural Machine Translation (Findings of EMNLP 2021))☆13Nov 21, 2021Updated 4 years ago
- Official implementation of DapperFL.☆13Oct 29, 2024Updated last year
- Code and Dataset for the CVPRW Paper "Where did I leave my keys? — Episodic-Memory-Based Question Answering on Egocentric Videos"☆29Aug 28, 2023Updated 2 years ago
- Finetuning a codegen model with python instruction set using QLORA technique for better efficacy☆11Aug 31, 2023Updated 2 years ago