KwanWaiChung/MT-Eval

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/KwanWaiChung/MT-Eval)

KwanWaiChung / MT-Eval

Code and data for "MT-Eval: A Multi-Turn Capabilities Evaluation Benchmark for Large Language Models"

☆57

Alternatives and similar repositories for MT-Eval

Users that are interested in MT-Eval are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

iwangjian / Midi-Tuning
View on GitHub
[ACL 2024] Instruct Once, Chat Consistently in Multiple Rounds: An Efficient Tuning Framework for Dialogue
☆26Oct 18, 2025Updated 9 months ago
mtbench101 / mt-bench-101
View on GitHub
[ACL 2024] MT-Bench-101: A Fine-Grained Benchmark for Evaluating Large Language Models in Multi-Turn Dialogues
☆152Jul 24, 2024Updated 2 years ago
open-compass / BotChat
View on GitHub
Evaluating LLMs' multi-round chatting capability via assessing conversations generated by two LLM instances.
☆163May 22, 2025Updated last year
feiyang-k / AutoScale
View on GitHub
Official Code Repository for [AutoScale📈: Scale-Aware Data Mixing for Pre-Training LLMs] Published as a conference paper at **COLM 2025*…
☆14Aug 8, 2025Updated 11 months ago
lblankl / Short-RL
View on GitHub
Short RL
☆19Apr 16, 2026Updated 3 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
KwanWaiChung / M4LE
View on GitHub
Code for M4LE: A Multi-Ability Multi-Range Multi-Task Multi-Domain Long-Context Evaluation Benchmark for Large Language Models
☆23Jul 27, 2024Updated 2 years ago
Perfec-Yu / Lifelong-ED
View on GitHub
Repo for the EMNLP2021 paper: Lifelong Event Detection with Knowledge Transfer
☆14Sep 2, 2021Updated 4 years ago
qinyiwei / InfoBench
View on GitHub
☆61Aug 22, 2024Updated last year
wandb / llm-kr-eval
View on GitHub
☆20Jul 24, 2024Updated 2 years ago
felix01189 / SEED
View on GitHub
☆14Jan 31, 2025Updated last year
zhangzx-uiuc / EventKE
View on GitHub
Source code for the EMNLP 2021 finding paper: Event-enhanced Knowledge Graph Embeddings
☆13Sep 3, 2021Updated 4 years ago
SharathChampzz / Leaf_Disease_Detection-Classification
View on GitHub
Flask App Which detects 15 variety of plants [Pepper , Potato , Tomato ]
☆11Aug 27, 2020Updated 5 years ago
WindyLee0822 / CTG
View on GitHub
Source code of “Reinforcement Learning with Token-level Feedback for Controllable Text Generation (NAACL 2024)
☆17Dec 8, 2024Updated last year
MayDomine / Seq1F1B
View on GitHub
Sequence-level 1F1B schedule for LLMs.
☆19Jun 4, 2024Updated 2 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
scipopt / bliss
View on GitHub
Fork of Bliss
☆15Dec 13, 2025Updated 7 months ago
IVADL / tomato-disease-detector
View on GitHub
prototype of plant-disease-detector
☆10Apr 21, 2021Updated 5 years ago
Leezekun / dialogic
View on GitHub
[EMNLP 2022] Code and data for "Controllable Dialogue Simulation with In-Context Learning"
☆34Feb 22, 2023Updated 3 years ago
RunxinXu / Make-Information-Extraction-Great-Again
View on GitHub
An (incomplete) overview of information extraction
☆43Apr 28, 2022Updated 4 years ago
accretional / semantifly
View on GitHub
☆15Sep 6, 2024Updated last year
tongshuangwu / llm-crowdsourcing-pipeline
View on GitHub
☆11Jul 6, 2023Updated 3 years ago
MiuLab / Spk-Dialogue
View on GitHub
Speaker Role Contextual Model for Dialogues
☆15Sep 30, 2017Updated 8 years ago
smallporridge / AssistRAG
View on GitHub
☆23Jan 3, 2025Updated last year
shivanshkaushikk / rag-fusion
View on GitHub
RAG-Fusion implementation using Langchain, Weaviate and OpenAI
☆13Oct 31, 2023Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
xingyaoww / mint-bench
View on GitHub
Official Repo for ICLR 2024 paper MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback by Xingyao Wang*, Ziha…
☆141Jun 4, 2024Updated 2 years ago
liziniu / ReMax
View on GitHub
Code for Paper (ReMax: A Simple, Efficient and Effective Reinforcement Learning Method for Aligning Large Language Models)
☆202Dec 16, 2023Updated 2 years ago
Jinhsi555 / My_PlaNet
View on GitHub
☆13Mar 16, 2025Updated last year
wbbeyourself / DTE
View on GitHub
Detect-Then-Explain Framework for Text-to-SQL task
☆10Dec 6, 2023Updated 2 years ago
Victorwz / LaViA
View on GitHub
☆10Jul 13, 2024Updated 2 years ago
leuchine / self_play_picard
View on GitHub
Using self-play to augment multi-turn text-to-SQL datasets
☆12Oct 20, 2022Updated 3 years ago
horizon-llm / OpenKimi
View on GitHub
[ICML2026] Reproduce Kimi K1.5/K2 RL algorithm and rollout system
☆19Apr 9, 2026Updated 3 months ago
CRIPAC-DIG / SCGAN
View on GitHub
[ICME 2019] Source code and datasets for "Semi-supervised Compatibility Learning Across Categories for Clothing Matching"
☆11Apr 26, 2024Updated 2 years ago
OpenBuddy / GrandSage
View on GitHub
☆16May 31, 2024Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
corca-ai / evaluating-gpt-4o-on-CLIcK
View on GitHub
Evaluate gpt-4o on CLIcK (Korean NLP Dataset)
☆20May 18, 2024Updated 2 years ago
chenxingqiang / PFLoRA-lib
View on GitHub
PFLoRA-lib: Personalized Federated Learning with LoRA Algorithm Library focusing on privacy-protection, federated-learning, Citation, Ext…
☆14Sep 19, 2024Updated last year
FreedomIntelligence / PlatoLM
View on GitHub
A trainable user simulator
☆34Jun 30, 2025Updated last year
xiaomile / ChineseMedicalAssistant
View on GitHub
ChineseMedicalAssistant based on Internlm-chat-7b
☆17Mar 13, 2024Updated 2 years ago
clprice32 / Predicting-NBA-Game-Winners
View on GitHub
Using decision tree and random forest models, predict the winner of an NBA regular season game
☆15Jun 7, 2018Updated 8 years ago
EhsanAghazadeh / Metaphors_in_PLMs
View on GitHub
Probing and Generalization of Metaphorical Knowledge in Pre-Trained Language Modelss[ACL 2022]
☆23May 15, 2022Updated 4 years ago
ZhaoyueSun / PHEE
View on GitHub
☆24Feb 16, 2025Updated last year