☆45Dec 12, 2024Updated last year
Alternatives and similar repositories for McEval
Users that are interested in McEval are comparing it to the libraries listed below
Sorting:
- ☆10Nov 14, 2024Updated last year
- Arxiv地址:https://arxiv.org/abs/2409.01944☆22Feb 20, 2025Updated last year
- ☆12Mar 5, 2025Updated last year
- 中文原生等级化代码能力测试基准☆15Apr 11, 2024Updated last year
- ☆16Nov 26, 2024Updated last year
- This is the project repository of our ASE22 paper: Natural Test Generation for Precise Testing of Question Answering Software☆14Dec 1, 2022Updated 3 years ago
- Official repository for paper "TableBench: A Comprehensive and Complex Benchmark for Table Question Answering"☆83May 8, 2025Updated 9 months ago
- This repository open-sources our GEC system submitted by THU KELab (sz) in the CCL2023-CLTC Track 1: Multidimensional Chinese Learner Tex…☆15Nov 25, 2023Updated 2 years ago
- The data for the CRASS-benchmark☆16Oct 24, 2022Updated 3 years ago
- Dr. Zinn AI Psychotherapist is a chatbot for psychological support. You can chat with it when you need help or fun. It will listen and gi…☆18Sep 22, 2023Updated 2 years ago
- [LREC-COLING'24] HumanEval-XL: A Multilingual Code Generation Benchmark for Cross-lingual Natural Language Generalization☆41Mar 7, 2025Updated 11 months ago
- ☆41Jun 19, 2024Updated last year
- LLM evaluation.☆16Nov 7, 2023Updated 2 years ago
- SG-Bench: Evaluating LLM Safety Generalization Across Diverse Tasks and Prompt Types☆25Nov 29, 2024Updated last year
- The evaluation framework for the InfiCoder-Eval benchmark.☆21Jul 22, 2024Updated last year
- A lightweight script for processing HTML page to markdown format with support for code blocks☆82Apr 14, 2024Updated last year
- ☆21Aug 19, 2024Updated last year
- ☆26Oct 9, 2024Updated last year
- A Chinese Spell Checking Model Released on EMNLP2022.☆22Apr 14, 2023Updated 2 years ago
- ☆28Nov 10, 2025Updated 3 months ago
- [ACL 2022] A hierarchical table dataset for question answering and data-to-text generation.☆107Dec 16, 2025Updated 2 months ago
- [ACL 2024] FollowBench: A Multi-level Fine-grained Constraints Following Benchmark for Large Language Models☆119Jun 12, 2025Updated 8 months ago
- [ACL 2024] CodeScope: An Execution-based Multilingual Multitask Multidimensional Benchmark for Evaluating LLMs on Code Understanding and …☆101Jul 29, 2024Updated last year
- Repository of LV-Eval Benchmark☆73Aug 31, 2024Updated last year
- SIGIR 2022: Contrastive Learning with Hard Negative Entities for Entity Set Expansion☆30Jan 6, 2023Updated 3 years ago
- A multi-programming language benchmark for LLMs☆298Jan 28, 2026Updated last month
- NaturalCodeBench (Findings of ACL 2024)☆68Oct 14, 2024Updated last year
- Towards Systematic Measurement for Long Text Quality☆37Sep 5, 2024Updated last year
- [ICLR'25] BigCodeBench: Benchmarking Code Generation Towards AGI☆483Jan 3, 2026Updated 2 months ago
- This repository is the replication package of the ICSE22 paper "FIRA: Fine-Grained Graph-Based Code Change Representation for Automated C…☆33Apr 20, 2022Updated 3 years ago
- 汽车行业中文大模型测评基准,基于多轮开放式问题的细粒度评测☆38Dec 26, 2023Updated 2 years ago
- ☆12Nov 30, 2018Updated 7 years ago
- GAOGAO-Bench-Updates is a supplement to the GAOKAO-Bench, a dataset to evaluate large language models.☆39Jan 7, 2025Updated last year
- ☆37Jan 26, 2024Updated 2 years ago
- ☆36Sep 6, 2024Updated last year
- e☆43Apr 23, 2025Updated 10 months ago
- [NeurIPS 2025 Spotlight] Co-Evolving LLM Coder and Unit Tester via Reinforcement Learning☆152Sep 19, 2025Updated 5 months ago
- ☆50Aug 21, 2025Updated 6 months ago
- Baselines for all tasks from Long Code Arena benchmarks 🏟️☆39Mar 30, 2025Updated 11 months ago