MCEVAL / McEval
☆30Updated last month
Alternatives and similar repositories for McEval:
Users that are interested in McEval are comparing it to the libraries listed below
- NaturalCodeBench (Findings of ACL 2024)☆61Updated 3 months ago
- xCodeEval: A Large Scale Multilingual Multitask Benchmark for Code Understanding, Generation, Translation and Retrieval☆77Updated 4 months ago
- An Evolving Code Generation Benchmark Aligned with Real-world Code Repositories☆47Updated 5 months ago
- ☆28Updated 2 months ago
- ☆41Updated 7 months ago
- Generate the WizardCoder Instruct from the CodeAlpaca☆20Updated last year
- Code for our EMNLP-2023 paper: "Active Instruction Tuning: Improving Cross-Task Generalization by Training on Prompt Sensitive Tasks"☆24Updated last year
- Dataset and evaluation script for "Evaluating Hallucinations in Chinese Large Language Models"☆113Updated 7 months ago
- CrossCodeEval: A Diverse and Multilingual Benchmark for Cross-File Code Completion (NeurIPS 2023)☆126Updated 5 months ago
- ☆60Updated 6 months ago
- code for Scaling Laws of RoPE-based Extrapolation☆71Updated last year
- [COLING 2025] ToolEyes: Fine-Grained Evaluation for Tool Learning Capabilities of Large Language Models in Real-world Scenarios☆64Updated last month
- Fantastic Data Engineering for Large Language Models☆64Updated 3 weeks ago
- MTU-Bench: A Multi-granularity Tool-Use Benchmark for Large Language Models☆23Updated 2 months ago
- CLongEval: A Chinese Benchmark for Evaluating Long-Context Large Language Models☆38Updated 10 months ago
- ☆22Updated 2 months ago
- 🧬 RegMix: Data Mixture as Regression for Language Model Pre-training☆97Updated 3 months ago
- Feeling confused about super alignment? Here is a reading list☆42Updated last year
- The repository for paper "DebugBench: "Evaluating Debugging Capability of Large Language Models".☆59Updated 6 months ago
- ☆93Updated 3 months ago
- Official github repo for AutoDetect, an automated weakness detection framework for LLMs.☆40Updated 6 months ago
- [ICLR'24 spotlight] Tool-Augmented Reward Modeling☆44Updated 3 weeks ago
- Implementations of online merging optimizers proposed by Online Merging Optimizers for Boosting Rewards and Mitigating Tax in Alignment☆70Updated 7 months ago
- Implementation of ICML 23 Paper: Specializing Smaller Language Models towards Multi-Step Reasoning.☆128Updated last year
- 🐋 An unofficial implementation of Self-Alignment with Instruction Backtranslation.☆136Updated 6 months ago
- Repository of LV-Eval Benchmark☆56Updated 4 months ago
- [ACL 2024] MT-Bench-101: A Fine-Grained Benchmark for Evaluating Large Language Models in Multi-Turn Dialogues☆61Updated 5 months ago
- Towards Systematic Measurement for Long Text Quality☆31Updated 4 months ago
- A Comprehensive Benchmark for Software Development.☆88Updated 7 months ago
- ACL 2024 | LooGLE: Long Context Evaluation for Long-Context Language Models☆171Updated 3 months ago