MCEVAL / McEvalLinks
☆41Updated 6 months ago
Alternatives and similar repositories for McEval
Users that are interested in McEval are comparing it to the libraries listed below
Sorting:
- NaturalCodeBench (Findings of ACL 2024)☆65Updated 8 months ago
- Heuristic filtering framework for RefineCode☆66Updated 3 months ago
- ☆46Updated last year
- xCodeEval: A Large Scale Multilingual Multitask Benchmark for Code Understanding, Generation, Translation and Retrieval☆83Updated 9 months ago
- 代码大模型 预训练&微调&DPO 数据处理 业界处理pipeline sota☆42Updated 10 months ago
- Collection of papers for scalable automated alignment.☆91Updated 8 months ago
- Official github repo for AutoDetect, an automated weakness detection framework for LLMs.☆42Updated 11 months ago
- code for Scaling Laws of RoPE-based Extrapolation☆73Updated last year
- [COLING 2025] ToolEyes: Fine-Grained Evaluation for Tool Learning Capabilities of Large Language Models in Real-world Scenarios☆68Updated last month
- ☆31Updated this week
- ☆101Updated 8 months ago
- [ACL 2024] MT-Bench-101: A Fine-Grained Benchmark for Evaluating Large Language Models in Multi-Turn Dialogues☆96Updated 10 months ago
- The repository for paper "DebugBench: "Evaluating Debugging Capability of Large Language Models".☆77Updated 11 months ago
- Dataset and evaluation script for "Evaluating Hallucinations in Chinese Large Language Models"☆131Updated last year
- Astraios: Parameter-Efficient Instruction Tuning Code Language Models☆58Updated last year
- A distributed, extensible, secure solution for evaluating machine generated code with unit tests in multiple programming languages.☆55Updated 8 months ago
- A Comprehensive Benchmark for Software Development.☆108Updated last year
- [LREC-COLING'24] HumanEval-XL: A Multilingual Code Generation Benchmark for Cross-lingual Natural Language Generalization☆39Updated 3 months ago
- Code implementation of synthetic continued pretraining☆114Updated 5 months ago
- InstructCoder: Instruction Tuning Large Language Models for Code Editing | Oral ACL-2024 srw☆63Updated 8 months ago
- Generate the WizardCoder Instruct from the CodeAlpaca☆21Updated last year
- ☆50Updated last year
- Implementations of online merging optimizers proposed by Online Merging Optimizers for Boosting Rewards and Mitigating Tax in Alignment☆75Updated last year
- CLongEval: A Chinese Benchmark for Evaluating Long-Context Large Language Models☆40Updated last year
- Code for our EMNLP-2023 paper: "Active Instruction Tuning: Improving Cross-Task Generalization by Training on Prompt Sensitive Tasks"☆24Updated last year
- ☆142Updated 11 months ago
- StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback☆65Updated 9 months ago
- CFBench: A Comprehensive Constraints-Following Benchmark for LLMs☆35Updated 9 months ago
- [ACL 2024] FollowBench: A Multi-level Fine-grained Constraints Following Benchmark for Large Language Models☆101Updated last week
- Benchmarking Complex Instruction-Following with Multiple Constraints Composition (NeurIPS 2024 Datasets and Benchmarks Track)☆85Updated 4 months ago