MCEVAL / McEval
☆26Updated last month
Related projects ⓘ
Alternatives and complementary repositories for McEval
- Generate the WizardCoder Instruct from the CodeAlpaca☆20Updated last year
- A Comprehensive Benchmark for Software Development.☆84Updated 5 months ago
- NaturalCodeBench (Findings of ACL 2024)☆56Updated last month
- Codev-Bench (Code Development Benchmark), a fine-grained, real-world, repository-level, and developer-centric evaluation framework. Codev…☆25Updated 2 weeks ago
- Feeling confused about super alignment? Here is a reading list☆43Updated 10 months ago
- An Evolving Code Generation Benchmark Aligned with Real-world Code Repositories☆46Updated 3 months ago
- The repository for paper "DebugBench: "Evaluating Debugging Capability of Large Language Models".☆57Updated 4 months ago
- xCodeEval: A Large Scale Multilingual Multitask Benchmark for Code Understanding, Generation, Translation and Retrieval☆74Updated 2 months ago
- We introduce ScaleQuest, a scalable, novel and cost-effective data synthesis method to unleash the reasoning capability of LLMs.☆51Updated 3 weeks ago
- ☆88Updated last month
- ☆89Updated 7 months ago
- ☆40Updated 5 months ago
- ☆71Updated 10 months ago
- code for Scaling Laws of RoPE-based Extrapolation☆70Updated last year
- Token level visualization tools for large language models☆50Updated last month
- 🧬 RegMix: Data Mixture as Regression for Language Model Pre-training☆88Updated last month
- ☆53Updated 4 months ago
- [ACL 2024] MT-Bench-101: A Fine-Grained Benchmark for Evaluating Large Language Models in Multi-Turn Dialogues☆51Updated 3 months ago
- ToolEyes: Fine-Grained Evaluation for Tool Learning Capabilities of Large Language Models in Real-world Scenarios☆62Updated 7 months ago
- Dataset and evaluation script for "Evaluating Hallucinations in Chinese Large Language Models"☆109Updated 5 months ago
- ☆125Updated last year
- ☆39Updated 5 months ago
- Repository of LV-Eval Benchmark☆48Updated 2 months ago
- [ACL 2024] Long-Context Language Modeling with Parallel Encodings☆147Updated 5 months ago
- Aix-bench, the Java benchmark for code synthesis problem.☆51Updated 2 years ago
- CLongEval: A Chinese Benchmark for Evaluating Long-Context Large Language Models☆38Updated 8 months ago
- Do Large Language Models Know What They Don’t Know?☆85Updated 2 weeks ago
- Collection of papers for scalable automated alignment.☆73Updated last month
- Astraios: Parameter-Efficient Instruction Tuning Code Language Models☆57Updated 7 months ago
- ACL 2024 | LooGLE: Long Context Evaluation for Long-Context Language Models☆167Updated last month