my-other-github-account / llm-humaneval-benchmarks
☆86Updated last year
Related projects ⓘ
Alternatives and complementary repositories for llm-humaneval-benchmarks
- Open Source WizardCoder Dataset☆153Updated last year
- CRUXEval: Code Reasoning, Understanding, and Execution Evaluation☆115Updated last month
- [NeurIPS 2023 D&B] Code repository for InterCode benchmark https://arxiv.org/abs/2306.14898☆194Updated 6 months ago
- ☆72Updated last year
- evol augment any dataset online☆55Updated last year
- Official repository for the paper "LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code"☆219Updated last month
- CrossCodeEval: A Diverse and Multilingual Benchmark for Cross-File Code Completion (NeurIPS 2023)☆122Updated 3 months ago
- Fine-tune SantaCoder for Code/Text Generation.☆186Updated last year
- Enhancing AI Software Engineering with Repository-level Code Graph☆96Updated 2 months ago
- RepoQA: Evaluating Long-Context Code Understanding☆100Updated 2 weeks ago
- Evaluating LLMs with fewer examples☆134Updated 7 months ago
- A distributed, extensible, secure solution for evaluating machine generated code with unit tests in multiple programming languages.☆42Updated last month
- ✨ RepoBench: Benchmarking Repository-Level Code Auto-Completion Systems - ICLR 2024☆133Updated 3 months ago
- ToolBench, an evaluation suite for LLM tool manipulation capabilities.☆145Updated 8 months ago
- [NeurIPS'24] SelfCodeAlign: Self-Alignment for Code Generation☆270Updated 2 weeks ago
- ☆263Updated last year
- Benchmarking LLMs with Challenging Tasks from Real Users☆195Updated 2 weeks ago
- Accepted by Transactions on Machine Learning Research (TMLR)☆119Updated last month
- Run evaluation on LLMs using human-eval benchmark☆379Updated last year
- ☆296Updated 5 months ago
- Spherical Merge Pytorch/HF format Language Models with minimal feature loss.☆112Updated last year
- A set of utilities for running few-shot prompting experiments on large-language models☆113Updated last year
- ☆146Updated 3 months ago
- ☆170Updated last year
- ☆175Updated last year
- ☆75Updated last year
- Load multiple LoRA modules simultaneously and automatically switch the appropriate combination of LoRA modules to generate the best answe…☆142Updated 9 months ago
- ☆39Updated 5 months ago
- Official implementation for 'Extending LLMs’ Context Window with 100 Samples'☆74Updated 10 months ago
- Can It Edit? Evaluating the Ability of Large Language Models to Follow Code Editing Instructions☆40Updated 3 months ago