abacaj / code-evalLinks
Run evaluation on LLMs using human-eval benchmark
β419Updated 2 years ago
Alternatives and similar repositories for code-eval
Users that are interested in code-eval are comparing it to the libraries listed below
Sorting:
- π OctoPack: Instruction Tuning Code Large Language Modelsβ472Updated 7 months ago
- Open Source WizardCoder Datasetβ161Updated 2 years ago
- β275Updated 2 years ago
- Fine-tune SantaCoder for Code/Text Generation.β195Updated 2 years ago
- β667Updated 11 months ago
- A framework for the evaluation of autoregressive code generation language models.β981Updated 2 months ago
- [NeurIPS'24] SelfCodeAlign: Self-Alignment for Code Generationβ316Updated 7 months ago
- This repository contains code to quantitatively evaluate instruction-tuned models such as Alpaca and Flan-T5 on held-out tasks.β547Updated last year
- β¨ RepoBench: Benchmarking Repository-Level Code Auto-Completion Systems - ICLR 2024β171Updated last year
- [ICML 2023] Data and code release for the paper "DS-1000: A Natural and Reliable Benchmark for Data Science Code Generation".β256Updated 11 months ago
- Official repository for LongChat and LongEvalβ533Updated last year
- PaL: Program-Aided Language Models (ICML 2023)β511Updated 2 years ago
- β84Updated 2 years ago
- β371Updated 2 years ago
- β312Updated last year
- CrossCodeEval: A Diverse and Multilingual Benchmark for Cross-File Code Completion (NeurIPS 2023)β157Updated last month
- Accepted by Transactions on Machine Learning Research (TMLR)β131Updated 11 months ago
- [ICLR 2024] Lemur: Open Foundation Models for Language Agentsβ556Updated last year
- [NeurIPS 2023 D&B] Code repository for InterCode benchmark https://arxiv.org/abs/2306.14898β224Updated last year
- β472Updated last year
- Learning to Compress Prompts with Gist Tokens - https://arxiv.org/abs/2304.08467β295Updated 7 months ago
- Implementation of paper Data Engineering for Scaling Language Models to 128K Contextβ473Updated last year
- evol augment any dataset onlineβ60Updated 2 years ago
- Compress your input to ChatGPT or other LLMs, to let them process 2x more content and save 40% memory and GPU time.β397Updated last year
- β541Updated 10 months ago
- NexusRaven-13B, a new SOTA Open-Source LLM for function calling. This repo contains everything for reproducing our evaluation on NexusRavβ¦β317Updated 2 years ago
- Code and data for "MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning" [ICLR 2024]β375Updated last year
- [ICML'24 Spotlight] LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuningβ661Updated last year
- Mass-editing thousands of facts into a transformer memory (ICLR 2023)β516Updated last year
- A multi-programming language benchmark for LLMsβ276Updated last month