abacaj / code-evalLinks

Run evaluation on LLMs using human-eval benchmark

☆421

Alternatives and similar repositories for code-eval

Users that are interested in code-eval are comparing it to the libraries listed below

Sorting:

bigcode-project / octopack
🐙 OctoPack: Instruction Tuning Code Large Language Models
☆471Updated 8 months ago
nickrosh / evol-teacher
Open Source WizardCoder Dataset
☆160Updated 2 years ago
nlpxucan / evol-instruct
☆274Updated 2 years ago
bigcode-project / bigcode-evaluation-harness
A framework for the evaluation of autoregressive code generation language models.
☆985Updated 3 months ago
microsoft / CodeT
☆666Updated 11 months ago
declare-lab / instruct-eval
This repository contains code to quantitatively evaluate instruction-tuned models such as Alpaca and Flan-T5 on held-out tasks.
☆548Updated last year
DachengLi1 / LongChat
Official repository for LongChat and LongEval
☆531Updated last year
loubnabnl / santacoder-finetuning
Fine-tune SantaCoder for Code/Text Generation.
☆193Updated 2 years ago
bigcode-project / selfcodealign
[NeurIPS'24] SelfCodeAlign: Self-Alignment for Code Generation
☆317Updated 8 months ago
Zyq-scut / RLTF
Accepted by Transactions on Machine Learning Research (TMLR)
☆132Updated last year
conceptofmind / toolformer
☆371Updated 2 years ago
xlang-ai / DS-1000
[ICML 2023] Data and code release for the paper "DS-1000: A Natural and Reliable Benchmark for Data Science Code Generation".
☆256Updated 11 months ago
bigcode-project / bigcode-dataset
☆475Updated last year
Re-Align / URIAL
☆312Updated last year
reasoning-machines / pal
PaL: Program-Aided Language Models (ICML 2023)
☆512Updated 2 years ago
suzgunmirac / BIG-Bench-Hard
Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them
☆519Updated last year
my-other-github-account / llm-humaneval-benchmarks
☆83Updated 2 years ago
Leolty / repobench
✨ RepoBench: Benchmarking Repository-Level Code Auto-Completion Systems - ICLR 2024
☆174Updated last year
OpenLemur / Lemur
[ICLR 2024] Lemur: Open Foundation Models for Language Agents
☆555Updated last year
allenai / lumos
Code and data for "Lumos: Learning Agents with Unified Data, Modular Design, and Open-Source LLMs"
☆470Updated last year
huggingface / cosmopedia
☆544Updated 11 months ago
jondurbin / bagel
A bagel, with everything.
☆324Updated last year
FranxYao / Long-Context-Data-Engineering
Implementation of paper Data Engineering for Scaling Language Models to 128K Context
☆477Updated last year
nexusflowai / NexusRaven
NexusRaven-13B, a new SOTA Open-Source LLM for function calling. This repo contains everything for reproducing our evaluation on NexusRav…
☆316Updated 2 years ago
princeton-nlp / intercode
[NeurIPS 2023 D&B] Code repository for InterCode benchmark https://arxiv.org/abs/2306.14898
☆227Updated last year
GammaTauAI / leetcode-hard-gym
A hard gym for programming
☆161Updated last year
TIGER-AI-Lab / MAmmoTH
Code and data for "MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning" [ICLR 2024]
☆377Updated last year
lm-sys / llm-decontaminator
Code for the paper "Rethinking Benchmark and Contamination for Language Models with Rephrased Samples"
☆311Updated last year
nuprl / MultiPL-E
A multi-programming language benchmark for LLMs
☆278Updated 2 months ago
jayelm / gisting
Learning to Compress Prompts with Gist Tokens - https://arxiv.org/abs/2304.08467
☆296Updated 8 months ago