my-other-github-account / llm-humaneval-benchmarksLinks

☆83

Alternatives and similar repositories for llm-humaneval-benchmarks

Users that are interested in llm-humaneval-benchmarks are comparing it to the libraries listed below

Sorting:

emrgnt-cmplxty / zero-shot-replication
☆73Updated 2 years ago
FSoft-AI4Code / CodeCapybara
Open-source Self-Instruction Tuning Code LLM
☆169Updated 2 years ago
nlpxucan / evol-instruct
☆274Updated 2 years ago
nickrosh / evol-teacher
Open Source WizardCoder Dataset
☆160Updated 2 years ago
loubnabnl / santacoder-finetuning
Fine-tune SantaCoder for Code/Text Generation.
☆193Updated 2 years ago
abacaj / code-eval
Run evaluation on LLMs using human-eval benchmark
☆420Updated 2 years ago
manyoso / haltt4llm
This project is an attempt to create a common metric to test LLM's for progress in eliminating hallucinations which is the most serious c…
☆222Updated 2 years ago
Zyq-scut / RLTF
Accepted by Transactions on Machine Learning Research (TMLR)
☆132Updated last year
bigcode-project / selfcodealign
[NeurIPS'24] SelfCodeAlign: Self-Alignment for Code Generation
☆319Updated 8 months ago
GammaTauAI / leetcode-hard-gym
A hard gym for programming
☆161Updated last year
bigcode-project / octopack
🐙 OctoPack: Instruction Tuning Code Large Language Models
☆471Updated 8 months ago
bhargaviparanjape / language-programmes
☆173Updated 2 years ago
juyongjiang / CodeUp
CodeUp: A Multilingual Code Generation Llama-X Model with Parameter-Efficient Instruction-Tuning
☆127Updated 10 months ago
NL2Code / CodeR
☆160Updated last year
luohongyin / SAIL
SAIL: Search Augmented Instruction Learning
☆157Updated 3 months ago
reasoning-machines / prompt-lib
A set of utilities for running few-shot prompting experiments on large-language models
☆123Updated 2 years ago
Gryphe / BlockMerge_Gradient
Merge Transformers language models by use of gradient parameters.
☆207Updated last year
ntunlp / ExecEval
A distributed, extensible, secure solution for evaluating machine generated code with unit tests in multiple programming languages.
☆56Updated last year
theblackcat102 / evol-dataset
evol augment any dataset online
☆59Updated 2 years ago
dust-tt / llama-ssp
Experiments on speculative sampling with Llama models
☆125Updated 2 years ago
bigcode-project / bigcode-analysis
Repository for analysis and experiments in the BigCode project.
☆124Updated last year
young-geng / koala_data_pipeline
The data processing pipeline for the Koala chatbot language model
☆118Updated 2 years ago
facebookresearch / cruxeval
CRUXEval: Code Reasoning, Understanding, and Execution Evaluation
☆154Updated last year
sambanova / toolbench
ToolBench, an evaluation suite for LLM tool manipulation capabilities.
☆163Updated last year
uukuguy / multi_loras
Load multiple LoRA modules simultaneously and automatically switch the appropriate combination of LoRA modules to generate the best answe…
☆158Updated last year
allenai / CommonGen-Eval
Evaluating LLMs with CommonGen-Lite
☆91Updated last year
wang-research-lab / agentinstruct
Code repo for "Agent Instructs Large Language Models to be General Zero-Shot Reasoners"
☆116Updated this week
princeton-nlp / intercode
[NeurIPS 2023 D&B] Code repository for InterCode benchmark https://arxiv.org/abs/2306.14898
☆227Updated last year
shuyanzhou / docprompting
Data and code for "DocPrompting: Generating Code by Retrieving the Docs" @ICLR 2023
☆249Updated last year
Leolty / repobench
✨ RepoBench: Benchmarking Repository-Level Code Auto-Completion Systems - ICLR 2024
☆174Updated last year