all-the-noises/eval-arena

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/all-the-noises/eval-arena)

all-the-noises / eval-arena

☆34

Alternatives and similar repositories for eval-arena

Users that are interested in eval-arena are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

amazon-science / llm-code-preference
View on GitHub
Training and Benchmarking LLMs for Code Preference.
☆38Nov 15, 2024Updated last year
codetlingua / codetlingua
View on GitHub
☆18Apr 15, 2024Updated 2 years ago
bigcode-project / astraios
View on GitHub
Astraios: Parameter-Efficient Instruction Tuning Code Language Models
☆63Apr 10, 2024Updated 2 years ago
ise-uiuc / blazedit
View on GitHub
Making code edting up to 7.7x faster using multi-layer speculation
☆23Feb 20, 2025Updated last year
ise-uiuc / neuri-artifact
View on GitHub
Artifact for ESEC/FSE'23 paper "NeuRI: Diversifying DNN Generation via Inductive Rule Inference"
☆33Nov 13, 2023Updated 2 years ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
facebookresearch / cruxeval
View on GitHub
CRUXEval: Code Reasoning, Understanding, and Execution Evaluation
☆171Oct 11, 2024Updated last year
tongshuangwu / llm-crowdsourcing-pipeline
View on GitHub
☆11Jul 6, 2023Updated 3 years ago
ganler / memcov
View on GitHub
Collect simple coverage information in memory.
☆11Oct 6, 2022Updated 3 years ago
3B-Group / ConvRe
View on GitHub
🤖ConvRe🤯: An Investigation of LLMs’ Inefficacy in Understanding Converse Relations (EMNLP 2023)
☆24Oct 10, 2023Updated 2 years ago
sola-st / DyPyBench
View on GitHub
☆17Nov 12, 2025Updated 8 months ago
LZhengisme / self-infilling
View on GitHub
[ICML 2024] Self-Infilling Code Generation
☆18May 5, 2024Updated 2 years ago
SparksofAGI / MHPP
View on GitHub
☆35Sep 14, 2025Updated 10 months ago
ntunlp / ExecEval
View on GitHub
A distributed, extensible, secure solution for evaluating machine generated code with unit tests in multiple programming languages.
☆64Oct 21, 2024Updated last year
vzhong / silg
View on GitHub
☆20Jan 14, 2022Updated 4 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
LLM360 / TxT360
View on GitHub
☆25Dec 18, 2024Updated last year
gonglinyuan / safim
View on GitHub
☆48May 6, 2025Updated last year
llm4code / 2024
View on GitHub
The First International Workshop on Large Language Model for Code 2024 (Co-Located with ICSE 2024)
☆18Oct 4, 2024Updated last year
evo-eval / evoeval
View on GitHub
EvoEval: Evolving Coding Benchmarks via LLM
☆84Apr 6, 2024Updated 2 years ago
zorazrw / odex
View on GitHub
[EMNLP'23] Execution-Based Evaluation for Open Domain Code Generation
☆49Dec 22, 2023Updated 2 years ago
SEC-bench / SEC-bench
View on GitHub
Automated Benchmarking of LLM Agents on Real-World Software Security Tasks [NeurIPS 2025]
☆87Jan 27, 2026Updated 5 months ago
ruiqi-zhong / nlparam
View on GitHub
Augmenting Statistical Models with Natural Language Parameters
☆28Sep 17, 2024Updated last year
bigcode-project / bigcodebench
View on GitHub
[ICLR'25] BigCodeBench: Benchmarking Code Generation Towards AGI
☆515Jan 3, 2026Updated 6 months ago
NEUIR / INTERVENOR
View on GitHub
[ACL '24] Source code for paper: INTERVENOR : Prompt the Coding Ability of Large Language Models with the Interactive Chain of Repairing
☆30Nov 25, 2024Updated last year
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
Zhiyuan-Zeng / EvalTree
View on GitHub
[COLM 2025] EvalTree: Profiling Language Model Weaknesses via Hierarchical Capability Trees
☆31Jul 11, 2025Updated last year
ganler / code-r1
View on GitHub
Reproducing R1 for Code with Reliable Rewards
☆313May 5, 2025Updated last year
jamesmurdza / humaneval-results
View on GitHub
Evaluation results of code generation LLMs
☆32Sep 1, 2023Updated 2 years ago
facebookresearch / mbr-exec
View on GitHub
code for "Natural Language to Code Translation with Execution"
☆41Nov 2, 2022Updated 3 years ago
GAIR-NLP / self-improvement-reversal
View on GitHub
☆13Jul 14, 2024Updated 2 years ago
xlang-ai / EVOR
View on GitHub
☆70Dec 15, 2024Updated last year
HKUNLP / critic-rl
View on GitHub
[ICML 2025] Teaching Language Models to Critique via Reinforcement Learning
☆126May 6, 2025Updated last year
xlang-ai / Binder
View on GitHub
[ICLR 2023] Code for the paper "Binding Language Models in Symbolic Languages"
☆326Aug 25, 2023Updated 2 years ago
MetaronWang / StackPropagation-SLU-TF
View on GitHub
A TensorFlow implement for "A Stack-Propagation Framework with Token-Level Intent Detection for Spoken Language Understanding".
☆10Jan 22, 2021Updated 5 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
YihongDong / CDD-TED4LLMs
View on GitHub
☆16Nov 26, 2024Updated last year
ntunlp / LLMSanitize
View on GitHub
An open-source library for contamination detection in NLP datasets and Large Language Models (LLMs).
☆61Aug 13, 2024Updated last year
RLHFlow / Directional-Preference-Alignment
View on GitHub
Directional Preference Alignment
☆62Sep 23, 2024Updated last year
gonglinyuan / metro_t0
View on GitHub
Code repo for "Model-Generated Pretraining Signals Improves Zero-Shot Generalization of Text-to-Text Transformers" (ACL 2023)
☆22Nov 1, 2023Updated 2 years ago
ganler / ResearchReading
View on GitHub
General system research material (not limited to paper) reading notes.
☆22Mar 17, 2021Updated 5 years ago
chuyg1005 / seeclick-crawler
View on GitHub
☆20Apr 24, 2024Updated 2 years ago
SivilTaram / code-html-to-markdown
View on GitHub
A lightweight script for processing HTML page to markdown format with support for code blocks
☆81Apr 14, 2024Updated 2 years ago