infi-coder/infibench-evaluation-harness

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/infi-coder/infibench-evaluation-harness)

infi-coder / infibench-evaluation-harness

The Infibench variant of bigcode-evaluation-harness --- a framework for the evaluation of autoregressive code generation language models.

☆14

Alternatives and similar repositories for infibench-evaluation-harness

Users that are interested in infibench-evaluation-harness are comparing it to the libraries listed below

Sorting:

infi-coder / infibench-evaluator
View on GitHub
The evaluation framework for the InfiCoder-Eval benchmark.
☆21Jul 22, 2024Updated last year
HumanCompatibleAI / tensor-trust
View on GitHub
A prompt injection game to collect data for robust ML research
☆68Jan 27, 2025Updated last year
SalesforceAIResearch / swecomm
View on GitHub
☆28Nov 10, 2025Updated 3 months ago
seketeam / DevEval
View on GitHub
A Manually-Annotated Code Generation Benchmark Aligned with Real-World Code Repositories
☆36Sep 4, 2024Updated last year
fbarberis / tiktok-tts
View on GitHub
Text to audio with Tik-Tok Voices
☆13Apr 6, 2023Updated 2 years ago
TIGER-AI-Lab / TheoremQA
View on GitHub
The official repo for "TheoremQA: A Theorem-driven Question Answering dataset" (EMNLP 2023)
☆38May 15, 2024Updated last year
jspark-noticekorea / Camera-calibration-opencvSharp4
View on GitHub
calibrate camera with openCvSharp4
☆11Jun 11, 2021Updated 4 years ago
No-Raccoon1456 / NR-PROMPT-ENGINEERING
View on GitHub
☆10Sep 29, 2024Updated last year
light-and-ray / sd-webui-topaz-photo-ai-integration
View on GitHub
Topaz Photo AI upscaler inside sd-webui
☆12Jul 5, 2024Updated last year
cjerry1243 / M3Act
View on GitHub
[CVPR2024] Learning from Synthetic Human Group Activities
☆14Feb 24, 2025Updated last year
OpenGPTX / lm-evaluation-harness
View on GitHub
A framework for few-shot evaluation of autoregressive language models.
☆12Jul 14, 2025Updated 7 months ago
domaineval / DomainEval
View on GitHub
DOMAINEVAL is an auto-constructed benchmark for multi-domain code generation that consists of 2k+ subjects (i.e., description, reference …
☆14Dec 12, 2024Updated last year
iohub / OpenCopilot
View on GitHub
Copilot with deepseek and more...
☆13Mar 7, 2025Updated 11 months ago
zhimin-z / zhimin-z
View on GitHub
☆12Jan 11, 2026Updated last month
qwersyk / Newelle-tavily
View on GitHub
Simple and powerful extension for searching web and viewing website content.
☆11Apr 11, 2025Updated 10 months ago
spraakbanken / SuperLim-2
View on GitHub
A Swedish Natural Language Understanding Benchmark
☆11Dec 12, 2025Updated 2 months ago
MateoRbt / HomeServer
View on GitHub
Home server set up
☆13Oct 5, 2025Updated 5 months ago
tmpjr / pubmed
View on GitHub
A PHP 5.3+ wrapper to the NCBI/PubMed efetch API
☆12Oct 14, 2020Updated 5 years ago
open-compass / CriticEval
View on GitHub
[NeurIPS 2024] A comprehensive benchmark for evaluating critique ability of LLMs
☆49Nov 29, 2024Updated last year
goodmike31 / pl-asr-speech-data-survey
View on GitHub
Survey of available speech datasets for Polish ASR development
☆17Jan 1, 2025Updated last year
zhimin-z / Rigid-Body-Simulation
View on GitHub
☆11Oct 15, 2022Updated 3 years ago
gokayfem / ComfyUI-Fluxpromptenhancer
View on GitHub
A Prompt Enhancer for flux.1 in ComfyUI
☆12Jan 11, 2026Updated last month
yuh-zha / Align
View on GitHub
Align, a general text alignment function
☆15Dec 7, 2023Updated 2 years ago
SciMT / SciMT-benchmark
View on GitHub
☆11Jan 3, 2024Updated 2 years ago
CLUEbenchmark / SuperCLUE-Fin
View on GitHub
中文金融大模型测评基准，六大类二十五任务、等级化评价，国内模型获得A级
☆10May 6, 2024Updated last year
tsykin / cron-env
View on GitHub
Simple Cron Jobs Scheduler using environment variables + Bun + TypeScript
☆14Dec 9, 2025Updated 2 months ago
Helsinki-NLP / OPUS-MT-testsets
View on GitHub
benchmarks for evaluating MT models
☆11Jun 26, 2024Updated last year
JsSucks / BDI
View on GitHub
BetterDiscord Installer
☆10Mar 8, 2019Updated 6 years ago
scanoss / crypto_algorithms_open_dataset
View on GitHub
☆10Jan 28, 2026Updated last month
shakfu / cyllama
View on GitHub
A thin cython wrapper around llama.cpp, whisper.cpp and stable-diffusion.cpp
☆16Feb 10, 2026Updated 3 weeks ago
sadeqi-ah / react-ctx-toolkit
View on GitHub
Helper functions for React Context API inspired by @reduxjs/toolkit
☆11Nov 25, 2022Updated 3 years ago
UCLA-SEAL / DeepLearningTest
View on GitHub
Is Neuron Coverage a Meaningful Measure for Testing Deep Neural Networks? (FSE 2020)
☆10Sep 23, 2021Updated 4 years ago
stimm-ai / stimm
View on GitHub
The Open Source Voice Agent Platform. Orchestrate ultra-low latency AI pipelines for real-time conversations over WebRTC.
☆39Updated this week
roman-ryzenadvanced / Custom-Engineered-Agents-and-Tools-for-Vibe-Coders
View on GitHub
Custom Engineered Agents and Tools for Vibe Coders | Agents for TRAE.AI, Smart MCPs, GLM Models integration and more...
☆22Dec 24, 2025Updated 2 months ago
a-antoniades / swe-search
View on GitHub
☆12Nov 5, 2024Updated last year
sevdaimany / Puzzle-Solver
View on GitHub
🧩Using backtracking algorithm to solve binary puzzles
☆11Jul 17, 2021Updated 4 years ago
jeffery9 / qwen-mcp-tool
View on GitHub
MCP server that enables AI assistants to interact with Qwen code
☆23Aug 22, 2025Updated 6 months ago
hockerschwan / Kampos
View on GitHub
GUI for WireSock VPN client on Windows
☆14Jul 8, 2024Updated last year
stalwartlabs / spam-filter
View on GitHub
SPAM filter rules for Stalwart Mail Server
☆14Dec 16, 2025Updated 2 months ago