CodeClash-ai/CodeClash

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/CodeClash-ai/CodeClash)

CodeClash-ai / CodeClash

Benchmarking Goal-Oriented Software Engineering

☆189

Alternatives and similar repositories for CodeClash

Users that are interested in CodeClash are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

SWE-bench / SWE-smith
View on GitHub
[NeurIPS 2025 D&B Spotlight] Scaling Data for SWE-agents
☆709Jul 13, 2026Updated last week
SWE-agent / SWE-ReX
View on GitHub
Sandboxed code execution for AI agents, locally or on the cloud. Massively parallel, easy to extend. Powering SWE-agent and more.
☆554Updated this week
SWE-bench / sb-cli
View on GitHub
Run SWE-bench evaluations remotely
☆78Aug 14, 2025Updated 11 months ago
abundant-ai / SWE-gen
View on GitHub
Convert GitHub PRs into Harbor tasks
☆71Jul 13, 2026Updated last week
SALT-NLP / SWE-chat
View on GitHub
SWE-chat: Coding Agent Interactions From Real Users in the Wild
☆29Apr 24, 2026Updated 2 months ago
Open source password manager - Proton Pass • Ad
Securely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
SWE-agent / mini-swe-agent
View on GitHub
The 100 line AI agent that solves GitHub issues or helps you in your command line. Radically simple, no huge configs, no giant monorepo—b…
☆5,914Updated this week
microsoft / SWE-bench-Live
View on GitHub
[NeurIPS 2025 D&B] 🚀 SWE-bench Goes Live!
☆209Jun 11, 2026Updated last month
aisa-group / PostTrainBench
View on GitHub
Measuring how well CLI agents like Claude Code or Codex CLI can post-train base LLMs on a single H100 GPU in 10 hours
☆462Updated this week
SWE-bench / SWE-bench
View on GitHub
SWE-bench: Can Language Models Resolve Real-world Github Issues?
☆5,459Apr 1, 2026Updated 3 months ago
oripress / AlgoTune
View on GitHub
AlgoTune is a NeurIPS 2025 benchmark made up of 154 math, physics, and computer science problems. The goal is write code that solves each…
☆109Jun 24, 2026Updated 3 weeks ago
TuringEnterprises / SWE-Bench-plus-plus
View on GitHub
SWE-Bench-plus-plus
☆25Feb 5, 2026Updated 5 months ago
scaleapi / SWE-bench_Pro-os
View on GitHub
SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks?
☆485May 18, 2026Updated 2 months ago
waynchi / editbench
View on GitHub
☆31Apr 7, 2026Updated 3 months ago
commit-0 / commit0
View on GitHub
Commit0: Library Generation from Scratch
☆189Feb 24, 2026Updated 4 months ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
NovaSky-AI / SkyRL
View on GitHub
SkyRL: A Modular Full-stack RL Library for LLMs
☆2,081Updated this week
harbor-framework / terminal-bench
View on GitHub
A benchmark for LLMs on complicated tasks in the terminal
☆2,467Jul 11, 2026Updated last week
SakanaAI / ALE-Bench
View on GitHub
The official repository of ALE-Bench
☆198Updated this week
IBM / TDD-Bench-Verified
View on GitHub
TDD-Bench-Verified is a new benchmark for generating test cases for test-driven development (TDD)
☆33Jun 18, 2026Updated last month
Danau5tin / Orca-Agent-RL
View on GitHub
Qwen3-14B Orchestrator Agent Reinforcement Learning. **Achieved 160% improvement** on Stanford's TerminalBench
☆102Nov 3, 2025Updated 8 months ago
SWE-Gym / SWE-Gym
View on GitHub
Code for Paper: Training Software Engineering Agents and Verifiers with SWE-Gym [ICML 2025]
☆708Jul 29, 2025Updated 11 months ago
open-thoughts / OpenThoughts-Agent
View on GitHub
Data recipes and robust infrastructure for training AI agents
☆260Updated this week
R2E-Gym / R2E-Gym
View on GitHub
[COLM 2025] Official repository for R2E-Gym: Procedural Environment Generation and Hybrid Verifiers for Scaling Open-Weights SWE Agents
☆307Jul 13, 2025Updated last year
harbor-framework / harbor
View on GitHub
Framework for evaluating and improving agents
☆3,320Updated this week
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
facebookresearch / ProgramBench
View on GitHub
Can Language Models Rebuild Programs From Scratch?
☆855Jul 14, 2026Updated last week
Siyuexi / Hue
View on GitHub
[ESEC/FSE'23] Hue: A User-Adaptive Parser for Hybrid Logs
☆10Aug 24, 2023Updated 2 years ago
microsoft / FEA-Bench
View on GitHub
[ACL25] FEA-Bench: A Benchmark for Evaluating Repository-Level Code Generation for Feature Implementation
☆57Jan 28, 2026Updated 5 months ago
RobustNLP / TestNER
View on GitHub
A toolkit for testing and improving named entity recognition [ESEC/FSE'23]
☆11Aug 31, 2023Updated 2 years ago
badlogic / pi-terminal-bench
View on GitHub
Harbor agent adapter for pi coding agent to run Terminal-Bench evaluations
☆31Dec 1, 2025Updated 7 months ago
all-the-noises / eval-arena
View on GitHub
☆34Mar 21, 2026Updated 3 months ago
harbor-framework / terminal-bench-challenges
View on GitHub
☆18Jun 18, 2026Updated last month
gso-bench / gso
View on GitHub
[NeurIPS '25] GSO: Challenging Software Optimization Tasks for Evaluating SWE-Agents
☆87Jul 12, 2026Updated last week
Proximal-Labs / frontier-swe
View on GitHub
FrontierSWE is an ultra long-horizon coding agent benchmark that tests implementation, performance eng and ML research
☆187Updated this week
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
ScalingIntelligence / kernelbench-tinker
View on GitHub
Tinker ↔ KernelBench Integration enabling RL for GPU Kernel Generation
☆29Mar 5, 2026Updated 4 months ago
apple / ml-tic-lm
View on GitHub
Repository for the paper: "TiC-LM: A Web-Scale Benchmark for Time-Continual LLM Pretraining" ACL Oral 2025
☆24Apr 19, 2026Updated 3 months ago
lucidrains / populora
View on GitHub
Implementation and explorations into PopuLoRA, Co-Evolving LLM Populations for Reasoning Self-Play
☆15Jun 3, 2026Updated last month
EleutherAI / deep-ignorance
View on GitHub
☆20Jan 7, 2026Updated 6 months ago
Zayne-sprague / MuSR
View on GitHub
☆57Aug 10, 2024Updated last year
pgasawa / continual-learning-bench
View on GitHub
Continual Learning Bench
☆186Updated this week
SWE-EVO / SWE-EVO
View on GitHub
☆53May 3, 2026Updated 2 months ago