Benchmarking Goal-Oriented Software Engineering
☆154May 5, 2026Updated 2 weeks ago
Alternatives and similar repositories for CodeClash
Users that are interested in CodeClash are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- TDD-Bench-Verified is a new benchmark for generating test cases for test-driven development (TDD)☆29Apr 28, 2026Updated 3 weeks ago
- Run SWE-bench evaluations remotely☆66Aug 14, 2025Updated 9 months ago
- Sandboxed code execution for AI agents, locally or on the cloud. Massively parallel, easy to extend. Powering SWE-agent and more.☆503Updated this week
- Qwen3-14B Orchestrator Agent Reinforcement Learning. **Achieved 160% improvement** on Stanford's TerminalBench☆102Nov 3, 2025Updated 6 months ago
- MoE training for Me and You and maybe other people☆386Mar 15, 2026Updated 2 months ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Harness for running and evaluating AI agents against RL environments☆171May 13, 2026Updated last week
- [NeurIPS 2025 D&B Spotlight] Scaling Data for SWE-agents☆649Apr 27, 2026Updated 3 weeks ago
- Recursive Self-Aggregation evals on ARC-AGI☆36Jan 26, 2026Updated 3 months ago
- ☆43May 11, 2026Updated last week
- ☆19Aug 10, 2024Updated last year
- Calling LLM APIs on a Raspberry Pi for lulz☆24Apr 17, 2023Updated 3 years ago
- ☆34Mar 21, 2026Updated 2 months ago
- A Symbolic Emulator for Shuffle Synthesis on the NVIDIA PTX Code☆16Mar 19, 2023Updated 3 years ago
- 📊 LLM Context Benchmarks - A comprehensive benchmarking tool for testing LLMs with varying context sizes using Ollama. Features dual b…☆63May 10, 2026Updated last week
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- Production-Grade Autoresearch. Ideal for GPU kernels, ML model development, feature engineering, prompt engineering, and other optimizabl…☆48May 7, 2026Updated 2 weeks ago
- Hand-Rolled GPU communications library☆93Nov 25, 2025Updated 5 months ago
- moodist☆28Apr 23, 2026Updated 3 weeks ago
- Implementation of StrongDM's Attractor spec (https://github.com/strongdm/attractor) in Rust☆34May 14, 2026Updated last week
- An ergonomic, opinionated memory interface for AI agents☆39Dec 18, 2025Updated 5 months ago
- Live-SWE-agent: live, runtime self-evolving software engineering agent☆393Jan 19, 2026Updated 4 months ago
- A framework for pitting LLMs against each other in an evolving library of games ⚔☆35Apr 17, 2025Updated last year
- Clean RL implementation using MLX☆34Mar 8, 2024Updated 2 years ago
- GSM-Plus: Data, Code, and Evaluation for Enhancing Robust Mathematical Reasoning in Math Word Problems.☆65Jul 8, 2024Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- a library for massive analyses of internal voids in biomolecules and ligand transport through them☆10Updated this week
- Abstraction and Reasoning Corpus☆14Nov 22, 2022Updated 3 years ago
- RACE is a multi-dimensional benchmark for code generation that focuses on Readability, mAintainability, Correctness, and Efficiency.☆14Oct 12, 2024Updated last year
- Benchmark Large Language Models Reliably On Your Data☆18Dec 27, 2025Updated 4 months ago
- ☆10Nov 6, 2024Updated last year
- Reached #1 on Stanford's Terminal Bench leaderboard. New SOTA on agentic coding. Sharing some insights on how it is built and some ablat…☆67Nov 3, 2025Updated 6 months ago
- This repository provides an implementation of the DTi2Vec tool, to identify Drug-Target interaction using network embedding and ensemble …☆12Sep 28, 2021Updated 4 years ago
- ☆136Oct 16, 2025Updated 7 months ago
- Analyse metabolic stability predictions using SHapley Additive exPlanations.☆11Jul 26, 2023Updated 2 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- ☆13Feb 5, 2024Updated 2 years ago
- ☆15May 17, 2022Updated 4 years ago
- Agent skill for managing Omarchy Linux systems with natural language☆24Jan 5, 2026Updated 4 months ago
- A meta-repo that watches karpathy/autoresearch and adjacent systems, distills portable patterns for bounded agent-verifier research lo…☆43May 8, 2026Updated last week
- ☆114Apr 10, 2026Updated last month
- NSA Triton Kernels written with GPT5 and Opus 4.1☆70Aug 12, 2025Updated 9 months ago
- run deepseek v3 on a single node. Drops unused experts from memory.☆16Jan 26, 2025Updated last year