Benchmarking Goal-Oriented Software Engineering
☆165May 5, 2026Updated last month
Alternatives and similar repositories for CodeClash
Users that are interested in CodeClash are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- a Python library that uses Reinforcement Learning (RL) to train LLMs.☆43Apr 15, 2026Updated last month
- TDD-Bench-Verified is a new benchmark for generating test cases for test-driven development (TDD)☆31Apr 28, 2026Updated last month
- Run SWE-bench evaluations remotely☆69Aug 14, 2025Updated 9 months ago
- ☆14Apr 16, 2025Updated last year
- Sandboxed code execution for AI agents, locally or on the cloud. Massively parallel, easy to extend. Powering SWE-agent and more.☆519Jun 1, 2026Updated last week
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Qwen3-14B Orchestrator Agent Reinforcement Learning. **Achieved 160% improvement** on Stanford's TerminalBench☆101Nov 3, 2025Updated 7 months ago
- MoE training for Me and You and maybe other people☆386Mar 15, 2026Updated 2 months ago
- Harness for running and evaluating AI agents against RL environments☆189Updated this week
- [NeurIPS 2025 D&B Spotlight] Scaling Data for SWE-agents☆672Updated this week
- ☆19Aug 10, 2024Updated last year
- Programmable chat templates for LLM training and inference.☆109Updated this week
- Calling LLM APIs on a Raspberry Pi for lulz☆24Apr 17, 2023Updated 3 years ago
- ☆34Mar 21, 2026Updated 2 months ago
- A Symbolic Emulator for Shuffle Synthesis on the NVIDIA PTX Code☆16Mar 19, 2023Updated 3 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- 📊 LLM Context Benchmarks - A comprehensive benchmarking tool for testing LLMs with varying context sizes using Ollama. Features dual b…☆67Jun 4, 2026Updated last week
- Hand-Rolled GPU communications library☆94Nov 25, 2025Updated 6 months ago
- Production-Grade Autoresearch. Ideal for GPU kernels, ML model development, feature engineering, prompt engineering, and other optimizabl…☆51Updated this week
- moodist☆28Apr 23, 2026Updated last month
- An ergonomic, opinionated memory interface for AI agents☆39Dec 18, 2025Updated 5 months ago
- A framework for pitting LLMs against each other in an evolving library of games ⚔☆35Apr 17, 2025Updated last year
- Clean RL implementation using MLX☆34Mar 8, 2024Updated 2 years ago
- 🌳 MCTS-inspired parallel beam search for conversation optimization. Explore multiple dialogue strategies simultaneously, stress-test a…☆36Jan 18, 2026Updated 4 months ago
- RACE is a multi-dimensional benchmark for code generation that focuses on Readability, mAintainability, Correctness, and Efficiency.☆14Oct 12, 2024Updated last year
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- Abstraction and Reasoning Corpus☆14Nov 22, 2022Updated 3 years ago
- Benchmark Large Language Models Reliably On Your Data☆18Dec 27, 2025Updated 5 months ago
- RISC-V vector extension ISA simulation☆18Jun 11, 2019Updated 7 years ago
- ☆10Nov 6, 2024Updated last year
- A lightweight computational physics framework, based on the organization of turboWAVE. Implements a "Simulation, PhysicsModule, ComputeTo…☆12Apr 1, 2026Updated 2 months ago
- ☆139Oct 16, 2025Updated 7 months ago
- PIRA - Automatic Instrumentation Refinement☆17Mar 28, 2024Updated 2 years ago
- ☆12Mar 15, 2024Updated 2 years ago
- a benchmark to evaluate the situated inductive reasoning☆16Jan 7, 2025Updated last year
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- nyc is so back☆21Jun 27, 2025Updated 11 months ago
- run deepseek v3 on a single node. Drops unused experts from memory.☆16Jan 26, 2025Updated last year
- ☆12Mar 3, 2023Updated 3 years ago
- Official Code Repository for EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents (COLM 2024)☆40Jul 13, 2024Updated last year
- ☆45Jan 10, 2026Updated 5 months ago
- in this repository, i'm going to implement increasingly complex llm inference optimizations☆85May 22, 2025Updated last year
- Agentic RL Training at Scale☆1,427Updated this week