Benchmarking Goal-Oriented Software Engineering
☆122Jan 7, 2026Updated 2 months ago
Alternatives and similar repositories for CodeClash
Users that are interested in CodeClash are comparing it to the libraries listed below
Sorting:
- a Python library that uses Reinforcement Learning (RL) to train LLMs.☆43Updated this week
- Run SWE-bench evaluations remotely☆60Aug 14, 2025Updated 7 months ago
- ☆14Apr 16, 2025Updated 11 months ago
- Harness for running and evaluating AI agents against RL environments☆132Mar 6, 2026Updated 2 weeks ago
- MoE training for Me and You and maybe other people☆375Updated this week
- [NeurIPS 2025 D&B Spotlight] Scaling Data for SWE-agents☆597Updated this week
- On-Page SEO Analyzer - Analyze your website's SEO performance☆29Mar 14, 2026Updated last week
- A comprehensive tool for assessing AI Agents performance in simulated poker environments☆21Nov 27, 2024Updated last year
- Sandboxed code execution for AI agents, locally or on the cloud. Massively parallel, easy to extend. Powering SWE-agent and more.☆453Updated this week
- ☆34Mar 5, 2026Updated 2 weeks ago
- Live-SWE-agent: live, runtime self-evolving software engineering agent☆334Jan 19, 2026Updated 2 months ago
- Hand-Rolled GPU communications library☆87Nov 25, 2025Updated 3 months ago
- moodist☆25Mar 13, 2026Updated last week
- Clean RL implementation using MLX☆34Mar 8, 2024Updated 2 years ago
- GSM-Plus: Data, Code, and Evaluation for Enhancing Robust Mathematical Reasoning in Math Word Problems.☆64Jul 8, 2024Updated last year
- A framework for pitting LLMs against each other in an evolving library of games ⚔☆35Apr 17, 2025Updated 11 months ago
- An ergonomic, opinionated memory interface for AI agents☆39Dec 18, 2025Updated 3 months ago
- ☆24Updated this week
- Generate PDFs using Cloudflare Workers and Browser Rendering☆47Jun 20, 2025Updated 9 months ago
- Abstraction and Reasoning Corpus☆14Nov 22, 2022Updated 3 years ago
- A lightweight computational physics framework, based on the organization of turboWAVE. Implements a "Simulation, PhysicsModule, ComputeTo…☆11Jun 13, 2023Updated 2 years ago
- ☆10Nov 6, 2024Updated last year
- PIRA - Automatic Instrumentation Refinement☆16Mar 28, 2024Updated last year
- ☆15May 17, 2022Updated 3 years ago
- ☆13Feb 5, 2024Updated 2 years ago
- ☆11Mar 15, 2024Updated 2 years ago
- a benchmark to evaluate the situated inductive reasoning☆15Jan 7, 2025Updated last year
- nyc is so back☆21Jun 27, 2025Updated 8 months ago
- A meta-repo that watches karpathy/autoresearch and adjacent systems, distills portable patterns for bounded agent-verifier research lo…☆38Mar 11, 2026Updated last week
- NSA Triton Kernels written with GPT5 and Opus 4.1☆71Aug 12, 2025Updated 7 months ago
- ☆12Mar 3, 2023Updated 3 years ago
- tuimorphic choose-your-own-adventure story game☆18Mar 3, 2026Updated 2 weeks ago
- [ACL 2021] Learning to Perturb Word Embeddings for Out-of-distribution QA☆16May 11, 2022Updated 3 years ago
- A benchmark for LLMs on complicated tasks in the terminal☆1,732Jan 22, 2026Updated 2 months ago
- Official Code Repository for the paper "Generative Modeling on Manifolds Through Mixture of Riemannian Diffusion Processes" (ICML 2024).☆15Jul 21, 2024Updated last year
- Official Code Repository for EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents (COLM 2024)☆40Jul 13, 2024Updated last year
- in this repository, i'm going to implement increasingly complex llm inference optimizations☆84May 22, 2025Updated 9 months ago
- Async RL Training at Scale☆1,156Updated this week
- A series of high-performance GEMM (General Matrix Multiply) implementations Iteratively optimised for H100 GPUs in Pure CUDA.☆73Feb 18, 2026Updated last month