commit-0 / commit0Links
Commit0: Library Generation from Scratch
☆149Updated 3 weeks ago
Alternatives and similar repositories for commit0
Users that are interested in commit0 are comparing it to the libraries listed below
Sorting:
- A benchmark for LLMs on complicated tasks in the terminal☆141Updated this week
- Scaling Data for SWE-agents☆220Updated this week
- RepoQA: Evaluating Long-Context Code Understanding☆108Updated 7 months ago
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆173Updated 2 months ago
- A scalable asynchronous reinforcement learning implementation with in-flight weight updates.☆119Updated this week
- SWE Arena☆33Updated last month
- A simple unified framework for evaluating LLMs☆215Updated last month
- r2e: turn any github repository into a programming agent environment☆121Updated last month
- ⚖️ Awesome LLM Judges ⚖️☆103Updated last month
- ☆114Updated 3 months ago
- ☆58Updated 2 weeks ago
- Dynamic Cheatsheet: Test-Time Learning with Adaptive Memory☆61Updated last week
- ☆131Updated 2 months ago
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆171Updated 4 months ago
- Computer Agent Arena: Test & compare AI agents in real desktop apps & web environments. Code/data coming soon!☆45Updated last month
- ☆83Updated last month
- Long context evaluation for large language models☆213Updated 3 months ago
- Training an LLM to use a calculator with multi-turn reinforcement learning, achieving a **62% absolute increase in evaluation accuracy**.☆38Updated 3 weeks ago
- [ACL 2024] Do Large Language Models Latently Perform Multi-Hop Reasoning?☆67Updated 2 months ago
- A Collection of Competitive Text-Based Games for Language Model Evaluation and Reinforcement Learning☆169Updated this week
- A benchmark that challenges language models to code solutions for scientific problems☆123Updated this week
- Train your own SOTA deductive reasoning model☆92Updated 2 months ago
- ☆126Updated 2 months ago
- ☆41Updated 4 months ago
- Open source interpretability artefacts for R1.☆140Updated last month
- Official repo for Learning to Reason for Long-Form Story Generation☆60Updated last month
- Official repository for R2E-Gym: Procedural Environment Generation and Hybrid Verifiers for Scaling Open-Weights SWE Agents☆73Updated last month
- Just a bunch of benchmark logs for different LLMs☆119Updated 10 months ago
- Functional Benchmarks and the Reasoning Gap☆86Updated 8 months ago
- Utilities for efficient fine-tuning, inference and evaluation of code generation models☆21Updated last year