commit-0 / commit0Links
Commit0: Library Generation from Scratch
☆161Updated 3 months ago
Alternatives and similar repositories for commit0
Users that are interested in commit0 are comparing it to the libraries listed below
Sorting:
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆177Updated 5 months ago
- r2e: turn any github repository into a programming agent environment☆130Updated 4 months ago
- RepoQA: Evaluating Long-Context Code Understanding☆115Updated 9 months ago
- Long context evaluation for large language models☆220Updated 5 months ago
- Train your own SOTA deductive reasoning model☆104Updated 5 months ago
- ☆41Updated 7 months ago
- ☆108Updated 2 months ago
- Evaluation of LLMs on latest math competitions☆160Updated 2 weeks ago
- ☆130Updated 5 months ago
- A scalable asynchronous reinforcement learning implementation with in-flight weight updates.☆137Updated this week
- Storing long contexts in tiny caches with self-study☆140Updated last week
- ⚖️ Awesome LLM Judges ⚖️☆122Updated 4 months ago
- Small, simple agent task environments for training and evaluation☆18Updated 9 months ago
- Learning to Retrieve by Trying - Source code for Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval☆49Updated 9 months ago
- Evaluating LLMs with fewer examples☆160Updated last year
- ☆89Updated 7 months ago
- The code for the paper ROUTERBENCH: A Benchmark for Multi-LLM Routing System☆132Updated last year
- SWE Arena☆33Updated last month
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆173Updated 7 months ago
- j1-micro (1.7B) & j1-nano (600M) are absurdly tiny but mighty reward models.☆95Updated last month
- A simple unified framework for evaluating LLMs☆240Updated 4 months ago
- ☆120Updated 6 months ago
- ☆54Updated last year
- ☆98Updated 4 months ago
- A benchmark that challenges language models to code solutions for scientific problems☆132Updated last week
- LILO: Library Induction with Language Observations☆88Updated 11 months ago
- Evaluating LLMs with CommonGen-Lite☆91Updated last year
- Scaling Data for SWE-agents☆378Updated this week
- ☆139Updated last week
- accompanying material for sleep-time compute paper☆105Updated 3 months ago