commit-0 / commit0Links

Commit0: Library Generation from Scratch

☆160

Alternatives and similar repositories for commit0

Users that are interested in commit0 are comparing it to the libraries listed below

Sorting:

r2e-project / r2e
r2e: turn any github repository into a programming agent environment
☆129Updated 3 months ago
ScalingIntelligence / Archon
Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.
☆175Updated 4 months ago
evalplus / repoqa
RepoQA: Evaluating Long-Context Code Understanding
☆113Updated 9 months ago
aorwall / moatless-tree-search
☆99Updated 2 months ago
eth-sri / matharena
Evaluation of LLMs on latest math competitions
☆155Updated 2 weeks ago
ScalingIntelligence / codemonkeys
☆41Updated 6 months ago
ServiceNow / PipelineRL
A scalable asynchronous reinforcement learning implementation with in-flight weight updates.
☆134Updated last week
PrimeIntellect-ai / genesys
☆130Updated 4 months ago
OpenPipe / deductive-reasoning
Train your own SOTA deductive reasoning model
☆103Updated 5 months ago
ConsequentAI / fneval
Functional Benchmarks and the Reasoning Gap
☆88Updated 10 months ago
magicproduct / hash-hop
Long context evaluation for large language models
☆220Updated 5 months ago
SWE-bench / SWE-smith
Scaling Data for SWE-agents
☆328Updated this week
BigComputer-Project / SWE-Arena
SWE Arena
☆33Updated last month
felipemaiapolo / tinyBenchmarks
Evaluating LLMs with fewer examples
☆160Updated last year
InternLM / SWE-Fixer
☆108Updated 2 months ago
letta-ai / sleep-time-compute
accompanying material for sleep-time compute paper
☆99Updated 3 months ago
data-for-agents / insta
Official Repo for InSTA: Towards Internet-Scale Training For Agents
☆52Updated 3 weeks ago
SalesforceAIResearch / LaTRO
☆118Updated 5 months ago
scicode-bench / SciCode
A benchmark that challenges language models to code solutions for scientific problems
☆127Updated last week
withmartian / routerbench
The code for the paper ROUTERBENCH: A Benchmark for Multi-LLM Routing System
☆131Updated last year
goodfire-ai / r1-interpretability
Open source interpretability artefacts for R1.
☆157Updated 3 months ago
google-deepmind / mishax
☆136Updated 4 months ago
teknium1 / LLM-Benchmark-Logs
Just a bunch of benchmark logs for different LLMs
☆119Updated last year
casper-hansen / OpenCoconut
OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.
☆173Updated 6 months ago
haizelabs / j1-micro
j1-micro (1.7B) & j1-nano (600M) are absurdly tiny but mighty reward models.
☆95Updated 2 weeks ago
haizelabs / Awesome-LLM-Judges
⚖️ Awesome LLM Judges ⚖️
☆108Updated 3 months ago
sher222 / LeReT
Learning to Retrieve by Trying - Source code for Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval
☆49Updated 9 months ago
METR / RE-Bench
☆95Updated 3 months ago
Alex-Gurung / ReasoningNCP
Official repo for Learning to Reason for Long-Form Story Generation
☆68Updated 3 months ago
HazyResearch / cartridges
Storing long contexts in tiny caches with self-study
☆121Updated last week