logic-star-ai / insightsLinks
We track and analyze the activity and performance of autonomous code agents in the wild
☆43Updated 2 months ago
Alternatives and similar repositories for insights
Users that are interested in insights are comparing it to the libraries listed below
Sorting:
- ☆112Updated 3 months ago
- ☆56Updated 7 months ago
- Coding problems used in aider's polyglot benchmark☆180Updated 9 months ago
- j1-micro (1.7B) & j1-nano (600M) are absurdly tiny but mighty reward models.☆98Updated 2 months ago
- Harness used to benchmark aider against SWE Bench benchmarks☆76Updated last year
- SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks?☆107Updated this week
- ☆133Updated this week
- ☆28Updated 3 months ago
- This repo tracks the opened and merged PRs by the top SWE coding agents by OpenAI, GitHub, and others. Updates every 3 hours.☆252Updated this week
- Public repository containing METR's DVC pipeline for eval data analysis☆110Updated 5 months ago
- Run SWE-bench evaluations remotely☆42Updated last month
- A framework for optimizing DSPy programs with RL☆182Updated this week
- Open Source Replication of Anthropic's Alignment Faking Paper☆50Updated 5 months ago
- A benchmark that challenges language models to code solutions for scientific problems☆143Updated this week
- RepoQA: Evaluating Long-Context Code Understanding☆117Updated 10 months ago
- Commit0: Library Generation from Scratch☆167Updated 4 months ago
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆183Updated 6 months ago
- [ICML '24] r2e: turn any github repository into a programming agent environment☆129Updated 5 months ago
- Accompanying code and SEP dataset for the "Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?" paper.☆55Updated 6 months ago
- ☆224Updated 3 months ago
- A Text-Based Environment for Interactive Debugging☆266Updated this week
- SWE Arena☆34Updated 2 months ago
- Pivotal Token Search☆125Updated 2 months ago
- Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"☆62Updated 9 months ago
- accompanying material for sleep-time compute paper☆111Updated 4 months ago
- A better way of testing, inspecting, and analyzing AI Agent traces.☆40Updated this week
- ☆133Updated 6 months ago
- ☆148Updated 3 months ago
- [ICML 2023] "Outline, Then Details: Syntactically Guided Coarse-To-Fine Code Generation", Wenqing Zheng, S P Sharan, Ajay Kumar Jaiswal, …☆41Updated last year
- Training-Ready RL Environments + Evals☆101Updated this week