logic-star-ai / insightsLinks
We track and analyze the activity and performance of autonomous code agents in the wild
☆47Updated 2 weeks ago
Alternatives and similar repositories for insights
Users that are interested in insights are comparing it to the libraries listed below
Sorting:
- Harness used to benchmark aider against SWE Bench benchmarks☆78Updated last year
- Public repository containing METR's DVC pipeline for eval data analysis☆164Updated 8 months ago
- Harbor is a framework for running agent evaluations and creating and using RL environments.☆213Updated this week
- A better way of testing, inspecting, and analyzing AI Agent traces.☆40Updated 2 months ago
- Open sourced predictions, execution logs, trajectories, and results from model inference + evaluation runs on the SWE-bench task.☆228Updated last week
- A Text-Based Environment for Interactive Debugging☆286Updated this week
- ☆128Updated 6 months ago
- SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks?☆233Updated last month
- ☆59Updated 10 months ago
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆190Updated 9 months ago
- Curated collection of community environments☆195Updated last week
- Run SWE-bench evaluations remotely☆47Updated 4 months ago
- ☆29Updated 2 months ago
- This repo tracks the opened and merged PRs by the top SWE coding agents by OpenAI, GitHub, and others. Updates every 3 hours.☆296Updated this week
- ☆68Updated 6 months ago
- Coding problems used in aider's polyglot benchmark☆198Updated last year
- SWE-PolyBench: A multi-language benchmark for repository level evaluation of coding agents☆75Updated this week
- ☆207Updated last week
- ☆136Updated 9 months ago
- A framework for optimizing DSPy programs with RL☆298Updated last month
- A library for benchmarking the Long Term Memory and Continual learning capabilities of LLM based agents. With all the tests and code you…☆81Updated last year
- Storing long contexts in tiny caches with self-study☆226Updated 2 weeks ago
- A system that tries to resolve all issues on a github repo with OpenHands.☆117Updated last year
- Agent computer interface for AI software engineer.☆115Updated 2 weeks ago
- Prompts used in the Automated Auditing Blog Post☆127Updated 5 months ago
- [ICML '24] R2E: Turn any GitHub Repository into a Programming Agent Environment☆138Updated 8 months ago
- ☆80Updated last month
- Functional Benchmarks and the Reasoning Gap☆90Updated last year
- Sandboxed code execution for AI agents, locally or on the cloud. Massively parallel, easy to extend. Powering SWE-agent and more.☆396Updated this week
- [NeurIPS '25] Challenging Software Optimization Tasks for Evaluating SWE-Agents☆58Updated 3 weeks ago