logic-star-ai / insightsLinks
We track and analyze the activity and performance of autonomous code agents in the wild
☆38Updated last month
Alternatives and similar repositories for insights
Users that are interested in insights are comparing it to the libraries listed below
Sorting:
- This repo tracks the opened and merged PRs by the top SWE coding agents by OpenAI, GitHub, and others. Updates every 3 hours.☆243Updated this week
- A Text-Based Environment for Interactive Debugging☆257Updated last week
- Coding problems used in aider's polyglot benchmark☆175Updated 8 months ago
- Public repository containing METR's DVC pipeline for eval data analysis☆104Updated 4 months ago
- Accompanying code and SEP dataset for the "Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?" paper.☆54Updated 5 months ago
- ☆28Updated 2 months ago
- ☆109Updated 2 months ago
- Experimental wasm32-unknown-wasi runtime for Python code execution☆37Updated 9 months ago
- j1-micro (1.7B) & j1-nano (600M) are absurdly tiny but mighty reward models.☆96Updated last month
- Pivotal Token Search☆123Updated last month
- A framework for optimizing DSPy programs with RL☆151Updated last week
- Prompts used in the Automated Auditing Blog Post☆93Updated last month
- This repository contains the toolkit for replicating results from our technical report.☆120Updated last month
- A better way of testing, inspecting, and analyzing AI Agent traces.☆40Updated last month
- ☆163Updated 8 months ago
- Code for our paper PAPILLON: PrivAcy Preservation from Internet-based and Local Language MOdel ENsembles☆54Updated 3 months ago
- A benchmarking tool for evaluating AI coding assistants on real-world software engineering tasks from the SWE-Bench dataset.☆53Updated 2 months ago
- Condense source code for LLM analysis by extracting essential highlights, utilizing a simplified version of Paul Gauthier's repomap techn…☆13Updated last year
- Red-Teaming Language Models with DSPy☆212Updated 6 months ago
- Sandboxed code execution for AI agents, locally or on the cloud. Massively parallel, easy to extend. Powering SWE-agent and more.☆296Updated last week
- ☆133Updated 5 months ago
- Just a bunch of benchmark logs for different LLMs☆120Updated last year
- Alice in Wonderland code base for experiments and raw experiments data☆131Updated last month
- ☆145Updated 2 months ago
- Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"☆61Updated 8 months ago
- ☆78Updated 2 weeks ago
- Official Repo for CRMArena and CRMArena-Pro☆110Updated 2 months ago
- ☆46Updated 7 months ago
- Foyle is a copilot to help developers deploy and operate their applications.☆131Updated 5 months ago
- Modular, open source LLMOps stack that separates concerns: LiteLLM unifies LLM APIs, manages routing and cost controls, and ensures high-…☆111Updated 6 months ago