☆50Apr 7, 2026Updated 2 months ago
Alternatives and similar repositories for SWE-QA-Bench
Users that are interested in SWE-QA-Bench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- something for paper agent☆11Dec 18, 2024Updated last year
- A framework for evaluating the effectiveness of chain-of-thought reasoning in language models.☆19Feb 6, 2025Updated last year
- Multi-Granularity LLM Debugger [ICSE2026]☆98Jul 6, 2025Updated 11 months ago
- ☆49Oct 28, 2025Updated 7 months ago
- Interface for GenAI-Arena [NeurIPS24]☆17Feb 27, 2024Updated 2 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Must-read papers on Repository-level Code Generation & Issue Resolution 🔥☆308May 23, 2026Updated 3 weeks ago
- Guide: from fragile multi-agent app to prod ready with orra - code and resources.☆14Mar 24, 2025Updated last year
- AI powered coding Agent☆37Oct 22, 2025Updated 7 months ago
- This is the repository for the paper titled "ThinkRepair: Self-Directed Automated Program Repair" accepted by ISSTA'24.☆32Jan 10, 2026Updated 5 months ago
- [NeurIPS 2024] Evaluation harness for SWT-Bench, a benchmark for evaluating LLM repository-level test-generation☆83Apr 28, 2026Updated last month
- Ultron: Collective Intelligence System — Shared Memories, Skills, and Harnesses Across Every Agent☆155Updated this week
- Defect Library for LLM-enabled Software☆25Dec 31, 2025Updated 5 months ago
- ☆22May 28, 2026Updated 3 weeks ago
- ☆168Mar 18, 2026Updated 3 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- DSN jailbreak Attack & Evaluation Ensemble☆17Feb 7, 2026Updated 4 months ago
- TDD-Bench-Verified is a new benchmark for generating test cases for test-driven development (TDD)☆31Apr 28, 2026Updated last month
- [ICLR 2026] Official Implementation of "FeatureBench: Benchmarking Agentic Coding for Complex Feature Development"☆76Updated this week
- A tool for simulating an arbitrary connection between two network endpoints☆19May 31, 2019Updated 7 years ago
- ☆48Jan 6, 2025Updated last year
- A modern, blazing-fast SQL IDE for the cloud era. Query PostgreSQL, MySQL, SQLite & MongoDB from anywhere — your browser is your new data…☆53Updated this week
- ☆17Feb 4, 2025Updated last year
- Twinkle✨: Training workbench to make your model glow.☆238Jun 12, 2026Updated last week
- ☆92Mar 30, 2026Updated 2 months ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- Implementation of a multi-turn Chain of Thought (CoT) reasoning system, powered by the Llama 3.1 70B model on Groq.☆18Sep 22, 2024Updated last year
- Robotic arm using dynamixels☆17Jan 17, 2021Updated 5 years ago
- [EMNLP 2023] CodeTransOcean: A Comprehensive Multilingual Benchmark for Code Translation☆59Nov 16, 2023Updated 2 years ago
- MCE: Clone Human Souls with LLM Native Agent Skills | 基于 LLM Agent Skills 的心智克隆工程 | Agent Skills | Mind Skills | Mind Clone☆54Dec 21, 2025Updated 5 months ago
- Advanced Shodan-based scanner for discovering, verifying, and enumerating Model Context Protocol (MCP) servers and AI infrastructure tool…☆49Jun 7, 2026Updated last week
- A production-grade implementation of an Investment Portfolio Management System created for testing LLM translation of real world legacy a…☆26Oct 30, 2024Updated last year
- ☆69Mar 12, 2026Updated 3 months ago
- Interactive HTML Sankey Charts for MoneyMoney based on Transaction Categories☆28Updated this week
- ☆44Jun 5, 2026Updated last week
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Build agents on modern serverless infra for low overhead and high-powered AI app functionality☆30Jul 3, 2025Updated 11 months ago
- ToolFuzz is a fuzzing framework designed to test your LLM Agent tools.☆41Jul 20, 2025Updated 10 months ago
- ☆125May 13, 2026Updated last month
- [ACL 2026] Repository of IPBench☆23Apr 6, 2026Updated 2 months ago
- JAX Scalify: end-to-end scaled arithmetics☆18Oct 30, 2024Updated last year
- Code for the examples presented in the talk "Training a Llama in your backyard: fine-tuning very large models on consumer hardware" given…☆15Oct 16, 2023Updated 2 years ago
- TensortRT installation and Conversion from PyTorch Models☆36Sep 14, 2020Updated 5 years ago