SWE-PolyBench: A multi-language benchmark for repository level evaluation of coding agents
☆81Feb 6, 2026Updated last month
Alternatives and similar repositories for SWE-PolyBench
Users that are interested in SWE-PolyBench are comparing it to the libraries listed below
Sorting:
- Official implementation for the paper, StackEval: Benchmarking LLMs in Coding Assistance, https://arxiv.org/abs/2412.05288☆20Oct 30, 2024Updated last year
- Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving☆326Dec 18, 2025Updated 3 months ago
- Building an Intelligent AWS Cloud Engineer Agent with Strands Agents SDK☆24Dec 16, 2025Updated 3 months ago
- ☆11Sep 7, 2023Updated 2 years ago
- Agentless Lite: RAG-based SWE-Bench software engineering scaffold☆45Apr 15, 2025Updated 11 months ago
- ☆37May 15, 2025Updated 10 months ago
- ☆28Aug 13, 2025Updated 7 months ago
- Viewer for text datasets in formats like HuggingFace, JSONL, etc.☆15Feb 25, 2025Updated last year
- [NeurIPS 2025 D&B Spotlight] Scaling Data for SWE-agents☆597Updated this week
- Code for Paper: Training Software Engineering Agents and Verifiers with SWE-Gym [ICML 2025]☆650Jul 29, 2025Updated 7 months ago
- A collection of scripts and tools for analyzing SWE agents.☆16May 7, 2025Updated 10 months ago
- ☆13Apr 2, 2018Updated 7 years ago
- Building RESTful API with Laravel [Video], published by Packt☆12Jan 15, 2021Updated 5 years ago
- The official repository of the Omni-MATH benchmark.☆93Dec 22, 2024Updated last year
- LLM benchmarks☆13Feb 22, 2024Updated 2 years ago
- A repository of code examples to accompany the LSU CSC7809/7700/47000 course on AI foundation models.☆13Apr 5, 2025Updated 11 months ago
- This repo contains the dataset and code for the paper "SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software E…☆1,439Jul 18, 2025Updated 8 months ago
- SWE-Flow: Synthesizing Software Engineering Data in a Test-Driven Manner☆35Jun 29, 2025Updated 8 months ago
- ☆11Mar 15, 2024Updated 2 years ago
- Mass Android app vulnerability analysis toolkit☆13Dec 6, 2016Updated 9 years ago
- [NeurIPS 2024] Evaluation harness for SWT-Bench, a benchmark for evaluating LLM repository-level test-generation☆73Mar 13, 2026Updated last week
- Capsule networks can defend against adversarial attacks using reconstruction error☆13May 24, 2018Updated 7 years ago
- Code for our paper "Learning to Generate Unit Tests for Automated Debugging"☆17Mar 7, 2025Updated last year
- Training tiny models to prove hard theorems☆59Mar 5, 2026Updated 2 weeks ago
- CLI to extract article contents in bulk using Newspaper3k and multithreading.☆12Apr 15, 2018Updated 7 years ago
- Open-source coding assistant for Visual Studio Code. Connect to LLMs from OpenAI or Google.☆18Aug 14, 2023Updated 2 years ago
- Code for "SCL-RAI: Span-based Contrastive Learning with Retrieval Augmented Inference for Unlabeled Entity Problem in NER" @COLING-2022☆11Aug 20, 2022Updated 3 years ago
- Simple pub/sub architecture with AWS Copilot☆10Feb 20, 2026Updated 3 weeks ago
- ☆25Mar 2, 2026Updated 2 weeks ago
- ☆11Oct 17, 2019Updated 6 years ago
- Minimum DevSecOps with Monitoring Options on Amazon EKS☆13Feb 25, 2026Updated 3 weeks ago
- AI for Mathematics Paper List☆17Jan 14, 2025Updated last year
- The very simple ETS wrapper simplifying cross-process ETS handling (like `Agent`, but `:ets`).☆13Jun 7, 2019Updated 6 years ago
- leveldb backed mail repl.☆10May 5, 2015Updated 10 years ago
- Verification Layer for Claude Code☆87Updated this week
- Multilingual Code Co-Evolution Using Large Language Models☆13Dec 8, 2024Updated last year
- ☆13Aug 12, 2022Updated 3 years ago
- A game for experimenting with sensorimotor AI.☆16May 9, 2014Updated 11 years ago
- ☆12May 13, 2022Updated 3 years ago