Software-Engineering-Arena/SWE-Chatbot-Arena

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Software-Engineering-Arena/SWE-Chatbot-Arena)

Software-Engineering-Arena / SWE-Chatbot-Arena

Compare chatbots pairwise via multi‑round evaluations for SE tasks.

☆15

Alternatives and similar repositories for SWE-Chatbot-Arena

Users that are interested in SWE-Chatbot-Arena are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

SAILResearch / AgenticSZZ
View on GitHub
☆15May 7, 2026Updated 2 months ago
AI4Scientist / learn-auto-research
View on GitHub
AutoResearch official style beginner tutorial, from 0 to 1
☆18May 8, 2026Updated 2 months ago
zhimin-z / zhimin-z
View on GitHub
☆12Jun 19, 2026Updated last month
AI4Maths / awesome-interactive-theorem-prover
View on GitHub
A curated list of awesome interactive theorem prover frameworks
☆23Jun 19, 2026Updated last month
Solo-Entrepreneur / solopreneur
View on GitHub
How to Start a Startup — AI Agent Skill
☆25Apr 17, 2026Updated 3 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
zhimin-z / Rigid-Body-Simulation
View on GitHub
☆11Oct 15, 2022Updated 3 years ago
davidlee / spec-driver
View on GitHub
You install it. Claude drives the CLI tool. At first it might seem like too much. Eventually, nothing less will make sense.
☆26Jun 2, 2026Updated last month
acnlabs / MetaSpec
View on GitHub
Meta-specification framework for AI Agents to generate Spec-driven X toolkits automatically.
☆50Nov 22, 2025Updated 7 months ago
AI4Scientist / awesome-autoresearch
View on GitHub
A curated list of awesome autonomous researcher frameworks
☆141Jul 15, 2026Updated last week
Engineering4AI / awesome-spec-driven-development
View on GitHub
A curated list of awesome resources for spec-driven development (SDD)
☆212Jul 4, 2026Updated 2 weeks ago
EthicalML / awesome-production-agentic-systems
View on GitHub
A curated list of awesome open source libraries to deploy, monitor, version and scale agentic applications and systems
☆154Jul 1, 2026Updated 3 weeks ago
SAILResearch / awesome-ai-leaderboard
View on GitHub
A curated list of awesome leaderboard-oriented resources for AI domain
☆372Updated this week
zhimin-z / awesome-awesome-artificial-intelligence
View on GitHub
A curated list of awesome curated lists of many topics related to artificial intelligence.
☆224Jul 9, 2026Updated last week
gomate-community / rageval
View on GitHub
Evaluation tools for Retrieval-augmented Generation (RAG) methods.
☆171Nov 18, 2024Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
modit-team / MODIT
View on GitHub
MODIT: On Multi-Modal Learning of Editing Source Code.
☆20Apr 24, 2021Updated 5 years ago
9erxis / DietCode
View on GitHub
☆20Mar 6, 2023Updated 3 years ago
ASSERT-KTH / cigar
View on GitHub
Efficient APR with LLMs http://arxiv.org/pdf/2402.06598
☆16May 28, 2024Updated 2 years ago
clowee / OpenSZZ-Cloud-Native
View on GitHub
SZZ Algorithm To Detect Fault-Inducing Commits
☆51Oct 24, 2023Updated 2 years ago
zzwjames / FailureLLMUnlearning
View on GitHub
An official implementation of "Catastrophic Failure of LLM Unlearning via Quantization" (ICLR 2025)
☆39Feb 22, 2025Updated last year
xiaolin-cs / BackTime
View on GitHub
BackTime: Backdoor Attacks on Multivariate Time Series Forecasting
☆32Apr 14, 2025Updated last year
EngineeringSoftware / CoditT5
View on GitHub
CoditT5: Pretraining for Source Code and Natural Language Editing
☆29Jan 16, 2025Updated last year
mdrafiqulrabin / tnpa-generalizability
View on GitHub
IST'21 & SANER'22: Semantic-Preserving Program Transformations
☆31Oct 25, 2022Updated 3 years ago
saikat107 / NatGen
View on GitHub
☆41Jan 13, 2023Updated 3 years ago
End-to-end encrypted cloud storage - Proton Drive • Ad
Special offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
NJUDeepEngine / CAEF
View on GitHub
Code for paper: "Executing Arithmetic: Fine-Tuning Large Language Models as Turing Machines"
☆11Oct 11, 2024Updated last year
SkyRiver-2000 / RuleArena
View on GitHub
[ACL 2025] RuleArena: A Benchmark for Rule-Guided Reasoning with LLMs in Real-World Scenarios
☆29Jul 2, 2025Updated last year
datawhalechina / key-book
View on GitHub
《机器学习理论导引》（宝箱书）的证明、案例、概念补充与参考文献讲解。
☆1,710Jul 7, 2026Updated 2 weeks ago
lunary-ai / llm-benchmarks
View on GitHub
LLM benchmarks
☆13Feb 22, 2024Updated 2 years ago
CLUEbenchmark / SuperCLUE-Fin
View on GitHub
中文金融大模型测评基准，六大类二十五任务、等级化评价，国内模型获得A级
☆10May 6, 2024Updated 2 years ago
NougatCA / SPT-Code
View on GitHub
☆49Nov 19, 2025Updated 8 months ago
soarsmu / attack-pretrain-models-of-code
View on GitHub
Replication Package for "Natural Attack for Pre-trained Models of Code", ICSE 2022
☆52May 31, 2026Updated last month
hannahxchen / automatic-paraphrase-dataset-augmentation
View on GitHub
Code and data for automatic paraphrase dataset augmentation.
☆11Mar 8, 2021Updated 5 years ago
SciMT / SciMT-benchmark
View on GitHub
☆11Jan 3, 2024Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
kennymckormick / ARAS-Dataset
View on GitHub
☆11Nov 5, 2024Updated last year
CLUEbenchmark / LGEB
View on GitHub
LGEB: Benchmark of Language Generation Evaluation
☆16Oct 21, 2022Updated 3 years ago
StonyBrookNLP / tellmewhy
View on GitHub
Website for release of TellMeWhy dataset for why question answering
☆14Nov 11, 2022Updated 3 years ago
magkai / CROWN
View on GitHub
Code for our project CROWN (Conversational Passage Ranking by Reasoning over Word Networks)
☆10Jan 11, 2024Updated 2 years ago
HallerPatrick / pecc
View on GitHub
[LREC-Coling 2024] PECC: Problem Extraction and Coding Challenges
☆14May 30, 2024Updated 2 years ago
gipplab / MathQA
View on GitHub
Math-aware QA system
☆18May 8, 2026Updated 2 months ago
csitfun / ConTRoL-dataset
View on GitHub
Dataset for AAAI paper "Natural Language Inference in Context - Investigating Contextual Reasoning over Long Texts"
☆11Nov 18, 2022Updated 3 years ago