itbench-hub / ITBench-ScenariosLinks
Code repository for scenarios and environment setup as part of ITBench
☆14Updated last week
Alternatives and similar repositories for ITBench-Scenarios
Users that are interested in ITBench-Scenarios are comparing it to the libraries listed below
Sorting:
- A holistic framework to enable the design, development, and evaluation of autonomous AIOps agents.☆12Updated 7 months ago
- ☆11Updated last year
- The repository for paper "DebugBench: "Evaluating Debugging Capability of Large Language Models".☆85Updated last year
- [ICSE'25] Aligning the Objective of LLM-based Program Repair☆23Updated 10 months ago
- [NeurIPS'25] Official Implementation of RISE (Reinforcing Reasoning with Self-Verification)☆31Updated 5 months ago
- [ICLR'25] OpenRCA: Can Large Language Models Locate the Root Cause of Software Failures?☆242Updated last week
- Pip compatible CodeBLEU metric implementation available for linux/macos/win☆128Updated 9 months ago
- LILAC: Log Parsing using LLMs with Adaptive Parsing Cache [FSE'24]☆64Updated last year
- Enhancing AI Software Engineering with Repository-level Code Graph☆246Updated 9 months ago
- ☆57Updated last year
- A lightweight tool for detecting bugs on Graph Database Management Systems☆15Updated 2 years ago
- This repo is for our submission for ICSE 2025.☆20Updated last year
- A toolkit for hybrid log parsing☆18Updated 2 years ago
- [ICLR 2024]: Is Self-Repair a Silver Bullet for Code Generation?☆15Updated last year
- Benchmark ClassEval for class-level code generation.☆145Updated last year
- RepairAgent is an autonomous LLM-based agent for software repair.☆81Updated 5 months ago
- [LREC-COLING'24] HumanEval-XL: A Multilingual Code Generation Benchmark for Cross-lingual Natural Language Generalization☆40Updated 10 months ago
- A Code Efficiency Benchmark for Code Generation☆13Updated 7 months ago
- Large Language Models for Software Engineering☆259Updated 5 months ago
- [ICSE 2023] Differentiable interpretation and failure-inducing input generation for neural network numerical bugs.☆13Updated 2 years ago
- A collection of practical code generation tasks and tests in open source projects. Complementary to HumanEval by OpenAI.☆154Updated last year
- AutoLog: A Log Sequence Synthesis Framework for Anomaly Detection [ASE'23]☆41Updated last year
- [VLDB'2025] LEAP: LLM-powered End-to-end Automatic Library for Processing Social Science Queries on Unstructured Data☆19Updated 2 months ago
- EvoEval: Evolving Coding Benchmarks via LLM☆80Updated last year
- An Evolving Code Generation Benchmark Aligned with Real-world Code Repositories☆67Updated last year
- Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving☆308Updated last month
- Cloud incidents/failures related work.☆20Updated last year
- ☆33Updated 11 months ago
- [ICML '24] R2E: Turn any GitHub Repository into a Programming Agent Environment☆138Updated 9 months ago
- Code and data for the paper: On the Resilience of LLM-Based Multi-Agent Collaboration with Faulty Agents☆41Updated last month