itbench-hub / ITBench-ScenariosLinks
Code repository for scenarios and environment setup as part of ITBench
☆12Updated this week
Alternatives and similar repositories for ITBench-Scenarios
Users that are interested in ITBench-Scenarios are comparing it to the libraries listed below
Sorting:
- A holistic framework to enable the design, development, and evaluation of autonomous AIOps agents.☆12Updated 5 months ago
- The repository for paper "DebugBench: "Evaluating Debugging Capability of Large Language Models".☆83Updated last year
- ☆11Updated 9 months ago
- [ICSE'25] Aligning the Objective of LLM-based Program Repair☆20Updated 7 months ago
- Cloud incidents/failures related work.☆19Updated 9 months ago
- LILAC: Log Parsing using LLMs with Adaptive Parsing Cache [FSE'24]☆57Updated last year
- [NeurIPS'25] Official Implementation of RISE (Reinforcing Reasoning with Self-Verification)☆30Updated 2 months ago
- [ICLR'25] OpenRCA: Can Large Language Models Locate the Root Cause of Software Failures?☆187Updated last week
- Code repository for SRE agent as part of ITBench☆18Updated last month
- Reinforcement Learning for Repository-Level Code Completion☆40Updated last year
- AutoLog: A Log Sequence Synthesis Framework for Anomaly Detection [ASE'23]☆40Updated last year
- [ICML '24] R2E: Turn any GitHub Repository into a Programming Agent Environment☆133Updated 6 months ago
- XFT: Unlocking the Power of Code Instruction Tuning by Simply Merging Upcycled Mixture-of-Experts☆35Updated last year
- A series of work towards achieving ACV.☆21Updated last month
- A toolkit for hybrid log parsing☆18Updated 2 years ago
- Log Parsing: How Far Can ChatGPT Go? (ASE 2023 - NIER Track)☆21Updated last year
- EvoEval: Evolving Coding Benchmarks via LLM☆79Updated last year
- [ICSE 2023] Differentiable interpretation and failure-inducing input generation for neural network numerical bugs.☆13Updated last year
- Code and Data to reproduce the ASPLOS'23 paper "ShapleyIQ: Influence Quantification by Shapley Values for Performance Debugging of Micros…☆13Updated last year
- ☆14Updated 6 months ago
- This is the artifact for paper “Are Machine Learning Cloud APIs Used Correctly? (#421)” in ICSE2021☆16Updated 4 years ago
- Large Language Models for Software Engineering☆250Updated 3 months ago
- A Comprehensive Benchmark for Software Development.☆115Updated last year
- DeepCrime - Mutation Testing Tool for Deep Learning Systems☆15Updated 2 years ago
- A collection of practical code generation tasks and tests in open source projects. Complementary to HumanEval by OpenAI.☆152Updated 10 months ago
- This repo is for our submission for ICSE 2025.☆20Updated last year
- Benchmark ClassEval for class-level code generation.☆145Updated last year
- A Manually-Annotated Code Generation Benchmark Aligned with Real-World Code Repositories☆33Updated last year
- ☁️ Benchmarking LLMs for Cloud Config Generation | 云场景下的大模型基准测试☆37Updated last year
- [NeurIPS 2024] Evaluation harness for SWT-Bench, a benchmark for evaluating LLM repository-level test-generation☆58Updated last month