autoiac-project / iac-evalLinks
[NeurIPS 24] IaC-Eval: A Code Generation Benchmark for Cloud Infrastructure-as-Code programs
☆34Updated last year
Alternatives and similar repositories for iac-eval
Users that are interested in iac-eval are comparing it to the libraries listed below
Sorting:
- Zodiac: Unearthing Semantic Checks for Cloud Infrastructure-as-Code Programs, SOSP 2024☆15Updated last year
- A holistic framework to enable the design, development, and evaluation of autonomous AIOps agents.☆12Updated 8 months ago
- Course information for CS598-Topics in LLM Agents(25'Spring) under the direction of Prof. Jiaxuan You ( jiaxuan@illinois.edu ).☆62Updated 9 months ago
- ☆35Updated 4 months ago
- AI-Driven Research Systems (ADRS)☆119Updated last month
- [OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable☆209Updated last year
- Predict the performance of LLM inference services☆21Updated 4 months ago
- [ICML '24] R2E: Turn any GitHub Repository into a Programming Agent Environment☆140Updated 9 months ago
- RouterArena: An open framework for evaluating LLM routers with standardized datasets, metrics, an automated framework, and a live leaderb…☆66Updated this week
- ☆331Updated 6 months ago
- (ACL 2025 Main) Code for MultiAgentBench : Evaluating the Collaboration and Competition of LLM agents https://www.arxiv.org/pdf/2503.019…☆217Updated 3 months ago
- A Framework for Automated Validation of Deep Learning Training Tasks☆61Updated last month
- Serverless LLM Serving for Everyone.☆647Updated 2 weeks ago
- An AI-Native Platform for Benchmarking SRE Agents☆94Updated this week
- Code repository for scenarios and environment setup as part of ITBench☆15Updated last week
- Systems for GenAI☆159Updated this week
- A canonical source of GenAI energy benchmark and meausrements☆50Updated 2 months ago
- ☁️ Benchmarking LLMs for Cloud Config Generation | 云场景下的大模型基准测试☆39Updated last year
- [NeurIPS 2025] Simple extension on vLLM to help you speed up reasoning model without training.☆220Updated 8 months ago
- ☆36Updated 11 months ago
- This is the official implementation for paper "PENCIL: Long Thoughts with Short Memory".☆73Updated 9 months ago
- ☆43Updated last year
- [NeurIPS 2024] Efficient LLM Scheduling by Learning to Rank☆69Updated last year
- "FusionFactory: Fusing LLM Capabilities with Routing Data", Tao Feng, Haozhen Zhang, Zijie Lei, Pengrui Han, Mostofa Patwary, Mohammad Sh…☆19Updated last month
- ☆47Updated last year
- ☆15Updated last year
- LDB: A Large Language Model Debugger via Verifying Runtime Execution Step by Step (ACL'24)☆576Updated last year
- The repository for paper "DebugBench: "Evaluating Debugging Capability of Large Language Models".☆85Updated last year
- [ICLR2025 Spotlight] MagicPIG: LSH Sampling for Efficient LLM Generation☆248Updated last year
- [NeurIPS 2025] CodeCrash: Exposing LLM Fragility to Misleading Natural Language in Code Reasoning☆16Updated 2 weeks ago