Predict the performance of LLM inference services
☆23Sep 18, 2025Updated 9 months ago
Alternatives and similar repositories for LLM-performance-prediction
Users that are interested in LLM-performance-prediction are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Cloud Native Benchmarking of Foundation Models☆45Jul 31, 2025Updated 11 months ago
- SpotServe: Serving Generative Large Language Models on Preemptible Instances☆134Feb 22, 2024Updated 2 years ago
- Serverless Paper Reading and Discussion☆38Jan 9, 2023Updated 3 years ago
- Failure dataset accompanying the paper "How Bad Can a Bug Get? An Empirical Analysis of Software Failures in the OpenStack Cloud Computi…☆10Jun 12, 2020Updated 6 years ago
- ☆20May 14, 2025Updated last year
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Simulator for the datacenter, including power, cooling, server and other components☆19Feb 12, 2025Updated last year
- ☆20May 10, 2025Updated last year
- Releasing the spot availability traces used in "Can't Be Late" paper.☆26Mar 31, 2024Updated 2 years ago
- Artifact for "Apparate: Rethinking Early Exits to Tame Latency-Throughput Tensions in ML Serving" [SOSP '24]☆24Nov 21, 2024Updated last year
- Accurate, large-scale, and extensible simulator for LLM inference Systems☆627Jul 25, 2025Updated 11 months ago
- ☆179Mar 12, 2024Updated 2 years ago
- TraceWeaver is a research prototype for transparently tracing requests through a microservice without application instrumentation.☆23Sep 2, 2024Updated last year
- LLM Inference analyzer for different hardware platforms☆115Jun 23, 2026Updated last week
- ☆13Jun 20, 2025Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ☆20Sep 25, 2023Updated 2 years ago
- Know Your Enemy To Save Cloud Energy: Energy-Performance Characterization of Machine Learning Serving (HPCA '23)☆14Jun 20, 2025Updated last year
- ☆10Dec 10, 2024Updated last year
- ☆30Mar 20, 2022Updated 4 years ago
- 🧯 Kubernetes coverage for fault awareness and recovery, works for any LLMOps, MLOps, AI workloads.☆35Jun 23, 2026Updated last week
- A tool to detect infrastructure issues on cloud native AI systems☆53Sep 18, 2025Updated 9 months ago
- Dynamic batching library for Deep Learning inference. Tutorials for LLM, GPT scenarios.☆106Aug 14, 2024Updated last year
- Official repository for paper "KeyEE: Enhancing Low-resource Generative Event Extraction with Auxiliary Keyword Sub-Prompt"☆10Jun 5, 2024Updated 2 years ago
- Pytorch implementation for the pilot study on the robustness of latent diffusion models.☆12Jun 20, 2023Updated 3 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- LLM Serving Performance Evaluation Harness☆84Feb 25, 2025Updated last year
- CAShift: Benchmarking Log-Based Cloud Attack Detection under Normality Shift (FSE 2025)☆15Updated this week
- From Task-based to Instruction-based Automated Log Analysis☆25Jan 7, 2025Updated last year
- E-commerce search benchmark is the first end-to-end application benchmark for e-commerce search system with personalized recommendations.…☆44Feb 15, 2023Updated 3 years ago
- Code for SIGKDD2025 paper: An Efficient Diffusion-based Non-Autoregressive Solver for Traveling Salesman Problem☆14Jan 28, 2025Updated last year
- Official Tensorflow implementation for "Improving the Transferability of Adversarial Samples by Path-Augmented Method" (CVPR 2023).☆12Jun 16, 2023Updated 3 years ago
- An Observability Framework for AI Training☆71Updated this week
- This repository manifests set which is made to build a prototype system of TraceZip, made by 4 pieces.☆14Jul 17, 2025Updated 11 months ago
- ☆17May 29, 2025Updated last year
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- ☆10Jun 4, 2024Updated 2 years ago
- A Suite for Parallel Inference of Diffusion Transformers (DiTs) on multi-GPU Clusters☆58May 3, 2026Updated last month
- Deduplication over dis-aggregated memory for Serverless Computing☆14Mar 21, 2022Updated 4 years ago
- ☆12Apr 23, 2026Updated 2 months ago
- [OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable☆222Sep 21, 2024Updated last year
- Microsoft question-answering dataset☆10Jun 16, 2023Updated 3 years ago
- Secure and Scalable Federated Learning using Serverless Computing☆13Jan 31, 2024Updated 2 years ago