Predict the performance of LLM inference services
☆23Sep 18, 2025Updated 7 months ago
Alternatives and similar repositories for LLM-performance-prediction
Users that are interested in LLM-performance-prediction are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Cloud Native Benchmarking of Foundation Models☆45Jul 31, 2025Updated 9 months ago
- SpotServe: Serving Generative Large Language Models on Preemptible Instances☆134Feb 22, 2024Updated 2 years ago
- Serverless Paper Reading and Discussion☆38Jan 9, 2023Updated 3 years ago
- Failure dataset accompanying the paper "How Bad Can a Bug Get? An Empirical Analysis of Software Failures in the OpenStack Cloud Computi…☆10Jun 12, 2020Updated 5 years ago
- Simulator for the datacenter, including power, cooling, server and other components☆17Feb 12, 2025Updated last year
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- LangBench applications and scripts☆14Jun 7, 2023Updated 2 years ago
- ☆20May 10, 2025Updated 11 months ago
- Releasing the spot availability traces used in "Can't Be Late" paper.☆26Mar 31, 2024Updated 2 years ago
- Accurate, large-scale, and extensible simulator for LLM inference Systems☆595Jul 25, 2025Updated 9 months ago
- ☆178Mar 12, 2024Updated 2 years ago
- TraceWeaver is a research prototype for transparently tracing requests through a microservice without application instrumentation.☆23Sep 2, 2024Updated last year
- LLM Inference analyzer for different hardware platforms☆111Apr 20, 2026Updated last week
- ☆13Jun 20, 2025Updated 10 months ago
- ☆18Oct 31, 2022Updated 3 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- ☆10Dec 10, 2024Updated last year
- Know Your Enemy To Save Cloud Energy: Energy-Performance Characterization of Machine Learning Serving (HPCA '23)☆14Jun 20, 2025Updated 10 months ago
- ☆30Mar 20, 2022Updated 4 years ago
- 🧯 Kubernetes coverage for fault awareness and recovery, works for any LLMOps, MLOps, AI workloads.☆35Apr 19, 2026Updated last week
- A tool to detect infrastructure issues on cloud native AI systems☆53Sep 18, 2025Updated 7 months ago
- Dynamic batching library for Deep Learning inference. Tutorials for LLM, GPT scenarios.☆106Aug 14, 2024Updated last year
- LLM Serving Performance Evaluation Harness☆85Feb 25, 2025Updated last year
- CAShift: Benchmarking Log-Based Cloud Attack Detection under Normality Shift (FSE 2025)☆13May 19, 2025Updated 11 months ago
- Collect information about 2018 CS courses in CSE of SYSU.☆11Jun 29, 2022Updated 3 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- From Task-based to Instruction-based Automated Log Analysis☆23Jan 7, 2025Updated last year
- E-commerce search benchmark is the first end-to-end application benchmark for e-commerce search system with personalized recommendations.…☆44Feb 15, 2023Updated 3 years ago
- Official Tensorflow implementation for "Improving the Transferability of Adversarial Samples by Path-Augmented Method" (CVPR 2023).☆12Jun 16, 2023Updated 2 years ago
- Code for SIGKDD2025 paper: An Efficient Diffusion-based Non-Autoregressive Solver for Traveling Salesman Problem☆14Jan 28, 2025Updated last year
- An Observability Framework for AI Training☆70Mar 25, 2026Updated last month
- This repository manifests set which is made to build a prototype system of TraceZip, made by 4 pieces.☆14Jul 17, 2025Updated 9 months ago
- ☆82Apr 22, 2026Updated last week
- ☆17May 29, 2025Updated 11 months ago
- A Suite for Parallel Inference of Diffusion Transformers (DiTs) on multi-GPU Clusters☆58Jul 23, 2024Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Deduplication over dis-aggregated memory for Serverless Computing☆14Mar 21, 2022Updated 4 years ago
- ☆12Apr 23, 2026Updated last week
- [OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable☆214Sep 21, 2024Updated last year
- Microsoft question-answering dataset☆10Jun 16, 2023Updated 2 years ago
- ☆15Apr 13, 2024Updated 2 years ago
- Secure and Scalable Federated Learning using Serverless Computing☆12Jan 31, 2024Updated 2 years ago
- ☆16Jan 14, 2025Updated last year