fmperf-project / fmperf
Cloud Native Benchmarking of Foundation Models
☆24Updated 4 months ago
Alternatives and similar repositories for fmperf:
Users that are interested in fmperf are comparing it to the libraries listed below
- A tool to detect infrastructure issues on cloud native AI systems☆26Updated 2 weeks ago
- NVIDIA NCCL Tests for Distributed Training☆82Updated this week
- SpotServe: Serving Generative Large Language Models on Preemptible Instances☆112Updated last year
- Efficient and easy multi-instance LLM serving☆326Updated this week
- Predict the performance of LLM inference services☆15Updated 8 months ago
- The driver for LMCache core to run in vLLM☆32Updated last month
- An Operator for deployment and maintenance of NVIDIA NIMs and NeMo microservices in a Kubernetes environment.☆87Updated this week
- Automatic tuning for ML model deployment on Kubernetes☆81Updated 4 months ago
- LLM Serving Performance Evaluation Harness☆70Updated 2 weeks ago
- Microsoft Collective Communication Library☆60Updated 3 months ago
- InstaSlice Operator facilitates slicing of accelerators using stable APIs☆29Updated this week
- Fine-grained GPU sharing primitives☆141Updated 5 years ago
- Holistic job manager on Kubernetes☆112Updated last year
- GenAI inference performance benchmarking tool☆19Updated this week
- GPU scheduler for elastic/distributed deep learning workloads in Kubernetes cluster (IC2E'23)☆34Updated last year
- A resilient distributed training framework☆89Updated 11 months ago
- ☆54Updated 5 months ago
- A low-latency & high-throughput serving engine for LLMs☆319Updated last month
- An interference-aware scheduler for fine-grained GPU sharing☆127Updated last month
- NCCL Fast Socket is a transport layer plugin to improve NCCL collective communication performance on Google Cloud.☆116Updated last year
- A library developed by Volcano Engine for high-performance reading and writing of PyTorch model files.☆14Updated 2 months ago
- Pretrain, finetune and serve LLMs on Intel platforms with Ray☆120Updated 2 weeks ago
- An efficient GPU resource sharing system with fine-grained control for Linux platforms.☆78Updated 11 months ago
- ☆87Updated 4 months ago
- An experimental parallel training platform☆54Updated 11 months ago
- A ChatGPT(GPT-3.5) & GPT-4 Workload Trace to Optimize LLM Serving Systems☆148Updated 4 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆58Updated this week
- Code for "Heterogenity-Aware Cluster Scheduling Policies for Deep Learning Workloads", which appeared at OSDI 2020☆126Updated 7 months ago
- Artifact of OSDI '24 paper, ”Llumnix: Dynamic Scheduling for Large Language Model Serving“☆60Updated 9 months ago