ml-energy / zeus
Measure and optimize the energy consumption of your AI applications!
☆244Updated this week
Alternatives and similar repositories for zeus:
Users that are interested in zeus are comparing it to the libraries listed below
- How much energy do GenAI models consume?☆42Updated 5 months ago
- A large-scale simulation framework for LLM inference☆355Updated 4 months ago
- SpotServe: Serving Generative Large Language Models on Preemptible Instances☆112Updated last year
- [ICLR'25] Fast Inference of MoE Models with CPU-GPU Orchestration☆201Updated 4 months ago
- ACT An Architectural Carbon Modeling Tool for Designing Sustainable Computer Systems☆36Updated last month
- A resilient distributed training framework☆91Updated 11 months ago
- Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity☆202Updated last year
- A low-latency & high-throughput serving engine for LLMs☆330Updated last month
- LLM Serving Performance Evaluation Harness☆73Updated last month
- Multi-Instance-GPU profiling tool☆57Updated last year
- ☆45Updated 9 months ago
- 🏙 Interactive performance profiling and debugging tool for PyTorch neural networks.☆59Updated 2 months ago
- ☆204Updated 2 months ago
- QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference☆116Updated last year
- Dynamic Memory Management for Serving LLMs without PagedAttention☆326Updated this week
- PyTorch library for cost-effective, fast and easy serving of MoE models.☆157Updated this week
- [OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable☆150Updated 6 months ago
- A ChatGPT(GPT-3.5) & GPT-4 Workload Trace to Optimize LLM Serving Systems☆156Updated 5 months ago
- KernelBench: Can LLMs Write GPU Kernels? - Benchmark with Torch -> CUDA problems☆239Updated this week
- Fast Matrix Multiplications for Lookup Table-Quantized LLMs☆235Updated last month
- extensible collectives library in triton☆84Updated 6 months ago
- 🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.☆189Updated this week
- LLM serving cluster simulator☆94Updated 11 months ago
- [ICML 2024 Oral] Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs☆100Updated 3 months ago
- LLM KV cache compression made easy☆442Updated last week
- Fast and Efficient Model Serving Using Multi-GPUs with Direct-Host-Access (ACM EuroSys '23)☆57Updated last year
- [NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization☆337Updated 7 months ago
- Stateful LLM Serving☆50Updated 2 weeks ago
- FlexFlow Serve: Low-Latency, High-Performance LLM Serving☆32Updated this week
- NVIDIA Inference Xfer Library (NIXL)☆191Updated this week