ml-energy / zeus
Deep Learning Energy Measurement and Optimization
☆230Updated this week
Alternatives and similar repositories for zeus:
Users that are interested in zeus are comparing it to the libraries listed below
- How much energy do GenAI models consume?☆41Updated 3 months ago
- ☆244Updated 5 months ago
- A resilient distributed training framework☆88Updated 9 months ago
- Microsoft Collective Communication Library☆60Updated last month
- A library to analyze PyTorch traces.☆324Updated last month
- Applied AI experiments and examples for PyTorch☆211Updated this week
- A performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mind…☆152Updated last month
- Cataloging released Triton kernels.☆156Updated last week
- Fast Inference of MoE Models with CPU-GPU Orchestration☆179Updated 2 months ago
- extensible collectives library in triton☆76Updated 3 months ago
- PArametrized Recommendation and Ai Model benchmark is a repository for development of numerous uBenchmarks as well as end to end nets for…☆128Updated last week
- Collection of kernels written in Triton language☆90Updated 2 months ago
- Fast low-bit matmul kernels in Triton☆187Updated last week
- KernelBench: Can LLMs Write GPU Kernels? - Benchmark with Torch -> CUDA problems☆91Updated this week
- CUDA checkpoint and restore utility☆268Updated 9 months ago
- QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference☆114Updated 10 months ago
- Efficient and easy multi-instance LLM serving☆278Updated this week
- Cloud Native Benchmarking of Foundation Models☆21Updated 2 months ago
- Fast Matrix Multiplications for Lookup Table-Quantized LLMs☆219Updated this week
- A ChatGPT(GPT-3.5) & GPT-4 Workload Trace to Optimize LLM Serving Systems☆142Updated 3 months ago
- PyTorch library for cost-effective, fast and easy serving of MoE models.☆112Updated last month
- 🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.☆182Updated this week
- ☆190Updated last month
- ACT An Architectural Carbon Modeling Tool for Designing Sustainable Computer Systems☆35Updated last month
- A low-latency & high-throughput serving engine for LLMs☆296Updated 4 months ago
- Multi-Instance-GPU profiling tool☆56Updated last year
- MSCCL++: A GPU-driven communication stack for scalable AI applications☆284Updated this week
- ☆64Updated 2 months ago
- Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity☆195Updated last year
- Dynamic Memory Management for Serving LLMs without PagedAttention☆272Updated last month