bytedance / ByteMLPerfLinks
AI Accelerator Benchmark focuses on evaluating AI Accelerators from a practical production perspective, including the ease of use and versatility of software and hardware.
☆246Updated last week
Alternatives and similar repositories for ByteMLPerf
Users that are interested in ByteMLPerf are comparing it to the libraries listed below
Sorting:
- ☆66Updated last week
- A model compilation solution for various hardware☆437Updated last week
- ☆148Updated 5 months ago
- ☆139Updated last year
- A prefill & decode disaggregated LLM serving framework with shared GPU memory and fine-grained compute isolation.☆87Updated last month
- A lightweight design for computation-communication overlap.☆141Updated this week
- DeepSeek-V3/R1 inference performance simulator☆148Updated 2 months ago
- Development repository for the Triton-Linalg conversion☆188Updated 4 months ago
- PyTorch distributed training acceleration framework☆49Updated 4 months ago
- MSCCL++: A GPU-driven communication stack for scalable AI applications☆378Updated this week
- Dynamic Memory Management for Serving LLMs without PagedAttention☆396Updated 3 weeks ago
- ☆58Updated 7 months ago
- Examples of CUDA implementations by Cutlass CuTe☆195Updated 4 months ago
- ☆127Updated 5 months ago
- GLake: optimizing GPU memory management and IO transmission.☆467Updated 2 months ago
- A collection of memory efficient attention operators implemented in the Triton language.☆272Updated last year
- FlagGems is an operator library for large language models implemented in the Triton Language.☆573Updated this week
- ☆122Updated 6 months ago
- Yinghan's Code Sample☆330Updated 2 years ago
- Shared Middle-Layer for Triton Compilation☆255Updated this week
- Microsoft Collective Communication Library☆349Updated last year
- A low-latency & high-throughput serving engine for LLMs☆379Updated 3 weeks ago
- A baseline repository of Auto-Parallelism in Training Neural Networks☆143Updated 2 years ago
- A benchmark suited especially for deep learning operators☆42Updated 2 years ago
- ☆212Updated 11 months ago
- TePDist (TEnsor Program DISTributed) is an HLO-level automatic distributed system for DL models.☆94Updated 2 years ago
- FlagScale is a large model toolkit based on open-sourced projects.☆301Updated this week
- Perplexity GPU Kernels☆364Updated last week
- ☆97Updated 2 months ago
- ☆146Updated 5 months ago