bytedance / byteir
A model compilation solution for various hardware
☆377Updated this week
Related projects ⓘ
Alternatives and complementary repositories for byteir
- Development repository for the Triton-Linalg conversion☆148Updated 3 weeks ago
- ☆195Updated last year
- Shared Middle-Layer for Triton Compilation☆185Updated this week
- Yinghan's Code Sample☆284Updated 2 years ago
- An MLIR-based compiler framework bridges DSLs (domain-specific languages) to DSAs (domain-specific architectures).☆518Updated this week
- Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruct…☆296Updated 2 months ago
- A home for the final text of all TVM RFCs.☆101Updated last month
- Hands-On Practical MLIR Tutorial☆333Updated last year
- A Easy-to-understand TensorOp Matmul Tutorial☆287Updated last month
- A simple high performance CUDA GEMM implementation.☆334Updated 10 months ago
- Optimizing SGEMM kernel functions on NVIDIA GPUs to a close-to-cuBLAS performance.☆276Updated 2 years ago
- AI Accelerator Benchmark focuses on evaluating AI Accelerators from a practical production perspective, including the ease of use and ver…☆200Updated 3 weeks ago
- row-major matmul optimization☆590Updated last year
- heterogeneity-aware-lowering-and-optimization☆253Updated 9 months ago
- FlagGems is an operator library for large language models implemented in Triton Language.☆328Updated this week
- ☆398Updated this week
- Efficient operation implementation based on the Cambricon Machine Learning Unit (MLU) .☆103Updated this week
- BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.☆815Updated 2 months ago
- A fast communication-overlapping library for tensor parallelism on GPUs.☆217Updated last week
- MegCC是一个运行时超轻量,高效,移植简单的深度学习模型编译器☆474Updated 2 weeks ago
- ☆136Updated this week
- ☆78Updated 8 months ago
- ☆189Updated last month
- how to learn PyTorch and OneFlow☆347Updated 7 months ago
- An unofficial cuda assembler, for all generations of SASS, hopefully :)☆401Updated last year
- optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052☆457Updated 7 months ago
- ☆79Updated last year
- collection of benchmarks to measure basic GPU capabilities☆264Updated 4 months ago
- GLake: optimizing GPU memory management and IO transmission.☆375Updated 3 months ago
- PET: Optimizing Tensor Programs with Partially Equivalent Transformations and Automated Corrections☆114Updated 2 years ago