alibaba / BladeDISCLinks

BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.

☆899

Alternatives and similar repositories for BladeDISC

Users that are interested in BladeDISC are comparing it to the libraries listed below

Sorting:

microsoft / nnfusion
A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.
☆994Updated last year
bytedance / byteir
A model compilation solution for various hardware
☆451Updated 2 months ago
OpenPPL / ppl.nn
A primitive library for neural network
☆1,363Updated 10 months ago
bytedance / ByteTransformer
optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052
☆479Updated last year
tlc-pack / relax
☆193Updated 2 years ago
FlagOpen / FlagGems
FlagGems is an operator library for large language models implemented in the Triton Language.
☆696Updated this week
MegEngine / MegCC
MegCC是一个运行时超轻量，高效，移植简单的深度学习模型编译器
☆486Updated 11 months ago
tpoisonooo / how-to-optimize-gemm
row-major matmul optimization
☆682Updated 2 months ago
Yinghan-Li / YHs_Sample
Yinghan's Code Sample
☆353Updated 3 years ago
d2l-ai / d2l-tvm
Dive into Deep Learning Compiler
☆646Updated 3 years ago
antgroup / glake
GLake: optimizing GPU memory management and IO transmission.
☆481Updated 6 months ago
mlc-ai / mlc-zh
☆617Updated last year
jiazhihao / TASO
The Tensor Algebra SuperOptimizer for Deep Learning
☆730Updated 2 years ago
alibaba / heterogeneity-aware-lowering-and-optimization
heterogeneity-aware-lowering-and-optimization
☆256Updated last year
Cambricon / triton-linalg
Development repository for the Triton-Linalg conversion
☆202Updated 8 months ago
alibaba / EasyParallelLibrary
Easy Parallel Library (EPL) is a general and efficient deep learning framework for distributed model training.
☆269Updated 2 years ago
tensorflow / mlir-hlo
☆422Updated last week
onnx / onnx-mlir
Representation and Reference Lowering of ONNX Models in MLIR Compiler Infrastructure
☆921Updated last week
pytorch / FBGEMM
FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/
☆1,453Updated this week
buddy-compiler / buddy-mlir
An MLIR-based compiler framework bridges DSLs (domain-specific languages) to DSAs (domain-specific architectures).
☆646Updated this week
bytedance / ByteMLPerf
AI Accelerator Benchmark focuses on evaluating AI Accelerators from a practical production perspective, including the ease of use and ver…
☆265Updated 2 months ago
Tencent / TPAT
TensorRT Plugin Autogen Tool
☆368Updated 2 years ago
alibaba / TePDist
TePDist (TEnsor Program DISTributed) is an HLO-level automatic distributed system for DL models.
☆97Updated 2 years ago
Oneflow-Inc / DLPerf
DeepLearning Framework Performance Profiling Toolkit
☆292Updated 3 years ago
llvm / torch-mlir
The Torch-MLIR project aims to provide first class support from the PyTorch ecosystem to the MLIR ecosystem.
☆1,652Updated this week
Jack47 / hack-SysML
The road to hack SysML and become an system expert
☆498Updated last year
Cjkkkk / CUDA_gemm
A simple high performance CUDA GEMM implementation.
☆409Updated last year
Bruce-Lee-LY / cuda_hgemm
Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruct…
☆485Updated last year
ByteDance-Seed / Triton-distributed
Distributed Compiler based on Triton for Parallel Systems
☆1,173Updated 2 weeks ago
KnowingNothing / MatmulTutorial
A Easy-to-understand TensorOp Matmul Tutorial
☆385Updated last week