sophgo / tpu-mlir
Machine learning compiler based on MLIR for Sophgo TPU.
☆612Updated this week
Related projects ⓘ
Alternatives and complementary repositories for tpu-mlir
- An MLIR-based compiler framework bridges DSLs (domain-specific languages) to DSAs (domain-specific architectures).☆521Updated 2 weeks ago
- MegCC是一个运行时超轻量,高效,移植简单的深度学习模型编译器☆473Updated 3 weeks ago
- Hands-On Practical MLIR Tutorial☆338Updated last year
- how to learn PyTorch and OneFlow☆349Updated 8 months ago
- nndeploy是一款模型端到端部署框架。以多端推理以及基于有向无环图模型部署为基础,致力为用户提供跨平台、简单易用、高性能的模型部署体验。☆633Updated this week
- compiler learning resources collect.☆2,163Updated 5 months ago
- Yinghan's Code Sample☆289Updated 2 years ago
- row-major matmul optimization☆591Updated last year
- A model compilation solution for various hardware☆378Updated this week
- Development repository for the Triton-Linalg conversion☆151Updated last month
- ppl.cv is a high-performance image processing library of openPPL supporting various platforms.☆494Updated 3 weeks ago
- Optimizing SGEMM kernel functions on NVIDIA GPUs to a close-to-cuBLAS performance.☆281Updated 2 years ago
- A simple high performance CUDA GEMM implementation.☆335Updated 10 months ago
- FlagGems is an operator library for large language models implemented in Triton Language.☆343Updated this week
- A CUDA tutorial to make people learn CUDA program from 0☆195Updated 4 months ago
- ☆592Updated 5 months ago
- 校招、秋招、春招、实习好项目,带你从零动手实现支持LLama2/3和Qwen2.5的大模型推理框架。☆228Updated 2 weeks ago
- ☆221Updated last month
- A primitive library for neural network☆1,295Updated 2 weeks ago
- how to optimize some algorithm in cuda.☆1,603Updated last week
- Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruct…☆304Updated 2 months ago
- BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.☆814Updated last week
- PPL Quantization Tool (PPQ) is a powerful offline neural network quantization tool.☆1,561Updated 7 months ago
- learning how CUDA works☆169Updated 3 months ago
- A CPU tool for benchmarking the peak of floating points☆503Updated last month
- PaddlePaddle custom device implementaion. (『飞桨』自定义硬件接入实现)☆70Updated this week
- A self-learning tutorail for CUDA High Performance Programing.☆259Updated last week
- Compiler Infrastructure for Neural Networks☆143Updated last year
- ☆190Updated 2 months ago
- Free resource for the book AI Compiler Development Guide☆40Updated last year