dongbeiyewu / xla
☆21Updated 4 years ago
Alternatives and similar repositories for xla:
Users that are interested in xla are comparing it to the libraries listed below
- examples for tvm schedule API☆101Updated last year
- CUDA PTX-ISA Document 中文翻译版☆38Updated last month
- ☆25Updated this week
- ☆23Updated 4 years ago
- Development repository for the Triton-Linalg conversion☆185Updated 2 months ago
- Benchmark Framework for Buddy Projects☆54Updated 2 months ago
- code reading for tvm☆76Updated 3 years ago
- Efficient operation implementation based on the Cambricon Machine Learning Unit (MLU) .☆115Updated 2 weeks ago
- Machine Learning Compiler Road Map☆43Updated last year
- A home for the final text of all TVM RFCs.☆102Updated 7 months ago
- Paella: Low-latency Model Serving with Virtualized GPU Scheduling☆58Updated 11 months ago
- An MLIR-based toy DL compiler for TVM Relay.☆58Updated 2 years ago
- Triton Compiler related materials.☆28Updated 3 months ago
- ☆192Updated 2 years ago
- ☆90Updated 3 weeks ago
- ☆36Updated 3 months ago
- ☆70Updated 2 years ago
- ☆138Updated 4 months ago
- Yinghan's Code Sample☆323Updated 2 years ago
- Play with MLIR right in your browser☆134Updated last year
- Assembler and Decompiler for NVIDIA (Maxwell Pascal Volta Turing Ampere) GPUs.☆78Updated 2 years ago
- HierarchicalKV is a part of NVIDIA Merlin and provides hierarchical key-value storage to meet RecSys requirements. The key capability of…☆143Updated last week
- Dissecting NVIDIA GPU Architecture☆92Updated 2 years ago
- Some source code about matrix multiplication implementation on CUDA☆34Updated 6 years ago
- TPP experimentation on MLIR for linear algebra☆127Updated last week
- TePDist (TEnsor Program DISTributed) is an HLO-level automatic distributed system for DL models.☆94Updated 2 years ago
- A model compilation solution for various hardware☆427Updated this week
- ☆102Updated last month
- ☆115Updated 4 months ago
- Shared Middle-Layer for Triton Compilation☆246Updated last week