triton for dsa
☆57Feb 12, 2026Updated 2 weeks ago
Alternatives and similar repositories for DLCompiler
Users that are interested in DLCompiler are comparing it to the libraries listed below
Sorting:
- DLBlas: clean and efficient kernels☆33Updated this week
- OpenVINO LLM Benchmark☆11Dec 7, 2023Updated 2 years ago
- Triton adapter for Ascend. Mirror of https://gitcode.com/ascend/triton-ascend☆110Updated this week
- LLVM/MLIR based compiler instrumentation of AMD GPU kernels☆20Jul 13, 2025Updated 7 months ago
- A Triton JIT runtime and ffi provider in C++☆31Updated this week
- LLM Inference via Triton (Flexible & Modular): Focused on Kernel Optimization using CUBIN binaries, Starting from gpt-oss Model☆64Oct 18, 2025Updated 4 months ago
- This is a cross-chip platform collection of operators and a unified neural network library.☆16Nov 3, 2023Updated 2 years ago
- A collection of pre-compiled, state-of-the-art models in the AXera‘s format☆22Apr 9, 2023Updated 2 years ago
- Shared Middle-Layer for Triton Compilation☆329Dec 5, 2025Updated 2 months ago
- MSLK (Meta Superintelligence Labs Kernels) is a collection of PyTorch GPU operator libraries that are designed and optimized for GenAI tr…☆52Updated this week
- tutorials about polyhedral compilation.☆61Feb 9, 2026Updated 2 weeks ago
- LLVM Backend tutorial Cpu0☆26Nov 5, 2023Updated 2 years ago
- incubator repo for CUDA-TileIR backend☆106Feb 14, 2026Updated 2 weeks ago
- DeeperGEMM: crazy optimized version☆74May 5, 2025Updated 9 months ago
- Transformers components but in Triton☆34May 9, 2025Updated 9 months ago
- SBLP 2025 MLIR Tutorial☆70Feb 8, 2026Updated 2 weeks ago
- ☆87Updated this week
- NVIDIA NVSHMEM is a parallel programming interface for NVIDIA GPUs based on OpenSHMEM. NVSHMEM can significantly reduce multi-process com…☆469Updated this week
- FlagGems is an operator library for large language models implemented in the Triton Language.☆904Updated this week
- Development repository for the Triton-Linalg conversion☆215Feb 7, 2025Updated last year
- High Performance LLM Inference Operator Library☆739Feb 5, 2026Updated 3 weeks ago
- NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer☆163Feb 11, 2026Updated 2 weeks ago
- ☆97Mar 26, 2025Updated 11 months ago
- WaferLLM: Large Language Model Inference at Wafer Scale☆90Jan 7, 2026Updated last month
- MLIR-based toolkit targeting intel heterogeneous hardware☆50Updated this week
- ☆168Updated this week
- TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.☆106Jun 28, 2025Updated 8 months ago
- hadoop 的 docker 集群配置☆11Jun 8, 2024Updated last year
- Math24o: 高中奥林匹克数学竞赛测评集 High School Olympiad Mathematics Chinese Benchmark☆11Mar 27, 2025Updated 11 months ago
- ☆22Dec 11, 2025Updated 2 months ago
- a tensor computing compiler based tile programming for gpu, cpu or tpu☆45Feb 2, 2026Updated 3 weeks ago
- ☆18Feb 16, 2025Updated last year
- learn llvm from scratch☆14Apr 29, 2023Updated 2 years ago
- 面向多平台编译优化的深度学习中间表示☆10Oct 28, 2024Updated last year
- ☆11Oct 31, 2024Updated last year
- QuTLASS: CUTLASS-Powered Quantized BLAS for Deep Learning☆168Nov 11, 2025Updated 3 months ago
- ☆134May 29, 2025Updated 9 months ago
- a simple general program language☆100Feb 2, 2026Updated 3 weeks ago
- Supplementary material for our paper "Compute Trends Across Three Eras of Machine Learning".☆45Mar 12, 2022Updated 3 years ago