ptillet / triton-llvm-releases
☆20Updated last year
Alternatives and similar repositories for triton-llvm-releases:
Users that are interested in triton-llvm-releases are comparing it to the libraries listed below
- TileFusion is a highly efficient C++ macro kernel template library designed to elevate the level of abstraction in CUDA C for processing…☆63Updated this week
- Benchmark tests supporting the TiledCUDA library.☆15Updated 3 months ago
- GPTQ inference TVM kernel☆39Updated 10 months ago
- FlexAttention w/ FlashAttention3 Support☆26Updated 5 months ago
- Awesome Triton Resources☆20Updated 3 months ago
- No-GIL Python environment featuring NVIDIA Deep Learning libraries.☆43Updated last week
- Code for Large Graph Convolutional Network Training with GPU-Oriented Data Communication Architecture (accepted by PVLDB).The outdated wr…☆9Updated last year
- APPy (Annotated Parallelism for Python) enables users to annotate loops and tensor expressions in Python with compiler directives akin to…☆23Updated 3 weeks ago
- ☆11Updated 3 years ago
- ☆24Updated 2 months ago
- ☆22Updated last year
- ☆49Updated last year
- Memory Optimizations for Deep Learning (ICML 2023)☆62Updated 11 months ago
- Yet another Polyhedra Compiler for DeepLearning☆19Updated last year
- Framework to reduce autotune overhead to zero for well known deployments.☆62Updated 2 weeks ago
- Inference framework for MoE layers based on TensorRT with Python binding☆41Updated 3 years ago
- ☆15Updated 5 months ago
- CUDA 12.2 HMM demos☆19Updated 7 months ago
- ☆61Updated 2 weeks ago
- Hacks for PyTorch☆18Updated last year
- An external memory allocator example for PyTorch.☆14Updated 3 years ago
- ☆30Updated 9 months ago
- Open deep learning compiler stack for cpu, gpu and specialized accelerators☆18Updated last month
- Odysseus: Playground of LLM Sequence Parallelism☆66Updated 8 months ago
- FP64 equivalent GEMM via Int8 Tensor Cores using the Ozaki scheme☆56Updated 3 weeks ago
- ☆38Updated last year
- IntLLaMA: A fast and light quantization solution for LLaMA☆18Updated last year
- Quantized Attention on GPU☆45Updated 3 months ago