ptillet / triton-llvm-releases
☆20Updated last year
Related projects ⓘ
Alternatives and complementary repositories for triton-llvm-releases
- GPTQ inference TVM kernel☆36Updated 6 months ago
- Memory Optimizations for Deep Learning (ICML 2023)☆60Updated 8 months ago
- ☆48Updated 8 months ago
- ☆22Updated 11 months ago
- CUDA 12.2 HMM demos☆17Updated 3 months ago
- FlexAttention w/ FlashAttention3 Support☆27Updated last month
- ☆55Updated 5 months ago
- FP64 equivalent GEMM via Int8 Tensor Cores using the Ozaki scheme☆46Updated 2 months ago
- TensorRT LLM Benchmark Configuration☆11Updated 3 months ago
- Inference framework for MoE layers based on TensorRT with Python binding☆41Updated 3 years ago
- APPy (Annotated Parallelism for Python) enables users to annotate loops and tensor expressions in Python with compiler directives akin to…☆20Updated last week
- ☆11Updated 3 years ago
- Awesome Triton Resources☆18Updated last month
- ☆14Updated last month
- Fairring (FAIR + Herring) is a plug-in for PyTorch that provides a process group for distributed training that outperforms NCCL at large …☆63Updated 2 years ago
- PyTorch bindings for CUTLASS grouped GEMM.☆53Updated 3 weeks ago
- High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.☆90Updated 4 months ago
- pytorch-profiler☆50Updated last year
- ☆47Updated 2 months ago
- IntLLaMA: A fast and light quantization solution for LLaMA☆18Updated last year
- Benchmarking PyTorch 2.0 different models☆21Updated last year
- ☆36Updated last year
- Odysseus: Playground of LLM Sequence Parallelism☆57Updated 5 months ago
- extensible collectives library in triton☆72Updated last month
- Prototype routines for GPU quantization written using PyTorch.☆19Updated last week
- An external memory allocator example for PyTorch.☆13Updated 3 years ago
- Fast and memory-efficient exact attention☆30Updated 3 weeks ago
- Experiment of using Tangent to autodiff triton☆72Updated 9 months ago
- Reference implementation of "Softmax Attention with Constant Cost per Token" (Heinsen, 2024)☆24Updated 5 months ago