habanero-lab / APPyLinks
APPy (Annotated Parallelism for Python) enables users to annotate loops and tensor expressions in Python with compiler directives akin to OpenMP, and automatically compiles the annotated code to GPU kernels.
☆24Updated 2 months ago
Alternatives and similar repositories for APPy
Users that are interested in APPy are comparing it to the libraries listed below
Sorting:
- FlexAttention w/ FlashAttention3 Support☆27Updated 10 months ago
- ☆32Updated last year
- Quantized Attention on GPU☆44Updated 9 months ago
- Odysseus: Playground of LLM Sequence Parallelism☆76Updated last year
- CUDA and Triton implementations of Flash Attention with SoftmaxN.☆73Updated last year
- ☆77Updated 3 months ago
- ☆55Updated last year
- Benchmark tests supporting the TiledCUDA library.☆17Updated 9 months ago
- Transformers components but in Triton☆34Updated 3 months ago
- ☆50Updated 3 months ago
- ☆22Updated last year
- Reference implementation of "Softmax Attention with Constant Cost per Token" (Heinsen, 2024)☆24Updated last year
- Implementation of Hyena Hierarchy in JAX☆10Updated 2 years ago
- ☆123Updated 2 months ago
- Awesome Triton Resources☆33Updated 3 months ago
- Framework to reduce autotune overhead to zero for well known deployments.☆81Updated last week
- A bunch of kernels that might make stuff slower 😉☆58Updated this week
- Accelerate LLM preference tuning via prefix sharing with a single line of code☆43Updated last month
- Flash-Muon: An Efficient Implementation of Muon Optimizer☆160Updated 2 months ago
- TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.☆95Updated last month
- ☆91Updated last week
- GPTQ inference TVM kernel☆40Updated last year
- FlashRNN - Fast RNN Kernels with I/O Awareness☆94Updated 2 months ago
- Triton implement of bi-directional (non-causal) linear attention☆51Updated 6 months ago
- A Suite for Parallel Inference of Diffusion Transformers (DiTs) on multi-GPU Clusters☆47Updated last year
- ☆81Updated 7 months ago
- ☆72Updated 9 months ago
- Linear Attention Sequence Parallelism (LASP)☆86Updated last year
- Fast and memory-efficient exact attention☆70Updated 5 months ago
- QuTLASS: CUTLASS-Powered Quantized BLAS for Deep Learning☆67Updated last month