habanero-lab / APPyLinks
APPy (Annotated Parallelism for Python) enables users to annotate loops and tensor expressions in Python with compiler directives akin to OpenMP, and automatically compiles the annotated code to GPU kernels.
☆23Updated 3 weeks ago
Alternatives and similar repositories for APPy
Users that are interested in APPy are comparing it to the libraries listed below
Sorting:
- FlexAttention w/ FlashAttention3 Support☆26Updated 9 months ago
- ☆22Updated last year
- Quantized Attention on GPU☆44Updated 7 months ago
- Implementation of Hyena Hierarchy in JAX☆10Updated 2 years ago
- ☆31Updated last year
- CUDA and Triton implementations of Flash Attention with SoftmaxN.☆70Updated last year
- Awesome Triton Resources☆31Updated 2 months ago
- Reference implementation of "Softmax Attention with Constant Cost per Token" (Heinsen, 2024)☆24Updated last year
- PyTorch implementation of the Flash Spectral Transform Unit.☆17Updated 9 months ago
- Odysseus: Playground of LLM Sequence Parallelism☆70Updated last year
- Accelerate LLM preference tuning via prefix sharing with a single line of code☆42Updated last week
- ☆42Updated last week
- ☆51Updated last year
- Benchmark tests supporting the TiledCUDA library.☆16Updated 7 months ago
- ☆49Updated last month
- Repository for CPU Kernel Generation for LLM Inference☆26Updated last year
- ☆74Updated last month
- Framework to reduce autotune overhead to zero for well known deployments.☆79Updated this week
- Fast and memory-efficient exact attention☆68Updated 4 months ago
- Triton implement of bi-directional (non-causal) linear attention☆51Updated 5 months ago
- Transformers components but in Triton☆34Updated 2 months ago
- Memory Optimizations for Deep Learning (ICML 2023)☆64Updated last year
- ☆77Updated 5 months ago
- ☆21Updated 2 months ago
- A bunch of kernels that might make stuff slower 😉☆54Updated this week
- ☆116Updated last month
- GPTQ inference TVM kernel☆40Updated last year
- FlashRNN - Fast RNN Kernels with I/O Awareness☆92Updated last month
- Here we will test various linear attention designs.☆60Updated last year
- Open deep learning compiler stack for cpu, gpu and specialized accelerators☆19Updated 2 weeks ago