habanero-lab / APPyLinks
APPy (Annotated Parallelism for Python) enables users to annotate loops and tensor expressions in Python with compiler directives akin to OpenMP, and automatically compiles the annotated code to GPU kernels.
☆28Updated last week
Alternatives and similar repositories for APPy
Users that are interested in APPy are comparing it to the libraries listed below
Sorting:
- FlexAttention w/ FlashAttention3 Support☆27Updated last year
- Quantized Attention on GPU☆44Updated last year
- ☆32Updated last year
- CUDA and Triton implementations of Flash Attention with SoftmaxN.☆73Updated last year
- Framework to reduce autotune overhead to zero for well known deployments.☆91Updated 3 months ago
- Transformers components but in Triton☆34Updated 7 months ago
- ☆59Updated 2 years ago
- Reference implementation of "Softmax Attention with Constant Cost per Token" (Heinsen, 2024)☆24Updated last year
- ☆52Updated 7 months ago
- Awesome Triton Resources☆39Updated 8 months ago
- ☆115Updated 7 months ago
- Benchmark tests supporting the TiledCUDA library.☆18Updated last year
- Odysseus: Playground of LLM Sequence Parallelism☆78Updated last year
- A bunch of kernels that might make stuff slower 😉☆72Updated this week
- ☆22Updated 2 years ago
- Xmixers: A collection of SOTA efficient token/channel mixers☆28Updated 4 months ago
- Fast and memory-efficient exact attention☆75Updated 10 months ago
- PyTorch implementation of the Flash Spectral Transform Unit.☆21Updated last year
- TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.☆104Updated 6 months ago
- Vortex: A Flexible and Efficient Sparse Attention Framework☆43Updated last month
- ☆23Updated 8 months ago
- ☆133Updated 7 months ago
- Here we will test various linear attention designs.☆62Updated last year
- A Suite for Parallel Inference of Diffusion Transformers (DiTs) on multi-GPU Clusters☆53Updated last year
- GPTQ inference TVM kernel☆41Updated last year
- Accelerate LLM preference tuning via prefix sharing with a single line of code☆51Updated 6 months ago
- Triton implement of bi-directional (non-causal) linear attention☆60Updated 11 months ago
- DeeperGEMM: crazy optimized version☆74Updated 7 months ago
- ☆65Updated 8 months ago
- ☆125Updated 4 months ago