habanero-lab / APPyLinks
APPy (Annotated Parallelism for Python) enables users to annotate loops and tensor expressions in Python with compiler directives akin to OpenMP, and automatically compiles the annotated code to GPU kernels.
☆24Updated last month
Alternatives and similar repositories for APPy
Users that are interested in APPy are comparing it to the libraries listed below
Sorting:
- FlexAttention w/ FlashAttention3 Support☆27Updated 10 months ago
- Quantized Attention on GPU☆44Updated 8 months ago
- Implementation of Hyena Hierarchy in JAX☆10Updated 2 years ago
- Benchmark tests supporting the TiledCUDA library.☆16Updated 8 months ago
- Transformers components but in Triton☆34Updated 2 months ago
- ☆22Updated last year
- CUDA and Triton implementations of Flash Attention with SoftmaxN.☆71Updated last year
- Framework to reduce autotune overhead to zero for well known deployments.☆79Updated last week
- PyTorch implementation of the Flash Spectral Transform Unit.☆17Updated 10 months ago
- Reference implementation of "Softmax Attention with Constant Cost per Token" (Heinsen, 2024)☆24Updated last year
- Flash-Muon: An Efficient Implementation of Muon Optimizer☆149Updated last month
- Odysseus: Playground of LLM Sequence Parallelism☆72Updated last year
- ☆50Updated 2 months ago
- ☆32Updated last year
- Awesome Triton Resources☆32Updated 3 months ago
- ☆52Updated last year
- ☆123Updated 2 months ago
- GPTQ inference TVM kernel☆40Updated last year
- Accelerate LLM preference tuning via prefix sharing with a single line of code☆42Updated last month
- A Suite for Parallel Inference of Diffusion Transformers (DiTs) on multi-GPU Clusters☆46Updated last year
- ☆158Updated last year
- TensorRT LLM Benchmark Configuration☆13Updated last year
- ☆79Updated 6 months ago
- FlashRNN - Fast RNN Kernels with I/O Awareness☆93Updated last month
- TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.☆93Updated last month
- The evaluation framework for training-free sparse attention in LLMs☆86Updated last month
- A bunch of kernels that might make stuff slower 😉☆56Updated this week
- Fast and memory-efficient exact attention☆69Updated 5 months ago
- ☆75Updated 2 months ago
- Linear Attention Sequence Parallelism (LASP)☆85Updated last year