habanero-lab / APPyLinks
APPy (Annotated Parallelism for Python) enables users to annotate loops and tensor expressions in Python with compiler directives akin to OpenMP, and automatically compiles the annotated code to GPU kernels.
☆24Updated 2 months ago
Alternatives and similar repositories for APPy
Users that are interested in APPy are comparing it to the libraries listed below
Sorting:
- FlexAttention w/ FlashAttention3 Support☆27Updated 11 months ago
- Quantized Attention on GPU☆44Updated 9 months ago
- ☆32Updated last year
- ☆22Updated last year
- Transformers components but in Triton☆34Updated 4 months ago
- Odysseus: Playground of LLM Sequence Parallelism☆77Updated last year
- ☆50Updated 3 months ago
- Benchmark tests supporting the TiledCUDA library.☆17Updated 9 months ago
- CUDA and Triton implementations of Flash Attention with SoftmaxN.☆73Updated last year
- Reference implementation of "Softmax Attention with Constant Cost per Token" (Heinsen, 2024)☆24Updated last year
- ☆95Updated 3 months ago
- Awesome Triton Resources☆33Updated 4 months ago
- Accelerate LLM preference tuning via prefix sharing with a single line of code☆43Updated 2 months ago
- ☆22Updated 4 months ago
- Framework to reduce autotune overhead to zero for well known deployments.☆81Updated last week
- Implementation of Hyena Hierarchy in JAX☆10Updated 2 years ago
- ☆124Updated 3 months ago
- ☆102Updated 3 weeks ago
- ☆159Updated 2 years ago
- TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.☆97Updated 2 months ago
- PyTorch implementation of the Flash Spectral Transform Unit.☆17Updated 11 months ago
- Flash-Muon: An Efficient Implementation of Muon Optimizer☆181Updated 3 months ago
- Linear Attention Sequence Parallelism (LASP)☆86Updated last year
- A bunch of kernels that might make stuff slower 😉☆58Updated 2 weeks ago
- ☆47Updated 2 weeks ago
- Here we will test various linear attention designs.☆62Updated last year
- 32 times longer context window than vanilla Transformers and up to 4 times longer than memory efficient Transformers.☆49Updated 2 years ago
- The evaluation framework for training-free sparse attention in LLMs☆91Updated 2 months ago
- ☆56Updated last year
- GPTQ inference TVM kernel☆40Updated last year