habanero-lab / APPyLinks
APPy (Annotated Parallelism for Python) enables users to annotate loops and tensor expressions in Python with compiler directives akin to OpenMP, and automatically compiles the annotated code to GPU kernels.
☆24Updated 2 months ago
Alternatives and similar repositories for APPy
Users that are interested in APPy are comparing it to the libraries listed below
Sorting:
- FlexAttention w/ FlashAttention3 Support☆27Updated 11 months ago
- Quantized Attention on GPU☆44Updated 9 months ago
- ☆22Updated last year
- Awesome Triton Resources☆33Updated 4 months ago
- CUDA and Triton implementations of Flash Attention with SoftmaxN.☆73Updated last year
- Reference implementation of "Softmax Attention with Constant Cost per Token" (Heinsen, 2024)☆24Updated last year
- Transformers components but in Triton☆34Updated 4 months ago
- Odysseus: Playground of LLM Sequence Parallelism☆77Updated last year
- Benchmark tests supporting the TiledCUDA library.☆17Updated 9 months ago
- ☆32Updated last year
- Accelerate LLM preference tuning via prefix sharing with a single line of code☆43Updated 2 months ago
- Repository for CPU Kernel Generation for LLM Inference☆26Updated 2 years ago
- Framework to reduce autotune overhead to zero for well known deployments.