habanero-lab / APPyLinks
APPy (Annotated Parallelism for Python) enables users to annotate loops and tensor expressions in Python with compiler directives akin to OpenMP, and automatically compiles the annotated code to GPU kernels.
☆23Updated this week
Alternatives and similar repositories for APPy
Users that are interested in APPy are comparing it to the libraries listed below
Sorting:
- FlexAttention w/ FlashAttention3 Support☆26Updated 8 months ago
- Reference implementation of "Softmax Attention with Constant Cost per Token" (Heinsen, 2024)☆24Updated last year
- Quantized Attention on GPU☆44Updated 7 months ago
- Accelerate LLM preference tuning via prefix sharing with a single line of code☆41Updated last month
- Awesome Triton Resources☆31Updated last month
- ☆63Updated this week
- Repository for CPU Kernel Generation for LLM Inference☆26Updated last year
- ☆21Updated 3 months ago
- ☆50Updated last year
- ☆31Updated last year
- Flash-Muon: An Efficient Implementation of Muon Optimizer☆131Updated last week
- Odysseus: Playground of LLM Sequence Parallelism☆70Updated last year
- TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.☆90Updated 2 weeks ago
- ☆114Updated 3 weeks ago
- ☆21Updated last month
- Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.☆45Updated 11 months ago
- ☆49Updated last month
- A bunch of kernels that might make stuff slower 😉☆51Updated this week
- Make triton easier☆46Updated last year
- Framework to reduce autotune overhead to zero for well known deployments.☆77Updated last week
- Linear Attention Sequence Parallelism (LASP)☆84Updated last year
- DeeperGEMM: crazy optimized version☆69Updated last month
- Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.☆136Updated this week
- ☆60Updated last month
- A Suite for Parallel Inference of Diffusion Transformers (DiTs) on multi-GPU Clusters☆47Updated 11 months ago
- Implementation of Hyena Hierarchy in JAX☆10Updated 2 years ago
- ☆75Updated 5 months ago
- Using FlexAttention to compute attention with different masking patterns☆44Updated 9 months ago
- extensible collectives library in triton☆86Updated 2 months ago
- Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry☆42Updated last year