habanero-lab / APPyLinks
APPy (Annotated Parallelism for Python) enables users to annotate loops and tensor expressions in Python with compiler directives akin to OpenMP, and automatically compiles the annotated code to GPU kernels.
☆25Updated this week
Alternatives and similar repositories for APPy
Users that are interested in APPy are comparing it to the libraries listed below
Sorting:
- FlexAttention w/ FlashAttention3 Support☆27Updated last year
- Quantized Attention on GPU☆44Updated 11 months ago
- ☆22Updated last year
- Benchmark tests supporting the TiledCUDA library.☆17Updated 11 months ago
- Transformers components but in Triton☆34Updated 5 months ago
- ☆50Updated 5 months ago
- Framework to reduce autotune overhead to zero for well known deployments.☆84Updated last month
- ☆103Updated 5 months ago
- Accelerate LLM preference tuning via prefix sharing with a single line of code☆51Updated 4 months ago
- CUDA and Triton implementations of Flash Attention with SoftmaxN.☆73Updated last year
- Awesome Triton Resources☆36Updated 6 months ago
- ☆32Updated last year
- Odysseus: Playground of LLM Sequence Parallelism☆78Updated last year
- Reference implementation of "Softmax Attention with Constant Cost per Token" (Heinsen, 2024)☆24Updated last year
- Xmixers: A collection of SOTA efficient token/channel mixers☆29Updated 2 months ago
- ☆130Updated 5 months ago
- Implementation of Hyena Hierarchy in JAX☆10Updated 2 years ago
- ☆83Updated 9 months ago
- ☆57Updated last year
- PyTorch implementation of the Flash Spectral Transform Unit.☆19Updated last year
- GPTQ inference TVM kernel☆39Updated last year
- FlashRNN - Fast RNN Kernels with I/O Awareness☆103Updated 2 weeks ago
- ☆77Updated last year
- ☆64Updated 6 months ago
- The evaluation framework for training-free sparse attention in LLMs☆102Updated 3 weeks ago
- 方便扩展的Cuda算子理解和优化框架,仅用在学习使用☆18Updated last year
- A bunch of kernels that might make stuff slower 😉☆64Updated this week
- An efficient implementation of the NSA (Native Sparse Attention) kernel☆124Updated 4 months ago
- DeeperGEMM: crazy optimized version☆72Updated 5 months ago
- ☆120Updated 2 months ago