saifhaq / almaLinks
☆20Updated 2 months ago
Alternatives and similar repositories for alma
Users that are interested in alma are comparing it to the libraries listed below
Sorting:
- A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.☆567Updated this week
- A curated collection of resources, tutorials, and best practices for learning and mastering NVIDIA CUTLASS☆208Updated 3 months ago
- A repository to unravel the language of GPUs, making their kernel conversations easy to understand☆189Updated 2 months ago
- Slides, notes, and materials for the workshop☆328Updated last year
- This repository contains the experimental PyTorch native float8 training UX☆224Updated last year
- ☆324Updated last week
- Fast low-bit matmul kernels in Triton☆339Updated last week
- Cataloging released Triton kernels.☆251Updated 7 months ago
- Annotated version of the Mamba paper☆487Updated last year
- ☆187Updated 7 months ago
- ☆227Updated last week
- A repository for log-time feedforward networks☆223Updated last year
- Implementation of a Transformer, but completely in Triton☆273Updated 3 years ago
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.☆137Updated last year
- ☆47Updated 7 months ago
- The simplest but fast implementation of matrix multiplication in CUDA.☆37Updated last year
- ☆162Updated last year
- 🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.☆207Updated this week
- making the official triton tutorials actually comprehensible☆53Updated 2 weeks ago
- Explore training for quantized models☆20Updated last month
- ☆519Updated last year
- Collection of kernels written in Triton language☆144Updated 4 months ago
- Applied AI experiments and examples for PyTorch☆289Updated 2 months ago
- Custom kernels in Triton language for accelerating LLMs☆23Updated last year
- An implementation of the transformer architecture onto an Nvidia CUDA kernel☆189Updated last year
- ring-attention experiments☆147Updated 9 months ago
- ☆88Updated last year
- seqax = sequence modeling + JAX☆165Updated 2 weeks ago
- Fault tolerance for PyTorch (HSDP, LocalSGD, DiLoCo, Streaming DiLoCo)☆377Updated this week
- Implementation of Flash Attention in Jax☆215Updated last year