tspeterkim / mixed-precision-from-scratch
Mixed precision training from scratch with Tensors and CUDA
☆21Updated 9 months ago
Alternatives and similar repositories for mixed-precision-from-scratch:
Users that are interested in mixed-precision-from-scratch are comparing it to the libraries listed below
- ring-attention experiments☆123Updated 3 months ago
- The simplest implementation of recent Sparse Attention patterns for efficient LLM inference.☆56Updated 3 weeks ago
- Experiment of using Tangent to autodiff triton☆75Updated last year
- ☆75Updated 7 months ago
- Learn CUDA with PyTorch☆16Updated 2 weeks ago
- Cataloging released Triton kernels.☆164Updated last month
- Code for studying the super weight in LLM☆79Updated 2 months ago
- ☆175Updated this week
- CUDA and Triton implementations of Flash Attention with SoftmaxN.☆67Updated 8 months ago
- ☆141Updated last year
- ☆45Updated last year
- ☆88Updated 8 months ago
- Fast low-bit matmul kernels in Triton☆231Updated this week
- ☆86Updated 11 months ago
- FlexAttention w/ FlashAttention3 Support☆26Updated 4 months ago
- A minimal implementation of vllm.☆33Updated 6 months ago
- Boosting 4-bit inference kernels with 2:4 Sparsity☆64Updated 5 months ago
- Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.☆86Updated this week
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆114Updated 2 months ago
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.☆117Updated last year
- ☆125Updated last month
- Fast Matrix Multiplications for Lookup Table-Quantized LLMs☆228Updated this week
- Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry☆40Updated last year
- Write a fast kernel and run it on Discord. See how you compare against the best!☆17Updated this week
- This is a repo covers ai research papers pseudocodes☆14Updated last year
- Simple and efficient pytorch-native transformer training and inference (batched)☆68Updated 10 months ago
- Learning about CUDA by writing PTX code.☆35Updated 11 months ago
- ☆14Updated 7 months ago
- a minimal cache manager for PagedAttention, on top of llama3.☆67Updated 5 months ago
- extensible collectives library in triton☆82Updated 4 months ago