sekstini / gpupoor
☆12Updated 3 months ago
Alternatives and similar repositories for gpupoor:
Users that are interested in gpupoor are comparing it to the libraries listed below
- PyTorch half precision gemm lib w/ fused optional bias + optional relu/gelu☆57Updated 3 months ago
- A collection of niche / personally useful PyTorch optimizers with modified code.☆17Updated last week
- research impl of Native Sparse Attention (2502.11089)☆53Updated last month
- ☆19Updated this week
- ☆22Updated 9 months ago
- Focused on fast experimentation and simplicity☆70Updated 3 months ago
- ☆49Updated last year
- ☆112Updated this week
- QuIP quantization☆52Updated last year
- ☆16Updated last year
- ☆65Updated 3 months ago
- [WIP] Better (FP8) attention for Hopper☆26Updated last month
- Gpu benchmark☆57Updated 2 months ago
- Experimental GPU language with meta-programming☆22Updated 6 months ago
- Implementation of Diffusion Transformers and Rectified Flow in Jax☆21Updated 8 months ago
- Lightweight package that tracks and summarizes code changes using LLMs (Large Language Models)☆32Updated last month
- https://x.com/BlinkDL_AI/status/1884768989743882276☆27Updated last month
- Efficient optimizers☆185Updated this week
- supporting pytorch FSDP for optimizers☆80Updated 3 months ago
- ☆21Updated 3 weeks ago
- Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.☆45Updated 8 months ago
- FlashRNN - Fast RNN Kernels with I/O Awareness☆76Updated this week
- Experiment of using Tangent to autodiff triton☆78Updated last year
- A library for unit scaling in PyTorch☆124Updated 4 months ago
- Code accompanying the paper "Generalized Interpolating Discrete Diffusion"☆69Updated last week
- ☆20Updated last year
- JAX Scalify: end-to-end scaled arithmetics☆16Updated 5 months ago
- ☆76Updated 8 months ago
- GoldFinch and other hybrid transformer components☆10Updated 2 weeks ago
- [WIP] Transformer to embed Danbooru labelsets☆13Updated last year