kimbochen / md-blogs
A blog where I write about research papers and blog posts I read.
☆11Updated this week
Related projects ⓘ
Alternatives and complementary repositories for md-blogs
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.☆107Updated last year
- Cataloging released Triton kernels.☆138Updated 2 months ago
- Solve puzzles. Learn CUDA.☆61Updated 11 months ago
- ☆153Updated this week
- An implementation of the transformer architecture onto an Nvidia CUDA kernel☆157Updated last year
- seqax = sequence modeling + JAX☆134Updated 4 months ago
- Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*☆80Updated 11 months ago
- Experiment of using Tangent to autodiff triton☆72Updated 10 months ago
- ☆73Updated 4 months ago
- ☆133Updated 9 months ago
- ML/DL Math and Method notes☆57Updated 11 months ago
- Two implementations of ZeRO-1 optimizer sharding in JAX☆13Updated last year
- Learning about CUDA by writing PTX code.☆29Updated 8 months ago
- ring-attention experiments☆97Updated last month
- ☆225Updated 4 months ago
- CUDA and Triton implementations of Flash Attention with SoftmaxN.☆66Updated 5 months ago
- Collection of kernels written in Triton language☆69Updated 3 weeks ago
- extensible collectives library in triton☆72Updated 2 months ago
- ☆64Updated 2 years ago
- A set of Python scripts that makes your experience on TPU better☆40Updated 4 months ago
- ☆198Updated 4 months ago
- ☆269Updated this week
- ☆83Updated 8 months ago
- train with kittens!☆49Updated last month
- Proof-of-concept of global switching between numpy/jax/pytorch in a library.☆18Updated 5 months ago
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆84Updated this week
- ☆49Updated 2 weeks ago
- Applied AI experiments and examples for PyTorch☆168Updated 3 weeks ago
- ☆53Updated 11 months ago