kimbochen / md-blogsLinks

A blog where I write about research papers and blog posts I read.

☆12

Alternatives and similar repositories for md-blogs

Users that are interested in md-blogs are comparing it to the libraries listed below

Sorting:

siboehm / ShallowSpeed
Small scale distributed training of sequential deep learning models, built on Numpy and MPI.
☆145Updated 2 years ago
cloneofsimo / ptx-tutorial-by-aislop
PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)
☆66Updated 6 months ago
vdesai2014 / inference-optimization-blog-post
☆89Updated last year
MatX-inc / seqax
seqax = sequence modeling + JAX
☆167Updated 2 months ago
xrsrke / pipegoose
Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*
☆87Updated last year
cloneofsimo / min-fsdp
☆91Updated last year
google-deepmind / asyncdiloco
☆46Updated last year
srush / anynp
Proof-of-concept of global switching between numpy/jax/pytorch in a library.
☆18Updated last year
dshah3 / GPU-Puzzles
Solve puzzles. Learn CUDA.
☆64Updated last year
MekkCyber / TritonAcademy
A repository to unravel the language of GPUs, making their kernel conversations easy to understand
☆194Updated 4 months ago
srush / triton-autodiff
Experiment of using Tangent to autodiff triton
☆80Updated last year
nano-R1 / resources
Compiling useful links, papers, benchmarks, ideas, etc.
☆45Updated 7 months ago
AllanYangZhou / midGPT
Distributed pretraining of large language models (LLMs) on cloud TPU slices, with Jax and Equinox.
☆24Updated last year
gpu-mode / profiling-cuda-in-torch
☆174Updated last year
HazyResearch / train-tk
train with kittens!
☆63Updated 11 months ago
srush / GPTWorld
A puzzle to learn about prompting
☆135Updated 2 years ago
gpu-mode / discord-cluster-manager
Write a fast kernel and run it on Discord. See how you compare against the best!
☆58Updated this week
PrimeIntellect-ai / pccl
PCCL (Prime Collective Communications Library) implements fault tolerant collective communications over IP
☆131Updated last month
Jaykef / Triton-nanoGPT
Custom triton kernels for training Karpathy's nanoGPT.
☆19Updated 11 months ago
divyamakkar0 / JAXformer
A zero-to-one guide on scaling modern transformers with n-dimensional parallelism.
☆103Updated 3 weeks ago
linjames0 / Transformer-CUDA
An implementation of the transformer architecture onto an Nvidia CUDA kernel
☆190Updated 2 years ago
gau-nernst / learn-cuda
Learn CUDA with PyTorch
☆87Updated 3 weeks ago
okarthikb / state-space-models
☆28Updated last year
stas00 / ml-ways
ML/DL Math and Method notes
☆64Updated last year
yixiaoer / tpux
A set of Python scripts that makes your experience on TPU better
☆54Updated last month
young-geng / mintext
Minimal but scalable implementation of large language models in JAX
☆35Updated last month
athms / mad-lab
A MAD laboratory to improve AI architecture designs 🧪
☆131Updated 10 months ago
yixiaoer / mistral-v0.2-jax
JAX implementation of the Mistral 7b v0.2 model
☆35Updated last year
matttreed / diloco-sim
☆21Updated 9 months ago
google / torchax
☆77Updated this week