kimbochen / md-blogs
A blog where I write about research papers and blog posts I read.
โ12Updated 4 months ago
Alternatives and similar repositories for md-blogs:
Users that are interested in md-blogs are comparing it to the libraries listed below
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.โ127Updated last year
- ML/DL Math and Method notesโ59Updated last year
- Large scale 4D parallelism pre-training for ๐ค transformers in Mixture of Experts *(still work in progress)*โ81Updated last year
- PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)โ62Updated last week
- Write a fast kernel and run it on Discord. See how you compare against the best!โ35Updated this week
- Experiment of using Tangent to autodiff tritonโ78Updated last year
- โ76Updated 8 months ago
- Proof-of-concept of global switching between numpy/jax/pytorch in a library.โ18Updated 9 months ago
- โ43Updated last year
- โ87Updated last year
- Mixed precision training from scratch with Tensors and CUDAโ21Updated 10 months ago
- seqax = sequence modeling + JAXโ151Updated 2 weeks ago
- โ27Updated 8 months ago
- The simplest implementation of recent Sparse Attention patterns for efficient LLM inference.โ59Updated 2 months ago
- A MAD laboratory to improve AI architecture designs ๐งชโ108Updated 3 months ago
- โ60Updated 3 years ago
- โ152Updated last year
- Learn CUDA with PyTorchโ19Updated 2 months ago
- โ192Updated this week
- A repository to unravel the language of GPUs, making their kernel conversations easy to understandโ169Updated last week
- โ17Updated last year
- ring-attention experimentsโ128Updated 5 months ago
- ArcticTraining is a framework designed to simplify and accelerate the post-training process for large language models (LLMs)โ54Updated this week
- Custom triton kernels for training Karpathy's nanoGPT.โ18Updated 5 months ago
- Project 2 (Building Large Language Models) for Stanford CS324: Understanding and Developing Large Language Models (Winter 2022)โ103Updated 2 years ago
- A bunch of kernels that might make stuff slower ๐โ29Updated this week
- KernelBench: Can LLMs Write GPU Kernels? - Benchmark with Torch -> CUDA problemsโ239Updated this week
- Solve puzzles. Learn CUDA.โ63Updated last year
- Code for studying the super weight in LLMโ94Updated 4 months ago
- Distributed pretraining of large language models (LLMs) on cloud TPU slices, with Jax and Equinox.โ24Updated 6 months ago