mryab / efficient-dl-systems
Efficient Deep Learning Systems course materials (HSE, YSDA)
☆674Updated 7 months ago
Related projects ⓘ
Alternatives and complementary repositories for efficient-dl-systems
- What would you do with 1000 H100s...☆895Updated 10 months ago
- Automatically split your PyTorch models on multiple GPUs for training & inference☆624Updated 10 months ago
- Best practices & guides on how to write distributed pytorch training code☆282Updated last week
- Tensors, for human consumption☆1,111Updated 2 weeks ago
- Puzzles for exploring transformers☆323Updated last year
- Official implementation of the paper "Linear Transformers with Learnable Kernel Functions are Better In-Context Models"☆157Updated 8 months ago
- Puzzles for learning Triton☆1,089Updated last month
- 🦖 X—LLM: Cutting Edge & Easy LLM Finetuning☆380Updated 9 months ago
- GPU programming related news and material links☆1,216Updated last month
- Slides, notes, and materials for the workshop☆305Updated 5 months ago
- ☆388Updated 3 weeks ago
- Deep learning for dummies. All the practical details and useful utilities that go into working with real models.☆711Updated last month
- An ML Systems Onboarding list☆541Updated 3 months ago
- Building blocks for foundation models.☆388Updated 10 months ago
- LoRA and DoRA from Scratch Implementations☆188Updated 8 months ago
- The full minitorch student suite.☆1,912Updated 2 months ago
- Pipeline Parallelism for PyTorch☆725Updated 2 months ago
- MLCommons Algorithmic Efficiency is a benchmark and competition measuring neural network training speedups due to algorithmic improvement…☆332Updated 2 weeks ago
- YaFSDP: Yet another Fully Sharded Data Parallel☆846Updated last week
- UNet diffusion model in pure CUDA☆573Updated 4 months ago
- Collection of important articles to be treated as a textbook☆603Updated 7 months ago
- An implementation of the transformer architecture onto an Nvidia CUDA kernel☆157Updated last year
- Llama from scratch, or How to implement a paper without crying☆517Updated 5 months ago
- TensorDict is a pytorch dedicated tensor container.☆832Updated this week
- A library to inspect and extract intermediate layers of PyTorch models.☆470Updated 2 years ago
- Solve puzzles. Improve your pytorch.☆3,267Updated 3 months ago
- ☆20Updated 3 months ago
- Make PyTorch models up to 40% faster! Thunder is a source to source compiler for PyTorch. It enables using different hardware executors a…☆1,193Updated this week
- A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.☆479Updated 2 weeks ago
- Train to 94% on CIFAR-10 in <6.3 seconds on a single A100. Or ~95.79% in ~110 seconds (or less!)☆1,222Updated last year
- Annotated version of the Mamba paper☆455Updated 8 months ago