stas00 / the-art-of-debugging
The Art of Debugging
☆810Updated 3 months ago
Related projects ⓘ
Alternatives and complementary repositories for the-art-of-debugging
- Puzzles for learning Triton☆1,089Updated last month
- An ML Systems Onboarding list☆541Updated 3 months ago
- ☆388Updated 3 weeks ago
- What would you do with 1000 H100s...☆895Updated 10 months ago
- GPU programming related news and material links☆1,216Updated last month
- Slides, notes, and materials for the workshop☆305Updated 5 months ago
- Puzzles for exploring transformers☆323Updated last year
- Deep learning for dummies. All the practical details and useful utilities that go into working with real models.☆711Updated last month
- ☆235Updated 7 months ago
- Best practices & guides on how to write distributed pytorch training code☆282Updated last week
- Building blocks for foundation models.☆388Updated 10 months ago
- LLM papers I'm reading, mostly on inference and model compression☆691Updated 10 months ago
- High Quality Resources on GPU Programming/Architecture☆563Updated 3 months ago
- For optimization algorithm research and development.☆417Updated this week
- Alex Krizhevsky's original code from Google Code☆189Updated 8 years ago
- Make PyTorch models up to 40% faster! Thunder is a source to source compiler for PyTorch. It enables using different hardware executors a…☆1,193Updated this week
- A JAX research toolkit for building, editing, and visualizing neural networks.☆1,675Updated last week
- A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.☆479Updated 2 weeks ago
- Implementation of Diffusion Transformer (DiT) in JAX☆252Updated 5 months ago
- The full minitorch student suite.☆1,912Updated 2 months ago
- Tile primitives for speedy kernels☆1,645Updated this week
- Solve puzzles. Improve your pytorch.☆3,267Updated 3 months ago
- UNet diffusion model in pure CUDA☆573Updated 4 months ago
- NanoGPT (124M) quality in 7.8 8xH100-minutes☆965Updated this week
- The Tensor (or Array)☆408Updated 3 months ago
- A tool to analyze and debug neural networks in pytorch. Use a GUI to traverse the computation graph and view the data from many different…☆270Updated 2 weeks ago
- MLCommons Algorithmic Efficiency is a benchmark and competition measuring neural network training speedups due to algorithmic improvement…☆332Updated 2 weeks ago
- A deep dive into embeddings starting from fundamentals☆954Updated 7 months ago
- TensorDict is a pytorch dedicated tensor container.☆832Updated this week
- Solve puzzles. Learn CUDA.☆60Updated 10 months ago