mandliya / PMPP_notesLinks
Notes and code for Programming Massively Parallel Processors
☆13Updated 9 months ago
Alternatives and similar repositories for PMPP_notes
Users that are interested in PMPP_notes are comparing it to the libraries listed below
Sorting:
- A repository to unravel the language of GPUs, making their kernel conversations easy to understand☆195Updated 7 months ago
- ML/DL Math and Method notes☆66Updated 2 years ago
- Custom kernels in Triton language for accelerating LLMs☆27Updated last year
- GPU Kernels☆218Updated 9 months ago
- Documented and Unit Tested educational Deep Learning framework with Autograd from scratch.☆122Updated last year
- ☆29Updated last year
- Distributed training (multi-node) of a Transformer model☆92Updated last year
- ☆178Updated last year
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.☆154Updated 2 years ago
- A blog where I write about research papers and blog posts I read.☆12Updated last year
- Learn CUDA with PyTorch☆185Updated this week
- A really tiny autograd engine☆99Updated 8 months ago
- LORA: Low-Rank Adaptation of Large Language Models implemented using PyTorch☆121Updated 2 years ago
- ☆45Updated 8 months ago
- ☆52Updated 7 months ago
- RL significantly the reasoning capability of Qwen2.5-1.5B-Instruct☆31Updated 11 months ago
- ☆41Updated last year
- Tutorials for Triton, a language for writing gpu kernels☆72Updated 2 years ago
- ML algorithms implementations that are good for learning the underlying principles☆26Updated last year
- ☆235Updated last year
- Pre-train BERT from scratch, with HuggingFace. Accompanies the blog post: sidsite.com/posts/bert-from-scratch☆43Updated 8 months ago
- Implementations of Papers that I read, you can read my breakdown in my blog☆89Updated 3 months ago
- Papers on Search, Recommendations, and Ads (搜广推)☆29Updated 6 months ago
- This is a repo covers ai research papers pseudocodes☆17Updated 2 years ago
- Slides, notes, and materials for the workshop☆338Updated last year
- A collection of lightweight interpretability scripts to understand how LLMs think☆89Updated this week
- Notes on quantization in neural networks☆117Updated 2 years ago
- a minimal cache manager for PagedAttention, on top of llama3.☆130Updated last year
- Complete implementation of Llama2 with/without KV cache & inference 🚀☆49Updated last year
- A complete PyTorch implementation of Google's Gemma3 270M language model, featuring sliding window attention, RoPE positional encoding, a…☆44Updated 4 months ago