stanford-cs149 / cs149gptLinks
☆72Updated last year
Alternatives and similar repositories for cs149gpt
Users that are interested in cs149gpt are comparing it to the libraries listed below
Sorting:
- A curated collection of resources, tutorials, and best practices for learning and mastering NVIDIA CUTLASS☆189Updated last month
- Cataloging released Triton kernels.☆238Updated 5 months ago
- ☆219Updated this week
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.☆134Updated last year
- Fast low-bit matmul kernels in Triton☆322Updated last week
- Custom kernels in Triton language for accelerating LLMs☆22Updated last year
- ☆109Updated 3 months ago
- ☆159Updated last year
- a minimal cache manager for PagedAttention, on top of llama3.☆91Updated 9 months ago
- Learning about CUDA by writing PTX code.☆132Updated last year
- Fastest kernels written from scratch☆281Updated 2 months ago
- Examples and exercises from the book Programming Massively Parallel Processors - A Hands-on Approach. David B. Kirk and Wen-mei W. Hwu (T…☆67Updated 4 years ago
- Applied AI experiments and examples for PyTorch☆277Updated 3 weeks ago
- ☆174Updated 5 months ago
- Reference Kernels for the Leaderboard☆60Updated last week
- Write a fast kernel and run it on Discord. See how you compare against the best!☆46Updated this week
- extensible collectives library in triton☆86Updated 2 months ago
- ☆212Updated 11 months ago
- ring-attention experiments☆144Updated 8 months ago
- Collection of kernels written in Triton language☆128Updated 2 months ago
- ☆90Updated 5 months ago
- ☆81Updated 7 months ago
- A repository to unravel the language of GPUs, making their kernel conversations easy to understand☆185Updated 3 weeks ago
- ☆117Updated last month
- Stanford CS149 -- Assignment 1☆109Updated 8 months ago
- CUTLASS and CuTe Examples☆57Updated 5 months ago
- A minimal implementation of vllm.☆43Updated 10 months ago
- Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.☆136Updated this week
- Perplexity GPU Kernels☆375Updated 2 weeks ago
- An implementation of the transformer architecture onto an Nvidia CUDA kernel☆185Updated last year