stanford-cs149 / cs149gpt
☆52Updated 11 months ago
Related projects ⓘ
Alternatives and complementary repositories for cs149gpt
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.☆107Updated last year
- Cataloging released Triton kernels.☆134Updated 2 months ago
- Collection of kernels written in Triton language☆68Updated 3 weeks ago
- Simple and fast low-bit matmul kernels in CUDA / Triton☆145Updated this week
- extensible collectives library in triton☆72Updated last month
- ring-attention experiments☆97Updated last month
- ☆133Updated 9 months ago
- Custom kernels in Triton language for accelerating LLMs☆17Updated 7 months ago
- Applied AI experiments and examples for PyTorch☆166Updated 3 weeks ago
- An ML Systems Onboarding list☆547Updated last week
- ☆153Updated this week
- An implementation of the transformer architecture onto an Nvidia CUDA kernel☆157Updated last year
- ☆12Updated last month
- LLM training in simple, raw C/CUDA☆86Updated 6 months ago
- Learn CUDA with PyTorch☆14Updated 2 weeks ago
- pytorch from scratch in pure C/CUDA and python☆37Updated last month
- Project 2 (Building Large Language Models) for Stanford CS324: Understanding and Developing Large Language Models (Winter 2022)☆101Updated last year
- Alex Krizhevsky's original code from Google Code☆190Updated 8 years ago
- 🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.☆165Updated this week
- ☆47Updated 2 months ago
- ☆224Updated 4 months ago
- a minimal cache manager for PagedAttention, on top of llama3.☆45Updated 2 months ago
- ☆45Updated 2 weeks ago
- Learning about CUDA by writing PTX code.☆28Updated 8 months ago
- Slides, notes, and materials for the workshop☆306Updated 5 months ago
- ☆169Updated 4 months ago
- ☆48Updated this week
- Packages and instructions for training and inference of LLMs on NVIDIA's new GH200 machines☆19Updated 2 months ago
- This repository contains the experimental PyTorch native float8 training UX☆211Updated 3 months ago
- 🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash…☆193Updated this week