stanford-cs336 / assignment2-systemsLinks
Student version of Assignment 2 for Stanford CS336 - Language Modeling From Scratch
☆131Updated 4 months ago
Alternatives and similar repositories for assignment2-systems
Users that are interested in assignment2-systems are comparing it to the libraries listed below
Sorting:
- ☆44Updated 8 months ago
- ☆222Updated 11 months ago
- making the official triton tutorials actually comprehensible☆75Updated 3 months ago
- ☆82Updated 4 months ago
- FlexAttention based, minimal vllm-style inference engine for fast Gemma 2 inference.☆313Updated last month
- ring-attention experiments☆160Updated last year
- Cataloging released Triton kernels.☆274Updated 2 months ago
- An extension of the nanoGPT repository for training small MOE models.☆215Updated 8 months ago
- LLM KV cache compression made easy☆701Updated this week
- ☆463Updated 3 months ago
- ☆224Updated last week
- An early research stage expert-parallel load balancer for MoE models based on linear programming.☆433Updated 2 weeks ago
- KernelBench: Can LLMs Write GPU Kernels? - Benchmark with Torch -> CUDA (+ more DSLs)☆683Updated last week
- ☆344Updated this week
- A curated list of resources for learning and exploring Triton, OpenAI's programming language for writing efficient GPU code.☆438Updated 8 months ago
- A curated collection of resources, tutorials, and best practices for learning and mastering NVIDIA CUTLASS☆244Updated 7 months ago
- A repository to unravel the language of GPUs, making their kernel conversations easy to understand☆196Updated 6 months ago
- ☆177Updated last year
- ☆257Updated this week
- JAX backend for SGL☆187Updated this week
- Student version of Assignment 1 for Stanford CS336 - Language Modeling From Scratch☆976Updated 3 months ago
- coding CUDA everyday!☆71Updated 3 weeks ago
- ☆403Updated 11 months ago
- Official Problem Sets / Reference Kernels for the GPU MODE Leaderboard!☆164Updated last week
- Collection of kernels written in Triton language☆172Updated 8 months ago
- Memory optimized Mixture of Experts☆69Updated 4 months ago
- Dion optimizer algorithm☆395Updated 2 weeks ago
- Flash-Muon: An Efficient Implementation of Muon Optimizer☆212Updated 5 months ago
- Efficient LLM Inference over Long Sequences☆392Updated 5 months ago
- a minimal cache manager for PagedAttention, on top of llama3.☆126Updated last year