stanford-cs336 / assignment2-systemsLinks
Student version of Assignment 2 for Stanford CS336 - Language Modeling From Scratch
☆140Updated 5 months ago
Alternatives and similar repositories for assignment2-systems
Users that are interested in assignment2-systems are comparing it to the libraries listed below
Sorting:
- ☆44Updated 9 months ago
- ☆89Updated 5 months ago
- ☆629Updated this week
- ☆228Updated 11 months ago
- [ICLR2025 Spotlight] MagicPIG: LSH Sampling for Efficient LLM Generation☆245Updated last year
- making the official triton tutorials actually comprehensible☆80Updated 4 months ago
- ☆403Updated last year
- LLM KV cache compression made easy☆729Updated last week
- An early research stage expert-parallel load balancer for MoE models based on linear programming.☆476Updated last month
- FlexAttention based, minimal vllm-style inference engine for fast Gemma 2 inference.☆327Updated last month
- ☆225Updated last month
- An extension of the nanoGPT repository for training small MOE models.☆219Updated 9 months ago
- ☆465Updated 3 months ago
- JAX backend for SGL☆205Updated this week
- fmchisel: Efficient Compression and Training Algorithms for Foundation Models☆81Updated 2 months ago
- Student version of Assignment 1 for Stanford CS336 - Language Modeling From Scratch☆1,026Updated 3 months ago
- ring-attention experiments☆160Updated last year
- KernelBench: Can LLMs Write GPU Kernels? - Benchmark + Toolkit with Torch -> CUDA (+ more DSLs)☆718Updated last week
- Memory optimized Mixture of Experts☆72Updated 5 months ago
- Accelerating MoE with IO and Tile-aware Optimizations☆469Updated this week
- Cataloging released Triton kernels.☆278Updated 3 months ago
- ☆178Updated last year
- 🔥 LLM-powered GPU kernel synthesis: Train models to convert PyTorch ops into optimized Triton kernels via SFT+RL. Multi-turn compilation…☆109Updated last month
- A curated collection of resources, tutorials, and best practices for learning and mastering NVIDIA CUTLASS☆244Updated 7 months ago
- 📰 Must-read papers on KV Cache Compression (constantly updating 🤗).☆627Updated 2 months ago
- torchcomms: a modern PyTorch communications API☆314Updated this week
- Building blocks for foundation models.☆585Updated last year
- Systems for GenAI☆151Updated 8 months ago
- Implementation for FP8/INT8 Rollout for RL training without performence drop.☆282Updated last month
- The evaluation framework for training-free sparse attention in LLMs☆106Updated 2 months ago