stanford-cs336 / assignment2-systemsLinks
Student version of Assignment 2 for Stanford CS336 - Language Modeling From Scratch
☆164Updated 6 months ago
Alternatives and similar repositories for assignment2-systems
Users that are interested in assignment2-systems are comparing it to the libraries listed below
Sorting:
- Miles is an enterprise-facing reinforcement learning framework for LLM and VLM post-training, forked from and co-evolving with slime.☆858Updated this week
- ☆105Updated 6 months ago
- Official implementation of "Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding"☆825Updated 2 weeks ago
- ☆46Updated 10 months ago
- mHC kernels implemented in CUDA☆249Updated 3 weeks ago
- ☆236Updated last year
- Block Diffusion for Ultra-Fast Speculative Decoding☆533Updated this week
- Based on Nano-vLLM, a simple replication of vLLM with self-contained paged attention and flash attention implementation☆422Updated this week
- LLM KV cache compression made easy☆876Updated 2 weeks ago
- Student version of Assignment 1 for Stanford CS336 - Language Modeling From Scratch☆1,243Updated 5 months ago
- making the official triton tutorials actually comprehensible☆111Updated 5 months ago
- dInfer: An Efficient Inference Framework for Diffusion Language Models☆413Updated this week
- Physics of Language Models: Part 4.2, Canon Layers at Scale where Synthetic Pretraining Resonates in Reality☆317Updated last month
- Efficient triton implementation of Native Sparse Attention.☆262Updated 8 months ago
- JAX backend for SGL☆237Updated this week
- [ICLR 2025] Official PyTorch Implementation of Gated Delta Networks: Improving Mamba2 with Delta Rule☆452Updated 4 months ago
- Speed Always Wins: A Survey on Efficient Architectures for Large Language Models☆394Updated 3 months ago
- ☆413Updated last year
- [ICLR2025 Spotlight] MagicPIG: LSH Sampling for Efficient LLM Generation☆249Updated last year
- Implementation for FP8/INT8 Rollout for RL training without performence drop.☆289Updated 3 months ago
- 📰 Must-read papers on KV Cache Compression (constantly updating 🤗).☆658Updated 4 months ago
- An extension of the nanoGPT repository for training small MOE models.☆236Updated 11 months ago
- ☆232Updated 2 months ago
- KernelBench: Can LLMs Write GPU Kernels? - Benchmark + Toolkit with Torch -> CUDA (+ more DSLs)☆792Updated 3 weeks ago
- Efficient LLM Inference over Long Sequences☆394Updated 7 months ago
- ☆961Updated 3 months ago
- [ICLR 2025] Palu: Compressing KV-Cache with Low-Rank Projection☆155Updated 11 months ago
- ☆270Updated 8 months ago
- [ICLR 2026]QeRL enables RL for 32B LLMs on a single H100 GPU.☆484Updated 2 months ago
- Awesome-LLM-KV-Cache: A curated list of 📙Awesome LLM KV Cache Papers with Codes.☆411Updated 11 months ago