stanford-cs336 / assignment2-systemsLinks
☆41Updated 2 months ago
Alternatives and similar repositories for assignment2-systems
Users that are interested in assignment2-systems are comparing it to the libraries listed below
Sorting:
- ☆198Updated 5 months ago
- ☆161Updated last year
- An extension of the nanoGPT repository for training small MOE models.☆160Updated 4 months ago
- A curated list of resources for learning and exploring Triton, OpenAI's programming language for writing efficient GPU code.☆378Updated 4 months ago
- ☆28Updated last month
- A repository to unravel the language of GPUs, making their kernel conversations easy to understand☆188Updated last month
- Normalized Transformer (nGPT)☆184Updated 7 months ago
- making the official triton tutorials actually comprehensible☆45Updated 3 months ago
- ☆179Updated 6 months ago
- Student version of Assignment 1 for Stanford CS336 - Language Modeling From Scratch☆348Updated 3 months ago
- A collection of tricks and tools to speed up transformer models☆170Updated last month
- Chain of Experts (CoE) enables communication between experts within Mixture-of-Experts (MoE) models☆216Updated 3 weeks ago
- ring-attention experiments☆144Updated 9 months ago
- Library for text-to-text regression, applicable to any input string representation and allows pretraining and fine-tuning over multiple r…☆86Updated this week
- ☆179Updated 7 months ago
- The evaluation framework for training-free sparse attention in LLMs☆82Updated 3 weeks ago
- Scalable toolkit for efficient model reinforcement☆499Updated this week
- Official PyTorch implementation for Hogwild! Inference: Parallel LLM Generation with a Concurrent Attention Cache☆112Updated this week
- PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)☆66Updated 3 months ago
- Flash-Muon: An Efficient Implementation of Muon Optimizer☆138Updated last month
- in this repository, i'm going to implement increasingly complex llm inference optimizations☆63Updated last month
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆128Updated 7 months ago
- Load compute kernels from the Hub☆203Updated this week
- A curated collection of resources, tutorials, and best practices for learning and mastering NVIDIA CUTLASS☆195Updated 2 months ago
- Exploring Applications of GRPO☆240Updated this week
- The Automated LLM Speedrunning Benchmark measures how well LLM agents can reproduce previous innovations and discover new ones in languag…☆87Updated 2 weeks ago
- ☆35Updated 4 months ago
- LLM KV cache compression made easy☆535Updated last week
- Efficient LLM Inference over Long Sequences☆382Updated 3 weeks ago
- ☆222Updated last month