Fridge003 / Cuda-Learn-By-PracticeLinks
Codebase for Cuda Learning
☆29Updated last year
Alternatives and similar repositories for Cuda-Learn-By-Practice
Users that are interested in Cuda-Learn-By-Practice are comparing it to the libraries listed below
Sorting:
- Distributed MoE in a Single Kernel [NeurIPS '25]☆188Updated this week
- TritonBench: Benchmarking Large Language Model Capabilities for Generating Triton Operators☆111Updated 7 months ago
- Accelerate LLM preference tuning via prefix sharing with a single line of code☆51Updated 6 months ago
- Efficient Long-context Language Model Training by Core Attention Disaggregation☆85Updated this week
- ☆128Updated 5 months ago
- Systems for GenAI☆155Updated last week
- ☆52Updated 8 months ago
- 🔥 LLM-powered GPU kernel synthesis: Train models to convert PyTorch ops into optimized Triton kernels via SFT+RL. Multi-turn compilation…☆115Updated 2 months ago
- Artifact for "Marconi: Prefix Caching for the Era of Hybrid LLMs" [MLSys '25 Outstanding Paper Award, Honorable Mention]☆49Updated 10 months ago
- An early research stage expert-parallel load balancer for MoE models based on linear programming.☆491Updated 2 months ago
- [ICLR 2025] DeFT: Decoding with Flash Tree-attention for Efficient Tree-structured LLM Inference☆48Updated 7 months ago
- Estimate MFU for DeepSeekV3☆26Updated last year
- ☆65Updated 9 months ago
- JAX backend for SGL☆232Updated this week
- Ship correct and fast LLM kernels to PyTorch☆139Updated 2 weeks ago
- Autonomous GPU Kernel Generation via Deep Agents☆223Updated this week
- Utility scripts for PyTorch (e.g. Make Perfetto show some disappearing kernels, Memory profiler that understands more low-level allocatio…☆82Updated 4 months ago
- APRIL: Active Partial Rollouts in Reinforcement Learning to Tame Long-tail Generation. A system-level optimization for scalable LLM tra…☆47Updated 3 months ago
- ☆45Updated 10 months ago
- ByteCheckpoint: An Unified Checkpointing Library for LFMs☆264Updated last month
- ☆47Updated last year
- ☆96Updated 10 months ago
- ☆117Updated 8 months ago
- Our first fully AI generated deep learning system☆429Updated last week
- ☆50Updated 9 months ago
- DeeperGEMM: crazy optimized version☆73Updated 8 months ago
- ☆74Updated this week
- ☆87Updated last week
- An experimental communicating attention kernel based on DeepEP.☆35Updated 6 months ago
- A simple calculation for LLM MFU.☆66Updated 4 months ago