ScalingIntelligence / good-kernelsLinks
Samples of good AI generated CUDA kernels
☆99Updated 7 months ago
Alternatives and similar repositories for good-kernels
Users that are interested in good-kernels are comparing it to the libraries listed below
Sorting:
- RWKV-7: Surpassing GPT☆103Updated last year
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆131Updated last year
- ☆163Updated 7 months ago
- ☆115Updated 2 weeks ago
- ☆71Updated 7 months ago
- Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"☆250Updated 11 months ago
- Simple high-throughput inference library☆155Updated 8 months ago
- Official implementation for Training LLMs with MXFP4☆116Updated 8 months ago
- 👷 Build compute kernels☆213Updated this week
- Work in progress.☆77Updated last month
- PyTorch implementation of models from the Zamba2 series.☆186Updated 11 months ago
- ☆218Updated 11 months ago
- QuIP quantization☆61Updated last year
- ☆54Updated last year
- [ICLR'25] Fast Inference of MoE Models with CPU-GPU Orchestration☆255Updated last year
- Reverse Engineering Gemma 3n: Google's New Edge-Optimized Language Model☆261Updated 7 months ago
- LLM Inference on consumer devices☆128Updated 10 months ago
- CUDA-L2: Surpassing cuBLAS Performance for Matrix Multiplication through Reinforcement Learning☆318Updated 2 weeks ago
- Write a fast kernel and run it on Discord. See how you compare against the best!☆67Updated this week
- The evaluation framework for training-free sparse attention in LLMs☆110Updated 3 months ago
- The Automated LLM Speedrunning Benchmark measures how well LLM agents can reproduce previous innovations and discover new ones in languag…☆127Updated 3 months ago
- Boosting 4-bit inference kernels with 2:4 Sparsity☆91Updated last year
- Ship correct and fast LLM kernels to PyTorch☆132Updated last week
- Advanced Ultra-Low Bitrate Compression Techniques for the LLaMA Family of LLMs☆110Updated 2 years ago
- ☆52Updated last year
- Flash-Muon: An Efficient Implementation of Muon Optimizer☆229Updated 7 months ago
- ring-attention experiments☆161Updated last year
- Memory optimized Mixture of Experts☆72Updated 5 months ago
- Lightweight toolkit package to train and fine-tune 1.58bit Language models☆107Updated 8 months ago
- Normalized Transformer (nGPT)☆196Updated last year