ScalingIntelligence / good-kernelsLinks
Samples of good AI generated CUDA kernels
☆84Updated last month
Alternatives and similar repositories for good-kernels
Users that are interested in good-kernels are comparing it to the libraries listed below
Sorting:
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆128Updated 7 months ago
- RWKV-7: Surpassing GPT☆92Updated 7 months ago
- Simple high-throughput inference library☆120Updated 2 months ago
- ☆139Updated 3 weeks ago
- ☆71Updated 2 weeks ago
- ☆41Updated 3 weeks ago
- A safetensors extension to efficiently store sparse quantized tensors on disk☆135Updated this week
- LLM Inference on consumer devices☆120Updated 3 months ago
- PyTorch implementation of models from the Zamba2 series.☆183Updated 5 months ago
- Inference of Mamba models in pure C☆188Updated last year
- Training-free Post-training Efficient Sub-quadratic Complexity Attention. Implemented with OpenAI Triton.☆139Updated this week
- Prepare for DeekSeek R1 inference: Benchmark CPU, DRAM, SSD, iGPU, GPU, ... with efficient code.☆72Updated 5 months ago
- High-Performance SGEMM on CUDA devices☆97Updated 5 months ago
- Write a fast kernel and run it on Discord. See how you compare against the best!☆46Updated 2 weeks ago
- ☆79Updated 8 months ago
- Work in progress.☆70Updated 2 weeks ago
- QuIP quantization☆54Updated last year
- Lightweight Llama 3 8B Inference Engine in CUDA C☆47Updated 3 months ago
- PCCL (Prime Collective Communications Library) implements fault tolerant collective communications over IP☆96Updated last month
- ☆49Updated 11 months ago
- FlashRNN - Fast RNN Kernels with I/O Awareness☆92Updated last month
- ☆214Updated 5 months ago
- Official repository for the paper "NeuZip: Memory-Efficient Training and Inference with Dynamic Compression of Neural Networks". This rep…☆59Updated 8 months ago
- The evaluation framework for training-free sparse attention in LLMs☆82Updated 3 weeks ago
- Token Omission Via Attention☆128Updated 9 months ago
- Library for text-to-text regression, applicable to any input string representation and allows pretraining and fine-tuning over multiple r…☆86Updated this week
- A collection of tricks and tools to speed up transformer models☆170Updated last month
- An efficent implementation of the method proposed in "The Era of 1-bit LLMs"☆154Updated 9 months ago
- Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"☆243Updated 5 months ago
- TritonParse is a tool designed to help developers analyze and debug Triton kernels by visualizing the compilation process and source code…☆126Updated last week