ScalingIntelligence / good-kernelsLinks

Samples of good AI generated CUDA kernels

☆91

Alternatives and similar repositories for good-kernels

Users that are interested in good-kernels are comparing it to the libraries listed below

Sorting:

BlinkDL / modded-nanogpt-rwkv
RWKV-7: Surpassing GPT
☆100Updated last year
Zyphra / tree_attention
Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters
☆130Updated 11 months ago
Cornell-RelaxML / yaqa-quantization
☆62Updated 4 months ago
facebookresearch / fastgen
Simple high-throughput inference library
☆149Updated 6 months ago
amazon-science / mxfp4-llm
Official implementation for Training LLMs with MXFP4
☆102Updated 6 months ago
IST-DASLab / Quartet
☆106Updated 2 weeks ago
VatsaDev / NanoPoor
NanoGPT-speedrunning for the poor T4 enjoyers
☆72Updated 6 months ago
Zyphra / Zamba2
PyTorch implementation of models from the Zamba2 series.
☆185Updated 9 months ago
GreenBitAI / low_bit_llama
Advanced Ultra-Low Bitrate Compression Techniques for the LLaMA Family of LLMs
☆110Updated last year
tiiuae / onebitllms
Lightweight toolkit package to train and fine-tune 1.58bit Language models
☆98Updated 6 months ago
HazyResearch / lolcats
Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"
☆249Updated 9 months ago
apple / ml-recurrent-drafter
☆218Updated 9 months ago
chu-tianxiang / QuIP-for-all
QuIP quantization
☆60Updated last year
Cornell-RelaxML / qtip
☆153Updated 4 months ago
gpu-mode / discord-cluster-manager
Write a fast kernel and run it on Discord. See how you compare against the best!
☆61Updated last week
huggingface / kernel-builder
👷 Build compute kernels
☆178Updated this week
xiayuqing0622 / flex_head_fa
Fast and memory-efficient exact attention
☆74Updated 8 months ago
IST-DASLab / QuEST
Work in progress.
☆75Updated 4 months ago
Infini-AI-Lab / UMbreLLa
LLM Inference on consumer devices
☆125Updated 8 months ago
wdlctc / mini-s
☆52Updated last year
tilde-research / MoMoE-impl
Memory optimized Mixture of Experts
☆69Updated 3 months ago
PiotrNawrot / sparse-frontier
The evaluation framework for training-free sparse attention in LLMs
☆103Updated last month
HazyResearch / train-tk
train with kittens!
☆63Updated last year
meta-pytorch / BackendBench
How to ensure correctness and ship LLM generated kernels in PyTorch
☆117Updated this week
facebookresearch / llm-speedrunner
The Automated LLM Speedrunning Benchmark measures how well LLM agents can reproduce previous innovations and discover new ones in languag…
☆111Updated last month
PrimeIntellect-ai / pccl
PCCL (Prime Collective Communications Library) implements fault tolerant collective communications over IP
☆138Updated 2 months ago
NolanoOrg / SpectraSuite
☆52Updated last year
astramind-ai / BitMat
An efficent implementation of the method proposed in "The Era of 1-bit LLMs"
☆154Updated last year
SJTU-IPADS / Bamboo
Bamboo-7B Large Language Model
☆93Updated last year
DeepAuto-AI / hip-attention
Training-free Post-training Efficient Sub-quadratic Complexity Attention. Implemented with OpenAI Triton.
☆148Updated 2 weeks ago