huggingface / kernel-builderLinks

👷 Build compute kernels

☆87

Alternatives and similar repositories for kernel-builder

Users that are interested in kernel-builder are comparing it to the libraries listed below

Sorting:

huggingface / kernels
Load compute kernels from the Hub
☆220Updated this week
joey00072 / ohara
Collection of autoregressive model implementation
☆86Updated 3 months ago
Zyphra / tree_attention
Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters
☆127Updated 8 months ago
HazyResearch / train-tk
train with kittens!
☆61Updated 9 months ago
foundation-model-stack / bamba
Train, tune, and infer Bamba model
☆130Updated 2 months ago
siyan-zhao / prepacking
The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models" [AISTATS …
☆61Updated 9 months ago
huggingface / optimum-tpu
Google TPU optimizations for transformers models
☆117Updated 6 months ago
cloneofsimo / ptx-tutorial-by-aislop
PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)
☆66Updated 4 months ago
PrimeIntellect-ai / pccl
PCCL (Prime Collective Communications Library) implements fault tolerant collective communications over IP
☆98Updated 2 weeks ago
VatsaDev / NanoPoor
NanoGPT-speedrunning for the poor T4 enjoyers
☆68Updated 3 months ago
kyleliang919 / Super_Muon
☆60Updated 4 months ago
tiiuae / onebitllms
Lightweight toolkit package to train and fine-tune 1.58bit Language models
☆81Updated 2 months ago
frankxwang / dpo-prefix-sharing
DPO, but faster 🚀
☆43Updated 7 months ago
Zyphra / Zamba2
PyTorch implementation of models from the Zamba2 series.
☆184Updated 6 months ago
IST-DASLab / Quartet
☆76Updated last month
opendatahub-io / vllm-tgis-adapter
vLLM adapter for a TGIS-compatible gRPC server.
☆33Updated this week
AnswerDotAI / cold-compress
Cold Compress is a hackable, lightweight, and open-source toolkit for creating and benchmarking cache compression methods built on top of…
☆138Updated 11 months ago
neuralmagic / compressed-tensors
A safetensors extension to efficiently store sparse quantized tensors on disk
☆141Updated last week
HazyResearch / cartridges
Storing long contexts in tiny caches with self-study
☆121Updated this week
tilde-research / MoMoE-impl
Memory optimized Mixture of Experts
☆51Updated last week
UmerHA / triton_util
Make triton easier
☆47Updated last year
PiotrNawrot / sparse-frontier
The evaluation framework for training-free sparse attention in LLMs
☆86Updated last month
michaelfeil / candle-flash-attn-v3
☆12Updated 6 months ago
gpu-mode / discord-cluster-manager
Write a fast kernel and run it on Discord. See how you compare against the best!
☆48Updated this week
microsoft / ArchScale
Simple & Scalable Pretraining for Neural Architecture Research
☆277Updated last week
BlinkDL / modded-nanogpt-rwkv
RWKV-7: Surpassing GPT
☆94Updated 8 months ago
HazyResearch / lolcats
Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"
☆244Updated 6 months ago
mgmalek / efficient_cross_entropy
☆114Updated last year
cloneofsimo / min-fsdp
☆83Updated last year
JoeLi12345 / nGPT
an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)
☆103Updated 4 months ago