sekstini / gpupoorLinks

☆17

Alternatives and similar repositories for gpupoor

Users that are interested in gpupoor are comparing it to the libraries listed below

Sorting:

aredden / torch-cublas-hgemm
PyTorch half precision gemm lib w/ fused optional bias + optional relu/gelu
☆72Updated 7 months ago
WaveSpeedAI / QuantumAttention
[WIP] Better (FP8) attention for Hopper
☆31Updated 4 months ago
sekstini / basedxl
☆18Updated last year
KONAKONA666 / q8_kernels
☆71Updated 6 months ago
mag- / gpu_benchmark
Gpu benchmark
☆63Updated 5 months ago
fal-ai / diffusion-speedrun
Focused on fast experimentation and simplicity
☆76Updated 6 months ago
timudk / flux_triton
Writing FLUX in Triton
☆34Updated 9 months ago
fal-ai-community / NativeSparseAttention
research impl of Native Sparse Attention (2502.11089)
☆54Updated 4 months ago
SwayStar123 / microdiffusion
☆47Updated 4 months ago
chengzeyi / piflux
(WIP) Parallel inference for black-forest-labs' FLUX model.
☆19Updated 7 months ago
Clybius / Personalized-Optimizers
A collection of niche / personally useful PyTorch optimizers with modified code.
☆20Updated this week
cloneofsimo / efae
☆23Updated last year
SwayStar123 / reimei
☆24Updated 2 months ago
IST-DASLab / Quartet
☆71Updated 2 weeks ago
NovelAI / t5
Model code for inferencing T5
☆65Updated 4 months ago
chu-tianxiang / QuIP-for-all
QuIP quantization
☆54Updated last year
kuterd / opal_ptx
Experimental GPU language with meta-programming
☆23Updated 10 months ago
sandyresearch / chipmunk
🎬 3.7× faster video generation E2E 🖼️ 1.6× faster image generation E2E ⚡ ColumnSparseAttn 9.3× vs FlashAttn‑3 💨 ColumnSparseGEMM 2.5× …
☆74Updated 2 weeks ago
huggingface / flux-fast
Making Flux go brrr on GPUs.
☆95Updated last week
kyegomez / SingLoRA
This repository provides a minimal, single-file implementation of SingLoRA (Single Matrix Low-Rank Adaptation) as described in the paper …
☆27Updated last week
BlinkDL / fast.c
Prepare for DeekSeek R1 inference: Benchmark CPU, DRAM, SSD, iGPU, GPU, ... with efficient code.
☆72Updated 5 months ago
kyleliang919 / Super_Muon
☆59Updated 3 months ago
fal-ai / stable-diffusion-benchmarks
Comparison of different stable diffusion implementations and optimizations
☆39Updated last year
BlinkDL / modded-nanogpt-rwkv
RWKV-7: Surpassing GPT
☆92Updated 7 months ago
ethansmith2000 / ImprovedTokenMerge
☆49Updated last year
ScalingIntelligence / good-kernels
Samples of good AI generated CUDA kernels
☆84Updated last month
Cornell-RelaxML / qtip
☆139Updated 3 weeks ago
sayakpaul / q8-ltx-video
This repository shows how to use Q8 kernels with `diffusers` to optimize inference of LTX-Video on ADA GPUs.
☆21Updated 6 months ago
cloneofsimo / repa-rf
☆32Updated 8 months ago
sayakpaul / simple-image-recaptioning
Recaption large (Web)Datasets with vllm and save the artifacts.
☆52Updated 7 months ago