timudk / flux_tritonLinks

Writing FLUX in Triton

☆41

Alternatives and similar repositories for flux_triton

Users that are interested in flux_triton are comparing it to the libraries listed below

Sorting:

huggingface / flux-fast
Making Flux go brrr on GPUs.
☆150Updated 3 months ago
chengzeyi / piflux
(WIP) Parallel inference for black-forest-labs' FLUX model.
☆18Updated 11 months ago
SwayStar123 / microdiffusion
☆49Updated 8 months ago
cloneofsimo / repa-rf
☆32Updated 11 months ago
WaveSpeedAI / QuantumAttention
[WIP] Better (FP8) attention for Hopper
☆33Updated 8 months ago
KONAKONA666 / q8_kernels
☆76Updated 10 months ago
RE-N-Y / imscore
Minimal Differentiable Image Reward Functions
☆97Updated 2 months ago
mapo-t2i / mapo
Official codebase for Margin-aware Preference Optimization for Aligning Diffusion Models without Reference (MaPO).
☆82Updated last year
ai-compiler-study / triton-kernels
Triton kernels for Flux
☆22Updated 3 months ago
csguoh / IntLoRA
[ICML2025] LoRA fine-tune directly on the quantized models.
☆36Updated 11 months ago
fal-ai / diffusion-speedrun
Focused on fast experimentation and simplicity
☆75Updated 10 months ago
yandex-research / vqdm
Official repository for VQDM:Accurate Compression of Text-to-Image Diffusion Models via Vector Quantization paper
☆34Updated last year
sayakpaul / simple-image-recaptioning
Recaption large (Web)Datasets with vllm and save the artifacts.
☆52Updated 11 months ago
Lucky-Lance / TerDiT
TerDiT: Ternary Diffusion Models with Transformers
☆71Updated last year
huggingface / diffusion-fast
Faster generation with text-to-image diffusion models.
☆228Updated 3 months ago
cloneofsimo / imagenet.int8
☆39Updated last year
cloneofsimo / min-max-in-dit
☆27Updated last year
sandyresearch / chipmunk
🎬 3.7× faster video generation E2E 🖼️ 1.6× faster image generation E2E ⚡ ColumnSparseAttn 9.3× vs FlashAttn‑3 💨 ColumnSparseGEMM 2.5× …
☆87Updated last month
xdit-project / mochi-xdit
faster parallel inference of mochi-1 video generation model
☆125Updated 8 months ago
NVlabs / T-Stitch
[ICLR 2025] Official PyTorch implmentation of paper "T-Stitch: Accelerating Sampling in Pre-trained Diffusion Models with Trajectory Stit…
☆103Updated last year
czg1225 / AsyncDiff
[NeurIPS 2024] AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising
☆205Updated last month
Roblox / SmoothCache
Implementation of SmoothCache, a project aimed at speeding-up Diffusion Transformer (DiT) based GenAI models with error-guided caching.
☆45Updated 3 months ago
fal-ai-community / NativeSparseAttention
research impl of Native Sparse Attention (2502.11089)
☆62Updated 8 months ago
ethansmith2000 / AutoLoRADiscovery
☆28Updated last year
cloneofsimo / project_RF
☆24Updated last year
stanislavfort / Direct_Ascent_Synthesis
A demo for the Direct Ascent Synthesis: Hidden Generative Capabilities in Discriminative Models paper (https://arxiv.org/abs/2502.07753)
☆40Updated 7 months ago
cloneofsimo / efae
☆23Updated last year
mit-han-lab / VisCompare
A WebUI for Side-by-Side Comparison of Media (Images/Videos) Across Multiple Folders
☆24Updated 8 months ago
ethansmith2000 / ImprovedTokenMerge
☆49Updated last year
aredden / torch-cublas-hgemm
PyTorch half precision gemm lib w/ fused optional bias + optional relu/gelu
☆75Updated 10 months ago