fal-ai / flashpackLinks

High-throughput tensor loading for PyTorch

☆197

Alternatives and similar repositories for flashpack

Users that are interested in flashpack are comparing it to the libraries listed below

Sorting:

huggingface / flux-fast
Making Flux go brrr on GPUs.
☆154Updated 4 months ago
xdit-project / mochi-xdit
faster parallel inference of mochi-1 video generation model
☆125Updated 8 months ago
sayakpaul / simple-image-recaptioning
Recaption large (Web)Datasets with vllm and save the artifacts.
☆52Updated 11 months ago
WaveSpeedAI / QuantumAttention
[WIP] Better (FP8) attention for Hopper
☆32Updated 8 months ago
fal-ai / diffusion-speedrun
Focused on fast experimentation and simplicity
☆75Updated 10 months ago
KONAKONA666 / q8_kernels
☆77Updated 10 months ago
SwayStar123 / microdiffusion
☆49Updated 8 months ago
fal-ai / stable-diffusion-benchmarks
Comparison of different stable diffusion implementations and optimizations
☆39Updated last year
chengzeyi / piflux
(WIP) Parallel inference for black-forest-labs' FLUX model.
☆18Updated last year
aredden / torch-cublas-hgemm
PyTorch half precision gemm lib w/ fused optional bias + optional relu/gelu
☆76Updated 11 months ago
cloneofsimo / project_RF
☆24Updated last year
fal-ai-community / llmdifftracker
Lightweight package that tracks and summarizes code changes using LLMs (Large Language Models)
☆34Updated 8 months ago
kyegomez / SingLoRA
This repository provides a minimal, single-file implementation of SingLoRA (Single Matrix Low-Rank Adaptation) as described in the paper …
☆44Updated last week
sayakpaul / diffusers-torchao
End-to-end recipes for optimizing diffusion models with torchao and diffusers (inference and FP8 training).
☆383Updated 5 months ago
timudk / flux_triton
Writing FLUX in Triton
☆41Updated last year
ethansmith2000 / AutoLoRADiscovery
☆27Updated last year
gau-nernst / kokoro
https://hf.co/hexgrad/Kokoro-82M
☆14Updated 8 months ago
sekstini / gpupoor
☆18Updated 11 months ago
cloneofsimo / infinite-fractal-stream
☆30Updated last year
deepreinforce-ai / CUDA-L1
CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning
☆244Updated 2 weeks ago
peanutcocktail / CogVideo
Text-to-video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
☆17Updated last year
morphicfilms / frames-to-video
☆155Updated last week
SwayStar123 / reimei
☆25Updated last month
sandyresearch / chipmunk
🎬 3.7× faster video generation E2E 🖼️ 1.6× faster image generation E2E ⚡ ColumnSparseAttn 9.3× vs FlashAttn‑3 💨 ColumnSparseGEMM 2.5× …
☆90Updated 2 months ago
NovelAI / t5
Model code for inferencing T5
☆66Updated 8 months ago
lodestone-rock / RamTorch
RAM is all you need
☆223Updated last week
tdrussell / qlora-pipe
A pipeline parallel training script for LLMs.
☆162Updated 6 months ago
sayakpaul / q8-ltx-video
This repository shows how to use Q8 kernels with `diffusers` to optimize inference of LTX-Video on ADA GPUs.
☆24Updated 10 months ago
SonicCodes / lucid-v1
realtime latent world model inference demo
☆48Updated last year
aredden / flux-fp8-api
Flux diffusion model implementation using quantized fp8 matmul & remaining layers use faster half precision accumulate, which is ~2x fast…
☆285Updated last year