fal-ai / flashpackLinks
High-throughput tensor loading for PyTorch
☆209Updated this week
Alternatives and similar repositories for flashpack
Users that are interested in flashpack are comparing it to the libraries listed below
Sorting:
- Making Flux go brrr on GPUs.☆157Updated 4 months ago
- faster parallel inference of mochi-1 video generation model☆125Updated 9 months ago
- Focused on fast experimentation and simplicity☆75Updated 11 months ago
- Recaption large (Web)Datasets with vllm and save the artifacts.☆52Updated last year
- [WIP] Better (FP8) attention for Hopper☆32Updated 9 months ago
- PyTorch half precision gemm lib w/ fused optional bias + optional relu/gelu☆76Updated last year
- ☆49Updated 9 months ago
- End-to-end recipes for optimizing diffusion models with torchao and diffusers (inference and FP8 training).☆388Updated 6 months ago
- Comparison of different stable diffusion implementations and optimizations☆40Updated last year
- RAM is all you need☆251Updated last week
- ☆166Updated last month
- ☆24Updated last year
- ☆18Updated last year
- Lightweight package that tracks and summarizes code changes using LLMs (Large Language Models)☆34Updated 9 months ago
- Flux diffusion model implementation using quantized fp8 matmul & remaining layers use faster half precision accumulate, which is ~2x fast…☆287Updated last year
- ☆69Updated last year
- ☆76Updated 11 months ago
- (CVPR 2025) Switti: Designing Scale-Wise Transformers for Text-to-Image Synthesis☆200Updated 4 months ago
- Writing FLUX in Triton☆41Updated last year
- ☆30Updated last year
- https://hf.co/hexgrad/Kokoro-82M☆14Updated 9 months ago
- Model code for inferencing T5☆66Updated 8 months ago
- (WIP) Parallel inference for black-forest-labs' FLUX model.☆18Updated last year
- Community ComfyUI workflows running on fal.ai☆57Updated last year
- This repository provides a minimal, single-file implementation of SingLoRA (Single Matrix Low-Rank Adaptation) as described in the paper …☆44Updated last week
- ☆27Updated 2 months ago
- ☆165Updated last week
- ☆27Updated last year
- CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning☆247Updated last month
- ☆20Updated last year