High-throughput tensor loading for PyTorch
☆221Jan 22, 2026Updated last month
Alternatives and similar repositories for flashpack
Users that are interested in flashpack are comparing it to the libraries listed below
Sorting:
- FlexAttention w/ FlashAttention3 Support☆27Oct 5, 2024Updated last year
- Benchmark tests supporting the TiledCUDA library.☆18Nov 19, 2024Updated last year
- https://wavespeed.ai/ Context parallel attention that accelerates DiT model inference with dynamic caching☆424Jul 5, 2025Updated 7 months ago
- SR-DiT Speedrunning ImageNet Diffusion☆126Dec 31, 2025Updated 2 months ago
- DeeperGEMM: crazy optimized version☆74May 5, 2025Updated 9 months ago
- A survey of manufacturer-provided DRAM operating parameters and timings as specified by DRAM chip datasheets from between 1970 and 2021. …☆11May 4, 2022Updated 3 years ago
- [ICCV 2025] Official repository of DiffSim: Taming Diffusion Models for Evaluating Visual Similarity☆30Jul 14, 2025Updated 7 months ago
- Focused on fast experimentation and simplicity☆80Dec 24, 2024Updated last year
- The flutter plugin for image-to-image-transformation using PyTorch-mobile (android) and CoreML (ios)☆11Feb 5, 2023Updated 3 years ago
- Fast and memory-efficient exact attention☆18Feb 23, 2026Updated last week
- ☆67Oct 25, 2025Updated 4 months ago
- DeepXTrace is a lightweight tool for precisely diagnosing slow ranks in DeepEP-based environments.☆93Jan 16, 2026Updated last month
- A big_vision inspired repo that implements a generic Auto-Encoder class capable in representation learning and generative modeling.☆34Jun 26, 2024Updated last year
- A Powerful LoRA key converter for ComfyUI☆28Nov 17, 2025Updated 3 months ago
- All-in-one benchmarking platform for evaluating LLM.☆15Nov 12, 2025Updated 3 months ago
- ☆16Sep 24, 2024Updated last year
- ☆47Jan 31, 2026Updated last month
- Multi-Layer Key-Value sharing experiments on Pythia models☆34Jun 14, 2024Updated last year
- CLIP GUI - XAI app ~ explainable (and guessable) AI with ViT & ResNet models☆21Sep 13, 2024Updated last year
- IP Address implementation☆18Mar 6, 2025Updated 11 months ago
- ☆21Updated this week
- Code accompanying the paper "A Language Model's Guide Through Latent Space". It contains functionality for training and using concept vec…☆21Feb 23, 2024Updated 2 years ago
- Animatediff implementation. Includes a ControlNet pipeline.☆19Dec 24, 2023Updated 2 years ago
- Context7 Scoring Library☆28Sep 19, 2025Updated 5 months ago
- A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.☆766Updated this week
- 🎬 3.7× faster video generation E2E 🖼️ 1.6× faster image generation E2E ⚡ ColumnSparseAttn 9.3× vs FlashAttn‑3 💨 ColumnSparseGEMM 2.5× …☆101Sep 8, 2025Updated 5 months ago
- End-to-end recipes for optimizing diffusion models with torchao and diffusers (inference and FP8 training).☆395Jan 8, 2026Updated last month
- An experimental implementation of compiler-driven automatic sharding of models across a given device mesh.☆52Updated this week
- Simple LaMa Inpainting: An easy-to-use implementation of the LaMa (Large Mask) inpainting model. Remove unwanted objects or fill in missi…☆23Nov 5, 2024Updated last year
- Lightweight Python Wrapper for OpenVINO, enabling LLM inference on NPUs☆27Dec 17, 2024Updated last year
- ☆97Mar 9, 2025Updated 11 months ago
- ☆47Jan 18, 2024Updated 2 years ago
- An open source real-time AI inference engine for seamless scaling☆22Jul 2, 2025Updated 8 months ago
- [⭐️ WACV 2025 Oral ⭐️] PETALface: Parameter Efficient Transfer Learning for Low-resolution Face Recognition☆31Jun 9, 2025Updated 8 months ago
- Efficient optimizers☆285Dec 20, 2025Updated 2 months ago
- [ICML 24 NGSM workshop] Associative Recurrent Memory Transformer implementation and scripts for training and evaluation☆61Updated this week
- Node to tryoff clothes☆23Apr 14, 2025Updated 10 months ago
- Official implementation of "Single Image Iterative Subject-driven Generation and Editing".☆100May 30, 2025Updated 9 months ago
- ☆175Nov 8, 2025Updated 3 months ago