High-performance safetensors model loader
☆143May 19, 2026Updated this week
Alternatives and similar repositories for fastsafetensors
Users that are interested in fastsafetensors are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A forked version of flux-fast that makes flux-fast even faster with cache-dit, 3.3x speedup on NVIDIA L20.☆24Jul 18, 2025Updated 10 months ago
- 🚀 Collection of libraries used with fms-hf-tuning to accelerate fine-tuning and training of large models.☆14Jan 30, 2026Updated 3 months ago
- ☆13May 11, 2026Updated last week
- ☆33Nov 4, 2024Updated last year
- ☆19Mar 4, 2025Updated last year
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Making Flux go brrr on GPUs.☆168Jan 5, 2026Updated 4 months ago
- ☆33Feb 3, 2025Updated last year
- Eurosys22' - Rolis: a software approach to efficiently replicating multi-core transactions☆17Feb 28, 2024Updated 2 years ago
- DeeperGEMM: crazy optimized version☆86May 5, 2025Updated last year
- KV cache store for distributed LLM inference☆419Nov 13, 2025Updated 6 months ago
- Multiple GEMM operators are constructed with cutlass to support LLM inference.☆20Aug 3, 2025Updated 9 months ago
- Lightweight Python Wrapper for OpenVINO, enabling LLM inference on NPUs☆29Dec 17, 2024Updated last year
- ☆52May 19, 2025Updated last year
- ☆22Feb 26, 2025Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- A Triton JIT runtime and ffi provider in C++☆35Updated this week
- Alfred workflow for Typora.☆10Dec 31, 2025Updated 4 months ago
- A prefill & decode disaggregated LLM serving framework with shared GPU memory and fine-grained compute isolation.☆126Dec 25, 2025Updated 4 months ago
- A lightweight design for computation-communication overlap.☆232Jan 20, 2026Updated 4 months ago
- ByteCheckpoint: An Unified Checkpointing Library for LFMs☆279Feb 2, 2026Updated 3 months ago
- A curated list for Efficient Large Language Models☆11Mar 25, 2024Updated 2 years ago
- USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inference☆667Jan 15, 2026Updated 4 months ago
- Estimate resources needed to train LLMs☆14Feb 10, 2026Updated 3 months ago
- Simplified Data Management and Sharing for Kubernetes☆18May 13, 2026Updated last week
- End-to-end encrypted cloud storage - Proton Drive • AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- Linux kernel source tree for PVM☆37May 11, 2026Updated last week
- ☆13Jan 7, 2025Updated last year
- Ths is a fast RDMA abstraction layer that works both in the kernel and user-space.☆59Nov 12, 2024Updated last year
- Module, Model, and Tensor Serialization/Deserialization☆308Apr 30, 2026Updated 3 weeks ago
- UniFork: Exploring Modality Alignment for Unified Multimodal Understanding and Generation☆47Aug 26, 2025Updated 8 months ago
- ☆98Mar 26, 2025Updated last year
- An experimental communicating attention kernel based on DeepEP.☆34Jul 29, 2025Updated 9 months ago
- Whisper in TensorRT-LLM☆17Sep 21, 2023Updated 2 years ago
- A throughput-oriented high-performance serving framework for LLMs☆959Mar 29, 2026Updated last month
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Fine-tune of Florence-2 for shot categorization.☆26Mar 6, 2025Updated last year
- Flash Sculptor: Modular 3D Worlds from Objects☆33Apr 13, 2025Updated last year
- ☆33Apr 19, 2025Updated last year
- Virtual I/O acceleration technologies for KVM☆15Sep 17, 2013Updated 12 years ago
- Official implementation of ICML 2024 paper "ExCP: Extreme LLM Checkpoint Compression via Weight-Momentum Joint Shrinking".☆48Jul 12, 2024Updated last year
- Tacker: Tensor-CUDA Core Kernel Fusion for Improving the GPU Utilization while Ensuring QoS☆33Feb 10, 2025Updated last year
- FlexAttention w/ FlashAttention3 Support☆27Oct 5, 2024Updated last year