High-performance safetensors model loader
☆121Mar 11, 2026Updated last week
Alternatives and similar repositories for fastsafetensors
Users that are interested in fastsafetensors are comparing it to the libraries listed below
Sorting:
- ☆289Updated this week
- A forked version of flux-fast that makes flux-fast even faster with cache-dit, 3.3x speedup on NVIDIA L20.☆24Jul 18, 2025Updated 8 months ago
- 🚀 Collection of libraries used with fms-hf-tuning to accelerate fine-tuning and training of large models.☆13Jan 30, 2026Updated last month
- ☆13Mar 10, 2026Updated last week
- ☆13Mar 2, 2018Updated 8 years ago
- ☆33Nov 4, 2024Updated last year
- CRIU based GPU workload migration in Kubernetes☆20Apr 22, 2025Updated 11 months ago
- ☆18Mar 4, 2025Updated last year
- A tool for coordinated checkpoint/restore of distributed applications with CRIU☆31Mar 2, 2026Updated 2 weeks ago
- Eurosys22' - Rolis: a software approach to efficiently replicating multi-core transactions☆17Feb 28, 2024Updated 2 years ago
- This action provides GitHub Actions runner OS information.☆14Mar 1, 2026Updated 2 weeks ago
- NVIDIA Inference Xfer Library (NIXL)☆945Updated this week
- ☆22Feb 26, 2025Updated last year
- A Triton JIT runtime and ffi provider in C++☆32Updated this week
- Alfred workflow for Typora.☆10Dec 31, 2025Updated 2 months ago
- A prefill & decode disaggregated LLM serving framework with shared GPU memory and fine-grained compute isolation.☆123Dec 25, 2025Updated 2 months ago
- A Golang Prometheus back-filling library☆10May 30, 2023Updated 2 years ago
- A curated list for Efficient Large Language Models☆11Mar 25, 2024Updated last year
- VUA stands for 'VAST Undivided Attention'. It's a global KVCache storage solution optimizing LLM time to first token (TTFT) and GPU utili…☆37Mar 12, 2026Updated last week
- Simplified Data Management and Sharing for Kubernetes☆18Mar 11, 2026Updated last week
- Making Flux go brrr on GPUs.☆163Jan 5, 2026Updated 2 months ago
- Module, Model, and Tensor Serialization/Deserialization☆294Feb 6, 2026Updated last month
- Ths is a fast RDMA abstraction layer that works both in the kernel and user-space.☆59Nov 12, 2024Updated last year
- KV cache store for distributed LLM inference☆399Nov 13, 2025Updated 4 months ago
- UniFork: Exploring Modality Alignment for Unified Multimodal Understanding and Generation☆46Aug 26, 2025Updated 6 months ago
- IoT Projects with Elixir/Phoenix for fukuoka.ex#11☆14Apr 21, 2019Updated 6 years ago
- Hack for start other istance of wpa_supplicant daemon☆13Nov 16, 2017Updated 8 years ago
- Fine-tune of Florence-2 for shot categorization.☆26Mar 6, 2025Updated last year
- ☆32Apr 19, 2025Updated 11 months ago
- Virtual I/O acceleration technologies for KVM☆15Sep 17, 2013Updated 12 years ago
- PelemayBackend: A memory-saving, fault-tolerant and distributed collection of Nx compilers and backends for embedded systems.☆26Apr 17, 2025Updated 11 months ago
- Official implementation of ICML 2024 paper "ExCP: Extreme LLM Checkpoint Compression via Weight-Momentum Joint Shrinking".☆47Jul 12, 2024Updated last year
- USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inference☆649Jan 15, 2026Updated 2 months ago
- ☆11Jul 21, 2024Updated last year
- An opensource icon generation tool based on OpenAI gpt-image-1.☆14Nov 3, 2025Updated 4 months ago
- Pie: Programmable LLM Serving☆131Updated this week
- DeeperGEMM: crazy optimized version☆75May 5, 2025Updated 10 months ago
- ☆16Sep 4, 2025Updated 6 months ago
- Tacker: Tensor-CUDA Core Kernel Fusion for Improving the GPU Utilization while Ensuring QoS☆34Feb 10, 2025Updated last year