High-performance safetensors model loader
☆111Updated this week
Alternatives and similar repositories for fastsafetensors
Users that are interested in fastsafetensors are comparing it to the libraries listed below
Sorting:
- ☆285Updated this week
- ☆13Feb 10, 2026Updated 2 weeks ago
- CRIU based GPU workload migration in Kubernetes☆19Apr 22, 2025Updated 10 months ago
- ☆18Mar 4, 2025Updated 11 months ago
- ☆33Nov 4, 2024Updated last year
- Eurosys22' - Rolis: a software approach to efficiently replicating multi-core transactions☆17Feb 28, 2024Updated 2 years ago
- Normalize text string☆12Nov 6, 2018Updated 7 years ago
- A prefill & decode disaggregated LLM serving framework with shared GPU memory and fine-grained compute isolation.☆123Dec 25, 2025Updated 2 months ago
- Module, Model, and Tensor Serialization/Deserialization☆289Feb 6, 2026Updated 3 weeks ago
- Triton kernels for Flux☆22Jul 7, 2025Updated 7 months ago
- A Triton JIT runtime and ffi provider in C++☆31Updated this week
- KV cache store for distributed LLM inference☆392Nov 13, 2025Updated 3 months ago
- Linux kernel source tree for PVM☆32Sep 24, 2025Updated 5 months ago
- [ACL 2025 Main] Repository for the paper: 500xCompressor: Generalized Prompt Compression for Large Language Models☆56Jun 11, 2025Updated 8 months ago
- Making Flux go brrr on GPUs.☆163Jan 5, 2026Updated last month
- [ICLR 2025] Official Pytorch Implementation of "Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN" by Pengxia…☆29Jul 24, 2025Updated 7 months ago
- 🍑 relsim: Relational Visual Similarity | pip install relsim 🌍 (CVPR 2026)☆63Feb 21, 2026Updated last week
- High Performance KV Cache Store for LLM☆47Updated this week
- FlexFlow Serve: Low-Latency, High-Performance LLM Serving☆74Sep 15, 2025Updated 5 months ago
- A throughput-oriented high-performance serving framework for LLMs☆946Oct 29, 2025Updated 4 months ago
- ☆24Jun 4, 2024Updated last year
- Compression for Foundation Models☆35Jul 21, 2025Updated 7 months ago
- ☆25Sep 19, 2025Updated 5 months ago
- Dynamic Memory Management for Serving LLMs without PagedAttention☆464May 30, 2025Updated 9 months ago
- LCM Full Cycle Trainer for Ostris - Ai Toolkit☆16Aug 20, 2024Updated last year
- Convert tflite to JSON and make it editable in the IDE. It also converts the edited JSON back to tflite binary.☆28Feb 21, 2023Updated 3 years ago
- A lightweight design for computation-communication overlap.☆223Jan 20, 2026Updated last month
- ☆160Dec 27, 2024Updated last year
- ByteCheckpoint: An Unified Checkpointing Library for LFMs☆270Feb 2, 2026Updated last month
- UniFork: Exploring Modality Alignment for Unified Multimodal Understanding and Generation☆46Aug 26, 2025Updated 6 months ago
- An OS kernel module for fast **remote** fork using advanced datacenter networking (RDMA).☆71Feb 15, 2025Updated last year
- Consistent Autoregressive Video Generation with Long Context☆67Feb 6, 2026Updated 3 weeks ago
- ☆40Sep 1, 2025Updated 6 months ago
- DeeperGEMM: crazy optimized version☆74May 5, 2025Updated 9 months ago
- USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inference☆644Jan 15, 2026Updated last month
- ☆34Feb 3, 2025Updated last year
- ☆30Sep 14, 2022Updated 3 years ago
- Checkpoint-engine is a simple middleware to update model weights in LLM inference engines☆912Updated this week
- The classic STREAM benchmark, extended to measure NUMA effects.☆38Aug 8, 2019Updated 6 years ago