foundation-model-stack / fastsafetensorsLinks
High-performance safetensors model loader
☆94Updated 3 weeks ago
Alternatives and similar repositories for fastsafetensors
Users that are interested in fastsafetensors are comparing it to the libraries listed below
Sorting:
- CUDA checkpoint and restore utility☆406Updated 4 months ago
- ☆31Updated 9 months ago
- A safetensors extension to efficiently store sparse quantized tensors on disk☆237Updated last week
- ☆48Updated last year
- ☆278Updated last week
- Module, Model, and Tensor Serialization/Deserialization☆286Updated 5 months ago
- The driver for LMCache core to run in vLLM☆60Updated 11 months ago
- Fast and memory-efficient exact attention☆111Updated this week
- Genai-bench is a powerful benchmark tool designed for comprehensive token-level performance evaluation of large language model (LLM) serv…☆260Updated this week
- Accelerating MoE with IO and Tile-aware Optimizations☆563Updated 2 weeks ago
- torchcomms: a modern PyTorch communications API☆323Updated last week
- DeeperGEMM: crazy optimized version☆73Updated 8 months ago
- ☆206Updated 8 months ago
- NVIDIA NVSHMEM is a parallel programming interface for NVIDIA GPUs based on OpenSHMEM. NVSHMEM can significantly reduce multi-process com…☆461Updated last month
- ☆76Updated last year
- LoRAFusion: Efficient LoRA Fine-Tuning for LLMs☆23Updated 4 months ago
- Perplexity GPU Kernels☆554Updated 2 months ago
- KV cache store for distributed LLM inference☆389Updated 2 months ago
- NVIDIA Resiliency Extension is a python package for framework developers and users to implement fault-tolerant features. It improves the …☆255Updated this week
- ☆342Updated this week
- extensible collectives library in triton☆93Updated 10 months ago
- A NCCL extension library, designed to efficiently offload GPU memory allocated by the NCCL communication library.☆87Updated last month
- kernels, of the mega variety☆657Updated 4 months ago
- ArcticInference: vLLM plugin for high-throughput, low-latency inference☆384Updated this week
- ☆71Updated 11 months ago
- NVIDIA Inference Xfer Library (NIXL)☆864Updated this week
- ☆321Updated last year
- A high-performance and light-weight router for vLLM large scale deployment☆95Updated last week
- Tilus is a tile-level kernel programming language with explicit control over shared memory and registers.☆440Updated last month
- Toolchain built around the Megatron-LM for Distributed Training☆84Updated last month