foundation-model-stack / fastsafetensorsLinks
High-performance safetensors model loader
☆39Updated last week
Alternatives and similar repositories for fastsafetensors
Users that are interested in fastsafetensors are comparing it to the libraries listed below
Sorting:
- ☆38Updated 5 months ago
- NVIDIA Resiliency Extension is a python package for framework developers and users to implement fault-tolerant features. It improves the …☆177Updated 2 weeks ago
- The driver for LMCache core to run in vLLM☆41Updated 4 months ago
- extensible collectives library in triton☆86Updated 2 months ago
- NVIDIA Inference Xfer Library (NIXL)☆413Updated this week
- ☆28Updated 2 months ago
- CUDA checkpoint and restore utility☆345Updated 4 months ago
- DeeperGEMM: crazy optimized version☆69Updated last month
- Ultra and Unified CCL☆154Updated this week
- ☆91Updated 5 months ago
- KV cache store for distributed LLM inference☆261Updated 2 weeks ago
- ☆26Updated 3 months ago
- A lightweight design for computation-communication overlap.☆141Updated this week
- A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.☆166Updated this week
- Aims to implement dual-port and multi-qp solutions in deepEP ibrc transport☆50Updated last month
- NCCL Profiling Kit☆137Updated 11 months ago
- Fast and memory-efficient exact attention☆74Updated this week
- ☆81Updated 7 months ago
- ☆49Updated 3 months ago
- A low-latency & high-throughput serving engine for LLMs☆379Updated 3 weeks ago
- Perplexity GPU Kernels☆364Updated last week
- SpotServe: Serving Generative Large Language Models on Preemptible Instances☆123Updated last year
- Efficient and easy multi-instance LLM serving☆437Updated this week
- ☆62Updated last year
- Fast low-bit matmul kernels in Triton☆322Updated this week
- NVIDIA NCCL Tests for Distributed Training☆97Updated this week
- ☆55Updated 9 months ago
- Stateful LLM Serving☆73Updated 3 months ago
- A prefill & decode disaggregated LLM serving framework with shared GPU memory and fine-grained compute isolation.☆87Updated last month
- Boosting 4-bit inference kernels with 2:4 Sparsity☆78Updated 9 months ago