foundation-model-stack / fastsafetensors
High-performance safetensors model loader
☆25Updated 3 weeks ago
Alternatives and similar repositories for fastsafetensors:
Users that are interested in fastsafetensors are comparing it to the libraries listed below
- NVIDIA Inference Xfer Library (NIXL)☆304Updated this week
- NVIDIA Resiliency Extension is a python package for framework developers and users to implement fault-tolerant features. It improves the …☆151Updated this week
- ☆49Updated last month
- ☆25Updated 2 weeks ago
- CUDA checkpoint and restore utility☆330Updated 3 months ago
- ☆205Updated last month
- KV cache store for distributed LLM inference☆165Updated this week
- NVIDIA NCCL Tests for Distributed Training☆88Updated last week
- Perplexity GPU Kernels☆272Updated this week
- extensible collectives library in triton☆85Updated last month
- ☆186Updated 7 months ago
- LLM Serving Performance Evaluation Harness☆77Updated 2 months ago
- ☆34Updated 4 months ago
- A framework for PyTorch to enable fault management for collective communication libraries (CCL) such as NCCL☆19Updated last week
- Applied AI experiments and examples for PyTorch☆262Updated last week
- The driver for LMCache core to run in vLLM☆38Updated 3 months ago
- ☆117Updated last year
- ☆70Updated 4 months ago
- ☆53Updated 7 months ago
- ☆304Updated 8 months ago
- Module, Model, and Tensor Serialization/Deserialization☆225Updated 2 months ago
- Efficient and easy multi-instance LLM serving☆398Updated this week
- Fast low-bit matmul kernels in Triton☆295Updated this week
- Cloud Native Benchmarking of Foundation Models☆32Updated 5 months ago
- Dynamic Memory Management for Serving LLMs without PagedAttention☆360Updated 2 weeks ago
- Boosting 4-bit inference kernels with 2:4 Sparsity☆73Updated 8 months ago
- This repository contains the experimental PyTorch native float8 training UX☆224Updated 9 months ago
- A performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mind…☆157Updated 4 months ago
- Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.☆120Updated this week
- A low-latency & high-throughput serving engine for LLMs☆351Updated 2 weeks ago