huggingface / safetensorsLinks
Simple, safe way to store and distribute tensors
β3,311Updated this week
Alternatives and similar repositories for safetensors
Users that are interested in safetensors are comparing it to the libraries listed below
Sorting:
- Accessible large language models via k-bit quantization for PyTorch.β7,142Updated this week
- π Accelerate inference and training of π€ Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimizationβ¦β2,942Updated this week
- Hackable and optimized Transformers building blocks, supporting a composable construction.β9,591Updated last week
- An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.β4,873Updated 2 months ago
- Large Language Model Text Generation Inferenceβ10,236Updated this week
- MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.β2,020Updated 2 months ago
- Transformer related optimization, including BERT, GPTβ6,211Updated last year
- Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".β2,128Updated last year
- AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (Nβ¦β4,649Updated 2 months ago
- Fast inference engine for Transformer modelsβ3,856Updated 2 months ago
- AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:β2,193Updated last month
- A fast inference library for running LLMs locally on modern consumer-class GPUsβ4,213Updated 2 weeks ago
- Fast and memory-efficient exact attentionβ17,846Updated this week
- π A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (iβ¦β8,839Updated this week
- TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizatiβ¦β10,734Updated this week
- A machine learning compiler for GPUs, CPUs, and ML acceleratorsβ3,280Updated this week
- A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.β2,882Updated last year
- PyTorch native quantization and sparsity for training and inferenceβ2,114Updated this week
- Tensor library for machine learningβ12,697Updated last week
- β‘ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Plβ¦β2,169Updated 8 months ago
- [MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Accelerationβ3,081Updated last week
- 4 bits quantization of LLaMA using GPTQβ3,057Updated 11 months ago
- A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper, Ada and Blaβ¦β2,491Updated this week
- A pytorch quantization backend for optimumβ950Updated 3 weeks ago
- β2,833Updated 2 weeks ago
- PyTorch extensions for high performance and large scale training.β3,331Updated last month
- Development repository for the Triton language and compilerβ15,881Updated this week
- [Unmaintained, see README] An ecosystem of Rust libraries for working with large language modelsβ6,117Updated 11 months ago
- A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)β4,667Updated last year
- Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.β5,993Updated 2 months ago