huggingface / safetensorsLinks
Simple, safe way to store and distribute tensors
β3,345Updated last week
Alternatives and similar repositories for safetensors
Users that are interested in safetensors are comparing it to the libraries listed below
Sorting:
- π Accelerate inference and training of π€ Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimizationβ¦β2,977Updated this week
- Accessible large language models via k-bit quantization for PyTorch.β7,212Updated this week
- Hackable and optimized Transformers building blocks, supporting a composable construction.β9,696Updated this week
- Large Language Model Text Generation Inferenceβ10,311Updated this week
- π A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (iβ¦β8,914Updated this week
- PyTorch native quantization and sparsity for training and inferenceβ2,168Updated this week
- Python bindings for the Transformer models implemented in C/C++ using GGML library.β1,866Updated last year
- PyTorch native post-training libraryβ5,323Updated this week
- A blazing fast inference solution for text embeddings modelsβ3,781Updated this week
- MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.β2,031Updated 2 weeks ago
- An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.β4,891Updated 3 months ago
- A fast inference library for running LLMs locally on modern consumer-class GPUsβ4,228Updated this week
- A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.β2,886Updated last year
- π€ Evaluate: A library for easily evaluating machine learning models and datasets.β2,254Updated 3 weeks ago
- AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:β2,206Updated 2 months ago
- Transformer related optimization, including BERT, GPTβ6,238Updated last year
- AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (Nβ¦β4,655Updated 3 months ago
- Tools for merging pretrained large language models.β6,016Updated 3 weeks ago
- Fast inference engine for Transformer modelsβ3,902Updated 3 months ago
- Inference Llama 2 in one file of pure π₯β2,115Updated last year
- A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper, Ada and Blaβ¦β2,548Updated this week
- Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackabβ¦β1,575Updated last year
- Olive: Simplify ML Model Finetuning, Conversion, Quantization, and Optimization for CPUs, GPUs and NPUs.β1,996Updated this week
- Fast and memory-efficient exact attentionβ18,252Updated this week
- β‘ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Plβ¦β2,167Updated 9 months ago
- π₯ Fast State-of-the-Art Tokenizers optimized for Research and Productionβ9,880Updated last week
- Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".β2,139Updated last year
- PyTriton is a Flask/FastAPI-like interface that simplifies Triton's deployment in Python environments.β806Updated last week
- [MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Accelerationβ3,140Updated this week
- PyTorch extensions for high performance and large scale training.β3,337Updated 2 months ago