huggingface / safetensorsLinks

Simple, safe way to store and distribute tensors

☆3,528

Alternatives and similar repositories for safetensors

Users that are interested in safetensors are comparing it to the libraries listed below

Sorting:

huggingface / optimum
🚀 Accelerate inference and training of 🤗 Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimization…
☆3,192Updated 2 weeks ago
bitsandbytes-foundation / bitsandbytes
Accessible large language models via k-bit quantization for PyTorch.
☆7,790Updated last week
facebookresearch / xformers
Hackable and optimized Transformers building blocks, supporting a composable construction.
☆10,131Updated 2 weeks ago
facebookincubator / AITemplate
AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (N…
☆4,695Updated last month
pytorch / ao
PyTorch native quantization and sparsity for training and inference
☆2,543Updated this week
deepspeedai / DeepSpeed-MII
MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
☆2,079Updated 5 months ago
Lightning-AI / lightning-thunder
PyTorch compiler that accelerates training and inference. Get built-in optimizations for performance, memory, parallelism, and easily wri…
☆1,424Updated last week
openxla / xla
A machine learning compiler for GPUs, CPUs, and ML accelerators
☆3,714Updated last week
ggml-org / ggml
Tensor library for machine learning
☆13,648Updated last week
NVIDIA / FasterTransformer
Transformer related optimization, including BERT, GPT
☆6,355Updated last year
huggingface / text-generation-inference
Large Language Model Text Generation Inference
☆10,684Updated 2 weeks ago
ELS-RD / kernl
Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackab…
☆1,585Updated last year
facebookresearch / fairscale
PyTorch extensions for high performance and large scale training.
☆3,386Updated 7 months ago
huggingface / accelerate
🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (i…
☆9,329Updated this week
AutoGPTQ / AutoGPTQ
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
☆4,992Updated 7 months ago
casper-hansen / AutoAWQ
AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:
☆2,276Updated 6 months ago
intel / intel-extension-for-transformers
⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Pl…
☆2,167Updated last year
turboderp-org / exllamav2
A fast inference library for running LLMs locally on modern consumer-class GPUs
☆4,378Updated 3 months ago
turboderp / exllama
A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
☆2,905Updated 2 years ago
NVIDIA / TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on H…
☆2,954Updated last week
microsoft / Olive
Olive: Simplify ML Model Finetuning, Conversion, Quantization, and Optimization for CPUs, GPUs and NPUs.
☆2,196Updated this week
marella / ctransformers
Python bindings for the Transformer models implemented in C/C++ using GGML library.
☆1,877Updated last year
mit-han-lab / llm-awq
[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
☆3,362Updated 4 months ago
intel / intel-extension-for-pytorch
A Python package for extending the official PyTorch that can easily obtain performance on Intel platform
☆1,993Updated this week
OpenNMT / CTranslate2
Fast inference engine for Transformer models
☆4,166Updated this week
NVIDIA / TensorRT-LLM
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizat…
☆12,258Updated this week
alpa-projects / alpa
Training and serving large-scale neural networks with auto parallelization.
☆3,167Updated last year
triton-lang / triton
Development repository for the Triton language and compiler
☆17,730Updated this week
huggingface / optimum-quanto
A pytorch quantization backend for optimum
☆1,011Updated last week
meta-pytorch / torchtune
PyTorch native post-training library
☆5,604Updated last week