huggingface / safetensorsLinks
Simple, safe way to store and distribute tensors
β3,557Updated this week
Alternatives and similar repositories for safetensors
Users that are interested in safetensors are comparing it to the libraries listed below
Sorting:
- π Accelerate inference and training of π€ Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimizationβ¦β3,215Updated 2 weeks ago
- Accessible large language models via k-bit quantization for PyTorch.β7,845Updated last week
- Hackable and optimized Transformers building blocks, supporting a composable construction.β10,201Updated last week
- AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (Nβ¦β4,695Updated last week
- MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.β2,083Updated 5 months ago
- π A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (iβ¦β9,398Updated this week
- Large Language Model Text Generation Inferenceβ10,709Updated last week
- PyTorch native quantization and sparsity for training and inferenceβ2,576Updated last week
- A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hβ¦β3,007Updated last week
- PyTorch native post-training libraryβ5,619Updated this week
- β2,920Updated last week
- A machine learning compiler for GPUs, CPUs, and ML acceleratorsβ3,819Updated this week
- PyTorch extensions for high performance and large scale training.β3,390Updated 7 months ago
- Transformer related optimization, including BERT, GPTβ6,370Updated last year
- A blazing fast inference solution for text embeddings modelsβ4,321Updated this week
- PyTorch compiler that accelerates training and inference. Get built-in optimizations for performance, memory, parallelism, and easily wriβ¦β1,431Updated this week
- [MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Accelerationβ3,389Updated 5 months ago
- Development repository for the Triton language and compilerβ17,861Updated last week
- Inference Llama 2 in one file of pure π₯β2,115Updated 3 weeks ago
- Training and serving large-scale neural networks with auto parallelization.β3,171Updated 2 years ago
- A PyTorch native platform for training generative AI modelsβ4,866Updated this week
- Python bindings for the Transformer models implemented in C/C++ using GGML library.β1,877Updated last year
- β‘ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Plβ¦β2,169Updated last year
- Olive: Simplify ML Model Finetuning, Conversion, Quantization, and Optimization for CPUs, GPUs and NPUs.β2,206Updated last week
- An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.β5,007Updated 8 months ago
- Fast inference engine for Transformer modelsβ4,202Updated this week
- Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackabβ¦β1,584Updated last year
- A fast llama2 decoder in pure Rust.β1,056Updated 2 years ago
- Minimalistic large language model 3D-parallelism trainingβ2,365Updated last week
- PyTriton is a Flask/FastAPI-like interface that simplifies Triton's deployment in Python environments.β834Updated 4 months ago