justinchuby / onnx-safetensorsLinks
Use safetensors with ONNX 🤗
☆73Updated last month
Alternatives and similar repositories for onnx-safetensors
Users that are interested in onnx-safetensors are comparing it to the libraries listed below
Sorting:
- Model compression for ONNX☆98Updated last year
- ONNX Script enables developers to naturally author ONNX functions and models using a subset of Python.☆408Updated this week
- Python bindings for ggml☆146Updated last year
- Thin wrapper around GGML to make life easier☆40Updated 2 weeks ago
- Visualize ONNX models with model-explorer☆63Updated last month
- onnxruntime-extensions: A specialized pre- and post- processing library for ONNX Runtime☆424Updated this week
- 🤗 Optimum ONNX: Export your model to ONNX and run inference with ONNX Runtime☆83Updated last week
- Inference Vision Transformer (ViT) in plain C/C++ with ggml☆298Updated last year
- Common utilities for ONNX converters☆284Updated 2 months ago
- A safetensors extension to efficiently store sparse quantized tensors on disk☆204Updated last week
- AI Edge Quantizer: flexible post training quantization for LiteRT models.☆76Updated this week
- 🤗 Optimum ExecuTorch☆80Updated last week
- AMD related optimizations for transformer models☆95Updated last month
- No-code CLI designed for accelerating ONNX workflows☆216Updated 5 months ago
- 👷 Build compute kernels☆178Updated this week
- The Triton backend for the ONNX Runtime.☆166Updated last week
- ☆77Updated 10 months ago
- A Toolkit to Help Optimize Onnx Model☆236Updated last week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆267Updated last year
- GGUF parser in Python☆28Updated last year
- TTS support with GGML☆193Updated last month
- ☆18Updated 11 months ago
- An innovative library for efficient LLM inference via low-bit quantization☆349Updated last year
- Advanced Ultra-Low Bitrate Compression Techniques for the LLaMA Family of LLMs☆110Updated last year
- PyTorch half precision gemm lib w/ fused optional bias + optional relu/gelu☆76Updated 11 months ago
- ONNX Runtime prebuilt wheels for Apple Silicon (M1 / M2 / M3 / ARM64)☆227Updated last year
- 🤗 Optimum Intel: Accelerate inference with Intel optimization tools☆507Updated this week
- ☆169Updated last week
- A general 2-8 bits quantization toolbox with GPTQ/AWQ/HQQ/VPTQ, and export to onnx/onnx-runtime easily.☆180Updated 7 months ago
- Module, Model, and Tensor Serialization/Deserialization☆272Updated 2 months ago