justinchuby / onnx-safetensorsLinks
Use safetensors with ONNX 🤗
☆67Updated 2 weeks ago
Alternatives and similar repositories for onnx-safetensors
Users that are interested in onnx-safetensors are comparing it to the libraries listed below
Sorting:
- Model compression for ONNX☆96Updated 7 months ago
- ONNX Script enables developers to naturally author ONNX functions and models using a subset of Python.☆362Updated this week
- Thin wrapper around GGML to make life easier☆36Updated 3 weeks ago
- Inference Vision Transformer (ViT) in plain C/C++ with ggml☆288Updated last year
- Python bindings for ggml☆142Updated 10 months ago
- AI Edge Quantizer: flexible post training quantization for LiteRT models.☆53Updated this week
- A Toolkit to Help Optimize Onnx Model☆174Updated last week
- onnxruntime-extensions: A specialized pre- and post- processing library for ONNX Runtime☆399Updated this week
- 🤗 Optimum ExecuTorch☆54Updated last week
- No-code CLI designed for accelerating ONNX workflows☆201Updated last month
- A high-throughput and memory-efficient inference and serving engine for LLMs☆264Updated 9 months ago
- A safetensors extension to efficiently store sparse quantized tensors on disk☆135Updated this week
- TTS support with GGML☆127Updated 2 weeks ago
- An innovative library for efficient LLM inference via low-bit quantization☆349Updated 10 months ago
- The Triton backend for the ONNX Runtime.☆155Updated last week
- AMD related optimizations for transformer models☆80Updated 3 weeks ago
- Visualize ONNX models with model-explorer☆36Updated last month
- OpenVINO Tokenizers extension☆37Updated last week
- 👷 Build compute kernels☆74Updated last week
- Experiments with BitNet inference on CPU☆54Updated last year
- Common utilities for ONNX converters☆274Updated 2 weeks ago
- 🤗 Optimum Intel: Accelerate inference with Intel optimization tools☆477Updated this week
- Module, Model, and Tensor Serialization/Deserialization☆248Updated this week
- A user-friendly tool chain that enables the seamless execution of ONNX models using JAX as the backend.☆115Updated 3 weeks ago
- GGML implementation of BERT model with Python bindings and quantization.☆55Updated last year
- GGUF parser in Python☆28Updated 11 months ago
- A minimalistic C++ Jinja templating engine for LLM chat templates☆157Updated 2 months ago
- PyTorch half precision gemm lib w/ fused optional bias + optional relu/gelu☆72Updated 7 months ago
- Profile your CoreML models directly from Python 🐍☆28Updated 9 months ago
- Google TPU optimizations for transformers models