justinchuby / onnx-safetensorsLinks
Use safetensors with ONNX 🤗
☆81Updated last week
Alternatives and similar repositories for onnx-safetensors
Users that are interested in onnx-safetensors are comparing it to the libraries listed below
Sorting:
- Model compression for ONNX☆99Updated last year
- Visualize ONNX models with model-explorer☆66Updated last week
- ONNX Script enables developers to naturally author ONNX functions and models using a subset of Python.☆418Updated this week
- Thin wrapper around GGML to make life easier☆42Updated 2 months ago
- Python bindings for ggml☆146Updated last year
- 🤗 Optimum ONNX: Export your model to ONNX and run inference with ONNX Runtime☆112Updated 3 weeks ago
- No-code CLI designed for accelerating ONNX workflows☆224Updated 7 months ago
- A Toolkit to Help Optimize Onnx Model☆308Updated last week
- onnxruntime-extensions: A specialized pre- and post- processing library for ONNX Runtime☆434Updated last month
- A safetensors extension to efficiently store sparse quantized tensors on disk☆233Updated last week
- Inference Vision Transformer (ViT) in plain C/C++ with ggml☆305Updated last year
- AMD related optimizations for transformer models☆96Updated 3 months ago
- GGUF parser in Python☆28Updated last year
- The Triton backend for the ONNX Runtime.☆171Updated this week
- Common utilities for ONNX converters☆291Updated last month
- 🤗 Optimum ExecuTorch☆101Updated last week
- AI Edge Quantizer: flexible post training quantization for LiteRT models.☆88Updated this week
- Efficient in-memory representation for ONNX, in Python☆41Updated this week
- Module, Model, and Tensor Serialization/Deserialization☆283Updated 4 months ago
- ☆70Updated 2 years ago
- A minimalistic C++ Jinja templating engine for LLM chat templates☆200Updated 3 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆267Updated last month
- 🤗 Optimum Intel: Accelerate inference with Intel optimization tools☆528Updated this week
- python package of rocm-smi-lib☆24Updated last month
- ☆171Updated 2 weeks ago
- A general 2-8 bits quantization toolbox with GPTQ/AWQ/HQQ/VPTQ, and export to onnx/onnx-runtime easily.☆184Updated 9 months ago
- An innovative library for efficient LLM inference via low-bit quantization☆351Updated last year
- PyTorch half precision gemm lib w/ fused optional bias + optional relu/gelu☆77Updated last year
- Notes and artifacts from the ONNX steering committee☆28Updated this week
- vLLM adapter for a TGIS-compatible gRPC server.☆47Updated this week