justinchuby / onnx-safetensorsLinks
Use safetensors with ONNX π€
β76Updated 2 months ago
Alternatives and similar repositories for onnx-safetensors
Users that are interested in onnx-safetensors are comparing it to the libraries listed below
Sorting:
- Model compression for ONNXβ99Updated last year
- A safetensors extension to efficiently store sparse quantized tensors on diskβ214Updated this week
- ONNX Script enables developers to naturally author ONNX functions and models using a subset of Python.β412Updated this week
- Thin wrapper around GGML to make life easierβ40Updated last month
- Inference Vision Transformer (ViT) in plain C/C++ with ggmlβ300Updated last year
- Python bindings for ggmlβ146Updated last year
- Visualize ONNX models with model-explorerβ64Updated last month
- A high-throughput and memory-efficient inference and serving engine for LLMsβ267Updated last year
- onnxruntime-extensions: A specialized pre- and post- processing library for ONNX Runtimeβ430Updated this week
- π€ Optimum ONNX: Export your model to ONNX and run inference with ONNX Runtimeβ95Updated last week
- No-code CLI designed for accelerating ONNX workflowsβ219Updated 5 months ago
- AI Edge Quantizer: flexible post training quantization for LiteRT models.β81Updated 3 weeks ago
- AMD related optimizations for transformer modelsβ96Updated last month
- π€ Optimum Intel: Accelerate inference with Intel optimization toolsβ515Updated this week
- A Toolkit to Help Optimize Onnx Modelβ267Updated last week
- a simple Flash Attention v2 implementation with ROCM (RDNA3 GPU, roc wmma), mainly used for stable diffusion(ComfyUI) in Windows ZLUDA enβ¦β48Updated last year
- Common utilities for ONNX convertersβ288Updated 3 months ago
- An innovative library for efficient LLM inference via low-bit quantizationβ350Updated last year
- The Triton backend for the ONNX Runtime.β168Updated last week
- π· Build compute kernelsβ192Updated this week
- python package of rocm-smi-libβ24Updated last week
- Triton CLI is an open source command line interface that enables users to create, deploy, and profile models served by the Triton Inferenβ¦β72Updated 3 weeks ago
- PyTorch half precision gemm lib w/ fused optional bias + optional relu/geluβ76Updated last year
- Module, Model, and Tensor Serialization/Deserializationβ277Updated 3 months ago
- A general 2-8 bits quantization toolbox with GPTQ/AWQ/HQQ/VPTQ, and export to onnx/onnx-runtime easily.β183Updated 8 months ago
- β170Updated 3 weeks ago
- π€ Optimum ExecuTorchβ88Updated this week
- Development repository for the Triton language and compilerβ137Updated this week
- β76Updated 11 months ago
- GGUF parser in Python