justinchuby / onnx-safetensorsLinks
Use safetensors with ONNX π€
β69Updated last month
Alternatives and similar repositories for onnx-safetensors
Users that are interested in onnx-safetensors are comparing it to the libraries listed below
Sorting:
- Model compression for ONNXβ97Updated 8 months ago
- Inference Vision Transformer (ViT) in plain C/C++ with ggmlβ289Updated last year
- Python bindings for ggmlβ143Updated 11 months ago
- ONNX Script enables developers to naturally author ONNX functions and models using a subset of Python.β369Updated this week
- onnxruntime-extensions: A specialized pre- and post- processing library for ONNX Runtimeβ405Updated this week
- Visualize ONNX models with model-explorerβ39Updated 2 months ago
- Thin wrapper around GGML to make life easierβ40Updated last month
- A safetensors extension to efficiently store sparse quantized tensors on diskβ142Updated this week
- No-code CLI designed for accelerating ONNX workflowsβ207Updated last month
- A Toolkit to Help Optimize Onnx Modelβ189Updated this week
- Common utilities for ONNX convertersβ276Updated 3 weeks ago
- AI Edge Quantizer: flexible post training quantization for LiteRT models.β56Updated 2 weeks ago
- π€ Optimum ExecuTorchβ58Updated last week
- A high-throughput and memory-efficient inference and serving engine for LLMsβ266Updated 9 months ago
- π€ Optimum Intel: Accelerate inference with Intel optimization toolsβ481Updated this week
- High-Performance SGEMM on CUDA devicesβ98Updated 6 months ago
- An experimental CPU backend for Triton (https//github.com/openai/triton)β43Updated 4 months ago
- β17Updated 8 months ago
- AMD related optimizations for transformer modelsβ81Updated last month
- PyTorch half precision gemm lib w/ fused optional bias + optional relu/geluβ72Updated 8 months ago
- a simple Flash Attention v2 implementation with ROCM (RDNA3 GPU, roc wmma), mainly used for stable diffusion(ComfyUI) in Windows ZLUDA enβ¦β44Updated 11 months ago
- An innovative library for efficient LLM inference via low-bit quantizationβ349Updated 11 months ago
- Common source, scripts and utilities shared across all Triton repositories.β75Updated this week
- Triton CLI is an open source command line interface that enables users to create, deploy, and profile models served by the Triton Inferenβ¦β66Updated this week
- Module, Model, and Tensor Serialization/Deserializationβ250Updated this week
- β73Updated 7 months ago
- A general 2-8 bits quantization toolbox with GPTQ/AWQ/HQQ/VPTQ, and export to onnx/onnx-runtime easily.β175Updated 4 months ago
- TTS support with GGMLβ139Updated 2 weeks ago
- OpenVINO Tokenizers extensionβ38Updated last week
- A user-friendly tool chain that enables the seamless execution of ONNX models using JAX as the backend.β118Updated last week