justinchuby / onnx-safetensorsLinks

Use safetensors with ONNX 🤗

☆69

Alternatives and similar repositories for onnx-safetensors

Users that are interested in onnx-safetensors are comparing it to the libraries listed below

Sorting:

onnx / neural-compressor
Model compression for ONNX
☆97Updated 8 months ago
staghado / vit.cpp
Inference Vision Transformer (ViT) in plain C/C++ with ggml
☆289Updated last year
abetlen / ggml-python
Python bindings for ggml
☆143Updated 11 months ago
microsoft / onnxscript
ONNX Script enables developers to naturally author ONNX functions and models using a subset of Python.
☆369Updated this week
microsoft / onnxruntime-extensions
onnxruntime-extensions: A specialized pre- and post- processing library for ONNX Runtime
☆405Updated this week
justinchuby / model-explorer-onnx
Visualize ONNX models with model-explorer
☆39Updated 2 months ago
ngxson / ggml-easy
Thin wrapper around GGML to make life easier
☆40Updated last month
neuralmagic / compressed-tensors
A safetensors extension to efficiently store sparse quantized tensors on disk
☆142Updated this week
onnx / turnkeyml
No-code CLI designed for accelerating ONNX workflows
☆207Updated last month
inisis / OnnxSlim
A Toolkit to Help Optimize Onnx Model
☆189Updated this week
microsoft / onnxconverter-common
Common utilities for ONNX converters
☆276Updated 3 weeks ago
google-ai-edge / ai-edge-quantizer
AI Edge Quantizer: flexible post training quantization for LiteRT models.
☆56Updated 2 weeks ago
huggingface / optimum-executorch
🤗 Optimum ExecuTorch
☆58Updated last week
neuralmagic / nm-vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
☆266Updated 9 months ago
huggingface / optimum-intel
🤗 Optimum Intel: Accelerate inference with Intel optimization tools
☆481Updated this week
salykova / sgemm.cu
High-Performance SGEMM on CUDA devices
☆98Updated 6 months ago
pytorch-labs / triton-cpu
An experimental CPU backend for Triton (https//github.com/openai/triton)
☆43Updated 4 months ago
sekstini / gpupoor
☆17Updated 8 months ago
huggingface / optimum-amd
AMD related optimizations for transformer models
☆81Updated last month
aredden / torch-cublas-hgemm
PyTorch half precision gemm lib w/ fused optional bias + optional relu/gelu
☆72Updated 8 months ago
Repeerc / flash-attention-v2-RDNA3-minimal
a simple Flash Attention v2 implementation with ROCM (RDNA3 GPU, roc wmma), mainly used for stable diffusion(ComfyUI) in Windows ZLUDA en…
☆44Updated 11 months ago
intel / neural-speed
An innovative library for efficient LLM inference via low-bit quantization
☆349Updated 11 months ago
triton-inference-server / common
Common source, scripts and utilities shared across all Triton repositories.
☆75Updated this week
triton-inference-server / triton_cli
Triton CLI is an open source command line interface that enables users to create, deploy, and profile models served by the Triton Inferen…
☆66Updated this week
coreweave / tensorizer
Module, Model, and Tensor Serialization/Deserialization
☆250Updated this week
KONAKONA666 / q8_kernels
☆73Updated 7 months ago
wejoncy / QLLM
A general 2-8 bits quantization toolbox with GPTQ/AWQ/HQQ/VPTQ, and export to onnx/onnx-runtime easily.
☆175Updated 4 months ago
mmwillet / TTS.cpp
TTS support with GGML
☆139Updated 2 weeks ago
openvinotoolkit / openvino_tokenizers
OpenVINO Tokenizers extension
☆38Updated last week
google / jaxonnxruntime
A user-friendly tool chain that enables the seamless execution of ONNX models using JAX as the backend.
☆118Updated last week