justinchuby / onnx-safetensors
Use safetensors with ONNX π€
β57Updated 2 months ago
Alternatives and similar repositories for onnx-safetensors
Users that are interested in onnx-safetensors are comparing it to the libraries listed below
Sorting:
- Model compression for ONNXβ92Updated 5 months ago
- Python bindings for ggmlβ140Updated 8 months ago
- π· Build compute kernelsβ37Updated this week
- Common utilities for ONNX convertersβ268Updated 5 months ago
- Visualize ONNX models with model-explorerβ33Updated 2 months ago
- onnxruntime-extensions: A specialized pre- and post- processing library for ONNX Runtimeβ385Updated this week
- ONNX Script enables developers to naturally author ONNX functions and models using a subset of Python.β349Updated this week
- The Triton backend for the ONNX Runtime.β144Updated last week
- Experiments with BitNet inference on CPUβ54Updated last year
- A safetensors extension to efficiently store sparse quantized tensors on diskβ109Updated this week
- A general 2-8 bits quantization toolbox with GPTQ/AWQ/HQQ/VPTQ, and export to onnx/onnx-runtime easily.β169Updated last month
- Load compute kernels from the Hubβ116Updated last week
- A high-throughput and memory-efficient inference and serving engine for LLMsβ263Updated 7 months ago
- Module, Model, and Tensor Serialization/Deserializationβ227Updated this week
- β14Updated 5 months ago
- Fast low-bit matmul kernels in Tritonβ299Updated this week
- Rust crate for some audio utilitiesβ23Updated 2 months ago
- π€ Optimum ExecuTorchβ38Updated last week
- β207Updated last week
- Local LLM Server with NPU Accelerationβ180Updated last week
- β68Updated 4 months ago
- [WIP] Better (FP8) attention for Hopperβ30Updated 2 months ago
- Common source, scripts and utilities shared across all Triton repositories.β71Updated 2 weeks ago
- a simple Flash Attention v2 implementation with ROCM (RDNA3 GPU, roc wmma), mainly used for stable diffusion(ComfyUI) in Windows ZLUDA enβ¦β41Updated 8 months ago
- Thin wrapper around GGML to make life easierβ29Updated this week
- AMD related optimizations for transformer modelsβ75Updated 6 months ago
- Triton CLI is an open source command line interface that enables users to create, deploy, and profile models served by the Triton Inferenβ¦β62Updated 2 weeks ago
- MLX support for the Open Neural Network Exchange (ONNX)β48Updated last year
- This repository contains the experimental PyTorch native float8 training UXβ224Updated 9 months ago
- β73Updated 5 months ago