justinchuby / onnx-safetensors
Use safetensors with ONNX π€
β54Updated last month
Alternatives and similar repositories for onnx-safetensors:
Users that are interested in onnx-safetensors are comparing it to the libraries listed below
- Model compression for ONNXβ91Updated 5 months ago
- A safetensors extension to efficiently store sparse quantized tensors on diskβ100Updated last week
- ONNX Script enables developers to naturally author ONNX functions and models using a subset of Python.β341Updated this week
- Local LLM Server with NPU Accelerationβ156Updated last week
- OpenVINO Tokenizers extensionβ32Updated this week
- AMD related optimizations for transformer modelsβ75Updated 5 months ago
- Python bindings for ggmlβ140Updated 7 months ago
- The Triton backend for the ONNX Runtime.β140Updated last week
- python package of rocm-smi-libβ20Updated 7 months ago
- Common utilities for ONNX convertersβ266Updated 4 months ago
- a simple Flash Attention v2 implementation with ROCM (RDNA3 GPU, roc wmma), mainly used for stable diffusion(ComfyUI) in Windows ZLUDA enβ¦β41Updated 8 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMsβ262Updated 6 months ago
- Visualize ONNX models with model-explorerβ31Updated last month
- [WIP] Better (FP8) attention for Hopperβ30Updated 2 months ago
- β68Updated 3 weeks ago
- Fast low-bit matmul kernels in Tritonβ291Updated this week
- A general 2-8 bits quantization toolbox with GPTQ/AWQ/HQQ/VPTQ, and export to onnx/onnx-runtime easily.β168Updated 3 weeks ago
- An experimental CPU backend for Triton (https//github.com/openai/triton)β40Updated last month
- β196Updated 3 weeks ago
- A Toolkit to Help Optimize Onnx Modelβ140Updated this week
- β14Updated 4 months ago
- Load compute kernels from the Hubβ115Updated this week
- β29Updated this week
- Development repository for the Triton language and compilerβ118Updated this week
- onnxruntime-extensions: A specialized pre- and post- processing library for ONNX Runtimeβ375Updated this week
- Easy and lightning fast training of π€ Transformers on Habana Gaudi processor (HPU)β185Updated this week
- Notes and artifacts from the ONNX steering committeeβ26Updated last week
- Gpu benchmarkβ59Updated 2 months ago
- ποΈ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of Oβ¦β295Updated 2 months ago
- Module, Model, and Tensor Serialization/Deserializationβ223Updated 2 months ago