justinchuby / onnx-safetensors
Use safetensors with ONNX π€
β50Updated 3 weeks ago
Alternatives and similar repositories for onnx-safetensors:
Users that are interested in onnx-safetensors are comparing it to the libraries listed below
- Model compression for ONNXβ88Updated 4 months ago
- Python bindings for ggmlβ140Updated 7 months ago
- A Toolkit to Help Optimize Onnx Modelβ129Updated this week
- LLM SDK for OnnxRuntime GenAI (OGA)β119Updated this week
- ONNX Script enables developers to naturally author ONNX functions and models using a subset of Python.β329Updated this week
- Common utilities for ONNX convertersβ261Updated 4 months ago
- PyTorch half precision gemm lib w/ fused optional bias + optional relu/geluβ57Updated 4 months ago
- The Triton backend for TensorRT.β70Updated 3 weeks ago
- A safetensors extension to efficiently store sparse quantized tensors on diskβ92Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMsβ262Updated 5 months ago
- onnxruntime-extensions: A specialized pre- and post- processing library for ONNX Runtimeβ371Updated this week
- AMD related optimizations for transformer modelsβ72Updated 4 months ago
- A general 2-8 bits quantization toolbox with GPTQ/AWQ/HQQ/VPTQ, and export to onnx/onnx-runtime easily.β165Updated 3 weeks ago
- The Triton backend for the ONNX Runtime.β140Updated 2 weeks ago
- β124Updated last year
- Load compute kernels from the Hubβ107Updated last week
- Notes and artifacts from the ONNX steering committeeβ25Updated this week
- a simple Flash Attention v2 implementation with ROCM (RDNA3 GPU, roc wmma), mainly used for stable diffusion(ComfyUI) in Windows ZLUDA enβ¦β37Updated 7 months ago
- β63Updated last week
- Module, Model, and Tensor Serialization/Deserializationβ220Updated last month
- β176Updated this week
- An experimental CPU backend for Triton (https//github.com/openai/triton)β40Updated 2 weeks ago
- π· Build compute kernelsβ24Updated this week
- Generative AI extensions for onnxruntimeβ667Updated this week
- [WIP] Better (FP8) attention for Hopperβ26Updated last month
- (WIP) Parallel inference for black-forest-labs' FLUX model.β18Updated 4 months ago
- Gpu benchmarkβ57Updated 2 months ago
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMsβ87Updated this week
- GGUF parser in Pythonβ26Updated 7 months ago
- Fast low-bit matmul kernels in Tritonβ275Updated this week