justinchuby / onnx-safetensors
Use safetensors with ONNX 🤗
☆37Updated this week
Alternatives and similar repositories for onnx-safetensors:
Users that are interested in onnx-safetensors are comparing it to the libraries listed below
- ONNX Adapter for model-explorer☆27Updated 5 months ago
- LLM SDK for OnnxRuntime GenAI (OGA)☆90Updated this week
- OpenVINO Tokenizers extension☆30Updated this week
- onnxruntime-extensions: A specialized pre- and post- processing library for ONNX Runtime☆361Updated this week
- AMD related optimizations for transformer models☆68Updated 3 months ago
- Python bindings for ggml☆140Updated 6 months ago
- Model compression for ONNX☆86Updated 3 months ago
- ONNX Script enables developers to naturally author ONNX functions and models using a subset of Python.☆318Updated this week
- Notes and artifacts from the ONNX steering committee☆25Updated last week
- Common utilities for ONNX converters☆259Updated 3 months ago
- The Triton backend for the ONNX Runtime.☆139Updated this week
- A general 2-8 bits quantization toolbox with GPTQ/AWQ/HQQ/VPTQ, and export to onnx/onnx-runtime easily.☆160Updated 2 weeks ago
- Module, Model, and Tensor Serialization/Deserialization☆214Updated last week
- ☆60Updated this week
- LLaMa/RWKV onnx models, quantization and testcase☆356Updated last year
- Development repository for the Triton language and compiler☆108Updated this week
- python package of rocm-smi-lib☆20Updated 5 months ago
- High-Performance SGEMM on CUDA devices☆79Updated last month
- ☆65Updated 3 months ago
- The Triton backend for the PyTorch TorchScript models.☆144Updated this week
- TORCH_LOGS parser for PT2☆33Updated this week
- Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)☆173Updated this week
- Easy and Efficient Quantization for Transformers☆192Updated 3 weeks ago
- ☆34Updated this week
- Blazing fast training of 🤗 Transformers on Graphcore IPUs☆85Updated 11 months ago
- Google TPU optimizations for transformers models☆100Updated last month
- This repository contains Dockerfiles, scripts, yaml files, Helm charts, etc. used to scale out AI containers with versions of TensorFlow …☆35Updated last week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆260Updated 4 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆67Updated this week