diegofiori / benchmark-pytorch2.0-with-nebullvmLinks
☆9Updated 2 years ago
Alternatives and similar repositories for benchmark-pytorch2.0-with-nebullvm
Users that are interested in benchmark-pytorch2.0-with-nebullvm are comparing it to the libraries listed below
Sorting:
- Model compression for ONNX☆96Updated 7 months ago
- Various transformers for FSDP research☆37Updated 2 years ago
- ☆32Updated 2 years ago
- A tool convert TensorRT engine/plan to a fake onnx☆39Updated 2 years ago
- NASRec Weight Sharing Neural Architecture Search for Recommender Systems☆30Updated last year
- ONNX and TensorRT implementation of Whisper☆63Updated 2 years ago
- ☆18Updated 2 weeks ago
- The Triton backend that allows running GPU-accelerated data pre-processing pipelines implemented in DALI's python API.☆135Updated 3 weeks ago
- ☆157Updated last year
- ☆18Updated 2 years ago
- Advanced inference pipeline using NVIDIA Triton Inference Server for CRAFT Text detection (Pytorch), included converter from Pytorch -> O…☆33Updated 3 years ago
- The Triton backend for the ONNX Runtime.☆153Updated last week
- This library empowers users to seamlessly port pretrained models and checkpoints on the HuggingFace (HF) hub (developed using HF transfor…☆71Updated this week
- The Triton backend for the PyTorch TorchScript models.☆152Updated last week
- Triton CLI is an open source command line interface that enables users to create, deploy, and profile models served by the Triton Inferen…☆64Updated 2 weeks ago
- Article about deploying machine learning models using grpc, pytorch and asyncio☆28Updated 2 years ago
- This code repository contains the code used for my "Optimizing Memory Usage for Training LLMs and Vision Transformers in PyTorch" blog po…☆92Updated last year
- ☆52Updated 4 years ago
- Count number of parameters / MACs / FLOPS for ONNX models.☆93Updated 8 months ago
- ☆31Updated 2 years ago
- ☆17Updated 2 years ago
- The Triton backend for TensorRT.☆77Updated last week
- A general 2-8 bits quantization toolbox with GPTQ/AWQ/HQQ/VPTQ, and export to onnx/onnx-runtime easily.☆172Updated 2 months ago
- Zero-copy multimodal vector DB with CUDA and CLIP/SigLIP☆59Updated last month
- A very simple tool for situations where optimization with onnx-simplifier would exceed the Protocol Buffers upper file size limit of 2GB,…☆17Updated last year
- torch::deploy (multipy for non-torch uses) is a system that lets you get around the GIL problem by running multiple Python interpreters i…☆180Updated 2 weeks ago
- Minimal sharded dataset loaders, decoders, and utils for multi-modal document, image, and text datasets.☆158Updated last year
- Sample app code for deploying TAO Toolkit trained models to Triton☆87Updated 9 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆264Updated 8 months ago
- Wanwu models release, code will be released soon☆24Updated 2 years ago