diegofiori / benchmark-pytorch2.0-with-nebullvm
☆9Updated 2 years ago
Alternatives and similar repositories for benchmark-pytorch2.0-with-nebullvm:
Users that are interested in benchmark-pytorch2.0-with-nebullvm are comparing it to the libraries listed below
- Model compression for ONNX☆81Updated 2 months ago
- A very simple tool for situations where optimization with onnx-simplifier would exceed the Protocol Buffers upper file size limit of 2GB,…☆15Updated 8 months ago
- ☆28Updated last year
- ☆31Updated last year
- A project that optimizes Whisper for low latency inference using NVIDIA TensorRT☆69Updated 3 months ago
- A boilerplate to use multiprocessing for your gRPC server in your Python project☆25Updated 3 years ago
- The Triton backend for TensorRT.☆68Updated this week
- ONNX and TensorRT implementation of Whisper☆61Updated last year
- The Triton backend for the ONNX Runtime.☆136Updated this week
- ☆157Updated last year
- Comparing PyTorch, JIT and ONNX for inference with Transformers☆17Updated 3 years ago
- Zero-copy multimodal vector DB with CUDA and CLIP/SigLIP☆41Updated 7 months ago
- A tool convert TensorRT engine/plan to a fake onnx☆37Updated 2 years ago
- Triton CLI is an open source command line interface that enables users to create, deploy, and profile models served by the Triton Inferen…☆52Updated this week
- TensorRT Acceleration for PyTorch Native Eager Mode Quantization Models☆14Updated 5 months ago
- Article about deploying machine learning models using grpc, pytorch and asyncio☆27Updated 2 years ago
- Various transformers for FSDP research☆34Updated 2 years ago
- This library empowers users to seamlessly port pretrained models and checkpoints on the HuggingFace (HF) hub (developed using HF transfor…☆57Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆257Updated 3 months ago
- Scailable ONNX python tools☆96Updated 2 months ago
- A general 2-8 bits quantization toolbox with GPTQ/AWQ/HQQ, and export to onnx/onnx-runtime easily.☆153Updated 3 months ago
- Advanced inference pipeline using NVIDIA Triton Inference Server for CRAFT Text detection (Pytorch), included converter from Pytorch -> O…☆32Updated 3 years ago
- Simple and easy stable diffusion inference with LightningModule on GPU, CPU and MPS (Possibly all devices supported by Lightning).☆17Updated last year
- Large Scale Distributed Model Training strategy with Colossal AI and Lightning AI☆58Updated last year
- NASRec Weight Sharing Neural Architecture Search for Recommender Systems☆29Updated last year
- Docker scripts for building ONNX Runtime with TensorRT and OpenVINO in manylinux environment☆23Updated last year
- Easy and Efficient Quantization for Transformers☆191Updated last month
- CLIP and SigLIP models optimized with TensorRT with a Transformers-like API☆19Updated 3 months ago
- A lightweight wrapper for PyTorch that provides a simple declarative API for context switching between devices, distributed modes, mixed-…☆66Updated last year