diegofiori / benchmark-pytorch2.0-with-nebullvmLinks

☆9

Alternatives and similar repositories for benchmark-pytorch2.0-with-nebullvm

Users that are interested in benchmark-pytorch2.0-with-nebullvm are comparing it to the libraries listed below

Sorting:

onnx / neural-compressor
Model compression for ONNX
☆96Updated 7 months ago
lessw2020 / transformer_central
Various transformers for FSDP research
☆37Updated 2 years ago
FrancescoSaverioZuppichini / dynamic-batching-asyncio
☆32Updated 2 years ago
triple-Mu / TensorRT2ONNX
A tool convert TensorRT engine/plan to a fake onnx
☆39Updated 2 years ago
facebookresearch / NasRec
NASRec Weight Sharing Neural Architecture Search for Recommender Systems
☆30Updated last year
PINTO0309 / whisper-onnx-tensorrt
ONNX and TensorRT implementation of Whisper
☆63Updated 2 years ago
triton-inference-server / developer_tools
☆18Updated 2 weeks ago
triton-inference-server / dali_backend
The Triton backend that allows running GPU-accelerated data pre-processing pipelines implemented in DALI's python API.
☆135Updated 3 weeks ago
jundaf2 / INT8-Flash-Attention-FMHA-Quantization
☆157Updated last year
ahennequ / cuda-tensorcores-register-mapping
☆18Updated 2 years ago
k9ele7en / Triton-TensorRT-Inference-CRAFT-pytorch
Advanced inference pipeline using NVIDIA Triton Inference Server for CRAFT Text detection (Pytorch), included converter from Pytorch -> O…
☆33Updated 3 years ago
triton-inference-server / onnxruntime_backend
The Triton backend for the ONNX Runtime.
☆153Updated last week
quic / efficient-transformers
This library empowers users to seamlessly port pretrained models and checkpoints on the HuggingFace (HF) hub (developed using HF transfor…
☆71Updated this week
triton-inference-server / pytorch_backend
The Triton backend for the PyTorch TorchScript models.
☆152Updated last week
triton-inference-server / triton_cli
Triton CLI is an open source command line interface that enables users to create, deploy, and profile models served by the Triton Inferen…
☆64Updated 2 weeks ago
roboflow / deploy-models-with-grpc-pytorch-asyncio
Article about deploying machine learning models using grpc, pytorch and asyncio
☆28Updated 2 years ago
rasbt / pytorch-memory-optim
This code repository contains the code used for my "Optimizing Memory Usage for Training LLMs and Vision Transformers in PyTorch" blog po…
☆92Updated last year
pbridger / tensorrt-ssd300-8bit-quantized
☆52Updated 4 years ago
gmalivenko / onnx-opcounter
Count number of parameters / MACs / FLOPS for ONNX models.
☆93Updated 8 months ago
YH-Wu / Triton-Inference-Server-on-Kubernetes
☆31Updated 2 years ago
gnovack / distributed-training-and-deepspeed
☆17Updated 2 years ago
triton-inference-server / tensorrt_backend
The Triton backend for TensorRT.
☆77Updated last week
wejoncy / QLLM
A general 2-8 bits quantization toolbox with GPTQ/AWQ/HQQ/VPTQ, and export to onnx/onnx-runtime easily.
☆172Updated 2 months ago
dusty-nv / NanoDB
Zero-copy multimodal vector DB with CUDA and CLIP/SigLIP
☆59Updated last month
PINTO0309 / sne4onnx
A very simple tool for situations where optimization with onnx-simplifier would exceed the Protocol Buffers upper file size limit of 2GB,…
☆17Updated last year
pytorch / multipy
torch::deploy (multipy for non-torch uses) is a system that lets you get around the GIL problem by running multiple Python interpreters i…
☆180Updated 2 weeks ago
huggingface / chug
Minimal sharded dataset loaders, decoders, and utils for multi-modal document, image, and text datasets.
☆158Updated last year
NVIDIA-AI-IOT / tao-toolkit-triton-apps
Sample app code for deploying TAO Toolkit trained models to Triton
☆87Updated 9 months ago
neuralmagic / nm-vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
☆264Updated 8 months ago
lucasjinreal / wanwu_release
Wanwu models release, code will be released soon
☆24Updated 2 years ago