Triton CLI is an open source command line interface that enables users to create, deploy, and profile models served by the Triton Inference Server.
☆74Apr 8, 2026Updated last week
Alternatives and similar repositories for triton_cli
Users that are interested in triton_cli are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Integrating SSE with NVIDIA Triton Inference Server using a Python backend and Zephyr model. There is very less documentation how to use …☆10May 29, 2024Updated last year
- Triton Model Navigator is an inference toolkit designed for optimizing and deploying Deep Learning models with a focus on NVIDIA GPUs.☆221Feb 3, 2026Updated 2 months ago
- Triton backend that enables pre-process, post-processing and other logic to be implemented in Python.☆673Updated this week
- This repository contains tutorials and examples for Triton Inference Server☆828Apr 8, 2026Updated last week
- An api for interfacing Nvidia Trition Inference Server with Rust☆12Jun 12, 2023Updated 2 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- TRITONCACHE implementation of a Redis cache☆17Apr 8, 2026Updated last week
- Triton Model Analyzer is a CLI tool to help with better understanding of the compute and memory requirements of the Triton Inference Serv…☆509Updated this week
- OpenAI compatible API for TensorRT LLM triton backend☆219Aug 1, 2024Updated last year
- Please visit https://github.com/HKUSTDial/NL2SQL360 to get the official code!☆10Sep 1, 2024Updated last year
- An Operator for deployment and maintenance of NVIDIA NIMs and NeMo microservices in a Kubernetes environment.☆153Updated this week
- Infrastructure as code for GPU accelerated managed Kubernetes clusters.☆59Apr 30, 2025Updated 11 months ago
- PyTriton is a Flask/FastAPI-like interface that simplifies Triton's deployment in Python environments.☆842Aug 13, 2025Updated 8 months ago
- The Triton TensorRT-LLM Backend☆931Apr 8, 2026Updated last week
- ☆22Apr 8, 2026Updated last week
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Quotek is an open source algotrading platform, written in C++.☆11Nov 12, 2020Updated 5 years ago
- The core library and APIs implementing the Triton Inference Server.☆169Updated this week
- Dataset collected from popular Russian collective blog Habrahabr.ru☆13Oct 24, 2016Updated 9 years ago
- ESG Insights AI simplifies ESG data analysis with advanced AI models, ensuring compliance with GRI standards. It helps asset managers ass…☆13Oct 31, 2024Updated last year
- An NVIDIA Triton Server workflow for OCR and the LayoutLMv3 Transformer Model☆30Sep 14, 2022Updated 3 years ago
- [ICML2023] Instant Soup Cheap Pruning Ensembles in A Single Pass Can Draw Lottery Tickets from Large Models. Ajay Jaiswal, Shiwei Liu, Ti…☆11Nov 28, 2023Updated 2 years ago
- Repository for open inference protocol specification☆69May 12, 2025Updated 11 months ago
- MIG Partition Editor for NVIDIA GPUs☆247Updated this week
- Implementation of SmoothCache, a project aimed at speeding-up Diffusion Transformer (DiT) based GenAI models with error-guided caching.☆48Jul 17, 2025Updated 8 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Create a minimal linux distro from scratch☆29Sep 16, 2025Updated 6 months ago
- Get GDDR5 memory information and other information from AMD Radeon GPUs.☆13May 26, 2018Updated 7 years ago
- Code for Draft Attention☆101May 22, 2025Updated 10 months ago
- /j f t/ - YAML file tool☆13Feb 9, 2026Updated 2 months ago
- The Triton backend for TensorFlow.☆56Nov 22, 2025Updated 4 months ago
- [ICLR 2026] Official implementation of DiCache: Let Diffusion Model Determine Its Own Cache☆59Jan 26, 2026Updated 2 months ago
- Classification and aggregation of russian news articles. University coursework.☆18Jan 21, 2019Updated 7 years ago
- ☆65Apr 26, 2025Updated 11 months ago
- Run cloud native workloads on NVIDIA GPUs☆232Jan 22, 2026Updated 2 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ☆338Updated this week
- An experimental communicating attention kernel based on DeepEP.☆35Jul 29, 2025Updated 8 months ago
- Example of out-of-RAM k-nearest neighbors search using faiss☆18Mar 28, 2026Updated 2 weeks ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆116Apr 9, 2026Updated last week
- ☆18Updated this week
- Common source, scripts and utilities for creating Triton backends.☆370Apr 8, 2026Updated last week
- ☆13Dec 3, 2021Updated 4 years ago