Triton CLI is an open source command line interface that enables users to create, deploy, and profile models served by the Triton Inference Server.
☆74May 8, 2026Updated 2 weeks ago
Alternatives and similar repositories for triton_cli
Users that are interested in triton_cli are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆13May 8, 2023Updated 3 years ago
- Integrating SSE with NVIDIA Triton Inference Server using a Python backend and Zephyr model. There is very less documentation how to use …☆10May 29, 2024Updated last year
- Triton Model Navigator is an inference toolkit designed for optimizing and deploying Deep Learning models with a focus on NVIDIA GPUs.☆221Feb 3, 2026Updated 3 months ago
- Triton backend that enables pre-process, post-processing and other logic to be implemented in Python.☆677Updated this week
- This repository contains tutorials and examples for Triton Inference Server☆838May 8, 2026Updated 2 weeks ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- The DL Streamer Pipeline Zoo is a catalog of optimized media and media analytics pipelines. It includes tools for downloading pipelines a…☆16Aug 20, 2024Updated last year
- Triton Model Analyzer is a CLI tool to help with better understanding of the compute and memory requirements of the Triton Inference Serv…☆509May 13, 2026Updated last week
- OpenAI compatible API for TensorRT LLM triton backend☆220Aug 1, 2024Updated last year
- Fuses IMU readings with a complementary filter to achieve accurate pitch and roll readings.☆15Aug 23, 2021Updated 4 years ago
- Краулеры для проекта Taiga Corpus и Taiga Parser, скачивание ресурсов из открытых источников☆14Apr 9, 2019Updated 7 years ago
- PyTriton is a Flask/FastAPI-like interface that simplifies Triton's deployment in Python environments.☆844Aug 13, 2025Updated 9 months ago
- ☆26Feb 23, 2026Updated 3 months ago
- A CUDA kernel optimization toolkit for validation, benchmarking, Nsight Compute profiling, bottleneck analysis, and iterative tuning. It …☆168Apr 22, 2026Updated last month
- The Triton TensorRT-LLM Backend☆934May 7, 2026Updated 2 weeks ago
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- ☆22May 8, 2026Updated 2 weeks ago
- Multiple GEMM operators are constructed with cutlass to support LLM inference.☆20Aug 3, 2025Updated 9 months ago
- This repository provides optical character detection and recognition solution optimized on Nvidia devices.☆88May 13, 2025Updated last year
- The core library and APIs implementing the Triton Inference Server.☆171Updated this week
- Dataset collected from popular Russian collective blog Habrahabr.ru☆13Oct 24, 2016Updated 9 years ago
- An NVIDIA Triton Server workflow for OCR and the LayoutLMv3 Transformer Model☆30Sep 14, 2022Updated 3 years ago
- Compare multiple optimization methods on triton to imporve model service performance☆52Jan 10, 2024Updated 2 years ago
- Repository for open inference protocol specification☆72May 12, 2025Updated last year
- MIG Partition Editor for NVIDIA GPUs☆252May 18, 2026Updated last week
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Implementation of SmoothCache, a project aimed at speeding-up Diffusion Transformer (DiT) based GenAI models with error-guided caching.☆48Jul 17, 2025Updated 10 months ago
- Create a minimal linux distro from scratch☆35Apr 24, 2026Updated last month
- A collection of YAML files, Helm Charts, Operator code, and guides to act as an example reference implementation for NVIDIA NIM deploymen…☆236May 15, 2026Updated last week
- A brief understanding of ffmpeg cli through pseudocode☆11Dec 20, 2020Updated 5 years ago
- ☆18Mar 20, 2019Updated 7 years ago
- Code for Draft Attention☆103May 22, 2025Updated last year
- /j f t/ - YAML file tool☆14Apr 28, 2026Updated 3 weeks ago
- [ICLR 2026] Official implementation of DiCache: Let Diffusion Model Determine Its Own Cache☆61Jan 26, 2026Updated 4 months ago
- The Triton backend for TensorFlow.☆56Nov 22, 2025Updated 6 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Classification and aggregation of russian news articles. University coursework.☆18Jan 21, 2019Updated 7 years ago
- ☆66Apr 26, 2025Updated last year
- Run cloud native workloads on NVIDIA GPUs☆236Jan 22, 2026Updated 4 months ago
- ☆341May 8, 2026Updated 2 weeks ago
- An experimental communicating attention kernel based on DeepEP.☆34Jul 29, 2025Updated 9 months ago
- Automatically distribute GitHub Actions workflow across repositories.☆12May 19, 2026Updated last week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆120Updated this week