triton-inference-server / server
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
☆8,037Updated last week
Related projects: ⓘ
- Serve, optimize and scale PyTorch models in production☆4,155Updated this week
- NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source compone…☆10,552Updated last week
- Transformer related optimization, including BERT, GPT☆5,773Updated 5 months ago
- Ongoing research training transformer models at scale☆9,949Updated this week
- Development repository for the Triton language and compiler☆12,698Updated this week
- Open standard for machine learning interoperability☆17,638Updated this week
- ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator☆14,117Updated this week
- Open deep learning compiler stack for cpu, gpu and specialized accelerators☆11,602Updated this week
- Fast and memory-efficient exact attention☆13,401Updated this week
- Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.☆14,158Updated 2 weeks ago
- 🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (i…☆7,687Updated this week
- A library for efficient similarity search and clustering of dense vectors.☆30,511Updated this week
- A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch☆8,303Updated 2 weeks ago
- PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT☆2,499Updated this week
- State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enter…☆13,233Updated last month
- A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Auto…☆11,519Updated this week
- A collection of pre-trained, state-of-the-art models in the ONNX format☆7,714Updated 4 months ago
- PyTorch extensions for high performance and large scale training.☆3,149Updated 2 weeks ago
- A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep lear…☆5,065Updated this week
- Tutorials for creating and using ONNX models☆3,338Updated 2 months ago
- TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain…☆8,186Updated last week
- An easy to use PyTorch to TensorRT converter☆4,557Updated last month
- The easiest way to serve AI apps and models - Build reliable Inference APIs, LLM apps, Multi-model chains, RAG service, and much more!☆6,985Updated this week
- Standardized Serverless ML Inference Platform on Kubernetes☆3,468Updated this week
- An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model c…☆13,985Updated 2 months ago
- AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (N…☆4,531Updated this week
- OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference☆6,837Updated this week
- Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities☆19,545Updated 3 weeks ago
- 🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.☆15,839Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆26,822Updated this week