NVIDIA / nvidia-container-toolkit
Build and run containers leveraging NVIDIA GPUs
☆2,219Updated this week
Related projects: ⓘ
- NVIDIA container runtime library☆810Updated this week
- NVIDIA container runtime☆1,101Updated 10 months ago
- Simple, safe way to store and distribute tensors☆2,755Updated 2 weeks ago
- NVIDIA GPU metrics exporter for Prometheus leveraging DCGM☆852Updated last month
- An interactive NVIDIA-GPU process viewer and beyond, the one-stop solution for GPU process management.☆4,617Updated last week
- TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain…☆8,186Updated last week
- NVIDIA device plugin for Kubernetes☆2,693Updated this week
- A developer reference project for creating Retrieval Augmented Generation (RAG) chatbots on Windows using TensorRT-LLM☆2,666Updated last month
- A Python package for extending the official PyTorch that can easily obtain performance on Intel platform☆1,554Updated this week
- DirectML is a high-performance, hardware-accelerated DirectX 12 library for machine learning. DirectML provides GPU acceleration for comm…☆2,172Updated last week
- 🚀 Accelerate training and inference of 🤗 Transformers and 🤗 Diffusers with easy to use hardware optimization tools☆2,461Updated this week
- Accessible large language models via k-bit quantization for PyTorch.☆6,042Updated this week
- Fast and memory-efficient exact attention☆13,401Updated this week
- Hackable and optimized Transformers building blocks, supporting a composable construction.☆8,351Updated last week
- NVIDIA GPU Operator creates/configures/manages GPUs atop Kubernetes☆1,746Updated last week
- This repository contains tutorials and examples for Triton Inference Server☆527Updated this week
- Transformer related optimization, including BERT, GPT☆5,773Updated 5 months ago
- Large Language Model Text Generation Inference☆8,778Updated this week
- AMD ROCm™ Software - GitHub Home☆4,493Updated this week
- A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs…☆1,817Updated this week
- Olive: Simplify ML Model Finetuning, Conversion, Quantization, and Optimization for CPUs, GPUs and NPUs.☆1,524Updated this week
- A Native-PyTorch Library for LLM Fine-tuning☆3,954Updated this week
- NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source compone…☆10,559Updated this week
- Scripts for fine-tuning Meta Llama3 with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datase…☆11,610Updated last week
- The Triton Inference Server provides an optimized cloud and edge inferencing solution.☆8,061Updated this week
- lightweight, standalone C++ inference engine for Google's Gemma models.☆5,911Updated this week
- Multi-GPU CUDA stress test☆1,352Updated last month
- Nvidia GPU exporter for prometheus using nvidia-smi binary☆831Updated this week
- SGLang is a fast serving framework for large language models and vision language models.☆5,162Updated this week
- LMDeploy is a toolkit for compressing, deploying, and serving LLMs.☆4,184Updated last week