dusty-nv / NanoLLM
Optimized local inference for LLMs with HuggingFace-like APIs for quantization, vision/language models, multimodal agents, speech, vector DB, and RAG.
☆171Updated 2 weeks ago
Related projects: ⓘ
- ☆74Updated this week
- A reference application for a local AI assistant with LLM and RAG☆82Updated 2 months ago
- A project that optimizes OWL-ViT for real-time inference with NVIDIA TensorRT.☆232Updated last month
- A tutorial introducing knowledge distillation as an optimization technique for deployment on NVIDIA Jetson☆145Updated 10 months ago
- Creation of annotated datasets from scratch using Generative AI and Foundation Computer Vision models☆78Updated this week
- A reference example for integrating NanoOwl with Metropolis Microservices for Jetson☆25Updated 3 months ago
- ☆77Updated this week
- Quick exploration into fine tuning florence 2☆250Updated last month
- A project that optimizes Whisper for low latency inference using NVIDIA TensorRT☆47Updated 2 months ago
- Using FastChat-T5 Large Language Model, Vosk API for automatic speech recognition, and Piper for text-to-speech☆97Updated last year
- Use Florence 2 to auto-label data for use in training fine-tuned object detection models.☆54Updated last month
- TAO Toolkit deep learning networks with PyTorch backend☆81Updated 2 weeks ago
- Use Segment Anything 2, grounded with Florence-2, to auto-label data for use in training vision models.☆80Updated last month
- ASR/NLP/TTS deep learning inference library for NVIDIA Jetson using PyTorch and TensorRT☆180Updated 7 months ago
- Quick start scripts and tutorial notebooks to get started with TAO Toolkit☆35Updated 3 weeks ago
- Easy to use neural networks for NVIDIA Jetson (and desktop too!)☆75Updated last year
- Inference Vision Transformer (ViT) in plain C/C++ with ggml☆215Updated 5 months ago
- Advanced Quantization Algorithm for LLMs. This is official implementation of "Optimize Weight Rounding via Signed Gradient Descent for t…☆205Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆250Updated this week
- A distilled Segment Anything (SAM) model capable of running real-time with NVIDIA TensorRT☆623Updated 9 months ago
- An innovative library for efficient LLM inference via low-bit quantization☆342Updated 2 weeks ago
- Build LLM-powered robots in your garage with MachinaScript For Robots!☆159Updated 4 months ago
- Python bindings for ggml☆125Updated 2 weeks ago
- YOLOExplorer : Iterate on your YOLO / CV datasets using SQL, Vector semantic search, and more within seconds☆119Updated 2 weeks ago
- From scratch implementation of a vision language model in pure PyTorch☆149Updated 4 months ago
- ☆351Updated 9 months ago
- ☆158Updated last year
- The jetson-examples repository by Seeed Studio offers a seamless, one-line command deployment to run vision AI and Generative AI models o…☆65Updated this week
- Real-time Depth Estimation for Jetson Orin☆14Updated 3 weeks ago