NVIDIA-AI-Blueprints / video-search-and-summarization
Blueprint for Ingesting massive volumes of live or archived videos and extract insights for summarization and interactive Q&A
☆46Updated last week
Alternatives and similar repositories for video-search-and-summarization:
Users that are interested in video-search-and-summarization are comparing it to the libraries listed below
- Inference and fine-tuning examples for vision models from 🤗 Transformers☆132Updated this week
- Collection of reference workflows for building intelligent agents with NIMs☆155Updated 3 months ago
- ☆107Updated last month
- This repo has the code of the 3 demos I presented at Google Gemma2 DevDay Tokyo, using Gemma2 on a Jetson Orin Nano device.☆43Updated last month
- Optimized local inference for LLMs with HuggingFace-like APIs for quantization, vision/language models, multimodal agents, speech, vector…☆262Updated 6 months ago
- ☆17Updated last month
- A utility library to help integrate Python applications with Metropolis Microservices for Jetson☆12Updated 4 months ago
- Route LLM requests to the best model for the task at hand.☆43Updated last month
- This NVIDIA RAG blueprint serves as a reference solution for a foundational Retrieval Augmented Generation (RAG) pipeline.☆89Updated this week
- A collection of reference AI microservices and workflows for Jetson Platform Services☆38Updated 3 months ago
- A reference application for a local AI assistant with LLM and RAG☆110Updated 5 months ago
- An integration of Segment Anything Model, Molmo, and, Whisper to segment objects using voice and natural language.☆25Updated 2 months ago
- Multimodal AI agent with Llama 3.2: A Streamlit app that processes text, images, PDFs, and PPTs, integrating NIM microservices, Milvus, a…☆114Updated 7 months ago
- Accelerate your Gen AI with NVIDIA NIM and NVIDIA AI Workbench☆159Updated last week
- Solving Computer Vision with AI agents☆31Updated this week
- A DeepStream sample application demonstrating end-to-end retail video analytics for brick-and-mortar retail.☆46Updated 2 years ago
- Which model is the best at object detection? Which is best for small or large objects? We compare the results in a handy leaderboard.☆70Updated this week
- VLM driven tool that processes surveillance videos, extracts frames, and generates insightful annotations using a fine-tuned Florence-2 V…☆110Updated 7 months ago
- Fine Tuning Multimodal LLM "Idefics 9B" on Pokemon Go Dataset available on Hugging Face.☆19Updated last year
- Chat with Phi 3.5/3 Vision LLMs. Phi-3.5-vision is a lightweight, state-of-the-art open multimodal model built upon datasets which includ…☆33Updated 4 months ago
- Matrix (Multi-Agent daTa geneRation Infra and eXperimentation framework) is a versatile engine for multi-agent conversational data genera…☆29Updated this week
- Testbed for multimodal retrieval augmented generation techniques with FiftyOne, LlamaIndex, and Milvus☆18Updated 9 months ago
- Ultralytics Notebooks 🚀☆76Updated 3 weeks ago
- Testing and evaluating the capabilities of Vision-Language models (PaliGemma) in performing computer vision tasks such as object detectio…☆80Updated 11 months ago
- An NVIDIA AI Workbench example project for an Agentic Retrieval Augmented Generation (RAG)☆72Updated 3 months ago
- The NVIDIA RTX™ AI Toolkit is a suite of tools and SDKs for Windows developers to customize, optimize, and deploy AI models across RTX PC…☆149Updated 5 months ago
- ☆112Updated 5 months ago
- ☆68Updated 7 months ago
- Speech To Speech: an effort for an open-sourced and modular GPT4-o☆56Updated 6 months ago
- ☆150Updated this week