NVIDIA-AI-Blueprints / video-search-and-summarization
Blueprint for Ingesting massive volumes of live or archived videos and extract insights for summarization and interactive Q&A
☆23Updated 2 weeks ago
Alternatives and similar repositories for video-search-and-summarization:
Users that are interested in video-search-and-summarization are comparing it to the libraries listed below
- Collection of reference workflows for building intelligent agents with NIMs☆149Updated 2 months ago
- Optimized local inference for LLMs with HuggingFace-like APIs for quantization, vision/language models, multimodal agents, speech, vector…☆251Updated 5 months ago
- ☆103Updated last week
- A collection of reference AI microservices and workflows for Jetson Platform Services☆38Updated 2 months ago
- Inference and fine-tuning examples for vision models from 🤗 Transformers☆70Updated this week
- ☆16Updated last week
- This repo has the code of the 3 demos I presented at Google Gemma2 DevDay Tokyo, using Gemma2 on a Jetson Orin Nano device.☆38Updated 5 months ago
- ☆93Updated 6 months ago
- Which model is the best at object detection? Which is best for small or large objects? We compare the results in a handy leaderboard.☆68Updated last week
- Eye exploration☆25Updated last month
- A tutorial introducing knowledge distillation as an optimization technique for deployment on NVIDIA Jetson☆184Updated last year
- A reference example for integrating NanoOwl with Metropolis Microservices for Jetson☆30Updated 9 months ago
- A utility library to help integrate Python applications with Metropolis Microservices for Jetson☆12Updated 3 months ago
- Zero-copy multimodal vector DB with CUDA and CLIP/SigLIP☆50Updated 9 months ago
- Testing and evaluating the capabilities of Vision-Language models (PaliGemma) in performing computer vision tasks such as object detectio…☆80Updated 9 months ago
- Find your Twin Celebrity in Vector Space☆17Updated 2 months ago
- Simple and unified interface to zero-shot computer vision models curated for robotics use cases.☆117Updated this week
- Chat with Phi 3.5/3 Vision LLMs. Phi-3.5-vision is a lightweight, state-of-the-art open multimodal model built upon datasets which includ…☆33Updated 2 months ago
- An integration of Segment Anything Model, Molmo, and, Whisper to segment objects using voice and natural language.☆24Updated last month
- A reference application for a local AI assistant with LLM and RAG☆108Updated 3 months ago
- This repository stores the source code for the Mistral Hackathon 2024 in Paris☆16Updated 7 months ago
- Quick start scripts and tutorial notebooks to get started with TAO Toolkit☆75Updated 7 months ago
- End-to-End LLM Guide☆104Updated 8 months ago
- CrewAI + Ollama + Llama3 team up to program my Arduino UNO☆15Updated 10 months ago
- From scratch implementation of a vision language model in pure PyTorch☆207Updated 10 months ago
- Ultralytics Notebooks 🚀☆47Updated this week
- Multimodal AI agent with Llama 3.2: A Streamlit app that processes text, images, PDFs, and PPTs, integrating NIM microservices, Milvus, a…☆105Updated 6 months ago
- Accurately locating each head's position in the crowd scenes is a crucial task in the field of crowd analysis. However, traditional densi…☆21Updated last year
- ☆29Updated last year
- An NVIDIA AI Workbench example project for an Agentic Retrieval Augmented Generation (RAG)☆64Updated last month