parsakhaz / video-understanding-engineLinks
A powerful video summarization tool that utilizes Moondream alongside multiple AI models to provide comprehensive video understanding through audio transcription, intelligent frame selection, visual description, and content summarization.
☆24Updated last year
Alternatives and similar repositories for video-understanding-engine
Users that are interested in video-understanding-engine are comparing it to the libraries listed below
Sorting:
- Examples and quickstarts for Moondream☆68Updated 7 months ago
- Turn text from websites into spoken audio with edge-tts, F5, etc. and save as mp3 files☆46Updated 7 months ago
- Gradio UI for a Cog API☆70Updated last year
- This repository is an implementation of converting sketches into lively videos using Google's Veo 3 model.☆76Updated 7 months ago
- [WIP] AI Try-On plugin for Chrome☆28Updated last year
- AnyModal is a Flexible Multimodal Language Model Framework for PyTorch☆103Updated last year
- Build Web Datasets with Ease☆33Updated last year
- Garvis: Realtime AI Voice Assistant☆39Updated last year
- ☆17Updated last year
- Whisper STT + Orpheus TTS + Gemma 3 using LM Studio to create a virtual assistant.☆80Updated last month
- A couple scripts to grab stats from email☆43Updated last year
- Jockey is a conversational video agent.☆97Updated 8 months ago
- Radiantloom Email Assist 7B is an email-assistant large language model fine-tuned from Zephyr-7B-Beta, over a custom-curated dataset of 1…☆14Updated 2 years ago
- ☆11Updated last year
- Replicate Flux LoRA image editor.☆54Updated last year
- Local character AI chatbot with chroma vector store memory and some scripts to process documents for Chroma☆34Updated last year
- ☆55Updated 4 months ago
- Use the Moondream 2 model to detect faces and their gaze directions in videos.☆46Updated last year
- Gradio based tool to run opensource LLM models directly from Huggingface☆96Updated last year
- ☆58Updated last year
- Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"☆82Updated last year
- GroqChat: Local ChatGPT-like environment in your browser using best open model LLama 3.1 Series on the Grow fastest inference engine.☆91Updated last year
- ☆119Updated last year
- Realtime Voice and Vision wtih Brilliant Labs Frame and Gemini☆68Updated 8 months ago
- ☆41Updated last year
- MCP Server implementation for Claude☆26Updated last year
- Using langchain, deeplake and openai to create a Q&A on the Mojo lang programming manual☆22Updated 2 years ago
- Generate visual podcasts about novels using open source models☆25Updated 2 years ago
- A python command-line tool to download & manage MLX AI models from Hugging Face.☆19Updated last year
- ☆42Updated last year