parsakhaz / video-understanding-engineLinks
A powerful video summarization tool that utilizes Moondream alongside multiple AI models to provide comprehensive video understanding through audio transcription, intelligent frame selection, visual description, and content summarization.
☆23Updated 10 months ago
Alternatives and similar repositories for video-understanding-engine
Users that are interested in video-understanding-engine are comparing it to the libraries listed below
Sorting:
- Garvis: Realtime AI Voice Assistant☆38Updated last year
- Jockey is a conversational video agent.☆93Updated 6 months ago
- Whisper STT + Orpheus TTS + Gemma 3 using LM Studio to create a virtual assistant.☆74Updated 7 months ago
- AnyModal is a Flexible Multimodal Language Model Framework for PyTorch☆103Updated 11 months ago
- Synthify: Seamlessly generate ai datasets with a no-code UI | https://synthify.toolstack.run☆49Updated 10 months ago
- VideoDB Python SDK☆84Updated this week
- Turn text from websites into spoken audio with edge-tts, F5, etc. and save as mp3 files☆46Updated 5 months ago
- ☆55Updated 3 months ago
- kokoro text to speech using javascript☆63Updated 10 months ago
- Use the Moondream 2 model to detect faces and their gaze directions in videos.☆46Updated 11 months ago
- A quick and optimized solution to manage llama based gguf quantized models, download gguf files, retreive messege formatting, add more mo…☆12Updated last year
- [WIP] AI Try-On plugin for Chrome☆28Updated last year
- Gradio UI for a Cog API☆72Updated last year
- ☆58Updated last year
- Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"☆82Updated last year
- Benchmarking LLMs as Casual Card Game AIs☆20Updated 10 months ago
- Kyutai with an "eye"☆230Updated 8 months ago
- Realtime tts reading of large textfiles by your favourite voice. +Translation via LLM (Python script)☆52Updated last year
- ☆117Updated last year
- ☆18Updated 3 months ago
- Terminal Voice Assistant is a powerful and flexible tool designed to help users interact with their terminal using natural language comma…☆19Updated last year
- A lightweight recreation of OS1/Samantha from the movie Her, running locally in the browser☆112Updated 5 months ago
- George is an API leveraging AI to make it easy to control a computer with natural language.☆49Updated 11 months ago
- Very basic framework for composable parameterized large language model (Q)LoRA / (Q)Dora fine-tuning using mlx, mlx_lm, and OgbujiPT.☆43Updated 5 months ago
- A modular framework for building massively parallel agentic systems☆29Updated 3 months ago
- Simple UI for Llama-3.2-11B-Vision & Molmo-7B-D☆136Updated last year
- ☆101Updated 6 months ago
- A real-time speech-to-speech chatbot powered by Whisper Small, Llama 3.2, and Kokoro-82M.☆246Updated 10 months ago
- LoRA Explorer model to test with LoRAs using Flux.1[Dev] as the base model☆53Updated last year
- Realtime Voice and Vision wtih Brilliant Labs Frame and Gemini☆68Updated 7 months ago