parsakhaz / video-understanding-engineLinks
A powerful video summarization tool that utilizes Moondream alongside multiple AI models to provide comprehensive video understanding through audio transcription, intelligent frame selection, visual description, and content summarization.
☆17Updated 5 months ago
Alternatives and similar repositories for video-understanding-engine
Users that are interested in video-understanding-engine are comparing it to the libraries listed below
Sorting:
- Turn text from websites into spoken audio with edge-tts, F5, etc. and save as mp3 files☆47Updated 3 weeks ago
- Transcribe and summarize videos using whisper and llms on apple mlx framework☆75Updated last year
- AnyModal is a Flexible Multimodal Language Model Framework for PyTorch☆100Updated 6 months ago
- Using the moondream VLM with optical flow for promptable object tracking☆68Updated 4 months ago
- ☆51Updated 8 months ago
- ☆91Updated 2 months ago
- ☆53Updated last month
- A python package for serving LLM on OpenAI-compatible API endpoints with prompt caching using MLX.☆89Updated 2 weeks ago
- Speech To Speech: an effort for an open-sourced and modular GPT4-o☆64Updated 9 months ago
- Whisper STT + Orpheus TTS + Gemma 3 using LM Studio to create a virtual assistant.☆63Updated 2 months ago
- Run Ollama LLM models in Google Colab for free☆36Updated 7 months ago
- LoRA Explorer model to test with LoRAs using Flux.1[Dev] as the base model☆50Updated 9 months ago
- Own your AI, search the web with it🌐😎☆86Updated 6 months ago
- Build Web Datasets with Ease☆33Updated last year
- ☆19Updated 2 months ago
- A WebRTC server that allows you to interact with an LLM using your speech and responds back with generated audio.☆134Updated last year
- ☆132Updated 2 months ago
- A real-time speech-to-speech chatbot powered by Whisper Small, Llama 3.2, and Kokoro-82M.☆232Updated 5 months ago
- Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"☆81Updated 9 months ago
- [WIP] AI Try-On plugin for Chrome☆27Updated last year
- ☆40Updated last year
- B-Llama3o a llama3 with Vision Audio and Audio understanding as well as text and Audio and Animation Data output.☆26Updated last year
- Video+code lecture on building nanoGPT from scratch☆69Updated last year
- Automated LLM novelist☆47Updated last year
- A Python library to orchestrate LLMs in a neural network-inspired structure☆49Updated 9 months ago
- BUD-E (Buddy) is an open-source voice assistant framework that facilitates seamless interaction with AI models and APIs, enabling the cre…☆36Updated last year
- Locally running LLM with internet access☆96Updated 2 weeks ago
- No longer maintained:Your personal ArXiv Curator☆40Updated 8 months ago
- GRDN.AI app for garden optimization☆70Updated last year
- Service for testing out the new Qwen2.5 omni model☆54Updated 2 months ago