byjlw / video-analyzer
A comprehensive video analysis tool that combines computer vision, audio transcription, and natural language processing to generate detailed descriptions of video content. This tool extracts key frames from videos, transcribes audio content, and produces natural language descriptions of the video's content.
☆468Updated this week
Alternatives and similar repositories for video-analyzer:
Users that are interested in video-analyzer are comparing it to the libraries listed below
- 实时语音交互数字人,支持端到端语音方案(GLM-4-Voice - THG)和级联方案(ASR-LLM-TTS-THG)。可自定义形象与音色,无须训练,支持音色克隆,首包延迟低至3s。Real-time voice interactive digital human, su…☆647Updated 2 months ago
- Easegen is an open-source digital human course creation platform offering comprehensive solutions from course production and video manage…☆172Updated this week
- The fastest digital human algorithm, now on your desktop.☆411Updated last month
- A Training-free Iterative Framework for Long Story Visualization☆702Updated last week
- ☆136Updated this week
- ☆541Updated 3 months ago
- ☆377Updated 2 months ago
- gradio WebUI for AdvancedLivePortrait☆424Updated 3 weeks ago
- Fuse ChatTTS with OpenVoice, upload a 10-second audio clip, and clone your personalized ChatTTS voice.☆395Updated 2 months ago
- An open-source AI content search engine designed specifically for content creators. Supports extraction of text, images, and short videos…☆523Updated 7 months ago
- [AAAI 2025] StoryWeaver: A Unified World Model for Knowledge-Enhanced Story Character Customization☆171Updated 3 weeks ago
- JoyHallo: Digital human model for Mandarin☆424Updated 2 months ago
- AI ContentCraft is an all-in-one content creation suite that helps creators generate stories, podcast scripts, and multimedia content usi…☆250Updated last week
- 百聆 是一个类似GPT-4o的语音对话机器人,通过ASR+LLM+TTS实现,时延低至800ms,低配置也可运行,支持打断☆432Updated last week
- An open-sourced end-to-end VLM-based GUI Agent☆637Updated this week
- An open-source LLM based automatically daily news collecting workflow showcase powered by Agently AI application development framework.☆469Updated 3 months ago
- Bring portraits to life via Monitor!☆266Updated 5 months ago
- Free, high-quality text-to-speech API endpoint to replace OpenAI, Azure, or ElevenLabs☆409Updated last week
- ☆139Updated 2 months ago
- WebDesignAgent : Towards Effortless Website Creation☆244Updated 4 months ago
- Parse PDFs into markdown using Vision LLMs☆224Updated this week
- AigcPanel 是一个简单易用的一站式AI数字人系统,支持视频合成、声音合成、声音克隆,简化本地模型管理、一键导入和使用AI模型。☆673Updated this week
- Local SRT/LLM/TTS Voicechat☆601Updated 3 months ago
- ⚡ Insanely fast AI voice assistant with <500ms response times☆362Updated last month
- Memory-Guided Diffusion for Expressive Talking Video Generation☆688Updated this week
- Awada 是一个基于微信场景的团队知识助理智能体。它可以从群聊、公众号、网站等来源中进行在线自主学习(同时也接受自主文档上传),打造团队私域知识库,并为团队成员提供问答、资料查找以及写作(Word)服务。☆222Updated last month
- A Fast TTS Engine☆411Updated this week
- E2M converts various file types (doc, docx, epub, html, htm, url, pdf, ppt, pptx, mp3, m4a) into Markdown. It’s easy to install, with ded…☆892Updated 4 months ago
- Gradio-powered application that converts audio recordings of meetings into transcripts and provides concise summaries using whisper.☆80Updated 3 months ago
- Solution for checking file if contain NSFW content.☆433Updated last month