datawhalechina / video-devourLinks
🚀 基于 ASR + VLM 技术的智能视频笔记工具,能够将任何视频"吞噬"并生成包含图文内容和视频剪影的结构化笔记报告
☆49Updated 3 months ago
Alternatives and similar repositories for video-devour
Users that are interested in video-devour are comparing it to the libraries listed below
Sorting:
- 基于FunASR实现语音识别,包含常规版和ONNX版(推荐)。☆48Updated last year
- 本项目借助飞桨平台,构建起一套创新的多模型协同系统,实现 PDF 文件到 Markdown 文件的高效、精准转换。☆27Updated 10 months ago
- SenseVoice-python: A enterprise-grade open source multi-language asr system from funasr opensource with onnxruntime☆109Updated 4 months ago
- Utilizes ONNX Runtime for speech activity detection.☆41Updated 2 months ago
- An AI-powered content conversion tool that transforms text, web content, or HTML code into beautifully designed card images.一款基于AI的内容转换工…☆33Updated 6 months ago
- We Speech Transcript based on LLM, in 300 lines of code.☆183Updated 7 months ago
- ☆204Updated last year
- Github repository for ACL 2025 paper: Recent Advances in Speech Language Models: A Survey.☆178Updated 7 months ago
- 基于通义千问 Qwen2.5-Omni 的实时语音对话系统,使用在线API服务,支持实时语音交互、动态语音活动检测和流式音频处理。A real-time voice conversation system based on Qwen2.5-Omni Online-API, …☆82Updated 9 months ago
- RealSI: Open Benchmark for Simultaneous Interpretation in Real-world Scenarios☆79Updated 7 months ago
- flow mirror models from JZX AI Labs☆43Updated last year
- Fun-ASR is an end-to-end speech recognition large model launched by Tongyi Lab.☆873Updated last week
- Extension of ChatTTS, 3x Faster on Windows, Support Voice Cloning and Mobile Deployment☆172Updated last year
- low-latency realtime ASR based on FireRedASR☆57Updated 7 months ago
- 一份全栈式大语言模型参考指南,用最简洁的代码帮助你端到端定义模型从零训练到工程落地的每一个细节☆123Updated 3 weeks ago
- A mini assistant to help you read paper quickly☆55Updated 9 months ago
- ☆69Updated last year
- Streaming ASR and TTS based on FastAPI+ sherpa-onnx☆188Updated 3 months ago
- 一个包含了多种主流大模型微调方案的实战代码库,基于Qwen3系列模型☆116Updated 6 months ago
- Python的音频工具☆16Updated 2 months ago
- ☆32Updated 7 months ago
- Step-Realtime-Console☆64Updated last month
- ☆25Updated 6 months ago
- Hugging Face Audio Course中文版,帮助学习者快速入门音频模态☆37Updated last year
- Dynamic Voice Actor Assignment and Emotional Narration for Realistic Story Play☆47Updated 10 months ago
- An open source chat bot architecture for voice/vision (and multimodal) assistants, local(CPU/GPU bound) and remote(I/O bound) to run.☆87Updated last month
- paraformer(chinense asr) online onnx runtime for python☆53Updated last year
- Utilizes ONNX Runtime to transcribe audio into text.☆80Updated this week
- 本教程将全面指导你如何快速搭建自己的AI应用环境,从Docker桌面版的安装与配置开始,到本地部署Dify并自定义AI助手功能,让你轻松实现“猜病例”、“甜蜜哄人”、“新生入学指南”、“小红书读书卡片”与“面试宝典”等多种特色AI应用。并教会你从基 础智能体到使用工作流,再到…☆327Updated last month
- GPT-4o-level, real-time spoken dialogue system.☆369Updated last year