showlab / whisperVideoLinks
Find out who said what in the video.
☆23Updated this week
Alternatives and similar repositories for whisperVideo
Users that are interested in whisperVideo are comparing it to the libraries listed below
Sorting:
- ☆77Updated 8 months ago
- Official GPU implementation of the paper "PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance"☆131Updated last year
- The official repo for paper "Spatial Speech Translation: Translating Across Space With Binaural Hearables"☆71Updated 5 months ago
- AudioStory: Generating Long-Form Narrative Audio with Large Language Models☆295Updated 4 months ago
- The official GitHub Page for MiniMax☆60Updated 2 months ago
- ☆146Updated 5 months ago
- [EMNLP 2025 Demo] PresentAgent: Multimodal Agent for Presentation Video Generation☆123Updated last month
- LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale (CVPR 2025)☆366Updated 2 months ago
- ☆78Updated 2 weeks ago
- Official repository for "VideoPrism: A Foundational Visual Encoder for Video Understanding" (ICML 2024)☆349Updated last week
- Incredibly descriptive audiovisual summaries for videos☆41Updated last year
- ☆82Updated 10 months ago
- [ICCV2025] WikiAutoGen offical page☆23Updated 6 months ago
- Official Pytorch implementation of "Omni-AVSR: Towards Unified Multimodal Speech Recognition with Large Language Models" [IEEE ICASSP 202…☆28Updated this week
- ViSAudio: End-to-End Video-Driven Binaural Spatial Audio Generation☆109Updated last month
- Official code of the paper: Draw an Audio: Leveraging Multi-Instruction for Video-to-Audio Synthesis.☆45Updated last year
- video-SALMONN 2 is a powerful audio-visual large language model (LLM) that generates high-quality audio-visual video captions, which is d…☆141Updated last month
- ☆93Updated 3 months ago
- ☆81Updated last year
- An official implementation of SwapAnyone.☆73Updated 10 months ago
- A collection of optimized utilities for text-to-audio processing, enhancing both training and inference workflows. This repository contai…☆42Updated 9 months ago
- A one-stop library to standardize the inference and evaluation of all the conditional video generation models.☆50Updated 11 months ago
- PodAgent: A Comprehensive Framework for Podcast Generation☆123Updated 8 months ago
- Kyutai with an "eye"☆234Updated 9 months ago
- This is the official repo for the paper "LongCat-Flash-Omni Technical Report"☆456Updated last week
- ☆230Updated 2 weeks ago
- A new multi-shot video understanding benchmark Shot2Story with comprehensive video summaries and detailed shot-level captions.☆165Updated 11 months ago
- InteractiveVideo: User-Centric Controllable Video Generation with Synergistic Multimodal Instructions☆132Updated last year
- [arXiv] On-device Sora: Enabling Diffusion-Based Text-to-Video Generation for Mobile Devices☆131Updated last month
- OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.☆625Updated 2 months ago