showlab / liveccLinks
LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale (CVPR 2025)
β229Updated last week
Alternatives and similar repositories for livecc
Users that are interested in livecc are comparing it to the libraries listed below
Sorting:
- Ming - facilitating advanced multimodal understanding and generation capabilities built upon the Ling LLM.β349Updated this week
- π₯ICLR 2025 (Spotlight) One-Prompt-One-Story: Free-Lunch Consistent Text-to-Image Generation Using a Single Promptβ271Updated 3 weeks ago
- AnimeGamer: Infinite Anime Life Simulation with Next Game State Predictionβ317Updated 2 months ago
- β336Updated 3 months ago
- MovieAgent: Automated Movie Generation via Multi-Agent CoT Planningβ209Updated 3 months ago
- The official repo for "Vidi: Large Multimodal Models for Video Understanding and Editing"β112Updated last week
- π‘ VideoMind: A Chain-of-LoRA Agent for Long Video Reasoningβ217Updated last month
- [CVPR 2025] This is an official inference code of the paper "BizGen: Advancing Article-level Visual Text Rendering for Infographics Generβ¦β276Updated 2 months ago
- [ICLR2025] DisPose: Disentangling Pose Guidance for Controllable Human Image Animationβ367Updated 5 months ago
- FlashVideo: Flowing Fidelity to Detail for Efficient High-Resolution Video Generationβ433Updated 3 months ago
- Official GPU implementation of the paper "PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance"β131Updated 7 months ago
- [SIGGRAPH2025] Official repo for paper "Any-length Video Inpainting and Editing with Plug-and-Play Context Control"β408Updated 2 months ago
- All-round Creator and Editorβ223Updated 5 months ago
- [AAAI 2025] StoryWeaver: A Unified World Model for Knowledge-Enhanced Story Character Customizationβ213Updated 2 months ago
- ACTalker: an end-to-end video diffusion framework for talking head synthesis that supports both single and multi-signal control (e.g., auβ¦β289Updated 2 months ago
- Let Them Talk: Audio-Driven Multi-Person Conversational Video Generationβ427Updated this week
- [ICML 2025] Official PyTorch implementation of LongVUβ383Updated last month
- [SIGGRAPH 2025] Official code of the paper "FlexiAct: Towards Flexible Action Control in Heterogeneous Scenarios"β290Updated last month
- [CVPR 2025 Highlight] X-Dyna: Expressive Dynamic Human Image Animationβ251Updated 4 months ago
- MagicTryOn is a video virtual try-on framework based on a large-scale video diffusion Transformer.β309Updated this week
- Pusa: Thousands Timesteps Video Diffusion Modelβ199Updated last week
- Official implementation of the paper "MusicInfuser: Making Video Diffusion Listen and Dance"β73Updated 2 months ago
- β180Updated 3 weeks ago
- π₯π₯First-ever hour scale video understanding modelsβ450Updated 3 weeks ago
- KeySync: A Robust Approach for Leakage-free Lip Synchronization in High Resolutionβ321Updated last week
- A new multi-shot video understanding benchmark Shot2Story with comprehensive video summaries and detailed shot-level captions.β137Updated 4 months ago
- [ICML 2025] SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generationβ239Updated 3 months ago
- β219Updated last month
- Multimodal Models in Real Worldβ517Updated 4 months ago
- Official implementation of MAGREF: Masked Guidance for Any-Reference Video Generationβ178Updated this week