sieve-community / describe
Incredibly descriptive audiovisual summaries for videos
☆40Updated 9 months ago
Alternatives and similar repositories for describe
Users that are interested in describe are comparing it to the libraries listed below
Sorting:
- ☆30Updated last year
- Gradio app to track objects in video and add visual effects☆16Updated 7 months ago
- Fine-tune of Florence-2 for shot categorization.☆24Updated 2 months ago
- A multi-modal AI Model that can generate high quality novel videos with text, images, or video clips.☆65Updated last year
- ☆46Updated last year
- ☆12Updated last year
- A one-stop library to standardize the inference and evaluation of all the conditional video generation models.☆48Updated 3 months ago
- Website source code for our ACM MM'23 paper "Hierarchical Masked 3D Diffusion Model for Video Outpainting".☆42Updated last year
- Here we will track the latest AI Multimodal Models, including Multimodal Foundation Models, LLM, Agent, Audio, Image, Video, Music and 3D…☆35Updated 3 months ago
- SkyScript-100M: 1,000,000,000 Pairs of Scripts and Shooting Scripts for Short Drama: https://arxiv.org/abs/2408.09333v2☆119Updated 6 months ago
- ☆24Updated last year
- An open source community implementation of the model from the paper: "Movie Gen: A Cast of Media Foundation Models". Join our community …☆60Updated last week
- A minimalistic, hackable code base to finetune Wan video generation model☆39Updated 3 weeks ago
- ☆28Updated last year
- [IJCV'24] AutoStory: Generating Diverse Storytelling Images with Minimal Human Effort☆150Updated 5 months ago
- ☆12Updated 7 months ago
- ☆13Updated 5 months ago
- ☆76Updated 7 months ago
- ☆176Updated 10 months ago
- Implementation of "SCEdit: Efficient and Controllable Image Diffusion Generation via Skip Connection Editing"☆86Updated last year
- Official GPU implementation of the paper "PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance"☆130Updated 5 months ago
- ☆25Updated last year
- ☆74Updated 7 months ago
- XVERSE-MoE-A36B: A multilingual large language model developed by XVERSE Technology Inc.☆37Updated 8 months ago
- ☆31Updated 8 months ago
- ☆29Updated last year
- Official code of the paper: Draw an Audio: Leveraging Multi-Instruction for Video-to-Audio Synthesis.☆45Updated 8 months ago
- InteractiveVideo: User-Centric Controllable Video Generation with Synergistic Multimodal Instructions☆129Updated last year
- ☆12Updated last year
- ☆13Updated last year