kaleido-lab / dolphinLinks

General video interaction platform based on LLMs, including Video ChatGPT

☆252

Alternatives and similar repositories for dolphin

Users that are interested in dolphin are comparing it to the libraries listed below

Sorting:

sail-sg / BindDiffusion
BindDiffusion: One Diffusion Model to Bind Them All
☆164Updated 2 years ago
HL-hanlin / VideoDirectorGPT
official implementation of VideoDirectorGPT: Consistent Multi-scene Video Generation via LLM-Guided Planning (COLM 2024)
☆173Updated last year
Zeqiang-Lai / Anything2Image
Generate image from anything with ImageBind and Stable Diffusion
☆196Updated 2 years ago
JourneyDB / JourneyDB
☆174Updated 2 years ago
Vision-CAIR / ChatCaptioner
Official Repository of ChatCaptioner
☆464Updated 2 years ago
SHI-Labs / VCoder
[CVPR 2024] VCoder: Versatile Vision Encoders for Multimodal Large Language Models
☆278Updated last year
showlab / ShowAnything
☆82Updated 2 years ago
AILab-CVC / Animate-A-Story
Retrieval-Augmented Video Generation for Telling a Story
☆258Updated last year
cg1177 / VideoLLM
VideoLLM: Modeling Video Sequence with Large Language Models
☆158Updated last year
EvolvingLMMs-Lab / RelateAnything
Relate Anything Model is capable of taking an image as input and utilizing SAM to identify the corresponding mask within the image.
☆456Updated 2 years ago
mshukor / UnIVAL
[TMLR23] Official implementation of UnIVAL: Unified Model for Image, Video, Audio and Language Tasks.
☆228Updated last year
AILab-CVC / TaleCrafter
[SIGGRAPH Asia 2023] An interactive story visualization tool that support multiple characters
☆261Updated last year
RunpeiDong / DreamLLM
[ICLR 2024 Spotlight] DreamLLM: Synergistic Multimodal Comprehension and Creation
☆453Updated 8 months ago
OpenGVLab / ControlLLM
ControlLLM: Augment Language Models with Tools by Searching on Graphs
☆193Updated last year
Zeqiang-Lai / Mini-DALLE3
Mini-DALLE3: Interactive Text to Image by Prompting Large Language Models
☆313Updated last year
baaivision / vid2vid-zero
Zero-Shot Video Editing Using Off-The-Shelf Image Diffusion Models
☆355Updated 2 years ago
AILab-CVC / SEED
Official implementation of SEED-LLaMA (ICLR 2024).
☆619Updated 10 months ago
maitrix-org / Pandora
Pandora: Towards General World Model with Natural Language Actions and Video States
☆510Updated 10 months ago
JiauZhang / DragDiffusion
Implementation of DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing
☆227Updated 2 years ago
yukw777 / VideoBLIP
Supercharged BLIP-2 that can handle videos
☆120Updated last year
bytedance / Shot2Story
A new multi-shot video understanding benchmark Shot2Story with comprehensive video summaries and detailed shot-level captions.
☆148Updated 6 months ago
icoz69 / StableLLAVA
Official repo for StableLLAVA
☆95Updated last year
invictus717 / InteractiveVideo
InteractiveVideo: User-Centric Controllable Video Generation with Synergistic Multimodal Instructions
☆128Updated last year
G-U-N / Gen-L-Video
The official implementation for "Gen-L-Video: Multi-Text to Long Video Generation via Temporal Co-Denoising".
☆300Updated last year
showlab / assistgpt
☆66Updated 2 years ago
AILab-CVC / Make-Your-Video
[IEEE TVCG 2024] Customized Video Generation Using Textual and Structural Guidance
☆193Updated last year
md-mohaiminul / VideoRecap
☆187Updated last year
yukw777 / EILEV
EILeV: Eliciting In-Context Learning in Vision-Language Models for Videos Through Curated Data Distributional Properties
☆128Updated 9 months ago
aim-uofa / AutoStory
[IJCV'24] AutoStory: Generating Diverse Storytelling Images with Minimal Human Effort
☆152Updated 8 months ago
OpenGVLab / Awesome-DragGAN
Awesome-DragGAN: A curated list of papers, tutorials, repositories related to DragGAN
☆85Updated last year