kaleido-lab / dolphin
General video interaction platform based on LLMs, including Video ChatGPT
☆252Updated last year
Alternatives and similar repositories for dolphin:
Users that are interested in dolphin are comparing it to the libraries listed below
- [ICLR 2024 Spotlight] DreamLLM: Synergistic Multimodal Comprehension and Creation☆415Updated 2 months ago
- BindDiffusion: One Diffusion Model to Bind Them All☆165Updated last year
- VideoLLM: Modeling Video Sequence with Large Language Models☆154Updated last year
- [ICLR 2024] Code for FreeNoise based on VideoCrafter☆398Updated 7 months ago
- ☆164Updated last year
- Retrieval-Augmented Video Generation for Telling a Story☆252Updated last year
- [CVPR 2024] VCoder: Versatile Vision Encoders for Multimodal Large Language Models☆271Updated 9 months ago
- A new multi-shot video understanding benchmark Shot2Story with comprehensive video summaries and detailed shot-level captions.☆117Updated 2 weeks ago
- Subject-Diffusion:Open Domain Personalized Text-to-Image Generation without Test-time Fine-tuning☆289Updated 7 months ago
- [IEEE TVCG 2024] Customized Video Generation Using Textual and Structural Guidance☆188Updated 11 months ago
- ☆174Updated 7 months ago
- Official implementation of SEED-LLaMA (ICLR 2024).☆597Updated 4 months ago
- official implementation of VideoDirectorGPT: Consistent Multi-scene Video Generation via LLM-Guided Planning (COLM 2024)☆170Updated 6 months ago
- The official repository of "Video assistant towards large language model makes everything easy"☆218Updated last month
- [SIGGRAPH Asia 2023] An interactive story visualization tool that support multiple characters☆261Updated 10 months ago
- EILeV: Eliciting In-Context Learning in Vision-Language Models for Videos Through Curated Data Distributional Properties☆118Updated 3 months ago
- Official code for 'Paragraph-to-Image Generation with Information-Enriched Diffusion Model'☆102Updated 2 months ago
- 🐟 Code and models for the NeurIPS 2023 paper "Generating Images with Multimodal Language Models".☆447Updated last year
- Mini-DALLE3: Interactive Text to Image by Prompting Large Language Models☆306Updated last year
- [NeurIPS'23] "MagicBrush: A Manually Annotated Dataset for Instruction-Guided Image Editing".☆330Updated 8 months ago
- [IJCV'24] AutoStory: Generating Diverse Storytelling Images with Minimal Human Effort☆146Updated 2 months ago
- [CVPR 2024] Prompt Highlighter: Interactive Control for Multi-Modal LLMs☆138Updated 6 months ago
- Zero-Shot Video Editing Using Off-The-Shelf Image Diffusion Models☆350Updated last year
- LLaVA-Interactive-Demo☆362Updated 6 months ago
- LLMScore: Unveiling the Power of Large Language Models in Text-to-Image Synthesis Evaluation☆126Updated last year
- Long Context Transfer from Language to Vision☆360Updated 2 months ago
- The official implementation for "Gen-L-Video: Multi-Text to Long Video Generation via Temporal Co-Denoising".☆295Updated last year
- MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities (ICML 2024)☆284Updated 3 weeks ago
- Official PyTorch implementation of the paper "In-Context Learning Unlocked for Diffusion Models"☆385Updated 10 months ago
- [TMLR23] Official implementation of UnIVAL: Unified Model for Image, Video, Audio and Language Tasks.☆224Updated last year