haoningwu3639 / SimpleSDM-Video
A simple and flexible PyTorch implementation of Video StableDiffusion (ZeroScope_v2) based on diffusers.
☆16Updated last year
Alternatives and similar repositories for SimpleSDM-Video:
Users that are interested in SimpleSDM-Video are comparing it to the libraries listed below
- A simple and flexible PyTorch implementation of StableDiffusion based on diffusers.☆22Updated 4 months ago
- A simple and flexible PyTorch implementation of StableDiffusion-3 based on diffusers for DIY and finetuning.☆17Updated last month
- Code Release of F-LMM: Grounding Frozen Large Multimodal Models☆62Updated 6 months ago
- [AAAI 2025] Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos☆21Updated 5 months ago
- ☆38Updated 4 months ago
- ☆16Updated last year
- Official code for CVPR 2024 paper, "Audio-Visual Segmentation via Unlabeled Frame Exploitation""☆11Updated 7 months ago
- PyTorch code for "Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training"☆32Updated 11 months ago
- MRGen: Diffusion-based Controllable Data Engine for MRI Segmentation towards Unannotated Modalities☆16Updated last month
- Official PyTorch code of "Grounded Question-Answering in Long Egocentric Videos", accepted by CVPR 2024.☆56Updated 5 months ago
- [ECCV 2024] ControlCap: Controllable Region-level Captioning☆67Updated 3 months ago
- [NeurlPS 2024] One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos☆105Updated last month
- The official implementation of the paper "MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding". …☆49Updated 3 months ago
- Official implementation of TagAlign☆34Updated 2 months ago
- [NeurIPS 2024] Efficient Large Multi-modal Models via Visual Context Compression☆51Updated this week
- The official repository for paper "PruneVid: Visual Token Pruning for Efficient Video Large Language Models".☆29Updated this week
- A Large Multimodal Model for Pixel-Level Visual Grounding in Videos☆41Updated 2 months ago
- A simple and flexible PyTorch implementation of StableDiffusion-XL based on diffusers.☆14Updated 5 months ago
- [NeurIPS 2024] MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models☆45Updated 2 months ago
- VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection☆51Updated last month
- Official PyTorch implementation of the paper "Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner"☆15Updated last year
- [NeurIPS 2024] TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration☆22Updated 4 months ago
- ☆28Updated 4 months ago
- Turning to Video for Transcript Sorting☆48Updated last year
- Official code for CVPR 2024 paper: Discriminative Probing and Tuning for Text-to-Image Generation☆28Updated 2 months ago
- ☆16Updated 2 months ago
- ☆19Updated last year
- ☆58Updated last year
- [NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effect…☆35Updated 8 months ago
- The repository contains the official implementation of "Self-Calibrated CLIP for Training-Free Open-Vocabulary Segmentation"☆30Updated 2 months ago