hanghuacs / MMComposition
β14Updated 2 months ago
Alternatives and similar repositories for MMComposition:
Users that are interested in MMComposition are comparing it to the libraries listed below
- π See How Top MLLMs Understand Video Compositions.β18Updated 2 months ago
- [NeurlPS 2024] One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videosβ105Updated last month
- [NAACL 2024] LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-text Generation?β37Updated 8 months ago
- FQGAN: Factorized Visual Tokenization and Generationβ42Updated last month
- β86Updated last month
- Official Implementation of ICLR'24: Kosmos-G: Generating Images in Context with Multimodal Large Language Modelsβ64Updated 8 months ago
- Official repository of NeurIPS D&B Track 2024 paper "VERIFIED: A Video Corpus Moment Retrieval Benchmark for Fine-Grained Video Understanβ¦β32Updated last month
- official repo for "VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation" [EMNLP2024]β73Updated last week
- [Preprint] Number it: Temporal Grounding Videos like Flipping Mangaβ55Updated 2 months ago
- A collection of vision foundation models unifying understanding and generation.β40Updated last month
- Official Pytorch implementation for LARP: Tokenizing Videos with a Learned Autoregressive Generative Prior (ICLR 2025 Oral).β50Updated last week
- Official implementation for "A Simple LLM Framework for Long-Range Video Question-Answering"β92Updated 3 months ago
- Official Implementation of VideoDPOβ49Updated last month
- π₯ Official impl. of "TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation".β256Updated this week
- Official implementation of "HowToCaption: Prompting LLMs to Transform Video Annotations at Scale." ECCV 2024β50Updated 4 months ago
- [NeurIPS 2024 D&B Track] Official Repo for "LVD-2M: A Long-take Video Dataset with Temporally Dense Captions"β45Updated 4 months ago
- Code for ROICtrl: Boosting Instance Control for Visual Generationβ101Updated 2 months ago
- [CVPR 2024] On the Content Bias in FrΓ©chet Video Distanceβ103Updated 4 months ago
- [NeurIPS 2024] Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspectiveβ59Updated 3 months ago
- β23Updated 2 months ago
- [ECCV 2024 Oral] ConceptExpress: Harnessing Diffusion Models for Single-image Unsupervised Concept Extractionβ57Updated 6 months ago
- [AAAI 2025] Grounded Multi-Hop VideoQA in Long-Form Egocentric Videosβ21Updated 5 months ago
- Official PyTorch code of "Enhancing Video-LLM Reasoning via Agent-of-Thoughts Distillation".β17Updated last week
- β56Updated 9 months ago
- β23Updated 4 months ago
- [ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, β¦β101Updated 2 weeks ago
- Code release for "EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone" [ICCV, 2023]β94Updated 7 months ago
- Official Implementation (Pytorch) of the "VidChain: Chain-of-Tasks with Metric-based Direct Preference Optimization for Dense Video Captiβ¦β15Updated 3 weeks ago