VIStA-H / GPT-4V_Social_Media
GPT-4V(ision) as A Social Media Analysis Engine
☆30Updated last year
Related projects ⓘ
Alternatives and complementary repositories for GPT-4V_Social_Media
- Official implementation of HawkEye: Training Video-Text LLMs for Grounding Text in Videos☆34Updated 6 months ago
- This is the official implementation of our paper "Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension"☆16Updated this week
- LLMBind: A Unified Modality-Task Integration Framework☆17Updated 5 months ago
- [EMNLP'23] The official GitHub page for ''Evaluating Object Hallucination in Large Vision-Language Models''☆73Updated 8 months ago
- Official Implementation of ICLR'24: Kosmos-G: Generating Images in Context with Multimodal Large Language Models☆50Updated 5 months ago
- VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs☆30Updated last month
- Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".☆56Updated last year
- Official PyTorch code of "Grounded Question-Answering in Long Egocentric Videos", accepted by CVPR 2024.☆52Updated 2 months ago
- [Neurips 24' D&B] Official Dataloader and Evaluation Scripts for LongVideoBench.☆66Updated 3 months ago
- Empowering Unified MLLM with Multi-granular Visual Generation☆106Updated last month
- ☆72Updated 6 months ago
- PyTorch code for "Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training"☆28Updated 8 months ago
- code for "Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization"☆47Updated 3 months ago
- [ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, …☆89Updated last week
- ☆38Updated last month
- Implementation of "VL-Mamba: Exploring State Space Models for Multimodal Learning"☆78Updated 8 months ago
- This is a repo to track the latest autoregressive visual generation papers.☆50Updated this week
- Official implementation of "Why are Visually-Grounded Language Models Bad at Image Classification?" (NeurIPS 2024)☆52Updated last month
- ACL'24 (Oral) Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedback☆52Updated 2 months ago
- Official implement of MIA-DPO☆41Updated 3 weeks ago
- 🔥stable, simple, state-of-the-art VQVAE toolkit & cookbook☆42Updated 5 months ago
- A Versatile Video-LLM for Long and Short Video Understanding with Superior Temporal Localization Ability☆38Updated 2 weeks ago
- [NeurIPS 2024] Calibrated Self-Rewarding Vision Language Models☆45Updated 5 months ago
- A Comprehensive Benchmark and Toolkit for Evaluating Video-based Large Language Models!☆118Updated 10 months ago
- (NeurIPS 2024 Spotlight) TOPA: Extend Large Language Models for Video Understanding via Text-Only Pre-Alignment☆19Updated last month
- ☆54Updated 4 months ago
- VisualGPTScore for visio-linguistic reasoning☆26Updated last year
- [NeurlPS 2024] One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos☆52Updated this week
- Official repository of DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models☆75Updated 2 months ago