Mengzibin / SocialGPT
☆21Updated 3 months ago
Alternatives and similar repositories for SocialGPT:
Users that are interested in SocialGPT are comparing it to the libraries listed below
- Accepted by CVPR 2024☆31Updated 9 months ago
- Official Repository of Multi-Object Hallucination in Vision-Language Models (NeurIPS 2024)☆27Updated 3 months ago
- [Preprint] Number it: Temporal Grounding Videos like Flipping Manga☆55Updated 2 months ago
- A collection of vision foundation models unifying understanding and generation.☆40Updated last month
- Empowering Unified MLLM with Multi-granular Visual Generation☆117Updated last month
- [NeurlPS 2024] One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos☆105Updated last month
- Official repository of DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models☆81Updated 5 months ago
- [ECCV 2024🔥] Official implementation of the paper "ST-LLM: Large Language Models Are Effective Temporal Learners"☆140Updated 5 months ago
- [NeurIPS 2024] Official code for HourVideo: 1-Hour Video Language Understanding☆63Updated last month
- PhysGame Benchmark for Physical Commonsense Evaluation in Gameplay Videos☆37Updated 2 weeks ago
- OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?☆30Updated last month
- [NeurIPS2024] Repo for the paper `ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models'☆144Updated last month
- The code and data of Paper: Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation☆85Updated 3 months ago
- [NeurIPS'24] This repository is the implementation of "SpatialRGPT: Grounded Spatial Reasoning in Vision Language Models"☆117Updated 2 months ago
- 🌀 R^2-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal Grounding (ECCV 2024)☆74Updated 7 months ago
- Official implementation of "Why are Visually-Grounded Language Models Bad at Image Classification?" (NeurIPS 2024)☆70Updated 4 months ago
- TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models☆27Updated 3 months ago
- [ICLR 2024] Seer: Language Instructed Video Prediction with Latent Diffusion Models☆24Updated 9 months ago
- ☆47Updated this week
- [NeurIPS 2024] The official implement of research paper "FreeLong : Training-Free Long Video Generation with SpectralBlend Temporal Atten…☆36Updated 2 months ago
- Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models☆83Updated 2 months ago
- Video Generation, Physical Commonsense, Semantic Adherence, VideoCon-Physics☆78Updated 2 weeks ago
- [ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, …☆101Updated 2 weeks ago
- Official Implementation of ICLR'24: Kosmos-G: Generating Images in Context with Multimodal Large Language Models☆64Updated 8 months ago
- This repository collects papers on VLLM applications. We will update new papers irregularly.☆46Updated 3 weeks ago
- ✨✨ [ICLR 2025] MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?☆93Updated last week
- Language Repository for Long Video Understanding☆31Updated 8 months ago
- 🔥 Official impl. of "TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation".☆256Updated this week