Mengzibin / SocialGPTLinks
☆21Updated 7 months ago
Alternatives and similar repositories for SocialGPT
Users that are interested in SocialGPT are comparing it to the libraries listed below
Sorting:
- [NeurlPS 2024] One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos☆122Updated 5 months ago
- [CVPR 2025 Oral] VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection☆82Updated last month
- Accepted by CVPR 2024☆33Updated last year
- A collection of vision foundation models unifying understanding and generation.☆55Updated 5 months ago
- ☆48Updated 2 months ago
- SpatialScore: Towards Unified Evaluation for Multimodal Spatial Understanding☆41Updated this week
- (ICLR 2025 Spotlight) Official code repository for Interleaved Scene Graph.☆22Updated 4 months ago
- [CVPR 2025] OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?☆59Updated 2 months ago
- [NeurIPS'24 spotlight] MECD: Unlocking Multi-Event Causal Discovery in Video Reasoning☆28Updated 2 months ago
- Empowering Unified MLLM with Multi-granular Visual Generation☆124Updated 4 months ago
- Official code for MotionBench (CVPR 2025)☆40Updated 3 months ago
- [CVPR2025] Number it: Temporal Grounding Videos like Flipping Manga☆80Updated 2 months ago
- [NeurIPS 2024] The official implement of research paper "FreeLong : Training-Free Long Video Generation with SpectralBlend Temporal Atten…☆44Updated 3 months ago
- A paper list for spatial reasoning☆82Updated this week
- PhysGame Benchmark for Physical Commonsense Evaluation in Gameplay Videos☆44Updated 3 weeks ago
- Official repository of DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models☆85Updated 9 months ago
- MetaSpatial leverages reinforcement learning to enhance 3D spatial reasoning in vision-language models (VLMs), enabling more structured, …☆128Updated last month
- Code for "CAFe: Unifying Representation and Generation with Contrastive-Autoregressive Finetuning"☆15Updated 2 months ago
- TimeChat-online: 80% Visual Tokens are Naturally Redundant in Streaming Videos☆42Updated 2 weeks ago
- GenDoP: Auto-regressive Camera Trajectory Generation as a Director of Photography☆62Updated this week
- A Massive Multi-Discipline Lecture Understanding Benchmark☆19Updated 3 weeks ago
- Code Release of Harmonizing Visual Representations for Unified Multimodal Understanding and Generation☆117Updated 2 weeks ago
- ☆84Updated 2 months ago
- [ICLR2025] Official code implementation of Video-UTR: Unhackable Temporal Rewarding for Scalable Video MLLMs☆54Updated 3 months ago
- Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models☆111Updated 2 months ago
- ☆32Updated 2 months ago
- Video Generation, Physical Commonsense, Semantic Adherence, VideoCon-Physics☆106Updated last month
- [CVPR 2025 (Oral)] Open implementation of "RandAR"☆155Updated 2 months ago
- Official repo for "Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge" ICLR2025☆53Updated 2 months ago
- The official implementation of The paper "Exploring the Potential of Encoder-free Architectures in 3D LMMs"☆52Updated 2 weeks ago