MajorDavidZhang / Generalization_unified_VLMLinks
☆17Updated 2 months ago
Alternatives and similar repositories for Generalization_unified_VLM
Users that are interested in Generalization_unified_VLM are comparing it to the libraries listed below
Sorting:
- The code for "VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by VIdeo SpatioTemporal Augmentation" [CVPR2025]☆19Updated 5 months ago
- M2-Reasoning: Empowering MLLMs with Unified General and Spatial Reasoning☆35Updated 3 weeks ago
- [CVPR 2025] PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models☆45Updated last month
- T2I-Copilot: A Training-Free Multi-Agent Text-to-Image System for Enhanced Prompt Interpretation and Interactive Generation (ICCV'25)☆23Updated last week
- [NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effect…☆38Updated last year
- Official implementation of Next Block Prediction: Video Generation via Semi-Autoregressive Modeling☆38Updated 5 months ago
- Official implementation of "Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology"☆49Updated 3 weeks ago
- PyTorch implementation of "Sample- and Parameter-Efficient Auto-Regressive Image Models" from CVPR 2025☆12Updated 4 months ago
- Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?☆63Updated 3 weeks ago
- High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning☆44Updated 2 weeks ago
- ☆51Updated 3 weeks ago
- [ECCV 2024] This is the official implementation of "Stitched ViTs are Flexible Vision Backbones".☆28Updated last year
- SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward☆72Updated this week
- A curated list of papers and resources for text-to-image evaluation.☆30Updated last year
- ☆19Updated last year
- On Path to Multimodal Generalist: General-Level and General-Bench☆19Updated 3 weeks ago
- ☆62Updated this week
- ☆101Updated last month
- ☆37Updated 2 months ago
- [ICCV 2025] Dynamic-VLM☆23Updated 7 months ago
- [CVPR 2025 AI4CC Workshop] Official Implementation of HumanEdit: A High-Quality Human-Rewarded Dataset for Instruction-based Image Editin…☆33Updated 3 months ago
- [Preprint] GMem: A Modular Approach for Ultra-Efficient Generative Models☆39Updated 4 months ago
- Official InfiniBench: A Benchmark for Large Multi-Modal Models in Long-Form Movies and TV Shows☆15Updated last month
- Test-time Scaling for VAR models☆21Updated last week
- WeGeFT: Weight‑Generative Fine‑Tuning for Multi‑Faceted Efficient Adaptation of Large Models☆21Updated 3 weeks ago
- Autoregressive Semantic Visual Reconstruction Helps VLMs Understand Better☆36Updated last month
- Video Diffusion State Space Models☆19Updated last year
- HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation☆63Updated 5 months ago
- ☆43Updated 9 months ago
- [ICLR 2025] IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model☆31Updated 8 months ago