Vision-CAIR / InfiniBench
☆12Updated last month
Related projects ⓘ
Alternatives and complementary repositories for InfiniBench
- ☆21Updated 3 months ago
- Official Repository of Personalized Visual Instruct Tuning☆24Updated 2 weeks ago
- [ECCV2024] Learning Video Context as Interleaved Multimodal Sequences☆30Updated last month
- 👾 E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding (NeurIPS 2024)☆34Updated 2 weeks ago
- [NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effect…☆32Updated 5 months ago
- Video Diffusion State Space Models☆19Updated 7 months ago
- Official repo of the paper "MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos"☆20Updated 2 months ago
- ☆38Updated last month
- This is the official repo for the incoming work: ByteVideoLLM☆15Updated 3 weeks ago
- FreeVA: Offline MLLM as Training-Free Video Assistant☆49Updated 5 months ago
- [NeurIPS 2024] EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models.☆40Updated last month
- 🔥 [CVPR 2024] Official implementation of "See, Say, and Segment: Teaching LMMs to Overcome False Premises (SESAME)"☆26Updated 5 months ago
- [ECCV 2024] ControlCap: Controllable Region-level Captioning☆55Updated 3 weeks ago
- VideoHallucer, The first comprehensive benchmark for hallucination detection in large video-language models (LVLMs)☆22Updated 4 months ago
- Official implement of MIA-DPO☆41Updated 3 weeks ago
- MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models☆55Updated 2 months ago
- ☆30Updated 2 months ago
- 🔥 Aurora Series: A more efficient multimodal large language model series for video.☆47Updated last week
- [NeurIPS 2024] Efficient Multi-modal Models via Stage-wise Visual Context Compression☆41Updated 3 months ago
- ☆17Updated 5 months ago
- ☆19Updated 11 months ago
- [NeurIPS 2024] The official implement of research paper "FreeLong : Training-Free Long Video Generation with SpectralBlend Temporal Atten…☆29Updated last week
- Turning to Video for Transcript Sorting☆46Updated last year
- Look, Compare, Decide: Alleviating Hallucination in Large Vision-Language Models via Multi-View Multi-Path Reasoning☆17Updated 2 months ago
- ☆30Updated this week
- INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model☆39Updated 3 months ago
- SAFREE: Training-Free and Adaptive Guard for Safe Text-to-Image and Video Generation (arXiv: 2410.12761)☆19Updated last month
- [CVPR 2024] Improving language-visual pretraining efficiency by perform cluster-based masking on images.☆22Updated 6 months ago