RUCAIBox / Event-Bench
Official code of *Towards Event-oriented Long Video Understanding*
☆12Updated 6 months ago
Alternatives and similar repositories for Event-Bench:
Users that are interested in Event-Bench are comparing it to the libraries listed below
- Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".☆57Updated last year
- ☆26Updated 6 months ago
- VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs☆33Updated 3 months ago
- TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models☆27Updated 2 months ago
- ☆28Updated 2 months ago
- VideoHallucer, The first comprehensive benchmark for hallucination detection in large video-language models (LVLMs)☆25Updated 7 months ago
- HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data (Accepted by CVPR 2024)☆43Updated 6 months ago
- The official GitHub page for ''What Makes for Good Visual Instructions? Synthesizing Complex Visual Reasoning Instructions for Visual Ins…☆18Updated last year
- Accelerating Vision-Language Pretraining with Free Language Modeling (CVPR 2023)☆31Updated last year
- Repository of paper: Position-Enhanced Visual Instruction Tuning for Multimodal Large Language Models☆37Updated last year
- [ECCV2024] Learning Video Context as Interleaved Multimodal Sequences☆35Updated last week
- [ArXiv] V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding☆27Updated last month
- This repo contains code for "VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by VIdeo SpatioTemporal Augmentation"☆11Updated 3 weeks ago
- ☆17Updated 6 months ago
- VisualGPTScore for visio-linguistic reasoning☆26Updated last year
- [NeurIPS2024] Official code for (IMA) Implicit Multimodal Alignment: On the Generalization of Frozen LLMs to Multimodal Inputs☆16Updated 3 months ago
- [2024-ACL]: TextBind: Multi-turn Interleaved Multimodal Instruction-following in the Wildrounded Conversation☆47Updated last year
- 【NeurIPS 2024】The official code of paper "Automated Multi-level Preference for MLLMs"☆17Updated 4 months ago
- [EMNLP'22] Weakly-Supervised Temporal Article Grounding☆14Updated last year
- EMNLP2023 - InfoSeek: A New VQA Benchmark focus on Visual Info-Seeking Questions☆17Updated 8 months ago
- Repo for paper: "Paxion: Patching Action Knowledge in Video-Language Foundation Models" Neurips 23 Spotlight☆37Updated last year
- The released data for paper "Measuring and Improving Chain-of-Thought Reasoning in Vision-Language Models".☆32Updated last year
- Official repo of the ICLR 2025 paper "MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos"☆23Updated 4 months ago
- Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models☆43Updated 7 months ago
- [ECCV'22 Poster] Explicit Image Caption Editing☆21Updated 2 years ago
- ☆95Updated last year
- code for "Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization"☆53Updated 5 months ago
- COLA: Evaluate how well your vision-language model can Compose Objects Localized with Attributes!☆24Updated 2 months ago
- ☆17Updated 11 months ago
- CVPR 2022 (Oral) Pytorch Code for Unsupervised Vision-and-Language Pre-training via Retrieval-based Multi-Granular Alignment☆22Updated 2 years ago