showlab / VisInContext
Official implementation of Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning
☆14Updated 2 months ago
Alternatives and similar repositories for VisInContext:
Users that are interested in VisInContext are comparing it to the libraries listed below
- Official implementation of our paper "Finetuned Multimodal Language Models are High-Quality Image-Text Data Filters".☆43Updated 2 weeks ago
- ☕️ CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion☆30Updated this week
- Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision☆58Updated 6 months ago
- ACL'24 (Oral) Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedback☆56Updated 4 months ago
- [NeurIPS2024] VideoGUI: A Benchmark for GUI Automation from Instructional Videos☆26Updated last month
- [Arxiv] Aligning Modalities in Vision Large Language Models via Preference Fine-tuning☆78Updated 8 months ago
- [EMNLP 2023] TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding☆49Updated last year
- Language Repository for Long Video Understanding☆31Updated 7 months ago
- Enhancing Large Vision Language Models with Self-Training on Image Comprehension.☆62Updated 7 months ago
- [NeurIPS 2024] Calibrated Self-Rewarding Vision Language Models☆60Updated 7 months ago
- ☆49Updated last week
- ☆72Updated 8 months ago
- Official repo of the paper "MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos"☆23Updated 3 months ago
- ☆91Updated 8 months ago
- [EMNLP'23] The official GitHub page for ''Evaluating Object Hallucination in Large Vision-Language Models''☆77Updated 9 months ago
- VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs☆33Updated 2 months ago
- VideoHallucer, The first comprehensive benchmark for hallucination detection in large video-language models (LVLMs)☆25Updated 6 months ago
- Official Repository of Multi-Object Hallucination in Vision-Language Models (NeurIPS 2024)☆26Updated 2 months ago
- Official Pytorch implementation of "Interpreting and Editing Vision-Language Representations to Mitigate Hallucinations"☆51Updated 3 weeks ago
- MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models☆58Updated 4 months ago
- [NeurIPS2023] Official implementation of the paper "Large Language Models are Visual Reasoning Coordinators"☆104Updated last year
- ☆55Updated 8 months ago
- [AAAI 2025] HiRED strategically drops visual tokens in the image encoding stage to improve inference efficiency for High-Resolution Visio…☆18Updated last week
- ☆37Updated 2 months ago
- Official repo for StableLLAVA☆94Updated last year
- ☆63Updated last month
- Official Repository of VideoLLaMB: Long Video Understanding with Recurrent Memory Bridges☆60Updated 3 months ago
- This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.or…☆112Updated 6 months ago
- Official code repository for Interleaved Scene Graph.☆13Updated last month
- ☆42Updated 5 months ago