lzhxmu / VTW
β23Updated 3 months ago
Related projects β
Alternatives and complementary repositories for VTW
- [EMNLP 2024 Findingsπ₯] Official implementation of "LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context Infeβ¦β75Updated 2 weeks ago
- LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Modelsβ100Updated 6 months ago
- [NeurIPS2024] Repo for the paper `ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models'β98Updated last week
- A Survey on Benchmarks of Multimodal Large Language Modelsβ65Updated last month
- Official implementation of paper "SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference" proposed by Pekinβ¦β55Updated last month
- The official code of the paper "PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction".β45Updated 3 weeks ago
- official impelmentation of Kangaroo: A Powerful Video-Language Model Supporting Long-context Video Inputβ54Updated 2 months ago
- β23Updated 6 months ago
- Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimizationβ66Updated 9 months ago
- Official repository of MMDU datasetβ75Updated last month
- A paper list of some recent works about Token Compress for Vit and VLMβ149Updated this week
- This is the official repo for Debiasing Large Visual Language Models, including a Post-Hoc debias method and Visual Debias Decoding stratβ¦β72Updated 7 months ago
- [NeurIPS'24] Official PyTorch Implementation of Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignmentβ51Updated last month
- [NeurIPS 2024] This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"β148Updated last month
- [ECCV 2024] Paying More Attention to Image: A Training-Free Method for Alleviating Hallucination in LVLMsβ71Updated 2 weeks ago
- [EMNLP'23] The official GitHub page for ''Evaluating Object Hallucination in Large Vision-Language Models''β73Updated 7 months ago
- Official implementation of paper 'Look Twice Before You Answer: Memory-Space Visual Retracing for Hallucination Mitigation in Multimodal β¦β27Updated this week
- [NeurIPS 2024] Calibrated Self-Rewarding Vision Language Modelsβ45Updated 5 months ago
- [MM2024, oral] "Self-Supervised Visual Preference Alignment" https://arxiv.org/abs/2404.10501β41Updated 3 months ago
- A collection of visual instruction tuning datasets.β75Updated 8 months ago
- A RLHF Infrastructure for Vision-Language Modelsβ106Updated last week
- Official implement of MIA-DPOβ41Updated 2 weeks ago
- [Neurips 24' D&B] Official Dataloader and Evaluation Scripts for LongVideoBench.β66Updated 3 months ago
- [EMNLP 2024] mDPO: Conditional Preference Optimization for Multimodal Large Language Models.β33Updated last week
- HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data (Accepted by CVPR 2024)β41Updated 4 months ago
- β¨β¨ MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?β78Updated last week
- Making LLaVA Tiny via MoE-Knowledge Distillationβ63Updated last month
- β105Updated 3 months ago
- β77Updated 4 months ago
- The codebase for our EMNLP24 paper: Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Moβ¦β55Updated last month