rohan598 / ConTextual
☆23Updated 4 months ago
Related projects ⓘ
Alternatives and complementary repositories for ConTextual
- VideoHallucer, The first comprehensive benchmark for hallucination detection in large video-language models (LVLMs)☆22Updated 4 months ago
- ☆45Updated last year
- Enhancing Large Vision Language Models with Self-Training on Image Comprehension.☆59Updated 5 months ago
- The codebase for our EMNLP24 paper: Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Mo…☆55Updated last month
- ☆35Updated 3 months ago
- ☆30Updated this week
- TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models☆22Updated last week
- [EMNLP'23] The official GitHub page for ''Evaluating Object Hallucination in Large Vision-Language Models''☆73Updated 7 months ago
- Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".☆56Updated last year
- ☆58Updated 9 months ago
- ACL'24 (Oral) Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedback☆52Updated 2 months ago
- ☆21Updated 3 months ago
- An benchmark for evaluating the capabilities of large vision-language models (LVLMs)☆33Updated last year
- Emerging Pixel Grounding in Large Multimodal Models Without Grounding Supervision☆24Updated last month
- This is the official repo for the incoming work: ByteVideoLLM☆14Updated 2 weeks ago
- HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data (Accepted by CVPR 2024)☆41Updated 4 months ago
- The official code of the paper "PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction".☆44Updated 3 weeks ago
- ☆84Updated 10 months ago
- VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models☆26Updated 4 months ago
- [ACL 2024] Multi-modal preference alignment remedies regression of visual instruction tuning on language model☆25Updated last week
- Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models☆43Updated 5 months ago
- [NeurIPS 2024] Calibrated Self-Rewarding Vision Language Models☆45Updated 5 months ago
- The official GitHub page for ''What Makes for Good Visual Instructions? Synthesizing Complex Visual Reasoning Instructions for Visual Ins…☆18Updated last year
- [Arxiv] Aligning Modalities in Vision Large Language Models via Preference Fine-tuning☆73Updated 6 months ago
- [NeurIPS 2024] TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration☆17Updated last month
- Touchstone: Evaluating Vision-Language Models by Language Models☆78Updated 10 months ago
- MATH-Vision dataset and code to measure Multimodal Mathematical Reasoning capabilities.☆69Updated last month
- Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization☆66Updated 9 months ago
- ☆36Updated last month
- Official Code of IdealGPT☆32Updated last year