The official GitHub page for ''What Makes for Good Visual Instructions? Synthesizing Complex Visual Reasoning Instructions for Visual Instruction Tuning''
☆19Nov 10, 2023Updated 2 years ago
Alternatives and similar repositories for ComVint
Users that are interested in ComVint are comparing it to the libraries listed below
Sorting:
- ☆101Dec 22, 2023Updated 2 years ago
- Repository of paper: Position-Enhanced Visual Instruction Tuning for Multimodal Large Language Models☆37Sep 19, 2023Updated 2 years ago
- ☆92Nov 25, 2023Updated 2 years ago
- [MM 2023] Toward High Quality Facial Representation Learning☆19Oct 30, 2023Updated 2 years ago
- 🏠🔍 Auto check for new apartments in Hamburg from various real estate provides☆16Jun 2, 2024Updated last year
- M-HalDetect Dataset Release☆27Nov 4, 2023Updated 2 years ago
- Open Source + Multilingual MLLM + Fine-tuning + Distillation + More efficient models and learning + ?☆18Jan 31, 2025Updated last year
- ☆21Oct 10, 2023Updated 2 years ago
- The official GitHub page for ''Evaluating Object Hallucination in Large Vision-Language Models''☆248Aug 21, 2025Updated 6 months ago
- ☆134Dec 22, 2023Updated 2 years ago
- HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data (Accepted by CVPR 2024)☆52Jul 16, 2024Updated last year
- [ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning☆296Mar 13, 2024Updated last year
- Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".☆58Jun 27, 2023Updated 2 years ago
- ☆25May 13, 2024Updated last year
- Counterfactual Reasoning VQA Dataset☆28Nov 23, 2023Updated 2 years ago
- Code and Dataset for the CVPRW Paper "Where did I leave my keys? — Episodic-Memory-Based Question Answering on Egocentric Videos"☆29Aug 28, 2023Updated 2 years ago
- Exploration of the multi modal fuyu-8b model of Adept. 🤓 🔍☆27Nov 7, 2023Updated 2 years ago
- ☆75Mar 7, 2024Updated last year
- Touchstone: Evaluating Vision-Language Models by Language Models☆83Jan 18, 2024Updated 2 years ago
- ☆10Feb 10, 2026Updated 3 weeks ago
- Multi-caption Text-to-Face Synthesis: Database and Algorithm☆32Mar 17, 2022Updated 3 years ago
- Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models☆45Jun 14, 2024Updated last year
- ☆10Feb 25, 2026Updated last week
- ☆12Jun 26, 2024Updated last year
- P1AC: Revisiting Absolute Pose From a Single Affine Correspondence☆11Mar 19, 2024Updated last year
- ☆10Sep 5, 2024Updated last year
- Official implementation of the paper "Pretraining Language Models to Ponder in Continuous Space"☆25Jul 21, 2025Updated 7 months ago
- Data Programming for Text Detection in Documents using SPEAR☆12Mar 26, 2025Updated 11 months ago
- ☆10Nov 15, 2023Updated 2 years ago
- Tools for registering images with Dicom Registration files☆12Mar 20, 2024Updated last year
- (CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.☆360Jan 14, 2025Updated last year
- ☆88Jul 4, 2024Updated last year
- Official repo for StableLLAVA☆95Dec 22, 2023Updated 2 years ago
- [EMNLP'23] The official GitHub page for ''Evaluating Object Hallucination in Large Vision-Language Models''☆107Aug 21, 2025Updated 6 months ago
- paper: https://arxiv.org/abs/2307.02469 page: https://lynx-llm.github.io/☆270Aug 9, 2023Updated 2 years ago
- [ICLR 2026] Official repo for "FrameThinker: Learning to Think with Long Videos via Multi-Turn Frame Spotlighting"☆38Oct 9, 2025Updated 4 months ago
- LONGAGENT: Scaling Language Models to 128k Context through Multi-Agent Collaboration☆11Mar 11, 2024Updated last year
- ☆10Oct 4, 2023Updated 2 years ago
- [ICCV2023] DR-Tune: Improving Fine-tuning of Pretrained Visual Models by Distribution Regularization with Semantic Calibration☆12Oct 12, 2023Updated 2 years ago