si0wang / ViCritLinks
☆18Updated 3 weeks ago
Alternatives and similar repositories for ViCrit
Users that are interested in ViCrit are comparing it to the libraries listed below
Sorting:
- [EMNLP 2024] Official code for "Beyond Embeddings: The Promise of Visual Table in Multi-Modal Models"☆19Updated 8 months ago
- ☆42Updated 8 months ago
- DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding☆64Updated last month
- ☆45Updated 6 months ago
- ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration☆45Updated 6 months ago
- Evaluating Deep Multimodal Reasoning in Vision-Centric Agentic Tasks☆14Updated last month
- Enhancing Large Vision Language Models with Self-Training on Image Comprehension.☆68Updated last year
- [SCIS 2024] The official implementation of the paper "MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Di…☆55Updated 8 months ago
- Official implementation of "Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation" (CVPR 202…☆32Updated last month
- [ICLR 2023] CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding☆45Updated last month
- Benchmarking Video-LLMs on Video Spatio-Temporal Reasoning☆24Updated this week
- [NeurIPS 2024] Official PyTorch implementation of "Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives"☆41Updated 7 months ago
- ☆50Updated 5 months ago
- Multimodal RewardBench☆42Updated 4 months ago
- Official code for "pi-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation", ICML 2023.☆33Updated last year
- Fast-Slow Thinking for Large Vision-Language Model Reasoning☆16Updated 2 months ago
- [NeurIPS 2024] Calibrated Self-Rewarding Vision Language Models☆76Updated last year
- ☆83Updated 6 months ago
- [NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effect…☆37Updated last year
- ☆38Updated last year
- (NeurIPS 2024) What Makes CLIP More Robust to Long-Tailed Pre-Training Data? A Controlled Study for Transferable Insights☆27Updated 8 months ago
- X-Reasoner: Towards Generalizable Reasoning Across Modalities and Domains☆46Updated 2 months ago
- A instruction data generation system for multimodal language models.☆33Updated 5 months ago
- ☆12Updated 6 months ago
- [ACL2025] Unsolvable Problem Detection: Robust Understanding Evaluation for Large Multimodal Models☆77Updated last month
- The released data for paper "Measuring and Improving Chain-of-Thought Reasoning in Vision-Language Models".☆33Updated last year
- HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data (Accepted by CVPR 2024)☆45Updated 11 months ago
- Codes for ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding☆35Updated this week
- Code for our Paper "All in an Aggregated Image for In-Image Learning"☆30Updated last year
- [ICCV 2025] Dynamic-VLM☆21Updated 6 months ago