VisuLogic-Benchmark / VisuLogic-Train
☆12Updated this week
Alternatives and similar repositories for VisuLogic-Train:
Users that are interested in VisuLogic-Train are comparing it to the libraries listed below
- ☆77Updated 3 weeks ago
- ☆40Updated 3 months ago
- VisRL: Intention-Driven Visual Perception via Reinforced Reasoning☆26Updated last month
- Official implementation of MC-LLaVA.☆25Updated 2 months ago
- ☆41Updated 5 months ago
- [NeurIPS 2024] Official Repository of Multi-Object Hallucination in Vision-Language Models☆28Updated 5 months ago
- Multi-Stage Vision Token Dropping: Towards Efficient Multimodal Large Language Model☆27Updated 3 months ago
- The official code of "VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning"☆74Updated this week
- Official implementation of Next Block Prediction: Video Generation via Semi-Autoregressive Modeling☆31Updated 2 months ago
- ☆10Updated 3 months ago
- Code for Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense? [COLM 2024]☆20Updated 8 months ago
- [CVPR 2025] PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models☆39Updated 2 months ago
- ☆35Updated 9 months ago
- NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation☆46Updated this week
- A Massive Multi-Discipline Lecture Understanding Benchmark☆14Updated last week
- Code for "AVG-LLaVA: A Multimodal Large Model with Adaptive Visual Granularity"☆28Updated 6 months ago
- [ICLR 2025] CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion☆41Updated 3 months ago
- Code and data for ACL 2024 paper on 'Cross-Modal Projection in Multimodal LLMs Doesn't Really Project Visual Attributes to Textual Space'☆13Updated 9 months ago
- CLIP-MoE: Mixture of Experts for CLIP☆31Updated 6 months ago
- ☆38Updated 3 weeks ago
- ☆25Updated 5 months ago
- ☆33Updated 2 months ago
- ☆22Updated 2 months ago
- ☆19Updated 5 months ago
- Official Repository of Personalized Visual Instruct Tuning☆28Updated last month
- ☆73Updated 3 months ago
- [EMNLP 2024] Official code for "Beyond Embeddings: The Promise of Visual Table in Multi-Modal Models"☆17Updated 6 months ago
- [NeurIPS 2024] Calibrated Self-Rewarding Vision Language Models☆71Updated 10 months ago
- MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models☆33Updated 2 weeks ago
- [ECCV 2024] AdaNAT: Exploring Adaptive Policy for Token-Based Image Generation☆33Updated 7 months ago