VisuLogic-Benchmark / VisuLogic-TrainLinks
☆18Updated last month
Alternatives and similar repositories for VisuLogic-Train
Users that are interested in VisuLogic-Train are comparing it to the libraries listed below
Sorting:
- Fast-Slow Thinking for Large Vision-Language Model Reasoning☆14Updated last month
- ☆81Updated 2 months ago
- ☆22Updated last month
- G1: Bootstrapping Perception and Reasoning Abilities of Vision-Language Model via Reinforcement Learning☆44Updated 2 weeks ago
- ☆16Updated 4 months ago
- ☆42Updated 6 months ago
- [CVPR 2025] PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models☆40Updated 3 months ago
- ☆43Updated 5 months ago
- ☆12Updated 3 weeks ago
- This repo contains evaluation code for the paper "AV-Odyssey: Can Your Multimodal LLMs Really Understand Audio-Visual Information?"☆25Updated 5 months ago
- [NeurIPS 2024] Official Repository of Multi-Object Hallucination in Vision-Language Models☆29Updated 6 months ago
- ☆30Updated 10 months ago
- Multimodal RewardBench☆39Updated 3 months ago
- Assessing Context-Aware Creative Intelligence in MLLMs☆19Updated 2 months ago
- Official implementation of MC-LLaVA.☆27Updated 4 months ago
- ☆37Updated 10 months ago
- ☆26Updated 6 months ago
- Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?☆49Updated this week
- ☆46Updated last month
- [NeurIPS 2024] Calibrated Self-Rewarding Vision Language Models☆74Updated 11 months ago
- ☆77Updated 4 months ago
- Official implement of MIA-DPO☆58Updated 4 months ago
- Official implementation of Next Block Prediction: Video Generation via Semi-Autoregressive Modeling☆31Updated 3 months ago
- ☆36Updated 2 weeks ago
- VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs☆47Updated 2 months ago
- ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration☆37Updated 5 months ago
- The official code of "VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning"☆103Updated 2 weeks ago
- [ArXiv] V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding☆47Updated 5 months ago
- VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning☆29Updated last month
- ☆32Updated 3 weeks ago