yh-hust / VisuRiddlesLinks

VisuRiddles: Fine-grained Perception is a important thing for Multimodal Large Models in Riddles Solving

☆17

Alternatives and similar repositories for VisuRiddles

Users that are interested in VisuRiddles are comparing it to the libraries listed below

Sorting:

yuyq96 / R1-Vision
R1-Vision: Let's first take a look at the image
☆48Updated 9 months ago
jefferyZhan / Griffon
Official repo of Griffon series including v1(ECCV 2024), v2(ICCV 2025), G, and R, and also the RL tool Vision-R1.
☆243Updated 3 months ago
bzluan / TextCoT
The official repo for “TextCoT: Zoom In for Enhanced Multimodal Text-Rich Image Understanding”.
☆43Updated last year
2bgm / KIE-HVQA
☆14Updated 5 months ago
Alpha-Innovator / SimChart9K
The proposed simulated dataset consisting of 9,536 charts and associated data annotations in CSV format.
☆26Updated last year
FanqingM / MM-Eureka-V0
MM-Eureka V0 also called R1-Multimodal-Journey, Latest version is in MM-Eureka
☆320Updated 5 months ago
opendatalab / VIGC
AAAI 2024: Visual Instruction Generation and Correction
☆93Updated last year
BAAI-DCAI / Visual-Instruction-Tuning
SVIT: Scaling up Visual Instruction Tuning
☆163Updated last year
FreedomIntelligence / ALLaVA
Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model
☆276Updated last year
MengLcool / SliMM
☆21Updated 10 months ago
SY-Xuan / Pink
Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs
☆95Updated 10 months ago
NiceRingNode / Awesome-Generative-Models-for-OCR
[arXiv 25] Aesthetics is Cheap, Show me the Text: An Empirical Evaluation of State-of-the-Art Generative Models for OCR
☆237Updated 2 months ago
ligeng0197 / Awesome-Thinking-With-Images
Latest open-source "Thinking with images" (O3/O4-mini) papers, covering training-free, SFT-based, and RL-enhanced methods for "fine-grain…
☆97Updated 3 months ago
JierunChen / Ref-L4
Evaluation code for Ref-L4, a new REC benchmark in the LMM era
☆51Updated 10 months ago
zezeze97 / DFE-GPS
☆13Updated 4 months ago
Code-kunkun / LamRA
[CVPR 2025] LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant
☆172Updated 4 months ago
AFeng-x / Draw-and-Understand
[ICLR2025] Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want
☆91Updated 5 months ago
JiuTian-VL / JiuTian-LION
[CVPR 2024] LION: Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge
☆152Updated 2 months ago
DrLuo / SemiETS
【CVPR 2025】SemiETS: Integrating Spatial and Content Consistencies for Semi-Supervised End-to-end Text Spotting
☆14Updated 4 months ago
RupertLuo / VoCoT
VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models
☆75Updated last year
zhourax / VEGA
☆37Updated last year
MMStar-Benchmark / MMStar
[NeurIPS 2024] This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"
☆199Updated last year
Liuziyu77 / RAR
The official implementation of RAR
☆92Updated last year
OpenGVLab / OmniCorpus
[ICLR 2025 Spotlight] OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
☆402Updated 6 months ago
open-compass / MMBench
Official Repo of "MMBench: Is Your Multi-modal Model an All-around Player?"
☆266Updated 5 months ago
guoxy25 / Ocean-OCR
☆43Updated 9 months ago
ParadoxZW / LLaVA-UHD-Better
A bug-free and improved implementation of LLaVA-UHD, based on the code from the official repo
☆34Updated last year
alibaba / conv-llava
☆123Updated last year
HJYao00 / DenseConnector
【NeurIPS 2024】Dense Connector for MLLMs
☆180Updated last year
ant-research / DreamLIP
[ECCV 2024] Official PyTorch implementation of DreamLIP: Language-Image Pre-training with Long Captions
☆136Updated 6 months ago