NiceRingNode / Awesome-Image-Generators-for-OCR-Image-Generation-and-EditingLinks

Evaluating SOTA image generators' generation and editing abilities in OCR tasks.

☆188

Alternatives and similar repositories for Awesome-Image-Generators-for-OCR-Image-Generation-and-Editing

Users that are interested in Awesome-Image-Generators-for-OCR-Image-Generation-and-Editing are comparing it to the libraries listed below

Sorting:

NiceRingNode / LGGPT
[IJCV 2025] Smaller But Better: Unifying Layout Generation with Smaller Large Language Models
☆138Updated 3 weeks ago
SCUT-DLVCLab / PAVENet
[IEEE TPAMI 2025] Official repository of "Privacy-Preserving Biometric Verification With Handwritten Random Digit String".
☆59Updated last month
SCUT-DLVCLab / DOLPHIN
[IEEE TIFS 2024] Official repository of "Online Writer Retrieval with Chinese Handwritten Phrases: A Synergistic Temporal-Frequency Repre…
☆52Updated last month
bzluan / TextCoT
The official repo for “TextCoT: Zoom In for Enhanced Multimodal Text-Rich Image Understanding”.
☆40Updated 9 months ago
Fantasyele / LLaVA-KD
[ICCV 2025] Official implementation of LLaVA-KD: A Framework of Distilling Multimodal Large Language Models
☆87Updated last week
minglllli / CLS-RL
Think or Not Think: A Study of Explicit Thinking in Rule-Based Visual Reinforcement Fine-Tuning
☆50Updated last month
SCUT-DLVCLab / MegaHan97K
[PR 2025] The official GitHub page of "MegaHan97K: A Large-Scale Dataset for Mega-Category Chinese Character Recognition with over 97K Ca…
☆59Updated this week
Wang-Xiaodong1899 / CVPR25-MLLM-Paper-List
🔥CVPR 2025 Multimodal Large Language Models Paper List
☆147Updated 4 months ago
mrwu-mac / ControlMLLM
[NeurIPS2024] Repo for the paper `ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models'
☆184Updated this week
SxJyJay / UniToken
[CVPRW 2025] UniToken is an auto-regressive generation model that combines discrete and continuous representations to process visual inpu…
☆86Updated 2 months ago
MME-Benchmarks / MME-RealWorld
✨✨ [ICLR 2025] MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?
☆128Updated 4 months ago
PhoenixZ810 / RISEBench
Official Repository of paper: Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing
☆69Updated this week
foundation-multimodal-models / CAL
[NeurIPS'24] Official PyTorch Implementation of Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment
☆58Updated 9 months ago
saccharomycetes / mllms_know
[ICLR'25] Official code for the paper 'MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs'
☆224Updated 2 months ago
PKU-ICST-MIPL / Finedefics_ICLR2025
☆66Updated 2 months ago
Kwai-YuanQi / MM-RLHF
The Next Step Forward in Multimodal LLM Alignment
☆169Updated 2 months ago
AFeng-x / Draw-and-Understand
[ICLR2025] Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want
☆82Updated last month
ding523 / Curr_REFT
☆62Updated last month
BAAI-DCAI / MMVU
☆48Updated 3 months ago
MME-Benchmarks / MME-CoT
MME-CoT: Benchmarking Chain-of-Thought in LMMs for Reasoning Quality, Robustness, and Efficiency
☆116Updated 2 weeks ago
opendatalab / LEGION
The official implementation of the paper "LEGION: Learning to Ground and Explain for Synthetic Image Detection"
☆42Updated last month
Hoar012 / RAP-MLLM
[CVPR 2025] RAP: Retrieval-Augmented Personalization
☆64Updated 3 weeks ago
ywh187 / FitPrune
☆53Updated 2 months ago
OpenGVLab / Mono-InternVL
[CVPR 2025] Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
☆48Updated 3 months ago
yu-rp / VisualPerceptionToken
☆88Updated 3 months ago
Liuziyu77 / RAR
The official implementation of RAR
☆88Updated last year
MMStar-Benchmark / MMStar
[NeurIPS 2024] This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"
☆185Updated 9 months ago
RupertLuo / VoCoT
VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models
☆68Updated last year
HKUST-LongGroup / CoMM
Official repository for CoMM Dataset
☆43Updated 6 months ago
Osilly / TokenExpansion
[CVPR 2024] The official pytorch implementation of "A General and Efficient Training for Transformer via Token Expansion".
☆44Updated last year