yuyq96/TextHawk

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/yuyq96/TextHawk)

yuyq96 / TextHawk

Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models

☆68

Alternatives and similar repositories for TextHawk

Users that are interested in TextHawk are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

TenMilesLotus / DTSM
View on GitHub
Code and data for the paper: DTSM: Toward Dense Table Structure Recognition with Text Query Encoder and Adjacent Feature Aggregator
☆13Apr 28, 2024Updated 2 years ago
lcy0604 / QT-TextSR
View on GitHub
This repository is the implementation of "QT-TextSR: Enhancing scene text image super-resolution via efficient interaction with text reco…
☆20Jul 9, 2025Updated last year
HJYao00 / DenseConnector
View on GitHub
【NeurIPS 2024】Dense Connector for MLLMs
☆183Oct 14, 2024Updated last year
nota-github / ERGO
View on GitHub
ERGO (Efficient Reasoning & Guided Observation) is a large vision-language model trained with reinforcement learning on efficiency object…
☆19Feb 25, 2026Updated 4 months ago
shoaibahmed / llm_depth_pruning
View on GitHub
Official implementation of the paper: "A deeper look at depth pruning of LLMs"
☆15Jul 24, 2024Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
LukeForeverYoung / UReader
View on GitHub
☆142Feb 13, 2024Updated 2 years ago
harrytea / Awesome-Document-Understanding
View on GitHub
Document Artifical Intelligence
☆201Sep 28, 2025Updated 9 months ago
gautam-aayush / form-data-augmentation
View on GitHub
Repository for augmenting data in forms, invoices and receipts for document image understanding
☆17May 6, 2021Updated 5 years ago
cofe-ai / Mu-scaling
View on GitHub
Research without Re-search: Maximal Update Parametrization Yields Accurate Loss Prediction across Scales
☆32Jul 17, 2023Updated 3 years ago
mu-cai / matryoshka-mm
View on GitHub
Matryoshka Multimodal Models
☆123Jan 22, 2025Updated last year
Mungeryang / colqwen3
View on GitHub
The code used to train and run inference with the ColQwen3 model. Welcome to follow and star! ⭐️⭐️⭐️ https://huggingface.co/goodman2001/…
☆15Jul 4, 2026Updated 2 weeks ago
NathanGodey / qfilters
View on GitHub
Repository for the Q-Filters method (https://arxiv.org/pdf/2503.02812)
☆34Mar 7, 2025Updated last year
193746 / VHASR
View on GitHub
☆11Oct 31, 2024Updated last year
HZQ950419 / Math-LLaVA
View on GitHub
Code for Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models
☆91Jun 28, 2024Updated 2 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
bhanML / coteaching_plus
View on GitHub
ICML'19: How does Disagreement Help Generalization against Label Corruption?
☆22Jun 30, 2019Updated 7 years ago
ZZZHANG-jx / WMeter-Reader
View on GitHub
[TIM 2025] Towards Accurate Readings of Water Meters by Eliminating Transition Error: New Dataset and Effective Solution
☆19Mar 5, 2025Updated last year
RylonW / DocNLC
View on GitHub
Official code for DocNLC: A Document Image Enhancement Framework with Normalized and Latent Contrastive Representation for Multiple Degra…
☆44Mar 20, 2026Updated 4 months ago
Yuliang-Liu / Monkey
View on GitHub
Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models (CVPR 2024 Highlight)
☆1,949Jun 2, 2026Updated last month
locuslab / llava-token-compression
View on GitHub
☆47Nov 8, 2024Updated last year
DocTron-hub / Chart-R1
View on GitHub
Chart-R1: Chain-of-Thought Supervision and Reinforcement for Advanced Chart Reasoner
☆24Aug 7, 2025Updated 11 months ago
bytedance / WildDoc
View on GitHub
The official repo for “WildDoc: How Far Are We from Achieving Comprehensive and Robust Document Understanding in the Wild?“
☆74May 19, 2025Updated last year
nttmdlab-nlp / InstructDoc
View on GitHub
InstructDoc: A Dataset for Zero-Shot Generalization of Visual Document Understanding with Instructions (AAAI2024)
☆162May 31, 2024Updated 2 years ago
BunnySoCrazy / LA-DocFlatten
View on GitHub
Code and Dataset for our paper: Layout-Aware Single-Image Document Flattening
☆24Dec 16, 2024Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
SWHL / TrOCR-Formula-Rec
View on GitHub
基于TrOCR + UniMER-1M数据集，训练一个小而美的公式识别模型
☆30Mar 17, 2026Updated 4 months ago
kiaia / GIRAFFE
View on GitHub
Extending context length of visual language models
☆12Dec 18, 2024Updated last year
SxJyJay / MORE
View on GitHub
[ECCV 2022] MORE: Multi-Order RElation Mining for Dense Captioning in 3D Scenes official implementation
☆16Feb 2, 2023Updated 3 years ago
SCUT-DLVCLab / OCR-Reasoning
View on GitHub
[ICLR 2026] OCR-Reasoning Benchmark: Unveiling the True Capabilities of MLLMs in Complex Text-Rich Image Reasoning
☆76May 26, 2026Updated last month
ZeningLin / PEneo
View on GitHub
[MM'2024] PEneo, an effective algorithm for key-value pair extraction from form-like documents, designed for real-world applications.
☆41Apr 7, 2025Updated last year
thunlp / LLaVA-UHD
View on GitHub
LLaVA-UHD v3: Progressive Visual Compression for Efficient Native-Resolution Encoding in MLLMs
☆423Jul 6, 2026Updated 2 weeks ago
XMUDeepLIT / AVG-LLaVA
View on GitHub
Code for "AVG-LLaVA: A Multimodal Large Model with Adaptive Visual Granularity"
☆33Oct 12, 2024Updated last year
OpenGVLab / De-focus-Attention-Networks
View on GitHub
Learning 1D Causal Visual Representation with De-focus Attention Networks
☆35Jun 7, 2024Updated 2 years ago
WePOINTS / WePOINTS
View on GitHub
☆190Mar 13, 2026Updated 4 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
Form2Seq-Data / Dataset
View on GitHub
Dataset corresponding to the paper: "Form2Seq : A Framework for Higher-Order Form Structure Extraction"
☆10Feb 17, 2021Updated 5 years ago
ChocoWu / SeTok
View on GitHub
Codes for ICLR 2025 Paper: Towards Semantic Equivalence of Tokenization in Multimodal LLM
☆81Apr 19, 2025Updated last year
declare-lab / Emma-X
View on GitHub
Emma-X: An Embodied Multimodal Action Model with Grounded Chain of Thought and Look-ahead Spatial Reasoning
☆84May 17, 2025Updated last year
360AILAB-NLP / 360LayoutAnalysis
View on GitHub
360LayoutAnaylsis, a series Document Analysis Models and Datasets deleveped by 360 AI Research Institute
☆305Sep 10, 2024Updated last year
MengLcool / DeepStack-VL
View on GitHub
[NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effect…
☆93Jun 17, 2024Updated 2 years ago
ChWick / caffe
View on GitHub
Caffe: a fast open framework for deep learning.
☆14Jun 23, 2017Updated 9 years ago
XinshaoAmosWang / Improving-Mean-Absolute-Error-against-CCE
View on GitHub
Mean Absolute Error Does Not Treat Examples Equally and Gradient Magnitude’s Variance Matters
☆31Nov 21, 2020Updated 5 years ago