Ucas-HaoranWei/Vary-tiny-600k

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Ucas-HaoranWei/Vary-tiny-600k)

Ucas-HaoranWei / Vary-tiny-600k

Vary-tiny codebase upon LAVIS （for training from scratch）and a PDF image-text pairs data (about 600k including English/Chinese)

☆89

Alternatives and similar repositories for Vary-tiny-600k

Users that are interested in Vary-tiny-600k are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Ucas-HaoranWei / Vary-family
View on GitHub
☆57Jan 23, 2024Updated 2 years ago
LingyvKong / OneChart
View on GitHub
[ACM'MM 2024 Oral] Official code for "OneChart: Purify the Chart Structural Extraction via One Auxiliary Token"
☆266Apr 14, 2025Updated last year
ucaslcl / Fox
View on GitHub
official code for "Fox: Focus Anywhere for Fine-grained Multi-page Document Understanding"
☆196May 31, 2024Updated 2 years ago
Ucas-HaoranWei / Vary-toy
View on GitHub
Official code implementation of Vary-toy (Small Language Model Meets with Reinforced Vision Vocabulary)
☆630Dec 30, 2024Updated last year
Ucas-HaoranWei / Vary
View on GitHub
[ECCV 2024] Official code implementation of Vary: Scaling Up the Vision Vocabulary of Large Vision Language Models.
☆1,889Dec 30, 2024Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
Ucas-HaoranWei / Slow-Perception
View on GitHub
Official code implementation of Slow Perception:Let's Perceive Geometric Figures Step-by-step
☆161Jul 28, 2025Updated 11 months ago
MosRat / got.cpp
View on GitHub
Using Llam.cpp and onnxruntime to accelerate inference of GOT-OCR2.0
☆15Mar 6, 2025Updated last year
LukeForeverYoung / UReader
View on GitHub
☆142Feb 13, 2024Updated 2 years ago
IITB-LEAP-OCR / SPRINT
View on GitHub
SPRINT: Script-agnostic Structure Recognition in Tables
☆16Mar 26, 2025Updated last year
1694439208 / GOT-OCR-Inference
View on GitHub
研究GOT-OCR-项目落地加速，不限语言
☆62Oct 24, 2024Updated last year
DocTron-hub / Chart-R1
View on GitHub
Chart-R1: Chain-of-Thought Supervision and Reinforcement for Advanced Chart Reasoner
☆24Aug 7, 2025Updated 11 months ago
Open-Reasoner-Zero / Open-Vision-Reasoner
View on GitHub
[NeurIPS 2025] The official repository for our paper, "Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reason…
☆157Sep 12, 2025Updated 10 months ago
thunlp / LLaVA-UHD
View on GitHub
LLaVA-UHD v3: Progressive Visual Compression for Efficient Native-Resolution Encoding in MLLMs
☆423Jul 6, 2026Updated 2 weeks ago
ElvisClaros / GOT-OCR2.0
View on GitHub
Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
☆23Sep 26, 2024Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
Yuliang-Liu / MultimodalOCR
View on GitHub
On the Hidden Mystery of OCR in Large Multimodal Models (OCRBench)
☆870Updated this week
microsoft / ArxivFormula
View on GitHub
This repo is used to release the ArxivFormula dataset.
☆35Nov 12, 2024Updated last year
Yifan-Gao / open_retrieval_conversational_machine_reading
View on GitHub
Open-Retrieval Conversational Machine Reading: A new setting & OR-ShARC dataset
☆13Nov 19, 2022Updated 3 years ago
TenMilesLotus / DTSM
View on GitHub
Code and data for the paper: DTSM: Toward Dense Table Structure Recognition with Text Query Encoder and Adjacent Feature Aggregator
☆13Apr 28, 2024Updated 2 years ago
justliulong / OGHFYOLO
View on GitHub
The official code for "OG-HFYOLO :Orientation Gradient Guidance and Heterogeneous Feature Fusion For Deformation Table Cell Instance Segm…
☆13Jul 28, 2025Updated 11 months ago
winter1203 / vllm_GOT2_OCR
View on GitHub
Accelerating GOT-OCRv2 with VLLM
☆10Nov 15, 2024Updated last year
Alpha-Innovator / DocGenome
View on GitHub
DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Models
☆156Jan 13, 2025Updated last year
KirariSpark / GOT-OCR-2-GUI
View on GitHub
GOT-OCR的GUI版本，提供OCR、导出PDF、批处理等功能，但不提供训练功能
☆179Nov 11, 2025Updated 8 months ago
X-PLUG / mPLUG-DocOwl
View on GitHub
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
☆2,408May 30, 2025Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
rohan598 / ConTextual
View on GitHub
☆27Jul 20, 2024Updated 2 years ago
SWHL / TrOCR-Formula-Rec
View on GitHub
基于TrOCR + UniMER-1M数据集，训练一个小而美的公式识别模型
☆30Mar 17, 2026Updated 4 months ago
vis-nlp / ChartInstruct
View on GitHub
☆28Jul 6, 2024Updated 2 years ago
dali-does / clevr-math
View on GitHub
☆13May 9, 2023Updated 3 years ago
JunjieHu / amber
View on GitHub
Explicit Alignment Objectives for Multilingual Bidirectional Encoders
☆14Apr 14, 2021Updated 5 years ago
RLHF-V / RLAIF-V
View on GitHub
[CVPR'25 highlight] RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness
☆456May 14, 2025Updated last year
MaxKinny / TabRecSet
View on GitHub
A large scale camera-taken table detection and recognition dataset.
☆150Apr 9, 2026Updated 3 months ago
ZrrSkywalker / MAVIS
View on GitHub
[ICLR 2025] Mathematical Visual Instruction Tuning for Multi-modal Large Language Models
☆156Dec 5, 2024Updated last year
FaltingsA / SSM
View on GitHub
[IJCAI-2024] The official code of Self-Supervised Pre-training with Symmetric Superimposition Modeling for Scene Text Recognition
☆10Aug 10, 2025Updated 11 months ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
SALT-NLP / LLaVAR
View on GitHub
Code/Data for the paper: "LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding"
☆268Jun 12, 2024Updated 2 years ago
billhhh / MnasNet-pytorch-pretrained
View on GitHub
A pytorch pretrained model of MnasNet
☆21Dec 3, 2019Updated 6 years ago
OpenGVLab / OmniCorpus
View on GitHub
[ICLR 2025 Spotlight] OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
☆425May 5, 2025Updated last year
CXH-Research / StainRestorer
View on GitHub
[WACV 2025] High-Fidelity Document Stain Removal via A Large-Scale Real-World Dataset and A Memory-Augmented Transformer
☆23Jan 14, 2026Updated 6 months ago
lipiji / uChecker
View on GitHub
Code of the COLING22 paper "uChecker: Masked Pretrained Language Models as Unsupervised Chinese Spelling Checkers"
☆19Aug 17, 2022Updated 3 years ago
Kromtar / EasyOCR-ONNX
View on GitHub
EasyOCR modified for ONNX use
☆13Jul 19, 2022Updated 4 years ago
CMMMU-Benchmark / CMMMU
View on GitHub
☆48Sep 5, 2024Updated last year