Ucas-HaoranWei/Vary-family

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Ucas-HaoranWei/Vary-family)

Ucas-HaoranWei / Vary-family

☆57

Alternatives and similar repositories for Vary-family

Users that are interested in Vary-family are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Ucas-HaoranWei / Vary-toy
View on GitHub
Official code implementation of Vary-toy (Small Language Model Meets with Reinforced Vision Vocabulary)
☆630Dec 30, 2024Updated last year
Ucas-HaoranWei / Vary-tiny-600k
View on GitHub
Vary-tiny codebase upon LAVIS （for training from scratch）and a PDF image-text pairs data (about 600k including English/Chinese)
☆89Sep 21, 2024Updated last year
Ucas-HaoranWei / Aircraft-KP
View on GitHub
Keypoint dataset for airplane
☆10Dec 28, 2019Updated 6 years ago
Ucas-HaoranWei / Vary
View on GitHub
[ECCV 2024] Official code implementation of Vary: Scaling Up the Vision Vocabulary of Large Vision Language Models.
☆1,889Dec 30, 2024Updated last year
Ucas-HaoranWei / Slow-Perception
View on GitHub
Official code implementation of Slow Perception:Let's Perceive Geometric Figures Step-by-step
☆161Jul 28, 2025Updated 11 months ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
ucaslcl / Fox
View on GitHub
official code for "Fox: Focus Anywhere for Fine-grained Multi-page Document Understanding"
☆196May 31, 2024Updated 2 years ago
herobd / dessurt
View on GitHub
Official implementation for Dessurt: Document end-to-end self-supervised understanding and recognition transformer
☆62Jan 11, 2023Updated 3 years ago
LingyvKong / OneChart
View on GitHub
[ACM'MM 2024 Oral] Official code for "OneChart: Purify the Chart Structural Extraction via One Auxiliary Token"
☆266Apr 14, 2025Updated last year
lucasjinreal / MLLM_Factory
View on GitHub
A Dead Simple and Modularized Multi-Modal Training and Finetune Framework. Compatible to any LLaVA/Flamingo/QwenVL/MiniGemini etc series …
☆19Apr 24, 2024Updated 2 years ago
vivien000 / clip-demo
View on GitHub
Minimal user-friendly demo of OpenAI's CLIP for semantic image search
☆20Sep 28, 2024Updated last year
jfma-USTC / HRDoc
View on GitHub
Dataset and scripts for HRDoc
☆42Jun 21, 2023Updated 3 years ago
1694439208 / GOT-OCR-Inference
View on GitHub
研究GOT-OCR-项目落地加速，不限语言
☆62Oct 24, 2024Updated last year
chenxn2020 / GOSE
View on GitHub
[Paper] Code for the EMNLP2023 (Findings) paper "Global Structure Knowledge-Guided Relation Extraction Method for Visually-Rich Document"
☆17Dec 1, 2023Updated 2 years ago
liufanfanlff / RoboUniview
View on GitHub
☆66Feb 20, 2025Updated last year
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
HCIILAB / M6Doc
View on GitHub
☆163May 8, 2025Updated last year
SALT-NLP / LLaVAR
View on GitHub
Code/Data for the paper: "LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding"
☆268Jun 12, 2024Updated 2 years ago
lucasjinreal / LLaVA-Magvit2
View on GitHub
LLaVA combines with Magvit Image tokenizer, training MLLM without an Vision Encoder. Unifying image understanding and generation.
☆38Jun 20, 2024Updated 2 years ago
leeguandong / ComfyUI_InternVL2
View on GitHub
comfyui的InternVL2插件，InternVL2是当前不错的开源多模态大语言模型，在文档vqa上表现很好
☆13Aug 10, 2024Updated last year
LinkSoul-AI / Chinese-LLaVA
View on GitHub
支持中英文双语视觉-文本对话的开源可商用多模态模型。
☆378Sep 23, 2023Updated 2 years ago
MosRat / got.cpp
View on GitHub
Using Llam.cpp and onnxruntime to accelerate inference of GOT-OCR2.0
☆15Mar 6, 2025Updated last year
gccnlp / Light-PEFT
View on GitHub
[ACL 2024 Findings] Light-PEFT: Lightening Parameter-Efficient Fine-Tuning via Early Pruning
☆13Sep 2, 2024Updated last year
tsb0601 / MultiMon
View on GitHub
☆25Jun 22, 2023Updated 3 years ago
PedroBarcha / context-spelling-correction
View on GitHub
Given a text, wrap it into phrases and send them to Yandex's search engine. If it yields a "did you mean:", substitute the original phras…
☆11Dec 13, 2018Updated 7 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
liunian-Jay / MU-GOT
View on GitHub
PDF Parsing Tool: GOT's vLLM acceleration implementation, MinerU for layout recognition, and GOT for table formula parsing.
☆66Nov 7, 2024Updated last year
large-ocr-model / large-ocr-model.github.io
View on GitHub
☆189Feb 27, 2024Updated 2 years ago
kyegomez / Kosmos2.5
View on GitHub
My implementation of Kosmos2.5 from the paper: "KOSMOS-2.5: A Multimodal Literate Model"
☆75Jun 22, 2026Updated 3 weeks ago
chongzhangFDU / Token-Path-Prediction-Datasets
View on GitHub
This is the official repository of the revised datasets FUNSD-r and CORD-r, introduced in EMNLP 2023 paper Reading Order Matters: Informa…
☆17Mar 20, 2024Updated 2 years ago
SCUT-DLVCLab / Document-AI-Recommendations
View on GitHub
Algorithms, papers, datasets, performance comparisons for Document AI.
☆209Mar 1, 2025Updated last year
Rayrtfr / FasterTransformer
View on GitHub
Transformer related optimization, including BERT, GPT
☆17Jul 29, 2023Updated 2 years ago
rhythm92 / Unsupervised-Pixel-Level-Domain-Adaptation-with-GAN
View on GitHub
Implementation of Unsupervised Pixel–Level Domain Adaptation with Generative Adversarial Networks by Google
☆15Jan 10, 2017Updated 9 years ago
LukeForeverYoung / UReader
View on GitHub
☆142Feb 13, 2024Updated 2 years ago
BunnySoCrazy / LA-DocFlatten
View on GitHub
Code and Dataset for our paper: Layout-Aware Single-Image Document Flattening
☆24Dec 16, 2024Updated last year
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
SpursGoZmy / Table-LLaVA
View on GitHub
Dataset and Code for our ACL 2024 paper: "Multimodal Table Understanding". We propose the first large-scale Multimodal IFT and Pre-Train …
☆227Jun 12, 2025Updated last year
YunxinLi / LingCloud
View on GitHub
Attaching human-like eyes to the large language model. The codes of IEEE TMM paper "LMEye: An Interactive Perception Network for Large La…
☆49Jul 18, 2024Updated 2 years ago
OpenGVLab / V2PE
View on GitHub
[ICCV2025] V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding
☆60Apr 4, 2026Updated 3 months ago
mayubo2333 / MMLongBench-Doc
View on GitHub
Official Repository of MMLONGBENCH-DOC: Benchmarking Long-context Document Understanding with Visualizations
☆149Sep 28, 2025Updated 9 months ago
CLUEbenchmark / SuperCLUE-Industry
View on GitHub
中文原生工业测评基准
☆17Mar 21, 2024Updated 2 years ago
liuzhuang1024 / liuzhuang1024
View on GitHub
You found a secret! lzmisscc/lzmisscc is a ✨special ✨ repository that you can use to add a README.md to your GitHub profile. Make sure it…
☆13Apr 4, 2026Updated 3 months ago
psunlpgroup / MultiHiertt
View on GitHub
Data and code for ACL 2022 paper "MultiHiertt: Numerical Reasoning over Multi Hierarchical Tabular and Textual Data"
☆54Oct 22, 2024Updated last year