Ucas-HaoranWei / Vary-familyView external linksLinks
☆57Jan 23, 2024Updated 2 years ago
Alternatives and similar repositories for Vary-family
Users that are interested in Vary-family are comparing it to the libraries listed below
Sorting:
- Official code implementation of Vary-toy (Small Language Model Meets with Reinforced Vision Vocabulary)☆629Dec 30, 2024Updated last year
- [ECCV 2024] Official code implementation of Vary: Scaling Up the Vision Vocabulary of Large Vision Language Models.☆1,896Dec 30, 2024Updated last year
- official code for "Fox: Focus Anywhere for Fine-grained Multi-page Document Understanding"☆195May 31, 2024Updated last year
- Official implementation for Dessurt: Document end-to-end self-supervised understanding and recognition transformer☆62Jan 11, 2023Updated 3 years ago
- ☆22Dec 11, 2025Updated 2 months ago
- Accelerating GOT-OCRv2 with VLLM☆11Nov 15, 2024Updated last year
- Official code implementation of Slow Perception:Let's Perceive Geometric Figures Step-by-step☆159Jul 28, 2025Updated 6 months ago
- 支持中英文双语视觉-文本对话的开源可商用多模态模型。☆378Sep 23, 2023Updated 2 years ago
- unofficial impelement of the webformer: The Web-page Transformer for Structure Information Extraction☆13Apr 20, 2023Updated 2 years ago
- Chinese CLIP models with SOTA performance.☆60Aug 28, 2023Updated 2 years ago
- 中文原生工业测评基准☆15Mar 21, 2024Updated last year
- A Deeplearn Model to rec table in photo with ncnn. 一个深度学习模型用于检测图片中的表格 画像内のテーブルを検出するためのディープラーニング モデル☆20Mar 2, 2025Updated 11 months ago
- a family of highly capabale yet efficient large multimodal models☆191Aug 23, 2024Updated last year
- Custom Iterable Dataset Class for Large-Scale Data Loading☆14Dec 8, 2021Updated 4 years ago
- You found a secret! lzmisscc/lzmisscc is a ✨special ✨ repository that you can use to add a README.md to your GitHub profile. Make sure it…☆13Sep 4, 2023Updated 2 years ago
- Minimal user-friendly demo of OpenAI's CLIP for semantic image search☆19Sep 28, 2024Updated last year
- [EMNLP 2024] Official code for "Beyond Embeddings: The Promise of Visual Table in Multi-Modal Models"☆20Oct 17, 2024Updated last year
- Dataset and scripts for HRDoc☆41Jun 21, 2023Updated 2 years ago
- ☆156May 8, 2025Updated 9 months ago
- 基于ncnn的android端的enet分割☆17Mar 29, 2020Updated 5 years ago
- My implementation of Kosmos2.5 from the paper: "KOSMOS-2.5: A Multimodal Literate Model"☆74Feb 6, 2026Updated last week
- This is the official repository of the revised datasets FUNSD-r and CORD-r, introduced in EMNLP 2023 paper Reading Order Matters: Informa…☆17Mar 20, 2024Updated last year
- [Paper] Code for the EMNLP2023 (Findings) paper "Global Structure Knowledge-Guided Relation Extraction Method for Visually-Rich Document"☆17Dec 1, 2023Updated 2 years ago
- Transformer related optimization, including BERT, GPT☆17Jul 29, 2023Updated 2 years ago
- [COLM'24] "How Easily do Irrelevant Inputs Skew the Responses of Large Language Models?"☆22Oct 13, 2024Updated last year
- Attaching human-like eyes to the large language model. The codes of IEEE TMM paper "LMEye: An Interactive Perception Network for Large La…☆49Jul 18, 2024Updated last year
- Code and Dataset for our paper: Layout-Aware Single-Image Document Flattening☆23Dec 16, 2024Updated last year
- ☆25Jun 22, 2023Updated 2 years ago
- ☆19Mar 28, 2022Updated 3 years ago
- ☆187Feb 27, 2024Updated last year
- Algorithms, papers, datasets, performance comparisons for Document AI. Continuously updating.☆203Mar 1, 2025Updated 11 months ago
- BlockchainGPT: An intuitive, chat-based platform to manage your blockchain environments using natural language processing capabilities.☆11Jul 6, 2023Updated 2 years ago
- ☆142Feb 13, 2024Updated 2 years ago
- List of environments and competitions for RL and AI training☆25Sep 10, 2025Updated 5 months ago
- On the Hidden Mystery of OCR in Large Multimodal Models (OCRBench)☆793Jul 5, 2025Updated 7 months ago
- [NAACL 2024] MMC: Advancing Multimodal Chart Understanding with LLM Instruction Tuning☆95Jan 7, 2025Updated last year
- Question Answering dataset generator of Document Visual in English and Chinese☆24Apr 17, 2023Updated 2 years ago
- Official implementation of UPOCR: Towards unified pixel-level OCR interface (ICML 2024)☆67Jun 6, 2024Updated last year
- Dataset and Code for our ACL 2024 paper: "Multimodal Table Understanding". We propose the first large-scale Multimodal IFT and Pre-Train …☆225Jun 12, 2025Updated 8 months ago