hnam-1765 / WriteViTLinks
☆16Updated 4 months ago
Alternatives and similar repositories for WriteViT
Users that are interested in WriteViT are comparing it to the libraries listed below
Sorting:
- ☆17Updated 6 months ago
- [PR 2024] A large Cross-Modal Video Retrieval Dataset with Reading Comprehension☆28Updated 2 years ago
- WeGeFT: Weight‑Generative Fine‑Tuning for Multi‑Faceted Efficient Adaptation of Large Models☆22Updated 7 months ago
- [ICME 2023] FlowText: Synthesizing Realistic Scene Text Video with Optical Flow Estimation☆13Updated 2 years ago
- [CVPR 2025] DocLayLLM: An Efficient Multi-modal Extension of Large Language Models for Text-rich Document Understanding☆26Updated last month
- "Towards Improving Document Understanding: An Exploration on Text-Grounding via MLLMs" 2023☆16Updated last year
- The official repo for the technical report "Scalable Mask Annotation for Video Text Spotting"☆16Updated 2 years ago
- [arXiv: 2505.12307] LogicOCR: Do Your Large Multimodal Models Excel at Logical Reasoning on Text-Rich Images?☆34Updated 2 months ago
- VimTS: A Unified Video and Image Text Spotter☆78Updated last year
- [NeurIPS'24] GoMatching: A Simple Baseline for Video Text Spotting via Long and Short Term Matching☆28Updated 8 months ago
- ☆13Updated 8 months ago
- [ICCV2025] A Token-level Text Image Foundation Model for Document Understanding☆130Updated 5 months ago
- ☆33Updated last year
- Official PyTorch implementation of "No Time to Waste: Squeeze Time into Channel for Mobile Video Understanding"☆32Updated last year
- [T-PAMI 2025] EMOv2: Pushing 5M Vision Model Frontier☆54Updated last year
- ☆11Updated last year
- The official implementation of the paper "MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding". …☆62Updated last year
- ECCV2024_Parrot Captions Teach CLIP to Spot Text☆66Updated last year
- [ACM MM2025] The official repository for the RealSyn dataset☆40Updated last month
- Official PyTorch implementation of `[ACMMM 2023]Relational Contrastive Learning for Scene Text Recognition`☆17Updated 2 years ago
- [CVPR 2025] Few-shot Recognition via Stage-Wise Retrieval-Augmented Finetuning☆29Updated last month
- Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment☆64Updated 6 months ago
- [ICCV 2025] Dynamic-VLM☆28Updated last year
- ☆32Updated last year
- Transactions on Multimedia (TMM25)☆19Updated 10 months ago
- Fully Open Framework for Democratized Multimodal Reinforcement Learning.☆39Updated last month
- [CVPR 2025] Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training☆101Updated 6 months ago
- The official implementation of our paper "Cockatiel: Ensembling Synthetic and Human Preferenced Training for Detailed Video Caption"☆38Updated 8 months ago
- The official implementation of ADDP (ICLR 2024)☆12Updated last year
- [CVPR 2025] PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models☆51Updated 7 months ago