hnam-1765 / WriteViTLinks
☆16Updated 3 months ago
Alternatives and similar repositories for WriteViT
Users that are interested in WriteViT are comparing it to the libraries listed below
Sorting:
- ☆17Updated 5 months ago
- [PR 2024] A large Cross-Modal Video Retrieval Dataset with Reading Comprehension☆28Updated 2 years ago
- Official PyTorch implementation of `[ACMMM 2023]Relational Contrastive Learning for Scene Text Recognition`☆17Updated 2 years ago
- [CVPR 2025] DocLayLLM: An Efficient Multi-modal Extension of Large Language Models for Text-rich Document Understanding☆22Updated 3 weeks ago
- [arXiv: 2505.12307] LogicOCR: Do Your Large Multimodal Models Excel at Logical Reasoning on Text-Rich Images?☆34Updated last month
- [NeurIPS'24] GoMatching: A Simple Baseline for Video Text Spotting via Long and Short Term Matching☆28Updated 7 months ago
- [ICME 2023] FlowText: Synthesizing Realistic Scene Text Video with Optical Flow Estimation☆13Updated 2 years ago
- "Towards Improving Document Understanding: An Exploration on Text-Grounding via MLLMs" 2023☆16Updated last year
- The official repo for the technical report "Scalable Mask Annotation for Video Text Spotting"☆16Updated 2 years ago
- ECCV2024_Parrot Captions Teach CLIP to Spot Text☆66Updated last year
- [ACM MM2025] The official repository for the RealSyn dataset☆39Updated 3 weeks ago
- [CVPR 2025] PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models☆51Updated 6 months ago
- Analyse and Design Deep Neural Network, Dr.Kalhor, University of Tehran☆11Updated last year
- VimTS: A Unified Video and Image Text Spotter☆79Updated last year
- (ICCV2025) Official repository of paper "ViSpeak: Visual Instruction Feedback in Streaming Videos"☆44Updated 6 months ago
- The official implementation of ADDP (ICLR 2024)☆12Updated last year
- [T-PAMI 2025] EMOv2: Pushing 5M Vision Model Frontier☆54Updated last year
- Official code of the paper "VideoMolmo: Spatio-Temporal Grounding meets Pointing"☆53Updated 6 months ago
- The official repo for “TextCoT: Zoom In for Enhanced Multimodal Text-Rich Image Understanding”.☆44Updated last year
- Official PyTorch implementation of "No Time to Waste: Squeeze Time into Channel for Mobile Video Understanding"☆32Updated last year
- The official implementation of the paper "MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding". …☆62Updated last year
- [CBMI 2024 Best Paper] Official repository of the paper "Is CLIP the main roadblock for fine-grained open-world perception?".☆31Updated 7 months ago
- WeGeFT: Weight‑Generative Fine‑Tuning for Multi‑Faceted Efficient Adaptation of Large Models☆22Updated 6 months ago
- Official Code for: "DC-SAM: In-Context Segment Anything in Images and Videos via Dual Consistency"☆30Updated 2 weeks ago
- This is the official repository for the paper "Modeling Human Gaze Behavior with Diffusion Models for Unified Scanpath Prediction". ICCV …☆22Updated last month
- ☆13Updated 7 months ago
- Fully Open Framework for Democratized Multimodal Reinforcement Learning.☆33Updated 3 weeks ago
- [ICCV 2025] Dynamic-VLM☆28Updated last year
- BTS: A Bi-lingual Benchmark for Text Segmentation in the Wild☆33Updated last year
- [NeurIPS'25] ColorBench: Can VLMs See and Understand the Colorful World? A Comprehensive Benchmark for Color Perception, Reasoning, and R…☆30Updated 3 months ago