FaltingsA / SSMLinks
[IJCAI-2024] The official code of Self-Supervised Pre-training with Symmetric Superimposition Modeling for Scene Text Recognition
☆10Updated 4 months ago
Alternatives and similar repositories for SSM
Users that are interested in SSM are comparing it to the libraries listed below
Sorting:
- [ECCV2024] Bridging Synthetic and Real Worlds for Pre-training Scene Text Detectors☆19Updated last year
- [ICCV2023] Self-supervised Character-to-Character Distillation for Text Recognition☆151Updated last year
- [CVPR2023] Self-supervised Implicit Glyph Attention for Text Recognition☆109Updated 9 months ago
- ☆24Updated 11 months ago
- Update the latest text-related papers from top conferences☆26Updated 9 months ago
- What Is a Good Caption? A Comprehensive Visual Caption Benchmark for Evaluating Both Correctness and Thoroughness☆24Updated 7 months ago
- [NeurIPS'24] GoMatching: A Simple Baseline for Video Text Spotting via Long and Short Term Matching☆28Updated 6 months ago
- ODM: A Text-Image Further Alignment Pre-training Approach for Scene Text Detection and Spotting☆45Updated 8 months ago
- Transferable Decoding with Visual Entities for Zero-Shot Image Captioning, ICCV 2023☆161Updated last year
- [ICCV-2023] The official code of Bridging Vision and Language Encoders: Parameter-Efficient Tuning for Referring Image Segmentation☆137Updated 5 months ago
- [arXiv 25] Aesthetics is Cheap, Show me the Text: An Empirical Evaluation of State-of-the-Art Generative Models for OCR☆243Updated 3 months ago
- Progressive Spatio-Temporal Prototype Matching for Text-Video Retrieval --ICCV2023 Oral☆91Updated 2 years ago
- ☆17Updated 4 months ago
- [ICCV 2025] LIRA☆21Updated 3 weeks ago
- Official repository for paper MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning(https://arxiv.org/abs/2406.17770).☆159Updated last year
- [ICCV 2023] Few shot font generation via transferring similarity guided global and quantization local styles☆152Updated 3 months ago
- [ICCV 2025] Official implementation of LLaVA-KD: A Framework of Distilling Multimodal Large Language Models☆115Updated 2 months ago
- The official repo for “TextCoT: Zoom In for Enhanced Multimodal Text-Rich Image Understanding”.☆44Updated last year
- [ECCV 2024] Official PyTorch implementation of DreamLIP: Language-Image Pre-training with Long Captions☆135Updated 7 months ago
- Reason-before-Retrieve: One-Stage Reflective Chain-of-Thoughts for Training-Free Zero-Shot Composed Image Retrieval [CVPR 2025 Highlight]☆62Updated 5 months ago
- ☆100Updated 4 months ago
- 【CVPR 2025】SemiETS: Integrating Spatial and Content Consistencies for Semi-Supervised End-to-end Text Spotting☆14Updated 5 months ago
- [ECCV 2024] SAM4MLLM: Enhance Multi-Modal Large Language Model for Referring Expression Segmentation,☆45Updated 9 months ago
- [ICLR 2025] Diffusion Feedback Helps CLIP See Better☆298Updated 10 months ago
- A curated list of publications on image and video segmentation leveraging Multimodal Large Language Models (MLLMs), highlighting state-of…☆174Updated last week
- [CVPR 2024] PixelLM is an effective and efficient LMM for pixel-level reasoning and understanding.☆245Updated 10 months ago
- [ICLR2024 Spotlight] Code Release of CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction☆200Updated last year
- The official implementation of A Counting-Aware Hierarchical Decoding Framework for Generalized Referring Expression Segmentation☆23Updated 4 months ago
- [PR 2025] The official GitHub page of "MegaHan97K: A Large-Scale Dataset for Mega-Category Chinese Character Recognition with over 97K Ca…☆70Updated 5 months ago
- MomentDiff: Generative Video Moment Retrieval from Random to Real--NeurIPS 2023☆80Updated 2 years ago