ytpeng-aimlab / Multi-Stage-Partitioned-Transformer-for-Efficient-Image-Deraining
☆14Updated 2 years ago
Alternatives and similar repositories for Multi-Stage-Partitioned-Transformer-for-Efficient-Image-Deraining:
Users that are interested in Multi-Stage-Partitioned-Transformer-for-Efficient-Image-Deraining are comparing it to the libraries listed below
- ☆13Updated last year
- Turning a CLIP Model into a Scene Text Detector (CVPR2023) | Turning a CLIP Model into a Scene Text Spotter (TPAMI)☆193Updated 10 months ago
- An implementation of "CLIP4STR: A Simple Baseline for Scene Text Recognition with Pre-trained Vision-Language Model".☆139Updated last month
- The official repo for [CVPR'23] "DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting" & [ArXiv'23] "DeepSolo++:…☆264Updated 8 months ago
- ☆10Updated last year
- Document Artifical Intelligence☆160Updated this week
- [ICCV 2023] Code base for Revisiting Scene Text Recognition: A Data Perspective☆188Updated last year
- Contextual Object Detection with Multimodal Large Language Models☆235Updated 6 months ago
- AMC: Adaptive Multi-expert Collaborative Network for Text-guided Image Retrieval☆19Updated 7 months ago
- Dataset and Code for our ACL 2024 paper: "Multimodal Table Understanding". We propose the first large-scale Multimodal IFT and Pre-Train …☆196Updated 7 months ago
- ☆132Updated last year
- ☆9Updated 2 years ago
- [AAAI'23 Oral] DPText-DETR: Towards Better Scene Text Detection with Dynamic Points in Transformer☆183Updated last year
- Visual Instruction Tuning for Qwen2 Base Model☆32Updated 9 months ago
- [CVPR-2024] Official implementations of CLIP-KD: An Empirical Study of CLIP Model Distillation☆111Updated 9 months ago
- [CVPR 2024] Official Code for the Paper "Compositional Chain-of-Thought Prompting for Large Multimodal Models"☆125Updated 10 months ago
- The official implement of CTRNet++.☆11Updated 3 months ago
- [NAACL 2024] MMC: Advancing Multimodal Chart Understanding with LLM Instruction Tuning☆97Updated 3 months ago
- Applied Deep Learning (2021 Spring) at National Taiwan University (NTU) CSIE☆9Updated 3 years ago
- On the Hidden Mystery of OCR in Large Multimodal Models (OCRBench)☆598Updated 2 months ago
- (CVPR 2022) Text Spotting Transformers☆184Updated 2 years ago
- [CVPR 2024 Highlight] OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allo…☆332Updated 8 months ago
- Latest Advances on Modality Priors in Multimodal Large Language Models☆13Updated this week
- [ECCV 2024 Oral] Code for paper: An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Langua…☆417Updated 3 months ago
- The official code for the CVPR 2024 paper: Multi-modal In-Context Learning Makes an Ego-evolving Scene Text Recognizer☆53Updated 10 months ago
- [CVPR 2024 Highlight] Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding☆267Updated 6 months ago
- Collection of Composed Image Retrieval (CIR) papers.☆189Updated last week
- [CVPR 2024] Official PyTorch Code for "PromptKD: Unsupervised Prompt Distillation for Vision-Language Models"☆292Updated last month
- (ICCV 2023) ESTextSpotter: Towards Better Scene Text Spotting with Explicit Synergy in Transformer☆76Updated last year
- 【ArXiv】PDF-Wukong: A Large Multimodal Model for Efficient Long PDF Reading with End-to-End Sparse Sampling☆116Updated 6 months ago