ytpeng-aimlab / Multi-Stage-Partitioned-Transformer-for-Efficient-Image-Deraining
☆14Updated 2 years ago
Alternatives and similar repositories for Multi-Stage-Partitioned-Transformer-for-Efficient-Image-Deraining:
Users that are interested in Multi-Stage-Partitioned-Transformer-for-Efficient-Image-Deraining are comparing it to the libraries listed below
- ☆13Updated last year
- AMC: Adaptive Multi-expert Collaborative Network for Text-guided Image Retrieval☆16Updated 7 months ago
- Collection of Composed Image Retrieval (CIR) papers.☆171Updated this week
- ☆9Updated 2 years ago
- Document Artifical Intelligence☆157Updated 3 months ago
- An implementation of "CLIP4STR: A Simple Baseline for Scene Text Recognition with Pre-trained Vision-Language Model".☆134Updated 3 weeks ago
- ☆10Updated last year
- Turning a CLIP Model into a Scene Text Detector (CVPR2023) | Turning a CLIP Model into a Scene Text Spotter (TPAMI)☆191Updated 9 months ago
- The official repo for [CVPR'23] "DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting" & [ArXiv'23] "DeepSolo++:…☆263Updated 7 months ago
- [CVPR 2024] Official PyTorch Code for "PromptKD: Unsupervised Prompt Distillation for Vision-Language Models"☆284Updated 2 weeks ago
- [AAAI'23 Oral] DPText-DETR: Towards Better Scene Text Detection with Dynamic Points in Transformer☆183Updated last year
- Official implementation of SPTS: Single-Point Text Spotting (ACM MM 2022 Oral)☆140Updated last year
- Cross-Modal Implicit Relation Reasoning and Aligning for Text-to-Image Person Retrieval (CVPR 2023)☆221Updated last week
- (CVPR 2024) Bridging the Gap Between End-to-End and Two-Step Text Spotting.☆59Updated 9 months ago
- [ECCV2024] Bridging Synthetic and Real Worlds for Pre-training Scene Text Detectors☆17Updated 6 months ago
- Official Code for IJCV 2024 paper — Globally Correlation-Aware Hard Negative Generation☆15Updated 3 months ago
- ☆37Updated last year
- ✨✨ Scene-Text Grounding for Text-Based Video Question Answering (arxiv)☆14Updated 3 weeks ago
- [CVPR'2024 Highlight] Official PyTorch implementation of the paper "VTimeLLM: Empower LLM to Grasp Video Moments".☆264Updated 9 months ago
- This repository is the codebase of TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy☆26Updated 5 months ago
- [CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts☆316Updated 8 months ago
- ☆79Updated last year
- ☆46Updated last year
- Dataset and Code for our ACL 2024 paper: "Multimodal Table Understanding". We propose the first large-scale Multimodal IFT and Pre-Train …☆192Updated 6 months ago
- 【ArXiv】PDF-Wukong: A Large Multimodal Model for Efficient Long PDF Reading with End-to-End Sparse Sampling☆114Updated 5 months ago
- [ECCV 2024 Oral] Code for paper: An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Langua…☆401Updated 2 months ago
- ☆174Updated last year
- [ICDAR 2024] (Best Student Paper🏆) Exploring Knowledge Distillation Towards Document Object Detection with Structured Graph Creation☆13Updated 6 months ago
- The official code of "Towards Unified Text-based Person Retrieval: A Large-scale Multi-Attribute and Language Search Benchmark"☆151Updated 7 months ago
- ☆131Updated last year