The official repo for [TPAMI'23] "Vision Transformer with Quadrangle Attention"
☆233Sep 25, 2025Updated 5 months ago
Alternatives and similar repositories for QFormer
Users that are interested in QFormer are comparing it to the libraries listed below
Sorting:
- The official repo for the technical report "Scalable Mask Annotation for Video Text Spotting"☆16May 3, 2023Updated 2 years ago
- Official repo for "S5: Scalable Semi-Supervised Semantic Segmentation in Remote Sensing"☆33Dec 4, 2025Updated 3 months ago
- ☆18Mar 19, 2025Updated 11 months ago
- (CVPR2023/TPAMI2024) Integrally Pre-Trained Transformer Pyramid Networks -- A Hierarchical Vision Transformer for Masked Image Modeling☆211Jul 28, 2024Updated last year
- SuperpixelGridMasks is an approach for sensor-based data augmentation towards image classification tasks and so on.☆14Jan 18, 2023Updated 3 years ago
- MMPD Dataset from ECCV'2024 "When Pedestrian Detection Meets Multi-Modal Learning: Generalist Model and Benchmark Dataset"☆21Jul 15, 2024Updated last year
- Repository of Vision Transformer with Deformable Attention (CVPR2022) and DAT++: Spatially Dynamic Vision Transformerwith Deformable Atte…☆926Apr 17, 2024Updated last year
- ☆37Oct 17, 2025Updated 4 months ago
- [arXiv: 2505.12307] LogicOCR: Do Your Large Multimodal Models Excel at Logical Reasoning on Text-Rich Images?☆35Dec 1, 2025Updated 3 months ago
- Unofficial implementation for [ECCV'22] "Exploring Plain Vision Transformer Backbones for Object Detection"☆579Apr 24, 2022Updated 3 years ago
- A comprehensive list [AIM@IJCAI'21, P3M@MM'21, GFM@IJCV'22, RIM@CVPR'23, P3MNet@IJCV'23] of our research works related to image matting, …☆230Apr 11, 2023Updated 2 years ago
- ☆15May 23, 2025Updated 9 months ago
- ☆11Sep 2, 2024Updated last year
- [ECCV 2024] Official repository of Agent Attention☆661Nov 17, 2024Updated last year
- VMamba: Visual State Space Models,code is based on mamba☆3,054Mar 7, 2025Updated 11 months ago
- Implementation of SoundtStream from the paper: "SoundStream: An End-to-End Neural Audio Codec"☆13Jan 27, 2025Updated last year
- tmp DPI☆14Dec 18, 2024Updated last year
- [ICML 2025] LaCache: Ladder-Shaped KV Caching for Efficient Long-Context Modeling of Large Language Models☆17Nov 4, 2025Updated 4 months ago
- ☆12Mar 5, 2025Updated last year
- The official pytorch implementation of "CSAKD: Knowledge Distillation with Cross Self-Attention for Hyperspectral and Multispectral Image…☆13Nov 7, 2024Updated last year
- Minimum viable code for the Decodable Information Bottleneck paper. Pytorch Implementation.☆11Oct 20, 2020Updated 5 years ago
- ☆10Nov 26, 2023Updated 2 years ago
- ☆13Apr 19, 2024Updated last year
- Salient Objects in Clutter, arXiv, 2021 (ECCV2018 extenstion).☆11Jun 17, 2021Updated 4 years ago
- ☆133Jan 19, 2023Updated 3 years ago
- UAV-Rain1k: A Benchmark for Raindrop Removal from UAV Aerial Imagery (CVPRW 2024)☆31Apr 13, 2024Updated last year
- A regularized self-labeling approach to improve the generalization and robustness of fine-tuned models☆27Jun 7, 2022Updated 3 years ago
- [ICML'25] Official code of paper "Fast Large Language Model Collaborative Decoding via Speculation"☆28Jun 23, 2025Updated 8 months ago
- ☆27Aug 8, 2022Updated 3 years ago
- [CVPR L3D-IVU 2024] Official implementation for the paper "Learnable Prompt for Few-Shot Semantic Segmentation"☆14Apr 22, 2024Updated last year
- EfficientSAM + YOLO World base model for use with Autodistill.☆10Feb 21, 2024Updated 2 years ago
- ☆11Jun 12, 2024Updated last year
- LMM for VQA, tcsvt version☆11Jul 19, 2024Updated last year
- CIFAR10 ResNets implemented in JAX+Flax☆12Apr 6, 2022Updated 3 years ago
- [TPAMI 2024] The official repo for "Stereo Image Restoration via Attention-Guided Correspondence Learning"☆10Apr 21, 2024Updated last year
- Self-reimplemented version of 4D-LRM.☆65May 30, 2025Updated 9 months ago
- [CVPR 2025] OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts☆17Apr 2, 2025Updated 11 months ago
- Official Implementation of Video-MA2MBA☆12Dec 3, 2024Updated last year
- [ICLR 2026] Official repo for "Spotlight on Token Perception for Multimodal Reinforcement Learning"☆49Jan 30, 2026Updated last month