OpenGVLab / De-focus-Attention-Networks
Learning 1D Causal Visual Representation with De-focus Attention Networks
β32Updated 8 months ago
Alternatives and similar repositories for De-focus-Attention-Networks:
Users that are interested in De-focus-Attention-Networks are comparing it to the libraries listed below
- Codes for ICML 2023 Learning Dynamic Query Combinations for Transformer-based Object Detection and Segmentationβ37Updated last year
- π₯ [CVPR 2024] Official implementation of "See, Say, and Segment: Teaching LMMs to Overcome False Premises (SESAME)"β32Updated 8 months ago
- β29Updated 10 months ago
- PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Modelsβ25Updated 2 months ago
- Sambor: Boosting Segment Anything Model Towards Open-Vocabulary Learningβ30Updated last year
- [CVPR 2023] RILS: Masked Visual Reconstruction in Language Semantic Space (https://arxiv.org/abs/2301.06958)β44Updated last year
- [ECCV 2024] This is the official implementation of "Stitched ViTs are Flexible Vision Backbones".β27Updated last year
- (ECCV 2024) Can OOD Object Detectors Learn from Foundation Models?β25Updated 2 months ago
- [NeurIPS 2024] Official PyTorch implementation of "Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives"β34Updated 2 months ago
- Official code for the paper, "TaCA: Upgrading Your Visual Foundation Model with Task-agnostic Compatible Adapter".β16Updated last year
- β36Updated last month
- [NeurIPS 2024] official code release for our paper "Revisiting the Integration of Convolution and Attention for Vision Backbone".β32Updated 3 weeks ago
- [BMVC 2024] PlainMamba: Improving Non-hierarchical Mamba in Visual Recognitionβ73Updated 6 months ago
- [ICCV 2023]The PyTorch implementation of TL-Align: Token-Label Alignment for Vision Transformers.β23Updated last year
- (ICCV 2023) Betrayed by Captions: Joint Caption Grounding and Generation for Open Vocabulary Instance Segmentationβ46Updated 7 months ago
- β21Updated last year
- [CVPR 2023] Official implementation of "SAP-DETR: Bridging the Gap between Salient Points and Queries-Based Transformer Detector for Fastβ¦β29Updated last year
- [ECCV-24] This is the official implementation of the paper "SEGIC: Unleashing the Emergent Correspondence for In-Context Segmentation".β20Updated 4 months ago
- [ECCV2024] ClearCLIP: Decomposing CLIP Representations for Dense Vision-Language Inferenceβ74Updated 6 months ago
- (ICCV 2023) MasQCLIP for Open-Vocabulary Universal Image Segmentationβ37Updated last year
- Adapting LLaMA Decoder to Vision Transformerβ26Updated 9 months ago
- The official implementation of ADDP (ICLR 2024)β12Updated 10 months ago
- DiverGen (CVPR 2024) & BSGAL (ICML 2024)β41Updated 3 months ago
- [ICLR 2025] IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Modelβ27Updated 2 months ago
- OpenMMLab Detection Toolbox and Benchmark for V3Detβ15Updated 10 months ago
- Code Release of F-LMM: Grounding Frozen Large Multimodal Modelsβ62Updated 6 months ago
- The offical implemention of JM3D.β29Updated last year
- A PyTorch implementation of the paper "Revisiting Non-Autoregressive Transformers for Efficient Image Synthesis"β42Updated 8 months ago
- DynRefer: Delving into Region-level Multi-modality Tasks via Dynamic Resolutionβ39Updated this week