TACJu / Axial-VS
This repo contains the code for our paper MaXTron: Mask Transformer with Trajectory Attention for Video Panoptic Segmentation
☆28Updated 3 months ago
Related projects: ⓘ
- Official Pytorch Implementation of Paper "A Semantic Space is Worth 256 Language Descriptions: Make Stronger Segmentation Models with Des…☆47Updated 2 months ago
- [ECCV2024] ProxyCLIP: Proxy Attention Improves CLIP for Open-Vocabulary Segmentation☆45Updated 2 weeks ago
- Official implementation of the paper "Interfacing Foundation Models' Embeddings"☆107Updated last month
- Project for "LaSagnA: Language-based Segmentation Assistant for Complex Queries".☆43Updated 4 months ago
- [NeurIPS2023] 3D-OWIS is capable of detecting unknown instances in inference, and progressively learning novel classes in the process of …☆66Updated 9 months ago
- 4D Panoptic Scene Graph Generation (NeurIPS'23 Spotlight)☆84Updated 4 months ago
- ☆17Updated 5 months ago
- Official code repo of PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs☆22Updated 3 months ago
- INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model☆36Updated last month
- ☆32Updated 8 months ago
- [ICML 2024] This repository includes the official implementation of our paper "Rejuvenating image-GPT as Strong Visual Representation Lea…☆96Updated 4 months ago
- Using Segment-Anything and CLIP to generate pixel-aligned semantic features.☆31Updated last year
- ☆57Updated last year
- Official Pytorch Implementation of CLIP-VIS: Adapting CLIP for Open-Vocabulary Video Instance Segmentation.☆34Updated 3 weeks ago
- ☆26Updated last week
- (ICLR 2024, CVPR 2024) SparseFormer☆62Updated 5 months ago
- Multimodal Video Understanding Framework (MVU)☆23Updated 4 months ago
- ☆25Updated 11 months ago
- MIMIC: Masked Image Modeling with Image Correspondences☆15Updated 3 months ago
- AAPL: Adding Attributes to Prompt Learning for Vision-Language Models (CVPRw 2024)☆29Updated 4 months ago
- [ICCV 2023] OnlineRefer: A Simple Online Baseline for Referring Video Object Segmentation☆48Updated 11 months ago
- Diffusion base mining☆37Updated this week
- ☆35Updated 11 months ago
- state-of-the-art open vocabulary detector on COCO/LVIS/V3Det☆23Updated 5 months ago
- Codes for ICML 2023 Learning Dynamic Query Combinations for Transformer-based Object Detection and Segmentation☆35Updated last year
- Public repository for the ECCV 2024 paper "Train Till You Drop: Towards Stable and Robust Source-free Unsupervised 3D Domain Adaptation".☆17Updated last week
- ChatterBox: Multi-round Multimodal Referring and Grounding, Multimodal, Multi-round dialogues☆49Updated 4 months ago
- This repo contains the code for our paper Towards Open-Ended Visual Recognition with Large Language Model☆88Updated 2 months ago
- [CVPR24] Official Implementation of GEM (Grounding Everything Module)☆72Updated 9 months ago