TACJu / Axial-VSLinks
This repo contains the code for our TMLR paper: A Simple Video Segmenter by Tracking Objects Along Axial Trajectories
☆27Updated 3 months ago
Alternatives and similar repositories for Axial-VS
Users that are interested in Axial-VS are comparing it to the libraries listed below
Sorting:
- Official Pytorch Implementation of Paper "A Semantic Space is Worth 256 Language Descriptions: Make Stronger Segmentation Models with Des…☆55Updated 11 months ago
- 🔥 [CVPR 2024] Official implementation of "See, Say, and Segment: Teaching LMMs to Overcome False Premises (SESAME)"☆41Updated last year
- LiVOS: Light Video Object Segmentation with Gated Linear Matching (CVPR 2025)☆37Updated 2 months ago
- INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model☆42Updated 10 months ago
- Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment☆51Updated 5 months ago
- Official implementation of Add-SD: Rational Generation without Manual Reference.☆27Updated 10 months ago
- [NeurIPS 2024] Official implementation of the paper "Interfacing Foundation Models' Embeddings"☆125Updated 10 months ago
- ☆32Updated last year
- SpatialScore: Towards Unified Evaluation for Multimodal Spatial Understanding☆48Updated 3 weeks ago
- Harnessing CLIP, DINO and SAM for Open Vocabulary Segmentation☆58Updated 3 months ago
- Implementation of Zero-Shot Video Semantic Segmentation [CVPR 2025]☆49Updated 3 months ago
- Implementation for "The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer"☆44Updated last month
- ☆64Updated 2 months ago
- [ICML 2025] VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models☆27Updated last week
- Project for "LaSagnA: Language-based Segmentation Assistant for Complex Queries".☆57Updated last year
- Diffusion Models as Data Mining Tools☆54Updated last month
- This repository is for the first survey on SAM & SAM2 for Videos.☆51Updated last month
- Implementation of the model: "(MC-ViT)" from the paper: "Memory Consolidation Enables Long-Context Video Understanding"☆20Updated 2 months ago
- [NeurIPS2023] 3D-OWIS is capable of detecting unknown instances in inference, and progressively learning novel classes in the process of …☆68Updated last year
- [ICML 2024] This repository includes the official implementation of our paper "Rejuvenating image-GPT as Strong Visual Representation Lea…☆98Updated last year
- ☆34Updated last year
- Official code repo of PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs☆26Updated 5 months ago
- ☆13Updated 9 months ago
- ☆34Updated last year
- ☆43Updated 8 months ago
- OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models☆40Updated 3 weeks ago
- ☆23Updated last year
- Official code for Perspective-Aware Reasoning in Vision-Language Models via Mental Imagery Simulation☆31Updated last month
- Official implementation of "URECA : Unique Region Caption Anything"☆49Updated 2 months ago
- Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning☆70Updated last week