ASID-Caption: Attribute-Structured and Quality-Verified Audiovisual Instruction Dataset and Training Pipeline for Fine-Grained Video Understanding.
☆35Mar 3, 2026Updated this week
Alternatives and similar repositories for ASID-Caption
Users that are interested in ASID-Caption are comparing it to the libraries listed below
Sorting:
- An official code for "A Decoupled Spatio-Temporal Framework for Skeleton-based Action Segmentation".☆37Dec 15, 2023Updated 2 years ago
- [ICCV 2025] Unbiased Region-Language Alignment for Open-Vocabulary Dense Prediction☆52Sep 22, 2025Updated 5 months ago
- Offical implementation of "Re-Aligning Language to Visual Objects with an Agentic Workflow"☆31Apr 20, 2025Updated 10 months ago
- ☆14Apr 19, 2025Updated 10 months ago
- Enhancing Representations through Heterogeneous Self-Supervised Learning (TPAMI 2025)☆14May 2, 2025Updated 10 months ago
- Bibliometric. A Python framework designed for the analysis and evaluation of scholarly publications.☆15Jan 16, 2026Updated last month
- DIPO: Dual-State Images Controlled Articulated Object Generation Powered by Diverse Data☆39Dec 12, 2025Updated 2 months ago
- Official implement of ICML2024 Cascade-CLIP: Cascaded Vision-Language Embeddings Alignment for Zero-Shot Semantic Segmentation☆57Aug 15, 2024Updated last year
- ☆20Updated this week
- InfiniteVL: Synergizing Linear and Sparse Attention for Highly-Efficient, Unlimited-Input Vision-Language Models☆91Feb 2, 2026Updated last month
- Efficient Two-Step Networks for Temporal Action Segmentation (Neurocomputing 2021)☆17Dec 6, 2021Updated 4 years ago
- ☆36Dec 16, 2025Updated 2 months ago
- Official repository of the paper "High-Quality Mask Tuning Matters for Open-Vocabulary Segmentation"☆45Mar 25, 2025Updated 11 months ago
- [ICCV 2023] This is the official implementation of "Multiple Planar Object Tracking"☆24Aug 19, 2023Updated 2 years ago
- Official repository of the paper "A Glimpse to Compress: Dynamic Visual Token Pruning for Large Vision-Language Models"☆91Feb 13, 2026Updated 2 weeks ago
- ☆28Apr 4, 2025Updated 11 months ago
- ☆26Jun 20, 2024Updated last year
- InsertAnywhere: Bridging 4D Scene Geometry and Diffusion Models for Realistic Video Object Insertion☆82Dec 27, 2025Updated 2 months ago
- [ICCV2025] VEGGIE: Instructional Editing and Reasoning Video Concepts with Grounded Generation☆33Aug 18, 2025Updated 6 months ago
- The official code for the paper: LLaVA-Scissor: Token Compression with Semantic Connected Components for Video LLMs☆118Jul 1, 2025Updated 8 months ago
- The official implementation of our work Hawkeye: Discovering and Grounding Implicit Anomalous Sentiment in Recon-videos via Scene-enhanc…☆12Oct 14, 2024Updated last year
- ☆21Dec 14, 2025Updated 2 months ago
- Official Code for 'TAR3D: Creating High-Quality 3D Assets via Next-Part Prediction' (ICCV 2025)☆77Nov 8, 2025Updated 3 months ago
- ThinkGen: Generalized Thinking for Visual Generation☆51Dec 30, 2025Updated 2 months ago
- [CVPR 2024 Challenge] 1st Place Solution for MeViS Track in CVPR 2024 PVUW Workshop: Motion Expression guided Video Segmentation☆32Oct 18, 2024Updated last year
- The repository of VG-Refiner paper☆17Dec 9, 2025Updated 2 months ago
- This repository contains the official implementation for the series of NAIP family.☆52Jan 15, 2026Updated last month
- Project Page for "Multi-Task Dense Prediction via Mixture of Low-Rank Experts"☆89Jun 9, 2025Updated 8 months ago
- Finetuning & extending DiffusionDet to video & pedestrian multi-object-tracking☆13Apr 12, 2023Updated 2 years ago
- (NeurIPS 2024) Official repository of paper "Frozen-DETR: Enhancing DETR with Image Understanding from Frozen Foundation Models"☆35Mar 22, 2025Updated 11 months ago
- (ICCV2025) Official repository of paper "ViSpeak: Visual Instruction Feedback in Streaming Videos"☆45Jul 1, 2025Updated 8 months ago
- ☆41Dec 10, 2024Updated last year
- [ICCV 2025 Oral] Official implementation of Learning Streaming Video Representation via Multitask Training.☆84Dec 24, 2025Updated 2 months ago
- Transferring Genshin PVs into a freehand style with Diffusion Model.☆10Jun 5, 2024Updated last year
- This is the official repository for the paper "Modeling Human Gaze Behavior with Diffusion Models for Unified Scanpath Prediction". ICCV …☆24Dec 4, 2025Updated 3 months ago
- ☆43Dec 1, 2025Updated 3 months ago
- The 💩DaBian programming language. 💩"答辩"编程语言, 编程不是💩"答辩"的我不学!☆10Sep 28, 2023Updated 2 years ago
- ☆11Jan 18, 2025Updated last year
- [ICML 2025 Spotlight] MODA: MOdular Duplex Attention for Multimodal Perception, Cognition, and Emotion Understanding☆67Jul 10, 2025Updated 7 months ago