HVision-NKU/ASID-Caption

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/HVision-NKU/ASID-Caption)

HVision-NKU / ASID-Caption

ASID-Caption: Attribute-Structured and Quality-Verified Audiovisual Instruction Dataset and Training Pipeline for Fine-Grained Video Understanding.

☆68

Alternatives and similar repositories for ASID-Caption

Users that are interested in ASID-Caption are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

xzxxntxdy / PEPO
View on GitHub
Official repo for ”Rethinking Token-Level Policy Optimization for Multimodal Chain-of-Thought“
☆26Mar 29, 2026Updated 3 months ago
HVision-NKU / DenseVLM
View on GitHub
[ICCV 2025] Unbiased Region-Language Alignment for Open-Vocabulary Dense Prediction
☆53Sep 22, 2025Updated 10 months ago
lzyhha / HSSL
View on GitHub
Enhancing Representations through Heterogeneous Self-Supervised Learning (TPAMI 2025)
☆15May 2, 2025Updated last year
HVision-NKU / MutualForcing
View on GitHub
☆58Apr 28, 2026Updated 2 months ago
NK-JittorCV / nk-det
View on GitHub
An open source codebase for object detection based on Jittor
☆19Dec 9, 2025Updated 7 months ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
HVision-NKU / TempSamp-R1
View on GitHub
[Official, NeurIPS 2025] TempSamp-R1: Effective Temporal Sampling with Reinforcement Fine-Tuning for Video LLMs.
☆17Jun 8, 2026Updated last month
lyhisme / DeST
View on GitHub
An official code for "A Decoupled Spatio-Temporal Framework for Skeleton-based Action Segmentation".
☆39Dec 15, 2023Updated 2 years ago
HVision-NKU / ControlSR
View on GitHub
☆13Apr 19, 2025Updated last year
HumanMLLM / SWIM
View on GitHub
Official Code for See What I Mean: Aligning Vision and Language Representations for Video Fine-grained Object Understanding (CVPR 2026)
☆97May 20, 2026Updated 2 months ago
HVision-NKU / OneVAE
View on GitHub
☆55Sep 21, 2025Updated 10 months ago
HVision-NKU / Cascade-CLIP
View on GitHub
Official implement of ICML2024 Cascade-CLIP: Cascaded Vision-Language Embeddings Alignment for Zero-Shot Semantic Segmentation
☆58Aug 15, 2024Updated last year
HumanMLLM / LLaVA-Scissor
View on GitHub
The official code for the paper: LLaVA-Scissor: Token Compression with Semantic Connected Components for Video LLMs
☆122Jul 1, 2025Updated last year
zhengyuan-xie / ECCV24_NeST
View on GitHub
[ECCV 2024] Early Preparation Pays Off: New Classifier Pre-tuning for Class Incremental Semantic Segmentation
☆39Mar 3, 2025Updated last year
yaolinli / TimeChat-Captioner
View on GitHub
[ICML 2026] Scripting Multi-Scene Videos with Time-Aware and Structural Audio-Visual Captions
☆49Jun 29, 2026Updated 3 weeks ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
HVision-NKU / AR123
View on GitHub
Official Code for 'AR-1-to-3: Single Image to Consistent 3D Object Generation via Next-View Prediction' (ICCV 2025)
☆64Nov 8, 2025Updated 8 months ago
AVoCaDO-Captioner / AVoCaDO
View on GitHub
https://avocado-captioner.github.io/
☆37Oct 16, 2025Updated 9 months ago
WPR001 / UGC_VideoCaptioner
View on GitHub
☆16Jun 23, 2026Updated last month
yangzhangok / crystal
View on GitHub
official repository of article "CrystaL: Spontaneous Emergence of Visual Latents in MLLMs"
☆18May 26, 2026Updated last month
HVision-NKU / TAR3D
View on GitHub
Official Code for 'TAR3D: Creating High-Quality 3D Assets via Next-Part Prediction' (ICCV 2025)
☆77Nov 8, 2025Updated 8 months ago
zhasion / nkuthesis
View on GitHub
Nankai University Thesis LaTeX Template (南开大学硕博毕业论文模版 v2026.5)
☆17May 31, 2026Updated last month
NK-JittorCV / nk-diffusion
View on GitHub
☆18Jul 2, 2026Updated 3 weeks ago
Adam-duan / DiffRetouch
View on GitHub
[AAAI2025] This is the official PyTorch codes for the paper: "DiffRetouch: Using Diffusion to Retouch on the Shoulder of Experts"
☆25Jun 16, 2025Updated last year
HVision-NKU / MaskCLIPpp
View on GitHub
Official repository of the paper "High-Quality Mask Tuning Matters for Open-Vocabulary Segmentation"
☆47Mar 25, 2025Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
HVision-NKU / StyleExpert
View on GitHub
Official implementation of StyleExpert(CVPR 2026)
☆38Mar 19, 2026Updated 4 months ago
RQ-Wu / DIPO
View on GitHub
[NeurIPS 2025] | DIPO: Dual-State Images Controlled Articulated Object Generation Powered by Diverse Data
☆52Dec 12, 2025Updated 7 months ago
HVision-NKU / OffSeg
View on GitHub
[ICCV 2025] Revisiting Efficient Semantic Segmentation: Learning Offsets for Better Spatial and Class Feature Alignment
☆58Oct 14, 2025Updated 9 months ago
HVision-NKU / GlimpsePrune
View on GitHub
[TCSVT] Official repository of the paper "A Glimpse to Compress: Dynamic Visual Token Pruning for Large Vision-Language Models"
☆98Jun 12, 2026Updated last month
HVision-NKU / DepthAnythingAC
View on GitHub
Official code for the paper: Depth Anything At Any Condition
☆344Aug 21, 2025Updated 11 months ago
ZX-Yin / DreamLifting
View on GitHub
The code implementation for the paper "DreamLifting: A Plug-in Module Lifting MV Diffusion Models for 3D Asset Generation".
☆30Sep 1, 2025Updated 10 months ago
NK-CS-ZZL / GS-ROR
View on GitHub
Official Release of ACM TOG 2025 paper -- GS-ROR
☆54Feb 9, 2026Updated 5 months ago
NK-JittorCV / nk-yolo
View on GitHub
☆24Jul 11, 2026Updated last week
ssocean / pybiblion
View on GitHub
Bibliometric. A Python framework designed for the analysis and evaluation of scholarly publications.
☆15Jan 16, 2026Updated 6 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
YXB-NKU / SE-GUI
View on GitHub
[NeurIPS 2025]"Enhancing Visual Grounding for GUI Agents via Self-Evolutionary Reinforcement Learning"
☆108Oct 21, 2025Updated 9 months ago
Lliar-liar / Daily-Omni
View on GitHub
This is the official repository of Daily-Omni: Towards Audio-Visual Reasoning with Temporal Alignment across Modalities
☆42Apr 28, 2026Updated 2 months ago
NK-JittorCV / nk-remote
View on GitHub
☆21Jul 9, 2025Updated last year
mims-harvard / Qworld
View on GitHub
Qworld: Question-Specific Evaluation Criteria for LLMs
☆30Mar 26, 2026Updated 3 months ago
AAwcAA / WOW-Seg-Meta
View on GitHub
☆35Updated this week
DragonisCV / RAM
View on GitHub
[ECCV 2024] Restore Anything with Masks: Leveraging Mask Image Modeling for Blind All-in-One Image Restoration
☆110Apr 21, 2026Updated 3 months ago
dingyue772 / OmniSIFT
View on GitHub
[ICML2026] OmniSIFT: Modality-Asymmetric Token Compression for Efficient Omni-modal Large Language Models
☆25May 21, 2026Updated 2 months ago