naver-ai/class-query-vad

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/naver-ai/class-query-vad)

naver-ai / class-query-vad

[ECCV 2024] Official PyTorch implementation of "Classification Matters: Improving Video Action Detection with Class-Specific Attention"

☆18

Alternatives and similar repositories for class-query-vad

Users that are interested in class-query-vad are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

alibaba-mmai-research / HiCo
View on GitHub
CVPR2022:Learning from Untrimmed Videos: Self-Supervised Video Representation Learning with Hierarchical Consistency
☆18Aug 10, 2022Updated 3 years ago
musicalOffering / ActionSwitch-release
View on GitHub
☆12Aug 7, 2024Updated last year
Pilhyeon / BAM-DETR
View on GitHub
Official Pytorch Implementation of 'BAM-DETR: Boundary-Aligned Moment Detection Transformer for Temporal Sentence Grounding in Videos'
☆36Feb 26, 2025Updated last year
KHU-VLL / CAST
View on GitHub
[NeurIPS 2023] Official implementation of the paper "CAST: Cross-Attention in Space and Time for Video Action Recognition"
☆55Dec 28, 2023Updated 2 years ago
HYUNJS / DecAF
View on GitHub
[ICLR 2026] Official implementation of "Decomposed Attention Fusion in MLLMs for Training-Free Video Reasoning Segmentation"
☆36Jan 26, 2026Updated 6 months ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
rllab-snu / IDNet
View on GitHub
Implementation of Learning Instance-Aware Object Detection Using Determinantal Point Processes [https://arxiv.org/pdf/1805.10765.pdf]
☆19Nov 21, 2023Updated 2 years ago
rllab-snu / Adaptive-Soft-Actor-Critic
View on GitHub
☆20Aug 18, 2020Updated 5 years ago
rllab-snu / deep_learning_tutorial
View on GitHub
Deep learning tutorials using tensorflow
☆22Oct 11, 2019Updated 6 years ago
HYUNJS / STTM
View on GitHub
[ICCV 2025] Multi-Granular Spatio-Temporal Token Merging for Training-Free Acceleration of Video LLMs
☆61Feb 2, 2026Updated 5 months ago
minjoong507 / Consistency-of-Video-LLM
View on GitHub
[CVPR 2025] Official Repository of the paper "On the Consistency of Video Large Language Models in Temporal Comprehension"
☆16Oct 13, 2025Updated 9 months ago
mbzuai-oryx / LongShOT
View on GitHub
A Benchmark and Agentic Framework for Omni-Modal Reasoning and Tool Use in Long Videos
☆21Jun 20, 2026Updated last month
jolin830 / SlowFast-Meet-ViT
View on GitHub
We have implemented Track # 1 for ICME 2024: Spatial Action Localization on Chaotic World dataset. Our mAP on the validation set reaches …
☆14Nov 11, 2024Updated last year
lijenchang / Mask2Hand
View on GitHub
PyTorch Implementation of "Mask2Hand: Learning to Predict the 3D Hand Pose and Shape from Shadow"
☆12Aug 19, 2024Updated last year
rllab-snu / Pedestrian-Intention-Prediction-for-Autonomous-Driving
View on GitHub
☆30Nov 24, 2025Updated 8 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
rllab-snu / soft_action_particle_method
View on GitHub
☆33Nov 24, 2025Updated 8 months ago
daeunni / Video-Skill-CoT
View on GitHub
Code for "Skill-based Chain-of-Thoughts for Domain-Adaptive Video Reasoning [EMNLP 2025 Findings]"
☆18Aug 27, 2025Updated 11 months ago
NJU-LINK / IF-VidCap
View on GitHub
The Source Code for IF-VidCap @ICLR 2026
☆19Oct 22, 2025Updated 9 months ago
ms-dot-k / LRW_ID
View on GitHub
The speaker-labeled information of LRW dataset, which is the outcome of the paper "Speaker-adaptive Lip Reading with User-dependent Paddi…
☆10Oct 12, 2023Updated 2 years ago
david-gimeno / tailored-avsr
View on GitHub
Official source code for the paper "Tailored Design of Audio-Visual Speech Recognition Models using Branchformers"
☆15Feb 24, 2025Updated last year
rllab-snu / Deep-Elastic-Network
View on GitHub
Implementation of Deep Elastic Network
☆42Nov 24, 2025Updated 8 months ago
densechen / Pose-refinement
View on GitHub
Pose refinement with differentiable rendering
☆10Dec 27, 2020Updated 5 years ago
IVY-LVLM / Video-MA2MBA
View on GitHub
Official Implementation of Video-MA2MBA
☆12Dec 3, 2024Updated last year
andreagemelli / Action-recognition-by-2D-skeleton-analysis
View on GitHub
Implementation of the techniques presented in "Co-occurrence Feature Learning from Skeleton Data for Action Recognition" to recognize two…
☆11Jul 22, 2019Updated 7 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
Jho-Yonsei / CoMoGaussian
View on GitHub
[ICCV 2025] CoMoGaussian: Continuous Motion-Aware Gaussian Splatting from Motion-Blurred Images
☆56Jul 15, 2025Updated last year
NVlabs / FRAG
View on GitHub
☆15Apr 25, 2025Updated last year
kdariina / CLIP-not-BoW-unimodally
View on GitHub
Code for "CLIP Behaves like a Bag-of-Words Model Cross-modally but not Uni-modally"
☆29Feb 27, 2026Updated 5 months ago
gulvarol / cslr2
View on GitHub
Large-Vocabulary Continuous Sign Language Recognition, 2024
☆16May 30, 2024Updated 2 years ago
joslefaure / HERMES
View on GitHub
[ICCV'25] HERMES: temporal-coHERent long-forM understanding with Episodes and Semantics
☆37Sep 10, 2025Updated 10 months ago
rllab-snu / tsallis_actor_critic_mujoco
View on GitHub
Implementation of Tsallis Actor Critic method
☆61Nov 24, 2025Updated 8 months ago
sangmin-git / MMSI
View on GitHub
Code for "Modeling Multimodal Social Interactions: New Challenges and Baselines with Densely Aligned Representations" (CVPR 2024 Oral)
☆19Jun 23, 2024Updated 2 years ago
IVUL-KAUST / VideoAuto-R1
View on GitHub
[CVPR2026] VideoAuto-R1: Video Auto Reasoning via Thinking Once, Answering Twice
☆88Feb 27, 2026Updated 5 months ago
JeongHun0716 / e-mvsr
View on GitHub
Efficient Training for Multilingual Visual Speech Recognition: Pre-training with Discretized Visual Speech Representation (ACM MM 2024)
☆20Mar 17, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
Yui010206 / Ego2Web
View on GitHub
[CVPR 2026] Ego2Web: A Web Agent Benchmark Grounded in Egocentric Videos
☆29Mar 25, 2026Updated 4 months ago
ChajinShin / Survey-on-Implicit-Neural-Representation
View on GitHub
Survey-on-Implicit-Neural-Representation
☆36Mar 31, 2021Updated 5 years ago
NicolaNardino / Blockchain.Ethereum
View on GitHub
Java web application backed by the Ethereum-Blockchain network. Powered by RESTful web services (JAX-RS && Spring Boot) , Docker, Kuberne…
☆14Feb 19, 2019Updated 7 years ago
jingjing12110 / MixPHM
View on GitHub
[CVPR 2023] Pytorch Code of MixPHM: Redundancy-Aware Parameter-Efficient Tuning for Low-Resource Visual Question Answering
☆17Jul 11, 2023Updated 3 years ago
OpenGVLab / VKnowU
View on GitHub
[ECCV 2026] VKnowU: Evaluating Visual Knowledge Understanding in Multimodal LLMs
☆16Feb 3, 2026Updated 5 months ago
qirui-chen / MultiHop-EgoQA
View on GitHub
[AAAI 2025] Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos
☆38May 27, 2025Updated last year
gyxxyg / VTG-LLM
View on GitHub
[AAAI 2025] VTG-LLM: Integrating Timestamp Knowledge into Video LLMs for Enhanced Video Temporal Grounding
☆130Dec 10, 2024Updated last year