tomchen-ctj / CVPR23-LOVEU-AQTCView external linksLinks
【CVPRW'23】First Place Solution to the CVPR'2023 AQTC Challenge
☆15Jul 18, 2023Updated 2 years ago
Alternatives and similar repositories for CVPR23-LOVEU-AQTC
Users that are interested in CVPR23-LOVEU-AQTC are comparing it to the libraries listed below
Sorting:
- 【CVPR'24】OST: Refining Text Knowledge with Optimal Spatio-Temporal Descriptor for General Video Recognition☆38Apr 27, 2024Updated last year
- [ECCV'24 Oral] PiTe: Pixel-Temporal Alignment for Large Video-Language Model☆17Feb 13, 2025Updated last year
- ☆16Apr 4, 2025Updated 10 months ago
- TEMPURA enables video-language models to reason about causal event relationships and generate fine-grained, timestamped descriptions of u…☆25Jun 4, 2025Updated 8 months ago
- BEAR: a new BEnchmark on video Action Recognition☆46Apr 21, 2024Updated last year
- Winner solution to Generic Event Boundary Captioning task in LOVEU Challenge (CVPR 2023 workshop)☆29Jan 1, 2024Updated 2 years ago
- The official implementation of our work Hawkeye: Discovering and Grounding Implicit Anomalous Sentiment in Recon-videos via Scene-enhanc…☆12Oct 14, 2024Updated last year
- [ACM MM 2024] Frame Interpolation with Consecutive Brownian Bridge Diffusion Model☆36Feb 22, 2025Updated 11 months ago
- Fast and general video object segmentation evaluation.☆36Jan 30, 2024Updated 2 years ago
- A curated list of resources about long-context in large-language models and video understanding.☆31Aug 8, 2023Updated 2 years ago
- Official implementation of paper VideoLLM Knows When to Speak: Enhancing Time-Sensitive Video Comprehension with Video-Text Duet Interact…☆42Feb 5, 2025Updated last year
- DisTime: Distribution-based Time Representation for Video Large Language Models.☆18Jul 10, 2025Updated 7 months ago
- Finetuning & extending DiffusionDet to video & pedestrian multi-object-tracking☆13Apr 12, 2023Updated 2 years ago
- The repository of VG-Refiner paper☆17Dec 9, 2025Updated 2 months ago
- [CVPR 2024] BIVDiff: A Training-free Framework for General-Purpose Video Synthesis via Bridging Image and Video Diffusion Models☆75Sep 11, 2024Updated last year
- VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation☆86Sep 12, 2024Updated last year
- ☆10Apr 7, 2025Updated 10 months ago
- ☆11Jan 18, 2025Updated last year
- [CVPR 2024] LoSh: Long-Short Text Joint Prediction Network for Referring Video Object Segmentation☆13Jun 17, 2024Updated last year
- [ECCV 2024] STEVE in Minecraft is for See and Think: Embodied Agent in Virtual Environment☆41Dec 27, 2023Updated 2 years ago
- Adaptive FSS has been Accepted by AAAI 2024. A Novel Few-Shot Segmentation Framework via Prototype Enhancement☆43Mar 11, 2024Updated last year
- The official repository of UVOSAM☆13Jun 5, 2024Updated last year
- Progressive Language-guided Visual Learning for Multi-Task Visual Grounding☆13May 9, 2025Updated 9 months ago
- [ICCV'25] Official PyTorch Implementation of "JointDiT: Enhancing RGB-Depth Joint Modeling with Diffusion Transformers"☆27Nov 27, 2025Updated 2 months ago
- ☆11Dec 13, 2023Updated 2 years ago
- (2024) The Official Repository of Paper "SISP: A Benchmark Dataset for Fine-grained Ship Instance Segmentation in Panchromatic Satellite …☆14Feb 7, 2024Updated 2 years ago
- Qwen-SAM is a reasoning-based segmentation model that integrates Qwen 2.5 VL 7B with the Segment Anything Model (SAM), enabling fine-grai…☆24Jun 4, 2025Updated 8 months ago
- [ICLR 2026] Official repo for "FrameThinker: Learning to Think with Long Videos via Multi-Turn Frame Spotlighting"☆37Oct 9, 2025Updated 4 months ago
- ☆10Jan 9, 2025Updated last year
- This repository is an official implementation of the paper A Simple Baseline for Open-World Tracking via Self-training.☆10Jan 26, 2024Updated 2 years ago
- ☆13Jan 21, 2025Updated last year
- Official code for "Weakly Supervised Two-Stage Training Scheme for Deep Video Fight Detection Model"☆12Oct 29, 2022Updated 3 years ago
- Unveiling the Potential of Segment Anything Model 2 for RGB-Thermal Semantic Segmentation with Language Guidance☆13Nov 27, 2025Updated 2 months ago
- Placeholder☆10Jul 17, 2023Updated 2 years ago
- Code for ACM MM 2023 paper - Regress Before Construct: Regress Autoencoder for Point Cloud Self-supervised Learning☆14Jan 19, 2024Updated 2 years ago
- The project is an official implementation of our paper "POTTER: Pooling Attention Transformer for Efficient Human Mesh Recovery".☆46Aug 16, 2023Updated 2 years ago
- ☆24Nov 27, 2025Updated 2 months ago
- Aggregate and Discriminate: Pseudo Clips-Guided Boundary Perception for Video Moment Retrieval☆12Nov 25, 2024Updated last year
- ☆23Updated this week