[EMNLP 2024] A Video Chat Agent with Temporal Prior
☆32Mar 2, 2025Updated last year
Alternatives and similar repositories for VideoTGB
Users that are interested in VideoTGB are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [ICLR 2026] RuleReasoner: Reinforced Rule-based Reasoning via Domain-aware Dynamic Sampling☆37Feb 25, 2026Updated last month
- ☆12Mar 4, 2022Updated 4 years ago
- [EMNLP’24 Main] Encoding and Controlling Global Semantics for Long-form Video Question Answering☆18Oct 9, 2024Updated last year
- [ICML 2025] |TokenSwift: Lossless Acceleration of Ultra Long Sequence Generation☆123May 19, 2025Updated 10 months ago
- Contrastive Video Question Answering via Video Graph Transformer (IEEE T-PAMI'23)☆19Mar 9, 2024Updated 2 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- ☆14Dec 16, 2023Updated 2 years ago
- (NeurIPS 2024 Spotlight) TOPA: Extend Large Language Models for Video Understanding via Text-Only Pre-Alignment☆29Sep 27, 2024Updated last year
- This repo contains source code for Glance and Focus: Memory Prompting for Multi-Event Video Question Answering (Accepted in NeurIPS 2023)☆31Jun 28, 2024Updated last year
- MR. Video: MapReduce is the Principle for Long Video Understanding☆31Apr 23, 2025Updated 11 months ago
- ☆13Feb 22, 2021Updated 5 years ago
- EMNLP 2025 | TongSearch-QR☆42Dec 4, 2025Updated 3 months ago
- Code release for "EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone" [ICCV, 2023]☆106Jul 2, 2024Updated last year
- a fully open-source implementation of a GPT-4o-like speech-to-speech video understanding model.☆38Apr 7, 2025Updated 11 months ago
- The open-source code for the NeurIPS 2025 paper, "Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learn…☆49Jan 5, 2026Updated 2 months ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- [WACV 2025] Exploiting VLM Localizability and Semantics for Open Vocabulary Action Detection☆16Mar 23, 2025Updated last year
- TTRV: Test-Time Reinforcement Learning for Vision–Language Models (CVPR 2026)☆37Mar 8, 2026Updated 3 weeks ago
- ☆11May 2, 2022Updated 3 years ago
- [CVPR 2026] UFVideo: Towards Unified Fine-Grained Video Cooperative Understanding with Large Language Models☆37Feb 21, 2026Updated last month
- ☆11Aug 7, 2024Updated last year
- Repository of "Train Once, Get a Family: State-Adaptive Balances for Offline-to-Online Reinforcement Learning" (NeurIPS 2023 Spotlight)☆41Oct 30, 2023Updated 2 years ago
- VideoDetective: Clue Hunting via both Extrinsic Query and Intrinsic Relevance for Long Video Understanding☆54Updated this week
- ☆15Aug 12, 2022Updated 3 years ago
- [CVPR 2024] Adapting Short-Term Transformers for Action Detection in Untrimmed Videos☆12Jun 11, 2024Updated last year
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- ☆142Apr 16, 2025Updated 11 months ago
- ☆15Dec 2, 2025Updated 3 months ago
- Official repository of "Prompting Segmentation with Sound is Generalizable Audio-Visual Source Localizer", AAAI 2024☆27Mar 14, 2026Updated 2 weeks ago
- A4-Agent: An Agentic Framework for Zero-Shot Affordance Reasoning☆37Mar 12, 2026Updated 2 weeks ago
- IMAGEimate is an end-to-end pipeline to create realistic animatable 3D avatars from a single image using neural networks☆13Dec 9, 2021Updated 4 years ago
- ACM Multimedia 2023 (Oral) - RTQ: Rethinking Video-language Understanding Based on Image-text Model☆16Jan 31, 2024Updated 2 years ago
- The official implementation of paper: Estimating Egocentric 3D Human Pose in Global Space.☆12Sep 23, 2023Updated 2 years ago
- Official repository for "Boosting Audio Visual Question Answering via Key Semantic-Aware Cues" in ACM MM 2024.☆16Oct 25, 2024Updated last year
- [NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirectional Language Models☆158Dec 9, 2024Updated last year
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Repository to generate CLEVR-Dialog: A diagnostic dataset for Visual Dialog☆49Feb 18, 2020Updated 6 years ago
- (IJCAI 2023) Sph2Pob: Boosting Object Detection on Spherical Images with Planar Oriented Boxes Methods☆14Aug 23, 2023Updated 2 years ago
- A simplified version of MPN☆13May 21, 2021Updated 4 years ago
- Recent advancements propelled by large language models (LLMs), encompassing an array of domains including Vision, Audio, Agent, Robotics,…☆124May 28, 2025Updated 10 months ago
- Explore and Control with Adversarial Surprise☆10Jul 20, 2021Updated 4 years ago
- Official PyTorch code of GroundVQA (CVPR'24)☆64Sep 13, 2024Updated last year
- A curated list of researches in object-centric learning☆11Oct 14, 2024Updated last year