[EMNLP 2024] A Video Chat Agent with Temporal Prior
☆32Mar 2, 2025Updated last year
Alternatives and similar repositories for VideoTGB
Users that are interested in VideoTGB are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆13Feb 26, 2024Updated 2 years ago
- 【ICLR 2025 🔥】The code for Consistent In-Context Editing, an approach for tuning language models through contextual distributions, overco…☆55Apr 2, 2025Updated last year
- Weakly Supervised Gaussian Contrastive Grounding with Large Multimodal Models for Video Question Answering [ACM MM'24]☆10Jul 22, 2024Updated last year
- ☆12Mar 4, 2022Updated 4 years ago
- [EMNLP’24 Main] Encoding and Controlling Global Semantics for Long-form Video Question Answering☆18Oct 9, 2024Updated last year
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- [ACL 2023] VSTAR is a multimodal dialogue dataset with scene and topic transition information☆16Oct 27, 2024Updated last year
- Contrastive Video Question Answering via Video Graph Transformer (IEEE T-PAMI'23)☆19Mar 9, 2024Updated 2 years ago
- (NeurIPS 2024 Spotlight) TOPA: Extend Large Language Models for Video Understanding via Text-Only Pre-Alignment☆29Sep 27, 2024Updated last year
- This repo contains source code for Glance and Focus: Memory Prompting for Multi-Event Video Question Answering (Accepted in NeurIPS 2023)☆31Jun 28, 2024Updated last year
- [Main Conference @ EACL'26] [Workshop @ NeurIPS'24] 🎞️ LVNet.☆43Feb 10, 2026Updated 2 months ago
- Language Modeling Research Hub, a comprehensive compendium for enthusiasts and scholars delving into the fascinating realm of language mo…☆19Mar 19, 2025Updated last year
- We have implemented Track # 1 for ICME 2024: Spatial Action Localization on Chaotic World dataset. Our mAP on the validation set reaches …☆12Nov 11, 2024Updated last year
- The open-source code for the NeurIPS 2025 paper, "Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learn…☆49Jan 5, 2026Updated 3 months ago
- ☆11May 2, 2022Updated 3 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- 中国历年GDP和人口数据可视化☆13Jan 18, 2023Updated 3 years ago
- [CVPR 2026] UFVideo: Towards Unified Fine-Grained Video Cooperative Understanding with Large Language Models☆37Feb 21, 2026Updated last month
- ☆11Aug 7, 2024Updated last year
- ☆30Feb 12, 2026Updated 2 months ago
- VideoDetective: Clue Hunting via both Extrinsic Query and Intrinsic Relevance for Long Video Understanding☆59Mar 24, 2026Updated 3 weeks ago
- [CVPR 2024] Adapting Short-Term Transformers for Action Detection in Untrimmed Videos☆12Jun 11, 2024Updated last year
- ☆145Apr 16, 2025Updated last year
- ☆15Apr 11, 2026Updated last week
- IMAGEimate is an end-to-end pipeline to create realistic animatable 3D avatars from a single image using neural networks☆13Dec 9, 2021Updated 4 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- (2024CVPR) MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding☆350Jul 19, 2024Updated last year
- ACM Multimedia 2023 (Oral) - RTQ: Rethinking Video-language Understanding Based on Image-text Model☆16Apr 7, 2026Updated last week
- The official implementation of paper: Estimating Egocentric 3D Human Pose in Global Space.☆12Sep 23, 2023Updated 2 years ago
- (IJCAI 2023) Sph2Pob: Boosting Object Detection on Spherical Images with Planar Oriented Boxes Methods☆14Aug 23, 2023Updated 2 years ago
- Repository to generate CLEVR-Dialog: A diagnostic dataset for Visual Dialog☆49Feb 18, 2020Updated 6 years ago
- [NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirectional Language Models☆159Dec 9, 2024Updated last year
- A simplified version of MPN☆13May 21, 2021Updated 4 years ago
- CVPR 2021 Oral Paper PatchGenCN☆11Oct 28, 2021Updated 4 years ago
- Risky Object Localization (ROL) in a Driving Scene Dataset☆15Dec 24, 2023Updated 2 years ago
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- ☆63Apr 7, 2026Updated last week
- \infty-Video: A Training-Free Approach to Long Video Understanding via Continuous-Time Memory Consolidation☆20Feb 14, 2025Updated last year
- [WIP🚧] 2025 up-to-date list of resources on visual tokenizers (primarily for visual generation). Give it a star 🌟 if you find it useful…☆20Jan 5, 2025Updated last year
- [arXiv 2024] PyTorch implementation of RRD: https://arxiv.org/abs/2407.12073☆15Dec 2, 2025Updated 4 months ago
- [NeurIPS 2023] Self-Chained Image-Language Model for Video Localization and Question Answering☆198Jan 14, 2024Updated 2 years ago
- [arXiv'25] LiCoMemory: Lightweight and Cognitive Agentic Memory for Efficient Long-Term Reasoning☆41Jan 6, 2026Updated 3 months ago
- ☆107Jul 30, 2024Updated last year