ADL-X / LLAVIDAL
This is the offical repository of LLAVIDAL
β14Updated last month
Alternatives and similar repositories for LLAVIDAL:
Users that are interested in LLAVIDAL are comparing it to the libraries listed below
- β29Updated last month
- Code release for "EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone" [ICCV, 2023]β97Updated 9 months ago
- πΎ E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding (NeurIPS 2024)β57Updated 3 months ago
- Language Repository for Long Video Understandingβ31Updated 10 months ago
- Code for paper "VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos"β108Updated last month
- [ECCV 2024 Oral] ActionVOS: Actions as Prompts for Video Object Segmentationβ31Updated 4 months ago
- Official PyTorch code of GroundVQA (CVPR'24)β59Updated 7 months ago
- [CVPR 2024] Data and benchmark code for the EgoExoLearn datasetβ56Updated 7 months ago
- [CVPR 2024] Official PyTorch implementation of the paper "One For All: Video Conversation is Feasible Without Video Instruction Tuning"β32Updated last year
- Can I Trust Your Answer? Visually Grounded Video Question Answering (CVPR'24, Highlight)β69Updated 9 months ago
- [ICLR 2025] TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuningβ31Updated 2 weeks ago
- Accepted by CVPR 2024β33Updated 11 months ago
- π€ [ICLR'25] Multimodal Video Understanding Framework (MVU)β36Updated 2 months ago
- β29Updated 7 months ago
- [AAAI 2025] Grounded Multi-Hop VideoQA in Long-Form Egocentric Videosβ24Updated last week
- [ECCV 2024] OpenPSG: Open-set Panoptic Scene Graph Generation via Large Multimodal Modelsβ44Updated 3 months ago
- [ECCV 2024] EgoCVR: An Egocentric Benchmark for Fine-Grained Composed Video Retrievalβ35Updated 2 weeks ago
- Action Scene Graphs for Long-Form Understanding of Egocentric Videos (CVPR 2024)β38Updated 2 weeks ago
- Code and Dataset for the CVPRW Paper "Where did I leave my keys? β Episodic-Memory-Based Question Answering on Egocentric Videos"β25Updated last year
- Affordance Grounding from Demonstration Video to Target Image (CVPR 2023)β43Updated 9 months ago
- Official code for MotionBench (CVPR 2025)β35Updated last month
- [CVPR 2024] Context-Guided Spatio-Temporal Video Groundingβ52Updated 9 months ago
- Official implementation of the paper "Boosting Human-Object Interaction Detection with Text-to-Image Diffusion Model"β60Updated last year
- Official PyTorch Code of ReKV (ICLR'25)β13Updated last month
- Code and data release for the paper "Learning Object State Changes in Videos: An Open-World Perspective" (CVPR 2024)β32Updated 7 months ago
- Official implementation of HawkEye: Training Video-Text LLMs for Grounding Text in Videosβ41Updated 11 months ago
- Large Language Models are Temporal and Causal Reasoners for Video Question Answering (EMNLP 2023)β74Updated last month
- [AAAI 2024] DGL: Dynamic Global-Local Prompt Tuning for Text-Video Retrieval.β41Updated 6 months ago
- [CVPR 2025] Official PyTorch Implementation of GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmentaβ¦β34Updated last week
- β88Updated 3 months ago