qiulu66 / EgoPlan-Bench2
☆22Updated 2 months ago
Alternatives and similar repositories for EgoPlan-Bench2:
Users that are interested in EgoPlan-Bench2 are comparing it to the libraries listed below
- ReNeg: Learning Negative Embedding with Reward Guidance☆31Updated 2 months ago
- Can 3D Vision-Language Models Truly Understand Natural Language?☆21Updated 11 months ago
- [ICLR 2025] IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model☆27Updated 3 months ago
- IMProv: Inpainting-based Multimodal Prompting for Computer Vision Tasks☆59Updated 5 months ago
- ☆58Updated last year
- Egocentric Video Understanding Dataset (EVUD)☆26Updated 7 months ago
- 👾 E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding (NeurIPS 2024)☆53Updated last month
- Official Pytorch implementation for LARP: Tokenizing Videos with a Learned Autoregressive Generative Prior (ICLR 2025 Oral).☆51Updated 2 weeks ago
- [CVPR 2025] PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models☆30Updated this week
- (NeurIPS 2024 Spotlight) TOPA: Extend Large Language Models for Video Understanding via Text-Only Pre-Alignment☆26Updated 5 months ago
- The offical implemention of JM3D.☆29Updated last year
- Implementation of paper 'Helping Hands: An Object-Aware Ego-Centric Video Recognition Model'☆33Updated last year
- [NeurIPS 2024 D&B Track] Official Repo for "LVD-2M: A Long-take Video Dataset with Temporally Dense Captions"☆46Updated 4 months ago
- ☆58Updated last year
- The repository contains the official implementation of "Self-Calibrated CLIP for Training-Free Open-Vocabulary Segmentation"☆33Updated 3 months ago
- 🔥 [CVPR 2024] Official implementation of "See, Say, and Segment: Teaching LMMs to Overcome False Premises (SESAME)"☆32Updated 8 months ago
- ☆27Updated 7 months ago
- VisualGPTScore for visio-linguistic reasoning☆27Updated last year
- [ECCV 2024] OpenPSG: Open-set Panoptic Scene Graph Generation via Large Multimodal Models☆39Updated last month
- Diffusion Powers Video Tokenizer for Comprehension and Generation (CVPR 2025)☆64Updated this week
- Official Implementation of ISR-DPO:Aligning Large Multimodal Models for Videos by Iterative Self-Retrospective DPO (AAAI'25)☆17Updated 2 weeks ago
- EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Video Generation☆92Updated 3 months ago
- Official code for MotionBench☆24Updated this week
- [CVPR 2024] Data and benchmark code for the EgoExoLearn dataset☆54Updated 6 months ago
- TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models☆28Updated 3 months ago