WPR001 / Ego-ST
☆13Updated this week
Alternatives and similar repositories for Ego-ST:
Users that are interested in Ego-ST are comparing it to the libraries listed below
- [CVPR 2025] OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?☆40Updated last week
- R1-like Video-LLM for Temporal Grounding☆62Updated 2 weeks ago
- Implementation of "VL-Mamba: Exploring State Space Models for Multimodal Learning"☆81Updated last year
- The official repo for "Ref-AVS: Refer and Segment Objects in Audio-Visual Scenes", ECCV 2024☆37Updated 3 months ago
- (CVPR 2025) PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction☆84Updated 3 weeks ago
- [CVPR'2025] VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".☆140Updated 3 weeks ago
- Accepted by CVPR 2024☆33Updated 10 months ago
- [NeurIPS 2024] Efficient Large Multi-modal Models via Visual Context Compression☆54Updated last month
- [CVPR'25] VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection☆65Updated this week
- Official PyTorch code of GroundVQA (CVPR'24)☆59Updated 6 months ago
- [Open LLaVA-Video-R1]✨First Adaptation of R1 to LLaVA-Video☆24Updated 2 weeks ago
- MADTP: Multimodal Alignment-Guided Dynamic Token Pruning for Accelerating Vision-Language Transformer☆41Updated 6 months ago
- Official PyTorch code of "Enhancing Video-LLM Reasoning via Agent-of-Thoughts Distillation".☆22Updated last month
- AL-Ref-SAM 2: Unleashing the Temporal-Spatial Reasoning Capacity of GPT for Training-Free Audio and Language Referenced Video Object Segm…☆80Updated 3 months ago
- [NeurIPS2024] Repo for the paper `ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models'☆154Updated 2 months ago
- This repo contains evaluation code for the paper "AV-Odyssey: Can Your Multimodal LLMs Really Understand Audio-Visual Information?"☆23Updated 3 months ago
- FreeVA: Offline MLLM as Training-Free Video Assistant☆57Updated 9 months ago
- A paper list for spatial reasoning☆52Updated last month
- [ECCV 2024] API: Attention Prompting on Image for Large Vision-Language Models☆79Updated 5 months ago
- [CVPR 2025] Adaptive Keyframe Sampling for Long Video Understanding☆42Updated last week
- 🔥CVPR 2025 Multimodal Large Language Models Paper List☆123Updated 3 weeks ago
- Repo for paper "T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs"☆49Updated 3 weeks ago
- Official implement of MIA-DPO☆54Updated 2 months ago
- [AAAI 2025] Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos☆23Updated 6 months ago
- Official repository of DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models☆83Updated 7 months ago
- Latest Advances on (RL based) Multimodal Reasoning and Generation in Multimodal Large Language Models☆17Updated this week
- Official implementation of paper "SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference".☆88Updated 3 weeks ago
- Official code for paper: [CLS] Attention is All You Need for Training-Free Visual Token Pruning: Make VLM Inference Faster.☆64Updated 3 months ago
- [NeurIPS'24]Efficient and accurate memory saving method towards W4A4 large multi-modal models.☆68Updated 3 months ago
- Official code for MotionBench (CVPR 2025)☆31Updated last month