traveler-framework / TraveLERLinks
[EMNLP 2024] TraveLER: A Modular Multi-LMM Agent Framework for Video Question-Answering
☆15Updated 7 months ago
Alternatives and similar repositories for TraveLER
Users that are interested in TraveLER are comparing it to the libraries listed below
Sorting:
- [CVPR 2025] Official PyTorch code of "Enhancing Video-LLM Reasoning via Agent-of-Thoughts Distillation".☆32Updated 2 weeks ago
- Code release for VTW (AAAI 2025) Oral☆43Updated 4 months ago
- A Comprehensive Survey on Evaluating Reasoning Capabilities in Multimodal Large Language Models.☆63Updated 2 months ago
- Latest Advances on (RL based) Multimodal Reasoning and Generation in Multimodal Large Language Models☆25Updated this week
- A Self-Training Framework for Vision-Language Reasoning☆80Updated 4 months ago
- VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models☆63Updated 10 months ago
- ☆48Updated 2 months ago
- [CVPR' 25] Interleaved-Modal Chain-of-Thought☆45Updated last month
- [CVPR 2025] Adaptive Keyframe Sampling for Long Video Understanding☆66Updated last month
- ☆46Updated last month
- Multi-Stage Vision Token Dropping: Towards Efficient Multimodal Large Language Model☆31Updated 5 months ago
- Reinforcement Learning Tuning for VideoLLMs: Reward Design and Data Efficiency☆38Updated this week
- [ICLR 2025] The official pytorch implement of "Dynamic-LLaVA: Efficient Multimodal Large Language Models via Dynamic Vision-language Cont…☆40Updated 6 months ago
- [Neurips 24' D&B] Official Dataloader and Evaluation Scripts for LongVideoBench.☆96Updated 10 months ago
- ☆24Updated 3 months ago
- Collections of Papers and Projects for Multimodal Reasoning.☆105Updated last month
- ☆84Updated 2 months ago
- [CVPR 2025] DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Models☆48Updated last week
- Code for CVPR25 paper "VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos"☆113Updated 3 months ago
- Official repository of DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models☆85Updated 9 months ago
- ☆77Updated 5 months ago
- official impelmentation of Kangaroo: A Powerful Video-Language Model Supporting Long-context Video Input☆65Updated 9 months ago
- ☆101Updated last month
- [LLaVA-Video-R1]✨First Adaptation of R1 to LLaVA-Video (2025-03-18)☆28Updated 3 weeks ago
- Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective (ACL 2024)☆50Updated 7 months ago
- A curated collection of resources, tools, and frameworks for developing GUI Agents.☆51Updated last week
- Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*☆103Updated last week
- TimeChat-online: 80% Visual Tokens are Naturally Redundant in Streaming Videos☆42Updated 2 weeks ago
- MME-CoT: Benchmarking Chain-of-Thought in LMMs for Reasoning Quality, Robustness, and Efficiency☆108Updated last month
- ☆92Updated 5 months ago