traveler-framework / TraveLER
[EMNLP 2024] TraveLER: A Modular Multi-LMM Agent Framework for Video Question-Answering
☆13Updated 5 months ago
Alternatives and similar repositories for TraveLER:
Users that are interested in TraveLER are comparing it to the libraries listed below
- Code release for VTW (AAAI 2025) Oral☆34Updated 3 months ago
- Official PyTorch code of "Enhancing Video-LLM Reasoning via Agent-of-Thoughts Distillation".☆26Updated 2 months ago
- ☆85Updated this week
- Code for paper "VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos"☆105Updated last month
- ☆86Updated 3 months ago
- This is the official implementation of our paper "QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Video Comprehens…☆68Updated last month
- [EMNLP 2024 Findings🔥] Official implementation of ": LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context In…☆92Updated 5 months ago
- A Self-Training Framework for Vision-Language Reasoning☆75Updated 2 months ago
- [EMNLP'23] The official GitHub page for ''Evaluating Object Hallucination in Large Vision-Language Models''☆82Updated last year
- [Blog 1] Recording a bug of grpo_trainer in some R1 projects☆18Updated last month
- ☆144Updated 5 months ago
- VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models☆53Updated 9 months ago
- Latest Advances on (RL based) Multimodal Reasoning and Generation in Multimodal Large Language Models☆19Updated last week
- Envolving Temporal Reasoning Capability into LMMs via Temporal Consistent Reward☆30Updated 3 weeks ago
- LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models☆124Updated 11 months ago
- [CVPR 2024 Highlight] Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding☆265Updated 6 months ago
- up-to-date curated list of state-of-the-art Large vision language models hallucinations research work, papers & resources☆117Updated last week
- official impelmentation of Kangaroo: A Powerful Video-Language Model Supporting Long-context Video Input☆64Updated 7 months ago
- [CVPR 2025] VISCO: Benchmarking Fine-Grained Critique and Correction Towards Self-Improvement in Visual Reasoning☆11Updated last month
- The official repository for the paper "Can MLLMs Reason in Multimodality? EMMA: An Enhanced MultiModal ReAsoning Benchmark"☆50Updated this week
- Weakly Supervised Gaussian Contrastive Grounding with Large Multimodal Models for Video Question Answering [ACM MM'24]☆10Updated 8 months ago
- ☆71Updated 3 months ago
- [Neurips 24' D&B] Official Dataloader and Evaluation Scripts for LongVideoBench.☆94Updated 8 months ago
- A Comprehensive Survey on Evaluating Reasoning Capabilities in Multimodal Large Language Models.☆52Updated last month
- Official implementation for "A Simple LLM Framework for Long-Range Video Question-Answering"☆94Updated 5 months ago
- A hot-pluggable tool for visualizing LLaVA's attention.☆15Updated last year
- [ECCV 2024] Paying More Attention to Image: A Training-Free Method for Alleviating Hallucination in LVLMs☆114Updated 5 months ago
- Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization☆86Updated last year
- ☆99Updated last week
- [ICML 2024 Oral] Official code repository for MLLM-as-a-Judge.☆67Updated 2 months ago