[NeurIPS'25] ReAgent-V: A Reward-Driven Multi-Agent Framework for Video Understanding
☆46Sep 21, 2025Updated 5 months ago
Alternatives and similar repositories for ReAgent-V
Users that are interested in ReAgent-V are comparing it to the libraries listed below
Sorting:
- Code of LVAgent: Long Video Understanding by Multi-Round Dynamical Collaboration of MLLM Agents☆28Nov 24, 2025Updated 3 months ago
- Agentic Keyframe Search for Video Question Answering☆16Apr 7, 2025Updated 10 months ago
- When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought☆27Feb 14, 2026Updated 2 weeks ago
- [NeurIPS 2025] PANDA: Towards Generalist Video Anomaly Detection via Agentic AI Engineer☆28Oct 2, 2025Updated 5 months ago
- EMMA [TMLR 2025]☆12Sep 25, 2025Updated 5 months ago
- [ICLR'26] EduVisAgent: A Benchmark and Multi-Agent Framework for Pedagogical Visualization☆28Aug 5, 2025Updated 6 months ago
- TS-LLaVA: Constructing Visual Tokens through Thumbnail-and-Sampling for Training-Free Video Large Language Models☆19Jan 2, 2025Updated last year
- Research works from Tencent AI Lab regarding self-evolving agents☆82Jan 30, 2026Updated last month
- [NeurIPS25] Official Implementation (Pytorch) of "DeepVideo-R1"☆31Feb 22, 2026Updated last week
- ☆48Jan 13, 2026Updated last month
- [ICLR 2025] Code for Self-Correcting Decoding with Generative Feedback for Mitigating Hallucinations in Large Vision-Language Models☆24Apr 14, 2025Updated 10 months ago
- [TMLR 2024] Official implementation of "Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics"☆20Sep 15, 2023Updated 2 years ago
- Open-Vocabulary Panoptic Segmentation☆27Jun 15, 2025Updated 8 months ago
- [CVPR 2024] Hybrid Proposal Refiner: Revisiting DETR Series from the Faster R-CNN Perspective☆21Aug 18, 2024Updated last year
- [Arxiv] Aligning Modalities in Vision Large Language Models via Preference Fine-tuning☆91Apr 30, 2024Updated last year
- ☆23Sep 5, 2023Updated 2 years ago
- Official Implementation (Pytorch) of the "VidChain: Chain-of-Tasks with Metric-based Direct Preference Optimization for Dense Video Capti…☆23Jan 26, 2025Updated last year
- [ICML 2024] Official implementation for "HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding"☆109Dec 4, 2024Updated last year
- VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning☆35Jul 15, 2025Updated 7 months ago
- Universal Video Temporal Grounding with Generative Multi-modal Large Language Models☆46Nov 25, 2025Updated 3 months ago
- ☆18Jun 10, 2025Updated 8 months ago
- The official implementation of our work Hawkeye: Discovering and Grounding Implicit Anomalous Sentiment in Recon-videos via Scene-enhanc…☆12Oct 14, 2024Updated last year
- ☆105Feb 4, 2026Updated 3 weeks ago
- [AAAI 2026] Global Compression Commander: Plug-and-Play Inference Acceleration for High-Resolution Large Vision-Language Models☆38Jan 27, 2026Updated last month
- [ICRA 2026] StereoAdapter: Adapting Stereo Depth Estimation to Underwater Scenes☆20Feb 17, 2026Updated last week
- ☆65Feb 23, 2026Updated last week
- Finetuning & extending DiffusionDet to video & pedestrian multi-object-tracking☆13Apr 12, 2023Updated 2 years ago
- MiniMax-Provider-Verifier offers a rigorous, vendor-agnostic way to verify whether third-party deployments of the Minimax M2 model are co…☆28Feb 18, 2026Updated last week
- The repository of VG-Refiner paper☆17Dec 9, 2025Updated 2 months ago
- (ICCV2025) Official repository of paper "ViSpeak: Visual Instruction Feedback in Streaming Videos"☆45Jul 1, 2025Updated 8 months ago
- [ACM MM 2025] TimeChat-online: 80% Visual Tokens are Naturally Redundant in Streaming Videos☆115Dec 12, 2025Updated 2 months ago
- ☆36Jul 9, 2025Updated 7 months ago
- MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation Models (CVPR 2023)☆35Apr 23, 2024Updated last year
- Community maintained hardware plugin for vLLM on AWS Neuron☆23Updated this week
- 用Kinect2.0读取图像的深度等信息,分割出手部图像。用HOG提取手部图像信息,接着用SVM进行训练。目的是为了识别手势。☆10Jan 8, 2020Updated 6 years ago
- ☆10Apr 7, 2025Updated 10 months ago
- DisTime: Distribution-based Time Representation for Video Large Language Models.☆18Jul 10, 2025Updated 7 months ago
- Official Implementation of HIMA (COLM'25)☆19Nov 25, 2025Updated 3 months ago
- Gesture Recognition Based on ALTERA DE2-115 FPGA☆10Mar 18, 2014Updated 11 years ago