MM-FIRE / FIRELinks
β13Updated last year
Alternatives and similar repositories for FIRE
Users that are interested in FIRE are comparing it to the libraries listed below
Sorting:
- MAT: Multi-modal Agent Tuning π₯ ICLR 2025 (Spotlight)β77Updated 5 months ago
- [IJCV] EgoPlan-Bench: Benchmarking Multimodal Large Language Models for Human-Level Planningβ75Updated last year
- [CVPR 2025] VISCO: Benchmarking Fine-Grained Critique and Correction Towards Self-Improvement in Visual Reasoningβ14Updated 6 months ago
- β108Updated 4 months ago
- β14Updated 2 years ago
- β84Updated last year
- EMNLP2023 - InfoSeek: A New VQA Benchmark focus on Visual Info-Seeking Questionsβ25Updated last year
- [NeurIPS 25] The official implementation of SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoningβ24Updated 2 months ago
- [ICML 2025] M-STAR (Multimodal Self-Evolving TrAining for Reasoning) Project. Diving into Self-Evolving Training for Multimodal Reasoningβ69Updated 5 months ago
- [NeurIPS D&B Track 2024] Source code for the paper "Constrained Human-AI Cooperation: An Inclusive Embodied Social Intelligence Challengeβ¦β22Updated 7 months ago
- Official Repository of LatentSeekβ70Updated 6 months ago
- Official repo for EscapeCraft (an 3D environment for room escape) and benchmark MM-Escape. This work is accepted by ICCV 2025.β35Updated 5 months ago
- Evaluate Multimodal LLMs as Embodied Agentsβ54Updated 10 months ago
- The code of the paper "DivScene: Benchmarking LVLMs for Object Navigation with Diverse Scenes and Objects"β19Updated 7 months ago
- β133Updated last year
- β55Updated last year
- MuMA-ToM: Multi-modal Multi-Agent Theory of Mindβ36Updated 10 months ago
- β¨β¨The Curse of Multi-Modalities (CMM): Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audioβ50Updated 5 months ago
- The official implement of "Grounded Chain-of-Thought for Multimodal Large Language Models"β19Updated 4 months ago
- β28Updated 10 months ago
- Extending context length of visual language modelsβ12Updated 11 months ago
- β35Updated last year
- A Self-Training Framework for Vision-Language Reasoningβ87Updated 10 months ago
- Codes for ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding [ICML 2025]]β43Updated 4 months ago
- [ICLR2025 Spotlight] Agent Trajectory Synthesis via Guiding Replay with Web Tutorialsβ45Updated 9 months ago
- [ACL 2025] A Neural-Symbolic Self-Training Frameworkβ117Updated 6 months ago
- Research works from Tencent AI Lab regarding self-evolving agentsβ70Updated 3 months ago
- [NeurIPS 2025] Scaling Language-centric Omnimodal Representation Learningβ30Updated last month
- my commonly-used toolsβ63Updated 11 months ago
- β21Updated 7 months ago