[Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics]: VisuoThink: Empowering LVLM Reasoning with Multimodal Tree Search
☆102Jul 24, 2025Updated 7 months ago
Alternatives and similar repositories for VisuoThink
Users that are interested in VisuoThink are comparing it to the libraries listed below
Sorting:
- An implementation of Scalable Evaluation and Improvement of Document Set Expansion via Neural Positive-Unlabeled Learning without AllenNL…☆19Feb 20, 2024Updated 2 years ago
- Codes for ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding [ICML 2025]]☆45Jul 22, 2025Updated 7 months ago
- ☆13Dec 9, 2024Updated last year
- OpenThinkIMG is an end-to-end open-source framework that empowers Large Vision-Language Models to think with images.☆116Jul 11, 2025Updated 7 months ago
- Course Website for OS Autumn 2021 at Fudan University☆14Feb 1, 2022Updated 4 years ago
- (CVPR2025 Highlight) Official repository of paper "Panorama Generation From NFoV Image Done Right"☆19May 29, 2025Updated 9 months ago
- ☆20Jul 22, 2025Updated 7 months ago
- Code of the COLING22 paper "uChecker: Masked Pretrained Language Models as Unsupervised Chinese Spelling Checkers"☆19Aug 17, 2022Updated 3 years ago
- [NeurIPS 2025] Codes for paper Foundation Cures Personalization: Improving Personalized Models' Prompt Consistency via Hidden Foundation …☆133Sep 20, 2025Updated 5 months ago
- Chat with Arxiv Paper 📑 (ChatGPT) / 通过对话理解论文☆27Apr 6, 2023Updated 2 years ago
- ☆132Mar 22, 2025Updated 11 months ago
- The official implementation of our NeurIPS 2025 Poster paper: Precise Diffusion Inversion: Towards Novel Samples and Few-Step Model☆88Nov 30, 2025Updated 3 months ago
- [ICCV 2025] ONLY: One-Layer Intervention Sufficiently Mitigates Hallucinations in Large Vision-Language Models☆49Jul 7, 2025Updated 7 months ago
- A simple, unified multimodal models training engine. Lean, flexible, and built for hacking at scale.☆734Updated this week
- [ICML 2024] CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers☆34Dec 30, 2024Updated last year
- ☆93Jul 7, 2025Updated 7 months ago
- MultiMath: Bridging Visual and Mathematical Reasoning for Large Language Models☆32Jan 22, 2025Updated last year
- ☆20Mar 3, 2023Updated 3 years ago
- [CVPR2025] VDocRAG: Retirval-Augmented Generation over Visually-Rich Documents☆59May 26, 2025Updated 9 months ago
- [ICLR 2026 Oral] Visual Planning: Let's Think Only with Images☆304Feb 24, 2026Updated last week
- 为您的网站或APP接入USDT收款,无需区块链知识,支持Telegram和独角数卡,支持回调,支持各种编程语言,整个过程只需2步,小白也能接入。a USDT wallet development and automatic payment APIs☆214Feb 9, 2026Updated 3 weeks ago
- Align Anything: Training All-modality Model with Feedback☆4,636Nov 27, 2025Updated 3 months ago
- A collection of awesome think with videos papers.☆90Dec 1, 2025Updated 3 months ago
- 2021MXAP-DGL rank2☆35Mar 23, 2022Updated 3 years ago
- XL-VLMs: General Repository for eXplainable Large Vision Language Models☆46Sep 8, 2025Updated 5 months ago
- [NeurIPS 2025] Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models☆64Nov 27, 2025Updated 3 months ago
- 🔥minerproxy,minerproxy,minerproxy,minerproxy,minerproxy,minerproxy,minerproxy,minerproxy,minerproxy,minerproxy,矿池抽水,矿池中转,矿场运维专用☆3,246Jan 14, 2026Updated last month
- Skywork-R1V is an advanced multimodal AI model series developed by Skywork AI, specializing in vision-language reasoning.☆3,150Dec 15, 2025Updated 2 months ago
- BERT-based AI-generated academic text detection model☆220Nov 1, 2025Updated 4 months ago
- Repository of paper: Position-Enhanced Visual Instruction Tuning for Multimodal Large Language Models☆37Sep 19, 2023Updated 2 years ago
- Science-Star: A Platform for Building, Extending, and Experimenting with Scientific Agents.☆743Feb 15, 2026Updated 2 weeks ago
- ☆27Dec 30, 2025Updated 2 months ago
- the datasets of our paper☆11Feb 26, 2024Updated 2 years ago
- Instituto de Telecomunicações Deep Learning-based Point Cloud Codec☆11Jun 18, 2024Updated last year
- Building a multi-agent RAG system with advanced RAG methods☆12Jan 12, 2025Updated last year
- [MQM-APE] Toward High-Quality Error Annotation Predictors with Automatic Post-Editing in LLM Translation Evaluators.☆11Sep 24, 2024Updated last year
- A simple exam generator and grader written in Python with OpenCV☆14Jan 14, 2026Updated last month
- Official implementation of the paper "Light Transport-aware Diffusion Posterior Sampling for Single View Reconstruction of Volumes"☆17Aug 1, 2025Updated 7 months ago
- [EMNLP 2024 Findings] Wrong-of-Thought: An Integrated Reasoning Framework with Multi-Perspective Verification and Wrong Information☆13Oct 1, 2024Updated last year