Scaffold Prompting to promote LMMs
☆46Dec 16, 2024Updated last year
Alternatives and similar repositories for Scaffold
Users that are interested in Scaffold are comparing it to the libraries listed below
Sorting:
- Dettoolchain: A new prompting paradigm to unleash detection ability of MLLM☆45Oct 12, 2024Updated last year
- Official Implementation of UA^{2}-Agent and other baseline algorithms of "Towards Unified Alignment Between Agents, Humans, and Environme…☆19Nov 12, 2024Updated last year
- Repo for paper "CODIS: Benchmarking Context-Dependent Visual Comprehension for Multimodal Large Language Models".☆12Oct 14, 2024Updated last year
- ☆12Dec 20, 2024Updated last year
- [ICCV 2023] Going Beyond Nouns With Vision & Language Models Using Synthetic Data☆14Sep 30, 2023Updated 2 years ago
- The repository of the ACCV 2024 paper "FG-CXR: A Radiologist-Aligned Gaze Dataset for Enhancing Interpretability in Chest X-Ray Report Ge…☆11Jul 28, 2025Updated 6 months ago
- High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning☆53Jul 23, 2025Updated 7 months ago
- The Code for Lever LM: Configuring In-Context Sequence to Lever Large Vision Language Models☆17Oct 4, 2024Updated last year
- ☆18Oct 28, 2025Updated 3 months ago
- ☆17Oct 22, 2024Updated last year
- HACMan++ code release. RSS 2024.☆22Dec 23, 2024Updated last year
- Code repo for the paper: Attacking Vision-Language Computer Agents via Pop-ups☆50Dec 23, 2024Updated last year
- Code and data for ACL 2024 paper on 'Cross-Modal Projection in Multimodal LLMs Doesn't Really Project Visual Attributes to Textual Space'☆19Jul 21, 2024Updated last year
- The collection of medical VLP papars☆20Jul 24, 2024Updated last year
- Implementation of Language-Conditioned Path Planning (Amber Xie, Youngwoon Lee, Pieter Abbeel, Stephen James)☆25Sep 1, 2023Updated 2 years ago
- Implementation of the model: "(MC-ViT)" from the paper: "Memory Consolidation Enables Long-Context Video Understanding"☆27Jan 17, 2026Updated last month
- ☆21Aug 9, 2024Updated last year
- Chest X-Ray Explainer (ChEX)☆23Jan 30, 2025Updated last year
- Code for recreating the HoS benchmark of VISOR☆22Jul 2, 2023Updated 2 years ago
- ☆20Jan 3, 2025Updated last year
- [CVPR 2024] Official Code for the Paper "Compositional Chain-of-Thought Prompting for Large Multimodal Models"☆145Jun 20, 2024Updated last year
- [NeurIPS 2024] Calibrated Self-Rewarding Vision Language Models☆86Oct 26, 2025Updated 4 months ago
- Code☆42Nov 24, 2025Updated 3 months ago
- [EMNLP 2024] This is the code for our paper "BMRetriever: Tuning Large Language Models as Better Biomedical Text Retrievers".☆22Sep 19, 2024Updated last year
- Code for the paper "If at First You Don't Succeed, Try, Try Again: Faithful Diffusion-based Text-to-Image Generation by Selection"☆27Jul 10, 2023Updated 2 years ago
- [CVPR 2025] Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention☆61Jul 16, 2024Updated last year
- ☆55Oct 25, 2025Updated 4 months ago
- Automatic Integration for Neural Spatio-Temporal Point Process models (AI-STPP) is a new paradigm for exact, efficient, non-parametric inf…☆25Oct 14, 2024Updated last year
- LogiCity@NeurIPS'24, D&B track. A multi-agent inductive learning environment for "abstractions".☆27Jun 10, 2025Updated 8 months ago
- Source Code for SIGGRAPH Asia 2024 Paper "Spatiotemporal Bilateral Gradient Filtering for Inverse Rendering"☆25Jan 29, 2025Updated last year
- [ICLR 2025] See What You Are Told: Visual Attention Sink in Large Multimodal Models☆92Feb 16, 2025Updated last year
- LLaVa Version of RaDialog☆26May 27, 2025Updated 9 months ago
- I learn about and explain quantization☆26Apr 19, 2024Updated last year
- Pytorch implementation of HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models☆28Mar 22, 2024Updated last year
- [EMNLP-2025 Oral] ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration☆77Nov 20, 2025Updated 3 months ago
- ☆29Feb 7, 2024Updated 2 years ago
- This repository provides the data and the codes used in the AAAI'24 paper, COOPER: Coordinating Specialized Agents towards a Complex Dial…☆27Mar 1, 2024Updated last year
- [CVPR2025] Official repository for "VideoGuide: Improving Video Diffusion Models without Training Through a Teacher's Guide"☆28May 27, 2025Updated 9 months ago
- Code for the paper "RECAP: Towards Precise Radiology Report Generation via Dynamic Disease Progression Reasoning" (EMNLP'23 Findings).☆28Jun 12, 2025Updated 8 months ago