Scaffold Prompting to promote LMMs
☆46Dec 16, 2024Updated last year
Alternatives and similar repositories for Scaffold
Users that are interested in Scaffold are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Dettoolchain: A new prompting paradigm to unleash detection ability of MLLM☆45Oct 12, 2024Updated last year
- The repository of the ACCV 2024 paper "FG-CXR: A Radiologist-Aligned Gaze Dataset for Enhancing Interpretability in Chest X-Ray Report Ge…☆11Jul 28, 2025Updated 8 months ago
- ☆12Dec 20, 2024Updated last year
- LDA-1B: Scaling Latent Dynamics Action Model via Universal Embodied Data Ingestion☆84Mar 18, 2026Updated 3 weeks ago
- Implementation of Language-Conditioned Path Planning (Amber Xie, Youngwoon Lee, Pieter Abbeel, Stephen James)☆26Sep 1, 2023Updated 2 years ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Extract features and bounding boxes using the original Bottom-up Attention Faster-RCNN in a few lines of Python code☆11Sep 18, 2022Updated 3 years ago
- [CVPR 2024] Official Code for the Paper "Compositional Chain-of-Thought Prompting for Large Multimodal Models"☆146Jun 20, 2024Updated last year
- [CVPR 2025] Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention☆65Jul 16, 2024Updated last year
- ☆19Oct 28, 2025Updated 5 months ago
- High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning☆54Jul 23, 2025Updated 8 months ago
- The Code for Lever LM: Configuring In-Context Sequence to Lever Large Vision Language Models☆18Oct 4, 2024Updated last year
- Code and data for ACL 2024 paper on 'Cross-Modal Projection in Multimodal LLMs Doesn't Really Project Visual Attributes to Textual Space'☆19Jul 21, 2024Updated last year
- Data and code for the paper: Finding Safety Neurons in Large Language Models☆25Jan 29, 2026Updated 2 months ago
- Code repo for the paper: Attacking Vision-Language Computer Agents via Pop-ups☆51Dec 23, 2024Updated last year
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- 目标检测,关键点检测。A pure version of CenterNet, convenient for secondary development and easy to understand.☆21Dec 9, 2020Updated 5 years ago
- ☆21Aug 9, 2024Updated last year
- Spatial Aptitude Training for Multimodal Langauge Models☆27Feb 8, 2026Updated 2 months ago
- Implementation of the model: "(MC-ViT)" from the paper: "Memory Consolidation Enables Long-Context Video Understanding"☆27Mar 13, 2026Updated 3 weeks ago
- VLS: Steering Pretrained Robot Policies via Vision–Language Models☆46Mar 29, 2026Updated last week
- ☆36Apr 14, 2023Updated 2 years ago
- ☆20Jan 3, 2025Updated last year
- ☆56Oct 25, 2025Updated 5 months ago
- ☆14Apr 25, 2025Updated 11 months ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- The collection of medical VLP papars☆20Jul 24, 2024Updated last year
- Official Code Repository for EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents (COLM 2024)☆40Jul 13, 2024Updated last year
- ☆10Dec 15, 2024Updated last year
- [EMNLP 2024] This is the code for our paper "BMRetriever: Tuning Large Language Models as Better Biomedical Text Retrievers".☆24Sep 19, 2024Updated last year
- [EMNLP-2025 Oral] ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration☆80Nov 20, 2025Updated 4 months ago
- Code for the paper "If at First You Don't Succeed, Try, Try Again: Faithful Diffusion-based Text-to-Image Generation by Selection"☆27Jul 10, 2023Updated 2 years ago
- [ACM TOMM] Official implementation of "TextCoT: Zoom-In for Enhanced Multimodal Text-Rich Image Understanding"☆44Feb 27, 2026Updated last month
- [NeurIPS 2024] Calibrated Self-Rewarding Vision Language Models☆87Oct 26, 2025Updated 5 months ago
- Graph Cut Algorithm in CUDA☆28Jun 1, 2019Updated 6 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- ☆110Jan 3, 2026Updated 3 months ago
- ☆24Nov 29, 2023Updated 2 years ago
- ☆16Mar 10, 2025Updated last year
- [ICLR 2025] See What You Are Told: Visual Attention Sink in Large Multimodal Models☆104Feb 16, 2025Updated last year
- [CVPR2025] BOLT: Boost Large Vision-Language Model Without Training for Long-form Video Understanding☆47Feb 5, 2026Updated 2 months ago
- Imply games202 homework in C++ and OpenGL☆13Sep 14, 2022Updated 3 years ago
- A Vision-Language Model for Spatial Affordance Prediction in Robotics☆218Jul 17, 2025Updated 8 months ago