KARL: Knowledge-Aware Reasoning and Reinforcement Learning for Knowledge-Intensive Visual Grounding
☆68Apr 5, 2026Updated last month
Alternatives and similar repositories for KARL
Users that are interested in KARL are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*☆110May 27, 2025Updated 11 months ago
- This is the code repo for our paper "Say More with Less: Understanding Prompt Learning Behaviors through Gist Compression".☆13Feb 27, 2024Updated 2 years ago
- ☆28Jan 22, 2026Updated 3 months ago
- ☆33May 9, 2025Updated last year
- VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning☆37May 9, 2026Updated last week
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- [CIKM 2023 Oral] This is the code repo for our CIKM‘23 paper "Text Matching Improves Sequential Recommendation by Reducing Popularity Bia…☆40Mar 17, 2024Updated 2 years ago
- [ICLR 2025] MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation☆144Sep 11, 2025Updated 8 months ago
- [CVPR 2025] MicroVQA eval and 🤖RefineBot code for "MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research"…☆38Nov 25, 2025Updated 5 months ago
- Code release for 'Struct2D: A Perception-Guided Framework for Spatial Reasoning in MLLMs' (NeurIPS 2025)☆30Oct 28, 2025Updated 6 months ago
- Code for the Paper 'On the Connection Between Adversarial Robustness and Saliency Map Interpretability' by C. Etmann, S. Lunz, P. Maass, …☆16May 9, 2019Updated 7 years ago
- iLLaVA: An Image is Worth Fewer Than 1/3 Input Tokens in Large Multimodal Models (ICLR2026)☆22Mar 29, 2026Updated last month
- ☆16Jul 23, 2024Updated last year
- [NeurIPS 2025] Elevating Visual Perception in Multimodal LLMs with Visual Embedding Distillation☆72Oct 17, 2025Updated 7 months ago
- TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning☆116Dec 24, 2025Updated 4 months ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- [CVPR'25] Conformal prediction for vision-language models. Enhancing VLMs deployment with reliability gurarantees.☆21Jun 7, 2025Updated 11 months ago
- Code for paper: Reinforced Vision Perception with Tools☆72Oct 3, 2025Updated 7 months ago
- ☆37Sep 16, 2024Updated last year
- official implementation of "Med-Unic: unifying cross-lingual medical vision-language pre-training by diminishing bias"☆17Sep 22, 2023Updated 2 years ago
- Official repo of the ICLR 2025 paper "MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos"☆28Jul 15, 2025Updated 10 months ago
- [CVPR2025 Highlight] Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models☆240Nov 7, 2025Updated 6 months ago
- MM-EUREKA: Exploring the Frontiers of Multimodal Reasoning with Rule-based Reinforcement Learning☆774Sep 7, 2025Updated 8 months ago
- Multiplex Thinking: Reasoning via Token-wise Branch-and-Merge☆121Apr 27, 2026Updated 3 weeks ago
- ORES: Open-vocabulary Responsible Visual Synthesis☆14Dec 12, 2023Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ☆11Oct 2, 2024Updated last year
- [ACL 2024 Oral] This is the code repo for our ACL‘24 paper "MARVEL: Unlocking the Multi-Modal Capability of Dense Retrieval via Visual Mo…☆39Jun 30, 2024Updated last year
- Official Implementation of CODE☆17Sep 26, 2024Updated last year
- [NeurIPS 2025] NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation☆111Sep 18, 2025Updated 8 months ago
- MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment☆35Jul 1, 2024Updated last year
- Pruning the VLLMs☆105Dec 9, 2024Updated last year
- [ACL2025 Findings] Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models☆89May 20, 2025Updated last year
- [ICLR 2023] This is the code repo for our ICLR‘23 paper "Universal Vision-Language Dense Retrieval: Learning A Unified Representation Spa…☆52Jul 3, 2024Updated last year
- Extend OpenRLHF to support LMM RL training for reproduction of DeepSeek-R1 on multimodal tasks.☆845May 14, 2025Updated last year
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- 【NeurIPS 2024】Dense Connector for MLLMs☆183Oct 14, 2024Updated last year
- ☆60Jun 5, 2024Updated last year
- [EMNLP 2024 Findings🔥] Official implementation of ": LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context In…☆103Nov 9, 2024Updated last year
- Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme☆149Apr 9, 2025Updated last year
- ☆20Jun 10, 2025Updated 11 months ago
- ☆20Dec 31, 2020Updated 5 years ago
- MMPD Dataset from ECCV'2024 "When Pedestrian Detection Meets Multi-Modal Learning: Generalist Model and Benchmark Dataset"☆21Jul 15, 2024Updated last year