microsoft / react
REACT (CVPR 2023, Highlight 2.5%)
☆130Updated last year
Related projects: ⓘ
- Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone☆127Updated 11 months ago
- Official implementation for the paper "Prompt Pre-Training with Over Twenty-Thousand Classes for Open-Vocabulary Visual Recognition"☆251Updated 4 months ago
- Code and Models for "GeneCIS A Benchmark for General Conditional Image Similarity"☆54Updated last year
- ☆100Updated last month
- Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"☆138Updated last week
- ☆80Updated 4 months ago
- Filtering, Distillation, and Hard Negatives for Vision-Language Pre-Training☆130Updated last year
- Reproducible scaling laws for contrastive language-image learning (https://arxiv.org/abs/2212.07143)☆149Updated 9 months ago
- Densely Captioned Images (DCI) dataset repository.☆155Updated 2 months ago
- UniTAB: Unifying Text and Box Outputs for Grounded VL Modeling, ECCV 2022 (Oral Presentation)☆84Updated last year
- Open source code for AAAI 2023 Paper "BridgeTower: Building Bridges Between Encoders in Vision-Language Representation Learning"☆149Updated last year
- [CVPR2023] The code for 《Position-guided Text Prompt for Vision-Language Pre-training》☆148Updated last year
- ☆128Updated 8 months ago
- SVIT: Scaling up Visual Instruction Tuning☆159Updated 3 months ago
- Toolkit for Elevater Benchmark☆65Updated 11 months ago
- [NeurIPS 2023] Text data, code and pre-trained models for paper "Improving CLIP Training with Language Rewrites"☆251Updated 8 months ago
- Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning☆93Updated 2 months ago
- Dense Connector for MLLMs☆98Updated last month
- Official repo for StableLLAVA☆90Updated 8 months ago
- Pytorch code for paper From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language Models☆181Updated 8 months ago
- ☆163Updated last year
- [ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning☆244Updated 6 months ago
- Implementation for the CVPR 2023 paper "Improving Selective Visual Question Answering by Learning from Your Peers" (https://arxiv.org/abs…☆23Updated last year
- ☆45Updated 2 months ago
- Official implementation of the paper "Interfacing Foundation Models' Embeddings"☆107Updated 3 weeks ago
- Official implementation of the Law of Vision Representation in MLLMs☆93Updated last week
- [EMNLP'23] The official GitHub page for ''Evaluating Object Hallucination in Large Vision-Language Models''☆67Updated 5 months ago
- This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"☆138Updated 5 months ago
- This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.or…☆100Updated 2 months ago
- [ECCV2024] Official code implementation of Merlin: Empowering Multimodal LLMs with Foresight Minds☆80Updated 2 months ago