HaozheZhao / MICView external linksLinks
MMICL, a state-of-the-art VLM with the in context learning ability from ICL, PKU
☆360Dec 18, 2023Updated 2 years ago
Alternatives and similar repositories for MIC
Users that are interested in MIC are comparing it to the libraries listed below
Sorting:
- ☆14Nov 14, 2023Updated 2 years ago
- MMICL, a state-of-the-art VLM with the in context learning ability from ICL, PKU☆50Jul 13, 2025Updated 7 months ago
- [ACL 2024] PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action Chain☆106Mar 14, 2024Updated last year
- ☆805Jul 8, 2024Updated last year
- Emu Series: Generative Multimodal Models from BAAI☆1,765Jan 12, 2026Updated last month
- ☆133Dec 22, 2023Updated 2 years ago
- [2024-ACL]: TextBind: Multi-turn Interleaved Multimodal Instruction-following in the Wildrounded Conversation☆47Sep 19, 2023Updated 2 years ago
- [ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning☆296Mar 13, 2024Updated last year
- Code for "SCL-RAI: Span-based Contrastive Learning with Retrieval Augmented Inference for Unlabeled Entity Problem in NER" @COLING-2022☆11Aug 20, 2022Updated 3 years ago
- LaVIT: Empower the Large Language Model to Understand and Generate Visual Content☆603Oct 6, 2024Updated last year
- mPLUG-Owl: The Powerful Multi-modal Large Language Model Family☆2,537Apr 2, 2025Updated 10 months ago
- Lion: Kindling Vision Intelligence within Large Language Models☆51Jan 25, 2024Updated 2 years ago
- 🦩 Official repository of paper "Visual Instruction Tuning with Polite Flamingo" (AAAI-24 Oral)☆65Dec 9, 2023Updated 2 years ago
- [Findings of EMNLP22] From Mimicking to Integrating: Knowledge Integration for Pre-Trained Language Models☆19Mar 16, 2023Updated 2 years ago
- Code/Data for the paper: "LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding"☆269Jun 12, 2024Updated last year
- Repository of paper: Position-Enhanced Visual Instruction Tuning for Multimodal Large Language Models☆37Sep 19, 2023Updated 2 years ago
- Official implementation of paper "MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens"☆864May 8, 2025Updated 9 months ago
- MultimodalC4 is a multimodal extension of c4 that interleaves millions of images with text.☆952Mar 19, 2025Updated 10 months ago
- Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing imag…☆557Apr 21, 2024Updated last year
- Findings of EMNLP 2023: InfoCL: Alleviating Catastrophic Forgetting in Continual Text Classification from An Information Theoretic Perspe…☆14Aug 13, 2024Updated last year
- Vision Large Language Models trained on M3IT instruction tuning dataset☆17Aug 16, 2023Updated 2 years ago
- paper: https://arxiv.org/abs/2307.02469 page: https://lynx-llm.github.io/☆270Aug 9, 2023Updated 2 years ago
- InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions☆2,919May 26, 2025Updated 8 months ago
- [CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback☆306Sep 11, 2024Updated last year
- MultiInstruct: Improving Multi-Modal Zero-Shot Learning via Instruction Tuning☆134Jun 20, 2023Updated 2 years ago
- [ACL 2023] Delving into the Openness of CLIP☆24Jan 11, 2023Updated 3 years ago
- (CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.☆360Jan 14, 2025Updated last year
- Source code for paper "ATP: AMRize Than Parse! Enhancing AMR Parsing with PseudoAMRs" @NAACL-2022☆14Mar 31, 2023Updated 2 years ago
- Long Context Transfer from Language to Vision☆400Mar 18, 2025Updated 10 months ago
- MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer☆248Apr 3, 2024Updated last year
- Dataset and baseline for Coling 2022 long paper (oral): "ConFiguRe: Exploring Discourse-level Chinese Figures of Speech"☆13Jul 27, 2023Updated 2 years ago
- Rationale-enhanced language models are better continual relation learners (EMNLP 2023 Main Conference)☆12Oct 11, 2023Updated 2 years ago
- [ICLR 2024 & ECCV 2024] The All-Seeing Projects: Towards Panoptic Visual Recognition&Understanding and General Relation Comprehension of …☆504Aug 9, 2024Updated last year
- An open-source framework for training large multimodal models.☆4,068Aug 31, 2024Updated last year
- An Open-source Toolkit for LLM Development☆2,804Jan 13, 2025Updated last year
- [ECCV 2024 Oral] Code for paper: An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Langua…☆553Jan 4, 2025Updated last year
- [NeurIPS2023] Official implementation of the paper "Large Language Models are Visual Reasoning Coordinators"☆105Nov 9, 2023Updated 2 years ago
- Multimodal-GPT☆1,518Jun 4, 2023Updated 2 years ago
- 🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing imp…☆3,319Mar 5, 2024Updated last year