FudanNLPLAB/MouSi

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/FudanNLPLAB/MouSi)

FudanNLPLAB / MouSi

☆75

Alternatives and similar repositories for MouSi

Users that are interested in MouSi are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

RUCAIBox / ComVint
View on GitHub
The official GitHub page for ''What Makes for Good Visual Instructions? Synthesizing Complex Visual Reasoning Instructions for Visual Ins…
☆19Nov 10, 2023Updated 2 years ago
jiaangli / VILA
View on GitHub
[TACL/EMNLP'24] Do Vision and Language Models Share Concepts? A Vector Space Alignment Study
☆16Nov 22, 2024Updated last year
palchenli / VL-Instruction-Tuning
View on GitHub
☆90Nov 25, 2023Updated 2 years ago
SparksJoe / Prism
View on GitHub
A Framework for Decoupling and Assessing the Capabilities of VLMs
☆44Jun 28, 2024Updated 2 years ago
mlvlab / ProMetaR
View on GitHub
Official implementation of CVPR 2024 paper "Prompt Learning via Meta-Regularization".
☆31Mar 10, 2025Updated last year
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
HashmatShadab / MambaRobustness
View on GitHub
[CVPRW 2025] Official repository of paper titled "Towards Evaluating the Robustness of Visual State Space Models"
☆26Jun 8, 2025Updated last year
facebookresearch / DCI
View on GitHub
Densely Captioned Images (DCI) dataset repository.
☆197Jul 1, 2024Updated 2 years ago
QUVA-Lab / PIN
View on GitHub
Official code repo of PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs
☆26Jan 14, 2025Updated last year
tsb0601 / MMVP
View on GitHub
☆364Jan 27, 2024Updated 2 years ago
OpenGVLab / MM-Interleaved
View on GitHub
MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer
☆255Apr 3, 2024Updated 2 years ago
TRI-ML / vlm-evaluation
View on GitHub
VLM Evaluation: Benchmark for VLMs, spanning text generation tasks from VQA to Captioning
☆139Sep 17, 2024Updated last year
fxmeng / mixtral_spliter
View on GitHub
Converting Mixtral-8x7B to Mixtral-[1~7]x7B
☆22Mar 4, 2024Updated 2 years ago
PKU-YuanGroup / MoE-LLaVA
View on GitHub
【TMM 2025🔥】 Mixture-of-Experts for Large Vision-Language Models
☆2,322Jul 15, 2025Updated last year
google / imageinwords
View on GitHub
Data release for the ImageInWords (IIW) paper.
☆224Nov 17, 2024Updated last year
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
RLHF-V / RLHF-V
View on GitHub
[CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback
☆310Sep 11, 2024Updated last year
codezakh / LilT
View on GitHub
[ICLR 23] Contrastive Aligned of Vision to Language Through Parameter-Efficient Transfer Learning
☆40Jul 29, 2023Updated 2 years ago
TencentARC / GVT
View on GitHub
Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".
☆59Jun 27, 2023Updated 3 years ago
foundation-multimodal-models / CAL
View on GitHub
[NeurIPS'24] Official PyTorch Implementation of Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment
☆58Sep 26, 2024Updated last year
RUCAIBox / Virgo
View on GitHub
Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*
☆110May 27, 2025Updated last year
khanrc / honeybee
View on GitHub
Official implementation of project Honeybee (CVPR 2024)
☆468May 10, 2024Updated 2 years ago
penghao-wu / vstar
View on GitHub
PyTorch Implementation of "V* : Guided Visual Search as a Core Mechanism in Multimodal LLMs"
☆707Jan 7, 2024Updated 2 years ago
JIA-Lab-research / LLaMA-VID
View on GitHub
LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models (ECCV 2024)
☆861Jul 29, 2024Updated last year
opendatalab / VIGC
View on GitHub
AAAI 2024: Visual Instruction Generation and Correction
☆97Feb 4, 2024Updated 2 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
CASIA-IVA-Lab / VideoNIAH
View on GitHub
VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs
☆57Mar 9, 2025Updated last year
Terry-Xu-666 / visual_inference_chain
View on GitHub
This repository contains the official code for our paper: Thinking Before Looking: Improving Multimodal LLM Reasoning via Mitigating Visu…
☆25Nov 15, 2024Updated last year
TRI-ML / prismatic-vlms
View on GitHub
A flexible and efficient codebase for training visually-conditioned language models (VLMs)
☆1,005Jul 4, 2024Updated 2 years ago
IdoAmos / not-from-scratch
View on GitHub
☆33Oct 22, 2024Updated last year
ChenShawn / MultiModal-Jupyter-Sandbox
View on GitHub
Simple code sandbox supporting jupyter notebook style code execution. Used for agent training
☆24Dec 5, 2025Updated 7 months ago
mbzuai-oryx / Video-LLaVA
View on GitHub
PG-Video-LLaVA: Pixel Grounding in Large Multimodal Video Models
☆264Aug 5, 2025Updated 11 months ago
mcahny / rovit
View on GitHub
RO-ViT CVPR 2023 "Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision Transformers"
☆17Aug 24, 2023Updated 2 years ago
jfkuang / CFAM
View on GitHub
Contrast-guided Feature Adjustment Module for Visual Information Extraction
☆30May 23, 2023Updated 3 years ago
llmeval / LLMEval-2
View on GitHub
[AAAI 2024] LLMEval Phase II dataset — professional domain evaluation across 12 academic disciplines
☆71May 21, 2026Updated 2 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
TempleX98 / MoVA
View on GitHub
[NeurIPS 2024] MoVA: Adapting Mixture of Vision Experts to Multimodal Context
☆174Sep 25, 2024Updated last year
zjunlp / TRICE
View on GitHub
[NAACL 2024] Making Language Models Better Tool Learners with Execution Feedback
☆43Mar 14, 2024Updated 2 years ago
jy0205 / LaVIT
View on GitHub
LaVIT: Empower the Large Language Model to Understand and Generate Visual Content
☆603Oct 6, 2024Updated last year
gordonhu608 / MQT-LLaVA
View on GitHub
[NeurIPS 2024] Matryoshka Query Transformer for Large Vision-Language Models
☆126Jul 1, 2024Updated 2 years ago
zepingyu0512 / arithmetic-mechanism
View on GitHub
code for EMNLP 2024 paper: Interpreting Arithmetic Mechanism in Large Language Models through Comparative Neuron Analysis
☆12Nov 17, 2024Updated last year
Osilly / dynamic_llava
View on GitHub
[ICLR 2025] The official pytorch implement of "Dynamic-LLaVA: Efficient Multimodal Large Language Models via Dynamic Vision-language Cont…
☆72Sep 18, 2025Updated 10 months ago
UX-Decoder / FIND
View on GitHub
[NeurIPS 2024] Official implementation of the paper "Interfacing Foundation Models' Embeddings"
☆132Aug 21, 2024Updated last year