scenarios/WeMM

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/scenarios/WeMM)

scenarios / WeMM

☆90

Alternatives and similar repositories for WeMM

Users that are interested in WeMM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

mynameischaos / Lion
View on GitHub
Lion: Kindling Vision Intelligence within Large Language Models
☆51Jan 25, 2024Updated 2 years ago
PCIResearch / TransCore-M
View on GitHub
Large Multimodal Model
☆15Apr 8, 2024Updated 2 years ago
buptlihang / CVLM
View on GitHub
☆23Jan 8, 2024Updated 2 years ago
JiuTian-VL / JiuTian-LION
View on GitHub
[CVPR 2024] LION: Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge
☆154Sep 3, 2025Updated 10 months ago
kyegomez / PALI3
View on GitHub
Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"
☆147Updated this week
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
RUCAIBox / ComVint
View on GitHub
The official GitHub page for ''What Makes for Good Visual Instructions? Synthesizing Complex Visual Reasoning Instructions for Visual Ins…
☆19Nov 10, 2023Updated 2 years ago
FuxiaoLiu / LRV-Instruction
View on GitHub
[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning
☆297Mar 13, 2024Updated 2 years ago
Tencent-QQMM / PureMM
View on GitHub
☆21Feb 29, 2024Updated 2 years ago
BAAI-DCAI / DataOptim
View on GitHub
A collection of visual instruction tuning datasets.
☆77Mar 14, 2024Updated 2 years ago
InternLM / InternLM-XComposer
View on GitHub
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
☆2,921May 26, 2025Updated last year
FreedomIntelligence / MLLM-Bench
View on GitHub
MLLM-Bench: Evaluating Multimodal LLMs with Per-sample Criteria
☆77Oct 16, 2024Updated last year
mightyzau / InfMLLM
View on GitHub
☆19Dec 6, 2023Updated 2 years ago
YuchenLiu98 / COMM
View on GitHub
Pytorch code for paper From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language Models
☆211Jan 8, 2025Updated last year
thunlp / Muffin
View on GitHub
☆65Feb 5, 2024Updated 2 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
OpenGVLab / all-seeing
View on GitHub
[ICLR 2024 & ECCV 2024] The All-Seeing Projects: Towards Panoptic Visual Recognition&Understanding and General Relation Comprehension of …
☆508Aug 9, 2024Updated last year
ResearchingDexter / ICDAR2019RecTS
View on GitHub
character recognition, textline recognition
☆10Aug 31, 2019Updated 6 years ago
TIGER-AI-Lab / VideoEval-Pro
View on GitHub
VideoEval-Pro: Robust and Realistic Long Video Understanding Evaluation [TMLR26]
☆15Jun 1, 2026Updated last month
luogen1996 / LLaVA-HR
View on GitHub
[ICLR2025] LLaVA-HR: High-Resolution Large Language-Vision Assistant
☆249Aug 14, 2024Updated last year
facebookresearch / flip
View on GitHub
Official Open Source code for "Scaling Language-Image Pre-training via Masking"
☆428Mar 30, 2023Updated 3 years ago
OFA-Sys / TouchStone
View on GitHub
Touchstone: Evaluating Vision-Language Models by Language Models
☆84Jan 18, 2024Updated 2 years ago
FuxiaoLiu / MMC
View on GitHub
[NAACL 2024] MMC: Advancing Multimodal Chart Understanding with LLM Instruction Tuning
☆95Jan 7, 2025Updated last year
zai-org / CogCoM
View on GitHub
☆222Jul 5, 2024Updated 2 years ago
opendatalab / HA-DPO
View on GitHub
Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization
☆104Jan 30, 2024Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
shikras / shikra
View on GitHub
☆814Jul 8, 2024Updated 2 years ago
X2FD / LVIS-INSTRUCT4V
View on GitHub
☆134Dec 22, 2023Updated 2 years ago
TencentARC-QQ / QA-CLIP
View on GitHub
Chinese CLIP models with SOTA performance.
☆63Aug 28, 2023Updated 2 years ago
Liuziyu77 / MMDU
View on GitHub
Official repository of MMDU dataset
☆108Sep 29, 2024Updated last year
cambrian-mllm / cambrian
View on GitHub
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
☆2,008Nov 7, 2025Updated 8 months ago
AILab-CVC / SEED-Bench
View on GitHub
(CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.
☆366Jan 14, 2025Updated last year
TIGER-AI-Lab / Mantis
View on GitHub
Official code for Paper "Mantis: Multi-Image Instruction Tuning" [TMLR 2024 Best Paper]
☆239Jan 3, 2026Updated 6 months ago
JIA-Lab-research / MGM
View on GitHub
Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"
☆3,329May 4, 2024Updated 2 years ago
WeChatCV / D-ORCA
View on GitHub
D-ORCA: Dialogue-Centric Optimization for Robust Audio-Visual Captioning
☆15Feb 11, 2026Updated 5 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
TencentYoutuResearch / PersonRetrieval-IVT
View on GitHub
Code for ECCV 2022 Workshop paper "See Finer, See More: Implicit Modality Alignment for Text-based Person Retrieval"
☆23Nov 16, 2025Updated 8 months ago
TideDra / VL-RLHF
View on GitHub
A RLHF Infrastructure for Vision-Language Models
☆201Nov 15, 2024Updated last year
X-PLUG / mPLUG-Owl
View on GitHub
mPLUG-Owl: The Powerful Multi-modal Large Language Model Family
☆2,537Apr 2, 2025Updated last year
Fancy-MLLM / R1-Onevision
View on GitHub
R1-onevision, a visual language model capable of deep CoT reasoning.
☆581Apr 13, 2025Updated last year
VITA-MLLM / Woodpecker
View on GitHub
✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models
☆649Dec 23, 2024Updated last year
mbzuai-oryx / groundingLMM
View on GitHub
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses tha…
☆964Aug 5, 2025Updated 11 months ago
Feynben / ADAS
View on GitHub
A Simple Active-and-Adaptive Baseline for Cross-Domain 3D Semantic Segmentation
☆13Dec 22, 2022Updated 3 years ago