zai-org/CogCoM

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/zai-org/CogCoM)

zai-org / CogCoM

☆222

Alternatives and similar repositories for CogCoM

Users that are interested in CogCoM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

GuangyanS / Sys2-LLaVA
View on GitHub
☆31Feb 10, 2025Updated last year
cambrian-mllm / cambrian
View on GitHub
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
☆2,008Nov 7, 2025Updated 8 months ago
Visual-Agent / DeepEyes
View on GitHub
☆1,251Nov 20, 2025Updated 8 months ago
RLHF-V / RLAIF-V
View on GitHub
[CVPR'25 highlight] RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness
☆457May 14, 2025Updated last year
penghao-wu / vstar
View on GitHub
PyTorch Implementation of "V* : Guided Visual Search as a Core Mechanism in Multimodal LLMs"
☆707Jan 7, 2024Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
njucckevin / MM-Self-Improve
View on GitHub
A Self-Training Framework for Vision-Language Reasoning
☆90Jan 23, 2025Updated last year
TencentARC / GVT
View on GitHub
Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".
☆59Jun 27, 2023Updated 3 years ago
mbzuai-oryx / groundingLMM
View on GitHub
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses tha…
☆964Aug 5, 2025Updated 11 months ago
EternityYW / Gemini-Commonsense-Evaluation
View on GitHub
Official implementation of "Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models"
☆38Jan 3, 2024Updated 2 years ago
zai-org / LVBench
View on GitHub
[ICCV 2025] LVBench: An Extreme Long Video Understanding Benchmark
☆145Jul 9, 2025Updated last year
zai-org / CogVLM
View on GitHub
a state-of-the-art-level open visual language model | 多模态预训练模型
☆6,743May 29, 2024Updated 2 years ago
OpenGVLab / all-seeing
View on GitHub
[ICLR 2024 & ECCV 2024] The All-Seeing Projects: Towards Panoptic Visual Recognition&Understanding and General Relation Comprehension of …
☆508Aug 9, 2024Updated last year
thunlp / LLaVA-UHD
View on GitHub
LLaVA-UHD v3: Progressive Visual Compression for Efficient Native-Resolution Encoding in MLLMs
☆423Jul 6, 2026Updated 2 weeks ago
Ucas-HaoranWei / Slow-Perception
View on GitHub
Official code implementation of Slow Perception:Let's Perceive Geometric Figures Step-by-step
☆163Jul 28, 2025Updated 11 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
RUCAIBox / Virgo
View on GitHub
Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*
☆110May 27, 2025Updated last year
TideDra / VL-RLHF
View on GitHub
A RLHF Infrastructure for Vision-Language Models
☆201Nov 15, 2024Updated last year
FanqingM / MM-Eureka-V0
View on GitHub
MM-Eureka V0 also called R1-Multimodal-Journey, Latest version is in MM-Eureka
☆325Jun 21, 2025Updated last year
InternLM / InternLM-XComposer
View on GitHub
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
☆2,921May 26, 2025Updated last year
deepcs233 / Visual-CoT
View on GitHub
[Neurips'24 Spotlight] Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought …
☆447Dec 22, 2024Updated last year
Wang-Xiaodong1899 / Open-R1-Video
View on GitHub
✨First Open-Source R1-like Video-LLM [2025/02/18]
☆382Jul 1, 2026Updated 3 weeks ago
dongyh20 / Insight-V
View on GitHub
[CVPR2025 Highlight] Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models
☆240Nov 7, 2025Updated 8 months ago
SHI-Labs / VCoder
View on GitHub
[CVPR 2024] VCoder: Versatile Vision Encoders for Multimodal Large Language Models
☆280Apr 17, 2024Updated 2 years ago
lupantech / MathVista
View on GitHub
MathVista: data, code, and evaluation for Mathematical Reasoning in Visual Contexts
☆367Sep 29, 2025Updated 9 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
TempleX98 / MoVA
View on GitHub
[NeurIPS 2024] MoVA: Adapting Mixture of Vision Experts to Multimodal Context
☆174Sep 25, 2024Updated last year
alibaba / conv-llava
View on GitHub
☆128Jul 29, 2024Updated last year
TideDra / lmm-r1
View on GitHub
Extend OpenRLHF to support LMM RL training for reproduction of DeepSeek-R1 on multimodal tasks.
☆848May 14, 2025Updated last year
scenarios / WeMM
View on GitHub
☆90Jul 4, 2024Updated 2 years ago
luogen1996 / LLaVA-HR
View on GitHub
[ICLR2025] LLaVA-HR: High-Resolution Large Language-Vision Assistant
☆249Aug 14, 2024Updated last year
TencentARC / ViSFT
View on GitHub
☆38Jan 20, 2024Updated 2 years ago
RUCAIBox / POPE
View on GitHub
The official GitHub page for ''Evaluating Object Hallucination in Large Vision-Language Models''
☆266Aug 21, 2025Updated 11 months ago
YuchenLiu98 / COMM
View on GitHub
Pytorch code for paper From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language Models
☆211Jan 8, 2025Updated last year
Liuziyu77 / Visual-RFT
View on GitHub
Official repository of 'Visual-RFT: Visual Reinforcement Fine-Tuning' & 'Visual-ARFT: Visual Agentic Reinforcement Fine-Tuning'’
☆2,263Oct 29, 2025Updated 8 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
SHI-Labs / CuMo
View on GitHub
CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts
☆163Jun 8, 2024Updated 2 years ago
MILVLG / imp
View on GitHub
a family of highly capabale yet efficient large multimodal models
☆194Aug 23, 2024Updated last year
PKU-YuanGroup / MoE-LLaVA
View on GitHub
【TMM 2025🔥】 Mixture-of-Experts for Large Vision-Language Models
☆2,322Jul 15, 2025Updated last year
zhaochen0110 / Awesome_Think_With_Images
View on GitHub
Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual in…
☆1,493Mar 9, 2026Updated 4 months ago
JIA-Lab-research / GroupContrast
View on GitHub
[CVPR 2024] GroupContrast: Semantic-aware Self-supervised Representation Learning for 3D Understanding
☆44Mar 15, 2024Updated 2 years ago
NVlabs / STL
View on GitHub
Official Pytorch Implementation of Self-emerging Token Labeling
☆35Mar 27, 2024Updated 2 years ago
Mini-o3 / Mini-o3
View on GitHub
Official Code for "Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search"
☆423Jan 29, 2026Updated 5 months ago