om-ai-lab / OmChat

A suite of multimodal language models that are powerful and efficient

☆17

Alternatives and similar repositories for OmChat:

Users that are interested in OmChat are comparing it to the libraries listed below

om-ai-lab / OmModel
A collection of strong multimodal models for building multimodal AGI agents
☆38Updated 6 months ago
om-ai-lab / ZoomEye
ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration
☆22Updated 3 weeks ago
om-ai-lab / OVDEval
A Comprehensive Evaluation Benchmark for Open-Vocabulary Detection (AAAI 2024)
☆41Updated 8 months ago
RLHF-V / RLAIF-V
RLAIF-V: Aligning MLLMs through Open-Source AI Feedback for Super GPT-4V Trustworthiness
☆284Updated last month
alipay / Ant-Multi-Modal-Framework
Research Code for Multimodal-Cognition Team in Ant Group
☆133Updated 6 months ago
DataEval / dingo
Dingo: A Comprehensive Data Quality Evaluation Tool
☆32Updated last week
Coobiw / MPP-LLaVA
Personal Project: MPP-Qwen14B & MPP-Qwen-Next(Multimodal Pipeline Parallel based on Qwen-LM). Support [video/image/multi-image] {sft/conv…
☆395Updated last month
Alibaba-NLP / OmniSearch
Repo for Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent
☆209Updated 2 weeks ago
LinWeizheDragon / FLMR
The huggingface implementation of Fine-grained Late-interaction Multi-modal Retriever.
☆80Updated this week
LingyvKong / OneChart
[ACM'MM 2024 Oral] Official code for "OneChart: Purify the Chart Structural Extraction via One Auxiliary Token"
☆213Updated last month
bytedance / Valley
Valley is a cutting-edge multimodal large model designed to handle a variety of tasks involving text, images, and video data.
☆197Updated last week
SpursGoZmy / Table-LLaVA
Dataset and Code for our ACL 2024 paper: "Multimodal Table Understanding". We propose the first large-scale Multimodal IFT and Pre-Train …
☆183Updated 4 months ago
LDLINGLINGLING / adan_application
一些大语言模型和多模态模型的应用,主要包括Rag，小模型，Agent，跨模态搜索，OCR等等
☆147Updated 2 months ago
FudanNLPLAB / MouSi
☆73Updated 10 months ago
IDEA-Research / ChatRex
Code for ChatRex: Taming Multimodal LLM for Joint Perception and Understanding
☆126Updated last week
dle666 / R-CoT
Reverse Chain-of-Thought Problem Generation for Geometric Reasoning in Large Multimodal Models
☆165Updated 2 months ago
WePOINTS / WePOINTS
☆167Updated last month
PKU-Alignment / align-anything
Align Anything: Training All-modality Model with Feedback
☆938Updated last week
OpenGVLab / MM-NIAH
[NeurIPS 2024] Needle In A Multimodal Haystack (MM-NIAH): A comprehensive benchmark designed to systematically evaluate the capability of…
☆109Updated 2 months ago
TIGER-AI-Lab / VLM2Vec
This repo contains the code for "VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks" [ICLR25]
☆119Updated this week
Leon1207 / Video-RAG-master
This is the official implementation of our paper "Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension"
☆100Updated this week
DAMO-NLP-SG / multimodal_textbook
The official repository for "2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining"
☆132Updated last week
FoundationVision / Groma
[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization
☆531Updated 7 months ago
xiaoachen98 / Open-LLaVA-NeXT
An open-source implementation for training LLaVA-NeXT.
☆375Updated 3 months ago
OpenGVLab / OmniCorpus
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
☆305Updated 2 months ago
ucaslcl / Fox
official code for "Fox: Focus Anywhere for Fine-grained Multi-page Document Understanding"
☆136Updated 8 months ago
RLHF-V / RLHF-V
[CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback
☆259Updated 4 months ago
JiuTian-VL / JiuTian-LION
[CVPR 2024] LION: Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge
☆128Updated 6 months ago
RUCAIBox / Virgo
Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*
☆80Updated 2 weeks ago
EvolvingLMMs-Lab / LongVA
Long Context Transfer from Language to Vision
☆359Updated 2 months ago