om-ai-lab / OmChat
A suite of multimodal language models that are powerful and efficient
☆17Updated 2 weeks ago
Alternatives and similar repositories for OmChat:
Users that are interested in OmChat are comparing it to the libraries listed below
- A collection of strong multimodal models for building multimodal AGI agents☆38Updated 6 months ago
- ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration☆22Updated 3 weeks ago
- A Comprehensive Evaluation Benchmark for Open-Vocabulary Detection (AAAI 2024)☆41Updated 8 months ago
- RLAIF-V: Aligning MLLMs through Open-Source AI Feedback for Super GPT-4V Trustworthiness☆284Updated last month
- Research Code for Multimodal-Cognition Team in Ant Group☆133Updated 6 months ago
- Dingo: A Comprehensive Data Quality Evaluation Tool☆32Updated last week
- Personal Project: MPP-Qwen14B & MPP-Qwen-Next(Multimodal Pipeline Parallel based on Qwen-LM). Support [video/image/multi-image] {sft/conv…☆395Updated last month
- Repo for Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent☆209Updated 2 weeks ago
- The huggingface implementation of Fine-grained Late-interaction Multi-modal Retriever.☆80Updated this week
- [ACM'MM 2024 Oral] Official code for "OneChart: Purify the Chart Structural Extraction via One Auxiliary Token"☆213Updated last month
- Valley is a cutting-edge multimodal large model designed to handle a variety of tasks involving text, images, and video data.☆197Updated last week
- Dataset and Code for our ACL 2024 paper: "Multimodal Table Understanding". We propose the first large-scale Multimodal IFT and Pre-Train …☆183Updated 4 months ago
- 一些大语言模型和多模态模型的应用,主要包括Rag,小模型,Agent,跨模态搜索,OCR等等☆147Updated 2 months ago
- ☆73Updated 10 months ago
- Code for ChatRex: Taming Multimodal LLM for Joint Perception and Understanding☆126Updated last week
- Reverse Chain-of-Thought Problem Generation for Geometric Reasoning in Large Multimodal Models☆165Updated 2 months ago
- ☆167Updated last month
- Align Anything: Training All-modality Model with Feedback☆938Updated last week
- [NeurIPS 2024] Needle In A Multimodal Haystack (MM-NIAH): A comprehensive benchmark designed to systematically evaluate the capability of…☆109Updated 2 months ago
- This repo contains the code for "VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks" [ICLR25]☆119Updated this week
- This is the official implementation of our paper "Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension"☆100Updated this week
- The official repository for "2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining"☆132Updated last week
- [ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization☆531Updated 7 months ago
- An open-source implementation for training LLaVA-NeXT.☆375Updated 3 months ago
- OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text☆305Updated 2 months ago
- official code for "Fox: Focus Anywhere for Fine-grained Multi-page Document Understanding"☆136Updated 8 months ago
- [CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback☆259Updated 4 months ago
- [CVPR 2024] LION: Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge☆128Updated 6 months ago
- Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*☆80Updated 2 weeks ago
- Long Context Transfer from Language to Vision☆359Updated 2 months ago