om-ai-lab / OmChat
A suite of multimodal language models that are powerful and efficient
☆16Updated 2 months ago
Related projects ⓘ
Alternatives and complementary repositories for OmChat
- A Comprehensive Evaluation Benchmark for Open-Vocabulary Detection (AAAI 2024)☆38Updated 6 months ago
- A collection of strong multimodal models for building multimodal AGI agents☆38Updated 4 months ago
- ☆74Updated 8 months ago
- [ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization☆566Updated 5 months ago
- [NeurIPS 2024] Needle In A Multimodal Haystack (MM-NIAH): A comprehensive benchmark designed to systematically evaluate the capability of…☆102Updated last month
- RLAIF-V: Aligning MLLMs through Open-Source AI Feedback for Super GPT-4V Trustworthiness☆246Updated 2 weeks ago
- Research Code for Multimodal-Cognition Team in Ant Group☆123Updated 4 months ago
- Vary-tiny codebase upon LAVIS (for training from scratch)and a PDF image-text pairs data (about 600k including English/Chinese)☆68Updated 2 months ago
- Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models☆53Updated 3 weeks ago
- Official repository of MMDU dataset☆75Updated last month
- The code for "TokenPacker: Efficient Visual Projector for Multimodal LLM".☆215Updated last month
- GroundVLP: Harnessing Zero-shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection (AAAI 2024)☆58Updated 10 months ago
- A Multimodal Native Agent Framework for Smart Hardware and More☆1,336Updated this week
- An open-source implementation for training LLaVA-NeXT.☆397Updated last month
- official code for "Fox: Focus Anywhere for Fine-grained Multi-page Document Understanding"☆129Updated 5 months ago
- Making LLaVA Tiny via MoE-Knowledge Distillation☆63Updated last month
- ☆69Updated 6 months ago
- OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text☆274Updated last week
- 一些大语言模型和多模态模型的应用,主要包括Rag,小模型,Agent,跨模态搜索,OCR等等☆124Updated 2 weeks ago
- ☆23Updated 3 months ago
- Official Repository of ChartX & ChartVLM: A Versatile Benchmark and Foundation Model for Complicated Chart Reasoning☆211Updated last month
- The huggingface implementation of Fine-grained Late-interaction Multi-modal Retriever.☆69Updated 2 months ago
- This repo contains the code and data for "VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks"☆77Updated this week
- DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Models☆134Updated 2 months ago
- [ECCV 2024 Oral] Code for paper: An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Langua…☆277Updated 3 months ago
- ☆127Updated 9 months ago
- [CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback☆236Updated 2 months ago
- MLLM for On-Demand Spatial-Temporal Understanding at Arbitrary Resolution☆291Updated last week
- [ECCV 2024] Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?☆149Updated 2 months ago
- Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model☆246Updated 5 months ago