om-ai-lab / OmModel
A collection of strong multimodal models for building multimodal AGI agents
☆38Updated 6 months ago
Alternatives and similar repositories for OmModel:
Users that are interested in OmModel are comparing it to the libraries listed below
- Empirical Study Towards Building An Effective Multi-Modal Large Language Model☆23Updated last year
- A Framework for Decoupling and Assessing the Capabilities of VLMs☆40Updated 7 months ago
- ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration☆22Updated 3 weeks ago
- Offical Code for GPT4Video: A Unified Multimodal Large Language Model for lnstruction-Followed Understanding and Safety-Aware Generation☆134Updated 3 months ago
- Representing Rule-based Chatbots with Transformers☆19Updated 6 months ago
- ☆73Updated 10 months ago
- 1.4B sLLM for Chinese and English - HammerLLM🔨☆44Updated 9 months ago
- Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models☆56Updated 2 months ago
- XVERSE-MoE-A36B: A multilingual large language model developed by XVERSE Technology Inc.☆37Updated 4 months ago
- The official repository for "2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining"☆132Updated last week
- Valley is a cutting-edge multimodal large model designed to handle a variety of tasks involving text, images, and video data.☆197Updated last week
- ☆32Updated 8 months ago
- ☆36Updated 4 months ago
- ☆17Updated last year
- [ICLR'24 spotlight] Tool-Augmented Reward Modeling☆44Updated last month
- ☆19Updated 2 months ago
- Pytorch Implementation of the Model from "MIRASOL3B: A MULTIMODAL AUTOREGRESSIVE MODEL FOR TIME-ALIGNED AND CONTEXTUAL MODALITIES"☆26Updated this week
- Enable Next-sentence Prediction for Large Language Models with Faster Speed, Higher Accuracy and Longer Context☆25Updated 5 months ago
- [IJCAI 2024] CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and Reasoning☆23Updated 11 months ago
- ☆27Updated 4 months ago
- Official Repository of VideoLLaMB: Long Video Understanding with Recurrent Memory Bridges☆61Updated 4 months ago
- Multimodal Open-O1 (MO1) is designed to enhance the accuracy of inference models by utilizing a novel prompt-based approach. This tool wo…☆29Updated 4 months ago
- A Simple MLLM Surpassed QwenVL-Max with OpenSource Data Only in 14B LLM.☆36Updated 4 months ago
- Hammer: Robust Function-Calling for On-Device Language Models via Function Masking☆47Updated 2 weeks ago
- A suite of multimodal language models that are powerful and efficient☆17Updated 2 weeks ago
- ☆27Updated 5 months ago
- ✨✨Latest Papers and Datasets on Mobile and PC GUI Agent☆95Updated 2 months ago
- 🎉 The code repository for "Parrot: Multilingual Visual Instruction Tuning" in PyTorch.☆35Updated 5 months ago
- Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs☆72Updated 3 months ago