☆116Feb 26, 2026Updated last week
Alternatives and similar repositories for INF-MLLM
Users that are interested in INF-MLLM are comparing it to the libraries listed below
Sorting:
- ☆88Jul 4, 2024Updated last year
- ☆23Jan 8, 2024Updated 2 years ago
- ☆48Feb 7, 2025Updated last year
- Karras et al. (2022) diffusion models for PyTorch☆17Oct 5, 2023Updated 2 years ago
- SGLang is a fast serving framework for large language models and vision language models.☆19Updated this week
- ☆15Apr 26, 2024Updated last year
- python 图像处理 以图搜图 无损压缩☆11Dec 20, 2018Updated 7 years ago
- Code for paper: Unified Text-to-Image Generation and Retrieval☆16Jul 6, 2024Updated last year
- I used morphing target animation to implement a system to reconstruct 2D webcam frame images to 3D facial mesh☆14Mar 7, 2017Updated 9 years ago
- Large Multimodal Model☆15Apr 8, 2024Updated last year
- An simple web/API framework for individual developers.☆19Oct 8, 2025Updated 5 months ago
- [ACL 2024] On the Multi-turn Instruction Following for Conversational Web Agents☆17Oct 12, 2024Updated last year
- [CVPR 2026] Accelerating Streaming Video Large Language Models via Hierarchical Token Compression☆45Feb 25, 2026Updated last week
- Multimodal chatbot with computer vision capabilities integrated, our 1st-gen LMM☆101May 17, 2024Updated last year
- [CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts☆336Jul 17, 2024Updated last year
- official code for "Fox: Focus Anywhere for Fine-grained Multi-page Document Understanding"☆195May 31, 2024Updated last year
- X-LLM: Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign Languages☆316Aug 10, 2023Updated 2 years ago
- LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills☆763Feb 1, 2024Updated 2 years ago
- A Faster LayoutReader Model based on LayoutLMv3, Sort OCR bboxes to reading order.☆314Aug 15, 2025Updated 6 months ago
- Code for "DAMEX: Dataset-aware Mixture-of-Experts for visual understanding of mixture-of-datasets", accepted at Neurips 2023 (Main confer…☆27Mar 29, 2024Updated last year
- Forked vLLM that supports higgs-audio model☆43Oct 27, 2025Updated 4 months ago
- Lion: Kindling Vision Intelligence within Large Language Models☆51Jan 25, 2024Updated 2 years ago
- Data and code for paper "M3Exam: A Multilingual, Multimodal, Multilevel Benchmark for Examining Large Language Models"☆103Jun 15, 2023Updated 2 years ago
- Official repository of "CoMP: Continual Multimodal Pre-training for Vision Foundation Models"☆45Apr 3, 2025Updated 11 months ago
- This repository contains information on the creation, evaluation, and benchmark models for the L+M-24 Dataset. L+M-24 will be featured as…☆30Jan 23, 2025Updated last year
- Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models (CVPR 2024 Highlight)☆1,948Jan 24, 2026Updated last month
- Scaling Multi-modal Instruction Fine-tuning with Tens of Thousands Vision Task Types☆32Jul 16, 2025Updated 7 months ago
- (AAAI 2024) BLIVA: A Simple Multimodal LLM for Better Handling of Text-rich Visual Questions☆260Apr 14, 2024Updated last year
- ☆39Jul 25, 2024Updated last year
- ☆49Apr 11, 2025Updated 10 months ago
- ☆18Sep 23, 2025Updated 5 months ago
- mPLUG-Owl: The Powerful Multi-modal Large Language Model Family☆2,540Apr 2, 2025Updated 11 months ago
- [ICLR 2025 Spotlight] OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text☆413May 5, 2025Updated 10 months ago
- OCR-VQGAN, a discrete image encoder (tokenizer and detokenizer) for figure images in Paper2Fig100k dataset. Implementation of OCR Percept…☆83Jan 30, 2023Updated 3 years ago
- 基于baichuan-7b的开源多模态大语言模型☆72Dec 7, 2023Updated 2 years ago
- Dataset and evaluation script for "Evaluating Hallucinations in Chinese Large Language Models"☆136Jun 5, 2024Updated last year
- Fast LLM Training CodeBase With dynamic strategy choosing [Deepspeed+Megatron+FlashAttention+CudaFusionKernel+Compiler];☆40Jan 4, 2024Updated 2 years ago
- a state-of-the-art-level open visual language model | 多模态预训练模型☆6,724May 29, 2024Updated last year
- official code for unigame☆19Nov 26, 2025Updated 3 months ago