infly-ai / INF-MLLMView external linksLinks
☆114Jan 9, 2026Updated last month
Alternatives and similar repositories for INF-MLLM
Users that are interested in INF-MLLM are comparing it to the libraries listed below
Sorting:
- ☆21Feb 29, 2024Updated last year
- The official repo of INF-34B models trained by INF Technology.☆34Jul 25, 2024Updated last year
- ☆88Jul 4, 2024Updated last year
- ☆23Jan 8, 2024Updated 2 years ago
- ☆48Feb 7, 2025Updated last year
- FR-TSVM☆12Nov 20, 2017Updated 8 years ago
- SGLang is a fast serving framework for large language models and vision language models.☆18Feb 7, 2026Updated last week
- python 图像处理 以图搜图 无损压缩☆11Dec 20, 2018Updated 7 years ago
- ☆15Apr 26, 2024Updated last year
- Code for paper: Unified Text-to-Image Generation and Retrieval☆16Jul 6, 2024Updated last year
- An simple web/API framework for individual developers.☆19Oct 8, 2025Updated 4 months ago
- Accelerating Streaming Video Large Language Models via Hierarchical Token Compression☆42Jan 6, 2026Updated last month
- Large Multimodal Model☆15Apr 8, 2024Updated last year
- [ACL 2024] On the Multi-turn Instruction Following for Conversational Web Agents☆17Oct 12, 2024Updated last year
- Multimodal chatbot with computer vision capabilities integrated, our 1st-gen LMM☆101May 17, 2024Updated last year
- MMPD Dataset from ECCV'2024 "When Pedestrian Detection Meets Multi-Modal Learning: Generalist Model and Benchmark Dataset"☆21Jul 15, 2024Updated last year
- [CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts☆336Jul 17, 2024Updated last year
- official code for "Fox: Focus Anywhere for Fine-grained Multi-page Document Understanding"☆195May 31, 2024Updated last year
- X-LLM: Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign Languages☆316Aug 10, 2023Updated 2 years ago
- LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills☆763Feb 1, 2024Updated 2 years ago
- A Faster LayoutReader Model based on LayoutLMv3, Sort OCR bboxes to reading order.☆306Aug 15, 2025Updated 5 months ago
- Code for "DAMEX: Dataset-aware Mixture-of-Experts for visual understanding of mixture-of-datasets", accepted at Neurips 2023 (Main confer…☆27Mar 29, 2024Updated last year
- Forked vLLM that supports higgs-audio model☆42Oct 27, 2025Updated 3 months ago
- ☆29May 13, 2024Updated last year
- Lion: Kindling Vision Intelligence within Large Language Models☆51Jan 25, 2024Updated 2 years ago
- This repository contains information on the creation, evaluation, and benchmark models for the L+M-24 Dataset. L+M-24 will be featured as…☆30Jan 23, 2025Updated last year
- ☆33Dec 18, 2023Updated 2 years ago
- Official implementation of the CVPR paper Open-TransMind: A New Baseline and Benchmark for 1st Foundation Model Challenge of Intelligent …☆28Jun 4, 2023Updated 2 years ago
- Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models (CVPR 2024 Highlight)☆1,947Jan 24, 2026Updated 3 weeks ago
- Scaling Multi-modal Instruction Fine-tuning with Tens of Thousands Vision Task Types☆33Jul 16, 2025Updated 6 months ago
- (AAAI 2024) BLIVA: A Simple Multimodal LLM for Better Handling of Text-rich Visual Questions☆260Apr 14, 2024Updated last year
- ☆48Apr 11, 2025Updated 10 months ago
- 트랜스포머 블록을 활용한 상품명 자연어 처리 기반 카테고리 분류 모델☆10Dec 5, 2022Updated 3 years ago
- ☆16Sep 23, 2025Updated 4 months ago
- mPLUG-Owl: The Powerful Multi-modal Large Language Model Family☆2,537Apr 2, 2025Updated 10 months ago
- [ICLR 2025 Spotlight] OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text☆412May 5, 2025Updated 9 months ago
- InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions☆2,919May 26, 2025Updated 8 months ago
- Multi-Stage Vision Token Dropping: Towards Efficient Multimodal Large Language Model☆37Jan 8, 2025Updated last year
- 基于baichuan-7b的开源多模态大语言模型☆72Dec 7, 2023Updated 2 years ago