INF Tech's open-source MLLMs for SOTA visual-language understanding and advanced document intelligence.
☆141Apr 30, 2026Updated this week
Alternatives and similar repositories for INF-MLLM
Users that are interested in INF-MLLM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆21Feb 29, 2024Updated 2 years ago
- The official repo of INF-34B models trained by INF Technology.☆34Jul 25, 2024Updated last year
- ☆90Jul 4, 2024Updated last year
- ☆48Feb 7, 2025Updated last year
- ☆23Jan 8, 2024Updated 2 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- [ACL 2024] On the Multi-turn Instruction Following for Conversational Web Agents☆17Oct 12, 2024Updated last year
- [CVPR 2026] Accelerating Streaming Video Large Language Models via Hierarchical Token Compression☆62Feb 25, 2026Updated 2 months ago
- Large Multimodal Model☆15Apr 8, 2024Updated 2 years ago
- ☆16Apr 26, 2024Updated 2 years ago
- X-LLM: Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign Languages☆316Aug 10, 2023Updated 2 years ago
- A full codebase for replicating the results of Nougat from downloading arXiv dataset to the final evaluation. It also contains a few fixe…☆11Dec 11, 2023Updated 2 years ago
- KeyTerms centralized terminology management tool☆13Feb 7, 2019Updated 7 years ago
- Multi-Stage Vision Token Dropping: Towards Efficient Multimodal Large Language Model☆36Jan 8, 2025Updated last year
- An Easy-to-use, Scalable and High-performance RLHF Framework designed for Multimodal Models.☆163Apr 6, 2026Updated 3 weeks ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- official code for "Fox: Focus Anywhere for Fine-grained Multi-page Document Understanding"☆196May 31, 2024Updated last year
- LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills☆766Feb 1, 2024Updated 2 years ago
- Multimodal chatbot with computer vision capabilities integrated, our 1st-gen LMM☆101May 17, 2024Updated last year
- Data and code for paper "M3Exam: A Multilingual, Multimodal, Multilevel Benchmark for Examining Large Language Models"☆104Jun 15, 2023Updated 2 years ago
- [CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts☆338Jul 17, 2024Updated last year
- A Faster LayoutReader Model based on LayoutLMv3, Sort OCR bboxes to reading order.☆320Aug 15, 2025Updated 8 months ago
- ☆12Feb 13, 2025Updated last year
- Official repository of "CoMP: Continual Multimodal Pre-training for Vision Foundation Models"☆48Apr 3, 2025Updated last year
- Evaluation of the Optical Character Recognition (OCR) capabilities of GPT-4V(ision)☆126Nov 13, 2023Updated 2 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- [CVPR 2025] A Comprehensive Benchmark for Document Parsing and Evaluation☆1,683Apr 20, 2026Updated last week
- waymo open data utils☆11Aug 29, 2020Updated 5 years ago
- A full Python implementation of the ROUGE metric, especially for Chinese texts processing.☆16Nov 21, 2019Updated 6 years ago
- [CVPR'26] UniGame code implementation☆19Apr 21, 2026Updated last week
- MMPD Dataset from ECCV'2024 "When Pedestrian Detection Meets Multi-Modal Learning: Generalist Model and Benchmark Dataset"☆21Jul 15, 2024Updated last year
- ☆59Aug 7, 2023Updated 2 years ago
- DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Models☆153Jan 13, 2025Updated last year
- InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions☆2,924May 26, 2025Updated 11 months ago
- OCR-VQGAN, a discrete image encoder (tokenizer and detokenizer) for figure images in Paper2Fig100k dataset. Implementation of OCR Percept…☆83Jan 30, 2023Updated 3 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Code for "DAMEX: Dataset-aware Mixture-of-Experts for visual understanding of mixture-of-datasets", accepted at Neurips 2023 (Main confer…☆28Mar 29, 2024Updated 2 years ago
- mPLUG-Owl: The Powerful Multi-modal Large Language Model Family☆2,541Apr 2, 2025Updated last year
- ☆108Feb 16, 2021Updated 5 years ago
- ☆12Nov 17, 2023Updated 2 years ago
- a state-of-the-art-level open visual language model | 多模态预训练模型☆6,739May 29, 2024Updated last year
- Code for paper: Unified Text-to-Image Generation and Retrieval☆16Jul 6, 2024Updated last year
- Code for the NeurIPS 2020 paper "Improved analysis of clippind algorithms for non-convex optimization", including various clipping algori…☆10Feb 17, 2021Updated 5 years ago