INF Tech's open-source MLLMs for SOTA visual-language understanding and advanced document intelligence.
☆170May 15, 2026Updated last week
Alternatives and similar repositories for INF-MLLM
Users that are interested in INF-MLLM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆21Feb 29, 2024Updated 2 years ago
- ☆19Dec 6, 2023Updated 2 years ago
- ☆90Jul 4, 2024Updated last year
- ☆48Feb 7, 2025Updated last year
- ☆23Jan 8, 2024Updated 2 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- [ACL 2024] On the Multi-turn Instruction Following for Conversational Web Agents☆17Oct 12, 2024Updated last year
- FR-TSVM☆12Nov 20, 2017Updated 8 years ago
- Large Multimodal Model☆15Apr 8, 2024Updated 2 years ago
- [CVPR 2026] Accelerating Streaming Video Large Language Models via Hierarchical Token Compression☆65Feb 25, 2026Updated 2 months ago
- X-LLM: Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign Languages☆316Aug 10, 2023Updated 2 years ago
- [CVPR 2025] A Comprehensive Benchmark for Document Parsing and Evaluation☆1,742May 6, 2026Updated 2 weeks ago
- A full codebase for replicating the results of Nougat from downloading arXiv dataset to the final evaluation. It also contains a few fixe…☆11Dec 11, 2023Updated 2 years ago
- ☆41May 22, 2025Updated last year
- Karras et al. (2022) diffusion models for PyTorch☆17Oct 5, 2023Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- [NeurIPS 2023] Bilevel Coreset Selection in Continual Learning: A New Formulation and Algorithm☆15Nov 23, 2023Updated 2 years ago
- ECAI 2025☆20May 4, 2026Updated 2 weeks ago
- Multi-Stage Vision Token Dropping: Towards Efficient Multimodal Large Language Model☆36Jan 8, 2025Updated last year
- An Easy-to-use, Scalable and High-performance RLHF Framework designed for Multimodal Models.☆162Apr 6, 2026Updated last month
- official code for "Fox: Focus Anywhere for Fine-grained Multi-page Document Understanding"☆196May 31, 2024Updated last year
- LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills☆766Feb 1, 2024Updated 2 years ago
- Multimodal chatbot with computer vision capabilities integrated, our 1st-gen LMM☆101May 17, 2024Updated 2 years ago
- Data and code for paper "M3Exam: A Multilingual, Multimodal, Multilevel Benchmark for Examining Large Language Models"☆104Jun 15, 2023Updated 2 years ago
- The official code for paper "EasyGen: Easing Multimodal Generation with a Bidirectional Conditional Diffusion Model and LLMs"☆73Nov 21, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- [CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts☆339Jul 17, 2024Updated last year
- The official PyTorch code for AAAI'23 Paper "Sparse Coding in a Dual Memory System for Lifelong Learning"☆12Feb 15, 2023Updated 3 years ago
- SGLang is a fast serving framework for large language models and vision language models.☆22Updated this week
- Narrative movie understanding benchmark☆76Jun 11, 2025Updated 11 months ago
- A Faster LayoutReader Model based on LayoutLMv3, Sort OCR bboxes to reading order.☆320Aug 15, 2025Updated 9 months ago
- ☆12Feb 13, 2025Updated last year
- A human-annotated, fine-grained dataset for Vision-and-Language Navigation☆17Jan 20, 2022Updated 4 years ago
- Official repository of "CoMP: Continual Multimodal Pre-training for Vision Foundation Models"☆48Apr 3, 2025Updated last year
- Evaluation of the Optical Character Recognition (OCR) capabilities of GPT-4V(ision)☆126Nov 13, 2023Updated 2 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- I used morphing target animation to implement a system to reconstruct 2D webcam frame images to 3D facial mesh☆14Mar 7, 2017Updated 9 years ago
- ✨✨Latest Papers and Datasets on Mobile and PC GUI Agent☆156Nov 29, 2024Updated last year
- ☆19Jan 11, 2024Updated 2 years ago
- waymo open data utils☆11Aug 29, 2020Updated 5 years ago
- A full Python implementation of the ROUGE metric, especially for Chinese texts processing.☆16Nov 21, 2019Updated 6 years ago
- Code and Data for "Characterizing Multi-Domain False News on Weibo and the Underlying User Effects"☆18Aug 24, 2022Updated 3 years ago
- mPLUG-Owl: The Powerful Multi-modal Large Language Model Family☆2,541Apr 2, 2025Updated last year