☆119Apr 8, 2026Updated this week
Alternatives and similar repositories for INF-MLLM
Users that are interested in INF-MLLM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆21Feb 29, 2024Updated 2 years ago
- ☆19Dec 6, 2023Updated 2 years ago
- The official repo of INF-34B models trained by INF Technology.☆34Jul 25, 2024Updated last year
- ☆90Jul 4, 2024Updated last year
- ☆48Feb 7, 2025Updated last year
- NordVPN Threat Protection Pro™ • AdTake your cybersecurity to the next level. Block phishing, malware, trackers, and ads. Lightweight app that works with all browsers.
- ☆23Jan 8, 2024Updated 2 years ago
- Large Multimodal Model☆15Apr 8, 2024Updated 2 years ago
- X-LLM: Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign Languages☆316Aug 10, 2023Updated 2 years ago
- Official implementation of paper ReTaKe: Reducing Temporal and Knowledge Redundancy for Long Video Understanding☆40Mar 16, 2025Updated last year
- Karras et al. (2022) diffusion models for PyTorch☆17Oct 5, 2023Updated 2 years ago
- Multi-Stage Vision Token Dropping: Towards Efficient Multimodal Large Language Model☆36Jan 8, 2025Updated last year
- ☆26Sep 26, 2025Updated 6 months ago
- An Easy-to-use, Scalable and High-performance RLHF Framework designed for Multimodal Models.☆162Updated this week
- official code for "Fox: Focus Anywhere for Fine-grained Multi-page Document Understanding"☆195May 31, 2024Updated last year
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills☆767Feb 1, 2024Updated 2 years ago
- Multimodal chatbot with computer vision capabilities integrated, our 1st-gen LMM☆101May 17, 2024Updated last year
- Data and code for paper "M3Exam: A Multilingual, Multimodal, Multilevel Benchmark for Examining Large Language Models"☆103Jun 15, 2023Updated 2 years ago
- [CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts☆337Jul 17, 2024Updated last year
- The official PyTorch code for AAAI'23 Paper "Sparse Coding in a Dual Memory System for Lifelong Learning"☆12Feb 15, 2023Updated 3 years ago
- Narrative movie understanding benchmark☆76Jun 11, 2025Updated 10 months ago
- A Faster LayoutReader Model based on LayoutLMv3, Sort OCR bboxes to reading order.☆318Aug 15, 2025Updated 7 months ago
- ☆12Feb 13, 2025Updated last year
- STABILIZING GRADIENTS FOR DEEP NEURAL NETWORKS VIA EFFICIENT SVD PARAMETERIZATION☆16Jun 5, 2018Updated 7 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- A human-annotated, fine-grained dataset for Vision-and-Language Navigation☆13Jan 20, 2022Updated 4 years ago
- Official repository of "CoMP: Continual Multimodal Pre-training for Vision Foundation Models"☆46Apr 3, 2025Updated last year
- Single source publishing for vertical writing☆11Mar 15, 2021Updated 5 years ago
- Data preprocessing for CCTA☆14May 29, 2025Updated 10 months ago
- [CVPR 2025] A Comprehensive Benchmark for Document Parsing and Evaluation☆1,630Feb 27, 2026Updated last month
- Official PyTorch implementation for paper "ProAPO: Progressively Automatic Prompt Optimization for Visual Classification". The paper is a…☆28Nov 9, 2025Updated 5 months ago
- waymo open data utils☆11Aug 29, 2020Updated 5 years ago
- Code and Data for "Characterizing Multi-Domain False News on Weibo and the Underlying User Effects"☆18Aug 24, 2022Updated 3 years ago
- official code for unigame☆19Nov 26, 2025Updated 4 months ago
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- MMPD Dataset from ECCV'2024 "When Pedestrian Detection Meets Multi-Modal Learning: Generalist Model and Benchmark Dataset"☆21Jul 15, 2024Updated last year
- ☆59Aug 7, 2023Updated 2 years ago
- DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Models☆153Jan 13, 2025Updated last year
- InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions☆2,923May 26, 2025Updated 10 months ago
- presentations for busy messy hackers☆36Jan 21, 2014Updated 12 years ago
- OCR-VQGAN, a discrete image encoder (tokenizer and detokenizer) for figure images in Paper2Fig100k dataset. Implementation of OCR Percept…☆83Jan 30, 2023Updated 3 years ago
- Code for "DAMEX: Dataset-aware Mixture-of-Experts for visual understanding of mixture-of-datasets", accepted at Neurips 2023 (Main confer…☆27Mar 29, 2024Updated 2 years ago