linjh1118 / Awesome-MLLM-For-Games
MLLM @ Game
☆13Updated last week
Alternatives and similar repositories for Awesome-MLLM-For-Games
Users that are interested in Awesome-MLLM-For-Games are comparing it to the libraries listed below
Sorting:
- Multimodal Open-O1 (MO1) is designed to enhance the accuracy of inference models by utilizing a novel prompt-based approach. This tool wo…☆29Updated 7 months ago
- [ArXiv] V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding☆46Updated 5 months ago
- ☆56Updated 3 weeks ago
- Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*☆100Updated 2 months ago
- ✨✨R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning☆109Updated last week
- A Simple Framework of Small-scale LMMs for Video Understanding☆61Updated last week
- ☆85Updated last year
- A Self-Training Framework for Vision-Language Reasoning☆78Updated 3 months ago
- Official Repository of VideoLLaMB: Long Video Understanding with Recurrent Memory Bridges☆67Updated 2 months ago
- [MM2024, oral] "Self-Supervised Visual Preference Alignment" https://arxiv.org/abs/2404.10501☆55Updated 9 months ago
- ☆44Updated last month
- The codebase for our EMNLP24 paper: Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Mo…☆79Updated 3 months ago
- The official code of "Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs"☆61Updated last week
- [NeurIPS 2024] TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration☆24Updated 7 months ago
- VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models☆56Updated 10 months ago
- Official repository of MMDU dataset☆90Updated 7 months ago
- Unifying Visual Understanding and Generation with Dual Visual Vocabularies 🌈☆45Updated 3 weeks ago
- minisora-DiT, a DiT reproduction based on XTuner from the open source community MiniSora☆41Updated last year
- official impelmentation of Kangaroo: A Powerful Video-Language Model Supporting Long-context Video Input☆64Updated 8 months ago
- An Easy-to-use, Scalable and High-performance RLHF Framework designed for Multimodal Models.☆121Updated last month
- This repository provides an improved LLamaGen Model, fine-tuned on 500,000 high-quality images, each accompanied by over 300 token prompt…☆30Updated 6 months ago
- [CVPR 2025 Oral] VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection☆75Updated last month
- Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme☆122Updated last month
- [ACL 2024] Multi-modal preference alignment remedies regression of visual instruction tuning on language model☆44Updated 6 months ago
- TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning☆65Updated last week
- Assessing Context-Aware Creative Intelligence in MLLMs☆17Updated last month
- MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models☆34Updated last month
- LMM solved catastrophic forgetting, AAAI2025☆42Updated last month
- ☆41Updated 4 months ago
- ☆75Updated 4 months ago