linjh1118 / Awesome-MLLM-For-Games
MLLM @ Game
☆11Updated 3 weeks ago
Alternatives and similar repositories for Awesome-MLLM-For-Games:
Users that are interested in Awesome-MLLM-For-Games are comparing it to the libraries listed below
- Multimodal Open-O1 (MO1) is designed to enhance the accuracy of inference models by utilizing a novel prompt-based approach. This tool wo…☆29Updated 7 months ago
- ☆51Updated last week
- Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*☆100Updated 2 months ago
- minisora-DiT, a DiT reproduction based on XTuner from the open source community MiniSora☆41Updated last year
- ☆32Updated 3 months ago
- [ArXiv] V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding☆44Updated 4 months ago
- Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme☆115Updated 2 weeks ago
- This repository provides an improved LLamaGen Model, fine-tuned on 500,000 high-quality images, each accompanied by over 300 token prompt…☆30Updated 6 months ago
- ☆83Updated 11 months ago
- A Framework for Decoupling and Assessing the Capabilities of VLMs☆42Updated 9 months ago
- ☆73Updated last year
- ☆40Updated 3 months ago
- LLaVA combines with Magvit Image tokenizer, training MLLM without an Vision Encoder. Unifying image understanding and generation.☆38Updated 10 months ago
- Official code implementation of Slow Perception:Let's Perceive Geometric Figures Step-by-step☆125Updated 2 months ago
- OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation, arXiv 2024☆58Updated 2 months ago
- LMM solved catastrophic forgetting, AAAI2025☆40Updated last week
- ☆86Updated 2 weeks ago
- Our 2nd-gen LMM☆33Updated 11 months ago
- ☆33Updated 2 months ago
- Official repository of MMDU dataset☆89Updated 6 months ago
- ☆73Updated 3 months ago
- ☆40Updated 2 weeks ago
- [CVPR 2025] Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training☆35Updated 3 weeks ago
- A light-weight and high-efficient training framework for accelerating diffusion tasks.☆46Updated 7 months ago
- [NeurIPS 2024] TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration☆23Updated 6 months ago
- OpenVLThinker: An Early Exploration to Vision-Language Reasoning via Iterative Self-Improvement☆71Updated 3 weeks ago
- OpenMMLab Detection Toolbox and Benchmark for V3Det☆15Updated last year
- official code for "BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning"☆35Updated 3 months ago
- Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.☆18Updated 2 years ago
- The official repo for "VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search"☆24Updated last month