Mixture-AI / meta-llama-explainLinks
Explanation of the llama2 repo.
☆10Updated last year
Alternatives and similar repositories for meta-llama-explain
Users that are interested in meta-llama-explain are comparing it to the libraries listed below
Sorting:
- Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing☆64Updated last month
- [Survey] Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey☆448Updated 7 months ago
- Data and Code for CVPR 2025 paper "MMVU: Measuring Expert-Level Multi-Discipline Video Understanding"☆72Updated 6 months ago
- Interleaving Reasoning: Next-Generation Reasoning Systems for AGI☆136Updated last month
- Modified LLaVA framework for MOSS2, and makes MOSS2 a multimodal model.☆13Updated 11 months ago
- ☆59Updated last month
- [Preprint] On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification.☆419Updated 2 weeks ago
- This is for ACL 2025 Findings Paper: From Specific-MLLMs to Omni-MLLMs: A Survey on MLLMs Aligned with Multi-modalitiesModels☆52Updated last month
- A paper list for spatial reasoning☆136Updated 2 months ago
- A Massive Multi-Discipline Lecture Understanding Benchmark☆30Updated 2 months ago
- Official repository of 'ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing’☆53Updated 2 months ago
- ☆54Updated this week
- ☆67Updated last month
- 📖 This is a repository for organizing papers, codes, and other resources related to unified multimodal models.☆281Updated 3 weeks ago
- code for "Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization"☆59Updated last year
- A Collection of Papers on Diffusion Language Models☆119Updated last week
- MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning☆131Updated last year
- AlphaEdit: Null-Space Constrained Knowledge Editing for Language Models, ICLR 2025 (Outstanding Paper)☆314Updated last month
- [ICCV 2025] The official code of the paper "Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration R…☆106Updated last month
- VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models☆72Updated last year
- An Arena-style Automated Evaluation Benchmark for Detailed Captioning☆55Updated 3 months ago
- Official PyTorch implementation of EMOVA in CVPR 2025 (https://arxiv.org/abs/2409.18042)☆66Updated 5 months ago
- Recent Advances on MLLM's Reasoning Ability☆25Updated 4 months ago
- [ECCV 2024] ShareGPT4V: Improving Large Multi-modal Models with Better Captions☆233Updated last year
- VLM2-Bench [ACL 2025 Main]: A Closer Look at How Well VLMs Implicitly Link Explicit Matching Visual Cues☆41Updated 3 months ago
- Official repository of MMDU dataset☆93Updated 11 months ago
- MMR1: Advancing the Frontiers of Multimodal Reasoning☆163Updated 5 months ago
- [CVPR 2025] 🔥 Official impl. of "TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation".☆374Updated 3 weeks ago
- ☆111Updated 5 months ago
- MM-PRM: Enhancing Multimodal Mathematical Reasoning with Scalable Step-Level Supervision☆25Updated 3 months ago