tyfeld / MMaDA-ParallelLinks
Official Implementation of "MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation"
☆278Updated last month
Alternatives and similar repositories for MMaDA-Parallel
Users that are interested in MMaDA-Parallel are comparing it to the libraries listed below
Sorting:
- The official github repo for "Diffusion Language Models are Super Data Learners".☆212Updated last month
- Official PyTorch implementation of TokenSet.☆127Updated 9 months ago
- [MTI-LLM@NeurIPS 2025] Official implementation of "PyVision: Agentic Vision with Dynamic Tooling."☆140Updated 5 months ago
- We introduce CausalVQA, a benchmark dataset for video question answering (VQA) composed of question-answer pairs that probe models’ under…☆51Updated 4 months ago
- ☆105Updated 6 months ago
- [ICML 2025] This is the official repository of our paper "What If We Recaption Billions of Web Images with LLaMA-3 ?"☆146Updated last year
- ☆139Updated last week
- Official implementation of "Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs".☆94Updated last month
- Official PyTorch implementation and models for paper "Diffusion Beats Autoregressive in Data-Constrained Settings". We find diffusion mod…☆115Updated last month
- [ACL2025 Oral & Award] Evaluate Image/Video Generation like Humans - Fast, Explainable, Flexible☆113Updated 4 months ago
- The open-source code of MetaStone-S1.☆106Updated 4 months ago
- Model Merging with Functional Dual Anchors☆44Updated last month
- Official implementation of Bifrost-1: Bridging Multimodal LLMs and Diffusion Models with Patch-level CLIP Latents (NeurIPS 2025)☆43Updated last month
- ☆79Updated last month
- Pivotal Token Search☆135Updated this week
- Official codes of "Monet: Reasoning in Latent Visual Space Beyond Image and Language"☆85Updated last week
- [NeurIPS 2025] Elevating Visual Perception in Multimodal LLMs with Visual Embedding Distillation☆68Updated 2 months ago
- Multimodal Representation Alignment for Image Generation: Text-Image Interleaved Control Is Easier Than You Think!☆120Updated 9 months ago
- Code release for "LLMs can see and hear without any training"☆454Updated 7 months ago
- UniDisc: A discrete diffusion model for joint multimodal generation, enabling controllable and efficient text-image synthesis, editing, a…☆133Updated 8 months ago
- [EMNLP 2024] Official PyTorch implementation code for realizing the technical part of Traversal of Layers (TroL) presenting new propagati…☆99Updated last year
- LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture☆212Updated 11 months ago
- Vision Language Models are Biased☆104Updated last week
- [Technical Report] Official PyTorch implementation code for realizing the technical part of Phantom of Latent representing equipped with …☆61Updated last year
- ☆80Updated last year
- Official Implementation of Muddit [Meissonic II]: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusion Model.☆95Updated last month
- An open source implementation of CLIP (With TULIP Support)☆164Updated 7 months ago
- Official implementation for SSDD Single-Step Diffusion Decoder for Efficient Image Tokenization.☆50Updated last month
- Official PyTorch Implementation for Vision-Language Models Create Cross-Modal Task Representations, ICML 2025☆31Updated 7 months ago
- The code repository for the CURLoRA research paper. Stable LLM continual fine-tuning and catastrophic forgetting mitigation.☆53Updated last year