AstraZeneca / vlmLinks
Official implementation for "Diffusion Instruction Tuning"
☆31Updated 5 months ago
Alternatives and similar repositories for vlm
Users that are interested in vlm are comparing it to the libraries listed below
Sorting:
- ☆56Updated 6 months ago
- the official repo for "D-AR: Diffusion via Autoregressive Models"☆124Updated 4 months ago
- The official repo for LIFT: Language-Image Alignment with Fixed Text Encoders☆37Updated 5 months ago
- [NeurIPS'25 Spotlight] Boosting Generative Image Modeling via Joint Image-Feature Synthesis☆97Updated 2 weeks ago
- ☆71Updated 11 months ago
- [NeurIPS 2025 Oral] Exploring Diffusion Transformer Designs via Grafting☆61Updated 4 months ago
- [ICLR 2024] Contextualized Diffusion Models for Text-Guided Image and Video Generation☆70Updated last year
- [NeurIPS '25 Spotlight] Official Pytorch implementation of "Vision Transformers Don't Need Trained Registers"☆141Updated last month
- High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning☆51Updated 3 months ago
- 🐻 Uniform Discrete Diffusion with Metric Path for Video Generation☆73Updated this week
- Official PyTorch Implementation of "Scalable Autoregressive Image Generation with Mamba"☆141Updated 10 months ago
- [ICML 2024] This repository includes the official implementation of our paper "Rejuvenating image-GPT as Strong Visual Representation Lea…☆98Updated last year
- INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model☆42Updated last year
- Code for "VideoRepair: Improving Text-to-Video Generation via Misalignment Evaluation and Localized Refinement"☆50Updated 11 months ago
- Official respository for ReasonGen-R1☆73Updated 4 months ago
- This repository provides an improved LLamaGen Model, fine-tuned on 500,000 high-quality images, each accompanied by over 300 token prompt…☆30Updated last year
- [ICCV 2025] Official code for Perspective-Aware Reasoning in Vision-Language Models via Mental Imagery Simulation☆47Updated 2 months ago
- Official Implementation of LaViDa: :A Large Diffusion Language Model for Multimodal Understanding☆170Updated 3 weeks ago
- ☆132Updated last month
- [CVPR2025 Highlight] PAR: Parallelized Autoregressive Visual Generation. https://yuqingwang1029.github.io/PAR-project☆178Updated 7 months ago
- [ICCV 2025] TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for Image-to-Video Generation☆33Updated 11 months ago
- Official implementation of Bifrost-1: Bridging Multimodal LLMs and Diffusion Models with Patch-level CLIP Latents (NeurIPS 2025)☆43Updated last month
- FaceXBench: Evaluating Multimodal LLMs on Face Understanding☆17Updated 9 months ago
- [Preprint] UCGM: Unified Continuous Generative Models☆169Updated 5 months ago
- [NeurIPS 2025] HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation☆71Updated last month
- Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment☆61Updated 3 months ago
- Diffusion Models as Data Mining Tools☆54Updated 6 months ago
- [NeurIPS 2024] Official implementation of the paper "Interfacing Foundation Models' Embeddings"☆127Updated last year
- Code for the paper "Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers" [ICCV 2025]☆93Updated 3 months ago
- The official implementation of "[MASK] is All You Need"☆125Updated 3 months ago