AstraZeneca / vlmLinks
Official implementation for "Diffusion Instruction Tuning"
☆23Updated 2 weeks ago
Alternatives and similar repositories for vlm
Users that are interested in vlm are comparing it to the libraries listed below
Sorting:
- ☆37Updated last month
- the official repo for "D-AR: Diffusion via Autoregressive Models"☆98Updated this week
- ☆70Updated 7 months ago
- ☆41Updated 11 months ago
- Autoregressive Image Generation with Randomized Parallel Decoding☆67Updated 2 months ago
- ☆37Updated 2 weeks ago
- Exploring Diffusion Transformer Designs via Grafting☆33Updated last week
- [NeurIPS 2024] TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration☆24Updated 8 months ago
- FaceXBench: Evaluating Multimodal LLMs on Face Understanding☆14Updated 4 months ago
- Evaluation and dataset construction code for the CVPR 2025 paper "Vision-Language Models Do Not Understand Negation"☆24Updated 2 months ago
- Fast-Slow Thinking for Large Vision-Language Model Reasoning☆15Updated last month
- MCPL: MULTI-CONCEPT PROMPT LEARNING☆20Updated last year
- Code for ICML 2025 Paper "Highly Compressed Tokenizer Can Generate Without Training"☆80Updated 2 weeks ago
- This repository provides an improved LLamaGen Model, fine-tuned on 500,000 high-quality images, each accompanied by over 300 token prompt…☆30Updated 8 months ago
- [ICML 2025] VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models☆27Updated 2 weeks ago
- Official Pytorch implementation of "Vision Transformers Don't Need Trained Registers"☆59Updated this week
- [CVPR2025] Official code repository for SeTa: "Scale Efficient Training for Large Datasets"☆17Updated 3 months ago
- Official implementation of Next Block Prediction: Video Generation via Semi-Autoregressive Modeling☆35Updated 4 months ago
- 👆Pytorch implementation of "Ctrl-V: Higher Fidelity Video Generation with Bounding-Box Controlled Object Motion"☆27Updated 8 months ago
- INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model☆42Updated 10 months ago
- ☆23Updated last year
- Scaling Multi-modal Instruction Fine-tuning with Tens of Thousands Vision Task Types☆19Updated 2 months ago
- Official implementation of VRoPE: Rotary Position Embedding for Video Large Language Models.☆21Updated last month
- Pytorch implementation of HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models☆28Updated last year
- Official Repository of Personalized Visual Instruct Tuning☆29Updated 3 months ago
- Autoregressive Semantic Visual Reconstruction Helps VLMs Understand Better☆29Updated last week
- No Other Representation Component Is Needed: Diffusion Transformers Can Provide Representation Guidance by Themselves☆59Updated 2 weeks ago
- ☆37Updated 11 months ago
- Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing☆33Updated this week
- Official code for Perspective-Aware Reasoning in Vision-Language Models via Mental Imagery Simulation☆31Updated 2 months ago