UCSC-VLAA / OpenVisionLinks
OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning
☆237Updated 3 weeks ago
Alternatives and similar repositories for OpenVision
Users that are interested in OpenVision are comparing it to the libraries listed below
Sorting:
- An open source implementation of CLIP (With TULIP Support)☆147Updated 3 weeks ago
- [Fully open] [Encoder-free MLLM] Vision as LoRA☆280Updated last week
- LLM2CLIP makes SOTA pretrained CLIP model more SOTA ever.☆520Updated 2 months ago
- Python Library to evaluate VLM models' robustness across diverse benchmarks☆207Updated this week
- [ICLR 2025] VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation☆330Updated last month
- ConceptAttention: A method for interpreting multi-modal diffusion transformers.☆264Updated last month
- Official Implementation for our NeurIPS 2024 paper, "Don't Look Twice: Run-Length Tokenization for Faster Video Transformers".☆211Updated 2 months ago
- When do we not need larger vision models?☆396Updated 3 months ago
- Code for "Scaling Language-Free Visual Representation Learning" paper (Web-SSL).☆129Updated last month
- Scaling Vision Pre-Training to 4K Resolution☆162Updated this week
- [ACL 2025 🔥] Rethinking Step-by-step Visual Reasoning in LLMs☆299Updated 2 weeks ago
- [ICML 2025] Official PyTorch implementation of LongVU☆380Updated 3 weeks ago
- [CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts☆321Updated 10 months ago
- ☆177Updated 7 months ago
- LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture☆203Updated 5 months ago
- SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models☆220Updated 8 months ago
- MMaDA - Open-Sourced Multimodal Large Diffusion Language Models☆934Updated this week
- Code for the Molmo Vision-Language Model☆431Updated 5 months ago
- [CVPR 2025 Highlight] The official CLIP training codebase of Inf-CL: "Breaking the Memory Barrier: Near Infinite Batch Size Scaling for C…☆248Updated 4 months ago
- UniDisc: A discrete diffusion model for joint multimodal generation, enabling controllable and efficient text-image synthesis, editing, a…☆105Updated 2 months ago
- FlexTok: Resampling Images into 1D Token Sequences of Flexible Length☆143Updated 2 weeks ago
- Visual Planning: Let's Think Only with Images☆179Updated 2 weeks ago
- 🦾 EvalGIM (pronounced as "EvalGym") is an evaluation library for generative image models. It enables easy-to-use, reproducible automatic…☆80Updated 5 months ago
- HART: Efficient Visual Generation with Hybrid Autoregressive Transformer☆602Updated 7 months ago
- Matryoshka Multimodal Models☆107Updated 4 months ago
- Code for Scaling Language-Free Visual Representation Learning (WebSSL)☆244Updated last month
- GenEval: An object-focused framework for evaluating text-to-image alignment☆287Updated 3 months ago
- [ICLR 2025] Autoregressive Video Generation without Vector Quantization☆510Updated 2 weeks ago
- [ICML 2025] This is the official repository of our paper "What If We Recaption Billions of Web Images with LLaMA-3 ?"☆132Updated 11 months ago
- DDT: Decoupled Diffusion Transformer☆252Updated 2 weeks ago