Hon-Wong / VoRALinks
[Fully open] [Encoder-free MLLM] Vision as LoRA
ā360Updated 6 months ago
Alternatives and similar repositories for VoRA
Users that are interested in VoRA are comparing it to the libraries listed below
Sorting:
- An open source implementation of CLIP (With TULIP Support)ā165Updated 7 months ago
- [ACL 2025 š„] Rethinking Step-by-step Visual Reasoning in LLMsā310Updated 7 months ago
- Code for "Scaling Language-Free Visual Representation Learning" paper (Web-SSL).ā194Updated 8 months ago
- [ICCV 2025] OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learningā413Updated last month
- PyTorch implementation of NEPAā230Updated 2 weeks ago
- [COLM'25] Official implementation of the Law of Vision Representation in MLLMsā172Updated 3 months ago
- Python Library to evaluate VLM models' robustness across diverse benchmarksā220Updated 2 months ago
- [ICCVW 25] LLaVA-MORE: A Comparative Study of LLMs and Visual Backbones for Enhanced Visual Instruction Tuningā157Updated 4 months ago
- [CVPR2025 Highlight] Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Modelsā232Updated 2 months ago
- [CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Promptsā335Updated last year
- When do we not need larger vision models?ā413Updated 10 months ago
- [CVPR'2025] VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".ā203Updated 6 months ago
- [CVPR 2025 Highlight] The official CLIP training codebase of Inf-CL: "Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Cā¦ā275Updated 11 months ago
- Visual Planning: Let's Think Only with Imagesā290Updated 7 months ago
- Code for Scaling Language-Free Visual Representation Learning (WebSSL)ā245Updated 8 months ago
- Code for ChatRex: Taming Multimodal LLM for Joint Perception and Understandingā210Updated 2 months ago
- [NeurIPS 2024] MoVA: Adapting Mixture of Vision Experts to Multimodal Contextā168Updated last year
- TIPS (ICLR'25): Text-Image Pretraining with Spatial Awarenessā111Updated 8 months ago
- Pixel-Level Reasoning Model trained with RL [NeuIPS25]ā257Updated 2 months ago
- Scaling Vision Pre-Training to 4K Resolutionā217Updated 4 months ago
- Official Implementation of LaViDa: :A Large Diffusion Language Model for Multimodal Understandingā186Updated 3 weeks ago
- LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architectureā212Updated last year
- EVE Series: Encoder-Free Vision-Language Models from BAAIā363Updated 5 months ago
- LLM2CLIP makes SOTA pretrained CLIP model more SOTA ever.ā567Updated last month
- Projects based on SigLIP (Zhai et. al, 2023) and Hugging Face transformers integration š¤ā297Updated 10 months ago
- CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Expertsā162Updated last year
- Fully Open Framework for Democratized Multimodal Trainingā679Updated last week
- [ECCV 2024] Official Release of SILC: Improving vision language pretraining with self-distillationā47Updated last year
- [TMLR 2025 J2C] TextRegion: Text-Aligned Region Tokens from Frozen Image-Text Modelsā50Updated 2 weeks ago
- [TMLR] Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling"ā147Updated last year