deepglint / RWKV-CLIPLinks

[EMNLP 2024] RWKV-CLIP: A Robust Vision-Language Representation Learner

☆143

Alternatives and similar repositories for RWKV-CLIP

Users that are interested in RWKV-CLIP are comparing it to the libraries listed below

Sorting:

howard-hou / VisualRWKV
VisualRWKV is the visual-enhanced version of the RWKV language model, enabling RWKV to handle various visual tasks.
☆237Updated 5 months ago
alibaba / conv-llava
☆123Updated last year
invictus717 / MiCo
[ICCV 2025] Explore the Limits of Omni-modal Pretraining at Scale
☆118Updated last year
yfzhang114 / SliME
✨✨Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models
☆162Updated 10 months ago
FreedomIntelligence / LongLLaVA
LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture
☆211Updated 10 months ago
Beckschen / ViTamin
[CVPR 2024] Official implementation of "ViTamin: Designing Scalable Vision Models in the Vision-language Era"
☆210Updated last year
yuyq96 / TextHawk
Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models
☆63Updated last year
SparksJoe / Prism
A Framework for Decoupling and Assessing the Capabilities of VLMs
☆43Updated last year
ZhangXJ199 / TinyLLaVA-Video
A Simple Framework of Small-scale LMMs for Video Understanding
☆96Updated 5 months ago
FudanNLPLAB / MouSi
☆75Updated last year
HJYao00 / DenseConnector
【NeurIPS 2024】Dense Connector for MLLMs
☆180Updated last year
deepglint / UniME
[ACM MM 2025] The official code of "Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs"
☆94Updated 3 months ago
deepglint / MVT
Margin-based Vision Transformer
☆55Updated last month
TempleX98 / MoVA
[NeurIPS 2024] MoVA: Adapting Mixture of Vision Experts to Multimodal Context
☆168Updated last year
DAMO-NLP-SG / Inf-CLIP
[CVPR 2025 Highlight] The official CLIP training codebase of Inf-CL: "Breaking the Memory Barrier: Near Infinite Batch Size Scaling for C…
☆269Updated 10 months ago
WePOINTS / WePOINTS
☆186Updated 9 months ago
TencentARC / mllm-npu
mllm-npu: training multimodal large language models on Ascend NPUs
☆94Updated last year
scenarios / WeMM
☆87Updated last year
360CVGroup / Inner-Adaptor-Architecture
LMM solved catastrophic forgetting, AAAI2025
☆44Updated 7 months ago
ZhangAIPI / YOPO_MLLM_Pruning
Pruning the VLLMs
☆104Updated 11 months ago
luogen1996 / LLaVA-HR
[ICLR2025] LLaVA-HR: High-Resolution Large Language-Vision Assistant
☆246Updated last year
si0wang / ThinkLite-VL
☆105Updated 5 months ago
kyegomez / PALI3
Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"
☆145Updated 3 weeks ago
BytedanceDouyinContent / SAIL-VL2
The SAIL-VL2 series model developed by the BytedanceDouyinContent Group
☆75Updated 2 months ago
cnzzx / VSA
Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines
☆126Updated last year
ggg0919 / cantor
☆90Updated last year
sterzhang / image-textualization
Image Textualization: An Automatic Framework for Generating Rich and Detailed Image Descriptions (NeurIPS 2024)
☆168Updated last year
kyegomez / NaViT
My implementation of "Patch n’ Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution"
☆264Updated 3 weeks ago
rednote-hilab / dots.vlm1
The official repository of the dots.vlm1 instruct models proposed by rednote-hilab.
☆264Updated last month
Yangyi-Chen / SOLO
[TMLR] Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling"
☆148Updated last year