zh460045050 / V2L-TokenizerLinks

☆135

Alternatives and similar repositories for V2L-Tokenizer

Users that are interested in V2L-Tokenizer are comparing it to the libraries listed below

Sorting:

ant-research / DreamLIP
[ECCV 2024] Official PyTorch implementation of DreamLIP: Language-Image Pre-training with Long Captions
☆134Updated 2 months ago
FeipengMa6 / VLoRA
[NeurIPS 2024] Visual Perception by Large Language Model’s Weights
☆45Updated 4 months ago
wusize / F-LMM
[CVPR2025] Code Release of F-LMM: Grounding Frozen Large Multimodal Models
☆100Updated 2 months ago
Hoar012 / RAP-MLLM
[CVPR 2025] RAP: Retrieval-Augmented Personalization
☆64Updated last week
callsys / GenPromp
[ICCV 2023] Generative Prompt Model for Weakly Supervised Object Localization
☆57Updated last year
UMass-Embodied-AGI / Mod-Squad
☆91Updated 2 years ago
ggjy / DeLVM
☆118Updated last year
ziqipang / LM4VisualEncoding
[ICLR 2024 (Spotlight)] "Frozen Transformers in Language Models are Effective Visual Encoder Layers"
☆241Updated last year
baaivision / DenseFusion
DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception
☆150Updated 8 months ago
ZhengYu518 / VL-Mamba
Implementation of "VL-Mamba: Exploring State Space Models for Multimodal Learning"
☆82Updated last year
FutureXiang / ddae
[ICCV 2023 Oral] Official Implementation of "Denoising Diffusion Autoencoders are Unified Self-supervised Learners"
☆174Updated last year
x-cls / superclass
[NeurIPS 2024] Classification Done Right for Vision-Language Pre-Training
☆211Updated 4 months ago
yayafengzi / LMM-HiMTok
HiMTok: Learning Hierarchical Mask Tokens for Image Segmentation with Large Multimodal Model
☆58Updated 2 weeks ago
OpenGVLab / Mono-InternVL
[CVPR 2025] Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
☆69Updated 2 weeks ago
mc-lan / ClearCLIP
[ECCV2024] ClearCLIP: Decomposing CLIP Representations for Dense Vision-Language Inference
☆86Updated 4 months ago
Paranioar / UniPT
[CVPR2024] The code of "UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and Memory"
☆67Updated 9 months ago
kyegomez / Vit-RGTS
Open source implementation of "Vision Transformers Need Registers"
☆184Updated 2 weeks ago
OliverRensu / ARM
[ICLR2025] This repository is the official implementation of our Autoregressive Pretraining with Mamba in Vision
☆83Updated 2 months ago
yichengchen24 / ACP
Official code for paper: Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language
☆27Updated 5 months ago
Martinser / REG
☆62Updated 3 weeks ago
baaivision / DIVA
[ICLR 2025] Diffusion Feedback Helps CLIP See Better
☆286Updated 6 months ago
deepglint / ALIP
[ICCV 2023] ALIP: Adaptive Language-Image Pre-training with Synthetic Caption
☆97Updated last year
chuangchuangtan / LLaVA-NeXT-Image-Llama3-Lora
LLaVA-NeXT-Image-Llama3-Lora, Modified from https://github.com/arielnlee/LLaVA-1.6-ft
☆44Updated last year
wusize / CLIPSelf
[ICLR2024 Spotlight] Code Release of CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction
☆191Updated last year
YichaoCai1 / CLAP
CLAP: Isolating Content from Style through Contrastive Learning with Augmented Prompts
☆53Updated 11 months ago
wangf3014 / SCLIP
Official implementation of SCLIP: Rethinking Self-Attention for Dense Vision-Language Inference
☆161Updated 9 months ago
mc-lan / Awesome-MLLM-Segmentation
A curated list of publications on image and video segmentation leveraging Multimodal Large Language Models (MLLMs), highlighting state-of…
☆109Updated this week
OpenGVLab / LCL
[NeurIPS 2024] Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning
☆69Updated 5 months ago
TempleX98 / MoVA
[NeurIPS 2024] MoVA: Adapting Mixture of Vision Experts to Multimodal Context
☆165Updated 10 months ago
contrastive / FreeVideoLLM
☆81Updated 9 months ago