deepglint/Victor

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/deepglint/Victor)

deepglint / Victor

ViCToR: Improving Visual Comprehension via Token Reconstruction for Pretraining LMMs

☆29

Alternatives and similar repositories for Victor

Users that are interested in Victor are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

xiaoxing2001 / DeGLA
View on GitHub
[ACM MM25] Official Pytorch implementation of [Decoupled Global-Local Alignment for Improving Compositional Understanding]
☆16Jul 15, 2025Updated last year
anxiangsir / Video_Benchmark_Suite
View on GitHub
Video Benchmark Suite: Rapid Evaluation of Video Foundation Models
☆17Jan 10, 2025Updated last year
deepglint / UniME
View on GitHub
[ACM MM 2025] The official code of "Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs"
☆105Dec 8, 2025Updated 7 months ago
deepglint / UniDoc-RL
View on GitHub
UniDoc-RL: Unified Document Understanding with Reinforcement Learning
☆16May 21, 2026Updated 2 months ago
ytaek-oh / fsc-clip
View on GitHub
[EMNLP 2024] Preserving Multi-Modal Capabilities of Pre-trained VLMs for Improving Vision-Linguistic Compositionality
☆22Oct 8, 2024Updated last year
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
GaryGuTC / UniME-v2
View on GitHub
[AAAI 2026 Oral] The official code of "UniME-V2: MLLM-as-a-Judge for Universal Multimodal Embedding Learning"
☆74Dec 8, 2025Updated 7 months ago
chs20 / fuselip
View on GitHub
FuseLIP: Multimodal Embeddings via Early Fusion of Discrete Tokens
☆17Sep 8, 2025Updated 10 months ago
deepglint / MVT
View on GitHub
Margin-based Vision Transformer
☆70Apr 7, 2026Updated 3 months ago
VisionXLab / ProCLIP
View on GitHub
Official PyTorch implementation of ProCLIP: Progressive Vision-Language Alignment via LLM-based Embedder
☆25Dec 4, 2025Updated 7 months ago
EvolvingLMMs-Lab / LLaVA-OneVision-1.5-RL
View on GitHub
Fully Open Framework for Democratized Multimodal Reinforcement Learning.
☆51Dec 19, 2025Updated 7 months ago
deepglint / MLCD-Seg
View on GitHub
MLCD-Seg is a zero-shot segmentation model from DeepGlint.
☆18Jul 4, 2025Updated last year
XMUDeepLIT / LLaVE
View on GitHub
LLaVE: Large Language and Vision Embedding Models with Hardness-Weighted Contrastive Learning
☆78May 23, 2025Updated last year
TIGER-AI-Lab / ABC
View on GitHub
ABC: Achieving Better Control of Multimodal Embeddings using VLMs [TMLR2025]
☆19Aug 21, 2025Updated 10 months ago
Zi-hao-Wei / Efficient-Vision-Language-Pre-training-by-Cluster-Masking
View on GitHub
[CVPR 2024] Improving language-visual pretraining efficiency by perform cluster-based masking on images.
☆33May 16, 2024Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
wuw2019 / LoTLIP
View on GitHub
[NeurIPS 2024] Official PyTorch implementation of LoTLIP: Improving Language-Image Pre-training for Long Text Understanding
☆49Jan 14, 2025Updated last year
Multimodal-Representation-Learning-MRL / GA-DMS
View on GitHub
[EMNLP25 Main]The official code of "Gradient-Attention Guided Dual-Masking Synergetic Framework for Robust Text-based Person Retrieval"
☆25Mar 30, 2026Updated 3 months ago
wntg / LLaMA-Omni
View on GitHub
llama-omni训练代码复现
☆72Jan 23, 2025Updated last year
michaelneri / unsupervised-audio-anomaly-detection
View on GitHub
Official repository of the work "Low-complexity Unsupervised Audio Anomaly Detection exploiting Separable Convolutions and Angular Loss" …
☆11Nov 6, 2024Updated last year
MCG-NJU / VideoEval
View on GitHub
VideoEval: Comprehensive Benchmark Suite for Low-Cost Evaluation of Video Foundation Model
☆15Jul 31, 2025Updated 11 months ago
lezhang7 / SAIL
View on GitHub
[CVPR 2025 Highlight] Official Pytorch codebase for paper: "Assessing and Learning Alignment of Unimodal Vision and Language Models"
☆60Aug 15, 2025Updated 11 months ago
deepglint / DanQing
View on GitHub
The official repo for the DanQing dataset.
☆36Mar 25, 2026Updated 3 months ago
Luodian / nano-hevc
View on GitHub
A minimal, educational HEVC (H.265) encoder written in Python.
☆53Feb 23, 2026Updated 4 months ago
ZJUJeffLai / SAW_SSL
View on GitHub
☆14Oct 31, 2022Updated 3 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
m1k2zoo / negbench
View on GitHub
Evaluation and dataset construction code for the CVPR 2025 paper "Vision-Language Models Do Not Understand Negation"
☆47Feb 26, 2026Updated 4 months ago
SALT-NLP / PersuationGames
View on GitHub
[ACL2023, Findings] Source codes for the paper "Werewolf Among Us: Multimodal Resources for Modeling Persuasion Behaviors in Social Deduc…
☆16Feb 22, 2025Updated last year
MAGAer13 / DeCapBench
View on GitHub
Official Code for "Painting with Words: Elevating Detailed Image Captioning with Benchmark and Alignment Learning" (ICLR 2025)
☆14Mar 6, 2025Updated last year
uvavision / SyViC
View on GitHub
[ICCV 2023] Going Beyond Nouns With Vision & Language Models Using Synthetic Data
☆13Sep 30, 2023Updated 2 years ago
LeapLabTHU / SimPro
View on GitHub
[ICML 2024] SimPro: A Simple Probabilistic Framework Towards Realistic Long-Tailed Semi-Supervised Learning
☆31Sep 30, 2024Updated last year
jiyounglee-0523 / VisAlign
View on GitHub
☆20Apr 23, 2024Updated 2 years ago
deepglint / RealSyn
View on GitHub
[ACM MM2025] The official repository for the RealSyn dataset
☆39Dec 14, 2025Updated 7 months ago
OzerCanDevecioglu / Exploring-Sound-vs-Vibration-for-Robust-Fault-Detection-on-Rotating-Machinery
View on GitHub
☆13Jul 4, 2024Updated 2 years ago
adobe-research / llava-score
View on GitHub
☆11Oct 2, 2024Updated last year
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
C-Fun / Self-Attentive-Pooling-for-Efficient-Deep-Learning
View on GitHub
Official PyTorch implementation of the paper entitled 'Self Attentive Pooling for Efficient Deep Learning'.
☆13May 3, 2024Updated 2 years ago
ShiqiYu / I2CS-loss
View on GitHub
Beyond Softmax Loss: Intra-Concentration and Inter-Separability Loss for Classification(I2CS)
☆12Aug 11, 2020Updated 5 years ago
ExplainableML / cosmos
View on GitHub
[CVPR 2025] COSMOS: Cross-Modality Self-Distillation for Vision Language Pre-training
☆42Mar 27, 2025Updated last year
hushon / ood-diffusion
View on GitHub
☆17Nov 15, 2022Updated 3 years ago
mingkai-zheng / SimMatchV2
View on GitHub
SimMatchV2: Semi-Supervised Learning with Graph Consistency
☆22Dec 26, 2023Updated 2 years ago
haon-chen / mmE5
View on GitHub
☆59Feb 27, 2025Updated last year
ant-research / DreamLIP
View on GitHub
[ECCV 2024] Official PyTorch implementation of DreamLIP: Language-Image Pre-training with Long Captions
☆138May 8, 2025Updated last year