xiaoxing2001/DeGLA

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/xiaoxing2001/DeGLA)

xiaoxing2001 / DeGLA

[ACM MM25] Official Pytorch implementation of [Decoupled Global-Local Alignment for Improving Compositional Understanding]

☆16

Alternatives and similar repositories for DeGLA

Users that are interested in DeGLA are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

anxiangsir / Video_Benchmark_Suite
View on GitHub
Video Benchmark Suite: Rapid Evaluation of Video Foundation Models
☆17Jan 10, 2025Updated last year
deepglint / UniME
View on GitHub
[ACM MM 2025] The official code of "Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs"
☆105Dec 8, 2025Updated 7 months ago
deepglint / RealSyn
View on GitHub
[ACM MM2025] The official repository for the RealSyn dataset
☆39Dec 14, 2025Updated 7 months ago
VisionXLab / ProCLIP
View on GitHub
Official PyTorch implementation of ProCLIP: Progressive Vision-Language Alignment via LLM-based Embedder
☆25Dec 4, 2025Updated 7 months ago
deepglint / UniDoc-RL
View on GitHub
UniDoc-RL: Unified Document Understanding with Reinforcement Learning
☆16May 21, 2026Updated 2 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
Multimodal-Representation-Learning-MRL / GA-DMS
View on GitHub
[EMNLP25 Main]The official code of "Gradient-Attention Guided Dual-Masking Synergetic Framework for Robust Text-based Person Retrieval"
☆25Mar 30, 2026Updated 3 months ago
GaryGuTC / UniME-v2
View on GitHub
[AAAI 2026 Oral] The official code of "UniME-V2: MLLM-as-a-Judge for Universal Multimodal Embedding Learning"
☆74Dec 8, 2025Updated 7 months ago
EvolvingLMMs-Lab / LLaVA-OneVision-1.5-RL
View on GitHub
Fully Open Framework for Democratized Multimodal Reinforcement Learning.
☆51Dec 19, 2025Updated 7 months ago
deepglint / MLCD-Seg
View on GitHub
MLCD-Seg is a zero-shot segmentation model from DeepGlint.
☆18Jul 4, 2025Updated last year
deepglint / MVT
View on GitHub
Margin-based Vision Transformer
☆70Apr 7, 2026Updated 3 months ago
MCG-NJU / VideoEval
View on GitHub
VideoEval: Comprehensive Benchmark Suite for Low-Cost Evaluation of Video Foundation Model
☆15Jul 31, 2025Updated 11 months ago
GaryGuTC / LaPA_model
View on GitHub
[CVPRW 2024] LaPA: Latent Prompt Assist Model For Medical Visual Question Answering
☆27Apr 24, 2025Updated last year
deepglint / DanQing
View on GitHub
The official repo for the DanQing dataset.
☆36Mar 25, 2026Updated 3 months ago
Luodian / nano-hevc
View on GitHub
A minimal, educational HEVC (H.265) encoder written in Python.
☆53Feb 23, 2026Updated 4 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
chenpk00 / IS2024_stream_decoder_only_asr
View on GitHub
☆16Mar 12, 2024Updated 2 years ago
Niujunbo2002 / NativeRes-LLaVA
View on GitHub
Official code repo for our work "Native Visual Understanding: Resolving Resolution Dilemmas in Vision-Language Models"
☆54Jun 17, 2025Updated last year
chs20 / fuselip
View on GitHub
FuseLIP: Multimodal Embeddings via Early Fusion of Discrete Tokens
☆17Sep 8, 2025Updated 10 months ago
wntg / LLaMA-Omni
View on GitHub
llama-omni训练代码复现
☆72Jan 23, 2025Updated last year
XiaoBuL / OmniCLIP
View on GitHub
[ECAI-2024] OmniCLIP: Adapting CLIP for Video Recognition with Spatial-Temporal Omni-Scale Feature Learning
☆16Jan 7, 2025Updated last year
EvolvingLMMs-Lab / OpenMMReasoner
View on GitHub
[CVPR 2026] OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe
☆164Mar 30, 2026Updated 3 months ago
aimagelab / HySAC
View on GitHub
Hyperbolic Safety-Aware Vision-Language Models. CVPR 2025
☆31Apr 8, 2025Updated last year
TIGER-AI-Lab / ABC
View on GitHub
ABC: Achieving Better Control of Multimodal Embeddings using VLMs [TMLR2025]
☆19Aug 21, 2025Updated 11 months ago
Heidelberg-NLP / VALSE
View on GitHub
Data repository for the VALSE benchmark.
☆40Feb 15, 2024Updated 2 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
vinid / neg_clip
View on GitHub
NegCLIP.
☆41Feb 6, 2023Updated 3 years ago
peterant330 / KUEA
View on GitHub
[ICML'25] Kernel-based Unsupervised Embedding Alignment for Enhanced Visual Representation in Vision-language Models
☆23Sep 7, 2025Updated 10 months ago
HenryCai11 / LLM-Self-Control
View on GitHub
The official repo of paper "Self-Control of LLM Behaviors by Compressing Suffix Gradient into Prefix Controller"
☆18Aug 13, 2024Updated last year
xinyebei / 2026_finvcup_baseline
View on GitHub
信也杯2026比赛baseline
☆15Jun 17, 2026Updated last month
calisolo / Levels_image_captioning_NICE
View on GitHub
NICE challenge 2023 Track2 2nd result(total 4th) (CVPR 2023) sponsered by LG AI/Shutterstock/SNU
☆11Jun 22, 2023Updated 3 years ago
wonjune-kang / llm-speech-summarization
View on GitHub
Prompting Large Language Models with Audio for General-Purpose Speech Summarization
☆20May 14, 2025Updated last year
ShareLab-SII / CoMP-MM
View on GitHub
Official repository of "CoMP: Continual Multimodal Pre-training for Vision Foundation Models"
☆48Apr 3, 2025Updated last year
lscpku / VITATECS
View on GitHub
☆18Jul 10, 2024Updated 2 years ago
akgarhwal / Monopoly-Android-Game-Project
View on GitHub
Designed an android application using android studio 1.3, java, xml. This application is a digital version of the actual Monopoly game. I…
☆11Sep 25, 2021Updated 4 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
yanchick / Practice-ML-DEV
View on GitHub
ITMO course
☆12Mar 26, 2024Updated 2 years ago
rxtan2 / Koala-video-llm
View on GitHub
☆37Sep 16, 2024Updated last year
EvolvingLMMs-Lab / ParaVT
View on GitHub
ParaVT: Taming the Tool Prior Paradox for Parallel Tool Use in Agentic Video Reinforcement Learning
☆54Jun 2, 2026Updated last month
deepglint / ALIP
View on GitHub
[ICCV 2023] ALIP: Adaptive Language-Image Pre-training with Synthetic Caption
☆106Sep 18, 2023Updated 2 years ago
Mia-YatingYu / STDD
View on GitHub
[AAAI'25]: Building a Multi-modal Spatiotemporal Expert for Zero-shot Action Recognition with CLIP
☆23Aug 5, 2025Updated 11 months ago
ytaek-oh / fsc-clip
View on GitHub
[EMNLP 2024] Preserving Multi-Modal Capabilities of Pre-trained VLMs for Improving Vision-Linguistic Compositionality
☆22Oct 8, 2024Updated last year
gmberton / LLM-table
View on GitHub
☆17Apr 23, 2026Updated 2 months ago