yzy-bupt / LDRELinks

[SIGIR'2024 Best Paper Honorable Mention] Official repository for "LDRE: LLM-based Divergent Reasoning and Ensemble for Zero-Shot Composed Image Retrieval"

☆60

Alternatives and similar repositories for LDRE

Users that are interested in LDRE are comparing it to the libraries listed below

Sorting:

yzy-bupt / SEIZE
[ACM MM'2024] Official repository for "Semantic Editing Increment Benefits Zero-Shot Composed Image Retrieval"
☆41Updated 10 months ago
MCG-NJU / AWT
[NeurIPS 2024] AWT: Transferring Vision-Language Models via Augmentation, Weighting, and Transportation
☆109Updated last year
Hui-design / TSPO
[AAAI 2026] ✨ TSPO: Temporal Sampling Policy Optimization for Long-form Video Language Understanding
☆94Updated this week
yuanze-lin / REVIVE
[NeurIPS 2022] Official code for REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual Question Answering
☆104Updated 7 months ago
LeapLabTHU / Cross-Modal-Adapter
[Pattern Recognition 2025] Cross-Modal Adapter for Vision-Language Retrieval
☆136Updated 3 months ago
gordonhu608 / MQT-LLaVA
[NeurIPS 2024] Matryoshka Query Transformer for Large Vision-Language Models
☆118Updated last year
jqtangust / hawk
🔥 🔥 🔥 [NeurIPS 2024] Official Implementation of Hawk: Learning to Understand Open-World Video Anomalies
☆223Updated 7 months ago
URSA-MATH / URSA-MATH
☆125Updated last month
yongliu20 / UniLSeg
[CVPR 2024] Official implementation of "Universal Segmentation at Arbitrary Granularity with Language Instruction"
☆284Updated last year
XLearning-SCU / 2024-ICLR-Norton
Multi-granularity Correspondence Learning from Long-term Noisy Videos [ICLR 2024, Oral]
☆117Updated last year
AlenjandroWang / ASVR
Autoregressive Semantic Visual Reconstruction Helps VLMs Understand Better
☆180Updated 5 months ago
fletcherjiang / LLMEPET
[MM'24 Oral] Prior Knowledge Integration via LLM Encoding and Pseudo Event Regulation for Video Moment Retrieval
☆130Updated last year
lwpyh / CoS_codes
CoS: Chain-of-Shot Prompting for Long Video Understanding
☆52Updated 9 months ago
microsoft / DeepVideoDiscovery
**Deep Video Discovery (DVD)** is a deep-research style question answering agent designed for understanding extra-long videos.
☆303Updated 2 weeks ago
Alpha-Innovator / OmniCaptioner
Official Repository of OmniCaptioner
☆165Updated 6 months ago
shuolucs / Awesome-Out-Of-Distribution-Detection
[ACM CSUR 2025] Out-of-Distribution Detection: A Task-Oriented Survey of Recent Advances
☆150Updated 2 months ago
mashijie1028 / GenHancer
(ICCV 2025) Enhance CLIP and MLLM's fine-grained visual representations with generative models.
☆73Updated 4 months ago
JingyangQiao / prompt-gradient-projection
☆87Updated last year
pokerme7777 / Compositional-Visual-Reasoning-Survey
Explain Before You Answer: A Survey on Compositional Visual Reasoning
☆292Updated last month
Alpha-Innovator / Chimera
(ICCV-2025 Official Code)) Improving Generalist Model with Domain-Specific Experts
☆85Updated 2 weeks ago
MingXiangL / DEVIL
Evaluation of Text-to-Video Generation Models: A Dynamics Perspective[NeurIPS 2024].
☆276Updated 11 months ago
zfr00 / Fact-R1
the official code for Fact-R1: Towards Explainable Video Misinformation Detection with Deep Reasoning
☆33Updated 3 months ago
RoyZry98 / MoASE-Pytorch
[AAAI 2026 Oral🔥] Official code for Decomposing the Neurons: Activation Sparsity via Mixture of Experts for Continual Test Time Adaptati…
☆70Updated last year
Dreamer312 / SEED-GRPO
The official repository of SEED-GRPO: Semantic Entropy Enhanced GRPO for Uncertainty-Aware Policy Optimization
☆92Updated last month
Everlyn-Labs / ANTRP
Intervening Anchor Token: Decoding Strategy in Alleviating Hallucinations for MLLMs
☆162Updated 8 months ago
YihuaJerry / EventVAD
[MM 2025] EventVAD: Training-Free Event-Aware Video Anomaly Detection
☆505Updated 4 months ago
ekonwang / VisuoThink
[Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics]: VisuoThink: Empowering LVLM Reasoning with Mul…
☆97Updated 3 months ago
OatmealLiu / FineR
[ICLR'24] Democratizing Fine-grained Visual Recognition with Large Language Models
☆186Updated last year
yuanze-lin / Learnable_Regions
[CVPR 2024] Official code for "Text-Driven Image Editing via Learnable Regions"
☆226Updated last year
steven-ccq / ViLAMP
[ICML 2025] Official repository for paper "Scaling Video-Language Models to 10K Frames via Hierarchical Differential Distillation"
☆186Updated last month