This repo holds the official code and data for "Beyond Literal Descriptions: Understanding and Locating Open-World Objects Aligned with Human Intentions", which is accepted by ACL 2024 (Findings).
☆16May 21, 2024Updated last year
Alternatives and similar repositories for IVG
Users that are interested in IVG are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Official PyTorch implementation of the paper "Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner"☆15Aug 9, 2023Updated 2 years ago
- Med-DANet Series (ECCV 2022 & WACV 2024)☆13Jan 2, 2024Updated 2 years ago
- This repo holds the official code and data for "Unveiling Parts Beyond Objects: Towards Finer-Granularity Referring Expression Segmentati…☆72Jun 3, 2024Updated last year
- ☆13Oct 30, 2023Updated 2 years ago
- Entity-Aware and Motion-Aware Transformers for Language-driven Action Localization(IJCAI-22)☆12Oct 11, 2022Updated 3 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- [ICLR2024] Codes and Models for COSA: Concatenated Sample Pretrained Vision-Language Foundation Model☆43Dec 25, 2024Updated last year
- [NeurIPS 2025] Watch and Listen: Understanding Audio-Visual-Speech Moments with Multimodal LLM☆24Feb 10, 2026Updated 2 months ago
- ☆13Jul 20, 2024Updated last year
- Code of LVAgent: Long Video Understanding by Multi-Round Dynamical Collaboration of MLLM Agents☆31Nov 24, 2025Updated 5 months ago
- Official PyTorch implementation Source code for Weakly Supervised Video Scene Graph Generation via Natural Language Supervision, accepted…☆24Jun 13, 2025Updated 10 months ago
- ☆14Aug 13, 2021Updated 4 years ago
- [CVPR-2023] The official dataset of Advancing Visual Grounding with Scene Knowledge: Benchmark and Method.☆34Jul 12, 2023Updated 2 years ago
- Official code for the ICLR 2025 paper, "Ada-K Routing: Boosting the Efficiency of MoE-based LLMs"☆12Mar 1, 2025Updated last year
- ChatBridge, an approach to learning a unified multimodal model to interpret, correlate, and reason about various modalities without rely…☆55Sep 4, 2023Updated 2 years ago
- End-to-end encrypted cloud storage - Proton Drive • AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- [ICLR2026] The code for "Interp3D: Correspondence-Aware Interpolation for Generative Textured 3D Morphing."☆28Jan 21, 2026Updated 3 months ago
- Welcome to the official repository of Emotion-Qwen.☆26Jun 10, 2025Updated 10 months ago
- Official code for the paper, "TaCA: Upgrading Your Visual Foundation Model with Task-agnostic Compatible Adapter".☆16Jun 20, 2023Updated 2 years ago
- ☆10Jan 9, 2025Updated last year
- Adaptive FSS has been Accepted by AAAI 2024. A Novel Few-Shot Segmentation Framework via Prototype Enhancement☆43Mar 11, 2024Updated 2 years ago
- [EMNLP 2025 Main] Official implementation of VRoPE: Rotary Position Embedding for Video Large Language Models.☆27Nov 18, 2025Updated 5 months ago
- Official Implementation (Pytorch) of the "Representation Shift: Unifying Token Compression with FlashAttention", ICCV 2025☆34Feb 22, 2026Updated 2 months ago
- Official code for CVPR 2024 paper, "SC-Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models"☆16Apr 22, 2024Updated 2 years ago
- ☆21Jul 6, 2022Updated 3 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Heterformer: Transformer-based Deep Node Representation Learning on Heterogeneous Text-Rich Networks (KDD 2023)☆28Feb 16, 2024Updated 2 years ago
- ☆18May 7, 2025Updated 11 months ago
- ☆29Feb 27, 2025Updated last year
- https://arxiv.org/abs/2102.12594☆14Oct 3, 2023Updated 2 years ago
- [TPAMI2024] Codes and Models for VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset☆310Dec 25, 2024Updated last year
- [ICLR 2025] Diffusion Feedback Helps CLIP See Better☆300Jan 23, 2025Updated last year
- Pytorch Implementation of the paper: "Learning to (Learn at Test Time): RNNs with Expressive Hidden States"☆25Updated this week
- PyTorch implementation of Data2Vec self-supervised approach for vision use cases.☆18Oct 7, 2022Updated 3 years ago
- ☆12Jan 4, 2022Updated 4 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- ☆31Mar 25, 2024Updated 2 years ago
- ☆16Jan 6, 2025Updated last year
- ☆33Sep 27, 2024Updated last year
- UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning☆163Jun 2, 2025Updated 11 months ago
- ☆16Jun 5, 2023Updated 2 years ago
- [WACV 2024 Oral] Rethinking Visibility in Human Pose Estimation: Occluded Pose Reasoning via Transformers☆17Jul 6, 2024Updated last year
- Explicit Occlusion Reasoning for Multi-person 3D Human Pose Estimation☆17Nov 20, 2022Updated 3 years ago