This repo holds the official code and data for "Beyond Literal Descriptions: Understanding and Locating Open-World Objects Aligned with Human Intentions", which is accepted by ACL 2024 (Findings).
☆16May 21, 2024Updated 2 years ago
Alternatives and similar repositories for IVG
Users that are interested in IVG are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Official PyTorch implementation of the paper "Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner"☆15Aug 9, 2023Updated 2 years ago
- Med-DANet Series (ECCV 2022 & WACV 2024)☆13Jan 2, 2024Updated 2 years ago
- ☆13Oct 30, 2023Updated 2 years ago
- ☆22May 16, 2023Updated 3 years ago
- Entity-Aware and Motion-Aware Transformers for Language-driven Action Localization(IJCAI-22)☆12Oct 11, 2022Updated 3 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- [ICLR2024] Codes and Models for COSA: Concatenated Sample Pretrained Vision-Language Foundation Model☆43Dec 25, 2024Updated last year
- [NeurIPS 2025] Watch and Listen: Understanding Audio-Visual-Speech Moments with Multimodal LLM☆27Feb 10, 2026Updated 4 months ago
- ☆13Jul 20, 2024Updated last year
- Official PyTorch implementation Source code for Weakly Supervised Video Scene Graph Generation via Natural Language Supervision, accepted…☆24Jun 13, 2025Updated last year
- ☆14Aug 13, 2021Updated 4 years ago
- [CVPR-2023] The official dataset of Advancing Visual Grounding with Scene Knowledge: Benchmark and Method.☆34Jul 12, 2023Updated 2 years ago
- Official code for the ICLR 2025 paper, "Ada-K Routing: Boosting the Efficiency of MoE-based LLMs"☆12Mar 1, 2025Updated last year
- ChatBridge, an approach to learning a unified multimodal model to interpret, correlate, and reason about various modalities without rely…☆55Sep 4, 2023Updated 2 years ago
- ☆31Nov 17, 2024Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- [ICLR2026] The code for "Interp3D: Correspondence-Aware Interpolation for Generative Textured 3D Morphing."☆30Jan 21, 2026Updated 4 months ago
- Official code for the paper, "TaCA: Upgrading Your Visual Foundation Model with Task-agnostic Compatible Adapter".☆17Jun 20, 2023Updated 2 years ago
- ☆10Jan 9, 2025Updated last year
- Adaptive FSS has been Accepted by AAAI 2024. A Novel Few-Shot Segmentation Framework via Prototype Enhancement☆43Mar 11, 2024Updated 2 years ago
- [EMNLP 2025 Main] Official implementation of VRoPE: Rotary Position Embedding for Video Large Language Models.☆28Nov 18, 2025Updated 6 months ago
- Code for Learned Thresholds Token Merging and Pruning for Vision Transformers (LTMP). A technique to reduce the size of Vision Transforme…☆17Nov 24, 2024Updated last year
- Official Implementation (Pytorch) of the "Representation Shift: Unifying Token Compression with FlashAttention", ICCV 2025☆35Feb 22, 2026Updated 3 months ago
- Official code for CVPR 2024 paper, "SC-Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models"☆16Apr 22, 2024Updated 2 years ago
- The official implementation of the paper "MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding". …☆63Nov 5, 2024Updated last year
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Heterformer: Transformer-based Deep Node Representation Learning on Heterogeneous Text-Rich Networks (KDD 2023)☆28Feb 16, 2024Updated 2 years ago
- ☆18May 7, 2025Updated last year
- This repo holds the official code for the paper "FreMIM: Fourier Transform Meets Masked Image Modeling for Medical Image Segmentation".☆24Jan 2, 2024Updated 2 years ago
- https://arxiv.org/abs/2102.12594☆14Oct 3, 2023Updated 2 years ago
- [TPAMI2024] Codes and Models for VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset☆311Dec 25, 2024Updated last year
- [ICLR 2025] Diffusion Feedback Helps CLIP See Better☆301Jan 23, 2025Updated last year
- PyTorch implementation of Data2Vec self-supervised approach for vision use cases.☆18Oct 7, 2022Updated 3 years ago
- ☆31Mar 25, 2024Updated 2 years ago
- ☆33Sep 27, 2024Updated last year
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- ISMIR 2021: Curriculum Learning for Imbalanced Classification in Large Vocabulary Automatic Chord Recognition☆10Nov 8, 2021Updated 4 years ago
- UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning☆163Jun 2, 2025Updated last year
- An efficient GRPO training util.☆55Jun 13, 2025Updated last year
- ☆16Jun 5, 2023Updated 3 years ago
- [WACV 2024 Oral] Rethinking Visibility in Human Pose Estimation: Occluded Pose Reasoning via Transformers☆16Jul 6, 2024Updated last year
- This is some implements of pattern classificaion course including perceptron,relaxation procedure,MSE,Fisher,Ho-kashyap,SVM,KNN☆13May 29, 2018Updated 8 years ago
- ☆21Oct 10, 2023Updated 2 years ago