[TMM 2025] This is the official Pytorch code for our paper "Visual Position Prompt for MLLM based Visual Grounding".
☆29Jul 23, 2025Updated 10 months ago
Alternatives and similar repositories for VPP-LLaVA
Users that are interested in VPP-LLaVA are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- This is the official Pytorch code for our paper "Artemis: Structured Visual Reasoning for Perception Policy Learning".☆14Dec 4, 2025Updated 5 months ago
- Implementation of the paper Open Eyes, Then Reason: Fine-grained Visual Mathematical Understanding in MLLMs☆12Jun 7, 2025Updated 11 months ago
- 16k Hz Vocoder (HiFiGAN Codes and Pretrained Models)☆18Apr 3, 2023Updated 3 years ago
- ☆13Dec 12, 2024Updated last year
- A C++ implementation of stft, melspectrogram and mel_to_stft☆10Jun 2, 2022Updated 3 years ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- [Tiny KWS] SparkNet: Sparse Binarization for Fast Keyword Spotting☆17Aug 26, 2025Updated 8 months ago
- Implementation of our paper "Exploiting Unsupervised Data for Emotion Recognition in Conversations" in the Findings of EMNLP-2020.☆13Nov 17, 2020Updated 5 years ago
- Code for "Skill-based Chain-of-Thoughts for Domain-Adaptive Video Reasoning [EMNLP 2025 Findings]"☆18Aug 27, 2025Updated 8 months ago
- Pytorch Implementation of Automatic Relation-aware Graph Network Proliferation (CVPR'22, Oral)☆26Apr 3, 2024Updated 2 years ago
- BioLiP2 database curation and web interface☆33Jun 19, 2025Updated 11 months ago
- [CVPR'25] Attention IoU: Examining Biases in CelebA using Attention Maps☆13Mar 26, 2025Updated last year
- [TPAMI]CTNet: Context-based Tandem Network for Semantic Segmentation☆16Jun 15, 2022Updated 3 years ago
- Implementation of the multi-time-scale convolution layer used in the paper Multi-Time-Scale Convolution for Emotion Recognition from Spee…☆11Oct 22, 2019Updated 6 years ago
- ☆10Jan 28, 2024Updated 2 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- The official source code of our AAAI25 paper "D&M: Enriching E-commerce Videos with Sound Effects by Key Moment Detection and SFX Matchin…☆10Feb 9, 2025Updated last year
- [CVPR 2025] Official implementation of paper "Multi-Granularity Class Prototype Topology Distillation for Class-Incremental Source-Free …☆18Apr 16, 2026Updated last month
- Unofficial version of LaneExtraction☆13Oct 12, 2022Updated 3 years ago
- Source Code for the Paper "UNIFIED KEYWORD SPOTTING AND AUDIO TAGGING ON MOBILE DEVICES WITH TRANSFORMERS"☆23Mar 6, 2023Updated 3 years ago
- Extract MFCCs from videos and make bag-of-audio-words (BOAW) representations.☆11Dec 20, 2018Updated 7 years ago
- [ACL 2021] This is the Pytorch code for our paper "Semantic Relation-aware Difference Representation Learning for Change Captioning".☆13Jan 16, 2022Updated 4 years ago
- [EMNLP 2024 Main] MaPPER: Multimodal Prior-guided Parameter Efficient Tuning for Referring Expression Comprehension☆16Jan 6, 2025Updated last year
- Connective Cognition Network for Directional Visual Commonsense Reasoning☆15May 6, 2021Updated 5 years ago
- RPIfield dataset for Person Re-identification☆13Aug 17, 2020Updated 5 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- [CVPR 2023] Official code for paper: Learning to Dub Movies via Hierarchical Prosody Models.☆111Jun 21, 2024Updated last year
- Paper Statistics for CVPR‘22☆14Jun 1, 2022Updated 3 years ago
- Repository for my paper: Deep Multilayer Perceptrons for Dimensional Speech Emotion Recognition☆11Oct 24, 2023Updated 2 years ago
- This code is used to get images from google maps given a GPS region or a center GPS point and a Zoom level.☆18Dec 16, 2024Updated last year
- Official PyTorch Implementation of Exploring Stochastic Autoregressive Image Modeling for Visual Representation, Accepted by AAAI 2023.☆16Jul 3, 2023Updated 2 years ago
- ☆13May 21, 2023Updated 3 years ago
- AN INTERACTIVE REMOTE SENSING CHANGE ANALYSIS MODEL BASED ON MULTIMODAL INSTRUCTION TUNING☆22Jun 16, 2025Updated 11 months ago
- ☆22May 16, 2023Updated 3 years ago
- ☆19Jul 15, 2022Updated 3 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- ☆14Jul 8, 2018Updated 7 years ago
- [CVPR2025] Code Release of Patch Matters: Training-free Fine-grained Image Caption Enhancement via Local Perception☆25Jun 17, 2025Updated 11 months ago
- [ICLR 2026] Thinking on the Fly: Test-Time Reasoning Enhancement via Latent Thought Policy Optimization☆30Mar 6, 2026Updated 2 months ago
- ☆14Oct 11, 2023Updated 2 years ago
- Pytorch implementation for our NeurIPS 2019 paper "TAB-VCR: Tags and Attributes based VCR Baselines" https://arxiv.org/abs/1910.14671☆19May 6, 2021Updated 5 years ago
- Official code for Guiding Language Model Math Reasoning with Planning Tokens☆19Feb 29, 2024Updated 2 years ago
- implementation of TDConvED for video captioning☆13Mar 18, 2020Updated 6 years ago