zhjohnchan / SK-VGView external linksLinks
[CVPR-2023] The official dataset of Advancing Visual Grounding with Scene Knowledge: Benchmark and Method.
☆33Jul 12, 2023Updated 2 years ago
Alternatives and similar repositories for SK-VG
Users that are interested in SK-VG are comparing it to the libraries listed below
Sorting:
- released code for CVPR2021: Deeply Shape-guided Cascade for Instance Segmentation☆14Feb 20, 2022Updated 3 years ago
- The Stardog Whisperer: TypeScript/JS parsers for Stardog languages☆16Jul 9, 2025Updated 7 months ago
- This repo holds the official code and data for "Beyond Literal Descriptions: Understanding and Locating Open-World Objects Aligned with H…☆16May 21, 2024Updated last year
- ☆14Jul 24, 2025Updated 6 months ago
- ☆39Jun 28, 2023Updated 2 years ago
- [TIP2020] ICNet: Information Conversion Network for RGB-D Based Salient Object Detection☆17Nov 17, 2023Updated 2 years ago
- A curated list of research papers in Referring Expression Comprehension (REC)☆46May 13, 2021Updated 4 years ago
- Code for ACL 2023 paper titled "Lifting the Curse of Capacity Gap in Distilling Language Models"☆29Jul 14, 2023Updated 2 years ago
- Improving One-stage Visual Grounding by Recursive Sub-query Construction, ECCV 2020☆89Sep 30, 2021Updated 4 years ago
- Official codebase for "Ref-NMS: Breaking Proposal Bottlenecks in Two-Stage Referring Expression Grounding"☆22Dec 20, 2020Updated 5 years ago
- Learning Situation Hyper-Graphs for Video Question Answering☆22Feb 16, 2024Updated last year
- [ICLR 2024 & ECCV 2024] The All-Seeing Projects: Towards Panoptic Visual Recognition&Understanding and General Relation Comprehension of …☆504Aug 9, 2024Updated last year
- The official PyTorch Implementation of the Paper "Adversarial Visual Robustness by Causal Intervention"☆18Oct 6, 2021Updated 4 years ago
- ☆31Nov 17, 2024Updated last year
- [ICCV'25] HERMES: temporal-coHERent long-forM understanding with Episodes and Semantics☆38Sep 10, 2025Updated 5 months ago
- Official Implementation for paper "Referring Transformer: A One-step Approach to Multi-task Visual Grounding" Neurips 2021☆68May 26, 2022Updated 3 years ago
- ☆18Jun 10, 2025Updated 8 months ago
- This repository is about how to build an SQLite version of the Arabic WordNet database.☆10Mar 19, 2019Updated 6 years ago
- ☆28Jul 1, 2020Updated 5 years ago
- [CVPR 2022] Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding☆153Jul 13, 2024Updated last year
- Lattice Recurrent Unit: Improving Convergence and Statistical Efficiency for Sequence Modeling☆36Jan 30, 2018Updated 8 years ago
- A codebase for data crawling and preprocessing for TTS and ASR systems training.☆22Feb 5, 2026Updated last week
- A nonparametric variational information bottleneck (NVIB) layer in Pytorch☆11Apr 15, 2025Updated 9 months ago
- Code for reproducing the results from "CrAM: A Compression-Aware Minimizer" accepted at ICLR 2023☆10Mar 1, 2023Updated 2 years ago
- Deep Neural Networks for Python☆10Sep 22, 2015Updated 10 years ago
- Poincaré Embedding☆39Nov 21, 2017Updated 8 years ago
- Arabic Word-Embedding (Word2vec) model training from Wikipedia articles☆11Dec 13, 2018Updated 7 years ago
- Improving Visual Grounding with Visual-Linguistic Verification and Iterative Reasoning, CVPR 2022☆96Dec 2, 2022Updated 3 years ago
- ☆38Nov 12, 2017Updated 8 years ago
- A simple Markov-chain-based music generator. It was trained on Bach's violin concertos and used as a demonstration for a lecture to physi…☆10Oct 14, 2016Updated 9 years ago
- Code for paper Audio Visual Speaker Localization from EgoCentric Views☆11Jul 3, 2024Updated last year
- ☆11Oct 24, 2024Updated last year
- Official code for the paper: "Perception and Semantic Aware Regularization for Sequential Confidence Calibration (CVPR2023)"☆10May 15, 2024Updated last year
- Tensegrity Lab is for efficiently exploring spatial structures based on pure pairwise push and pull forces using Rust Language☆11Feb 4, 2026Updated last week
- Calculate Mahalanobis distances for multivariate data.☆12Mar 23, 2020Updated 5 years ago
- F-16 is a powerful video large language model (LLM) that perceives high-frame-rate videos, which is developed by the Department of Electr…☆34Jul 3, 2025Updated 7 months ago
- [CVPR2025] Code Release of F-LMM: Grounding Frozen Large Multimodal Models☆108May 29, 2025Updated 8 months ago
- phonetic transcription for arabic☆10Sep 17, 2019Updated 6 years ago
- Question Dependent Recurrent Entity Network☆13Sep 21, 2017Updated 8 years ago