zhousheng97/ViTXT-GQA

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/zhousheng97/ViTXT-GQA)

zhousheng97 / ViTXT-GQA

[IEEE TMM'25] Scene-Text Grounding for Text-Based Video Question Answering

☆17

Alternatives and similar repositories for ViTXT-GQA

Users that are interested in ViTXT-GQA are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

zhousheng97 / EgoTextVQA
View on GitHub
[CVPR'25] 🌟🌟 EgoTextVQA: Towards Egocentric Scene-Text Aware Video Question Answering
☆52Jun 19, 2025Updated last year
kai422 / SCALE
View on GitHub
[ICLR 2024] Scaling for Training Time and Post-hoc Out-of-distribution Detection Enhancement.
☆15Mar 12, 2024Updated 2 years ago
kalenforn / Real_time_face_recognition_with_insightface
View on GitHub
这个项目是基于python3的mxnet框架实现的实时视频人脸识别，其中包括视频传输，人脸识别等部分，用户可根据需要调整使用。整个项目建立在ubuntu18.04系统下。
☆16Dec 12, 2020Updated 5 years ago
kalenforn / DCHMT
View on GitHub
This is the official code of the paper "Differentiable Cross Modal Hashing via Multimodal Transformers"
☆18Mar 11, 2024Updated 2 years ago
yl3800 / EIGV
View on GitHub
☆15Aug 12, 2022Updated 3 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
wzx99 / TMIM
View on GitHub
☆13Oct 17, 2024Updated last year
ming71 / GCL
View on GitHub
GCL implementation
☆14Mar 7, 2024Updated 2 years ago
nailwatts / FNIN
View on GitHub
FNIN: A Fourier Neural Operator-based Numerical Integration Network for Surface-form-gradients
☆13Jan 22, 2025Updated last year
Hxyz-123 / GoMatching
View on GitHub
[NeurIPS'24] GoMatching: A Simple Baseline for Video Text Spotting via Long and Short Term Matching
☆34May 29, 2025Updated last year
weijiawu / BOVText-Benchmark
View on GitHub
[NeurIPS2021] BOVText: A Large-Scale, Multidimensional Multilingual Dataset for Video Text Spotting
☆71Oct 9, 2023Updated 2 years ago
any2any-mllm / awesome-any2any
View on GitHub
This is a repository for awesome any2any work collection.
☆30Jul 10, 2026Updated 2 weeks ago
hanqiu-hq / MAD
View on GitHub
☆14Sep 9, 2024Updated last year
kai422 / CoVOS
View on GitHub
[CVPR 2022] Accelerating Video Object Segmentation with Compressed Video
☆41Jul 3, 2022Updated 4 years ago
ZijiaLewisLu / CVPR2025-DeCafNet
View on GitHub
Official Repo for CVPR 2025 Paper -- DeCafNet: Delegate and Conquer for Efficient Temporal Grounding in Long Videos
☆17Mar 16, 2026Updated 4 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
DrLuo / SemiETS
View on GitHub
【CVPR 2025】SemiETS: Integrating Spatial and Content Consistencies for Semi-Supervised End-to-end Text Spotting
☆17Jul 1, 2025Updated last year
HDUyiming / SOCCER
View on GitHub
We are very happy that our work has been accepted by ACM Multimedia 2024！🥰
☆12Jan 8, 2025Updated last year
Lee-zixu / FineCIR
View on GitHub
☆12Mar 31, 2025Updated last year
CSC2548 / image_caption_gan
View on GitHub
☆10May 4, 2018Updated 8 years ago
Gyann-z / FDP
View on GitHub
☆16Apr 21, 2025Updated last year
scofield7419 / MUIE-REAMO
View on GitHub
Code of the Grounded MUIE model, REAMO
☆11Dec 3, 2024Updated last year
ezeli / InSentiCap_model
View on GitHub
A pytorch implementation of our paper Image Captioning with Inherent Sentiment (ICME 2021 Oral).
☆11Jul 18, 2022Updated 4 years ago
viridityzhu / InstructHumans
View on GitHub
[IEEE TMM] InstructHumans: Editing Animated 3D Human Textures with Instructions
☆68Feb 28, 2026Updated 5 months ago
kalenforn / clip-based-cross-modal-hash
View on GitHub
This project summarizes the CLIP-based cross-modal hashing methods. Including DCMHT, MITH, DSPH, DNPH, TwDH (Two-Step Discrete Hashing fo…
☆55May 26, 2026Updated 2 months ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
PKU-ICST-MIPL / MAI_ICLR2025
View on GitHub
☆20Mar 5, 2025Updated last year
entalent / MemCap
View on GitHub
code for paper `MemCap: Memorizing Style Knowledge for Image Captioning`
☆11Mar 17, 2020Updated 6 years ago
yl3800 / IGV
View on GitHub
This repo contains code for Invariant Grounding for Video Question Answering
☆27Mar 2, 2023Updated 3 years ago
Chiangsonw / CaLa
View on GitHub
The official code of "CaLa: Complementary Association Learning for Augmenting Composed Image Retrieval"
☆15Sep 19, 2024Updated last year
hongwang600 / fashion-iq-metadata
View on GitHub
this repo contains some useful metadata for Fashion IQ challenge: https://sites.google.com/view/lingir/fashion-iq
☆15Jun 28, 2019Updated 7 years ago
PKU-ICST-MIPL / MGAH_TMM2019
View on GitHub
Source code of our TMM 2019 paper "Multi-pathway Generative Adversarial Hashing for Unsupervised Cross-modal Retrieval"
☆12Jun 17, 2019Updated 7 years ago
DarrenZZhang / MM23-MITH
View on GitHub
☆21Apr 10, 2024Updated 2 years ago
blisgard / BucketedRankingBasedLosses
View on GitHub
Official PyTorch Implementation of Bucketed Ranking-based Losses for Efficient Training of Object Detectors [ECCV2024]
☆26Apr 27, 2025Updated last year
EricWWWW / image-caption-metrics
View on GitHub
a py3 lib for NLP & image-caption metrics : BLEU METEOR CIDEr ROUGE SPICE WMD
☆14Sep 13, 2022Updated 3 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
ZZDoog / ProDubber
View on GitHub
[CVPR 2025] Official implementation of paper "Prosody-Enhanced Acoustic Pre-training and Acoustic-Disentangled Prosody Adapting for Movie…
☆23Jun 6, 2025Updated last year
bofang98 / UATVR
View on GitHub
[ICCV'23] UATVR: Uncertainty-Adaptive Text-Video Retrieval
☆13Nov 5, 2023Updated 2 years ago
LijunRio / A-Self-Guided-Framework
View on GitHub
This repository contains the code accompanying the paper "A Self-Guided Framework for Radiology Report Generation", accepted by MICCAI 20…
☆20Mar 11, 2024Updated 2 years ago
Chiaraplizz / OSNOM
View on GitHub
Official repository from the paper "Spatial Cognition from Egocentric Video: Out of Sight, Not Out of Mind"
☆17Mar 18, 2025Updated last year
daooshee / TE141K
View on GitHub
Project website of TE141K.
☆17Mar 24, 2020Updated 6 years ago
remega / Compressd_Domain_SaliencyPrediction
View on GitHub
The open-source code for paper 'Learning to Detect Video Saliency With HEVC Features'
☆13Nov 9, 2017Updated 8 years ago
icq-benchmark / icq-benchmark
View on GitHub
☆19Jul 28, 2025Updated last year