Qinying-Liu/Awesome-omni-modal-understanding

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Qinying-Liu/Awesome-omni-modal-understanding)

Qinying-Liu / Awesome-omni-modal-understanding

Collection of papers about video-audio understanding

☆25

Alternatives and similar repositories for Awesome-omni-modal-understanding

Users that are interested in Awesome-omni-modal-understanding are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

LikeLidoA / BiDGANet
View on GitHub
[CAC2023] Bilateral Network with Residual U-blocks and Dual-Guided Attention for Real-time Semantic Segmentation
☆11Nov 28, 2024Updated last year
EdenHazardan / SFC
View on GitHub
☆11Jun 13, 2024Updated last year
Qinying-Liu / C3BN
View on GitHub
accepted by ICME2023 oral（CCF B）
☆64Jul 16, 2023Updated 2 years ago
thuhcsi / Contextual-Biasing-Dataset
View on GitHub
open-source Mandarian biased word dataset
☆14Sep 21, 2023Updated 2 years ago
ustc-hyin / HiMAP
View on GitHub
Code for paper: Unraveling the Shift of Visual Information Flow in MLLMs: From Phased Interaction to Efficient Inference
☆13Jun 7, 2025Updated 11 months ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
Qinying-Liu / PBRNet
View on GitHub
The code is for PBRnet for action detection
☆76Jun 10, 2021Updated 4 years ago
Qinying-Liu / CoDT
View on GitHub
The implementaion of CoDT on the task of NTU-60+->PKUMMD
☆76Feb 24, 2023Updated 3 years ago
PRIS-CV / AutoDriveRL
View on GitHub
☆19Jun 13, 2025Updated 11 months ago
Qinying-Liu / OpenWTAL
View on GitHub
a unified and simple codebase for weakly-supervised temporal action localization
☆23Sep 30, 2023Updated 2 years ago
HITsz-TMG / ICL-State-Vector
View on GitHub
☆12Jul 4, 2024Updated last year
zlngan / ASQuery
View on GitHub
☆12Jan 26, 2025Updated last year
PostMindLab / ICD
View on GitHub
[ACL 2024] Mitigating Hallucinations in Large Vision-Language Models with Instruction Contrastive Decoding
☆17Nov 10, 2025Updated 6 months ago
baopj / Vid-Morp
View on GitHub
☆12Dec 6, 2024Updated last year
baopj / E3M
View on GitHub
[ECCV 2024] The first zero-shot setting for spatio-temporal video grounding.
☆11Jul 16, 2024Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
ljang0 / videowebarena
View on GitHub
☆14Dec 25, 2024Updated last year
pengzhansun / Occluded-Pose-Reasoning
View on GitHub
[WACV 2024 Oral] Rethinking Visibility in Human Pose Estimation: Occluded Pose Reasoning via Transformers
☆17Jul 6, 2024Updated last year
zhyx12 / CRCo
View on GitHub
☆35Mar 20, 2023Updated 3 years ago
LaVi-Lab / FTTT
View on GitHub
[ACL 2025] Official code for ''Learning to Reason from Feedback at Test-Time''.
☆14May 16, 2025Updated last year
orangeshushu / TCM-Ladder
View on GitHub
The first multimodal QA dataset specifically designed for evaluating large TCM language models.
☆21Oct 24, 2025Updated 6 months ago
JieShibo / MemVP
View on GitHub
[ICML 2024] Memory-Space Visual Prompting for Efficient Vision-Language Fine-Tuning
☆50May 12, 2024Updated 2 years ago
hxixixh / amo-release
View on GitHub
Official implementation for CVPR 2025 paper "AMO Sampler: Enhancing Text Rendering with Overshooting"
☆30May 3, 2025Updated last year
YiwengXie / FluxMem
View on GitHub
[CVPR 2026] FluxMem: Adaptive Hierarchical Memory for Streaming Video Understanding
☆61Mar 16, 2026Updated 2 months ago
TsinghuaC3I / AdsQA
View on GitHub
[ICCV 2025] AdsQA: Towards Advertisement Video Understanding Arxiv: https://arxiv.org/abs/2509.08621
☆34Oct 30, 2025Updated 6 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
xuyang-liu16 / V2Drop
View on GitHub
[CVPR 2026] Variation-aware Vision Token Dropping for Faster Large Vision-Language Models
☆30Mar 18, 2026Updated 2 months ago
nianfd / RWKV-VG
View on GitHub
☆10Dec 3, 2024Updated last year
wangzhichuan123 / DAC
View on GitHub
[ICCV 2025] Official PyTorch Code for "Describe, Adapt and Combine: Empowering CLIP Encoders for Open-set 3D Object Retrieval"
☆17Aug 23, 2025Updated 8 months ago
CR400AF-A / SparseMM
View on GitHub
[ICCV 2025] SparseMM: Head Sparsity Emerges from Visual Concept Responses in MLLMs
☆85Jan 17, 2026Updated 4 months ago
hyungjin-chung / VPS
View on GitHub
☆16Sep 11, 2025Updated 8 months ago
daiwk / BERT-pytorch
View on GitHub
Google AI 2018 BERT pytorch implementation
☆13Oct 22, 2018Updated 7 years ago
Never-wx / GCD
View on GitHub
[AAAI 2025] The official repository of our paper "GCD: Advancing Vision-Language Models for Incremental Object Detection via Global Align…
☆19Sep 10, 2025Updated 8 months ago
Specnr / FSGOptimizedSeedBank
View on GitHub
An efficient implementation of the FSG seed bank
☆11Jan 4, 2022Updated 4 years ago
tanvir-utexas / PaPr
View on GitHub
☆13Jul 3, 2024Updated last year
Open source password manager - Proton Pass • Ad
Securely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
KaiyangLi1992 / Uni-LoRA
View on GitHub
☆43Jan 16, 2026Updated 4 months ago
hkust-nlp / GUIMid
View on GitHub
☆22May 3, 2025Updated last year
sinwang20 / D2PO
View on GitHub
[ACL 2025] "World Modeling Makes a Better Planner: Dual Preference Optimization for Embodied Task Planning." https://arxiv.org/abs/2503.1…
☆18Jul 22, 2025Updated 9 months ago
mengchuang123 / VASparse-github
View on GitHub
[CVPR 2025] VASparse: Towards Efficient Visual Hallucination Mitigation via Visual-Aware Token Sparsification
☆50Mar 24, 2025Updated last year
Lliar-liar / Daily-Omni
View on GitHub
This is the official repository of Daily-Omni: Towards Audio-Visual Reasoning with Temporal Alignment across Modalities
☆40Apr 28, 2026Updated 3 weeks ago
bscho333 / ReVisiT
View on GitHub
[ACL 2026 Main] Revisit What You See: Revealing Visual Semantics in Vision Tokens to Guide LVLM Decoding
☆25Nov 21, 2025Updated 5 months ago
ssppp / Click4Caption
View on GitHub
A visual LLM for image region description or QA.
☆16Jul 14, 2023Updated 2 years ago