zjr2000/GVL

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/zjr2000/GVL)

zjr2000 / GVL

Official implementation for paper Learning Grounded Vision-Language Representation for Versatile Understanding in Untrimmed Videos

☆28

Alternatives and similar repositories for GVL

Users that are interested in GVL are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

zjr2000 / Untrimmed-Video-Feature-Extractor
View on GitHub
A simple and effective feature extractor for untrimmed videos
☆13Sep 1, 2022Updated 3 years ago
zjr2000 / LLMVA-GEBC
View on GitHub
Winner solution to Generic Event Boundary Captioning task in LOVEU Challenge (CVPR 2023 workshop)
☆29Jan 1, 2024Updated 2 years ago
NIneeeeeem / LangDC
View on GitHub
[EMNLP 2025 Oral] Official codebase for Seeing More, Saying More: Lightweight Language Experts are Dynamic Video Token Compressors.
☆18Sep 7, 2025Updated 10 months ago
zjr2000 / REVERIE
View on GitHub
[ECCV2024] Reflective Instruction Tuning: Mitigating Hallucinations in Large Vision-Language Models
☆20Jul 17, 2024Updated 2 years ago
zjr2000 / SPES
View on GitHub
Official Implementation for paper "Pretraining A Large Language Model using Distributed GPUs: A Memory-Efficient Decentralized Paradigm"
☆23May 8, 2026Updated 2 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
zjr2000 / Awesome-Multimodal-Chatbot
View on GitHub
Awesome Multimodal Assistant is a curated list of multimodal chatbots/conversational assistants that utilize various modes of interaction…
☆79Jun 18, 2023Updated 3 years ago
ttengwang / PDVC
View on GitHub
End-to-End Dense Video Captioning with Parallel Decoding (ICCV 2021)
☆230Jan 3, 2024Updated 2 years ago
ttgeng233 / UniAV
View on GitHub
Unified Audio-Visual Perception for Multi-Task Video Localization
☆33Apr 19, 2024Updated 2 years ago
Yioutpi / Awesome-3D-Understanding
View on GitHub
☆13Jul 22, 2024Updated 2 years ago
antoyang / VidChapters
View on GitHub
[NeurIPS 2023 D&B] VidChapters-7M: Video Chapters at Scale
☆213Nov 13, 2023Updated 2 years ago
jinhyunj / EaTR
View on GitHub
Official pytorch repository for "Knowing Where to Focus: Event-aware Transformer for Video Grounding" (ICCV 2023)
☆55Sep 7, 2023Updated 2 years ago
Guaranteer / VidSTG-Dataset
View on GitHub
This repository provides the dataset introduced by the paper "Where Does It Exist: Spatio-Temporal Video Grounding for Multi-Form Sentenc…
☆70May 1, 2020Updated 6 years ago
ttgeng233 / UnAV
View on GitHub
Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline (CVPR 2023)
☆73Jan 4, 2026Updated 6 months ago
KoDohwan / VT-TWINS
View on GitHub
Video-Text Representation Learning via Differentiable Weak Temporal Alignment (PyTorch implementation for the CVPR 2022 paper)
☆11Oct 12, 2022Updated 3 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
QiQAng / UEDVC
View on GitHub
☆12May 26, 2023Updated 3 years ago
microsoft / LAVENDER
View on GitHub
A Unified Framework for Video-Language Understanding
☆62Jun 17, 2023Updated 3 years ago
baiyang4 / D-LSG-Video-Caption
View on GitHub
☆26Oct 20, 2021Updated 4 years ago
ttengwang / ESGN
View on GitHub
Event Sequence Generation Network
☆14Jun 22, 2021Updated 5 years ago
Tanveer81 / RGNet
View on GitHub
This is the official implementation of RGNet: A Unified Retrieval and Grounding Network for Long Videos
☆20Mar 3, 2025Updated last year
ttgeng233 / LongVALE
View on GitHub
LongVALE: Vision-Audio-Language-Event Benchmark Towards Time-Aware Omni-Modal Perception of Long Videos. (CVPR 2025))
☆61Jun 9, 2025Updated last year
ubc-tea / FedSoup
View on GitHub
The official Pytorch implementation of paper "FedSoup: Improving Generalization and Personalization in Federated Learning via Selective M…
☆18Apr 14, 2024Updated 2 years ago
HuiGuanLab / DL-DKD
View on GitHub
Source code of the paper Dual Learning with Dynamic Knowledge Distillation and Soft Alignment for Partially Relevant Video Retrieval
☆19May 13, 2026Updated 2 months ago
TencentARC / ARC-Chapter
View on GitHub
Structuring Hour-Long Videos into Navigable Chapters and Hierarchical Summaries
☆44Nov 19, 2025Updated 8 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
fmu2 / snag_release
View on GitHub
Official Implementation of SnAG (CVPR 2024)
☆59Apr 26, 2025Updated last year
lethal233 / SUSTech-CS315
View on GitHub
CS315 Lab & Assignment in SUSTech
☆22Dec 14, 2021Updated 4 years ago
rxtan2 / Koala-video-llm
View on GitHub
☆37Sep 16, 2024Updated last year
Darshansingh11 / AVLectures
View on GitHub
Official repository of the paper "Unsupervised Audio-Visual Lecture Segmentation", WACV 2023
☆13Mar 3, 2025Updated last year
TencentARC / TimeLens
View on GitHub
[CVPR 2026] TimeLens: Rethinking Video Temporal Grounding with Multimodal LLMs
☆162Updated this week
CASIA-IVA-Lab / OPT_Questioner
View on GitHub
Official PyTorch implementation of the paper "Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner"
☆15Aug 9, 2023Updated 2 years ago
TianheWu / Assessor360
View on GitHub
[NeurIPS 2023] Assessor360: Multi-sequence Network for Blind Omnidirectional Image Quality Assessment
☆38Oct 11, 2023Updated 2 years ago
waybarrios / guidance-based-video-grounding
View on GitHub
[ICCV 2023] The official PyTorch implementation of the paper: "Localizing Moments in Long Video Via Multimodal Guidance"
☆23Sep 26, 2024Updated last year
entalent / MemCap
View on GitHub
code for paper `MemCap: Memorizing Style Knowledge for Image Captioning`
☆11Mar 17, 2020Updated 6 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
TencentARC / FLM
View on GitHub
Accelerating Vision-Language Pretraining with Free Language Modeling (CVPR 2023)
☆31May 15, 2023Updated 3 years ago
wjun0830 / CGDETR
View on GitHub
Official pytorch repository for CG-DETR "Correlation-guided Query-Dependency Calibration in Video Representation Learning for Temporal Gr…
☆154Aug 21, 2024Updated last year
RyanLiut / awesome-diverse-captioning
View on GitHub
Some papers about *diverse* image (a few videos) captioning
☆25Apr 4, 2023Updated 3 years ago
jerinphilip / ilmulti
View on GitHub
Tooling to play around with multilingual machine translation for Indian Languages.
☆22Mar 5, 2022Updated 4 years ago
md-mohaiminul / ViS4mer
View on GitHub
☆58Dec 2, 2025Updated 7 months ago
iGuoYanjun / Memorize-When-Needed
View on GitHub
☆23Jun 29, 2026Updated 3 weeks ago
weizhou-geek / SFSN
View on GitHub
Implementation of QoMEX 2021 "Image Super-Resolution Quality Assessment: Structural Fidelity Versus Statistical Naturalness"
☆16Sep 28, 2022Updated 3 years ago