JacobYuan7 / RLIPv2Links

[ICCV 2023] RLIPv2: Fast Scaling of Relational Language-Image Pre-training

☆135

Alternatives and similar repositories for RLIPv2

Users that are interested in RLIPv2 are comparing it to the libraries listed below

Sorting:

Artanic30 / HOICLIP
CVPR 2023 Accepted Paper HOICLIP: Efficient Knowledge Transfer for HOI Detection with Vision-Language Models
☆68Updated last year
JacobYuan7 / RLIP
[NeurIPS 2022 Spotlight] RLIP: Relational Language-Image Pre-training and a series of other methods to solve HOI detection and Scene Grap…
☆78Updated last year
fredzzhang / pvic
[ICCV'23] Official PyTorch implementation for paper "Exploring Predicate Visual Context in Detecting Human-Object Interactions"
☆85Updated last year
LilyDaytoy / OpenPVSG
Benchmarking Panoptic Video Scene Graph Generation (PVSG), CVPR'23
☆99Updated last year
IDEA-Research / DiffHOI
Official implementation of the paper "Boosting Human-Object Interaction Detection with Text-to-Image Diffusion Model"
☆64Updated 2 years ago
farewellthree / BT-Adapter
[CVPR 2024] Official PyTorch implementation of the paper "One For All: Video Conversation is Feasible Without Video Instruction Tuning"
☆35Updated last year
xingaoli / DP-HOI
Disentangled Pre-training for Human-Object Interaction Detection
☆26Updated 2 months ago
xk-huang / segment-caption-anything
[CVPR 24] The repository provides code for running inference and training for "Segment and Caption Anything" (SCA) , links for downloadin…
☆230Updated last year
haochenheheda / LVVIS
Large-Vocabulary Video Instance Segmentation dataset
☆95Updated last year
shikras / d-cube
A detection/segmentation dataset with labels characterized by intricate and flexible expressions. "Described Object Detection: Liberating…
☆138Updated last year
facebookresearch / EgoVLPv2
Code release for "EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone" [ICCV, 2023]
☆100Updated last year
OreoChocolate / MUREN
The official code for Relational Context Learning for Human-Object Interaction Detection, CVPR2023.
☆52Updated 2 years ago
Ziyang412 / UCoFiA
Pytorch Code for "Unified Coarse-to-Fine Alignment for Video-Text Retrieval" (ICCV 2023)
☆66Updated last year
wengzejia1 / Open-VCLIP
☆119Updated last year
mlvlab / Flipped-VQA
Large Language Models are Temporal and Causal Reasoners for Video Question Answering (EMNLP 2023)
☆77Updated 7 months ago
HengLan / CGSTVG
[CVPR 2024] Context-Guided Spatio-Temporal Video Grounding
☆61Updated last year
Ahnsun / merlin
[ECCV2024] Official code implementation of Merlin: Empowering Multimodal LLMs with Foresight Minds
☆95Updated last year
NeeluMadan / ViFM_Survey
Foundation Models for Video Understanding: A Survey
☆141Updated 4 months ago
PolyU-ChenLab / ETBench
👾 E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding (NeurIPS 2024)
☆70Updated 10 months ago
Visual-AI / FROSTER
[ICLR 2024] FROSTER: Frozen CLIP is a Strong Teacher for Open-Vocabulary Action Recognition
☆90Updated 10 months ago
Becomebright / GroundVQA
Official PyTorch code of GroundVQA (CVPR'24)
☆64Updated last year
xiaomabufei / FGAHOI
☆33Updated 2 years ago
ut-vision / ActionVOS
[ECCV 2024 Oral] ActionVOS: Actions as Prompts for Video Object Segmentation
☆31Updated 11 months ago
OpenGVLab / EgoVideo
[CVPR 2024 Champions][ICLR 2025] Solutions for EgoVis Chanllenges in CVPR 2024
☆132Updated 6 months ago
Hon-Wong / Elysium
[ECCV 2024] Elysium: Exploring Object-level Perception in Videos via MLLM
☆86Updated last year
callsys / ControlCap
[ECCV 2024] ControlCap: Controllable Region-level Captioning
☆79Updated last year
gyxxyg / VTG-LLM
[AAAI 2025] VTG-LLM: Integrating Timestamp Knowledge into Video LLMs for Enhanced Video Temporal Grounding
☆115Updated 11 months ago
mbzuai-oryx / Video-LLaVA
PG-Video-LLaVA: Pixel Grounding in Large Multimodal Video Models
☆259Updated 3 months ago
SY-Xuan / Pink
Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs
☆95Updated 10 months ago
Echo0125 / MAT-Memory-and-Anticipation-Transformer
[ICCV 2023] Official implementation of Memory-and-Anticipation Transformer for Online Action Understanding
☆49Updated 2 years ago