qiulu66/EgoPlan-Bench2

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/qiulu66/EgoPlan-Bench2)

qiulu66 / EgoPlan-Bench2

☆27

Alternatives and similar repositories for EgoPlan-Bench2

Users that are interested in EgoPlan-Bench2 are comparing it to the libraries listed below

Sorting:

TencentARC / Moto
View on GitHub
[ICCV2025 Oral] Latent Motion Token as the Bridging Language for Learning Robot Manipulation from Videos
☆164Oct 1, 2025Updated 5 months ago
TencentARC / Divot
View on GitHub
Diffusion Powers Video Tokenizer for Comprehension and Generation (CVPR 2025)
☆86Feb 27, 2025Updated last year
TencentARC / Video-Holmes
View on GitHub
Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?
☆88Jul 13, 2025Updated 7 months ago
TencentARC / TVTS
View on GitHub
Turning to Video for Transcript Sorting
☆49Aug 27, 2023Updated 2 years ago
TencentARC / SEED-Bench-R1
View on GitHub
☆98Jun 23, 2025Updated 8 months ago
TencentARC / GRPO-CARE
View on GitHub
☆81Jun 23, 2025Updated 8 months ago
Karine-Huang / GenMAC
View on GitHub
[AAAI 2026] GenMAC for Compositional Text-to-Video Generation
☆32Jan 10, 2026Updated 2 months ago
TencentARC / TaCA
View on GitHub
Official code for the paper, "TaCA: Upgrading Your Visual Foundation Model with Task-agnostic Compatible Adapter".
☆16Jun 20, 2023Updated 2 years ago
rxtan2 / Koala-video-llm
View on GitHub
☆37Sep 16, 2024Updated last year
tmbdev-archive / webdataset-imagenet-2
View on GitHub
A small repository demonstrating the use of Webdataset and Imagenet
☆17Dec 19, 2023Updated 2 years ago
mayhugotong / VideoINSTA
View on GitHub
This is the official impletations of the EMNLP Findings paper, VideoINSTA: Zero-shot Long-Form Video Understanding via Informative Spatia…
☆24Nov 15, 2024Updated last year
mightyzau / InfMLLM
View on GitHub
☆19Dec 6, 2023Updated 2 years ago
liyz15 / Aligning-Latent-Spaces-with-Flow-Priors
View on GitHub
☆40Jun 6, 2025Updated 9 months ago
MengLcool / DeepStack-VL
View on GitHub
[NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effect…
☆80Jun 17, 2024Updated last year
TencentARC / Plot2Code
View on GitHub
☆23Aug 17, 2024Updated last year
IRL-VLA / IRL-VLA
View on GitHub
Official repo for IRL-VLA
☆76Aug 13, 2025Updated 6 months ago
Jazzcharles / Egoinstructor
View on GitHub
Pytorch implementation for Egoinstructor at CVPR 2024
☆28Dec 1, 2024Updated last year
AV-Odyssey / AV-Odyssey
View on GitHub
This repo contains evaluation code for the paper "AV-Odyssey: Can Your Multimodal LLMs Really Understand Audio-Visual Information?"
☆31Dec 23, 2024Updated last year
TencentARC / GVT
View on GitHub
Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".
☆58Jun 27, 2023Updated 2 years ago
UniAdapter / UniAdapter
View on GitHub
☆26Mar 20, 2023Updated 2 years ago
TencentARC / TokLIP
View on GitHub
TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation
☆236Aug 18, 2025Updated 6 months ago
AmrinKareem / PARIS3D
View on GitHub
Official implementation of PARIS3D (Accepted to ECCV 2024).
☆27Sep 25, 2024Updated last year
TencentARC / ViT-Lens
View on GitHub
[CVPR 2024] ViT-Lens: Towards Omni-modal Representations
☆190Feb 3, 2025Updated last year
AILab-CVC / SEED-Bench
View on GitHub
(CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.
☆361Jan 14, 2025Updated last year
OpenGVLab / EgoExoLearn
View on GitHub
[CVPR 2024] Data and benchmark code for the EgoExoLearn dataset
☆80Aug 26, 2025Updated 6 months ago
YuqingWang1029 / TokenBridge
View on GitHub
[ICCV2025] TokenBridge: Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation. https://yuqingwang1029.github.io/To…
☆153Jul 24, 2025Updated 7 months ago
alanaai / EVUD
View on GitHub
Egocentric Video Understanding Dataset (EVUD)
☆33Jul 4, 2024Updated last year
TencentARC / DSR_Suite
View on GitHub
☆66Feb 23, 2026Updated 2 weeks ago
Kami-code / HandsOnVLM-release
View on GitHub
HandsOnVLM: Vision-Language Models for Hand-Object Interaction Prediction
☆41Sep 15, 2025Updated 5 months ago
SCZwangxiao / video-ReTaKe
View on GitHub
Official implementation of paper ReTaKe: Reducing Temporal and Knowledge Redundancy for Long Video Understanding
☆40Mar 16, 2025Updated 11 months ago
sega-hsj / MVT-3DVG
View on GitHub
[CVPR 2022] Multi-View Transformer for 3D Visual Grounding
☆80Nov 9, 2022Updated 3 years ago
3dlg-hcvc / multi3drefer
View on GitHub
[ICCV 2023] Multi3DRefer: Grounding Text Description to Multiple 3D Objects
☆94Oct 18, 2025Updated 4 months ago
OpenDFM / MULTI-Benchmark
View on GitHub
[SCIS] MULTI-Benchmark: Multimodal Understanding Leaderboard with Text and Images
☆44Nov 19, 2025Updated 3 months ago
ZhengYu518 / VL-Mamba
View on GitHub
Implementation of "VL-Mamba: Exploring State Space Models for Multimodal Learning"
☆85Mar 21, 2024Updated last year
TencentARC / MindOmni
View on GitHub
☆141Oct 15, 2025Updated 4 months ago
nirgreshler / bayesian-online-planning
View on GitHub
The code for the paper "A Bayesian Approach to Online Planning" published in ICML 2024.
☆13Jun 17, 2024Updated last year
YangLiu9208 / VisionGRU
View on GitHub
VisionGRU: A Linear-Complexity RNN Model for Efficient Image Analysis
☆13Dec 26, 2024Updated last year
ExplainableML / EgoCVR
View on GitHub
[ECCV 2024] EgoCVR: An Egocentric Benchmark for Fine-Grained Composed Video Retrieval
☆41Apr 11, 2025Updated 10 months ago
Robot-K / OpenVLA_AIRBOT
View on GitHub
OpenVLA for AIRBOT
☆15Aug 15, 2024Updated last year