z-x-yang/DoraemonGPT

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/z-x-yang/DoraemonGPT)

z-x-yang / DoraemonGPT

Official repository of DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models

☆91

Alternatives and similar repositories for DoraemonGPT

Users that are interested in DoraemonGPT are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

guikunchen / FEC
View on GitHub
[CVPR'24] Neural Clustering based Visual Representation Learning
☆44Oct 6, 2025Updated 9 months ago
lingorX / LogicSeg
View on GitHub
(ICCV23 Oral) LOGICSEG: Parsing Visual Semantics with Neural Logic Learning and Reasoning
☆25Apr 11, 2024Updated 2 years ago
DefaultRui / BEV-Scene-Graph
View on GitHub
[ICCV23] Bird’s-Eye-View Scene Graph for Vision-Language Navigation
☆125Apr 12, 2024Updated 2 years ago
weijianan1 / NVI
View on GitHub
[ECCV2024] Nonverbal Interaction Detection
☆31Oct 30, 2024Updated last year
pansanity666 / INO_VOS
View on GitHub
The official code for [ACM MM 2022] 'In-N-Out Generative Learning for Dense Unsupervised Video Segmentation'.
☆20Feb 22, 2023Updated 3 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
DefaultRui / VLN-VER
View on GitHub
[CVPR24] Volumetric Environment Representation for Vision-Language Navigation
☆143Sep 9, 2024Updated last year
kagawa588 / GvSeg
View on GitHub
This is the official implementation of "GvSeg: General and Task-Oriented Video Segmentation" (Accepted at ECCV 2024).
☆18Jul 15, 2024Updated 2 years ago
sxl142 / TEx-Face
View on GitHub
(AAAI2024) Controllable 3D Face Generation with Conditional Style Code Diffusion
☆40Apr 17, 2024Updated 2 years ago
pansanity666 / TransHuman
View on GitHub
Official code for ICCV 2023 paper: "TransHuman: A Transformer-based Human Representation for Generalizable Neural Human Rendering".
☆67Jan 11, 2024Updated 2 years ago
VamosC / CoLearning-meet-StitchUp
View on GitHub
[TIP 2023] Co-Learning Meets Stitch-Up for Noisy Multi-label Visual Recognition.
☆13Aug 19, 2023Updated 2 years ago
longxiang-ai / Human101
View on GitHub
The official implementation of "Human101: Training 100+FPS Human Gaussians in 100s from 1 View".
☆110Dec 27, 2023Updated 2 years ago
cnsdqd-dyb / VillagerAgent-Minecraft-multiagent-framework
View on GitHub
(VillagerAgent ACL 2024) A Graph based Minecraft multi agents framework
☆95Jun 5, 2026Updated last month
BUPT-PRIV / LOAF
View on GitHub
☆100Sep 5, 2023Updated 2 years ago
kkahatapitiya / LangRepo
View on GitHub
Code for our ACL 2025 paper "Language Repository for Long Video Understanding"
☆36Jun 17, 2024Updated 2 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
leonnnop / VAR
View on GitHub
[CVPR 2022] Visual Abductive Reasoning
☆124Oct 22, 2024Updated last year
VamosC / CapHuman
View on GitHub
[CVPR2024] CapHuman: Capture Your Moments in Parallel Universes
☆99Nov 20, 2024Updated last year
dhg-wei / MCL
View on GitHub
(ICML 2024) Improve Context Understanding in Multimodal Large Language Models via Multimodal Composition Learning
☆28Sep 27, 2024Updated last year
Dreamer312 / SEED-GRPO
View on GitHub
The official repository of SEED-GRPO: Semantic Entropy Enhanced GRPO for Uncertainty-Aware Policy Optimization
☆159Jan 29, 2026Updated 6 months ago
Ruiyang-061X / UA3D
View on GitHub
[ICCV'25] "Harnessing Uncertainty-aware Bounding Boxes for Unsupervised 3D Object Detection".
☆26Jan 12, 2026Updated 6 months ago
weijianan1 / LogicHOI
View on GitHub
[NeurIPS2023] Neural-Logic Human-Object Interaction Detection
☆14Aug 24, 2024Updated last year
yuexihang / DeltaPhi
View on GitHub
Implementation for "DeltaPhi: Learning Physical Trajectory Residual for PDE Solving"
☆13Jun 17, 2024Updated 2 years ago
yoxu515 / SEEAvatar
View on GitHub
☆47Mar 24, 2024Updated 2 years ago
lfedgeai / shifu
View on GitHub
Kubernetes-native IoT gateway
☆14Jul 21, 2025Updated last year
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
lingorX / HieraSeg
View on GitHub
CVPR2022 - Deep Hierarchical Semantic Segmentation - A structured, pixel-wise description of visual scenes in terms of the class hierarch…
☆257Apr 24, 2023Updated 3 years ago
aspirinone / CATR.github.io
View on GitHub
☆31Mar 1, 2024Updated 2 years ago
lfedgeai / eda
View on GitHub
Data on-Prem, Code on-the-Fly
☆15Nov 22, 2025Updated 8 months ago
lfedgeai / yomo
View on GitHub
🦖 Stateful Serverless Framework for Edge AI Infra
☆15Sep 3, 2025Updated 10 months ago
River-Zhang / GTA
View on GitHub
[NeurIPS 23] Official repository for NeurIPS 2023 paper "Global-correlated 3D-decoupling Transformer for Clothed Avatar Reconstruction"
☆112Sep 21, 2025Updated 10 months ago
ChengHan111 / DNC
View on GitHub
Official Pytorch implementation of "Visual Recognition with Deep Nearest Centroids". (ICLR2023 Spotlight)
☆69Feb 1, 2023Updated 3 years ago
cnsdqd-dyb / Guide-GRPO
View on GitHub
Aims for memory-efficient training (24GB VRAM) on consumer GPUs. Optimizing language models through guidance tokens in reasoning chains, …
☆28Feb 23, 2025Updated last year
traveler-framework / TraveLER
View on GitHub
[EMNLP 2024] TraveLER: A Modular Multi-LMM Agent Framework for Video Question-Answering
☆18Oct 31, 2024Updated last year
wxh1996 / VideoAgent
View on GitHub
☆150Apr 16, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
knightyxp / VideoGrain
View on GitHub
[ICLR 2025] VideoGrain: This repo is the official implementation of "VideoGrain: Modulating Space-Time Attention for Multi-Grained Video …
☆159Mar 24, 2025Updated last year
suoych / KEDs
View on GitHub
Implementation of the paper Knowledge-Enhanced Dual-stream Zero-shot Composed Image Retrieval (CVPR 2024)
☆20Nov 4, 2024Updated last year
dhg-wei / TOPA
View on GitHub
(NeurIPS 2024 Spotlight) TOPA: Extend Large Language Models for Video Understanding via Text-Only Pre-Alignment
☆29Sep 27, 2024Updated last year
antonioo-c / Diptych-Prompting
View on GitHub
Unofficial implementation of 'Large-Scale Text-to-Image Model with Inpainting is a Zero-Shot Subject-Driven Image Generator'
☆10Dec 10, 2024Updated last year
knightyxp / DGL
View on GitHub
[AAAI 2024] DGL: Dynamic Global-Local Prompt Tuning for Text-Video Retrieval.
☆49Oct 14, 2024Updated last year
Szy-Young / ActFormer
View on GitHub
🔥ActFormer in PyTorch (ICCV 2023)
☆67May 14, 2024Updated 2 years ago
lfedgeai / SPEAR
View on GitHub
Distributed Cloud-Edge Collaborative AI Agent Platform
☆35Jun 26, 2026Updated last month