vinthony / academic
Yet Another Academic Homepage Template
☆18Updated last month
Alternatives and similar repositories for academic:
Users that are interested in academic are comparing it to the libraries listed below
- ☆65Updated this week
- IMProv: Inpainting-based Multimodal Prompting for Computer Vision Tasks☆59Updated 4 months ago
- [ECCV2024, Oral, Best Paper Finalist]This is the official implementation of the paper "LEGO: Learning EGOcentric Action Frame Generation …☆36Updated this week
- [NeurIPS 2024] Official code for HourVideo: 1-Hour Video Language Understanding☆62Updated last month
- Official implementation of "Self-Improving Video Generation"☆59Updated last month
- [NeurIPSw'24] This repo is the official implementation of "MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simu…☆81Updated 3 weeks ago
- ☆42Updated 9 months ago
- ☆15Updated 3 months ago
- Language Repository for Long Video Understanding☆31Updated 8 months ago
- [CVPR'24 Highlight] The official code and data for paper "EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Lan…☆56Updated 2 months ago
- ☆27Updated 7 months ago
- [ICLR 2025] CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion☆40Updated 3 weeks ago
- ☆66Updated 2 months ago
- Code release for the paper "Egocentric Video Task Translation" (CVPR 2023 Highlight)☆32Updated last year
- Official Repository of Multi-Object Hallucination in Vision-Language Models (NeurIPS 2024)☆26Updated 3 months ago
- Benchmarking Panoptic Video Scene Graph Generation (PVSG), CVPR'23☆83Updated 9 months ago
- Official repository for "iVideoGPT: Interactive VideoGPTs are Scalable World Models" (NeurIPS 2024), https://arxiv.org/abs/2405.15223☆113Updated last month
- [ICCV2023] EgoObjects: A Large-Scale Egocentric Dataset for Fine-Grained Object Understanding☆75Updated last year
- ☆68Updated 7 months ago
- [CVPR 2024] Data and benchmark code for the EgoExoLearn dataset☆54Updated 5 months ago
- [ECCV 2024] STEVE in Minecraft is for See and Think: Embodied Agent in Virtual Environment☆36Updated last year
- Official repository of S-Agents: Self-organizing Agents in Open-ended Environment☆21Updated 11 months ago
- Code for paper "Grounding Video Models to Actions through Goal Conditioned Exploration".☆41Updated last month
- Binding Touch to Everything: Learning Unified Multimodal Tactile Representations☆28Updated last week
- ☆61Updated 5 months ago
- This repository is a collection of research papers on World Models.☆37Updated last year
- Latent Motion Token as the Bridging Language for Robot Manipulation☆72Updated last week
- Egocentric Video Understanding Dataset (EVUD)☆26Updated 7 months ago
- Official implementation of "Connect, Collapse, Corrupt: Learning Cross-Modal Tasks with Uni-Modal Data" (ICLR 2024)☆28Updated 4 months ago
- VPEval Codebase from Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)☆44Updated last year