vinthony / academic
Yet Another Academic Homepage Template
☆19Updated 2 weeks ago
Alternatives and similar repositories for academic:
Users that are interested in academic are comparing it to the libraries listed below
- [ICML 2024] A Touch, Vision, and Language Dataset for Multimodal Alignment☆72Updated 2 months ago
- IMProv: Inpainting-based Multimodal Prompting for Computer Vision Tasks☆58Updated 7 months ago
- [ICLR 2024] Seer: Language Instructed Video Prediction with Latent Diffusion Models☆31Updated 11 months ago
- Language Repository for Long Video Understanding☆31Updated 10 months ago
- 🤖 [ICLR'25] Multimodal Video Understanding Framework (MVU)☆36Updated 2 months ago
- [ECCV2024, Oral, Best Paper Finalist]This is the official implementation of the paper "LEGO: Learning EGOcentric Action Frame Generation …☆37Updated 2 months ago
- Official implementation of "Self-Improving Video Generation"☆63Updated last month
- Code release for the paper "Egocentric Video Task Translation" (CVPR 2023 Highlight)☆32Updated last year
- ☆19Updated 5 months ago
- Code release for NeurIPS 2023 paper SlotDiffusion: Object-centric Learning with Diffusion Models☆85Updated last year
- [ICLR 2025] CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion☆41Updated 3 months ago
- Official Release of NeurIPS 2023 Spotlight paper "Object-Centric Slot Diffusion"☆65Updated last year
- [ICLR 2025] Official implementation and benchmark evaluation repository of <PhysBench: Benchmarking and Enhancing Vision-Language Models …☆50Updated last month
- Training code for CLIP-FlanT5☆26Updated 8 months ago
- [ICCV2023] EgoObjects: A Large-Scale Egocentric Dataset for Fine-Grained Object Understanding☆76Updated last year
- Official code for MotionBench (CVPR 2025)☆35Updated last month
- Code release for "EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone" [ICCV, 2023]☆97Updated 9 months ago
- [NeurIPS2023] Official implementation of the paper "Large Language Models are Visual Reasoning Coordinators"☆105Updated last year
- Egocentric Video Understanding Dataset (EVUD)☆29Updated 9 months ago
- ElasticTok: Adaptive Tokenization for Image and Video☆66Updated 5 months ago
- FunQA benchmarks funny, creative, and magic videos for challenging tasks including timestamp localization, video description, reasoning, …☆100Updated 4 months ago
- LAVIS - A One-stop Library for Language-Vision Intelligence☆47Updated 8 months ago
- This repo contains the official implementation of ICLR 2024 paper "Is ImageNet worth 1 video? Learning strong image encoders from 1 long …☆87Updated 11 months ago
- [ICCV 2023] Unsupervised Compositional Concepts Discovery with Text-to-Image Generative Models☆84Updated last year
- [ICLR 2024] LLM-grounded Video Diffusion Models (LVD): official implementation for the LVD paper☆150Updated 11 months ago
- Video Generation, Physical Commonsense, Semantic Adherence, VideoCon-Physics☆90Updated last month
- Code for the paper "GenHowTo: Learning to Generate Actions and State Transformations from Instructional Videos" published at CVPR 2024☆51Updated last year
- ☆41Updated last month
- Code and data for the paper "Emergent Visual-Semantic Hierarchies in Image-Text Representations" (ECCV 2024)☆27Updated 8 months ago
- Source code for "A Dense Reward View on Aligning Text-to-Image Diffusion with Preference" (ICML'24).☆38Updated 11 months ago