Official implementation of EgoThinker at NIPS 2025
☆24Nov 25, 2025Updated 3 months ago
Alternatives and similar repositories for EgoThinker
Users that are interested in EgoThinker are comparing it to the libraries listed below
Sorting:
- ☆13Apr 23, 2025Updated 10 months ago
- [NeurIPS'25] Time-R1: Post-Training Large Vision Language Model for Temporal Video Grounding☆77Dec 14, 2025Updated 2 months ago
- The first spoken long-text dataset derived from live streams, designed to reflect the redundancy-rich and conversational nature of real-w…☆12Jun 28, 2025Updated 7 months ago
- Official Implementation of "Open-Vocabulary Audio-Visual Semantic Segmentation" [ACM MM 2024 Oral].☆35Nov 2, 2024Updated last year
- ☆28Aug 6, 2025Updated 6 months ago
- ☆34Mar 10, 2023Updated 2 years ago
- ClawPhD is an agent for research that can turn academic papers into publication-ready diagrams, posters, videos, and more.☆55Updated this week
- A codebase for data crawling and preprocessing for TTS and ASR systems training.☆22Updated this week
- A Google Chrome Extension that replaces the official New Tab page with a beautiful to-do list.☆12Mar 7, 2018Updated 7 years ago
- ☆10Nov 17, 2022Updated 3 years ago
- ECG analysis to classify anterior myocardial infarction cases.☆10May 17, 2017Updated 8 years ago
- [CVPR 2024 Highlight] Official implementation of the paper: Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-…☆40Apr 20, 2025Updated 10 months ago
- The official repo for "Ref-AVS: Refer and Segment Objects in Audio-Visual Scenes", ECCV 2024☆50Oct 12, 2025Updated 4 months ago
- [BMVC 2021]OMAD: Object Model with Articulated Deformations for Pose Estimation and Retrieval☆12Dec 17, 2021Updated 4 years ago
- This repository contains code used for our Multi Sentence Inference NAACL'22 paper.☆12Mar 6, 2023Updated 2 years ago
- ☆13May 21, 2024Updated last year
- [NIPS 2025] FreqPolicy: Frequency Autoregressive Visuomotor Policy with Continuous Tokens☆20Oct 12, 2025Updated 4 months ago
- EgoToM is an egocentric theory-of-mind benchmark built on Ego4D videos, containing multi-choice questions that evaluate multimodal large …☆13Apr 1, 2025Updated 10 months ago
- [CVPR 2024] "Towards Robust Audiovisual Segmentation in Complex Environments with Quantization-based Semantic Decomposition"☆12Feb 27, 2024Updated 2 years ago
- Official code release for "TDFNet: An Efficient Audio-Visual Speech Separation Model with Top-down Fusion", accepted ICIST 2023☆12Mar 17, 2024Updated last year
- [ICCV 2025] Official repo of "EC-Flow: Enabling Versatile Robotic Manipulation from Action-Unlabeled Videos via Embodiment-Centric Flow"☆27Oct 16, 2025Updated 4 months ago
- Just wanna see what type and how many GPUs/TPUs are used in CVPR 2025 oral papers. Fun vibe coding with LLMs.☆12Apr 24, 2025Updated 10 months ago
- A few TensorFlow techniques I'm saving for future reference.☆13Oct 4, 2016Updated 9 years ago
- [CVPR 2024] Code and datasets for 'Learning Spatial Features from Audio-Visual Correspondence in Egocentric Videos'☆13Jun 16, 2024Updated last year
- This is the official Pytorch code for our paper "Artemis: Structured Visual Reasoning for Perception Policy Learning".☆14Dec 4, 2025Updated 2 months ago
- Reinforcement Learning Tuning for VideoLLMs: Reward Design and Data Efficiency☆60Jun 6, 2025Updated 8 months ago
- [CVPR 2023] Cascade Evidential Learning for Open-world Weakly-supervised Temporal Action Localization☆12Jul 9, 2024Updated last year
- Geometry-Consistent Video Diffusion for Robotic Visual Policy Transfer☆28Nov 4, 2025Updated 3 months ago
- Official Codebase of "A Unified Audio-Visual Learning Framework for Localization, Separation, and Recognition" (ICML 2023)☆12Jun 1, 2023Updated 2 years ago
- ☆11May 7, 2022Updated 3 years ago
- Vlaser: Vision-Language-Action Model with Synergistic Embodied Reasoning☆41Updated this week
- The official repo of the paper "Cal-SFDA: Source-Free Domain-adaptive Semantic Segmentation with Differentiable Expected Calibration Erro…☆10Oct 29, 2023Updated 2 years ago
- FieldGen is a semi-automatic data generation framework that enables scalable collection of diverse, high-quality real-world manipulation …☆25Oct 28, 2025Updated 4 months ago
- ☆11Jul 31, 2022Updated 3 years ago
- official code of Efficient Depth-Guided Urban View Synthesis☆14Dec 24, 2024Updated last year
- Pytorch implementation of Yolo V3☆11Aug 30, 2018Updated 7 years ago
- Assignments from 16-825 Learning for 3D Vision at Carnegie Mellon University☆13Apr 5, 2023Updated 2 years ago
- Visual Reaction: Learning to Play Catch with Your Drone☆13Jul 23, 2023Updated 2 years ago
- This repository contains the Adverbs in Recipes (AIR) dataset and the code published at the CVPR 23 paper: "Learning Action Changes by Me…☆13May 25, 2023Updated 2 years ago