AI4Phys / SeePhysLinks
Official implementation for the paper "SeePhys: Does Seeing Help Thinking? -- Benchmarking Vision-Based Physics Reasoning"
☆39Updated last month
Alternatives and similar repositories for SeePhys
Users that are interested in SeePhys are comparing it to the libraries listed below
Sorting:
- AI2-THOR Data Collection Tool Based On Keyboard Interaction☆53Updated last year
- [ICML 2025] Official repository for paper "Scaling Video-Language Models to 10K Frames via Hierarchical Differential Distillation"☆175Updated 3 months ago
- ☆70Updated 8 months ago
- A Gaussian dense reward framework for GUI grounding training☆223Updated 3 weeks ago
- Reverse Chain-of-Thought Problem Generation for Geometric Reasoning in Large Multimodal Models☆178Updated 10 months ago
- [ICCV 2025] Boosting MLLM Reasoning with Text-Debiased Hint-GRPO☆33Updated 2 months ago
- Multi-granularity Correspondence Learning from Long-term Noisy Videos [ICLR 2024, Oral]☆117Updated last year
- (NeurIPS 2024) Official PyTorch implementation of LOVA3☆90Updated 5 months ago
- [ECCV 2024] Efficient Inference of Vision Instruction-Following Models with Elastic Cache☆42Updated last year
- [NAACL 2025 Oral] 🎉 From redundancy to relevance: Enhancing explainability in multimodal large language models☆115Updated 7 months ago
- [MM'24 Oral] Prior Knowledge Integration via LLM Encoding and Pseudo Event Regulation for Video Moment Retrieval☆130Updated last year
- A comprehensive collection of resources focused on addressing and understanding hallucination phenomena in MLLMs.☆34Updated last year
- ✨✨latest advancements in VLA models(VIsion Language Action)☆82Updated 5 months ago
- Image and video Tokenizer/VAE selection guide, text and face reconstruction evaluation.☆124Updated 3 months ago
- [NeurIPS'24] Leveraging Hallucinations to Reduce Manual Prompt Dependency in Promptable Segmentation☆62Updated 9 months ago
- (ECCV 2024) Empowering Multimodal Large Language Model as a Powerful Data Generator☆113Updated 5 months ago
- 🚀 [NeurIPS24] Make Vision Matter in Visual-Question-Answering (VQA)! Introducing NaturalBench, a vision-centric VQA benchmark (NeurIPS'2…☆85Updated 2 months ago
- [ICCV 2025] Decouple and Track: Benchmarking and Improving Video Diffusion Transformers for Motion Transfer☆127Updated last month
- Domain-Controlled Prompt Learning (AAAI2024)☆89Updated 9 months ago
- Chain-of-Spot: Interactive Reasoning Improves Large Vision-language Models☆96Updated last year
- ☆27Updated last month
- Open source code for Paper: Evaluating Memory in LLM Agents via Incremental Multi-Turn Interactions☆62Updated last month
- [NAACL 2025] SIUO: Cross-Modality Safety Alignment☆113Updated 7 months ago
- Hybrid Latent Reasoning via Reinforcement Learning☆152Updated 3 months ago
- CoS: Chain-of-Shot Prompting for Long Video Understanding☆50Updated 7 months ago
- This repository contains the core implementation of our ICML 2025 paper: "Token Signature: Predicting Chain-of-Thought Gains with Token D…☆41Updated last month
- Official Code of "GeReA: Question-Aware Prompt Captions for Knowledge-based Visual Question Answering"☆112Updated 11 months ago
- Domain Prompt Learning with Quaternion Networks (CVPR2024 Highlight)☆79Updated 8 months ago
- WorldGPT: Empowering LLM as Multimodal World Model☆119Updated last year
- [CVPR 2024] Interactive continual learning: Fast and slow thinking☆102Updated last year