dannyXSC / Fudan_FreshmanTestLinks
复旦研究生入学教育测试
☆14Updated last year
Alternatives and similar repositories for Fudan_FreshmanTest
Users that are interested in Fudan_FreshmanTest are comparing it to the libraries listed below
Sorting:
- SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation☆175Updated 2 weeks ago
- [arXiv 2025] MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence☆42Updated last week
- Official repo and evaluation implementation of VSI-Bench☆541Updated last week
- A paper list for spatial reasoning☆119Updated last month
- [ICLR2025] Official code implementation of Video-UTR: Unhackable Temporal Rewarding for Scalable Video MLLMs☆56Updated 4 months ago
- Latest Advances on (RL based) Multimodal Reasoning and Generation in Multimodal Large Language Models☆30Updated last week
- Official repo of VLABench, a large scale benchmark designed for fairly evaluating VLA, Embodied Agent, and VLMs.☆253Updated 2 weeks ago
- Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks☆144Updated last month
- Accepted by CVPR 2024☆35Updated last year
- [CVPR 2025] The code for paper ''Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding''.☆133Updated last month
- Collections of Papers and Projects for Multimodal Reasoning.☆105Updated 2 months ago
- A python script for downloading huggingface datasets and models.☆19Updated 3 months ago
- Official implementation of "RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics"☆96Updated last week
- It's not a list of papers, but a list of paper reading lists...☆207Updated 2 months ago
- Fetch citations and abstracts of a Google Scholar paper and generate prompt for LLM☆24Updated 7 months ago
- Embodied Question Answering (EQA) benchmark and method (ICCV 2025)☆27Updated 2 weeks ago
- ☆44Updated 3 months ago
- Latest Advances on Embodied Multimodal LLMs (or Vison-Language-Action Models).☆116Updated last year
- Code for "Chat-Scene: Bridging 3D Scene and Large Language Models with Object Identifiers" (NeurIPS 2024)☆173Updated 3 months ago
- MetaSpatial leverages reinforcement learning to enhance 3D spatial reasoning in vision-language models (VLMs), enabling more structured, …☆156Updated 2 months ago
- ☆69Updated 2 weeks ago
- ☆53Updated 3 weeks ago
- ⭐️ Reason-RFT: Reinforcement Fine-Tuning for Visual Reasoning.☆173Updated last month
- HybridVLA: Collaborative Diffusion and Autoregression in a Unified Vision-Language-Action Model☆254Updated last month
- [CVPR 2024] The code for paper 'Towards Learning a Generalist Model for Embodied Navigation'☆192Updated last year
- [NeurIPS'24] This repository is the implementation of "SpatialRGPT: Grounded Spatial Reasoning in Vision Language Models"☆215Updated 7 months ago
- 😎 up-to-date & curated list of awesome 3D Visual Grounding papers, methods & resources.☆191Updated 2 weeks ago
- A comprehensive list of papers for the definition of World Models and using World Models for General Video Generation, Embodied AI, and A…☆207Updated this week
- [ICML 2024] Official code repository for 3D embodied generalist agent LEO☆446Updated 2 months ago
- A vue-based project page template for academic papers. (in development) https://junyaohu.github.io/academic-project-page-template-vue☆274Updated last week