dannyXSC / Fudan_FreshmanTestLinks
复旦研究生入学教育测试
☆18Updated last week
Alternatives and similar repositories for Fudan_FreshmanTest
Users that are interested in Fudan_FreshmanTest are comparing it to the libraries listed below
Sorting:
- [ICLR2025] Official code implementation of Video-UTR: Unhackable Temporal Rewarding for Scalable Video MLLMs☆58Updated 6 months ago
- Collections of Papers and Projects for Multimodal Reasoning.☆105Updated 4 months ago
- A python script for downloading huggingface datasets and models.☆19Updated 4 months ago
- A paper list for spatial reasoning☆134Updated 2 months ago
- Latest Advances on (RL based) Multimodal Reasoning and Generation in Multimodal Large Language Models☆33Updated 2 weeks ago
- [arXiv 2025] MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence☆47Updated 3 weeks ago
- 🔥CVPR 2025 Multimodal Large Language Models Paper List☆152Updated 5 months ago
- Official repo and evaluation implementation of VSI-Bench☆583Updated 3 weeks ago
- Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks☆163Updated 3 months ago
- [NeurIPS'24] This repository is the implementation of "SpatialRGPT: Grounded Spatial Reasoning in Vision Language Models"☆245Updated 8 months ago
- Accepted by CVPR 2024☆37Updated last year
- OST-Bench: Evaluating the Capabilities of MLLMs in Online Spatio-temporal Scene Understanding☆59Updated last month
- ☆85Updated last month
- TStar is a unified temporal search framework for long-form video question answering☆63Updated last week
- [ACM MM 2025] TimeChat-online: 80% Visual Tokens are Naturally Redundant in Streaming Videos☆74Updated last month
- Official implementation of "RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics"☆138Updated last month
- SpaceR: The first MLLM empowered by SG-RLVR for video spatial reasoning☆76Updated last month
- MetaSpatial leverages reinforcement learning to enhance 3D spatial reasoning in vision-language models (VLMs), enabling more structured, …☆187Updated 3 months ago
- A framework for unified personalized model, achieving mutual enhancement between personalized understanding and generation. Demonstrating…☆121Updated 3 weeks ago
- Official implementation of MC-LLaVA.☆139Updated 2 weeks ago
- [CVPR 2025] The code for paper ''Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding''.☆154Updated 3 months ago
- A tiny paper rating web☆39Updated 5 months ago
- Official PyTorch Code of ReKV (ICLR'25)☆42Updated 5 months ago
- Official implementation of Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence☆328Updated 2 months ago
- [LLaVA-Video-R1]✨First Adaptation of R1 to LLaVA-Video (2025-03-18)☆30Updated 3 months ago
- The official repo for "SpatialBot: Precise Spatial Understanding with Vision Language Models.☆301Updated 3 months ago
- Code for "Chat-Scene: Bridging 3D Scene and Large Language Models with Object Identifiers" (NeurIPS 2024)☆193Updated 5 months ago
- [ICCV'25] Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness☆53Updated last month
- [CVPR 2025] EgoLife: Towards Egocentric Life Assistant☆322Updated 5 months ago
- ⭐️ Reason-RFT: Reinforcement Fine-Tuning for Visual Reasoning.☆197Updated last month