fudan-zvg / S-Agents
Official repository of S-Agents: Self-organizing Agents in Open-ended Environment
☆26Updated last year
Alternatives and similar repositories for S-Agents
Users that are interested in S-Agents are comparing it to the libraries listed below
Sorting:
- Official Code Repository for EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents (COLM 2024)☆29Updated 9 months ago
- Official implementation of the paper "MMInA: Benchmarking Multihop Multimodal Internet Agents"☆43Updated 2 months ago
- ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration☆34Updated 4 months ago
- [NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effect…☆35Updated 10 months ago
- PhysGame Benchmark for Physical Commonsense Evaluation in Gameplay Videos☆43Updated 3 months ago
- ☆41Updated 4 months ago
- Official implementation of WebVLN: Vision-and-Language Navigation on Websites☆28Updated last year
- Official repo of the ICLR 2025 paper "MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos"☆25Updated 7 months ago
- ☆33Updated last year
- This repo contains the code for "MEGA-Bench Scaling Multimodal Evaluation to over 500 Real-World Tasks" [ICLR2025]☆65Updated 3 weeks ago
- [NeurIPSw'24] This repo is the official implementation of "MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simu…☆88Updated 3 months ago
- Official PyTorch Implementation for Vision-Language Models Create Cross-Modal Task Representations, ICML 2025☆22Updated last week
- A Framework for Decoupling and Assessing the Capabilities of VLMs☆42Updated 10 months ago
- ☆79Updated last month
- MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment☆34Updated 10 months ago
- Code for “Pretrained Language Models as Visual Planners for Human Assistance”☆61Updated last year
- ✨✨R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning☆63Updated last week
- Recursive Visual Programming (ECCV 2024)☆17Updated 5 months ago
- Multimodal RewardBench☆39Updated 2 months ago
- Official implementation of "Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models"☆36Updated last year
- [NeurIPS 2024] Efficient Large Multi-modal Models via Visual Context Compression☆55Updated 2 months ago
- [TMLR] Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling"☆133Updated 5 months ago
- [ICLR 2025] Source code for paper "A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegr…☆76Updated 5 months ago
- OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation, arXiv 2024☆59Updated 2 months ago
- Official repo for StableLLAVA☆95Updated last year
- [NeurIPS-2024] The offical Implementation of "Instruction-Guided Visual Masking"☆35Updated 5 months ago
- A big_vision inspired repo that implements a generic Auto-Encoder class capable in representation learning and generative modeling.☆35Updated 10 months ago
- ☆51Updated last year
- Code for "Interactive Task Planning with Language Models"☆28Updated 2 weeks ago
- Language Repository for Long Video Understanding☆31Updated 10 months ago