fudan-zvg / S-Agents
Official repository of S-Agents: Self-organizing Agents in Open-ended Environment
☆21Updated last year
Alternatives and similar repositories for S-Agents:
Users that are interested in S-Agents are comparing it to the libraries listed below
- ☆37Updated 2 months ago
- [NeurIPS-2024] The offical Implementation of "Instruction-Guided Visual Masking"☆33Updated 4 months ago
- [NeurIPS 2024] Official code for HourVideo: 1-Hour Video Language Understanding☆68Updated last week
- Official Code for ACL 2023 Outstanding Paper: World-to-Words: Grounded Open Vocabulary Acquisition through Fast Mapping in Vision-Languag…☆30Updated last year
- Official implementation of "Self-Improving Video Generation"☆61Updated 2 weeks ago
- Official repo of the ICLR 2025 paper "MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos"☆25Updated 5 months ago
- ☆33Updated last year
- Official Code Repository for EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents (COLM 2024)☆29Updated 8 months ago
- [ICLR 2025] Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision☆59Updated 8 months ago
- [NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effect…☆35Updated 9 months ago
- ☆38Updated last year
- ☆66Updated 2 months ago
- Official PyTorch Implementation for Task Vectors are Cross-Modal☆22Updated 3 months ago
- ☆37Updated 2 weeks ago
- ☆25Updated last month
- [NeurIPSw'24] This repo is the official implementation of "MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simu…☆83Updated last month
- [NeurIPS 2024] Efficient Large Multi-modal Models via Visual Context Compression☆52Updated last month
- [ICLR 2025] AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark☆80Updated last month
- Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models☆74Updated 6 months ago
- [ICLR 2025] Source code for paper "A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegr…☆68Updated 3 months ago
- [ECCV 2024] STEVE in Minecraft is for See and Think: Embodied Agent in Virtual Environment☆36Updated last year
- IMProv: Inpainting-based Multimodal Prompting for Computer Vision Tasks☆59Updated 5 months ago
- Official implementation of the paper "MMInA: Benchmarking Multihop Multimodal Internet Agents"☆41Updated 2 weeks ago
- MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment☆33Updated 8 months ago
- [NeurIPS 2024] Official Repository of Multi-Object Hallucination in Vision-Language Models☆28Updated 4 months ago
- ElasticTok: Adaptive Tokenization for Image and Video☆60Updated 4 months ago