fudan-zvg / S-Agents
Official repository of S-Agents: Self-organizing Agents in Open-ended Environment
☆21Updated 11 months ago
Alternatives and similar repositories for S-Agents:
Users that are interested in S-Agents are comparing it to the libraries listed below
- [NeurIPS-2024] The offical Implementation of "Instruction-Guided Visual Masking"☆33Updated 3 months ago
- ☆33Updated last month
- Official implementation of "Self-Improving Video Generation"☆59Updated last month
- ☆33Updated last year
- [NeurIPSw'24] This repo is the official implementation of "MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simu…☆80Updated 3 weeks ago
- PhysGame Benchmark for Physical Commonsense Evaluation in Gameplay Videos☆36Updated 2 weeks ago
- [ECCV 2024] This is the official implementation of "Stitched ViTs are Flexible Vision Backbones".☆27Updated last year
- ☆26Updated 7 months ago
- ☆68Updated 7 months ago
- IMProv: Inpainting-based Multimodal Prompting for Computer Vision Tasks☆59Updated 4 months ago
- [NeurIPS 2024] Official code for HourVideo: 1-Hour Video Language Understanding☆62Updated last month
- [ICLR 2025] AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark☆73Updated 3 weeks ago
- This repository is a collection of research papers on World Models.☆37Updated last year
- [NeurIPS 2024] Efficient Large Multi-modal Models via Visual Context Compression☆51Updated this week
- [NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effect…☆35Updated 7 months ago
- The official implementation for "MonoFormer: One Transformer for Both Diffusion and Autoregression"☆83Updated 4 months ago
- ☆15Updated 3 months ago
- A big_vision inspired repo that implements a generic Auto-Encoder class capable in representation learning and generative modeling.☆34Updated 7 months ago
- Learning to Identify Critical States for Reinforcement Learning from Videos (Accepted to ICCV'23)☆26Updated last year
- (NeurIPS 2024) What Makes CLIP More Robust to Long-Tailed Pre-Training Data? A Controlled Study for Transferable Insights☆24Updated 3 months ago
- [NeurIPS 2024] Official implementation of the paper "Interfacing Foundation Models' Embeddings"☆119Updated 5 months ago
- ☆23Updated last week
- ElasticTok: Adaptive Tokenization for Image and Video☆52Updated 3 months ago
- VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation☆85Updated 5 months ago
- Official Code for ACL 2023 Outstanding Paper: World-to-Words: Grounded Open Vocabulary Acquisition through Fast Mapping in Vision-Languag…☆30Updated last year
- Official repository of paper "Subobject-level Image Tokenization"☆65Updated 9 months ago
- Official repo of the ICLR 2025 paper "MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos"☆24Updated 4 months ago
- ☆50Updated 4 months ago
- ☆37Updated last year
- [ICLR 2025] Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision☆59Updated 7 months ago